ICNALE: The International Corpus Network of Asian Learners of English
A collection of controlled essays and speeches produced by learners of English in 10 countries and areas in Asia
Project Leader: Dr. Shin'ichiro Ishikawa, Kobe University, Japan





About Data

Last updated 2017/08/12











Data Collection Protocol (For the ICNALE Spoken Monologue)

In the ICNALE Spoken project, participants are given the tasks below:

    The ICNALE Spoken 
   1    Respond to the personal attribute questions on the data collection sheet 
   2   Respond to the learning history questions on the data collection sheet 
   3    Answer the vocabulary size test on the data collection sheet 
   4    Take a telephone interview and respond to the questions below:
  Q1: Student Number
  Q2: Name
  Q3: Country
  Q4: College
  Q5: Self introduction (60 sec speech)
  Q6: Topic 1, Trial 1 (60 sec speech after 20 sec preparation)
  Q7: Topic 1, Trial 2 (60 sec speech after 10 sec preparation)
  Q8: Topic 2, Trial 1 (60 sec speech after 20 sec preparation)
  Q9: Topic 2, Trial 2 (60 sec speech after 10 sec preparation)
  Q10: Self-evaluation (0 to 5)


Data collection Sheet for the ICNALE Spoken Monologue

Instruction for Participants of the ICNALE Spoken Monologue

In order to collect speech data produced in the same condition, we are using the ICNALE ASCS (Automatic Speech Collection System), which gives the same prompts at the same timing to all the participants. Speech data is automatically stored in an online server, to which project members have an easy access.



Fig. The ICNALE ASCS (Speech data collection and management system)







Data Collection Protocol (For the ICNALE Written Essays)

In the ICNALE Written project, participants are given the tasks below:


    The ICNALE Written 
   1     Respond to the personal attribute questions on the data collection sheet 
   2   Respond to the learning history questions on the data collection sheet 
   3    Answer the vocabulary size test on the data collection sheet 
   4    Write essays about two topics 


Data Collection Sheet for the ICNALE Written Essays

Instruction for Participants of the ICNALE Written Essays







Survey of Participants' L2 Proficiency


Level Indices

Participants' L2 proficiency levels are classified into four bands based on the CEFR.

 Original Level Indices in CEFR Four Level Indices Adopted in the ICNALE 

A (Basic Users)
A1 (Breakthrough)
A2 (Waystage)

B (Independent Users)
B1 (Threshold)
B2 (Vantage),

C (Advanced Users)
  C1 (Effective Operational Proficiency)
  C2 (Mastery)
 
A2 (Waystage)

B1_1 (Threshold, Lower)

B1_2 (Threshold, Upper)

B2+ (Vantage or Higher)




Level Estimation

We made all the participants take a L2 vocabulary size test (VST) covering the top 5K word levels (Nation & Beglar, 2007) and also asked them to report their scores or grades in some L2 proficiency test that they have taken before.

When participants report scores in proficiency test, they were converted directly to the CEFR levels. We used an official score conversion table offered by individual testing institutes. In this case, their VST scores were not used for level estimation.

Meanwhile, when participants do not report scores in proficiency tests, their VST scores were converted to the expected TOEIC scores and finally to the CEFR levels. The VST/ TOEIC conversion formula was obtained from a linear regression modeling of 268 participants who have taken both of the TOEIC Test and the VST.




Fig. Conversion of proficiency or vocabulary test scores to CEFR-levels





Proficiency Test Scores and the CEFR Levels

The ratio of learners who have taken established L2 proficiency tests is almost 100% in Japan and Korea, 30 to 50% in other EFL countries, and less than 10% in ESL countries. These tests include:


TOEIC
Test of English for International Communication, which is administered by ETS, is quite popular in the countries such as Japan and Korea. It comprises a reading section and a listening section, and the full mark is 990. The TOEIC/ CEFR conversion table is offered by ETS.

TOEFL
Test of English as a Foreign Language, which is also administered by ETS, is popular in many EFL countries in Asia. The PBT (Paper-Based Test) version comprises listening, structure, and reading sections and the full mark is 677. The iBT (Internet-Based Test) version comprises reading, listening, speaking, and writing sections, and the full mark is 120. The PBT/ iBT conversion table and the TOEFL/ CEFR conversion table are offered by ETS.

IELTS
International English Language Testing System, which is co-administered by Cambridge ESOL, British Council, and IDP Education, is used widely in the world. It comprises reading, listening, speaking, and writing sections and the full mark is 9.0.The IELTS/ CEFR conversion table is offered by Cambridge EOS.

TEPS
Test of English Proficiency, which is developed by Language Education Institute at Seoul National University, is administered in Korea. As direct conversion between TEPS and CEFR is not available, TEPS scores are converted to TOEIC scores based on the TEPS/ TOEIC conversion table, and then to CEFR levels.

CET/ TEM
The College English Test (CET) and the Test for English Majors (TEM) are national English tests administered in China. College students excluding those majoring in English are required to pass the CET4 in four years. Students who proceed to graduate schools are required or recommended to pass the CET6. English major students are required to pass the TEM4 and encouraged to pass the TEM6. CET6 and TEM4 are generally said to be of the same difficulty level (Hui, 2010). As direct conversion between CET/TEM and CEFR is not available, CET/TEM scores are converted to STEP grades (Inoue,2002; Miyauchi, 2005) and then to CEFR levels based on the STEP/CEFR conversion Table.




VST Scores and the CEFR Levels

The ratio of learners who have taken established L2 proficiency tests is almost 100% in Japan and Korea, 30 to 50% in other EFL countries, and less than 10% in ESL countries. These tests include:

Vocabulary Size Test (Nation & Beglar, 2007)
Nation's VST has been widely used in EFL education and it is suggested that L2 vocabulary knowledge measured by VST is robustly correlated with general L2 proficiency. As Meara & Milton (2003) and Milton (2010) state that it is appropriate to measure the vocabulary size of non-native speakers with a ceiling of 5,000 words, we used fifty test items in the 1000 to 5000 word levels of VST Monolingual Version (14,000 words). Although the original VST is in a pencil and paper style, we prepared the Excel version of VST, which is integrated into the Data Collection Sheets. This makes it possible for all the participants to take the test before they do writing or speaking tasks.


There seem to be no reliable conversion guidelines between vocabulary size and CEFR levels. For instance, based on the analysis of Greek and Hungarian EFL learners, Meara & Milton (2003) relate the size of 2500+ words to B1, that of 3250+ words to B2, that of 3750+ words to C1, and that of 4500+ words to C2. However, this conversion proved to greatly overestimate the proficiency of Asian learners.

Therefore, we conducted a linear regression modeling of 268 Asian participants who have taken both of the TOEIC test and the VST to obtain a conversion formula.


TOEIC=10.495 * VST + 289 (R2=.21)
 


Fig. Scatter plot of VST and TOEIC scores of 268 participants

Thus, participants' VST scores were converted to the (estimated) TOEIC scores, and finally to the CEFR levels.



Mutual Conversion Table Adopted in the Project

 Levels TOEFL PBT  TOEFL iBT  TOEIC  IELTS  STEP  TEPS  CET/TEM  VST 
A2 (Waystage) -486 -56  -545  3+  3+  ---  ---  -24 
B1_1 (Threshold, Lower) 487+ 57+ 550+  4+  2+  417+  CET4  25+ 
B1_2 (Threshold, Upper) 527+ 72+ 670+  4+  2+  513+  CET4  36+ 
B2+ (Vantage or Higher) 567+ 87+ 785+  5+  Pre1+  608+  CET6/ TEM4  47+ 

As all the participants are college students, we set the min proficiency level as A2 not A1.



CFFR Levels of the NNS Participants in the ICNALE Spoken Monologue

Country Code     Country/ Area      # of Participants     # of Speeches   CEFR-based L2 Proficiency Levels
      A2 B1_1 B1_2  B2+ 
TOEIC   -545 550+  670+  785+ 
TOEFL   -56 (486) 57 (487)+  72 (527)+  87 (567)+ 
 VST -24 25+  36+  47+ 
Inner Circle      
 ENS* USA, UK, CAN, AUS, NZ 150 600  --   N/A N/A  N/A  N/A 
Outer Circle     
 HKG  Hong Kong 50  200   -- 0.0% 2.0% 46.0% 52.0%
 PAK  Pakistan 100  400  -- 5.0% 6.0% 88.0% 1.0%
 PHL  Philippines  100  400   -- 0.0% 7.0% 81.0% 12.0%
 SIN  Singapore  50  200   -- 0.0% 0.0% 58.0% 42.0%
Expanding Circle     
 CHN  China  150 600   -- 9.3% 32.0% 52.0% 6.7%
 IDN  Indonesia 100  400   -- 26.0% 37.0% 34.0% 3.0%
 JPN  Japan 150  600   -- 20.0% 31.3% 28.7% 20.0%
 KOR  Korea 100  400   -- 6.0% 15.0% 43.0% 36.0%
 THA  Thailand   50  200   -- 4.0% 38.0% 50.0% 8.0%
 TWN  Taiwan  100  400   -- 17.0% 41.0% 25.0% 17.0%
Total --- 1,100 4,400  -- --  --  --  -- 




CFFR Levels of the NNS Participants in the ICNALE Written Essays

Country Code     Country/ Area      # of Participants     # of Essays   CEFR-based L2 Proficiency Levels
      A2 B1_1 B1_2  B2+ 
TOEIC   -545 550+  670+  785+ 
TOEFL   -56 (486) 57 (487)+  72 (527)+  87 (567)+ 
 VST -24 25+  36+  47+ 
Inner Circle      
 ENS* USA, UK, CAN, AUS, NZ  200 400  --   N/A N/A  N/A  N/A 
Outer Circle     
 HKG  Hong Kong 100  200   -- 1.0% 30.0% 52.0% 17.0%
 PAK  Pakistan 200  400   -- 9.0% 45.5% 44.0% 1.5%
 PHL  Philippines  200  400   -- 1.0% 5.5% 88.0% 5.5%
 SIN  Singapore  200  400   -- 0.0% 0.0% 67.0% 33.0%
Expanding Circle     
 CHN  China  400  800   -- 12.5% 58.0% 26.3% 3.3%
 IDN  Indonesia 200  400   -- 16.0% 41.0% 41.5% 1.5%
 JPN  Japan 400  800   -- 38.5% 44.8% 12.3% 4.5%
 KOR  Korea 300  600   -- 25.0% 20.3% 29.3% 25.3%
 THA  Thailand   400  800   -- 29.8% 44.8% 25.0% 0.5%
 TWN  Taiwan  200  400   -- 14.5% 43.5% 30.5% 11.5%
Total --- 2,800 5,600  -- --  --  --  -- 









Survey of Participants' Attribute Information

Using a questionnaire, we collected varied participants' background information, which covers (A) Basic Attributes and (B) Motivation in L2 Learning. These data are available only in the download version.



(A) Basic Attributes


Sex

Age
Grade
Years of Study
Major (Occupation)
Academic Genres: Humanities, Social Sciences, Science and Technology, Life Science
 



(B) Motivation in L2 Learning

Participants were required to answer the ten question items below with a point from 1 (Strongly disagree) to 6 (Strongly agree).These intend to survey the strength of two kinds of motivations: integrative and instrumental.


Fig. Two kinds of L2 learning motivations


The former concerns the interests in L2 itself or its culture and the want to have interactions with or be integrated into its speech community, while the latter concerns the want to achieve some practical goal (passing the exam, getting a better grade, getting a better job, getting a job skill, getting a promotion etc.) by using L2 as an instrument. Integrative and instrumental motivations are roughly in accordance with intrinsic and extrinsic motivations. Some studies claim that integrative (intrinsic) motivation is often more effective than its counterpart.



I study English because ...

Q01  I find pleasure when I understand the content sufficiently./
Q02  I want to get a better job in future.
Q03  Learning content is more important than being awarded high grades. /
Q04  I want to be socially acknowledged.
Q05  Being awarded high grades is important for me.
Q06  Learning English is what we have to do anyway.
Q07  I want to achieve a good mark in the tests.
Q08  I am interested in the content, even if it is difficult.
Q09  Learning something new is fun, even if it is difficult.
Q10  I find pleasure in discovering something new.
Q11  I want to get a better grade than others.
Q12  Increasing English knowledge is fun.

 



In order to grasp participants' overall motivational tendency, we calculated four kinds of motivation scores based on the points of the questions above.


INTM (Integrative Motivation Score) is the average of the questions below:
  Q01  I find pleasure when I understand the content sufficiently./
  Q03  Learning content is more important than being awarded high grades. /
  Q08  I am interested in the content, even if it is difficult.
  Q09  Learning something new is fun, even if it is difficult.
  Q10  I find pleasure in discovering something new.
  Q12  Increasing English knowledge is fun.

INSM (Instrumental Motivation Score) is the average of the questions below:
  Q02  I want to get a better job in future.
  Q04  I want to be socially acknowledged.
  Q05  Being awarded high grades is important for me.
  Q06  Learning English is what we have to do anyway.
  Q07  I want to achieve a good mark in the tests.
  Q11  I want to get a better grade than others.


INTM+INSM (Motivation Strength Score) is the sum of INTM and INSM

INTM-INSM (Integrative Motivation Orientation Score) is the difference between two motivations










Survey of Participants' L2 Learning History

Using a different questionnaire, we also investigated participants' L2 learning history. This data is available only in the download version.


(C) L2 Learning Experiences

Participants were required to answer the twenty two question items below with a point from 1 (Strongly disagree) to 6 (Strongly agree).A series of questionnaires intend to survey three basic elements in L2 learning: When (Period), Where (In or out of class), and What (Skill Type).





In my primary school days...

Q13  I often used English in class.
Q14  I often used English outside class.


In my secondary school days...

Q15  I listened to English a lot in class.
Q16  I read English a lot in class.
Q17  I spoke English a lot in class.
Q18  I wrote English a lot in class.
Q19  I listened to English a lot outside class.
Q20  I read English a lot outside class.
Q21  I spoke English a lot outside class.
Q22  I wrote English a lot outside class.


In my college days...

Q23  I listen to English a lot in class.
Q24  I read English a lot in class.
Q25  I speak English a lot in class.
Q26  I write English a lot in class.
Q27  I listen to English a lot outside class.
Q28  I read English a lot outside class.
Q29  I speak English a lot outside class.
Q30  I write English a lot outside class.


So far in my life,...

Q31  I have been taught by English native speakers.
Q32  I have been taught English pronunciation.
Q33  I have been taught speaking or presentation.
Q34  I have been taught essay writing.
 


In order to grasp participants' overall tendency in L2 learning experiences, we calculated thirteen kinds of learning experience scores based on the points of the questions above.



P
RIMARY (Amount of English learning in primary school period) is the average of the questions below:
  Q13  In my primary school days, I often used English in class.
  Q14  In my primary school days, I often used English outside class.


SECONDARY (Amount of English learning in secondary school period) is the average of the questions below:
  Q15 In my secondary school days, I listened to English a lot in class.
  Q16 In my secondary school days, I read English a lot in class.
  Q17 In my secondary school days, I spoke English a lot in class.
  Q18 In my secondary school days, I wrote English a lot in class.
  Q19 In my secondary school days, I listened to English a lot outside class.
  Q20 In my secondary school days, I read English a lot outside class.
  Q21 In my secondary school days, I spoke English a lot outside class.
  Q22 In my secondary school days, I wrote English a lot outside class.


COLLEGE (Amount of English learning in college period) is the average of the questions below:
  Q23  In my college days, I listen to English a lot in class.
  Q24  In my college days, I read English a lot in class.
  Q25  In my college days, I speak English a lot in class.
  Q26  In my college days, I write English a lot in class.
  Q27  In my college days, I listen to English a lot outside class.
  Q28  In my college days, I read English a lot outside class.
  Q29  In my college days, I speak English a lot outside class.
  Q30  In my college days, I write English a lot outside class.

INSCHOOL (Amount of English learning in class) is the average of the questions below:
  Q13  In my primary school days, I often used English in class.
  Q15  In my secondary school days, I listened to English a lot in class.
  Q16  In my secondary school days, I read English a lot in class.
  Q17  In my secondary school days, I spoke English a lot in class.
  Q18  In my secondary school days, I wrote English a lot in class.
  Q23  In my college days, I listen to English a lot in class.
  Q24  In my college days, I read English a lot in class.
  Q25  In my college days, I speak English a lot in class.
  Q26  In my college days, I write English a lot in class.

OUTSCHOOL (Amount of English learning outside class) is the average of the questions below:
  Q14  In my primary school days, I often used English outside class.
  Q19  In my secondary school days, I listened to English a lot outside class.
  Q20  In my secondary school days, I read English a lot outside class.
  Q21  In my secondary school days, I spoke English a lot outside class.
  Q22  In my secondary school days, I wrote English a lot outside class.
  Q27  In my college days, I listen to English a lot outside class.
  Q28  In my college days, I read English a lot outside class.
  Q29  In my college days, I speak English a lot outside class.
  Q30  In my college days, I write English a lot outside class.


LISTENING (Amount of studying L2 listening) is the average of the questions below:
  Q15  In my secondary school days, I listened to English a lot in class.
  Q19  In my secondary school days, I listened to English a lot outside class.
  Q23  In my college days, I listen to English a lot in class.
  Q27  In my college days, I listen to English a lot outside class.


READING (Amount of studying L2 reading) is the average of the questions below:
  Q16  In my secondary school days, I read English a lot in class.
  Q20  In my secondary school days, I read English a lot outside class.
  Q24  In my college days, I read English a lot in class.
  Q28  In my college days, I read English a lot outside class.


SPEAKING (Amount of studying L2 speaking) is the average of the questions below:
  Q17  In my secondary school days, I spoke English a lot in class.
  Q21  In my secondary school days, I spoke English a lot outside class.
  Q25  In my college days, I speak English a lot in class.
  Q29  In my college days, I speak English a lot outside class.


WRITING (Amount of studying L2 writing) is the average of the questions below:
  Q18 In my secondary school days, I wrote English a lot in class.
  Q22 In my secondary school days, I wrote English a lot outside class.
  Q26  In my college days, I write English a lot in class.
  Q30  In my college days, I write English a lot outside class.


NS(Experience of being taught by ENS) is the score of the question below:
  Q31  So far in my life, I have been taught by English native speakers.


PRONUNCIATION (Experience of being taught pronunciation) is the score of the question below
  Q32  So far in my life, I have been taught English pronunciation.

PRESENTATION (Experience of being taught presentation) is the score of the question below
  Q33  So far in my life, I have been taught speaking or presentation.

ESSAY WRITING (Experience of being taught essay writing) is the score of the question below
  Q34  So far in my life, I have been taught essay writing.

  









Text Processing

In order to avoid errors in text processing, we

  (1) changed 2-byte characters to 1-byte characters (e.g: 2-bite quotation mark)
  (2) replaced some of non-letter characters with substitutes
  (3) deleted ^n(new lines), ^ t (tabs), and so on









Sound Processing (For the ICNALE Spoken Monologue)

In order to assure anonymity and privacy of participants, we have morphed original sound data by changing its pitch and formant. Also, we have released the ICNALE ASMS (Automatic Speech Morphing System) so that more researchers can compile varied learner speech corpora. Now you can download and test it by yourself.


Fig. The ICNALE ASMS