ICNALE: The International Corpus Network of Asian Learners of English
A collection of controlled essays and speeches produced by learners of English in 10 countries and areas in Asia
Project Leader: Dr. Shin'ichiro Ishikawa, Kobe University, Japan





The ICNALE for Download

Last updated 2017/6/1


Recent updates
The Data Developed by the ICNALE Team  The Data Developed by the 3rd Party 

2017.6.1
The ICNALE Edited Essays 0.3 (Formerly called The ICNALE-Proofread) has been released. It now includes 440 learners essays and the same number of edited essays. Rating info is also included.

2017.4.13
The ICNALE-Proofread 0.2 has been released, which comprises 300 essays written by EFL learners and the edited essays by professional proofreaders. A unique dataset for an error analysis.

2015.12.03
The ICNALE-SW 1.1 has been released, which comprises The ICNALE-Spoken V1.2 and The ICNALE-Written V2.1. 

2015.10.17
The ICNALE-SW 1.0 has been released, which comprises The ICNALE-Spoken V1.1 and The ICNALE-Written V2.1. 

2015.05.31
The ICNALE-Spoken 1.0 has been released.

2014.12.31
The ICNALE-Spoken Baby 1.3 has been released.
Data of Indonesian learners was newly added. The new Baby includes transcripts and audio files of 2,900 speeches by 650 learners in Asia and 75 English native speakers.

2014.11.30
The ICNALE ASMS (Automatic Speech Morphing System) Version 1.0 has been released.
This standalone software changes the pitch and the formant of collected speech data, which helps to keep participants' anonymity.
 

2016.06.18
Dr./ Assoc. Prof. Ryo NAGATA of Konan University (Website) designed a new phrase structure annotation system for parsing learner English. Using this system, he tagged a part of the ICNALE-Written. Now you can download the ICNALE-PSA (phrase structure annotation), Sample Data, which includes 134 parsed texts ( 33,913 tokens).























What are included in The ICNALE for Download?



Registered users can download the whole corpus data and analyze it with concordancers such as AntConc and Wordsmith or self-made analytical programs.

Released by the ICNALE Development Team

The ICNALE-SW_1.1 Texts [17MB]
  ---- ICNALE_SW_1.1_Merged Texts (Plain/ Tagged)
  ---- ICNALE_SW_1.1_Unmerged Texts (Plain)
  ---- ICNALE_SW_V1.1_Infosheet
  ---- ICNALE_SW_V1.1_Release Note

The ICNALE_SW_1.1_Sounds [Caution! Approx. 1GB]
  ---- mp3 sound files
__________________________________________

Released by the 3rd Party

The ICNALE-PSA (phrase structure annotated edition), Sample Data [280KB] (Compiled by Dr./ Assoc. Prof. Ryo NAGATA of Konan University (Website))







How to Obtain the Data

Registered users can obtain the whole corpus data and freely use it for academic purposes. If you plan to use the data for commercial purposes, please contact the project team in advance.


Three Steps

1. Firstly, download the data you need.

The ICNALE-SW 1.1 Text (December, 2015)   17MB

The ICNALE-SW 1.1 Sounds (December, 2015)  Caution!  1GB

The ICNALE Edited Essays 0.3 (June, 2017) 5.0MB

The ICNALE Automatic Speech Morphing System (November 2014) 15MB
 
The ICNALE PSA, Sample Data (June 2016) (Developed by Dr./ Assoc. Prof. Ryo NAGATA) 290KB


2. Then, register from the The ICNALE User Registration Form to obtain passwords for unzipping.
You do NOT need to register separately for each data.
If you cannot reach the registration page, please send your name, your institute, and your position (eg. Prof./ Grad Student/ Undergrad/ Independent researcher) directly to the project leader.


3. You will receive a password within a few days. If you do not receive any replies, please contact the project team.
If you use Mac OS, you may need some software such as Stufflt and ZipEZ to unzip/ uncompress the downloaded file. Or try this. Or ask your friend using the Windows to unzip it for you :-)
 













Text Encode

All the texts are encoded in the UTF-8 containing the BOM character [More Info]. When using a concordance, you may need to set the character code before conducting analysis.



AntConc : Global Settings< Character Settings<Edit


When using the Wordmisth with a default setting, you will be required to convert each file to Unicode. Please choose No.
Or you can unclick "Convert from UTF8" option beforehand.











Codes for Individual Files 


Ex)  S_CHN_PTJ1_001_A2_0.txt
S/W Country Topic/ Trial Serial CEFR
S: Speech
W: Writing
CHN, ENS, HKG, IDN, JPN,
KOR, PAK, PHL, SIN, THA,
TWN
PTJ: part-time job
SMK: non-smoking
0 Essay
1 Speech (Trial 1)
2 Speech (Trial 2)
001-999 For NNS
A2_0: A2
B1_1: B1 Lower
B1_2: B1 Upper
B2_0: B2 +

For NS
XX_1 Students
XX_2 Teachers
XX_3 Others

 










Codes for Merged Files 

Ex)  S_CHN_PTJ_A2_0.txt
S/W Country Topic/ Trial CEFR   + Extension
S: Speech
W: Writing
CHN, ENS, HKG, IDN, JPN,
KOR, PAK, PHL, SIN, THA,
TWN
PTJ: part-time job
SMK: non-smoking
For NNS
A2_0: A2
B1_1: B1 Lower
B1_2: B1 Upper
B2_0: B2 +

For NS
XX_1 Students
XX_2 Teachers
XX_3 Others
  .txt: text
.vert: vertical tagged text







POS Tag 


POS (parts-of-speech) tagging was conducted on the Sketch Engine System. The grammar rule adopted for tagging is English PennTB-Tree Tagger 2.0.

Token (word form) and its grammatical attributes are presented in a line.



Word form   POS-Tag    Lemma-pos (lempos)
 


The below is a sample of the tagged data.

<s>

Now      RB         now-r

many     JJ          many-j

parents NNS      parent-n

and        CC         and-c

teachers              NNS      teacher-n

disagree              VVP      disagree-v

that       IN/that  that-x

college   NN        college-n

students              NNS      student-n

have      VHP      have-v

their      PP$       their-p

own       JJ          own-j

part-time            JJ          part-time-j

jobs        NNS      job-n

<g/>

.             SENT    .-x

</s>

 







Participants' Information Sheet

The participants' information sheet lists detailed data about individual participants and their essays or speeches.


About essays or speeches
Code... File code
PTJ (wds)... The number of words in one essay or speech
SMK (wds)...The number of words in one essay or speech

About participants' background
Country... Participant's country or area
Sex ... Participant's sex
Age... Participant's age
Grade... Participant's school grade (1, 2, 3, 4...)
Major (Occupation)... In case of students, their major at colleges; in case of employed people, their job.
Academic Genres... Only for students: Humanities, Social Sciences, Science and Technology, and Life Science

About participants' proficiency
Proficiency Test... Test name such as TOEIC or TOEFL
Score... Score in the test above
VST... Score in the vocabulary size test (full mark is 50) This test measures participants' L2 lexical knowledge with a ceiling of 5,000 words.
CEFR... CEFR levels: A2, B1_1, B1_2, B2+. Estimated from participants' scores in the proficiency test or in the vocabulary size test

About participants' motivation
INTM... Integrative Motivation Score
INSM... Instrumental Motivation Score
INTM+INSM... Strength of Motivation
INTM-INSM... Integrative Motivation Orientation Score

About participants' L2 learning experiences
Primary... How much a participant studies English in their primary school days (1 to 6 points)
Secondary...How much a participant studies English in their secondary school days (1 to 6 points)
College...How much a participant studies English in their college days (1 to 6 points)
Inschool... How much a participant studies English in class (1 to 6 points)
Outschool... How much a participant studies English outside class, namely, at home, in the community etc (1 to 6 points)
Listening... How much a participant studies listening (1 to 6 points)
Reading... How much a participant studies reading (1 to 6 points)
Speaking... How much a participant studies speaking (1 to 6 points)
Writing... How much a participant studies writing (1 to 6 points)
NS... How much a participant has been taught by English native participant (1 to 6 points)
Pronunciation...How much a participant has been taught by English native participant (1 to 6 points)
Presentation...How much a participant has been taught presentation (1 to 6 points)
Essay Writing...How much a participant has been taught essay writing (1 to 6 points)