Laboratorio de Lingüística Informática
ACCESS TO CORPUS ONLINE (Use preferably Mozilla Firefox) |
The Spanish Learner Oral Corpus gathers 40 interviews with learners of Spanish from over 9 mother tongues.
Almost all learners had between 18 and 26 years old, and they were enrolled in courses of Spanish at A2 and B1 levels (Common European Framework of Reference).
The corpus has 55.567 words* (just counting the learners' turns) and a total of 13 hours and 36 minutes recorded.
Furthermore, 4 interviews with native speakers (control group) were collected, which makes a total of 9389 words* and 1 hour and 22 minutes recorded (see table below).
Each recording has been synchronized with its orthographic transcription (at the utterance level). Files include transcription and metadata with sociolinguistic information (e.g. learner's origin or educational level) and data concerning his/her knowledge of Spanish (e.g. proficiency level, time, place or learning context).
Transcriptions also include error tags which have been used in the error analysis of the oral production.
Moreover, transcriptions have been POS-tagged using GRAMPAL analyzer (Moreno y Guirao, 2006), in order to perform analysis of the frequency of use of word categories.
*This figure corresponds to recuento the raw word count in which a "word" is every element between two white spaces; thus, a lexical unit such as es decir ('I mean') counts as 2 words.
File | Sex | L1 | Level |
Length (mm : ss) |
Length L1 group |
Nº of turns |
|
Romance languages |
PORMA2 | M | Portuguese | A2 | 25:10 | 1:26:52 | 524 |
PORWA2_1 | W | Portuguese | A2 | 20:09 | 328 | ||
PORWA2_2 | W | Portuguese (Brazilian) | A2 | 19:51 | 462 | ||
PORWB1 | W | Portuguese (Brazilian) | B1 | 21:42 | 496 | ||
ITAMA2 | M | Italian | A2 | 20:45 | 1:13:25 | 540 | |
ITAWA2 | W | Italian | A2 | 13:09 | 304 | ||
ITAMB1 | M | Italian | B1 | 23:16 | 436 | ||
ITAWB1 | W | Italian | B1 | 16:15 | 280 | ||
FREMA2 | M | French | A2 | 24:08 | 1:23:17 | 584 | |
FREWA2 | W | French | A2 | 20:31 | 250 | ||
FREMB1 | M | French | B1 | 21:56 | 566 | ||
FREWB1 | W | French | B1 | 16:46 | 522 | ||
Germanic |
ENGWA2 | W | English | A2 | 15:04 | 1:20:39 | 348 |
ENGMB1 | M | English | B1 | 18:44 | 436 | ||
ENGWB1_1 | W | English | B1 | 18:02 | 347 | ||
ENGWB1_2 | W | English | B1 | 28:49 | 733 | ||
DUTMA2 | M | Dutch | A2 | 18:19 | 1:16:46 | 454 | |
DUTWA2_1 | W | Dutch | A2 | 17:33 | 180 | ||
DUTWA2_2 | W | Dutch | A2 | 23:05 | 582 | ||
DUTWB1 | W | Dutch | B1 | 17:49 | 370 | ||
GERMA2 | M | German | A2 | 18:23 | 1:13:24 | 306 | |
GERWA2 | W | German | A2 | 19:45 | 526 | ||
GERWB1_1 | W | German | B1 | 15:35 | 284 | ||
GERWB1_2 | W | German | B1 | 19:41 | 336 | ||
Slavic |
POLMA2_1 | M | Polish | A2 | 22:20 | 1:32:25 | 510 |
POLMA2_2 | M | Polish | A2 | 30:28 | 656 | ||
POLMB1 | M | Polish | B1 | 26:46 | 443 | ||
POLWB1 | W | Polish | B1 | 12:51 | 268 | ||
Sino-Tibetan |
CHIWA2_1 | W | Chinese | A2 | 18:48 | 1:17:27 | 478 |
CHIWA2_2 | W | Chinese | A2 | 18:45 | 450 | ||
CHIMB1 | M | Chinese | B1 | 18:56 | 425 | ||
CHIWB1 | W | Chinese | B1 | 20:58 | 449 | ||
Languages |
JAPWA2 | W | Japanese | A2 | 28:52 | 1:32:41 | 552 |
JAPWB1_1 | W | Japanese | B1 | 16:28 | 466 | ||
JAPWB1_2 | W | Japanese | B1 | 20:59 | 498 | ||
JAPWB1_3 | W | Japanese | B1 | 26:22 | 679 | ||
Other |
FINWA2 | W | Finnish | A2 | 20:27 | 1:19:05 | 544 |
HUNWA2 | W | Hungarian | A2 | 21:28 | 164 | ||
KORWB1 | W | Korean | B1 | 21:14 | 462 | ||
TURWB1 | W | Turkish | B1 | 15:56 | 288 | ||
Spanish |
SPAM_1 | M | Spanish | - | 18:57 | 1:22:29 | 401 |
SPAM_2 | M | Spanish | - | 26:47 | 626 | ||
SPAW_2 | W | Spanish | - | 16:49 | 307 | ||
SPAW_2 | W | Spanish | - | 19:56 | 333 |
SELECTED PUBLICATIONS