Main

Laboratorio de Lingüística Informática

Spanish Learner Oral Corpus

Spanish Learner Oral Corpus

ACCESS TO CORPUS ONLINE (Use preferably Mozilla Firefox)

The Spanish Learner Oral Corpus gathers 40 interviews with learners of Spanish from over 9 mother tongues.

Almost all learners had between 18 and 26 years old, and they were enrolled in courses of Spanish at A2 and B1 levels (Common European Framework of Reference).

The corpus has 55.567 words* (just counting the learners' turns) and a total of 13 hours and 36 minutes recorded.

Furthermore, 4 interviews with native speakers (control group) were collected, which makes a total of 9389 words* and 1 hour and 22 minutes recorded (see table below).

Each recording has been synchronized with its orthographic transcription (at the utterance level). Files include transcription and metadata with sociolinguistic information (e.g. learner's origin or educational level) and data concerning his/her knowledge of Spanish (e.g. proficiency level, time, place or learning context).

Transcriptions also include error tags which have been used in the error analysis of the oral production.

Moreover, transcriptions have been POS-tagged using GRAMPAL analyzer (Moreno y Guirao, 2006), in order to perform analysis of the frequency of use of word categories.

*This figure corresponds to recuento the raw word count in which a "word" is every element between two white spaces; thus, a lexical unit such as es decir ('I mean') counts as 2 words.

  File Sex L1 Level Length
(mm : ss)
Length
L1 group
Nº of
turns
Romance
languages
PORMA2 M Portuguese A2 25:10 1:26:52 524
PORWA2_1 W Portuguese A2 20:09 328
PORWA2_2 W Portuguese (Brazilian) A2 19:51 462
PORWB1 W Portuguese (Brazilian) B1 21:42 496
ITAMA2 M Italian A2 20:45 1:13:25 540
ITAWA2 W Italian A2 13:09 304
ITAMB1 M Italian B1 23:16 436
ITAWB1 W Italian B1 16:15 280
FREMA2 M French A2 24:08 1:23:17 584
FREWA2 W French A2 20:31 250
FREMB1 M French B1 21:56 566
FREWB1 W French B1 16:46 522

Germanic
languages

ENGWA2 W English A2 15:04 1:20:39 348
ENGMB1 M English B1 18:44 436
ENGWB1_1 W English B1 18:02 347
ENGWB1_2 W English B1 28:49 733
DUTMA2 M Dutch A2 18:19 1:16:46 454
DUTWA2_1 W Dutch A2 17:33 180
DUTWA2_2 W Dutch A2 23:05 582
DUTWB1 W Dutch B1 17:49 370
GERMA2 M German A2 18:23 1:13:24 306
GERWA2 W German A2 19:45 526
GERWB1_1 W German B1 15:35 284
GERWB1_2 W German B1 19:41 336

Slavic
languages

POLMA2_1 M Polish A2 22:20 1:32:25 510
POLMA2_2 M Polish A2 30:28 656
POLMB1 M Polish B1 26:46 443
POLWB1 W Polish B1 12:51 268

Sino-Tibetan
languages

CHIWA2_1 W Chinese A2 18:48 1:17:27 478
CHIWA2_2 W Chinese A2 18:45 450
CHIMB1 M Chinese B1 18:56 425
CHIWB1 W Chinese B1 20:58 449

Languages
of Japan

JAPWA2 W Japanese A2 28:52 1:32:41 552
JAPWB1_1 W Japanese B1 16:28 466
JAPWB1_2 W Japanese B1 20:59 498
JAPWB1_3 W Japanese B1  26:22 679

Other
languages

FINWA2 W Finnish A2 20:27 1:19:05 544
HUNWA2 W Hungarian A2 21:28 164
KORWB1 W Korean B1 21:14 462
TURWB1 W Turkish B1 15:56 288

Spanish
(control group)

SPAM_1 M Spanish - 18:57 1:22:29 401
SPAM_2 M Spanish - 26:47 626
SPAW_2 W Spanish - 16:49 307
SPAW_2 W Spanish - 19:56 333

SELECTED PUBLICATIONS