Main

Laboratorio de Lingüística Informática

Comparison of LLI-UAM's corpora


CORLEC
C-ORAL-ROM
CHIEDE
ARABIC-SPANISH CORPUS
MAVIR CORPUS
C-ORAL-CHINA
C-ORAL-JAPÓN



Compilation date
1990-92
2001-04
2008
2005
2006-08
2010-11
2010-11

Type of corpus
Oral
Oral
Oral
Written
Oral
Oral
Oral

Languages
Spanish
Spanish, Portuguese, Italian, French
Spanish
Spanish, Arabic, English
Spanish, English
Chinese
Japanese

Number of words
1.100.000
312.000 for each language
60.000
4.000 for each language
103.000
140.000 characters
235.000 characters

Type of recording
Analogical
Digital
Digital
Digital
Digital
Digital
Annotation levels
Features of speech
Prosody and morphology. Partial semantics and pragmatics
Prosody, morphology and phonology
Estructure (paragraphs, sentences and tokens), categories and partial pragmatics
Prosody
Prosody
Prosody
Text-sound alignment
No
Yes
Yes
Yes
Yes
Yes
Participants' permit
No
Yes
Yes
Not necessary
Yes
Yes
Yes
Validation
No
Yes, internal and external
Yes, internal
Yes, internal
Yes, internal
Yes, internal
Search engine
No
Yes
Yes
No
No
Yes
Yes
User guide
No
Yes
Yes
No
Yes
Yes
Yes
Phonological transcription
No
No
Yes
No
No
Pinyin
No



Main Main