Laboratorio de Lingüística Informática
At LLI, work on Arabic language is focused on three linguistic resources:
The corpus is made up of UNO, in which the name-entities were annotated.
The following table shows the lexicon size:
Names include common names, adjectives, pronouns, adverbial names and quantifiers. Particles are prepositions, conjunctions, interjections and adverbials. And verbs are annotated according to tense: present, past and imperative.
After the participation in the Cross-Language Evaluation Forum (CLEF), it was created an acoustic database of questions in different languages, as Spanish, Arabic or Thai, to train speech recognition systems.