The Computational Linguistics Laboratory (LLI due to its initials in Spanish: Laboratorio de Lingüística Informática) is a research group recognized by the Universidad Autónoma de Madrid, UAM.

The Computational Linguistics Laboratory was created at UAM-IBM Research Centre, after the joining of Francisco Marcos Marín as General Linguistics Professor in 1981. In the 80s, the works had a double goal: on one side, the collaboration with IBM on immediate projects as spelling checkers, vocabularies, tools development for the new personal computers.

On the other side, works on the application of computers to Philology were started, especially on unified editions and critics. This second work later led to programs of electronic critic editions, as UNITE, and more extensive projects as ADMYTE, Manuscript Digital File and Electronic Texts.

The work started at UAM-IBM Scientific Centre spread out to the equivalent IBM centre in Heidelberg, thanks to a grant awarded by the Alexander von Humboldt Foundation to Francisco Marcos Marín. Between 1985 and 1987, the first great application of the computer programs to the text edition was carried out on the Libro de Alexandre. The work done between Madrid and Germany lead the group to contact with other European groups that started out in linguistic and computer activities. In particular with the group that started with the EUROTRA project of computer translation, supported by the then European Commission.

Although the laboratory was formed at UAM-IBM Scientific Centre, it became a solid foundation with the EUROTRA project. Together with researchers who had worked at the Centre, such as Antonio Moreno Sandoval, others joined the laboratory, such as Fernando Sánchez León and Flora Ramírez Bustamante, whose roles were crucial for the activity developed since then.

Early in the 90s, the work on Eurotra was combined with the one on the digital files supported by the Sociedad Estatal del Quinto Centenario. That is the reason for a split in the work at the Laboratory and its projects, the philological-textual orientation, on one side, and the corpus linguistics, on the other. Numerous links are being forged between the two extremes, without neglecting projections towards new possibilities.

That is why the laboratory is a centre on permanent restlessness, always open to collaborations and consortia. The LLI-UAM group occupies its own place in the interdisciplinary research area between Computer Science and Language, both in Spain and the Spanish-speaking community.

Since 2000, the LLI has specialized in compiling corpora: parallel corpora (Arabic-Spanish-English), spontaneous speech corpora (C-ORAL-ROM), children's speech corpora (CHIEDE), multimodal corpora (MAVIR), oral corpora for foreign/second language learners (Spanish Learner Oral Corpus and French Learner Oral Corpus) and specialized language corpora (MultiMedica). The LLI has also created several linguistic resources: acoustic data bases, applications of corpora for foreign/second language teaching and learning (Textos de español oral, UAM Publishing Services, 2010), electronic dictionaries (Japanese-English-Spanish, and French prepositions), and a morphological analyzer of Arabic verbs (JABALÍN).

The LLI has a close collaboration with several researchers and professors of the Departments of Computer Science Engineering and Telecommunications Engineering from the UAM. Since December 2009, the LLI collaborates with the Instituto de Ingeniería del Conocimiento, a research and development private, non-lucrative institution at the UAM Cantoblanco campus.


Compilation of speech and written corpora, multilingual and multimodal

Linguistic annotation at all levels: phonological, morphological, sintactic, semantic and pragmatic.

Electronic dictionaries

Tools for linguistic corpus management (oral and written) (present and diachronic)

Computational grammars
Acoustic databases
Information retrieval
Machine translation

Electronic tools for linguistic and/or philological studies


PhD Student
Xioahan Zhang

Análisis de los tiempos verbales del español empleados por estudiantes chinos mediante técnicas de Lingüística de Corpus

Defended in July 2022

Antonio Moreno Sandoval
Paula Gozalo Gómez

PhD Student
Nuria Aldama

Disambiguating Spanish se constructions with machine learning techniques

Defended on 10th December 2021

Antonio Moreno Sandoval

PhD Student
Patricia Elhazaz Walsh

Análisis de la fluidez lectora y la interlengua oral en un corpus de aprendices de inglés como lengua extranjera

Defended on 29th January 2021

Leonardo Campillos Llanos
Daniel Bolaños Alonso


PhD Student
Blanca Carbajo Coronado

Provisional title
Tratamiento computacional de las relaciones de causa-efecto en español con técnicas de aprendizaje automático

Antonio Moreno Sandoval


