INFORMATION – Laboratorio de Lingüística Informática

HISTORY

The Computational Linguistics Laboratory (LLI due to its initials in Spanish: Laboratorio de Lingüística Informática) is a research group recognized by the Universidad Autónoma de Madrid, UAM.

The Computational Linguistics Laboratory was created at UAM-IBM Research Centre, after the joining of Francisco Marcos Marín as General Linguistics Professor in 1981. In the 80s, the works had a double goal: on one side, the collaboration with IBM on immediate projects as spelling checkers, vocabularies, tools development for the new personal computers.

On the other side, works on the application of computers to Philology were started, especially on unified editions and critics. This second work later led to programs of electronic critic editions, as UNITE, and more extensive projects as ADMYTE, Manuscript Digital File and Electronic Texts.

The work started at UAM-IBM Scientific Centre spread out to the equivalent IBM centre in Heidelberg, thanks to a grant awarded by the Alexander von Humboldt Foundation to Francisco Marcos Marín. Between 1985 and 1987, the first great application of the computer programs to the text edition was carried out on the Libro de Alexandre. The work done between Madrid and Germany lead the group to contact with other European groups that started out in linguistic and computer activities. In particular with the group that started with the EUROTRA project of computer translation, supported by the then European Commission.

Although the laboratory was formed at UAM-IBM Scientific Centre, it became a solid foundation with the EUROTRA project. Together with researchers who had worked at the Centre, such as Antonio Moreno Sandoval, others joined the laboratory, such as Fernando Sánchez León and Flora Ramírez Bustamante, whose roles were crucial for the activity developed since then.

Early in the 90s, the work on Eurotra was combined with the one on the digital files supported by the Sociedad Estatal del Quinto Centenario. That is the reason for a split in the work at the Laboratory and its projects, the philological-textual orientation, on one side, and the corpus linguistics, on the other. Numerous links are being forged between the two extremes, without neglecting projections towards new possibilities

That is why the laboratory is a centre on permanent restlessness, always open to collaborations and consortia. The LLI-UAM group occupies its own place in the interdisciplinary research area between Computer Science and Language, both in Spain and the Spanish-speaking community.

Since 2000, the LLI has specialized in compiling corpora: parallel corpora (Arabic-Spanish-English), spontaneous speech corpora (C-ORAL-ROM), children’s speech corpora (CHIEDE), multimodal corpora (MAVIR), oral corpora for foreign/second language learners (Spanish Learner Oral Corpus and French Learner Oral Corpus) and specialized language corpora (MultiMedica). The LLI has also created several linguistic resources: acoustic data bases, applications of corpora for foreign/second language teaching and learning (Textos de español oral, UAM Publishing Services, 2010), electronic dictionaries (Japanese-English-Spanish, and French prepositions), and a morphological analyzer of Arabic verbs (JABALÍN).

The LLI has a close collaboration with several researchers and professors of the Departments of Computer Science Engineering and Telecommunications Engineering from the UAM. Since December 2009, the LLI collaborates with the Instituto de Ingeniería del Conocimiento, a research and development private, non-lucrative institution at the UAM Cantoblanco campus.