HISTORY
The Computational Linguistics Laboratory (LLI due to its initials in Spanish: Laboratorio de Lingüística Informática) is a research group recognized by the Universidad Autónoma de Madrid, UAM.
The Computational Linguistics Laboratory was created at UAM-IBM Research Centre, after the joining of Francisco Marcos Marín as General Linguistics Professor in 1981. In the 80s, the works had a double goal: on one side, the collaboration with IBM on immediate projects as spelling checkers, vocabularies, tools development for the new personal computers.
On the other side, works on the application of computers to Philology were started, especially on unified editions and critics. This second work later led to programs of electronic critic editions, as UNITE, and more extensive projects as ADMYTE, Manuscript Digital File and Electronic Texts.
The work started at UAM-IBM Scientific Centre spread out to the equivalent IBM centre in Heidelberg, thanks to a grant awarded by the Alexander von Humboldt Foundation to Francisco Marcos Marín. Between 1985 and 1987, the first great application of the computer programs to the text edition was carried out on the Libro de Alexandre. The work done between Madrid and Germany lead the group to contact with other European groups that started out in linguistic and computer activities. In particular with the group that started with the EUROTRA project of computer translation, supported by the then European Commission.
Although the laboratory was formed at UAM-IBM Scientific Centre, it became a solid foundation with the EUROTRA project. Together with researchers who had worked at the Centre, such as Antonio Moreno Sandoval, others joined the laboratory, such as Fernando Sánchez León and Flora Ramírez Bustamante, whose roles were crucial for the activity developed since then.
Early in the 90s, the work on Eurotra was combined with the one on the digital files supported by the Sociedad Estatal del Quinto Centenario. That is the reason for a split in the work at the Laboratory and its projects, the philological-textual orientation, on one side, and the corpus linguistics, on the other. Numerous links are being forged between the two extremes, without neglecting projections towards new possibilities
That is why the laboratory is a centre on permanent restlessness, always open to collaborations and consortia. The LLI-UAM group occupies its own place in the interdisciplinary research area between Computer Science and Language, both in Spain and the Spanish-speaking community.
Since 2000, the LLI has specialized in compiling corpora: parallel corpora (Arabic-Spanish-English), spontaneous speech corpora (C-ORAL-ROM), children’s speech corpora (CHIEDE), multimodal corpora (MAVIR), oral corpora for foreign/second language learners (Spanish Learner Oral Corpus and French Learner Oral Corpus) and specialized language corpora (MultiMedica). The LLI has also created several linguistic resources: acoustic data bases, applications of corpora for foreign/second language teaching and learning (Textos de español oral, UAM Publishing Services, 2010), electronic dictionaries (Japanese-English-Spanish, and French prepositions), and a morphological analyzer of Arabic verbs (JABALÍN).
The LLI has a close collaboration with several researchers and professors of the Departments of Computer Science Engineering and Telecommunications Engineering from the UAM. Since December 2009, the LLI collaborates with the Instituto de Ingeniería del Conocimiento, a research and development private, non-lucrative institution at the UAM Cantoblanco campus.
RESEARCH LINES
Compilation of speech and written corpora, multilingual and multimodal
Linguistic annotation at all levels: phonological, morphological, sintactic, semantic and pragmatic.
Tools for linguistic corpus management (oral and written) (present and diachronic)
Electronic tools for linguistic and/or philological studies
Terminology
FINISHED Ph.D. THESIS
PhD Student
Xioahan Zhang
Title
Análisis de los tiempos verbales del español empleados por estudiantes chinos mediante técnicas de Lingüística de Corpus
Defended on July 2022
Directors
Antonio Moreno Sandoval
Paula Gozalo Gómez
PhD Student
Nuria Aldama
Title
Disambiguating Spanish se constructions with machine learning techniques
Defended on December, 2021
Director
Antonio Moreno Sandoval
PhD Student
Patricia Elhazaz Walsh
Title
Análisis de la fluidez lectora y la interlengua oral en un corpus de aprendices de inglés como lengua extranjera
Defended on January 29, 2021
Directors
Leonardo Campillos Llanos
Daniel Bolaños Alonso
PhD Student
Yuanyi Liu
Title
Diccionario de terminología médica español-chino basado en corpus
Defendida el September 4, 2018
Director
Antonio Moreno Sandoval
PhD Student
Marta Vacas Matos
Title
Diseño y compilación de un corpus multimodal de análisis pragmático para la aplicación a la enseñanza de español L2/LE
Defended on September 9, 2017
Directors
Antonio Moreno Sandoval
Paula Gozalo Gómez
PhD Student
Carlos Herrero Zorita
Title
Modality in spoken Spanish and Japanese: a corpus-based study and automatic annotation
Defendida on May 11, 2017
Director
Antonio Moreno Sandoval
PhD Student
Emi Takamori
Title
Análisis de usos de partículas japonesas basado en corpus de estudiantes españoles
Defended on June 18, 2014
Director
Antonio Moreno Sandoval
PhD Student
Alicia González Martínez
Title
A computational model of modern standard arabic verbal morphology based on generation
Defended on January 29, 2013
Director
Antonio Moreno Sandoval
PhD Student
Leonardo Campillos Llanos
Title
La expresión oral en español lengua extranjera: interlengua y análisis de errores basado en corpus
Defended on December 17, 2012
Directores
Antonio Moreno Sandoval
Paula Gozalo Gómez
PhD Student
Ana Valverde Mateos
Title
Análisis de errores de aprendientes de francés lengua extranjera (FLE) basado en corpus orales
Defended on June 4, 2012
Directores
Antonio Moreno Sandoval
Concepción Sanz Miguel (UCLM)
PhD Student
Yang Dong
Title
Compilación de un corpus de habla espontánea de chino putonghua para la aplicación en la enseñanza como lengua segunda a hispanohablantes
Defended on 2011
Director
Antonio Moreno Sandoval
PhD Student
Ana González Ledesma
Title
Los marcadores del discurso en el corpus C-ORAL-ROM: anotación pragmática, estrategias computacinales de etiquetado y aplicaciones a otros campos
Defended on 2010
Director
Antonio Moreno Sandoval
PhD Student
Marta Garrote Salazar
Title
CHIEDE: corpus de habla infantil espontánea del español
Defended on 2008
Director
Antonio Moreno Sandoval
PhD Student
Doaa Ahmed Samy
Title
Recursos bilingües de Ingeniería Lingüística para el procesamiento del español y árabe
Defended on 2005
Director
Antonio Moreno Sandoval
PhD Student
Manuel Alcántara Pla
Title
Anotación y recuperación de información semántica eventiva en corpus
Defended on 2005
Director
Antonio Moreno Sandoval
ON-GOING Ph.D. THESIS
PhD Student
Blanca Carbajo Coronado
Provisional Title
Tratamiento computacional de las relaciones de causa-efecto en español con técnicas de aprendizaje automático
Director
Antonio Moreno Sandoval
CONTACT
Contact person
Antonio Moreno Sandoval
Telephone
(+34) 91 497 52 50 / (+34) 91 497 87 07
Department of Linguistics, Modern Languages, Logic and Philosophy of Science
Facultad de Filosofía y Letras – Universidad Autónoma de Madrid
Cantoblanco campus, Carretera de Colmenar, km. 16, 28049 Madrid