Laboratorio de Lingüística Informática

HISTORY

The Computational Linguistics Laboratory (LLI due to its initials in Spanish: Laboratorio de Lingüística Informática) is a research group recognized by the Universidad Autónoma de Madrid, UAM.

The Computational Linguistics Laboratory was created at UAM-IBM Research Centre, after the joining of Francisco Marcos Marín as General Linguistics Professor in 1981. In the 80s, the works had a double goal: on one side, the collaboration with IBM on immediate projects as spelling checkers, vocabularies, tools development for the new personal computers.

On the other side, works on the application of computers to Philology were started, especially on unified editions and critics. This second work later led to programs of electronic critic editions, as UNITE, and more extensive projects as ADMYTE, Manuscript Digital File and Electronic Texts.

The work started at UAM-IBM Scientific Centre spread out to the equivalent IBM centre in Heidelberg, thanks to a grant awarded by the Alexander von Humboldt Foundation to Francisco Marcos Marín. Between 1985 and 1987, the first great application of the computer programs to the text edition was carried out on the Libro de Alexandre. The work done between Madrid and Germany lead the group to contact with other European groups that started out in linguistic and computer activities. In particular with the group that started with the EUROTRA project of computer translation, supported by the then European Commission.

Although the laboratory was formed at UAM-IBM Scientific Centre, it became a solid foundation with the EUROTRA project. Together with researchers who had worked at the Centre, such as Antonio Moreno Sandoval, others joined the laboratory, such as Fernando Sánchez León and Flora Ramírez Bustamante, whose roles were crucial for the activity developed since then.

Early in the 90s, the work on Eurotra was combined with the one on the digital files supported by the Sociedad Estatal del Quinto Centenario. That is the reason for a split in the work at the Laboratory and its projects, the philological-textual orientation, on one side, and the corpus linguistics, on the other. Numerous links are being forged between the two extremes, without neglecting projections towards new possibilities

That is why the laboratory is a centre on permanent restlessness, always open to collaborations and consortia. The LLI-UAM group occupies its own place in the interdisciplinary research area between Computer Science and Language, both in Spain and the Spanish-speaking community.

Since 2000, the LLI has specialized in compiling corpora: parallel corpora (Arabic-Spanish-English), spontaneous speech corpora (C-ORAL-ROM), children’s speech corpora (CHIEDE), multimodal corpora (MAVIR), oral corpora for foreign/second language learners (Spanish Learner Oral Corpus and French Learner Oral Corpus) and specialized language corpora (MultiMedica). The LLI has also created several linguistic resources: acoustic data bases, applications of corpora for foreign/second language teaching and learning (Textos de español oral, UAM Publishing Services, 2010), electronic dictionaries (Japanese-English-Spanish, and French prepositions), and a morphological analyzer of Arabic verbs (JABALÍN).

The LLI has a close collaboration with several researchers and professors of the Departments of Computer Science Engineering and Telecommunications Engineering from the UAM. Since December 2009, the LLI collaborates with the Instituto de Ingeniería del Conocimiento, a research and development private, non-lucrative institution at the UAM Cantoblanco campus.

RESEARCH LINES

Compilation of speech and written corpora, multilingual and multimodal

Acoustic databases

Linguistic annotation at all levels: phonological, morphological, sintactic, semantic and pragmatic.

Treebanks
 
Information retrieval
Electronic dictionaries
Machine translation

Tools for linguistic corpus management (oral and written) (present and diachronic)

Electronic tools for linguistic and/or philological studies

Computational grammars
 

Terminology

FINISHED Ph.D. THESIS

PhD Student
Xioahan Zhang

Title

Análisis de los tiempos verbales del español empleados por estudiantes chinos mediante técnicas de Lingüística de Corpus

Defended on July 2022

 

Directors
Antonio Moreno Sandoval
Paula Gozalo Gómez

PhD Student
Nuria Aldama

Title

Disambiguating Spanish se constructions with machine learning techniques

Defended on December,  2021

Director
Antonio Moreno Sandoval

PhD Student
Patricia Elhazaz Walsh

Title
Análisis de la fluidez lectora y la interlengua oral en un corpus de aprendices de inglés como lengua extranjera

Defended on January 29, 2021

Directors
Leonardo Campillos Llanos
Daniel Bolaños Alonso

PhD Student
Yuanyi Liu

Title
Diccionario de terminología médica español-chino basado en corpus

Defendida el September 4, 2018

Director

Antonio Moreno Sandoval

PhD Student

Marta Vacas Matos

Title
Diseño y compilación de un corpus multimodal de análisis pragmático para la aplicación a la enseñanza de español L2/LE

Defended on September 9,  2017

Directors
Antonio Moreno Sandoval
Paula Gozalo Gómez

PhD Student

Carlos Herrero Zorita

Title
Modality in spoken Spanish and Japanese: a corpus-based study and automatic annotation

Defendida on May 11, 2017

Director
Antonio Moreno Sandoval

PhD Student

Emi Takamori

Title
Análisis de usos de partículas japonesas basado en corpus de estudiantes españoles

Defended on June 18, 2014

Director
Antonio Moreno Sandoval

PhD Student

Alicia González Martínez

Title
A computational model of modern standard arabic verbal morphology based on generation

Defended on January 29, 2013

Director
Antonio Moreno Sandoval

PhD Student

Leonardo Campillos Llanos

Title
La expresión oral en español lengua extranjera: interlengua y análisis de errores basado en corpus

Defended on December 17, 2012

Directores
Antonio Moreno Sandoval
Paula Gozalo Gómez

PhD Student
Ana Valverde Mateos

Title
Análisis de errores de aprendientes de francés lengua extranjera (FLE) basado en corpus orales

Defended on June 4, 2012

Directores
Antonio Moreno Sandoval
Concepción Sanz Miguel (UCLM)

PhD Student
Marta Garrote Salazar

Title
CHIEDE: corpus de habla infantil espontánea del español

Defended on 2008

Director
Antonio Moreno Sandoval

PhD Student

Doaa Ahmed Samy

Title
Recursos bilingües de Ingeniería Lingüística para el procesamiento del español y árabe

Defended on 2005

Director
Antonio Moreno Sandoval

PhD Student

Manuel Alcántara Pla

Title
Anotación y recuperación de información semántica eventiva en corpus

Defended on 2005

Director
Antonio Moreno Sandoval

ON-GOING Ph.D. THESIS

PhD Student
Blanca Carbajo Coronado

Provisional Title

Tratamiento computacional de las relaciones de causa-efecto en español con técnicas de aprendizaje automático

Director
Antonio Moreno Sandoval

CONTACT

Contact person

Antonio Moreno Sandoval

Telephone
(+34) 91 497 52 50 / (+34) 91 497 87 07

Department of Linguistics, Modern Languages, Logic and Philosophy of Science
Facultad de Filosofía y Letras – Universidad Autónoma de Madrid

Cantoblanco campus, Carretera de Colmenar, km. 16, 28049 Madrid