Linguistic Analyses of the Spanish Language through Electronic Resources
Founded by UAM and the Santander Bank
Period: from 1 July, 2013 through 31 December, 2014
Principal Investigator in Spain: Antonio Moreno Sandoval (UAM)
Principal Investigators in Japan: Hiroto Ueda (University of Tokyo), Toshihiro Takagaki (Tokyo University of Foreign Studies), Antonio Ruiz Tinoco (Sophia University)
This project is a collaboration between the Laboratorio de Lingüística Informática (LLI) and three Japanese institutions: Tokyo University of Foreign Studies (TUFS), University of Tokyo, and Sophia University. The goals of the project are: 1. to reuse electronic linguistic resources developed by each team, and 2. to integrate corpora and electronic resources to perform linguistic analyses on the Spanish language.
Goals of the project
- Spanish corpora developed by each team at LLI, TUFS, Sophia University and the University of Tokyo.
- Part-of-Speech (PoS) analyzers such as the GRAMPAL tagger (LLI-UAM).
A tool developed by Hiroto Ueda (University of Tokyo) for advanced text search. It features, among others, linguistic patterns and word search, frequency counts and Key Word in Context (KWIC) functionality.
- To gather and reuse the different corpora and taggers to carry out research on the Spanish language.
- To develop and improve electronic tools for language analysis (e.g. Part-of-Speech analyzers).
- To share didactic materials for Spanish.
- To publish the research results achieved between researchers.
All of that will positively improve institutional relations between the universities involved, which are highly ranked institutions in both national and international contexts. There is indeed already an international agreement and an international exchange program between TUFS and UAM for teachers and students.
- Design of resource integration: analysis of needs and costs for integrating the corpora into the electronic search interface.
- Adaptation of existing resources to the format needed for the electronic search interface.
- Integration of electronic linguistic resources: the GRAMPAL PoS analyzer (Laboratorio de Lingüística Informática) and the LETRAS tool (developed by Professor Hiroto Ueda, University of Tokyo).
- Indexation of corpora into the database of the search interface.
- Development of the web interface, following the methodology from the LLI.
- Linguistic analyses using the electronic resources (e.g. frequency lists or analysis of syntactic patterns).
- UAM: Laboratory of Computational Linguistics
- Antonio Moreno-Sandoval (Principal investigator)
- Leonardo Campillos Llanos (Postdoctoral researcher)
- Carlos Herrero Zorita (PhD student)
- Paula Gozalo Gómez (Teacher of Spanish as a Foreign Language at UAM Language Service)
- Théophile Ambadiang (Associate professor)
- Tokyo University of Foreing Studies
- Toshihiro Takagaki (Professor)
- Ryo Tsutahara (PhD student)
- Sophia University
- Antonio Ruiz Tinoco (Professor)