Laboratorio de Lingüística Informática
This project proposes the creation of a set of tools and resources for multilingual corpus linguistics work.
A database model for annotated and aligned parallel multilingual corpora storage based on the model developed under the ET10-63 project.
A language free statistical alignment package for sentence alignment.
A software package for text retrieval and corpus browsing.
A part-of-speech tagger for Spanish.
A 1M word parallel trilingual (English, French, Spanish) subcorpus of the ITU corpus part-of-speech annotated and sentence aligned (POS annotation manually corrected).
Mono- and multilingual lexical resources (lexicons, term banks).