Laboratorio de Lingüística Informática
(MLAP-93/20)
PROJECT SUMMARY
This project proposes the creation of a set of tools and resources for multilingual corpus linguistics work.
TOOLS
A database model for annotated and aligned parallel multilingual corpora storage based on the model developed under the ET10-63 project.
A language free statistical alignment package for sentence alignment.
A software package for text retrieval and corpus browsing.
A part-of-speech tagger for Spanish.
RESOURCES
A 1M word parallel trilingual (English, French, Spanish) subcorpus of the ITU corpus part-of-speech annotated and sentence aligned (POS annotation manually corrected).
Mono- and multilingual lexical resources (lexicons, term banks).
PARTNERSHIP
Full partners
Subcontractors
IBM-France
ETSI Telecomunicación, UPM, Spain