Main

Laboratorio de Lingüística Informática

Corpus Resources And Terminology ExtRaction


(MLAP-93/20)

PROJECT SUMMARY

This project proposes the creation of a set of tools and resources for multilingual corpus linguistics work.

TOOLS

A database model for annotated and aligned parallel multilingual corpora storage based on the model developed under the ET10-63 project.

A language free statistical alignment package for sentence alignment.

A software package for text retrieval and corpus browsing.

A part-of-speech tagger for Spanish.

RESOURCES

A 1M word parallel trilingual (English, French, Spanish) subcorpus of the ITU corpus part-of-speech annotated and sentence aligned (POS annotation manually corrected).

Mono- and multilingual lexical resources (lexicons, term banks).

PARTNERSHIP




Main Main