Main

Laboratorio de Lingüística Informática

C-ORAL-ROM

C-ORAL-ROM is a multilingual corpus of spoken romance languages: French, Italian, Portuguese and Spanish. The project was funded by the EU within the V Framework Programme (IST-2000-26228) and the consortium comprises nine partners coordinated by the University of Florence. The most significant feature of C-ORAL-ROM is the spontaneity of texts: they were recorded in real context and without a script. Each subcorpus is made up of 300.000 words, with the same textual distribution to guarantee comparability and representativity. The resource is presented in different formats: an orthographic transcription, an XML tagged version and the text-sound alignment. Also, it is provided partial linguistic annotation of texts and programs to handle the corpus.

CONSULT ONLINE

HOW TO OBTAIN THE PRODUCT

The corpus is available in two formats:

  1. Book+DVD published by John Benjamins


  2. For I+D through ELDA.

ESSENTIAL DATA

CORPUS SAMPLE

C-ORAL-ROM ELE

SELECTED PUBLICATIONS


MAIN REFERENCE

OTHER PUBLICATIONS