Spoken and Written Corpus Development

The methodology of elaborating corpora is a systematized process developed by the Universidad Autónoma of Madrid, Spain.

This service is offered through an agreement signed by the client and the LLI. The work includes all the stages of development: corpus design, data capture and subsequent text analysis, annotation and enrichment of texts.

Preliminary design, taking into account the socio-linguistic features (age, sex, demographic data, linguistic origin, education, etc.) and the communicative context. (this information can be modified depending on the aims of the research and the design can be adapted to different variables)

Data collection (recordings, video captures, edition)

Orthographic transcription (including the normative variation as well as the real enunciation)

Prosodic annotation, pause marks, vocalic lengthenings, overlaps, interruptions, entonation, etc.

Alignment of text-sound units in sentences

Semi-automatic morphological annotation
(morphological information and lemmas)

Automatic phonological annotation