Spoken and Written Corpus Development
The methodology of elaborating corpora is a systematized process
developed by the Universidad Autónoma of Madrid,
Spain. This service is offered through an
agreement signed by the client and the LLI. The work
includes all the stages of development: corpus design, data capture and subsequent text analysis,
annotation and enrichment of texts.
- Preliminary design, taking into account the socio-linguistic
features (age, sex, demographic data, linguistic origin, education,
etc.) and the communicative context. This information can be
modified depending on the aims of the research and the design can be
adapted to different variables.
- Data collection (recordings, video captures, edition)
- Orthographic transcription (including the normative variation as
well as the real enunciation)
- Prosodic annotation, pause marks, vocalic lengthenings, overlaps,
interruptions, entonation, etc.
- Alignment of text-sound units.
- Semi-automatic morphological annotation (morphological information
- Automatic phonological annotation.