LINGUISTIC RESOURCES
CORPUS
TOOLS & MODELS
SERVICES
Spoken and Written Corpus Development
The methodology of elaborating corpora is a systematized process developed by the Universidad Autónoma of Madrid, Spain.
This service is offered through an agreement signed by the client and the LLI. The work includes all the stages of development: corpus design, data capture and subsequent text analysis, annotation and enrichment of texts.
Preliminary design, taking into account the socio-linguistic features (age, sex, demographic data, linguistic origin, education, etc.) and the communicative context. (this information can be modified depending on the aims of the research and the design can be adapted to different variables)
Data collection (recordings, video captures, edition)
Orthographic transcription (including the normative variation as well as the real enunciation)
Prosodic annotation, pause marks, vocalic lengthenings, overlaps, interruptions, entonation, etc.
Alignment of text-sound units in sentences
Semi-automatic morphological annotation (morphological information and lemmas)
Automatic phonological annotation
PROCEEDINGS
8º Congreso de Lingüística General
I Conference on Digital Humanities:
• Paul Rayson (Lancaster University, CLARIN Ambassador): Linking digital humanities with NLP and corpus linguistics
• Antonio Moreno Sandoval (LLI-UAM): Herramientas digitales para literatura: el diccionario de lemas y formas del Quijote.
• Alicia González (Universität Hamburg): Digital humanities for classical arabic: applications for historians and philologists.