Laboratorio de Lingüística Informática
The MAVIR corpus is a collection of audio and video recordings, with their corresponding orthographic transcriptions, computationally processed. The main aim of the corpus is researching in Natural Language Processing and Speech Technology.
![]() |
![]() |
Recordings come from lectures and talks on language technology celebrated within the framework of the MAVIR Consortium ('Improving Access and Visibility of Multilingual Information Online for the Comunidad de Madrid').
The corpus is made up of 13 recordings (audio and video) in Spanish and English languages. Data were collected during the I, II and III MAVIR Conference, held in Madrid in 2006, 2007 and 2008 respectively.
The following table shows the data regarding the recordings and transcriptions:
File | Title | Length | Words | Language |
mavir01 | Challenges for Information Extraction | 1h 07' 39" | 9,113 | English |
mavir02 | Proceso de innovación de tecnologías de acceso a la información: ¿Cómo llegar al mercado? | 1h 14' 32" | 13,422 | Spanish |
mavir03 | España y los buscadores: un mercado potencial | 38' 11" | 6,681 | Spanish |
mavir04 | Aplicaciones en dominios médico y cultural | 57’ 22" | 9,310 | Spanish |
mavir05 | On-demand Information Extraction | 36' 08" | 4,461 | English |
mavir06 | Buscador General Panhispánico | 29' 09" | 4,332 | Spanish |
mavir07 | Tecnología de la Web Semántica | 21' 47" | 3,831 | Spanish |
mavir08 | Premio MAVIR 2007 | 18' 55" | 3,356 | Spanish |
mavir09 | Buenas prácticas en presencia web para grupos de investigación | 1h 10' 03" | 11,179 | Spanish |
mavir10 | Multimedia Retrieval and Evaluation | 1h 27' 24" | 15,659 | English |
mavir11 | Premio MAVIR 2008 | 20' 20" | 3,130 | Spanish |
mavir12 | Beyond Text-based Multimedia Retrieval | 1h 7' 40" | 11,168 | Spanish |
mavir13 | Buscando cangrejos en Flickr | 43' 38 | 7,837 | Spanish |
TOTAL | 10h 38' 48" | 10,3479 |
The MAVIR consortium, which distributes the corpus, recognizes that the copyright of the contents belong to each speaker.
The corpus is freely available for non-commercial and non-lucrative purposes, provided that no change is to be made on the data and on condition that all research publications acknowledge the use of the MAVIR corpus.
Copyright © by MAVIR Consortium. All rights reserved.