Laboratorio de Lingüística Informática


MAVIR Corpus

The MAVIR corpus is a collection of audio and video recordings, with their corresponding orthographic transcriptions, computationally processed. The main aim of the corpus is researching in Natural Language Processing and Speech Technology.

Consult the MAVIR corpus Consult the MAVIR corpus

Recordings come from lectures and talks on language technology celebrated within the framework of the MAVIR Consortium ('Improving Access and Visibility of Multilingual Information Online for the Comunidad de Madrid').

The corpus is made up of 13 recordings (audio and video) in Spanish and English languages. Data were collected during the I, II and III MAVIR Conference, held in Madrid in 2006, 2007 and 2008 respectively.

The following table shows the data regarding the recordings and transcriptions:

File Title Length Words Language
mavir01 Challenges for Information Extraction 1h 07' 39" 9,113 English
mavir02 Proceso de innovación de tecnologías de acceso a la información: ¿Cómo llegar al mercado? 1h 14' 32" 13,422 Spanish
mavir03 España y los buscadores: un mercado potencial 38' 11" 6,681 Spanish
mavir04 Aplicaciones en dominios médico y cultural 57’ 22" 9,310 Spanish
mavir05 On-demand Information Extraction 36' 08" 4,461 English
mavir06 Buscador General Panhispánico 29' 09" 4,332 Spanish
mavir07 Tecnología de la Web Semántica 21' 47" 3,831 Spanish
mavir08 Premio MAVIR 2007 18' 55" 3,356 Spanish
mavir09 Buenas prácticas en presencia web para grupos de investigación 1h 10' 03" 11,179 Spanish
mavir10 Multimedia Retrieval and Evaluation 1h 27' 24" 15,659 English
mavir11 Premio MAVIR 2008 20' 20" 3,130 Spanish
mavir12 Beyond Text-based Multimedia Retrieval 1h 7' 40" 11,168 Spanish
mavir13 Buscando cangrejos en Flickr 43' 38 7,837 Spanish
TOTAL 10h 38' 48" 10,3479  

The MAVIR consortium, which distributes the corpus, recognizes that the copyright of the contents belong to each speaker.

The corpus is freely available for non-commercial and non-lucrative purposes, provided that no change is to be made on the data and on condition that all research publications acknowledge the use of the MAVIR corpus.

Copyright © by MAVIR Consortium. All rights reserved.

Main Main