Multimodal and Multilingual Advanced Answers Search: Linguistic Resources
Funded by CICYT
Project TIN2007-67407-C03-02
October 2007 to September 2010
The project aims at creating a multimodal (text and voice) and multilingual asnwers search platform which integrates the modules developed by the different participating groups. The stating hypothesis is that it is possible to improve the answers search task of the current systems, working on the modules which made up the architecture of a system of this sort. Specially, the multilingual IR modules, the enhancement of indexing, speeding up the information access, improvement of extraction and arrangement of answers and the questions analysis. We deal with web information, encyclopaedic resources and news. Thus, linguists' work is essential to develop and/or adapt appropriate resources, as well as for the integration of lexical and software resources.
We also aim at appliying this techniques and methodology to other areas, as onthology and information retrieval, Named Entities and voice interaction, investigating ways of adapting these tasks to new domains and languages.
Project goals
The main tasks of the LLI-UAM in BRAVO are:
- Creation of new multilingual resources in Arabic, Spanish and Japanese.
- Design and annotatio of a Spanish speech corpus of questions.
- Definition of a model for question classification.
- Adding linguistic resources to improve the management of spontaneous speech, in order to adapt a voice recognizer to questions formulation.
Results
Researchers
- Principal investigator: Antonio Moreno Sandoval
- Computer technician: José María Guirao Miras
- Other professors:
- Théophile Ambadiang
- Mohamed El-Madkouri
- Chieko Kimura
- Paula Gonzalo Gómez
- Other researchers:
- Manuel Alcántara
- Doaa Samy
- Ana González Ledesma
- Marta Garrote Salazar
Papers
2011
- MORENO-SCHNEIDER, J., GARROTE-SALAZAR, M., MARTÍNEZ, P. and MARTÍNEZ FERNANDEZ, J.L. "Some experiments in evaluating ASR systems applied to multimedia retrieval", in Detyniecki, M., García-Serrano, A.and Nürnberger, A. (Eds.), Adaptive Multimedia Retrieval. Understanding Media and Adapting to the User. 7th International Workshop, AMR 2009, Madrid, Spain, September 24-25, 2009, Revised Selected Papers, Springer-Verlag, Lecture Notes in Computer Science, 6535, ISBN: 978-3-642-184, Páginas: 12-23.
2010
- CAMPILLOS LLANOS, L., GOZALO GÓMEZ, P., GUIRAO MIRAS, J. Mª and MORENO SANDOVAL, A. Español oral en contexto. Vol. 1. Textos de español oral. Material de ELE basado en corpus. Comprensión auditiva. Madrid: Servicio de publicaciones de la Universidad Autónoma de Madrid. 2010. ISBN 978-84-8344-181-7.
- GARROTE, M. and MORENO SANDOVAL, A."Chiede. A spontaneous child language corpus of spanish". In Moneglia y Panunzi (eds.): Bootstrapping Information from Corpora in a Cross-Linguistic Perspective. Firenze University Press, pp. 121-140. ISBN 978-88-8453-518-4.
- GARROTE, M. Los corpus de habla infantil. Metodología y análisis. Servicio de publicaciones de la Universidad Autónoma de Madrid. ISBN 978-84-8344-187-9.
- VICENTE-DÍEZ, M., DE PABLO, C., MARTÍNEZ, P., MORENO-SCHNEIDER, J. and GARROTE-SALAZAR, M. "Are Passages Enough? The MIRACLE Team Participation in QA@CLEF2009", in Peters, C., Di Nunzio, G.M., Kurimo, M., Mandl, Th., Mostefa, D., Penas, A. y Roda, G. (Eds.), Multilingual Information Access Evaluation I - Text Retrieval Experiments. Springer-Velarg, ISBN: 978-3-642-157, Volumen: 6241, Páginas: 281-288.
2009
- ALCÁNTARA PLA , M. and DECLERCK, T. Proceedings of the EACL 2009 Workshop on Semantic Representation of Spoken Language; Atenas: ACL, 2009.
- CAMPILLOS, L. and ALCÁNTARA, M. "Speech Disfluencies in Formal Context. Analysis Based on Spontaneous Speech Corpora", in Corpus Linguistics Conference, Liverpool. 2009
- GONZÁLEZ LEDESMA, A. Los marcadores del discurso en el corpus C-ORAL-ROM: anotación pragmática, estrategias computacinales de etiquetado y aplicaciones a otros campos. 2009. Universidad Autónoma de Madrid.
- MORENO SANDOVAL, A. and GUIRAO MIRAS, J.M. "Frecuencia y distintividad en el uso lingüístico: casos tomados de la lematización verbal de corpus de distintos registros", in Actas del I Congreso Internacional de Lingüística de Corpus (CILC-09), Universidad de Murcia, 2009.
2008
- ALCÁNTARA PLÁ, M."El análisis lingüístico en la transcripción automática de la lengua hablada, el Proyecto COAST"
in Actas del VIII Congreso de Lingüística General: El valor de la diversidad [meta]lingüística, Madrid. AÑO: 2008
- CAMPILLOS, L.. "Las expresiones causales en el corpus de habla espontánea C-ORAL-ROM". In Actas del 8ª Congreso de Lingüística General, Universidad Autónoma de Madrid, 25-28 de junio. AÑO: 2008
- DE PABLO SÁNCHEZ, C., MARTÍNEZ FERNÁNDEZ, J.L., GONZÁLEZ LEDESMA, A., SAMY, D., MARTÍNEZ, P., MORENO, A. and ALJUMAILY, H. "Combining Wikipedia and newswire text for Question Answering in Spanish"
Carol Peters, Valentin Jijkoun, Thomas Mandl, Henning Müller, Douglas W. Oard, Anselmo Peñas, Vivien Petras, Diana Santos (Eds.): Advances in Multilingual and Multimodal Information Retrieval, 8th Workshop of the Cross-Language Evaluation Forum, CLEF 2007, Budapest, Hungary, September 19-21, 2007, Revised Selected Papers. Lecture Notes in Computer Science 5152 Springer 2008, ISBN 978-3-540-85759-4 Pp. 352-355.
- GARROTE, M., GUIRAO, J.M. and MORENO, A.. "Extracción de unidades distintivas en adultos y niños de un corpus de lengua oral espontánea". In Actas del 8ª Congreso de Lingüística General, Universidad Autónoma de Madrid, 25-28 de junio. AÑO: 2008
- GONZÁLEZ LEDESMA, A. and SAMY, D.. "Marcadores discursivos en árabe y español: un estudio computacional basado en corpus paralelos con anotación pragmática". In Actas del 8ª Congreso de Lingüística General, Universidad Autónoma de Madrid, 25-28 de junio. AÑO: 2008
- GOZALO, P.. "Reflexiones sobre el futuro. Los datos del español no nativo". In Actas del 8ª Congreso de Lingüística General, Universidad Autónoma de Madrid, 25-28 de junio. AÑO: 2008
- MORENO SANDOVAL, A., T. TOLEDANO, D., DE LA TORRE, R., GARROTE, M. and GUIRAO, J.M.. "Developing a Phonemic and Syllabic Frequency Inventory for Spontaneous Spoken Castilian Spanish and their Comparison to Text-Based Inventories". In Proceedings of LREC 2008,Marrakech, 28-30 de mayo. AÑO: 2008
- SAMY, D. y GONZÁLEZ LEDESMA, A.. "Pragmatic Annotation of Discourse Markers in a Multilingual Parallel Corpus (Arabic- Spanish-English)". In Proceedings of LREC 2008,Marrakech, 28-30 de mayo. AÑO: 2008
- SEGURA BEDMAR, I., MARTÍNEZ, P. and SAMY, D. "Detección de fármacos genéricos en textos biomédicos"
Marzo, 2008, Revista Española para el procesamiento del lenguaje natural (SEPLN), ISSN: 1135-5948, Pp. 27-34.
- SEGURA BEDMAR, I., MARTÍNEZ, P. and SAMY, D. "A preliminary approach to recognize generic drug names by combining UMLS resources and USAN naming conventions"
Ohio, USA, June, 2008, Proceedings of the Workshop on Current Trends in Biomedical Natural Language Processing (BioNLP), Association for Computational Linguistics, ISBN: 978-1-932432-,
Páginas: 100-101.
- SEGURA BEDMAR, I., SAMY, D., MARTÍNEZ FERNÁNDEZ, J.L.and MARTÍNEZ, P.
"Detecting Semantic Relations between Nominals using Support Vector Machines and Linguistic-Based Rules", Portugal, November, 2007,
On the Move to Meaningful Internet Systems 2007: OTM 2007 Workshops, Springer Berlin / Heidelberg, ISBN: 978-3-540-768, ISSN: 0302-9743, Pp. 1267-1273.
- VICENTE DÍEZ, M., SAMY, D. and MARTÍNEZ, P. "An empirical approach to a preliminary successful identification and resolution of temporal expressions in Spanish news corpora"
Proceedings of the Sixth International Language Resources and Evaluation Conference (LREC'08), Marrakech, Morocco, May, 2008, European Language Resources Association (ELRA), ISBN: 2-9517408-4-0, Pp. 2153-2158.