Main

Laboratorio de Lingüística Informática

MultiMedica


Funded by CICYT
Project TIN2010-20644-C03-03
January 2011 to June 2014

The project's main goal is the processing of divulgative texts about health topics in languages such as Spanish, Arabic and Japanese. The result is a generic search tool to look for information about diseases and drugs; and a specific one for translating and terminology teaching in the biomedical domain, by means of terminology extraction applied to the comparable corpus Spanish, Arabic and Japanese.

The work made by the UAM team has been focused on the development of linguistic resources (medical corpora in three languages) and the creation of an automatic Spanish, Japanese and Arabic medical term extractor.


Project web page (Faculty of Computer Science, Universidad Carlos III de Madrid)

Multilingual prototype of term query tool and automatic medical term extractor


Current stage of the project

A search tool and a medical term extractor were developed to consult the corpus (click on the image below to access to the interface):


Click here to consult the corpus

The following data bank were collected:


Summary table

CORPUS
Documents
Words/characters
JAPANESE
3746
1131304
Kampo Medicine (Oriental Medicine in Japan)
719
214757
Kansenshogaku Zasshi (Infectious Diseases Journal)
858
244879
Kanzo (Liver Diseases Journal)
1446
432674
ORLTokyo (Japanese Otolaryngology)
623
203705
Sanfujinka no shinpo (Advances in Obstetrics)
100
35289
ARABIC
43526
2559323
Altibbi
43278
2460733
Alawsat
68
58610
Youm7
83
18948
ElKhabar
97
21032
SPANISH
4204
4031174
Harrison
3841
3696484
OCU-Salud
297
310894
Tu otro médico
66
23796
TOTAL
51476
7721801



Foremost publications

MORENO SANDOVAL, A., L. CAMPILLOS LLANOS, C. HERRERO ZORITA, J. M. GUIRAO MIRAS, A. GONZÁLEZ MARTÍNEZ, D. SAMY and E. TAKAMORI (2014) "An online tool for enhancing NLP of a biomedical corpus". 6º Congreso Internacional de Lingüística de Corpus (6th International Conference on Corpus Linguistics - CILC 2014). Las Palmas de Gran Canaria, May 22-24 2014.

MORENO SANDOVAL, A., and L. CAMPILLOS LLANOS (2013) "Design and annotation of MultiMedica - a multilingual text corpus of the biomedical domain". En Procedia - Social and Behavioral Sciences, 95, pp. 33–39 (Actas seleccionadas del 5º Congreso Internacional de Lingüística de Corpus 2013, Universidad de Alicante, España. March 14 - 16 2013). Berlin: Elsevier. ISSN: 1877-0428.


Publications

2015

CAMPILLOS LLANOS. L. and H. UEDA (2015) "Frecuencia y dispersión léxicas en textos médicos divulgativos en español". In Ibérica, 29 (accepted, to be published in 2015)

HERRERO-ZORITA, C., MOLINA, C. and MORENO-SANDOVAL, A. (2015) "Medical term formation in English and Japanese: A study of the suffixes –gram, -graph and –graphy". In Review of Cognitive Linguistics, 13 (1) (accepted, to be published in April, 2015)

2014

HERRERO ZORITA, C., L. CAMPILLOS LLANOS, and A. MORENO SANDOVAL (2014) "Collecting and POS-tagging a lexical resource of Japanese biomedical terms from a corpus". Procesamiento del Lenguaje Natural, 52, pp. 29-36. ISSN: 1989-7553.

MORENO SANDOVAL, A., L. CAMPILLOS LLANOS, C. HERRERO ZORITA, J. M. GUIRAO MIRAS, A. GONZÁLEZ MARTÍNEZ, D. SAMY and E. TAKAMORI (2014) "An online tool for enhancing NLP of a biomedical corpus". 6th International Conference on Corpus Linguistics (CILC 2014). Las Palmas de Gran Canaria, 22-24 May 2014.

2013

CAMPILLOS LLANOS, L., MORENO SANDOVAL, A., and J. M. GUIRAO (2013) "An automatic term extractor for biomedical terms in Spanish". In Proceedings of the 5th International Symposium on Languages in Biology and Medicine (LBM 2013). 12th and 13th December 2013. Tokyo, Japan . Best poster award.

HERRERO ZORITA, C., A. MORENO SANDOVAL, and L. CAMPILLOS LLANOS (2013) "Technology and Terminology. The Case of Japanese Medical Terms". Via Japan International Conference: Japan-Imprinted Discourses. Universidad Autónoma de Madrid. 24th October 2013

HERRERO ZORITA, C. (2013) "An initial approach on medical term formation in Japanese through the usage of corpora". In Andrew Hardie and Robbie Love (eds.) Proceedings of the 7th Corpus Linguistics Conference 2013, pp. 339-341. Lancaster University (United Kingdom). 23rd-26th July 2013. Lancaster: UCREL.

LANA-SERRANO, S., D. SÁNCHEZ-CISNEROS, L. CAMPILLOS LLANOS, and I. SEGURA-BÉDMAR (2013) "Recognizing Chemical Compounds and Drugs: a Rule-Based Approach Using Semantic Information". In M. Krallinger, F. Leitner, O. Rabal, M. Vázquez, J. Oyarzábal, and A. Valencia (eds.) Proceedings of the Fourth BioCreative Challenge Evaluation Workshop vol. 2, pp. 121-128. Washington, DC, EE.UU. 8 de octubre del 2013. ISBN: 978-84-933255-8-9.

MORENO SANDOVAL, A., and L. CAMPILLOS LLANOS (2013) "Design and annotation of MultiMedica - a multilingual text corpus of the biomedical domain". In Procedia - Social and Behavioral Sciences, 95, pp. 33-39 (Selected proceedings of the 5th International Conference in Corpus Linguistics 2013, University of Alicante, Spain. 14th-16th March 2013). Berlin: Elsevier. ISSN: 1877-0428.

MORENO SANDOVAL, A., L. CAMPILLOS LLANOS, A. GONZÁLEZ MARTÍNEZ, and J. M. GUIRAO (2013) "An affix-based method for automatic term recognition from a medical corpus of Spanish". In Andrew Hardie and Robbie Love (eds.) Proceedings of the 7th Corpus Linguistics Conference 2013. Lancaster University (United Kingdom), pp. 214-217. 23rd-26th July 2013. Lancaster: UCREL.

SÁNCHEZ-CISNEROS, D., S. LANA-SERRANO, I. SEGURA-BÉDMAR, L. CAMPILLOS LLANOS, and P. MARTÍNEZ FERNÁNDEZ (2013) "A web prototype for detecting chemical compounds and drugs". In Adrian Paschke, Albert Burger, Paolo Romano, M. Scott Marshall, Andrea Splendiani (eds.) Proceedings of the 6th International Workshop on Semantic Web Applications and tools for life sciences (SWAT4LS), 10th December 2013. Edinburgh, UK. ISSN: 1613-0073.

2012

LANA-SERRANO, S., D. SÁNCHEZ-CISNEROS, P. MARTÍNEZ FERNÁNDEZ, A. MORENO SANDOVAL, and L. CAMPILLOS LLANOS (2012) "An Approach for Detecting Modality and Negation in Texts by Using Rule-based Techniques". CLEF (Online Working Notes/Labs/Workshop) 2012. Rome, Italy, September, 2012. ISSN 2038-4963.

SAMY, D.; A. MORENO SANDOVAL; C. BUENO-DÍAZ, M. GARROTE-SALAZAR, and J. M. GUIRAO (2012) "Medical Term Extraction in an Arabic Medical Corpus". In N. Calzolari, K. Choukri, T. Declerck, M. Uğur Doğan, B. Maegaard, J. Mariani, J. Odijk and S. Piperidis (eds) (2012) Proceedings of the 8th Language Resources and Evaluation Conference 2012. 23-25 May 2012. Istanbul, Turkey. ISBN 978-2-9517408-7-7.

SÁNCHEZ-CISNEROS, D., S. LANA SERRANO, A. MORENO SANDOVAL, L. CAMPILLOS LLANOS, P. MARTÍNEZ FERNÁNDEZ, and I. SEGURA-BEDMAR (2012) "Prototipo de buscador de información médica en corpus multilingües y extractor de información sobre fármacos". Conference of the Spanish Society for Natural Language Processing (Sociedad Española para el Procesamiento del Lenguaje Natural, SEPLN 2012). Published in Procesamiento del Lenguaje Natural, nº 49, September 2012, pp. 209-212. ISSN 1135-5948.




Main Main