Laboratorio de Lingüística Informática
Funded by CICYT
Project TIN2010-20644-C03-03
January 2011 to June 2014
The project's main goal is the processing of divulgative texts about health topics in languages such as Spanish, Arabic and Japanese. The result is a generic search tool to look for information about diseases and drugs; and a specific one for translating and terminology teaching in the biomedical domain, by means of terminology extraction applied to the comparable corpus Spanish, Arabic and Japanese.
The work made by the UAM team has been focused on the development of linguistic resources (medical corpora in three languages) and the creation of an automatic Spanish, Japanese and Arabic medical term extractor.
Project web page (Faculty of Computer Science, Universidad Carlos III de Madrid)
Multilingual prototype of term query tool and automatic medical term extractor
Current stage of the project
A search tool and a medical term extractor were developed to consult the corpus (click on the image below to access to the interface):
The following data bank were collected:
Spanish corpus: it features three resources, each of them reflecting a different type of divulgative medical text:
Japanese corpus: it collects abstracts from medical journals on different specialities, from Oriental Medicine to Obstetrics and Gynaecology. Corpus size is in Japanese characters (kanjis and kanas).
Arabic corpus: it comprises documents from a medical web portal (Altibbi), a Jordanian initiative equivalent to North American portals such as Healthline. This resource contains medical articles and divulgation news, with a certain degree of control by medical doctors. It is from Altibbi that most of the documents were taken for the Arab corpus, which is complemented with texts from the Health sections from three journals published in three geographical and dialect areas in the Arab world: Al-Awsat (Saudi Arabia); Youm7 (Egypt); and El Khabar (Algeria).
Summary table
Kampo Medicine (Oriental Medicine in Japan) | ||
Kansenshogaku Zasshi (Infectious Diseases Journal) | ||
Kanzo (Liver Diseases Journal) | ||
ORLTokyo (Japanese Otolaryngology) | ||
Sanfujinka no shinpo (Advances in Obstetrics) | ||
Altibbi | ||
Alawsat | ||
Youm7 | ||
ElKhabar | ||
Harrison | ||
OCU-Salud | ||
Tu otro médico | ||
MORENO SANDOVAL, A., L. CAMPILLOS LLANOS, C. HERRERO ZORITA, J. M. GUIRAO MIRAS, A. GONZÁLEZ MARTÍNEZ, D. SAMY and E. TAKAMORI (2014) "An online tool for enhancing NLP of a biomedical corpus". 6º Congreso Internacional de Lingüística de Corpus (6th International Conference on Corpus Linguistics - CILC 2014). Las Palmas de Gran Canaria, May 22-24 2014.
MORENO SANDOVAL, A., and L. CAMPILLOS LLANOS (2013) "Design and annotation of MultiMedica - a multilingual text corpus of the biomedical domain". En Procedia - Social and Behavioral Sciences, 95, pp. 33–39 (Actas seleccionadas del 5º Congreso Internacional de Lingüística de Corpus 2013, Universidad de Alicante, España. March 14 - 16 2013). Berlin: Elsevier. ISSN: 1877-0428.
2015
CAMPILLOS LLANOS. L. and H. UEDA (2015) "Frecuencia y dispersión léxicas en textos médicos divulgativos en español". In Ibérica, 29 (accepted, to be published in 2015)
HERRERO-ZORITA, C., MOLINA, C. and MORENO-SANDOVAL, A. (2015) "Medical term formation in English and Japanese: A study of the suffixes –gram, -graph and –graphy". In Review of Cognitive Linguistics, 13 (1) (accepted, to be published in April, 2015)
2014
HERRERO ZORITA, C., L. CAMPILLOS LLANOS, and A. MORENO SANDOVAL (2014) "Collecting and POS-tagging a lexical resource of Japanese biomedical terms from a corpus". Procesamiento del Lenguaje Natural, 52, pp. 29-36. ISSN: 1989-7553.
MORENO SANDOVAL, A., L. CAMPILLOS LLANOS, C. HERRERO ZORITA, J. M. GUIRAO MIRAS, A. GONZÁLEZ MARTÍNEZ, D. SAMY and E. TAKAMORI (2014) "An online tool for enhancing NLP of a biomedical corpus". 6th International Conference on Corpus Linguistics (CILC 2014). Las Palmas de Gran Canaria, 22-24 May 2014.
2013
CAMPILLOS LLANOS, L., MORENO SANDOVAL, A., and J. M. GUIRAO (2013) "An automatic term extractor for biomedical terms in Spanish". In Proceedings of the 5th International Symposium on Languages in Biology and Medicine (LBM 2013). 12th and 13th December 2013. Tokyo, Japan . Best poster award.
HERRERO ZORITA, C., A. MORENO SANDOVAL, and L. CAMPILLOS LLANOS (2013) "Technology and Terminology. The Case of Japanese Medical Terms". Via Japan International Conference: Japan-Imprinted Discourses. Universidad Autónoma de Madrid. 24th October 2013
HERRERO ZORITA, C. (2013) "An initial approach on medical term formation in Japanese through the usage of corpora". In Andrew Hardie and Robbie Love (eds.) Proceedings of the 7th Corpus Linguistics Conference 2013, pp. 339-341. Lancaster University (United Kingdom). 23rd-26th July 2013. Lancaster: UCREL.
LANA-SERRANO, S., D. SÁNCHEZ-CISNEROS, L. CAMPILLOS LLANOS, and I. SEGURA-BÉDMAR (2013) "Recognizing Chemical Compounds and Drugs: a Rule-Based Approach Using Semantic Information". In M. Krallinger, F. Leitner, O. Rabal, M. Vázquez, J. Oyarzábal, and A. Valencia (eds.) Proceedings of the Fourth BioCreative Challenge Evaluation Workshop vol. 2, pp. 121-128. Washington, DC, EE.UU. 8 de octubre del 2013. ISBN: 978-84-933255-8-9.
MORENO SANDOVAL, A., and L. CAMPILLOS LLANOS (2013) "Design and annotation of MultiMedica - a multilingual text corpus of the biomedical domain". In Procedia - Social and Behavioral Sciences, 95, pp. 33-39 (Selected proceedings of the 5th International Conference in Corpus Linguistics 2013, University of Alicante, Spain. 14th-16th March 2013). Berlin: Elsevier. ISSN: 1877-0428.
MORENO SANDOVAL, A., L. CAMPILLOS LLANOS, A. GONZÁLEZ MARTÍNEZ, and J. M. GUIRAO (2013) "An affix-based method for automatic term recognition from a medical corpus of Spanish". In Andrew Hardie and Robbie Love (eds.) Proceedings of the 7th Corpus Linguistics Conference 2013. Lancaster University (United Kingdom), pp. 214-217. 23rd-26th July 2013. Lancaster: UCREL.
SÁNCHEZ-CISNEROS, D., S. LANA-SERRANO, I. SEGURA-BÉDMAR, L. CAMPILLOS LLANOS, and P. MARTÍNEZ FERNÁNDEZ (2013) "A web prototype for detecting chemical compounds and drugs". In Adrian Paschke, Albert Burger, Paolo Romano, M. Scott Marshall, Andrea Splendiani (eds.) Proceedings of the 6th International Workshop on Semantic Web Applications and tools for life sciences (SWAT4LS), 10th December 2013. Edinburgh, UK. ISSN: 1613-0073.
2012
LANA-SERRANO, S., D. SÁNCHEZ-CISNEROS, P. MARTÍNEZ FERNÁNDEZ, A. MORENO SANDOVAL, and L. CAMPILLOS LLANOS (2012) "An Approach for Detecting Modality and Negation in Texts by Using Rule-based Techniques". CLEF (Online Working Notes/Labs/Workshop) 2012. Rome, Italy, September, 2012. ISSN 2038-4963.
SAMY, D.; A. MORENO SANDOVAL; C. BUENO-DÍAZ, M. GARROTE-SALAZAR, and J. M. GUIRAO (2012) "Medical Term Extraction in an Arabic Medical Corpus". In N. Calzolari, K. Choukri, T. Declerck, M. Uğur Doğan, B. Maegaard, J. Mariani, J. Odijk and S. Piperidis (eds) (2012) Proceedings of the 8th Language Resources and Evaluation Conference 2012. 23-25 May 2012. Istanbul, Turkey. ISBN 978-2-9517408-7-7.
SÁNCHEZ-CISNEROS, D., S. LANA SERRANO, A. MORENO SANDOVAL, L. CAMPILLOS LLANOS, P. MARTÍNEZ FERNÁNDEZ, and I. SEGURA-BEDMAR (2012) "Prototipo de buscador de información médica en corpus multilingües y extractor de información sobre fármacos". Conference of the Spanish Society for Natural Language Processing (Sociedad Española para el Procesamiento del Lenguaje Natural, SEPLN 2012). Published in Procesamiento del Lenguaje Natural, nº 49, September 2012, pp. 209-212. ISSN 1135-5948.