Main

Laboratorio de Lingüística Informática

THE JABALÍN MORPHOLOGICAL ANALYZER OF THE ARABIC VERB


Jabalín morphological analyzer is an application for generating and analyzing verbs in Modern Standard Arabic and Classical Arabic. It has been created by Alicia González Martínez (linguist) and Susana López Hervás (computer engineer), under the supervision of Dr Antonio Moreno Sandoval, in 2012.

Jabalín is primarily an application for generating verbs in Modern Standard Arabic. The generation system has been implemented in python language, version 3. It includes a lexicon of 15,452 verb lemmas that can be used as the input of the system. The system generates all verbs as morphologically regular starting from the verbal root. Irregularities are considered phonological alterations affecting the superficial level of the form and as such they are treated in a later stage. The system classifies all verbs in only two conjugational classes.

The output of the system is a lexicon of 1,684,268 verbal forms along with their corresponding morphological information. 749,051 of these forms, comprising 44% of the lexicon, have been evaluated to assure the accuracy of the generation process. For the evaluation, we used the lexicon of the ElixirFM Morphological Analyzer (Smrž 2012) and considered it a gold standard. The evaluation task consisted of comparing the shape of each verbal form of Jabalín with the corresponding ElixirFM form and determining whether they match or not. Matched forms are considered to be successfully generated forms. The results of the evaluation can be seen in the following table.



No. of forms % from total % from evaluable forms
Correct 745,436 44,26% 99.52%
Incorrect 3,615 0.21% 0.48%
No data 935,217 55.53% -
Total 1,684,268 - -



ONLINE INTERFACE OF THE JABALÍN ANALYZER


The generation system implemented in python is available under a GPL-GNU license:

DOWNLOAD JABALÍN SOURCE CODE FROM github.com

DOWNLOAD JABALÍN SOURCE CODE FROM sourceforge.net


The Jabalín project includes some additional features:

The Jabalín Transliteration System

Quantitative Data extracted from the lexicons



RELEVANT PUBLICATIONS