Three-level Morphological Analyzer for Arabic Verbs and Particles

F.T. Al-Raisi, A.M. Al-Hafeedh, S.M. Al-Farsi, and H.Z. Zidoum (Sultanate of Oman)


Arabic processing, morphological analysis, computational linguistics


This paper presents a Three-level Morphological Analyzer (MA). Our approach consists of mimicking morphology processing carried out by a human linguist expert. Hence, a great emphasis is put on the analysis and representation of Arabic linguistic rules. This step is very crucial in order to come up with a reliable MA. In the Three-level MA, surface words (tokens) undergo stemming to produce corresponding stems. Roots are then generated from resultant stems. A multi-affix approach is considered when stemming tokens. The stemming algorithm performs iterative light stemming which strips a part of the prefix/suffix. Indeed, from the linguistic point of view, a prefix/suffix is not just one string of characters. It is rather a combination of letters that may represent a number of distinct entities. Light stemming helps extracting information from each prefix/suffix by considering each separately. The root generating algorithm identifies the form of a stem, wherefrom, it extracts the root. The root generating algorithm manipulates deviated stems for unified treatment purposes. The MA is equipped with a comprehensive coverage lexicon to ensure correctness of results.

