A Rule-based Extensible Stemmer for Information Retrieval with Application to Arabic

H.M. Harmanani, W.T. Keirouz, and S. Raheel (Lebanon)


Natural Language Processing, Information Retrieval


This paper presents a new and extensible method for information retrieval and content analysis in natural languages (NL). The proposed method is stem-based; stems are extracted based on a set of language dependent rules that are interpreted by a rule engine. The rule engine allows the system to be adapted to any natural language by modifying the NL semantic rules and grammar. The system has been fully tested using Arabic, and partially using English, Hebrew and Persian. We validate our approach using a database-based prototype.

Important Links:

Go Back