Modified Light Stemming Algorithm for Arabic Language
Rafal Ali Sameer*
Department of Computer Science, College of Science, University of Baghdad, Baghdad, Iraq
Stemming is a pre-processing step in Text mining applications as well as it is very important in most of the Information Retrieval systems. The goal of stemming is to reduce different grammatical forms of a word and sometimes derivationally related forms of a word to a common base (root or stem) form like reducing noun, adjective, verb, adverb etc. to its base form. The stem needs not to be identical to the morphological root of the word; it is usually sufficient that related words map to the same stem, even if this stem is not in itself a valid root. As in other languages; there is a need for an effective stemming algorithm for the indexing and retrieval of Arabic documents while the Arabic stemming algorithms are not widely available. The current algorithm will perform preprocessing operations then matches the result word to Arabic patterns to get the stem of the word. This paper proposed a modified light stemming algorithm for Arabic Languages. As shown from the results, the proposed algorithm is an efficient algorithm.
Keywords: Stemming, stop words, Light stemming algorithm.
الخوارزمية المعدلة لاستعادة الجذور في اللغة العربية
رفل علي سمير*
قسم علوم الحاسبات، كلية العلوم، جامعة بغداد، بغداد، العراق