چکیده:
This article aims at defining RICeST Stemmer in Persian language set up in the
Regional Information Center for Science and Technology (RICeST). We applied
linguistic knowledge and standard algorithms to extract machine-readable rules.
In addition, plural suffixes and exceptions of which compound nouns are a part
were applied. Different parts of Singular-stemmer and their functions are
described.
خلاصه ماشینی:
com Abstract This article aims at defining RICeST Steamer in Persian language set up in the Regional Information Center for Science and Technology (RlCeST).
Keywords: Singular-Stemmer Software, Information Retrieval, Singular-stemmer Algorithm, Plural Suffixes, RICeST, RICeST Stemmer Introduction Language is in contact with the outer world from two aspects: speech and writing.
In order to make tools, models or techniques that would be applicable to information retrieval, it is important to pay attention to linguistic and Natural Language Processing.
Neshat (2000), in her article on Persian language and script difficulties, believes that singular and plural subjects can not provide a suitable solution to solve these problems in information science (Alizadeh, 2009).
Jalili (2004) introduces Persian Stemmer tools in his article and names some of its applications such as context automatic classification in big archives, indexing the words, searching the texts in search engines and mechanical translation.
Therefore, we applied linguistic knowledge and standard Algorithms with supporting of 10 suffixes and almost 2000 exceptions (also irregular nouns) in order to make RICeST Stemmer.
Singular-Stemmer In order to stem, a list of ten plural suffixes was provided, that can be used for nouns, adjectives or pronouns.
Because plural suffixes in Persian language are not more than three letters, there is no need to check more than three letters of words.
As it can be seen, if ‘ha’ and ‘an’ are added to the end of the singular words, plural nouns will be formed (Anvari & Ahmadi Givi, 2006).