Persian Text Classification Enhancement by Latent Semantic Space مقاله

نویسنده: Dastgheib، Mohammad Bagher ؛ Koleini، Sara ؛

International Journal Of Information Science And Management January & June 2019,Volume 17 - Number 1 رتبه الف (وزارت علوم/ISC (‎14 صفحه - از 33 تا 46 )

کلیدواژه ها: Vector Space Model Persian Text Classification Latent Semantic Indexing(LSI)

چکیده:

Heterogeneous data in all groups are growing on the web nowadays. Because of the variety of data types in the web search results, it is common to classify the results in order to find the preferred data. Many machine learning methods are used to classify textual data. The main challenges in data classification are the cost of classifier and performance of classification. A traditional model in IR and text data representation is the vector space model. In this representation cost of computations are dependent upon the dimension of the vector. Another problem is to select effective features and prune unwanted terms. Latent semantic indexing is used to transform VSM to orthogonal semantic space with term relation consideration. Experimental results showed that LSI semantic space can achieve better performance in computation time and classification accuracy. This result showed that semantic topic space has less noise so the accuracy will increase. Less vector dimension also reduces the computational complexity.

خلاصه ماشینی:

Experimental results showed that LSI semantic space can achieve better performance in computation time and classification accuracy. Keywords: Persian Text Classification, Vector Space Model, Latent Semantic Indexing (LSI). Therefore, word2vec constructs a vocabulary from the training text data and then learns vector representation of words with its neural network model. Rejan, Ramalingam, Ganesan, Palanivel & Palaniappan (2009) developed the Tamil language text classification system based on VSM and neural network (NN) model. Uysal and Gunal (2014) proposed a method based on genetic algorithm oriented latent semantic features (GALSF) for improving the representation of documents in text classification. They developed a text classification system for comparing the performance of the VSM model and their proposed VSM model with title vector-based document representation method. Pilevar, Feili & Soltani (2009) used the Learning Vector Quantization network for Persian document classification. Persian Text Classification Vector Space Model (VSM) has been used in IR and NLP for many years before (Wong, Ziarko, Raghavan & Wong, 1987). The latent semantic analysis uses singular value decomposition (SVD) method to decompose a large term-document matrix into a set of k orthogonal principal value'. SVD low rank approximation LSI is an automatic method that can transform the original textual data to a smaller semantic space by taking advantage of some of the implicit higher-order structure in associations of words with text objects (Landauer & Dumais, 2008; Landauer and Dumais, 2006). The experiments showed that reduced semantic LSI space has better performance in Persian text classification. Improve VSM text classification by title vector based document representation method.

دریافت فایل ارجاع :
(پژوهیار, , , )

دانلود HTML
دانلود PDF

ورود / عضویت

برای مشاهده محتوای مقاله لازم است وارد پایگاه شوید. در صورتی که عضو نیستید از قسمت عضویت اقدام فرمایید.

ورود

عضویت

تحتاج دخول لعرض محتوى المقالة. إذا لم تكن عضوًا ، فتابع من الجزء الاشتراک.
إن كنت لا تقدر علی شراء الاشتراك عبرPayPal أو بطاقة VISA، الرجاء ارسال رقم هاتفك المحمول إلی مدير الموقع عبر webmaster@noormags.com .

You need Sign in to view the content of the article. If you are not a member, proceed from part Sign up.
If you fail to purchase subscription via PayPal or VISA Card, please send your mobile number to the Website Administrator via webmaster@noormags.com .

لینک کوتاه:

1400

1399

1398

1397

1396

1395

1394

1393

1392

1391

1390

1389

1388

1387

1386

1385

1384

1383

1382

1381

Persian Text Classification Enhancement by Latent Semantic Space مقاله