International Journal of Advances in Computer Science and Its Applications
Author(s) : MAYANK KALBHOR , SANJAY AGARWAL
To classify objects into different classes, feature plays a vital role. So identification of best features is a backbone of classification process. In text classification, features are simple words, having very large dimension so finding the most appropriate feature set is a big challenge. This paper includes analysis of some feature selection methods for multi class text classification and checks their results on different classifier for an email classification. We run our experiments on 20NewGroups and PU corpora datasets. Experiments are done on some well-known feature selection method like Term Selection, Document Frequency, Mutual Information, Odds Ratio, Chi square and etc. This paper concludes that Mutual Information and Chi square are most appropriate for text classification