Introduction - If you have any usage issues, please Google them yourself
Thispaper is a comparativestudy of feature selectionmethodsintext categorization. Four methods were
evaluated, including document frequency ( DF) , information gain ( IG) , mutual information ( MI) andV
2
-test
( CHI). ASupport Vector Machine ( SVM) anda k-nearest neighbor ( KNN) wereselectedastheevaluating class-i
fiers. We foundIG, MI andCHI hadpoor performance inour test, thoughthey behavewell inEnglishtext catego-rization. We analyzedthereasonstheoretically andput forwardedthe possible solutions. Afurthermore experiment
provedthat the combinedfeatureselectionmethodis effective.