Description: Thispaper is a comparativestudy of feature selectionmethodsintext categorization. Four methods were
evaluated, including document frequency ( DF) , information gain ( IG) , mutual information ( MI) andV
2
-test
( CHI). ASupport Vector Machine ( SVM) anda k-nearest neighbor ( KNN) wereselectedastheevaluating class-i
fiers. We foundIG, MI andCHI hadpoor performance inour test, thoughthey behavewell inEnglishtext catego-rization. We analyzedthereasonstheoretically andput forwardedthe possible solutions. Afurthermore experiment
provedthat the combinedfeatureselectionmethodis effective.
To Search:
File list (Check if you may need any files):
中文文本分类中特征抽取方法的比较研究.pdf