Improving the accuracy of text classification by the different classifier with multiple confidence threshold values

淡江大學機構典藏 > 工學院 > 資訊工程學系暨研究所 > 學位論文 > Item 987654321/52332

Please use this identifier to cite or link to this item: https://tkuir.lib.tku.edu.tw/dspace/handle/987654321/52332

Title:	Improving the accuracy of text classification by the different classifier with multiple confidence threshold values
Other Titles:	利用不同分類器與多重靜態門檻值來改善文件分類的準確度
Authors:	黃蕙華;Huang, Hui-hua
Contributors:	淡江大學資訊工程學系博士班葛煥昭;Keh, Huan-chao
Keywords:	關聯式分類;文件分類;文字採擷;Association Classification;Text Classification;text mining
Date:	2010
Issue Date:	2010-09-23 17:33:21 (UTC+8)
Abstract:	在使用Associative Classification (AC)做分類時，通常會將無法利用Class Association Rules(CAR)做分類的資料，直接歸類到一個預先設定的類別，以避免資料無法被分類的問題。但在使用CAR建立AC分類器時，規則信賴度的標準很難設定，定得太高會將很多可能有用的規則刪除而造成許多資料不能使用CAR做分類，而定得太低則又容易產生分類錯誤，這些情形都會影響到分類準確性。為了解決預設類別和低信賴度規則造成分類錯誤的問題，提升分類結果的準確度，我們提出同時使用兩種不同分類器的概念，依據分類器特性，在不同階段做不同的事。本論文將利用貝氏分類器對訓練文件做分類，然後利用所得之平均準確率來設定門檻值，篩選出滿足門檻值條件的CAR。由於這些CAR之準確度皆高於貝氏分類器的結果，我們可利用這些篩選出CAR來進一步改善分類的結果。而針對CAR不能分類的文件，則以貝氏分類器來分類。經由實驗證明，這種結合不同的分類器的優點的作法的確可獲得比僅使用單一分類器更好的分類效能，換言之，這種結合不同的分類器的優點的作法可有效提升文件分類的效能。 Each type of classifier has its own advantages as well as certain shortcomings. In this paper, we take the advantages of the Associative classifier and the Naive Bayes classifier to make up the shortcomings of each other, thus improving the accuracy of text classification. We will classify the training cases with the Naive Bayes classifier and set different confidence threshold values for different class association rules (CARs) to different classes by the obtained classification accuracy rate of the Naive Bayes classifier to the classes. Since the accuracy rates of all selected CARs of the class are higher than that obtained by the Naive Bayes classifier, we could further optimize the classification result through these selected CARs. Moreover, for those unclassified cases, we will classify them with the Naive Bayes classifier. The experimental results show that combining the advantages of these two different classifiers better classification result can be obtained than with a single classifier.
Appears in Collections:	[資訊工程學系暨研究所] 學位論文

Files in This Item:

File	Size	Format
index.html	0Kb	HTML	353	View/Open

Loading...