利用關聯式法則將中文文件分類

機構典藏 > College of Engineering > Graduate Institute & Department of Computer Science and Information Engineering > Thesis > Item 987654321/35034

Please use this identifier to cite or link to this item: https://tkuir.lib.tku.edu.tw/dspace/handle/987654321/35034

Title:	利用關聯式法則將中文文件分類
Other Titles:	Classifying Chinese text documents by association rule
Authors:	李卓銘;Lee, Cho-ming
Contributors:	淡江大學資訊工程學系碩士在職專班黃連進;Huang, Lain-jinn
Keywords:	文件分類;關聯式法則;文字探勘;document classification;association rule;text mining
Date:	2007
Issue Date:	2010-01-11 05:56:04 (UTC+8)
Abstract:	利用改良式TFIDF公式計算每個特徵詞的權重，依據權重表可以計算出每份文件對各類別的權重值總和，同時利用關聯式法則採礦，找出同時會出現於一份文件中的特徵詞作為新的規則，統計新規則在訓練文件中各個類別出現的情形，依據每個規則之信賴度(confidence)及支持度(support)篩選出可以幫助分類的新規則，利用新規則修正文件的錯誤類別，以提升分類正確率。本論文除利用改良式TFIDF弱化分布過廣之雜訊詞權重減少預處理時未刪減完全所帶來的影響，主要利用關聯式法則採礦出之新規則，並針對各種可能的情況篩選重覆性規則，依據信賴度遞減、規則長度遞減作為規則引用之排序準則以修正分類錯誤，並將分類類別調整先後順序，使分類的正確率提高。由本論文的實驗結果，在經過本論文提出的方法修正後，能夠大幅度提高文件分類的效率。 Use improved TFIDF to build weighting table. Thereby, the system computes the sum of weight of each document relative to each category. According to this way, we can classify the documents which haven’t been labeled. In this paper, we use improve TFIDF to calculate the keywords weight and then combine two words as a new word by association rule to help us increase the keywords. We exploit association rule technology to apply to the data mining miner. The features of weight table are input into the data mining miner and examined whether these rules sorted by confidence, support and the length of rule to save into rule base. It will make the classification more efficiency.
Appears in Collections:	[Graduate Institute & Department of Computer Science and Information Engineering] Thesis

Files in This Item:

File	Size	Format
	0Kb	Unknown	483	View/Open

Loading...