利用關鍵字的觀念,我們可以從一群已經標示分類的文件,取得適當分類規則,也就是利用類別關鍵詞,並使用這樣的依據對未標示類別的文件進行分類的工作。 文件分類的訓練學習過程從學習樣本文件開始,計算樣本文件特徵詞的出現情形與分佈的狀況,經過統計後判斷該特徵詞是否屬於有類別代表意義的詞,若是,則將其作為一種分類的規則。在一份文件中,也可能帶著大量雜訊,為了有效過濾掉不必要的雜訊,在本文提出了改良式TFIDF修正關鍵詞權重的計算方式,再配合關聯式法則,找出能幫助分類的複合關鍵詞,用來修正文件的權重,最後再根據文件資料的特性,給予不同類別不同的優先權。由本論文的實驗結果,在經過本論文提出的方法修正後,能夠大幅度提高文件分類的效率。 By using feature keywords, we can obtain some appropriate rules from a group of labeled documents. According to this way, we can classify the documents which haven’t been labeled. In this paper, we will discuss how to choose some training datum to be a basic, to calculate all keywords’ weights, to judge the keywords’ importance by their distribution, first, we will use a better way to calculate the keywords weight, and then combine two words as a new word by association rule to help us increase the keywords. At last, according to the character of the datum, we give different category with different priority. It will make the classification more efficiency.