收發電子郵件已經是現代人主要的通訊工具之ㄧ,而廣告電子郵件的大幅增加,使的我們的電子信箱經常在不知不覺中就充斥著一堆信件。過去對於廣告電子郵件則都歸類於垃圾郵件,然而在台灣ALS於2006年6月28日至7月28日間所做的調查中確有27.4%的受訪者表示曾經因為收到廣告郵件而確實有完成交易,可見在這些廣告電子信件中,有些對使用者言的確提供了所需的資訊及幫助,但有些則對使用者造成困擾及時間的浪費。因此,客製化郵件的分類則為本研究的主要議題。 在本論文中使用機器學習法之C4.5決策樹法則及機率類神經法則為核心用以建制郵件分類系統,一般郵件分類所攫取的關鍵字通常都是以頻的高低做為選取條件,但有許多關鍵字的選取並不能真正代表該類別的郵件。所以本研究除了利用CKIP中文斷詞技術外,並計算TF-IDF的方法來攫取真正能表達每一種分類電子郵件的關鍵詞,再搭配14種發送特徵作為判斷郵件分類的準則。 本研究將廣告信件分為九大類客製化郵件,並綜合評比整體準確率、正常郵件精確率、正常郵件檢出率、客製化郵件精確率和客製化郵件檢出率五種指標,其結果顯示本研究在個人日常郵件的測試上亦有不錯的結果。 E-mail has become a very popular mode of communication in the modern world; however, along with the rapid growth of E-mail advertising, recipients often receive commercial E-mails that that are unsolicited and sent in bulk. In the past years all the Unsolicited Commercial E-mail were automatically categorized as spam. A survey done by Taiwan ALS from June 28th to July 28th in 2006 shows that 27.4% of interviewee had bought products through commercial E-mails. Accordingly, some of the commercial E-mails really provide recipients with information and assistance, but the others are often annoying and wasting time; therefore, Customizable e-mail Classification is the main theme in this research. In the research C4.5 decision tree and Probabilistic Neural Network (PNN) of machine learning method are used mainly to establish E-mail classification system. Usually the key words which are seized to categorize E-mails are chosen by their appearance rate, but many key words can not really represent the E-mails of their categories. In this research the CKIP and the method of calculating TF-IDF are used in order to seize the key words which can actually represent every categorized E-mail, accompanying 14 different sending characteristics as the rules to categorize E-mails. This research categorized commercial E-mails into nine major Customizable E-mails categories and comprehensively evaluates five indexes: overall precision rate, (normal) E-mail accuracy rate, (normal) E-mail detectable rate, Customizable E-mail precision rate, and Customizable E-mail detectable rate.