語意解析垃圾郵件過濾器

淡江大學機構典藏 > 商管學院 > 資訊管理學系暨研究所 > 學位論文 > Item 987654321/34105

請使用永久網址來引用或連結此文件: https://tkuir.lib.tku.edu.tw/dspace/handle/987654321/34105

題名:	語意解析垃圾郵件過濾器
其他題名:	Semantic processing model for spam filter
作者:	謝文軒;Hsieh, Wen-hsuan
貢獻者:	淡江大學資訊管理學系碩士班梁德昭;Liang, Te-chao
關鍵詞:	垃圾郵件;語意處理;特徵擷取;spam;feature extraction;semantic processing
日期:	2006
上傳時間:	2010-01-11 04:55:29 (UTC+8)
摘要:	網路基礎建設發達之後，網路人口暴增，也陸續衍生出許多便利的網路應用。垃圾郵件卻是一個負面的例子。垃圾郵件的數量以及不堪入目的內容讓人不勝其擾。本研究欲發展一使用者端郵件過濾器技術，此技術將可處理中、英文郵件資訊，不需預先建立大量的郵件黑名單，擁有累進學習(adaptive learning)的能力，達成高正確率並兼顧訓練時期與分類時期的速度，使其能實際應用於現實環境中。郵件過濾的技術與文件分類相似。首先是面對的問題是如何擷取數量以及特質都足以代表此郵件的特徵，再利用自動分類演算法依據這些特徵來決定該郵件是為垃圾郵件。本研究在特徵擷取方面，利用斷詞後的結果經由以詞性為主的停用字過濾，以及Sliding Window配合關鍵詞組合的方式，擷取垃圾郵件的字面特徵。而分類演算法則採用貝式分類演算法。由於本研究使用之特徵擷取的演算法深入語意層面，所以其正確率高於關鍵字的特徵擷取法，從實驗結果來看，我們的郵件過濾機制正確率達到92%，但是由於語意特徵擷取的程序，因此其訓練階段與分類階段的速度皆低於關鍵字特徵擷取法。 In this information age, network provides many convenient applications to us, but spam is different one. The huge amount of spam and disgusting contents are disturbance people who use e-mail in daily life. The thesis is to develop a semantic-based spam filter in client side, it can handle mail message in Chinese or in English and doesn’t need to build a huge amount of black-white list for mail. It has an ability of adaptive learning to reach high precision rate and looks after the speed in training phase and classifying phase. So it can be used in real environment. Mail filtering is similar with document classification. First problem is how to extract enough features that represent the mail exactly. Then according to these features, we use automatic classify algorithm to classify this mail is spam or ham. We use sliding window to extract features and take Bayesian’s algorithm as our classification algorithm. Due to the feature extraction method deeps into semantic layer, the precision rate is higher than the feature extraction with keywords as a result.
顯示於類別:	[資訊管理學系暨研究所] 學位論文

文件中的檔案:

檔案	大小	格式	瀏覽次數
	0Kb	Unknown	405	檢視/開啟

在機構典藏中所有的資料項目都受到原著作權保護.

TAIR相關文章

資料載入中.....