淡江大學機構典藏:Item 987654321/34105
English  |  正體中文  |  简体中文  |  Items with full text/Total items : 62797/95867 (66%)
Visitors : 3748983      Online Users : 437
RC Version 7.0 © Powered By DSPACE, MIT. Enhanced by NTU Library & TKU Library IR team.
Scope Tips:
  • please add "double quotation mark" for query phrases to get precise results
  • please goto advance search for comprehansive author search
  • Adv. Search
    HomeLoginUploadHelpAboutAdminister Goto mobile version
    Please use this identifier to cite or link to this item: https://tkuir.lib.tku.edu.tw/dspace/handle/987654321/34105


    Title: 語意解析垃圾郵件過濾器
    Other Titles: Semantic processing model for spam filter
    Authors: 謝文軒;Hsieh, Wen-hsuan
    Contributors: 淡江大學資訊管理學系碩士班
    梁德昭;Liang, Te-chao
    Keywords: 垃圾郵件;語意處理;特徵擷取;spam;feature extraction;semantic processing
    Date: 2006
    Issue Date: 2010-01-11 04:55:29 (UTC+8)
    Abstract: 網路基礎建設發達之後,網路人口暴增,也陸續衍生出許多便利的網路應用。垃圾郵件卻是一個負面的例子。垃圾郵件的數量以及不堪入目的內容讓人不勝其擾。本研究欲發展一使用者端郵件過濾器技術,此技術將可處理中、英文郵件資訊,不需預先建立大量的郵件黑名單,擁有累進學習(adaptive learning)的能力,達成高正確率並兼顧訓練時期與分類時期的速度,使其能實際應用於現實環境中。郵件過濾的技術與文件分類相似。首先是面對的問題是如何擷取數量以及特質都足以代表此郵件的特徵,再利用自動分類演算法依據這些特徵來決定該郵件是為垃圾郵件。本研究在特徵擷取方面,利用斷詞後的結果經由以詞性為主的停用字過濾,以及Sliding Window配合關鍵詞組合的方式,擷取垃圾郵件的字面特徵。而分類演算法則採用貝式分類演算法。由於本研究使用之特徵擷取的演算法深入語意層面,所以其正確率高於關鍵字的特徵擷取法,從實驗結果來看,我們的郵件過濾機制正確率達到92%,但是由於語意特徵擷取的程序,因此其訓練階段與分類階段的速度皆低於關鍵字特徵擷取法。
    In this information age, network provides many convenient applications to us, but spam is different one. The huge amount of spam and disgusting contents are disturbance people who use e-mail in daily life. The thesis is to develop a semantic-based spam filter in client side, it can handle mail message in Chinese or in English and doesn’t need to build a huge amount of black-white list for mail. It has an ability of adaptive learning to reach high precision rate and looks after the speed in training phase and classifying phase. So it can be used in real environment. Mail filtering is similar with document classification. First problem is how to extract enough features that represent the mail exactly. Then according to these features, we use automatic classify algorithm to classify this mail is spam or ham. We use sliding window to extract features and take Bayesian’s algorithm as our classification algorithm. Due to the feature extraction method deeps into semantic layer, the precision rate is higher than the feature extraction with keywords as a result.
    Appears in Collections:[Graduate Institute & Department of Information Management] Thesis

    Files in This Item:

    File SizeFormat
    0KbUnknown299View/Open

    All items in 機構典藏 are protected by copyright, with all rights reserved.


    DSpace Software Copyright © 2002-2004  MIT &  Hewlett-Packard  /   Enhanced by   NTU Library & TKU Library IR teams. Copyright ©   - Feedback