淡江大學機構典藏:Item 987654321/111154
English  |  正體中文  |  简体中文  |  全文笔数/总笔数 : 62805/95882 (66%)
造访人次 : 3910060      在线人数 : 366
RC Version 7.0 © Powered By DSPACE, MIT. Enhanced by NTU Library & TKU Library IR team.
搜寻范围 查询小技巧:
  • 您可在西文检索词汇前后加上"双引号",以获取较精准的检索结果
  • 若欲以作者姓名搜寻,建议至进阶搜寻限定作者字段,可获得较完整数据
  • 进阶搜寻


    jsp.display-item.identifier=請使用永久網址來引用或連結此文件: https://tkuir.lib.tku.edu.tw/dspace/handle/987654321/111154


    题名: 垃圾郵件分類及特徵選擇組合之分析研究
    其它题名: Analysis of combinations of the spam classification and feature selection
    作者: 鄭奕騰;Cheng, Yi-Teng
    贡献者: 淡江大學資訊管理學系碩士班
    周清江;Jou, Chichang
    关键词: 郵件分類;概念漂移;特徵選擇;組合分析;e-mail categorization;concept drift;Feature Selection;combination analysis
    日期: 2016
    上传时间: 2017-08-24 23:45:20 (UTC+8)
    摘要: 垃圾郵件氾濫的問題主要是透過垃圾郵件分類過濾垃圾郵件,先依照相關指標選定特徵字集,再依照某個分類演算法進行分類。然而此問題一直沒有獲得徹底解決,需要進一步分析垃圾郵件分類相關特徵字選取指標及分類演算法之特性,以求更佳分類效果。本研究採用TFIDF和IG這兩種特徵字選取指標,並採用權重貝氏和支持向量機這兩種分類演算法,對這些特徵選取指標和分類演算法以各自獨立、交集和聯集的方式,進行組合分析,本研究將透過實驗來比較分析這16種組合在概念漂移情況下之分類效能,並就各組實驗之最佳分類組合,分析在不同時間點之效能及整體穩定度。
    The spam-email overflow problems are mainly solved by filtering spam-emails through spam email classifications. They first select a set of feature words according to their indicative figures, and then apply a classification algorithm to decide whether an incoming email is a spam. However, the problem has not been solved completely. There is a need to further analyze related characteristics of the feature words selection indicatives and classification algorithms to achieve better classification effectiveness. We use two feature words selection indicatives: TFIDF (Term Frequency–Inverse Document Frequency) and IG (Information Gain) and two classification algorithms: Weighted Naive Bayesian and SVM (Support Vector Machine) as representatives in the analysis. By using them independently, under the intersection operator, or under the union operator, through experiments in the context of concept drift, we compare the classification effectiveness of these 16 combinations of feature selection indicatives and classification algorithms. Additionally, for each experiment we analyse the classification effectiveness of the best combination different accumulated number of e-mails. Stability of the combination is also discussed.
    显示于类别:[資訊管理學系暨研究所] 學位論文

    文件中的档案:

    档案 描述 大小格式浏览次数
    index.html0KbHTML85检视/开启

    在機構典藏中所有的数据项都受到原著作权保护.

    TAIR相关文章

    DSpace Software Copyright © 2002-2004  MIT &  Hewlett-Packard  /   Enhanced by   NTU Library & TKU Library IR teams. Copyright ©   - 回馈