English  |  正體中文  |  简体中文  |  Items with full text/Total items : 62805/95882 (66%)
Visitors : 3890152      Online Users : 284
RC Version 7.0 © Powered By DSPACE, MIT. Enhanced by NTU Library & TKU Library IR team.
Scope Tips:
  • please add "double quotation mark" for query phrases to get precise results
  • please goto advance search for comprehansive author search
  • Adv. Search
    HomeLoginUploadHelpAboutAdminister Goto mobile version
    Please use this identifier to cite or link to this item: https://tkuir.lib.tku.edu.tw/dspace/handle/987654321/74362

    Title: 順序型變數轉換在決策樹之應用
    Other Titles: Transformation of ordinal variables with applications in decision trees
    Authors: 陳宇邦;Chen, Yu-Pang
    Contributors: 淡江大學統計學系碩士班
    陳景祥;Chen, Ching-Hsiang
    Keywords: 順序型變數;決策樹;變數轉換;輔助變數;歐式距離;data mining;Decision tree;transform variables;ordinal variable;surrogate variables;euclidean distance;CART;C4.5;QUEST
    Date: 2011
    Issue Date: 2011-12-28 18:24:50 (UTC+8)
    Abstract: 在資料探勘的實務分析中,我們常會遇到順序型尺度變數。順序型變數大都是研究員為求方便,將連續型變數進行切割、區間化轉換後產生。轉換後的順序型變數常會因為訊息的縮減而喪失原本連續型變數的完整資訊。此外,一般研究中分析順序型變數時,傳統的做法是直接將其視為連續型變數看待,兩者既然沒有同等的資訊卻混為一談,這樣的作法欠缺熟慮。因此,本研究利用輔助變數以及平面座標的概念,提出順序型變數的轉換方式,使用歐氏距離的方法將原本的順序型變數轉換成擬連續型變數並予以加權,以減少順序型變數所造成的資訊損失。我們也將轉換結果套用到CART、C4.5以及QUEST三種決策樹方法進行比較,結果顯示轉換後的擬連續變數確實能夠有效提升決策樹的分類準確率,代表轉換後的擬連續變數可以有效的彌補原本順序型變數所喪失的資訊。
    In empirical data mining analysis, we need to handle ordinal-scale variables frequently. Also, many ordinal variables are often generated by researchers from continuous variables for convenience by grouping observed values into intervals, but some of the information contained in the original continuous variable will be lost. On the othe hand, when analyzing ordinal variables with numeric coding, people used to treat them as continuous variables, regardless of their differences in the amount of information.
      We propose a transformation method of ordinal variables into quasi-continuous variables by means of surrogate variables, concept of coordinates, and Euclidean distances. Our method expects less information loss than the traditional practice which uses only ordinal information. Our transformation method is then applied to three decision tree algorithm: CART, C4.5, and QUEST. With several real-world data sets, our study shows that the transformed Quasi-continuous variables can efficiently enhance classification accuracy rate of these decision trees.
    Appears in Collections:[Graduate Institute & Department of Statistics] Thesis

    Files in This Item:

    File SizeFormat

    All items in 機構典藏 are protected by copyright, with all rights reserved.

    DSpace Software Copyright © 2002-2004  MIT &  Hewlett-Packard  /   Enhanced by   NTU Library & TKU Library IR teams. Copyright ©   - Feedback