順序型變數轉換在決策樹之應用

淡江大學機構典藏 > 商管學院 > 統計學系暨研究所 > 學位論文 > Item 987654321/74362

請使用永久網址來引用或連結此文件: https://tkuir.lib.tku.edu.tw/dspace/handle/987654321/74362

題名:	順序型變數轉換在決策樹之應用
其他題名:	Transformation of ordinal variables with applications in decision trees
作者:	陳宇邦;Chen, Yu-Pang
貢獻者:	淡江大學統計學系碩士班陳景祥;Chen, Ching-Hsiang
關鍵詞:	順序型變數;決策樹;變數轉換;輔助變數;歐式距離;data mining;Decision tree;transform variables;ordinal variable;surrogate variables;euclidean distance;CART;C4.5;QUEST
日期:	2011
上傳時間:	2011-12-28 18:24:50 (UTC+8)
摘要:	在資料探勘的實務分析中，我們常會遇到順序型尺度變數。順序型變數大都是研究員為求方便，將連續型變數進行切割、區間化轉換後產生。轉換後的順序型變數常會因為訊息的縮減而喪失原本連續型變數的完整資訊。此外，一般研究中分析順序型變數時，傳統的做法是直接將其視為連續型變數看待，兩者既然沒有同等的資訊卻混為一談，這樣的作法欠缺熟慮。因此，本研究利用輔助變數以及平面座標的概念，提出順序型變數的轉換方式，使用歐氏距離的方法將原本的順序型變數轉換成擬連續型變數並予以加權，以減少順序型變數所造成的資訊損失。我們也將轉換結果套用到CART、C4.5以及QUEST三種決策樹方法進行比較，結果顯示轉換後的擬連續變數確實能夠有效提升決策樹的分類準確率，代表轉換後的擬連續變數可以有效的彌補原本順序型變數所喪失的資訊。 In empirical data mining analysis, we need to handle ordinal-scale variables frequently. Also, many ordinal variables are often generated by researchers from continuous variables for convenience by grouping observed values into intervals, but some of the information contained in the original continuous variable will be lost. On the othe hand, when analyzing ordinal variables with numeric coding, people used to treat them as continuous variables, regardless of their differences in the amount of information. 　　We propose a transformation method of ordinal variables into quasi-continuous variables by means of surrogate variables, concept of coordinates, and Euclidean distances. Our method expects less information loss than the traditional practice which uses only ordinal information. Our transformation method is then applied to three decision tree algorithm: CART, C4.5, and QUEST. With several real-world data sets, our study shows that the transformed Quasi-continuous variables can efficiently enhance classification accuracy rate of these decision trees.
顯示於類別:	[統計學系暨研究所] 學位論文

文件中的檔案:

檔案	大小	格式	瀏覽次數
index.html	0Kb	HTML	372	檢視/開啟

在機構典藏中所有的資料項目都受到原著作權保護.

TAIR相關文章

資料載入中.....