可處理巨量資料的平行化CHAID決策樹

淡江大學機構典藏 > 商管學院 > 統計學系暨研究所 > 學位論文 > Item 987654321/102332

jsp.display-item.identifier=請使用永久網址來引用或連結此文件: https://tkuir.lib.tku.edu.tw/dspace/handle/987654321/102332

题名:	可處理巨量資料的平行化CHAID決策樹
其它题名:	Paralleled CHAID decision tree algorithm with big-data capability
作者:	蔡育儒;Tsai, Yu-Ju
贡献者:	淡江大學統計學系碩士班陳景祥
关键词:	資料探勘;分類器;CHAID決策樹;平行化;data mining;classifiers;parallel;CHAID
日期:	2014
上传时间:	2015-05-04 09:53:09 (UTC+8)
摘要:	隨著科技的進步，Big-Data的時代正式來臨。在資料量急增下，電腦處理速度的改良已成為一項重要的發展技術。若將資料處理及分析的時間縮短，可以提早進行預測或判斷，平行化處理就是減少分析時間的一個方法。本研究探討資料探勘常被使用的決策樹方法與平行化運算的結合。我們改寫了CHAID決策樹在合併及判斷變數的運算法則，利用多核心計算，使決策樹的建構時間縮短。在結論中，模擬的結果顯示，當CPU 的核心為一顆以上時，CHAID決策樹的計算時間比單核心狀況明顯縮短。在處理更大的資料量時，我們節省的時間會有更明顯的差異。 As technology advances, the era of Big-Data has finally arrived. As the amount of data increases , the improvement of computing speed becomes an important development technology. If data training and analysis time are reduced, we could make the prediction or decision much earlier then expected. As a result, parallel computation is one of the methods which can reduce the analysis time. In this paper, we rewrite the CHAID decision tree algorithm for parallel computation and Big-Data capability. Our simulation results show that, when the CPU has more than one kernel, the computation time of our improved CHAID tree is significantly reduced. When we have a huge amount of data, the difference of computation times is even more significant.
显示于类别:	[統計學系暨研究所] 學位論文

文件中的档案:

档案	大小	格式	浏览次数
index.html	0Kb	HTML	511	检视/开启

在機構典藏中所有的数据项都受到原著作权保护.

TAIR相关文章

数据加载中.....