English  |  正體中文  |  简体中文  |  全文筆數/總筆數 : 51258/86283 (59%)
造訪人次 : 8027735      線上人數 : 67
RC Version 7.0 © Powered By DSPACE, MIT. Enhanced by NTU Library & TKU Library IR team.
搜尋範圍 查詢小技巧:
  • 您可在西文檢索詞彙前後加上"雙引號",以獲取較精準的檢索結果
  • 若欲以作者姓名搜尋,建議至進階搜尋限定作者欄位,可獲得較完整資料
  • 進階搜尋
    請使用永久網址來引用或連結此文件: http://tkuir.lib.tku.edu.tw:8080/dspace/handle/987654321/34127

    題名: 有效率的分散式循序樣本探勘系統
    其他題名: An efficient distributed system for mining sequential patterns
    作者: 黃揚智;Huang, Yang-chih
    貢獻者: 淡江大學資訊管理學系碩士班
    張昭憲;Chang, Jau-shien
    關鍵詞: 分散式探勘;平行探勘;循序樣本探勘;資料探勘;distributed mining;parallel mining;sequential pattern mining;Data mining
    日期: 2006
    上傳時間: 2010-01-11 04:57:08 (UTC+8)
    摘要: 本研究以提昇循序樣本探勘(sequential pattern mining)之效率為目標,發展以多部電腦為基礎的分散式探勘系統。分散式探勘首重節點間的負載平衡,同時需降低節點間通訊負擔。負載平衡與通訊負擔的降低均有賴於準確的任務工作量預測,此外,也應避免透過繁複的任務重分配來解決負載失衡的狀況。
    針對上述重點,本研究首先針對相關文獻中常利用之工作量預測方法進行分析,發現以序列(sequence)支持度(support)為基礎之預測方式與實際狀況差異甚大,將嚴重影響負載平衡。此外,我們也發現不同特性之資料庫可能適用於不同之探勘演算法,此發現可導致全新的分散式演算法設計。根據上述結果,我們發展了一套新的分散式探勘演算法,Segmented Dynamic Load Balance algorithm (SDLB)。演算法中對於任務工作分配採取分段實施,並依狀況進行靜態與動態分配,期能在負載平衡與降低通訊負擔間取得協調。此外,本研究亦導入全新的演算法切換概念,能讓各節點依照分配任務之特性,選用合適之探勘演算法,以進一步提升總體效率。由實驗結果顯示,與前人研究相較,本系統能有效縮短執行時間,並取得更良好之加速比,此結果也顯示本系統具有處理超大型資料庫之潛力。
    In this thesis, an effective distributed mining system is developed to increase the efficiency of sequential pattern mining. Two important concerns for distributed mining, load balance and communication overhead reduction, are carefully examined in our works. To achieve a good load balance, accurate predictions on work load of subtasks are indispensable. In addition, the independency of subtask should be effectively controlled to decrease the communication overhead among nodes.
    According to above objectives, the work load prediction methods used in the previous works are analyzed at first. We finds that the commonly-used prediction method does not show a well metrics for work load. Eventually, a skew presented in task partition occurs frequently due to the inaccurate prediction. Besides, we also found that, the efficiency of performing a sequence mining algorithm on a transaction database is strongly related to the distribution of sequential patterns in this database. To increase the mining efficiency, we should choose a proper mining algorithm for each database but not apply a single algorithm for all databases with different features. Base on the above observations, a novel distributed data mining algorithm, Segmented Dynamic Load Balance algorithm (SDLB), is developed. Different previous works, SDLB divides subtask dispatches into several stages. According to different situations, the static and dynamic load balance method are applied adeptly to prevent the task partition from skew and reduce the communication overhead simultaneously. Furthermore, SDLB inducts a brand-new concept that the nodes may adopt different basic mining algorithm for subtasks with different characteristics to promote the mining efficiency. In comparison with the previous works, the experimental results shows SDLB can effectively reduce the runtime and obtain a better speed-up ratio. This result demonstrates the potentials of SDLB for mining sequential pattern in Very Large Data Bases (VLDBs).
    顯示於類別:[資訊管理學系暨研究所] 學位論文


    檔案 大小格式瀏覽次數



    DSpace Software Copyright © 2002-2004  MIT &  Hewlett-Packard  /   Enhanced by   NTU Library & TKU Library IR teams. Copyright ©   - 回饋