English  |  正體中文  |  简体中文  |  全文筆數/總筆數 : 51510/86705 (59%)
造訪人次 : 8272699      線上人數 : 99
RC Version 7.0 © Powered By DSPACE, MIT. Enhanced by NTU Library & TKU Library IR team.
搜尋範圍 查詢小技巧:
  • 您可在西文檢索詞彙前後加上"雙引號",以獲取較精準的檢索結果
  • 若欲以作者姓名搜尋,建議至進階搜尋限定作者欄位,可獲得較完整資料
  • 進階搜尋
    請使用永久網址來引用或連結此文件: http://tkuir.lib.tku.edu.tw:8080/dspace/handle/987654321/34166

    題名: 以動態任務分配為基礎之分散式循序樣本探勘系統
    其他題名: A distributed sequential pattern mining system based on dynamic task partition
    作者: 周定賢;Chou, Ting-hsien
    貢獻者: 淡江大學資訊管理學系碩士班
    張昭憲;Chang, Jau-shien
    日期: 2005
    上傳時間: 2010-01-11 04:59:25 (UTC+8)
    摘要: 循序樣本探勘(sequential pattern mining)可從資料庫找出經常出現的樣本,而且指出樣本中各項目出現的時序,其複雜度遠高於關聯規則式的菜籃分析(Market Basket Analysis)。針對循序樣本探勘目前已有許多方法被提出[1,10-16],然而,面對日益膨脹的資料庫,這些方法的效能再次受到挑戰。為有效改善大型資料庫的探勘效率,利用網路結合多部電腦的分散式探勘(distributed mining)便開始受到重視[2][4]。
    為加速大型資料庫的循序樣本探勘,本研究以分散式架構為基礎研製有效的探勘演算法,並據以發展實用的探勘系統。首先,本研究提出任務佇列(task queue)的概念,有效結合靜態與動態任務分配之優點,不但可減輕靜態分配的任務歪斜問題,亦能降低動態分配頻繁的通訊負擔。其次,為使探勘完成後之結果彙整更有效率,本研究也充分利用閒置節點來進行探勘結果整合。此外,我們特別採用PrefixSpan[1]做為基礎演算法,以便有效控制任務間的獨立性。為評估系統效能,我們分別使用2、4、8、16及32部電腦進行分散式探勘實驗,數據顯示本系統不但能有效降低探勘時間,同時具有良好的加速比(speedup ratio)。此結果驗證了提出方法之有效性,也顯示本系統處理大型資料庫之潛能。
    Sequential pattern mining can discover frequent patterns in database and point out the sequence of items in patterns. It is more complex than traditional association rule mining. In the past few years, there are many efficient sequential pattern mining methods have been proposed. However, their efficiency are being challenged because the size of real-life database is drastically increased. In order to alleviate the problem of mining large database, the researchers begin paying attention to perform mining under a distributed architecture.
    In this thesis, a distributed sequential pattern mining system are developed to speed up the mining process. The proposed system is based on a novel concept of “task queue”, which effectively abates task askew of static task partition and communication overhead of dynamic task partition. For the purpose of collecting the mining result efficiently, the first idle node is assigned to collect the resultant patterns. Besides, to maintain the independency of dispatched tasks, we adopt PrefixSpan as the outline algorithm. Finally , we performed a serious of experiments on 1, 2, 4, 8, 16 and 32 processors respectively. A fine speedup ratio is obtained according to the experimental results. It clearly demonstrate that the system has potential to deal with large database .
    顯示於類別:[資訊管理學系暨研究所] 學位論文


    檔案 大小格式瀏覽次數



    DSpace Software Copyright © 2002-2004  MIT &  Hewlett-Packard  /   Enhanced by   NTU Library & TKU Library IR teams. Copyright ©   - 回饋