English  |  正體中文  |  简体中文  |  全文筆數/總筆數 : 62805/95882 (66%)
造訪人次 : 3946193      線上人數 : 557
RC Version 7.0 © Powered By DSPACE, MIT. Enhanced by NTU Library & TKU Library IR team.
搜尋範圍 查詢小技巧:
  • 您可在西文檢索詞彙前後加上"雙引號",以獲取較精準的檢索結果
  • 若欲以作者姓名搜尋,建議至進階搜尋限定作者欄位,可獲得較完整資料
  • 進階搜尋
    請使用永久網址來引用或連結此文件: https://tkuir.lib.tku.edu.tw/dspace/handle/987654321/111175


    題名: 基於MapReduce程式架構下的分散式循序樣式探勘方法之研究
    其他題名: A study of distributed sequential pattern mining method based on MapReduce programming model
    作者: 陳智翔;Chen, Jhih-Siang
    貢獻者: 淡江大學資訊管理學系碩士班
    徐煥智
    關鍵詞: Hadoop;MapReduce;循序樣式;資料探勘;sequential pattern;data mining
    日期: 2016
    上傳時間: 2017-08-24 23:45:53 (UTC+8)
    摘要: 循序樣式探勘是在巨量循序資料庫中用來取得頻繁循序樣式的一種資料探勘方法,常見的循序資料探勘方法可以分為兩大類,候選樣式產生與樣式成長方法,這些演算法主要執行於單機的環境,便會造成一些缺點,像是對於巨量資料的掃描時間、可擴展性的問題、對於巨量資料及的效率較低。為了增進循序資料探勘的性能,並且改善可擴展性的問題,本研究提出了以Hadoop平台與MapReduce軟體架構為基礎的循序資料探勘方法。
    探勘任務被分解為許多分散式任務,Map方法用來挖掘資料集中的所有循序樣式,然後Reduce方法合併所有被找出來的樣式。簡化了搜尋的空間以及獲得了更高的探勘效能。
    在這次研究當中,我們對於用戶所設定最小支持度的影響有更進一步的討論,根據我們的實驗,我們發現在探勘過程中的Map與Reduce階段對於最小支持度的設定應該不同,否則會產生頻繁樣式流失的可能。
    Sequential pattern mining is a data mining method for obtaining frequent sequential patterns in a large sequential database. Conventional sequence data mining methods could be divided into two categories: Apriori-like methods and pattern growth methods. These algorithms are mainly executed on standalone environment. There are some disadvantages like large database scanning time, scalability problem, less efficient for massive dataset. To improve the performance of sequential pattern mining and to improve the scalability issues, this study presents a distributed sequential pattern mining method based on Hadoop platform and Map Reduce programming model. Mining tasks are decomposed to many distributed tasks, the Map function is used to mine each sequential pattern in a subset of database. Then the Reduce function merges together all these identified patterns. It simplifies the search space and acquires a higher mining efficiency. In this study, we have further discussion on the influence of the setting of user-specified minimum support threshold on the distributed mining process. According to our experiments, it has been found that the threshold setting should be different in Map and Reduce mining process to prevent loss of some frequent patterns.
    顯示於類別:[資訊管理學系暨研究所] 學位論文

    文件中的檔案:

    檔案 描述 大小格式瀏覽次數
    index.html0KbHTML78檢視/開啟

    在機構典藏中所有的資料項目都受到原著作權保護.

    TAIR相關文章

    DSpace Software Copyright © 2002-2004  MIT &  Hewlett-Packard  /   Enhanced by   NTU Library & TKU Library IR teams. Copyright ©   - 回饋