English  |  正體中文  |  简体中文  |  Items with full text/Total items : 49647/84944 (58%)
Visitors : 7705778      Online Users : 78
RC Version 7.0 © Powered By DSPACE, MIT. Enhanced by NTU Library & TKU Library IR team.
Scope Tips:
  • please add "double quotation mark" for query phrases to get precise results
  • please goto advance search for comprehansive author search
  • Adv. Search
    HomeLoginUploadHelpAboutAdminister Goto mobile version
    Please use this identifier to cite or link to this item: http://tkuir.lib.tku.edu.tw:8080/dspace/handle/987654321/111175


    Title: 基於MapReduce程式架構下的分散式循序樣式探勘方法之研究
    Other Titles: A study of distributed sequential pattern mining method based on MapReduce programming model
    Authors: 陳智翔;Chen, Jhih-Siang
    Contributors: 淡江大學資訊管理學系碩士班
    徐煥智
    Keywords: Hadoop;MapReduce;循序樣式;資料探勘;sequential pattern;data mining
    Date: 2016
    Issue Date: 2017-08-24 23:45:53 (UTC+8)
    Abstract: 循序樣式探勘是在巨量循序資料庫中用來取得頻繁循序樣式的一種資料探勘方法,常見的循序資料探勘方法可以分為兩大類,候選樣式產生與樣式成長方法,這些演算法主要執行於單機的環境,便會造成一些缺點,像是對於巨量資料的掃描時間、可擴展性的問題、對於巨量資料及的效率較低。為了增進循序資料探勘的性能,並且改善可擴展性的問題,本研究提出了以Hadoop平台與MapReduce軟體架構為基礎的循序資料探勘方法。
    探勘任務被分解為許多分散式任務,Map方法用來挖掘資料集中的所有循序樣式,然後Reduce方法合併所有被找出來的樣式。簡化了搜尋的空間以及獲得了更高的探勘效能。
    在這次研究當中,我們對於用戶所設定最小支持度的影響有更進一步的討論,根據我們的實驗,我們發現在探勘過程中的Map與Reduce階段對於最小支持度的設定應該不同,否則會產生頻繁樣式流失的可能。
    Sequential pattern mining is a data mining method for obtaining frequent sequential patterns in a large sequential database. Conventional sequence data mining methods could be divided into two categories: Apriori-like methods and pattern growth methods. These algorithms are mainly executed on standalone environment. There are some disadvantages like large database scanning time, scalability problem, less efficient for massive dataset. To improve the performance of sequential pattern mining and to improve the scalability issues, this study presents a distributed sequential pattern mining method based on Hadoop platform and Map Reduce programming model. Mining tasks are decomposed to many distributed tasks, the Map function is used to mine each sequential pattern in a subset of database. Then the Reduce function merges together all these identified patterns. It simplifies the search space and acquires a higher mining efficiency. In this study, we have further discussion on the influence of the setting of user-specified minimum support threshold on the distributed mining process. According to our experiments, it has been found that the threshold setting should be different in Map and Reduce mining process to prevent loss of some frequent patterns.
    Appears in Collections:[資訊管理學系暨研究所] 學位論文

    Files in This Item:

    File Description SizeFormat
    index.html0KbHTML10View/Open

    All items in 機構典藏 are protected by copyright, with all rights reserved.


    DSpace Software Copyright © 2002-2004  MIT &  Hewlett-Packard  /   Enhanced by   NTU Library & TKU Library IR teams. Copyright ©   - Feedback