English  |  正體中文  |  简体中文  |  全文笔数/总笔数 : 49378/84106 (59%)
造访人次 : 7383251      在线人数 : 58
RC Version 7.0 © Powered By DSPACE, MIT. Enhanced by NTU Library & TKU Library IR team.
搜寻范围 查询小技巧:
  • 您可在西文检索词汇前后加上"双引号",以获取较精准的检索结果
  • 若欲以作者姓名搜寻,建议至进阶搜寻限定作者字段,可获得较完整数据
  • 进阶搜寻


    jsp.display-item.identifier=請使用永久網址來引用或連結此文件: http://tkuir.lib.tku.edu.tw:8080/dspace/handle/987654321/111175


    题名: 基於MapReduce程式架構下的分散式循序樣式探勘方法之研究
    其它题名: A study of distributed sequential pattern mining method based on MapReduce programming model
    作者: 陳智翔;Chen, Jhih-Siang
    贡献者: 淡江大學資訊管理學系碩士班
    徐煥智
    关键词: Hadoop;MapReduce;循序樣式;資料探勘;sequential pattern;data mining
    日期: 2016
    上传时间: 2017-08-24 23:45:53 (UTC+8)
    摘要: 循序樣式探勘是在巨量循序資料庫中用來取得頻繁循序樣式的一種資料探勘方法,常見的循序資料探勘方法可以分為兩大類,候選樣式產生與樣式成長方法,這些演算法主要執行於單機的環境,便會造成一些缺點,像是對於巨量資料的掃描時間、可擴展性的問題、對於巨量資料及的效率較低。為了增進循序資料探勘的性能,並且改善可擴展性的問題,本研究提出了以Hadoop平台與MapReduce軟體架構為基礎的循序資料探勘方法。
    探勘任務被分解為許多分散式任務,Map方法用來挖掘資料集中的所有循序樣式,然後Reduce方法合併所有被找出來的樣式。簡化了搜尋的空間以及獲得了更高的探勘效能。
    在這次研究當中,我們對於用戶所設定最小支持度的影響有更進一步的討論,根據我們的實驗,我們發現在探勘過程中的Map與Reduce階段對於最小支持度的設定應該不同,否則會產生頻繁樣式流失的可能。
    Sequential pattern mining is a data mining method for obtaining frequent sequential patterns in a large sequential database. Conventional sequence data mining methods could be divided into two categories: Apriori-like methods and pattern growth methods. These algorithms are mainly executed on standalone environment. There are some disadvantages like large database scanning time, scalability problem, less efficient for massive dataset. To improve the performance of sequential pattern mining and to improve the scalability issues, this study presents a distributed sequential pattern mining method based on Hadoop platform and Map Reduce programming model. Mining tasks are decomposed to many distributed tasks, the Map function is used to mine each sequential pattern in a subset of database. Then the Reduce function merges together all these identified patterns. It simplifies the search space and acquires a higher mining efficiency. In this study, we have further discussion on the influence of the setting of user-specified minimum support threshold on the distributed mining process. According to our experiments, it has been found that the threshold setting should be different in Map and Reduce mining process to prevent loss of some frequent patterns.
    显示于类别:[資訊管理學系暨研究所] 學位論文

    文件中的档案:

    档案 描述 大小格式浏览次数
    index.html0KbHTML5检视/开启

    在機構典藏中所有的数据项都受到原著作权保护.

    TAIR相关文章

    DSpace Software Copyright © 2002-2004  MIT &  Hewlett-Packard  /   Enhanced by   NTU Library & TKU Library IR teams. Copyright ©   - 回馈