淡江大學機構典藏:Item 987654321/87999
English  |  正體中文  |  简体中文  |  Items with full text/Total items : 62805/95882 (66%)
Visitors : 3903962      Online Users : 486
RC Version 7.0 © Powered By DSPACE, MIT. Enhanced by NTU Library & TKU Library IR team.
Scope Tips:
  • please add "double quotation mark" for query phrases to get precise results
  • please goto advance search for comprehansive author search
  • Adv. Search
    HomeLoginUploadHelpAboutAdminister Goto mobile version
    Please use this identifier to cite or link to this item: https://tkuir.lib.tku.edu.tw/dspace/handle/987654321/87999


    Title: 以查核點提升Hadoop雲端計算系統容錯效能之研究
    Other Titles: Using checkpoints to improve fault tolerance for hadoop cloud computing environment
    Authors: 陳廷豪;Chen, Ting-Hau
    Contributors: 淡江大學資訊工程學系碩士班
    林其誼;Lin, Chi-Yi
    Keywords: 查核點;中繼資料;資料在地化;Hadoop;MapReduce;Checkpoint;Intermediate Data;Data locality
    Date: 2012
    Issue Date: 2013-04-13 11:56:01 (UTC+8)
    Abstract: 雲端計算中大規模數據密集型的MapReduce計算模組在近幾年來日益普及。Hadoop是用來實現MapReduce的雲端開源平台,它可以輕易且迅速的建立一個龐大的商用計算集群。在這種大型集群中,運算任務故障或運算節點故障並非是一種異常的情形,但是這些故障對於Hadoop的性能而言,這將會導致非常重大的影響。雖然Hadoop可以自動重新啟動失敗的任務並透過使用Speculative Execution來自動補償緩慢任務,但仍有許多研究人員發現了Hadoop容錯方面的缺點。在此研究中,我們探討當Hadoop在執行MapReduce運算時,如何以增加系統容錯能力的方式來減少因為錯誤恢復時所導致運算完成時間的延長與整體性能下降的問題。我們嘗試藉由設計一個簡單的Map任務查核點機制去改善這個問題,當該機制啟動時,透過雲端系統回傳的Progress與Heartbeat來獲得輸入資料區塊的執行進度,當輸入資料區塊處理進度達某特定百分比時,Mapper將會創建一個查核點來儲存Mapper執行時所產生的中繼資料。而一旦查核點建立後,若Mapper發生故障,Mapper則可以直接從查核點之後的進度開始執行而不需要將任務重頭開始執行。另外在加快錯誤恢復速度方面,在發生運算節點故障的情況下,我們利用移動TaskTracker的方式使得運算節點具有Data locality的性質,以節省輸入資料區塊搬移的時間。萬一具有輸入資料區塊的節點都在忙碌時,我們則優先選擇具有Rack locality性質的節點來執行任務的複製與移動,如此亦可加快錯誤恢復的速度。經由大量的模擬,我們發現我們提出的方法雖然需要花費更多的儲存空間與網路流量的成本,但相較於原始的Hadoop在任務完成時間方面,我們的方法表現出了更好的性能。
    The computing paradigm of MapReduce has gained extreme popularity in the area of large-scale data-intensive applications in recent years. Hadoop, an open-source implementation of MapReduce, can be set up easily and rapidly on commodity hardware to form a massive computing cluster. In such a cluster, task failures and node failures are not an anomaly, which will cause a substantial impact on Hadoop’s performance. Although Hadoop can restart failed tasks automatically and compensate for slow tasks by enabling speculative execution, many researchers have identified the shortcomings of Hadoop’s fault tolerance. In this research, we try to improve them by designing a simple checkpointing mechanism for Map tasks. When the mechanism is enabled, a checkpoint will be created for a mapper when the progress of processing the input data block reaches a certain percentage. Once a mapper fails after the progress of the checkpoint state, it can resume from the checkpoint state without having to restart from scratch. By extensive simulations, the proposed approach shows better performance than native Hadoop in terms of job completion time, at the cost of more storage space and network traffic.
    Appears in Collections:[Graduate Institute & Department of Computer Science and Information Engineering] Thesis

    Files in This Item:

    File SizeFormat
    index.html0KbHTML190View/Open

    All items in 機構典藏 are protected by copyright, with all rights reserved.


    DSpace Software Copyright © 2002-2004  MIT &  Hewlett-Packard  /   Enhanced by   NTU Library & TKU Library IR teams. Copyright ©   - Feedback