English  |  正體中文  |  简体中文  |  全文筆數/總筆數 : 49645/84944 (58%)
造訪人次 : 7701498      線上人數 : 61
RC Version 7.0 © Powered By DSPACE, MIT. Enhanced by NTU Library & TKU Library IR team.
搜尋範圍 查詢小技巧:
  • 您可在西文檢索詞彙前後加上"雙引號",以獲取較精準的檢索結果
  • 若欲以作者姓名搜尋,建議至進階搜尋限定作者欄位,可獲得較完整資料
  • 進階搜尋
    請使用永久網址來引用或連結此文件: http://tkuir.lib.tku.edu.tw:8080/dspace/handle/987654321/105687


    題名: HDFS分散式檔案系統之負載均衡演算法設計
    其他題名: A load-balancing algorithm for Hadoop Distributed File System
    作者: 林映辰;Lin, Ying-Chen
    貢獻者: 淡江大學資訊工程學系碩士班
    林其誼
    關鍵詞: Hadoop;分散式檔案系統;負載均衡;雲端運算;Distributed file system;Load Balancing;Cloud Computing
    日期: 2015
    上傳時間: 2016-01-22 15:02:40 (UTC+8)
    摘要: 隨著網路使用的普及以及儲存設備成本的降低,許多企業開始提供雲端服務。像是由Apache軟體基金會發表的Hadoop Distributed File System (HDFS)、Hadoop MapReduce和HBase,而其中HDFS分散式檔案系統更是廣受歡迎。HDFS為Master-Slave的架構,由單一個NameNode及多個存放著DataNode的Rack組成。其中NameNode為Master負責管理DataNode及使用者,而DataNode則為Slave負責儲存檔案。
    HDFS以分散式的原則儲存檔案,每個檔案皆會被切割成固定大小的Block,並且每個Block預設儲存3份Replica於不同的DataNode上。然而HDFS儲存Block的方式Block Placement並未考慮DataNode的儲存使用率,可能造成儲存負載不均衡,因此需要Balancer解決。Balancer為NameNode的一項工具,由管理員下指令執行,它會反覆的將Block從儲存使用率較高的DataNode搬移到儲存使用率較低的DataNode,直到所有DataNode的儲存使用率都介於平均值加減門檻值之間。但Balancer執行時,需大量搬移Block,會造成網路資源的消耗。並且於過往的研究指出,NameNode為系統中的效能瓶頸,由NameNode負責執行Balancer會更加降低HDFS的效能。
    因此,本研究將針對所有會影響儲存使用率的因素,設計一個新的儲存負載均衡演算法。並於演算法中新增一個節點BalanceNode,負責高儲存使用率及低儲存使用率DataNode的配對工作,讓低儲存使用率的DataNode能夠幫高儲存使用率的DataNode分擔儲存負載。
    With the advancement of Internet and increasing data demands, many enterprises are offering cloud services to their customers. Among various cloud computing platforms, the Apache Hadoop project has been widely adopted by many large organizations and enterprises. In the Hadoop ecosystem, Hadoop Distributed File System (HDFS), Hadoop MapReduce, and HBase are open source equivalents of the Google proposed Google File System (GFS), MapReduce framework, and BigTable, respectively. To meet the requirement of horizontal scaling of storage in the big data era, HDFS has received significant attention among researchers. HDFS clusters are in a master-slave architecture: there is a single NameNode and a number of DataNodes in each cluster. NameNode is the master responsible for managing the DataNodes and the client accesses. DataNodes are slaves, and are responsible for storing data.
    As the name suggests, HDFS stores files distributedly. Files are divided into fixed-sized blocks, and in default configuration each block has three replicas stored in three different DataNodes to ensure the fault tolerance capability of HDFS. However, Hadoop''s default strategy of allocating new blocks does not take into account of DataNodes’ utilization, which can lead to load imbalance in HDFS. To cope with the problem, NameNode has a built-in tool called Balancer, which can be executed by the system administrator. Balancer iteratively moves blocks from DataNodes in high utilization to those in low utilization to keep each DataNode’s disk utilization within a configurable range centered at the average utilization. The primary cost of using Balancer to achieve load balance is the bandwidth consumed during the movement of blocks. Besides, the previous research shows that the NameNode is the performance bottleneck of HDFS. That is, frequent execution of Balancer by the NameNode may degrade the performance of HDFS.
    Therefore, in this research we would like to design a new load-balancing algorithm by considering all the situations that may influence the load-balancing state. In the proposed algorithm a new role named BalanceNode is introduced to help in matching heavy-loaded and light-loaded nodes, so those light-loaded nodes can share part of the load from heavy-loaded ones. The simulation results show that our algorithm not only can achieve good load-balancing state in the HDFS, but also with minimized movement cost.
    顯示於類別:[資訊工程學系暨研究所] 學位論文

    文件中的檔案:

    檔案 描述 大小格式瀏覽次數
    index.html0KbHTML53檢視/開啟

    在機構典藏中所有的資料項目都受到原著作權保護.

    TAIR相關文章

    DSpace Software Copyright © 2002-2004  MIT &  Hewlett-Packard  /   Enhanced by   NTU Library & TKU Library IR teams. Copyright ©   - 回饋