以軟體定義網路改善Hadoop叢集之運作效能

淡江大學機構典藏 > 工學院 > 資訊工程學系暨研究所 > 學位論文 > Item 987654321/105697

Please use this identifier to cite or link to this item: https://tkuir.lib.tku.edu.tw/dspace/handle/987654321/105697

Title:	以軟體定義網路改善Hadoop叢集之運作效能
Other Titles:	Using software-defined networking to improve the performance of Hadoop clusters
Authors:	廖振宇;Liao, Jhen-Yu
Contributors:	淡江大學資訊工程學系碩士班林其誼;Lin, Chi-Yi
Keywords:	Hadoop;Shuffle;MapReduce;以軟體定義網路;Software-Defined Networking
Date:	2015
Issue Date:	2016-01-22 15:02:56 (UTC+8)
Abstract:	由於現今網路發展迅速，巨量資料的時代已經來臨。全球各大企業與組織紛紛改採雲端運算模型來解決他們所面臨的諸多問題。雲端運算的龐大計算與儲存能力來自於大型資料中心，其中運行的巨量資料處理核心工具則多數為Hadoop MapReduce。相關研究指出資料中心透過MapReduce運算框架所產生的中介資料交互傳遞(Shuffling)行為造成網路壅塞現象，直接影響到運算工作的執行效能。針對這個問題，已有一些研究初步驗證了結合軟體定義網路技術將是一個可行的解決方案。換言之，若能將MapReduce排程機制與軟體定義網路結合，藉由動態調整網路資源，將能有效提升Hadoop叢集運作性能。因此，在本研究中，在SDN的網路環境架構下執行Hadoop MapReduce的分散式運算，利用SDN可控制網路頻寬的特性，將Hadoop Shuffle的封包導入頻寬較大的Flow Entry，以加快Shuffle的執行速度，進而改善Hadoop執行MapReduce的效能。本研究設計四組實驗來證明利用SDN可以明顯改善Hadoop MapReduce的效能，在實驗過程當中額外加入其他的封包干擾MapReduce的運算，其結果顯示加入封包干擾的情況下，加速Shuffle封包的處理也可以明顯降低MapReduce執行的時間。最後，本研究針對Hadoop設計一個SDN App，在SDN網路環境之下，建立、刪除Flow Entry基本上都是透過終端機來操作執行，透過這個SDN App使用者可以方便快速的執行建立、刪除Flow Entry的動作。 With the rapid development of Internet technologies, we are now in the era of Big Data. To meet the need of handling a vast amount of data, many global enterprises are using cloud computing model to solve their problems. The massive computing and storage capabilities of cloud computing come from a huge cluster of servers in data centers, and many data centers are using Hadoop MapReduce to process their data. Previous researches pointed out that sending intermediate data to the Reducers during the shuffle phase of MapReduce can cause network congestion in the data center, and thus degrades the overall computation performance. To address this issue, some researchers proposed the idea of introducing the Software-Defined Networking (SDN) technology into a Hadoop cluster. Specifically, with the knowledge of the scheduling of MapReduce jobs, the SDN technology can be used to adjust the network resources dynamically to prevent network congestion during the Shuffle phase. As a result, the MapReduce jobs can be completed faster. Therefore, in this research we build a small-scale experimental Hadoop cluster with two Open vSwitches and one Floodlight Controller. By matching shuffle traffic to a flow entry with higher transmission rate, even when the network is congested, those packets carrying intermediate data can be sent as fast as possible, so the Hadoop MapReduce execution time is reduced. To prove the idea, we designed four experiments and compared the experimental results. Finally, to ease the administration of creating and deleting flow entries for Hadoop applications, we designed an SDN App for Hadoop MapReduce. This is accomplished by using Node.js technology and the REST APIs provided by the Floodlight Controller.
Appears in Collections:	[資訊工程學系暨研究所] 學位論文

Files in This Item:

File	Description	Size	Format
index.html		0Kb	HTML	303	View/Open

Loading...