雲端運算系統中資料容錯與恢復之探討

淡江大學機構典藏 > 工學院 > 電機工程學系暨研究所 > 學位論文 > Item 987654321/88128

jsp.display-item.identifier=請使用永久網址來引用或連結此文件: https://tkuir.lib.tku.edu.tw/dspace/handle/987654321/88128

题名:	雲端運算系統中資料容錯與恢復之探討
其它题名:	Investigating data fault tolerance and recovery in cloud computing systems
作者:	王竣星;Wang, Chung-Hsing
贡献者:	淡江大學電機工程學系碩士班莊博任
关键词:	容錯率;非硬碟檢察;XOR運算;備份;Fault-tolerance;Diskless Checkpoint;Exclusive-OR;recovery;Backup
日期:	2013
上传时间:	2013-04-13 12:02:03 (UTC+8)
摘要:	在資訊科技的迅速進步下，資料傳輸的量也以倍數成長，因此資料的保存就更顯重要。當伺服器發生錯誤的時候，就可以馬上利用別台伺服器來修復遺失的資料，來確保資料的正確性。檢查點(checkpoint)是容錯系統中做為資料的重要備份技術，根據不同的備份方式，備份資料以及恢復時間也就有所不同，MA [11]改進了傳統neighbor-based scheme的方式，利用互助的方式提升容錯率，且容錯率將可達到 k/2k+1。DDC [12]以矩陣方式來做備份的運算，利用矩陣相乘產生備份資料，產生出一組固定矩陣M， B為原本的資料，C為產生出來的encoded checkpoint，MB=C，當B發生錯誤，我們想要恢復B的資料時，恢復方程式如下: M -1MB= M -1C，就可算出原來的矩陣B了。G.J. code [13] 的策略是為任兩台伺服器之間，都會有間隔距離，也就是伺服器的編號差，利用這些編號差，產生出partial sum restricted sequence (PSR sequence)數列，即任意兩個數字不能重複，任兩個以上的數字總和也不能等於後面任意其中一個，在這PSR的策略下，當資料發生錯誤時，就可以立即恢復錯誤伺服器的資料。我們所提出的方法是把六台伺服器設為一組group，並把每組group內的伺服器都給予編號，每台伺服器都會把自己的data傳給自己編號加2和加3的伺服器buffer裡，每組group都可以容忍隨意三個以下的伺服器發生錯誤，容錯率達到3/6。因為在平行處理的情況下，每組group都是同時進行的，並不會互相影響，所以不管有多少組group，運算及復原時間都跟只有一組group是一樣的，且buffer內的XOR運算只需一次。我們用C語言建立出模擬架構，模擬的結果證明我們所提出來的方法比其他方式的容錯成功率優異許多，時間也花得較少。同時我們利用分組的概念來改進其他論文buffer內需要大量運算的缺點，因此節省運算的時間。 By the rapid development of information technology, amount of data transfer also grow in multiples. Therefore, the preservation of information is even more important. It can use others processor to recovery data to ensure the correctness of the data when the processor fail. Checkpoint is the important technology to backup the data in fault-tolerant. According to the different backup mode, backup data and recovery time won’t be the same. MA [11] improves tradition neighbor-based scheme. Using mutual assistance to enhance fault-tolerance reach k/2k+1(k is the number of fail). DDC [12] uses matrix multiplication to backup the data. First make a square matrix M, B is the initial data. Use M multiplied by B is C (MB=C). C is the encoded checkpoint. We can use formula M -1MB= M -1C to recover B when it is fail. About G.J. code [13] method, there will be interval distance between any two processors. It’s mean the processor ID distance. Use processor ID distance to produce partial sum restricted sequence (PSR sequence). Any two numbers can’t be repeated and two or more consecutive sum can’t equal to anyone. Under the PSR strategy, we can immediately recovery the data when the processor is failed. We propose to put six processors to be a group, and give any processor a ID number. Each processor Pi will send the data to Pi+2, Pi+2. The number of maximum fault tolerance of each three in the group, so the fault-tolerance can reach 50%. Duo to the parallel computing, all the groups are running at the same time and they won’t influence each other. However the number of groups, the time for backup and recovery are the same with one group, and it just spend one time to do XOR. We use language C to establish a simulation environment. The simulation results prove that our proposed method is much better than other methods of fault-tolerance and recovery time. At the same time, we take advantage of the group-based method to improve the disadvantage of other methods requires a lot of computing.
显示于类别:	[電機工程學系暨研究所] 學位論文

文件中的档案:

档案	大小	格式	浏览次数
index.html	0Kb	HTML	280	检视/开启

在機構典藏中所有的数据项都受到原著作权保护.

TAIR相关文章

数据加载中.....