In cloud computing systems with huge volumes of data, fault tolerance is of critical importance. To enhance data fault tolerance in cloud systems, we introduce a new groupbased data backup and recovery scheme in this paper. The new scheme performs efficient diskless checkpointing practices to maintain data correctness via alternative processors upon processor failure. The basic idea is to place six processors in a transmission group, with each processor sending data to only two member processors. In face of processor failure, such a practice helps reduce the needed data backup volume and recovery time, and reaches up to 3/6 fault-tolerance ratios. Our scheme attains the performance gain mainly because (1) it allows a processor to receive only two backup data from the group - each processor hence performs only one XOR during data backup, and (2) all groups work independently in parallel so that the needed data backup and recovery time is reduced to that for a single group. To compare the performance of our scheme and related schemes, we carry out extended simulation runs with results indicating improved survival counts, fault-tolerance ratios and computation overhead for our scheme.
Journal of Information Science and Engineering 33(1), p.183-198