Publication | Closed Access
Heterogeneity-aware data regeneration in distributed storage systems
35
Citations
14
References
2014
Year
Unknown Venue
Cluster ComputingFlexible Regeneration SchemeEngineeringStorage ManagementNetwork AnalysisHeterogeneity-aware Data RegenerationStorage SystemsData ScienceNetwork TrafficCombinatorial OptimizationNetwork OptimizationData ManagementFile SystemsDistributed SystemsComputer ScienceNetwork ScienceGraph TheoryCloud ComputingBusinessDistributed Data StoreBig Data
Distributed storage systems provide large-scale reliable data storage services by spreading redundancy across a large group of storage nodes. In such big systems, node failures take place on a regular basis. When a node fails or leaves the system, to maintain the same level of redundancy, it is expected to regenerate the redundant data at a replacement node as soon as possible. Previous studies aim to minimize the network traffic in the regeneration process, but in practical networks, where link capacities vary in a wide range, minimizing network traffic does not always mean minimizing regeneration time. Considering the heterogeneous link capacities, Li et al. proposed a tree-structured regeneration scheme, called RCTREE, to bypass the low-capacitated link encountered in direct transmissions. However, we find that RCTREE may rapidly lose data integrity after several regenerations. In this paper, we reconsider the problem of minimizing regeneration time in networks with heterogeneous link capacities. We derive the minimum amount of data to be transmitted through each link to preserve data integrity. We prove that building an optimal regeneration tree is NP-complete and propose a heuristic algorithm for a near-optimal solution. We further introduce a flexible regeneration scheme, which allows providers to generate different amount of coded data. Simulation results show that the flexible tree-structured regeneration scheme can reduce the regeneration time significantly.
| Year | Citations | |
|---|---|---|
Page 1
Page 1