Concepedia

Publication | Closed Access

Analysis of Long-Running Replicated Systems

97

Citations

14

References

2006

Year

Abstract

We address the problem of using replication to reliably maintain state in a distributed system for time spans that far exceed the lifetimes of individual replicas. This scenario is relevant for any system comprised of a potentially large and selectable number of replicated components, each of which may be highly unreliable, where the goal is to have enough replicas to keep the system "alive" (meaning at least one replica is working or available) for a certain expected period of time, i.e., the system's lifetime. In particular, this applies to recent efforts to build highly available storage systems based on the peer-to-peer paradigm. We model notions of replica loss and replica repair in such systems by a simple Markov chain model, and derive an expression for the lifetime of the replicated state. We then apply this model to study the impact of practical considerations like storage and bandwidth limits on the system, and describe methods to optimally choose system parameters so as to maximize lifetime. Our analysis sheds light on the efficacy of various replication strategies.

References

YearCitations

Page 1