Publication | Open Access
Practical scalable consensus for pseudo-synchronous distributed systems
19
Citations
27
References
2015
Year
Unknown Venue
Cluster ComputingEngineeringVerificationFault TolerancePractical Scalable ConsensusFault-tolerant MessagingFormal VerificationSelf-stabilizationEra AlgorithmByzantine FaultSynchronization ProtocolDistributed EnvironmentSystems EngineeringParallel ComputingConsensus AlgorithmComputer EngineeringDistributed SystemsComputer ScienceDistributed ComputingCloud ComputingFormal MethodsParallel ProgrammingAgreement Algorithm
The ability to consistently handle faults in a distributed environment requires, among a small set of basic routines, an agreement algorithm allowing surviving entities to reach a consensual decision between a bounded set of volatile resources. This paper presents an algorithm that implements an Early Returning Agreement (ERA) in pseudo-synchronous systems, which optimistically allows a process to resume its activity while guaranteeing strong progress. We prove the correctness of our ERA algorithm, and expose its logarithmic behavior, which is an extremely desirable property for any algorithm which targets future exascale platforms. We detail a practical implementation of this consensus algorithm in the context of an MPI library, and evaluate both its efficiency and scalability through a set of benchmarks and two fault tolerant scientific applications.
| Year | Citations | |
|---|---|---|
Page 1
Page 1