Publication | Closed Access
Preventing useless checkpoints in distributed computations
60
Citations
22
References
2002
Year
Unknown Venue
Cluster ComputingEngineeringVerificationFault ToleranceFault-tolerant MessagingFormal VerificationSystems EngineeringFault RecoveryParallel ComputingDistributed ComputationsLocal CheckpointDistributed SystemsComputer ScienceUseless CheckpointHigh Availability SoftwareFault-tolerant NetworkConsistent Global CheckpointDistributed ComputingFormal MethodsParallel Programming
A useless checkpoint is a local checkpoint that cannot be part of a consistent global checkpoint. The paper addresses the following important problem. Given a set of processes that take (basic) local checkpoints in an independent and unknown way, the problem is to design a communication induced checkpointing protocol that directs processes to take additional local (forced) checkpoints to ensure that no local checkpoint is useless. A general and efficient protocol answering this problem is proposed. It is shown that several existing protocols that solve the same problem are particular instances of it. The design of this general protocol is motivated by the use of communication induced checkpointing protocols in "consistent global checkpoint" based distributed applications. Detection of stable or unstable properties, rollback recovery and determination of distributed breakpoints are examples of such applications.
| Year | Citations | |
|---|---|---|
Page 1
Page 1