Publication | Closed Access
Independent checkpointing and concurrent rollback for recovery in distributed systems-an optimistic approach
194
Citations
12
References
2003
Year
Unknown Venue
EngineeringVerificationTransactional SystemFault ToleranceTransaction ProcessingFault-tolerant MessagingFormal VerificationConcurrency ControlSelf-stabilizationCheckpoint AlgorithmSystems EngineeringFault RecoveryIndependent CheckpointingParallel ComputingSite Recovery AlgorithmsDistributed SystemsComputer ScienceConcurrent RollbackHigh Availability SoftwareData RestorationBusinessParallel ProgrammingReal-time SystemsAsynchronous SystemsDistributed Transaction
A checkpoint algorithm is presented that benefits from the research in concurrency control, commit, and site recovery algorithms in transaction processing. In the authors' approach a number of checkpointing processes, a number of rollback processes, and computations on operational processes can proceed concurrently while tolerating the failure of an arbitrary number of processes. Each process takes checkpoints independently. During recovery after a failure, a process invokes a two-phase rollback algorithm. It collects information about relevant message exchanges in the system in the first phase and uses it in the second phase to determine both the set of processes that must roll back and the set of checkpoints up to which rollback must occur. Concurrent rollbacks are completed in the order of the priorities of the recovering processes. The proposed solution is optimistic in the sense that it does well if failures are infrequent by minimizing overhead during normal processing.< <ETX xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">></ETX>
| Year | Citations | |
|---|---|---|
Page 1
Page 1