Publication | Closed Access
The performance of consistent checkpointing
355
Citations
27
References
2003
Year
Unknown Venue
Cluster ComputingEngineeringTransparent Fault ToleranceVerificationConsistent CheckpointsFault ToleranceFault-tolerant MessagingFormal VerificationData ConsistencySystems EngineeringFault RecoveryParallel ComputingComputer EngineeringDistributed SystemsComputer ScienceDistributed ProcessingDistributed ComputingParallel ProgrammingConsistent Checkpointing
Consistent checkpointing provides transparent fault tolerance for long-running distributed applications. Performance measurements of an implementation of consistent checkpointing are described. The measurements show that consistent checkpointing performs remarkably well. Eight computation-intensive distributed applications were executed on a network of 16 diskless Sun-3/60 workstations, and the performance without checkpointing was compared to the performance with consistent checkpoints taken at two-minute intervals. For six of the eight applications, the running time increased by less than 1% as a result of the checkpointing. The highest overhead measured was 5.8%. Incremental checkpointing and copy-on write checkpointing were the most effective techniques in lowering the running time overhead. It is argued that these measurements show that consistent checkpointing is an efficient way to provide fault tolerance for long-running distributed applications.< <ETX xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">></ETX>
| Year | Citations | |
|---|---|---|
Page 1
Page 1