Publication | Closed Access
Checkpointing multicomputer applications
64
Citations
15
References
2002
Year
Unknown Venue
Cluster ComputingEngineeringDistributed AlgorithmsComputer ArchitectureFault ToleranceFault-tolerant MessagingCheckpointed ImageSystems EngineeringParallel ComputingNetworked Computer SystemsCheckpointing SchemeDistributed SystemsComputer ScienceDistributed ProcessingDistributed ComputingParallel ProgrammingReal-time SystemsMulticomputer ApplicationsAsynchronous SystemsSystem SoftwareMinimal Message Logging
The authors present a checkpointing scheme that is transparent, imposes overhead only during checkpoints, requires minimal message logging, and allows for quick resumption of execution from a checkpointed image. Since checkpointing multicomputer applications poses requirements different from those posed by checkpointing general distributed systems, existing distributed checkpointing schemes are inadequate for multicomputer checkpointing. The proposed checkpointing scheme makes use of special properties of multicomputer interconnection networks to satisfy this set of requirements. The proposed algorithm is efficient both when taking checkpoints and when recovering from checkpointed images.< <ETX xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">></ETX>
| Year | Citations | |
|---|---|---|
Page 1
Page 1