Publication | Closed Access
ReVive: cost-effective architectural support for rollback recovery in shared-memory multiprocessors
108
Citations
25
References
2003
Year
Unknown Venue
Cluster ComputingEngineeringComputer ArchitectureFault ToleranceHardware CostArchitectural SupportFault-tolerant MessagingHardware SecurityShared MemoryFault RecoveryParallel ComputingRevive PerformsMemory ManagementComputer EngineeringComputer ScienceVirtual MemoryHigh Availability SoftwareProgram AnalysisCloud ComputingParallel ProgrammingRollback RecoveryParity ProtectionSystem SoftwareTransactional Memory
This paper presents ReVive, a novel general-purpose rollback recovery mechanism for shared-memory multiprocessors. ReVive carefully balances the conflicting requirements of availability, performance, and hardware cost. ReVive performs checkpointing, logging, and distributed parity protection, all memory-based. It enables recovery from a wide class of errors, including the permanent loss of an entire node. To maintain high performance, ReVive includes specialized hardware that performs frequent operations in the background, such as log and parity updates. To keep the cost low, more complex checkpointing and recovery functions are performed in software, while the hardware modifications are limited to the directory controllers of the machine. Our simulation results on a 16-processor system indicate that the average error-free execution time overhead of using ReVive is only 6.3%, while the achieved availability is better than 99.999% even when the errors occur as often as once per day.
| Year | Citations | |
|---|---|---|
Page 1
Page 1