Publication | Closed Access
Strategies for storage of checkpointing data using non-dedicated repositories on Grid systems
14
Citations
10
References
2005
Year
Unknown Venue
Cluster ComputingEngineeringComputer ArchitectureFault ToleranceData GridFault-tolerant MessagingInformation DispersalParity InformationGrid DatabaseSystems EngineeringParallel ComputingData ManagementComputer ScienceNon-dedicated RepositoriesHigh Availability SoftwareDistributed ComputingCloud ComputingParallel ProgrammingGrid SystemsDistributed Data StoreLarge Amounts
Dealing with the large amounts of data generated by long-running parallel applications is one of the most challenging aspects of Grid Computing. Periodic checkpoints might be taken to guarantee application progression, producing even more data. The classical approach is to employ high-throughput checkpoint servers connected to the computational nodes by high speed networks. In the case of Opportunistic Grid Computing, we do not want to be forced to rely on such dedicated hardware. Instead, we want to use the shared Grid nodes to store application data in a distributed fashion.In this work, we evaluate several strategies to store checkpoints on distributed non-dedicated repositories. We consider the tradeoff among computational overhead, storage overhead, and degree of fault-tolerance of these strategies. We compare the use of replication, parity information, and information dispersal (IDA). We used InteGrade, an object-oriented Grid middleware, to implement the storage strategies and perform evaluation experiments.
| Year | Citations | |
|---|---|---|
Page 1
Page 1