Publication | Closed Access
Fault injection framework for system resilience evaluation
31
Citations
6
References
2009
Year
Unknown Venue
Cluster ComputingAvailabilityEngineeringSurvivable SystemComputer ArchitectureRobustness TestingSoftware EngineeringFault ToleranceSoftware AnalysisReliability EngineeringSystems EngineeringFault RecoveryParallel ComputingComputer EngineeringComputer ScienceLarge Scale SystemFault Injection FrameworkHigh Availability SoftwareLarge Scale SystemsSoftware TestingCloud ComputingResilience AnalysisSystem ResilienceHigh AvailabilityFault InjectionSystem Software
As high-performance computing (HPC) systems increase in size and complexity they become more difficult to manage. The enormous component counts associated with these large systems lead to significant challenges in system reliability and availability. This in turn is driving research into the resilience of large scale systems, which seeks to curb the effects of increased failures at large scales by masking the inevitable faults in these systems. The basic premise being that failure must be accepted as a reality of large scale system and coped with accordingly through system resilience.
| Year | Citations | |
|---|---|---|
Page 1
Page 1