Publication | Closed Access
A measurement-based model for estimation of resource exhaustion in operational software systems
176
Citations
23
References
2003
Year
Unknown Venue
Software MaintenanceSoftware Reliability TestingEngineeringSoftware SystemsSoftware EngineeringSystem ReliabilitySystem Workload StateSoftware AnalysisResource ExhaustionReliability EngineeringData ScienceSoftware AgingMeasurement-based ModelSystems EngineeringSoftware Engineering EconomicsQuantitative ManagementSoftware MeasurementComputer ScienceSoftware DesignSoftware EvolutionSystem WorkloadHigh Availability SoftwareOperating SystemsReliability ModellingProgram AnalysisSoftware TestingSoftware MetricOperational Software SystemsSoftware SystemSystem Software
Software systems are known to suffer from outages due to transient errors. Recently, the phenomenon of "software aging", in which the state of the software system degrades with time, has been reported (S. Garg et al., 1998). The primary causes of this degradation are the exhaustion of operating system resources, data corruption and numerical error accumulation. This may eventually lead to performance degradation of the software or crash/hang failure, or both. Earlier work in this area to detect aging and to estimate its effect on system resources did not take into account the system workload. In this paper, we propose a measurement-based model to estimate the rate of exhaustion of operating system resources both as a function of time and the system workload state. A semi-Markov reward model is constructed based on workload and resource usage data collected from the UNIX operating system. We first identify different workload states using statistical cluster analysis and build a state-space model. Corresponding to each resource, a reward function is then defined for the model based on the rate of resource exhaustion in the different states. The model is then solved to obtain trends and the estimated exhaustion rates and the time-to-exhaustion for the resources. With the help of this measure, proactive fault management techniques such as "software rejuvenation" (Y. Huang et al., 1995) may be employed to prevent unexpected outages.
| Year | Citations | |
|---|---|---|
Page 1
Page 1