Publication | Closed Access
Dependability measurement and modeling of a multicomputer system
72
Citations
25
References
1993
Year
EngineeringSoftware SystemsComputer ArchitectureSystem ReliabilityDependable System ArchitectureSoftware AnalysisOptimal System DesignOperations ResearchReliability EngineeringMarkov Reward ModelsError DataSystems EngineeringFailure DetectionDependability AnalysisMaintainability EngineeringReliabilityComputer EngineeringNetworked Computer SystemsComputer ScienceDependability MeasurementDependability ModellingReliability Management Systems DesignReliability ModellingProgram AnalysisReliability ManagementReal-time SystemsMeasurement-based AnalysisSystem Software
A measurement-based analysis of error data collected from a DEC VAXcluster multicomputer system is presented. Basic system dependability characteristics such as error/failure distributions and hazard rate are obtained for both the individual machine and the entire VAXcluster. Markov reward models are developed to analyze error/failure behavior and to evaluate performance loss due to errors/failures. Correlation analysis is then performed to quantify relationships of error/failures across machines and across time. It is found that shared resources constitute a major reliability bottleneck. It is shown that for measured system, the homogeneous Markov model, which assumes constant failure rates, overestimates the transient reward rate for the short-term operation, and underestimates it for the long-term operation. Correlation analysis shows that errors are highly correlated across machines and across time. The failure correlation coefficient is low. However, its effect on system unavailability is significant.< <ETX xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">></ETX>
| Year | Citations | |
|---|---|---|
Page 1
Page 1