Publication | Closed Access
Correlating instrumentation data to system states: a building block for automated diagnosis and control
473
Citations
27
References
2004
Year
Unknown Venue
Software MaintenanceEngineeringDiagnosisService MonitoringSoftware EngineeringSystem MetricSystem DiagnosisSoftware AnalysisInstrumentation And ControlReliability EngineeringData ScienceData MiningManagementSystems EngineeringStatisticsPerformance MetricPredictive AnalyticsKnowledge DiscoveryInstrumentation DataConformance CheckingComputer ScienceSystem StatesStatistical Induction TechniquesPerformance MonitoringProgram AnalysisDiagnostic SystemSoftware TestingBuilding BlockAutomated Performance DiagnosisSystem MonitoringData ModelingPerformance Forecasting
The study aims to develop and evaluate statistical induction tools for automated performance diagnosis and management of Internet server platforms using offline and online analysis of instrumentation metrics. The authors employ Tree‑Augmented Bayesian Networks to map metric combinations and thresholds to Service Level Objective compliance in a three‑tier web service, and evaluate these models for offline forensic diagnosis and limited online forecasting under stable workloads. Experimental results show that compact TAN models accurately capture performance patterns, are computationally efficient, and provide interpretable insights, making them strong candidates for automated diagnosis and control.
This paper studies the use of statistical induction techniques as a basis for automated performance diagnosis and performance management. The goal of the work is to develop and evaluate tools for offline and online analysis of system metrics gathered from instrumentation in Internet server platforms. We use a promising class of probabilistic models (Tree-Augmented Bayesian Networks or TANs) to identify combinations of system-level metrics and threshold values that correlate with high-level performance states--compliance with Service Level Objectives (SLOs) for average-case response time--in a three-tier Web service under a variety of conditions.\n\\nExperimental results from a testbed show that TAN models involving small subsets of metrics capture patterns of performance behavior in a way that is accurate and yields insights into the causes of observed performance effects. TANs are extremely efficient to represent and evaluate, and they have interpretability properties that make them excellent candidates for automated diagnosis and control. We explore the use of TAN models for offline forensic diagnosis, and in a limited online setting for performance forecasting with stable workloads.
| Year | Citations | |
|---|---|---|
Page 1
Page 1