Publication | Closed Access
Learning a Hierarchical Monitoring System for Detecting and Diagnosing Service Issues
48
Citations
22
References
2015
Year
Unknown Venue
Diagnosing Service IssuesEngineeringMachine LearningBusiness IntelligenceDiagnosisService MonitoringSystem DiagnosisReliability EngineeringData ScienceData MiningIntelligent ServiceManagementSystems EngineeringData IntegrationData ManagementReliabilityHierarchical Monitoring SystemKnowledge DiscoveryComputer ScienceIntelligent AnalyticsMonitoring SystemSoftware TestingMonitoringSystem MonitoringIntelligent Service SystemIndustrial InformaticsNetwork MonitoringBig DataEvent-driven Monitoring
We propose a machine learning based framework for building a hierarchical monitoring system to detect and diagnose service issues. We demonstrate its use for building a monitoring system for a distributed data storage and computing service consisting of tens of thousands of machines. Our solution has been deployed in production as an end-to-end system, starting from telemetry data collection from individual machines, to a visualization tool for service operators to examine the detection outputs. Evaluation results are presented on detecting 19 customer impacting issues in the past three months.
| Year | Citations | |
|---|---|---|
Page 1
Page 1