Publication | Closed Access
Detecting large-scale system problems by mining console logs
1.2K
Citations
34
References
2009
Year
Unknown Venue
Software MaintenanceEngineeringSoftware EngineeringSource Code AnalysisHadoop File SystemConsole LogsSoftware AnalysisData ScienceData MiningFuzzingLog ManagementKnowledge DiscoveryComputer ScienceDebuggerStatic Program AnalysisAccess Log AnalysisLog AnalysisProgram AnalysisSoftware TestingSystem Software
Console logs rarely help operators detect problems in large‑scale datacenter services because they contain voluminous intermixing of messages from many independent components. The study proposes a general methodology to mine console logs for automatic detection of system runtime problems. The method parses logs via source‑code analysis and information retrieval to build composite features, then applies machine learning to detect problems and distills results into an operator‑friendly decision tree. The approach outperforms prior methods, accurately detecting real problems on Darkstar and Hadoop with few false positives, processing 24 million log lines in 3 minutes, and works on any size logs without requiring software changes, human input, or internal knowledge.
Surprisingly, console logs rarely help operators detect problems in large-scale datacenter services, for they often consist of the voluminous intermixing of messages from many software components written by independent developers. We propose a general methodology to mine this rich source of information to automatically detect system runtime problems. We first parse console logs by combining source code analysis with information retrieval to create composite features. We then analyze these features using machine learning to detect operational problems. We show that our method enables analyses that are impossible with previous methods because of its superior ability to create sophisticated features. We also show how to distill the results of our analysis to an operator-friendly one-page decision tree showing the critical messages associated with the detected problems. We validate our approach using the Darkstar online game server and the Hadoop File System, where we detect numerous real problems with high accuracy and few false positives. In the Hadoop case, we are able to analyze 24 million lines of console logs in 3 minutes. Our methodology works on textual console logs of any size and requires no changes to the service software, no human input, and no knowledge of the software's internals.
| Year | Citations | |
|---|---|---|
Page 1
Page 1