Publication | Closed Access
Execution Anomaly Detection in Distributed Systems through Unstructured Log Analysis
549
Citations
20
References
2009
Year
Unknown Venue
Software MaintenanceAnomaly DetectionExecution AnomaliesEngineeringInformation SecuritySoftware EngineeringSoftware AnalysisFormal VerificationData ScienceSystems EngineeringExecution Anomaly DetectionLog FilesLog ManagementRuntime VerificationDistributed SystemsComputer ScienceStatic Program AnalysisSoftware DesignAccess Log AnalysisLog AnalysisAnomalies DetectionProgram AnalysisSoftware TestingFormal MethodsEvent-driven MonitoringSystem Software
Detection of execution anomalies is very important for the maintenance, development, and performance refinement of large scale distributed systems. Execution anomalies include both work flow errors and low performance problems. People often use system logs produced by distributed systems for troubleshooting and problem diagnosis. However, manually inspecting system logs to detect anomalies is unfeasible due to the increasing scale and complexity of distributed systems. Therefore, there is a great demand for automatic anomalies detection techniques based on log analysis. In this paper, we propose an unstructured log analysis technique for anomalies detection. In the technique, we propose a novel algorithm to convert free form text messages in log files to log keys without heavily relying on application specific knowledge. The log keys correspond to the log-print statements in the source code which can provide cues of system execution behavior. After converting log messages to log keys, we learn a Finite State Automaton (FSA) from training log sequences to present the normal work flow for each system component. At the same time, a performance measurement model is learned to characterize the normal execution performance based on the log messages' timing information. With these learned models, we can automatically detect anomalies in newly input log files. Experiments on Hadoop and SILK show that the technique can effectively detect running anomalies.
| Year | Citations | |
|---|---|---|
Page 1
Page 1