Publication | Closed Access
Fault Analysis and Debugging of Microservice Systems: Industrial Survey, Benchmark System, and Empirical Study
345
Citations
36
References
2018
Year
Software MaintenanceEngineeringIndustrial SurveyVisualization TechniquesSoftware EngineeringMicroservice SystemsSoftware AnalysisReliability EngineeringData ScienceFault AnalysisSystems EngineeringSoftware AspectFailure DetectionReliabilityMicroservices DesignComputer EngineeringComputer ScienceDebuggerSoftware VisualizationSoftware DesignFault ManagementProgram AnalysisTrace VisualizationSoftware TestingFault InjectionSystem Software
Microservice systems’ complexity and dynamism create unique challenges for fault analysis and debugging, yet research on these issues remains limited. The study aims to survey industry practices, replicate 22 real fault cases, and evaluate whether advanced tracing and visualization can enhance debugging effectiveness. We built a medium‑size benchmark microservice system, injected 22 industrial faults, and performed an empirical study using state‑of‑the‑art tracing and visualization techniques. Results indicate that proper tracing and visualization improve debugging, and there is a strong need for intelligent, data‑driven trace analysis and visualization to further aid fault localization and exploration.
The complexity and dynamism of microservice systems pose unique challenges to a variety of software engineering tasks such as fault analysis and debugging. In spite of the prevalence and importance of microservices in industry, there is limited research on the fault analysis and debugging of microservice systems. To fill this gap, we conduct an industrial survey to learn typical faults of microservice systems, current practice of debugging, and the challenges faced by developers in practice. We then develop a medium-size benchmark microservice system (being the largest and most complex open source microservice system within our knowledge) and replicate 22 industrial fault cases on it. Based on the benchmark system and the replicated fault cases, we conduct an empirical study to investigate the effectiveness of existing industrial debugging practices and whether they can be further improved by introducing the state-of-the-art tracing and visualization techniques for distributed systems. The results show that the current industrial practices of microservice debugging can be improved by employing proper tracing and visualization techniques and strategies. Our findings also suggest that there is a strong need for more intelligent trace analysis and visualization, e.g., by combining trace visualization and improved fault localization, and employing data-driven and learning-based recommendation for guided visual exploration and comparison of traces.
| Year | Citations | |
|---|---|---|
Page 1
Page 1