Publication | Closed Access
How Long Will it Take to Mitigate this Incident for Online Service Systems?
12
Citations
41
References
2021
Year
Unknown Venue
EngineeringMachine LearningInformation SecurityIncident- Ttm PredictionCyber CrimeIncident TtmLarge Language ModelText MiningNatural Language ProcessingData ScienceDenial-of-service AttackManagementMulti-task LearningIncident ManagementMachine TranslationLarge Ai ModelCybercrimeSequence ModellingPredictive AnalyticsNlp TaskKnowledge DiscoveryComputer ScienceDeep LearningAccurate Ttm PredictionOnline Service SystemsCyberwarfare
Online service systems may encounter a large number of incidents, which should be mitigated as soon as possible to minimize the service disruption time and ensure high service availability. The ability to predict TTM (Time To Mitigation) of incidents can help service teams better organize the mainte-nance efforts. Although there are many traditional bug-fixing time prediction methods, we find that there are not readily available for incident- TTM prediction due to the characteristics of incidents. To better understand how incidents are mitigated, we conduct the first empirical study of incident TTM on 20 large-scale online service systems in Microsoft. We investigate the time distribution in the main stages of the incident life cycle and explore factors affecting TTM. Based on our empirical findings, we propose TTMPred, a deep-learning-based approach for incident- TTM prediction in a continuous triage scenario. Our model designs a two-level attention-based bidirectional GRU model to capture both the semantic information in text data and the temporal information in incremental discussions. And based on a novel continuous loss function, it builds a regression model to achieve accurate TTM prediction as much as possible at each time point of prediction. Our experiments on four large-scale online service systems in Microsoft show that TTMPred is effective and significantly outperforms the compared approaches. For example, TTMPred improves the state-of-the-art regression-based approach by 25.66% on average in terms of MAE (Mean Absolute Error).
| Year | Citations | |
|---|---|---|
Page 1
Page 1