Publication | Closed Access
Analysis of link failures in an IP backbone
321
Citations
0
References
2002
Year
Unknown Venue
Reliability EngineeringNetwork ScienceAvailabilityEngineeringFault-tolerant NetworkEdge ComputingSurvivable NetworkNetwork RobustnessNetwork AnalysisSprint Ip BackboneRobust RoutingComputer ScienceIp BackbonesIp BackboneAdvanced NetworkingDelay-tolerant Networking
IP backbones are provisioned for excellent loss, delay, and availability, yet failures such as fiber cuts or router crashes can degrade performance and disrupt service, making realistic link‑failure modeling essential. This study investigates the occurrence of failures in Sprint’s IP backbone and their potential impact on emerging services such as Voice‑over‑IP. The authors examined failure frequency and duration from IS‑IS routing updates at three backbone points, analyzed inter‑failure time and duration statistics, and evaluated routing and service reconvergence times in a controlled link‑failure scenario. Link failures are frequent and mostly brief (<10 min), and packet‑forwarding disruption depends on routing dynamics and router architecture/control planes, providing insights for network‑wide availability metrics better suited for SLAs of emerging applications.
Today's IP backbones are provisioned to provide excellent performance in terms of loss, delay and availability. However, performance degradation and service disruption are likely in the case of failure, such as fiber cuts, router crashes, etc. In this paper, we investigate the occurence of failures in Sprint's IP backbone and their potential impact on emerging services such as Voice-over-IP (VoIP). We first examine the frequency and duration of failure events derived from IS-IS routing updates collected from three different points in the Sprint IP backbone. We observe that link failures occur as part of everyday operation, and the majority of them are short-lived (less than 10 minutes). We also discuss various statistics such as the distribution of inter-failure time, distribution of link failure durations, etc. which are essential for constructing a realistic link failure model. Next, we present an analysis of routing and service reconvergence time during a controlled link failure scenario in our backbone. Our results indicate that disruption to packet forwarding after link failures depends not only on routing protocol dynamics, but also on the design of routers' architectures and control planes. Thus our results offer insights into two basic components for defining network-wide availability, which we consider a more appropriate metric for service-level agreements to support emerging applications.