Recovery from Link Failures in Networks with Arbitrary Topology via Diversity Coding
S. N. Avci, X. Hu, E. Ayanoglu
TL;DR
This paper addresses fast, reliable recovery from single link failures in networks with arbitrary topology. It introduces diversity coding as a hitless recovery mechanism and develops an overlay design to group links into basic diversity coding topologies while minimizing added spare capacity, evaluated against conventional methods via SCP, RT, and QoR. Across multiple network topologies, the approach yields substantially faster restoration and competitive or superior QoR, sometimes achieving best spare capacity. The work highlights practical potential for near-hitless restoration with reduced signaling and outlines avenues for extending the method to multiple failures using erasure coding.
Abstract
Link failures in wide area networks are common. To recover from such failures, a number of methods such as SONET rings, protection cycles, and source rerouting have been investigated. Two important considerations in such approaches are the recovery time and the needed spare capacity to complete the recovery. Usually, these techniques attempt to achieve a recovery time less than 50 ms. In this paper we introduce an approach that provides link failure recovery in a hitless manner, or without any appreciable delay. This is achieved by means of a method called diversity coding. We present an algorithm for the design of an overlay network to achieve recovery from single link failures in arbitrary networks via diversity coding. This algorithm is designed to minimize spare capacity for recovery. We compare the recovery time and spare capacity performance of this algorithm against conventional techniques in terms of recovery time, spare capacity, and a joint metric called Quality of Recovery (QoR). QoR incorporates both the spare capacity percentages and worst case recovery times. Based on these results, we conclude that the proposed technique provides much shorter recovery times while achieving similar extra capacity, or better QoR performance overall.
