On the Resilience of Fast Failover Routing Against Dynamic Link Failures

Wenkai Dai; Klaus-Tycho Foerster; Stefan Schmid

On the Resilience of Fast Failover Routing Against Dynamic Link Failures

Wenkai Dai, Klaus-Tycho Foerster, Stefan Schmid

TL;DR

This initial work re-investigate the resilience of failover routing against link flapping, by categorizing link failures into static, semi-dynamic, and dynamic types, shedding light on the capabilities and limitations of failover routing under these scenarios.

Abstract

Modern communication networks feature local fast failover mechanisms in the data plane, to swiftly respond to link failures with pre-installed rerouting rules. This paper explores resilient routing meant to tolerate $\leq k$ simultaneous link failures, ensuring packet delivery, provided that the source and destination remain connected. While past theoretical works studied failover routing under static link failures, i.e., links which permanently and simultaneously fail, real-world networks often face link flapping--dynamic down states caused by, e.g., numerous short-lived software-related faults. Thus, in this initial work, we re-investigate the resilience of failover routing against link flapping, by categorizing link failures into static, semi-dynamic (removing the assumption that links fail simultaneously), and dynamic (removing the assumption that links fail permanently) types, shedding light on the capabilities and limitations of failover routing under these scenarios. We show that $k$-edge-connected graphs exhibit $(k-1)$-resilient routing against dynamic failures for $k \leq 5$. We further show that this result extends to arbitrary $k$ if it is possible to rewrite $\log k$ bits in the packet header. Rewriting $3$ bits suffices to cope with $k$ semi-dynamic failures. However, on general graphs, tolerating $2$ dynamic failures becomes impossible without bit-rewriting. Even by rewriting $\log k$ bits, resilient routing cannot resolve $k$ dynamic failures, demonstrating the limitation of local fast rerouting.

On the Resilience of Fast Failover Routing Against Dynamic Link Failures

TL;DR

Abstract

simultaneous link failures, ensuring packet delivery, provided that the source and destination remain connected. While past theoretical works studied failover routing under static link failures, i.e., links which permanently and simultaneously fail, real-world networks often face link flapping--dynamic down states caused by, e.g., numerous short-lived software-related faults. Thus, in this initial work, we re-investigate the resilience of failover routing against link flapping, by categorizing link failures into static, semi-dynamic (removing the assumption that links fail simultaneously), and dynamic (removing the assumption that links fail permanently) types, shedding light on the capabilities and limitations of failover routing under these scenarios. We show that

-edge-connected graphs exhibit

-resilient routing against dynamic failures for

. We further show that this result extends to arbitrary

if it is possible to rewrite

bits in the packet header. Rewriting

bits suffices to cope with

semi-dynamic failures. However, on general graphs, tolerating

dynamic failures becomes impossible without bit-rewriting. Even by rewriting

bits, resilient routing cannot resolve

dynamic failures, demonstrating the limitation of local fast rerouting.

Paper Structure (11 sections, 17 theorems, 6 figures, 1 table, 2 algorithms)

This paper contains 11 sections, 17 theorems, 6 figures, 1 table, 2 algorithms.

Introduction and Related Work
Contributions
Organization
Preliminaries
Ideal Resilience Against Dynamic Failures
Background on Ideal Resilience against Static Failures
Ideal Resilience without Rewriting Bits in Packet Header
Ideal Resilience by Packet Header Rewriting
Perfect Resilience Against Dynamic Failures
Conclusions and Future Work
First Insights for Ideal Resilience against Static Failures

Key Result

Theorem 2

Given a $k$-connected graph $G$, with $k\leq 3$, any circular-arborescences routing is $(k-1)$-resilient against dynamic failures.

Figures (6)

Figure 1: An illustration of main ideas to prove Theorem \ref{['thm: 4-resilience']}. When each arborescence $T\in \mathcal{T}$ with $\mathcal{T}=\left\lbrace T_1, T_2, T_3, T_4 \right\rbrace$ contains at least one failure in $F$ of $\left| F\right| \leq 3$, then its meta-graph $H_F=\left(V_F, E_F\right)$ can be represented by one of these four subfigures in Fig. \ref{['fig:4-resilience-proof-idea']}, where each node $T^j_i\in V_F$, $0\leq j\leq 3$ and $1\leq i\leq 4$, denotes an arborescence $T_{\left( i+j\right) \mod 4}\in \mathcal{T}$, and each edge $\left\lbrace T^j_i, T^\ell_i\right\rbrace \in E_F$ (solid line) represents a failure in $F$, which is shared by two arborescences $T^j_i\in \mathcal{T}$ and $T^\ell_i\in \mathcal{T}$. The circular-arborescence routing with the ordering $\left\langle T_1, T_2, T_3, T_4 \right\rangle$, denoted by dashed (blue) arcs, always includes a bouncing from $T^j_i\in V_F$ to $T^\ell_i\in V_F$, where $T^\ell_i$ has a degree of one in $H_F$, indicating a potentiality of well-bouncing. After a circular-arborescence routing switching from $T^j_i\in V_F$ to $T^\ell_i\in V_F$, a canonical routing along $T^\ell_i$ might not arrive at the destination $t$ directly, even if the arc $\left(T^j_i, T^\ell_i \right)$ implies a well-bouncing, since the current failure confronted during a canonical routing along $T^j_i\in V_F$ may be different from the right failure that leads to the well-bouncing. However, we can prove that the routing eventually hits the right failure of the well-bouncing to approach $t$ by repeating the circular-arborescence routing of $\left\langle T_1, T_2, T_3, T_4 \right\rangle$ at most two times.
Figure 2: Counter-example for achieving $1$-resilience against dynamic failures in a $2$-connected graph $G$ when each node must employ a link-circular routing function. When each node uses a link-circular routing, it has only two possible orderings for its neighbors, i.e., clockwise and counter-clockwise for the shown drawing. For example, the clockwise and counter-clockwise orderings for $a$ are $\left<t,c,d\right>$ and $\left<t,d,c\right>$, respectively. If $a$ and $b$ use the clockwise (resp., counter-clockwise) orderings, given a dynamic failure $F=\{c,b\}$ (resp., $F=\{b,d\}$) and a packet originated at $c$ (resp., $d$), a forwarding loop: $\left(c,a,d,b,c\right)$ (resp., $\left(d,a,c,b,d \right)$) occurs, where $\{c,b\}$ (resp., $\{b,d\}$) is down only when the packet is originated for initial sending but recovered afterwards. However, if $b$ and $a$ use forwarding functions of clockwise and counter-clockwise orderings, respectively, and the node $c$ send its original packet to $v\in \{a,b\}$, the static failure $F=\left\lbrace v,t\right\rbrace$, implies a forwarding loop: $\left(c,v,d, v' \right)$ with $v'=\{a,b\}\setminus v$. Analogous arguments can be given if $a$ and $b$ reverse the orderings of their routing functions respectively.
Figure 3: Counter-example for applying HDR-$3$-Bits ( DBLP:journals/ton/ChiesaNMGMSS17, presented in Algorithm \ref{['alg: three_bits']}) against dynamic failures. For the $4$-connected graph $G=\left(V,E \right)$ and four arc-disjoint arborescences $\{T_1, \ldots,T_4\}$ as shown in this figure, HDR-$3$-Bits algorithm will result in a routing loop for the dynamic failures: $F=\left\lbrace \{a,b\}, \{b,c\}, \{c,d\}\right\rbrace$. A packet originated at the node $x\in V$ will be routed along $T_1$ until hitting the first failure $(b,a)$, and then it is bounced to $T_2$ to follow the directed path $\left(b,c,d \right)$ until hitting the second failure $\left( c,d\right)$. Now, the packet starts at $c$ to do a reversing DFS traversal of $T_2$ to hit the failure $\left(c,b \right)$. Algorithm \ref{['alg: three_bits']} will interpret $\left(c,b \right)$ as the first failure to believe that the current arborescence $T_i$ is $T_4$. Thus, the algorithm will select the next arborescence of $T_i$ as $T_1$ instead of $T_2$ and the packet will follow the directed path $\left(c, x,b \right)$ on $T_1$ to hit the failure $\left(b,a \right)$ again, s.t., the same steps are repeated to generate a routing loop.
Figure 4: Example of applying the $2$-resilient source-matched routing algorithm proposed by Dai et al. DBLP:conf/spaa/DaiF023 to a graph $G=\left(V,E \right)$ shown as bold lines without arrows in Fig. \ref{['fig:fig_SPAA']} (DBLP:conf/spaa/DaiF023) for the source-destination pair $\left(s,t \right)$ to obtain its kernel graph $\mathcal{G}$ by excluding these four red bold lines: $\left\lbrace\{v_1,v_4\}, \{v_2,v_3\},\{u_1,u_4\}, \{u_2,u_3\} \right\rbrace$ as shown in DBLP:conf/spaa/DaiF023, where a kernel graph $\mathcal{G}$ is a subgraph of $G$, s.t., for any two failures $F\subseteq E$, if $s-t$ is connected in $G\setminus F$ then $s-t$ is also connected in $\mathcal{G}\setminus F$. By DBLP:conf/spaa/DaiF023, a forwarding scheme $\Pi^{(s,t)}$ defines a link-circular forwarding function at each node of $\mathcal{G}$, and we can easily verify that $\Pi^{(s,t)}$ is $2$-resilient against static failures. In this figure, $\Pi^{(s,t)}$ is illustrated by solid (red) arcs, dotted (green) arcs, and dashed (blue) arcs respectively, s.t., at a node $v$, a packet from an incoming arc $(u,v)$ is forwarded to an outgoing arc $(v,w)$ that has the same dash pattern (color) as $(u,v)$. If an outgoing arc $(v,w)$ is failed, then the arc $(w,v)$ is treated as an incoming arc to continue forwarding on the dash pattern (color) of $(w,v)$, while a packet originated at $s$ can select either the solid (red) arc $(s,v_{10})$ or the dashed (blue) arc $(s,u_{10})$ arbitrarily to start. However, this forwarding scheme $\Pi^{(s,t)}$ is not $2$-resilient against semi-dynamic failures. For semi-dynamic failures $F=\left\lbrace \{v_1,v_2\}, \{v_7,v_9\} \right\rbrace$, by starting at $s$ and following forwarding rules (red arcs), the packet goes through $\left(s, v_{10}, v_0, v_5, v_1, v_2, v_7 \right)$ to meet the first failure $\left(v_7, v_9 \right)$, and then it is rerouted by the dashed forwarding rules (green arcs) to traverse $\left(v_7, v_2 \right)$ to hit the second failure $\left(v_2,v_1 \right)$. Now, $\Pi^{(s,t)}$ makes the packet stuck in the connected component on $\{v_2, v_7\}$, but in the graph $G\setminus F$, there is still a path from $v_7$ to $t$, e.g., $\left(v_7, v_2, v_3, v_4, v_8, v_9, v_{11}, t \right)$, implying that $\Pi^{(s,t)}$ is not $2$-resilient against semi-dynamic failures. Moreover, after adapting $\Pi^{(s,t)}$ by additionally enforcing clockwise link-circular routing at $v_1$ and $v_2$ to include $\left\lbrace \{v_1,v_4\}, \{v_2,v_3\} \right\rbrace$, we can easily verify that it becomes a $2$-resilient source-matched routing against semi-dynamic failures.
Figure 5: Counter-example topology $G$ for $2$-resilient source-matched routing scheme against dynamic failures, where $s$ is the source and $t$ is the destination. Let $V'=\left\lbrace v_0,\ldots, v_5 \right\rbrace$ and $U'=\left\lbrace u_0, \ldots, u_5 \right\rbrace$. By symmetry, w.l.o.g., we can assume $\pi_{s}\left(\bot \right) =v_0$ when $F_{s}=\emptyset$. Then, we can show that each node $v\in V'\cup \{s\}$ must use a link-circular routing function, which has only two possible orderings for its neighbors, i.e., clockwise and counter-clockwise for the shown drawing. For instance, the clockwise and counter-clockwise orderings for $v_1$ are $\left<v_{0},v_2,v_4\right>$ and $\left<v_0,v_4,v_2\right>$, respectively. We can further show that $v_1$ and $v_3$ must have the same type of orderings (clockwise or counter-clockwise), otherwise a routing loop can occur, e.g., if $v_1$ and $v_3$ select clockwise and counter-clockwise orderings respective, then a loop $\left(s,v_{0},v_1,v_2,v_3,v_4, v_1,v_0,s\right)$ occurs for a static failure $F=\left\lbrace s,u_0 \right\rbrace$. When $v_1$ and $v_3$ both use the clockwise (resp., counter-clockwise) ordering, for a dynamic failure $\{v_2, v_3\}\in F$ (resp., $\{v_3, v_4\}\in F$) , let $\left(v_2, v_3 \right)$ (resp., $\left(v_4, v_3 \right)$) be down and $\left(v_3, v_2 \right)$ (resp., $\left(v_3, v_4 \right)$) be up. Then, a routing loop: $\left( s,v_{0},v_1,v_2,v_1,v_4, v_3,v_2,v_1\right)$ (resp., $\left( s,v_{0},v_1,v_4,v_1,v_2, v_3,v_4,v_1\right)$) appears and the packet originated at $s$ cannot reach $t$ even there is an $s-t$ a path containing no dynamic failure. A similar proof can be given when $\pi_{s}\left(\bot \right) =u_0$ for $F_{s}=\emptyset$.
...and 1 more figures

Theorems & Definitions (19)

Definition 1: $k$-Resilient Failover Routing Problem
Theorem 2
Theorem 3
Lemma 4
Theorem 5
Theorem 6
Theorem 7
Theorem 8
Theorem 9: robroute16infocom
Theorem 10
...and 9 more

On the Resilience of Fast Failover Routing Against Dynamic Link Failures

TL;DR

Abstract

On the Resilience of Fast Failover Routing Against Dynamic Link Failures

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (6)

Theorems & Definitions (19)