Table of Contents
Fetching ...

FERN: Leveraging Graph Attention Networks for Failure Evaluation and Robust Network Design

Chenyi Liu, Vaneet Aggarwal, Tian Lan, Nan Geng, Yuan Yang, Mingwei Xu, Qing Li

TL;DR

A unified learning-based framework, FERN, is developed for scalable Failure Evaluation and Robust Network design that enables a broad range of robust network design problems, including robust network validation, network upgrade optimization, and fault-tolerant traffic engineering that are discussed in this paper.

Abstract

Robust network design, which aims to guarantee network availability under various failure scenarios while optimizing performance/cost objectives, has received significant attention. Existing approaches often rely on model-based mixed-integer optimization that is hard to scale or employ deep learning to solve specific engineering problems yet with limited generalizability. In this paper, we show that failure evaluation provides a common kernel to improve the tractability and scalability of existing solutions. By providing a neural network function approximation of this common kernel using graph attention networks, we develop a unified learning-based framework, FERN, for scalable Failure Evaluation and Robust Network design. FERN represents rich problem inputs as a graph and captures both local and global views by attentively performing feature extraction from the graph. It enables a broad range of robust network design problems, including robust network validation, network upgrade optimization, and fault-tolerant traffic engineering that are discussed in this paper, to be recasted with respect to the common kernel and thus computed efficiently using neural networks and over a small set of critical failure scenarios. Extensive experiments on real-world network topologies show that FERN can efficiently and accurately identify key failure scenarios for both OSPF and optimal routing scheme, and generalizes well to different topologies and input traffic patterns. It can speed up multiple robust network design problems by more than 80x, 200x, 10x, respectively with negligible performance gap.

FERN: Leveraging Graph Attention Networks for Failure Evaluation and Robust Network Design

TL;DR

A unified learning-based framework, FERN, is developed for scalable Failure Evaluation and Robust Network design that enables a broad range of robust network design problems, including robust network validation, network upgrade optimization, and fault-tolerant traffic engineering that are discussed in this paper.

Abstract

Robust network design, which aims to guarantee network availability under various failure scenarios while optimizing performance/cost objectives, has received significant attention. Existing approaches often rely on model-based mixed-integer optimization that is hard to scale or employ deep learning to solve specific engineering problems yet with limited generalizability. In this paper, we show that failure evaluation provides a common kernel to improve the tractability and scalability of existing solutions. By providing a neural network function approximation of this common kernel using graph attention networks, we develop a unified learning-based framework, FERN, for scalable Failure Evaluation and Robust Network design. FERN represents rich problem inputs as a graph and captures both local and global views by attentively performing feature extraction from the graph. It enables a broad range of robust network design problems, including robust network validation, network upgrade optimization, and fault-tolerant traffic engineering that are discussed in this paper, to be recasted with respect to the common kernel and thus computed efficiently using neural networks and over a small set of critical failure scenarios. Extensive experiments on real-world network topologies show that FERN can efficiently and accurately identify key failure scenarios for both OSPF and optimal routing scheme, and generalizes well to different topologies and input traffic patterns. It can speed up multiple robust network design problems by more than 80x, 200x, 10x, respectively with negligible performance gap.
Paper Structure (28 sections, 1 theorem, 21 equations, 20 figures, 8 tables)

This paper contains 28 sections, 1 theorem, 21 equations, 20 figures, 8 tables.

Key Result

Theorem 1

Eq. eq:network-upgrade-F holds for all failures $x\in X$, if it holds on the following set of critical failures ${X}_C$: ${X}_C = \left\{x \ | \ F_{\theta}(x, G, D, r^o) \geq \frac{F_\theta(x^w, G, D, r^o)}{MLU(x^w, G, D, r^o)} \right\}$.

Figures (20)

  • Figure 1: Distributions of MLU increase on large-scale topology for OSPF and optimal (MCF) routing schemes under 2 simultaneous link failures. MLU under failure scenarios are normalized by the MLU under the worst-case failure scenario.
  • Figure 2: Overall structure of FERN.
  • Figure 3: Comparison of failure impact between optimal reroute and simplified reroute.
  • Figure 4: Working process of GAT-based failure impact prediction model.
  • Figure 5: Accuracy (ROC curves) of FERN classification model on seen topologies.
  • ...and 15 more figures

Theorems & Definitions (2)

  • Theorem 1
  • proof