Table of Contents
Fetching ...

Intrusion Tolerance for Networked Systems through Two-Level Feedback Control

Kim Hammar, Rolf Stadler

TL;DR

This work introduces TOLERANCE, a two-level feedback-control architecture for intrusion-tolerant systems that jointly optimizes local intrusion-recovery and global replication management. By mapping the two control problems to classical operations-research tasks—the machine replacement problem and the inventory replenishment problem—the authors prove that optimal policies have threshold structures and develop efficient algorithms to compute them. They implement a proof-of-concept on a three-layer testbed using a reconfigurable minBFT consensus and evaluate it against a suite of 10 intrusion types, showing substantial gains in service availability and reductions in operational cost over state-of-the-art intrusion-tolerant approaches. The work also provides a rigorous framework for analyzing recovery and replication under partial synchrony and Byzantine-style threats, and outlines directions for extending the model with game-theoretic analysis and online-learning of intrusion-detection models.

Abstract

We formulate intrusion tolerance for a system with service replicas as a two-level optimal control problem. On the local level node controllers perform intrusion recovery, and on the global level a system controller manages the replication factor. The local and global control problems can be formulated as classical problems in operations research, namely, the machine replacement problem and the inventory replenishment problem. Based on this formulation, we design TOLERANCE, a novel control architecture for intrusion-tolerant systems. We prove that the optimal control strategies on both levels have threshold structure and design efficient algorithms for computing them. We implement and evaluate TOLERANCE in an emulation environment where we run 10 types of network intrusions. The results show that TOLERANCE can improve service availability and reduce operational cost compared with state-of-the-art intrusion-tolerant systems.

Intrusion Tolerance for Networked Systems through Two-Level Feedback Control

TL;DR

This work introduces TOLERANCE, a two-level feedback-control architecture for intrusion-tolerant systems that jointly optimizes local intrusion-recovery and global replication management. By mapping the two control problems to classical operations-research tasks—the machine replacement problem and the inventory replenishment problem—the authors prove that optimal policies have threshold structures and develop efficient algorithms to compute them. They implement a proof-of-concept on a three-layer testbed using a reconfigurable minBFT consensus and evaluate it against a suite of 10 intrusion types, showing substantial gains in service availability and reductions in operational cost over state-of-the-art intrusion-tolerant approaches. The work also provides a rigorous framework for analyzing recovery and replication under partial synchrony and Byzantine-style threats, and outlines directions for extending the model with game-theoretic analysis and online-learning of intrusion-detection models.

Abstract

We formulate intrusion tolerance for a system with service replicas as a two-level optimal control problem. On the local level node controllers perform intrusion recovery, and on the global level a system controller manages the replication factor. The local and global control problems can be formulated as classical problems in operations research, namely, the machine replacement problem and the inventory replenishment problem. Based on this formulation, we design TOLERANCE, a novel control architecture for intrusion-tolerant systems. We prove that the optimal control strategies on both levels have threshold structure and design efficient algorithms for computing them. We implement and evaluate TOLERANCE in an emulation environment where we run 10 types of network intrusions. The results show that TOLERANCE can improve service availability and reduce operational cost compared with state-of-the-art intrusion-tolerant systems.
Paper Structure (35 sections, 4 theorems, 30 equations, 18 figures, 8 tables, 2 algorithms)

This paper contains 35 sections, 4 theorems, 30 equations, 18 figures, 8 tables, 2 algorithms.

Key Result

Proposition 1

tolerance provides correct service if the following holds:

Figures (18)

  • Figure 1: Two-level feedback control for intrusion tolerance; node controllers with strategies $\pi_{1},\hdots, \pi_{N_t}$ compute belief states $b_1,\hdots,b_{N_t}$ and make local recovery decisions; a global system controller with strategy $\pi$ receives belief states and manages the replication factor $N_t$.
  • Figure 2: The tolerance architecture; $N_t$ nodes provide a replicated service to a client population; service responses are coordinated through an intrusion-tolerant consensus protocol; local node controllers decide when to perform recovery and a global system controller manages the replication factor $N_t$.
  • Figure 3: State transition diagram of node $i$ (\ref{['eq:recovery_dynamics']}): disks represent states; arrows represent state transitions; labels indicate probabilities and conditions for state transition; self-transitions are not shown.
  • Figure 4: The optimal value function $V^{\star}_{i,t}(b_{i,t})$ for Prob. \ref{['prob_1']}; the dashed red lines indicate the alpha-vectors smallwood_1krishnamurthy_2016 and the solid black lines indicate $V^{\star}_{i,t}(b_{i,t})$; the parameters for computing $V^{\star}_{i,t}(b_{i,t})$ are listed in Appendix \ref{['appendix:hyperparameters']}.
  • Figure 5: Probability that a node is compromised ($\mathbb{C}$) or crashed $(\emptyset)$ by time-step $t$ if no recoveries occur; the curves relate to $p_{\mathrm{A},i}$ (\ref{['eq:recovery_dynamics']}); hyperparameters are listed in Appendix \ref{['appendix:hyperparameters']}.
  • ...and 13 more figures

Theorems & Definitions (8)

  • Proposition 1
  • proof : Proof (Sketch)
  • Theorem 1
  • proof
  • Corollary 1
  • proof
  • Theorem 2
  • proof