Table of Contents
Fetching ...

A Formal Framework for Predicting Distributed System Performance under Faults (Extended Version)

Ziwei Zhou, Si Liu, Zhou Zhou, Peixin Wang, MIn Zhang

TL;DR

This work tackles predicting distributed system performance under faults by introducing a formal Maude-based framework that separates system models from a reusable fault library. It integrates a fault injector with model composition and transformation steps, enabling end-to-end performance prediction via statistical model checking using QuaTEx and PVeStA, while preserving the absence of nondeterminism through the $pmaude$ property. The authors implement an automated tool, PerF, and demonstrate its predictive accuracy across six diverse systems (2PC+CTP, Raft, PowerDNS, Cassandra, RAMP, Fast-HotStuff), with model-based results aligning closely to deployment evaluations under multiple fault scenarios. The approach enables proactive, design-time exploration of performance under realistic fault conditions and complements deployment fault injection by broadening the fault space that can be analyzed before implementation.

Abstract

Today's distributed systems operate in complex environments that inevitably involve faults and even adversarial behaviors. Predicting their performance under such environments directly from formal designs remains a longstanding challenge. We present the first formal framework that systematically enables performance prediction of distributed systems across diverse faulty scenarios. Our framework features a fault injector together with a wide range of faults, reusable as a library, and model compositions that integrate the system and the fault injector into a unified model suitable for statistical analysis of performance properties such as throughput and latency. We formalize the framework in Maude and implement it as an automated tool, PERF. Applied to representative distributed systems, PERF accurately predicts system performance under varying fault settings, with estimations from formal designs consistent with evaluations on real deployments.

A Formal Framework for Predicting Distributed System Performance under Faults (Extended Version)

TL;DR

This work tackles predicting distributed system performance under faults by introducing a formal Maude-based framework that separates system models from a reusable fault library. It integrates a fault injector with model composition and transformation steps, enabling end-to-end performance prediction via statistical model checking using QuaTEx and PVeStA, while preserving the absence of nondeterminism through the property. The authors implement an automated tool, PerF, and demonstrate its predictive accuracy across six diverse systems (2PC+CTP, Raft, PowerDNS, Cassandra, RAMP, Fast-HotStuff), with model-based results aligning closely to deployment evaluations under multiple fault scenarios. The approach enables proactive, design-time exploration of performance under realistic fault conditions and complements deployment fault injection by broadening the fault space that can be analyzed before implementation.

Abstract

Today's distributed systems operate in complex environments that inevitably involve faults and even adversarial behaviors. Predicting their performance under such environments directly from formal designs remains a longstanding challenge. We present the first formal framework that systematically enables performance prediction of distributed systems across diverse faulty scenarios. Our framework features a fault injector together with a wide range of faults, reusable as a library, and model compositions that integrate the system and the fault injector into a unified model suitable for statistical analysis of performance properties such as throughput and latency. We formalize the framework in Maude and implement it as an automated tool, PERF. Applied to representative distributed systems, PERF accurately predicts system performance under varying fault settings, with estimations from formal designs consistent with evaluations on real deployments.
Paper Structure (34 sections, 2 theorems, 4 figures, 2 tables)

This paper contains 34 sections, 2 theorems, 4 figures, 2 tables.

Key Result

theorem thmcountertheorem

The composed system model with fault injection guarantees AND.

Figures (4)

  • Figure 1: The workflow of our fault-injection framework. Arrows of different types in (a)--(d) represent the possible outcomes after a fault handler executes its designated operation to the message (Step 3).
  • Figure 2: The pipeline of PerF. Sketched files are provided by users, while dashed files are generated automatically by the tool. The composed model with the fault injector in Step B corresponds to Fig. \ref{['fig:overview']}.
  • Figure 3: Model-based performance predictions vs. deployment-level evaluations across six distributed systems and their variants. The scaling factor 1 t.u.$=T$ ms means that one time unit in the model corresponds to $T$ ms in the deployment.
  • Figure 4: An example run of PerF with the SMC result.

Theorems & Definitions (5)

  • theorem thmcountertheorem
  • definition thmcounterdefinition: AND DBLP:journals/pacmpl/LiuMOZB22
  • proof
  • theorem thmcountertheorem
  • proof