Table of Contents
Fetching ...

Efficient Interaction-Based Offline Runtime Verification of Distributed Systems with Lifeline Removal

Erwan Mahe, Boutheina Bannour, Christophe Gaston, Pascale Le Gall

TL;DR

This work tackles offline runtime verification for distributed systems by modeling expected communications with interactions and handling partial observations through lifeline removal. It introduces a formal framework linking global interaction traces to local multi-traces, plus an on-the-fly transformation that prunes the specification as subsystems become unobserved. The authors prove correctness, present two optimizations—Partial Order Reduction via one-unambiguous actions and Local Analyses—that significantly reduce search-space and verification time, and validate these enhancements with substantial experiments using the HIBOU tool. The approach enables conformance checking without clock synchronization, delivering practical offline RV capabilities for large, distributed environments with incomplete data.

Abstract

Runtime Verification (RV) refers to a family of techniques in which system executions are observed and confronted to formal specifications, with the aim of identifying faults. In Offline RV, observation is done in a first step and verification in a second, on a static artifact collected during observation. In this paper, we define an approach to offline RV of Distributed Systems (DS) against interactions. Interactions are formal models describing communications within a DS. DS are composed of subsystems deployed on different machines and interacting via message passing. Therefore, observing executions of a DS entails logging a collection of local execution traces, one for each subsystem, that we call a multi-trace. A major challenge in analyzing multi-traces is that there are no practical means to synchronize the ends of observations of all local traces. We address this via an operation, called lifeline removal, which we apply on-the-fly on the specification during verification once a local trace has been entirely analyzed. This operation removes from the interaction the specification of actions occurring on the subsystem that is no-longer observed. This may allow further execution of the specification via removing deadlocks due to the partial orders of actions. We prove the correctness of the resulting RV algorithm and introduce two optimization techniques which we also prove correct. We implement a Partial Order Reduction (POR) technique via the selection of a one-unambiguous action (as a unique first step to a linearization) which existence is determined via another use of the lifeline removal operator. Additionally, Local Analyses (LOC) i.e., the verification of local traces, can be leveraged during the global multi-trace analysis to prove failure more quickly. Experiments illustrate the application of our RV approach and the benefits of our optimizations.

Efficient Interaction-Based Offline Runtime Verification of Distributed Systems with Lifeline Removal

TL;DR

This work tackles offline runtime verification for distributed systems by modeling expected communications with interactions and handling partial observations through lifeline removal. It introduces a formal framework linking global interaction traces to local multi-traces, plus an on-the-fly transformation that prunes the specification as subsystems become unobserved. The authors prove correctness, present two optimizations—Partial Order Reduction via one-unambiguous actions and Local Analyses—that significantly reduce search-space and verification time, and validate these enhancements with substantial experiments using the HIBOU tool. The approach enables conformance checking without clock synchronization, delivering practical offline RV capabilities for large, distributed environments with incomplete data.

Abstract

Runtime Verification (RV) refers to a family of techniques in which system executions are observed and confronted to formal specifications, with the aim of identifying faults. In Offline RV, observation is done in a first step and verification in a second, on a static artifact collected during observation. In this paper, we define an approach to offline RV of Distributed Systems (DS) against interactions. Interactions are formal models describing communications within a DS. DS are composed of subsystems deployed on different machines and interacting via message passing. Therefore, observing executions of a DS entails logging a collection of local execution traces, one for each subsystem, that we call a multi-trace. A major challenge in analyzing multi-traces is that there are no practical means to synchronize the ends of observations of all local traces. We address this via an operation, called lifeline removal, which we apply on-the-fly on the specification during verification once a local trace has been entirely analyzed. This operation removes from the interaction the specification of actions occurring on the subsystem that is no-longer observed. This may allow further execution of the specification via removing deadlocks due to the partial orders of actions. We prove the correctness of the resulting RV algorithm and introduce two optimization techniques which we also prove correct. We implement a Partial Order Reduction (POR) technique via the selection of a one-unambiguous action (as a unique first step to a linearization) which existence is determined via another use of the lifeline removal operator. Additionally, Local Analyses (LOC) i.e., the verification of local traces, can be leveraged during the global multi-trace analysis to prove failure more quickly. Experiments illustrate the application of our RV approach and the benefits of our optimizations.
Paper Structure (30 sections, 3 theorems, 47 equations, 18 figures)

This paper contains 30 sections, 3 theorems, 47 equations, 18 figures.

Key Result

Theorem 1

For any $L \subseteq \mathcal{L}$, any $H \subseteq L$ and any $i \in \mathbb{I}(L)$, we have:

Figures (18)

  • Figure 1: Example interaction
  • Figure 2: Application of lifeline removal on the example from Fig.\ref{['fig:interaction_example']}...
  • Figure 3: Illustrating algebraic properties of $\mathsf{rmv}$ on the example from Fig.\ref{['fig:interaction_example']}
  • Figure 4: Clocks and characterization of executed behaviors
  • Figure 5: Multi-trace projection on the example from Fig.\ref{['fig:colocalization_schema']}
  • ...and 13 more figures

Theorems & Definitions (44)

  • Definition 1
  • Definition 2
  • Definition 3
  • Definition 4
  • proof
  • proof
  • Definition 5
  • Definition 6
  • proof
  • Definition 7
  • ...and 34 more