Fault-Tolerant Design and Multi-Objective Model Checking for Real-Time Deep Reinforcement Learning Systems

Guoxin Su; Thomas Robinson; Hoa Khanh Dam; Li Liu; David S. Rosenblum

Fault-Tolerant Design and Multi-Objective Model Checking for Real-Time Deep Reinforcement Learning Systems

Guoxin Su, Thomas Robinson, Hoa Khanh Dam, Li Liu, David S. Rosenblum

Abstract

Deep reinforcement learning (DRL) has emerged as a powerful paradigm for solving complex decision-making problems. However, DRL-based systems still face significant dependability challenges particularly in real-time environments due to the simulation-to-reality gap, out-of-distribution observations, and the critical impact of latency. Latency-induced faults, in particular, can lead to unsafe or unstable behaviour, yet existing fault-tolerance approaches to DRL systems lack formal methods to rigorously analyse and optimise performance and safety simultaneously in real-time settings. To address this, we propose a formal framework for designing and analysing real-time switching mechanisms between DRL agents and alternative controllers. Our approach leverages Timed Automata (TAs) for explicit switch logic design, which is then syntactically converted to a Markov Decision Process (MDP) for formal analysis. We develop a novel convex query technique for multi-objective model checking, enabling the optimisation of soft performance objectives while ensuring hard safety constraints for MDPs. Furthermore, we present MOPMC, a GPU-accelerated software tool implementing this technique, demonstrating superior scalability in both model size and objective numbers.

Fault-Tolerant Design and Multi-Objective Model Checking for Real-Time Deep Reinforcement Learning Systems

Abstract

Paper Structure (42 sections, 7 theorems, 7 equations, 9 figures, 5 tables, 1 algorithm)

This paper contains 42 sections, 7 theorems, 7 equations, 9 figures, 5 tables, 1 algorithm.

Introduction
Motivations and Example
Real-Time DRL System Characteristics
Example: Autonomous-Driving Agent
Switch Architecture and Design Choices
Formal Design Framework
TA-Based Switch Modelling
Incremental Design by Synchronised Composition
Syntactic-Converted MDP as Analysis Model
Design Choice Analysis
Convex Query on Multi-Objective MDP
Geometric Preliminaries
Multi-Objective MDP: Basic Queries
Convex Query
Policy-Value Iteration for Hyperplane Computation
...and 27 more sections

Key Result

Lemma 4.1

Let $\Psi\subseteq \mathbb{R}^n$ be a convex set of points. For any $\boldsymbol{v}\not\in \Psi$, there is a weight vector $\boldsymbol{w}$ such that $\boldsymbol{w}\cdot\boldsymbol{v}> \boldsymbol{w}\cdot\boldsymbol{x}$ for all $\boldsymbol{x}\in \Psi$. We say that $\boldsymbol{w}$separates$\boldsy

Figures (9)

Figure 1: DRL Agent for Autonomous Driving
Figure 2: Real-time Switch Architecture
Figure 3: Overview of the Switch Design Approach
Figure 4: $\mathcal{A}_{\mathit{sw}0}$: Base Model of a Real-time Switch
Figure 5: TAs for Synchronised Composition with $\mathcal{A}_{\mathit{SW}0}$
...and 4 more figures

Theorems & Definitions (14)

Definition 3.1: Timed automata
Definition 3.3: Synchronised composition
Definition 3.4: Markov decision process
Definition 3.5: Scheduler
Definition 3.6: Reward function
Definition 3.7: Total reward
Lemma 4.1: Separating and supporting hyperplane theorems boyd2004convex
Lemma 4.2: Polytope for multi-objective MDP Etessami2008Forejt2011
Definition 4.3: Convex query
Theorem 4.4: Correctness of Algorithm \ref{['alg:main-convex-query']}
...and 4 more

Fault-Tolerant Design and Multi-Objective Model Checking for Real-Time Deep Reinforcement Learning Systems

Abstract

Fault-Tolerant Design and Multi-Objective Model Checking for Real-Time Deep Reinforcement Learning Systems

Authors

Abstract

Table of Contents

Key Result

Figures (9)

Theorems & Definitions (14)