Table of Contents
Fetching ...

Fault Identification Enhancement with Reinforcement Learning (FIERL)

Valentina Zaccaria, Davide Sartor, Simone Del Favero, Gian Antonio Susto

TL;DR

The paper targets fault identification by decoupling passive fault detection (PFD) from control-input design and reframing active fault detection as input design optimized via constrained reinforcement learning (CRL). FIERL formulates this as a CMDP where the reward $\mathcal{M}(z^{est}_t,z_t)$ promotes accurate fault estimates and the cost $\mathcal{D}(y_t,y^{ref}_t)$ bounds tracking disturbance; the optimal policy is $\pi^* = \arg\max_{\pi} \mathbb{E}_{\tau}[\sum_t \gamma^t \mathcal{M}(z^{est}_t,z_t)]$ subject to the constraint on tracking. The framework is instantiated for linear actuator faults with a Gaussian passive observer, yielding Kalman-like update rules, and is demonstrated on a three-tank benchmark where FIERL outperforms a naive perturbation controller and generalizes to unseen fault dynamics. While offering robustness and applicability to continuous fault spectra, the approach is computationally intensive and lacks global convergence guarantees, motivating future work on richer constraint costs and extensions to non-linear or over-actuated systems.

Abstract

This letter presents a novel approach in the field of Active Fault Detection (AFD), by explicitly separating the task into two parts: Passive Fault Detection (PFD) and control input design. This formulation is very general, and most existing AFD literature can be viewed through this lens. By recognizing this separation, PFD methods can be leveraged to provide components that make efficient use of the available information, while the control input is designed in order to optimize the gathering of information. The core contribution of this work is FIERL, a general simulation-based approach for the design of such control strategies, using Constrained Reinforcement Learning (CRL) to optimize the performance of arbitrary passive detectors. The control policy is learned without the need of knowing the passive detector inner workings, making FIERL broadly applicable. However, it is especially useful when paired with the design of an efficient passive component. Unlike most AFD approaches, FIERL can handle fairly complex scenarios such as continuous sets of fault modes. The effectiveness of FIERL is tested on a benchmark problem for actuator fault diagnosis, where FIERL is shown to be fairly robust, being able to generalize to fault dynamics not seen in training.

Fault Identification Enhancement with Reinforcement Learning (FIERL)

TL;DR

The paper targets fault identification by decoupling passive fault detection (PFD) from control-input design and reframing active fault detection as input design optimized via constrained reinforcement learning (CRL). FIERL formulates this as a CMDP where the reward promotes accurate fault estimates and the cost bounds tracking disturbance; the optimal policy is subject to the constraint on tracking. The framework is instantiated for linear actuator faults with a Gaussian passive observer, yielding Kalman-like update rules, and is demonstrated on a three-tank benchmark where FIERL outperforms a naive perturbation controller and generalizes to unseen fault dynamics. While offering robustness and applicability to continuous fault spectra, the approach is computationally intensive and lacks global convergence guarantees, motivating future work on richer constraint costs and extensions to non-linear or over-actuated systems.

Abstract

This letter presents a novel approach in the field of Active Fault Detection (AFD), by explicitly separating the task into two parts: Passive Fault Detection (PFD) and control input design. This formulation is very general, and most existing AFD literature can be viewed through this lens. By recognizing this separation, PFD methods can be leveraged to provide components that make efficient use of the available information, while the control input is designed in order to optimize the gathering of information. The core contribution of this work is FIERL, a general simulation-based approach for the design of such control strategies, using Constrained Reinforcement Learning (CRL) to optimize the performance of arbitrary passive detectors. The control policy is learned without the need of knowing the passive detector inner workings, making FIERL broadly applicable. However, it is especially useful when paired with the design of an efficient passive component. Unlike most AFD approaches, FIERL can handle fairly complex scenarios such as continuous sets of fault modes. The effectiveness of FIERL is tested on a benchmark problem for actuator fault diagnosis, where FIERL is shown to be fairly robust, being able to generalize to fault dynamics not seen in training.
Paper Structure (12 sections, 22 equations, 3 figures)

This paper contains 12 sections, 22 equations, 3 figures.

Figures (3)

  • Figure 1: FIERL: depiction of the offline policy training phase (left) and of the control flow during deployment (right).
  • Figure 2: Experimental results for a random testing episode. The top two panels show the evolution of fault estimate distributions for FIERL compared to the naive approach. The fault estimate obtained with the RL policy promptly approaches the true fault values after jumps, and generally it tends to be more accurate, while also maintaining a tighter distribution. Because of the passive component design, single bursty actions provide more information on the fault mode than applying the equivalent control over a longer period of time. This is in accordance with the RL policy behaviour, depicted in the fourth panel, where a large action is followed by a sequence of small negative ones in order to meet the tracking performance requirements (shown in panel 3).
  • Figure 3: Diagnosis performance as a function of tracking threshold $\Delta y^{max}$. Each boxplot represents the average return distribution in test episodes for FIERL with the corresponding control requirements. The color-map shows the probability of violating the constraints purely due drifts caused by process noise. As $\Delta y^{max}$ shrinks, the diagnosis performance quickly degrades, and the training becomes unstable. This is because the starting policy, no matter how conservatively it is initialized, is almost certainly unfeasible, which is needed to ensure convergence bounds achiam2017constrained.