Table of Contents
Fetching ...

Trustworthy and Explainable Deep Reinforcement Learning for Safe and Energy-Efficient Process Control: A Use Case in Industrial Compressed Air Systems

Vincent Bezold, Patrick Wagner, Jakob Hofmann, Marco Huber, Alexander Sauer

TL;DR

The paper tackles energy inefficiency and safety risks in industrial compressed air systems by introducing a trustworthy DRL controller for multi-compressor setups. It combines a PPO-based policy with a physics-informed simulation and a multi-level explainability pipeline (input perturbation, gradient sensitivity, SHAP—including time-resolved attributions) to ensure plausibility and transparency. Empirical results show the agent lowers average pressure, respects safety boundaries, and achieves about 4% energy savings without explicit physics models, with explanations consistently highlighting pressure and forecast inputs as primary drivers. This work provides a transferable framework for interpretable RL in energy-critical industrial processes and lays groundwork for safer real-world deployment.

Abstract

This paper presents a trustworthy reinforcement learning approach for the control of industrial compressed air systems. We develop a framework that enables safe and energy-efficient operation under realistic boundary conditions and introduce a multi-level explainability pipeline combining input perturbation tests, gradient-based sensitivity analysis, and SHAP (SHapley Additive exPlanations) feature attribution. An empirical evaluation across multiple compressor configurations shows that the learned policy is physically plausible, anticipates future demand, and consistently respects system boundaries. Compared to the installed industrial controller, the proposed approach reduces unnecessary overpressure and achieves energy savings of approximately 4\,\% without relying on explicit physics models. The results further indicate that system pressure and forecast information dominate policy decisions, while compressor-level inputs play a secondary role. Overall, the combination of efficiency gains, predictive behavior, and transparent validation supports the trustworthy deployment of reinforcement learning in industrial energy systems.

Trustworthy and Explainable Deep Reinforcement Learning for Safe and Energy-Efficient Process Control: A Use Case in Industrial Compressed Air Systems

TL;DR

The paper tackles energy inefficiency and safety risks in industrial compressed air systems by introducing a trustworthy DRL controller for multi-compressor setups. It combines a PPO-based policy with a physics-informed simulation and a multi-level explainability pipeline (input perturbation, gradient sensitivity, SHAP—including time-resolved attributions) to ensure plausibility and transparency. Empirical results show the agent lowers average pressure, respects safety boundaries, and achieves about 4% energy savings without explicit physics models, with explanations consistently highlighting pressure and forecast inputs as primary drivers. This work provides a transferable framework for interpretable RL in energy-critical industrial processes and lays groundwork for safer real-world deployment.

Abstract

This paper presents a trustworthy reinforcement learning approach for the control of industrial compressed air systems. We develop a framework that enables safe and energy-efficient operation under realistic boundary conditions and introduce a multi-level explainability pipeline combining input perturbation tests, gradient-based sensitivity analysis, and SHAP (SHapley Additive exPlanations) feature attribution. An empirical evaluation across multiple compressor configurations shows that the learned policy is physically plausible, anticipates future demand, and consistently respects system boundaries. Compared to the installed industrial controller, the proposed approach reduces unnecessary overpressure and achieves energy savings of approximately 4\,\% without relying on explicit physics models. The results further indicate that system pressure and forecast information dominate policy decisions, while compressor-level inputs play a secondary role. Overall, the combination of efficiency gains, predictive behavior, and transparent validation supports the trustworthy deployment of reinforcement learning in industrial energy systems.

Paper Structure

This paper contains 27 sections, 11 equations, 8 figures, 1 table.

Figures (8)

  • Figure 1: Comparison of convergence speed and stability for Proximal Policy Optimization (PPO) and Soft Actor-Critic (SAC) in the compressor control environment. The PPO algorithm exhibits substantially faster and more stable convergence, while SAC fails to reach comparable performance even after hyperparameter tuning.
  • Figure 2: Comparison of system pressure and demand for a representative day based on real factory data, including the compressor-level supplied volumetric flow rates (control schedule). The RL-based optimized control maintains a lower average pressure than the installed industrial controller, while consistently adhering to the 8 bar system limit. The additional compressor-level traces reveal how the RL policy allocates supply across individual compressors over time and avoids unnecessary overproduction that would increase pressure. This reduction in average pressure results in about 4% energy savings.
  • Figure 3: Input perturbation testing across different configurations. Each curve shows the compressor output level selected by the agent in response to varying forecasted volumetric flow rates, under fixed pressure conditions.
  • Figure 4: Direct comparison of normalized mean absolute SHAP values (feature attribution) and gradient-based sensitivity scores (saliency) across all experimental configurations. Each bar plot visualizes both methods for each feature, allowing assessment of consistency in feature importance ranking. System pressure and the first forecasted demand consistently dominate in both approaches, supporting the physical plausibility and interpretability of the learned policy.
  • Figure 5: SHAP summary plot for the 3C3F configuration after only five training iterations. At this early checkpoint, the feature importance profile deviates markedly from the well-trained agent: compressor level 1 is incorrectly identified as the most relevant input, while pressure and forecast features play a subordinate role. This illustrates that feature attributions are only meaningful if the agent has learned plausible control logic.
  • ...and 3 more figures