Table of Contents
Fetching ...

Optimizing Sensor Redundancy in Sequential Decision-Making Problems

Jonas Nüßlein, Maximilian Zorn, Fabian Ritz, Jonas Stein, Gerhard Stenzel, Julian Schönberger, Thomas Gabor, Claudia Linnhoff-Popien

TL;DR

This work tackles sensor dropout robustness in reinforcement learning by optimizing backup sensor configurations under a cost cap $C$. It introduces a second-order approximation of the expected return $\mathbb{E}_{d,\pi,x}[R]$ and a QUBO formulation that balances performance gains with budget penalties, solved via Tabu Search. A momentum-based sampling strategy estimates pairwise dropout effects $\hat{R}_{(i,j)}$, and the method is validated across eight OpenAI Gym environments plus a Unity-based RobotArmGrasping domain, showing the approximation tracks real returns well and yields near-optimal sensor configurations in practice. The approach provides a practical, scalable pathway to plan cost-aware sensor redundancy for robust sequential decision-making in real-world RL deployments.

Abstract

Reinforcement Learning (RL) policies are designed to predict actions based on current observations to maximize cumulative future rewards. In real-world applications (i.e., non-simulated environments), sensors are essential for measuring the current state and providing the observations on which RL policies rely to make decisions. A significant challenge in deploying RL policies in real-world scenarios is handling sensor dropouts, which can result from hardware malfunctions, physical damage, or environmental factors like dust on a camera lens. A common strategy to mitigate this issue is the use of backup sensors, though this comes with added costs. This paper explores the optimization of backup sensor configurations to maximize expected returns while keeping costs below a specified threshold, C. Our approach uses a second-order approximation of expected returns and includes penalties for exceeding cost constraints. We then optimize this quadratic program using Tabu Search, a meta-heuristic algorithm. The approach is evaluated across eight OpenAI Gym environments and a custom Unity-based robotic environment (RobotArmGrasping). Empirical results demonstrate that our quadratic program effectively approximates real expected returns, facilitating the identification of optimal sensor configurations.

Optimizing Sensor Redundancy in Sequential Decision-Making Problems

TL;DR

This work tackles sensor dropout robustness in reinforcement learning by optimizing backup sensor configurations under a cost cap . It introduces a second-order approximation of the expected return and a QUBO formulation that balances performance gains with budget penalties, solved via Tabu Search. A momentum-based sampling strategy estimates pairwise dropout effects , and the method is validated across eight OpenAI Gym environments plus a Unity-based RobotArmGrasping domain, showing the approximation tracks real returns well and yields near-optimal sensor configurations in practice. The approach provides a practical, scalable pathway to plan cost-aware sensor redundancy for robust sequential decision-making in real-world RL deployments.

Abstract

Reinforcement Learning (RL) policies are designed to predict actions based on current observations to maximize cumulative future rewards. In real-world applications (i.e., non-simulated environments), sensors are essential for measuring the current state and providing the observations on which RL policies rely to make decisions. A significant challenge in deploying RL policies in real-world scenarios is handling sensor dropouts, which can result from hardware malfunctions, physical damage, or environmental factors like dust on a camera lens. A common strategy to mitigate this issue is the use of backup sensors, though this comes with added costs. This paper explores the optimization of backup sensor configurations to maximize expected returns while keeping costs below a specified threshold, C. Our approach uses a second-order approximation of expected returns and includes penalties for exceeding cost constraints. We then optimize this quadratic program using Tabu Search, a meta-heuristic algorithm. The approach is evaluated across eight OpenAI Gym environments and a custom Unity-based robotic environment (RobotArmGrasping). Empirical results demonstrate that our quadratic program effectively approximates real expected returns, facilitating the identification of optimal sensor configurations.

Paper Structure

This paper contains 14 sections, 1 theorem, 13 equations, 6 figures, 2 tables, 2 algorithms.

Key Result

Proposition 1

The optimization problem described in $(2)$ belongs to complexity class NP-hard.

Figures (6)

  • Figure 1: An illustration of our approach for optimizing the backup sensor configuration.
  • Figure 2: Proof-of-concept: this plot shows the real expected return and the approximated expected return $\mathbb{E}_{d,\pi,x}[R] \approx - \: x^T \: Q \: x + \hat{R}(d)$ when using a backup sensor configuration $x$. The configurations (x-axis) are sorted according to the real return.
  • Figure 3: Solution landscapes of all possible backup sensor configurations for Acrobot-v1 (upper), LunarLander-v2 (middle) and Hopper-v2 (lower)
  • Figure 4: Illustration of the random cube positioning in the RobotArmGrasping domain during training and evaluation. In this example, we traced 500 initial cube coordinates during the evaluation of an PPO agent. The color of the dots indicates whether the cube could successfully be grasped at these initial positions.
  • Figure 5: An illustration of our environment RobotArmGrasping.
  • ...and 1 more figures

Theorems & Definitions (2)

  • Proposition 1
  • proof