When Sensors Fail: Temporal Sequence Models for Robust PPO under Sensor Drift

Kevin Vogt-Lowell; Theodoros Tsiligkaridis; Rodney Lafuente-Mercado; Surabhi Ghatti; Shanghua Gao; Marinka Zitnik; Daniela Rus

When Sensors Fail: Temporal Sequence Models for Robust PPO under Sensor Drift

Kevin Vogt-Lowell, Theodoros Tsiligkaridis, Rodney Lafuente-Mercado, Surabhi Ghatti, Shanghua Gao, Marinka Zitnik, Daniela Rus

TL;DR

Under a stochastic sensor failure process, it is proved a high-probability bound on infinite-horizon reward degradation that quantifies how robustness depends on policy smoothness and failure persistence.

Abstract

Real-world reinforcement learning systems must operate under distributional drift in their observation streams, yet most policy architectures implicitly assume fully observed and noise-free states. We study robustness of Proximal Policy Optimization (PPO) under temporally persistent sensor failures that induce partial observability and representation shift. To respond to this drift, we augment PPO with temporal sequence models, including Transformers and State Space Models (SSMs), to enable policies to infer missing information from history and maintain performance. Under a stochastic sensor failure process, we prove a high-probability bound on infinite-horizon reward degradation that quantifies how robustness depends on policy smoothness and failure persistence. Empirically, on MuJoCo continuous-control benchmarks with severe sensor dropout, we show Transformer-based sequence policies substantially outperform MLP, RNN, and SSM baselines in robustness, maintaining high returns even when large fractions of sensors are unavailable. These results demonstrate that temporal sequence reasoning provides a principled and practical mechanism for reliable operation under observation drift caused by sensor unreliability.

When Sensors Fail: Temporal Sequence Models for Robust PPO under Sensor Drift

TL;DR

Abstract

Paper Structure (30 sections, 3 theorems, 33 equations, 3 figures, 2 tables)

This paper contains 30 sections, 3 theorems, 33 equations, 3 figures, 2 tables.

Introduction
Related Work
Sensor Failure Model
Sequence-based PPO Agents
Motivation.
Transformer-based PPO agent
History buffer.
Encoder.
Attention pooling.
RNN/SSM-based PPO Agent
Overview.
Generic recurrent encoder.
Theory
Loss variables.
Interpretation of the bound.
...and 15 more sections

Key Result

Theorem 5.6

Assume assump:1:bounded–assump:5:indep. Fix $\delta\in(0,1)$. Then, with probability at least $1-\delta$, Moreover, the mean satisfies

Figures (3)

Figure 1: Sample PPO training curves on HalfCheetah-v4 under full (left) and 60% partial (right) observability. Lines represent median episodic return and shaded regions denote inter-quartile ranges across 8 random seeds. Training curves generated under partial observability rise more slowly and plateau at lower returns than those produced using fully observed states.
Figure 2: Evaluation episodic returns for PPO agents on MuJoCo environments under full (left) and 60% partial (right) observability, with task complexity roughly increasing from top to bottom. Each violin shows the distribution of pooled episodic returns from 100 episodes across 8 random seeds. Black markers denote the median with 95% bootstrapped CI. While all models suffer performance degradation under partial observability, the Transformer agent demonstrates greater robustness.
Figure 3: PPO training curves for Hopper-v4, Walker2d-v4, and Ant-v4 under full (left) and 60% partial (right) observability. Lines represent median episodic return and shaded regions denote inter-quartile ranges across 8 random seeds.

Theorems & Definitions (5)

Theorem 5.6: High-probability reward-degradation bound
Lemma A.1: Pointwise Wasserstein bound on the per-step loss
Lemma A.2
proof : Proof of Theorem \ref{['thm:main']}
Remark A.3: Signed version

When Sensors Fail: Temporal Sequence Models for Robust PPO under Sensor Drift

TL;DR

Abstract

When Sensors Fail: Temporal Sequence Models for Robust PPO under Sensor Drift

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (3)

Theorems & Definitions (5)