Table of Contents
Fetching ...

Exploration and Adaptation in Non-Stationary Tasks with Diffusion Policies

Gunbir Singh Baveja

TL;DR

The paper investigates applying Diffusion Policy to non-stationary, vision-based RL scenarios, where task dynamics and objectives evolve over time. It integrates a conditional diffusion model with a visual encoder to produce contextually appropriate action sequences in a closed-loop loop, enabling on-the-fly adaptation from high-dimensional observations. Across Procgen (CoinRun, Maze) and PointMaze, the diffusion approach generally outperforms PPO and DQN in mean and max rewards with lower variability, though it incurs high computational costs and faces limitations under extreme non-stationarity. Ablation studies reveal modest gains from deeper encoders and adaptive noise schedules, while the discussion highlights opportunities for autoregressive comparisons, transformer encoders, and broader stability benchmarks to further enhance robustness and efficiency.

Abstract

This paper investigates the application of Diffusion Policy in non-stationary, vision-based RL settings, specifically targeting environments where task dynamics and objectives evolve over time. Our work is grounded in practical challenges encountered in dynamic real-world scenarios such as robotics assembly lines and autonomous navigation, where agents must adapt control strategies from high-dimensional visual inputs. We apply Diffusion Policy -- which leverages iterative stochastic denoising to refine latent action representations-to benchmark environments including Procgen and PointMaze. Our experiments demonstrate that, despite increased computational demands, Diffusion Policy consistently outperforms standard RL methods such as PPO and DQN, achieving higher mean and maximum rewards with reduced variability. These findings underscore the approach's capability to generate coherent, contextually relevant action sequences in continuously shifting conditions, while also highlighting areas for further improvement in handling extreme non-stationarity.

Exploration and Adaptation in Non-Stationary Tasks with Diffusion Policies

TL;DR

The paper investigates applying Diffusion Policy to non-stationary, vision-based RL scenarios, where task dynamics and objectives evolve over time. It integrates a conditional diffusion model with a visual encoder to produce contextually appropriate action sequences in a closed-loop loop, enabling on-the-fly adaptation from high-dimensional observations. Across Procgen (CoinRun, Maze) and PointMaze, the diffusion approach generally outperforms PPO and DQN in mean and max rewards with lower variability, though it incurs high computational costs and faces limitations under extreme non-stationarity. Ablation studies reveal modest gains from deeper encoders and adaptive noise schedules, while the discussion highlights opportunities for autoregressive comparisons, transformer encoders, and broader stability benchmarks to further enhance robustness and efficiency.

Abstract

This paper investigates the application of Diffusion Policy in non-stationary, vision-based RL settings, specifically targeting environments where task dynamics and objectives evolve over time. Our work is grounded in practical challenges encountered in dynamic real-world scenarios such as robotics assembly lines and autonomous navigation, where agents must adapt control strategies from high-dimensional visual inputs. We apply Diffusion Policy -- which leverages iterative stochastic denoising to refine latent action representations-to benchmark environments including Procgen and PointMaze. Our experiments demonstrate that, despite increased computational demands, Diffusion Policy consistently outperforms standard RL methods such as PPO and DQN, achieving higher mean and maximum rewards with reduced variability. These findings underscore the approach's capability to generate coherent, contextually relevant action sequences in continuously shifting conditions, while also highlighting areas for further improvement in handling extreme non-stationarity.

Paper Structure

This paper contains 21 sections, 1 equation, 1 figure, 2 tables.

Figures (1)

  • Figure 1: Unified Representation and Action Decoding in Diffusion Policy. Visual observations and state information (e.g., goal and achievement metrics) are encoded into a unified feature representation through an observation encoder and state encoder. This representation forms the input to a diffusion-based process, where noise is injected, denoised, and refined to decode action sequences. The framework effectively combines multimodal inputs for robust policy generation in complex navigation tasks.