DIAR: Diffusion-model-guided Implicit Q-learning with Adaptive Revaluation

Jaehyun Park; Yunho Kim; Sejin Kim; Byung-Jun Lee; Sundong Kim

DIAR: Diffusion-model-guided Implicit Q-learning with Adaptive Revaluation

Jaehyun Park, Yunho Kim, Sejin Kim, Byung-Jun Lee, Sundong Kim

TL;DR

This work proposes a novel offline reinforcement learning approach, introducing the Diffusion-model-guided Implicit Q-learning with Adaptive Revaluation (DIAR) framework, and addresses Q-value overestimation by combining Q-network learning with a value function guided by a diffusion model.

Abstract

We propose a novel offline reinforcement learning (offline RL) approach, introducing the Diffusion-model-guided Implicit Q-learning with Adaptive Revaluation (DIAR) framework. We address two key challenges in offline RL: out-of-distribution samples and long-horizon problems. We leverage diffusion models to learn state-action sequence distributions and incorporate value functions for more balanced and adaptive decision-making. DIAR introduces an Adaptive Revaluation mechanism that dynamically adjusts decision lengths by comparing current and future state values, enabling flexible long-term decision-making. Furthermore, we address Q-value overestimation by combining Q-network learning with a value function guided by a diffusion model. The diffusion model generates diverse latent trajectories, enhancing policy robustness and generalization. As demonstrated in tasks like Maze2D, AntMaze, and Kitchen, DIAR consistently outperforms state-of-the-art algorithms in long-horizon, sparse-reward environments.

DIAR: Diffusion-model-guided Implicit Q-learning with Adaptive Revaluation

TL;DR

Abstract

DIAR: Diffusion-model-guided Implicit Q-learning with Adaptive Revaluation

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (7)