H2O+: An Improved Framework for Hybrid Offline-and-Online RL with Dynamics Gaps

Haoyi Niu; Tianying Ji; Bingqi Liu; Haocheng Zhao; Xiangyu Zhu; Jianying Zheng; Pengfei Huang; Guyue Zhou; Jianming Hu; Xianyuan Zhan

H2O+: An Improved Framework for Hybrid Offline-and-Online RL with Dynamics Gaps

Haoyi Niu, Tianying Ji, Bingqi Liu, Haocheng Zhao, Xiangyu Zhu, Jianying Zheng, Pengfei Huang, Guyue Zhou, Jianming Hu, Xianyuan Zhan

TL;DR

The paper addresses the challenge of transferring RL policies from imperfect simulators to real-world tasks with limited offline data. It introduces H2O+, a dynamics-aware hybrid offline-online RL algorithm that blends offline data-driven value anchors with online simulator-based learning via a mixed value update. H2O+ avoids excessive conservatism of prior methods, does not require explicit dynamics-gap metrics, and is compatible with strong offline RL backbones. The approach is validated on MuJoCo benchmarks with engineered dynamics gaps and on a real-wheel robotics platform, showing superior or robust performance compared to SAC, CQL, IQL, DARC, and H2O, and demonstrating practical potential for cross-domain transfer in settings lacking high-fidelity simulators or large offline datasets. These results suggest significant impact for real-world robotics and control by enabling transferable policies with limited offline data and imperfect simulations.

Abstract

Solving real-world complex tasks using reinforcement learning (RL) without high-fidelity simulation environments or large amounts of offline data can be quite challenging. Online RL agents trained in imperfect simulation environments can suffer from severe sim-to-real issues. Offline RL approaches although bypass the need for simulators, often pose demanding requirements on the size and quality of the offline datasets. The recently emerged hybrid offline-and-online RL provides an attractive framework that enables joint use of limited offline data and imperfect simulator for transferable policy learning. In this paper, we develop a new algorithm, called H2O+, which offers great flexibility to bridge various choices of offline and online learning methods, while also accounting for dynamics gaps between the real and simulation environment. Through extensive simulation and real-world robotics experiments, we demonstrate superior performance and flexibility over advanced cross-domain online and offline RL algorithms.

H2O+: An Improved Framework for Hybrid Offline-and-Online RL with Dynamics Gaps

TL;DR

Abstract

Paper Structure (24 sections, 8 equations, 5 figures, 2 tables)

This paper contains 24 sections, 8 equations, 5 figures, 2 tables.

Introduction
Related Work
Reinforcement Learning with Imperfect Simulators
Policy Learning by Combining Offline and Online RL
Preliminaries
Reinforcement Learning
Hybrid Offline-and-Online RL with Imperfect Simulator
Method
Separate Considerations for Offline and Online Learning
Dynamics-Aware Mixed Value Update
Discussion and Comparison with H2O
Experiments
Experimental Setups
Algorithmic implementation of H2O+
Simulation experiments
...and 9 more sections

Figures (5)

Figure 1: Original environments and some illustrations of the modified dynamics
Figure 2: Average returns for MuJoCo HalfCheetah and Walker2d tasks
Figure 3: The real-robot experiment results of the “standing still” (b) and “moving forward” (c) tasks
Figure 4: Comparison of H2O / H2O+ simulation data quality in real-world tasks. (Top: standing still; Down: moving forward)
Figure 5: Different choices of offline RL backbone for state-value function learning

H2O+: An Improved Framework for Hybrid Offline-and-Online RL with Dynamics Gaps

TL;DR

Abstract

H2O+: An Improved Framework for Hybrid Offline-and-Online RL with Dynamics Gaps

Authors

TL;DR

Abstract

Table of Contents

Figures (5)