Learning from Sparse Offline Datasets via Conservative Density Estimation

Zhepeng Cen; Zuxin Liu; Zitong Wang; Yihang Yao; Henry Lam; Ding Zhao

Learning from Sparse Offline Datasets via Conservative Density Estimation

Zhepeng Cen, Zuxin Liu, Zitong Wang, Yihang Yao, Henry Lam, Ding Zhao

TL;DR

A novel training algorithm called Conservative Density Estimation (CDE), which addresses this challenge by explicitly imposing constraints on the state-action occupancy stationary distribution by addressing the support mismatch issue in marginal importance sampling.

Abstract

Offline reinforcement learning (RL) offers a promising direction for learning policies from pre-collected datasets without requiring further interactions with the environment. However, existing methods struggle to handle out-of-distribution (OOD) extrapolation errors, especially in sparse reward or scarce data settings. In this paper, we propose a novel training algorithm called Conservative Density Estimation (CDE), which addresses this challenge by explicitly imposing constraints on the state-action occupancy stationary distribution. CDE overcomes the limitations of existing approaches, such as the stationary distribution correction method, by addressing the support mismatch issue in marginal importance sampling. Our method achieves state-of-the-art performance on the D4RL benchmark. Notably, CDE consistently outperforms baselines in challenging tasks with sparse rewards or insufficient data, demonstrating the advantages of our approach in addressing the extrapolation error problem in offline RL.

Learning from Sparse Offline Datasets via Conservative Density Estimation

TL;DR

Abstract

Paper Structure (41 sections, 8 theorems, 56 equations, 6 figures, 14 tables, 2 algorithms)

This paper contains 41 sections, 8 theorems, 56 equations, 6 figures, 14 tables, 2 algorithms.

Introduction
Related Work
Method
Preliminaries
Conservative Density Estimation
Policy Evaluation and Improvement
Policy Extraction
Theoretical Analysis
Experiment
Results on D4RL sparse reward tasks
Comparative experiments in scarce data setting
Parameter studies
Conclusion
Supplementary Derivations and Proofs
Derivation of Eq.(\ref{['eq:cde_unconstrained']})(\ref{['eq:IS estimation']})
...and 26 more sections

Key Result

Proposition 1

With assumption ass:f, the closed-form solution to inner maximization problem $\max_{w\geq 0} {\mathcal{L}}'(w,v,\lambda)$ is where $\tilde{A}(s,a) := A(s,a) - \bm{1}\{(s,a) \in \text{supp}(\mu)\} \cdot\lambda(s,a)$ denotes regularized advantage function and $\bm{1}\{\cdot\}$ is the indicator function.

Figures (6)

Figure 1: The results on sub-datasets with different dataset sizes.
Figure 2: The heatmaps of agents with different levels of conservatism in maze2d-large environment. Yellow denotes the high occupation probability. The starting point of each trajectory may vary but the destination (red star) is the same. Smaller $\tilde{\epsilon}$ indicates more conservative policy. The yellow accumulation points except the destination indicate that the agent is stuck at those regions.
Figure 3: The performances with different $\zeta$.
Figure 4: The training curves of CDE. The shadow region indicates the standard deviation of mean values across different seeds. Here we report the normalized reward scores for MuJoCo tasks measured by dense rewards instead of success rate, which has been reported in previous tables.
Figure 5: The results on sub-datasets with different dataset sizes for MuJoCo medium-expert tasks.
...and 1 more figures

Theorems & Definitions (17)

Proposition 1
Proposition 2
Proposition 3: Upper bound of concentrability ratio on OOD state-actions
Theorem 1: Upper bound of function approximated concentrability ratio
Theorem 2: The upper bound of performance gap
proof
proof
proof
proof
proof
...and 7 more

Learning from Sparse Offline Datasets via Conservative Density Estimation

TL;DR

Abstract

Learning from Sparse Offline Datasets via Conservative Density Estimation

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (6)

Theorems & Definitions (17)