Curiosity-Diffuser: Curiosity Guide Diffusion Models for Reliability
Zihao Liu, Xing Liu, Yizhai Zhang, Zhengxiong Liu, Panfeng Huang
TL;DR
The paper tackles instability and hallucinations in neural imitation learning by introducing Curiosity-Diffuser, which guides a conditional diffusion model with an RND-based reliability signal and a similarity metric to keep generated trajectories close to training data. By decoupling policy learning from guidance and combining curiosity-based and reward-based cues, the method improves reliability and task performance in offline diffusion settings, particularly in data-rich and high-quality demonstrations. A novel K-Sim metric provides a practical, efficient way to quantify how closely inferred behaviors align with the training distribution. The approach shows promise in MuJoCo and AntMaze tasks and offers a path toward safer, more reliable robotic imitation learning in real-world deployments.
Abstract
One of the bottlenecks in robotic intelligence is the instability of neural network models, which, unlike control models, lack a well-defined convergence domain and stability. This leads to risks when applying intelligence in the physical world. Specifically, imitation policy based on neural network may generate hallucinations, leading to inaccurate behaviors that impact the safety of real-world applications. To address this issue, this paper proposes the Curiosity-Diffuser, aimed at guiding the conditional diffusion model to generate trajectories with lower curiosity, thereby improving the reliability of policy. The core idea is to use a Random Network Distillation (RND) curiosity module to assess whether the model's behavior aligns with the training data, and then minimize curiosity by classifier guidance diffusion to reduce overgeneralization during inference. Additionally, we propose a computationally efficient metric for evaluating the reliability of the policy, measuring the similarity between the generated behaviors and the training dataset, to facilitate research about reliability learning. Finally, simulation verify the effectiveness and applicability of the proposed method to a variety of scenarios, showing that Curiosity-Diffuser significantly improves task performance and produces behaviors that are more similar to the training data. The code for this work is available at: github.com/CarlDegio/Curiosity-Diffuser
