Table of Contents
Fetching ...

Curiosity-Diffuser: Curiosity Guide Diffusion Models for Reliability

Zihao Liu, Xing Liu, Yizhai Zhang, Zhengxiong Liu, Panfeng Huang

TL;DR

The paper tackles instability and hallucinations in neural imitation learning by introducing Curiosity-Diffuser, which guides a conditional diffusion model with an RND-based reliability signal and a similarity metric to keep generated trajectories close to training data. By decoupling policy learning from guidance and combining curiosity-based and reward-based cues, the method improves reliability and task performance in offline diffusion settings, particularly in data-rich and high-quality demonstrations. A novel K-Sim metric provides a practical, efficient way to quantify how closely inferred behaviors align with the training distribution. The approach shows promise in MuJoCo and AntMaze tasks and offers a path toward safer, more reliable robotic imitation learning in real-world deployments.

Abstract

One of the bottlenecks in robotic intelligence is the instability of neural network models, which, unlike control models, lack a well-defined convergence domain and stability. This leads to risks when applying intelligence in the physical world. Specifically, imitation policy based on neural network may generate hallucinations, leading to inaccurate behaviors that impact the safety of real-world applications. To address this issue, this paper proposes the Curiosity-Diffuser, aimed at guiding the conditional diffusion model to generate trajectories with lower curiosity, thereby improving the reliability of policy. The core idea is to use a Random Network Distillation (RND) curiosity module to assess whether the model's behavior aligns with the training data, and then minimize curiosity by classifier guidance diffusion to reduce overgeneralization during inference. Additionally, we propose a computationally efficient metric for evaluating the reliability of the policy, measuring the similarity between the generated behaviors and the training dataset, to facilitate research about reliability learning. Finally, simulation verify the effectiveness and applicability of the proposed method to a variety of scenarios, showing that Curiosity-Diffuser significantly improves task performance and produces behaviors that are more similar to the training data. The code for this work is available at: github.com/CarlDegio/Curiosity-Diffuser

Curiosity-Diffuser: Curiosity Guide Diffusion Models for Reliability

TL;DR

The paper tackles instability and hallucinations in neural imitation learning by introducing Curiosity-Diffuser, which guides a conditional diffusion model with an RND-based reliability signal and a similarity metric to keep generated trajectories close to training data. By decoupling policy learning from guidance and combining curiosity-based and reward-based cues, the method improves reliability and task performance in offline diffusion settings, particularly in data-rich and high-quality demonstrations. A novel K-Sim metric provides a practical, efficient way to quantify how closely inferred behaviors align with the training distribution. The approach shows promise in MuJoCo and AntMaze tasks and offers a path toward safer, more reliable robotic imitation learning in real-world deployments.

Abstract

One of the bottlenecks in robotic intelligence is the instability of neural network models, which, unlike control models, lack a well-defined convergence domain and stability. This leads to risks when applying intelligence in the physical world. Specifically, imitation policy based on neural network may generate hallucinations, leading to inaccurate behaviors that impact the safety of real-world applications. To address this issue, this paper proposes the Curiosity-Diffuser, aimed at guiding the conditional diffusion model to generate trajectories with lower curiosity, thereby improving the reliability of policy. The core idea is to use a Random Network Distillation (RND) curiosity module to assess whether the model's behavior aligns with the training data, and then minimize curiosity by classifier guidance diffusion to reduce overgeneralization during inference. Additionally, we propose a computationally efficient metric for evaluating the reliability of the policy, measuring the similarity between the generated behaviors and the training dataset, to facilitate research about reliability learning. Finally, simulation verify the effectiveness and applicability of the proposed method to a variety of scenarios, showing that Curiosity-Diffuser significantly improves task performance and produces behaviors that are more similar to the training data. The code for this work is available at: github.com/CarlDegio/Curiosity-Diffuser

Paper Structure

This paper contains 16 sections, 8 equations, 5 figures, 3 tables.

Figures (5)

  • Figure 1: The motivation behind the Curiosity-Diffuser method is that imitation learning can produce hallucinations due to model inference instability when facing new states. We observed that curiosity mechanisms can evaluate the novelty of states. Therefore, based on the guided diffusion framework, we propose the Curiosity-Diffuser method, which combines curiosity-based guidance and reward-based guidance. Through experiments, we demonstrate that the introduction of the curiosity mechanism enhances the reliability of execution.
  • Figure 2: The overall framework of Curiosity-Diffuser is illustrated as follows. In part (a), we train the classifier for guidance. The dataset is used to train both the Prediction Network of the RND module and the Reward Predictor of the reward prediction module. The gradients from these two modules yield two types of inference signals: one generated in the vicinity of the training data, and the other driving the model toward task completion. In part (b), a diffusion model is trained to mimic the policy embedded within the dataset, enabling the recovery of executable plans from noise samples. During inference, the classifier’s gradient is combined with the diffusion process to steer the generation of a specific policy. Finally, in part (c), we evaluate the generated results using the computationally efficient $K_{sim}$ metric, which verifies that Curiosity-Diffuser produces data with fewer hallucinations. In the accompanying figure, arrow positions represent states while arrow directions indicate actions, emphasizing that the search for the closest state–action pair prioritizes the state.
  • Figure 3: (a) HalfCheetah, Walker2d and Hopper in Gym-Mujoco of D4RL dataset. (b) AntMaze in D4RL dataset.
  • Figure 4: Visualize the running paths of the two methods in the antmaze medium task. The starting point and end point are represented by blue and green points respectively, and the trajectory is represented by the curve from light red to dark red.
  • Figure 5: Ablation experiment of guided weight $\lambda$, where the blue dots represent the sampling points of the experiment and are connected into blue curves to indicate the trend, and the red dotted line represents the success rate using only reward guidance.