Table of Contents
Fetching ...

Robust Learning of Diffusion Models with Extremely Noisy Conditions

Xin Chen, Gillian Dobbie, Xinyu Wang, Feng Liu, Di Wang, Jingfeng Zhang

TL;DR

The paper tackles the challenge of extremely noisy conditioning in conditional diffusion models. It introduces pseudo conditions that are progressively refined via temporal ensembling and a Reverse-time Diffusion Condition (RDC) to strengthen memorization and conditioning under noise. Empirical results demonstrate state-of-the-art robustness on class-conditioned image generation and visuomotor policy tasks across diverse noise regimes. The approach offers practical robustness for real-world applications with unreliable observations and noisy labels.

Abstract

Conditional diffusion models have the generative controllability by incorporating external conditions. However, their performance significantly degrades with noisy conditions, such as corrupted labels in the image generation or unreliable observations or states in the control policy generation. This paper introduces a robust learning framework to address extremely noisy conditions in conditional diffusion models. We empirically demonstrate that existing noise-robust methods fail when the noise level is high. To overcome this, we propose learning pseudo conditions as surrogates for clean conditions and refining pseudo ones progressively via the technique of temporal ensembling. Additionally, we develop a Reverse-time Diffusion Condition (RDC) technique, which diffuses pseudo conditions to reinforce the memorization effect and further facilitate the refinement of the pseudo conditions. Experimentally, our approach achieves state-of-the-art performance across a range of noise levels on both class-conditional image generation and visuomotor policy generation tasks.The code can be accessible via the project page https://robustdiffusionpolicy.github.io

Robust Learning of Diffusion Models with Extremely Noisy Conditions

TL;DR

The paper tackles the challenge of extremely noisy conditioning in conditional diffusion models. It introduces pseudo conditions that are progressively refined via temporal ensembling and a Reverse-time Diffusion Condition (RDC) to strengthen memorization and conditioning under noise. Empirical results demonstrate state-of-the-art robustness on class-conditioned image generation and visuomotor policy tasks across diverse noise regimes. The approach offers practical robustness for real-world applications with unreliable observations and noisy labels.

Abstract

Conditional diffusion models have the generative controllability by incorporating external conditions. However, their performance significantly degrades with noisy conditions, such as corrupted labels in the image generation or unreliable observations or states in the control policy generation. This paper introduces a robust learning framework to address extremely noisy conditions in conditional diffusion models. We empirically demonstrate that existing noise-robust methods fail when the noise level is high. To overcome this, we propose learning pseudo conditions as surrogates for clean conditions and refining pseudo ones progressively via the technique of temporal ensembling. Additionally, we develop a Reverse-time Diffusion Condition (RDC) technique, which diffuses pseudo conditions to reinforce the memorization effect and further facilitate the refinement of the pseudo conditions. Experimentally, our approach achieves state-of-the-art performance across a range of noise levels on both class-conditional image generation and visuomotor policy generation tasks.The code can be accessible via the project page https://robustdiffusionpolicy.github.io

Paper Structure

This paper contains 27 sections, 18 equations, 7 figures, 5 tables, 1 algorithm.

Figures (7)

  • Figure 1: (a) Structures of our robust diffusion model: a lightweight prediction head that predicts pseudo conditions $\hat{y}$ is added at the output of the U-Net encoder of the diffusion model, and temporal ensembling is then adopted to update pseudo conditions. (b) Learning dynamics of conditional diffusion models on CIFAR-10 under $40\%$ symmetric noise. Y-axis: controllability (Top-1 ACC of generated images, 1k images/class, 10 classes); X-axis: training epochs. Generations are evaluated using a pretrained CIFAR-10 classifier (Top-1 ACC $92.89\%$, silver dash-dot line). We compare the pseudo condition (PC) in orange curve and PC with Reverse-time Diffusion Condition (RDC) in green curve, both with early stopping (star markers), against TDSM DBLP:conf/iclr/NaKBLKKM24 in blue and the vanilla conditional diffusion DBLP:conf/nips/KarrasAAL22 in gray curve.
  • Figure 2: Visualization of different types of diffusion dynamics on the condition. Columns correspond to diffusion timesteps from $t=0.0$ to $t=1.0$. (a) Standard conditional diffusion: $t$ from 0 to 1, $\mathbf{y}_t$ remains fixed. (b) Conditional diffusion with diffused condition: $t$ from 0 to 1, $\mathbf{y}_t$ gradually spreads to a Gaussian distribution. (c) Conditional diffusion with reverse-time diffusion condition (RDC): $t$ from 0 to 1, $\mathbf{y}_t$ evolves toward the original distribution of $\mathbf{y}$.
  • Figure 3: 2-D Toy Case. Comparison of robust condition learning methods using 2-D synthetic data in four class (red for class 1, blue for class 2, green for class 3, yellow for class 4). From left to right, three methods are trained and generated on 2-D data with symmetric noise, and the noise level equals $20\%$, $40\%$, $60\%$, and $80\%$.
  • Figure 4: Results on Push-T with $80\%$ camera distortion. Each row shows one policy with nine key frames sampled at equal intervals.
  • Figure 5: Visualization examples of noisy Push-T dataset. Each subfigure shows the effect of condition noise, where the left image is the original image observations and the right image is the corresponding version with added camera distortion.
  • ...and 2 more figures