Table of Contents
Fetching ...

Habitizing Diffusion Planning for Efficient and Effective Decision Making

Haofei Lu, Yifei Shen, Dongsheng Li, Junliang Xing, Dongqi Han

TL;DR

This work tackles the slow inference of diffusion planning for decision making by introducing Habi, a two-stage framework that habitizes diffusion planners into fast habitual policies. Through a Bayesian behavior formulation, Habi learns a lightweight prior over actions and aligns it with the diffusion planner's posterior via an ELBO-based objective, augmented with an adaptive KL weighting and a critic to supervise habitual decisions. The resulting Habitual Inference (HI) runs at over 800 Hz on CPU while achieving competitive or superior performance on the D4RL offline RL benchmarks, and provides robust analyses of prior/posterior alignment, action-sampling effects, and distribution visualizations. This approach bridges cognitive-inspired habit formation with engineering efficiency, enabling real-time, probabilistic decision making in real-world tasks. Practical impact includes enabling diffusion-based decision-making to operate at real-time frequencies on modest hardware without sacrificing performance.

Abstract

Diffusion models have shown great promise in decision-making, also known as diffusion planning. However, the slow inference speeds limit their potential for broader real-world applications. Here, we introduce Habi, a general framework that transforms powerful but slow diffusion planning models into fast decision-making models, which mimics the cognitive process in the brain that costly goal-directed behavior gradually transitions to efficient habitual behavior with repetitive practice. Even using a laptop CPU, the habitized model can achieve an average 800+ Hz decision-making frequency (faster than previous diffusion planners by orders of magnitude) on standard offline reinforcement learning benchmarks D4RL, while maintaining comparable or even higher performance compared to its corresponding diffusion planner. Our work proposes a fresh perspective of leveraging powerful diffusion models for real-world decision-making tasks. We also provide robust evaluations and analysis, offering insights from both biological and engineering perspectives for efficient and effective decision-making.

Habitizing Diffusion Planning for Efficient and Effective Decision Making

TL;DR

This work tackles the slow inference of diffusion planning for decision making by introducing Habi, a two-stage framework that habitizes diffusion planners into fast habitual policies. Through a Bayesian behavior formulation, Habi learns a lightweight prior over actions and aligns it with the diffusion planner's posterior via an ELBO-based objective, augmented with an adaptive KL weighting and a critic to supervise habitual decisions. The resulting Habitual Inference (HI) runs at over 800 Hz on CPU while achieving competitive or superior performance on the D4RL offline RL benchmarks, and provides robust analyses of prior/posterior alignment, action-sampling effects, and distribution visualizations. This approach bridges cognitive-inspired habit formation with engineering efficiency, enabling real-time, probabilistic decision making in real-world tasks. Practical impact includes enabling diffusion-based decision-making to operate at real-time frequencies on modest hardware without sacrificing performance.

Abstract

Diffusion models have shown great promise in decision-making, also known as diffusion planning. However, the slow inference speeds limit their potential for broader real-world applications. Here, we introduce Habi, a general framework that transforms powerful but slow diffusion planning models into fast decision-making models, which mimics the cognitive process in the brain that costly goal-directed behavior gradually transitions to efficient habitual behavior with repetitive practice. Even using a laptop CPU, the habitized model can achieve an average 800+ Hz decision-making frequency (faster than previous diffusion planners by orders of magnitude) on standard offline reinforcement learning benchmarks D4RL, while maintaining comparable or even higher performance compared to its corresponding diffusion planner. Our work proposes a fresh perspective of leveraging powerful diffusion models for real-world decision-making tasks. We also provide robust evaluations and analysis, offering insights from both biological and engineering perspectives for efficient and effective decision-making.

Paper Structure

This paper contains 34 sections, 25 equations, 13 figures, 5 tables.

Figures (13)

  • Figure 1: Performance vs. Frequency. Performance is normalized across MuJoCo, AntMaze, and Kitchen tasks from D4RL. Decision frequency (Hz) is measured on a laptop CPU (Apple M2, MacBook). Habitual Inference (HI), a lightweight model generated by our Habi, achieves an optimal balance between performance and speed. See \ref{['tab: main-table']} for more results.
  • Figure 2: An illustrative example of the process of habitization in playing the Minesweeper game. With practice, one's decision-making relies less on deliberate goal-directed planning and more on context-dependent habitual behavior.
  • Figure 3: The diagram of Habi.(a) During the Habitization (Training) stage, Habi learns to reconstruct actions from plans generated by a diffusion planner, with the decision spaces of habits (prior) and planning (posterior) aligned via KL divergence in the latent space. Trainable parts include Prior Encoder, Posterior Encoder, Decoder, and Critic. (b) During the Habitual Inference (HI) stage, only the lightweight prior encoder and latent decoder are required, enabling fast, high-quality habitual behaviors for decision-making.
  • Figure 4: Visualized results of Table \ref{['tab: main-table']}. HI consistently performs in parallel with best models while being highly efficient.
  • Figure 5: Action distributions of Diffusion Planner (DV) and Habitual Inference (HI). Visualization of the action distributions from a state-of-the-art diffusion planner (DV, 2.8Hz, top) lu2025what and its corresponding Habitual Inference policy (HI, 1532.6Hz, middle) generated by our Habi framework. Here, HI shows a probabilistic generation capacity while roughly aligning with the action distribution from the diffusion planner (DV). More examples are deferred to Appendix \ref{['appendix:more_visual']}.
  • ...and 8 more figures