Table of Contents
Fetching ...

Dynamic Rank Adjustment in Diffusion Policies for Efficient and Flexible Training

Xiatao Sun, Shuo Yang, Yinxing Chen, Francis Fan, Yiyan Liang, Daniel Rakita

TL;DR

DRIFT tackles the training inefficiency of diffusion policies trained from scratch by introducing a dynamic, SVD-based rank modulation that splits weight matrices into trainable and frozen subspaces, paired with a rank scheduler to progressively adjust the number of trainable ranks. The DRIFT-DAgger algorithm combines offline bootstrapping with online expert interventions, achieving improved sample efficiency and faster training while maintaining competitive performance. Across simulations and real-world tasks, the approach reduces batch training time and preserves task success, with sigmoid-based rank scheduling offering the best balance between speed and accuracy. These results suggest a practical path to deploying large diffusion-policy models in robotics by leveraging intrinsic low-rank structures without reinitializing adapters or incurring prohibitive computation.

Abstract

Diffusion policies trained via offline behavioral cloning have recently gained traction in robotic motion generation. While effective, these policies typically require a large number of trainable parameters. This model size affords powerful representations but also incurs high computational cost during training. Ideally, it would be beneficial to dynamically adjust the trainable portion as needed, balancing representational power with computational efficiency. For example, while overparameterization enables diffusion policies to capture complex robotic behaviors via offline behavioral cloning, the increased computational demand makes online interactive imitation learning impractical due to longer training time. To address this challenge, we present a framework, called DRIFT, that uses the Singular Value Decomposition to enable dynamic rank adjustment during diffusion policy training. We implement and demonstrate the benefits of this framework in DRIFT-DAgger, an imitation learning algorithm that can seamlessly slide between an offline bootstrapping phase and an online interactive phase. We perform extensive experiments to better understand the proposed framework, and demonstrate that DRIFT-DAgger achieves improved sample efficiency and faster training with minimal impact on model performance. The project website is available at: https://apollo-lab-yale.github.io/25-RSS-DRIFT-website/.

Dynamic Rank Adjustment in Diffusion Policies for Efficient and Flexible Training

TL;DR

DRIFT tackles the training inefficiency of diffusion policies trained from scratch by introducing a dynamic, SVD-based rank modulation that splits weight matrices into trainable and frozen subspaces, paired with a rank scheduler to progressively adjust the number of trainable ranks. The DRIFT-DAgger algorithm combines offline bootstrapping with online expert interventions, achieving improved sample efficiency and faster training while maintaining competitive performance. Across simulations and real-world tasks, the approach reduces batch training time and preserves task success, with sigmoid-based rank scheduling offering the best balance between speed and accuracy. These results suggest a practical path to deploying large diffusion-policy models in robotics by leveraging intrinsic low-rank structures without reinitializing adapters or incurring prohibitive computation.

Abstract

Diffusion policies trained via offline behavioral cloning have recently gained traction in robotic motion generation. While effective, these policies typically require a large number of trainable parameters. This model size affords powerful representations but also incurs high computational cost during training. Ideally, it would be beneficial to dynamically adjust the trainable portion as needed, balancing representational power with computational efficiency. For example, while overparameterization enables diffusion policies to capture complex robotic behaviors via offline behavioral cloning, the increased computational demand makes online interactive imitation learning impractical due to longer training time. To address this challenge, we present a framework, called DRIFT, that uses the Singular Value Decomposition to enable dynamic rank adjustment during diffusion policy training. We implement and demonstrate the benefits of this framework in DRIFT-DAgger, an imitation learning algorithm that can seamlessly slide between an offline bootstrapping phase and an online interactive phase. We perform extensive experiments to better understand the proposed framework, and demonstrate that DRIFT-DAgger achieves improved sample efficiency and faster training with minimal impact on model performance. The project website is available at: https://apollo-lab-yale.github.io/25-RSS-DRIFT-website/.

Paper Structure

This paper contains 25 sections, 13 equations, 11 figures, 8 tables, 1 algorithm.

Figures (11)

  • Figure 1: This paper explores balancing overparameterization and training efficiency in diffusion policies by dynamically adjusting the frozen and trainable portions of weight matrices. In the top section of the figure, the learner, trained offline via behavior cloning with full-rank training, attempts to insert the upper drawer box into the container but fails due to collisions with both the container and the lower drawer box. In the bottom section, after efficient online adaptation with reduced trainable ranks, the learner efficiently improves its performance, successfully completing the task.
  • Figure 2: DRIFT-DAgger combines offline policy bootstrapping with online adaptation. The gating function, following the nomenclature of HG-DAgger kelly_hg_dagger, refers to expert intervention and demonstration when the learner reaches undesirable states during online adaptation. Compared to BC, DRIFT-DAgger reduces the need for expert labeling while maintaining high performance. The trainable rank reduction accelerates batch training, improving the usability and practicality of online adaptation for large models without sacrificing performance.
  • Figure 3: Experimental results of DRIFT-DAgger with different decay functions for the rank scheduler. We use HG-DAgger (HG) as a baseline for comparison.
  • Figure 4: Experimental results of DRIFT-DAgger with different terminal ranks $r_{\text{min}}$.
  • Figure 5: The upper row shows the simulation scenarios from robosuite and Manipulation with Viewpoint Selection (MVS) tasks. The lower row shows the plots of success rate with respect to the number of expert labels. HG, D(L), D(LR), and D(RR) represent HG-DAgger, DRIFT-DAgger with LoRA adapters that are only instantiated with $r_{\text{min}}$ when switching to online mode, DRIFT-DAgger with LoRA and rank scheduler, and DRIFT-DAgger with rank modulation and rank scheduler.
  • ...and 6 more figures