Table of Contents
Fetching ...

Diffusion Distillation With Direct Preference Optimization For Efficient 3D LiDAR Scene Completion

An Zhao, Shengyuan Zhang, Ling Yang, Zejian Li, Jiale Wu, Haoran Xu, AnYang Wei, Perry Pengyun GU, Lingyun Sun

TL;DR

This work tackles slow sampling in diffusion-based LiDAR scene completion by introducing Distillation-DPO, a teacher–student framework that uses preference data pairs to distill a high-quality, multi-step teacher into a lightweight student. The training leverages on-policy preference signals derived from winning/losing completed scenes and optimizes the student via the difference in score functions against a pre-trained teacher across noised timesteps. Empirical results on SemanticKITTI show more than a 5x speedup with improved Chamfer Distance and Jensen-Shannon Divergence compared to the state-of-the-art LiDiff, while maintaining competitive Earth Mover’s Distance. The approach combines elements of score distillation, direct preference optimization, and LiDAR-specific evaluation metrics, offering a practical, preference-aligned pathway to fast, high-quality LiDAR scene completion, with public code available. Potential extensions include applying the method to semantic scene completion SSC and further advancing real-time performance while preserving quality.

Abstract

The application of diffusion models in 3D LiDAR scene completion is limited due to diffusion's slow sampling speed. Score distillation accelerates diffusion sampling but with performance degradation, while post-training with direct policy optimization (DPO) boosts performance using preference data. This paper proposes Distillation-DPO, a novel diffusion distillation framework for LiDAR scene completion with preference aligment. First, the student model generates paired completion scenes with different initial noises. Second, using LiDAR scene evaluation metrics as preference, we construct winning and losing sample pairs. Such construction is reasonable, since most LiDAR scene metrics are informative but non-differentiable to be optimized directly. Third, Distillation-DPO optimizes the student model by exploiting the difference in score functions between the teacher and student models on the paired completion scenes. Such procedure is repeated until convergence. Extensive experiments demonstrate that, compared to state-of-the-art LiDAR scene completion diffusion models, Distillation-DPO achieves higher-quality scene completion while accelerating the completion speed by more than 5-fold. Our method is the first to explore adopting preference learning in distillation to the best of our knowledge and provide insights into preference-aligned distillation. Our code is public available on https://github.com/happyw1nd/DistillationDPO.

Diffusion Distillation With Direct Preference Optimization For Efficient 3D LiDAR Scene Completion

TL;DR

This work tackles slow sampling in diffusion-based LiDAR scene completion by introducing Distillation-DPO, a teacher–student framework that uses preference data pairs to distill a high-quality, multi-step teacher into a lightweight student. The training leverages on-policy preference signals derived from winning/losing completed scenes and optimizes the student via the difference in score functions against a pre-trained teacher across noised timesteps. Empirical results on SemanticKITTI show more than a 5x speedup with improved Chamfer Distance and Jensen-Shannon Divergence compared to the state-of-the-art LiDiff, while maintaining competitive Earth Mover’s Distance. The approach combines elements of score distillation, direct preference optimization, and LiDAR-specific evaluation metrics, offering a practical, preference-aligned pathway to fast, high-quality LiDAR scene completion, with public code available. Potential extensions include applying the method to semantic scene completion SSC and further advancing real-time performance while preserving quality.

Abstract

The application of diffusion models in 3D LiDAR scene completion is limited due to diffusion's slow sampling speed. Score distillation accelerates diffusion sampling but with performance degradation, while post-training with direct policy optimization (DPO) boosts performance using preference data. This paper proposes Distillation-DPO, a novel diffusion distillation framework for LiDAR scene completion with preference aligment. First, the student model generates paired completion scenes with different initial noises. Second, using LiDAR scene evaluation metrics as preference, we construct winning and losing sample pairs. Such construction is reasonable, since most LiDAR scene metrics are informative but non-differentiable to be optimized directly. Third, Distillation-DPO optimizes the student model by exploiting the difference in score functions between the teacher and student models on the paired completion scenes. Such procedure is repeated until convergence. Extensive experiments demonstrate that, compared to state-of-the-art LiDAR scene completion diffusion models, Distillation-DPO achieves higher-quality scene completion while accelerating the completion speed by more than 5-fold. Our method is the first to explore adopting preference learning in distillation to the best of our knowledge and provide insights into preference-aligned distillation. Our code is public available on https://github.com/happyw1nd/DistillationDPO.

Paper Structure

This paper contains 27 sections, 20 equations, 3 figures, 6 tables.

Figures (3)

  • Figure 1: An example demonstration of Distillation-DPO for LiDAR scene completion on SemanticKITTI dataset. (a) The input sparse LiDAR scan. (b) The corresponding ground truth scene. (c) Completion results of the existing state-of-the-art (SOTA) model, LiDiff LiDiff. (d) Completion results of the proposed Distillation-DPO. Compared to LiDiff, Distillation-DPO can complete a scene more than 5 times faster while achieving higher completion quality (lower Chamfer Distance).
  • Figure 2: The overall structure of Distillation-DPO. (1) The student model generates the completed scene with different initial noise level $\lambda$ based on the sparse scan. (2) Choosing the winning sample $\mathcal{G}_t^w$ and losing samples $\mathcal{G}_t^l$. (3) The sparse scan, $\mathcal{G}_t^w$ and $\mathcal{G}_t^l$ are input to $\boldsymbol{\epsilon}_\theta$, $\boldsymbol{\epsilon}_\phi^w$ and $\boldsymbol{\epsilon}_\phi^l$. (4) The model $\boldsymbol{\epsilon}_\theta^w$ and $\boldsymbol{\epsilon}_\theta^l$ are optimized on $\mathcal{G}_t^w$ and $\mathcal{G}_t^l$, separately. (5) The student model is optimized by the DPO gradient.
  • Figure 3: Qualitative results on SemanticKITTI. Compared to LiDiff LiDiff, Ditillation-DPO achieves faster and higher-quality completion.