Table of Contents
Fetching ...

Fast LiDAR Data Generation with Rectified Flows

Kazuto Nakashima, Xiaowen Liu, Tomoya Miyawaki, Yumi Iwashita, Ryo Kurazume

TL;DR

This work tackles the computational burden of diffusion-based LiDAR generation by introducing R2Flow, a rectified-flow model that yields straight trajectories and enables fast, high-fidelity sampling using an ODE $d\mathbf{x}_t/dt = v_\theta(\mathbf{x}_t,t)$ with a linear path $\mathbf{x}_t = t\mathbf{x}_1+(1-t)\mathbf{x}_0$. It operates on pixel-space 2-channel equirectangular LiDAR images (range and reflectance) and employs a Transformer-based velocity estimator (HDiT) with architectural tweaks to handle panoramic LiDAR data, using timesteps distillation and reflow to achieve few-step sampling. The method is evaluated on the KITTI-360 dataset for unconditional generation and compared against GANs and diffusion baselines across multiple fidelity/diversity metrics, showing competitive results with substantially reduced sampling steps. The findings highlight R2Flow’s potential as a practical LiDAR data priors tool for restoration, sim-to-real, and domain-adaptation tasks in robotics, while suggesting avenues for raydrop-aware modeling and broader application to sparse-to-dense completion and anomaly detection.

Abstract

Building LiDAR generative models holds promise as powerful data priors for restoration, scene manipulation, and scalable simulation in autonomous mobile robots. In recent years, approaches using diffusion models have emerged, significantly improving training stability and generation quality. Despite their success, diffusion models require numerous iterations of running neural networks to generate high-quality samples, making the increasing computational cost a potential barrier for robotics applications. To address this challenge, this paper presents R2Flow, a fast and high-fidelity generative model for LiDAR data. Our method is based on rectified flows that learn straight trajectories, simulating data generation with significantly fewer sampling steps compared to diffusion models. We also propose an efficient Transformer-based model architecture for processing the image representation of LiDAR range and reflectance measurements. Our experiments on unconditional LiDAR data generation using the KITTI-360 dataset demonstrate the effectiveness of our approach in terms of both efficiency and quality.

Fast LiDAR Data Generation with Rectified Flows

TL;DR

This work tackles the computational burden of diffusion-based LiDAR generation by introducing R2Flow, a rectified-flow model that yields straight trajectories and enables fast, high-fidelity sampling using an ODE with a linear path . It operates on pixel-space 2-channel equirectangular LiDAR images (range and reflectance) and employs a Transformer-based velocity estimator (HDiT) with architectural tweaks to handle panoramic LiDAR data, using timesteps distillation and reflow to achieve few-step sampling. The method is evaluated on the KITTI-360 dataset for unconditional generation and compared against GANs and diffusion baselines across multiple fidelity/diversity metrics, showing competitive results with substantially reduced sampling steps. The findings highlight R2Flow’s potential as a practical LiDAR data priors tool for restoration, sim-to-real, and domain-adaptation tasks in robotics, while suggesting avenues for raydrop-aware modeling and broader application to sparse-to-dense completion and anomaly detection.

Abstract

Building LiDAR generative models holds promise as powerful data priors for restoration, scene manipulation, and scalable simulation in autonomous mobile robots. In recent years, approaches using diffusion models have emerged, significantly improving training stability and generation quality. Despite their success, diffusion models require numerous iterations of running neural networks to generate high-quality samples, making the increasing computational cost a potential barrier for robotics applications. To address this challenge, this paper presents R2Flow, a fast and high-fidelity generative model for LiDAR data. Our method is based on rectified flows that learn straight trajectories, simulating data generation with significantly fewer sampling steps compared to diffusion models. We also propose an efficient Transformer-based model architecture for processing the image representation of LiDAR range and reflectance measurements. Our experiments on unconditional LiDAR data generation using the KITTI-360 dataset demonstrate the effectiveness of our approach in terms of both efficiency and quality.

Paper Structure

This paper contains 10 sections, 6 equations, 8 figures, 2 tables.

Figures (8)

  • Figure 1: Comparison of LiDAR generative models. Diffusion models have demonstrated realistic LiDAR data generation, while the previous methods ran2024towardsnakashima2024lidar suffer from the trade-off between quality and sampling efficiency in their iterative generation process. Our approach consistently generates high-quality samples across different numbers of iterations. $\dag$ Our improved version with APE peebles2023scalable.
  • Figure 2: Architectural comparison of LiDAR diffusion models and ours. Our approach R2Flow is categorized into the pixel-space iteration approach.
  • Figure 3: Scene interpolation using R2Flow inversion. The both side were reconstructed from real samples via inversion. The middle four samples were generated using interpolated latent variables.
  • Figure 4: Schematic overview of our velocity estimator. (a) Straight flows are learned to transport samples between the latent space $p_0$ and the image space $p_1$. (b) Overall architecture to estimate the velocity fields $\bm{v}_t$ from the intermediate state $\bm{x}_t$ and the timestep $t$. The Interp layers fuse the current tokens and skipped tokens at each spatial location with learnable weights. (c) The details of the building blocks. The Circular MHSA (multi-head self-attention) layer uses a global attention kernel at the bottleneck and a sliding local window hassani2023neighborhood for other stages.
  • Figure 5: Distribution of point clouds in bird's eye view. We calculated the marginal distribution of 1,000 random samples generated by LiDM ran2024towards. With APE, the distribution gets closer to the dataset.
  • ...and 3 more figures