Gait Sequence Upsampling using Diffusion Models for Single LiDAR Sensors
Jeongho Ahn, Kazuto Nakashima, Koki Yoshino, Yumi Iwashita, Ryo Kurazume
TL;DR
The paper tackles gait recognition with sparse LiDAR data by introducing LidarGSU, a diffusion-based upsampling framework that performs distance-independent inpainting on orthographic gait representations and enforces temporal consistency via a video-aware denoiser. By leveraging conditional DDPMs with a tailored 3D U-Net predictor and a continuous time schedule, the method improves both generative fidelity (PSNR/SSIM/Consistency) and recognition accuracy on SUSTeck1K and real-world 2v-gait-v2 data. Key contributions include the distance-agnostic inpainting strategy, a video-based noise prediction module, and an efficient diffusion pipeline that yields robust gait upsampling across varying LiDAR resolutions and distances. The approach demonstrates practical impact for LiDAR-based security and robotics systems, enabling reliable gait-based identification under challenging sensing conditions and low-cost hardware.
Abstract
Recently, 3D LiDAR has emerged as a promising technique in the field of gait-based person identification, serving as an alternative to traditional RGB cameras, due to its robustness under varying lighting conditions and its ability to capture 3D geometric information. However, long capture distances or the use of low-cost LiDAR sensors often result in sparse human point clouds, leading to a decline in identification performance. To address these challenges, we propose a sparse-to-dense upsampling model for pedestrian point clouds in LiDAR-based gait recognition, named LidarGSU, which is designed to improve the generalization capability of existing identification models. Our method utilizes diffusion probabilistic models (DPMs), which have shown high fidelity in generative tasks such as image completion. In this work, we leverage DPMs on sparse sequential pedestrian point clouds as conditional masks in a video-to-video translation approach, applied in an inpainting manner. We conducted extensive experiments on the SUSTeck1K dataset to evaluate the generative quality and recognition performance of the proposed method. Furthermore, we demonstrate the applicability of our upsampling model using a real-world dataset, captured with a low-resolution sensor across varying measurement distances.
