Table of Contents
Fetching ...

Real-Time LiDAR Super-Resolution via Frequency-Aware Multi-Scale Fusion

June Moh Goo, Zichao Zeng, Jan Boehm

TL;DR

FLASH tackles the challenge of high-quality LiDAR perception from low-resolution sensors by introducing dual-domain processing that combines Frequency-Aware Window Attention with Adaptive Multi-Scale Fusion. The approach expands the effective receptive field beyond spatial limits while preserving local geometry, enabling single-pass, real-time LiDAR range-image super-resolution that outperforms uncertainty-based baselines on KITTI. Key contributions include a dual-domain attention mechanism and a learned, position-specific fusion module that replaces fixed skip connections. Empirically, FLASH achieves state-of-the-art results across MAE, Chamfer Distance, IoU, and F1, with robust performance across near and far ranges, demonstrating practical applicability to autonomous systems without costly stochastic inference.

Abstract

LiDAR super-resolution addresses the challenge of achieving high-quality 3D perception from cost-effective, low-resolution sensors. While recent transformer-based approaches like TULIP show promise, they remain limited to spatial-domain processing with restricted receptive fields. We introduce FLASH (Frequency-aware LiDAR Adaptive Super-resolution with Hierarchical fusion), a novel framework that overcomes these limitations through dual-domain processing. FLASH integrates two key innovations: (i) Frequency-Aware Window Attention that combines local spatial attention with global frequency-domain analysis via FFT, capturing both fine-grained geometry and periodic scanning patterns at log-linear complexity. (ii) Adaptive Multi-Scale Fusion that replaces conventional skip connections with learned position-specific feature aggregation, enhanced by CBAM attention for dynamic feature selection. Extensive experiments on KITTI demonstrate that FLASH achieves state-of-the-art performance across all evaluation metrics, surpassing even uncertainty-enhanced baselines that require multiple forward passes. Notably, FLASH outperforms TULIP with Monte Carlo Dropout while maintaining single-pass efficiency, which enables real-time deployment. The consistent superiority across all distance ranges validates that our dual-domain approach effectively handles uncertainty through architectural design rather than computationally expensive stochastic inference, making it practical for autonomous systems.

Real-Time LiDAR Super-Resolution via Frequency-Aware Multi-Scale Fusion

TL;DR

FLASH tackles the challenge of high-quality LiDAR perception from low-resolution sensors by introducing dual-domain processing that combines Frequency-Aware Window Attention with Adaptive Multi-Scale Fusion. The approach expands the effective receptive field beyond spatial limits while preserving local geometry, enabling single-pass, real-time LiDAR range-image super-resolution that outperforms uncertainty-based baselines on KITTI. Key contributions include a dual-domain attention mechanism and a learned, position-specific fusion module that replaces fixed skip connections. Empirically, FLASH achieves state-of-the-art results across MAE, Chamfer Distance, IoU, and F1, with robust performance across near and far ranges, demonstrating practical applicability to autonomous systems without costly stochastic inference.

Abstract

LiDAR super-resolution addresses the challenge of achieving high-quality 3D perception from cost-effective, low-resolution sensors. While recent transformer-based approaches like TULIP show promise, they remain limited to spatial-domain processing with restricted receptive fields. We introduce FLASH (Frequency-aware LiDAR Adaptive Super-resolution with Hierarchical fusion), a novel framework that overcomes these limitations through dual-domain processing. FLASH integrates two key innovations: (i) Frequency-Aware Window Attention that combines local spatial attention with global frequency-domain analysis via FFT, capturing both fine-grained geometry and periodic scanning patterns at log-linear complexity. (ii) Adaptive Multi-Scale Fusion that replaces conventional skip connections with learned position-specific feature aggregation, enhanced by CBAM attention for dynamic feature selection. Extensive experiments on KITTI demonstrate that FLASH achieves state-of-the-art performance across all evaluation metrics, surpassing even uncertainty-enhanced baselines that require multiple forward passes. Notably, FLASH outperforms TULIP with Monte Carlo Dropout while maintaining single-pass efficiency, which enables real-time deployment. The consistent superiority across all distance ranges validates that our dual-domain approach effectively handles uncertainty through architectural design rather than computationally expensive stochastic inference, making it practical for autonomous systems.

Paper Structure

This paper contains 20 sections, 11 equations, 4 figures, 4 tables.

Figures (4)

  • Figure 1: Overview of FLASH performance. (a–c) Visual comparison of range image super-resolution (16×1024 → 64×1024). (d) Left: radar chart comparing performance metrics against competing methods on KITTI dataset (each metric with custom scale normalization). Right: bar graphs demonstrate that even with MC Dropout enhancement, TULIP variants fail to achieve FLASH's IoU and F1 score performance.
  • Figure 2: FLASH architecture overview. The encoder-decoder network processes low-resolution range images (16×1024) to produce high-resolution outputs (64×1024) through Enhanced Swin-Transformer blocks with Frequency-Aware Window Attention (FA) and Multi-Scale Fusion (MSF) at skip connections. The FA module (right) employs dual-branch processing combining spatial window attention with frequency domain analysis via FFT.
  • Figure 3: Multi-Scale Fusion (MSF) module. Encoder and decoder features are processed through parallel multi-scale convolutions (1×1, 3×3, 5×5). Adaptive weights are generated for position-specific fusion, followed by CBAM refinement for enhanced feature selection.
  • Figure 4: Qualitative results on KITTI. Comparison of super-resolution methods showing (top) noise suppression near sensor, (middle) edge preservation on large vehicles, and (bottom) Fine structural detail recovery on a van's rear section. FLASH demonstrates superior performance in preserving geometric sharpness and reducing artifacts compared to SwinIR, LiDAR-SR, and TULIP.