TDFANet: Encoding Sequential 4D Radar Point Clouds Using Trajectory-Guided Deformable Feature Aggregation for Place Recognition

Shouyi Lu; Guirong Zhuo; Haitao Wang; Quan Zhou; Huanyu Zhou; Renbo Huang; Minqing Huang; Lianqing Zheng; Qiang Shu

TDFANet: Encoding Sequential 4D Radar Point Clouds Using Trajectory-Guided Deformable Feature Aggregation for Place Recognition

Shouyi Lu, Guirong Zhuo, Haitao Wang, Quan Zhou, Huanyu Zhou, Renbo Huang, Minqing Huang, Lianqing Zheng, Qiang Shu

TL;DR

TDFANet tackles place recognition with sequential 4D radar by combining dynamic point removal, BEV feature encoding, ego-velocity–guided trajectory alignment, and a multi-scale spatio-temporal deformable transformer to aggregate features across time. The approach yields a compact global descriptor via GeM pooling and is trained with a metric-learning objective, achieving state-of-the-art performance on a real multi-sensor radar dataset. Core contributions include a trajectory-guided alignment strategy, a spatio-temporal pyramid deformable architecture, and the first end-to-end framework for sequential 4D radar place recognition, validated under dynamic and long-term appearance changes. This work advances radar-based localization robustness in challenging conditions and provides a dataset and codebase to spur further research.

Abstract

Place recognition is essential for achieving closed-loop or global positioning in autonomous vehicles and mobile robots. Despite recent advancements in place recognition using 2D cameras or 3D LiDAR, it remains to be seen how to use 4D radar for place recognition - an increasingly popular sensor for its robustness against adverse weather and lighting conditions. Compared to LiDAR point clouds, radar data are drastically sparser, noisier and in much lower resolution, which hampers their ability to effectively represent scenes, posing significant challenges for 4D radar-based place recognition. This work addresses these challenges by leveraging multi-modal information from sequential 4D radar scans and effectively extracting and aggregating spatio-temporal features.Our approach follows a principled pipeline that comprises (1) dynamic points removal and ego-velocity estimation from velocity property, (2) bird's eye view (BEV) feature encoding on the refined point cloud, (3) feature alignment using BEV feature map motion trajectory calculated by ego-velocity, (4) multi-scale spatio-temporal features of the aligned BEV feature maps are extracted and aggregated.Real-world experimental results validate the feasibility of the proposed method and demonstrate its robustness in handling dynamic environments. Source codes are available.

TDFANet: Encoding Sequential 4D Radar Point Clouds Using Trajectory-Guided Deformable Feature Aggregation for Place Recognition

TL;DR

Abstract

TDFANet: Encoding Sequential 4D Radar Point Clouds Using Trajectory-Guided Deformable Feature Aggregation for Place Recognition

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (4)