Enabling Visual Recognition at Radio Frequency

Haowen Lai; Gaoxiang Luo; Yifei Liu; Mingmin Zhao

Enabling Visual Recognition at Radio Frequency

Haowen Lai, Gaoxiang Luo, Yifei Liu, Mingmin Zhao

TL;DR

PanoRadar introduces a LiDAR-like RF imaging system that uses a rotating mmWave radar to form a dense cylindrical aperture, enabling 3D RF imaging and first-time visual recognition tasks such as surface normal estimation, semantic segmentation, and object detection. It combines robust motion estimation to compensate for platform movement, learning-based elevation-resolution enhancement using 2D convolutions, and cross-modal supervision from LiDAR to recover high-fidelity 3D structures. The approach demonstrates accurate range imaging (MAE ≈ $15.76\text{ cm}$ with median $3.39\text{ cm}$), competitive surface normal and semantic/detection metrics, and strong cross-building generalization across 12 buildings, culminating in a practical, low-cost RF perception pipeline. The work presents a comprehensive dataset of 11,033 synchronized RF–LiDAR scenes (461 GB) and opens avenues for RF-based perception in robotics and harsh-environment settings, potentially complementing or replacing LiDAR in certain applications.

Abstract

This paper introduces PanoRadar, a novel RF imaging system that brings RF resolution close to that of LiDAR, while providing resilience against conditions challenging for optical signals. Our LiDAR-comparable 3D imaging results enable, for the first time, a variety of visual recognition tasks at radio frequency, including surface normal estimation, semantic segmentation, and object detection. PanoRadar utilizes a rotating single-chip mmWave radar, along with a combination of novel signal processing and machine learning algorithms, to create high-resolution 3D images of the surroundings. Our system accurately estimates robot motion, allowing for coherent imaging through a dense grid of synthetic antennas. It also exploits the high azimuth resolution to enhance elevation resolution using learning-based methods. Furthermore, PanoRadar tackles 3D learning via 2D convolutions and addresses challenges due to the unique characteristics of RF signals. Our results demonstrate PanoRadar's robust performance across 12 buildings.

Enabling Visual Recognition at Radio Frequency

TL;DR

with median

), competitive surface normal and semantic/detection metrics, and strong cross-building generalization across 12 buildings, culminating in a practical, low-cost RF perception pipeline. The work presents a comprehensive dataset of 11,033 synchronized RF–LiDAR scenes (461 GB) and opens avenues for RF-based perception in robotics and harsh-environment settings, potentially complementing or replacing LiDAR in certain applications.

Abstract

Paper Structure (18 sections, 1 theorem, 16 equations, 24 figures, 4 tables)

This paper contains 18 sections, 1 theorem, 16 equations, 24 figures, 4 tables.

Introduction
Related Work
Overview
Cylindrical Array Imaging
Motion Estimation and Imaging
AoA and Doppler Effect
Robust Motion Estimation
Efficient Compensation and Imaging
Enhanced Imaging with ML
Resolution Enhancement with ML
Visual Recognition with ML
Panoramic Learning
Ground Truth Labels
Implementation and Dataset
Evaluation
...and 3 more sections

Key Result

lemma 1

Consider a circular array of radius $r$ and a reflector at angle $\theta=0$, due to resolution, this reflector influences the imaging of nearby angle $\theta_s$, with voltage $E(\theta_s)$ as: where $J_0$ denotes the Bessel function of the first kind bessel. The 3-dB beamwidth of this beam is, therefore, the angular resolution of the circular array: where $d=2r$ is the diameter of the circular a

Figures (24)

Figure 1: RF imaging and visual recognition with PanoRadar. This figure illustrates the capabilities of our system, showing (a) the 3D panoramic LiDAR range image as a reference, and (b) the RF-based prediction generated by our system. Our LiDAR-comparable results enable a variety of visual recognition tasks, including (c) surface normal estimation, (d) semantic segmentation, and (e) object detection and human localization. Additionally, we present (f) the LiDAR 3D point cloud color-coded with manually-annotated semantic labels, and (g) the predicted RF-based point cloud color-coded by the corresponding predicted semantic categories, which offers an enriched understanding of the 3D surroundings.
Figure 2: PanoRadar design. Left: our system rotates a single-chip mmWave radar using a motor, with its linear antenna array placed vertically. Right: this rotation emulates a dense cylindrical array of antennas.
Figure 3: PanoRadar architecture with four components: a 3D imaging system for cylindrical arrays (§ \ref{['sec:static']}), a motion estimation and compensation algorithm (§ \ref{['sec:motion_estimation_and_imaging']}), a resolution enhancement and range image estimation model (§ \ref{['sec:range_estimation_ml']}), and visual recognition heads for downstream tasks (§ \ref{['sec:visual_recog_model']}).
Figure 4: RF imaging results with a stationary robot. Our beamforming results capture humans in a rough shape, with limited elevation resolution.
Figure 5: Distortion of the imaging results due to robot motion. 2D visualizations show that the distortion gets worse as the robot starts to move.
...and 19 more figures

Theorems & Definitions (1)

lemma 1

Enabling Visual Recognition at Radio Frequency

TL;DR

Abstract

Enabling Visual Recognition at Radio Frequency

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (24)

Theorems & Definitions (1)