Table of Contents
Fetching ...

Radon Implicit Field Transform (RIFT): Learning Scenes from Radar Signals

Daqian Bao, Alex Saad-Falcon, Justin Romberg

TL;DR

RIFT introduces a Radon Implicit Field Transform that fuses the Generalized Radon Transform forward model with an Implicit Neural Representation to learn a scene's complex reflectivity directly from radar signals. By modeling $ ho(oldsymbol{x})$ with an INR and optimizing against radar measurements through a data-fidelity objective that leverages an $L_1$ loss for stability, RIFT enables high-quality scene reconstruction and viewpoint interpolation from a fraction of the conventional data. The work defines new metrics, including $p$-RMSE for phase (viewpoint interpolation) and $m$-SSIM/$m$-COS/$tIoU$ for magnitude-based scene recovery, and demonstrates that RIFT achieves substantial data-efficiency gains across simple and complex scenes, a weak-target case in the far-field, and a real-simulator scenario. These results highlight the potential for compact, data-efficient radar scene learning while noting stability challenges and the need for real-world datasets to fully validate the approach.

Abstract

Data acquisition in array signal processing (ASP) is costly because achieving high angular and range resolutions necessitates large antenna apertures and wide frequency bandwidths, respectively. The data requirements for ASP problems grow multiplicatively with the number of viewpoints and frequencies, significantly increasing the burden of data collection, even for simulation. Implicit Neural Representations (INRs) -- neural network-based models of 3D objects and scenes -- offer compact and continuous representations with minimal radar data. They can interpolate to unseen viewpoints and potentially address the sampling cost in ASP problems. In this work, we select Synthetic Aperture Radar (SAR) as a case from ASP and propose Radon Implicit Field Transform (RIFT). RIFT consists of two components: a classical forward model for radar (Generalized Radon Transform, GRT), and an INR based scene representation learned from radar signals. This method can be extended to other ASP problems by replacing the GRT with appropriate algorithms corresponding to different data modalities. In our experiments, we first synthesize radar data using the GRT. We then train the INR model on this synthetic data by minimizing the reconstruction error of the radar signal. After training, we render the scene using the trained INR and evaluate our scene representation against the ground truth scene. Due to the lack of existing benchmarks, we introduce two main new error metrics: phase-Root Mean Square Error (p-RMSE) for radar signal interpolation, and magnitude-Structural Similarity Index measure(m-SSIM) for scene reconstruction. These metrics adapt traditional error measures to account for the complex nature of radar signals. Compared to traditional scene models in radar signal processing, with only 10% data footprint, our RIFT model achieves up to 188% improvement in scene reconstruction.

Radon Implicit Field Transform (RIFT): Learning Scenes from Radar Signals

TL;DR

RIFT introduces a Radon Implicit Field Transform that fuses the Generalized Radon Transform forward model with an Implicit Neural Representation to learn a scene's complex reflectivity directly from radar signals. By modeling with an INR and optimizing against radar measurements through a data-fidelity objective that leverages an loss for stability, RIFT enables high-quality scene reconstruction and viewpoint interpolation from a fraction of the conventional data. The work defines new metrics, including -RMSE for phase (viewpoint interpolation) and -SSIM/-COS/ for magnitude-based scene recovery, and demonstrates that RIFT achieves substantial data-efficiency gains across simple and complex scenes, a weak-target case in the far-field, and a real-simulator scenario. These results highlight the potential for compact, data-efficient radar scene learning while noting stability challenges and the need for real-world datasets to fully validate the approach.

Abstract

Data acquisition in array signal processing (ASP) is costly because achieving high angular and range resolutions necessitates large antenna apertures and wide frequency bandwidths, respectively. The data requirements for ASP problems grow multiplicatively with the number of viewpoints and frequencies, significantly increasing the burden of data collection, even for simulation. Implicit Neural Representations (INRs) -- neural network-based models of 3D objects and scenes -- offer compact and continuous representations with minimal radar data. They can interpolate to unseen viewpoints and potentially address the sampling cost in ASP problems. In this work, we select Synthetic Aperture Radar (SAR) as a case from ASP and propose Radon Implicit Field Transform (RIFT). RIFT consists of two components: a classical forward model for radar (Generalized Radon Transform, GRT), and an INR based scene representation learned from radar signals. This method can be extended to other ASP problems by replacing the GRT with appropriate algorithms corresponding to different data modalities. In our experiments, we first synthesize radar data using the GRT. We then train the INR model on this synthetic data by minimizing the reconstruction error of the radar signal. After training, we render the scene using the trained INR and evaluate our scene representation against the ground truth scene. Due to the lack of existing benchmarks, we introduce two main new error metrics: phase-Root Mean Square Error (p-RMSE) for radar signal interpolation, and magnitude-Structural Similarity Index measure(m-SSIM) for scene reconstruction. These metrics adapt traditional error measures to account for the complex nature of radar signals. Compared to traditional scene models in radar signal processing, with only 10% data footprint, our RIFT model achieves up to 188% improvement in scene reconstruction.

Paper Structure

This paper contains 27 sections, 13 equations, 10 figures, 7 tables.

Figures (10)

  • Figure 1: Workflow chart of the RIFT architecture. The diagram illustrates how RIFT models physical radar sensing, transforms the learned scene into radar signals through the GRT segment, and iteratively refines the scene representation via backpropagation.
  • Figure 2: Visualizations of the "cube" scene (a): Ground truth of a cube of edge of 2m. (b)-(e): Scene reconstruction by the baseline with 100, 200, 500, and 1000 viewpoints, respectively. (f): Scene reconstruction by RIFT(N) with 100 The m-SSIM score and p-RMSE of reconstruction in (f) is 0.6395 and 5.4986, 274% and 11% better than those of reconstruction in (e) while only using 10% of the viewpoints, respectively. (g), (h) Scene reconstruction by RIFT(N/S) with 1000 viewpoints as references.
  • Figure 3: Visualizations for presenting the need of data from different models. (a)-(d) Scene reconstruction by the baseline least square model with 100, 200, 500, and 1000 viewpoints. (e)-(h) Scene reconstruction by our RIFT(N) model with 100, 200, 500, and 1000 viewpoints.
  • Figure 4: Visualizations of the "mini parking lot" scene from Section \ref{['Results']}: (a) Scene reconstructed by the baseline model. (b) Scene reconstructed by RIFT. (c) Ground truth scene visualized with the same granularity (defined in Section \ref{['grt']}) as scene reconstruction. Under the same number of input, scene reconstruction by our RIFT model achieved upto 300% higher score in scene reconstruction than baseline by only using 40% of the data samples. The detailed data is in Table \ref{['scene_1_table']}.
  • Figure 5: Visualizations for Weak Target Detection: (a)-(d) Scene reconstruction by the baseline with no difference in reflectivity, 2$\times$, 3$\times$ and 4$\times$ difference in reflectivity. (a)-(d) Scene reconstruction by the RIFT with no difference in reflectivity, 2$\times$, 3$\times$ and 4$\times$ difference in reflectivity.
  • ...and 5 more figures