PathFinder: Attention-Driven Dynamic Non-Line-of-Sight Tracking with a Mobile Robot

Shenbagaraj Kannapiran; Sreenithy Chandran; Suren Jayasuriya; Spring Berman

PathFinder: Attention-Driven Dynamic Non-Line-of-Sight Tracking with a Mobile Robot

Shenbagaraj Kannapiran, Sreenithy Chandran, Suren Jayasuriya, Spring Berman

TL;DR

This work tackles dynamic non-line-of-sight tracking with a moving platform by introducing PathFinder, a data-driven approach that uses a standard RGB camera on a drone to recover a hidden person’s 2D trajectory. It combines a plane extraction pipeline with a dual-branch transformer network (NLOS-Patch) that processes multiple relay walls (planes) via example packing, followed by a plane-wise optimization to fuse information across planes in the global frame. Key contributions include PlaneRecNet-based plane extraction, the NLOS-Patch network with MPP-T and DPP-T, synthetic and real-world drone datasets, and a real-time inference pipeline that outperforms state-of-the-art baselines in dynamic-camera NLOS tracking. The method demonstrates high-accuracy NLOS tracking in real-world environments and provides a dataset release to support further research in low-cost NLOS imaging under motion.

Abstract

The study of non-line-of-sight (NLOS) imaging is growing due to its many potential applications, including rescue operations and pedestrian detection by self-driving cars. However, implementing NLOS imaging on a moving camera remains an open area of research. Existing NLOS imaging methods rely on time-resolved detectors and laser configurations that require precise optical alignment, making it difficult to deploy them in dynamic environments. This work proposes a data-driven approach to NLOS imaging, PathFinder, that can be used with a standard RGB camera mounted on a small, power-constrained mobile robot, such as an aerial drone. Our experimental pipeline is designed to accurately estimate the 2D trajectory of a person who moves in a Manhattan-world environment while remaining hidden from the camera's field-of-view. We introduce a novel approach to process a sequence of dynamic successive frames in a line-of-sight (LOS) video using an attention-based neural network that performs inference in real-time. The method also includes a preprocessing selection metric that analyzes images from a moving camera which contain multiple vertical planar surfaces, such as walls and building facades, and extracts planes that return maximum NLOS information. We validate the approach on in-the-wild scenes using a drone for video capture, thus demonstrating low-cost NLOS imaging in dynamic capture environments.

PathFinder: Attention-Driven Dynamic Non-Line-of-Sight Tracking with a Mobile Robot

TL;DR

Abstract

Paper Structure (21 sections, 2 equations, 8 figures, 2 tables)

This paper contains 21 sections, 2 equations, 8 figures, 2 tables.

Introduction
Problem Statement
NLOS Tracking Pipeline
Plane Extraction Pipeline
NLOS-Patch Network
Patchify
Factorized Positional Embedding
Masked Self-Attention
Loss Functions
Optimization Pipeline
Datasets for NLOS-Patch Network Training
Synthetic Dataset
Real-World Dataset and Hardware Configuration
Experiments
Training Procedure
...and 6 more sections

Figures (8)

Figure 1: NLOS imaging task addressed by our method, which estimates a person's 2D trajectory by leveraging the light scatter information in a drone's capture of several relay walls.
Figure 2: Inference pipeline for NLOS object tracking. Raw images, stereo image pairs, and IMU data are input to VIO to estimate the camera pose. PlaneRecNet generates plane masks from consecutive images. Homography from feature matching is applied to plane masks, creating difference images and plane IDs $k$. The raw image at time step $i+1$ and the difference image between time steps $i$ and $i+1$ are input to MPP-T and DPP-T networks, which compute the estimates $\mathbf{X}_m$ and $\mathbf{V}_m$ for each example plane $m$ (see Section \ref{['nlospatchnet']}). These estimates, along with the camera pose, plane IDs, and unit vector normal to each plane, are input to an optimization layer to compute the NLOS object's trajectory. The figure on the right shows details of the Plane-Patch Transformer architecture.
Figure 3: (a) Overhead view of a sample synthetic NLOS scene simulated using Blender, showing the camera ( lower left), human character (NLOS object), and sources of ambient lighting in the room. (b) Samples of the eight characters from the Mixamo library that were used for synthetic data generation. (c) Samples of three sets of relay walls with different materials that were used for synthetic data generation.
Figure 4: (a) Real-world data collection setup: A drone captures images of a relay wall while a person (NLOS object) is hidden from view. The person's ground-truth position is obtained using motion capture cameras. (b) Side view of the setup. (c) Helmet mounted with IR markers for ground-truth data collection. (d) Samples of FOV regions in the dataset, with different surface textures and types of objects present.
Figure 5: Customized drone equipped with Intel RealSense cameras ( highlighted in red) for visual inertial odometry (VIO) during indoor flight. Raw data, including color camera images, stereo images, and IMU data, are extracted for VIO and camera pose estimation. The onboard Jetson Nano ( highlighted in yellow) facilitates real-time processing.
...and 3 more figures

PathFinder: Attention-Driven Dynamic Non-Line-of-Sight Tracking with a Mobile Robot

TL;DR

Abstract

PathFinder: Attention-Driven Dynamic Non-Line-of-Sight Tracking with a Mobile Robot

Authors

TL;DR

Abstract

Table of Contents

Figures (8)