3D Reconstruction from Transient Measurements with Time-Resolved Transformer
Yue Li, Shida Sun, Yu Hong, Feihu Xu, Zhiwei Xiong
TL;DR
This paper introduces Time-Resolved Transformer (TRT), a transformer-based architecture designed to exploit local and global spatio-temporal correlations in transient measurements for photon-efficient 3D reconstruction. It defines two attention mechanisms—spatio-temporal self-attention encoders and spatio-temporal cross attention decoders—to produce deep local and global feature representations, which are fused to reconstruct LOS and NLOS scenes. TRT-LOS and TRT-NLOS demonstrate state-of-the-art performance on synthetic and real-world data, with a dedicated transient denoiser for NLOS and large synthetic LOS datasets to support training. The approach offers robust generalization across different imaging systems and sensor noise levels, advancing practical 3D imaging in challenging environments.
Abstract
Transient measurements, captured by the timeresolved systems, are widely employed in photon-efficient reconstruction tasks, including line-of-sight (LOS) and non-line-of-sight (NLOS) imaging. However, challenges persist in their 3D reconstruction due to the low quantum efficiency of sensors and the high noise levels, particularly for long-range or complex scenes. To boost the 3D reconstruction performance in photon-efficient imaging, we propose a generic Time-Resolved Transformer (TRT) architecture. Different from existing transformers designed for high-dimensional data, TRT has two elaborate attention designs tailored for the spatio-temporal transient measurements. Specifically, the spatio-temporal self-attention encoders explore both local and global correlations within transient data by splitting or downsampling input features into different scales. Then, the spatio-temporal cross attention decoders integrate the local and global features in the token space, resulting in deep features with high representation capabilities. Building on TRT, we develop two task-specific embodiments: TRT-LOS for LOS imaging and TRT-NLOS for NLOS imaging. Extensive experiments demonstrate that both embodiments significantly outperform existing methods on synthetic data and real-world data captured by different imaging systems. In addition, we contribute a large-scale, high-resolution synthetic LOS dataset with various noise levels and capture a set of real-world NLOS measurements using a custom-built imaging system, enhancing the data diversity in this field. Code and datasets are available at https://github.com/Depth2World/TRT.
