Learnable Burst-Encodable Time-of-Flight Imaging for High-Fidelity Long-Distance Depth Sensing
Manchao Bao, Shengjiang Fang, Tao Yue, Xuemei Hu
TL;DR
The paper tackles long-distance depth imaging with ToF by introducing Burst-Encodable Time-of-Flight (BE-ToF), which uses burst-mode modulation to span the full phase range within a single burst and thereby avoids phase wrapping typical of iToF. It proposes an end-to-end learnable framework that jointly optimizes binarized coding functions and a Restormer-based depth reconstruction network, incorporating hardware-friendly constraints via a double-well loss and a first-order-difference loss, plus Fisher guidance to enhance SNR-driven reconstruction. The approach is validated through synthetic assessments and real-world hardware prototypes, showing state-of-the-art MAE performance across distances and SNRs with single-frequency modulation. This work enables robust, long-range depth sensing with more practical hardware requirements, with potential impact on autonomous systems and robotics, while acknowledging limitations in imaging range and privacy considerations.
Abstract
Long-distance depth imaging holds great promise for applications such as autonomous driving and robotics. Direct time-of-flight (dToF) imaging offers high-precision, long-distance depth sensing, yet demands ultra-short pulse light sources and high-resolution time-to-digital converters. In contrast, indirect time-of-flight (iToF) imaging often suffers from phase wrapping and low signal-to-noise ratio (SNR) as the sensing distance increases. In this paper, we introduce a novel ToF imaging paradigm, termed Burst-Encodable Time-of-Flight (BE-ToF), which facilitates high-fidelity, long-distance depth imaging. Specifically, the BE-ToF system emits light pulses in burst mode and estimates the phase delay of the reflected signal over the entire burst period, thereby effectively avoiding the phase wrapping inherent to conventional iToF systems. Moreover, to address the low SNR caused by light attenuation over increasing distances, we propose an end-to-end learnable framework that jointly optimizes the coding functions and the depth reconstruction network. A specialized double well function and first-order difference term are incorporated into the framework to ensure the hardware implementability of the coding functions. The proposed approach is rigorously validated through comprehensive simulations and real-world prototype experiments, demonstrating its effectiveness and practical applicability.
