Streaming quanta sensors for online, high-performance imaging and vision
Tianyi Zhang, Matthew Dutson, Vivek Boominathan, Mohit Gupta, Ashok Veeraraghavan
TL;DR
The paper tackles the data-bandwidth and processing bottlenecks of ultra-fast SPAD-based quanta image sensors (QIS) by introducing a compact streaming representation that updates per binary frame and stores multi-time-scale information. A feed-forward neural network reconstructs intensity frames from an 8-channel streaming exposure stack in real time (10–30 fps), yielding ~100× bandwidth reductions and 10^4–10^5× computational speedups over prior methods. The approach enables near-real-time image reconstruction on QIS and supports downstream vision tasks (detection, tracking, pose estimation) with real-time performance, validated on synthetic and real data using a semi-realistic QIS dataset. The work demonstrates how streaming perception can bridge high-speed sensing and practical vision systems, while outlining limitations and avenues for end-to-end streaming architectures and alternative representations. Overall, this method significantly lowers data and compute requirements for QIS-enabled vision, making real-time QIS-enabled imaging and inference feasible in resource-constrained settings.
Abstract
Recently quanta image sensors (QIS) -- ultra-fast, zero-read-noise binary image sensors -- have demonstrated remarkable imaging capabilities in many challenging scenarios. Despite their potential, the adoption of these sensors is severely hampered by (a) high data rates and (b) the need for new computational pipelines to handle the unconventional raw data. We introduce a simple, low-bandwidth computational pipeline to address these challenges. Our approach is based on a novel streaming representation with a small memory footprint, efficiently capturing intensity information at multiple temporal scales. Updating the representation requires only 16 floating-point operations/pixel, which can be efficiently computed online at the native frame rate of the binary frames. We use a neural network operating on this representation to reconstruct videos in real-time (10-30 fps). We illustrate why such representation is well-suited for these emerging sensors, and how it offers low latency and high frame rate while retaining flexibility for downstream computer vision. Our approach results in significant data bandwidth reductions ~100X and real-time image reconstruction and computer vision -- $10^4$-$10^5$ reduction in computation than existing state-of-the-art approach while maintaining comparable quality. To the best of our knowledge, our approach is the first to achieve online, real-time image reconstruction on QIS.
