Table of Contents
Fetching ...

bit2bit: 1-bit quanta video reconstruction via self-supervised photon prediction

Yehe Liu, Alexander Krull, Hector Basevi, Ales Leonardis, Michael W. Jenkins

Abstract

Quanta image sensors, such as SPAD arrays, are an emerging sensor technology, producing 1-bit arrays representing photon detection events over exposures as short as a few nanoseconds. In practice, raw data are post-processed using heavy spatiotemporal binning to create more useful and interpretable images at the cost of degrading spatiotemporal resolution. In this work, we propose bit2bit, a new method for reconstructing high-quality image stacks at the original spatiotemporal resolution from sparse binary quanta image data. Inspired by recent work on Poisson denoising, we developed an algorithm that creates a dense image sequence from sparse binary photon data by predicting the photon arrival location probability distribution. However, due to the binary nature of the data, we show that the assumption of a Poisson distribution is inadequate. Instead, we model the process with a Bernoulli lattice process from the truncated Poisson. This leads to the proposal of a novel self-supervised solution based on a masked loss function. We evaluate our method using both simulated and real data. On simulated data from a conventional video, we achieve 34.35 mean PSNR with extremely photon-sparse binary input (<0.06 photons per pixel per frame). We also present a novel dataset containing a wide range of real SPAD high-speed videos under various challenging imaging conditions. The scenes cover strong/weak ambient light, strong motion, ultra-fast events, etc., which will be made available to the community, on which we demonstrate the promise of our approach. Both reconstruction quality and throughput substantially surpass the state-of-the-art methods (e.g., Quanta Burst Photography (QBP)). Our approach significantly enhances the visualization and usability of the data, enabling the application of existing analysis techniques.

bit2bit: 1-bit quanta video reconstruction via self-supervised photon prediction

Abstract

Quanta image sensors, such as SPAD arrays, are an emerging sensor technology, producing 1-bit arrays representing photon detection events over exposures as short as a few nanoseconds. In practice, raw data are post-processed using heavy spatiotemporal binning to create more useful and interpretable images at the cost of degrading spatiotemporal resolution. In this work, we propose bit2bit, a new method for reconstructing high-quality image stacks at the original spatiotemporal resolution from sparse binary quanta image data. Inspired by recent work on Poisson denoising, we developed an algorithm that creates a dense image sequence from sparse binary photon data by predicting the photon arrival location probability distribution. However, due to the binary nature of the data, we show that the assumption of a Poisson distribution is inadequate. Instead, we model the process with a Bernoulli lattice process from the truncated Poisson. This leads to the proposal of a novel self-supervised solution based on a masked loss function. We evaluate our method using both simulated and real data. On simulated data from a conventional video, we achieve 34.35 mean PSNR with extremely photon-sparse binary input (<0.06 photons per pixel per frame). We also present a novel dataset containing a wide range of real SPAD high-speed videos under various challenging imaging conditions. The scenes cover strong/weak ambient light, strong motion, ultra-fast events, etc., which will be made available to the community, on which we demonstrate the promise of our approach. Both reconstruction quality and throughput substantially surpass the state-of-the-art methods (e.g., Quanta Burst Photography (QBP)). Our approach significantly enhances the visualization and usability of the data, enabling the application of existing analysis techniques.
Paper Structure (42 sections, 6 equations, 22 figures, 9 tables)

This paper contains 42 sections, 6 equations, 22 figures, 9 tables.

Figures (22)

  • Figure 1: Visualization of the reconstruction task. a. A signal in spacetime generates discrete photons through a Poisson process. Real detectors can only count one photon at a time. The discrete nature of photons and the discrete counting process introduce shot noise, resulting in a sparse binary map. Our goal is to predict the underlying signal from this information-sparse data. b. Real SPAD raw data captured by a detector. The highlighted box indicates a zoomed-in region, revealing sparse photon detection events. To the right is a cross-section of the time-height dimensions, showing a similar binary noisy pattern. c. Our method produces the video from the data in b at the original spatiotemporal resolution (Video S1). d. Left: effect of accumulating raw data frames directly, showing shot noise and motion artifacts. Right: additional keyframe pairs are provided for reference.
  • Figure 2: Example Results from Our Method Using Real SPAD Data The top row displays raw SPAD data. The middle row shows the corresponding reconstructions using our method. CPU Fan + motion: Imaged under camera motion. Additional paired raw data and reconstruction keyframes are shown below. H&E slide: Moving under a microscope. Sonicating bubbles: Humidifier generates bubbles, water droplets, and mist. USAF 1951 + drill: Resolution target spinning on a drill. Plasma ball: Firing plasma. A color-coded accumulation of 50 frames is shown on the right. [More in supp]
  • Figure 3: Overview of the sampling/masking strategy. The raw data is processed in 3D to use space and time similarly. Data pairs are created by random 3D crop from the raw data, then randomly split the positive values into an input or a target matrix. The split ratio is controlled by a parameter p. A mask is created by flipping the bits in the input image, which prevents gradient back-propagation from locations of 1s in the input. This process is repeated indefinitely, each time creating a new pair of data equivalent to independent observations from the underlying signal.
  • Figure 4: Real data examples of photon splitting and the effect of the masked loss a. Example of splitting a randomly selected quanta image raw data frame. The Raw data consists of only binary pixels indicating the location of the photon counting event. The Split indicates the Input (black) and Target (white) of the split. The Mask is calculated by inverting the Input and is applied to the loss. b. Comparison of the training results with unmasked and masked loss. Without the masked loss, the network learns that whenever a pixel location has a photon in the input, it never has a photon in the target. The deterministic relationship leads to the artifacts. The pixel locations where the input is 1 appear dark in the network output. The masked loss effectively addresses the problem.
  • Figure 5: Results of ablation studies. a. Group normalization substantially improved the PSNR. b. The choice of lower and c. upper bound of the thinning probability p affects the reconstruction quality. (.9x6:$1-10^{6}$, etc.) d. Fixed large p led to performance degradation despite the proposed single photon prediction suggested in GAP. e. Large model size could negatively impact PSNR. Rome numbers indicate the corresponding images in Fig. \ref{['fig:S3']}. Numerical values in Table S2-6.
  • ...and 17 more figures