Machine vision with small numbers of detected photons per inference

Shi-Yuan Ma; Jérémie Laydevant; Mandar M. Sohoni; Logan G. Wright; Tianyu Wang; Peter L. McMahon

Machine vision with small numbers of detected photons per inference

Shi-Yuan Ma, Jérémie Laydevant, Mandar M. Sohoni, Logan G. Wright, Tianyu Wang, Peter L. McMahon

Abstract

Machine vision, including object recognition and image reconstruction, is a central technology in many consumer devices and scientific instruments. The design of machine-vision systems has been revolutionized by the adoption of end-to-end optimization, in which the optical front end and the post-processing back end are jointly optimized. However, while machine vision currently works extremely well in moderate-light or bright-light situations -- where a camera may detect thousands of photons per pixel and billions of photons per frame -- it is far more challenging in very low-light situations. We introduce photon-aware neuromorphic sensing (PANS), an approach for end-to-end optimization in highly photon-starved scenarios. The training incorporates knowledge of the low photon budget and the stochastic nature of light detection when the average number of photons per pixel is near or less than 1. We report a proof-of-principle experimental demonstration in which we performed low-light image classification using PANS, achieving 73% (82%) accuracy on FashionMNIST with an average of only 4.9 (17) detected photons in total per inference, and 86% (97%) on MNIST with 8.6 (29) detected photons -- orders of magnitude more photon-efficient than conventional approaches. We also report simulation studies showing how PANS could be applied to other classification, event-detection, and image-reconstruction tasks. By taking into account the statistics of measurement results for non-classical states or alternative sensing hardware, PANS could in principle be adapted to enable high-accuracy results in quantum and other photon-starved setups.

Machine vision with small numbers of detected photons per inference

Abstract

Paper Structure (6 sections, 2 equations, 5 figures)

This paper contains 6 sections, 2 equations, 5 figures.

Introduction
Photon-aware neuromorphic sensing (PANS) with highly restricted photon counts
Quantifying information loss at the detection bottleneck
Active PANS using structured illumination
Passive PANS with optical linear operations
Discussion

Figures (5)

Figure 1: Detection bottleneck in optical sensing and photon-aware neuromorphic sensing (PANS) under limited photon counts. A, Conceptual optical sensing pipeline. An object (illustrated by a cat) interacts with a probe signal (e.g., light) shaped by a configurable optical front end (green), which may operate in active (controlled illumination) or passive (incoming-signal modulation) modes. The detector then converts the incident optical signal into digital data via single-photon detection (SPD). When photon budgets are highly limited, this conversion presents a detection bottleneck with significant information loss (lower schematic), which cannot be recovered by subsequent digital processing. A digital back end (e.g., a post-processing neural network) extracts task-relevant information from the detected data. B, Direct imaging vs. PANS under limited photons. Top: in a conventional direct-imaging pipeline, photon-limited image frames (shown as repeated realizations across independent trials 1, 2, 3) exhibit strong shot noise, making downstream inference challenging (e.g., cat vs. dog). Bottom: PANS introduces a parameterized optical front end that transforms the optical field before detection, producing photon-efficient feature measurements; the front end and digital back end are jointly optimized end-to-end through the stochastic detection bottleneck. C, Photon detection bottleneck. With mean photon energy $\lambda$, SPD produces a discrete stochastic digital readout $a$. PANS faithfully models this stochastic forward propagation and applies gradient estimation to enable estimated backpropagation (backprop) through the detection bottleneck, allowing end-to-end optimization under photon-budget constraints.
Figure 2: Active photon-aware neuromorphic sensing (PANS) demonstrated on FashionMNIST object classification.A, Direct imaging (conventional approach). Uniform illumination probes the object, and single-photon detectors directly capture an image frame with $d_{\text{obj}}$ pixels. $N_\text{illu}$ and $N_\text{det}$ denote average total illumination and detection photon budgets, respectively. The example object image (a sneaker) is taken from the FashionMNIST dataset. B, Image frames degrade with decreasing $N_\text{det}$ (denoted above each column) for a pullover (top) and a shirt (bottom). Frames become increasingly noisy as the photon budget decreases. C, Quantifying information loss at the detection bottleneck. As $N_\text{det}$ decreases (left to right), three metrics decline: mutual information with labels (top), Fisher discriminant ratio (FDR; middle), and the test accuracy using a convolutional neural network (bottom). D, Active PANS (our approach). $d_\mathrm{f}$ illumination patterns are projected onto the object, producing a $d_\mathrm{f}$-dimensional feature vector through single-photon detection (see Fig. \ref{['fig:expresults']} and Appendix 12 for details of the experimental protocol). E, Experimental results on FashionMNIST. Top: confusion matrices at two different photon budgets. Bottom: Test accuracy vs. $N_{\mathrm{det}}$ (left) and $N_{\mathrm{illu}}$ (right). $N_{\mathrm{illu}}$ is the total illumination incident on the object (uniform for direct imaging; sum of pattern intensities for structured illumination; Appendix 5). Red markers: active PANS experiment (mean $\pm$ std over 30 trials per image) for $d_\text{f} = 3, 4, 6, 10, 16, 24, 32$; light red shade: corresponding simulation (mean $\pm$ 3 std). Blue curve: direct imaging baseline (from C). Green curve: conventional E2E without photon-aware modeling (non-PA E2E; Appendix 8C). F, 2D t-SNE van2008visualizing visualization comparing feature distributions. Active PANS (red boxes) versus direct imaging (blue boxes) at different $N_\text{det}$ values, with test accuracies shown.
Figure 3: Stochastic single-shot inference and experimental validation of active PANS.A, Stochastic single-shot inference under extreme photon constraints. $d_\mathrm{f}$ learned illumination patterns $\{\vec{w}_i\}_{i=1}^{d_\mathrm{f}}$ are sequentially projected onto an object with transmission $\vec{x}$, each producing a binary single-photon detection (SPD) readout. Because detection is highly stochastic at these photon levels, $n_\mathrm{T}$ independent trials on the same object yield different feature vectors (bottom), each processed by the digital back end. Despite this trial-to-trial variability, the system consistently identifies the correct class across trials. The aggregate output distribution over $n_\mathrm{T}$ inferences (right) reflects the classification confidence (Appendix 13). B--C, Experimental classification accuracy (red, mean $\pm$ std over $n_\mathrm{T}=30$ trials) versus number of illumination patterns $d_\mathrm{f}$ for FashionMNIST (B) and MNIST (C) under single-shot operation. The FashionMNIST data correspond to the red markers in Fig. \ref{['fig:fashion']}E, here plotted against $d_\mathrm{f}$. Simulation results (light red band, mean $\pm\,3$ std) show close agreement with experiment. Annotations indicate the average total detected photon budget per inference $N_{\mathrm{det}}$ at selected $d_\mathrm{f}$ (Appendix 13).
Figure 4: Proposed real-time image sensing with active photon-aware neuromorphic sensing (PANS) in simulation.A, Conceptual wavelength-multiplexed implementation for flow-cytometric cell sorting. Multiple static illumination patterns at distinct optical wavelengths (illustrated as $\vec{w}_1,\vec{w}_2,\vec{w}_3$ with different colors) are applied simultaneously; wavelength demultiplexing routes each channel to a dedicated photon counter, producing activations $(a_1,a_2,a_3)$ in parallel for real-time digital processing. B, Simulated test accuracy for cell-organelle classification versus total detected photons $N_{\mathrm{det}}$ under different detector dark-count rates (DCRs), compared with an ideal direct-imaging baseline. C, Example real-time sequence. Left: representative frames as a cell traverses the illumination field (top to bottom). Right: corresponding model outputs over time across classes (Mem.: membrane; Nuc.: nucleolus; Mit.: mitochondria; Null: no cell present). D, Barcode identification task. The illumination field spans a 10-bar window (red box); the goal is to decide whether the target subsequence "1010" appears at any position. E, Simulated test accuracy for barcode identification versus $N_{\mathrm{det}}$ under multiple DCR values. Direct imaging accounts only for ideal shot noise (no dark counts or additional detector noise), highlighting the robustness of active PANS under realistic counting noise.
Figure 5: Diagram and applications of passive photon-aware neuromorphic sensing (PANS) in simulation.A, Passive PANS vs. direct imaging for sensing images transmitted through a scattering multimode fiber (MMF). Input images propagate through the MMF, emerging as speckle patterns that scramble spatial information. In passive PANS, speckles pass through a passive optical encoder before detection; in direct imaging, speckle frames are captured directly at the image plane with $N_\text{det}$ photons per frame. Both schemes use post-processing neural networks for classification or reconstruction. B, Classification accuracy on MNIST speckle images versus $N_\text{det}$, with dark count rates (DCRs) of 1%, 5%, and 10% for passive PANS. Direct imaging (blue curve) simulated using only ideal shot noise. Inset: direct imaging accuracy at higher $N_\text{det}$. Red markers show passive PANS with $d_\mathrm{f}=4,5,6,8,10,16,24,32$. C, Average structural similarity index (SSIM) of reconstructed images from scattered speckles, evaluated with different DCRs. Passive PANS data points: $d_\mathrm{f}=4,6,8,10,32,48,64$. D, Example images showing original MNIST digits (top), corresponding speckle patterns (middle), and reconstructed images (bottom) from passive PANS ($d_\mathrm{f} = 64$, DCR = 1%). E, Transient event detection: fleeting objects in noisy backgrounds are identified in a monitored scene (left). Right: test accuracy vs. $N_\text{det}$. Passive PANS: $d_\mathrm{f}=2,4,5,6,8,10$. F, Tissue blood flow detection via speckle contrast imaging. G, Compact nebula classification. H, Optical fiber end-face contamination inspection. Insets show direct imaging performance at higher photon counts.

Machine vision with small numbers of detected photons per inference

Abstract

Machine vision with small numbers of detected photons per inference

Authors

Abstract

Table of Contents

Figures (5)