Table of Contents
Fetching ...

Deep Feature-specific Imaging

Yizhou Lu, Andreas Velten

TL;DR

DeepFSI addresses the challenge of Poisson-dominated noise in photon-counting sensors by introducing an end-to-end optical-electronic framework that learns measurement masks $\boldsymbol{M}$ through backpropagation under realistic noise. By unfreezing traditional FSI and optimizing the sensing layer jointly with a task-specific classifier, DeepFSI delivers higher feature fidelity and task performance than predefined FSI masks, including under additive Gaussian noise, and demonstrates robust performance across mask counts and photon budgets. The approach is validated on MNIST with simulations and hardware SPC experiments, and extended to CIFAR10 with a Vision Transformer-based end-to-end variant (OViT), showing generalization to more complex tasks. These results suggest a practical pathway for noise-robust, photon-limited computational imaging by co-optimizing optics and inference, with strong potential for real-world imaging systems and downstream computer-vision applications.

Abstract

Modern photon-counting sensors are increasingly dominated by Poisson noise, yet conventional Feature-Specific Imaging (FSI) is optimized for additive Gaussian noise, leading to suboptimal performance and a loss of its advantages under Poisson noise. To address this, we introduce DeepFSI, a novel end-to-end optical-electronic framework. DeepFSI "unfreezes" traditional FSI masks, enabling a deep neural network to learn globally optimal measurement masks by computing gradients directly under realistic Poisson and additive noise conditions. Our simulations demonstrate DeepFSI's superior feature fidelity and task performance compared to conventional FSI with predefined masks, especially in Poisson-Noise-dominant environments. DeepFSI also exhibits enhanced robustness to design choices and performs well under additive Gaussian noise, representing a significant advance for noise-robust computational imaging in photon-limited applications.

Deep Feature-specific Imaging

TL;DR

DeepFSI addresses the challenge of Poisson-dominated noise in photon-counting sensors by introducing an end-to-end optical-electronic framework that learns measurement masks through backpropagation under realistic noise. By unfreezing traditional FSI and optimizing the sensing layer jointly with a task-specific classifier, DeepFSI delivers higher feature fidelity and task performance than predefined FSI masks, including under additive Gaussian noise, and demonstrates robust performance across mask counts and photon budgets. The approach is validated on MNIST with simulations and hardware SPC experiments, and extended to CIFAR10 with a Vision Transformer-based end-to-end variant (OViT), showing generalization to more complex tasks. These results suggest a practical pathway for noise-robust, photon-limited computational imaging by co-optimizing optics and inference, with strong potential for real-world imaging systems and downstream computer-vision applications.

Abstract

Modern photon-counting sensors are increasingly dominated by Poisson noise, yet conventional Feature-Specific Imaging (FSI) is optimized for additive Gaussian noise, leading to suboptimal performance and a loss of its advantages under Poisson noise. To address this, we introduce DeepFSI, a novel end-to-end optical-electronic framework. DeepFSI "unfreezes" traditional FSI masks, enabling a deep neural network to learn globally optimal measurement masks by computing gradients directly under realistic Poisson and additive noise conditions. Our simulations demonstrate DeepFSI's superior feature fidelity and task performance compared to conventional FSI with predefined masks, especially in Poisson-Noise-dominant environments. DeepFSI also exhibits enhanced robustness to design choices and performs well under additive Gaussian noise, representing a significant advance for noise-robust computational imaging in photon-limited applications.

Paper Structure

This paper contains 18 sections, 4 equations, 10 figures.

Figures (10)

  • Figure 1: Single-pixel camera configuration. $\boldsymbol{x}$: field of view, $\boldsymbol{M}$: masks of coding, $\boldsymbol{y}$: counted photon numbers at the sensor, $\boldsymbol{M}^{-1}$: reconstruction operator, $\tilde{\boldsymbol{x}}$: reconstructed object. (\ref{['fig:RasterScan']}) Only one pixel (white) is scanned in each measurement, and the measured data requires no reconstruction. (\ref{['fig:BasisScan']}) The sum of all white pixels is measured in each measurement and it requires a decoding step to reconstruct the field of view.
  • Figure 2: (\ref{['fig:model_config_a']}) the general configuration of scanner-classifier networks. (\ref{['fig:model_config_b']}) the node-level architecture of our scanner (green diamonds)-classifier (blue circles) network. Noise is implemented after the sensing matrix $\boldsymbol{M}$. Different from other coding schemes, DeepFSI has a trainable scanner where $\boldsymbol{M}$ is not fixed and can be optimized by the gradient of the classifier.
  • Figure 3: OViT Overview. The process begins by dividing the input image into fixed-size grayscale patches. Poisson noise is then added after the initial linear projection to mimic optical coding within a vision system. These noisy embeddings are subsequently fed into a Vision Transformer. The picture is photographed by the author (Lu).
  • Figure 4: Experimental configuration of the single-pixel camera: The light path is illustrated by the blue arrows in the diagram. Specifically, our setup utilizes only one branch reflected by the DMD.
  • Figure 5: Raw image (left) VS SPC-observed image
  • ...and 5 more figures