Table of Contents
Fetching ...

Compute-first optical detection for noise-resilient visual perception

Jungmin Kim, Nanfang Yu, Zongfu Yu

TL;DR

This work introduces a compute-first optical detection paradigm whereby pre-detection optical processing concentrates scene light via a linear, unitary transform $P$ before detection, improving robustness to detection noise in visual perception tasks. The authors formulate a discrete, unitary model $\mathbf{E}^{(out)} = P \mathbf{E}^{(in)}$ with detection noise sources $\Delta I_{\text{photon}}$ and $\Delta I_{\text{dark}}$, and demonstrate enhanced MNIST classification performance under dark noise using both machine-learned and manually designed optical transforms. A key finding is that signal concentration improves resilience to dark noise but not to photon shot noise, which scales with the output intensity; increased compression via block-wise Fourier processing can further boost robustness at the cost of information loss. The paper also validates a practical incoherent meta-imaging system with trainable metalenses that yields higher-contrast images and better downstream machine perception under noise, illustrating the potential of optical computing to advance infrared machine vision in industrial and defense contexts. Overall, the work highlights the value of pre-detection optical computation for noise-resilient perception and provides a framework and demonstrations across theory, simulation, and a practical diffractive-optics implementation.

Abstract

In the context of visual perception, the optical signal from a scene is transferred into the electronic domain by detectors in the form of image data, which are then processed for the extraction of visual information. In noisy and weak-signal environments such as thermal imaging for night vision applications, however, the performance of neural computing tasks faces a significant bottleneck due to the inherent degradation of data quality upon noisy detection. Here, we propose a concept of optical signal processing before detection to address this issue. We demonstrate that spatially redistributing optical signals through a properly designed linear transformer can enhance the detection noise resilience of visual perception tasks, as benchmarked with the MNIST classification. Our idea is supported by a quantitative analysis detailing the relationship between signal concentration and noise robustness, as well as its practical implementation in an incoherent imaging system. This compute-first detection scheme can pave the way for advancing infrared machine vision technologies widely used for industrial and defense applications.

Compute-first optical detection for noise-resilient visual perception

TL;DR

This work introduces a compute-first optical detection paradigm whereby pre-detection optical processing concentrates scene light via a linear, unitary transform before detection, improving robustness to detection noise in visual perception tasks. The authors formulate a discrete, unitary model with detection noise sources and , and demonstrate enhanced MNIST classification performance under dark noise using both machine-learned and manually designed optical transforms. A key finding is that signal concentration improves resilience to dark noise but not to photon shot noise, which scales with the output intensity; increased compression via block-wise Fourier processing can further boost robustness at the cost of information loss. The paper also validates a practical incoherent meta-imaging system with trainable metalenses that yields higher-contrast images and better downstream machine perception under noise, illustrating the potential of optical computing to advance infrared machine vision in industrial and defense contexts. Overall, the work highlights the value of pre-detection optical computation for noise-resilient perception and provides a framework and demonstrations across theory, simulation, and a practical diffractive-optics implementation.

Abstract

In the context of visual perception, the optical signal from a scene is transferred into the electronic domain by detectors in the form of image data, which are then processed for the extraction of visual information. In noisy and weak-signal environments such as thermal imaging for night vision applications, however, the performance of neural computing tasks faces a significant bottleneck due to the inherent degradation of data quality upon noisy detection. Here, we propose a concept of optical signal processing before detection to address this issue. We demonstrate that spatially redistributing optical signals through a properly designed linear transformer can enhance the detection noise resilience of visual perception tasks, as benchmarked with the MNIST classification. Our idea is supported by a quantitative analysis detailing the relationship between signal concentration and noise robustness, as well as its practical implementation in an incoherent imaging system. This compute-first detection scheme can pave the way for advancing infrared machine vision technologies widely used for industrial and defense applications.
Paper Structure (17 sections, 12 equations, 5 figures)

This paper contains 17 sections, 12 equations, 5 figures.

Figures (5)

  • Figure 1: Concept of optical compute-first detection system for visual perception. a, Conventional procedure: the wave signal from a scene is converted to image data by a photodetector (PD) array, with additional detection noise. Subsequently, a digital processor processes the image data, extracting a latent feature of the scene. b, Proposed scheme: the wave signal undergoes primary modulation ahead of detection through an optical processing unit (OPU). It is detected and then post-processed in the digital domain to produce the final visual information. $\vb E^\mathrm{(in,out)}$, input and output state of waves; $\vb x$, detected value in electronic domain; $\vb y$, target feature.
  • Figure 2: Noise robustness achieved by optical signal processing.a-g, 2D representations ($28^2$ pixels) of optical intensities before detection, $I_\alpha^\mathrm{(out)}$: ideal image of digit 0 (a), random matrix multiplied image (b), 2D Fourier image (c), block-wise 2D Fourier image (d), images with machine-trained unitary matrices (e and f) from the initialization with b and c, respectively, and sampled ($7^2$ pixels) image from d by max-pooling (g). h,i, Detected images with two different types of noise $x_\alpha = I_\alpha^\mathrm{(out)} + \Delta I_\mathrm{dark} + \Delta I_\mathrm{photon}$: dark noise (h, $\Delta I_\mathrm{photon}\sim 0$) and photon shot noise (i, $\Delta I_\mathrm{dark}= 0$), applied to a, e, f, and g from left to right. j, k, MNIST classification accuracies according to increasing test noise levels: dark noise power (j) and shot exposure time (k), for various optical processing types (ideal image a, black; machine-trained operations e and f, red and orange; fixed block-wise Fourier operations g with different segmentation numbers 10 and 7, blue and green, respectively). Grey dashed lines indicate the applied noise level in h and i. The test accuracy is calculated over $10^4$ balanced test samples with 20 repetitions. $\Delta I\sim0.17$ is the intensity contrast in ideal images (a).
  • Figure 3: Concentration-induced noise robustness.a,b, Output intensity distributions from an input example in class 0 after applying block-wise Fourier operations (a) and then max-pooling (b), with different segmentation numbers $N_\mathrm{seg}=2$ (left) to $13$ (right). c, Shannon entropy distributions given a dataset and the operation with different $N_\mathrm{seg}$. d, MNIST classification accuracies as a function of $N_\mathrm{seg}$ with different test noise levels, from $I_\mathrm{dark}=0$ (black line) to $I_\mathrm{dark}=1$ (orange line). The average entropy per pixel is overlaid.
  • Figure 4: Training noise-induced emergence of hub detectors.a-c, Shannon entropy distributions for trained $\mathrm{U}(28^2)$ operations with the same random initialization but different training noise levels $\sigma_\mathrm{dark}^\mathrm{(tr)}=0.01$ (a), $0.1$ (b), and $0.2$ (c). Red circles and squares indicate the pixels with minimum and maximum entropy of each network, respectively. d, MNIST classification accuracies according to the cumulative pruning of pixels (i.e., enforced zero output to the digital network regardless of input) with ascending (filled circles) or descending (empty square) order of entropy.
  • Figure 5: Incoherent meta-imaging systems. a, Illustrations of a conventional 4$f$ system (lenses; L1 and L3) and a meta-imaging system with additional trainable phase masks (metalenses; ML1-3). b-d, Pure images without noise (b) and noisy images with dark noise power $\sigma_\mathrm{dark}=\Delta I/4$ (c) and $\Delta I/2$ (d), obtained by $4f$ (upper) and the optimized meta-imaging (lower) systems for digits 0 to 9. e, MNIST classification accuracies of the conventional (black) and the meta-images (red) as a function of dark noise power. $\lambda$, wavelength; $f_0=300\lambda$ and $\mathrm{NA}\sim0.22$, focal length and numerical aperture of L1 and L3; $\Delta I\sim0.051$, constant for the intensity contrast in conventional images. f-j, Example IR images in reality: a scene of pedestrians (f) from LLVIP dataset Jia2021; its modified images based on the pixel-wise intensity ranges for the $4f$ (g) and the optimized meta-imaging (h) systems with the same level of additional Gaussian noises; and the object detection results (magenta boxes, i and j) using the YOLOv3 modelyolov3 for g and h, respectively. Each image is normalized with its minimum and maximum values.