Compute-first optical detection for noise-resilient visual perception
Jungmin Kim, Nanfang Yu, Zongfu Yu
TL;DR
This work introduces a compute-first optical detection paradigm whereby pre-detection optical processing concentrates scene light via a linear, unitary transform $P$ before detection, improving robustness to detection noise in visual perception tasks. The authors formulate a discrete, unitary model $\mathbf{E}^{(out)} = P \mathbf{E}^{(in)}$ with detection noise sources $\Delta I_{\text{photon}}$ and $\Delta I_{\text{dark}}$, and demonstrate enhanced MNIST classification performance under dark noise using both machine-learned and manually designed optical transforms. A key finding is that signal concentration improves resilience to dark noise but not to photon shot noise, which scales with the output intensity; increased compression via block-wise Fourier processing can further boost robustness at the cost of information loss. The paper also validates a practical incoherent meta-imaging system with trainable metalenses that yields higher-contrast images and better downstream machine perception under noise, illustrating the potential of optical computing to advance infrared machine vision in industrial and defense contexts. Overall, the work highlights the value of pre-detection optical computation for noise-resilient perception and provides a framework and demonstrations across theory, simulation, and a practical diffractive-optics implementation.
Abstract
In the context of visual perception, the optical signal from a scene is transferred into the electronic domain by detectors in the form of image data, which are then processed for the extraction of visual information. In noisy and weak-signal environments such as thermal imaging for night vision applications, however, the performance of neural computing tasks faces a significant bottleneck due to the inherent degradation of data quality upon noisy detection. Here, we propose a concept of optical signal processing before detection to address this issue. We demonstrate that spatially redistributing optical signals through a properly designed linear transformer can enhance the detection noise resilience of visual perception tasks, as benchmarked with the MNIST classification. Our idea is supported by a quantitative analysis detailing the relationship between signal concentration and noise robustness, as well as its practical implementation in an incoherent imaging system. This compute-first detection scheme can pave the way for advancing infrared machine vision technologies widely used for industrial and defense applications.
