Opto-Electronic Convolutional Neural Network Design Via Direct Kernel Optimization
Ali Almuallem, Harshana Weligampola, Abhiram Gnanasambandam, Wei Xu, Dilshan Godaliyadda, Hamid R. Sheikh, Stanley H. Chan, Qi Guo
TL;DR
This work tackles the prohibitive cost of end-to-end optimization in opto-electronic CNNs by proposing a two-stage design: first train a conventional electronic CNN, then realize the optical front-end as a metasurface array by directly optimizing the kernels of the first convolutional layer. The Direct Kernel Optimization (DKO) approach reduces the design search space and training burden while maintaining accuracy, demonstrated on monocular depth estimation where the two-stage method outperforms end-to-end training under the same budget. Key contributions include formulating the optical front-end as a kernel-mimicking metasurface, applying a differentiable optical simulator for phase optimization, and validating the approach via comprehensive simulations on KITTI with Monodepth2. The results indicate substantial reductions in computation and parameter counts, with practical implications for scalable, fast, energy-efficient hybrid vision systems that exploit optical preprocessing for dense prediction tasks.
Abstract
Opto-electronic neural networks integrate optical front-ends with electronic back-ends to enable fast and energy-efficient vision. However, conventional end-to-end optimization of both the optical and electronic modules is limited by costly simulations and large parameter spaces. We introduce a two-stage strategy for designing opto-electronic convolutional neural networks (CNNs): first, train a standard electronic CNN, then realize the optical front-end implemented as a metasurface array through direct kernel optimization of its first convolutional layer. This approach reduces computational and memory demands by hundreds of times and improves training stability compared to end-to-end optimization. On monocular depth estimation, the proposed two-stage design achieves twice the accuracy of end-to-end training under the same training time and resource constraints.
