Sim2Real in Reconstructive Spectroscopy: Deep Learning with Augmented Device-Informed Data Simulation
Jiyi Chen, Pengyu Li, Yutong Wang, Pei-Cheng Ku, Qing Qu
TL;DR
The paper tackles reconstructive spectroscopy under severe training-data constraints by bridging the sim-to-real gap. It introduces a Sim2Real framework that combines Hierarchical Data Augmentation to perturb the device response and a lightweight ReSpecNN network trained entirely on augmented simulated data, enabling fast, accurate spectral reconstruction on real measurements. Empirical results on real-world data show comparable accuracy to NNLS-TV while achieving an order-of-magnitude faster inference, highlighting practical benefits for on-chip, real-time spectroscopy. The work also discusses limitations, such as extreme outliers, and outlines avenues for improving robustness through adversarial augmentation and selective fine-tuning with limited real data. Overall, Sim2Real offers a scalable path to deploy DL-based reconstructive spectroscopy on resource-constrained devices without requiring large real labeled datasets.
Abstract
This work proposes a deep learning (DL)-based framework, namely Sim2Real, for spectral signal reconstruction in reconstructive spectroscopy, focusing on efficient data sampling and fast inference time. The work focuses on the challenge of reconstructing real-world spectral signals under the extreme setting where only device-informed simulated data are available for training. Such device-informed simulated data are much easier to collect than real-world data but exhibit large distribution shifts from their real-world counterparts. To leverage such simulated data effectively, a hierarchical data augmentation strategy is introduced to mitigate the adverse effects of this domain shift, and a corresponding neural network for the spectral signal reconstruction with our augmented data is designed. Experiments using a real dataset measured from our spectrometer device demonstrate that Sim2Real achieves significant speed-up during the inference while attaining on-par performance with the state-of-the-art optimization-based methods.
