Multimodal Fusion Network for Micro-displacement Measurement via Michelson Interferometer

Zixing Jia; Jiawei Li; Ziping Chen; Xin Li

Multimodal Fusion Network for Micro-displacement Measurement via Michelson Interferometer

Zixing Jia, Jiawei Li, Ziping Chen, Xin Li

TL;DR

This work tackles the intrinsic $\lambda/2$ phase ambiguity in single-wavelength Michelson interferometry by introducing a multimodal fusion network (MFN) that jointly leverages spatial, spectral, and temporal cues. The architecture comprises three image branches (raw interferogram, frame-difference, FFT), a MobileViT backbone, and an LSTM-based temporal encoder, delivering two parallel heads: sub-$\lambda/2$ displacement regression and integer interference-order classification, with a soft orthogonality constraint to separate the two tasks. Trained on ~2×10^5 simulated interferograms and fine-tuned with as few as ~500 real images, MFN achieves ~4.84±0.15 nm displacement precision and ~98% order-classification accuracy, while maintaining ~16 nm RMSE under severe noise, and operates at ~10 ms per image—far faster than traditional heuristic fitting. This data-driven, hardware-light approach eliminates the need for dual-wavelength hardware or complex phase fitting, offering a robust, cost-efficient solution for real-time interferometric metrology with strong practical potential in research and industry.

Abstract

We propose a multimodal fusion network (MFN) for precise micro-displacement measurement using a modified Michelson interferometer. The model resolves the intrinsic half-wave displacement ambiguity that limits conventional single-wavelength interferometry by introducing a dual-head learning mechanism: one head performs sub-half-wave displacement regression, and the other classifies integer interference orders. Unlike dual-wavelength or iterative fitting methods, which require high signal quality and long computation time, MFN achieves robust, real-time prediction directly from interferometric images. Trained on 2x10^5 simulated interferograms and fine-tuned with only about 0.24% of real experimental data (about 500 images), the model attains a displacement precision of 4.84(15) nm and an order-classification accuracy of 98%. Even under severe noise, MFN maintains stable accuracy (about 16 nm RMSE), whereas conventional heuristic algorithms exhibit errors exceeding 100 nm. These results demonstrate that MFN offers a fast, noise-tolerant, and cost-efficient solution for single-wavelength interferometric metrology, eliminating the need for multi-wavelength hardware or complex phase fitting.

Multimodal Fusion Network for Micro-displacement Measurement via Michelson Interferometer

TL;DR

This work tackles the intrinsic

phase ambiguity in single-wavelength Michelson interferometry by introducing a multimodal fusion network (MFN) that jointly leverages spatial, spectral, and temporal cues. The architecture comprises three image branches (raw interferogram, frame-difference, FFT), a MobileViT backbone, and an LSTM-based temporal encoder, delivering two parallel heads: sub-

displacement regression and integer interference-order classification, with a soft orthogonality constraint to separate the two tasks. Trained on ~2×10^5 simulated interferograms and fine-tuned with as few as ~500 real images, MFN achieves ~4.84±0.15 nm displacement precision and ~98% order-classification accuracy, while maintaining ~16 nm RMSE under severe noise, and operates at ~10 ms per image—far faster than traditional heuristic fitting. This data-driven, hardware-light approach eliminates the need for dual-wavelength hardware or complex phase fitting, offering a robust, cost-efficient solution for real-time interferometric metrology with strong practical potential in research and industry.

Multimodal Fusion Network for Micro-displacement Measurement via Michelson Interferometer

TL;DR

Abstract

Multimodal Fusion Network for Micro-displacement Measurement via Michelson Interferometer

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (6)