Table of Contents
Fetching ...

Multimodal Fusion Network for Micro-displacement Measurement via Michelson Interferometer

Zixing Jia, Jiawei Li, Ziping Chen, Xin Li

TL;DR

This work tackles the intrinsic $\lambda/2$ phase ambiguity in single-wavelength Michelson interferometry by introducing a multimodal fusion network (MFN) that jointly leverages spatial, spectral, and temporal cues. The architecture comprises three image branches (raw interferogram, frame-difference, FFT), a MobileViT backbone, and an LSTM-based temporal encoder, delivering two parallel heads: sub-$\lambda/2$ displacement regression and integer interference-order classification, with a soft orthogonality constraint to separate the two tasks. Trained on ~2×10^5 simulated interferograms and fine-tuned with as few as ~500 real images, MFN achieves ~4.84±0.15 nm displacement precision and ~98% order-classification accuracy, while maintaining ~16 nm RMSE under severe noise, and operates at ~10 ms per image—far faster than traditional heuristic fitting. This data-driven, hardware-light approach eliminates the need for dual-wavelength hardware or complex phase fitting, offering a robust, cost-efficient solution for real-time interferometric metrology with strong practical potential in research and industry.

Abstract

We propose a multimodal fusion network (MFN) for precise micro-displacement measurement using a modified Michelson interferometer. The model resolves the intrinsic half-wave displacement ambiguity that limits conventional single-wavelength interferometry by introducing a dual-head learning mechanism: one head performs sub-half-wave displacement regression, and the other classifies integer interference orders. Unlike dual-wavelength or iterative fitting methods, which require high signal quality and long computation time, MFN achieves robust, real-time prediction directly from interferometric images. Trained on 2x10^5 simulated interferograms and fine-tuned with only about 0.24% of real experimental data (about 500 images), the model attains a displacement precision of 4.84(15) nm and an order-classification accuracy of 98%. Even under severe noise, MFN maintains stable accuracy (about 16 nm RMSE), whereas conventional heuristic algorithms exhibit errors exceeding 100 nm. These results demonstrate that MFN offers a fast, noise-tolerant, and cost-efficient solution for single-wavelength interferometric metrology, eliminating the need for multi-wavelength hardware or complex phase fitting.

Multimodal Fusion Network for Micro-displacement Measurement via Michelson Interferometer

TL;DR

This work tackles the intrinsic phase ambiguity in single-wavelength Michelson interferometry by introducing a multimodal fusion network (MFN) that jointly leverages spatial, spectral, and temporal cues. The architecture comprises three image branches (raw interferogram, frame-difference, FFT), a MobileViT backbone, and an LSTM-based temporal encoder, delivering two parallel heads: sub- displacement regression and integer interference-order classification, with a soft orthogonality constraint to separate the two tasks. Trained on ~2×10^5 simulated interferograms and fine-tuned with as few as ~500 real images, MFN achieves ~4.84±0.15 nm displacement precision and ~98% order-classification accuracy, while maintaining ~16 nm RMSE under severe noise, and operates at ~10 ms per image—far faster than traditional heuristic fitting. This data-driven, hardware-light approach eliminates the need for dual-wavelength hardware or complex phase fitting, offering a robust, cost-efficient solution for real-time interferometric metrology with strong practical potential in research and industry.

Abstract

We propose a multimodal fusion network (MFN) for precise micro-displacement measurement using a modified Michelson interferometer. The model resolves the intrinsic half-wave displacement ambiguity that limits conventional single-wavelength interferometry by introducing a dual-head learning mechanism: one head performs sub-half-wave displacement regression, and the other classifies integer interference orders. Unlike dual-wavelength or iterative fitting methods, which require high signal quality and long computation time, MFN achieves robust, real-time prediction directly from interferometric images. Trained on 2x10^5 simulated interferograms and fine-tuned with only about 0.24% of real experimental data (about 500 images), the model attains a displacement precision of 4.84(15) nm and an order-classification accuracy of 98%. Even under severe noise, MFN maintains stable accuracy (about 16 nm RMSE), whereas conventional heuristic algorithms exhibit errors exceeding 100 nm. These results demonstrate that MFN offers a fast, noise-tolerant, and cost-efficient solution for single-wavelength interferometric metrology, eliminating the need for multi-wavelength hardware or complex phase fitting.

Paper Structure

This paper contains 13 sections, 12 equations, 6 figures, 1 table.

Figures (6)

  • Figure 1: Architecture of the proposed multimodal fusion network. The model integrates: (1) three single-channel CNNs for raw, differential, and FFT-transformed interferograms (local features); (2) a MobileViT backbone for joint global feature extraction; and (3) an LSTM encoder for temporal numerical descriptors (activated only during fine-tuning). The multimodal feature vector branches into two parallel heads: a displacement regression head that utilizes features from the two image-based branches (CNN + MobileViT) to achieve high-precision prediction of sub-$\lambda/2$ displacements, and an order classification head that combines features from all three branches (CNN + MobileViT + LSTM) to learn the complex mapping from images and noise statistics to interference orders. Both heads are regularized by an orthogonality loss to ensure feature disentanglement.
  • Figure 2: (Left) Photograph of the modified Michelson interferometer setup.(Right) Schematic of the modified Michelson interferometer. Key components include: a He–Ne laser, beam expander and collimator, convex lens, prism-based beam splitter, movable mirror mounted on a piezoelectric actuator, fixed mirror, and CCD camera. Red lines indicate the optical path
  • Figure 3: (Left) Photograph of the voltage–displacement ($V$–$d$) characterization setup, showing the piezoelectric actuator mounted on the capacitive micrometer inside an acrylic enclosure.(Right) A brief description of the two experimental objectives: The first experimental goal is to obtain images aligned with voltage, and the second experiment is to obtain voltage aligned with displacement, ultimately resulting in the dataset we need, which is images and displacement.
  • Figure 4: The image demonstrates different eta values. In the experiment, the range was obtained using the original fitting formula, based on three times the standard deviation of parameters derived from the actual image($I$). Subsequently, simulated images were generated through randomization, incorporating noise sources such as laser power drift, photon Poisson noise, CCD readout Gaussian noise, the optical system's inherent PSF, salt-and-pepper noise, and background gradients caused by uneven illumination. The total illumination was calculated as$I^{'}$ and $\eta$ controls the overall noise intensity. After $\eta$ control, the final luminance output is $I_{final} = I + \eta \times(I^{'} -I)$
  • Figure 5: Left: Average displacement prediction accuracy (RMSE in nanometers) versus noise intensity. Due to the high computational cost of the HAA, its performance was evaluated on sampled interferogram subsets for each noise level. In contrast, the MFN was evaluated on all available sequences at every $\eta$, enabling the estimation of Type-A uncertainties shown as error bars. Right: Inference time per image versus noise intensity. The MFN model was trained and tested on an NVIDIA L40 GPU. For the HAA, the iterative optimization process was capped at a maximum runtime of $10^3$ s per image to prevent unbounded fitting time; without this restriction, convergence failures frequently occurred at high noise intensities, with runtimes exceeding $10^4$ s and no valid solution found. The results highlight that the MFN maintains stable precision and achieves over four orders of magnitude acceleration compared with the traditional HAA, even under severe noise degradation.
  • ...and 1 more figures