Table of Contents
Fetching ...

Physics-Informed Machine Learning for Efficient Sim-to-Real Data Augmentation in Micro-Object Pose Estimation

Zongcai Tan, Lan Wei, Dandan Zhang

TL;DR

The paper tackles the data bottleneck in microrobot pose estimation by introducing a physics-informed deep generative framework that combines wave optics-based physical rendering with PixelGAN refinement to generate high-fidelity microscope images for sim-to-real data augmentation. By grounding image synthesis in Fourier optics, depth-aware rendering, and pixel-wise domain adaptation, the method delivers real-time image generation (0.022 s/frame) and a 35.6% SSIM improvement over AI-only approaches, while enabling pose estimators trained on synthetic data to reach within 5%–5.4% of the performance of models trained on real data. The approach demonstrates generalizability to unseen poses and offers a practical, interpretable pathway to reduce labelling costs and accelerate development of microrobotic perception and control, with potential extensions to reinforcement learning and haptic-enabled manipulation. Overall, this work provides a robust digital-twin based solution for high-fidelity sim-to-real data augmentation in optical microrobotics, balancing physical accuracy with computational efficiency.

Abstract

Precise pose estimation of optical microrobots is essential for enabling high-precision object tracking and autonomous biological studies. However, current methods rely heavily on large, high-quality microscope image datasets, which are difficult and costly to acquire due to the complexity of microrobot fabrication and the labour-intensive labelling. Digital twin systems offer a promising path for sim-to-real data augmentation, yet existing techniques struggle to replicate complex optical microscopy phenomena, such as diffraction artifacts and depth-dependent imaging.This work proposes a novel physics-informed deep generative learning framework that, for the first time, integrates wave optics-based physical rendering and depth alignment into a generative adversarial network (GAN), to synthesise high-fidelity microscope images for microrobot pose estimation efficiently. Our method improves the structural similarity index (SSIM) by 35.6% compared to purely AI-driven methods, while maintaining real-time rendering speeds (0.022 s/frame).The pose estimator (CNN backbone) trained on our synthetic data achieves 93.9%/91.9% (pitch/roll) accuracy, just 5.0%/5.4% (pitch/roll) below that of an estimator trained exclusively on real data. Furthermore, our framework generalises to unseen poses, enabling data augmentation and robust pose estimation for novel microrobot configurations without additional training data.

Physics-Informed Machine Learning for Efficient Sim-to-Real Data Augmentation in Micro-Object Pose Estimation

TL;DR

The paper tackles the data bottleneck in microrobot pose estimation by introducing a physics-informed deep generative framework that combines wave optics-based physical rendering with PixelGAN refinement to generate high-fidelity microscope images for sim-to-real data augmentation. By grounding image synthesis in Fourier optics, depth-aware rendering, and pixel-wise domain adaptation, the method delivers real-time image generation (0.022 s/frame) and a 35.6% SSIM improvement over AI-only approaches, while enabling pose estimators trained on synthetic data to reach within 5%–5.4% of the performance of models trained on real data. The approach demonstrates generalizability to unseen poses and offers a practical, interpretable pathway to reduce labelling costs and accelerate development of microrobotic perception and control, with potential extensions to reinforcement learning and haptic-enabled manipulation. Overall, this work provides a robust digital-twin based solution for high-fidelity sim-to-real data augmentation in optical microrobotics, balancing physical accuracy with computational efficiency.

Abstract

Precise pose estimation of optical microrobots is essential for enabling high-precision object tracking and autonomous biological studies. However, current methods rely heavily on large, high-quality microscope image datasets, which are difficult and costly to acquire due to the complexity of microrobot fabrication and the labour-intensive labelling. Digital twin systems offer a promising path for sim-to-real data augmentation, yet existing techniques struggle to replicate complex optical microscopy phenomena, such as diffraction artifacts and depth-dependent imaging.This work proposes a novel physics-informed deep generative learning framework that, for the first time, integrates wave optics-based physical rendering and depth alignment into a generative adversarial network (GAN), to synthesise high-fidelity microscope images for microrobot pose estimation efficiently. Our method improves the structural similarity index (SSIM) by 35.6% compared to purely AI-driven methods, while maintaining real-time rendering speeds (0.022 s/frame).The pose estimator (CNN backbone) trained on our synthetic data achieves 93.9%/91.9% (pitch/roll) accuracy, just 5.0%/5.4% (pitch/roll) below that of an estimator trained exclusively on real data. Furthermore, our framework generalises to unseen poses, enabling data augmentation and robust pose estimation for novel microrobot configurations without additional training data.

Paper Structure

This paper contains 20 sections, 6 equations, 5 figures, 4 tables.

Figures (5)

  • Figure 1: Concept overview of the physics-informed machine learning network for efficient sim-to-real microscopy data generation.
  • Figure 2: Workflow of the visualization rendering system: A virtual optical microscope system was constructed in Isaac Sim based on real-time optical path parameters and predicted robotic poses. Using the initial CAD images and depth maps captured by a virtual camera, high-fidelity simulated images are generated via the visualization rendering module based on wave optics. The reality gap of the virtual image was further reduced through a sim-to-real module using PixelGAN isola2017image.
  • Figure 3: Alignment of rendering and experiment image features based on Laplacian of Gaussian (LoG) analysis. LoG values and normalised depth are extracted for each dataset (left). Peak LoG frames (corresponding to the focal plane) are used to segment the datasets. To enable one-to-one pairing, data within each segment is balanced (middle), facilitating aligned image pairs for downstream training (right).
  • Figure 4: Qualitative evaluation of image generation methods across varying poses and depths, demonstrating the visual fidelity of simulated microscope images compared to real experimental images. The comparison includes real experimental images (red), CAD renderings (yellow), physically rendered images (green), and GAN-generated images (blue).
  • Figure 5: Heatmap of evaluation metrics (SSIM, PSNR, MSE) across different robot poses and depths. The X-axis represents the robot’s posture angles, while the Y-axis indicates the height offset relative to the focal plane. Each cell corresponds to a specific combination of pose and depth. The horizontal axis represents the robot’s posture angles, denoted as Pa_Rb, where a and b indicate the pitch and roll angles in degrees, respectively (e.g., P0_R60 means pitch = 0$^\circ$ and roll = 60$^\circ$).