Physics-Informed Machine Learning for Efficient Sim-to-Real Data Augmentation in Micro-Object Pose Estimation
Zongcai Tan, Lan Wei, Dandan Zhang
TL;DR
The paper tackles the data bottleneck in microrobot pose estimation by introducing a physics-informed deep generative framework that combines wave optics-based physical rendering with PixelGAN refinement to generate high-fidelity microscope images for sim-to-real data augmentation. By grounding image synthesis in Fourier optics, depth-aware rendering, and pixel-wise domain adaptation, the method delivers real-time image generation (0.022 s/frame) and a 35.6% SSIM improvement over AI-only approaches, while enabling pose estimators trained on synthetic data to reach within 5%–5.4% of the performance of models trained on real data. The approach demonstrates generalizability to unseen poses and offers a practical, interpretable pathway to reduce labelling costs and accelerate development of microrobotic perception and control, with potential extensions to reinforcement learning and haptic-enabled manipulation. Overall, this work provides a robust digital-twin based solution for high-fidelity sim-to-real data augmentation in optical microrobotics, balancing physical accuracy with computational efficiency.
Abstract
Precise pose estimation of optical microrobots is essential for enabling high-precision object tracking and autonomous biological studies. However, current methods rely heavily on large, high-quality microscope image datasets, which are difficult and costly to acquire due to the complexity of microrobot fabrication and the labour-intensive labelling. Digital twin systems offer a promising path for sim-to-real data augmentation, yet existing techniques struggle to replicate complex optical microscopy phenomena, such as diffraction artifacts and depth-dependent imaging.This work proposes a novel physics-informed deep generative learning framework that, for the first time, integrates wave optics-based physical rendering and depth alignment into a generative adversarial network (GAN), to synthesise high-fidelity microscope images for microrobot pose estimation efficiently. Our method improves the structural similarity index (SSIM) by 35.6% compared to purely AI-driven methods, while maintaining real-time rendering speeds (0.022 s/frame).The pose estimator (CNN backbone) trained on our synthetic data achieves 93.9%/91.9% (pitch/roll) accuracy, just 5.0%/5.4% (pitch/roll) below that of an estimator trained exclusively on real data. Furthermore, our framework generalises to unseen poses, enabling data augmentation and robust pose estimation for novel microrobot configurations without additional training data.
