Fusing Structure from Motion and Simulation-Augmented Pose Regression from Optical Flow for Challenging Indoor Environments
Felix Ott, Lucas Heublein, David Rügamer, Bernd Bischl, Christopher Mutschler
TL;DR
This work tackles the problem of robust indoor localization from monocular imagery by fusing absolute poses derived from Structure from Motion (SfM) or Absolute Pose Regression (APR) with relative poses from Relative Pose Regression (RPR) based on optical flow. The authors introduce recurrent fusion networks to optimally align and smooth the combined pose stream, compare against Pose Graph Optimization (PGO), and demonstrate substantial improvements across a large, challenging warehouse-like dataset. A key contribution is simulation-augmented pre-training that uses synthetic data to initialize APR and RPR, boosting generalization to unseen configurations. The results show that recurrent fusion—especially with a strongly typed TRNN cell and two stacked layers—consistently outperforms PGO and non-fusion baselines, while a public Industry dataset and synthetic pre-training facilitate broader applicability. Overall, the approach enhances localization robustness against environmental changes, motion dynamics, and feature-poor scenes, with practical implications for robotics and warehouse automation.
Abstract
The localization of objects is a crucial task in various applications such as robotics, virtual and augmented reality, and the transportation of goods in warehouses. Recent advances in deep learning have enabled the localization using monocular visual cameras. While structure from motion (SfM) predicts the absolute pose from a point cloud, absolute pose regression (APR) methods learn a semantic understanding of the environment through neural networks. However, both fields face challenges caused by the environment such as motion blur, lighting changes, repetitive patterns, and feature-less structures. This study aims to address these challenges by incorporating additional information and regularizing the absolute pose using relative pose regression (RPR) methods. RPR methods suffer under different challenges, i.e., motion blur. The optical flow between consecutive images is computed using the Lucas-Kanade algorithm, and the relative pose is predicted using an auxiliary small recurrent convolutional network. The fusion of absolute and relative poses is a complex task due to the mismatch between the global and local coordinate systems. State-of-the-art methods fusing absolute and relative poses use pose graph optimization (PGO) to regularize the absolute pose predictions using relative poses. In this work, we propose recurrent fusion networks to optimally align absolute and relative pose predictions to improve the absolute pose prediction. We evaluate eight different recurrent units and construct a simulation environment to pre-train the APR and RPR networks for better generalized training. Additionally, we record a large database of different scenarios in a challenging large-scale indoor environment that mimics a warehouse with transportation robots. We conduct hyperparameter searches and experiments to show the effectiveness of our recurrent fusion method compared to PGO.
