Table of Contents
Fetching ...

sqrtVINS: Robust and Ultrafast Square-Root Filter-based 3D Motion Tracking

Yuxiang Peng, Chuchu Chen, Kejian Wu, Guoquan Huang

TL;DR

This work develops sqrtVINS, a square-root covariance filtering-based visual-inertial system designed for embedded, 32-bit platforms. It introduces an LLT-based SRF update that preserves the triangular structure and delivers substantial computational gains, paired with an ultrafast dynamic initialization that achieves robust results within 100 ms using only 3 keyframes. The system couples a structure-aware, sliding-window SRF with stochastic cloning, feature management, online calibration, and an efficient feature-bearing measurement update, achieving over 2x speedups and improved numerical stability relative to state-of-the-art methods, validated on diverse datasets. The authors also provide a full open-source implementation, enabling rapid adoption and extension to multi-sensor and dynamic environments in real-world applications.

Abstract

In this paper, we develop and open-source, for the first time, a square-root filter (SRF)-based visual-inertial navigation system (VINS), termed sqrtVINS, which is ultra-fast, numerically stable, and capable of dynamic initialization even under extreme conditions (i.e., extremely small time window). Despite recent advancements in VINS, resource constraints and numerical instability on embedded (robotic) systems with limited precision remain critical challenges. A square-root covariance-based filter offers a promising solution by providing numerical stability, efficient memory usage, and guaranteed positive semi-definiteness. However, canonical SRFs suffer from inefficiencies caused by disruptions in the triangular structure of the covariance matrix during updates. The proposed method significantly improves VINS efficiency with a novel Cholesky decomposition (LLT)-based SRF update, by fully exploiting the system structure to preserve the structure. Moreover, we design a fast, robust, dynamic initialization method, which first recovers the minimal states without triangulating 3D features and then efficiently performs iterative SRF update to refine the full states, enabling seamless VINS operation. The proposed LLT-based SRF is extensively verified through numerical studies, demonstrating superior numerical stability and achieving robust efficient performance on 32-bit single-precision floats, operating at twice the speed of state-of-the-art (SOTA) methods. Our initialization method, tested on both mobile workstations and Jetson Nano computers, achieving a high success rate of initialization even within a 100 ms window under minimal conditions. Finally, the proposed sqrtVINS is extensively validated across diverse scenarios, demonstrating strong efficiency, robustness, and reliability. The full open-source implementation is released to support future research and applications.

sqrtVINS: Robust and Ultrafast Square-Root Filter-based 3D Motion Tracking

TL;DR

This work develops sqrtVINS, a square-root covariance filtering-based visual-inertial system designed for embedded, 32-bit platforms. It introduces an LLT-based SRF update that preserves the triangular structure and delivers substantial computational gains, paired with an ultrafast dynamic initialization that achieves robust results within 100 ms using only 3 keyframes. The system couples a structure-aware, sliding-window SRF with stochastic cloning, feature management, online calibration, and an efficient feature-bearing measurement update, achieving over 2x speedups and improved numerical stability relative to state-of-the-art methods, validated on diverse datasets. The authors also provide a full open-source implementation, enabling rapid adoption and extension to multi-sensor and dynamic environments in real-world applications.

Abstract

In this paper, we develop and open-source, for the first time, a square-root filter (SRF)-based visual-inertial navigation system (VINS), termed sqrtVINS, which is ultra-fast, numerically stable, and capable of dynamic initialization even under extreme conditions (i.e., extremely small time window). Despite recent advancements in VINS, resource constraints and numerical instability on embedded (robotic) systems with limited precision remain critical challenges. A square-root covariance-based filter offers a promising solution by providing numerical stability, efficient memory usage, and guaranteed positive semi-definiteness. However, canonical SRFs suffer from inefficiencies caused by disruptions in the triangular structure of the covariance matrix during updates. The proposed method significantly improves VINS efficiency with a novel Cholesky decomposition (LLT)-based SRF update, by fully exploiting the system structure to preserve the structure. Moreover, we design a fast, robust, dynamic initialization method, which first recovers the minimal states without triangulating 3D features and then efficiently performs iterative SRF update to refine the full states, enabling seamless VINS operation. The proposed LLT-based SRF is extensively verified through numerical studies, demonstrating superior numerical stability and achieving robust efficient performance on 32-bit single-precision floats, operating at twice the speed of state-of-the-art (SOTA) methods. Our initialization method, tested on both mobile workstations and Jetson Nano computers, achieving a high success rate of initialization even within a 100 ms window under minimal conditions. Finally, the proposed sqrtVINS is extensively validated across diverse scenarios, demonstrating strong efficiency, robustness, and reliability. The full open-source implementation is released to support future research and applications.

Paper Structure

This paper contains 32 sections, 3 theorems, 42 equations, 14 figures, 8 tables, 1 algorithm.

Key Result

Lemma 1

Let the measurement model be given by $\mathbf z_k = \mathbf h(\mathbf x_k) + \mathbf n_k$, with measurement residual as:Throughout the paper $\hat{\mathbf{x}}$ is used to denote the estimate of a random variable $\mathbf{x}$, while $\tilde{\mathbf{x}}$ is the corresponding error state. The subscrip where the measurement noise assumes $\mathbf n_k \sim \mathcal{N}(\mathbf 0, \mathbf R_k)$ and $\ma

Figures (14)

  • Figure 1: Visualization of the matrix computation and its structure evolution during the proposed LLT-based SRF update.
  • Figure 2: An illustration of the marginalization process of $\sqrt{\mathrm{VINS}}$ with different state ordering. The blue blocks represent the marginalized state $\mathbf{x}_M$ and its covariance block $\mathbf{U}_M$, while the pink blocks represent the other states. The left side shows an extreme case where the marginalized state is at the bottom, allowing direct extraction of the upper-triangular square-root covariance $\mathbf{U}^*$. The right side shows the marginalized state in the middle, requiring an additional QR operation. Placing marginalized states further toward the bottom improves structure preservation and reduces the QR operation cost.
  • Figure 3: An illustration of two-view geometry for featureless initialization. The pink and blue planes represent two epipolar planes formed by the features $\mathbf{f}_1$ and $\mathbf{f}_2$, and their corresponding camera frames $\{C_1\}$ and $\{C_2\}$. The bearing observations are denoted by $\mathbf{b}$, while $\mathbf{n}_1$ and $\mathbf{n}_2$ represent the normal directions of the two epipolar planes. $\{I_0\}$ indicates the initial reference frame and $\mathbf{t}$ is the direction of the relative pose.
  • Figure 4: Simulated 2.4km UD-ARL trajectory.
  • Figure 5: Top: Orientation/position errors of different estimators performed on UD-ARL dataset. 'd' is for double; 'f' is for float. While most estimators perform similarly and are hard to distinguish from the plot, SRIF(f) shows a clear drop in accuracy over time. Bottom: Condition numbers of the square-root information matrix (purple line) and the Cholesky decomposed matrix $\mathbf{C}$ (green line, see Eq. \ref{['eq:update']}), presented in both standard (scientific) and logarithmic scales.
  • ...and 9 more figures

Theorems & Definitions (6)

  • Lemma 1
  • proof
  • Proposition 1
  • proof
  • Proposition 2
  • proof