Table of Contents
Fetching ...

Dense Matchers for Dense Tracking

Tomáš Jelínek, Jonáš Šerých, Jiří Matas

TL;DR

Dense long-term tracking across wide baselines is addressed by extending the MFT framework to work with dense matchers DKM and RoMa. The method constructs logarithmically spaced flow chains and selects the most reliable chain, while integrating the outputs of DKM/RoMa via calibrated occlusion and uncertainty signals. An ensemble strategy that combines RAFT-based occlusion with RoMa-based position yields strong, competitive results against non-causal trackers while remaining fully causal. This work demonstrates the versatility of MFT for dense tracking and points to future work in co-training and richer dense-tracking datasets.

Abstract

Optical flow is a useful input for various applications, including 3D reconstruction, pose estimation, tracking, and structure-from-motion. Despite its utility, the field of dense long-term tracking, especially over wide baselines, has not been extensively explored. This paper extends the concept of combining multiple optical flows over logarithmically spaced intervals as proposed by MFT. We demonstrate the compatibility of MFT with different optical flow networks, yielding results that surpass their individual performance. Moreover, we present a simple yet effective combination of these networks within the MFT framework. This approach proves to be competitive with more sophisticated, non-causal methods in terms of position prediction accuracy, highlighting the potential of MFT in enhancing long-term tracking applications.

Dense Matchers for Dense Tracking

TL;DR

Dense long-term tracking across wide baselines is addressed by extending the MFT framework to work with dense matchers DKM and RoMa. The method constructs logarithmically spaced flow chains and selects the most reliable chain, while integrating the outputs of DKM/RoMa via calibrated occlusion and uncertainty signals. An ensemble strategy that combines RAFT-based occlusion with RoMa-based position yields strong, competitive results against non-causal trackers while remaining fully causal. This work demonstrates the versatility of MFT for dense tracking and points to future work in co-training and richer dense-tracking datasets.

Abstract

Optical flow is a useful input for various applications, including 3D reconstruction, pose estimation, tracking, and structure-from-motion. Despite its utility, the field of dense long-term tracking, especially over wide baselines, has not been extensively explored. This paper extends the concept of combining multiple optical flows over logarithmically spaced intervals as proposed by MFT. We demonstrate the compatibility of MFT with different optical flow networks, yielding results that surpass their individual performance. Moreover, we present a simple yet effective combination of these networks within the MFT framework. This approach proves to be competitive with more sophisticated, non-causal methods in terms of position prediction accuracy, highlighting the potential of MFT in enhancing long-term tracking applications.
Paper Structure (26 sections, 5 equations, 3 figures, 3 tables)

This paper contains 26 sections, 5 equations, 3 figures, 3 tables.

Figures (3)

  • Figure 1: Illustration of the MFT flow chaining as defined in Equation \ref{['eq:flowchain']}. The optical flows are evaluated on the points in the outbound nodes of their respective arcs.
  • Figure 2: Visual comparison of selected dense tracking methods: (a) reference frame #0; (b)-(h) predicted positions of points in frame #140. All blue points are invisible in frame #140; blue points in (b)-(h) thus indicate false matches. Green points are visible both in frame #0 and frame #140. Red points highlight the points on the body of the lioness. Different shades are used to identify different points. The sequence is available at https://cmp.felk.cvut.cz/ serycjon/MFT/visuals/ugsJtsO9w1A-00.00.24.457-00.00.29.462_HD.mp4.
  • Figure 3: Images show the first frames of two selected TAP-Vid DAVIS sequences. Dots represent ground-truth tracking points, with shades of green showing the improvement in $<$$\delta^x_{avg}$ achieved by the Selective RoMa Position Prediction ensemble over methods (a)-(c), shades of red show the converse.