Table of Contents
Fetching ...

Rethinking RAFT for Efficient Optical Flow

Navid Eslami, Farnoosh Arefi, Amir M. Mansourian, Shohreh Kasaei

TL;DR

This work tackles the challenges of large displacements and repetitive patterns in optical flow estimation by extending the RAFT framework. It introduces two modules, the Amorphous Lookup Operator (ALO) for flexible, far-range similarity querying, and the Attention-based Feature Localizer (AFL) for incorporating global context to disambiguate poorly textured regions. Ef-RAFT demonstrates significant accuracy gains on Sintel (~10%) and KITTI (~5–6%) with only modest increases in runtime (~33%) and memory (~13%), while using a lean parameter count. The proposed methods offer a practical path to more accurate and efficient optical flow in real-world, memory-constrained settings, with code available publicly for reuse and extension.

Abstract

Despite significant progress in deep learning-based optical flow methods, accurately estimating large displacements and repetitive patterns remains a challenge. The limitations of local features and similarity search patterns used in these algorithms contribute to this issue. Additionally, some existing methods suffer from slow runtime and excessive graphic memory consumption. To address these problems, this paper proposes a novel approach based on the RAFT framework. The proposed Attention-based Feature Localization (AFL) approach incorporates the attention mechanism to handle global feature extraction and address repetitive patterns. It introduces an operator for matching pixels with corresponding counterparts in the second frame and assigning accurate flow values. Furthermore, an Amorphous Lookup Operator (ALO) is proposed to enhance convergence speed and improve RAFTs ability to handle large displacements by reducing data redundancy in its search operator and expanding the search space for similarity extraction. The proposed method, Efficient RAFT (Ef-RAFT),achieves significant improvements of 10% on the Sintel dataset and 5% on the KITTI dataset over RAFT. Remarkably, these enhancements are attained with a modest 33% reduction in speed and a mere 13% increase in memory usage. The code is available at: https://github.com/n3slami/Ef-RAFT

Rethinking RAFT for Efficient Optical Flow

TL;DR

This work tackles the challenges of large displacements and repetitive patterns in optical flow estimation by extending the RAFT framework. It introduces two modules, the Amorphous Lookup Operator (ALO) for flexible, far-range similarity querying, and the Attention-based Feature Localizer (AFL) for incorporating global context to disambiguate poorly textured regions. Ef-RAFT demonstrates significant accuracy gains on Sintel (~10%) and KITTI (~5–6%) with only modest increases in runtime (~33%) and memory (~13%), while using a lean parameter count. The proposed methods offer a practical path to more accurate and efficient optical flow in real-world, memory-constrained settings, with code available publicly for reuse and extension.

Abstract

Despite significant progress in deep learning-based optical flow methods, accurately estimating large displacements and repetitive patterns remains a challenge. The limitations of local features and similarity search patterns used in these algorithms contribute to this issue. Additionally, some existing methods suffer from slow runtime and excessive graphic memory consumption. To address these problems, this paper proposes a novel approach based on the RAFT framework. The proposed Attention-based Feature Localization (AFL) approach incorporates the attention mechanism to handle global feature extraction and address repetitive patterns. It introduces an operator for matching pixels with corresponding counterparts in the second frame and assigning accurate flow values. Furthermore, an Amorphous Lookup Operator (ALO) is proposed to enhance convergence speed and improve RAFTs ability to handle large displacements by reducing data redundancy in its search operator and expanding the search space for similarity extraction. The proposed method, Efficient RAFT (Ef-RAFT),achieves significant improvements of 10% on the Sintel dataset and 5% on the KITTI dataset over RAFT. Remarkably, these enhancements are attained with a modest 33% reduction in speed and a mere 13% increase in memory usage. The code is available at: https://github.com/n3slami/Ef-RAFT
Paper Structure (17 sections, 5 equations, 5 figures, 4 tables)

This paper contains 17 sections, 5 equations, 5 figures, 4 tables.

Figures (5)

  • Figure 1: Diagram of RAFTteed2020raft encompasses three main components. 1) Feature encoders are employed to extract per-pixel features from the input frames. 2) A correlation layer constructs a correlation volume with dimensions $W \times H \times W \times H$ by computing the inner product of feature vectors for all pairs. 3) An update operator recurrently enhances the optical flow estimation by leveraging the current estimate to retrieve values from the set of correlation volumes.
  • Figure 2: Structure of the grid used in the original lookup operator (left), compared to the transformed lookup operator used in the ALO (right).
  • Figure 3: Scalar parameter calculation network for the ALO.
  • Figure 4: Definition of the $x_p^{\pm}$ and $y_p^{\pm}$ values for a pixel $p$ in a poorly textured region. The orange points depict the pixels that may cause us to err in our estimation of $\Delta x_p$ and $\Delta y_p$.
  • Figure 5: Qualitative comparison between the proposed method and RAFT. Frames with orange and blue labels are from Sintel and KITTI datasets, respectively.