Table of Contents
Fetching ...

HMAFlow: Learning More Accurate Optical Flow via Hierarchical Motion Field Alignment

Dianbo Ma, Kousuke Imamura, Ziyan Gao, Xiangjie Wang, Satoshi Yamane

TL;DR

This work presents a novel method, dubbed HMAFlow, to improve optical flow estimation in challenging scenes, particularly those involving small objects, and demonstrates that the model achieves the best generalization performance compared to other state-of-the-art methods.

Abstract

Optical flow estimation is a fundamental and long-standing visual task. In this work, we present a novel method, dubbed HMAFlow, to improve optical flow estimation in challenging scenes, particularly those involving small objects. The proposed model mainly consists of two core components: a Hierarchical Motion Field Alignment (HMA) module and a Correlation Self-Attention (CSA) module. In addition, we rebuild 4D cost volumes by employing a Multi-Scale Correlation Search (MCS) layer and replacing average pooling in common cost volumes with a search strategy utilizing multiple search ranges. Experimental results demonstrate that our model achieves the best generalization performance compared to other state-of-the-art methods. Specifically, compared with RAFT, our method achieves relative error reductions of 14.2% and 3.4% on the clean pass and final pass of the Sintel online benchmark, respectively. On the KITTI test benchmark, HMAFlow surpasses RAFT and GMA in the Fl-all metric by relative margins of 6.8% and 7.7%, respectively. To facilitate future research, our code will be made available at https://github.com/BooTurbo/HMAFlow.

HMAFlow: Learning More Accurate Optical Flow via Hierarchical Motion Field Alignment

TL;DR

This work presents a novel method, dubbed HMAFlow, to improve optical flow estimation in challenging scenes, particularly those involving small objects, and demonstrates that the model achieves the best generalization performance compared to other state-of-the-art methods.

Abstract

Optical flow estimation is a fundamental and long-standing visual task. In this work, we present a novel method, dubbed HMAFlow, to improve optical flow estimation in challenging scenes, particularly those involving small objects. The proposed model mainly consists of two core components: a Hierarchical Motion Field Alignment (HMA) module and a Correlation Self-Attention (CSA) module. In addition, we rebuild 4D cost volumes by employing a Multi-Scale Correlation Search (MCS) layer and replacing average pooling in common cost volumes with a search strategy utilizing multiple search ranges. Experimental results demonstrate that our model achieves the best generalization performance compared to other state-of-the-art methods. Specifically, compared with RAFT, our method achieves relative error reductions of 14.2% and 3.4% on the clean pass and final pass of the Sintel online benchmark, respectively. On the KITTI test benchmark, HMAFlow surpasses RAFT and GMA in the Fl-all metric by relative margins of 6.8% and 7.7%, respectively. To facilitate future research, our code will be made available at https://github.com/BooTurbo/HMAFlow.
Paper Structure (16 sections, 6 equations, 6 figures, 5 tables)

This paper contains 16 sections, 6 equations, 6 figures, 5 tables.

Figures (6)

  • Figure 1: Visual comparisons with RAFT 2020raft on the Sintel 2012Sintel dataset. Our model provides more precise estimations for small targets and sharp edges, demonstrating the effectiveness of the proposed novel modules.
  • Figure 2: The overall framework of the proposed HMAFlow. It mainly consists of two key modules: 1) the Hierarchical Motion Field Alignment (HMA) module, and 2) the Correlation Self-attention (CSA) module. Addtionally, we develop a Multi-scale Correlation Search (MCS) layer to extend the original 4D cost volume into a two-level of multi-scale cost volumes (4 layers for each level). For the optical flow regressor, we adopt the convolutional GRU 2014GRU network.
  • Figure 3: Illustration of the Multi-scale Search strategy. We apply multiple search ranges to perform lookup operations on each of the two-level base 4D cost volumes separately, with each level generating a 3D pyramid-shaped cost volume.
  • Figure 4: The structure of the Correlation Self-attention module. After the alignment process, the $1/8$ resolution 3D cost volumes are fed into the CSA module. In the CSA module, we use only a single optimized attention block because the input 3D volumes are of very high quality, making one attention block sufficient to meet the model's requirements, while also achieving a balance between performance and computational cost.
  • Figure 5: Visual comparisons on the Sintel 2012Sintel online benchmark. We compare the proposed HMAFlow with two representative algorithms, i.e. RAFT 2020raft and GMA 2021gma. As shown, our model excels in identifying small objects, clearly distinguishing the boundaries between objects, and providing more accurate and robust estimations. In contrast, the other two methods tend to blur the boundaries between objects and even fail to recover small objects.
  • ...and 1 more figures