Table of Contents
Fetching ...

Memory-Efficient Optical Flow via Radius-Distribution Orthogonal Cost Volume

Gangwei Xu, Shujun Chen, Hao Jia, Miaojie Feng, Xin Yang

TL;DR

The paper tackles the memory burden of full 4D cost volumes in high-resolution optical flow methods by introducing MeFlow, which uses a Local Orthogonal Cost Volume to decompose 2D search into two 1D directions. It combines vertical and horizontal local attention with a radius-distribution multi-scale lookup to efficiently model large displacements, updating per GRU iteration. This yields a memory-efficient network that achieves competitive Sintel and KITTI performance and scales to 4K inputs with substantially reduced memory, enabling practical high-resolution optical flow deployment. The approach significantly lowers peak memory usage compared to RAFT while maintaining or improving accuracy in challenging scenarios and on very high-resolution data.

Abstract

The full 4D cost volume in Recurrent All-Pairs Field Transforms (RAFT) or global matching by Transformer achieves impressive performance for optical flow estimation. However, their memory consumption increases quadratically with input resolution, rendering them impractical for high-resolution images. In this paper, we present MeFlow, a novel memory-efficient method for high-resolution optical flow estimation. The key of MeFlow is a recurrent local orthogonal cost volume representation, which decomposes the 2D search space dynamically into two 1D orthogonal spaces, enabling our method to scale effectively to very high-resolution inputs. To preserve essential information in the orthogonal space, we utilize self attention to propagate feature information from the 2D space to the orthogonal space. We further propose a radius-distribution multi-scale lookup strategy to model the correspondences of large displacements at a negligible cost. We verify the efficiency and effectiveness of our method on the challenging Sintel and KITTI benchmarks, and real-world 4K ($2160\!\times\!3840$) images. Our method achieves competitive performance on both Sintel and KITTI benchmarks, while maintaining the highest memory efficiency on high-resolution inputs.

Memory-Efficient Optical Flow via Radius-Distribution Orthogonal Cost Volume

TL;DR

The paper tackles the memory burden of full 4D cost volumes in high-resolution optical flow methods by introducing MeFlow, which uses a Local Orthogonal Cost Volume to decompose 2D search into two 1D directions. It combines vertical and horizontal local attention with a radius-distribution multi-scale lookup to efficiently model large displacements, updating per GRU iteration. This yields a memory-efficient network that achieves competitive Sintel and KITTI performance and scales to 4K inputs with substantially reduced memory, enabling practical high-resolution optical flow deployment. The approach significantly lowers peak memory usage compared to RAFT while maintaining or improving accuracy in challenging scenarios and on very high-resolution data.

Abstract

The full 4D cost volume in Recurrent All-Pairs Field Transforms (RAFT) or global matching by Transformer achieves impressive performance for optical flow estimation. However, their memory consumption increases quadratically with input resolution, rendering them impractical for high-resolution images. In this paper, we present MeFlow, a novel memory-efficient method for high-resolution optical flow estimation. The key of MeFlow is a recurrent local orthogonal cost volume representation, which decomposes the 2D search space dynamically into two 1D orthogonal spaces, enabling our method to scale effectively to very high-resolution inputs. To preserve essential information in the orthogonal space, we utilize self attention to propagate feature information from the 2D space to the orthogonal space. We further propose a radius-distribution multi-scale lookup strategy to model the correspondences of large displacements at a negligible cost. We verify the efficiency and effectiveness of our method on the challenging Sintel and KITTI benchmarks, and real-world 4K () images. Our method achieves competitive performance on both Sintel and KITTI benchmarks, while maintaining the highest memory efficiency on high-resolution inputs.
Paper Structure (16 sections, 6 equations, 8 figures, 7 tables)

This paper contains 16 sections, 6 equations, 8 figures, 7 tables.

Figures (8)

  • Figure 1: Comparison of our local orthogonal cost volume and the global static cost volume in Flow1D flow1d. Flow1D only searches two static orthogonal lines (green lines) for every pixel in source image. In contrast, our method can dynamically search the entire 2D space based on the current updated flow (green point).
  • Figure 2: Comparisons with Flow1D flow1d on DAVIS dataset. Flow1D struggles to handle large motions. In contrast, our method performs well in these regions. Flow visualization is based on the color wheel shown on the corner of the first flow map.
  • Figure 3: Overview of the proposed MeFlow. We apply vertical and horizontal attention to the multi-scale target features to generate multi-scale attended features. Then, in each iteration, we index the multi-scale vertically attended features and horizontally attended features along the horizontal and vertical direction respectively based on the current updated flow (green point). Specially, the proposed radius-distribution multi-scale (MS) orthogonal lookup can index finer-resolution features at small radius and coarser-resolution at large radius. Finally, we construct a dynamic orthogonal cost volume by performing 1D correlation between the source image feature and the dynamically indexed attended features.
  • Figure 4: RAFT's multi-scale strategy v.s. our radius-distribution multi-scale strategy.
  • Figure 5: Comparisons on high-resolution ($1080 \times 1920$) images from DAVIS. We achieve comparable results with RAFT raft while consuming $6 \times$ less memory. We achieve more accurate results than the SOTA memory-efficient Flow1D flow1d (pointed by red arrows).
  • ...and 3 more figures