Memory-Efficient Optical Flow via Radius-Distribution Orthogonal Cost Volume
Gangwei Xu, Shujun Chen, Hao Jia, Miaojie Feng, Xin Yang
TL;DR
The paper tackles the memory burden of full 4D cost volumes in high-resolution optical flow methods by introducing MeFlow, which uses a Local Orthogonal Cost Volume to decompose 2D search into two 1D directions. It combines vertical and horizontal local attention with a radius-distribution multi-scale lookup to efficiently model large displacements, updating per GRU iteration. This yields a memory-efficient network that achieves competitive Sintel and KITTI performance and scales to 4K inputs with substantially reduced memory, enabling practical high-resolution optical flow deployment. The approach significantly lowers peak memory usage compared to RAFT while maintaining or improving accuracy in challenging scenarios and on very high-resolution data.
Abstract
The full 4D cost volume in Recurrent All-Pairs Field Transforms (RAFT) or global matching by Transformer achieves impressive performance for optical flow estimation. However, their memory consumption increases quadratically with input resolution, rendering them impractical for high-resolution images. In this paper, we present MeFlow, a novel memory-efficient method for high-resolution optical flow estimation. The key of MeFlow is a recurrent local orthogonal cost volume representation, which decomposes the 2D search space dynamically into two 1D orthogonal spaces, enabling our method to scale effectively to very high-resolution inputs. To preserve essential information in the orthogonal space, we utilize self attention to propagate feature information from the 2D space to the orthogonal space. We further propose a radius-distribution multi-scale lookup strategy to model the correspondences of large displacements at a negligible cost. We verify the efficiency and effectiveness of our method on the challenging Sintel and KITTI benchmarks, and real-world 4K ($2160\!\times\!3840$) images. Our method achieves competitive performance on both Sintel and KITTI benchmarks, while maintaining the highest memory efficiency on high-resolution inputs.
