Efficient Correlation Volume Sampling for Ultra-High-Resolution Optical Flow Estimation
Karlis Martins Briedis, Markus Gross, Christopher Schroers
TL;DR
This work targets the memory and compute bottleneck of dense all-pairs correlation volumes in ultra-high-resolution optical flow. It introduces a sparse, block-sparse, patch-major correlation volume sampler that updates incrementally across RAFT iterations, achieving sub-quadratic memory complexity $O(P^{1.5})$ with $P=H\times W$ and maintaining exact RAFT semantics. In isolated-sampling and end-to-end evaluations, it matches or surpasses the default RAFT sampling while reducing memory by up to 95% and beating on-demand sampling by up to 90%, yielding significant end-to-end speedups, especially at high resolutions. When applied to SEA-RAFT and tested on 8K data with a cascaded inference extension, it achieves state-of-the-art accuracy and efficiency, enabling practical high-fidelity optical flow for ultra-high-resolution video tasks.
Abstract
Recent optical flow estimation methods often employ local cost sampling from a dense all-pairs correlation volume. This results in quadratic computational and memory complexity in the number of pixels. Although an alternative memory-efficient implementation with on-demand cost computation exists, this is slower in practice and therefore prior methods typically process images at reduced resolutions, missing fine-grained details. To address this, we propose a more efficient implementation of the all-pairs correlation volume sampling, still matching the exact mathematical operator as defined by RAFT. Our approach outperforms on-demand sampling by up to 90% while maintaining low memory usage, and performs on par with the default implementation with up to 95% lower memory usage. As cost sampling makes up a significant portion of the overall runtime, this can translate to up to 50% savings for the total end-to-end model inference in memory-constrained environments. Our evaluation of existing methods includes an 8K ultra-high-resolution dataset and an additional inference-time modification of the recent SEA-RAFT method. With this, we achieve state-of-the-art results at high resolutions both in accuracy and efficiency.
