Table of Contents
Fetching ...

Efficient Correlation Volume Sampling for Ultra-High-Resolution Optical Flow Estimation

Karlis Martins Briedis, Markus Gross, Christopher Schroers

TL;DR

This work targets the memory and compute bottleneck of dense all-pairs correlation volumes in ultra-high-resolution optical flow. It introduces a sparse, block-sparse, patch-major correlation volume sampler that updates incrementally across RAFT iterations, achieving sub-quadratic memory complexity $O(P^{1.5})$ with $P=H\times W$ and maintaining exact RAFT semantics. In isolated-sampling and end-to-end evaluations, it matches or surpasses the default RAFT sampling while reducing memory by up to 95% and beating on-demand sampling by up to 90%, yielding significant end-to-end speedups, especially at high resolutions. When applied to SEA-RAFT and tested on 8K data with a cascaded inference extension, it achieves state-of-the-art accuracy and efficiency, enabling practical high-fidelity optical flow for ultra-high-resolution video tasks.

Abstract

Recent optical flow estimation methods often employ local cost sampling from a dense all-pairs correlation volume. This results in quadratic computational and memory complexity in the number of pixels. Although an alternative memory-efficient implementation with on-demand cost computation exists, this is slower in practice and therefore prior methods typically process images at reduced resolutions, missing fine-grained details. To address this, we propose a more efficient implementation of the all-pairs correlation volume sampling, still matching the exact mathematical operator as defined by RAFT. Our approach outperforms on-demand sampling by up to 90% while maintaining low memory usage, and performs on par with the default implementation with up to 95% lower memory usage. As cost sampling makes up a significant portion of the overall runtime, this can translate to up to 50% savings for the total end-to-end model inference in memory-constrained environments. Our evaluation of existing methods includes an 8K ultra-high-resolution dataset and an additional inference-time modification of the recent SEA-RAFT method. With this, we achieve state-of-the-art results at high resolutions both in accuracy and efficiency.

Efficient Correlation Volume Sampling for Ultra-High-Resolution Optical Flow Estimation

TL;DR

This work targets the memory and compute bottleneck of dense all-pairs correlation volumes in ultra-high-resolution optical flow. It introduces a sparse, block-sparse, patch-major correlation volume sampler that updates incrementally across RAFT iterations, achieving sub-quadratic memory complexity with and maintaining exact RAFT semantics. In isolated-sampling and end-to-end evaluations, it matches or surpasses the default RAFT sampling while reducing memory by up to 95% and beating on-demand sampling by up to 90%, yielding significant end-to-end speedups, especially at high resolutions. When applied to SEA-RAFT and tested on 8K data with a cascaded inference extension, it achieves state-of-the-art accuracy and efficiency, enabling practical high-fidelity optical flow for ultra-high-resolution video tasks.

Abstract

Recent optical flow estimation methods often employ local cost sampling from a dense all-pairs correlation volume. This results in quadratic computational and memory complexity in the number of pixels. Although an alternative memory-efficient implementation with on-demand cost computation exists, this is slower in practice and therefore prior methods typically process images at reduced resolutions, missing fine-grained details. To address this, we propose a more efficient implementation of the all-pairs correlation volume sampling, still matching the exact mathematical operator as defined by RAFT. Our approach outperforms on-demand sampling by up to 90% while maintaining low memory usage, and performs on par with the default implementation with up to 95% lower memory usage. As cost sampling makes up a significant portion of the overall runtime, this can translate to up to 50% savings for the total end-to-end model inference in memory-constrained environments. Our evaluation of existing methods includes an 8K ultra-high-resolution dataset and an additional inference-time modification of the recent SEA-RAFT method. With this, we achieve state-of-the-art results at high resolutions both in accuracy and efficiency.

Paper Structure

This paper contains 39 sections, 3 equations, 12 figures, 10 tables.

Figures (12)

  • Figure 1: SEA-RAFT optical flow prediction on ultra-high-resolution frames. By estimating optical flow at downsampled resolution, many of the object details are lost. By directly processing high-resolution frames, it fails to estimate large motion. Using our proposed extensions, both large motion and small details can be estimated, additionally achieving $30\%$ faster inference through our efficient correlation volume sampling algorithm.
  • Figure 2: Overview of the correlation volume sampling. Given a map of correlation between the features of a single source pixel and the features of every pixel in another image, bilinear sampling is used to extract local matching costs around a point of interest. When repeated for every source pixel, the costs are stored in a dense all-pairs correlation volume, where each row and column correspond to a source and target pixel, respectively. This is repeated on multiple levels of scale.
  • Figure 3: Sampling patterns of a single image over all RAFT iterations. Dark regions correspond to cells that have not been sampled while lighter values indicate more sampled values per block.
  • Figure 4: Algorithm overview. It consists of input preprocessing and 3 steps per iteration: a) determining blocks that need to be computed; b) computing selected blocks with block sparse matrix-matrix multiplication; c) sampling computed blocks.
  • Figure 5: Runtime and peak memory consumption depending on the RAFT input width. Standard deviation is displayed as shaded area, we show memory capacity of different hardware as dotted lines.
  • ...and 7 more figures