Table of Contents
Fetching ...

WAFT: Warping-Alone Field Transforms for Optical Flow

Yihan Wang, Jia Deng

TL;DR

WAFT is a simple and flexible meta-architecture with minimal inductive biases and reliance on custom designs that achieves the best zero-shot generalization on KITTI, while being up to 4.1x faster than methods with similar performance.

Abstract

We introduce Warping-Alone Field Transforms (WAFT), a simple and effective method for optical flow. WAFT is similar to RAFT but replaces cost volume with high-resolution warping, achieving better accuracy with lower memory cost. This design challenges the conventional wisdom that constructing cost volumes is necessary for strong performance. WAFT is a simple and flexible meta-architecture with minimal inductive biases and reliance on custom designs. Compared with existing methods, WAFT ranks 1st on Spring, Sintel, and KITTI benchmarks, achieves the best zero-shot generalization on KITTI, while being up to 4.1x faster than methods with similar performance. Code and model weights are available at https://github.com/princeton-vl/WAFT.

WAFT: Warping-Alone Field Transforms for Optical Flow

TL;DR

WAFT is a simple and flexible meta-architecture with minimal inductive biases and reliance on custom designs that achieves the best zero-shot generalization on KITTI, while being up to 4.1x faster than methods with similar performance.

Abstract

We introduce Warping-Alone Field Transforms (WAFT), a simple and effective method for optical flow. WAFT is similar to RAFT but replaces cost volume with high-resolution warping, achieving better accuracy with lower memory cost. This design challenges the conventional wisdom that constructing cost volumes is necessary for strong performance. WAFT is a simple and flexible meta-architecture with minimal inductive biases and reliance on custom designs. Compared with existing methods, WAFT ranks 1st on Spring, Sintel, and KITTI benchmarks, achieves the best zero-shot generalization on KITTI, while being up to 4.1x faster than methods with similar performance. Code and model weights are available at https://github.com/princeton-vl/WAFT.

Paper Structure

This paper contains 40 sections, 3 equations, 4 figures, 6 tables.

Figures (4)

  • Figure 1: The meta-architecture of WAFT consists of an input encoder and a recurrent update module. We first extract image features from the input encoder, and then use these features to iteratively update the flow estimate for $T$ steps. At each step, we perform feature indexing through a lightweight backward warping on the feature of frame 2, removing the dependency on expensive cost volume used by previous work.
  • Figure 2: For each pixel, the full cost volume calculates its visual similarity to all pixels in the other frame through correlation. The partial cost volume restricts the search range to the neighborhood of the corresponding pixel, marked by a red box. Compared with them, warping only uses the information from the corresponding pixel, offering better time and memory efficiency. This efficiency enables high-resolution processing, which leads to improved accuracy.
  • Figure 3: Visualizations of different methods on Spring mehl2023spring. WAFT, benefiting from high-resolution indexing, obtains sharper boundaries and lower errors than low-resolution approaches.
  • Figure 4: Visualizations on Spring, KITTI, and Sintel public benchmarks (from left to right).