High Resolution Multi-Scale RAFT (Robust Vision Challenge 2022)

Azin Jahedi; Maximilian Luz; Lukas Mehl; Marc Rivinius; Andrés Bruhn

High Resolution Multi-Scale RAFT (Robust Vision Challenge 2022)

Azin Jahedi, Maximilian Luz, Lukas Mehl, Marc Rivinius, Andrés Bruhn

TL;DR

The paper presents MS-RAFT+, an extension of MS-RAFT that introduces an additional finer scale enabled by on-demand correlation costs, allowing four-scale coarse-to-fine optical flow estimation with a shared $\times 2$ convex upsampler for full-resolution flow without precomputing an all-pairs cost volume. It maintains a four-scale feature extractor and shared refinement network, trading memory for on-demand computation and longer training/inference times. Training proceeds in three phases with a mix of datasets and uses a robust, multi-scale loss with $q=0.7$ to improve cross-dataset generalization, achieving state-of-the-art results on VIPER and near-top performance on KITTI, Sintel, and Middlebury. In the Robust Vision Challenge 2022, MS-RAFT+ ranks first overall, demonstrating strong generalization across diverse benchmarks and modalities while maintaining competitive per-dataset performance.

Abstract

In this report, we present our optical flow approach, MS-RAFT+, that won the Robust Vision Challenge 2022. It is based on the MS-RAFT method, which successfully integrates several multi-scale concepts into single-scale RAFT. Our approach extends this method by exploiting an additional finer scale for estimating the flow, which is made feasible by on-demand cost computation. This way, it can not only operate at half the original resolution, but also use MS-RAFT's shared convex upsampler to obtain full resolution flow. Moreover, our approach relies on an adjusted fine-tuning scheme during training. This in turn aims at improving the generalization across benchmarks. Among all participating methods in the Robust Vision Challenge, our approach ranks first on VIPER and second on KITTI, Sintel, and Middlebury, resulting in the first place of the overall ranking.

High Resolution Multi-Scale RAFT (Robust Vision Challenge 2022)

TL;DR

convex upsampler for full-resolution flow without precomputing an all-pairs cost volume. It maintains a four-scale feature extractor and shared refinement network, trading memory for on-demand computation and longer training/inference times. Training proceeds in three phases with a mix of datasets and uses a robust, multi-scale loss with

to improve cross-dataset generalization, achieving state-of-the-art results on VIPER and near-top performance on KITTI, Sintel, and Middlebury. In the Robust Vision Challenge 2022, MS-RAFT+ ranks first overall, demonstrating strong generalization across diverse benchmarks and modalities while maintaining competitive per-dataset performance.

Abstract

Paper Structure (9 sections, 2 figures, 2 tables)

This paper contains 9 sections, 2 figures, 2 tables.

Introduction
Approach
Architecture
Training
Inference
Evaluation
MS-RAFT+ vs. MS-RAFT
Cold warm-start vs. Cold-start
Robust Vision Challenge

Figures (2)

Figure 1: Our coarse-to-fine scheme. Best viewed as PDF
Figure 2: Final leaderboard of Robust Vision Challenge 2022 (Top 5).

High Resolution Multi-Scale RAFT (Robust Vision Challenge 2022)

TL;DR

Abstract

High Resolution Multi-Scale RAFT (Robust Vision Challenge 2022)

Authors

TL;DR

Abstract

Table of Contents

Figures (2)