Table of Contents
Fetching ...

SciFlow: Empowering Lightweight Optical Flow Models with Self-Cleaning Iterations

Jamie Menjay Lin, Jisoo Jeong, Hong Cai, Risheek Garrepalli, Kai Wang, Fatih Porikli

TL;DR

This work tackles real-time on-device optical flow for lightweight models, where iterative refinement often suffers from error propagation. It introduces Self-Cleaning Iterations (SCI), which provides a warping-consistency-based quality map to guide iterative correction, and Regression Focal Loss (RFL), a confidence-weighted regression loss that concentrates learning on high-error regions; together, they form SciFlow. SCI derives a dense quality map from feature-warps and feeds it into the refinement loop, while RFL uses a confidence map $M_{conf} = e^{- orm{f_{gt} - f_{pred}}^2}$ to weight the loss as $l_i = orm{(1 + \alpha (1-M)^{\beta}) (f_{gt} - f_i)}_1$, with training leveraging final-iteration confidence. When applied to two lightweight baselines, SciFlow yields substantial reductions in $EPE$ and $Fl\text{-}all$ on Sintel and KITTI (e.g., up to 6.3% and 10.5% in-domain, 6.2% and 13.5% cross-domain) with negligible inference overhead, enabling accurate, real-time optical flow on mobile and AR/VR devices. These improvements are complemented by on-device demonstrations showing minimal latency/power impact and potential for broader adoption in low-resource vision tasks.

Abstract

Optical flow estimation is crucial to a variety of vision tasks. Despite substantial recent advancements, achieving real-time on-device optical flow estimation remains a complex challenge. First, an optical flow model must be sufficiently lightweight to meet computation and memory constraints to ensure real-time performance on devices. Second, the necessity for real-time on-device operation imposes constraints that weaken the model's capacity to adequately handle ambiguities in flow estimation, thereby intensifying the difficulty of preserving flow accuracy. This paper introduces two synergistic techniques, Self-Cleaning Iteration (SCI) and Regression Focal Loss (RFL), designed to enhance the capabilities of optical flow models, with a focus on addressing optical flow regression ambiguities. These techniques prove particularly effective in mitigating error propagation, a prevalent issue in optical flow models that employ iterative refinement. Notably, these techniques add negligible to zero overhead in model parameters and inference latency, thereby preserving real-time on-device efficiency. The effectiveness of our proposed SCI and RFL techniques, collectively referred to as SciFlow for brevity, is demonstrated across two distinct lightweight optical flow model architectures in our experiments. Remarkably, SciFlow enables substantial reduction in error metrics (EPE and Fl-all) over the baseline models by up to 6.3% and 10.5% for in-domain scenarios and by up to 6.2% and 13.5% for cross-domain scenarios on the Sintel and KITTI 2015 datasets, respectively.

SciFlow: Empowering Lightweight Optical Flow Models with Self-Cleaning Iterations

TL;DR

This work tackles real-time on-device optical flow for lightweight models, where iterative refinement often suffers from error propagation. It introduces Self-Cleaning Iterations (SCI), which provides a warping-consistency-based quality map to guide iterative correction, and Regression Focal Loss (RFL), a confidence-weighted regression loss that concentrates learning on high-error regions; together, they form SciFlow. SCI derives a dense quality map from feature-warps and feeds it into the refinement loop, while RFL uses a confidence map to weight the loss as , with training leveraging final-iteration confidence. When applied to two lightweight baselines, SciFlow yields substantial reductions in and on Sintel and KITTI (e.g., up to 6.3% and 10.5% in-domain, 6.2% and 13.5% cross-domain) with negligible inference overhead, enabling accurate, real-time optical flow on mobile and AR/VR devices. These improvements are complemented by on-device demonstrations showing minimal latency/power impact and potential for broader adoption in low-resource vision tasks.

Abstract

Optical flow estimation is crucial to a variety of vision tasks. Despite substantial recent advancements, achieving real-time on-device optical flow estimation remains a complex challenge. First, an optical flow model must be sufficiently lightweight to meet computation and memory constraints to ensure real-time performance on devices. Second, the necessity for real-time on-device operation imposes constraints that weaken the model's capacity to adequately handle ambiguities in flow estimation, thereby intensifying the difficulty of preserving flow accuracy. This paper introduces two synergistic techniques, Self-Cleaning Iteration (SCI) and Regression Focal Loss (RFL), designed to enhance the capabilities of optical flow models, with a focus on addressing optical flow regression ambiguities. These techniques prove particularly effective in mitigating error propagation, a prevalent issue in optical flow models that employ iterative refinement. Notably, these techniques add negligible to zero overhead in model parameters and inference latency, thereby preserving real-time on-device efficiency. The effectiveness of our proposed SCI and RFL techniques, collectively referred to as SciFlow for brevity, is demonstrated across two distinct lightweight optical flow model architectures in our experiments. Remarkably, SciFlow enables substantial reduction in error metrics (EPE and Fl-all) over the baseline models by up to 6.3% and 10.5% for in-domain scenarios and by up to 6.2% and 13.5% for cross-domain scenarios on the Sintel and KITTI 2015 datasets, respectively.
Paper Structure (21 sections, 9 equations, 6 figures, 5 tables)

This paper contains 21 sections, 9 equations, 6 figures, 5 tables.

Figures (6)

  • Figure 1: A zoomed-in demonstration of "Self-Cleaning Iterations (SCI)" effect against error propagation, a prevalent issue in iterative refinement for optical flow models. (a) The baseline model (RAFT-Small teed2020raft as one choice of model architecture) suffers from error propagation over iterations, especially near the arm and legs. (b) When the SCI technique is applied to the baseline model, it demonstrates a "self cleaning" effect over iterations. This is achieved at negligible additional overhead in computation and in model size. (c) When both the SCI and RFL techniques are applied to the baseline model, the "self cleaning" effect becomes even more visible, particularly around the arm and feet. On top of "Base+SCI", this RFL technique concerns only the loss function in training so it adds no additional overhead for inference.
  • Figure 2: Overview of our proposed approach.Self-Cleaning Iterations (SCI) enables the network to "self-assess" the flow prediction quality and then to "self-clean" the flow prediction itself over the standard practice of iterative refinement process in many optical flow models. Regression Focal Loss (RFL) derives a confidence map and guide the network to focus more on regions of high residual regression errors during the iterations. ${\small{\textcircled{\scriptsize{C}}}}$ stands for the concatenation operator.
  • Figure 3: Concept for SCI map creation. (a) A pair of image features are taken as input. (b) The ground truth flows point to their matches on the left sub-figure, while the estimated flows point incorrectly for some features on the right sub-figure. (c) F2' is derived by warping F2 by the estimated flows. (d) F1 and F2' are taken by their tensor-wise differences for Gaussian Kernel (Eq. \ref{['eq:gaussian']}) evaluation for their affinity. (e) A dense SCI map output is derived.
  • Figure 4: Regression Focal Loss. While equal-weight loss across all pixels is used for conventional optical flow model training (Eq. \ref{['eq:optical_each_l1']}), Regression Focal Loss generates the confidence map using optical flow prediction and ground truth (Eq. \ref{['eq:conf']}) and leverages it to the optical flow loss (Eq. \ref{['eq:rfl_l1']}) so that model can focus on difficult areas in the dataset.
  • Figure 5: Qualitative results on Sintel (train) dataset using RAFT-small architecture (trained with C+T). First and second rows are input images. Third row is the ground truth. Fourth row is the output of RAFT-small. Fifth and sixth rows are the output of RAFT-small + SCI and RAFT-small + SCI + RFL, respectively.
  • ...and 1 more figures