Table of Contents
Fetching ...

Cost Function Unrolling in Unsupervised Optical Flow

Gal Lifshitz, Dan Raviv

TL;DR

The paper tackles the non-differentiability of the $L^1$ TV term in unsupervised learning, particularly for optical flow with occlusions, by introducing Cost Unrolling, a differentiable proxy derived from an ADMM-inspired unrolling of the TV objective. The method replaces the non-differentiable TV term with a differentiable smoothness loss $ ext{$ ext{L}_ ext{sm}^T$}$ obtained by unrolling $T$ ADMM steps and accumulating their constraints, with $T=2$ proven effective. It achieves faster convergence and improved performance across synthetic PC signals, image denoising, and unsupervised optical flow benchmarks (MPI Sintel, KITTI 2015) without modifying model architectures or increasing inference cost, including up to a 15.82% reduction in TV-driven EPE for occluded regions. The results demonstrate that higher-quality gradients during training can yield sharper motion boundaries and better predictions, suggesting broad applicability to other non-differentiable regularizers in deep learning.

Abstract

Steepest descent algorithms, which are commonly used in deep learning, use the gradient as the descent direction, either as-is or after a direction shift using preconditioning. In many scenarios calculating the gradient is numerically hard due to complex or non-differentiable cost functions, specifically next to singular points. In this work we focus on the derivation of the Total Variation semi-norm commonly used in unsupervised cost functions. Specifically, we derive a differentiable proxy to the hard L1 smoothness constraint in a novel iterative scheme which we refer to as Cost Unrolling. Producing more accurate gradients during training, our method enables finer predictions of a given DNN model through improved convergence, without modifying its architecture or increasing computational complexity. We demonstrate our method in the unsupervised optical flow task. Replacing the L1 smoothness constraint with our unrolled cost during the training of a well known baseline, we report improved results on both MPI Sintel and KITTI 2015 unsupervised optical flow benchmarks. Particularly, we report EPE reduced by up to 15.82% on occluded pixels, where the smoothness constraint is dominant, enabling the detection of much sharper motion edges.

Cost Function Unrolling in Unsupervised Optical Flow

TL;DR

The paper tackles the non-differentiability of the TV term in unsupervised learning, particularly for optical flow with occlusions, by introducing Cost Unrolling, a differentiable proxy derived from an ADMM-inspired unrolling of the TV objective. The method replaces the non-differentiable TV term with a differentiable smoothness loss ext{L}_ ext{sm}^T obtained by unrolling ADMM steps and accumulating their constraints, with proven effective. It achieves faster convergence and improved performance across synthetic PC signals, image denoising, and unsupervised optical flow benchmarks (MPI Sintel, KITTI 2015) without modifying model architectures or increasing inference cost, including up to a 15.82% reduction in TV-driven EPE for occluded regions. The results demonstrate that higher-quality gradients during training can yield sharper motion boundaries and better predictions, suggesting broad applicability to other non-differentiable regularizers in deep learning.

Abstract

Steepest descent algorithms, which are commonly used in deep learning, use the gradient as the descent direction, either as-is or after a direction shift using preconditioning. In many scenarios calculating the gradient is numerically hard due to complex or non-differentiable cost functions, specifically next to singular points. In this work we focus on the derivation of the Total Variation semi-norm commonly used in unsupervised cost functions. Specifically, we derive a differentiable proxy to the hard L1 smoothness constraint in a novel iterative scheme which we refer to as Cost Unrolling. Producing more accurate gradients during training, our method enables finer predictions of a given DNN model through improved convergence, without modifying its architecture or increasing computational complexity. We demonstrate our method in the unsupervised optical flow task. Replacing the L1 smoothness constraint with our unrolled cost during the training of a well known baseline, we report improved results on both MPI Sintel and KITTI 2015 unsupervised optical flow benchmarks. Particularly, we report EPE reduced by up to 15.82% on occluded pixels, where the smoothness constraint is dominant, enabling the detection of much sharper motion edges.

Paper Structure

This paper contains 22 sections, 23 equations, 10 figures, 7 tables, 2 algorithms.

Figures (10)

  • Figure 1: Sintel Final benchmark qualitative example. Training an optical flow model using our cost function enables the detection of sharper motion boundaries through improved convergence, without modifying the model's architecture or increasing complexity. Displayed are the predicted Sintel Final benchmark 'Market 1' flows (bottom) and errors (top) of both our method and the ARFlow liu2020learning baseline, with close-ups on specific regions. White regions feature measured high errors.
  • Figure 2: Unrolled smoothness constraint block diagram. In each training iteration, given a flow prediction, its spatial gradient $\mathbf{\nabla F}$ is derived. Initialized at $\boldsymbol{\beta}^{(0)}=\mathbf{Q}^{(0)}=\mathbf{0}$, the Soft Thresholding and Multipliers Update steps are carried for update steps $t \in \{1,...,T\}$ to produce $\{\mathbf{Q}^{(t)},\boldsymbol{\beta}^{(t)} \}_{t=1}^{T}$, which are then used together with $\mathbf{\nabla F}$ to construct our smoothness constraint $\mathcal{L}_\text{sm}^T$ as in (\ref{['eq:loss_sm']}).
  • Figure 3: PC signal prediction - training. In each experiment we train a simple model to predict a full randomly generated PC 1D signal given a fraction of its samples and smoothness regularization. We train using our method, as well as standard TV and common $L^1$ relaxations. Displayed are mean and standard deviation of gradient norms (b) and prediction errors (c) measured during training while performing several independent experiments of the best configurations. An example of gradients recorded at a selected point is given in (a).
  • Figure 4: Image denoising qualitative examples. Displayed are images corresponding to two example scenarios - Zebra (top) and Peppers (bottom). From left to right: original, corrupt and reconstructed images obtained by TV regularization (DIP-TV) and our method. The corrupt images are labeled with the added noise levels, and each reconstruction is labeled with its corresponding PSNR [dB] with respect to the original image. Patches and corresponding pSSIM scores are highlighted in colored bounding boxes.
  • Figure 5: Unrolled cost vs. TV relaxations - qualitative examples. Given are Sintel (top) and KITTI (bottom) flows predicted by ARFlow and SMURF baselines, respectively, using our method vs. standard TV, Huber and Charbonnier baselines. The reference images and GT occlusion masks (white stands for occluded) are also given. Training using our smoothness regularizer produces flows which are significantly more accurate at the occluded regions which are correlated with motion boundaries.
  • ...and 5 more figures