Cost Function Unrolling in Unsupervised Optical Flow
Gal Lifshitz, Dan Raviv
TL;DR
The paper tackles the non-differentiability of the $L^1$ TV term in unsupervised learning, particularly for optical flow with occlusions, by introducing Cost Unrolling, a differentiable proxy derived from an ADMM-inspired unrolling of the TV objective. The method replaces the non-differentiable TV term with a differentiable smoothness loss $ ext{$ ext{L}_ ext{sm}^T$}$ obtained by unrolling $T$ ADMM steps and accumulating their constraints, with $T=2$ proven effective. It achieves faster convergence and improved performance across synthetic PC signals, image denoising, and unsupervised optical flow benchmarks (MPI Sintel, KITTI 2015) without modifying model architectures or increasing inference cost, including up to a 15.82% reduction in TV-driven EPE for occluded regions. The results demonstrate that higher-quality gradients during training can yield sharper motion boundaries and better predictions, suggesting broad applicability to other non-differentiable regularizers in deep learning.
Abstract
Steepest descent algorithms, which are commonly used in deep learning, use the gradient as the descent direction, either as-is or after a direction shift using preconditioning. In many scenarios calculating the gradient is numerically hard due to complex or non-differentiable cost functions, specifically next to singular points. In this work we focus on the derivation of the Total Variation semi-norm commonly used in unsupervised cost functions. Specifically, we derive a differentiable proxy to the hard L1 smoothness constraint in a novel iterative scheme which we refer to as Cost Unrolling. Producing more accurate gradients during training, our method enables finer predictions of a given DNN model through improved convergence, without modifying its architecture or increasing computational complexity. We demonstrate our method in the unsupervised optical flow task. Replacing the L1 smoothness constraint with our unrolled cost during the training of a well known baseline, we report improved results on both MPI Sintel and KITTI 2015 unsupervised optical flow benchmarks. Particularly, we report EPE reduced by up to 15.82% on occluded pixels, where the smoothness constraint is dominant, enabling the detection of much sharper motion edges.
