DCVNet: Dilated Cost Volume Networks for Fast Optical Flow
Huaizu Jiang, Erik Learned-Miller
TL;DR
DCVNet proposes a single-pass optical flow model that uses multiple dilated cost volumes to capture both small and large displacements without sequential refinement. A U-Net converts the concatenated dilated volumes into interpolation weights, enabling a weighted combination of candidate displacements to produce the flow. The approach achieves competitive accuracy on Sintel and KITTI while maintaining real-time performance (30 fps) on a mid-range GPU, thanks to efficient cost-volume construction and a lightweight decoding stage. Training combines SceneFlow pre-training with targeted fine-tuning, and ablations demonstrate the benefit of multiple dilation factors and supervised interpolation weights. This method offers a fast alternative to coarse-to-fine or recurrent cost-volume strategies in optical flow estimation.
Abstract
The cost volume, capturing the similarity of possible correspondences across two input images, is a key ingredient in state-of-the-art optical flow approaches. When sampling correspondences to build the cost volume, a large neighborhood radius is required to deal with large displacements, introducing a significant computational burden. To address this, coarse-to-fine or recurrent processing of the cost volume is usually adopted, where correspondence sampling in a local neighborhood with a small radius suffices. In this paper, we propose an alternative by constructing cost volumes with different dilation factors to capture small and large displacements simultaneously. A U-Net with skip connections is employed to convert the dilated cost volumes into interpolation weights between all possible captured displacements to get the optical flow. Our proposed model DCVNet only needs to process the cost volume once in a simple feedforward manner and does not rely on the sequential processing strategy. DCVNet obtains comparable accuracy to existing approaches and achieves real-time inference (30 fps on a mid-end 1080ti GPU). The code and model weights are available at https://github.com/neu-vi/ezflow.
