Table of Contents
Fetching ...

Kornia: an Open Source Differentiable Computer Vision Library for PyTorch

Edgar Riba, Dmytro Mishkin, Daniel Ponsa, Ethan Rublee, Gary Bradski

TL;DR

Kornia addresses the gap in PyTorch where standard computer vision algorithms are CPU-bound and non-differentiable, hindering integration into deep learning pipelines. The authors present Kornia, a PyTorch-based library of differentiable, GPU-accelerated operators and modules (color, filters, geometry, features, losses) that mirror OpenCV-like functionality and can be embedded as network layers. They demonstrate practical use cases including batch image processing benchmarks, image registration by gradient descent, multi-view depth estimation, and differentiable local-feature matching with adversarial attack demonstrations, illustrating the practicality and performance benefits of differentiable CV within DL workflows. The work emphasizes production readiness and scalability through batching, distributed execution, and PyTorch TorchScript integration, aiming to broaden adoption and collaboration in open-source CV within DL contexts.

Abstract

This work presents Kornia -- an open source computer vision library which consists of a set of differentiable routines and modules to solve generic computer vision problems. The package uses PyTorch as its main backend both for efficiency and to take advantage of the reverse-mode auto-differentiation to define and compute the gradient of complex functions. Inspired by OpenCV, Kornia is composed of a set of modules containing operators that can be inserted inside neural networks to train models to perform image transformations, camera calibration, epipolar geometry, and low level image processing techniques, such as filtering and edge detection that operate directly on high dimensional tensor representations. Examples of classical vision problems implemented using our framework are provided including a benchmark comparing to existing vision libraries.

Kornia: an Open Source Differentiable Computer Vision Library for PyTorch

TL;DR

Kornia addresses the gap in PyTorch where standard computer vision algorithms are CPU-bound and non-differentiable, hindering integration into deep learning pipelines. The authors present Kornia, a PyTorch-based library of differentiable, GPU-accelerated operators and modules (color, filters, geometry, features, losses) that mirror OpenCV-like functionality and can be embedded as network layers. They demonstrate practical use cases including batch image processing benchmarks, image registration by gradient descent, multi-view depth estimation, and differentiable local-feature matching with adversarial attack demonstrations, illustrating the practicality and performance benefits of differentiable CV within DL workflows. The work emphasizes production readiness and scalability through batching, distributed execution, and PyTorch TorchScript integration, aiming to broaden adoption and collaboration in open-source CV within DL contexts.

Abstract

This work presents Kornia -- an open source computer vision library which consists of a set of differentiable routines and modules to solve generic computer vision problems. The package uses PyTorch as its main backend both for efficiency and to take advantage of the reverse-mode auto-differentiation to define and compute the gradient of complex functions. Inspired by OpenCV, Kornia is composed of a set of modules containing operators that can be inserted inside neural networks to train models to perform image transformations, camera calibration, epipolar geometry, and low level image processing techniques, such as filtering and edge detection that operate directly on high dimensional tensor representations. Examples of classical vision problems implemented using our framework are provided including a benchmark comparing to existing vision libraries.

Paper Structure

This paper contains 13 sections, 3 equations, 5 figures, 1 table.

Figures (5)

  • Figure 1: The library implements routines for low level image processing tasks using native PyTorch operators and their custom optimization. The purpose of the library is to be used for large-scale vision projects, data augmentation, or for creating computer vision layers inside of neural network layers that allow for backprogating error through them. The above results are obtained from a given batch of images using data parallelism in the GPU.
  • Figure 2: Left: Python script showing our image processing API. Notice that the API is transparent to the device, and can be easily combined with other PyTorch components. Right: Results of the benchmark comparing Kornia to other state-of-the-art vision libraries. We measure the elapsed time for computing Sobel edges (lower is better).
  • Figure 3: Results of the image registration by gradient descent. Each of the columns represent a different level of the image pyramid used to optimize the loss function. Row 1: the original source image; Row 2: the original destination image; Row 3: the source image warped to destination at the end of the optimization loop at that specific scale level. Row 4: the photometric error between the warped image using the estimated homography and the warped image using the ground truth homography. The algorithm starts to converge in the lower scales refining the solution as it goes to the upper levels of the pyramid.
  • Figure 4: Results of the depth estimation by gradient descent showing the depth map produced by the given set of calibrated camera images over different scales. Each column represents a level of a multi-resolution image pyramid. Row 1 to 3: the source images, where the 2nd row is the reference view; Row 3: the images from row 1 and 3 warped to the reference camera given the depth at that particular scale level. Row 4 & 5: the estimated depth map and the error per pixel compared to the ground truth depth map in the reference camera. The data used for these experiments was extracted from SceneNet RGB-D dataset McCormac:etal:ICCV2017, containing photorealistic indoor image trajectories.
  • Figure 5: Targeted adversarial attack on image matching. From top to bottom: original images, which do not match; images, optimized by gradient descent to have local features that match; the result of the attack: matching features (Hessian detector + SIFT descriptor). Matching features, which survived RANSAC geometric verification