Table of Contents
Fetching ...

Aligning Motion-Blurred Images Using Contrastive Learning on Overcomplete Pixels

Leonid Pogorelyuk, Stefan T. Radev

TL;DR

A new contrastive objective for learning overcomplete pixel-level features that are invariant to motion blur is proposed and it is shown that a simple U-Net trained with this objective can produce local features useful for aligning the frames of an unseen video captured with a moving camera under realistic and challenging conditions.

Abstract

We propose a new contrastive objective for learning overcomplete pixel-level features that are invariant to motion blur. Other invariances (e.g., pose, illumination, or weather) can be learned by applying the corresponding transformations on unlabeled images during self-supervised training. We showcase that a simple U-Net trained with our objective can produce local features useful for aligning the frames of an unseen video captured with a moving camera under realistic and challenging conditions. Using a carefully designed toy example, we also show that the overcomplete pixels can encode the identity of objects in an image and the pixel coordinates relative to these objects.

Aligning Motion-Blurred Images Using Contrastive Learning on Overcomplete Pixels

TL;DR

A new contrastive objective for learning overcomplete pixel-level features that are invariant to motion blur is proposed and it is shown that a simple U-Net trained with this objective can produce local features useful for aligning the frames of an unseen video captured with a moving camera under realistic and challenging conditions.

Abstract

We propose a new contrastive objective for learning overcomplete pixel-level features that are invariant to motion blur. Other invariances (e.g., pose, illumination, or weather) can be learned by applying the corresponding transformations on unlabeled images during self-supervised training. We showcase that a simple U-Net trained with our objective can produce local features useful for aligning the frames of an unseen video captured with a moving camera under realistic and challenging conditions. Using a carefully designed toy example, we also show that the overcomplete pixels can encode the identity of objects in an image and the pixel coordinates relative to these objects.

Paper Structure

This paper contains 11 sections, 6 equations, 3 figures.

Figures (3)

  • Figure 1: Training process illustration. (a) Two regions of the same image (from the Image Matching Challenge 2022 image-matching-challenge-2022) are selected. (b,c) After transforming the selected regions into rectangles, noise and motion blur are added to each. (d) A fraction of the pixel locations from region 2 is remapped (e.g., using gather) to align with the corresponding locations in region 1. (e) The rest of the pixel locations are randomly remapped. (f,g) Regions 1 and 2 are passed through the network to generate overcomplete features of the same image size but with more channels (just three channels are shown). (h) A fraction of the features of region 2 are remapped to align with features from region 1 and are encouraged to match in $L^{\infty}$. The rest of the features (not shown) are randomly remapped and encouraged not to match (contrastive loss).
  • Figure 2: A challenging image alignment example. (a) A static frame from a video of a farmers market (Troy, NY) taken from afar. (b) Three channels of the overcomplete representation generated by our network. (c and d) Frames 1 and 2 belong to a video captured with a moving global shutter camera. The frames are about 10 seconds apart and exhibit motion blur and overexposure. We applied a standard procedure of first finding the closest matches between the overcomplete features of the two frames, then using RANSAC to find matches corresponding to the same affine transformation between the two images. The colored dots correspond to all the locations successfully matched between the two frames despite the extreme motion blur.
  • Figure 3: A toy example to interpret the overcomplete pixels learned by the network. (a) Patterns A (top) and B (bottom) used in this example. (b and c) Two noisy training views of the same image composed of several randomly affine-transformed A or B patterns. (d) Two A and two B patterns with superimposed polar grids used to illustrate the affine transformation applied to the shapes. (e) Three out of twelve channels of the overcomplete output that encode information that resembles the radial distance from the centers of the shapes in pre-transformation coordinates. (f) More channels that encode radial distance and some of the overlap between adjacent areas (yellow areas). (g) Channels that encode mostly the polar angle (azimuth) by sectors similar to straight red lines in (d). (h) Channels that encode whether the center of the shape belongs to pattern A (red/magenta) or B (yellow/green), as well as some azimuth information.