Table of Contents
Fetching ...

Retrieval Robust to Object Motion Blur

Rong Zou, Marc Pollefeys, Denys Rozumnyi

TL;DR

This work tackles object instance retrieval under motion blur, a setting where existing methods struggle due to blur distortions. It introduces BRIDGE, a blur-robust descriptor generator, and BLISS, a blur-level guided contrastive learning strategy, trained with multiple losses to produce blur-invariant representations. The authors release synthetic and real-world blur datasets to benchmark blur-robust retrieval and demonstrate that their approach outperforms state-of-the-art methods across blur levels and transfers well to real video data. The results highlight the practical potential for robust object retrieval in surveillance, sports analytics, and wildlife monitoring, while also outlining avenues for handling multiple objects and broader blur types in future work.

Abstract

Moving objects are frequently seen in daily life and usually appear blurred in images due to their motion. While general object retrieval is a widely explored area in computer vision, it primarily focuses on sharp and static objects, and retrieval of motion-blurred objects in large image collections remains unexplored. We propose a method for object retrieval in images that are affected by motion blur. The proposed method learns a robust representation capable of matching blurred objects to their deblurred versions and vice versa. To evaluate our approach, we present the first large-scale datasets for blurred object retrieval, featuring images with objects exhibiting varying degrees of blur in various poses and scales. We conducted extensive experiments, showing that our method outperforms state-of-the-art retrieval methods on the new blur-retrieval datasets, which validates the effectiveness of the proposed approach. Code, data, and model are available at https://github.com/Rong-Zou/Retrieval-Robust-to-Object-Motion-Blur.

Retrieval Robust to Object Motion Blur

TL;DR

This work tackles object instance retrieval under motion blur, a setting where existing methods struggle due to blur distortions. It introduces BRIDGE, a blur-robust descriptor generator, and BLISS, a blur-level guided contrastive learning strategy, trained with multiple losses to produce blur-invariant representations. The authors release synthetic and real-world blur datasets to benchmark blur-robust retrieval and demonstrate that their approach outperforms state-of-the-art methods across blur levels and transfers well to real video data. The results highlight the practical potential for robust object retrieval in surveillance, sports analytics, and wildlife monitoring, while also outlining avenues for handling multiple objects and broader blur types in future work.

Abstract

Moving objects are frequently seen in daily life and usually appear blurred in images due to their motion. While general object retrieval is a widely explored area in computer vision, it primarily focuses on sharp and static objects, and retrieval of motion-blurred objects in large image collections remains unexplored. We propose a method for object retrieval in images that are affected by motion blur. The proposed method learns a robust representation capable of matching blurred objects to their deblurred versions and vice versa. To evaluate our approach, we present the first large-scale datasets for blurred object retrieval, featuring images with objects exhibiting varying degrees of blur in various poses and scales. We conducted extensive experiments, showing that our method outperforms state-of-the-art retrieval methods on the new blur-retrieval datasets, which validates the effectiveness of the proposed approach. Code, data, and model are available at https://github.com/Rong-Zou/Retrieval-Robust-to-Object-Motion-Blur.
Paper Structure (24 sections, 12 equations, 11 figures, 10 tables)

This paper contains 24 sections, 12 equations, 11 figures, 10 tables.

Figures (11)

  • Figure 1: Retrieval robust to object motion blur. We present the first approach designed for retrieval in the presence of object motion blur. Our method captures blur-invariant features and generates blur-robust representations. We show retrieval results on the newly created datasets for the task of blur-robust retrieval: synthetic (top) and real (bottom). All retrieved images match the query and are marked with green boxes.
  • Figure 1: Examples of hard distractors in our distractor set. Each row displays a query image on the left, 8 hard distractors in the middle (marked with red boxes), and 2 matching images from database on the right (1 blurred and 1 sharp, marked with green boxes).
  • Figure 2: Overview of the proposed method for blur-robust retrieval. Our model takes a query image as input. Then, it generates a descriptor robust to object motion blur through the Blur Robust Image Descriptor Generator (BRIDGE) module that encodes the input image using a CNN backbone followed by generalized mean-pooling (GeM) gem, extracting a feature vector. This vector is subsequently fed into three heads: the blur estimation head, classification head, and localization head. Each head processes the feature vector to extract relevant information for identifying motion-blurred objects. The resulting three processed feature vectors are concatenated and passed through the final fully connected (FC) layer to produce the blur-robust descriptor. During training, the Blur Level-based Image Sample Selection (BLISS) mechanism is employed to select contrastive samples based on the query blur level (BL, Eq. \ref{['eq:bl']}) $b_q$ and a specified blur level range $r$. Then, these selected image samples are input into the BRIDGE module to extract descriptors for subsequent contrastive learning.
  • Figure 2: Retrieval results of different methods by query and database blur level (BL) on our synthetic dataset (without 1M distractors).
  • Figure 3: The introduced synthetic (a) and real-world (b) datasets. Rows 1-2: different trajectories of the same object. Rows 2-3: two different objects from the same category with similar shapes. Rows 4-6: objects from different categories but share similar textures. Columns correspond to different blur levels, from 1 to 6, and they are different for synthetic (BL) and real-world (BLr) datasets.
  • ...and 6 more figures