Table of Contents
Fetching ...

Enhancing Weakly-Supervised Object Detection on Static Images through (Hallucinated) Motion

Cagri Gungor, Adriana Kovashka

TL;DR

This study introduces an approach to enhance WSOD methods by integrating motion information by leveraging hallucinated motion from static images to im-prove WSOD on image datasets, utilizing a Siamese network for enhanced representation learning with motion, addressing camera motion through motion normalization, and selectively training images based on object motion.

Abstract

While motion has garnered attention in various tasks, its potential as a modality for weakly-supervised object detection (WSOD) in static images remains unexplored. Our study introduces an approach to enhance WSOD methods by integrating motion information. This method involves leveraging hallucinated motion from static images to improve WSOD on image datasets, utilizing a Siamese network for enhanced representation learning with motion, addressing camera motion through motion normalization, and selectively training images based on object motion. Experimental validation on the COCO and YouTube-BB datasets demonstrates improvements over a state-of-the-art method.

Enhancing Weakly-Supervised Object Detection on Static Images through (Hallucinated) Motion

TL;DR

This study introduces an approach to enhance WSOD methods by integrating motion information by leveraging hallucinated motion from static images to im-prove WSOD on image datasets, utilizing a Siamese network for enhanced representation learning with motion, addressing camera motion through motion normalization, and selectively training images based on object motion.

Abstract

While motion has garnered attention in various tasks, its potential as a modality for weakly-supervised object detection (WSOD) in static images remains unexplored. Our study introduces an approach to enhance WSOD methods by integrating motion information. This method involves leveraging hallucinated motion from static images to improve WSOD on image datasets, utilizing a Siamese network for enhanced representation learning with motion, addressing camera motion through motion normalization, and selectively training images based on object motion. Experimental validation on the COCO and YouTube-BB datasets demonstrates improvements over a state-of-the-art method.
Paper Structure (9 sections, 8 equations, 3 figures, 1 table)

This paper contains 9 sections, 8 equations, 3 figures, 1 table.

Figures (3)

  • Figure 1: This figure illustrates the design of a Siamese WSOD network and contrastive learning by leveraging the motion modality to improve representation learning.
  • Figure 2: Visualization of the motion normalization approach.
  • Figure 3: Visualization of image selection based on motion. The top row image is not selected due to the absence of object motion, while the second row image is selected for its noticeable motion.