Table of Contents
Fetching ...

Learning Gaussian Data Augmentation in Feature Space for One-shot Object Detection in Manga

Takara Taniguchi, Ryosuke Furuta

TL;DR

A data augmentation method in feature space to increase the variation of the query of the query with the large variation in the poses and facial expressions of characters in target images, despite having only one query image as a reference.

Abstract

We tackle one-shot object detection in Japanese Manga. The rising global popularity of Japanese manga has made the object detection of character faces increasingly important, with potential applications such as automatic colorization. However, obtaining sufficient data for training conventional object detectors is challenging due to copyright restrictions. Additionally, new characters appear every time a new volume of manga is released, making it impractical to re-train object detectors each time to detect these new characters. Therefore, one-shot object detection, where only a single query (reference) image is required to detect a new character, is an essential task in the manga industry. One challenge with one-shot object detection in manga is the large variation in the poses and facial expressions of characters in target images, despite having only one query image as a reference. Another challenge is that the frequency of character appearances follows a long-tail distribution. To overcome these challenges, we propose a data augmentation method in feature space to increase the variation of the query. The proposed method augments the feature from the query by adding Gaussian noise, with the noise variance at each channel learned during training. The experimental results show that the proposed method improves the performance for both seen and unseen classes, surpassing data augmentation methods in image space.

Learning Gaussian Data Augmentation in Feature Space for One-shot Object Detection in Manga

TL;DR

A data augmentation method in feature space to increase the variation of the query of the query with the large variation in the poses and facial expressions of characters in target images, despite having only one query image as a reference.

Abstract

We tackle one-shot object detection in Japanese Manga. The rising global popularity of Japanese manga has made the object detection of character faces increasingly important, with potential applications such as automatic colorization. However, obtaining sufficient data for training conventional object detectors is challenging due to copyright restrictions. Additionally, new characters appear every time a new volume of manga is released, making it impractical to re-train object detectors each time to detect these new characters. Therefore, one-shot object detection, where only a single query (reference) image is required to detect a new character, is an essential task in the manga industry. One challenge with one-shot object detection in manga is the large variation in the poses and facial expressions of characters in target images, despite having only one query image as a reference. Another challenge is that the frequency of character appearances follows a long-tail distribution. To overcome these challenges, we propose a data augmentation method in feature space to increase the variation of the query. The proposed method augments the feature from the query by adding Gaussian noise, with the noise variance at each channel learned during training. The experimental results show that the proposed method improves the performance for both seen and unseen classes, surpassing data augmentation methods in image space.
Paper Structure (20 sections, 8 equations, 5 figures, 3 tables)

This paper contains 20 sections, 8 equations, 5 figures, 3 tables.

Figures (5)

  • Figure 1: Key idea of our method. In one-shot object detection in manga, there is a large variation in the poses and facial expressions of characters in target images, despite having only one query image as a reference. We propose a Gaussian noise-based data augmentation in the feature space, assuming that feature vectors from the same class are normally distributed in the feature space even though they exhibit large variations in the image space.
  • Figure 2: Overview of the proposed method. After extracting feature maps from the query and target images, Gaussian noise is added to the query feature to augment it according to the Gaussian distribution. Then, the target feature and the augmented query feature are fed into the RPN, IHR module, and RCNN head to obtain detection results, similar to BHRL BHRL. The variance of the Gaussian noise is optimized at each channel during the training.
  • Figure 3: Distribution of the number of appearances for each character.
  • Figure 4: Qualitative comparisons between the proposed and other augmentation methods for seen classes (thr = 320).
  • Figure 5: Qualitative comparisons between the proposed and other augmentation methods for unseen classes (thr =100).