Table of Contents
Fetching ...

Dark Side Augmentation: Generating Diverse Night Examples for Metric Learning

Albert Mohwald, Tomas Jenicek, Ondřej Chum

TL;DR

Nighttime image retrieval suffers from scarce and non-diverse night data, limiting metric-learning based descriptors. The authors introduce a light-weight HEDN GAN that translates day images to night with edge-consistency, jointly trains a night-edge detector, and employs diverse anchor mining to diversify training examples. Their pipeline, which uses synthetic night data generated during training, yields state-of-the-art results on Tokyo 24/7 while maintaining performance on Oxford and Paris, without relying on paired day-night training data. The work demonstrates that edge-aware generation plus diversity in anchors can substantially improve nighttime robustness of global descriptors and is readily applicable to other metric-learning tasks.

Abstract

Image retrieval methods based on CNN descriptors rely on metric learning from a large number of diverse examples of positive and negative image pairs. Domains, such as night-time images, with limited availability and variability of training data suffer from poor retrieval performance even with methods performing well on standard benchmarks. We propose to train a GAN-based synthetic-image generator, translating available day-time image examples into night images. Such a generator is used in metric learning as a form of augmentation, supplying training data to the scarce domain. Various types of generators are evaluated and analyzed. We contribute with a novel light-weight GAN architecture that enforces the consistency between the original and translated image through edge consistency. The proposed architecture also allows a simultaneous training of an edge detector that operates on both night and day images. To further increase the variability in the training examples and to maximize the generalization of the trained model, we propose a novel method of diverse anchor mining. The proposed method improves over the state-of-the-art results on a standard Tokyo 24/7 day-night retrieval benchmark while preserving the performance on Oxford and Paris datasets. This is achieved without the need of training image pairs of matching day and night images. The source code is available at https://github.com/mohwald/gandtr .

Dark Side Augmentation: Generating Diverse Night Examples for Metric Learning

TL;DR

Nighttime image retrieval suffers from scarce and non-diverse night data, limiting metric-learning based descriptors. The authors introduce a light-weight HEDN GAN that translates day images to night with edge-consistency, jointly trains a night-edge detector, and employs diverse anchor mining to diversify training examples. Their pipeline, which uses synthetic night data generated during training, yields state-of-the-art results on Tokyo 24/7 while maintaining performance on Oxford and Paris, without relying on paired day-night training data. The work demonstrates that edge-aware generation plus diversity in anchors can substantially improve nighttime robustness of global descriptors and is readily applicable to other metric-learning tasks.

Abstract

Image retrieval methods based on CNN descriptors rely on metric learning from a large number of diverse examples of positive and negative image pairs. Domains, such as night-time images, with limited availability and variability of training data suffer from poor retrieval performance even with methods performing well on standard benchmarks. We propose to train a GAN-based synthetic-image generator, translating available day-time image examples into night images. Such a generator is used in metric learning as a form of augmentation, supplying training data to the scarce domain. Various types of generators are evaluated and analyzed. We contribute with a novel light-weight GAN architecture that enforces the consistency between the original and translated image through edge consistency. The proposed architecture also allows a simultaneous training of an edge detector that operates on both night and day images. To further increase the variability in the training examples and to maximize the generalization of the trained model, we propose a novel method of diverse anchor mining. The proposed method improves over the state-of-the-art results on a standard Tokyo 24/7 day-night retrieval benchmark while preserving the performance on Oxford and Paris datasets. This is achieved without the need of training image pairs of matching day and night images. The source code is available at https://github.com/mohwald/gandtr .
Paper Structure (34 sections, 3 figures, 8 tables)

This paper contains 34 sections, 3 figures, 8 tables.

Figures (3)

  • Figure 1: Examples of day-to-night translations with various generators. Each row consists of (left to right) the source image, and images translated by: CycleGAN, CyEDA, and the proposed RCFNGAN and HEDNGAN. All models are trained on the SfM dataset, except for CyEDA where a model pre-trained on BDD100k is used.
  • Figure 3: One training step with unpaired day and night images (left block) of our HEDN GAN architecture. The day $\mathrel{ \mkern-4mu\hbox{)}}$ night generator translates the input day image (top left) into a fake night image (center), enforcing the edge consistency by L1 loss between HED and HEDN outputs (top right). The night discriminator predicts whether the generated night image (center) and the input night image (bottom left) are real or fake. HEDN edge detector (student) is trained by HED edge detector (teacher, not trained) to output night image edgemaps while preserving day image edgemaps.
  • Figure 4: Training data generation and photometric normalization during embedding network fine-tuning. A mined diverse anchor (center left day image) is randomly translated into a night image (gray block, trained generator from Figure \ref{['fig:hedgan-model']}). The randomly translated image is used to mine a set of five negative images. The contrastive loss is applied on the global descriptors of the positive pair (L2) and of the negative pairs (ReLU($\mu$ - L2)).