Table of Contents
Fetching ...

Meta-Learning for Color-to-Infrared Cross-Modal Style Transfer

Evelyn A. Stump, Francesco Luzi, Leslie M. Collins, Jordan M. Malof

TL;DR

Cross-modal style transfer (CMST) is explored to leverage large and diverse color imagery datasets so that they can be used to train DNN-based IR image-based object detectors, and it is found that CMST is highly effective for DNN-based detectors.

Abstract

Recent object detection models for infrared (IR) imagery are based upon deep neural networks (DNNs) and require large amounts of labeled training imagery. However, publicly available datasets that can be used for such training are limited in their size and diversity. To address this problem, we explore cross-modal style transfer (CMST) to leverage large and diverse color imagery datasets so that they can be used to train DNN-based IR image-based object detectors. We evaluate six contemporary stylization methods on four publicly-available IR datasets - the first comparison of its kind - and find that CMST is highly effective for DNN-based detectors. Surprisingly, we find that existing data-driven methods are outperformed by a simple grayscale stylization (an average of the color channels). Our analysis reveals that existing data-driven methods are either too simplistic or introduce significant artifacts into the imagery. To overcome these limitations, we propose meta-learning style transfer (MLST), which learns a stylization by composing and tuning well-behaved analytic functions. We find that MLST leads to more complex stylizations without introducing significant image artifacts and achieves the best overall detector performance on our benchmark datasets.

Meta-Learning for Color-to-Infrared Cross-Modal Style Transfer

TL;DR

Cross-modal style transfer (CMST) is explored to leverage large and diverse color imagery datasets so that they can be used to train DNN-based IR image-based object detectors, and it is found that CMST is highly effective for DNN-based detectors.

Abstract

Recent object detection models for infrared (IR) imagery are based upon deep neural networks (DNNs) and require large amounts of labeled training imagery. However, publicly available datasets that can be used for such training are limited in their size and diversity. To address this problem, we explore cross-modal style transfer (CMST) to leverage large and diverse color imagery datasets so that they can be used to train DNN-based IR image-based object detectors. We evaluate six contemporary stylization methods on four publicly-available IR datasets - the first comparison of its kind - and find that CMST is highly effective for DNN-based detectors. Surprisingly, we find that existing data-driven methods are outperformed by a simple grayscale stylization (an average of the color channels). Our analysis reveals that existing data-driven methods are either too simplistic or introduce significant artifacts into the imagery. To overcome these limitations, we propose meta-learning style transfer (MLST), which learns a stylization by composing and tuning well-behaved analytic functions. We find that MLST leads to more complex stylizations without introducing significant image artifacts and achieves the best overall detector performance on our benchmark datasets.
Paper Structure (10 sections, 4 equations, 5 figures, 2 tables, 2 algorithms)

This paper contains 10 sections, 4 equations, 5 figures, 2 tables, 2 algorithms.

Figures (5)

  • Figure 1: An example of (a) an RGB image, (b) a real IR image and (c) a synthetic IR image produced by a style transfer model. Images from flirdataset
  • Figure 2: This figure shows a diagram of our proposed MLST model. RGB imagery is stylized by a composition of k functions ($O^{n}$) and function parameters ($\mu_k^n, \rho_k^n$) sampled from a distribution of $N$ possible function where $n \in [1 ,2, \cdots, N]$. $\mu_k^n$ and $\rho_k^n$ are sampled from a learned probability distribution where $\rho$ determines the probability of application for each function and $\mu$ is the function specific parameter. This composition of functions $f_\theta()$ is termed a Stylization Policy. The loss of an adversarial critic (red line) is used as the training signal for the policy.
  • Figure 3: Example imagery produced by each CMST method we evaluated. Each column of the matrix has five representative images all stylized by an NST algorithm. Gray and WCT2 are both very similar, CyCADA introduces artifacts into the imagery, and InfraGAN saturates images when deployed on out-of-domain images. MLST introduces no artifacts and produces a relatively complex nonlinear stylization.
  • Figure 4: This figure shows how MLST is trained for stylization (top) and how MLST is trained for augmentation (bottom). For stylization only RGB imagery is acted upon by the policy during training. For augmentation IR imagery is acted upon by the policy in training.
  • Figure 5: This figure shows the differences in learned policies of MLST when used for stylization (blue), augmentation (orange), and unsupervised MLST (yellow). The top plot shows the expected number of times each operation in the dictionary is applied to an image $E[T^n]$ The lower plot shows the average parameter value of each operation $E[\mu^{(n)]}$. Note that operations invert and identity have no parameter and their $E[\mu^{(n)]}$ is set to zero.