Table of Contents
Fetching ...

Towards Improved Proxy-based Deep Metric Learning via Data-Augmented Domain Adaptation

Li Ren, Chen Chen, Liqiang Wang, Kien Hua

TL;DR

A novel proxy-based DML framework that focuses on aligning the sample and proxy distributions to improve the efficiency of proxy-based DML losses is presented and the Data-Augmented Domain Adaptation (DADA) method is proposed to adapt the domain gap between the group of samples and proxies.

Abstract

Deep Metric Learning (DML) plays an important role in modern computer vision research, where we learn a distance metric for a set of image representations. Recent DML techniques utilize the proxy to interact with the corresponding image samples in the embedding space. However, existing proxy-based DML methods focus on learning individual proxy-to-sample distance while the overall distribution of samples and proxies lacks attention. In this paper, we present a novel proxy-based DML framework that focuses on aligning the sample and proxy distributions to improve the efficiency of proxy-based DML losses. Specifically, we propose the Data-Augmented Domain Adaptation (DADA) method to adapt the domain gap between the group of samples and proxies. To the best of our knowledge, we are the first to leverage domain adaptation to boost the performance of proxy-based DML. We show that our method can be easily plugged into existing proxy-based DML losses. Our experiments on benchmarks, including the popular CUB-200-2011, CARS196, Stanford Online Products, and In-Shop Clothes Retrieval, show that our learning algorithm significantly improves the existing proxy losses and achieves superior results compared to the existing methods.

Towards Improved Proxy-based Deep Metric Learning via Data-Augmented Domain Adaptation

TL;DR

A novel proxy-based DML framework that focuses on aligning the sample and proxy distributions to improve the efficiency of proxy-based DML losses is presented and the Data-Augmented Domain Adaptation (DADA) method is proposed to adapt the domain gap between the group of samples and proxies.

Abstract

Deep Metric Learning (DML) plays an important role in modern computer vision research, where we learn a distance metric for a set of image representations. Recent DML techniques utilize the proxy to interact with the corresponding image samples in the embedding space. However, existing proxy-based DML methods focus on learning individual proxy-to-sample distance while the overall distribution of samples and proxies lacks attention. In this paper, we present a novel proxy-based DML framework that focuses on aligning the sample and proxy distributions to improve the efficiency of proxy-based DML losses. Specifically, we propose the Data-Augmented Domain Adaptation (DADA) method to adapt the domain gap between the group of samples and proxies. To the best of our knowledge, we are the first to leverage domain adaptation to boost the performance of proxy-based DML. We show that our method can be easily plugged into existing proxy-based DML losses. Our experiments on benchmarks, including the popular CUB-200-2011, CARS196, Stanford Online Products, and In-Shop Clothes Retrieval, show that our learning algorithm significantly improves the existing proxy losses and achieves superior results compared to the existing methods.
Paper Structure (32 sections, 1 theorem, 20 equations, 11 figures, 3 tables, 1 algorithm)

This paper contains 32 sections, 1 theorem, 20 equations, 11 figures, 3 tables, 1 algorithm.

Key Result

Theorem A.1

Let $\mathcal{H}$ be a hypothesis space of VC-dimension $d$, and $\mathcal{U}_S, \mathcal{U}_M$ and $\mathcal{U}_T$ are samples of size $m$ drawn from $\mathcal{D}_S, \mathcal{D}_M, \mathcal{D}_T$. Then $\forall h \in \mathcal{H}$, where $\tilde{\lambda}$ denotes the combined risk of optimal hypothesis $h^*$ that $\tilde{\lambda} := \epsilon_S(h^*) + \epsilon_M(h^*) + \epsilon_T(h^*)$, and $d_{\m

Figures (11)

  • Figure 1: The intuition of our Data-Augmented Domain Adaptation (DADA). The classes are labeled with unique colors. The initial distribution gap between the data samples and corresponding proxies causes ambiguity for proxy-based deep metric learning. Our proposed method solves this problem by aligning the data samples and proxies, assuming they are from different data domains. We further augment the data to a dense manifold with mixed features to support this alignment.
  • Figure 2: Demonstrate the mechanisms of our adversarial learning. Each class is labeled with a unique color. Left: Illustrate the Initial Space. Mid: Illustrate the training mechanisms and progress of our proposed method. Right: Illustrate the Adapted Space after training. The surface boundaries of the classifiers are trained to discriminate the domains with Domain-level Discriminators, and sample classes with Category-level Discriminators in the discriminator training phase. In the generator training phase, the samples and proxies are pushed to fool the Domain-level Discriminators from the adversarial learning signals while the class predictions from Category-level Discriminators are maintained.
  • Figure 3: The overview of our framework. The input images are embedded with a CNN encoder. The proxies are randomly sampled and mixed with the embedding of image samples. Then a domain-level classifier $f_D(\cdot)$ and a category-level classifier $f_C(\cdot)$ are trained to predict the domain and class property of each sample and proxy. With the adversarial training paradigm, the features and proxies are moved to fool the $f_C(\cdot)$ and minimize the discrepancy of prediction of $f_D(\cdot)$. The dashed lines on the surface represent the surface boundary of our discriminators. The gradient from the adversarial learning pushes the data samples in the opposite direction from the separation of the boundary, which aligns the data and proxies.
  • Figure 4: Illustrate the optimized beta distribution on different datasets in our experiments.
  • Figure 5: Illustrate the space with the t-SNE visualizations of sample representations and corresponding proxy vectors on the part of the training set of CUB-200-2011. Each class is labeled with a unique color. We demonstrate that the embedding space of our method looks well clustered, and the proxies are sufficiently separated and close to the clustered sample data, while the proxies of the original PA are not well separated and still maintain their own distribution.
  • ...and 6 more figures

Theorems & Definitions (2)

  • Theorem A.1
  • proof