Table of Contents
Fetching ...

Anchor-based Maximum Discrepancy for Relative Similarity Testing

Zhijian Zhou, Liuhua Peng, Xunye Tian, Feng Liu

TL;DR

The paper introduces Anchor-based Maximum Discrepancy (AMD), a kernel-based framework for relative similarity testing with three distributions $\mathbb{U},\mathbb{P},\mathbb{Q}$. AMD defines the relative similarity as the maximum discrepancy across a space of deep kernels, effectively learning both the hypothesis and the kernel in Phase I, followed by a Phase II unified AMD test using wild bootstrap to assess significance. The method comes with theoretical guarantees, including consistency of the AMD estimator and asymptotic power advantages when the learned direction aligns with the true relative similarity, and is validated on benchmarks (MNIST/CIFAR-10) and practical tasks like relative model evaluation and adversarial perturbation detection. The work demonstrates improved test power across regimes, provides open-source code, and discusses limitations such as potential overfitting and runtime, while outlining directions to extend relative similarity testing to more distributions.

Abstract

The relative similarity testing aims to determine which of the distributions, P or Q, is closer to an anchor distribution U. Existing kernel-based approaches often test the relative similarity with a fixed kernel in a manually specified alternative hypothesis, e.g., Q is closer to U than P. Although kernel selection is known to be important to kernel-based testing methods, the manually specified hypothesis poses a significant challenge for kernel selection in relative similarity testing: Once the hypothesis is specified first, we can always find a kernel such that the hypothesis is rejected. This challenge makes relative similarity testing ill-defined when we want to select a good kernel after the hypothesis is specified. In this paper, we cope with this challenge via learning a proper hypothesis and a kernel simultaneously, instead of learning a kernel after manually specifying the hypothesis. We propose an anchor-based maximum discrepancy (AMD), which defines the relative similarity as the maximum discrepancy between the distances of (U, P) and (U, Q) in a space of deep kernels. Based on AMD, our testing incorporates two phases. In Phase I, we estimate the AMD over the deep kernel space and infer the potential hypothesis. In Phase II, we assess the statistical significance of the potential hypothesis, where we propose a unified testing framework to derive thresholds for tests over different possible hypotheses from Phase I. Lastly, we validate our method theoretically and demonstrate its effectiveness via extensive experiments on benchmark datasets. Codes are publicly available at: https://github.com/zhijianzhouml/AMD.

Anchor-based Maximum Discrepancy for Relative Similarity Testing

TL;DR

The paper introduces Anchor-based Maximum Discrepancy (AMD), a kernel-based framework for relative similarity testing with three distributions . AMD defines the relative similarity as the maximum discrepancy across a space of deep kernels, effectively learning both the hypothesis and the kernel in Phase I, followed by a Phase II unified AMD test using wild bootstrap to assess significance. The method comes with theoretical guarantees, including consistency of the AMD estimator and asymptotic power advantages when the learned direction aligns with the true relative similarity, and is validated on benchmarks (MNIST/CIFAR-10) and practical tasks like relative model evaluation and adversarial perturbation detection. The work demonstrates improved test power across regimes, provides open-source code, and discusses limitations such as potential overfitting and runtime, while outlining directions to extend relative similarity testing to more distributions.

Abstract

The relative similarity testing aims to determine which of the distributions, P or Q, is closer to an anchor distribution U. Existing kernel-based approaches often test the relative similarity with a fixed kernel in a manually specified alternative hypothesis, e.g., Q is closer to U than P. Although kernel selection is known to be important to kernel-based testing methods, the manually specified hypothesis poses a significant challenge for kernel selection in relative similarity testing: Once the hypothesis is specified first, we can always find a kernel such that the hypothesis is rejected. This challenge makes relative similarity testing ill-defined when we want to select a good kernel after the hypothesis is specified. In this paper, we cope with this challenge via learning a proper hypothesis and a kernel simultaneously, instead of learning a kernel after manually specifying the hypothesis. We propose an anchor-based maximum discrepancy (AMD), which defines the relative similarity as the maximum discrepancy between the distances of (U, P) and (U, Q) in a space of deep kernels. Based on AMD, our testing incorporates two phases. In Phase I, we estimate the AMD over the deep kernel space and infer the potential hypothesis. In Phase II, we assess the statistical significance of the potential hypothesis, where we propose a unified testing framework to derive thresholds for tests over different possible hypotheses from Phase I. Lastly, we validate our method theoretically and demonstrate its effectiveness via extensive experiments on benchmark datasets. Codes are publicly available at: https://github.com/zhijianzhouml/AMD.

Paper Structure

This paper contains 24 sections, 9 theorems, 129 equations, 8 figures, 13 tables, 1 algorithm.

Key Result

Theorem 2

Let $\mathcal{M}$ be a set of probability measures over the space $\mathcal{X}\subseteq \mathbb{R}^d$, and let $\mathcal{K}$ be a kernel space consisting of characteristic kernels. For every $\mathbb{U}, \mathbb{P}, \mathbb{Q}, \mathbb{W}\in\mathcal{M}$, the three random variables $d(\mathbb{U},\mat

Figures (8)

  • Figure 1: The comparisons between AMD test and baselines. We set $\mathbb{U} = \nu\mathbb{P} + (1-\nu)\mathbb{Q}$ with $\nu\in[0,1]$. When $\nu < 0.5$, $\mathbb{Q}$ is closer to $\mathbb{U}$, and previous approaches perform well in terms of rejection rates (i.e., test power), as this aligns with the prespecified alternative hypothesis $\bm{H}'_1: d(\mathbb{U}, \mathbb{P}) > d(\mathbb{U}, \mathbb{Q})$; however, when $\nu > 0.5$, their performance deteriorates as $\mathbb{P}$ is closer to $\mathbb{U}$. In comparison, our AMD test performs well for both $\nu < 0.5$ and $\nu > 0.5$ by adjusting alternative hypothesis with $F$. Notably, when $\nu=0.5$ (i.e., no relative similarity relationship exists), all methods control the rejection rate (type-I error) at level $\alpha=0.05$ (black dashed line). The $p$-values align with the findings derived from the rejection rates, demonstrating the effectiveness of AMD test.
  • Figure 2: Probability $\beta = \Pr[F\cdot d^{\kappa^*_m}(\mathbb{U},\mathbb{P},\mathbb{Q})>0]$ versus sample size with parameter $\nu=0.3$.
  • Figure 3: Influence of the regularization parameter $\lambda$ in AMD relative similarity testing.
  • Figure 4: Test Power comparison between AMD and MMD-D in identifying which unlabeled variant of ImageNet the ResNet50 model (pre-trained on ImageNet) performs better on.
  • Figure 5: Comparisons in detecting whether the adversarial perturbations on CIFAR-10 exceeding 4/255.
  • ...and 3 more figures

Theorems & Definitions (16)

  • Definition 1
  • Theorem 2
  • Theorem 3
  • Lemma 4
  • Theorem 5
  • proof
  • Definition 6
  • Theorem 7
  • proof
  • Theorem 1
  • ...and 6 more