Table of Contents
Fetching ...

Learning for Transductive Threshold Calibration in Open-World Recognition

Qin Zhang, Dongsheng An, Tianjun Xiao, Tong He, Qingming Tang, Ying Nian Wu, Joseph Tighe, Yifan Xing, Stefano Soatto

TL;DR

OpenGCN, a Graph Neural Network-based transductive threshold calibration method with enhanced adaptability and robustness, is introduced, allowing for transductive inference of the distance thresholds which also incorporates test-time information.

Abstract

In deep metric learning for visual recognition, the calibration of distance thresholds is crucial for achieving desired model performance in the true positive rates (TPR) or true negative rates (TNR). However, calibrating this threshold presents challenges in open-world scenarios, where the test classes can be entirely disjoint from those encountered during training. We define the problem of finding distance thresholds for a trained embedding model to achieve target performance metrics over unseen open-world test classes as open-world threshold calibration. Existing posthoc threshold calibration methods, reliant on inductive inference and requiring a calibration dataset with a similar distance distribution as the test data, often prove ineffective in open-world scenarios. To address this, we introduce OpenGCN, a Graph Neural Network-based transductive threshold calibration method with enhanced adaptability and robustness. OpenGCN learns to predict pairwise connectivity for the unlabeled test instances embedded in a graph to determine its TPR and TNR at various distance thresholds, allowing for transductive inference of the distance thresholds which also incorporates test-time information. Extensive experiments across open-world visual recognition benchmarks validate OpenGCN's superiority over existing posthoc calibration methods for open-world threshold calibration.

Learning for Transductive Threshold Calibration in Open-World Recognition

TL;DR

OpenGCN, a Graph Neural Network-based transductive threshold calibration method with enhanced adaptability and robustness, is introduced, allowing for transductive inference of the distance thresholds which also incorporates test-time information.

Abstract

In deep metric learning for visual recognition, the calibration of distance thresholds is crucial for achieving desired model performance in the true positive rates (TPR) or true negative rates (TNR). However, calibrating this threshold presents challenges in open-world scenarios, where the test classes can be entirely disjoint from those encountered during training. We define the problem of finding distance thresholds for a trained embedding model to achieve target performance metrics over unseen open-world test classes as open-world threshold calibration. Existing posthoc threshold calibration methods, reliant on inductive inference and requiring a calibration dataset with a similar distance distribution as the test data, often prove ineffective in open-world scenarios. To address this, we introduce OpenGCN, a Graph Neural Network-based transductive threshold calibration method with enhanced adaptability and robustness. OpenGCN learns to predict pairwise connectivity for the unlabeled test instances embedded in a graph to determine its TPR and TNR at various distance thresholds, allowing for transductive inference of the distance thresholds which also incorporates test-time information. Extensive experiments across open-world visual recognition benchmarks validate OpenGCN's superiority over existing posthoc calibration methods for open-world threshold calibration.
Paper Structure (11 sections, 2 theorems, 12 equations, 4 figures, 7 tables)

This paper contains 11 sections, 2 theorems, 12 equations, 4 figures, 7 tables.

Key Result

Theorem 1

(Correspondence between $s^\mathrm{avg}$ and $\text{TPR}^k$) Let $\mathcal{N}$ be a cluster with high purity, where the majority class is $k$. For each sample $i\in\mathcal{N}$, when both $|\mathcal{N}|$ and $|\mathcal{N}_i|$ are sufficiently large, $\text{TPR}^k$ can be approximated as: where $\mathfrak{a}_i^\mathrm{avg}=\frac{1}{|\mathcal{N}_i|}{\sum_{j\in \mathcal{N}_i}a_{ij}}$, and $\mathfrak

Figures (4)

  • Figure 1: This figure illustrates the open-world threshold calibration problem. In open-world recognition, the embedding model is trained on closed-set classes but tested on distinct open-world classes. When applying the model to open-world classes, it often produces less compact embeddings than those encountered during training, necessitating the calibration of the distance threshold for achieving the desired TPR and TNR trade-off. However, the absence of prior knowledge about open-world test classes and distributions makes it challenging to find the optimal distance threshold, denoted as $d^\mathrm{opt}$. Best viewed in color.
  • Figure 2: This figure distinguishes between (left) inductive and (right) transductive threshold calibration methods in open-world scenarios with disjoint test-time classes. Inductive methods rely on a labeled hold-out dataset with the same distance distribution as the test data to learn general calibration rules. Transductive methods, however, also use the test information for more specific calibration, as indicated by the red arrow. Best viewed in color.
  • Figure 3: OpenGCN training workflow: (a) During pre-training, OpenGCN jointly optimizes pairwise connectivity, and instance-specific neighborhood and average densities. (b) During fine-tuning, the 2-layer MLP is reset for fine-tuning, while the other weights remain frozen. Solid blue and dashed red arrows represent forward and backward propagation, respectively. At test time, we employ the trained OpenGCN model and MLP head to predict the TPR and TNR as functions of each distance threshold specifically for each test distribution. We then follow \ref{['eq:optimization']} and use grid search to find the optimal distance threshold for each test dataset. Best viewed in color.
  • Figure :

Theorems & Definitions (2)

  • Theorem 1
  • Theorem 2