MOKD: Cross-domain Finetuning for Few-shot Classification via Maximizing Optimized Kernel Dependence

Hongduan Tian; Feng Liu; Tongliang Liu; Bo Du; Yiu-ming Cheung; Bo Han

MOKD: Cross-domain Finetuning for Few-shot Classification via Maximizing Optimized Kernel Dependence

Hongduan Tian, Feng Liu, Tongliang Liu, Bo Du, Yiu-ming Cheung, Bo Han

TL;DR

MOKD tackles cross-domain few-shot classification by reframing the prototypical NCC objective as a kernel dependence problem. It introduces a bi-level framework that first maximizes the test power of an optimized HSIC kernel (opt-HSIC) and then optimizes the dependence between representations and labels while regularizing dependence among samples via a γ-weighted HSIC(Z,Z) term. The approach yields better generalization to unseen domains and clearer data clustering on Meta-Dataset, with empirical gains over strong baselines and evidence that test power maximization and kernel dependence control are crucial. This method provides a scalable, principled way to learn class-specific representations under domain shift, with practical kernel bandwidth selection and efficient HSIC estimation.

Abstract

In cross-domain few-shot classification, \emph{nearest centroid classifier} (NCC) aims to learn representations to construct a metric space where few-shot classification can be performed by measuring the similarities between samples and the prototype of each class. An intuition behind NCC is that each sample is pulled closer to the class centroid it belongs to while pushed away from those of other classes. However, in this paper, we find that there exist high similarities between NCC-learned representations of two samples from different classes. In order to address this problem, we propose a bi-level optimization framework, \emph{maximizing optimized kernel dependence} (MOKD) to learn a set of class-specific representations that match the cluster structures indicated by labeled data of the given task. Specifically, MOKD first optimizes the kernel adopted in \emph{Hilbert-Schmidt independence criterion} (HSIC) to obtain the optimized kernel HSIC (opt-HSIC) that can capture the dependence more precisely. Then, an optimization problem regarding the opt-HSIC is addressed to simultaneously maximize the dependence between representations and labels and minimize the dependence among all samples. Extensive experiments on Meta-Dataset demonstrate that MOKD can not only achieve better generalization performance on unseen domains in most cases but also learn better data representation clusters. The project repository of MOKD is available at: \href{https://github.com/tmlr-group/MOKD}{https://github.com/tmlr-group/MOKD}.

MOKD: Cross-domain Finetuning for Few-shot Classification via Maximizing Optimized Kernel Dependence

TL;DR

Abstract

Paper Structure (46 sections, 3 theorems, 26 equations, 9 figures, 11 tables, 1 algorithm)

This paper contains 46 sections, 3 theorems, 26 equations, 9 figures, 11 tables, 1 algorithm.

Introduction
Preliminary
Motivation: Theoretically Understand NCC via the Kernel HSIC Measure
A Lower Bound of NCC-based Loss
HSIC as a Lower Bound of NCC-based Loss
Perform CFC Tasks with HSIC
Maximizing Optimized Kernel Dependence
Backbone Pre-training
Problem Formulation for MOKD
Bandwidth Selection for Test Power Maximization
Experiments
Main Results
Experimental Analysis
Conclusion
More Related Work
...and 31 more sections

Key Result

Theorem 3.2

Given a set of normalized support representations $\mathcal{Z}=\{\boldsymbol{z}_i\}_{i=1}^{|\mathcal{D}_{\mathcal{T}}|}=\{h_{\theta}\circ f_{\phi^*}(\boldsymbol{x}_i)\}_{i=1}^{|\mathcal{D}_{\mathcal{T}}|}$ and the corresponding labels $\{y_i\}_{i=1}^{|\mathcal{D}_{\mathcal{T}}|}$ that includes $N_C$ where $\boldsymbol{z}^{+}$ denotes the data samples belonging to the same class as $\boldsymbol{z}_

Figures (9)

Figure 1: Heatmaps of similarity matrices of representations respectively learned with NCC-based loss and MOKD. The left of each figure describes the similarities among all support data representations and the right side describes the similarities between query data and support data representations. As shown in (a), NCC-learned representations of samples are not only similar to samples belonging to their own class but also similar to samples from other classes. (b) shows that the undesirable high similarities existing between samples from different classes are significantly alleviated and the cluster structures of the given set of data are well explored by applying MOKD.
Figure 2: Results of analysis on MOKD.(a). Comparison results of running time between MOKD and URL. (b). Accuracy gaps between MOKD with and without test power maximization.
Figure 3: Quantitative analysis of $\gamma$.(a). Effect of $\gamma$ on accuracy of ImageNet dataset; (b). Effect of $\gamma$ on accuracy of MNIST dataset; (c). Performance gaps between MOKD w / w.o. ${\rm HSIC}(Z, Z)$; (d). Test accuracy curves of MOKD w. / w.o. ${\rm HSIC}(Z, Z)$ on ImageNet.
Figure 4: Heatmap visualization of representation similarity on Omniglot.. The results indicate that MOKD does help learn more discriminative sample clusters than those learned with NCC loss.
Figure 5: Test accuracy curves of Meta-Dataset with respect to the steps under "train on all datasets" settings. As shown in the figures, MOKD evidently achieves a better learning process and convergence performance compared with URL baseline.
...and 4 more figures

Theorems & Definitions (8)

Theorem 3.2: Lower bound of NCC-based loss
Definition 3.3: Label kernel sslhsic
Theorem 3.4
Remark 3.5
Lemma 2.1
proof
proof
proof

MOKD: Cross-domain Finetuning for Few-shot Classification via Maximizing Optimized Kernel Dependence

TL;DR

Abstract

MOKD: Cross-domain Finetuning for Few-shot Classification via Maximizing Optimized Kernel Dependence

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (9)

Theorems & Definitions (8)