MOKD: Cross-domain Finetuning for Few-shot Classification via Maximizing Optimized Kernel Dependence
Hongduan Tian, Feng Liu, Tongliang Liu, Bo Du, Yiu-ming Cheung, Bo Han
TL;DR
MOKD tackles cross-domain few-shot classification by reframing the prototypical NCC objective as a kernel dependence problem. It introduces a bi-level framework that first maximizes the test power of an optimized HSIC kernel (opt-HSIC) and then optimizes the dependence between representations and labels while regularizing dependence among samples via a γ-weighted HSIC(Z,Z) term. The approach yields better generalization to unseen domains and clearer data clustering on Meta-Dataset, with empirical gains over strong baselines and evidence that test power maximization and kernel dependence control are crucial. This method provides a scalable, principled way to learn class-specific representations under domain shift, with practical kernel bandwidth selection and efficient HSIC estimation.
Abstract
In cross-domain few-shot classification, \emph{nearest centroid classifier} (NCC) aims to learn representations to construct a metric space where few-shot classification can be performed by measuring the similarities between samples and the prototype of each class. An intuition behind NCC is that each sample is pulled closer to the class centroid it belongs to while pushed away from those of other classes. However, in this paper, we find that there exist high similarities between NCC-learned representations of two samples from different classes. In order to address this problem, we propose a bi-level optimization framework, \emph{maximizing optimized kernel dependence} (MOKD) to learn a set of class-specific representations that match the cluster structures indicated by labeled data of the given task. Specifically, MOKD first optimizes the kernel adopted in \emph{Hilbert-Schmidt independence criterion} (HSIC) to obtain the optimized kernel HSIC (opt-HSIC) that can capture the dependence more precisely. Then, an optimization problem regarding the opt-HSIC is addressed to simultaneously maximize the dependence between representations and labels and minimize the dependence among all samples. Extensive experiments on Meta-Dataset demonstrate that MOKD can not only achieve better generalization performance on unseen domains in most cases but also learn better data representation clusters. The project repository of MOKD is available at: \href{https://github.com/tmlr-group/MOKD}{https://github.com/tmlr-group/MOKD}.
