Table of Contents
Fetching ...

Task-Specific Preconditioner for Cross-Domain Few-Shot Learning

Suhyun Kang, Jungwon Park, Wonseok Lee, Wonjong Rhee

TL;DR

This work tackles cross-domain few-shot learning by introducing Task-Specific Preconditioned gradient descent (TSP), a mechanism that adapts optimization to the target task via a positive-definite preconditioner. TSP meta-learns Domain-Specific Preconditioners (DSPs) for each seen domain and combines them with task-coefficients derived from a Dataset Classifier to form a Task-Specific Preconditioner that guides gradient descent toward the steepest-descent direction in the target task’s geometry. Key contributions include a formal PD design for DSPs, a bi-level optimization framework to learn both DSPs and task-coefficients, and extensive experiments on Meta-Dataset showing state-of-the-art results in both multi-domain and single-domain settings, with ablations highlighting the PD constraint’s importance. The approach enables robust cross-domain adaptation by leveraging multi-domain knowledge to tailor the optimization process, yielding practical improvements for cross-domain few-shot classification tasks.

Abstract

Cross-Domain Few-Shot Learning~(CDFSL) methods typically parameterize models with task-agnostic and task-specific parameters. To adapt task-specific parameters, recent approaches have utilized fixed optimization strategies, despite their potential sub-optimality across varying domains or target tasks. To address this issue, we propose a novel adaptation mechanism called Task-Specific Preconditioned gradient descent~(TSP). Our method first meta-learns Domain-Specific Preconditioners~(DSPs) that capture the characteristics of each meta-training domain, which are then linearly combined using task-coefficients to form the Task-Specific Preconditioner. The preconditioner is applied to gradient descent, making the optimization adaptive to the target task. We constrain our preconditioners to be positive definite, guiding the preconditioned gradient toward the direction of steepest descent. Empirical evaluations on the Meta-Dataset show that TSP achieves state-of-the-art performance across diverse experimental scenarios.

Task-Specific Preconditioner for Cross-Domain Few-Shot Learning

TL;DR

This work tackles cross-domain few-shot learning by introducing Task-Specific Preconditioned gradient descent (TSP), a mechanism that adapts optimization to the target task via a positive-definite preconditioner. TSP meta-learns Domain-Specific Preconditioners (DSPs) for each seen domain and combines them with task-coefficients derived from a Dataset Classifier to form a Task-Specific Preconditioner that guides gradient descent toward the steepest-descent direction in the target task’s geometry. Key contributions include a formal PD design for DSPs, a bi-level optimization framework to learn both DSPs and task-coefficients, and extensive experiments on Meta-Dataset showing state-of-the-art results in both multi-domain and single-domain settings, with ablations highlighting the PD constraint’s importance. The approach enables robust cross-domain adaptation by leveraging multi-domain knowledge to tailor the optimization process, yielding practical improvements for cross-domain few-shot classification tasks.

Abstract

Cross-Domain Few-Shot Learning~(CDFSL) methods typically parameterize models with task-agnostic and task-specific parameters. To adapt task-specific parameters, recent approaches have utilized fixed optimization strategies, despite their potential sub-optimality across varying domains or target tasks. To address this issue, we propose a novel adaptation mechanism called Task-Specific Preconditioned gradient descent~(TSP). Our method first meta-learns Domain-Specific Preconditioners~(DSPs) that capture the characteristics of each meta-training domain, which are then linearly combined using task-coefficients to form the Task-Specific Preconditioner. The preconditioner is applied to gradient descent, making the optimization adaptive to the target task. We constrain our preconditioners to be positive definite, guiding the preconditioned gradient toward the direction of steepest descent. Empirical evaluations on the Meta-Dataset show that TSP achieves state-of-the-art performance across diverse experimental scenarios.

Paper Structure

This paper contains 49 sections, 3 theorems, 26 equations, 6 figures, 16 tables, 3 algorithms.

Key Result

Theorem 1

Let $p_k \in [0, 1], k=1,\cdots,K$, be the task-coefficients satisfying $\sum^K_{k=1}p_k=1$. For the Domain-Specific Preconditioners $\mathbf{P}_k \in \mathbb{R}^{m \times m}, k=1,\cdots,K$, Task-Specific Preconditioner $\mathbf{P}$ defined as $\mathbf{P} = \sum^K_{k=1}p_k \cdot \mathbf{P}_k$ is pos

Figures (6)

  • Figure 1: All experiments are conducted baed on TSA. (a) The optimal optimization strategy can vary significantly depending on the nature of the target task, leading to notable differences in performance on the Meta-Dataset. (b) The accuracy of seen and unseen for the Meta-Dataset. Compared to the baseline of using gradient descent, adopting a preconditioner without a PD constraint can be unreliable. With a PD constraint, it becomes reliable to adapt the preconditioner to the target task. Further details on these preconditioners are provided in Appendix A.
  • Figure 2: Illustration of forming a Task-Specific Preconditioner based on three DSPs that have been meta-trained for three meta-training domains.
  • Figure 3: (a) PGD with Domain-Specific Preconditioner (DSP) in the inner-level optimization. During meta-training, for a train task $\mathcal{T}$, DSP is chosen based on the domain label $d_{\mathcal{T}}$, and each task-specific parameter $\theta^l$ are optimized using PGD with the selected DSP $\mathbf{P}^l_{d_{\mathcal{T}}}$. (b) PGD with Task-Specific Preconditioner. During meta-testing, for a test task, each Task-Specific Preconditioner $\mathbf{P}^l_{\mathcal{T}}$ is contructed using DSPs and task-coefficients generated by Dataset Classifier. Each task-specific parameter $\theta^l$ is then then optimized using PGD with $\mathbf{P}^l_{\mathcal{T}}$.
  • Figure 4: Learning curves of PGD with and without the PD constraint across both seen and unseen domains. Further details on the preconditioners used in this figure can be found in Appendix A.
  • Figure 5: Task-coefficient values used in the construction of Task-Specific Preconditioner. Columns represent DSPs trained on one of the eight training domains of Meta-Dataset. (a) Rows represent five test tasks randomly sampled from each of the four domains: 2 seen domains (ImageNet, Birds) and 2 unseen domains (Traffic Sign, and MSCOCO). (b) Rows represent five test tasks randomly sampled from each of the three seen domains: Omniglot, Aricraft, and Textures. (c) Rows represent five test tasks randomly sampled from each of the three seen domains: Quick Draw, Fungi, and VGG Flower. (d) Rows represent five test tasks randomly sampled from each of the three unseen domains: MNIST, CIFAR-10, and CIFAR-100.
  • ...and 1 more figures

Theorems & Definitions (5)

  • Theorem 1
  • Lemma 1
  • proof
  • Theorem 1
  • proof