Table of Contents
Fetching ...

Effects of Structural Allocation of Geometric Task Diversity in Linear Meta-Learning Models

Saptati Datta, Nicolas W. Hengartner, Yulia Pimonova, Natalie E. Klein, Nicholas Lubbers

TL;DR

It is shown theoretically and through simulation that meta-learning prediction degrades when a larger fraction of between-task variability lies in orthogonal, non-informative directions, even when the overall geometric variability of tasks is held fixed.

Abstract

Meta-learning aims to leverage information across related tasks to improve prediction on unlabeled data for new tasks when only a small number of labeled observations are available ("few-shot" learning). Increased task diversity is often believed to enhance meta-learning by providing richer information across tasks. However, recent work by Kumar et al. (2022) shows that increasing task diversity, quantified through the overall geometric spread of task representations, can in fact degrade meta-learning prediction performance across a range of models and datasets. In this work, we build on this observation by showing that meta-learning performance is affected not only by the overall geometric variability of task parameters, but also by how this variability is allocated relative to an underlying low-dimensional structure. Similar to Pimonova et al. (2025), we decompose task-specific regression effects into a structurally informative component and an orthogonal, non-informative component. We show theoretically and through simulation that meta-learning prediction degrades when a larger fraction of between-task variability lies in orthogonal, non-informative directions, even when the overall geometric variability of tasks is held fixed.

Effects of Structural Allocation of Geometric Task Diversity in Linear Meta-Learning Models

TL;DR

It is shown theoretically and through simulation that meta-learning prediction degrades when a larger fraction of between-task variability lies in orthogonal, non-informative directions, even when the overall geometric variability of tasks is held fixed.

Abstract

Meta-learning aims to leverage information across related tasks to improve prediction on unlabeled data for new tasks when only a small number of labeled observations are available ("few-shot" learning). Increased task diversity is often believed to enhance meta-learning by providing richer information across tasks. However, recent work by Kumar et al. (2022) shows that increasing task diversity, quantified through the overall geometric spread of task representations, can in fact degrade meta-learning prediction performance across a range of models and datasets. In this work, we build on this observation by showing that meta-learning performance is affected not only by the overall geometric variability of task parameters, but also by how this variability is allocated relative to an underlying low-dimensional structure. Similar to Pimonova et al. (2025), we decompose task-specific regression effects into a structurally informative component and an orthogonal, non-informative component. We show theoretically and through simulation that meta-learning prediction degrades when a larger fraction of between-task variability lies in orthogonal, non-informative directions, even when the overall geometric variability of tasks is held fixed.

Paper Structure

This paper contains 25 sections, 2 theorems, 105 equations, 7 figures, 1 table, 2 algorithms.

Key Result

Lemma 5.1

Let the error variance be fixed at $\sigma = \sigma^\star$, which is assumed to be known for simplicity. Under the marginal posterior laws $\pi(\varphi \mid \mathcal{D})$ and $\pi(\mathbf{P} \mid \mathcal{D})$, where

Figures (7)

  • Figure 1: This figure displays the density of $\log\left(\sin^2(\theta_1)\right)$, representing the distance between the true $\mathbf{P}_0$ and posterior samples of $\mathbf{P}$ for different values of $\varphi_0$.
  • Figure 2: This figure on the top presents the density of $R^2$ values across $100$ datasets with $n=50$ data points, comparing meta-learning prediction for tasks generated with $\varphi_0 \in \{0.2,\,0.15,\,0.1,\,0.05,\,0.02,\,0.01\}$. The figure in the bottom presents the density of $\mathrm{trace}\!\left(\Sigma_y\right)$ values across $100$ datasets, comparing uncertainty in meta-learning prediction for tasks generated from various $\varphi_0$.
  • Figure 3: This figure displays the density of $\log\left(\sin^2(\theta_1)\right)$, representing the distance between the true $\mathbf{P}_0$ and posterior samples of $\mathbf{P}$ for different pairs of $(\varphi_0, k)$ with $k / \mathrm{trace}(\Sigma_{0}) = 0.169(\rm dotted), 0.423(\rm dashed), 0.847 (\rm solid)$, where $\mathrm{trace}\!\left(\Sigma_{0}\right) = 11.8$,
  • Figure 4: This figure on the top presents the density of $R^2$ values across $100$ datasets with $n=50$ data points, comparing meta-learning prediction for tasks generated using $(\varphi_0, k) = (0.1, 2), (0.05, 5), (0.02, 10)$ with corresponding $k / \mathrm{trace}(\Sigma_{0}) = 0.169(\rm dotted), 0.423(\rm dashed), 0.847 (\rm solid)$. The figure in the bottom presents the density of $\mathrm{trace}\!\left(\Sigma_y\right)$ values across the same datasets, under the same task generation settings.
  • Figure 5: Logarithm of $\sin^2\left(\theta_1\right)$ are plotted on the $x$-axis and the density of the values are plotted on the $y$-axis. This figure illustrates the decline of $\sin^2\theta_1(\mathbf{P}_{[t]}, \mathbf{P}^\star)$ as the number of tasks $S$ and the number of samples per task $n_s$ increase, under a high-dimensional setting with $n_s = 50$ (red) and a moderate-dimensional setting with $n_s = 100$ (black) samples per task.
  • ...and 2 more figures

Theorems & Definitions (8)

  • Definition 3.1: Geometric task diversity
  • Definition 3.2: Structural task diversity
  • Lemma 5.1
  • Theorem 5.2
  • Definition 1.1: Episodic few-shot learning
  • proof : Proof of Lemma 1 (i)
  • proof : Proof of Lemma 1 (ii)
  • proof