Table of Contents
Fetching ...

Few-shot Multi-Task Learning of Linear Invariant Features with Meta Subspace Pursuit

Chaozhi Zhang, Lin Liu, Xiaoqun Zhang

TL;DR

The paper tackles data scarcity in multi-task linear regression by positing a low-rank, task-invariant subspace across tasks: $\bm{\Theta}^{*}=\mathbf{W}^{*}\mathbf{B}^{*}$ with $\mathbf{B}^{*}$ of dimension $s$. It introduces Meta Subspace Pursuit (Meta-SP), an iterative rank-$s$ subspace learning algorithm that alternates gradient updates for each task with hard thresholding on the concatenated coefficient matrix to recover $\bm{\Theta}$ and, via the right singular vectors, the invariant subspace $\mathbf{B}$. The authors provide RIP-based guarantees and convergence rates showing the estimator converges to a noise floor $O\left(\sqrt{\frac{dT\sigma^2}{m}}\right)$ under per-task samples $m=\Omega(s\log s)$, outperforming several baselines in low-data regimes. Empirical results on simulated data and a real PM2.5 air-quality dataset demonstrate that Meta-SP achieves superior accuracy and computational efficiency when data are scarce, validating the practical value of learning a shared invariant representation for few-shot multi-task learning.

Abstract

Data scarcity poses a serious threat to modern machine learning and artificial intelligence, as their practical success typically relies on the availability of big datasets. One effective strategy to mitigate the issue of insufficient data is to first harness information from other data sources possessing certain similarities in the study design stage, and then employ the multi-task or meta learning framework in the analysis stage. In this paper, we focus on multi-task (or multi-source) linear models whose coefficients across tasks share an invariant low-rank component, a popular structural assumption considered in the recent multi-task or meta learning literature. Under this assumption, we propose a new algorithm, called Meta Subspace Pursuit (abbreviated as Meta-SP), that provably learns this invariant subspace shared by different tasks. Under this stylized setup for multi-task or meta learning, we establish both the algorithmic and statistical guarantees of the proposed method. Extensive numerical experiments are conducted, comparing Meta-SP against several competing methods, including popular, off-the-shelf model-agnostic meta learning algorithms such as ANIL. These experiments demonstrate that Meta-SP achieves superior performance over the competing methods in various aspects.

Few-shot Multi-Task Learning of Linear Invariant Features with Meta Subspace Pursuit

TL;DR

The paper tackles data scarcity in multi-task linear regression by positing a low-rank, task-invariant subspace across tasks: with of dimension . It introduces Meta Subspace Pursuit (Meta-SP), an iterative rank- subspace learning algorithm that alternates gradient updates for each task with hard thresholding on the concatenated coefficient matrix to recover and, via the right singular vectors, the invariant subspace . The authors provide RIP-based guarantees and convergence rates showing the estimator converges to a noise floor under per-task samples , outperforming several baselines in low-data regimes. Empirical results on simulated data and a real PM2.5 air-quality dataset demonstrate that Meta-SP achieves superior accuracy and computational efficiency when data are scarce, validating the practical value of learning a shared invariant representation for few-shot multi-task learning.

Abstract

Data scarcity poses a serious threat to modern machine learning and artificial intelligence, as their practical success typically relies on the availability of big datasets. One effective strategy to mitigate the issue of insufficient data is to first harness information from other data sources possessing certain similarities in the study design stage, and then employ the multi-task or meta learning framework in the analysis stage. In this paper, we focus on multi-task (or multi-source) linear models whose coefficients across tasks share an invariant low-rank component, a popular structural assumption considered in the recent multi-task or meta learning literature. Under this assumption, we propose a new algorithm, called Meta Subspace Pursuit (abbreviated as Meta-SP), that provably learns this invariant subspace shared by different tasks. Under this stylized setup for multi-task or meta learning, we establish both the algorithmic and statistical guarantees of the proposed method. Extensive numerical experiments are conducted, comparing Meta-SP against several competing methods, including popular, off-the-shelf model-agnostic meta learning algorithms such as ANIL. These experiments demonstrate that Meta-SP achieves superior performance over the competing methods in various aspects.
Paper Structure (19 sections, 8 theorems, 42 equations, 15 figures, 7 tables, 1 algorithm)

This paper contains 19 sections, 8 theorems, 42 equations, 15 figures, 7 tables, 1 algorithm.

Key Result

Lemma 2.4

Let $\mathbf{S}_1, \cdots, \mathbf{S}_m \in \mathbb{R}^{d_1 \times d_2}$ be independent, centered random matrices of the same size, where $\mathbb{E} \mathbf{S}_j = \bm{0}$ and $\|\mathbf{S}_j\| \le L$ for every $j = 1, \ldots, m$. Let $\mathbf{Z} \coloneqq \sum_{j=1}^{m} \mathbf{S}_j$. Define the q Then we have:

Figures (15)

  • Figure 1: Evolution of $\mathbf{Dist}_1$ (left) and $\mathbf{Dist}_2$ (right) with the number of tasks $T$ for $s=5$, $m=25$ and $\sigma=1$.
  • Figure 2: Evolution of $\mathbf{Dist}_1$(left) and $\mathbf{Dist}_2$(right) with the number of tasks $T$ for $s=5$, $m=5$ and $\sigma=1$.
  • Figure 3: Evolution of $\mathbf{Dist}_1$(left) and $\mathbf{Dist}_2$(right) with the sample size $m$ for $s=5$, $T=800$ and $\sigma=1$.
  • Figure 4: Evolution of $\mathbf{Dist}_1$(left) and $\mathbf{Dist}_2$(right) with variance of noise $\sigma$ for $s=5$, $T=400$ and $m=25$.
  • Figure 5: The empirical minimum amount of $(m, T)$ required for the sine angle distance between the estimated and the true task-invariant subspaces to be $\leq 0.1$. The horizontal axis represents the value of $m$, while the vertical axis represents the value of $T$.
  • ...and 10 more figures

Theorems & Definitions (16)

  • Definition 2.3: Restricted Isometry Property
  • Lemma 2.4: Matrix Bernstein inequality tropp2015introduction
  • Theorem 2.5
  • Theorem 3.1
  • Theorem 3.2
  • proof : Proof of Theorem \ref{['ripr']}
  • Definition A.1: Orthonormal basis of a subspace
  • Definition A.2: SVD basis of a matrix
  • Proposition A.3
  • Proposition A.4
  • ...and 6 more