Meta-Learning and representation learner: A short theoretical note
Mouad El Bouchattaoui
TL;DR
This note presents a formal perspective on meta-learning by treating learning across tasks as optimizing a meta-knowledge $\omega$ that governs inner-task learning. It uses a bi-level training scheme with inner loss $\mathcal{L}^{task}$ and outer loss $\mathcal{L}^{meta}$ across source tasks drawn from a task environment $E$, and transfers to target tasks using $\omega^*$. It advances a representation-learning angle by decomposing $h$ as $h=g\circ f$ with $f\in F$ and $g\in G$, and defines a representation learner $\mathcal{A}$ that maps meta-samples to $F$, optimizing empirically via $E^{*}_{G}(F,\mathbf{z})=\frac{1}{n}\sum_{i=1}^{n} \inf_{g\in G} \langle l_{g\circ f} \rangle_{z_i}$. Finally, it derives generalization guarantees using $\epsilon$-covering numbers $C(\epsilon, l_G)$ and $C^{*}_{l_G}(\epsilon,F)$ and states two theorems that bound the required $n$ and $m$ under permissibility assumptions, providing a rigorous path between multi-task data and transfer risk.
Abstract
Meta-learning, or "learning to learn," is a subfield of machine learning where the goal is to develop models and algorithms that can learn from various tasks and improve their learning process over time. Unlike traditional machine learning methods focusing on learning a specific task, meta-learning aims to leverage experience from previous tasks to enhance future learning. This approach is particularly beneficial in scenarios where the available data for a new task is limited, but there exists abundant data from related tasks. By extracting and utilizing the underlying structure and patterns across these tasks, meta-learning algorithms can achieve faster convergence and better performance with fewer data. The following notes are mainly inspired from \cite{vanschoren2018meta}, \cite{baxter2019learning}, and \cite{maurer2005algorithmic}.
