Table of Contents
Fetching ...

Function Encoders: A Principled Approach to Transfer Learning in Hilbert Spaces

Tyler Ingebrand, Adam J. Thorpe, Ufuk Topcu

TL;DR

This work introduces a geometric framework for inductive transfer in a Hilbert space $\mathcal{H}$, defining three transfer types: interpolation within the convex hull $C_h$, extrapolation to the linear span $\operatorname{span}\{f_{S_i}\}$, and extrapolation to the full space $\mathcal{H}$. It proposes function encoders that learn a fixed neural-basis representation, enabling efficient online adaptation by expressing target tasks as linear combinations of basis functions with coefficients computed via least-squares, a method supported by a universal function space approximation theorem. The authors prove that any function in a separable Hilbert space can be approximated arbitrarily well by finite neural-basis combinations, and they generalize inner products to various output spaces. Empirically, FE(LS) matches or outperforms baselines across polynomial, CIFAR, 7-Scenes, and MuJoCo tasks, with especially strong performance on Type 2 and Type 3 transfer, highlighting the method's ability to transfer knowledge to unseen tasks without retraining.

Abstract

A central challenge in transfer learning is designing algorithms that can quickly adapt and generalize to new tasks without retraining. Yet, the conditions of when and how algorithms can effectively transfer to new tasks is poorly characterized. We introduce a geometric characterization of transfer in Hilbert spaces and define three types of inductive transfer: interpolation within the convex hull, extrapolation to the linear span, and extrapolation outside the span. We propose a method grounded in the theory of function encoders to achieve all three types of transfer. Specifically, we introduce a novel training scheme for function encoders using least-squares optimization, prove a universal approximation theorem for function encoders, and provide a comprehensive comparison with existing approaches such as transformers and meta-learning on four diverse benchmarks. Our experiments demonstrate that the function encoder outperforms state-of-the-art methods on four benchmark tasks and on all three types of transfer.

Function Encoders: A Principled Approach to Transfer Learning in Hilbert Spaces

TL;DR

This work introduces a geometric framework for inductive transfer in a Hilbert space , defining three transfer types: interpolation within the convex hull , extrapolation to the linear span , and extrapolation to the full space . It proposes function encoders that learn a fixed neural-basis representation, enabling efficient online adaptation by expressing target tasks as linear combinations of basis functions with coefficients computed via least-squares, a method supported by a universal function space approximation theorem. The authors prove that any function in a separable Hilbert space can be approximated arbitrarily well by finite neural-basis combinations, and they generalize inner products to various output spaces. Empirically, FE(LS) matches or outperforms baselines across polynomial, CIFAR, 7-Scenes, and MuJoCo tasks, with especially strong performance on Type 2 and Type 3 transfer, highlighting the method's ability to transfer knowledge to unseen tasks without retraining.

Abstract

A central challenge in transfer learning is designing algorithms that can quickly adapt and generalize to new tasks without retraining. Yet, the conditions of when and how algorithms can effectively transfer to new tasks is poorly characterized. We introduce a geometric characterization of transfer in Hilbert spaces and define three types of inductive transfer: interpolation within the convex hull, extrapolation to the linear span, and extrapolation outside the span. We propose a method grounded in the theory of function encoders to achieve all three types of transfer. Specifically, we introduce a novel training scheme for function encoders using least-squares optimization, prove a universal approximation theorem for function encoders, and provide a comprehensive comparison with existing approaches such as transformers and meta-learning on four diverse benchmarks. Our experiments demonstrate that the function encoder outperforms state-of-the-art methods on four benchmark tasks and on all three types of transfer.

Paper Structure

This paper contains 38 sections, 3 theorems, 51 equations, 15 figures, 1 algorithm.

Key Result

Theorem 1

Let $K \subset \mathbb{R}^n$ be compact. Define the inner product $\langle f,g \rangle_\mathcal{H} := \int_K f(x)^\top g(x) dx$ and the induced norm $\lVert f \rVert_\mathcal{H}:=\sqrt{\langle f, f \rangle_\mathcal{H}}$. Let $\mathcal{H}=\{f:K \to \mathbb{R}^m | f \; \text{continuous}, \lVert f \rVe

Figures (15)

  • Figure 1: The Categorization of Transfer Learning. Black points are functions present in the training set. Purple points indicate type 1 transfer, interpolation within the convex hull. Orange points represent type 2 transfer, extrapolation to the linear span. The red point is type 3 transfer, extrapolation to the Hilbert space.
  • Figure 2: Empirical Results on the Polynomial Dataset. While many approaches demonstrate moderate type 1 transfer, only the function encoder successfully achieves all three types, as illustrated by its orders of magnitude advantage over other approaches.
  • Figure 3: Qualitative Analysis of Transfer on the Polynomial Dataset. In this illustrative example, we visualize the function encoder and one baseline, the auto encoder, on each of the three types of transfer. We observe that both approaches achieve reasonable performance for type 1 transfer. For type 2 transfer, the target function is much larger in magnitude than any function in the training set. The auto encoder fails at this function because it has only learned to output functions from the training function space. In contrast, the function encoder generalizes to the entire span of the training function space by design. For type 3 transfer, the target function is a cubic function. The auto encoder nonetheless outputs a function that is similar to the ones seen during training. When using a function encoder with only three basis functions, the basis functions only span the three-dimensional space of quadratic functions, and so its approximation is the best quadratic to fit the data. When using 100 basis functions, the basis functions spans the space of quadratics, but additionally have 97 unconstrained dimensions. Due to the use of least squares, the function encoder with 100 basis functions optimally uses these extra 97 dimensions to fit the new function. Therefore, it is able to reasonable approximate this function as well, despite having never seen a cubic function during training.
  • Figure 4: Empirical Results on the CIFAR Dataset. The training curves show the two ad-hoc baselines seem to be performing best, and many algorithms fail to converge on all or some seeds. However, when measuring type 1 transfer, the function encoder performs best, achieving slightly better performance than Siamese networks. For type 3 transfer, few-shot classification of unseen classes, the function encoder again performs best, albeit similar to Siamese networks. The key idea is that function encoders are performing comparably to ad-hoc approaches despite being designed for a more general setting.
  • Figure 5: Empirical Results on the 7-Scenes Dataset. Many approaches converge during training. As expected, all approaches perform much worse at type 1 transfer, indicating a degree of over-fitting. The function encoder performs best at both type 1 and type 3 transfer, indicating its ability to optimally use the learned features for unseen data.
  • ...and 10 more figures

Theorems & Definitions (11)

  • Definition 1: Domain
  • Definition 2: Task
  • Definition 3: Dataset
  • Definition 4: Transfer Learning, 5288526
  • Definition 5: Type 1, Interpolation in the Convex Hull
  • Definition 6: Type 2, Extrapolation to the Linear Span
  • Definition 7: Type 3, Extrapolation to $\mathcal{H}$
  • Theorem 1
  • Theorem 1: Restated
  • Theorem 2: ufat
  • ...and 1 more