Table of Contents
Fetching ...

On Transferring Transferability: Towards a Theory for Size Generalization

Eitan Levin, Yuxin Ma, Mateo Díaz, Soledad Villar

TL;DR

The paper develops a unifying theory for transferring learning across object sizes by embedding finite-sized problems into a common limit space $V_\infty$ with a limit group $\mathsf{G}_\infty$. It proves that transferability is equivalent to continuity of the limit extension $f_\infty$ under a symmetrized metric, enabling size generalization bounds that connect model Lipschitzness, data geometry, and sampling. The framework is instantiated across sets, graphs, and point clouds, leading to concrete transferable architectures (e.g., GGNN, Continuous GGNN, SVD-DS) and refined variants of DeepSet and IGNs, along with a principled path to design new transferable models. Empirical results on size-generalization tasks show that aligning model inductive biases with the limit space yields robust performance across increasing input sizes, with trade-offs in computational efficiency depending on the chosen representation. Overall, the work provides both a theoretical foundation and practical tools for achieving reliable size generalization in diverse domains such as graphs, sets, and 3D point clouds, anchored by $V_\infty$-level continuity and robust generalization guarantees.

Abstract

Many modern learning tasks require models that can take inputs of varying sizes. Consequently, dimension-independent architectures have been proposed for domains where the inputs are graphs, sets, and point clouds. Recent work on graph neural networks has explored whether a model trained on low-dimensional data can transfer its performance to higher-dimensional inputs. We extend this body of work by introducing a general framework for transferability across dimensions. We show that transferability corresponds precisely to continuity in a limit space formed by identifying small problem instances with equivalent large ones. This identification is driven by the data and the learning task. We instantiate our framework on existing architectures, and implement the necessary changes to ensure their transferability. Finally, we provide design principles for designing new transferable models. Numerical experiments support our findings.

On Transferring Transferability: Towards a Theory for Size Generalization

TL;DR

The paper develops a unifying theory for transferring learning across object sizes by embedding finite-sized problems into a common limit space with a limit group . It proves that transferability is equivalent to continuity of the limit extension under a symmetrized metric, enabling size generalization bounds that connect model Lipschitzness, data geometry, and sampling. The framework is instantiated across sets, graphs, and point clouds, leading to concrete transferable architectures (e.g., GGNN, Continuous GGNN, SVD-DS) and refined variants of DeepSet and IGNs, along with a principled path to design new transferable models. Empirical results on size-generalization tasks show that aligning model inductive biases with the limit space yields robust performance across increasing input sizes, with trade-offs in computational efficiency depending on the chosen representation. Overall, the work provides both a theoretical foundation and practical tools for achieving reliable size generalization in diverse domains such as graphs, sets, and 3D point clouds, anchored by -level continuity and robust generalization guarantees.

Abstract

Many modern learning tasks require models that can take inputs of varying sizes. Consequently, dimension-independent architectures have been proposed for domains where the inputs are graphs, sets, and point clouds. Recent work on graph neural networks has explored whether a model trained on low-dimensional data can transfer its performance to higher-dimensional inputs. We extend this body of work by introducing a general framework for transferability across dimensions. We show that transferability corresponds precisely to continuity in a limit space formed by identifying small problem instances with equivalent large ones. This identification is driven by the data and the learning task. We instantiate our framework on existing architectures, and implement the necessary changes to ensure their transferability. Finally, we provide design principles for designing new transferable models. Numerical experiments support our findings.

Paper Structure

This paper contains 104 sections, 36 theorems, 182 equations, 10 figures, 2 tables.

Key Result

Proposition 3.2

Let $\mathbb{V},\mathbb{U}$ be consistent sequences and let $(f_n\colon {V}_n\to{U}_n)$ be maps between them.

Figures (10)

  • Figure 1: Two examples of consistent sequences on sets. (top) Zero-padding consistent sequence for sets. (bottom) Duplication consistent sequence for sets.
  • Figure 2: Transferability of invariant networks on sets under $(\mathbb{V}_{\mathrm{dup}}, \|\cdot\|_{\overline{1}})$. The plots show outputs of untrained, randomly initialized models on input sets of increasing size $n$. Each set consists of $n$ i.i.d. samples from $\mathcal{N}(0,1)$, a distribution with non-compact support. Error bars indicate one standard deviation above and below the mean over $100$ random samples. (a)(b)(c): Model output $f_n(X_n)$ vs. set size $n$. For normalized DeepSet, the dashed line represents the limiting value $f_\infty(\mu) = \sigma\left( \int \rho(x)\, d\mu(x) \right)$ for $\mu = \mathcal{N}(0,1)$, computed via numerical integration. While the outputs of DeepSet and PointNet diverge as $n$ increases, the transferable model, normalized DeepSet, converges to the theoretical limit, i.e., $f_n(X_n) \to f_\infty(\mu)$. (d): Convergence error $|f_n(X_n) - f_\infty(\mu)|$ vs. set size $n$ for normalized DeepSet (both axes in log scale), demonstrating the expected $O(n^{-1/2})$ convergence rate as predicted by Proposition \ref{['prop:convergence_transferability']}. See Appendix \ref{['appen:NN_sets']} for further discussion.
  • Figure 3: Duplication consistent sequence for graphs
  • Figure 4: Transferability of equivariant GNNs with respect to $(\mathbb{V}^G_{\mathrm{dup}}, \|\cdot\|_{\mathrm{op},2})$. The plots show outputs of untrained, randomly initialized models for two sequences of input graph signals $(A_n, X_n)$: (dashed lines) Fully-connected weighted graphs $A_n = \frac{\mathbbm{1}_n\mathbbm{1}_n^{\top}}{2}$, $X_n = \mathbbm{1}_n$. (solid lines) $A_n$ is drawn i.i.d. from the Erdős–Rényi model $G(n,1/2)$, with $X_n = \mathbbm{1}_n$. These two sequences represent different samplings of the same underlying constant graphon signal, where $W \equiv 1/2$ and $f \equiv 1$. Error bars indicate one standard deviation above and below the mean over $100$ random samples. (dashed lines): For the fully connected model each finite graph signal exactly induces the underlying graphon signal. The outputs of all compatible models ((a), (c), (d)) remain constant over $n$, whereas the incompatible model (b) does not. (solid lines): The outputs of all transferable models ((a), (d)) converge to the same limit as Sequence (1), while the discontinuous model (c) does not.
  • Figure 5: Size generalization experiments: Mean test MSE (over 10 random runs) against test input dimensionality $n$. Error bars indicate the min/max range in (a)(b)(d), and $20^{th}/80^{th}$ percentiles in (c) for legibility.
  • ...and 5 more figures

Theorems & Definitions (87)

  • Definition 2.1
  • Example 2.2
  • Definition 2.3
  • Definition 2.4
  • Definition 2.5
  • Example 2.6: Example \ref{['ex:set_cs']} continued
  • Definition 2.7
  • Definition 3.1
  • Proposition 3.2
  • Proposition 4.2
  • ...and 77 more