Table of Contents
Fetching ...

Model Stitching by Functional Latent Alignment

Ioannis Athanasiadis, Anmar Karmush, Michael Felsberg

TL;DR

Assessing functional similarity across independently trained networks is challenging due to potential reliance on task cues. The paper introduces Functional Latent Alignment (FuLA), a model stitching objective that optimizes an affine transform to align latent representations not only at the stitching point but also across downstream layers via a hierarchy of functional hints, formalized as a weighted, normalized loss over depths. FuLA is shown to be more reliable than task-based stitching (SLM/TLM) in adversarial training, shortcut scenarios, and cross-layer settings, uncovering non-trivial alignments missed by stitch-level methods. The method offers a principled, task-agnostic measure of functional similarity with potential implications for robustness, transferability, and model interoperability in real-world AI systems.

Abstract

Evaluating functional similarity involves quantifying the degree to which independently trained neural networks learn functionally similar representations. Reliably inferring the functional similarity of these networks remains an open problem with far-reaching implications for AI. Model stitching has emerged as a promising paradigm, where an optimal affine transformation aligns two models to solve a task, with the stitched model serving as a proxy for functional similarity. In this work, we draw inspiration from the knowledge distillation literature and propose Functional Latent Alignment (FuLA) as a novel optimality condition for model stitching. We revisit previously explored functional similarity testbeds and introduce a new one, based on which FuLA emerges as an overall more reliable method of functional similarity. Specifically, our experiments in (a) adversarial training, (b) shortcut training and, (c) cross-layer stitching, reveal that FuLA is less prone to artifacts tied to training on task cues while achieving non-trivial alignments that are missed by stitch-level matching.

Model Stitching by Functional Latent Alignment

TL;DR

Assessing functional similarity across independently trained networks is challenging due to potential reliance on task cues. The paper introduces Functional Latent Alignment (FuLA), a model stitching objective that optimizes an affine transform to align latent representations not only at the stitching point but also across downstream layers via a hierarchy of functional hints, formalized as a weighted, normalized loss over depths. FuLA is shown to be more reliable than task-based stitching (SLM/TLM) in adversarial training, shortcut scenarios, and cross-layer settings, uncovering non-trivial alignments missed by stitch-level methods. The method offers a principled, task-agnostic measure of functional similarity with potential implications for robustness, transferability, and model interoperability in real-world AI systems.

Abstract

Evaluating functional similarity involves quantifying the degree to which independently trained neural networks learn functionally similar representations. Reliably inferring the functional similarity of these networks remains an open problem with far-reaching implications for AI. Model stitching has emerged as a promising paradigm, where an optimal affine transformation aligns two models to solve a task, with the stitched model serving as a proxy for functional similarity. In this work, we draw inspiration from the knowledge distillation literature and propose Functional Latent Alignment (FuLA) as a novel optimality condition for model stitching. We revisit previously explored functional similarity testbeds and introduce a new one, based on which FuLA emerges as an overall more reliable method of functional similarity. Specifically, our experiments in (a) adversarial training, (b) shortcut training and, (c) cross-layer stitching, reveal that FuLA is less prone to artifacts tied to training on task cues while achieving non-trivial alignments that are missed by stitch-level matching.

Paper Structure

This paper contains 19 sections, 5 equations, 30 figures, 1 table.

Figures (30)

  • Figure 1: Conceptual visualization for different degrees of functional alignment. The $f_3\circ f_2 \circ f_1$ is composition of functions mapping the input domain $\bm{Z}$ into the output domain $\bm{Y}$ through two intermediate domains $A_1$ and $A_2$. Let $T_{1}$, $T_2$ and $T_{3}$ be three different transformations, mapping domain $\bm{X}$ to $\bm{Z}$, that only differ in how they map the subdomain $\bm{X}^{(2)}$ to ${\bm{Z}^{(2)}}'$, ${\bm{Z}^{(2)}}"$ and ${\bm{Z}^{(2)}}"'$ respectively -- each equidistant from $\bm{Z}_2$. Under the DM, all three transformations are treated as equivalent. In contrast, the task-oriented objectives regards $T_1$ as the least optimal, while considering $T_2$ and $T_3$ equally performant, since they both preserve the input–output relationship of the composition.
  • Figure 2: The proposed model stitching by FuLA in relation to other model stitching settings. In the example of the figure, stitching is performed at the second layers between identical architectures.
  • Figure 3: DM in practice.
  • Figure 4: Cross-task stitching under AT -- ResNet18.
  • Figure 5: Same-task stitching under AT $(\alpha=1)$ -- ResNet18
  • ...and 25 more figures