Table of Contents
Fetching ...

Deep Model Merging: The Sister of Neural Network Interpretability -- A Survey

Arham Khan, Todd Nief, Nathaniel Hudson, Mansi Sakarvadia, Daniel Grzenda, Aswathy Ajith, Jordan Pettyjohn, Kyle Chard, Ian Foster

TL;DR

This survey addresses how the geometry of loss landscapes governs the merging of neural network models and the interpretability of their internal representations. It introduces a taxonomy of model merging techniques—ensembling, weight aggregation, and neuron alignment—and links their success to phenomena in loss landscape geometry such as mode convexity, mode determinism, mode directedness, and mode connectivity. By synthesizing empirical findings, the paper connects merging insights to model interpretability and robustness, and sketches promising directions for future work at this intersection. The study also highlights practical implications for large-scale training regimes and security considerations for open pre trained checkpoints.

Abstract

We survey the model merging literature through the lens of loss landscape geometry to connect observations from empirical studies on model merging and loss landscape analysis to phenomena that govern neural network training and the emergence of their inner representations. We distill repeated empirical observations from the literature in these fields into descriptions of four major characteristics of loss landscape geometry: mode convexity, determinism, directedness, and connectivity. We argue that insights into the structure of learned representations from model merging have applications to model interpretability and robustness, subsequently we propose promising new research directions at the intersection of these fields.

Deep Model Merging: The Sister of Neural Network Interpretability -- A Survey

TL;DR

This survey addresses how the geometry of loss landscapes governs the merging of neural network models and the interpretability of their internal representations. It introduces a taxonomy of model merging techniques—ensembling, weight aggregation, and neuron alignment—and links their success to phenomena in loss landscape geometry such as mode convexity, mode determinism, mode directedness, and mode connectivity. By synthesizing empirical findings, the paper connects merging insights to model interpretability and robustness, and sketches promising directions for future work at this intersection. The study also highlights practical implications for large-scale training regimes and security considerations for open pre trained checkpoints.

Abstract

We survey the model merging literature through the lens of loss landscape geometry to connect observations from empirical studies on model merging and loss landscape analysis to phenomena that govern neural network training and the emergence of their inner representations. We distill repeated empirical observations from the literature in these fields into descriptions of four major characteristics of loss landscape geometry: mode convexity, determinism, directedness, and connectivity. We argue that insights into the structure of learned representations from model merging have applications to model interpretability and robustness, subsequently we propose promising new research directions at the intersection of these fields.

Paper Structure

This paper contains 36 sections, 5 equations, 7 figures, 2 tables.

Figures (7)

  • Figure 1: A taxonomy of model merging methods. Ensemble methods do not manipulate model parameters, but rather combine prediction logits or model components in novel ways. Neuron Alignment techniques attempt to resolve potential misalignment between model representations by using permutations of their units before computing some linear combination of source model parameters. Weight Aggregation methods compute a linear combination of model parameters.
  • Figure 3: Permutation Symmetry: An illustration of symmetry in dense linear layers. Changing the order of two neurons does not change the resulting activations.
  • Figure 4: Mode Convexity: Within an objective basin, one can linearly interpolate between known solutions to discover equally performant models with diverse behaviors.
  • Figure 5: RegMean finds a new set of merged parameters that closely approximates the activation maps of the given source models in a manner analogous to least-squares regression.
  • Figure 6: Mode Connectivity: Many objective basins are equivalent up to a permutation in neurons. Models in distinct objective basins can be "transported" close to one another by applying appropriate permutations to their units.
  • ...and 2 more figures