Unsupervised Model Tree Heritage Recovery

Eliahu Horwitz; Asaf Shul; Yedid Hoshen

Unsupervised Model Tree Heritage Recovery

Eliahu Horwitz, Asaf Shul, Yedid Hoshen

TL;DR

The paper tackles the problem of tracing model heritage in an era of vast public model repositories by proposing Unsupervised MoTHer Recovery, which relies on model weights rather than metadata. It introduces Model Tree and Model Graph structures, and develops distance-based and directional statistics from weights—notably the full-fine-tuning distance $\\ell_{FT}$, LoRA distance $\\ell_{LoRA}$, and a kurtosis-based directional score—to recover parent-child relations via a minimum directed spanning tree (MDST) framework. The approach is validated on a large synthetic MoTHer dataset and a real-world Stable Diffusion tree, demonstrating high accuracy, robustness to pruning/quantization, and scalability through clustering and component-wise recovery. The work has practical implications for attribution, IP protection, and transparent model ecosystems, while acknowledging limitations such as training-stage supervision and the need for scalable web-scale deployment. Overall, MoTHer represents an influential step toward unsupervised, data-free reconstruction of model heritage in complex, multi-tree ecosystems.

Abstract

The number of models shared online has recently skyrocketed, with over one million public models available on Hugging Face. Sharing models allows other users to build on existing models, using them as initialization for fine-tuning, improving accuracy, and saving compute and energy. However, it also raises important intellectual property issues, as fine-tuning may violate the license terms of the original model or that of its training data. A Model Tree, i.e., a tree data structure rooted at a foundation model and having directed edges between a parent model and other models directly fine-tuned from it (children), would settle such disputes by making the model heritage explicit. Unfortunately, current models are not well documented, with most model metadata (e.g., "model cards") not providing accurate information about heritage. In this paper, we introduce the task of Unsupervised Model Tree Heritage Recovery (Unsupervised MoTHer Recovery) for collections of neural networks. For each pair of models, this task requires: i) determining if they are directly related, and ii) establishing the direction of the relationship. Our hypothesis is that model weights encode this information, the challenge is to decode the underlying tree structure given the weights. We discover several properties of model weights that allow us to perform this task. By using these properties, we formulate the MoTHer Recovery task as finding a directed minimal spanning tree. In extensive experiments we demonstrate that our method successfully reconstructs complex Model Trees.

Unsupervised Model Tree Heritage Recovery

TL;DR

, LoRA distance

, and a kurtosis-based directional score—to recover parent-child relations via a minimum directed spanning tree (MDST) framework. The approach is validated on a large synthetic MoTHer dataset and a real-world Stable Diffusion tree, demonstrating high accuracy, robustness to pruning/quantization, and scalability through clustering and component-wise recovery. The work has practical implications for attribution, IP protection, and transparent model ecosystems, while acknowledging limitations such as training-stage supervision and the need for scalable web-scale deployment. Overall, MoTHer represents an influential step toward unsupervised, data-free reconstruction of model heritage in complex, multi-tree ecosystems.

Abstract

Paper Structure (35 sections, 6 equations, 26 figures, 7 tables)

This paper contains 35 sections, 6 equations, 26 figures, 7 tables.

Introduction
Related Works
Model Trees and Model Graphs
Definition.
Task definition.
Model Graph Priors
Estimating node distance from model weights
LoRA fine-tuning.
Estimating edge direction from weights
Intuition.
Model Tree Heritage Recovery
Warm-up: A simplified Model Graph
Grandparent-Parent-Child (GPC).
Parent-Child-Child (PC2).
Parent-Child-Stranger (PCS).
...and 20 more sections

Figures (26)

Figure 1: Weight Distance vs. Model Tree Edge Distance: For every pair of models, we plot the weight distance and the corresponding edge distance on the Model Tree. Our weight distances $\ell_{FT}$ and $\ell_{LoRA}$ almost perfectly correlate with the number of edges between models in a Model Tree. This correlation confirms these weight distances are good indicators for determining parent-child relation, i.e., models that were fine-tuned from one another. We use a $3$ levels deep Model Tree that contains $21$ models
Figure 2: Directional Weight Score: We plot the change in the directional weight score throughout the pre-training (generalization) stage (left) and fine-tuning (specialization) stage (right). In all cases, the directional score is almost monotonic, indicating the increasing number of weight outlier values during generalization and the decreasing number during specialization. This confirms that our directional weight score is effective for determining the direction of an edge. For the fine-tuning, we used publicly available, pre-trained backbones as initialization
Figure 3: Recovering a Simplified Model Graph: We enumerate all possible Model Graphs of size $3$ (left). On the right, we demonstrate a Model Graph Recovery process. (a) A set of $3$ models with no prior knowledge regarding their relations. (b) Place edges between the nodes with the lowest weight distance. (c) Designate the node with the highest directional weight score as the root
Figure 4: MoTHer Recovery Overview: Our proposed Model Graphs and Model Trees are new data structures for describing the heredity training relations between models. In these structures, heredity relations are represented as directed edges. We introduce the task of MoTHer Recovery its goal is to uncover the unknown structure of Model Graphs based on the weights of a set of input models. Our algorithm works as follows: (a) Cluster into different Model Trees based on the pairwise weight distances. (b) For each cluster, i) use $\ell_{FT}$ or $\ell_{LoRA}$ to create a pairwise distance matrix $D$ for placing edges, and ii) create a binary directional matrix $K$ based on the kurtosis to determine the edge direction. (c) To recover the final Model Tree, run a minimum directed spanning tree (MDST) algorithm on the merged prior matrix $M$. The final recovered Model Graph is the union of the recovered Model Trees
Figure 5: MoTHer Dataset Overview: Our dataset simulates a Model Graph consisting of over $20$ Model Trees with a total of over $500$ models fine-tuned on varying datasets with different hyperparameters. We distinguish between $4$ main disjoint sub-graphs, differing in backbone and fine-tuning paradigm. We visualize the ground truth structure of a single sub-graph that contains $105$ models across $5$ Model Trees. The different colors represent the different Model Trees, each rooted in a different foundation model. In practice, this structure is unknown and we are only given the set of models, without knowing their relations or their origin. Note that all $105$ models use the same ViT architecture, making it non-trivial to recover the structure
...and 21 more figures

Unsupervised Model Tree Heritage Recovery

TL;DR

Abstract

Unsupervised Model Tree Heritage Recovery

Authors

TL;DR

Abstract

Table of Contents

Figures (26)