Unsupervised Model Tree Heritage Recovery
Eliahu Horwitz, Asaf Shul, Yedid Hoshen
TL;DR
The paper tackles the problem of tracing model heritage in an era of vast public model repositories by proposing Unsupervised MoTHer Recovery, which relies on model weights rather than metadata. It introduces Model Tree and Model Graph structures, and develops distance-based and directional statistics from weights—notably the full-fine-tuning distance $\\ell_{FT}$, LoRA distance $\\ell_{LoRA}$, and a kurtosis-based directional score—to recover parent-child relations via a minimum directed spanning tree (MDST) framework. The approach is validated on a large synthetic MoTHer dataset and a real-world Stable Diffusion tree, demonstrating high accuracy, robustness to pruning/quantization, and scalability through clustering and component-wise recovery. The work has practical implications for attribution, IP protection, and transparent model ecosystems, while acknowledging limitations such as training-stage supervision and the need for scalable web-scale deployment. Overall, MoTHer represents an influential step toward unsupervised, data-free reconstruction of model heritage in complex, multi-tree ecosystems.
Abstract
The number of models shared online has recently skyrocketed, with over one million public models available on Hugging Face. Sharing models allows other users to build on existing models, using them as initialization for fine-tuning, improving accuracy, and saving compute and energy. However, it also raises important intellectual property issues, as fine-tuning may violate the license terms of the original model or that of its training data. A Model Tree, i.e., a tree data structure rooted at a foundation model and having directed edges between a parent model and other models directly fine-tuned from it (children), would settle such disputes by making the model heritage explicit. Unfortunately, current models are not well documented, with most model metadata (e.g., "model cards") not providing accurate information about heritage. In this paper, we introduce the task of Unsupervised Model Tree Heritage Recovery (Unsupervised MoTHer Recovery) for collections of neural networks. For each pair of models, this task requires: i) determining if they are directly related, and ii) establishing the direction of the relationship. Our hypothesis is that model weights encode this information, the challenge is to decode the underlying tree structure given the weights. We discover several properties of model weights that allow us to perform this task. By using these properties, we formulate the MoTHer Recovery task as finding a directed minimal spanning tree. In extensive experiments we demonstrate that our method successfully reconstructs complex Model Trees.
