We Should Chart an Atlas of All the World's Models
Eliahu Horwitz, Nitzan Kurer, Jonathan Kahana, Liel Amar, Yedid Hoshen
TL;DR
The paper proposes the Model Atlas, a directed graph that represents the population of machine learning models and their weight-transformations, to address the documentation gap in millions of public models. It argues that charting this atlas enables model forensics, meta-ML visualization, and model discovery, but most edges and node features remain missing, necessitating symmetry-agnostic weight-space learning to scale. A symmetry-agnostic charting framework is introduced, including a greedy $O(n^2)$ algorithm, temporal priors, and probing-based representations, applied to real-world repositories such as Hugging Face. The authors discuss open challenges, visualization strategies, and alternative viewpoints, and provide initial datasets to facilitate community-driven development. Collectively, the Model Atlas is framed as a practical blueprint for organizing and leveraging the global model landscape to improve provenance, transferability, and reuse.
Abstract
Public model repositories now contain millions of models, yet most models remain undocumented and effectively lost. In this position paper, we advocate for charting the world's model population in a unified structure we call the Model Atlas: a graph that captures models, their attributes, and the weight transformations that connect them. The Model Atlas enables applications in model forensics, meta-ML research, and model discovery, challenging tasks given today's unstructured model repositories. However, because most models lack documentation, large atlas regions remain uncharted. Addressing this gap motivates new machine learning methods that treat models themselves as data, inferring properties such as functionality, performance, and lineage directly from their weights. We argue that a scalable path forward is to bypass the unique parameter symmetries that plague model weights. Charting all the world's models will require a community effort, and we hope its broad utility will rally researchers toward this goal.
