Table of Contents
Fetching ...

We Should Chart an Atlas of All the World's Models

Eliahu Horwitz, Nitzan Kurer, Jonathan Kahana, Liel Amar, Yedid Hoshen

TL;DR

The paper proposes the Model Atlas, a directed graph that represents the population of machine learning models and their weight-transformations, to address the documentation gap in millions of public models. It argues that charting this atlas enables model forensics, meta-ML visualization, and model discovery, but most edges and node features remain missing, necessitating symmetry-agnostic weight-space learning to scale. A symmetry-agnostic charting framework is introduced, including a greedy $O(n^2)$ algorithm, temporal priors, and probing-based representations, applied to real-world repositories such as Hugging Face. The authors discuss open challenges, visualization strategies, and alternative viewpoints, and provide initial datasets to facilitate community-driven development. Collectively, the Model Atlas is framed as a practical blueprint for organizing and leveraging the global model landscape to improve provenance, transferability, and reuse.

Abstract

Public model repositories now contain millions of models, yet most models remain undocumented and effectively lost. In this position paper, we advocate for charting the world's model population in a unified structure we call the Model Atlas: a graph that captures models, their attributes, and the weight transformations that connect them. The Model Atlas enables applications in model forensics, meta-ML research, and model discovery, challenging tasks given today's unstructured model repositories. However, because most models lack documentation, large atlas regions remain uncharted. Addressing this gap motivates new machine learning methods that treat models themselves as data, inferring properties such as functionality, performance, and lineage directly from their weights. We argue that a scalable path forward is to bypass the unique parameter symmetries that plague model weights. Charting all the world's models will require a community effort, and we hope its broad utility will rally researchers toward this goal.

We Should Chart an Atlas of All the World's Models

TL;DR

The paper proposes the Model Atlas, a directed graph that represents the population of machine learning models and their weight-transformations, to address the documentation gap in millions of public models. It argues that charting this atlas enables model forensics, meta-ML visualization, and model discovery, but most edges and node features remain missing, necessitating symmetry-agnostic weight-space learning to scale. A symmetry-agnostic charting framework is introduced, including a greedy algorithm, temporal priors, and probing-based representations, applied to real-world repositories such as Hugging Face. The authors discuss open challenges, visualization strategies, and alternative viewpoints, and provide initial datasets to facilitate community-driven development. Collectively, the Model Atlas is framed as a practical blueprint for organizing and leveraging the global model landscape to improve provenance, transferability, and reuse.

Abstract

Public model repositories now contain millions of models, yet most models remain undocumented and effectively lost. In this position paper, we advocate for charting the world's model population in a unified structure we call the Model Atlas: a graph that captures models, their attributes, and the weight transformations that connect them. The Model Atlas enables applications in model forensics, meta-ML research, and model discovery, challenging tasks given today's unstructured model repositories. However, because most models lack documentation, large atlas regions remain uncharted. Addressing this gap motivates new machine learning methods that treat models themselves as data, inferring properties such as functionality, performance, and lineage directly from their weights. We argue that a scalable path forward is to bypass the unique parameter symmetries that plague model weights. Charting all the world's models will require a community effort, and we hope its broad utility will rally researchers toward this goal.

Paper Structure

This paper contains 18 sections, 1 equation, 18 figures, 3 tables, 1 algorithm.

Figures (18)

  • Figure 1: Position overview: With millions of public models, it becomes important to move beyond individual models and study entire populations (left). The Model Atlas formalizes this shift by representing models as nodes in a graph, with directed edges denoting weight transformations (e.g., fine-tuning). Node size and color, as well as edge color, encode node and edge-level features; light blue indicates missing or unknown information. The atlas enables a range of applications, including model forensics, meta-ML research, and model discovery (center). In practice, most edges and features are unknown. This motivates ML methods that take models as input and infer their properties, thereby completing the missing atlas regions (right). Zoom in to view edges, best viewed in color.
  • Figure 1: Atlas structure recovery (psuedo-code)
  • Figure 2: Growth in Hugging Face models: The number of public models is growing rapidly, but most remain undocumented and effectively lost. We advocate for charting them in a Model Atlas.
  • Figure 3: The Model Atlas - Stable Diffusion vs. Llama: We visualize the atlas of the top 30% most downloaded models in the Stable Diffusion (SD) and Llama regions. Node size reflects cumulative monthly downloads, and color denotes the transformation type relative to the parent model. The atlas reveals that the Llama region has a more complex structure and a wider diversity of transformation techniques (e.g., quantization, merging) compared to SD. Zoom in to view edges, best viewed in color.
  • Figure 4: Model Atlas illustration: Each node in the Model Atlas represents a distinct model state, and each directed edge denotes a weight transformation from one model to another. Edge features encode information about the transformation, node features capture properties of the model itself.
  • ...and 13 more figures