Table of Contents
Fetching ...

Economies of Open Intelligence: Tracing Power & Participation in the Model Ecosystem

Shayne Longpre, Christopher Akiki, Campbell Lund, Atharva Kulkarni, Emily Chen, Irene Solaiman, Avijit Ghosh, Yacine Jernite, Lucie-Aimée Kaffee

TL;DR

The paper leverages a comprehensive, longitudinal dataset of Hugging Face Model Hub downloads (2020–2025) to map how power concentrates and diffuses across models, developers, and nations in the open AI ecosystem. Using rolling-window usage signals, annotated metadata, and economic concentration metrics (HHI, Gini), it reveals a dramatic shift from US industry dominance to unaffiliated/online communities and Chinese developers, alongside a rise of intermediary re-packagers. It documents a parallel technical transformation toward larger, multimodal, and quantized architectures, with increased reliance on mixture-of-experts and substantial declines in data transparency and open-source alignment. The work contributes a publicly released dataset and dashboard to enable ongoing monitoring, governance considerations, and informed policy discussions about openness, participation, and competition in open AI ecosystems.

Abstract

Since 2019, the Hugging Face Model Hub has been the primary global platform for sharing open weight AI models. By releasing a dataset of the complete history of weekly model downloads (June 2020-August 2025) alongside model metadata, we provide the most rigorous examination to-date of concentration dynamics and evolving characteristics in the open model economy. Our analysis spans 851,000 models, over 200 aggregated attributes per model, and 2.2B downloads. We document a fundamental rebalancing of economic power: US open-weight industry dominance by Google, Meta, and OpenAI has declined sharply in favor of unaffiliated developers, community organizations, and, as of 2025, Chinese industry, with DeepSeek and Qwen models potentially heralding a new consolidation of market power. We identify statistically significant shifts in model properties, a 17X increase in average model size, rapid growth in multimodal generation (3.4X), quantization (5X), and mixture-of-experts architectures (7X), alongside concerning declines in data transparency, with open weights models surpassing truly open source models for the first time in 2025. We expose a new layer of developer intermediaries that has emerged, focused on quantizing and adapting base models for both efficiency and artistic expression. To enable continued research and oversight, we release the complete dataset with an interactive dashboard for real-time monitoring of concentration dynamics and evolving properties in the open model economy.

Economies of Open Intelligence: Tracing Power & Participation in the Model Ecosystem

TL;DR

The paper leverages a comprehensive, longitudinal dataset of Hugging Face Model Hub downloads (2020–2025) to map how power concentrates and diffuses across models, developers, and nations in the open AI ecosystem. Using rolling-window usage signals, annotated metadata, and economic concentration metrics (HHI, Gini), it reveals a dramatic shift from US industry dominance to unaffiliated/online communities and Chinese developers, alongside a rise of intermediary re-packagers. It documents a parallel technical transformation toward larger, multimodal, and quantized architectures, with increased reliance on mixture-of-experts and substantial declines in data transparency and open-source alignment. The work contributes a publicly released dataset and dashboard to enable ongoing monitoring, governance considerations, and informed policy discussions about openness, participation, and competition in open AI ecosystems.

Abstract

Since 2019, the Hugging Face Model Hub has been the primary global platform for sharing open weight AI models. By releasing a dataset of the complete history of weekly model downloads (June 2020-August 2025) alongside model metadata, we provide the most rigorous examination to-date of concentration dynamics and evolving characteristics in the open model economy. Our analysis spans 851,000 models, over 200 aggregated attributes per model, and 2.2B downloads. We document a fundamental rebalancing of economic power: US open-weight industry dominance by Google, Meta, and OpenAI has declined sharply in favor of unaffiliated developers, community organizations, and, as of 2025, Chinese industry, with DeepSeek and Qwen models potentially heralding a new consolidation of market power. We identify statistically significant shifts in model properties, a 17X increase in average model size, rapid growth in multimodal generation (3.4X), quantization (5X), and mixture-of-experts architectures (7X), alongside concerning declines in data transparency, with open weights models surpassing truly open source models for the first time in 2025. We expose a new layer of developer intermediaries that has emerged, focused on quantizing and adapting base models for both efficiency and artistic expression. To enable continued research and oversight, we release the complete dataset with an interactive dashboard for real-time monitoring of concentration dynamics and evolving properties in the open model economy.

Paper Structure

This paper contains 29 sections, 9 figures, 2 tables.

Figures (9)

  • Figure 1: Top: The Top 12 Nation Map ranked by the all-time downloads of their models. Bottom: The top 10 downloads Leaderboard for countries, developers, and models, with their download percentages. Both the map and the leaderboard use Rolling Window Filter to mitigate inauthentic downloads. The All-time section reflects all time downloads, whereas the Aug. 2024 - Aug. 2025 reflects all downloads for models created within the last year (August 2024 to August 2025). Symbols indicate details about the model: 1f9e9 = embedding and classification models; 1f4dd = text generation, 1f5bc = image generation, 1f399 = speech generation, 1f3a5 = video generation, 1f310 = international/online organization, 1f464 = unaffiliated user.
  • Figure 2: Top: Developer Download Share over time, using the Rolling Window Filter and applying Recursive Model Attribution. Where Google, Meta, and OpenAI once dominated market share (2021-2024), their influence has subsided for other developers beyond the Top 20 have gradually increased from 20% to >50% share. Bottom: National Download Share over time, using the Rolling Window Filter and applying Recursive Model Attribution. Where the US and Europe once dominated market share (2021-2023), now Users, China, and Germany have become prominent contributors. Both plots use a 1-year Rolling Window Filter to better estimate authentic usage.
  • Figure 3: Top: Model economic concentration over time. Middle: Developer economic concentration over time. Bottom: National economic concentration over time. In each plot we measure the share of downloads allocated to ranked segments of the open model economy. Economic measures of concentration are also displayed in purple (the Gini coefficient) and pink (the Hherfindahl-Hirschman Index from 0-1). Across levels of abstraction (model, developer, nation) economic concentration first declined significantly, but has started to rise again in 2025.
  • Figure 4: The proportion of downloads allocated by developer organization type in each country. We find US, China, and UK development is skewed heavily to industry, whereas Germany, France, and the rest of Asia, Europe, and Online development is more balanced towards non-profits, universities, and community contributors.
  • Figure 5: The distribution of model sizes downloaded in each year (left---pink), and created in each year (right---green) is shifting over time. The created statistics only begin in 2022, as Hugging Face did not record prior model creation times. We log-scale the downloads distribution prior to the violin plots' kernel density estimation to smooth and improve the visibility of the various model size peaks. The lines within each distribution represent the 25, 50, and 75 percentiles. We find the mean size of model download and creation are both rising, though the medians far less quickly as seen in \ref{['tab:temporal-shifts']}.
  • ...and 4 more figures