Towards Semantic Versioning of Open Pre-trained Language Model Releases on Hugging Face
Adekunle Ajibode, Abdul Ali Bangash, Filipe Roseiro Cogo, Bram Adams, Ahmed E. Hassan
TL;DR
The paper investigates how open PTLM releases on Hugging Face are named, versioned, and documented, revealing pervasive inconsistencies and a lack of semantic versioning. Using a mixed-methods approach, it analyzes $52{,}227$ PTLMs, identifies $148$ naming conventions across $12$ segment-types, and shows that only a minority of releases explicitly signal variant-type or training-dataset provenance. It demonstrates widespread implicit versioning in model binaries and extensive gaps in model cards and dataset metadata, arguing for a multidimensional, provenance-rich semantic versioning framework for PTLMs. The work outlines concrete recommendations for standardizing naming, enhancing metadata, and providing tools (e.g., version calculators, SBOM-like practices) to improve reproducibility, trust, and interoperability in the model registry ecosystem.
Abstract
The proliferation of open Pre-trained Language Models (PTLMs) on model registry platforms like Hugging Face (HF) presents both opportunities and challenges for companies building products around them. Similar to traditional software dependencies, PTLMs continue to evolve after a release. However, the current state of release practices of PTLMs on model registry platforms are plagued by a variety of inconsistencies, such as ambiguous naming conventions and inaccessible model training documentation. Given the knowledge gap on current PTLM release practices, our empirical study uses a mixed-methods approach to analyze the releases of 52,227 PTLMs on the most well-known model registry, HF. Our results reveal 148 different naming practices for PTLM releases, with 40.87% of changes to model weight files not represented in the adopted name-based versioning practice or their documentation. In addition, we identified that the 52,227 PTLMs are derived from only 299 different base models (the modified original models used to create 52,227 PTLMs), with Fine-tuning and Quantization being the most prevalent modification methods applied to these base models. Significant gaps in release transparency, in terms of training dataset specifications and model card availability, still exist, highlighting the need for standardized documentation. While we identified a model naming practice explicitly differentiating between major and minor PTLM releases, we did not find any significant difference in the types of changes that went into either type of releases, suggesting that major/minor version numbers for PTLMs often are chosen arbitrarily. Our findings provide valuable insights to improve PTLM release practices, nudging the field towards more formal semantic versioning practices.
