Exploring the Lifecycle and Maintenance Practices of Pre-Trained Models in Open-Source Software Repositories
Matin Koohjani, Diego Elias Costa
TL;DR
The paper tackles the challenge of understanding how pre-trained models (PTMs) are adopted, maintained, and tested in open-source software, addressing lifecycle gaps arising from inconsistent semantic versioning. It proposes an exploratory study built on the PeaTMOSS dataset to mine PTMs used in production-oriented OSS projects via Hugging Face and PyTorch Hub, guided by five research questions on characteristics, usage, evolution, testing, and issues. The approach combines call-site mining, $PageRank$ centrality, $Kaplan-Meier$ survival analysis, and qualitative coding with $Cohen-Kappa$ reliability to map PTM roles, update patterns, and testing practices. The anticipated contributions include actionable guidance for PTM governance and maintenance in OSS and a publicly available replication package to support reproducibility and further research.
Abstract
Pre-trained models (PTMs) are becoming a common component in open-source software (OSS) development, yet their roles, maintenance practices, and lifecycle challenges remain underexplored. This report presents a plan for an exploratory study to investigate how PTMs are utilized, maintained, and tested in OSS projects, focusing on models hosted on platforms like Hugging Face and PyTorch Hub. We plan to explore how PTMs are used in open-source software projects and their related maintenance practices by mining software repositories that use PTMs and analyzing their code-base, historical data, and reported issues. This study aims to provide actionable insights into improving the use and sustainability of PTM in open-source projects and a step towards a foundation for advancing software engineering practices in the context of model dependencies.
