Documenting Ethical Considerations in Open Source AI Models
Haoyu Gao, Mansooreh Zahedi, Christoph Treude, Sarita Rosenstock, Marc Cheong
TL;DR
This study empirically investigates how ethical considerations are documented in open-source AI artefacts on GitHub and Hugging Face. By building an ethics-focused keyword set and applying thematic analysis to 265 documents, it identifies six themes—data quality concerns, model behavioural risks, model risk mitigation, model use cases, references to other materials, and others—with a heavy emphasis on behavioural risks and use restrictions. The findings reveal that most documentation is brief and often lacks concrete mitigation guidance, highlighting a gap between proposed documentation frameworks and actual practice. The authors provide a replication package and practical recommendations for developers, researchers, educators, and policymakers to improve ethical documentation in OSS AI projects, aiming to enhance transparency and accountability in open-source ecosystems.
Abstract
Background: The development of AI-enabled software heavily depends on AI model documentation, such as model cards, due to different domain expertise between software engineers and model developers. From an ethical standpoint, AI model documentation conveys critical information on ethical considerations along with mitigation strategies for downstream developers to ensure the delivery of ethically compliant software. However, knowledge on such documentation practice remains scarce. Aims: The objective of our study is to investigate how developers document ethical aspects of open source AI models in practice, aiming at providing recommendations for future documentation endeavours. Method: We selected three sources of documentation on GitHub and Hugging Face, and developed a keyword set to identify ethics-related documents systematically. After filtering an initial set of 2,347 documents, we identified 265 relevant ones and performed thematic analysis to derive the themes of ethical considerations. Results: Six themes emerge, with the three largest ones being model behavioural risks, model use cases, and model risk mitigation. Conclusions: Our findings reveal that open source AI model documentation focuses on articulating ethical problem statements and use case restrictions. We further provide suggestions to various stakeholders for improving documentation practice regarding ethical considerations.
