Table of Contents
Fetching ...

Documenting Ethical Considerations in Open Source AI Models

Haoyu Gao, Mansooreh Zahedi, Christoph Treude, Sarita Rosenstock, Marc Cheong

TL;DR

This study empirically investigates how ethical considerations are documented in open-source AI artefacts on GitHub and Hugging Face. By building an ethics-focused keyword set and applying thematic analysis to 265 documents, it identifies six themes—data quality concerns, model behavioural risks, model risk mitigation, model use cases, references to other materials, and others—with a heavy emphasis on behavioural risks and use restrictions. The findings reveal that most documentation is brief and often lacks concrete mitigation guidance, highlighting a gap between proposed documentation frameworks and actual practice. The authors provide a replication package and practical recommendations for developers, researchers, educators, and policymakers to improve ethical documentation in OSS AI projects, aiming to enhance transparency and accountability in open-source ecosystems.

Abstract

Background: The development of AI-enabled software heavily depends on AI model documentation, such as model cards, due to different domain expertise between software engineers and model developers. From an ethical standpoint, AI model documentation conveys critical information on ethical considerations along with mitigation strategies for downstream developers to ensure the delivery of ethically compliant software. However, knowledge on such documentation practice remains scarce. Aims: The objective of our study is to investigate how developers document ethical aspects of open source AI models in practice, aiming at providing recommendations for future documentation endeavours. Method: We selected three sources of documentation on GitHub and Hugging Face, and developed a keyword set to identify ethics-related documents systematically. After filtering an initial set of 2,347 documents, we identified 265 relevant ones and performed thematic analysis to derive the themes of ethical considerations. Results: Six themes emerge, with the three largest ones being model behavioural risks, model use cases, and model risk mitigation. Conclusions: Our findings reveal that open source AI model documentation focuses on articulating ethical problem statements and use case restrictions. We further provide suggestions to various stakeholders for improving documentation practice regarding ethical considerations.

Documenting Ethical Considerations in Open Source AI Models

TL;DR

This study empirically investigates how ethical considerations are documented in open-source AI artefacts on GitHub and Hugging Face. By building an ethics-focused keyword set and applying thematic analysis to 265 documents, it identifies six themes—data quality concerns, model behavioural risks, model risk mitigation, model use cases, references to other materials, and others—with a heavy emphasis on behavioural risks and use restrictions. The findings reveal that most documentation is brief and often lacks concrete mitigation guidance, highlighting a gap between proposed documentation frameworks and actual practice. The authors provide a replication package and practical recommendations for developers, researchers, educators, and policymakers to improve ethical documentation in OSS AI projects, aiming to enhance transparency and accountability in open-source ecosystems.

Abstract

Background: The development of AI-enabled software heavily depends on AI model documentation, such as model cards, due to different domain expertise between software engineers and model developers. From an ethical standpoint, AI model documentation conveys critical information on ethical considerations along with mitigation strategies for downstream developers to ensure the delivery of ethically compliant software. However, knowledge on such documentation practice remains scarce. Aims: The objective of our study is to investigate how developers document ethical aspects of open source AI models in practice, aiming at providing recommendations for future documentation endeavours. Method: We selected three sources of documentation on GitHub and Hugging Face, and developed a keyword set to identify ethics-related documents systematically. After filtering an initial set of 2,347 documents, we identified 265 relevant ones and performed thematic analysis to derive the themes of ethical considerations. Results: Six themes emerge, with the three largest ones being model behavioural risks, model use cases, and model risk mitigation. Conclusions: Our findings reveal that open source AI model documentation focuses on articulating ethical problem statements and use case restrictions. We further provide suggestions to various stakeholders for improving documentation practice regarding ethical considerations.

Paper Structure

This paper contains 22 sections, 5 figures, 1 table.

Figures (5)

  • Figure 1: Data Collection Steps
  • Figure 2: Top-10 False Positive Keywords
  • Figure 3: Pairwise Document Cosine Distance
  • Figure 4: Steps for Coding Process
  • Figure 5: Coding Example for Raw Document (databricks/dolly-v2.7b)