Table of Contents
Fetching ...

Blockchain-Enabled Accountability in Data Supply Chain: A Data Bill of Materials Approach

Yue Liu, Dawen Zhang, Boming Xia, Julia Anticev, Tunde Adebayo, Zhenchang Xing, Moses Machao

TL;DR

This paper introduces DataBOM, a Data Bill of Materials framework, to address traceability, verifiability, and reproducibility challenges in AI data supply chains. It maps DataBOM onto a three-tier blockchain architecture with on-chain registries and off-chain data repositories, enabled by smart contracts and identity services. The authors define an interaction protocol and minimal metadata requirements, and validate feasibility and performance through a case study and throughput/latency experiments. The work contributes a novel architecture, protocol, and evaluation demonstrating that blockchain-based DataBOM can improve accountability in data-intensive AI development, with future directions toward automated metadata extraction and integration with SBOM-based AI governance.

Abstract

In the era of advanced artificial intelligence, highlighted by large-scale generative models like GPT-4, ensuring the traceability, verifiability, and reproducibility of datasets throughout their lifecycle is paramount for research institutions and technology companies. These organisations increasingly rely on vast corpora to train and fine-tune advanced AI models, resulting in intricate data supply chains that demand effective data governance mechanisms. In addition, the challenge intensifies as diverse stakeholders may use assorted tools, often without adequate measures to ensure the accountability of data and the reliability of outcomes. In this study, we adapt the concept of ``Software Bill of Materials" into the field of data governance and management to address the above challenges, and introduce ``Data Bill of Materials" (DataBOM) to capture the dependency relationship between different datasets and stakeholders by storing specific metadata. We demonstrate a platform architecture for providing blockchain-based DataBOM services, present the interaction protocol for stakeholders, and discuss the minimal requirements for DataBOM metadata. The proposed solution is evaluated in terms of feasibility and performance via case study and quantitative analysis respectively.

Blockchain-Enabled Accountability in Data Supply Chain: A Data Bill of Materials Approach

TL;DR

This paper introduces DataBOM, a Data Bill of Materials framework, to address traceability, verifiability, and reproducibility challenges in AI data supply chains. It maps DataBOM onto a three-tier blockchain architecture with on-chain registries and off-chain data repositories, enabled by smart contracts and identity services. The authors define an interaction protocol and minimal metadata requirements, and validate feasibility and performance through a case study and throughput/latency experiments. The work contributes a novel architecture, protocol, and evaluation demonstrating that blockchain-based DataBOM can improve accountability in data-intensive AI development, with future directions toward automated metadata extraction and integration with SBOM-based AI governance.

Abstract

In the era of advanced artificial intelligence, highlighted by large-scale generative models like GPT-4, ensuring the traceability, verifiability, and reproducibility of datasets throughout their lifecycle is paramount for research institutions and technology companies. These organisations increasingly rely on vast corpora to train and fine-tune advanced AI models, resulting in intricate data supply chains that demand effective data governance mechanisms. In addition, the challenge intensifies as diverse stakeholders may use assorted tools, often without adequate measures to ensure the accountability of data and the reliability of outcomes. In this study, we adapt the concept of ``Software Bill of Materials" into the field of data governance and management to address the above challenges, and introduce ``Data Bill of Materials" (DataBOM) to capture the dependency relationship between different datasets and stakeholders by storing specific metadata. We demonstrate a platform architecture for providing blockchain-based DataBOM services, present the interaction protocol for stakeholders, and discuss the minimal requirements for DataBOM metadata. The proposed solution is evaluated in terms of feasibility and performance via case study and quantitative analysis respectively.
Paper Structure (13 sections, 5 figures)

This paper contains 13 sections, 5 figures.

Figures (5)

  • Figure 1: Platform architecture for blockchain-based DataBOM.
  • Figure 2: Interaction protocol for blockchain-based DataBOM.
  • Figure 3: Data supply chain in research projects.
  • Figure 4: Throughput for two RESTful APIs.
  • Figure 5: Response time for two RESTful APIs.