Table of Contents
Fetching ...

Leveraging Ontologies to Document Bias in Data

Mayra Russo, Maria-Esther Vidal

TL;DR

The paper addresses the lack of a formal, machine-readable description of biases in ML pipelines. It introduces Doc-BiasO, an ontology that aggregates bias concepts and their measurement across datasets, models, and tasks, reusing existing Semantic Web vocabularies to foster interoperability. The authors provide a design with layered architecture, competency questions, and an instantiation example (popularity bias in recommender systems), plus an automatic evaluation showing syntactic correctness, logical coherence, and reasonable coverage. The work aims to support trustworthy AI by clarifying terminology, enabling bias-aware documentation, and guiding future standardization.

Abstract

Machine Learning (ML) systems are capable of reproducing and often amplifying undesired biases. This puts emphasis on the importance of operating under practices that enable the study and understanding of the intrinsic characteristics of ML pipelines, prompting the emergence of documentation frameworks with the idea that ``any remedy for bias starts with awareness of its existence''. However, a resource that can formally describe these pipelines in terms of biases detected is still amiss. To fill this gap, we present the Doc-BiasO ontology, a resource that aims to create an integrated vocabulary of biases defined in the \textit{fair-ML} literature and their measures, as well as to incorporate relevant terminology and the relationships between them. Overseeing ontology engineering best practices, we re-use existing vocabulary on machine learning and AI, to foster knowledge sharing and interoperability between the actors concerned with its research, development, regulation, among others. Overall, our main objective is to contribute towards clarifying existing terminology on bias research as it rapidly expands to all areas of AI and to improve the interpretation of bias in data and downstream impact.

Leveraging Ontologies to Document Bias in Data

TL;DR

The paper addresses the lack of a formal, machine-readable description of biases in ML pipelines. It introduces Doc-BiasO, an ontology that aggregates bias concepts and their measurement across datasets, models, and tasks, reusing existing Semantic Web vocabularies to foster interoperability. The authors provide a design with layered architecture, competency questions, and an instantiation example (popularity bias in recommender systems), plus an automatic evaluation showing syntactic correctness, logical coherence, and reasonable coverage. The work aims to support trustworthy AI by clarifying terminology, enabling bias-aware documentation, and guiding future standardization.

Abstract

Machine Learning (ML) systems are capable of reproducing and often amplifying undesired biases. This puts emphasis on the importance of operating under practices that enable the study and understanding of the intrinsic characteristics of ML pipelines, prompting the emergence of documentation frameworks with the idea that ``any remedy for bias starts with awareness of its existence''. However, a resource that can formally describe these pipelines in terms of biases detected is still amiss. To fill this gap, we present the Doc-BiasO ontology, a resource that aims to create an integrated vocabulary of biases defined in the \textit{fair-ML} literature and their measures, as well as to incorporate relevant terminology and the relationships between them. Overseeing ontology engineering best practices, we re-use existing vocabulary on machine learning and AI, to foster knowledge sharing and interoperability between the actors concerned with its research, development, regulation, among others. Overall, our main objective is to contribute towards clarifying existing terminology on bias research as it rapidly expands to all areas of AI and to improve the interpretation of bias in data and downstream impact.
Paper Structure (12 sections, 3 figures, 2 tables)

This paper contains 12 sections, 3 figures, 2 tables.

Figures (3)

  • Figure 1: Types of Bias. Core categories of bias in relation to AI systems as per the NIST report 933006.
  • Figure 2: Conceptualization of the Doc-BiasO Ontology. Core concepts in the ontology are represented as classes, in color-coded boxes, to account for originating vocabularies. While object properties are drawn as directed arrows between classes. In purple colored boxes, relevant and prominently re-used vocabularies implemented in the representation of the universe of discourse.
  • Figure 3: