Table of Contents
Fetching ...

Situated Ground Truths: Enhancing Bias-Aware AI by Situating Data Labels with SituAnnotate

Delfina Sol Martinez Pandiani, Valentina Presutti

TL;DR

SituAnnotate addresses cultural bias in data labeling by introducing a situated grounding ontology that encodes contextual factors (annotator, time, place, remuneration, roles) alongside labels. It uses scenario-based requirement elicitation aligned with the Dolce Ultralight ontology to enable robust, machine-readable reasoning about annotation contexts. A case study in image annotation demonstrates the IAS module's ability to trace and compare meanings across annotation situations, with SPARQL-based evaluation showing all competency questions passing and supporting human-readable explanations. The approach improves interpretability and bias-awareness in AI data pipelines, providing provenance and contextual insight to support ethical, adaptable downstream systems.

Abstract

In the contemporary world of AI and data-driven applications, supervised machines often derive their understanding, which they mimic and reproduce, through annotations--typically conveyed in the form of words or labels. However, such annotations are often divorced from or lack contextual information, and as such hold the potential to inadvertently introduce biases when subsequently used for training. This paper introduces SituAnnotate, a novel ontology explicitly crafted for 'situated grounding,' aiming to anchor the ground truth data employed in training AI systems within the contextual and culturally-bound situations from which those ground truths emerge. SituAnnotate offers an ontology-based approach to structured and context-aware data annotation, addressing potential bias issues associated with isolated annotations. Its representational power encompasses situational context, including annotator details, timing, location, remuneration schemes, annotation roles, and more, ensuring semantic richness. Aligned with the foundational Dolce Ultralight ontology, it provides a robust and consistent framework for knowledge representation. As a method to create, query, and compare label-based datasets, SituAnnotate empowers downstream AI systems to undergo training with explicit consideration of context and cultural bias, laying the groundwork for enhanced system interpretability and adaptability, and enabling AI models to align with a multitude of cultural contexts and viewpoints.

Situated Ground Truths: Enhancing Bias-Aware AI by Situating Data Labels with SituAnnotate

TL;DR

SituAnnotate addresses cultural bias in data labeling by introducing a situated grounding ontology that encodes contextual factors (annotator, time, place, remuneration, roles) alongside labels. It uses scenario-based requirement elicitation aligned with the Dolce Ultralight ontology to enable robust, machine-readable reasoning about annotation contexts. A case study in image annotation demonstrates the IAS module's ability to trace and compare meanings across annotation situations, with SPARQL-based evaluation showing all competency questions passing and supporting human-readable explanations. The approach improves interpretability and bias-awareness in AI data pipelines, providing provenance and contextual insight to support ethical, adaptable downstream systems.

Abstract

In the contemporary world of AI and data-driven applications, supervised machines often derive their understanding, which they mimic and reproduce, through annotations--typically conveyed in the form of words or labels. However, such annotations are often divorced from or lack contextual information, and as such hold the potential to inadvertently introduce biases when subsequently used for training. This paper introduces SituAnnotate, a novel ontology explicitly crafted for 'situated grounding,' aiming to anchor the ground truth data employed in training AI systems within the contextual and culturally-bound situations from which those ground truths emerge. SituAnnotate offers an ontology-based approach to structured and context-aware data annotation, addressing potential bias issues associated with isolated annotations. Its representational power encompasses situational context, including annotator details, timing, location, remuneration schemes, annotation roles, and more, ensuring semantic richness. Aligned with the foundational Dolce Ultralight ontology, it provides a robust and consistent framework for knowledge representation. As a method to create, query, and compare label-based datasets, SituAnnotate empowers downstream AI systems to undergo training with explicit consideration of context and cultural bias, laying the groundwork for enhanced system interpretability and adaptability, and enabling AI models to align with a multitude of cultural contexts and viewpoints.
Paper Structure (48 sections, 7 figures)

This paper contains 48 sections, 7 figures.

Figures (7)

  • Figure 1: SituAnnotate at a glance: Core concepts connecting annotations, annotation situations, and annotators.
  • Figure 2: A detailed view of the SituAnnotate Ontology's core building block, the AnnotationSituation class.
  • Figure 3: Deep Dive into the Annotation class: annotation instances connect annotated entities to lexical entries by fullfilling a specific AnnotationRole in a certain AnnotationSituation.
  • Figure 4: Deep Dive into the Annotator class: SituAnnotate allows the formal representation of different types of annotators and relevant characteristics that may influence their annotation choices.
  • Figure 5: Specialization of the SituAnnotate pattern specifically for Image Annotation Situations (IAS), crucial in the field of Computer Vision (CV). Further modular specializations can be applied to capture details specific to certain types of annotation situations, such as object detection.
  • ...and 2 more figures