Situated Ground Truths: Enhancing Bias-Aware AI by Situating Data Labels with SituAnnotate
Delfina Sol Martinez Pandiani, Valentina Presutti
TL;DR
SituAnnotate addresses cultural bias in data labeling by introducing a situated grounding ontology that encodes contextual factors (annotator, time, place, remuneration, roles) alongside labels. It uses scenario-based requirement elicitation aligned with the Dolce Ultralight ontology to enable robust, machine-readable reasoning about annotation contexts. A case study in image annotation demonstrates the IAS module's ability to trace and compare meanings across annotation situations, with SPARQL-based evaluation showing all competency questions passing and supporting human-readable explanations. The approach improves interpretability and bias-awareness in AI data pipelines, providing provenance and contextual insight to support ethical, adaptable downstream systems.
Abstract
In the contemporary world of AI and data-driven applications, supervised machines often derive their understanding, which they mimic and reproduce, through annotations--typically conveyed in the form of words or labels. However, such annotations are often divorced from or lack contextual information, and as such hold the potential to inadvertently introduce biases when subsequently used for training. This paper introduces SituAnnotate, a novel ontology explicitly crafted for 'situated grounding,' aiming to anchor the ground truth data employed in training AI systems within the contextual and culturally-bound situations from which those ground truths emerge. SituAnnotate offers an ontology-based approach to structured and context-aware data annotation, addressing potential bias issues associated with isolated annotations. Its representational power encompasses situational context, including annotator details, timing, location, remuneration schemes, annotation roles, and more, ensuring semantic richness. Aligned with the foundational Dolce Ultralight ontology, it provides a robust and consistent framework for knowledge representation. As a method to create, query, and compare label-based datasets, SituAnnotate empowers downstream AI systems to undergo training with explicit consideration of context and cultural bias, laying the groundwork for enhanced system interpretability and adaptability, and enabling AI models to align with a multitude of cultural contexts and viewpoints.
