The Lifecycle of "Facts": A Survey of Social Bias in Knowledge Graphs
Angelie Kraft, Ricardo Usbeck
TL;DR
We address the problem of social bias in knowledge graphs by surveying bias sources, measurement, and mitigation across the KG lifecycle—from crowd-sourced triples and hand-crafted ontologies to automated extraction and KG embeddings. The paper synthesizes methods such as descriptive statistics, projection-based and update-based bias measures, and analogy tests, and discusses mitigation strategies like data balancing, adversarial learning, and hard debiasing, highlighting limitations and validation gaps. It emphasizes the downstream impact on link prediction and real-world tasks, and argues for transparency, representativeness, and multi-faceted evaluation to reduce harms. The findings underscore that KGs often reflect and amplify historical biases, calling for principled, context-aware approaches and ongoing governance to enable fairer, more reliable knowledge representations.
Abstract
Knowledge graphs are increasingly used in a plethora of downstream tasks or in the augmentation of statistical models to improve factuality. However, social biases are engraved in these representations and propagate downstream. We conducted a critical analysis of literature concerning biases at different steps of a knowledge graph lifecycle. We investigated factors introducing bias, as well as the biases that are rendered by knowledge graphs and their embedded versions afterward. Limitations of existing measurement and mitigation strategies are discussed and paths forward are proposed.
