The Lifecycle of "Facts": A Survey of Social Bias in Knowledge Graphs

Angelie Kraft; Ricardo Usbeck

The Lifecycle of "Facts": A Survey of Social Bias in Knowledge Graphs

Angelie Kraft, Ricardo Usbeck

TL;DR

We address the problem of social bias in knowledge graphs by surveying bias sources, measurement, and mitigation across the KG lifecycle—from crowd-sourced triples and hand-crafted ontologies to automated extraction and KG embeddings. The paper synthesizes methods such as descriptive statistics, projection-based and update-based bias measures, and analogy tests, and discusses mitigation strategies like data balancing, adversarial learning, and hard debiasing, highlighting limitations and validation gaps. It emphasizes the downstream impact on link prediction and real-world tasks, and argues for transparency, representativeness, and multi-faceted evaluation to reduce harms. The findings underscore that KGs often reflect and amplify historical biases, calling for principled, context-aware approaches and ongoing governance to enable fairer, more reliable knowledge representations.

Abstract

Knowledge graphs are increasingly used in a plethora of downstream tasks or in the augmentation of statistical models to improve factuality. However, social biases are engraved in these representations and propagate downstream. We conducted a critical analysis of literature concerning biases at different steps of a knowledge graph lifecycle. We investigated factors introducing bias, as well as the biases that are rendered by knowledge graphs and their embedded versions afterward. Limitations of existing measurement and mitigation strategies are discussed and paths forward are proposed.

The Lifecycle of "Facts": A Survey of Social Bias in Knowledge Graphs

TL;DR

Abstract

Paper Structure (33 sections, 1 figure, 1 table)

This paper contains 33 sections, 1 figure, 1 table.

Introduction
Notes on Bias, Fairness, and Factuality
Bias
Unwanted Biases and Harms
Factuality versus Fairness
Entering the Lifecycle: Bias in Knowledge Graph Creation
Triples: Crowd-Sourcing of Facts
Ontologies: Manual Creation of Rules
Extraction: Automated Extraction of Information
A Note on Reporting Bias
Bias in Knowledge Graphs
Descriptive Statistics
Semantic Polarity
Bias in Knowledge Graph Embeddings
Stereotypical Analogies
...and 18 more sections

Figures (1)

Figure 1: Overview of the knowledge graph lifecycle as discussed in this paper. Exclamation marks indicate factors that introduce or amplify bias. We examine bias-inducing factors of triple crowd-sourcing, hand-crafted ontologies, and automated information extraction (Chapter 3), as well as the resulting social biases in KGs (Chapter 4) and KG embeddings, including approaches for measurement and mitigation (Chapter 5).

The Lifecycle of "Facts": A Survey of Social Bias in Knowledge Graphs

TL;DR

Abstract

The Lifecycle of "Facts": A Survey of Social Bias in Knowledge Graphs

Authors

TL;DR

Abstract

Table of Contents

Figures (1)