Information Loss and Disparate Effects in Network Embeddings

Gabriel Chuang; Augustin Chaintreau

Information Loss and Disparate Effects in Network Embeddings

Gabriel Chuang, Augustin Chaintreau

TL;DR

This work analyzes baseline, non-fairness network embeddings through the lens of stochastic block models (SBMs), showing that low-dimensional, inner-product embeddings can lose information about the input graph. It characterizes three information-loss regimes for 2-block SBMs (dense, sparse, and intermediate) and proves that in the intermediate regime many SBMs map to identical embeddings, forming equivalence classes whose slopes depend on community sizes and densities. The authors extend these ideas to higher-order SBMs via numerical characterizations, revealing that identifiability hinges on density and PSD-like constraints, and they demonstrate that smaller or sparser communities are disproportionately densified in embeddings, leading to fairness-relevant disparities in tasks like link prediction. They validate the theory on real data (Facebook100), connect the phenomenon to downstream tasks and fairness considerations, and discuss implications for selecting representations and potential interventions.

Abstract

An extensive line of work studies fairness interventions for network embeddings, but less is known about their baseline behavior. In this work, we ask: how do baseline embeddings (without fairness interventions) produce disparate effects at the representation level? We analyze the asymptotic behavior of low-dimensional embeddings on stochastic block model (SBM) graphs, which encode both homophily and group structure. We characterize exact conditions under which embeddings cause information loss, showing that the amount of information loss depends directly on the graph's density and assortativity. Notably, very different graphs can produce identical embeddings in the limit, and this non-invertibility disproportionately affects smaller and sparser communities. As a result, simple downstream tasks, such as link prediction, introduce higher error rates for these communities, helping explain disparities widely observed in practice.

Information Loss and Disparate Effects in Network Embeddings

TL;DR

Abstract

Information Loss and Disparate Effects in Network Embeddings

TL;DR

Abstract

Paper Structure

Table of Contents

Key Result

Figures (10)

Theorems & Definitions (11)