Table of Contents
Fetching ...

Information Loss and Disparate Effects in Network Embeddings

Gabriel Chuang, Augustin Chaintreau

TL;DR

This work analyzes baseline, non-fairness network embeddings through the lens of stochastic block models (SBMs), showing that low-dimensional, inner-product embeddings can lose information about the input graph. It characterizes three information-loss regimes for 2-block SBMs (dense, sparse, and intermediate) and proves that in the intermediate regime many SBMs map to identical embeddings, forming equivalence classes whose slopes depend on community sizes and densities. The authors extend these ideas to higher-order SBMs via numerical characterizations, revealing that identifiability hinges on density and PSD-like constraints, and they demonstrate that smaller or sparser communities are disproportionately densified in embeddings, leading to fairness-relevant disparities in tasks like link prediction. They validate the theory on real data (Facebook100), connect the phenomenon to downstream tasks and fairness considerations, and discuss implications for selecting representations and potential interventions.

Abstract

An extensive line of work studies fairness interventions for network embeddings, but less is known about their baseline behavior. In this work, we ask: how do baseline embeddings (without fairness interventions) produce disparate effects at the representation level? We analyze the asymptotic behavior of low-dimensional embeddings on stochastic block model (SBM) graphs, which encode both homophily and group structure. We characterize exact conditions under which embeddings cause information loss, showing that the amount of information loss depends directly on the graph's density and assortativity. Notably, very different graphs can produce identical embeddings in the limit, and this non-invertibility disproportionately affects smaller and sparser communities. As a result, simple downstream tasks, such as link prediction, introduce higher error rates for these communities, helping explain disparities widely observed in practice.

Information Loss and Disparate Effects in Network Embeddings

TL;DR

This work analyzes baseline, non-fairness network embeddings through the lens of stochastic block models (SBMs), showing that low-dimensional, inner-product embeddings can lose information about the input graph. It characterizes three information-loss regimes for 2-block SBMs (dense, sparse, and intermediate) and proves that in the intermediate regime many SBMs map to identical embeddings, forming equivalence classes whose slopes depend on community sizes and densities. The authors extend these ideas to higher-order SBMs via numerical characterizations, revealing that identifiability hinges on density and PSD-like constraints, and they demonstrate that smaller or sparser communities are disproportionately densified in embeddings, leading to fairness-relevant disparities in tasks like link prediction. They validate the theory on real data (Facebook100), connect the phenomenon to downstream tasks and fairness considerations, and discuss implications for selecting representations and potential interventions.

Abstract

An extensive line of work studies fairness interventions for network embeddings, but less is known about their baseline behavior. In this work, we ask: how do baseline embeddings (without fairness interventions) produce disparate effects at the representation level? We analyze the asymptotic behavior of low-dimensional embeddings on stochastic block model (SBM) graphs, which encode both homophily and group structure. We characterize exact conditions under which embeddings cause information loss, showing that the amount of information loss depends directly on the graph's density and assortativity. Notably, very different graphs can produce identical embeddings in the limit, and this non-invertibility disproportionately affects smaller and sparser communities. As a result, simple downstream tasks, such as link prediction, introduce higher error rates for these communities, helping explain disparities widely observed in practice.

Paper Structure

This paper contains 32 sections, 5 theorems, 40 equations, 10 figures, 1 algorithm.

Key Result

Theorem 4.1

Let $\mathcal{G}$ be a 2-block SBM($[p,q;q,r], (a, 1-a))$), and let $\omega_i$ be the embeddings learned by Algorithm 1 at convergence. Then, the kernel matrix $K(\mathcal{G}) \in \mathbb R^{n\times n}$ is a block-constant matrix: where the values of $K_i \in \mathbb R$ depend on $a,p,q,r$ as follows:

Figures (10)

  • Figure 1: We characterize conditions under which the embedding process encodes full, partial, or no information about the input stochastic block model (SBM). In a dense regime $\Pi_d$, the embedding learns (perfectly) the information of the SBM. In a sparse regime $\Pi_s$, the embedding is degenerate and loses all information. In most cases $\Pi_m$, community structure is encoded but only some edge density information is retained; in this regime, very different SBMs map to the same representation in a way that leads to disparate group-level effects.
  • Figure 2: Two motivating examples. (a), left: embedding two same-density, different-size groups gives a representation that makes the smaller group seem much more dense. (b), right: similarly, sparser groups are made to appear more dense.
  • Figure 3: Regions of 2-block SBM parameter space in which none, some, or all of the information is preserved in the embedding.
  • Figure 4: A 3-dimensional slice of the 6-dimensional parameter space of 3-block SBMs, showing SBMs with within-group edge probabilities of $(x,y,z)$ and between-group probability $0.3$. The regions of full, partial, and no information loss closely resemble the 2-block case.
  • Figure 5: Examples of equivalence classes in $(p,q,r)$ space for $a=0.66$.
  • ...and 5 more figures

Theorems & Definitions (11)

  • Definition 2.1: $k$-block Stochastic Block Model
  • Definition 2.2: Uniform Vertex Sampling
  • Definition 2.3: Cross-Entropy Loss
  • Theorem 4.1: Information Loss Regimes
  • Theorem 5.1: Identically-embedded 2-block SBMs
  • Corollary 5.2
  • Theorem 1.1: Information Loss Regimes
  • proof
  • Theorem 1.1: Identically-embedded 2-block SBMs
  • proof
  • ...and 1 more