Debiasing Graph Representation Learning based on Information Bottleneck

Ziyi Zhang; Mingxuan Ouyang; Wanyu Lin; Hao Lan; Lei Yang

Debiasing Graph Representation Learning based on Information Bottleneck

Ziyi Zhang, Mingxuan Ouyang, Wanyu Lin, Hao Lan, Lei Yang

TL;DR

This work tackles fairness in graph representation learning by eliminating reliance on adversarial training. It introduces GRAFair, a variational graph auto-encoder that optimizes a Conditional Fairness Bottleneck to minimize sensitive information in node representations while preserving task-relevant information. By deriving tractable variational bounds and employing a Gaussian posterior with reparameterization, the approach achieves a favorable trade-off between utility and fairness without the instability of adversarial methods. Experiments on three real-world datasets demonstrate improved fairness, robust performance, and competitive time efficiency across different GNN backbones. The study highlights practical implications for deploying fair graph models and outlines avenues for extending to multiple sensitive attributes and richer encoding schemes.

Abstract

Graph representation learning has shown superior performance in numerous real-world applications, such as finance and social networks. Nevertheless, most existing works might make discriminatory predictions due to insufficient attention to fairness in their decision-making processes. This oversight has prompted a growing focus on fair representation learning. Among recent explorations on fair representation learning, prior works based on adversarial learning usually induce unstable or counterproductive performance. To achieve fairness in a stable manner, we present the design and implementation of GRAFair, a new framework based on a variational graph auto-encoder. The crux of GRAFair is the Conditional Fairness Bottleneck, where the objective is to capture the trade-off between the utility of representations and sensitive information of interest. By applying variational approximation, we can make the optimization objective tractable. Particularly, GRAFair can be trained to produce informative representations of tasks while containing little sensitive information without adversarial training. Experiments on various real-world datasets demonstrate the effectiveness of our proposed method in terms of fairness, utility, robustness, and stability.

Debiasing Graph Representation Learning based on Information Bottleneck

TL;DR

Abstract

Paper Structure (36 sections, 24 equations, 4 figures, 6 tables, 1 algorithm)

This paper contains 36 sections, 24 equations, 4 figures, 6 tables, 1 algorithm.

Introduction
Related work
Graph neural networks
Fair representation learning on graphs
Adversarial learning
Preliminaries
Notations
Graph message passing
Information bottleneck
Conditional fairness bottleneck
Fairness definitions
Problem definition
Our Framework: GRAFair
Objective function
The solution to GRAFair
...and 21 more sections

Figures (4)

Figure 1: An illustration of gender bias in GNN for loan approval prediction.
Figure 2: An illustration of the proposed framework. GRAFair consists of two parts: an encoder and a decoder. The variational graph encoder maps the input graph data $\mathcal{G}$ to node representations $\mathbf Z$. The encoder learns the mean $\bm{\mu}_i$ and log variance $\log{\bm{\sigma}_i}$ of $\mathbf z_i$. By sampling $\bm \epsilon$ from standard Gaussian distribution, we can obtain the latent representation of the node $\mathbf z_i=\bm\mu_i+\bm\sigma_i \odot \bm\epsilon$. The node representation $\mathbf Z$ sampling from the learned distribution together with sensitive attributes $\mathbf S$ are the input of the decoder during training. The decoder utilizes representations to predict label $\hat{\mathbf Y}$ in downstream tasks.
Figure 3: Time efficiency (in seconds) of different methods on Bail, Credit and German datasets. Each value refers to the average time during training of an epoch.
Figure 4: Utility performance and fairness under different Hyper-parameter $\beta$ on Bail dataset. The value range of $\beta$ is $\{1,5,10,50,10^2,5\times 10^2,10^3,5\times 10^3, 10^4, 5\times 10^4\}$. Here, $\beta=10^3$ can reach a favorable trade-off between utility and fairness.

Theorems & Definitions (3)

Definition 3.1
Definition 3.2
Definition 3.3

Debiasing Graph Representation Learning based on Information Bottleneck

TL;DR

Abstract

Debiasing Graph Representation Learning based on Information Bottleneck

Authors

TL;DR

Abstract

Table of Contents

Figures (4)

Theorems & Definitions (3)