Table of Contents
Fetching ...

Fairness Through Controlled (Un)Awareness in Node Embeddings

Dennis Vetter, Jasper Forth, Gemma Roig, Holger Dell

TL;DR

This study analyzes how CrossWalk’s hyperparameters $\alpha$, $\beta$, together with node2vec’s $p$, $q$, shape the ease of inferring sensitive attributes from node embeddings, enabling either obfuscation or amplification of such signals to support different fairness notions. Using Pokec-based subgraphs with controlled geography, the authors evaluate Awareness, Disparity, and Performance via Label Propagation and 25-fold cross-validation, demonstrating that high $\alpha$ and $\beta$ induce low awareness (better procedural justice) but can increase disparity and degrade non-sensitive attribute performance. Conversely, low $\alpha$ and $\beta$ raise awareness (better distributive justice) at the cost of fairness across groups and overall embedding quality. The work provides parameterization guidelines and an integrated implementation to compare CrossWalk with node2vec, highlighting practical trade-offs and the potential for misuse if sensitive attributes are exploited.

Abstract

Graph representation learning is central for the application of machine learning (ML) models to complex graphs, such as social networks. Ensuring `fair' representations is essential, due to the societal implications and the use of sensitive personal data. In this paper, we demonstrate how the parametrization of the \emph{CrossWalk} algorithm influences the ability to infer a sensitive attributes from node embeddings. By fine-tuning hyperparameters, we show that it is possible to either significantly enhance or obscure the detectability of these attributes. This functionality offers a valuable tool for improving the fairness of ML systems utilizing graph embeddings, making them adaptable to different fairness paradigms.

Fairness Through Controlled (Un)Awareness in Node Embeddings

TL;DR

This study analyzes how CrossWalk’s hyperparameters , , together with node2vec’s , , shape the ease of inferring sensitive attributes from node embeddings, enabling either obfuscation or amplification of such signals to support different fairness notions. Using Pokec-based subgraphs with controlled geography, the authors evaluate Awareness, Disparity, and Performance via Label Propagation and 25-fold cross-validation, demonstrating that high and induce low awareness (better procedural justice) but can increase disparity and degrade non-sensitive attribute performance. Conversely, low and raise awareness (better distributive justice) at the cost of fairness across groups and overall embedding quality. The work provides parameterization guidelines and an integrated implementation to compare CrossWalk with node2vec, highlighting practical trade-offs and the potential for misuse if sensitive attributes are exploited.

Abstract

Graph representation learning is central for the application of machine learning (ML) models to complex graphs, such as social networks. Ensuring `fair' representations is essential, due to the societal implications and the use of sensitive personal data. In this paper, we demonstrate how the parametrization of the \emph{CrossWalk} algorithm influences the ability to infer a sensitive attributes from node embeddings. By fine-tuning hyperparameters, we show that it is possible to either significantly enhance or obscure the detectability of these attributes. This functionality offers a valuable tool for improving the fairness of ML systems utilizing graph embeddings, making them adaptable to different fairness paradigms.
Paper Structure (14 sections, 6 figures, 2 tables)

This paper contains 14 sections, 6 figures, 2 tables.

Figures (6)

  • Figure 1: Impact of low awareness and high awareness configurations of the CrossWalk algorithm on node embeddings for a subgraph from the semi-distinct category, with sensitive attribute 'location'. The embeddings are visualized using t-SNE for dimensionality reduction, and the node colors correspond to sensitive attribute class.
  • Figure 2: Mean awareness over subgraphs for sensitive attribute 'location'. Error bars show range of values over all node2vec parametrisations. Low and high awareness configurations can consistently adapt awareness below or above the node2vec baseline.
  • Figure 3: Classification F1 score for sensitive attribute 'location' by relative group size of groups in all subgraphs. Low awareness makes classification of small groups virtually impossible, differences in F1 score between low and high awareness are more pronounced for smaller groups.
  • Figure 4: CrossWalk's impact on disparity of sensitive attribute 'location'. The low awareness configuration tends to lead to higher disparity, indicating an unequal impact on the different groups.
  • Figure 5: Classification F1 score for the control attribute 'age' by relative group size of groups in all subgraphs. Influencing embeddings for lower or higher awareness of the sensitive attribute makes classifying the control attribute more difficult.
  • ...and 1 more figures