Fairness Through Controlled (Un)Awareness in Node Embeddings
Dennis Vetter, Jasper Forth, Gemma Roig, Holger Dell
TL;DR
This study analyzes how CrossWalk’s hyperparameters $\alpha$, $\beta$, together with node2vec’s $p$, $q$, shape the ease of inferring sensitive attributes from node embeddings, enabling either obfuscation or amplification of such signals to support different fairness notions. Using Pokec-based subgraphs with controlled geography, the authors evaluate Awareness, Disparity, and Performance via Label Propagation and 25-fold cross-validation, demonstrating that high $\alpha$ and $\beta$ induce low awareness (better procedural justice) but can increase disparity and degrade non-sensitive attribute performance. Conversely, low $\alpha$ and $\beta$ raise awareness (better distributive justice) at the cost of fairness across groups and overall embedding quality. The work provides parameterization guidelines and an integrated implementation to compare CrossWalk with node2vec, highlighting practical trade-offs and the potential for misuse if sensitive attributes are exploited.
Abstract
Graph representation learning is central for the application of machine learning (ML) models to complex graphs, such as social networks. Ensuring `fair' representations is essential, due to the societal implications and the use of sensitive personal data. In this paper, we demonstrate how the parametrization of the \emph{CrossWalk} algorithm influences the ability to infer a sensitive attributes from node embeddings. By fine-tuning hyperparameters, we show that it is possible to either significantly enhance or obscure the detectability of these attributes. This functionality offers a valuable tool for improving the fairness of ML systems utilizing graph embeddings, making them adaptable to different fairness paradigms.
