Trustworthy Representation Learning via Information Funnels and Bottlenecks
João Machado de Freitas, Bernhard C. Geiger
TL;DR
Trustworthy representation learning is challenged by the need to balance utility, fairness, and privacy. The authors introduce CPFSI, a Conditional Privacy Funnel with Side-Information, and derive amortized variational bounds to optimize a multi-objective Lagrangian that jointly minimizes information about the sensitive attribute while preserving information about the input and utility for a downstream task. CPFSI extends prior information-theoretic objectives (IB, IBSI, CPF, CFB) and supports both fully supervised and semi-supervised learning on tabular data, with the ability to intervene on the sensitive attribute at inference for counterfactual fairness. Empirical results across Adult, Dutch, Credit, and COMPAS show CPFSI achieves favorable utility-invariance-fidelity trade-offs, often outperforming baselines, and demonstrates practical fairness gains with relatively small labeled datasets. The work provides a principled framework for robust, fair representations in data-scarce settings and suggests promising avenues for future extensions to domain adaptation and broader modalities.
Abstract
Ensuring trustworthiness in machine learning -- by balancing utility, fairness, and privacy -- remains a critical challenge, particularly in representation learning. In this work, we investigate a family of closely related information-theoretic objectives, including information funnels and bottlenecks, designed to extract invariant representations from data. We introduce the Conditional Privacy Funnel with Side-information (CPFSI), a novel formulation within this family, applicable in both fully and semi-supervised settings. Given the intractability of these objectives, we derive neural-network-based approximations via amortized variational inference. We systematically analyze the trade-offs between utility, invariance, and representation fidelity, offering new insights into the Pareto frontiers of these methods. Our results demonstrate that CPFSI effectively balances these competing objectives and frequently outperforms existing approaches. Furthermore, we show that by intervening on sensitive attributes in CPFSI's predictive posterior enhances fairness while maintaining predictive performance. Finally, we focus on the real-world applicability of these approaches, particularly for learning robust and fair representations from tabular datasets in data scarce-environments -- a modality where these methods are often especially relevant.
