Separation-Utility Pareto Frontier: An Information-Theoretic Characterization

Shizhou Xu

Separation-Utility Pareto Frontier: An Information-Theoretic Characterization

Shizhou Xu

TL;DR

This paper develops an information-theoretic framework for the separation–utility trade-off, showing that the optimal randomized frontier is the concave closure of the deterministic frontier and that the separation measure $v$ is precisely captured by the conditional mutual information $I(U;Z|Y)$. It introduces a direct, gradient-friendly CMI regularizer with a gradient-normalization scheme and a differentiable plug-in estimator, accompanied by finite-sample bias and concentration guarantees. The approach yields smoother, more stable Pareto frontiers and improved deployment metrics across four benchmarks (Adult, COMPAS, Bank, CelebA) while preserving or enhancing utility, and it demonstrates that post-hoc thresholding can mask underlying posterior dependence. The work provides provable guarantees, scalable optimization, and practical guidance for enforcing separation in deep learning systems.

Abstract

We study the Pareto frontier (optimal trade-off) between utility and separation, a fairness criterion requiring predictive independence from sensitive attributes conditional on the true outcome. Through an information-theoretic lens, we prove a characterization of the utility-separation Pareto frontier, establish its concavity, and thereby prove the increasing marginal cost of separation in terms of utility. In addition, we characterize the conditions under which this trade-off becomes strict, providing a guide for trade-off selection in practice. Based on the theoretical characterization, we develop an empirical regularizer based on conditional mutual information (CMI) between predictions and sensitive attributes given the true outcome. The CMI regularizer is compatible with any deep model trained via gradient-based optimization and serves as a scalar monitor of residual separation violations, offering tractable guarantees during training. Finally, numerical experiments support our theoretical findings: across COMPAS, UCI Adult, UCI Bank, and CelebA, the proposed method substantially reduces separation violations while matching or exceeding the utility of established baseline methods. This study thus offers a provable, stable, and flexible approach to enforcing separation in deep learning.

Separation-Utility Pareto Frontier: An Information-Theoretic Characterization

TL;DR

is precisely captured by the conditional mutual information

. It introduces a direct, gradient-friendly CMI regularizer with a gradient-normalization scheme and a differentiable plug-in estimator, accompanied by finite-sample bias and concentration guarantees. The approach yields smoother, more stable Pareto frontiers and improved deployment metrics across four benchmarks (Adult, COMPAS, Bank, CelebA) while preserving or enhancing utility, and it demonstrates that post-hoc thresholding can mask underlying posterior dependence. The work provides provable guarantees, scalable optimization, and practical guidance for enforcing separation in deep learning systems.

Abstract

Paper Structure (36 sections, 9 theorems, 77 equations, 4 figures, 3 tables, 1 algorithm)

This paper contains 36 sections, 9 theorems, 77 equations, 4 figures, 3 tables, 1 algorithm.

Introduction
Related works
Contributions
Methods: Regularization with Theoretical Guarantees
The Utility--Separation Pareto Frontier
Information-Theoretic Quantification of Separation Violation
When Is a Trade-off Necessary?
Training with Direct CMI Regularization
Differentiable Soft-Plug-in Estimator
Statistical Consistency and Explicit Bias Analysis
Numerical Experiment
Setup, Metrics, and Baselines
Results and Analysis
Bank Dataset: Frontier Shape and Operational Transfer
CelebA Dataset: Scalability and Posterior Separation
...and 21 more sections

Key Result

Proposition 2.1

For any $0\le v_1\le v_2 \le H(Z \mid Y)$, we have

Figures (4)

Figure 1: Bank Results: High Price of Fairness and Strict-Regime Stability.Top (Information Plane): The frontier estimation reveals a steep "knee" near the origin, indicating a dramatically varying marginal cost of separation. Normalized CMI provides a smooth, stable estimation of this dominant envelope, avoiding the significant fold-level variance or collapse observed in the comparison baselines near the strict-separation boundary. Bottom (Operational Transfer): These advantages transfer to superior performance in deployment metrics on test fold. CMI maintains high Accuracy and AUROC specifically in the critical strict-fairness regime (low EO gap), validating theoretical generalization guarantees.
Figure 2: CelebA Results: Scalability and Posterior Separation.Top (Information Plane): In the randomized view (left), Normalized CMI dominates the envelope, avoiding the range collapse and instability seen in baselines on high-dimensional embeddings. The narrowing of separation in the deterministic view (right) confirms that post-hoc thresholding can "mask" underlying posterior dependence. Bottom (Operational Transfer): Operational metrics mirror the randomized frontier: CMI achieves superior Accuracy/AUROC specifically in the strict-fairness regime (small EO gap), confirming that better posterior separation translates to robust deployment performance.
Figure 3: Adult Results: Low Marginal Cost and Robust Transfer.Top (Information Plane): In contrast to Bank, the Adult frontier exhibits a low marginal cost of separation; high utility is preserved even as the violation approaches zero ($v \approx 0$). Normalized CMI traces a smooth, concave envelope, avoiding the non-monotonicity and variance observed in proxy-based baselines. Bottom (Operational Transfer): This favorable geometry transfers to deployment metrics. CMI achieves near-optimal Accuracy and AUROC even at negligible EO gaps, confirming that when the theoretical cost of fairness is low, the direct estimator can reliably recover the optimal trade-off.
Figure 4: COMPAS Results: Strict-Regime Coverage and Stability.Top (Information Plane): Normalized CMI generates a coherent, well-ordered Pareto traversal into the strict regime ($v \approx 0$), avoiding the fold-level variance and non-Pareto artifacts (e.g., non-monotonic segments) observed in several proxy-based baselines on this noisy benchmark. Bottom (Operational Transfer): The stability advantages are most pronounced in the strict-fairness region. As the EO gap is tightened, CMI retains high AUROC and Accuracy, whereas baselines suffer sharper degradation or require larger operational gaps to recover comparable utility.

Theorems & Definitions (21)

Proposition 2.1: Deterministic frontier is non-decreasing
Theorem 2.2: Randomized Frontier equals the Concave Closure
Proposition 2.3: CMI characterizes separation
Lemma 2.4: Mutual information controls dependence
Theorem 2.5: CMI controls average conditional dependence
Lemma 2.6: Budget identity and universal bounds
Lemma 2.7: Conditional Law Matching
Theorem 2.8: Necessary trade-off beyond $u_X^\star$
Proposition 3.1: Bias and Concentration of Sample CMI
proof
...and 11 more

Separation-Utility Pareto Frontier: An Information-Theoretic Characterization

TL;DR

Abstract

Separation-Utility Pareto Frontier: An Information-Theoretic Characterization

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (4)

Theorems & Definitions (21)