Table of Contents
Fetching ...

Obliviator Reveals the Cost of Nonlinear Guardedness in Concept Erasure

Ramin Akbari, Milad Afshari, Vishnu Naresh Boddeti

TL;DR

The utility-erasure trade-off curves obtained by Obliviator outperform the baselines and demonstrate its strong generalizability: its erasure becomes more utility-preserving when applied to the better-disentangled representations learned by more capable models.

Abstract

Concept erasure aims to remove unwanted attributes, such as social or demographic factors, from learned representations, while preserving their task-relevant utility. While the goal of concept erasure is protection against all adversaries, existing methods remain vulnerable to nonlinear ones. This vulnerability arises from their failure to fully capture the complex, nonlinear statistical dependencies between learned representations and unwanted attributes. Moreover, although the existence of a trade-off between utility and erasure is expected, its progression during the erasure process, i.e., the cost of erasure, remains unstudied. In this work, we introduce Obliviator, a post-hoc erasure method designed to fully capture nonlinear statistical dependencies. We formulate erasure from a functional perspective, leading to an optimization problem involving a composition of kernels that lacks a closed-form solution. Instead of solving this problem in a single shot, we adopt an iterative approach that gradually morphs the feature space to achieve a more utility-preserving erasure. Unlike prior methods, Obliviator guards unwanted attribute against nonlinear adversaries. Our gradual approach quantifies the cost of nonlinear guardedness and reveals the dynamics between attribute protection and utility-preservation over the course of erasure. The utility-erasure trade-off curves obtained by Obliviator outperform the baselines and demonstrate its strong generalizability: its erasure becomes more utility-preserving when applied to the better-disentangled representations learned by more capable models.

Obliviator Reveals the Cost of Nonlinear Guardedness in Concept Erasure

TL;DR

The utility-erasure trade-off curves obtained by Obliviator outperform the baselines and demonstrate its strong generalizability: its erasure becomes more utility-preserving when applied to the better-disentangled representations learned by more capable models.

Abstract

Concept erasure aims to remove unwanted attributes, such as social or demographic factors, from learned representations, while preserving their task-relevant utility. While the goal of concept erasure is protection against all adversaries, existing methods remain vulnerable to nonlinear ones. This vulnerability arises from their failure to fully capture the complex, nonlinear statistical dependencies between learned representations and unwanted attributes. Moreover, although the existence of a trade-off between utility and erasure is expected, its progression during the erasure process, i.e., the cost of erasure, remains unstudied. In this work, we introduce Obliviator, a post-hoc erasure method designed to fully capture nonlinear statistical dependencies. We formulate erasure from a functional perspective, leading to an optimization problem involving a composition of kernels that lacks a closed-form solution. Instead of solving this problem in a single shot, we adopt an iterative approach that gradually morphs the feature space to achieve a more utility-preserving erasure. Unlike prior methods, Obliviator guards unwanted attribute against nonlinear adversaries. Our gradual approach quantifies the cost of nonlinear guardedness and reveals the dynamics between attribute protection and utility-preservation over the course of erasure. The utility-erasure trade-off curves obtained by Obliviator outperform the baselines and demonstrate its strong generalizability: its erasure becomes more utility-preserving when applied to the better-disentangled representations learned by more capable models.
Paper Structure (36 sections, 1 theorem, 55 equations, 11 figures, 11 tables, 1 algorithm)

This paper contains 36 sections, 1 theorem, 55 equations, 11 figures, 11 tables, 1 algorithm.

Key Result

Lemma 1

For any $\mathscr{L}\in\mathrm{HS}(\mathcal{F},\mathcal{G})$, $f\in\mathcal{F}$, and $g\in\mathcal{G}$,

Figures (11)

  • Figure 1: Erasure of Gender from Representation on Bias in Bios. Embeddings from a nonlinear adversary trained to extract gender information from the erased representation. Existing nonlinear methods fail to fully protect gender, as gender-specific distributions within each profession remain distinguishable. In contrast, Obliviator effectively guards gender by overlapping representations across gender, while preserving separability by profession.
  • Figure 2: Overview. Obliviator operates with two-step iterations: 1) Imposing Independence via RKHS: An encoder is trained with a multi-objective loss \ref{['eq:loss_emp_iterative']} to reduce statistical dependence on the unwanted attribute while preserving task-relevant information. 2) RKHS Disentanglement: Representations from the previous step are refined using functions derived from a constrained optimization in RKHS \ref{['eq:evp_smooth_confined']}. This refinement enhances the feature space's alignment with the target attribute, which facilitates the encoder's training in the next iteration. (Image source: https://muppet.fandom.com/wiki/Bert)
  • Figure 3: Finetuned+Supervised Erasure : Comparison of Obliviator with baselines for fine-tuned representations. Obliviator leverages $Y$ labels during the erasure, a scheme which we refer to as supervised erasure."
  • Figure 4: Frozen+Unsupervised Erasure : Comparison of Obliviator and baselines with frozen representations. In unsupervised erasure, we implicitly observe $Y$ information from $X$ and $X^i$ and thereby we observe a more noticeable trade-off compared to \ref{['fig:sup']}.
  • Figure 5: Supervised and unsupervised erasure on fine-tuned and frozen representations. (Sup: Supervised, Unsup: Unsupervised, Fd: Finetuned, and Fz: Frozen.)
  • ...and 6 more figures

Theorems & Definitions (2)

  • Lemma
  • proof