Table of Contents
Fetching ...

Nearest-Neighbor Density Estimation for Dependency Suppression

Kathleen Anderson, Thomas Martinetz

TL;DR

This work combines a specialized variational autoencoder with a novel loss function driven by non-parametric nearest-neighbor density estimation, enabling direct optimization of independence and demonstrating that it can outperform existing unsupervised techniques and even rival supervised methods in balancing information removal and utility.

Abstract

The ability to remove unwanted dependencies from data is crucial in various domains, including fairness, robust learning, and privacy protection. In this work, we propose an encoder-based approach that learns a representation independent of a sensitive variable but otherwise preserving essential data characteristics. Unlike existing methods that rely on decorrelation or adversarial learning, our approach explicitly estimates and modifies the data distribution to neutralize statistical dependencies. To achieve this, we combine a specialized variational autoencoder with a novel loss function driven by non-parametric nearest-neighbor density estimation, enabling direct optimization of independence. We evaluate our approach on multiple datasets, demonstrating that it can outperform existing unsupervised techniques and even rival supervised methods in balancing information removal and utility.

Nearest-Neighbor Density Estimation for Dependency Suppression

TL;DR

This work combines a specialized variational autoencoder with a novel loss function driven by non-parametric nearest-neighbor density estimation, enabling direct optimization of independence and demonstrating that it can outperform existing unsupervised techniques and even rival supervised methods in balancing information removal and utility.

Abstract

The ability to remove unwanted dependencies from data is crucial in various domains, including fairness, robust learning, and privacy protection. In this work, we propose an encoder-based approach that learns a representation independent of a sensitive variable but otherwise preserving essential data characteristics. Unlike existing methods that rely on decorrelation or adversarial learning, our approach explicitly estimates and modifies the data distribution to neutralize statistical dependencies. To achieve this, we combine a specialized variational autoencoder with a novel loss function driven by non-parametric nearest-neighbor density estimation, enabling direct optimization of independence. We evaluate our approach on multiple datasets, demonstrating that it can outperform existing unsupervised techniques and even rival supervised methods in balancing information removal and utility.
Paper Structure (20 sections, 12 equations, 5 figures, 3 tables)

This paper contains 20 sections, 12 equations, 5 figures, 3 tables.

Figures (5)

  • Figure 1: The pipeline used to translate input samples $x$ into invariant versions $x'$.
  • Figure 2: Depiction for the method described in \ref{['sec:neighbor_density_KNN']}. To estimate the density of $z$, one effectively counts the points that are in its circumference $\mathcal{B}$.
  • Figure 3: Validation accuracy of an MLP on MNIST with noisy labels. The blue (upper) line shows performance after background removal, while the orange line is the baseline. A noise ratio of 0.2 means 20% of training labels were randomly replaced.
  • Figure 4: T-SNE embedding for the MNIST dataset with backgrounds. The top row is colored by background label, bottom by digit. From left to right, the columns depict the original data, the VAE latent ($z_{vae}$) and the encoded latent ($z_{enc}$).
  • Figure 5: Images reconstructed from StyleGAN latents (see \ref{['sec:ablation']}). The bottom row shows images reconstructed from latents that have been translated using our nearest-neighbor divergence, to remove gender information. Note how the gender is not "removed", but rather shifted randomly for every image.