Multi-Group Proportional Representation in Retrieval

Alex Oesterling; Claudio Mayrink Verdun; Carol Xuan Long; Alexander Glynn; Lucas Monteiro Paes; Sajani Vithana; Martina Cardone; Flavio P. Calmon

Multi-Group Proportional Representation in Retrieval

Alex Oesterling, Claudio Mayrink Verdun, Carol Xuan Long, Alexander Glynn, Lucas Monteiro Paes, Sajani Vithana, Martina Cardone, Flavio P. Calmon

TL;DR

This work introduces Multi-Group Proportional Representation (MPR), a novel metric that measures representation across intersectional groups and shows that optimizing MPR yields more proportional representation across multiple intersectional groups specified by a rich function class, often with minimal compromise in retrieval accuracy.

Abstract

Image search and retrieval tasks can perpetuate harmful stereotypes, erase cultural identities, and amplify social disparities. Current approaches to mitigate these representational harms balance the number of retrieved items across population groups defined by a small number of (often binary) attributes. However, most existing methods overlook intersectional groups determined by combinations of group attributes, such as gender, race, and ethnicity. We introduce Multi-Group Proportional Representation (MPR), a novel metric that measures representation across intersectional groups. We develop practical methods for estimating MPR, provide theoretical guarantees, and propose optimization algorithms to ensure MPR in retrieval. We demonstrate that existing methods optimizing for equal and proportional representation metrics may fail to promote MPR. Crucially, our work shows that optimizing MPR yields more proportional representation across multiple intersectional groups specified by a rich function class, often with minimal compromise in retrieval accuracy.

Multi-Group Proportional Representation in Retrieval

TL;DR

Abstract

Paper Structure (30 sections, 11 theorems, 55 equations, 33 figures, 6 tables, 1 algorithm)

This paper contains 30 sections, 11 theorems, 55 equations, 33 figures, 6 tables, 1 algorithm.

Introduction
A Multi-Group Proportional Representation Metric
Preliminaries.
Multi-Group Proportional Representation.
Curated Datasets for Proportional Representation.
Computing Multi-Group Proportional Representation
Error in approximation $Q$ via a curated dataset.
Computing MPR via Mean Square Error (MSE) Minimization.
Computing MPR for Bounded-Norm Linear Regression.
Promoting Multi-Group Proportional Representation in Retrieval Tasks
Multi-Group Optimized Proportional Retrieval.
Numerical Experiments
Datasets.
Benchmarks.
Experimental Setup.
...and 15 more sections

Key Result

Proposition 1

Let $\mathcal{R}(q)=\{\boldsymbol{x}^r_i\}_{i = 1}^{k}$ be a set of $k$ retrieved samples, $\mathcal{D}_{C}=\{\boldsymbol{x}^c_i\}_{i = 1}^{m}$ be a curated dataset comprised of $m$ i.i.d. samples from a target representation distribution $Q$, and $\delta>0$. If $\mathcal{C} = \{c: \mathcal{R}^d\tim

Figures (33)

Figure 1: Fraction of Top-$k$ cosine similarity vs Fraction of Top-$k$ MPR averaged over 10 queries for $k=50$ images retrieved. From Left-to-Right: CelebA, UTKFaces, Occupations. Values are normalized so Top-$k$ MPR and similarity is the point (1,1). MOPR Pareto-dominates baselines and significantly closes the MPR gap.
Figure 2: Comparison of linear program with and without taking the top-$k$ to integer program. Solving the relaxed program is much more computationally efficient and achieves similar performance after rounding to solving the integer program.
Figure 3: Additional comparisons of relaxed problem and top-$k$ selection to integer program. "chef", "nurse", "artist", "lawyer", "teacher", "engineer", "architect", "scientist", and "programmer".
Figure 4: Similarity vs MPR for Quadratic Program (Eqn. \ref{['eqn:mopr_closed']}) and MOPR. MOPR well approximates the quadratic program along the Pareto frontier. Measured over a single query "A photo of a lawyer" for 50 retrieved samples on CelebA.
Figure 5: Similarity vs MPR for MSE estimated (Prop \ref{['prop:equiv']} and closed form (Prop \ref{['prop:closed_form_MPR']}) measures of MPR. For the class of linear models, a linear regression oracle perfectly achieves the analytical solution for MPR. Measured over a single query "A photo of a lawyer" for 50 retrieved samples on CelebA.
...and 28 more figures

Theorems & Definitions (21)

Definition 1
Remark 1
Remark 2
Definition 2: Curated Dataset
Proposition 1: Generalization Gap of MPR
Proposition 2: Query Budget Guarantee
Proposition 3
Proposition 4
Lemma 1
proof
...and 11 more

Multi-Group Proportional Representation in Retrieval

TL;DR

Abstract

Multi-Group Proportional Representation in Retrieval

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (33)

Theorems & Definitions (21)