Table of Contents
Fetching ...

Multi-Group Proportional Representation in Retrieval

Alex Oesterling, Claudio Mayrink Verdun, Carol Xuan Long, Alexander Glynn, Lucas Monteiro Paes, Sajani Vithana, Martina Cardone, Flavio P. Calmon

TL;DR

This work introduces Multi-Group Proportional Representation (MPR), a novel metric that measures representation across intersectional groups and shows that optimizing MPR yields more proportional representation across multiple intersectional groups specified by a rich function class, often with minimal compromise in retrieval accuracy.

Abstract

Image search and retrieval tasks can perpetuate harmful stereotypes, erase cultural identities, and amplify social disparities. Current approaches to mitigate these representational harms balance the number of retrieved items across population groups defined by a small number of (often binary) attributes. However, most existing methods overlook intersectional groups determined by combinations of group attributes, such as gender, race, and ethnicity. We introduce Multi-Group Proportional Representation (MPR), a novel metric that measures representation across intersectional groups. We develop practical methods for estimating MPR, provide theoretical guarantees, and propose optimization algorithms to ensure MPR in retrieval. We demonstrate that existing methods optimizing for equal and proportional representation metrics may fail to promote MPR. Crucially, our work shows that optimizing MPR yields more proportional representation across multiple intersectional groups specified by a rich function class, often with minimal compromise in retrieval accuracy.

Multi-Group Proportional Representation in Retrieval

TL;DR

This work introduces Multi-Group Proportional Representation (MPR), a novel metric that measures representation across intersectional groups and shows that optimizing MPR yields more proportional representation across multiple intersectional groups specified by a rich function class, often with minimal compromise in retrieval accuracy.

Abstract

Image search and retrieval tasks can perpetuate harmful stereotypes, erase cultural identities, and amplify social disparities. Current approaches to mitigate these representational harms balance the number of retrieved items across population groups defined by a small number of (often binary) attributes. However, most existing methods overlook intersectional groups determined by combinations of group attributes, such as gender, race, and ethnicity. We introduce Multi-Group Proportional Representation (MPR), a novel metric that measures representation across intersectional groups. We develop practical methods for estimating MPR, provide theoretical guarantees, and propose optimization algorithms to ensure MPR in retrieval. We demonstrate that existing methods optimizing for equal and proportional representation metrics may fail to promote MPR. Crucially, our work shows that optimizing MPR yields more proportional representation across multiple intersectional groups specified by a rich function class, often with minimal compromise in retrieval accuracy.
Paper Structure (30 sections, 11 theorems, 55 equations, 33 figures, 6 tables, 1 algorithm)

This paper contains 30 sections, 11 theorems, 55 equations, 33 figures, 6 tables, 1 algorithm.

Key Result

Proposition 1

Let $\mathcal{R}(q)=\{\boldsymbol{x}^r_i\}_{i = 1}^{k}$ be a set of $k$ retrieved samples, $\mathcal{D}_{C}=\{\boldsymbol{x}^c_i\}_{i = 1}^{m}$ be a curated dataset comprised of $m$ i.i.d. samples from a target representation distribution $Q$, and $\delta>0$. If $\mathcal{C} = \{c: \mathcal{R}^d\tim

Figures (33)

  • Figure 1: Fraction of Top-$k$ cosine similarity vs Fraction of Top-$k$ MPR averaged over 10 queries for $k=50$ images retrieved. From Left-to-Right: CelebA, UTKFaces, Occupations. Values are normalized so Top-$k$ MPR and similarity is the point (1,1). MOPR Pareto-dominates baselines and significantly closes the MPR gap.
  • Figure 2: Comparison of linear program with and without taking the top-$k$ to integer program. Solving the relaxed program is much more computationally efficient and achieves similar performance after rounding to solving the integer program.
  • Figure 3: Additional comparisons of relaxed problem and top-$k$ selection to integer program. "chef", "nurse", "artist", "lawyer", "teacher", "engineer", "architect", "scientist", and "programmer".
  • Figure 4: Similarity vs MPR for Quadratic Program (Eqn. \ref{['eqn:mopr_closed']}) and MOPR. MOPR well approximates the quadratic program along the Pareto frontier. Measured over a single query "A photo of a lawyer" for 50 retrieved samples on CelebA.
  • Figure 5: Similarity vs MPR for MSE estimated (Prop \ref{['prop:equiv']} and closed form (Prop \ref{['prop:closed_form_MPR']}) measures of MPR. For the class of linear models, a linear regression oracle perfectly achieves the analytical solution for MPR. Measured over a single query "A photo of a lawyer" for 50 retrieved samples on CelebA.
  • ...and 28 more figures

Theorems & Definitions (21)

  • Definition 1
  • Remark 1
  • Remark 2
  • Definition 2: Curated Dataset
  • Proposition 1: Generalization Gap of MPR
  • Proposition 2: Query Budget Guarantee
  • Proposition 3
  • Proposition 4
  • Lemma 1
  • proof
  • ...and 11 more