Table of Contents
Fetching ...

Equitable Evaluation via Elicitation

Elbert Du, Cynthia Dwork, Lunjia Hu, Reid McIlroy-Young, Han Shao, Linjun Zhang

TL;DR

An interactive AI for skill elicitation that provides accurate determination of skills while simultaneously allowing individuals to speak in their own voice and enforcement of equitability ensures that the covariance between self-presentation manner and skill evaluation error is small.

Abstract

Individuals with similar qualifications and skills may vary in their demeanor, or outward manner: some tend toward self-promotion while others are modest to the point of omitting crucial information. Comparing the self-descriptions of equally qualified job-seekers with different self-presentation styles is therefore problematic. We build an interactive AI for skill elicitation that provides accurate determination of skills while simultaneously allowing individuals to speak in their own voice. Such a system can be deployed, for example, when a new user joins a professional networking platform, or when matching employees to needs during a company reorganization. To obtain sufficient training data, we train an LLM to act as synthetic humans. Elicitation mitigates endogenous bias arising from individuals' own self-reports. To address systematic model bias we enforce a mathematically rigorous notion of equitability ensuring that the covariance between self-presentation manner and skill evaluation error is small.

Equitable Evaluation via Elicitation

TL;DR

An interactive AI for skill elicitation that provides accurate determination of skills while simultaneously allowing individuals to speak in their own voice and enforcement of equitability ensures that the covariance between self-presentation manner and skill evaluation error is small.

Abstract

Individuals with similar qualifications and skills may vary in their demeanor, or outward manner: some tend toward self-promotion while others are modest to the point of omitting crucial information. Comparing the self-descriptions of equally qualified job-seekers with different self-presentation styles is therefore problematic. We build an interactive AI for skill elicitation that provides accurate determination of skills while simultaneously allowing individuals to speak in their own voice. Such a system can be deployed, for example, when a new user joins a professional networking platform, or when matching employees to needs during a company reorganization. To obtain sufficient training data, we train an LLM to act as synthetic humans. Elicitation mitigates endogenous bias arising from individuals' own self-reports. To address systematic model bias we enforce a mathematically rigorous notion of equitability ensuring that the covariance between self-presentation manner and skill evaluation error is small.
Paper Structure (33 sections, 2 theorems, 13 equations, 3 figures, 2 tables, 3 algorithms)

This paper contains 33 sections, 2 theorems, 13 equations, 3 figures, 2 tables, 3 algorithms.

Key Result

Theorem 3.1

Let $\mathcal{F}=\{f_\phi: \phi \in \Phi\}$ be the hypothesis class, and suppose the solution to the loss function loss:ma, $f^*$, lies within $\epsilon^*$ of $\mathcal{F}$. Then $f^*$ is $(\mathcal{C}, \epsilon+\epsilon^*)$-multi-accurate at every iteration of the interactive procedure. ∎

Figures (3)

  • Figure 1: Accuracy on test samples during training with fairness correction at all epochs (blue), with the fairness correction applied at all previous epochs (blue dashed), with the fairness correction never applied (red), and with the fairness correction applied only at step $t$ ( green). Error bars are bootstrap confidence intervals.
  • Figure 2: Fairness loss (Equation\ref{['loss:ma']}) on test samples during training with the fairness correction (blue), with the fairness correction applied at all previous epochs (blue dashed), and with the fairness correction never applied (red). Error bars are bootstrap confidence intervals.
  • Figure 3: Z-scores on test samples during training with fairness correction at all epochs (blue), with the fairness correction applied only at step $t$ ( purple), and with the fairness correction never applied (red)

Theorems & Definitions (2)

  • Theorem 3.1
  • Lemma D.1: Theorem 5.6 from gopalan2023loss