Table of Contents
Fetching ...

Estimating Wage Disparities Using Foundation Models

Keyon Vafa, Susan Athey, David M. Blei

TL;DR

The paper addresses how to estimate wage disparities by leveraging foundation-model representations of labor-market histories, identifying omitted-variable bias that arises when fine-tuning for prediction alone. It derives conditions for $\sqrt{n}$-consistent, representation-based estimators and introduces three debiased fine-tuning methods to mitigate bias. Through semi-synthetic PSID-based experiments and an empirical PSID application, the authors show that rich history representations yield more accurate wage-gap estimates and reveal history factors omitted by traditional econometric summaries. The approach has broad implications for causal estimation and policy-relevant social-science analyses, suggesting a path to more robust decomposition and treatment-effect inferences using large pretrained representations.

Abstract

The rise of foundation models marks a paradigm shift in machine learning: instead of training specialized models from scratch, foundation models are first trained on massive datasets before being adapted or fine-tuned to make predictions on smaller datasets. Initially developed for text, foundation models have also excelled at making predictions about social science data. However, while many estimation problems in the social sciences use prediction as an intermediate step, they ultimately require different criteria for success. In this paper, we develop methods for fine-tuning foundation models to perform these estimation problems. We first characterize an omitted variable bias that can arise when a foundation model is only fine-tuned to maximize predictive accuracy. We then provide a novel set of conditions for fine-tuning under which estimates derived from a foundation model are root-n-consistent. Based on this theory, we develop new fine-tuning algorithms that empirically mitigate this omitted variable bias. To demonstrate our ideas, we study gender wage decomposition. This is a statistical estimation problem from econometrics where the goal is to decompose the gender wage gap into components that can and cannot be explained by career histories of workers. Classical methods for decomposing the wage gap employ simple predictive models of wages which condition on coarse summaries of career history that may omit factors that are important for explaining the gap. Instead, we use a custom-built foundation model to decompose the gender wage gap, which captures a richer representation of career history. Using data from the Panel Study of Income Dynamics, we find that career history explains more of the gender wage gap than standard econometric models can measure, and we identify elements of career history that are omitted by standard models but are important for explaining the wage gap.

Estimating Wage Disparities Using Foundation Models

TL;DR

The paper addresses how to estimate wage disparities by leveraging foundation-model representations of labor-market histories, identifying omitted-variable bias that arises when fine-tuning for prediction alone. It derives conditions for -consistent, representation-based estimators and introduces three debiased fine-tuning methods to mitigate bias. Through semi-synthetic PSID-based experiments and an empirical PSID application, the authors show that rich history representations yield more accurate wage-gap estimates and reveal history factors omitted by traditional econometric summaries. The approach has broad implications for causal estimation and policy-relevant social-science analyses, suggesting a path to more robust decomposition and treatment-effect inferences using large pretrained representations.

Abstract

The rise of foundation models marks a paradigm shift in machine learning: instead of training specialized models from scratch, foundation models are first trained on massive datasets before being adapted or fine-tuned to make predictions on smaller datasets. Initially developed for text, foundation models have also excelled at making predictions about social science data. However, while many estimation problems in the social sciences use prediction as an intermediate step, they ultimately require different criteria for success. In this paper, we develop methods for fine-tuning foundation models to perform these estimation problems. We first characterize an omitted variable bias that can arise when a foundation model is only fine-tuned to maximize predictive accuracy. We then provide a novel set of conditions for fine-tuning under which estimates derived from a foundation model are root-n-consistent. Based on this theory, we develop new fine-tuning algorithms that empirically mitigate this omitted variable bias. To demonstrate our ideas, we study gender wage decomposition. This is a statistical estimation problem from econometrics where the goal is to decompose the gender wage gap into components that can and cannot be explained by career histories of workers. Classical methods for decomposing the wage gap employ simple predictive models of wages which condition on coarse summaries of career history that may omit factors that are important for explaining the gap. Instead, we use a custom-built foundation model to decompose the gender wage gap, which captures a richer representation of career history. Using data from the Panel Study of Income Dynamics, we find that career history explains more of the gender wage gap than standard econometric models can measure, and we identify elements of career history that are omitted by standard models but are important for explaining the wage gap.
Paper Structure (15 sections, 2 theorems, 74 equations, 7 figures, 8 tables)

This paper contains 15 sections, 2 theorems, 74 equations, 7 figures, 8 tables.

Key Result

Theorem 1

Consider a sequence of wage models $\hat{\mu}_{n,0}:\mathbb{R}^D \to \mathbb{R}$, propensity models $\hat{e}_n:\mathbb{R}^D \to (0, 1)$, and representations $\lambda_n: \mathcal{X} \to \mathbb{R}^D$. Denote by $\psi$ the true wage gap unexplained by history and by $\hat{\psi}_n$ the representation-b Assume the following: Then,

Figures (7)

  • Figure 1: Debiased fine-tuning methods are better at estimating the unexplained wage gap than standard supervised fine-tuning devlin2018bert across 270 semi-synthetic experiments. For each semi-synthetic experiment, the true unexplained wage gap is known, and each method provides a different estimate of this gap. This figure compares each method's average error for estimating this gap, evaluated via MSE between the true and estimated unexplained gap. Specifically, the Y-axis compares each method's estimation error to the error from estimates derived from a model using standard supervised fine-tuning (larger values on the Y-axis correspond to larger improvements). Bars represent 95% confidence intervals.
  • Figure 2: CAREER finds omitted variables from a worker's job history that are important for explaining the gender wage gap. These omitted variables, which are identified by a regression tree as being most predictive of wage, are correlated with both wage and gender.
  • Figure 3: The AIPW estimator on semi-synthetic data performs worse than the outcome-only estimator. For each semi-synthetic experiment, the true unexplained wage gap is known, and each method provides a different estimate of this gap. This figure compares each method's average error for estimating this gap, evaluated via MSE between the true and estimated unexplained gap. Specifically, the Y-axis compares each method's estimation error to the error from estimates derived from a model using the outcome-only model and standard supervised fine-tuning (larger values on the Y-axis correspond to larger improvements). Bars represent 95% confidence intervals.
  • Figure 4: The R-learner evaluation metric is correlated with model estimation error across semi-synthetic experiments. The highest correlation occurs when more of the representation is shared in the underlying data. The models used to calculate this correlation are: linear regression, supervised fine-tuning, and the three debiased fine-tuning approaches described in \ref{['sec:framework']}. Test-set bootstrapped standard errors are in parentheses. The full results for each of the 27 settings is in \ref{['tab:app:all_semi_synthetic_results']}.
  • Figure 5: Compared to methods that adjust the gender wage gap for summary statistics of history, the learned representations of history explain more of the gap for later years in the PSID survey. Test-set bootstrapped standard errors are in parentheses.
  • ...and 2 more figures

Theorems & Definitions (3)

  • Theorem 1
  • Theorem 2
  • proof