Estimating Wage Disparities Using Foundation Models
Keyon Vafa, Susan Athey, David M. Blei
TL;DR
The paper addresses how to estimate wage disparities by leveraging foundation-model representations of labor-market histories, identifying omitted-variable bias that arises when fine-tuning for prediction alone. It derives conditions for $\sqrt{n}$-consistent, representation-based estimators and introduces three debiased fine-tuning methods to mitigate bias. Through semi-synthetic PSID-based experiments and an empirical PSID application, the authors show that rich history representations yield more accurate wage-gap estimates and reveal history factors omitted by traditional econometric summaries. The approach has broad implications for causal estimation and policy-relevant social-science analyses, suggesting a path to more robust decomposition and treatment-effect inferences using large pretrained representations.
Abstract
The rise of foundation models marks a paradigm shift in machine learning: instead of training specialized models from scratch, foundation models are first trained on massive datasets before being adapted or fine-tuned to make predictions on smaller datasets. Initially developed for text, foundation models have also excelled at making predictions about social science data. However, while many estimation problems in the social sciences use prediction as an intermediate step, they ultimately require different criteria for success. In this paper, we develop methods for fine-tuning foundation models to perform these estimation problems. We first characterize an omitted variable bias that can arise when a foundation model is only fine-tuned to maximize predictive accuracy. We then provide a novel set of conditions for fine-tuning under which estimates derived from a foundation model are root-n-consistent. Based on this theory, we develop new fine-tuning algorithms that empirically mitigate this omitted variable bias. To demonstrate our ideas, we study gender wage decomposition. This is a statistical estimation problem from econometrics where the goal is to decompose the gender wage gap into components that can and cannot be explained by career histories of workers. Classical methods for decomposing the wage gap employ simple predictive models of wages which condition on coarse summaries of career history that may omit factors that are important for explaining the gap. Instead, we use a custom-built foundation model to decompose the gender wage gap, which captures a richer representation of career history. Using data from the Panel Study of Income Dynamics, we find that career history explains more of the gender wage gap than standard econometric models can measure, and we identify elements of career history that are omitted by standard models but are important for explaining the wage gap.
