Table of Contents
Fetching ...

Attenuation Bias with Latent Predictors

Connor T. Jerzak, Stephen A. Jessee

TL;DR

This paper analyzes attenuation bias when regressors are latent traits estimated from indicators. It shows that conventional corrections (e.g., IV, MOC) can misadjust for latent-predictor error due to identification-based rescaling, and introduces a modular correlation-corrected estimator based on split indicators that yields consistent slopes under standard assumptions. The method uses two independent latent-trait estimates, derives a correction factor from their correlation, and can be applied with any latent-trait estimator, including additive scores, factor models, or ML approaches. Through theory, simulations, and empirical applications (e.g., political knowledge predicting duty to vote), the authors demonstrate substantial improvements in bias and often results close to full joint estimation, with open-source software provided for implementation. The work highlights the need to tailor error correction to latent predictors, offering a practical, scalable tool for robust inference in political science and related fields.

Abstract

Many core concepts in political science are latent and therefore can only be measured with error. Measurement error in a predictor attenuates slope coefficient estimates in regression, biasing them toward zero. We show that widely used strategies for correcting attenuation bias -- including instrumental variables and the method of composition -- are themselves biased when applied to latent regressors, sometimes even more than simple regression ignoring the measurement error altogether. We derive a correlation-based correction using split-sample measurement strategies. Rather than assuming a particular estimation strategy for the latent trait, our approach is modular and can be easily deployed with a wide variety of latent trait measurement strategies, including additive score, factor, or machine learning models, requiring no joint estimation while yielding consistent slopes under standard assumptions. Simulations and applications show stronger relationships after our correction, sometimes by as much as 50%. Open-source software implements the procedure. Results underscore that latent predictors demand tailored error correction; otherwise, conventional practice can exacerbate bias.

Attenuation Bias with Latent Predictors

TL;DR

This paper analyzes attenuation bias when regressors are latent traits estimated from indicators. It shows that conventional corrections (e.g., IV, MOC) can misadjust for latent-predictor error due to identification-based rescaling, and introduces a modular correlation-corrected estimator based on split indicators that yields consistent slopes under standard assumptions. The method uses two independent latent-trait estimates, derives a correction factor from their correlation, and can be applied with any latent-trait estimator, including additive scores, factor models, or ML approaches. Through theory, simulations, and empirical applications (e.g., political knowledge predicting duty to vote), the authors demonstrate substantial improvements in bias and often results close to full joint estimation, with open-source software provided for implementation. The work highlights the need to tailor error correction to latent predictors, offering a practical, scalable tool for robust inference in political science and related fields.

Abstract

Many core concepts in political science are latent and therefore can only be measured with error. Measurement error in a predictor attenuates slope coefficient estimates in regression, biasing them toward zero. We show that widely used strategies for correcting attenuation bias -- including instrumental variables and the method of composition -- are themselves biased when applied to latent regressors, sometimes even more than simple regression ignoring the measurement error altogether. We derive a correlation-based correction using split-sample measurement strategies. Rather than assuming a particular estimation strategy for the latent trait, our approach is modular and can be easily deployed with a wide variety of latent trait measurement strategies, including additive score, factor, or machine learning models, requiring no joint estimation while yielding consistent slopes under standard assumptions. Simulations and applications show stronger relationships after our correction, sometimes by as much as 50%. Open-source software implements the procedure. Results underscore that latent predictors demand tailored error correction; otherwise, conventional practice can exacerbate bias.

Paper Structure

This paper contains 20 sections, 22 equations, 11 figures, 4 tables.

Figures (11)

  • Figure 1: Grey points are uncorrected OLS slope coefficients from regressing duty-to-vote on estimated knowledge, where knowledge is constructed from each nonempty subset of the four indicators (1--4 items). Black dots are the mean slope within each indicator-count. Horizontal lines show: the unadjusted OLS estimate using all four items ($\hat{\beta}_{\hat{X}}$), the correlation-corrected estimate ($\hat{\beta}^{*}$), and the MOC estimate (using an item-response model). See Tables \ref{['tab:cIVvOLS_ANES-Knowledge_Part1']} and \ref{['tab:cIVvuIV_ANES-Knowledge_Part1']} for additional results varying $N$.
  • Figure 2: Simulation results varying $N$ and $M$. Results show comparison of performance of estimators from simluations described in Section 5.
  • Figure A.I.1: This figure depicts the value of the factor $\frac{(1+\sigma^2_{U_2})^{\frac{1}{4}}}{(1+\sigma^2_{U_1})^{\frac{1}{4}}}$ from Equation \ref{['eq:correlation-correction-different-variances2']} as a function of $\sigma^2_{U_1}$ and $\sigma^2_{U_2}$. The left pane shows the values of this factor when each of these variances ranges between 0 and 1, while the right pane shows values as the variances range between 0 and 10.
  • Figure A.I.2: This plot displays the value of the factor $\left[ \frac{(1+\sigma^2_{U_2})^{\frac{1}{4}}}{(1+\sigma^2_{U_1})^{\frac{1}{4}}} + \frac{(1+\sigma^2_{U_1})^{\frac{1}{4}}}{(1+\sigma^2_{U_2})^{\frac{1}{4}}} \right] /2$ from Equation \ref{['eq:factor-averaged']} as a function of $\sigma^2_{U_1}$ and $\sigma^2_{U_2}$. The left pane shows the values of this factor when each of these variances ranges between 0 and 1, while the right pane shows values as the variances range between 0 and 10.
  • Figure A.III.1: Illustrative simulation results. The true coefficient is highlighted in green (0.40). The mean of the estimates is highlighted in red across the various approaches.
  • ...and 6 more figures