Table of Contents
Fetching ...

Do covariates explain why these groups differ? The choice of reference group can reverse conclusions in the Oaxaca-Blinder decomposition

Manuel Quintero, Advik Shreekumar, William T. Stephenson, Tamara Broderick

Abstract

Scientists often want to explain why an outcome is different in two groups. For instance, differences in patient mortality rates across two hospitals could be due to differences in the patients themselves (covariates) or differences in medical care (outcomes given covariates). The Oaxaca--Blinder decomposition (OBD) is a standard tool to tease apart these factors. It is well known that the OBD requires choosing one of the groups as a reference, and the numerical answer can vary with the reference. To the best of our knowledge, there has not been a systematic investigation into whether the choice of OBD reference can yield different substantive conclusions and how common this issue is. In the present paper, we give existence proofs in real and simulated data that the OBD references can yield substantively different conclusions and that these differences are not entirely driven by model misspecification or small data. We prove that substantively different conclusions occur in up to half of the parameter space, but find these discrepancies rare in the real-data analyses we study. We explain this empirical rarity by examining how realistic data-generating processes can be biased towards parameters that do not change conclusions under the OBD.

Do covariates explain why these groups differ? The choice of reference group can reverse conclusions in the Oaxaca-Blinder decomposition

Abstract

Scientists often want to explain why an outcome is different in two groups. For instance, differences in patient mortality rates across two hospitals could be due to differences in the patients themselves (covariates) or differences in medical care (outcomes given covariates). The Oaxaca--Blinder decomposition (OBD) is a standard tool to tease apart these factors. It is well known that the OBD requires choosing one of the groups as a reference, and the numerical answer can vary with the reference. To the best of our knowledge, there has not been a systematic investigation into whether the choice of OBD reference can yield different substantive conclusions and how common this issue is. In the present paper, we give existence proofs in real and simulated data that the OBD references can yield substantively different conclusions and that these differences are not entirely driven by model misspecification or small data. We prove that substantively different conclusions occur in up to half of the parameter space, but find these discrepancies rare in the real-data analyses we study. We explain this empirical rarity by examining how realistic data-generating processes can be biased towards parameters that do not change conclusions under the OBD.

Paper Structure

This paper contains 28 sections, 5 theorems, 33 equations, 5 figures, 8 tables, 1 algorithm.

Key Result

Proposition 5.3

The OBD sign flips the unexplained component if and only if the following two conditions hold:

Figures (5)

  • Figure 1: Linear fit of mortality on admission heart rate for each group when holding all other covariates fixed at their group means.
  • Figure 2: Population linear models of SBP on BMI by group and data samples.
  • Figure 3: Percentage of parameter space leading to sign flips. See \ref{['remark:computing_irwin_hall']} for how we compute the percentage for the unexplained component.
  • Figure 4: Percentage of parameter space leading to unexplained component sign flips, without covariate standardization.
  • Figure 5: Mortality outcome histogram for HR quartile two subgroup.

Theorems & Definitions (12)

  • Definition 5.1
  • Remark 5.2
  • Definition 5.3
  • Proposition 5.3: OBD unexplained sign flips
  • Proposition 5.4
  • Proposition 5.5
  • Remark 5.6
  • Proposition 5.7
  • proof
  • Lemma 1.1
  • ...and 2 more