Table of Contents
Fetching ...

Interplay between social contact and media exposure in the overestimation of racial diversity in the U.S

Clara Eminente, Henrik Olsson, Ljubica Nedelkoska, Rafael Prieto-Curiel, Mirta Galesic, Elisa Omodei

Abstract

The general population systematically overestimates the size of minority groups, yet how these misperceptions vary across racial groups and geographical scales remains poorly understood. Using a purpose-built survey of the U.S. population, we examine overestimation of people of color (PoC) communities across four nested geographical scales: neighborhood, city, state, and nation. Our results demonstrate that overestimation is both scale- and group-dependent: the probability of overestimation increases progressively from local to national levels, and people of color overestimate their own group size more frequently than white people do at both the neighborhood and national levels. Among white respondents, we identify a scale-dependent divide in exposure mechanisms: direct interethnic social contact is the primary correlate of overestimation at local levels, whereas perceived frequency of coverage of people of color in news dominates at the national level. Furthermore, across both groups, frequent news consumption is associated with reduced rates of overestimation, while frequent social media use is associated with higher rates. These findings suggest that overestimation is real and present across scales and groups. This in turn can foster an `illusion of diversity', potentially undermining support for equity-promoting policies by creating the erroneous belief that representation goals have already been achieved.

Interplay between social contact and media exposure in the overestimation of racial diversity in the U.S

Abstract

The general population systematically overestimates the size of minority groups, yet how these misperceptions vary across racial groups and geographical scales remains poorly understood. Using a purpose-built survey of the U.S. population, we examine overestimation of people of color (PoC) communities across four nested geographical scales: neighborhood, city, state, and nation. Our results demonstrate that overestimation is both scale- and group-dependent: the probability of overestimation increases progressively from local to national levels, and people of color overestimate their own group size more frequently than white people do at both the neighborhood and national levels. Among white respondents, we identify a scale-dependent divide in exposure mechanisms: direct interethnic social contact is the primary correlate of overestimation at local levels, whereas perceived frequency of coverage of people of color in news dominates at the national level. Furthermore, across both groups, frequent news consumption is associated with reduced rates of overestimation, while frequent social media use is associated with higher rates. These findings suggest that overestimation is real and present across scales and groups. This in turn can foster an `illusion of diversity', potentially undermining support for equity-promoting policies by creating the erroneous belief that representation goals have already been achieved.

Paper Structure

This paper contains 8 sections, 6 equations, 17 figures, 13 tables.

Figures (17)

  • Figure 1: Geographical patterns in the overestimation of the size of the PoC population and sample representativeness. Panel a: Bootstrapped fraction of respondents who overestimate the size of the PoC population at each geographical resolution, split by racial group. Each boxplot is obtained by creating $1000$ bootstrapped samples of the labels assigned to respondents at that geographical level (neighborhood, town or city, state, and country) and, for each sample, computing the fraction of respondents who overestimate. The lower plot shows the magnitude of the difference between the distributions being compared, using Cliff's delta, a non-parametric effect size measure ranging from $-1$ to $1$. In this case, a positive value indicates that the white population tends to overestimate more often than the PoC population does, whereas a negative value indicates the opposite. Absolute values of $\delta$ between $0.5$ and $1$ indicate strong differences between the compared distributions (white background), values between $0.3$ and $0.5$ indicate moderate difference (light grey background), and values $\delta<0.3$ indicate negligible differences (dark grey background), often corresponding to non-statistically significant Mann-Whitney U tests (indicated by a cross sign in the plot). The dashed line at $f_{over}=0.5$ marks the threshold above which the majority of respondents are overestimating the size of PoC communities. Panels b-c: Representativeness of our sample in terms of gender, age, race of the respondents, and in terms of their geographical distribution. In panel b, we compare the quotas of each group in our sample and in the U.S. census, finding good alignment. Only binary gender identity is included due to sample size limitations, as further detailed in Section \ref{['sec:supmat:datacleaning']} of the Supplementary Information. In panel c, we overlay the population density distribution of the U.S. with that of our respondents. We exclude non-continental areas and Alaska for visualization purposes.
  • Figure 2: Probability of overestimation against exposure through social contact. Upper plots: distributions of the fraction of overestimating respondents (white on the left, PoC on the right) when estimating the size of the PoC population at different geographical levels. Respondents are split according to the composition of their social circle. Lower plots: magnitude of the difference between the compared distributions using Cliff's delta.
  • Figure 3: Probability of overestimation against news coverage about people of color. Panels a-b: distributions of the fraction of respondents who overestimate the size of the PoC population at different geographical levels, split according to the perceived frequency of news coverage about people of color. Panels c-d: distributions split according to the perceived tone of news coverage about people of color. Respondents are separated by racial group (white on the left, PoC on the right). The lower plots show the magnitude of the difference between the compared distributions using Cliff's delta.
  • Figure 4: Probability of overestimation against news consumption frequency and social media use. The upper plots in each panel show the distributions of the fraction of respondents who overestimate the size of the PoC population at different geographical levels. Respondents are first split by race (white on the left ---Panels a, c--- and PoC on the right ---Panels b, d---), and then further split according to how often they consume news (Panels a, b) and how often they use social media (Panels c, d). The lower plots show the magnitude of the difference between the compared distributions using Cliff's delta.
  • Figure 5: Variable importance in the random forest models. A separate random forest classifier is trained for each combination of the respondent's race and geographical level. Each plot displays the relative importance of the model's predictor variables, quantified using SHAP values. Variables in bold are those examined individually in previous sections. Each point represents a single observation in the test set (i.e., a respondent's estimate at a given geographical level), where the position along the horizontal axis indicates the corresponding SHAP value, reflecting the magnitude and direction of that feature's contribution to classifying the observation as an overestimation. Point color encodes the feature value: blue denotes low values and red denotes high values of the respective variable. As an illustrative example, in the upper right panel (PoC respondents at the neighborhood level), respondents with higher levels of education (red points) are associated with negative SHAP values, whereas those with lower levels of education (blue points) exhibit positive SHAP values. This indicates that higher educational attainment is negatively associated with overestimation, while lower educational attainment shows the opposite relationship. Education ranks first among all predictor variables in this panel, indicating that it exerts the greatest overall influence (measured as the mean absolute SHAP value) on the model's prediction of overestimation, regardless of direction. Plot titles report the Precision-Recall Area Under the Curve (PR-AUC) and the positive class prevalence. Models outperform the baselines when PR-AUC $> f_{over}$.
  • ...and 12 more figures