Table of Contents
Fetching ...

Population-Scale Network Embeddings Expose Educational Divides in Network Structure Related to Right-Wing Populist Voting

Malte Lüken, Javier Garcia-Bernardo, Sreeparna Deb, Flavio Hafner, Megha Khosla

Abstract

Administrative registry data can be used to construct population-scale networks whose ties reflect shared social contexts between persons. With machine learning, such networks can be encoded into numerical representations -- embeddings -- that automatically capture an individual's position within the network. We created embeddings for all persons in the Dutch population from a population-scale network that represents five shared contexts: neighborhood, work, family, household, and school. To assess the informativeness of these embeddings, we used them to predict right-wing populist voting. Embeddings alone predicted right-wing populist voting above chance-level but performed worse than individual characteristics. Combining the best subset of embeddings with individual characteristics only slightly improved predictions. After transforming the embeddings to make their dimensions more sparse and orthogonal, we found that one embedding dimension was strongly associated with the outcome. Mapping this dimension back to the population network revealed that differences in educational ties and attainment corresponded to distinct network structures associated with right-wing populist voting. Our study contributes methodologically by demonstrating how population-scale network embeddings can be made interpretable, and substantively by linking structural network differences in education to right-wing populist voting.

Population-Scale Network Embeddings Expose Educational Divides in Network Structure Related to Right-Wing Populist Voting

Abstract

Administrative registry data can be used to construct population-scale networks whose ties reflect shared social contexts between persons. With machine learning, such networks can be encoded into numerical representations -- embeddings -- that automatically capture an individual's position within the network. We created embeddings for all persons in the Dutch population from a population-scale network that represents five shared contexts: neighborhood, work, family, household, and school. To assess the informativeness of these embeddings, we used them to predict right-wing populist voting. Embeddings alone predicted right-wing populist voting above chance-level but performed worse than individual characteristics. Combining the best subset of embeddings with individual characteristics only slightly improved predictions. After transforming the embeddings to make their dimensions more sparse and orthogonal, we found that one embedding dimension was strongly associated with the outcome. Mapping this dimension back to the population network revealed that differences in educational ties and attainment corresponded to distinct network structures associated with right-wing populist voting. Our study contributes methodologically by demonstrating how population-scale network embeddings can be made interpretable, and substantively by linking structural network differences in education to right-wing populist voting.

Paper Structure

This paper contains 25 sections, 9 equations, 6 figures.

Figures (6)

  • Figure 1: Illustration of our approach. A: We created node embeddings for all persons in the population network (here displayed with four dimensions). Persons that could be linked to their voting behavior are indicated with a dashed contour. B: For those individuals, we predicted right-wing populist voting using network embeddings and individual characteristics. We identified one dimension (underlined) that was highly predictive of right-wing populist voting. C: We computed edge utilities for the entire population network, indicating the importance of an edge for predicting right-wing populist voting. When two connected nodes have similar embedding dimensions and a higher value in the populist dimension than in the other dimensions, edge utility is positive. When they have similar embedding dimensions and a relatively lower value in the populist dimension, edge utility is negative. When they have similar embedding dimensions and the populist dimension has a similar value to the other dimensions or completely dissimilar embedding dimensions, edge utility is zero.
  • Figure 2: Out-of-sample prediction performance for right-wing populist voting. Performance was measured with the macro AUC score (on y-axis). The x-axis shows different feature sets used for prediction. Points indicate the posterior predictive mean. Vertical bars indicate 95% credible intervals. Colors indicate whether the predictions were made for all scores (purple) or only for a subset of the best performing prediction models (XGBoost and embeddings with 100 walks per node of length 20 and 32 embedding dimensions, green).
  • Figure 3: Importance of 10 most important variables predicting populist voting behavior in the LISS panel data. Importance was quantified with SHAP values for each predictor and observation. Individual SHAP values were aggregated for each decile to guarantee privacy of the panel subjects. Each point represents the average SHAP value of a decile. Color indicates the average SHAP value in the decile, normalized between zero and one. Grey-colored points indicate that the average value could not be published to prevent group disclosure. Predictors on the y-axis are ordered according to the mean of their absolute decile-averaged SHAP values which are displayed on the right side of each panel. Panels contain results for predictor sets that included (A) only covariates (B) only embeddings or (C) embeddings and covariates. For (B--C), the results with untransformed versus DINE-transformed embeddings is shown in the two columns.
  • Figure 4: Edge utility scores (x-axis) for different network relation types (y-axis) and relation groups. Bold vertical bars indicate means and boxes span $\pm$ 2 standard deviations from the mean of the edge utility distributions of (A) neighbors, colleagues and households, (B) classmates, and (C) family relations. The size of the circles on the right of each box indicate the number of relations of the respective type. Classmate relations typically refer to students enrolled at the same school in the same location in the same school year (but this differs per relation type). Special schools are dedicated to students with mental, physical, or learning disabilities. Examples for institutional households are elderly homes, student dorms, and prisons. For neighbor relations, "200 meters" refers to 20 random persons living within a 200 meter radius and "10 closest" refers to all persons living at the 10 closest addresses. HH = household. By-m. = related by marriage.
  • Figure 5: Edge utility strength for different person-level variables. Highest achieved education levels (A), the number of parents born outside the Netherlands (B), and gender (C). Bold vertical bars indicate means and boxes span $\pm$ 2 standard deviations from the mean of the edge utility strength distributions. The size of the circles on the right of each box indicate the number of persons in the respective category. The categories are mutually exclusive within each panel but not across panels.
  • ...and 1 more figures