Table of Contents
Fetching ...

ML-EAT: A Multilevel Embedding Association Test for Interpretable and Transparent Social Science

Robert Wolfe, Alexis Hiniker, Bill Howe

TL;DR

ML-EAT introduces a three-level, interpretable Embedding Association Test framework that disaggregates bias into level-specific statistics, supplemented by EAT-Maps and a nine-pattern taxonomy to visualize and interpret intrinsic biases in language technologies. By applying to GloVe, HistWords, GPT-2, and CLIP, the approach reveals richer bias structure than traditional WEAT, including historical shifts, prompting effects, and model anisotropy, thereby enhancing transparency in computational social science. The method generalizes SC-EAT, supports alternative association measures, and offers practical guidance for robust bias analysis and reporting, with implications for dataset curation and responsible AI deployment. Overall, ML-EAT advances observable and interpretable bias measurement across static, diachronic, and multimodal embeddings, contributing a concrete, operational toolkit for researchers and policymakers.

Abstract

This research introduces the Multilevel Embedding Association Test (ML-EAT), a method designed for interpretable and transparent measurement of intrinsic bias in language technologies. The ML-EAT addresses issues of ambiguity and difficulty in interpreting the traditional EAT measurement by quantifying bias at three levels of increasing granularity: the differential association between two target concepts with two attribute concepts; the individual effect size of each target concept with two attribute concepts; and the association between each individual target concept and each individual attribute concept. Using the ML-EAT, this research defines a taxonomy of EAT patterns describing the nine possible outcomes of an embedding association test, each of which is associated with a unique EAT-Map, a novel four-quadrant visualization for interpreting the ML-EAT. Empirical analysis of static and diachronic word embeddings, GPT-2 language models, and a CLIP language-and-image model shows that EAT patterns add otherwise unobservable information about the component biases that make up an EAT; reveal the effects of prompting in zero-shot models; and can also identify situations when cosine similarity is an ineffective metric, rendering an EAT unreliable. Our work contributes a method for rendering bias more observable and interpretable, improving the transparency of computational investigations into human minds and societies.

ML-EAT: A Multilevel Embedding Association Test for Interpretable and Transparent Social Science

TL;DR

ML-EAT introduces a three-level, interpretable Embedding Association Test framework that disaggregates bias into level-specific statistics, supplemented by EAT-Maps and a nine-pattern taxonomy to visualize and interpret intrinsic biases in language technologies. By applying to GloVe, HistWords, GPT-2, and CLIP, the approach reveals richer bias structure than traditional WEAT, including historical shifts, prompting effects, and model anisotropy, thereby enhancing transparency in computational social science. The method generalizes SC-EAT, supports alternative association measures, and offers practical guidance for robust bias analysis and reporting, with implications for dataset curation and responsible AI deployment. Overall, ML-EAT advances observable and interpretable bias measurement across static, diachronic, and multimodal embeddings, contributing a concrete, operational toolkit for researchers and policymakers.

Abstract

This research introduces the Multilevel Embedding Association Test (ML-EAT), a method designed for interpretable and transparent measurement of intrinsic bias in language technologies. The ML-EAT addresses issues of ambiguity and difficulty in interpreting the traditional EAT measurement by quantifying bias at three levels of increasing granularity: the differential association between two target concepts with two attribute concepts; the individual effect size of each target concept with two attribute concepts; and the association between each individual target concept and each individual attribute concept. Using the ML-EAT, this research defines a taxonomy of EAT patterns describing the nine possible outcomes of an embedding association test, each of which is associated with a unique EAT-Map, a novel four-quadrant visualization for interpreting the ML-EAT. Empirical analysis of static and diachronic word embeddings, GPT-2 language models, and a CLIP language-and-image model shows that EAT patterns add otherwise unobservable information about the component biases that make up an EAT; reveal the effects of prompting in zero-shot models; and can also identify situations when cosine similarity is an ineffective metric, rendering an EAT unreliable. Our work contributes a method for rendering bias more observable and interpretable, improving the transparency of computational investigations into human minds and societies.
Paper Structure (34 sections, 11 equations, 3 figures, 12 tables)

This paper contains 34 sections, 11 equations, 3 figures, 12 tables.

Figures (3)

  • Figure 1: A visualization of the ML-EAT applied to the Career/Family and Young/Old EATs introduced by caliskan2017semantics. Where traditional EATs return a single effect size and $p$-value, the ML-EAT surfaces underlying patterns of bias in individual target group associations with the A or B attribute. A taxonomy of EAT patterns and EAT-Map visualizations provide a categorical and visual vocabulary for describing differences between EATs.
  • Figure 2: A taxonomy of EAT patterns to describe the associations of an EAT's target groups with its two attribute groups. Each pattern has a unique EAT-Map formed by shading cells corresponding to significant Level 2 tests, with target groups on the X-axis and attributes on the Y.
  • Figure 3: The ML-EAT can clarify underlying patterns of bias in studies of historical bias. While Math/Arts gender bias in the 1990s appears to return to 1920s magnitudes based on Level 1 effect sizes, Level 2 makes clear that the underlying bias pattern (Nondirectional, with two small, non-significant effect sizes) has not changed - although Math does exhibit a small, non-significant effect with Male in the 1990s.