ML-EAT: A Multilevel Embedding Association Test for Interpretable and Transparent Social Science
Robert Wolfe, Alexis Hiniker, Bill Howe
TL;DR
ML-EAT introduces a three-level, interpretable Embedding Association Test framework that disaggregates bias into level-specific statistics, supplemented by EAT-Maps and a nine-pattern taxonomy to visualize and interpret intrinsic biases in language technologies. By applying to GloVe, HistWords, GPT-2, and CLIP, the approach reveals richer bias structure than traditional WEAT, including historical shifts, prompting effects, and model anisotropy, thereby enhancing transparency in computational social science. The method generalizes SC-EAT, supports alternative association measures, and offers practical guidance for robust bias analysis and reporting, with implications for dataset curation and responsible AI deployment. Overall, ML-EAT advances observable and interpretable bias measurement across static, diachronic, and multimodal embeddings, contributing a concrete, operational toolkit for researchers and policymakers.
Abstract
This research introduces the Multilevel Embedding Association Test (ML-EAT), a method designed for interpretable and transparent measurement of intrinsic bias in language technologies. The ML-EAT addresses issues of ambiguity and difficulty in interpreting the traditional EAT measurement by quantifying bias at three levels of increasing granularity: the differential association between two target concepts with two attribute concepts; the individual effect size of each target concept with two attribute concepts; and the association between each individual target concept and each individual attribute concept. Using the ML-EAT, this research defines a taxonomy of EAT patterns describing the nine possible outcomes of an embedding association test, each of which is associated with a unique EAT-Map, a novel four-quadrant visualization for interpreting the ML-EAT. Empirical analysis of static and diachronic word embeddings, GPT-2 language models, and a CLIP language-and-image model shows that EAT patterns add otherwise unobservable information about the component biases that make up an EAT; reveal the effects of prompting in zero-shot models; and can also identify situations when cosine similarity is an ineffective metric, rendering an EAT unreliable. Our work contributes a method for rendering bias more observable and interpretable, improving the transparency of computational investigations into human minds and societies.
