BERT's Conceptual Cartography: Mapping the Landscapes of Meaning
Nina Haket, Ryan Daniels
TL;DR
This work operationalizes Conceptual Engineering by constructing conceptual landscapes that map the pragmatic usage of words through BERT-based contextual embeddings drawn from the Spoken British National Corpus. It combines PCA dimensionality reduction, Gaussian Mixture Models, and a suite of metrics (MEV, self-similarity, intra- and inter-group similarity) with qualitative analysis to reveal word-specific, context-driven landscapes. The findings show substantial variability across words and even within a single lemma, underscoring the need for word-by-word CE strategies rather than one-size-fits-all approaches. The methodology offers a framework for quantifying lexical landscapes that can inform ethical language design and downstream NLP tasks like bias detection and sentiment analysis.
Abstract
Conceptual Engineers want to make words better. However, they often underestimate how varied our usage of words is. In this paper, we take the first steps in exploring the contextual nuances of words by creating conceptual landscapes -- 2D surfaces representing the pragmatic usage of words -- that conceptual engineers can use to inform their projects. We use the spoken component of the British National Corpus and BERT to create contextualised word embeddings, and use Gaussian Mixture Models, a selection of metrics, and qualitative analysis to visualise and numerically represent lexical landscapes. Such an approach has not yet been used in the conceptual engineering literature and provides a detailed examination of how different words manifest in various contexts that is potentially useful to conceptual engineering projects. Our findings highlight the inherent complexity of conceptual engineering, revealing that each word exhibits a unique and intricate landscape. Conceptual Engineers cannot, therefore, use a one-size-fits-all approach when improving words -- a task that may be practically intractable at scale.
