Effect of dimensionality change on the bias of word embeddings
Rohit Raj Rai, Amit Awekar
TL;DR
This paper addresses how changing the dimensionality of word embeddings influences bias in the representations, a question not yet thoroughly explored. It employs WEAT for static WEMs and C-WEAT for contextualized WEMs on English Wikipedia, across dimensionalities ranging from $20$ to $1000$ for static models and from $128$ to $1024$ for contextual models. The results show that bias changes significantly with dimensionality, and that there is no uniform pattern across target/attribute groups or WEM types. The findings highlight the importance of accounting for dimensionality when deploying embeddings in real-world systems and motivate future work on downstream NLP tasks to assess bias variation.
Abstract
Word embedding methods (WEMs) are extensively used for representing text data. The dimensionality of these embeddings varies across various tasks and implementations. The effect of dimensionality change on the accuracy of the downstream task is a well-explored question. However, how the dimensionality change affects the bias of word embeddings needs to be investigated. Using the English Wikipedia corpus, we study this effect for two static (Word2Vec and fastText) and two context-sensitive (ElMo and BERT) WEMs. We have two observations. First, there is a significant variation in the bias of word embeddings with the dimensionality change. Second, there is no uniformity in how the dimensionality change affects the bias of word embeddings. These factors should be considered while selecting the dimensionality of word embeddings.
