Table of Contents
Fetching ...

Uncovering Implicit Bias in Large Language Models with Concept Learning Dataset

Leroy Z. Wang

TL;DR

The paper addresses implicit biases in large language models by introducing a concept-learning dataset and using in-context concept learning to probe mathematical reasoning. It demonstrates a bias toward upward monotone quantifiers during concept learning, which diminishes under explicit semantics, suggesting in-context probing reveals hidden biases not captured by standard prompts. The approach provides a practical tool for bias discovery and emphasizes how training data and task design can influence biased reasoning in LLMs. These findings have implications for bias mitigation and the evaluation of mathematical reasoning in LLMs.

Abstract

We introduce a dataset of concept learning tasks that helps uncover implicit biases in large language models. Using in-context concept learning experiments, we found that language models may have a bias toward upward monotonicity in quantifiers; such bias is less apparent when the model is tested by direct prompting without concept learning components. This demonstrates that in-context concept learning can be an effective way to discover hidden biases in language models.

Uncovering Implicit Bias in Large Language Models with Concept Learning Dataset

TL;DR

The paper addresses implicit biases in large language models by introducing a concept-learning dataset and using in-context concept learning to probe mathematical reasoning. It demonstrates a bias toward upward monotone quantifiers during concept learning, which diminishes under explicit semantics, suggesting in-context probing reveals hidden biases not captured by standard prompts. The approach provides a practical tool for bias discovery and emphasizes how training data and task design can influence biased reasoning in LLMs. These findings have implications for bias mitigation and the evaluation of mathematical reasoning in LLMs.

Abstract

We introduce a dataset of concept learning tasks that helps uncover implicit biases in large language models. Using in-context concept learning experiments, we found that language models may have a bias toward upward monotonicity in quantifiers; such bias is less apparent when the model is tested by direct prompting without concept learning components. This demonstrates that in-context concept learning can be an effective way to discover hidden biases in language models.

Paper Structure

This paper contains 12 sections, 3 figures, 3 tables.

Figures (3)

  • Figure 1: In-context concept learning helps uncover bias in monotonicity that is less noticeable in standard evaluation methods. In experiments with OLMo-2 and K2, models tend to have higher accuracies with upward monotone concepts during concept learning experiments (top). However, this bias is less noticeable in explicit semantics experiments (bottom).
  • Figure 2: In-context concept learning helps uncover bias in monotonicity that is less noticeable in standard evaluation methods. In experiments with OLMo-2 and K2, models tend to have higher accuracies with upward monotone concepts during concept learning experiments (top). However, this bias is less noticeable in explicit semantics experiments (bottom).
  • Figure 3: OLMo-2-32B concept learning results with cardinal concepts. The bias toward upward monotonicity becomes stronger as $c$ increases. $c \in [3, 30]$.