Table of Contents
Fetching ...

Beyond Bias Scores: Unmasking Vacuous Neutrality in Small Language Models

Sumanth Manduru, Carlotta Domeniconi

TL;DR

The paper addresses the fairness of small language models (0.5B–5B) by introducing VaNeu, a four-stage framework—Bias, Utility, Ambiguity Handling, and Positional Bias—for pre-deployment evaluation. It conducts a large-scale audit across nine open-source SLMs from four families using BBQ, StereoSet, and CrowS-Pairs, revealing that models with low bias can still perform poorly under ambiguity or display biased response patterns due to positional heuristics. Key findings show that the Phi family often achieves robust Utility and Ambiguity Handling with minimal Positional Bias, while other families exhibit vacuous neutrality—apparent fairness paired with unreliable reasoning. The work argues for multidimensional fairness assessment prior to deployment and highlights directions for formalizing Vacuous Neutrality and developing a composite fairness metric with practical implications for responsible use of SLMs in sensitive settings.

Abstract

The rapid adoption of Small Language Models (SLMs) for resource constrained applications has outpaced our understanding of their ethical and fairness implications. To address this gap, we introduce the Vacuous Neutrality Framework (VaNeu), a multi-dimensional evaluation paradigm designed to assess SLM fairness prior to deployment. The framework examines model robustness across four stages - biases, utility, ambiguity handling, and positional bias over diverse social bias categories. To the best of our knowledge, this work presents the first large-scale audit of SLMs in the 0.5-5B parameter range, an overlooked "middle tier" between BERT-class encoders and flagship LLMs. We evaluate nine widely used SLMs spanning four model families under both ambiguous and disambiguated contexts. Our findings show that models demonstrating low bias in early stages often fail subsequent evaluations, revealing hidden vulnerabilities and unreliable reasoning. These results underscore the need for a more comprehensive understanding of fairness and reliability in SLMs, and position the proposed framework as a principled tool for responsible deployment in socially sensitive settings.

Beyond Bias Scores: Unmasking Vacuous Neutrality in Small Language Models

TL;DR

The paper addresses the fairness of small language models (0.5B–5B) by introducing VaNeu, a four-stage framework—Bias, Utility, Ambiguity Handling, and Positional Bias—for pre-deployment evaluation. It conducts a large-scale audit across nine open-source SLMs from four families using BBQ, StereoSet, and CrowS-Pairs, revealing that models with low bias can still perform poorly under ambiguity or display biased response patterns due to positional heuristics. Key findings show that the Phi family often achieves robust Utility and Ambiguity Handling with minimal Positional Bias, while other families exhibit vacuous neutrality—apparent fairness paired with unreliable reasoning. The work argues for multidimensional fairness assessment prior to deployment and highlights directions for formalizing Vacuous Neutrality and developing a composite fairness metric with practical implications for responsible use of SLMs in sensitive settings.

Abstract

The rapid adoption of Small Language Models (SLMs) for resource constrained applications has outpaced our understanding of their ethical and fairness implications. To address this gap, we introduce the Vacuous Neutrality Framework (VaNeu), a multi-dimensional evaluation paradigm designed to assess SLM fairness prior to deployment. The framework examines model robustness across four stages - biases, utility, ambiguity handling, and positional bias over diverse social bias categories. To the best of our knowledge, this work presents the first large-scale audit of SLMs in the 0.5-5B parameter range, an overlooked "middle tier" between BERT-class encoders and flagship LLMs. We evaluate nine widely used SLMs spanning four model families under both ambiguous and disambiguated contexts. Our findings show that models demonstrating low bias in early stages often fail subsequent evaluations, revealing hidden vulnerabilities and unreliable reasoning. These results underscore the need for a more comprehensive understanding of fairness and reliability in SLMs, and position the proposed framework as a principled tool for responsible deployment in socially sensitive settings.

Paper Structure

This paper contains 31 sections, 5 equations, 12 figures, 11 tables.

Figures (12)

  • Figure 1: The Vacuous Neutrality Framework (VaNeu): a four-stage evaluation paradigm for assessing SLMs across Bias, Utility, Ambiguity Handling, and Positional Bias. Stage 1 (Bias) examines fairness via bias score, Stage 2 (Utility) tests task competence using F1 score, Stage 3 (Ambiguity Handling) measures calibrated caution via Target-to-NonTarget Ratio (TNR) and Unknown Ratio (UR), and Stage 4 (Positional Bias) evaluates response distribution consistency using normalized KL divergence.
  • Figure 2: Heatmaps show bias scores for (a) Tiny and (b) Small LMs under Ambiguous and Disambiguated contexts. Rows denote social bias categories and columns denote SLMs. Red indicates stereotypical, blue anti-stereotypical, and gray near-neutral responses. Most scores fall within ±15%, with the range spanning –100% to +100%.
  • Figure 3: Heatmaps show F1 scores for (a) Tiny LMs (blue) and (b) Small LMs (green) under Ambiguous and Disambiguated contexts. Rows represent social bias categories and columns represent SLMs. Darker shades indicate higher F1 Score and stronger task performance; lighter shades denote weaker competence.
  • Figure 4: (Left) Target/Non-target Ratio (TNR) by category for SLMs; values $> 1.0$ indicate a stronger tendency to predict target (stereotypical) over non-target, while values $< 1.0$ indicate bias denial. (Middle) Unknown Ratio (UR): values 1.0 indicates that the model correctly flags ambiguous cases as unresolvable. (Right) Stage 4 positional bias measured as normalized KL divergence $( {\text{Norm-}}D_{KL})$; higher is better and closer to the reference distribution. The dashed line marks the ground-truth baseline at $1.0$.
  • Figure 5: Bias scores for CSQA-fine-tuned LMs on BBQ, shown as heatmaps for (a) Tiny LMs and (b) Small LMs under Ambiguous and Disambiguated contexts. Rows denote social bias categories and columns denote SLMs. Red indicates stereotypical, blue anti-stereotypical, and gray near-neutral responses. Most scores fall within -20% to +10%, with the range spanning –100% to +100%.
  • ...and 7 more figures