Table of Contents
Fetching ...

An Evaluation of LLMs for Detecting Harmful Computing Terms

Joshua Jacas, Hana Winchester, Alicia Boyd, Brittany Johnson

TL;DR

The paper investigates how LLM architecture affects the automatic detection of harmful computing terminology. It compares encoder, encoder-decoder, and decoder families by prompting a fixed set of $64$ terms with zero-shot tasks, assessing accuracy and contextual reasoning. Findings show decoder architectures like Gemini Flash and Claude AI yield the strongest contextual detection and nuanced guidance, while encoder models such as BERT excel at pattern recognition but struggle with binary certainty; descriptive, context-rich outputs outperform binary labels. These results inform the design of automated inclusive-language tools in technical domains and underscore the need for task-specific tuning and expanded datasets for robust deployment.

Abstract

Detecting harmful and non-inclusive terminology in technical contexts is critical for fostering inclusive environments in computing. This study explores the impact of model architecture on harmful language detection by evaluating a curated database of technical terms, each paired with specific use cases. We tested a range of encoder, decoder, and encoder-decoder language models, including BERT-base-uncased, RoBERTa large-mnli, Gemini Flash 1.5 and 2.0, GPT-4, Claude AI Sonnet 3.5, T5-large, and BART-large-mnli. Each model was presented with a standardized prompt to identify harmful and non-inclusive language across 64 terms. Results reveal that decoder models, particularly Gemini Flash 2.0 and Claude AI, excel in nuanced contextual analysis, while encoder models like BERT exhibit strong pattern recognition but struggle with classification certainty. We discuss the implications of these findings for improving automated detection tools and highlight model-specific strengths and limitations in fostering inclusive communication in technical domains.

An Evaluation of LLMs for Detecting Harmful Computing Terms

TL;DR

The paper investigates how LLM architecture affects the automatic detection of harmful computing terminology. It compares encoder, encoder-decoder, and decoder families by prompting a fixed set of terms with zero-shot tasks, assessing accuracy and contextual reasoning. Findings show decoder architectures like Gemini Flash and Claude AI yield the strongest contextual detection and nuanced guidance, while encoder models such as BERT excel at pattern recognition but struggle with binary certainty; descriptive, context-rich outputs outperform binary labels. These results inform the design of automated inclusive-language tools in technical domains and underscore the need for task-specific tuning and expanded datasets for robust deployment.

Abstract

Detecting harmful and non-inclusive terminology in technical contexts is critical for fostering inclusive environments in computing. This study explores the impact of model architecture on harmful language detection by evaluating a curated database of technical terms, each paired with specific use cases. We tested a range of encoder, decoder, and encoder-decoder language models, including BERT-base-uncased, RoBERTa large-mnli, Gemini Flash 1.5 and 2.0, GPT-4, Claude AI Sonnet 3.5, T5-large, and BART-large-mnli. Each model was presented with a standardized prompt to identify harmful and non-inclusive language across 64 terms. Results reveal that decoder models, particularly Gemini Flash 2.0 and Claude AI, excel in nuanced contextual analysis, while encoder models like BERT exhibit strong pattern recognition but struggle with classification certainty. We discuss the implications of these findings for improving automated detection tools and highlight model-specific strengths and limitations in fostering inclusive communication in technical domains.

Paper Structure

This paper contains 17 sections, 1 figure, 1 table.

Figures (1)

  • Figure 1: How Model Types Process Harmful Terms