Table of Contents
Fetching ...

AnomaLLMy -- Detecting anomalous tokens in black-box LLMs through low-confidence single-token predictions

Waligóra Witold

TL;DR

AnomaLLMy addresses the problem of anomalous tokens degrading LLM reliability in black-box, API-only settings by using low-confidence single-token predictions derived from top-N log-probabilities. The method leverages three criteria with thresholds $H>1.0$, $P_{ ext{tail}}>0.1$, and $P_1-P_2<0.5$ to flag anomalies, and validates tokens via a confirmation pass, all without relying on embeddings. Applied to cl100k_base with GPT-4, it identified 478 anomalous tokens (65 minor, 413 major) at a total cost of $24.39, highlighting both the practical feasibility and the runtime challenges due to API rate limits. The work demonstrates a scalable way to improve tokenizer robustness and model reliability in API-based deployments, offering actionable pathways for filtering, perturbation, and future fine-tuning when available.

Abstract

This paper introduces AnomaLLMy, a novel technique for the automatic detection of anomalous tokens in black-box Large Language Models (LLMs) with API-only access. Utilizing low-confidence single-token predictions as a cost-effective indicator, AnomaLLMy identifies irregularities in model behavior, addressing the issue of anomalous tokens degrading the quality and reliability of models. Validated on the cl100k_base dataset, the token set of GPT-4, AnomaLLMy detected 413 major and 65 minor anomalies, demonstrating the method's efficiency with just \$24.39 spent in API credits. The insights from this research are expected to be beneficial for enhancing the robustness of and accuracy of LLMs, particularly in the development and assessment of tokenizers.

AnomaLLMy -- Detecting anomalous tokens in black-box LLMs through low-confidence single-token predictions

TL;DR

AnomaLLMy addresses the problem of anomalous tokens degrading LLM reliability in black-box, API-only settings by using low-confidence single-token predictions derived from top-N log-probabilities. The method leverages three criteria with thresholds , , and to flag anomalies, and validates tokens via a confirmation pass, all without relying on embeddings. Applied to cl100k_base with GPT-4, it identified 478 anomalous tokens (65 minor, 413 major) at a total cost of $24.39, highlighting both the practical feasibility and the runtime challenges due to API rate limits. The work demonstrates a scalable way to improve tokenizer robustness and model reliability in API-based deployments, offering actionable pathways for filtering, perturbation, and future fine-tuning when available.

Abstract

This paper introduces AnomaLLMy, a novel technique for the automatic detection of anomalous tokens in black-box Large Language Models (LLMs) with API-only access. Utilizing low-confidence single-token predictions as a cost-effective indicator, AnomaLLMy identifies irregularities in model behavior, addressing the issue of anomalous tokens degrading the quality and reliability of models. Validated on the cl100k_base dataset, the token set of GPT-4, AnomaLLMy detected 413 major and 65 minor anomalies, demonstrating the method's efficiency with just \$24.39 spent in API credits. The insights from this research are expected to be beneficial for enhancing the robustness of and accuracy of LLMs, particularly in the development and assessment of tokenizers.
Paper Structure (12 sections, 1 table)