Table of Contents
Fetching ...

Harnessing Chain-of-Thought Metadata for Task Routing and Adversarial Prompt Detection

Ryan Marinelli, Josef Pichlmeier, Tamas Bisztray

TL;DR

This work introduces Number of Thoughts ($NofT$), a CoT-derived pre-prompt metric to quantify task difficulty, detect adversarial prompts, and enable routing in production LLMs. It derives $NofT$ from a distillation-enabled CoT system on the MathInstruct corpus and trains a Random Forest estimator using TF-IDF features to predict thought counts, enabling model routing and security applications. Key findings show that deviations in predicted $NofT$ correlate with adversarial prompts (achieving up to 95% accuracy on DIA-Bench), and that threshold-based routing across distilled DeepSeek variants (1.5B, 7B, 14B) reduces latency by up to 24.9% with minimal accuracy loss. The results demonstrate practical benefits for secure, efficient LLM deployment and motivate further exploration of CoT metadata for robust, resource-aware prompt routing.

Abstract

In this work, we propose a metric called Number of Thoughts (NofT) to determine the difficulty of tasks pre-prompting and support Large Language Models (LLMs) in production contexts. By setting thresholds based on the number of thoughts, this metric can discern the difficulty of prompts and support more effective prompt routing. A 2% decrease in latency is achieved when routing prompts from the MathInstruct dataset through quantized, distilled versions of Deepseek with 1.7 billion, 7 billion, and 14 billion parameters. Moreover, this metric can be used to detect adversarial prompts used in prompt injection attacks with high efficacy. The Number of Thoughts can inform a classifier that achieves 95% accuracy in adversarial prompt detection. Our experiments ad datasets used are available on our GitHub page: https://github.com/rymarinelli/Number_Of_Thoughts/tree/main.

Harnessing Chain-of-Thought Metadata for Task Routing and Adversarial Prompt Detection

TL;DR

This work introduces Number of Thoughts (), a CoT-derived pre-prompt metric to quantify task difficulty, detect adversarial prompts, and enable routing in production LLMs. It derives from a distillation-enabled CoT system on the MathInstruct corpus and trains a Random Forest estimator using TF-IDF features to predict thought counts, enabling model routing and security applications. Key findings show that deviations in predicted correlate with adversarial prompts (achieving up to 95% accuracy on DIA-Bench), and that threshold-based routing across distilled DeepSeek variants (1.5B, 7B, 14B) reduces latency by up to 24.9% with minimal accuracy loss. The results demonstrate practical benefits for secure, efficient LLM deployment and motivate further exploration of CoT metadata for robust, resource-aware prompt routing.

Abstract

In this work, we propose a metric called Number of Thoughts (NofT) to determine the difficulty of tasks pre-prompting and support Large Language Models (LLMs) in production contexts. By setting thresholds based on the number of thoughts, this metric can discern the difficulty of prompts and support more effective prompt routing. A 2% decrease in latency is achieved when routing prompts from the MathInstruct dataset through quantized, distilled versions of Deepseek with 1.7 billion, 7 billion, and 14 billion parameters. Moreover, this metric can be used to detect adversarial prompts used in prompt injection attacks with high efficacy. The Number of Thoughts can inform a classifier that achieves 95% accuracy in adversarial prompt detection. Our experiments ad datasets used are available on our GitHub page: https://github.com/rymarinelli/Number_Of_Thoughts/tree/main.

Paper Structure

This paper contains 29 sections, 8 equations, 2 figures, 11 tables.

Figures (2)

  • Figure 1: Predicted Thought Count and Prompt Difficulty
  • Figure 2: Determining Threshold