Tracing Thought: Using Chain-of-Thought Reasoning to Identify the LLM Behind AI-Generated Text
Shifali Agrahari, Sanasam Ranbir Singh
TL;DR
The paper tackles the problem of detecting AI-generated text and attributing it to a specific LLM. It proposes COT_Finetuned, a dual-task framework that uses Chain-of-Thought reasoning to generate explanations while jointly predicting AI-vs-Human and LLM identity. Experiments show that CoT enhances performance, with Bert+COT achieving strong Task A F1 and reasonable Task B attribution, underscoring the value of interpretability. The approach has practical implications for academic integrity and content moderation, providing transparent, model-aware detection of AI-generated content.
Abstract
In recent years, the detection of AI-generated text has become a critical area of research due to concerns about academic integrity, misinformation, and ethical AI deployment. This paper presents COT Fine-tuned, a novel framework for detecting AI-generated text and identifying the specific language model. responsible for generating the text. We propose a dual-task approach, where Task A involves classifying text as AI-generated or human-written, and Task B identifies the specific LLM behind the text. The key innovation of our method lies in the use of Chain-of-Thought reasoning, which enables the model to generate explanations for its predictions, enhancing transparency and interpretability. Our experiments demonstrate that COT Fine-tuned achieves high accuracy in both tasks, with strong performance in LLM identification and human-AI classification. We also show that the CoT reasoning process contributes significantly to the models effectiveness and interpretability.
