Table of Contents
Fetching ...

LinguistAgent: A Reflective Multi-Model Platform for Automated Linguistic Annotation

Bingru Li

TL;DR

LinguistAgent addresses the labor-intensive challenge of linguistic annotation in the Humanities by offering a no-code, Reflective Multi-Agent platform with an Annotator and a Reviewer to automate and benchmark metaphor identification. The approach supports Prompt Engineering, Retrieval-Augmented Generation, and Fine-tuning, enabling real-time token-level evaluation using metrics such as $P$, $R$, and $F_1$. Case studies on metaphor identification demonstrate that the Reviewer loop improves detection and provides full traceability through a debug console and downloadable per-sample reports. This framework enables scalable, transparent, and reproducible annotation workflows for researchers, with open-source code available at GitHub.

Abstract

Data annotation remains a significant bottleneck in the Humanities and Social Sciences, particularly for complex semantic tasks such as metaphor identification. While Large Language Models (LLMs) show promise, a significant gap remains between the theoretical capability of LLMs and their practical utility for researchers. This paper introduces LinguistAgent, an integrated, user-friendly platform that leverages a reflective multi-model architecture to automate linguistic annotation. The system implements a dual-agent workflow, comprising an Annotator and a Reviewer, to simulate a professional peer-review process. LinguistAgent supports comparative experiments across three paradigms: Prompt Engineering (Zero/Few-shot), Retrieval-Augmented Generation, and Fine-tuning. We demonstrate LinguistAgent's efficacy using the task of metaphor identification as an example, providing real-time token-level evaluation (Precision, Recall, and $F_1$ score) against human gold standards. The application and codes are released on https://github.com/Bingru-Li/LinguistAgent.

LinguistAgent: A Reflective Multi-Model Platform for Automated Linguistic Annotation

TL;DR

LinguistAgent addresses the labor-intensive challenge of linguistic annotation in the Humanities by offering a no-code, Reflective Multi-Agent platform with an Annotator and a Reviewer to automate and benchmark metaphor identification. The approach supports Prompt Engineering, Retrieval-Augmented Generation, and Fine-tuning, enabling real-time token-level evaluation using metrics such as , , and . Case studies on metaphor identification demonstrate that the Reviewer loop improves detection and provides full traceability through a debug console and downloadable per-sample reports. This framework enables scalable, transparent, and reproducible annotation workflows for researchers, with open-source code available at GitHub.

Abstract

Data annotation remains a significant bottleneck in the Humanities and Social Sciences, particularly for complex semantic tasks such as metaphor identification. While Large Language Models (LLMs) show promise, a significant gap remains between the theoretical capability of LLMs and their practical utility for researchers. This paper introduces LinguistAgent, an integrated, user-friendly platform that leverages a reflective multi-model architecture to automate linguistic annotation. The system implements a dual-agent workflow, comprising an Annotator and a Reviewer, to simulate a professional peer-review process. LinguistAgent supports comparative experiments across three paradigms: Prompt Engineering (Zero/Few-shot), Retrieval-Augmented Generation, and Fine-tuning. We demonstrate LinguistAgent's efficacy using the task of metaphor identification as an example, providing real-time token-level evaluation (Precision, Recall, and score) against human gold standards. The application and codes are released on https://github.com/Bingru-Li/LinguistAgent.
Paper Structure (12 sections, 5 figures, 1 table)

This paper contains 12 sections, 5 figures, 1 table.

Figures (5)

  • Figure 1: The architecture of LinguistAgent.
  • Figure 2: The user interface of LinguistAgent.
  • Figure 3: The reasoning of the Annotator and the critique of the Reviewer.
  • Figure 4: The live tagging display, the performance (Average F1 Score), and the progress bar.
  • Figure 5: The debug section, keeping logs of all raw model responses and error types.