Table of Contents
Fetching ...

SEVADE: Self-Evolving Multi-Agent Analysis with Decoupled Evaluation for Hallucination-Resistant Irony Detection

Ziqi Liu, Ziyang Zhou, Yilin Li, Mingxuan Hu, Yushan Pan, Zhijie Xu, Yangbin Chen

TL;DR

The core of the framework is a Dynamic Agentive Reasoning Engine (DARE), which utilizes a team of specialized agents grounded in linguistic theory to perform a multifaceted deconstruction of the text and generate a structured reasoning chain.

Abstract

Sarcasm detection is a crucial yet challenging Natural Language Processing task. Existing Large Language Model methods are often limited by single-perspective analysis, static reasoning pathways, and a susceptibility to hallucination when processing complex ironic rhetoric, which impacts their accuracy and reliability. To address these challenges, we propose **SEVADE**, a novel **S**elf-**Ev**olving multi-agent **A**nalysis framework with **D**ecoupled **E**valuation for hallucination-resistant sarcasm detection. The core of our framework is a Dynamic Agentive Reasoning Engine (DARE), which utilizes a team of specialized agents grounded in linguistic theory to perform a multifaceted deconstruction of the text and generate a structured reasoning chain. Subsequently, a separate lightweight rationale adjudicator (RA) performs the final classification based solely on this reasoning chain. This decoupled architecture is designed to mitigate the risk of hallucination by separating complex reasoning from the final judgment. Extensive experiments on four benchmark datasets demonstrate that our framework achieves state-of-the-art performance, with average improvements of **6.75%** in Accuracy and **6.29%** in Macro-F1 score.

SEVADE: Self-Evolving Multi-Agent Analysis with Decoupled Evaluation for Hallucination-Resistant Irony Detection

TL;DR

The core of the framework is a Dynamic Agentive Reasoning Engine (DARE), which utilizes a team of specialized agents grounded in linguistic theory to perform a multifaceted deconstruction of the text and generate a structured reasoning chain.

Abstract

Sarcasm detection is a crucial yet challenging Natural Language Processing task. Existing Large Language Model methods are often limited by single-perspective analysis, static reasoning pathways, and a susceptibility to hallucination when processing complex ironic rhetoric, which impacts their accuracy and reliability. To address these challenges, we propose **SEVADE**, a novel **S**elf-**Ev**olving multi-agent **A**nalysis framework with **D**ecoupled **E**valuation for hallucination-resistant sarcasm detection. The core of our framework is a Dynamic Agentive Reasoning Engine (DARE), which utilizes a team of specialized agents grounded in linguistic theory to perform a multifaceted deconstruction of the text and generate a structured reasoning chain. Subsequently, a separate lightweight rationale adjudicator (RA) performs the final classification based solely on this reasoning chain. This decoupled architecture is designed to mitigate the risk of hallucination by separating complex reasoning from the final judgment. Extensive experiments on four benchmark datasets demonstrate that our framework achieves state-of-the-art performance, with average improvements of **6.75%** in Accuracy and **6.29%** in Macro-F1 score.

Paper Structure

This paper contains 22 sections, 6 equations, 5 figures, 4 tables, 1 algorithm.

Figures (5)

  • Figure 1: Key limitations of prior LLM-based sarcasm detection: single-perspective analysis, hallucination risk in final judgment, and static, inflexible reasoning pathways.
  • Figure 2: The overall framework of SEVADE. On the left, a pool of Core Analysis Agents, each with a distinct analytical perspective, and Support Agents with auxiliary functions are maintained. In the middle, the Controller Agent dynamically selects, refines, and expands the agent team to generate a structured reasoning chain. On the right, the Rationale Adjudicator produces the final sarcasm prediction based solely on the reasoning chain, decoupling reasoning from judgment.
  • Figure 3: Visualization of agent dynamics in processing sarcastic and non-sarcastic samples. (a) shows the mean intensity score. (b) shows the activation rate, indicating agent participation frequency.
  • Figure 4: Cross-dataset generalization performance. The left chart shows model that train on the IAC-V1 and test on semeval. The right chart shows the reverse scenario. Our proposed model is compared against BERT and RoBERTa.
  • Figure 5: The effect of model scale on performance, using the Qwen 2 and Llama 3 series.