Table of Contents
Fetching ...

Arg-LLaDA: Argument Summarization via Large Language Diffusion Models and Sufficiency-Aware Refinement

Hao Li, Yizheng Sun, Viktor Schlegel, Kailai Yang, Riza Batista-Navarro, Goran Nenadic

TL;DR

Arg-LLaDA introduces a diffusion-based, iterative framework for argument summarization that leverages sufficiency-guided remasking to progressively refine outputs. By coupling a flexible mask-remask controller with a span-level sufficiency checker, the model targets under-supported or redundant content while preserving argumentative structure. Evaluations on ArgKP and ASE show consistent improvements over state-of-the-art baselines in automatic metrics and human judgments across coverage, faithfulness, and conciseness. The work demonstrates the value of iterative, sufficiency-aware generation for producing faithful, concise, and well-structured argument summaries.

Abstract

Argument summarization aims to generate concise, structured representations of complex, multi-perspective debates. While recent work has advanced the identification and clustering of argumentative components, the generation stage remains underexplored. Existing approaches typically rely on single-pass generation, offering limited support for factual correction or structural refinement. To address this gap, we introduce Arg-LLaDA, a novel large language diffusion framework that iteratively improves summaries via sufficiency-guided remasking and regeneration. Our method combines a flexible masking controller with a sufficiency-checking module to identify and revise unsupported, redundant, or incomplete spans, yielding more faithful, concise, and coherent outputs. Empirical results on two benchmark datasets demonstrate that Arg-LLaDA surpasses state-of-the-art baselines in 7 out of 10 automatic evaluation metrics. In addition, human evaluations reveal substantial improvements across core dimensions, coverage, faithfulness, and conciseness, validating the effectiveness of our iterative, sufficiency-aware generation strategy.

Arg-LLaDA: Argument Summarization via Large Language Diffusion Models and Sufficiency-Aware Refinement

TL;DR

Arg-LLaDA introduces a diffusion-based, iterative framework for argument summarization that leverages sufficiency-guided remasking to progressively refine outputs. By coupling a flexible mask-remask controller with a span-level sufficiency checker, the model targets under-supported or redundant content while preserving argumentative structure. Evaluations on ArgKP and ASE show consistent improvements over state-of-the-art baselines in automatic metrics and human judgments across coverage, faithfulness, and conciseness. The work demonstrates the value of iterative, sufficiency-aware generation for producing faithful, concise, and well-structured argument summaries.

Abstract

Argument summarization aims to generate concise, structured representations of complex, multi-perspective debates. While recent work has advanced the identification and clustering of argumentative components, the generation stage remains underexplored. Existing approaches typically rely on single-pass generation, offering limited support for factual correction or structural refinement. To address this gap, we introduce Arg-LLaDA, a novel large language diffusion framework that iteratively improves summaries via sufficiency-guided remasking and regeneration. Our method combines a flexible masking controller with a sufficiency-checking module to identify and revise unsupported, redundant, or incomplete spans, yielding more faithful, concise, and coherent outputs. Empirical results on two benchmark datasets demonstrate that Arg-LLaDA surpasses state-of-the-art baselines in 7 out of 10 automatic evaluation metrics. In addition, human evaluations reveal substantial improvements across core dimensions, coverage, faithfulness, and conciseness, validating the effectiveness of our iterative, sufficiency-aware generation strategy.

Paper Structure

This paper contains 32 sections, 6 equations, 2 figures, 5 tables.

Figures (2)

  • Figure 1: Training and inference process of Arg-LLaDA. The model iteratively generate the output through a masked denoising diffusion process. At each timestep, the sufficiency diagnosis module assigns token-level sufficiency scores based on the claim-evidence context, guiding a selective masking controller to focus regeneration on unsupported or redundant spans. This allows Arg-LLaDA to perform semantically grounded and targeted refinement toward factually faithful and concise argument summaries.
  • Figure 2: Human evaluation results across three dimensions, using a 5-point Likert scale.