Table of Contents
Fetching ...

Argument Quality Assessment in the Age of Instruction-Following Large Language Models

Henning Wachsmuth, Gabriella Lapesa, Elena Cabrio, Anne Lauscher, Joonsuk Park, Eva Maria Vecchi, Serena Villata, Timon Ziegenbein

TL;DR

It is argued that the capabilities of instruction-following large language models (LLMs) to leverage knowledge across contexts enable a much more reliable assessment.

Abstract

The computational treatment of arguments on controversial issues has been subject to extensive NLP research, due to its envisioned impact on opinion formation, decision making, writing education, and the like. A critical task in any such application is the assessment of an argument's quality - but it is also particularly challenging. In this position paper, we start from a brief survey of argument quality research, where we identify the diversity of quality notions and the subjectiveness of their perception as the main hurdles towards substantial progress on argument quality assessment. We argue that the capabilities of instruction-following large language models (LLMs) to leverage knowledge across contexts enable a much more reliable assessment. Rather than just fine-tuning LLMs towards leaderboard chasing on assessment tasks, they need to be instructed systematically with argumentation theories and scenarios as well as with ways to solve argument-related problems. We discuss the real-world opportunities and ethical issues emerging thereby.

Argument Quality Assessment in the Age of Instruction-Following Large Language Models

TL;DR

It is argued that the capabilities of instruction-following large language models (LLMs) to leverage knowledge across contexts enable a much more reliable assessment.

Abstract

The computational treatment of arguments on controversial issues has been subject to extensive NLP research, due to its envisioned impact on opinion formation, decision making, writing education, and the like. A critical task in any such application is the assessment of an argument's quality - but it is also particularly challenging. In this position paper, we start from a brief survey of argument quality research, where we identify the diversity of quality notions and the subjectiveness of their perception as the main hurdles towards substantial progress on argument quality assessment. We argue that the capabilities of instruction-following large language models (LLMs) to leverage knowledge across contexts enable a much more reliable assessment. Rather than just fine-tuning LLMs towards leaderboard chasing on assessment tasks, they need to be instructed systematically with argumentation theories and scenarios as well as with ways to solve argument-related problems. We discuss the real-world opportunities and ethical issues emerging thereby.
Paper Structure (37 sections, 1 equation, 2 figures, 1 table)

This paper contains 37 sections, 1 equation, 2 figures, 1 table.

Figures (2)

  • Figure 1: Organization of the surveyed argument quality research into three general directions (conceptual notions, influence factors, and computational models), their main aspects (e.g., notions of maximal and minimal quality), and specific concepts studied for these (e.g., agreement, preference, and deliberation).
  • Figure 2: Learning of representational spaces in NLP models (same color: same type of representation): (a) Traditional supervised learning: Input and output spaces are separated across tasks; representations are task-specific. (b) Classification/Regression transformer: The input space is shared across tasks; its representation can be learned on all tasks. (b') Generation transformer: Both spaces are shared across tasks; their representations can be learned on all tasks, but not task interactions. (c) Instruction-following transformer: One space for inputs, outputs, and tasks; representations can be learned jointly on all tasks.