Table of Contents
Fetching ...

Decision-Making with Deliberation: Meta-reviewing as a Document-grounded Dialogue

Sukannya Purkayastha, Nils Dycke, Anne Lauscher, Iryna Gurevych

TL;DR

This work reframes meta-reviewing as a document-grounded decision-making dialogue, addressing the need for AI assistants that support, rather than replace, human meta-reviewers. It introduces ReMuSE, a Reward-based Multi-aspect Self-Editing framework to generate high-quality, knowledge-grounded synthetic dialogues from reviewer reviews, enabling fine-tuning of compact open-models for meta-review assistance. Through extensive automatic and human evaluations, including a within-subject deployment study, the authors show that dialogue agents trained with ReMuSE can reduce meta-reviewing time by up to about 50% while improving content relevance and coverage. The study demonstrates the viability of grounding, task-specific dialogue agents in expert reviews and highlights practical considerations, biases, and evaluation caveats for deploying AI-assisted peer review tools in real-world settings.

Abstract

Meta-reviewing is a pivotal stage in the peer-review process, serving as the final step in determining whether a paper is recommended for acceptance. Prior research on meta-reviewing has treated this as a summarization problem over review reports. However, complementary to this perspective, meta-reviewing is a decision-making process that requires weighing reviewer arguments and placing them within a broader context. Prior research has demonstrated that decision-makers can be effectively assisted in such scenarios via dialogue agents. In line with this framing, we explore the practical challenges for realizing dialog agents that can effectively assist meta-reviewers. Concretely, we first address the issue of data scarcity for training dialogue agents by generating synthetic data using Large Language Models (LLMs) based on a self-refinement strategy to improve the relevance of these dialogues to expert domains. Our experiments demonstrate that this method produces higher-quality synthetic data and can serve as a valuable resource towards training meta-reviewing assistants. Subsequently, we utilize this data to train dialogue agents tailored for meta-reviewing and find that these agents outperform \emph{off-the-shelf} LLM-based assistants for this task. Finally, we apply our agents in real-world meta-reviewing scenarios and confirm their effectiveness in enhancing the efficiency of meta-reviewing.\footnote{Code available at: https://github.com/UKPLab/eacl2026-meta-review-as-dialog

Decision-Making with Deliberation: Meta-reviewing as a Document-grounded Dialogue

TL;DR

This work reframes meta-reviewing as a document-grounded decision-making dialogue, addressing the need for AI assistants that support, rather than replace, human meta-reviewers. It introduces ReMuSE, a Reward-based Multi-aspect Self-Editing framework to generate high-quality, knowledge-grounded synthetic dialogues from reviewer reviews, enabling fine-tuning of compact open-models for meta-review assistance. Through extensive automatic and human evaluations, including a within-subject deployment study, the authors show that dialogue agents trained with ReMuSE can reduce meta-reviewing time by up to about 50% while improving content relevance and coverage. The study demonstrates the viability of grounding, task-specific dialogue agents in expert reviews and highlights practical considerations, biases, and evaluation caveats for deploying AI-assisted peer review tools in real-world settings.

Abstract

Meta-reviewing is a pivotal stage in the peer-review process, serving as the final step in determining whether a paper is recommended for acceptance. Prior research on meta-reviewing has treated this as a summarization problem over review reports. However, complementary to this perspective, meta-reviewing is a decision-making process that requires weighing reviewer arguments and placing them within a broader context. Prior research has demonstrated that decision-makers can be effectively assisted in such scenarios via dialogue agents. In line with this framing, we explore the practical challenges for realizing dialog agents that can effectively assist meta-reviewers. Concretely, we first address the issue of data scarcity for training dialogue agents by generating synthetic data using Large Language Models (LLMs) based on a self-refinement strategy to improve the relevance of these dialogues to expert domains. Our experiments demonstrate that this method produces higher-quality synthetic data and can serve as a valuable resource towards training meta-reviewing assistants. Subsequently, we utilize this data to train dialogue agents tailored for meta-reviewing and find that these agents outperform \emph{off-the-shelf} LLM-based assistants for this task. Finally, we apply our agents in real-world meta-reviewing scenarios and confirm their effectiveness in enhancing the efficiency of meta-reviewing.\footnote{Code available at: https://github.com/UKPLab/eacl2026-meta-review-as-dialog

Paper Structure

This paper contains 68 sections, 18 figures, 21 tables.

Figures (18)

  • Figure 1: Illustration of the process of meta-reviewing as a dialogue. Dialogues include requests to summarize opinions, weight arguments, and contextualize them.
  • Figure 2: Overview of our Reward-based Multi-aspect Self Editing (ReMuSE) method. ReMuSE consists of four steps: 1. Initial Dialogue Generation in which we prompt an LLM with relevant documents (paper reviews) and instructions, 2. Evaluation of the dialogues by computing one or multiple measures (rewards), 3. Natural language Feedback Generation based on the computed rewards, 4. Self-Refinement of the dialogues based on the feedback.
  • Figure 3: Correlation between human and automated evaluation metrics.
  • Figure 4: Comparison of utterances in human and synthetically generated dialogues in terms of (a) Token Distribution, (b) Specificity, (c) Q2 F1, and (d) K-Prec.
  • Figure 5: Correlation of the human evaluation metrics. We observe the strongest correlation between Helpfulness, Faithfulness, and Objectivity.
  • ...and 13 more figures