Table of Contents
Fetching ...

SAIE Framework: Support Alone Isn't Enough -- Advancing LLM Training with Adversarial Remarks

Mengsay Loem, Masahiro Kaneko, Naoaki Okazaki

TL;DR

The paper addresses the gap of leveraging interactive discussions during LLM training to boost reasoning and CoT verbalization. It introduces the SAIE framework, which pairs a trainable Learner with a fixed Partner that provides supportive or adversarial remarks across a two-phase training process (Warm-up on a subset, followed by a multi-round Discussion Phase), with the Learner updated from these interactions. Evaluations on GSM8K, CommonsenseQA, and MMLU using Flan-T5 and GPT-3.5 show SAIE consistently outperforms standard fine-tuning and single-remark baselines, with the combination of supportive and adversarial remarks yielding the strongest gains and improved CoT verbalization measured by automatic ROUGE metrics and human judgments. Inference-time experiments reveal SAIE-trained models better engage in self- and collaborative discussions, achieving higher accuracy than baselines in interaction settings and showcasing practical improvements for real-world reasoning tasks. The work also provides a thorough analysis of partner remarks, human-rated alignment, and discusses limitations related to model diversity, training dynamics, and computational costs, along with ethics considerations for adversarial feedback.

Abstract

Large Language Models (LLMs) can justify or critique their predictions through discussions with other models or humans, thereby enriching their intrinsic understanding of instances. While proactive discussions in the inference phase have been shown to boost performance, such interactions have not been extensively explored during the training phase. We hypothesize that incorporating interactive discussions into the training process can enhance the models' understanding and improve their reasoning and verbal expression abilities during inference. This work introduces the SAIE framework, which facilitates supportive and adversarial discussions between learner and partner models. The learner model receives responses from the partner, and its parameters are then updated based on this discussion. This dynamic adjustment process continues throughout the training phase, responding to the evolving outputs of the learner model. Our empirical evaluation across various tasks, including math problems, commonsense reasoning, and multi-domain knowledge, demonstrates that models fine-tuned with the SAIE framework outperform those trained with conventional fine-tuning approaches. Furthermore, our method enhances the models' reasoning capabilities, improving both individual and multi-agent inference performance.

SAIE Framework: Support Alone Isn't Enough -- Advancing LLM Training with Adversarial Remarks

TL;DR

The paper addresses the gap of leveraging interactive discussions during LLM training to boost reasoning and CoT verbalization. It introduces the SAIE framework, which pairs a trainable Learner with a fixed Partner that provides supportive or adversarial remarks across a two-phase training process (Warm-up on a subset, followed by a multi-round Discussion Phase), with the Learner updated from these interactions. Evaluations on GSM8K, CommonsenseQA, and MMLU using Flan-T5 and GPT-3.5 show SAIE consistently outperforms standard fine-tuning and single-remark baselines, with the combination of supportive and adversarial remarks yielding the strongest gains and improved CoT verbalization measured by automatic ROUGE metrics and human judgments. Inference-time experiments reveal SAIE-trained models better engage in self- and collaborative discussions, achieving higher accuracy than baselines in interaction settings and showcasing practical improvements for real-world reasoning tasks. The work also provides a thorough analysis of partner remarks, human-rated alignment, and discusses limitations related to model diversity, training dynamics, and computational costs, along with ethics considerations for adversarial feedback.

Abstract

Large Language Models (LLMs) can justify or critique their predictions through discussions with other models or humans, thereby enriching their intrinsic understanding of instances. While proactive discussions in the inference phase have been shown to boost performance, such interactions have not been extensively explored during the training phase. We hypothesize that incorporating interactive discussions into the training process can enhance the models' understanding and improve their reasoning and verbal expression abilities during inference. This work introduces the SAIE framework, which facilitates supportive and adversarial discussions between learner and partner models. The learner model receives responses from the partner, and its parameters are then updated based on this discussion. This dynamic adjustment process continues throughout the training phase, responding to the evolving outputs of the learner model. Our empirical evaluation across various tasks, including math problems, commonsense reasoning, and multi-domain knowledge, demonstrates that models fine-tuned with the SAIE framework outperform those trained with conventional fine-tuning approaches. Furthermore, our method enhances the models' reasoning capabilities, improving both individual and multi-agent inference performance.
Paper Structure (37 sections, 1 figure, 7 tables, 1 algorithm)

This paper contains 37 sections, 1 figure, 7 tables, 1 algorithm.

Figures (1)

  • Figure 1: Overview of SAIE Framework. Partner provides a supportive remark if Learner's answer is incorrect, and an adversarial remark if the answer is correct. Only the learner model undergoes parameter updates based on these interactions. Question in this example is: 'Tom decides to renovate a house. There are 3 bedrooms and each bedroom takes 4 hours to renovate. The kitchen takes 50% longer than each bedroom. The living room took twice as much time as everything else combined. How long did everything take?'