SAIE Framework: Support Alone Isn't Enough -- Advancing LLM Training with Adversarial Remarks
Mengsay Loem, Masahiro Kaneko, Naoaki Okazaki
TL;DR
The paper addresses the gap of leveraging interactive discussions during LLM training to boost reasoning and CoT verbalization. It introduces the SAIE framework, which pairs a trainable Learner with a fixed Partner that provides supportive or adversarial remarks across a two-phase training process (Warm-up on a subset, followed by a multi-round Discussion Phase), with the Learner updated from these interactions. Evaluations on GSM8K, CommonsenseQA, and MMLU using Flan-T5 and GPT-3.5 show SAIE consistently outperforms standard fine-tuning and single-remark baselines, with the combination of supportive and adversarial remarks yielding the strongest gains and improved CoT verbalization measured by automatic ROUGE metrics and human judgments. Inference-time experiments reveal SAIE-trained models better engage in self- and collaborative discussions, achieving higher accuracy than baselines in interaction settings and showcasing practical improvements for real-world reasoning tasks. The work also provides a thorough analysis of partner remarks, human-rated alignment, and discusses limitations related to model diversity, training dynamics, and computational costs, along with ethics considerations for adversarial feedback.
Abstract
Large Language Models (LLMs) can justify or critique their predictions through discussions with other models or humans, thereby enriching their intrinsic understanding of instances. While proactive discussions in the inference phase have been shown to boost performance, such interactions have not been extensively explored during the training phase. We hypothesize that incorporating interactive discussions into the training process can enhance the models' understanding and improve their reasoning and verbal expression abilities during inference. This work introduces the SAIE framework, which facilitates supportive and adversarial discussions between learner and partner models. The learner model receives responses from the partner, and its parameters are then updated based on this discussion. This dynamic adjustment process continues throughout the training phase, responding to the evolving outputs of the learner model. Our empirical evaluation across various tasks, including math problems, commonsense reasoning, and multi-domain knowledge, demonstrates that models fine-tuned with the SAIE framework outperform those trained with conventional fine-tuning approaches. Furthermore, our method enhances the models' reasoning capabilities, improving both individual and multi-agent inference performance.
