Table of Contents
Fetching ...

Rule-Guided Feedback: Enhancing Reasoning by Enforcing Rule Adherence in Large Language Models

Aissatou Diallo, Antonis Bikakis, Luke Dickens, Anthony Hunter, Rob Miller

TL;DR

Rule-Guided Feedback (RGF) presents a dual-agent framework where a Performer generates solutions under explicit rules and a Teacher enforces adherence through structured feedback. The method emphasizes rule compliance and proactive information seeking, potentially augmented by expert verification during inference. Across five tasks (Penguins In A Table, Checkmate In One, Shakespearean Sonnet Writing, GSM8K, StrategyQA), RGF outperforms direct prompting and several baselines, with ablations confirming the value of clarifying questions, expert validation, and bounded iteration. The work demonstrates that guided, rule-based interactions can improve LLM reasoning and reliability in open-ended, uncertain scenarios.

Abstract

In this paper, we introduce Rule-Guided Feedback (RGF), a framework designed to enhance Large Language Model (LLM) performance through structured rule adherence and strategic information seeking. RGF implements a teacher-student paradigm where rule-following is forced through established guidelines. Our framework employs a Teacher model that rigorously evaluates each student output against task-specific rules, providing constructive guidance rather than direct answers when detecting deviations. This iterative feedback loop serves two crucial purposes: maintaining solutions within defined constraints and encouraging proactive information seeking to resolve uncertainties. We evaluate RGF on diverse tasks including Checkmate-in-One puzzles, Sonnet Writing, Penguins-In-a-Table classification, GSM8k, and StrategyQA. Our findings suggest that structured feedback mechanisms can significantly enhance LLMs' performance across various domains.

Rule-Guided Feedback: Enhancing Reasoning by Enforcing Rule Adherence in Large Language Models

TL;DR

Rule-Guided Feedback (RGF) presents a dual-agent framework where a Performer generates solutions under explicit rules and a Teacher enforces adherence through structured feedback. The method emphasizes rule compliance and proactive information seeking, potentially augmented by expert verification during inference. Across five tasks (Penguins In A Table, Checkmate In One, Shakespearean Sonnet Writing, GSM8K, StrategyQA), RGF outperforms direct prompting and several baselines, with ablations confirming the value of clarifying questions, expert validation, and bounded iteration. The work demonstrates that guided, rule-based interactions can improve LLM reasoning and reliability in open-ended, uncertain scenarios.

Abstract

In this paper, we introduce Rule-Guided Feedback (RGF), a framework designed to enhance Large Language Model (LLM) performance through structured rule adherence and strategic information seeking. RGF implements a teacher-student paradigm where rule-following is forced through established guidelines. Our framework employs a Teacher model that rigorously evaluates each student output against task-specific rules, providing constructive guidance rather than direct answers when detecting deviations. This iterative feedback loop serves two crucial purposes: maintaining solutions within defined constraints and encouraging proactive information seeking to resolve uncertainties. We evaluate RGF on diverse tasks including Checkmate-in-One puzzles, Sonnet Writing, Penguins-In-a-Table classification, GSM8k, and StrategyQA. Our findings suggest that structured feedback mechanisms can significantly enhance LLMs' performance across various domains.

Paper Structure

This paper contains 41 sections, 5 figures, 5 tables.

Figures (5)

  • Figure 1: Rule-Guided Feedback framework: Teacher LLM generates rules, provides feedback, and evaluates. Performer LLM generates new answers or questions based on rules and feedback.
  • Figure 2: Performance of RGF in terms of Mean Conversation length in Accurate cases (MCA), Mean Conversation Length (MCL) and Dialogue Density (DD).
  • Figure 3: Performance across different iteration limits. Most tasks achieve optimal results within 5 iterations, with diminishing returns thereafter
  • Figure 4: Rule violation rates with and without expert validation. Expert validation reduces rule violations by an average of 12% across tasks.
  • Figure 5: Comprehensive analysis of RGF components. Performance comparison showing relative contribution of each framework component and the impact of removing key components from RGF framework. Removing clarifying questions and expert validation consistently reduces performance across all tasks.