Table of Contents
Fetching ...

GraphIF: Enhancing Multi-Turn Instruction Following for Large Language Models with Relation Graph Prompt

Zhenhe Li, Can Lin, Ling Zheng, Wen-Da Wei, Junli Liang, Qi Song

TL;DR

GraphIF addresses the challenge of enforcing inter-turn constraints in multi-turn instruction following without fine-tuning. It models dialogues as directed relation graphs and uses graph prompts to guide initial responses. An agent-based relation extraction workflow iteratively identifies relations and builds the graph, followed by a relation graph prompt generation and an initial response rewrite to produce final outputs. Experiments on MT-Eval* and StructFlowBench* show substantial improvements across CSR, ISR, DRFR, and WCSR across multiple backbones, with memory-based baselines failing to capture inter-turn relations; GraphIF is training-free and plug-and-play, scalable across model sizes.

Abstract

Multi-turn instruction following is essential for building intelligent conversational systems that can consistently adhere to instructions across dialogue turns. However, existing approaches to enhancing multi-turn instruction following primarily rely on collecting or generating large-scale multi-turn dialogue datasets to fine-tune large language models (LLMs), which treat each response generation as an isolated task and fail to explicitly incorporate multi-turn instruction following into the optimization objectives. As a result, instruction-tuned LLMs often struggle with complex long-distance constraints. In multi-turn dialogues, relational constraints across turns can be naturally modeled as labeled directed edges, making graph structures particularly suitable for modeling multi-turn instruction following. Despite this potential, leveraging graph structures to enhance the multi-turn instruction following capabilities of LLMs remains unexplored. To bridge this gap, we propose GraphIF, a plug-and-play framework that models multi-turn dialogues as directed relation graphs and leverages graph prompts to enhance the instruction following capabilities of LLMs. GraphIF comprises three key components: (1) an agent-based relation extraction module that captures inter-turn semantic relations via action-triggered mechanisms to construct structured graphs; (2) a relation graph prompt generation module that converts structured graph information into natural language prompts; and (3) a response rewriting module that refines initial LLM outputs using the generated graph prompts. Extensive experiments on two long multi-turn dialogue datasets demonstrate that GraphIF can be seamlessly integrated into instruction-tuned LLMs and leads to significant improvements across all four multi-turn instruction-following evaluation metrics.

GraphIF: Enhancing Multi-Turn Instruction Following for Large Language Models with Relation Graph Prompt

TL;DR

GraphIF addresses the challenge of enforcing inter-turn constraints in multi-turn instruction following without fine-tuning. It models dialogues as directed relation graphs and uses graph prompts to guide initial responses. An agent-based relation extraction workflow iteratively identifies relations and builds the graph, followed by a relation graph prompt generation and an initial response rewrite to produce final outputs. Experiments on MT-Eval* and StructFlowBench* show substantial improvements across CSR, ISR, DRFR, and WCSR across multiple backbones, with memory-based baselines failing to capture inter-turn relations; GraphIF is training-free and plug-and-play, scalable across model sizes.

Abstract

Multi-turn instruction following is essential for building intelligent conversational systems that can consistently adhere to instructions across dialogue turns. However, existing approaches to enhancing multi-turn instruction following primarily rely on collecting or generating large-scale multi-turn dialogue datasets to fine-tune large language models (LLMs), which treat each response generation as an isolated task and fail to explicitly incorporate multi-turn instruction following into the optimization objectives. As a result, instruction-tuned LLMs often struggle with complex long-distance constraints. In multi-turn dialogues, relational constraints across turns can be naturally modeled as labeled directed edges, making graph structures particularly suitable for modeling multi-turn instruction following. Despite this potential, leveraging graph structures to enhance the multi-turn instruction following capabilities of LLMs remains unexplored. To bridge this gap, we propose GraphIF, a plug-and-play framework that models multi-turn dialogues as directed relation graphs and leverages graph prompts to enhance the instruction following capabilities of LLMs. GraphIF comprises three key components: (1) an agent-based relation extraction module that captures inter-turn semantic relations via action-triggered mechanisms to construct structured graphs; (2) a relation graph prompt generation module that converts structured graph information into natural language prompts; and (3) a response rewriting module that refines initial LLM outputs using the generated graph prompts. Extensive experiments on two long multi-turn dialogue datasets demonstrate that GraphIF can be seamlessly integrated into instruction-tuned LLMs and leads to significant improvements across all four multi-turn instruction-following evaluation metrics.

Paper Structure

This paper contains 45 sections, 6 equations, 10 figures, 10 tables.

Figures (10)

  • Figure 1: A comparison of two types of methods: Instruction-tuned LLM only and our proposed GraphIF that uses graph structure to enhance the multi-turn instruction following.
  • Figure 2: Framework overview of GraphIF. Given dialogue history and current user instruction, GraphIF first extracts semantic relations between dialogues through the Agent-Based Relation Extraction module, then employs Relation Graph Prompt Generation to generate constraint-aware prompts, and finally uses the Initial Response Rewrite module to refine the initial response.
  • Figure 3: Detailed constraint satisfaction results across different constraint types on the StructFlowBench*.
  • Figure 4: A typical case demonstrating how GraphIF iteratively extracts relations and corrects errors in the initial response through generated relation graph prompts.
  • Figure 7: Detailed constraint satisfaction results across different constraint types on the MT-Eval*.
  • ...and 5 more figures