GraphIF: Enhancing Multi-Turn Instruction Following for Large Language Models with Relation Graph Prompt
Zhenhe Li, Can Lin, Ling Zheng, Wen-Da Wei, Junli Liang, Qi Song
TL;DR
GraphIF addresses the challenge of enforcing inter-turn constraints in multi-turn instruction following without fine-tuning. It models dialogues as directed relation graphs and uses graph prompts to guide initial responses. An agent-based relation extraction workflow iteratively identifies relations and builds the graph, followed by a relation graph prompt generation and an initial response rewrite to produce final outputs. Experiments on MT-Eval* and StructFlowBench* show substantial improvements across CSR, ISR, DRFR, and WCSR across multiple backbones, with memory-based baselines failing to capture inter-turn relations; GraphIF is training-free and plug-and-play, scalable across model sizes.
Abstract
Multi-turn instruction following is essential for building intelligent conversational systems that can consistently adhere to instructions across dialogue turns. However, existing approaches to enhancing multi-turn instruction following primarily rely on collecting or generating large-scale multi-turn dialogue datasets to fine-tune large language models (LLMs), which treat each response generation as an isolated task and fail to explicitly incorporate multi-turn instruction following into the optimization objectives. As a result, instruction-tuned LLMs often struggle with complex long-distance constraints. In multi-turn dialogues, relational constraints across turns can be naturally modeled as labeled directed edges, making graph structures particularly suitable for modeling multi-turn instruction following. Despite this potential, leveraging graph structures to enhance the multi-turn instruction following capabilities of LLMs remains unexplored. To bridge this gap, we propose GraphIF, a plug-and-play framework that models multi-turn dialogues as directed relation graphs and leverages graph prompts to enhance the instruction following capabilities of LLMs. GraphIF comprises three key components: (1) an agent-based relation extraction module that captures inter-turn semantic relations via action-triggered mechanisms to construct structured graphs; (2) a relation graph prompt generation module that converts structured graph information into natural language prompts; and (3) a response rewriting module that refines initial LLM outputs using the generated graph prompts. Extensive experiments on two long multi-turn dialogue datasets demonstrate that GraphIF can be seamlessly integrated into instruction-tuned LLMs and leads to significant improvements across all four multi-turn instruction-following evaluation metrics.
