RuleAlign: Making Large Language Models Better Physicians with Diagnostic Rule Alignment
Xiaohan Wang, Xiaoyan Yang, Yuqi Zhu, Yue Shen, Jian Wang, Peng Wei, Lei Liang, Jinjie Gu, Huajun Chen, Ningyu Zhang
TL;DR
RuleAlign tackles the gap between LLM-based medical dialogue and physician-level diagnostic reasoning by enforcing rule-based outputs through a diagnostic-rule framework. The authors build UrologyRD, a rule-driven dialogue dataset, and train models via a two-phase process: supervised fine-tuning followed by offline preference optimization that favors rule-compliant responses. They demonstrate that RuleAlign improves key metrics such as perplexity, Rouge, and BLEU across multiple base models and enhances multidimensional SP testing scores, indicating better information gathering, guidance, and logical deduction. The work advances AI physician capabilities by providing a scalable method to encode professional diagnostic rules into LLM behavior and offers a practical dataset and evaluation framework for future expansion.
Abstract
Large Language Models (LLMs) like GPT-4, MedPaLM-2, and Med-Gemini achieve performance competitively with human experts across various medical benchmarks. However, they still face challenges in making professional diagnoses akin to physicians, particularly in efficiently gathering patient information and reasoning the final diagnosis. To this end, we introduce the RuleAlign framework, designed to align LLMs with specific diagnostic rules. We develop a medical dialogue dataset comprising rule-based communications between patients and physicians and design an alignment learning approach through preference learning. Experimental results demonstrate the effectiveness of the proposed approach. We hope that our work can serve as an inspiration for exploring the potential of LLMs as AI physicians.
