Table of Contents
Fetching ...

Fine-grained List-wise Alignment for Generative Medication Recommendation

Chenxiao Fan, Chongming Gao, Wentao Shi, Yaxin Gong, Zihao Zhao, Fuli Feng

TL;DR

FLAME tackles safe medication recommendation in multimorbidity by reframing prescription generation as a sequential, drug-by-drug decision process. It introduces step-wise GRPO with potential-based reward shaping, a two-stage drug filtering and list-editing framework, and a multi-source knowledge fusion strategy that injects structured clinical signals into a domain-specific LLM. The method achieves state-of-the-art accuracy, controllable safety–accuracy trade-offs, and robust generalization across time and institutions, demonstrating practical potential for clinical decision support. By combining fine-grained credit assignment with hybrid representations, FLAME offers a scalable, interpretable approach to DDIs-aware prescription generation in real-world settings.

Abstract

Accurate and safe medication recommendations are critical for effective clinical decision-making, especially in multimorbidity cases. However, existing systems rely on point-wise prediction paradigms that overlook synergistic drug effects and potential adverse drug-drug interactions (DDIs). We propose FLAME, a fine-grained list-wise alignment framework for large language models (LLMs), enabling drug-by-drug generation of drug lists. FLAME formulates recommendation as a sequential decision process, where each step adds or removes a single drug. To provide fine-grained learning signals, we devise step-wise Group Relative Policy Optimization (GRPO) with potential-based reward shaping, which explicitly models DDIs and optimizes the contribution of each drug to the overall prescription. Furthermore, FLAME enhances patient modeling by integrating structured clinical knowledge and collaborative information into the representation space of LLMs. Experiments on benchmark datasets demonstrate that FLAME achieves state-of-the-art performance, delivering superior accuracy, controllable safety-accuracy trade-offs, and strong generalization across diverse clinical scenarios. Our code is available at https://github.com/cxfann/Flame.

Fine-grained List-wise Alignment for Generative Medication Recommendation

TL;DR

FLAME tackles safe medication recommendation in multimorbidity by reframing prescription generation as a sequential, drug-by-drug decision process. It introduces step-wise GRPO with potential-based reward shaping, a two-stage drug filtering and list-editing framework, and a multi-source knowledge fusion strategy that injects structured clinical signals into a domain-specific LLM. The method achieves state-of-the-art accuracy, controllable safety–accuracy trade-offs, and robust generalization across time and institutions, demonstrating practical potential for clinical decision support. By combining fine-grained credit assignment with hybrid representations, FLAME offers a scalable, interpretable approach to DDIs-aware prescription generation in real-world settings.

Abstract

Accurate and safe medication recommendations are critical for effective clinical decision-making, especially in multimorbidity cases. However, existing systems rely on point-wise prediction paradigms that overlook synergistic drug effects and potential adverse drug-drug interactions (DDIs). We propose FLAME, a fine-grained list-wise alignment framework for large language models (LLMs), enabling drug-by-drug generation of drug lists. FLAME formulates recommendation as a sequential decision process, where each step adds or removes a single drug. To provide fine-grained learning signals, we devise step-wise Group Relative Policy Optimization (GRPO) with potential-based reward shaping, which explicitly models DDIs and optimizes the contribution of each drug to the overall prescription. Furthermore, FLAME enhances patient modeling by integrating structured clinical knowledge and collaborative information into the representation space of LLMs. Experiments on benchmark datasets demonstrate that FLAME achieves state-of-the-art performance, delivering superior accuracy, controllable safety-accuracy trade-offs, and strong generalization across diverse clinical scenarios. Our code is available at https://github.com/cxfann/Flame.

Paper Structure

This paper contains 27 sections, 2 theorems, 21 equations, 7 figures, 3 tables.

Key Result

Theorem 4.1

Let the token-level reward for each token $t$ in the generated output $\mathbf{o}_i$ be defined under the following two schemes: Then, both reward formulations yield the same optimal policy.

Figures (7)

  • Figure 1: Contrasting advantage computation in GRPO and step-wise GRPO. (a) Outcome-based advantages in GRPO assign uniform rewards to all drugs in a completion. (b) Step-wise GRPO treats responses as sequential steps, where potential function variations at each step provide temporal signals, enabling finer-grained advantage allocation.
  • Figure 2: (a) Output is segmented by medication names into decision steps. (b) Each step is viewed as a state, with potentials $\varphi(n)$ derived from comparisons with ground truth $\mathcal{M}_{\text{GT}}$. (c) Potential differences are combined with outcome rewards to provide fine-grained training signals $\hat{A}$.
  • Figure 2: Ablation study. "w/o" denotes "without".
  • Figure 3: Illustration of the two-stage recommendation framework.
  • Figure 4: Overview of patient representation construction.
  • ...and 2 more figures

Theorems & Definitions (3)

  • Theorem 4.1: Optimal Policy Equivalence under Reward Reshaping
  • proof
  • Lemma A.1: Ng et al., 1999