Table of Contents
Fetching ...

Knowledge-Augmented Long-CoT Generation for Complex Biomolecular Reasoning

Tianwen Lyu, Xiang Zhuang, Keyan Ding, Xinzhe Cao, Lei Liang, Wei Zhao, Qiang Zhang, Huajun Chen

TL;DR

The paper tackles the challenge of reliable, multi-step biomolecular reasoning where LLMs struggle with grounding and long-range dependencies. It introduces Bio-KCoT, a knowledge-augmented long-CoT framework that retrieves and prunes knowledge-graph–guided reasoning paths and integrates them into supervised fine-tuning and reinforcement learning. To support rigorous evaluation, it also introduces PrimeKGQA, a diverse biomolecular QA benchmark with varying reasoning depths. Across PrimeKGQA and external datasets, Bio-KCoT achieves state-of-the-art performance on deep, multi-hop tasks and demonstrates strong generalization with smaller models, highlighting the value of structured knowledge in biology-oriented reasoning.

Abstract

Understanding complex biomolecular mechanisms requires multi-step reasoning across molecular interactions, signaling cascades, and metabolic pathways. While large language models(LLMs) show promise in such tasks, their application to biomolecular problems is hindered by logical inconsistencies and the lack of grounding in domain knowledge. Existing approaches often exacerbate these issues: reasoning steps may deviate from biological facts or fail to capture long mechanistic dependencies. To address these challenges, we propose a Knowledge-Augmented Long-CoT Reasoning framework that integrates LLMs with knowledge graph-based multi-hop reasoning chains. The framework constructs mechanistic chains via guided multi-hop traversal and pruning on the knowledge graph; these chains are then incorporated into supervised fine-tuning to improve factual grounding and further refined with reinforcement learning to enhance reasoning reliability and consistency. Furthermore, to overcome the shortcomings of existing benchmarks, which are often restricted in scale and scope and lack annotations for deep reasoning chains, we introduce PrimeKGQA, a comprehensive benchmark for biomolecular question answering. Experimental results on both PrimeKGQA and existing datasets demonstrate that although larger closed-source models still perform well on relatively simple tasks, our method demonstrates clear advantages as reasoning depth increases, achieving state-of-the-art performance on multi-hop tasks that demand traversal of structured biological knowledge. These findings highlight the effectiveness of combining structured knowledge with advanced reasoning strategies for reliable and interpretable biomolecular reasoning.

Knowledge-Augmented Long-CoT Generation for Complex Biomolecular Reasoning

TL;DR

The paper tackles the challenge of reliable, multi-step biomolecular reasoning where LLMs struggle with grounding and long-range dependencies. It introduces Bio-KCoT, a knowledge-augmented long-CoT framework that retrieves and prunes knowledge-graph–guided reasoning paths and integrates them into supervised fine-tuning and reinforcement learning. To support rigorous evaluation, it also introduces PrimeKGQA, a diverse biomolecular QA benchmark with varying reasoning depths. Across PrimeKGQA and external datasets, Bio-KCoT achieves state-of-the-art performance on deep, multi-hop tasks and demonstrates strong generalization with smaller models, highlighting the value of structured knowledge in biology-oriented reasoning.

Abstract

Understanding complex biomolecular mechanisms requires multi-step reasoning across molecular interactions, signaling cascades, and metabolic pathways. While large language models(LLMs) show promise in such tasks, their application to biomolecular problems is hindered by logical inconsistencies and the lack of grounding in domain knowledge. Existing approaches often exacerbate these issues: reasoning steps may deviate from biological facts or fail to capture long mechanistic dependencies. To address these challenges, we propose a Knowledge-Augmented Long-CoT Reasoning framework that integrates LLMs with knowledge graph-based multi-hop reasoning chains. The framework constructs mechanistic chains via guided multi-hop traversal and pruning on the knowledge graph; these chains are then incorporated into supervised fine-tuning to improve factual grounding and further refined with reinforcement learning to enhance reasoning reliability and consistency. Furthermore, to overcome the shortcomings of existing benchmarks, which are often restricted in scale and scope and lack annotations for deep reasoning chains, we introduce PrimeKGQA, a comprehensive benchmark for biomolecular question answering. Experimental results on both PrimeKGQA and existing datasets demonstrate that although larger closed-source models still perform well on relatively simple tasks, our method demonstrates clear advantages as reasoning depth increases, achieving state-of-the-art performance on multi-hop tasks that demand traversal of structured biological knowledge. These findings highlight the effectiveness of combining structured knowledge with advanced reasoning strategies for reliable and interpretable biomolecular reasoning.

Paper Structure

This paper contains 32 sections, 10 equations, 9 figures, 5 tables.

Figures (9)

  • Figure 1: Comparison of different reasoning approaches for biomolecular problems. (a) Reasoning LLMs generate multi-step reasoning chains but often suffer from hallucinations and logical inconsistencies. (b) Retrieval-augmented generation and related knowledge-enhanced methods reduce hallucinations but are knowledge-dependent on the quality and coverage of sources. (c) Our proposed Knowledge-Augmented Long-CoT framework integrates knowledge graph-guided reasoning, enabling logically coherent and reliable reasoning chains for complex biomolecular tasks.
  • Figure 2: Overview of our proposed framework. (a) Data Curation: Given a biomolecular question and candidate answers, we extract entities from the question ($E_Q$) and the correct answer ($E_A$). (b) Retrieving Reasoning Paths: The extracted entities are mapped onto KG nodes. Reasoning paths $\mathcal{P}(Q,A;d)$ are then retrieved using predefined templates. (c) CoT Trajectory Construction: The induced path $p$ provides semantic relations that guide the initial CoT generation. The generated trajectories are further refined through the pruning stage to ensure clarity and accuracy. (d) Training Pipeline: The curated $(Q,A,C_{\text{pruned}})$ pairs are used for supervised fine-tuning (SFT), followed by reinforcement learning (GRPO) to align the model’s reasoning and answer generation.
  • Figure 3: (a) Generalization results on additional biomedical benchmarks(BiomixQA, BioASQ, and MEDDDX). (b) Ablation study on the PrimeKGQA benchmark against the distilled CoT baseline.
  • Figure 4: The distribution of the PrimeKGQA test dataset across various tasks.
  • Figure 5: A detailed case study illustrating our three-stage methodology. The example demonstrates the transformation of a response, starting from the foundational KG path, to a verbose answer via CoT generation, and concluding with a concise, refined answer after CoT pruning.
  • ...and 4 more figures