Table of Contents
Fetching ...

MedCoG: Maximizing LLM Inference Density in Medical Reasoning via Meta-Cognitive Regulation

Yu Zhao, Hao Guan, Yongcheng Jing, Ying Zhang, Dacheng Tao

TL;DR

MedCoG introduces a medical meta-cognition framework that regulates LLM reasoning to maximize inference density and efficiency. A Meta-Cognition Regulator assesses problem complexity, familiarity, and knowledge density to steer the Knowledge Executor toward Procedural (SCoT), Factual (KG verification), or Episodic (memory) knowledge, enabling on-demand reasoning. Two metrics, Inference Density and Inference Incremental Efficiency, quantify how effectively additional computation translates into performance gains, and experiments across five hard medical benchmarks show a 5.5x inference density improvement and superior accuracy. The Oracle study underscores the potential upper bound of meta-cognitive regulation, pointing to future work in adaptive calibration and knowledge evolution to further close the gap to ideal strategy selection.

Abstract

Large Language Models (LLMs) have shown strong potential in complex medical reasoning yet face diminishing gains under inference scaling laws. While existing studies augment LLMs with various knowledge types, it remains unclear how effectively the additional costs translate into accuracy. In this paper, we explore how meta-cognition of LLMs, i.e., their self-awareness of their own knowledge states, can regulate the reasoning process. Specifically, we propose MedCoG, a Medical Meta-Cognition Agent with Knowledge Graph, where the meta-cognitive assessments of task complexity, familiarity, and knowledge density dynamically regulate utilization of procedural, episodic, and factual knowledge. The LLM-centric on-demand reasoning aims to mitigate scaling laws by (1) reducing costs via avoiding indiscriminate scaling, (2) improving accuracy via filtering out distractive knowledge. To validate this, we empirically characterize the scaling curve and introduce inference density to quantify inference efficiency, defined as the ratio of theoretically effective cost to actual cost. Experiments demonstrate the effectiveness and efficiency of MedCoG on five hard sets of medical benchmarks, yielding 5.5x inference density. Furthermore, the Oracle study highlights the significant potential of meta-cognitive regulation.

MedCoG: Maximizing LLM Inference Density in Medical Reasoning via Meta-Cognitive Regulation

TL;DR

MedCoG introduces a medical meta-cognition framework that regulates LLM reasoning to maximize inference density and efficiency. A Meta-Cognition Regulator assesses problem complexity, familiarity, and knowledge density to steer the Knowledge Executor toward Procedural (SCoT), Factual (KG verification), or Episodic (memory) knowledge, enabling on-demand reasoning. Two metrics, Inference Density and Inference Incremental Efficiency, quantify how effectively additional computation translates into performance gains, and experiments across five hard medical benchmarks show a 5.5x inference density improvement and superior accuracy. The Oracle study underscores the potential upper bound of meta-cognitive regulation, pointing to future work in adaptive calibration and knowledge evolution to further close the gap to ideal strategy selection.

Abstract

Large Language Models (LLMs) have shown strong potential in complex medical reasoning yet face diminishing gains under inference scaling laws. While existing studies augment LLMs with various knowledge types, it remains unclear how effectively the additional costs translate into accuracy. In this paper, we explore how meta-cognition of LLMs, i.e., their self-awareness of their own knowledge states, can regulate the reasoning process. Specifically, we propose MedCoG, a Medical Meta-Cognition Agent with Knowledge Graph, where the meta-cognitive assessments of task complexity, familiarity, and knowledge density dynamically regulate utilization of procedural, episodic, and factual knowledge. The LLM-centric on-demand reasoning aims to mitigate scaling laws by (1) reducing costs via avoiding indiscriminate scaling, (2) improving accuracy via filtering out distractive knowledge. To validate this, we empirically characterize the scaling curve and introduce inference density to quantify inference efficiency, defined as the ratio of theoretically effective cost to actual cost. Experiments demonstrate the effectiveness and efficiency of MedCoG on five hard sets of medical benchmarks, yielding 5.5x inference density. Furthermore, the Oracle study highlights the significant potential of meta-cognitive regulation.
Paper Structure (34 sections, 6 equations, 6 figures, 12 tables)

This paper contains 34 sections, 6 equations, 6 figures, 12 tables.

Figures (6)

  • Figure 1: (a) Inference Cost-Accuracy Analysis on MedQA based on GPT-4o. (b) Inference Density Analysis. The inference scaling curve is fitted with $R^2=0.91$. MedCoG-Meta advances the Pareto Frontier and shows $5.5\times$ inference density. MedCoG-Oracle reveal the upper bound of meta-cognition regulation.
  • Figure 2: The MedCoG framework for medical reasoning, composed of (1) Meta-Cognition Regulator to route the reasoning strategies via Monitoring, Planning, and Evaluating. (2) Knowledge Executor, to provide Episodic (Recalling Experiences), Procedural (Knowing How), and Factual Knowledge (Knowing What). Based on its assessment of complexity, familiarity, and knowledge density, the regulator dynamically routes reasoning over different knowledge types.
  • Figure 3: Strategy Distribution and Performance with different Monitoring backbones on MedQA-H and on different datasets.
  • Figure 4: Meta-Cognition score threshold study of Knowledge Density and Familiarity on MedQA. Complexity study is in Figure \ref{['fig:complexity_thres']}.
  • Figure 5: Score Distributions of Meta-Monitor with different backbones on MedQA Full set and Hard set.
  • ...and 1 more figures