Table of Contents
Fetching ...

Compartmentalised Agentic Reasoning for Clinical NLI

Maël Jullien, Lei Xu, Marco Valentino, André Freitas

TL;DR

The paper tackles schema collapse in clinical natural language inference (CTNLI) by introducing CARENLI, a compartmentalised agentic framework that routes premise-statement pairs to reasoning families, executes family-specific solvers with explicit traces, and uses a verifier and refiner for auditable corrections. By extending the CTNLI benchmark to 200 items across four reasoning families, CARENLI demonstrates substantial gains (approximately +34 percentage points on average) across multiple backbone models, especially in risk and epistemic verification tasks. The work provides a detailed analysis of each component, showing that the solver’s structured inference schemas deliver the majority of gains while routing accuracy remains a critical bottleneck. Overall, CARENLI offers a practical pathway to more reliable, auditable clinical inference by enforcing schema-aligned reasoning trajectories, with limitations noted in compositional reasoning and the need for broader validation.

Abstract

Large language models can produce fluent judgments for clinical natural language inference, yet they frequently fail when the decision requires the correct inferential schema rather than surface matching. We introduce CARENLI, a compartmentalised agentic framework that routes each premise-statement pair to a reasoning family and then applies a specialised solver with explicit verification and targeted refinement. We evaluate on an expanded CTNLI benchmark of 200 instances spanning four reasoning families: Causal Attribution, Compositional Grounding, Epistemic Verification, and Risk State Abstraction. Across four contemporary backbone models, CARENLI improves mean accuracy from about 23% with direct prompting to about 57%, a gain of roughly 34 points, with the largest benefits on structurally demanding reasoning types. These results support compartmentalisation plus verification as a practical route to more reliable and auditable clinical inference.

Compartmentalised Agentic Reasoning for Clinical NLI

TL;DR

The paper tackles schema collapse in clinical natural language inference (CTNLI) by introducing CARENLI, a compartmentalised agentic framework that routes premise-statement pairs to reasoning families, executes family-specific solvers with explicit traces, and uses a verifier and refiner for auditable corrections. By extending the CTNLI benchmark to 200 items across four reasoning families, CARENLI demonstrates substantial gains (approximately +34 percentage points on average) across multiple backbone models, especially in risk and epistemic verification tasks. The work provides a detailed analysis of each component, showing that the solver’s structured inference schemas deliver the majority of gains while routing accuracy remains a critical bottleneck. Overall, CARENLI offers a practical pathway to more reliable, auditable clinical inference by enforcing schema-aligned reasoning trajectories, with limitations noted in compositional reasoning and the need for broader validation.

Abstract

Large language models can produce fluent judgments for clinical natural language inference, yet they frequently fail when the decision requires the correct inferential schema rather than surface matching. We introduce CARENLI, a compartmentalised agentic framework that routes each premise-statement pair to a reasoning family and then applies a specialised solver with explicit verification and targeted refinement. We evaluate on an expanded CTNLI benchmark of 200 instances spanning four reasoning families: Causal Attribution, Compositional Grounding, Epistemic Verification, and Risk State Abstraction. Across four contemporary backbone models, CARENLI improves mean accuracy from about 23% with direct prompting to about 57%, a gain of roughly 34 points, with the largest benefits on structurally demanding reasoning types. These results support compartmentalisation plus verification as a practical route to more reliable and auditable clinical inference.

Paper Structure

This paper contains 39 sections, 4 equations, 19 figures, 6 tables.

Figures (19)

  • Figure 1: CARENLI: Compartmentalised agentic reasoning framework for Clinical Trial NLI. A Router assigns each premise--statement pair to a dominant reasoning type (Causal Attribution, Compositional Grounding, Epistemic Verification, or Risk State Abstraction). A reasoning-type-specific solver produces a provisional NLI label and explicit reasoning trace, which a verifier audits for factual grounding and schema compliance, and a refiner minimally corrects when needed. The figure illustrates the pipeline on a causal attribution example. Unlike generic prompting strategies CARENLI enforces formalised, structured reasoning trajectories that are explicitly grounded in clinical trial semantics and checked for logical consistency.
  • Figure 2: Overall accuracy on CTNLI tasks across all models and evaluation strategies (CARENLI, Oracle Router, CoT, and Direct). Results are averaged over four runs per configuration.
  • Figure 3: Overall accuracy of reasoning classification across models.
  • Figure 4: Verifier accuracy across reasoning families and models.
  • Figure 5: Causal Solver Prompt
  • ...and 14 more figures