Compartmentalised Agentic Reasoning for Clinical NLI
Maël Jullien, Lei Xu, Marco Valentino, André Freitas
TL;DR
The paper tackles schema collapse in clinical natural language inference (CTNLI) by introducing CARENLI, a compartmentalised agentic framework that routes premise-statement pairs to reasoning families, executes family-specific solvers with explicit traces, and uses a verifier and refiner for auditable corrections. By extending the CTNLI benchmark to 200 items across four reasoning families, CARENLI demonstrates substantial gains (approximately +34 percentage points on average) across multiple backbone models, especially in risk and epistemic verification tasks. The work provides a detailed analysis of each component, showing that the solver’s structured inference schemas deliver the majority of gains while routing accuracy remains a critical bottleneck. Overall, CARENLI offers a practical pathway to more reliable, auditable clinical inference by enforcing schema-aligned reasoning trajectories, with limitations noted in compositional reasoning and the need for broader validation.
Abstract
Large language models can produce fluent judgments for clinical natural language inference, yet they frequently fail when the decision requires the correct inferential schema rather than surface matching. We introduce CARENLI, a compartmentalised agentic framework that routes each premise-statement pair to a reasoning family and then applies a specialised solver with explicit verification and targeted refinement. We evaluate on an expanded CTNLI benchmark of 200 instances spanning four reasoning families: Causal Attribution, Compositional Grounding, Epistemic Verification, and Risk State Abstraction. Across four contemporary backbone models, CARENLI improves mean accuracy from about 23% with direct prompting to about 57%, a gain of roughly 34 points, with the largest benefits on structurally demanding reasoning types. These results support compartmentalisation plus verification as a practical route to more reliable and auditable clinical inference.
