Additive Large Language Models for Semi-Structured Text
Karthikeyan K, Raghuveer Thirukovalluru, David Carlson
TL;DR
CALM presents a fundamentally interpretable approach to text classification on semi-structured clinical data by enforcing an additive decomposition of predictions across components, enabling faithful component-level explanations while achieving competitive performance with black-box LLM finetuning. The framework encodes each component with a shared LLM, aggregates per-component logits, and optionally augments with CALM$^2$ pairwise interactions or CALM-Distill to improve accuracy without sacrificing transparency. Empirical results across MIMIC Admission Notes, ClinStructor, and LCD benchmarks show CALM, CALM$^2$, and CALM-Distill achieving strong predictive performance and clear global and local interpretability, including feature importance scores and visualized risk trajectories for individual features. The work demonstrates that additive, text-native models can support trustworthy deployment in high-stakes clinical settings by providing actionable explanations, robustness considerations, and practical extensions for enhanced expressivity. Overall, CALM offers a scalable, interpretable alternative to opaque LLM classifiers with significant potential for real-world clinical auditing and decision support.
Abstract
Large Language Models have advanced clinical text classification, but their opaque predictions remain a critical barrier to practical adoption in research and clinical settings where investigators and physicians need to understand which parts of a patient's record drive risk signals. To address this challenge, we introduce \textbf{CALM}, short for \textbf{Classification with Additive Large Language Models}, an interpretable framework for semi-structured text where inputs are composed of semantically meaningful components, such as sections of an admission note or question-answer fields from an intake form. CALM predicts outcomes as the additive sum of each component's contribution, making these contributions part of the forward computation itself and enabling faithful explanations at both the patient and population level. The additive structure also enables clear visualizations, such as component-level risk curves similar to those used in generalized additive models, making the learned relationships easier to inspect and communicate. Although CALM expects semi-structured inputs, many clinical documents already have this form, and similar structure can often be automatically extracted from free-text notes. CALM achieves performance comparable to conventional LLM classifiers while improving trust, supporting quality-assurance checks, and revealing clinically meaningful patterns during model development and auditing.
