CAST: Achieving Stable LLM-based Text Analysis for Data Analytics
Jinxiang Xie, Zihao Li, Wei He, Rui Ding, Shi Han, Dongmei Zhang
TL;DR
This paper tackles the instability of LLM-generated text analyses in tabular data contexts by formalizing Text Analysis for Data Analytics (TADA) and introducing CAST, a framework that constrains latent reasoning via Algorithmic Prompting (AP) and Thinking-before-Speaking (TbS). CAST reduces the entropy of reasoning paths, yielding more reproducible corpus-level summaries and row-level tags (CAST-S and CAST-T) while maintaining or improving accuracy. The authors formalize a probabilistic view of stability, propose concrete design principles, and implement CAST as a single call with explicit intermediate commitments and self-validation. Across multilingual summarization and diverse tagging tasks, CAST demonstrates superior stability across multiple LLM backbones, with strong alignment to human judgments and competitive efficiency, indicating practical potential for dependable LLM-driven data analytics pipelines.
Abstract
Text analysis of tabular data relies on two core operations: \emph{summarization} for corpus-level theme extraction and \emph{tagging} for row-level labeling. A critical limitation of employing large language models (LLMs) for these tasks is their inability to meet the high standards of output stability demanded by data analytics. To address this challenge, we introduce \textbf{CAST} (\textbf{C}onsistency via \textbf{A}lgorithmic Prompting and \textbf{S}table \textbf{T}hinking), a framework that enhances output stability by constraining the model's latent reasoning path. CAST combines (i) Algorithmic Prompting to impose a procedural scaffold over valid reasoning transitions and (ii) Thinking-before-Speaking to enforce explicit intermediate commitments before final generation. To measure progress, we introduce \textbf{CAST-S} and \textbf{CAST-T}, stability metrics for bulleted summarization and tagging, and validate their alignment with human judgments. Experiments across publicly available benchmarks on multiple LLM backbones show that CAST consistently achieves the best stability among all baselines, improving Stability Score by up to 16.2\%, while maintaining or improving output quality.
