Decision Potential Surface: A Theoretical and Practical Approximation of LLM's Decision Boundary
Zi Liang, Zhiyao Wu, Haoyang Shang, Yulin Jin, Qingqing Ye, Huadi Zheng, Peizhao Hu, Haibo Hu
TL;DR
The paper tackles the infeasibility of directly constructing LLM decision boundaries by introducing the Decision Potential Surface (DPS), a landscape defined by the squared log-likelihood gap between the top two output sequences, whose 0-isohypse corresponds to the decision boundary. To make the approach scalable, it proposes K-DPS, which estimates the DPS using only $K$ samples per input and provides finite-sample, expected, and concentration error bounds that scale as $\mathcal{O}(1/\sqrt{K})$ with a tail term that decays exponentially. The authors prove that DPS captures the competition among candidate sequences, enable visualization, and yield actionable guarantees on approximation quality, while validating the method empirically across multiple open-source LLMs and corpora. This work enables practical, interpretable, and theoretically grounded analysis of LLM decision boundaries, with implications for understanding output variability and robustness and for developing boundary-aware diagnostics.
Abstract
Decision boundary, the subspace of inputs where a machine learning model assigns equal classification probabilities to two classes, is pivotal in revealing core model properties and interpreting behaviors. While analyzing the decision boundary of large language models (LLMs) has raised increasing attention recently, constructing it for mainstream LLMs remains computationally infeasible due to the enormous vocabulary-sequence sizes and the auto-regressive nature of LLMs. To address this issue, in this paper we propose Decision Potential Surface (DPS), a new notion for analyzing LLM decision boundary. DPS is defined on the confidences in distinguishing different sampling sequences for each input, which naturally captures the potential of decision boundary. We prove that the zero-height isohypse in DPS is equivalent to the decision boundary of an LLM, with enclosed regions representing decision regions. By leveraging DPS, for the first time in the literature, we propose an approximate decision boundary construction algorithm, namely $K$-DPS, which only requires K-finite times of sequence sampling to approximate an LLM's decision boundary with negligible error. We theoretically derive the upper bounds for the absolute error, expected error, and the error concentration between K-DPS and the ideal DPS, demonstrating that such errors can be trade-off with sampling times. Our results are empirically validated by extensive experiments across various LLMs and corpora.
