LUNA: A Model-Based Universal Analysis Framework for Large Language Models
Da Song, Xuan Xie, Jiayang Song, Derui Zhu, Yuheng Huang, Felix Juefei-Xu, Lei Ma
TL;DR
LUNA introduces a universal, model-based framework to analyze large language models by extracting abstract stochastic models (DTMCs or HMMs) from decoder-level traces and binding semantics to assess three trustworthiness perspectives: out-of-distribution detection, adversarial robustness, and hallucination. The approach combines PCA-based dimensionality reduction, grid or clustering state partitioning, and probabilistic modeling, yielding two families of metrics (abstract-model-wise and semantics-wise) to evaluate model quality and task-specific trustworthiness. Large-scale experiments across Alpaca-7b, Llama2-7b, and CodeLlama-13b-Instruct on diverse datasets demonstrate LUNA’s ability to distinguish normal from abnormal behavior and reveal how modeling choices influence performance, with DTMC generally outperforming HMM and clustering-based partitions often excelling in coverage and causality capture. The work provides actionable guidance for metric selection and configuration to drive trustworthy LLM deployment in software engineering and NLP applications, and offers a foundation for online monitoring, testing, and potential output repair in practice.
Abstract
Over the past decade, Artificial Intelligence (AI) has had great success recently and is being used in a wide range of academic and industrial fields. More recently, LLMs have made rapid advancements that have propelled AI to a new level, enabling even more diverse applications and industrial domains with intelligence, particularly in areas like software engineering and natural language processing. Nevertheless, a number of emerging trustworthiness concerns and issues exhibited in LLMs have already recently received much attention, without properly solving which the widespread adoption of LLMs could be greatly hindered in practice. The distinctive characteristics of LLMs, such as the self-attention mechanism, extremely large model scale, and autoregressive generation schema, differ from classic AI software based on CNNs and RNNs and present new challenges for quality analysis. Up to the present, it still lacks universal and systematic analysis techniques for LLMs despite the urgent industrial demand. Towards bridging this gap, we initiate an early exploratory study and propose a universal analysis framework for LLMs, LUNA, designed to be general and extensible, to enable versatile analysis of LLMs from multiple quality perspectives in a human-interpretable manner. In particular, we first leverage the data from desired trustworthiness perspectives to construct an abstract model as an auxiliary analysis asset, which is empowered by various abstract model construction methods. To assess the quality of the abstract model, we collect and define a number of evaluation metrics, aiming at both abstract model level and the semantics level. Then, the semantics, which is the degree of satisfaction of the LLM w.r.t. the trustworthiness perspective, is bound to and enriches the abstract model with semantics, which enables more detailed analysis applications for diverse purposes.
