Rethinking Code Complexity Through the Lens of Large Language Models
Chen Xie, Yuling Shi, Xiaodong Gu, Beijun Shen
TL;DR
Traditional code complexity metrics do not reliably predict the difficulty large language models face when processing code. The authors propose LM-CC, a model-aware complexity metric built from entropy-driven semantic units arranged in a hierarchical representation to capture semantic nonlinearity arising from compositional depth and branching. Empirical results show LM-CC correlates more strongly with LLM performance than existing metrics, and semantics-preserving reductions that lower LM-CC yield meaningful performance gains across program repair, code translation, and execution reasoning. This work enables model-aware evaluation, targeted refactoring, and adaptive prompting strategies for LLM-based code intelligence.
Abstract
Code complexity metrics such as cyclomatic complexity have long been used to assess software quality and maintainability. With the rapid advancement of large language models (LLMs) on code understanding and generation tasks, an important yet underexplored question arises: do these traditional complexity metrics meaningfully characterize the difficulty LLMs experience when processing code? In this work, we empirically demonstrate that, after controlling for code length, classical metrics exhibit no consistent correlation with LLM performance, revealing a fundamental mismatch with model-perceived difficulty. To address this gap, we propose LM-CC, a novel code complexity metric designed from the perspective of LLMs. The core premise of LM-CC is that LLM-perceived difficulty is driven by the nonlinearity of program semantics. Accordingly, we decompose programs into semantic units based on entropy, organize these units into a compositional hierarchy, and quantify complexity as a principled aggregation of compositional level and branching-induced divergence, capturing cumulative model uncertainty during code processing. Our extensive experiments show that LM-CC not only correlates more strongly with LLM performance than traditional metrics but also that lowering it directly enhances task performance.
