Table of Contents
Fetching ...

Rethinking Code Complexity Through the Lens of Large Language Models

Chen Xie, Yuling Shi, Xiaodong Gu, Beijun Shen

TL;DR

Traditional code complexity metrics do not reliably predict the difficulty large language models face when processing code. The authors propose LM-CC, a model-aware complexity metric built from entropy-driven semantic units arranged in a hierarchical representation to capture semantic nonlinearity arising from compositional depth and branching. Empirical results show LM-CC correlates more strongly with LLM performance than existing metrics, and semantics-preserving reductions that lower LM-CC yield meaningful performance gains across program repair, code translation, and execution reasoning. This work enables model-aware evaluation, targeted refactoring, and adaptive prompting strategies for LLM-based code intelligence.

Abstract

Code complexity metrics such as cyclomatic complexity have long been used to assess software quality and maintainability. With the rapid advancement of large language models (LLMs) on code understanding and generation tasks, an important yet underexplored question arises: do these traditional complexity metrics meaningfully characterize the difficulty LLMs experience when processing code? In this work, we empirically demonstrate that, after controlling for code length, classical metrics exhibit no consistent correlation with LLM performance, revealing a fundamental mismatch with model-perceived difficulty. To address this gap, we propose LM-CC, a novel code complexity metric designed from the perspective of LLMs. The core premise of LM-CC is that LLM-perceived difficulty is driven by the nonlinearity of program semantics. Accordingly, we decompose programs into semantic units based on entropy, organize these units into a compositional hierarchy, and quantify complexity as a principled aggregation of compositional level and branching-induced divergence, capturing cumulative model uncertainty during code processing. Our extensive experiments show that LM-CC not only correlates more strongly with LLM performance than traditional metrics but also that lowering it directly enhances task performance.

Rethinking Code Complexity Through the Lens of Large Language Models

TL;DR

Traditional code complexity metrics do not reliably predict the difficulty large language models face when processing code. The authors propose LM-CC, a model-aware complexity metric built from entropy-driven semantic units arranged in a hierarchical representation to capture semantic nonlinearity arising from compositional depth and branching. Empirical results show LM-CC correlates more strongly with LLM performance than existing metrics, and semantics-preserving reductions that lower LM-CC yield meaningful performance gains across program repair, code translation, and execution reasoning. This work enables model-aware evaluation, targeted refactoring, and adaptive prompting strategies for LLM-based code intelligence.

Abstract

Code complexity metrics such as cyclomatic complexity have long been used to assess software quality and maintainability. With the rapid advancement of large language models (LLMs) on code understanding and generation tasks, an important yet underexplored question arises: do these traditional complexity metrics meaningfully characterize the difficulty LLMs experience when processing code? In this work, we empirically demonstrate that, after controlling for code length, classical metrics exhibit no consistent correlation with LLM performance, revealing a fundamental mismatch with model-perceived difficulty. To address this gap, we propose LM-CC, a novel code complexity metric designed from the perspective of LLMs. The core premise of LM-CC is that LLM-perceived difficulty is driven by the nonlinearity of program semantics. Accordingly, we decompose programs into semantic units based on entropy, organize these units into a compositional hierarchy, and quantify complexity as a principled aggregation of compositional level and branching-induced divergence, capturing cumulative model uncertainty during code processing. Our extensive experiments show that LM-CC not only correlates more strongly with LLM performance than traditional metrics but also that lowering it directly enhances task performance.
Paper Structure (27 sections, 5 theorems, 5 equations, 3 figures, 3 tables, 1 algorithm)

This paper contains 27 sections, 5 theorems, 5 equations, 3 figures, 3 tables, 1 algorithm.

Key Result

Theorem 3.3

Let $\mathcal{T} = (V, E)$ be a semantic hierarchy. Under Assumptions assump:context and assump:branch, the total predictive entropy satisfies where $H_0(v)$ is the baseline entropy under ideal context access, and $\Phi(\mathcal{T}) = \delta \sum_{v} \max(0, d(v) - d^*) + \gamma \sum_{v} (b(v) - 1)^+$ is the structural penalty.

Figures (3)

  • Figure 1: Comparison between Cyclomatic Complexity (CC) and our proposed LM-CC. While CC assigns identical values to code snippets with significantly different cognitive loads for LLMs (top), LM-CC effectively distinguishes them by capturing the model uncertainty on non-linear code semantics (bottom).
  • Figure 2: Hierarchical Semantic Decomposition Example. Left: source code with token-entropy annotations, where color-coded regions indicate elevated LLM uncertainty. Right: the induced hierarchical semantic representation, with elements color-aligned to their semantic units of source code.
  • Figure 3: Ablation on the weighting factor $\alpha$ in LM-CC. Performance peaks at intermediate $\alpha$ values, while hierarchy-only ($\alpha\!\to\!0$) and branching-only ($\alpha\!\to\!1$) configurations perform substantially worse.

Theorems & Definitions (11)

  • Theorem 3.3: Hierarchical Entropy Accumulation
  • proof
  • Corollary 3.4: LM-CC as Structural Complexity Proxy
  • proof
  • Proposition 2.1: Separation from Cyclomatic Complexity
  • proof
  • Proposition 2.2: Orthogonality to Code Length
  • proof
  • Proposition 2.3: Entropy Thresholding as Boundary Detection
  • proof
  • ...and 1 more