Quantitative Analysis of Technical Debt and Pattern Violation in Large Language Model Architectures

Tyler Slater

Quantitative Analysis of Technical Debt and Pattern Violation in Large Language Model Architectures

Tyler Slater

TL;DR

The paper tackles the risk that AI-generated code to scaffold production systems may incur architectural debt, potentially compromising long-term maintainability. It proposes an empirical framework using Hexagonal Architecture constraints, AST-based static analysis, and metrics like LLOC, MI, and AVR to quantify architectural erosion across three model families. Findings show open-weight models (Llama 3 8B) incur high architectural violations and implement less logic, while proprietary models (GPT-5.1) achieve near-perfect architectural conformance, underscoring the need for architecture-guided safeguards. The study highlights a Maintainability Paradox where brevity can misrepresent quality and advocates architecture-as-guardrails and automated linting to mitigate generative debt in AI-assisted software engineering. Future work aims to quantify remediation costs with a Debt Remediation Index and explore automated refactoring workflows.

Abstract

As Large Language Models (LLMs) transition from code completion tools to autonomous system architects, their impact on long-term software maintainability remains unquantified. While existing research benchmarks functional correctness (pass@k), this study presents the first empirical framework to measure "Architectural Erosion" and the accumulation of Technical Debt in AI-synthesized microservices. We conducted a comparative pilot study of three state-of-the-art models (GPT-5.1, Claude 4.5 Sonnet, and Llama 3 8B) by prompting them to implement a standardized Book Lending Microservice under strict Hexagonal Architecture constraints. Utilizing Abstract Syntax Tree (AST) parsing, we find that while proprietary models achieve high architectural conformance (0% violation rate for GPT-5.1), open-weights models exhibit critical divergence. Specifically, Llama 3 demonstrated an 80% Architectural Violation Rate, frequently bypassing interface adapters to create illegal circular dependencies between Domain and Infrastructure layers. Furthermore, we identified a phenomenon of "Implementation Laziness," where open-weights models generated 60% fewer Logical Lines of Code (LLOC) than their proprietary counterparts, effectively omitting complex business logic to satisfy token constraints. These findings suggest that without automated architectural linting, utilizing smaller open-weights models for system scaffolding accelerates the accumulation of structural technical debt.

Quantitative Analysis of Technical Debt and Pattern Violation in Large Language Model Architectures

TL;DR

Abstract

Quantitative Analysis of Technical Debt and Pattern Violation in Large Language Model Architectures

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (6)