Kodezi Chronos: A Debugging-First Language Model for Repository-Scale Code Understanding

Ishraq Khan; Assad Chowdary; Sharoz Haseeb; Urvish Patel; Yousuf Zaii

Kodezi Chronos: A Debugging-First Language Model for Repository-Scale Code Understanding

Ishraq Khan, Assad Chowdary, Sharoz Haseeb, Urvish Patel, Yousuf Zaii

TL;DR

Chronos-1 introduces a debugging-focused language model designed for repository-scale code understanding, combining persistent memory, adaptive graph-guided retrieval (AGR), and a seven-layer autonomous fix-test-refine loop. It demonstrates state-of-the-art performance on debugging benchmarks, notably 80.33% on SWE-bench Lite and 65.3% fix success across 5,000 real-world bugs, driven by Continuous memory (PDM), robust multi-hop context retrieval, and execution-feedback loops. The work provides extensive ablations, theoretical guarantees, and adversarial analyses, establishing a specialized paradigm that bridges memory, reasoning, and automated testing to outperform general frontier models on debugging tasks. The Chronos-1 architecture promises practical impact by enabling autonomous maintenance within CI/CD and IDE ecosystems, with plans for OS and API deployment in 2025–2026. It also outlines limitations and avenues for future work, including hardware-dependent and cross-language bugs, safety considerations, and broader adoption in production environments.

Abstract

Large Language Models (LLMs) have advanced code generation and software automation but remain constrained by inference-time context and lack structured reasoning over code, leaving debugging largely unsolved. While Claude 4.5 Opus achieves 74.40% on SWE-bench Verified and Gemini 3 Pro reaches 76.2%, both models remain below 20% on real multi-file debugging tasks. We introduce Kodezi Chronos-1, a language model purpose-built for debugging that integrates Adaptive Graph-Guided Retrieval to navigate codebases up to 10 million lines (92% precision, 85% recall), Persistent Debug Memory trained on over 15 million sessions, and a seven-layer fix-test-refine architecture. On 5,000 real-world scenarios, Chronos-1 achieves 67.3% +/- 2.1% fix accuracy compared to 14.2% +/- 1.3% for Claude 4.1 Opus and 13.8% +/- 1.2% for GPT-4.1 (Cohen's d = 3.87). On SWE-bench Lite, Chronos-1 reaches a state-of-the-art 80.33% resolution rate (241 of 300), outperforming the next best system by 20 points and achieving repository-specific highs of 96.1% on Sympy and 90.4% on Django. Chronos-1 reduces debugging time by 40% and iterations by 65%, resolving complex multi-file and cross-repository bugs that require temporal analysis. Limitations remain for hardware-dependent and dynamic language errors, and Chronos-1 will be available in Kodezi OS in Q4 2025 and via API in Q1 2026.

Kodezi Chronos: A Debugging-First Language Model for Repository-Scale Code Understanding

TL;DR

Abstract

Kodezi Chronos: A Debugging-First Language Model for Repository-Scale Code Understanding

TL;DR

Abstract

Paper Structure

Table of Contents

Key Result

Figures (23)

Theorems & Definitions (5)