Mellum: Production-Grade in-IDE Contextual Code Completion with Multi-File Project Understanding

Nikita Pavlichenko; Iurii Nazarov; Ivan Dolgov; Ekaterina Garanina; Dmitry Ustalov; Ivan Bondyrev; Kseniia Lysaniuk; Evgeniia Vu; Kirill Chekmenev; Joseph Shtok; Yaroslav Golubev; Anton Semenkin; Uladzislau Sazanovich

Mellum: Production-Grade in-IDE Contextual Code Completion with Multi-File Project Understanding

Nikita Pavlichenko, Iurii Nazarov, Ivan Dolgov, Ekaterina Garanina, Dmitry Ustalov, Ivan Bondyrev, Kseniia Lysaniuk, Evgeniia Vu, Kirill Chekmenev, Joseph Shtok, Yaroslav Golubev, Anton Semenkin, Uladzislau Sazanovich

TL;DR

Mellum addresses the gap between research code LLMs and production-grade in-editor code completion by introducing a $4$B-parameter, open-weight, in-editor focused architecture built on a Llama-style backbone with a context window of up to $8{,}192$ tokens. An end-to-end pipeline combines permissively licensed data, fill-in-the-middle pretraining, project-context supervised fine-tuning, and Direct Preference Optimization to align outputs with real-world IDE usage. Offline evaluations on JetComplete and public benchmarks, together with production telemetry from JetBrains IDE deployments, show that disciplined data governance and multi-stage training yield substantial quality gains and improved stopping behavior compared with larger, generic models, while maintaining low latency. Online metrics (RoCC and AR) demonstrate real-world productivity gains under cloud deployments, and the Apache-2.0 release enables broad, open adoption for organizations with privacy constraints, illustrating a practical path from research to production-scale, open-code completion.

Abstract

We present the Mellum models family, open-weight code completion models designed for interactive use in JetBrains IDEs. Mellums have 4B parameters, adopt a Llama-style architecture, and are pre-trained on ~4T tokens of permissively licensed, multi-language code. Our studies show that (i) careful data curation and staged training significantly improve the model's quality, (ii) editor-critical capabilities such as context packing are necessary for high-quality suggestions, and (iii) a compact, task-focused model can meet the cost and latency constraints of interactive completion. In the paper, we describe an end-to-end industrial pipeline for producing contextualized in-editor completion: disciplined data governance, multi-stage training that includes fill-in-the-middle and project context via supervised fine-tuning, and alignment via direct preference optimization using feedback from real-world scenarios. Our quality evaluations include both large-scale offline benchmarks and online telemetry from production deployments in JetBrains IDEs. Mellums are released under the Apache-2.0 license on HuggingFace, with a public model card providing a reproducible reference for practitioners. Our experience offers a pragmatic blueprint for taking a focused, open model from a research prototype to at scale production for hundreds of thousands of users.

Mellum: Production-Grade in-IDE Contextual Code Completion with Multi-File Project Understanding

TL;DR

Abstract

Mellum: Production-Grade in-IDE Contextual Code Completion with Multi-File Project Understanding

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (4)