Mi:dm K 2.5 Pro

KT Tech innovation Group

Mi:dm K 2.5 Pro

KT Tech innovation Group

Abstract

The evolving LLM landscape requires capabilities beyond simple text generation, prioritizing multi-step reasoning, long-context understanding, and agentic workflows. This shift challenges existing models in enterprise environments, especially in Korean-language and domain-specific scenarios where scaling is insufficient. We introduce Mi:dm K 2.5 Pro, a 32B parameter flagship LLM designed to address enterprise-grade complexity through reasoning-focused optimization. Our methodology builds a robust data foundation via a quality-centric curation pipeline utilizing abstract syntax tree (AST) analysis for code, gap-filling synthesis for mathematics, and an LLM-based quality evaluator. Pre-training scales the model via layer-predictor-based Depth Upscaling (DuS) and a progressive strategy supporting a 128K token context window. Post-training introduces a specialized multi-stage pipeline, including Reasoning SFT, model merging, and asynchronous reinforcement learning (RL), to develop complex problem-solving skills. "Fusion Training" then rebalances these capabilities with conversational fluency, consistent response styling, and reliable tool-use. The evaluations show that Mi:dm K 2.5 Pro achieves competitive performance against leading global and domestic models. In addition, it sets state-of-the-art results on Korean-specific benchmarks, showcasing deep linguistic and cultural understanding. Finally, Responsible AI evaluations validate safety against attacks, ensuring a secure profile for deployment with a balance of harmlessness and responsiveness.

Mi:dm K 2.5 Pro

Abstract

Paper Structure (57 sections, 12 figures, 20 tables)

This paper contains 57 sections, 12 figures, 20 tables.

Introduction
Data Foundations
High Quality Data Acquisition
Korean Data Acquisition Strategy.
Multilingual Support and Language Transfer.
Specialized Domain (STEM, Code, and Agentic).
Refinement Pipeline for Code
Programming Language Classifier.
Education Score Filter.
File-Level Low Quality Filter.
Execution Filter.
Difficulty Filter.
Task Classifier.
Code Data Distribution Analysis.
Structural Refinement and Synthesis for Math
...and 42 more sections

Figures (12)

Figure 1: Artificial Analysis Intelligence Index (AAII) v4.0 results
Figure 2: Refinement pipeline for code
Figure 3: Distribution report of score, difficulty, and task after refinement (distribution bias diagnosis based on top buckets).
Figure 4: Mathematical data distribution according to domain$\times$conceptual difficulty$\times$reasoning difficulty combinations.
Figure 5: Response length distribution (before vs. after rewriting)
...and 7 more figures

Mi:dm K 2.5 Pro

Abstract

Mi:dm K 2.5 Pro

Authors

Abstract

Table of Contents

Figures (12)