Table of Contents
Fetching ...

K2-V2: A 360-Open, Reasoning-Enhanced LLM

K2 Team, Zhengzhong Liu, Liping Tang, Linghao Jin, Haonan Li, Nikhil Ranjan, Desai Fan, Shaurya Rohatgi, Richard Fan, Omkar Pangarkar, Huijuan Wang, Zhoujun Cheng, Suqi Sun, Seungwook Han, Bowen Tan, Gurpreet Gosal, Xudong Han, Varad Pimpalkhute, Shibo Hao, Ming Shan Hee, Joel Hestness, Haolong Jia, Liqun Ma, Aaryamonvikram Singh, Daria Soboleva, Natalia Vassilieva, Renxi Wang, Yingquan Wu, Yuekai Sun, Taylor Killian, Alexander Moreno, John Maggs, Hector Ren, Guowei He, Hongyi Wang, Xuezhe Ma, Yuqi Wang, Mikhail Yurochkin, Eric P. Xing

TL;DR

K2-V2 presents a 360-open, reasoning-enhanced LLM built from scratch, designed to be a strong open-base model for long-context reasoning and tool use. The work details a three-phase training lifecycle—pretraining, mid-training with synthetic thinking data, and simple supervised fine-tuning—to cultivate reasoning behaviors, extended context lengths, and robust evaluation. It introduces TxT360 data ecosystems (including TxT360-Midas and TxT360-3efforts), a custom in-house training stack, and a reasoning-focused evaluation regime (pass@k, long-context benchmarks) that demonstrates state-of-the-art or near state-of-the-art performance in mathematics, STEM, logic, and tool use for a 70B-scale dense model. The paper also reports extensive safety and alignment analyses, highlighting both strengths and areas for improvement, especially in jailbreak and prompt-extraction resilience, and emphasizes open scientific progress through full transparency of data and training processes. Collectively, K2-V2 aims to advance open science and practical reasoning-centric AI deployment by offering a robust, well-documented, and extensible foundation for future research and production use.

Abstract

We introduce K2-V2, a 360-open LLM built from scratch as a superior base for reasoning adaptation, in addition to functions such as conversation and knowledge retrieval from general LLMs. It stands as the strongest fully open model, rivals open-weight leaders in its size class, outperforms Qwen2.5-72B and approaches the performance of Qwen3-235B. We actively infuse domain knowledge, reasoning, long-context, and tool use throughout the training process. This explicitly prepares the model for complex reasoning tasks. We demonstrate this potential using simple supervised fine-tuning, establishing a strong baseline that indicates significant headroom for advanced alignment. By releasing the full training history and data composition, we maximize the effectiveness of continuous training, a key open source production scenario. We release the model weights and signature LLM360 artifacts, such as complete training data, to empower the community with a capable, reasoning-centric foundation.

K2-V2: A 360-Open, Reasoning-Enhanced LLM

TL;DR

K2-V2 presents a 360-open, reasoning-enhanced LLM built from scratch, designed to be a strong open-base model for long-context reasoning and tool use. The work details a three-phase training lifecycle—pretraining, mid-training with synthetic thinking data, and simple supervised fine-tuning—to cultivate reasoning behaviors, extended context lengths, and robust evaluation. It introduces TxT360 data ecosystems (including TxT360-Midas and TxT360-3efforts), a custom in-house training stack, and a reasoning-focused evaluation regime (pass@k, long-context benchmarks) that demonstrates state-of-the-art or near state-of-the-art performance in mathematics, STEM, logic, and tool use for a 70B-scale dense model. The paper also reports extensive safety and alignment analyses, highlighting both strengths and areas for improvement, especially in jailbreak and prompt-extraction resilience, and emphasizes open scientific progress through full transparency of data and training processes. Collectively, K2-V2 aims to advance open science and practical reasoning-centric AI deployment by offering a robust, well-documented, and extensible foundation for future research and production use.

Abstract

We introduce K2-V2, a 360-open LLM built from scratch as a superior base for reasoning adaptation, in addition to functions such as conversation and knowledge retrieval from general LLMs. It stands as the strongest fully open model, rivals open-weight leaders in its size class, outperforms Qwen2.5-72B and approaches the performance of Qwen3-235B. We actively infuse domain knowledge, reasoning, long-context, and tool use throughout the training process. This explicitly prepares the model for complex reasoning tasks. We demonstrate this potential using simple supervised fine-tuning, establishing a strong baseline that indicates significant headroom for advanced alignment. By releasing the full training history and data composition, we maximize the effectiveness of continuous training, a key open source production scenario. We release the model weights and signature LLM360 artifacts, such as complete training data, to empower the community with a capable, reasoning-centric foundation.

Paper Structure

This paper contains 111 sections, 3 equations, 35 figures, 11 tables.

Figures (35)

  • Figure 1: K2 outperforms similar scale models on GPQA-Diamond. Even with base models.
  • Figure 2: Simple SFT on K2 makes it rival large models on AIME 2025.
  • Figure 3: The training phases of K2 are designed to progressively enable specific capabilities. Each phase introduce different types of challenges that we have to address.
  • Figure 4: A comparison between the K2 tokenizer vs. a few multilingual LLM tokenizers. Our tokenizer is tailored towards English and Arabic (MSA), as shown by the lower fertility scores.
  • Figure 5: Overview of our CommonCrawl data-curation pipeline, showing filtering percentages (by document count) at each stage. Grey bars indicate filtered documents. Line-level cleaning and PII removal eliminate almost no documents and are omitted from the visualization. The document-level quality-filtering stage includes Repetition Removal, Document Filtering, and Line Correction. A Local Exact Deduplication step is applied prior to global fuzzy deduplication to reduce computational overhead.
  • ...and 30 more figures