K2-V2: A 360-Open, Reasoning-Enhanced LLM
K2 Team, Zhengzhong Liu, Liping Tang, Linghao Jin, Haonan Li, Nikhil Ranjan, Desai Fan, Shaurya Rohatgi, Richard Fan, Omkar Pangarkar, Huijuan Wang, Zhoujun Cheng, Suqi Sun, Seungwook Han, Bowen Tan, Gurpreet Gosal, Xudong Han, Varad Pimpalkhute, Shibo Hao, Ming Shan Hee, Joel Hestness, Haolong Jia, Liqun Ma, Aaryamonvikram Singh, Daria Soboleva, Natalia Vassilieva, Renxi Wang, Yingquan Wu, Yuekai Sun, Taylor Killian, Alexander Moreno, John Maggs, Hector Ren, Guowei He, Hongyi Wang, Xuezhe Ma, Yuqi Wang, Mikhail Yurochkin, Eric P. Xing
TL;DR
K2-V2 presents a 360-open, reasoning-enhanced LLM built from scratch, designed to be a strong open-base model for long-context reasoning and tool use. The work details a three-phase training lifecycle—pretraining, mid-training with synthetic thinking data, and simple supervised fine-tuning—to cultivate reasoning behaviors, extended context lengths, and robust evaluation. It introduces TxT360 data ecosystems (including TxT360-Midas and TxT360-3efforts), a custom in-house training stack, and a reasoning-focused evaluation regime (pass@k, long-context benchmarks) that demonstrates state-of-the-art or near state-of-the-art performance in mathematics, STEM, logic, and tool use for a 70B-scale dense model. The paper also reports extensive safety and alignment analyses, highlighting both strengths and areas for improvement, especially in jailbreak and prompt-extraction resilience, and emphasizes open scientific progress through full transparency of data and training processes. Collectively, K2-V2 aims to advance open science and practical reasoning-centric AI deployment by offering a robust, well-documented, and extensible foundation for future research and production use.
Abstract
We introduce K2-V2, a 360-open LLM built from scratch as a superior base for reasoning adaptation, in addition to functions such as conversation and knowledge retrieval from general LLMs. It stands as the strongest fully open model, rivals open-weight leaders in its size class, outperforms Qwen2.5-72B and approaches the performance of Qwen3-235B. We actively infuse domain knowledge, reasoning, long-context, and tool use throughout the training process. This explicitly prepares the model for complex reasoning tasks. We demonstrate this potential using simple supervised fine-tuning, establishing a strong baseline that indicates significant headroom for advanced alignment. By releasing the full training history and data composition, we maximize the effectiveness of continuous training, a key open source production scenario. We release the model weights and signature LLM360 artifacts, such as complete training data, to empower the community with a capable, reasoning-centric foundation.
