EXAONE Deep: Reasoning Enhanced Language Models

Kyunghoon Bae; Eunbi Choi; Kibong Choi; Stanley Jungkyu Choi; Yemuk Choi; Seokhee Hong; Junwon Hwang; Hyojin Jeon; Kijeong Jeon; Gerrard Jeongwon Jo; Hyunjik Jo; Jiyeon Jung; Hyosang Kim; Joonkee Kim; Seonghwan Kim; Soyeon Kim; Sunkyoung Kim; Yireun Kim; Yongil Kim; Youchul Kim; Edward Hwayoung Lee; Haeju Lee; Honglak Lee; Jinsik Lee; Kyungmin Lee; Sangha Park; Yongmin Park; Sihoon Yang; Heuiyeen Yeen; Sihyuk Yi; Hyeongu Yun

EXAONE Deep: Reasoning Enhanced Language Models

Kyunghoon Bae, Eunbi Choi, Kibong Choi, Stanley Jungkyu Choi, Yemuk Choi, Seokhee Hong, Junwon Hwang, Hyojin Jeon, Kijeong Jeon, Gerrard Jeongwon Jo, Hyunjik Jo, Jiyeon Jung, Hyosang Kim, Joonkee Kim, Seonghwan Kim, Soyeon Kim, Sunkyoung Kim, Yireun Kim, Yongil Kim, Youchul Kim, Edward Hwayoung Lee, Haeju Lee, Honglak Lee, Jinsik Lee, Kyungmin Lee, Sangha Park, Yongmin Park, Sihoon Yang, Heuiyeen Yeen, Sihyuk Yi, Hyeongu Yun

TL;DR

EXAONE Deep introduces three reasoning-focused LLMs at $2.4B$, $7.8B$, and $32B$ that are fine-tuned with supervised fine-tuning, direct preference optimization, and online reinforcement learning to enhance chain-of-thought reasoning. The data strategy emphasizes long CoT sequences, including a large SFT corpus (~$12 ext{B}$ tokens) and targeted preference/ reinforcement datasets, with training performed on NVIDIA H100 hardware and detailed FLOP budgets. Across benchmarks such as MATH-500, AIME, CSAT, GPQA Diamond, LiveCodeBench, and MMLU/MMLU-Pro, the $32B$ model remains competitive with leading open-weight models, the $7.8B$ model often surpasses similarly sized baselines, and the $2.4B$ variant outperforms distilled counterparts, highlighting strong reasoning capabilities at multiple scales. The work emphasizes research-oriented deployment under a license, and suggests extending reasoning capabilities to tasks with less well-defined answers in future work.

Abstract

We present EXAONE Deep series, which exhibits superior capabilities in various reasoning tasks, including math and coding benchmarks. We train our models mainly on the reasoning-specialized dataset that incorporates long streams of thought processes. Evaluation results show that our smaller models, EXAONE Deep 2.4B and 7.8B, outperform other models of comparable size, while the largest model, EXAONE Deep 32B, demonstrates competitive performance against leading open-weight models. All EXAONE Deep models are openly available for research purposes and can be downloaded from https://huggingface.co/LGAI-EXAONE

EXAONE Deep: Reasoning Enhanced Language Models

TL;DR

EXAONE Deep introduces three reasoning-focused LLMs at

, and

that are fine-tuned with supervised fine-tuning, direct preference optimization, and online reinforcement learning to enhance chain-of-thought reasoning. The data strategy emphasizes long CoT sequences, including a large SFT corpus (~

tokens) and targeted preference/ reinforcement datasets, with training performed on NVIDIA H100 hardware and detailed FLOP budgets. Across benchmarks such as MATH-500, AIME, CSAT, GPQA Diamond, LiveCodeBench, and MMLU/MMLU-Pro, the

model remains competitive with leading open-weight models, the

model often surpasses similarly sized baselines, and the

variant outperforms distilled counterparts, highlighting strong reasoning capabilities at multiple scales. The work emphasizes research-oriented deployment under a license, and suggests extending reasoning capabilities to tasks with less well-defined answers in future work.

EXAONE Deep: Reasoning Enhanced Language Models

TL;DR

Abstract

EXAONE Deep: Reasoning Enhanced Language Models

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (6)