Table of Contents
Fetching ...

EXAONE 3.5: Series of Large Language Models for Real-world Use Cases

Soyoung An, Kyunghoon Bae, Eunbi Choi, Kibong Choi, Stanley Jungkyu Choi, Seokhee Hong, Junwon Hwang, Hyojin Jeon, Gerrard Jeongwon Jo, Hyunjik Jo, Jiyeon Jung, Yountae Jung, Hyosang Kim, Joonkee Kim, Seonghwan Kim, Soyeon Kim, Sunkyoung Kim, Yireun Kim, Yongil Kim, Youchul Kim, Edward Hwayoung Lee, Haeju Lee, Honglak Lee, Jinsik Lee, Kyungmin Lee, Woohyung Lim, Sangha Park, Sooyoun Park, Yongmin Park, Sihoon Yang, Heuiyeen Yeen, Hyeongu Yun

TL;DR

EXAONE 3.5 introduces a triad of instruction-tuned large language models (32B, 7.8B, 2.4B) with a 32K context, designed for real-world use and robust long-context understanding. The paper details a two-stage pre-training with context-length extension, a rigorous decontamination pipeline, and post-training alignment via supervised fine-tuning and preference optimization, all under a data-compliance framework. Evaluations across real-world benchmarks, long-context tasks, and general-domain problems show state-of-the-art or competitive performance for models of comparable size, with the smallest 2.4B model performing especially well in general-domain tasks. The work emphasizes responsible AI, transparent data governance, and open-access licensing, offering flexible model options for academia and industry.

Abstract

This technical report introduces the EXAONE 3.5 instruction-tuned language models, developed and released by LG AI Research. The EXAONE 3.5 language models are offered in three configurations: 32B, 7.8B, and 2.4B. These models feature several standout capabilities: 1) exceptional instruction following capabilities in real-world scenarios, achieving the highest scores across seven benchmarks, 2) outstanding long-context comprehension, attaining the top performance in four benchmarks, and 3) competitive results compared to state-of-the-art open models of similar sizes across nine general benchmarks. The EXAONE 3.5 language models are open to anyone for research purposes and can be downloaded from https://huggingface.co/LGAI-EXAONE. For commercial use, please reach out to the official contact point of LG AI Research: contact_us@lgresearch.ai.

EXAONE 3.5: Series of Large Language Models for Real-world Use Cases

TL;DR

EXAONE 3.5 introduces a triad of instruction-tuned large language models (32B, 7.8B, 2.4B) with a 32K context, designed for real-world use and robust long-context understanding. The paper details a two-stage pre-training with context-length extension, a rigorous decontamination pipeline, and post-training alignment via supervised fine-tuning and preference optimization, all under a data-compliance framework. Evaluations across real-world benchmarks, long-context tasks, and general-domain problems show state-of-the-art or competitive performance for models of comparable size, with the smallest 2.4B model performing especially well in general-domain tasks. The work emphasizes responsible AI, transparent data governance, and open-access licensing, offering flexible model options for academia and industry.

Abstract

This technical report introduces the EXAONE 3.5 instruction-tuned language models, developed and released by LG AI Research. The EXAONE 3.5 language models are offered in three configurations: 32B, 7.8B, and 2.4B. These models feature several standout capabilities: 1) exceptional instruction following capabilities in real-world scenarios, achieving the highest scores across seven benchmarks, 2) outstanding long-context comprehension, attaining the top performance in four benchmarks, and 3) competitive results compared to state-of-the-art open models of similar sizes across nine general benchmarks. The EXAONE 3.5 language models are open to anyone for research purposes and can be downloaded from https://huggingface.co/LGAI-EXAONE. For commercial use, please reach out to the official contact point of LG AI Research: contact_us@lgresearch.ai.

Paper Structure

This paper contains 40 sections, 13 figures, 15 tables.

Figures (13)

  • Figure 1: A procedure of instruction-tuning data construction. First, we extract the core knowledge from large-volume web corpora and classify it within the taxonomy we defined in advance. Next, instruction-tuning data is generated based on the knowledge. To construct additional training data that is more complex, we leverage an instruction-evolving method zeng-etal-2024-automatic that lets the final dataset cover various fields with varying levels of difficulty.
  • Figure 2: Overview of the preference optimization pipeline. (Top) Preference Data Creation: It shows the process of constructing preference data $\{x, y_w, y_l\}$ by scoring the responses $y$ generated from a model for the prompt $x$ using a reward model. (Bottom) Preference Optimization: Sequential training process where $M_0$ initialized from the SFT model is trained through DAA to obtain $M_1$ and $M_2$.
  • Figure 3: NIAH results of EXAONE 3.5 language models. The x-axis represents the token length of the input text, while the y-axis shows the relative position within the text, expressed as a percentage (0% corresponds to the beginning, and 100% to the end). The results are represented using a color-coded scheme: green indicates successful retrievals, and red represents unsuccessful ones. EXAONE 3.5 language models achieve near-perfect accuracy in retrieving information across various document depths and context lengths in English and Korean.
  • Figure 4: A summary of the decontamination method employed to train EXAONE 3.5 language models. Adopting an approach borrowed from the GPT-4 method, we increase the number of random sample to $N=10$ for stricter decontamination.
  • Figure 5: LLM-as-a-judge prompt for evaluating LongRAG
  • ...and 8 more figures