Table of Contents
Fetching ...

H2O-Danube3 Technical Report

Pascal Pfeiffer, Philipp Singer, Yauhen Babakhin, Gabor Fodor, Nischay Dhankhar, Sri Satish Ambati

TL;DR

This work addresses the need for compact, on-device LLMs by introducing H2O-Danube3-4B and H2O-Danube3-500M, decoder-only models trained on $6T$ and $4T$ tokens using a $32{,}000$-token Mistral tokenizer and a context length of $8{,}192$. The models undergo three-stage English-focused pretraining followed by supervised chat fine-tuning, with extensive evaluation across academic, chat, and fine-tuning benchmarks, demonstrating competitive performance (e.g., GSM8K ~ $50.14\%$, Hellaswag > $80\%$ in 10-shot) and strong edge-device capabilities. The paper also demonstrates practical deployment pathways through 4-bit quantization (and 3-bit quantization as a more aggressive option) and provides open-source access under Apache 2.0, along with tooling via H2O LLM Studio for on-device adaptation. Overall, H2O-Danube3 advances democratization of LLMs by delivering performant, freely accessible, small-scale models suitable for offline and mobile applications, reducing barriers to entry for edge AI research and deployment.

Abstract

We present H2O-Danube3, a series of small language models consisting of H2O-Danube3-4B, trained on 6T tokens and H2O-Danube3-500M, trained on 4T tokens. Our models are pre-trained on high quality Web data consisting of primarily English tokens in three stages with different data mixes before final supervised tuning for chat version. The models exhibit highly competitive metrics across a multitude of academic, chat, and fine-tuning benchmarks. Thanks to its compact architecture, H2O-Danube3 can be efficiently run on a modern smartphone, enabling local inference and rapid processing capabilities even on mobile devices. We make all models openly available under Apache 2.0 license further democratizing LLMs to a wider audience economically.

H2O-Danube3 Technical Report

TL;DR

This work addresses the need for compact, on-device LLMs by introducing H2O-Danube3-4B and H2O-Danube3-500M, decoder-only models trained on and tokens using a -token Mistral tokenizer and a context length of . The models undergo three-stage English-focused pretraining followed by supervised chat fine-tuning, with extensive evaluation across academic, chat, and fine-tuning benchmarks, demonstrating competitive performance (e.g., GSM8K ~ , Hellaswag > in 10-shot) and strong edge-device capabilities. The paper also demonstrates practical deployment pathways through 4-bit quantization (and 3-bit quantization as a more aggressive option) and provides open-source access under Apache 2.0, along with tooling via H2O LLM Studio for on-device adaptation. Overall, H2O-Danube3 advances democratization of LLMs by delivering performant, freely accessible, small-scale models suitable for offline and mobile applications, reducing barriers to entry for edge AI research and deployment.

Abstract

We present H2O-Danube3, a series of small language models consisting of H2O-Danube3-4B, trained on 6T tokens and H2O-Danube3-500M, trained on 4T tokens. Our models are pre-trained on high quality Web data consisting of primarily English tokens in three stages with different data mixes before final supervised tuning for chat version. The models exhibit highly competitive metrics across a multitude of academic, chat, and fine-tuning benchmarks. Thanks to its compact architecture, H2O-Danube3 can be efficiently run on a modern smartphone, enabling local inference and rapid processing capabilities even on mobile devices. We make all models openly available under Apache 2.0 license further democratizing LLMs to a wider audience economically.
Paper Structure (7 sections, 1 figure, 7 tables)

This paper contains 7 sections, 1 figure, 7 tables.

Figures (1)

  • Figure 1: Data stages for H2O-Danube3-4B. The model is trained over three different stages with different data mixes. The first data stage consist of 90.6% of web data which is gradually decreasing to 81.7% at the second stage, and to 51.6% at the third stage. The first two stages include the majority of the tokens: 4.6T and 1.35T tokens respectively, while the third stage comprises of 0.05T tokens.