H2O-Danube-1.8B Technical Report
Philipp Singer, Pascal Pfeiffer, Yauhen Babakhin, Maximilian Jeblick, Nischay Dhankhar, Gabor Fodor, Sri Satish Ambati
TL;DR
H2O-Danube presents a family of open-source, $1.8B$ decoder LLMs trained on $1T$ tokens, with an enhanced $2T$-token iteration (Danube2) that achieves state-of-the-art performance among open models below $2B$ parameters. The work combines architectural choices inspired by Llama 2 and Mistral with data-stage training, FP8 acceleration, and a rigorous SFT+ DPO dialogue-tuning pipeline to produce competitive base and chat models. The models are released under Apache $2.0$, enabling commercial use and community fine-tuning, and they demonstrate strong performance on commonsense reasoning, world knowledge, and reading comprehension benchmarks as well as Open LLM Leaderboard rankings. This open, permissive release aims to democratize access to capable LLMs that can run on consumer hardware and be further improved by the research and developer community.
Abstract
We present H2O-Danube, a series of small 1.8B language models consisting of H2O-Danube-1.8B, trained on 1T tokens, and the incremental improved H2O-Danube2-1.8B trained on an additional 2T tokens. Our models exhibit highly competitive metrics across a multitude of benchmarks and, as of the time of this writing, H2O-Danube2-1.8B achieves the top ranking on Open LLM Leaderboard for all models below the 2B parameter range. The models follow core principles of LLama 2 and Mistral, and we leverage and refine various techniques for pre-training large language models. We additionally release chat models trained with supervised fine-tuning followed by direct preference optimization. We make all models openly available under Apache 2.0 license further democratizing LLMs to a wider audience economically.
