Table of Contents
Fetching ...

ChatGPT's One-year Anniversary: Are Open-Source Large Language Models Catching up?

Hailin Chen, Fangkai Jiao, Xingxuan Li, Chengwei Qin, Mathieu Ravaut, Ruochen Zhao, Caiming Xiong, Shafiq Joty

TL;DR

The paper surveys a year of open-source LLM progress relative to ChatGPT, assessing whether open-source models have closed the gap in a broad set of tasks. It synthesizes results across general, agent-based, reasoning, long-context, medical, and structured-output domains, highlighting advances in instruction tuning, high-quality data, and retrieval augmentation. It also examines safety, hallucination, and data contamination challenges, arguing that while open-source LLMs are narrowing the gap, matching GPT-4 remains challenging and depends on data access and alignment practices. The authors propose practical guidelines for building robust open-source LLMs, including data curation, architecture choices, training regimes, and efficient inference strategies, and call for continued focus on transparency and scalable alignment as paths forward.

Abstract

Upon its release in late 2022, ChatGPT has brought a seismic shift in the entire landscape of AI, both in research and commerce. Through instruction-tuning a large language model (LLM) with supervised fine-tuning and reinforcement learning from human feedback, it showed that a model could answer human questions and follow instructions on a broad panel of tasks. Following this success, interests in LLMs have intensified, with new LLMs flourishing at frequent interval across academia and industry, including many start-ups focused on LLMs. While closed-source LLMs (e.g., OpenAI's GPT, Anthropic's Claude) generally outperform their open-source counterparts, the progress on the latter has been rapid with claims of achieving parity or even better on certain tasks. This has crucial implications not only on research but also on business. In this work, on the first anniversary of ChatGPT, we provide an exhaustive overview of this success, surveying all tasks where an open-source LLM has claimed to be on par or better than ChatGPT.

ChatGPT's One-year Anniversary: Are Open-Source Large Language Models Catching up?

TL;DR

The paper surveys a year of open-source LLM progress relative to ChatGPT, assessing whether open-source models have closed the gap in a broad set of tasks. It synthesizes results across general, agent-based, reasoning, long-context, medical, and structured-output domains, highlighting advances in instruction tuning, high-quality data, and retrieval augmentation. It also examines safety, hallucination, and data contamination challenges, arguing that while open-source LLMs are narrowing the gap, matching GPT-4 remains challenging and depends on data access and alignment practices. The authors propose practical guidelines for building robust open-source LLMs, including data curation, architecture choices, training regimes, and efficient inference strategies, and call for continued focus on transparency and scalable alignment as paths forward.

Abstract

Upon its release in late 2022, ChatGPT has brought a seismic shift in the entire landscape of AI, both in research and commerce. Through instruction-tuning a large language model (LLM) with supervised fine-tuning and reinforcement learning from human feedback, it showed that a model could answer human questions and follow instructions on a broad panel of tasks. Following this success, interests in LLMs have intensified, with new LLMs flourishing at frequent interval across academia and industry, including many start-ups focused on LLMs. While closed-source LLMs (e.g., OpenAI's GPT, Anthropic's Claude) generally outperform their open-source counterparts, the progress on the latter has been rapid with claims of achieving parity or even better on certain tasks. This has crucial implications not only on research but also on business. In this work, on the first anniversary of ChatGPT, we provide an exhaustive overview of this success, surveying all tasks where an open-source LLM has claimed to be on par or better than ChatGPT.
Paper Structure (59 sections, 3 figures, 5 tables)

This paper contains 59 sections, 3 figures, 5 tables.

Figures (3)

  • Figure 1: Overview of different open-source LLMs on various general benchmarks.
  • Figure 2: Typology of LLM's capabilities and best performing open-LLMs. White boxes denote domains, blue boxes represent specific datasets and orange boxes denote open-sourced LLMs.
  • Figure 3: LLM development timeline. The models below the arrow are closed-source while those above the arrow are open-source.