Table of Contents
Fetching ...

Libra-Leaderboard: Towards Responsible AI through a Balanced Leaderboard of Safety and Capability

Haonan Li, Xudong Han, Zenan Zhai, Honglin Mu, Hao Wang, Zhenxuan Zhang, Yilin Geng, Shom Lin, Renxi Wang, Artem Shelmanov, Xiangyu Qi, Yuxia Wang, Donghai Hong, Youliang Yuan, Meng Chen, Haoqin Tu, Fajri Koto, Tatsuki Kuribayashi, Cong Zeng, Rishabh Bhardwaj, Bingchen Zhao, Yawen Duan, Yi Liu, Emad A. Alghamdi, Yaodong Yang, Yinpeng Dong, Soujanya Poria, Pengfei Liu, Zhengzhong Liu, Xuguang Ren, Eduard Hovy, Iryna Gurevych, Preslav Nakov, Monojit Choudhury, Timothy Baldwin

TL;DR

The paper tackles the imbalance between safety and capability in LLM evaluation by proposing Libra-Leaderboard, a framework that jointly ranks models on safety and performance. It combines a safety-oriented evaluation framework (Libra-Eval), a dynamic safety benchmark, and an interactive Safety Arena to collect real user feedback and adversarial testing. A central contribution is the distance-to-optimal-score ranking, exemplified by the Balance-Encouraging Metric $1 - \sqrt{\frac{(1 - x)^2 + (1 - y)^2}{2}}$, which rewards balanced improvements in safety and capability. Empirical results on 26 mainstream models reveal pervasive safety gaps and demonstrate the framework's potential to guide responsible AI development.

Abstract

To address this gap, we introduce Libra-Leaderboard, a comprehensive framework designed to rank LLMs through a balanced evaluation of performance and safety. Combining a dynamic leaderboard with an interactive LLM arena, Libra-Leaderboard encourages the joint optimization of capability and safety. Unlike traditional approaches that average performance and safety metrics, Libra-Leaderboard uses a distance-to-optimal-score method to calculate the overall rankings. This approach incentivizes models to achieve a balance rather than excelling in one dimension at the expense of some other ones. In the first release, Libra-Leaderboard evaluates 26 mainstream LLMs from 14 leading organizations, identifying critical safety challenges even in state-of-the-art models.

Libra-Leaderboard: Towards Responsible AI through a Balanced Leaderboard of Safety and Capability

TL;DR

The paper tackles the imbalance between safety and capability in LLM evaluation by proposing Libra-Leaderboard, a framework that jointly ranks models on safety and performance. It combines a safety-oriented evaluation framework (Libra-Eval), a dynamic safety benchmark, and an interactive Safety Arena to collect real user feedback and adversarial testing. A central contribution is the distance-to-optimal-score ranking, exemplified by the Balance-Encouraging Metric , which rewards balanced improvements in safety and capability. Empirical results on 26 mainstream models reveal pervasive safety gaps and demonstrate the framework's potential to guide responsible AI development.

Abstract

To address this gap, we introduce Libra-Leaderboard, a comprehensive framework designed to rank LLMs through a balanced evaluation of performance and safety. Combining a dynamic leaderboard with an interactive LLM arena, Libra-Leaderboard encourages the joint optimization of capability and safety. Unlike traditional approaches that average performance and safety metrics, Libra-Leaderboard uses a distance-to-optimal-score method to calculate the overall rankings. This approach incentivizes models to achieve a balance rather than excelling in one dimension at the expense of some other ones. In the first release, Libra-Leaderboard evaluates 26 mainstream LLMs from 14 leading organizations, identifying critical safety challenges even in state-of-the-art models.

Paper Structure

This paper contains 28 sections, 8 figures, 5 tables.

Figures (8)

  • Figure 1: Overall Safety and Capability Scores on the LibrAI Leaderboard.
  • Figure 2: User interface of Libra-Leaderboard (left) and Arena (right).
  • Figure 3: Safety benchmark results from the Libra-Leaderboard. The task average scores and model average scores are displayed on the right and the bottom of the figure, respectively. Only the top-20 models and the bottom-20 tasks are included.
  • Figure 4: Results categorized by task type, with average scores shown on the right.
  • Figure 5: Visualization of three methods for combining safety and performance scores into a single metric. Contour lines represent sets of points with the same combined score for each method, showcasing the characteristics of each approach.
  • ...and 3 more figures