Table of Contents
Fetching ...

Diff-eRank: A Novel Rank-Based Metric for Evaluating Large Language Models

Lai Wei, Zhiquan Tan, Chenghai Li, Jindong Wang, Weiran Huang

TL;DR

A novel rank-based metric, Diff-eRank, grounded in information theory and geometry principles is introduced, which assesses LLMs by analyzing their hidden representations, providing a quantitative measure of how efficiently they eliminate redundant information during training.

Abstract

Large Language Models (LLMs) have transformed natural language processing and extended their powerful capabilities to multi-modal domains. As LLMs continue to advance, it is crucial to develop diverse and appropriate metrics for their evaluation. In this paper, we introduce a novel rank-based metric, Diff-eRank, grounded in information theory and geometry principles. Diff-eRank assesses LLMs by analyzing their hidden representations, providing a quantitative measure of how efficiently they eliminate redundant information during training. We demonstrate the applicability of Diff-eRank in both single-modal (e.g., language) and multi-modal settings. For language models, our results show that Diff-eRank increases with model size and correlates well with conventional metrics such as loss and accuracy. In the multi-modal context, we propose an alignment evaluation method based on the eRank, and verify that contemporary multi-modal LLMs exhibit strong alignment performance based on our method. Our code is publicly available at https://github.com/waltonfuture/Diff-eRank.

Diff-eRank: A Novel Rank-Based Metric for Evaluating Large Language Models

TL;DR

A novel rank-based metric, Diff-eRank, grounded in information theory and geometry principles is introduced, which assesses LLMs by analyzing their hidden representations, providing a quantitative measure of how efficiently they eliminate redundant information during training.

Abstract

Large Language Models (LLMs) have transformed natural language processing and extended their powerful capabilities to multi-modal domains. As LLMs continue to advance, it is crucial to develop diverse and appropriate metrics for their evaluation. In this paper, we introduce a novel rank-based metric, Diff-eRank, grounded in information theory and geometry principles. Diff-eRank assesses LLMs by analyzing their hidden representations, providing a quantitative measure of how efficiently they eliminate redundant information during training. We demonstrate the applicability of Diff-eRank in both single-modal (e.g., language) and multi-modal settings. For language models, our results show that Diff-eRank increases with model size and correlates well with conventional metrics such as loss and accuracy. In the multi-modal context, we propose an alignment evaluation method based on the eRank, and verify that contemporary multi-modal LLMs exhibit strong alignment performance based on our method. Our code is publicly available at https://github.com/waltonfuture/Diff-eRank.
Paper Structure (24 sections, 13 equations, 4 figures, 11 tables)

This paper contains 24 sections, 13 equations, 4 figures, 11 tables.

Figures (4)

  • Figure 1: Comparison of Diff-eRank and reduced loss when model scales up across various datasets. Both Diff-eRank and reduced loss show an upward trend when the model scales up.
  • Figure 2: Illustration of the eRank measurement in the MLLM framework. The evaluation encompasses the effective rank of image representations after the vision encoder ($\operatorname{eRank}_1$), post-connector representations ($\operatorname{eRank}_2$), as well as the output representations generated by the LLM including individual images ($\operatorname{eRank}_3$), textual data ($\operatorname{eRank}_4$), and the combined image-text pairs ($\operatorname{eRank}_5$).
  • Figure 3: Comparing Diff-eRank with reduced loss and benchmark accuracy across different model families, including OPT zhang2022opt, Cerebras-GPT dey2023cerebras, and OpenELM mehtaOpenELMEfficientLanguage2024.
  • Figure 4: Different designs for Diff-eRank.

Theorems & Definitions (5)

  • Definition 3.1: Construction of Covariance Matrix
  • Definition 3.2: eRank roy2007effective
  • Definition 3.3: Matrix Entropy
  • Definition 3.4: Diff-eRank
  • Definition 3.5: Diff-eRank of a Dataset