Towards Heterogeneous Long-tailed Learning: Benchmarking, Metrics, and Toolbox

Haohui Wang; Weijie Guan; Jianpeng Chen; Zi Wang; Dawei Zhou

Towards Heterogeneous Long-tailed Learning: Benchmarking, Metrics, and Toolbox

Haohui Wang, Weijie Guan, Jianpeng Chen, Zi Wang, Dawei Zhou

TL;DR

HeroLT is developed, a comprehensive long-tailed learning benchmark integrating 18 state-of-the-art algorithms, 10 evaluation metrics, and 17 real-world datasets across 6 tasks and 4 data modalities that enables effective and fair evaluation of newly proposed methods compared with existing baselines on varying dataset types.

Abstract

Long-tailed data distributions pose challenges for a variety of domains like e-commerce, finance, biomedical science, and cyber security, where the performance of machine learning models is often dominated by head categories while tail categories are inadequately learned. This work aims to provide a systematic view of long-tailed learning with regard to three pivotal angles: (A1) the characterization of data long-tailedness, (A2) the data complexity of various domains, and (A3) the heterogeneity of emerging tasks. We develop HeroLT, a comprehensive long-tailed learning benchmark integrating 18 state-of-the-art algorithms, 10 evaluation metrics, and 17 real-world datasets across 6 tasks and 4 data modalities. HeroLT with novel angles and extensive experiments (315 in total) enables effective and fair evaluation of newly proposed methods compared with existing baselines on varying dataset types. Finally, we conclude by highlighting the significant applications of long-tailed learning and identifying several promising future directions. For accessibility and reproducibility, we open-source our benchmark HeroLT and corresponding results at https://github.com/SSSKJ/HeroLT.

Towards Heterogeneous Long-tailed Learning: Benchmarking, Metrics, and Toolbox

TL;DR

Abstract

Paper Structure (24 sections, 3 equations, 5 figures, 10 tables)

This paper contains 24 sections, 3 equations, 5 figures, 10 tables.

Introduction
HeroLT: Benchmarking Heterogeneous Long-Tailed Learning
Preliminaries and Problem Definition
Benchmark Angles in HeroLT
Angle 1: Long-Tailedness in Terms of Data Imbalance and Extreme Number of Categories.
Angle 2: Data Complexity with 17 Datasets across 4 Data Modalities.
Angle 3: Task Heterogeneity with 18 Algorithms on 6 Tasks.
Experiment Results and Analyses
Experiment Setting
Algorithm Performance on Object Recognition
Algorithm Performance on Multi-Label Text Classification
Algorithm Performance on Image Classification and Instance Segmentation
Algorithm Performance on Node Classification
Algorithm Performance on Regression
Related Work
...and 9 more sections

Figures (5)

Figure 1: The systematic view of heterogeneous long-tailed learning concerning three pivotal angles, including long-tailedness (colored in red), data complexity (green), and task heterogeneity (blue).
Figure 2: Illustrative figures of a synthetic long-tailed distributed data. (a) long-tailed distribution of categories. (b) Visualization of obeying of Assumption 1, 2. (c) Visualization of violating of Assumption 1. (d) Visualization of violating of Assumption 2. (e) Visualization of violating of Assumption 1, 2.
Figure 3: The data distributions on two commonly used datasets exhibit prominent long-tailed distributions.
Figure 4: The data distributions on two commonly used datasets exhibit prominent long-tailed distributions.
Figure 5: An example to compare the three long-tailedness metrics.

Towards Heterogeneous Long-tailed Learning: Benchmarking, Metrics, and Toolbox

TL;DR

Abstract

Towards Heterogeneous Long-tailed Learning: Benchmarking, Metrics, and Toolbox

Authors

TL;DR

Abstract

Table of Contents

Figures (5)