Table of Contents
Fetching ...

TUDataset: A collection of benchmark datasets for learning with graphs

Christopher Morris, Nils M. Kriege, Franka Bause, Kristian Kersting, Petra Mutzel, Marion Neumann

TL;DR

To address inconsistent benchmarking in graph learning, the paper introduces TUDataset, a large collection of 120+ datasets with standardized loading, evaluation, and baseline tools. It provides kernels and GNN baselines and demonstrates reproducible experiments across diverse domains, from small molecules to social networks and molecular regression tasks. The experimental study reveals that classical graph kernels, particularly WL-based methods, remain competitive on many classification tasks and scale differently than GNNs, while neural models like MPNN gain over baselines on large-scale molecular regression. Overall, TUDataset aims to standardize benchmarks and facilitate fair, comparable progress in graph representation learning.

Abstract

Recently, there has been an increasing interest in (supervised) learning with graph data, especially using graph neural networks. However, the development of meaningful benchmark datasets and standardized evaluation procedures is lagging, consequently hindering advancements in this area. To address this, we introduce the TUDataset for graph classification and regression. The collection consists of over 120 datasets of varying sizes from a wide range of applications. We provide Python-based data loaders, kernel and graph neural network baseline implementations, and evaluation tools. Here, we give an overview of the datasets, standardized evaluation procedures, and provide baseline experiments. All datasets are available at www.graphlearning.io. The experiments are fully reproducible from the code available at www.github.com/chrsmrrs/tudataset.

TUDataset: A collection of benchmark datasets for learning with graphs

TL;DR

To address inconsistent benchmarking in graph learning, the paper introduces TUDataset, a large collection of 120+ datasets with standardized loading, evaluation, and baseline tools. It provides kernels and GNN baselines and demonstrates reproducible experiments across diverse domains, from small molecules to social networks and molecular regression tasks. The experimental study reveals that classical graph kernels, particularly WL-based methods, remain competitive on many classification tasks and scale differently than GNNs, while neural models like MPNN gain over baselines on large-scale molecular regression. Overall, TUDataset aims to standardize benchmarks and facilitate fair, comparable progress in graph representation learning.

Abstract

Recently, there has been an increasing interest in (supervised) learning with graph data, especially using graph neural networks. However, the development of meaningful benchmark datasets and standardized evaluation procedures is lagging, consequently hindering advancements in this area. To address this, we introduce the TUDataset for graph classification and regression. The collection consists of over 120 datasets of varying sizes from a wide range of applications. We provide Python-based data loaders, kernel and graph neural network baseline implementations, and evaluation tools. Here, we give an overview of the datasets, standardized evaluation procedures, and provide baseline experiments. All datasets are available at www.graphlearning.io. The experiments are fully reproducible from the code available at www.github.com/chrsmrrs/tudataset.

Paper Structure

This paper contains 11 sections, 4 tables.