Table of Contents
Fetching ...

First Place Solution of KDD Cup 2021 & OGB Large-Scale Challenge Graph Prediction Track

Chengxuan Ying, Mingqi Yang, Shuxin Zheng, Guolin Ke, Shengjie Luo, Tianle Cai, Chenglin Wu, Yuxin Wang, Yanming Shen, Di He

TL;DR

This work tackles predicting the HOMO-LUMO gap from molecular graphs in PCQM4M-LSC by deploying two architectures, Graphormer and ExpC*, augmented with rich 3D-aware features and RDKit-derived distances. The models are trained with 8-fold cross-validation and complemented by two full-data Graphormer retrains, culminating in a naive ensemble of 18 models that achieves a test MAE of $0.1200$, winning first place. Validation performance stabilizes around $0.096$–$0.103$ for Graphormer and around $0.101$ for ExpC*, underscoring the benefit of combining transformer-based GNNs with expanding convolution and feature engineering. The results demonstrate the practical impact of advanced graph representations and ensembling for large-scale molecular property prediction.

Abstract

In this technical report, we present our solution of KDD Cup 2021 OGB Large-Scale Challenge - PCQM4M-LSC Track. We adopt Graphormer and ExpC as our basic models. We train each model by 8-fold cross-validation, and additionally train two Graphormer models on the union of training and validation sets with different random seeds. For final submission, we use a naive ensemble for these 18 models by taking average of their outputs. Using our method, our team MachineLearning achieved 0.1200 MAE on test set, which won the first place in KDD Cup graph prediction track.

First Place Solution of KDD Cup 2021 & OGB Large-Scale Challenge Graph Prediction Track

TL;DR

This work tackles predicting the HOMO-LUMO gap from molecular graphs in PCQM4M-LSC by deploying two architectures, Graphormer and ExpC*, augmented with rich 3D-aware features and RDKit-derived distances. The models are trained with 8-fold cross-validation and complemented by two full-data Graphormer retrains, culminating in a naive ensemble of 18 models that achieves a test MAE of , winning first place. Validation performance stabilizes around for Graphormer and around for ExpC*, underscoring the benefit of combining transformer-based GNNs with expanding convolution and feature engineering. The results demonstrate the practical impact of advanced graph representations and ensembling for large-scale molecular property prediction.

Abstract

In this technical report, we present our solution of KDD Cup 2021 OGB Large-Scale Challenge - PCQM4M-LSC Track. We adopt Graphormer and ExpC as our basic models. We train each model by 8-fold cross-validation, and additionally train two Graphormer models on the union of training and validation sets with different random seeds. For final submission, we use a naive ensemble for these 18 models by taking average of their outputs. Using our method, our team MachineLearning achieved 0.1200 MAE on test set, which won the first place in KDD Cup graph prediction track.

Paper Structure

This paper contains 9 sections, 2 equations, 4 tables.