Open Polymer Challenge: Post-Competition Report

Gang Liu; Sobin Alosious; Subhamoy Mahajan; Eric Inae; Yihan Zhu; Yuhan Liu; Renzheng Zhang; Jiaxin Xu; Addison Howard; Ying Li; Tengfei Luo; Meng Jiang

Open Polymer Challenge: Post-Competition Report

Gang Liu, Sobin Alosious, Subhamoy Mahajan, Eric Inae, Yihan Zhu, Yuhan Liu, Renzheng Zhang, Jiaxin Xu, Addison Howard, Ying Li, Tengfei Luo, Meng Jiang

TL;DR

The paper introduces the Open Polymer Challenge (OPC), the first large-scale, community-driven benchmark for polymer informatics, featuring MD-derived properties for thousands of polymers and a multi-task prediction setup. It details the ADEPT data-generation pipeline, the five properties studied, and the competition design, including data leakage handling and distribution-shift considerations. Key findings show that careful data curation, diverse yet simple feature engineering, and robust, tree-based models achieve strong performance under small, noisy datasets, while highlighting needs for standardized pipelines and improved Tg handling. The work provides a practical foundation—datasets, code, and analyses—that can accelerate molecular AI for sustainable polymer discovery and guide best practices for future large-scale polymer datasets.

Abstract

Machine learning (ML) offers a powerful path toward discovering sustainable polymer materials, but progress has been limited by the lack of large, high-quality, and openly accessible polymer datasets. The Open Polymer Challenge (OPC) addresses this gap by releasing the first community-developed benchmark for polymer informatics, featuring a dataset with 10K polymers and 5 properties: thermal conductivity, radius of gyration, density, fractional free volume, and glass transition temperature. The challenge centers on multi-task polymer property prediction, a core step in virtual screening pipelines for materials discovery. Participants developed models under realistic constraints that include small data, label imbalance, and heterogeneous simulation sources, using techniques such as feature-based augmentation, transfer learning, self-supervised pretraining, and targeted ensemble strategies. The competition also revealed important lessons about data preparation, distribution shifts, and cross-group simulation consistency, informing best practices for future large-scale polymer datasets. The resulting models, analysis, and released data create a new foundation for molecular AI in polymer science and are expected to accelerate the development of sustainable and energy-efficient materials. Along with the competition, we release the test dataset at https://www.kaggle.com/datasets/alexliu99/neurips-open-polymer-prediction-2025-test-data. We also release the data generation pipeline at https://github.com/sobinalosious/ADEPT, which simulates more than 25 properties, including thermal conductivity, radius of gyration, and density.

Open Polymer Challenge: Post-Competition Report

TL;DR

Abstract

Open Polymer Challenge: Post-Competition Report

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (5)