Table of Contents
Fetching ...

OpenReview Should be Protected and Leveraged as a Community Asset for Research in the Era of Large Language Models

Hao Sun, Yunyi Shen, Mihaela van der Schaar

TL;DR

OpenReview is proposed as a core, evolving community asset to advance AI research in the era of large language models. The paper details how OpenReview's real-world, multi-round deliberations can improve peer review, enable open-ended task evaluation, and enhance alignment research, while proposing standardized benchmarks and responsible data stewardship. It presents two supervision streams—scientific demonstrations and structured evaluations—for post-training AI systems to conduct and assess research with human-like rigor. It also addresses potential concerns about automation, urging continued human oversight to preserve the deliberative and values-guided nature of scientific critique.

Abstract

In the era of large language models (LLMs), high-quality, domain-rich, and continuously evolving datasets capturing expert-level knowledge, core human values, and reasoning are increasingly valuable. This position paper argues that OpenReview -- the continually evolving repository of research papers, peer reviews, author rebuttals, meta-reviews, and decision outcomes -- should be leveraged more broadly as a core community asset for advancing research in the era of LLMs. We highlight three promising areas in which OpenReview can uniquely contribute: enhancing the quality, scalability, and accountability of peer review processes; enabling meaningful, open-ended benchmarks rooted in genuine expert deliberation; and supporting alignment research through real-world interactions reflecting expert assessment, intentions, and scientific values. To better realize these opportunities, we suggest the community collaboratively explore standardized benchmarks and usage guidelines around OpenReview, inviting broader dialogue on responsible data use, ethical considerations, and collective stewardship.

OpenReview Should be Protected and Leveraged as a Community Asset for Research in the Era of Large Language Models

TL;DR

OpenReview is proposed as a core, evolving community asset to advance AI research in the era of large language models. The paper details how OpenReview's real-world, multi-round deliberations can improve peer review, enable open-ended task evaluation, and enhance alignment research, while proposing standardized benchmarks and responsible data stewardship. It presents two supervision streams—scientific demonstrations and structured evaluations—for post-training AI systems to conduct and assess research with human-like rigor. It also addresses potential concerns about automation, urging continued human oversight to preserve the deliberative and values-guided nature of scientific critique.

Abstract

In the era of large language models (LLMs), high-quality, domain-rich, and continuously evolving datasets capturing expert-level knowledge, core human values, and reasoning are increasingly valuable. This position paper argues that OpenReview -- the continually evolving repository of research papers, peer reviews, author rebuttals, meta-reviews, and decision outcomes -- should be leveraged more broadly as a core community asset for advancing research in the era of LLMs. We highlight three promising areas in which OpenReview can uniquely contribute: enhancing the quality, scalability, and accountability of peer review processes; enabling meaningful, open-ended benchmarks rooted in genuine expert deliberation; and supporting alignment research through real-world interactions reflecting expert assessment, intentions, and scientific values. To better realize these opportunities, we suggest the community collaboratively explore standardized benchmarks and usage guidelines around OpenReview, inviting broader dialogue on responsible data use, ethical considerations, and collective stewardship.

Paper Structure

This paper contains 18 sections, 2 equations, 3 figures, 1 table.

Figures (3)

  • Figure 1: Left: an overview of the OpenReview data generation process; mid: this position paper argues OpenReview supports three main valuable applications --- regulating peer review, empowering LLM and Agentic open-ended task research, and post-training for alignment and reasoning; right: highlighted research opportunities around those use cases.
  • Figure 2: Growth trends at ICLR (2017–2025) in submissions, authors, and reviewers. While the number of reviewers has increased over time, it has not kept pace with the growth in submissions and authors, indicating a growing strain on the peer review process. The reviewer number estimation is calculated according to the number of submissions, the total number of reviews received, and the average reviewer workload of 3 per reviewer.
  • Figure 3: Distribution of frequency of bad reviews under Wright-Fisher type of selection model. The three stages of time are marked in red vertical lines in the first two panels. First column: model number of reviews, Second: what selection we put at which time, Third-last: distribution of proportion of bad reviews.