Estimating the Causal Effect of Early ArXiving on Paper Acceptance

Yanai Elazar; Jiayao Zhang; David Wadden; Bo Zhang; Noah A. Smith

Estimating the Causal Effect of Early ArXiving on Paper Acceptance

Yanai Elazar, Jiayao Zhang, David Wadden, Bo Zhang, Noah A. Smith

TL;DR

The paper tackles the question of whether releasing a paper on arXiv before peer review causally affects its likelihood of acceptance at ICLR. It adopts a causal-inference framework with a negative control outcome (based on post-release citations) and matching to adjust for observed confounders, aiming to account for unobserved quality. The primary analysis suggests a modest positive association between early arXiving and acceptance, but this effect diminishes when unobserved confounding is addressed via the NCO and, under QQ equi-confounding, becomes weak or non-significant in many settings. The findings indicate that early arXiving does not appear to confer a substantial or group-differentiated advantage, challenging the notion that anonymity periods are needed to ensure fairness; the authors advocate a randomized trial for a more definitive assessment. An epilogue notes policy shifts in related venues, highlighting the real-world relevance of understanding how preprint practices influence acceptance dynamics.

Abstract

What is the effect of releasing a preprint of a paper before it is submitted for peer review? No randomized controlled trial has been conducted, so we turn to observational data to answer this question. We use data from the ICLR conference (2018--2022) and apply methods from causal inference to estimate the effect of arXiving a paper before the reviewing period (early arXiving) on its acceptance to the conference. Adjusting for confounders such as topic, authors, and quality, we may estimate the causal effect. However, since quality is a challenging construct to estimate, we use the negative outcome control method, using paper citation count as a control variable to debias the quality confounding effect. Our results suggest that early arXiving may have a small effect on a paper's chances of acceptance. However, this effect (when existing) does not differ significantly across different groups of authors, as grouped by author citation count and institute rank. This suggests that early arXiving does not provide an advantage to any particular group.

Estimating the Causal Effect of Early ArXiving on Paper Acceptance

TL;DR

Abstract

Paper Structure (32 sections, 1 theorem, 13 equations, 4 figures, 7 tables)

This paper contains 32 sections, 1 theorem, 13 equations, 4 figures, 7 tables.

Introduction
Problem Formulation
Causal Inference: Background
Assumptions
Estimating the Causal Effect
Causal Estimand
Effect Estimation
Negative Control Outcome and Difference-in-Difference
Matching
Choosing a Negative Control Outcome Variable
Validity of Citation Count as a Negative Control Outcome
The Effect of arXiving on Acceptance
Dataset
Primary Analysis: Controlling for Observed Confounders
Analysis on Author Subgroups
...and 17 more sections

Key Result

Theorem 1

Under the above assumptions, ATET can be expressed as: where

Figures (4)

Figure 1: Causal graph of our problem. $A$ and $Y$ are binary treatment and effect variables, respectively: whether a paper was arXived before the review deadline, and whether the paper was accepted. As we cannot measure the unobserved confounders (e.g., quality), we estimate the effect of arXiving using a negative control outcome variable ($N$). Solid edges represent a directed causal effect, while dashed edges represent an association.
Figure 2: Effects of early arXiving (with $95\%$ bootstrap confidence interval) estimated on the matched sample. "Unadj" refers to the estimate without using any NCOs, on the same subset of data that are used by the NCO in the same panel. NCOs are defined to be whether $n$-year citation is greater than the $q$th quantile for $n=1,2,3$ and $q\in\{0.5,0.75,0.9\}$. Estimated effects without debiasing using NCOs are shown in red. Note that effects estimated using $N^{(1)}_{0.5}$, $N^{(2)}_{0.75}$, $N^{(3)}_{0.9}$, and $N^{(3)}_{0.9}$ are insignificant at the $95\%$ level (confidence intervals contain $0$); and effects, where significant, are reduced compared with their non-adjusted counterparts (marked in red). This indicates that NCOs help to explain a large part of undebiased effects.
Figure 3: Estimated effects in author subgroups on the matched sample. We use $N^{(n)}$, with three values for $n \in \{1,2,3\}$ as the NCO and estimate the effect across submissions grouped by the minimum author institution and maximum author citation in the submission ("Unadj" refers to the estimate without using any NCO). Note that for NCOs defined using $75\%$ and $90\%$ quantiles, effects are insignificant (confidence intervals contain $0$), and there is no evidence that the effects differ across subgroups (confidence intervals overlap).
Figure C.1: Empirical QQ-Plot ($\operatorname{qq}(u)$). The departure from the identity (dashed line) encodes unobserved confounding.

Theorems & Definitions (1)

Theorem 1: ATET Under QQ Equi-Confounding (Theorem 1 from Sofer2016OnNO)

Estimating the Causal Effect of Early ArXiving on Paper Acceptance

TL;DR

Abstract

Estimating the Causal Effect of Early ArXiving on Paper Acceptance

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (4)

Theorems & Definitions (1)