Estimating the Causal Effect of Early ArXiving on Paper Acceptance
Yanai Elazar, Jiayao Zhang, David Wadden, Bo Zhang, Noah A. Smith
TL;DR
The paper tackles the question of whether releasing a paper on arXiv before peer review causally affects its likelihood of acceptance at ICLR. It adopts a causal-inference framework with a negative control outcome (based on post-release citations) and matching to adjust for observed confounders, aiming to account for unobserved quality. The primary analysis suggests a modest positive association between early arXiving and acceptance, but this effect diminishes when unobserved confounding is addressed via the NCO and, under QQ equi-confounding, becomes weak or non-significant in many settings. The findings indicate that early arXiving does not appear to confer a substantial or group-differentiated advantage, challenging the notion that anonymity periods are needed to ensure fairness; the authors advocate a randomized trial for a more definitive assessment. An epilogue notes policy shifts in related venues, highlighting the real-world relevance of understanding how preprint practices influence acceptance dynamics.
Abstract
What is the effect of releasing a preprint of a paper before it is submitted for peer review? No randomized controlled trial has been conducted, so we turn to observational data to answer this question. We use data from the ICLR conference (2018--2022) and apply methods from causal inference to estimate the effect of arXiving a paper before the reviewing period (early arXiving) on its acceptance to the conference. Adjusting for confounders such as topic, authors, and quality, we may estimate the causal effect. However, since quality is a challenging construct to estimate, we use the negative outcome control method, using paper citation count as a control variable to debias the quality confounding effect. Our results suggest that early arXiving may have a small effect on a paper's chances of acceptance. However, this effect (when existing) does not differ significantly across different groups of authors, as grouped by author citation count and institute rank. This suggests that early arXiving does not provide an advantage to any particular group.
