Table of Contents
Fetching ...

Position: AI/ML Influencers Have a Place in the Academic Process

Iain Xie Weissburg, Mehir Arora, Xinyi Wang, Liangming Pan, William Yang Wang

TL;DR

With the volume of AI/ML publications expanding rapidly, the paper examines whether social media influencers can shape discovery and citation patterns. It analyzes two influential curators, builds a large target-control dataset of over 8,000 papers with precise covariate matching, and employs statistical and causal inference, including a negative outcome control, to estimate the effect of influencer sharing on citations. The results show that papers endorsed by these influencers accumulate substantially more citations than comparable papers not shared by influencers, supporting a causal effect that is robust to unobserved confounding. The work highlights implications for scholarly communication, equity, and conference workflows, and advocates responsible curation and broader community discussion about information sharing in AI/ML research.

Abstract

As the number of accepted papers at AI and ML conferences reaches into the thousands, it has become unclear how researchers access and read research publications. In this paper, we investigate the role of social media influencers in enhancing the visibility of machine learning research, particularly the citation counts of papers they share. We have compiled a comprehensive dataset of over 8,000 papers, spanning tweets from December 2018 to October 2023, alongside controls precisely matched by 9 key covariates. Our statistical and causal inference analysis reveals a significant increase in citations for papers endorsed by these influencers, with median citation counts 2-3 times higher than those of the control group. Additionally, the study delves into the geographic, gender, and institutional diversity of highlighted authors. Given these findings, we advocate for a responsible approach to curation, encouraging influencers to uphold the journalistic standard that includes showcasing diverse research topics, authors, and institutions.

Position: AI/ML Influencers Have a Place in the Academic Process

TL;DR

With the volume of AI/ML publications expanding rapidly, the paper examines whether social media influencers can shape discovery and citation patterns. It analyzes two influential curators, builds a large target-control dataset of over 8,000 papers with precise covariate matching, and employs statistical and causal inference, including a negative outcome control, to estimate the effect of influencer sharing on citations. The results show that papers endorsed by these influencers accumulate substantially more citations than comparable papers not shared by influencers, supporting a causal effect that is robust to unobserved confounding. The work highlights implications for scholarly communication, equity, and conference workflows, and advocates responsible curation and broader community discussion about information sharing in AI/ML research.

Abstract

As the number of accepted papers at AI and ML conferences reaches into the thousands, it has become unclear how researchers access and read research publications. In this paper, we investigate the role of social media influencers in enhancing the visibility of machine learning research, particularly the citation counts of papers they share. We have compiled a comprehensive dataset of over 8,000 papers, spanning tweets from December 2018 to October 2023, alongside controls precisely matched by 9 key covariates. Our statistical and causal inference analysis reveals a significant increase in citations for papers endorsed by these influencers, with median citation counts 2-3 times higher than those of the control group. Additionally, the study delves into the geographic, gender, and institutional diversity of highlighted authors. Given these findings, we advocate for a responsible approach to curation, encouraging influencers to uphold the journalistic standard that includes showcasing diverse research topics, authors, and institutions.
Paper Structure (16 sections, 4 equations, 11 figures, 5 tables)

This paper contains 16 sections, 4 equations, 11 figures, 5 tables.

Figures (11)

  • Figure 1: The number of papers accepted to top AI/ML conferences (solid) and shared by influencers (dashed) from 2014-2023 Li2023ConferenceAR.
  • Figure 2: Mean OpenReview scores of tweeted papers vs non-tweeted controls from 8 major ML conferences, the kernel density estimate of the joint distribution of scores, and the identity line (dotted). This shows the quality of papers in both sets are more or less equivalent.
  • Figure 3: Plots showing the distribution of citations in the two experimental datasets and matched control samples. Citation counts are scaled with the natural logarithm using numpy.log1p. Both comparisons show that papers shared by influencers have attained significantly higher citations for all three quartiles than those in the control sets.
  • Figure 4: 2-Sample Q-Q Plots comparing the experiment and control set distributions across every quantile. To build the plot, citation counts are log-scaled, normalized to the control distribution (z-scores), sorted, and paired in order. The dotted line shows an equal distribution; any points above the line show a higher experimental quantile, and vice versa. The plots show that both experimental distributions are consistently higher, especially closer to the median.
  • Figure 5: Forest plot of ATET and confidence intervals approximated with negative outcome control (NOC). Larger positive values indicate a stronger positive causal effect of the treatment (influencer sharing) on the outcome (a paper being "highly-cited"). To ensure robust results, we vary the quantile threshold for "highly-cited" and "highly-scored" papers; "Unadj." values (red squares) show the effect estimate before applying NOC. These results, where no confidence intervals contain 0%, indicate a significant positive causal effect of influencer sharing on paper citations.
  • ...and 6 more figures