NLLG Quarterly arXiv Report 09/24: What are the most influential current AI Papers?
Christoph Leiter, Jonas Belouadi, Yanran Chen, Ran Zhang, Daniil Larionov, Aida Kostikova, Steffen Eger
TL;DR
The paper analyzes the most influential AI-related arXiv papers published between 2023-01-01 and 2024-09-30 using week-based, time-normalized citations to identify top works across cs.CL, cs.AI, cs.CV, and cs.LG. It contrasts top-40 papers with a large random sample to study AI-generated content in scientific writing, employing word-lists and detectors to track LLM-influenced language. The findings show a surge of new foundation-model papers and a shift toward multimodal architectures, with NLP maintaining dominance only modestly, while AI-generated content markers are less prevalent in top papers than in random samples. The study provides open code and data artifacts, enabling replication and ongoing monitoring of AI research influence and writing practices. Overall, the work highlights emerging architectural paradigms, evolving linguistic markers in technical writing, and the practical importance of tracking research impact in a rapidly changing AI landscape.
Abstract
The NLLG (Natural Language Learning & Generation) arXiv reports assist in navigating the rapidly evolving landscape of NLP and AI research across cs.CL, cs.CV, cs.AI, and cs.LG categories. This fourth installment captures a transformative period in AI history - from January 1, 2023, following ChatGPT's debut, through September 30, 2024. Our analysis reveals substantial new developments in the field - with 45% of the top 40 most-cited papers being new entries since our last report eight months ago and offers insights into emerging trends and major breakthroughs, such as novel multimodal architectures, including diffusion and state space models. Natural Language Processing (NLP; cs.CL) remains the dominant main category in the list of our top-40 papers but its dominance is on the decline in favor of Computer vision (cs.CV) and general machine learning (cs.LG). This report also presents novel findings on the integration of generative AI in academic writing, documenting its increasing adoption since 2022 while revealing an intriguing pattern: top-cited papers show notably fewer markers of AI-generated content compared to random samples. Furthermore, we track the evolution of AI-associated language, identifying declining trends in previously common indicators such as "delve".
