Table of Contents
Fetching ...

Paper Espresso: From Paper Overload to Research Insight

Mingzhe Du, Luu Anh Tuan, Dong Huang, See-kiong Ng

Abstract

The accelerating pace of scientific publishing makes it increasingly difficult for researchers to stay current. We present Paper Espresso, an open-source platform that automatically discovers, summarizes, and analyzes trending arXiv papers. The system uses large language models (LLMs) to generate structured summaries with topical labels and keywords, and provides multi-granularity trend analysis at daily, weekly, and monthly scales through LLM-driven topic consolidation. Over 35 months of continuous deployment, Paper Espresso has processed over 13,300 papers and publicly released all structured metadata, revealing rich dynamics in the AI research landscape: a mid-2025 surge in reinforcement learning for LLM reasoning, non-saturating topic emergence (6,673 unique topics), and a positive correlation between topic novelty and community engagement (2.0x median upvotes for the most novel papers). A live demo is available at https://huggingface.co/spaces/Elfsong/Paper_Espresso.

Paper Espresso: From Paper Overload to Research Insight

Abstract

The accelerating pace of scientific publishing makes it increasingly difficult for researchers to stay current. We present Paper Espresso, an open-source platform that automatically discovers, summarizes, and analyzes trending arXiv papers. The system uses large language models (LLMs) to generate structured summaries with topical labels and keywords, and provides multi-granularity trend analysis at daily, weekly, and monthly scales through LLM-driven topic consolidation. Over 35 months of continuous deployment, Paper Espresso has processed over 13,300 papers and publicly released all structured metadata, revealing rich dynamics in the AI research landscape: a mid-2025 surge in reinforcement learning for LLM reasoning, non-saturating topic emergence (6,673 unique topics), and a positive correlation between topic novelty and community engagement (2.0x median upvotes for the most novel papers). A live demo is available at https://huggingface.co/spaces/Elfsong/Paper_Espresso.

Paper Structure

This paper contains 26 sections, 10 figures, 3 tables.

Figures (10)

  • Figure 1: Monthly paper volume: arXiv total (red, left axis) vs. Paper Espresso (blue, right axis). Although Paper Espresso selects only community-trending papers ($\sim$2--3% of arXiv), the two curves exhibit a consistent co-trend, confirming that the curated subset tracks the broader publishing rhythm.
  • Figure 2: System architecture of Paper Espresso. The data ingestion layer fetches papers from the Hugging Face Daily Papers API and arXiv. The AI processing layer uses Google Gemini to generate structured summaries and trend analyses. The presentation layer provides an interactive Streamlit interface with multi-granularity browsing.
  • Figure 3: Bimonthly proportion (%) of the top-10 research topics from May 2023 to March 2026, smoothed with a Gaussian kernel ($\sigma=0.8$) for visual clarity. Trend arrows in the legend indicate each topic's recent trajectory.
  • Figure 4: Community engagement distribution. The histogram (red, left axis) shows a heavily right-skewed upvote distribution; the CDF (blue, right axis) confirms that 50% of papers receive $\le$13 upvotes and 90% receive $\le$52.
  • Figure 5: Topic emergence and diversity. Red bars show the number of new topics each month; the blue line tracks Shannon entropy of the monthly topic distribution, which remains flat around 7.9 bits, confirming sustained diversity.
  • ...and 5 more figures