Table of Contents
Fetching ...

LLLMs: A Data-Driven Survey of Evolving Research on Limitations of Large Language Models

Aida Kostikova, Zhipin Wang, Deidamea Bajri, Ole Pütz, Benjamin Paaßen, Steffen Eger

TL;DR

A data-driven, semi-automated review of research on limitations of LLLMs from 2022 to early 2025 using a bottom-up approach, which finds that the share of LLM-related papers increases over fivefold in ACL and nearly eightfold in arXiv between 2022 and 2025.

Abstract

Large language model (LLM) research has grown rapidly, along with increasing concern about their limitations. In this survey, we conduct a data-driven, semi-automated review of research on limitations of LLMs (LLLMs) from 2022 to early 2025 using a bottom-up approach. From a corpus of 250,000 ACL and arXiv papers, we identify 14,648 relevant papers using keyword filtering, LLM-based classification, validated against expert labels, and topic clustering (via two approaches, HDBSCAN+BERTopic and LlooM). We find that the share of LLM-related papers increases over fivefold in ACL and nearly eightfold in arXiv between 2022 and 2025. Since 2022, LLLMs research grows even faster, reaching over 30% of LLM papers by 2025. Reasoning remains the most studied limitation, followed by generalization, hallucination, bias, and security. The distribution of topics in the ACL dataset stays relatively stable over time, while arXiv shifts toward security risks, alignment, hallucinations, knowledge editing, and multimodality. We offer a quantitative view of trends in LLLMs research and release a dataset of annotated abstracts and a validated methodology, available at: https://github.com/a-kostikova/LLLMs-Survey.

LLLMs: A Data-Driven Survey of Evolving Research on Limitations of Large Language Models

TL;DR

A data-driven, semi-automated review of research on limitations of LLLMs from 2022 to early 2025 using a bottom-up approach, which finds that the share of LLM-related papers increases over fivefold in ACL and nearly eightfold in arXiv between 2022 and 2025.

Abstract

Large language model (LLM) research has grown rapidly, along with increasing concern about their limitations. In this survey, we conduct a data-driven, semi-automated review of research on limitations of LLMs (LLLMs) from 2022 to early 2025 using a bottom-up approach. From a corpus of 250,000 ACL and arXiv papers, we identify 14,648 relevant papers using keyword filtering, LLM-based classification, validated against expert labels, and topic clustering (via two approaches, HDBSCAN+BERTopic and LlooM). We find that the share of LLM-related papers increases over fivefold in ACL and nearly eightfold in arXiv between 2022 and 2025. Since 2022, LLLMs research grows even faster, reaching over 30% of LLM papers by 2025. Reasoning remains the most studied limitation, followed by generalization, hallucination, bias, and security. The distribution of topics in the ACL dataset stays relatively stable over time, while arXiv shifts toward security risks, alignment, hallucinations, knowledge editing, and multimodality. We offer a quantitative view of trends in LLLMs research and release a dataset of annotated abstracts and a validated methodology, available at: https://github.com/a-kostikova/LLLMs-Survey.

Paper Structure

This paper contains 49 sections, 20 figures, 23 tables.

Figures (20)

  • Figure 1: Overview of the pipeline for our systematic literature review.
  • Figure 2: Distribution of papers over time in the crawled dataset, showing ACL papers, arXiv papers, and the total count (ACL + arXiv).
  • Figure 3: Comparison of clustering steps in HDBSCAN+BERTopic and LLooM. Both methods take on evidence excerpts with appended keyphrases as an input. For LLooM, we omit the Distill step, which is typically used to summarize full documents, as our input already consists of concise excerpts.
  • Figure 4: Confusion matrices comparing human agreement (Figure \ref{['fig:annotators-confusion-matrix']}) and the predictions of Llama-3.1-70b against the final labels (Figure \ref{['fig:llama-confusion-matrix']}). The human agreement matrix is aggregated over all pairwise annotator comparisons, making it symmetric.
  • Figure 5: Trends in LLM and LLM limitation research over time. Figure \ref{['fig:lllm-llm-crawled-ratio']} shows the share of LLM and limitation papers among all crawled papers, while Figure \ref{['fig:llm-limitation-ratio']} illustrates the proportion of limitation papers within LLM research. Note that the limitation trend in (ii) can rise even if it appears flatter in (i), as (ii) reflects growth relative to LLM research, not all papers.
  • ...and 15 more figures