Table of Contents
Fetching ...

Mapping Literature Landscapes with Data-Driven Discovery: A Case Study on MOEA/D

Mingyu Huang, Shasha Zhou, Ke Li

TL;DR

LitLA presents a scalable, data-driven workflow for mapping large-scale literature by constructing a heterogeneous bibliographic knowledge graph and applying a multi-perspective analysis pipeline (descriptive statistics, topic modeling, citation and collaboration networks, and future trend forecasting). Using MOEA/D as a case study, it assembles a landscape of 5,404 papers, 10,532 researchers, 432 venues, 78,490 keywords, and 1,661 institutions, enabling insights into topics, collaborations, and evolving patterns. The work demonstrates that data-driven landscape analysis can complement traditional reviews, reveal emergent directions (e.g., LLM-related MOEA/D applications), and offer actionable foresight for researchers and funders, while acknowledging limitations related to data sources and model choices. Overall, LitLA provides a versatile toolkit for systematic, scalable exploration of scientific domains beyond MOEA/D, with potential to accelerate discovery and strategic planning in fast-growing fields.

Abstract

We are living in an era of "big literature", where scientific literature is expanding exponentially. While this growth presents new opportunities, it complicates mapping global scientific research landscapes, as manual review methods become infeasible. Recent advancements in machine learning, complex networks, and natural language processing have enabled numerous data-driven discovery methods. Building upon these tools, we introduce an end-to-end workflow for analyzing large-scale literature landscapes, LitLA. This workflow first integrates diverse publication metadata into a bibliographic knowledge graph (KG) representing the research landscape. It then offers tools for exploratory analysis of various landscape aspects. We demonstrate the effectiveness of LitLA via a case study on follow-up works of multi-objective evolutionary algorithm based on decomposition (MOEA/D). In doing so, we constructed the MOEA/D research landscape as a KG comprising over 5,400 papers, 10,000 authors, 1,600 institutions, and 78,000 keywords. With this landscape, we start with descriptive statistics and investigate prominent topics pertaining to MOEA/D and interrogate their spatial-temporal and bilateral relationships. We then map the collaboration and citation networks to reveal the community's growth over time. We further experiment whether learning on latent patterns of this landscape can hint on future research directions.

Mapping Literature Landscapes with Data-Driven Discovery: A Case Study on MOEA/D

TL;DR

LitLA presents a scalable, data-driven workflow for mapping large-scale literature by constructing a heterogeneous bibliographic knowledge graph and applying a multi-perspective analysis pipeline (descriptive statistics, topic modeling, citation and collaboration networks, and future trend forecasting). Using MOEA/D as a case study, it assembles a landscape of 5,404 papers, 10,532 researchers, 432 venues, 78,490 keywords, and 1,661 institutions, enabling insights into topics, collaborations, and evolving patterns. The work demonstrates that data-driven landscape analysis can complement traditional reviews, reveal emergent directions (e.g., LLM-related MOEA/D applications), and offer actionable foresight for researchers and funders, while acknowledging limitations related to data sources and model choices. Overall, LitLA provides a versatile toolkit for systematic, scalable exploration of scientific domains beyond MOEA/D, with potential to accelerate discovery and strategic planning in fast-growing fields.

Abstract

We are living in an era of "big literature", where scientific literature is expanding exponentially. While this growth presents new opportunities, it complicates mapping global scientific research landscapes, as manual review methods become infeasible. Recent advancements in machine learning, complex networks, and natural language processing have enabled numerous data-driven discovery methods. Building upon these tools, we introduce an end-to-end workflow for analyzing large-scale literature landscapes, LitLA. This workflow first integrates diverse publication metadata into a bibliographic knowledge graph (KG) representing the research landscape. It then offers tools for exploratory analysis of various landscape aspects. We demonstrate the effectiveness of LitLA via a case study on follow-up works of multi-objective evolutionary algorithm based on decomposition (MOEA/D). In doing so, we constructed the MOEA/D research landscape as a KG comprising over 5,400 papers, 10,000 authors, 1,600 institutions, and 78,000 keywords. With this landscape, we start with descriptive statistics and investigate prominent topics pertaining to MOEA/D and interrogate their spatial-temporal and bilateral relationships. We then map the collaboration and citation networks to reveal the community's growth over time. We further experiment whether learning on latent patterns of this landscape can hint on future research directions.
Paper Structure (50 sections, 2 equations, 13 figures, 1 table)

This paper contains 50 sections, 2 equations, 13 figures, 1 table.

Figures (13)

  • Figure 1: Overview of the LitLA workflow, featuring eight primary modules. (a) online literature database query for relevant literature. (b) collecting publication metadata from online databases. (c) constructing bibliographic knowledge graph that represents the literature landscape using collected metadata, wherein nodes are heterogenous entities like authors, papers, venues, etc., and edges indicate relationships among them. (d) general statistical analysis of the literature landscape, e.g., publications per year, geographical distributions of researchers. (e) topic modeling using paper embeddings and clustering algorithms. (f) citation network analysis that highlights the "skeleton" of existing research and characterizes the growth pattern of the research. (g) collaboration network analysis to reveal collaboration patterns in the community. (h) future trend forecasting by learning from hidden patterns embedded in the research landscape.
  • Figure 2: General information of surveyed MOEA/D literature. (A) Number of publications per year and its cumulative distribution. (B) Number of authors per year and its cumulative distribution. (C) Frequency of top-20 subject categories. (D) Ring chart of pulication type (inner) and citation intention (outer). (E) Number of publications of the top-20 popular venues.
  • Figure 3: (Top) Geographic distribution of MOEA/D researchers. (Bottom) The number of researchers in the $20$ most active regions.
  • Figure 4: Low-dimensional visualization of the MOEA/D literature landscape by projecting the paper embeddings using UMAP. Papers are colored and labeled by BERTopic topics. Outliers are shown in light gray.
  • Figure 5: (Left) Number of publications per year and (Right) relative percentages for (A) the $7$ topics on MOs variants, (B) the $3$ application domains. For the right panel, we truncated the time range to 2012-2023 as the number of publications before 2012 is relatively small.
  • ...and 8 more figures

Theorems & Definitions (1)

  • Definition 1: Literature Landscape