Mapping Literature Landscapes with Data-Driven Discovery: A Case Study on MOEA/D
Mingyu Huang, Shasha Zhou, Ke Li
TL;DR
LitLA presents a scalable, data-driven workflow for mapping large-scale literature by constructing a heterogeneous bibliographic knowledge graph and applying a multi-perspective analysis pipeline (descriptive statistics, topic modeling, citation and collaboration networks, and future trend forecasting). Using MOEA/D as a case study, it assembles a landscape of 5,404 papers, 10,532 researchers, 432 venues, 78,490 keywords, and 1,661 institutions, enabling insights into topics, collaborations, and evolving patterns. The work demonstrates that data-driven landscape analysis can complement traditional reviews, reveal emergent directions (e.g., LLM-related MOEA/D applications), and offer actionable foresight for researchers and funders, while acknowledging limitations related to data sources and model choices. Overall, LitLA provides a versatile toolkit for systematic, scalable exploration of scientific domains beyond MOEA/D, with potential to accelerate discovery and strategic planning in fast-growing fields.
Abstract
We are living in an era of "big literature", where scientific literature is expanding exponentially. While this growth presents new opportunities, it complicates mapping global scientific research landscapes, as manual review methods become infeasible. Recent advancements in machine learning, complex networks, and natural language processing have enabled numerous data-driven discovery methods. Building upon these tools, we introduce an end-to-end workflow for analyzing large-scale literature landscapes, LitLA. This workflow first integrates diverse publication metadata into a bibliographic knowledge graph (KG) representing the research landscape. It then offers tools for exploratory analysis of various landscape aspects. We demonstrate the effectiveness of LitLA via a case study on follow-up works of multi-objective evolutionary algorithm based on decomposition (MOEA/D). In doing so, we constructed the MOEA/D research landscape as a KG comprising over 5,400 papers, 10,000 authors, 1,600 institutions, and 78,000 keywords. With this landscape, we start with descriptive statistics and investigate prominent topics pertaining to MOEA/D and interrogate their spatial-temporal and bilateral relationships. We then map the collaboration and citation networks to reveal the community's growth over time. We further experiment whether learning on latent patterns of this landscape can hint on future research directions.
