Monitoring Transformative Technological Convergence Through LLM-Extracted Semantic Entity Triple Graphs
Alexander Sternfeld, Andrei Kucharavy, Dimitri Percia David, Alain Mermoud, Julian Jang-Jaccard, Nathan Monnet
TL;DR
This work tackles the challenge of forecasting transformative ICT technologies by building a scalable, data-driven pipeline that uses LLMs to extract semantic entity triples from full-text sources and to construct a dynamic knowledge graph of technology concepts. It introduces noun stapling and graph-based convergence metrics to detect emerging patterns of technology convergence, and validates the approach on 278,625 arXiv preprints and 9,793 USPTO patent applications, yielding over 53k key terms and 23.8 million triples. The results reveal both established and emerging convergences, such as retrieval-augmented generation and conversational agents, and demonstrate the method's generalizability across scientific and patent data with implications for proactive technology forecasting and policy planning. The proposed framework provides a scalable, interpretable means to monitor transformative potential in fast-moving ICT domains, with practical significance for researchers, industry, and decision-makers.
Abstract
Forecasting transformative technologies remains a critical but challenging task, particularly in fast-evolving domains such as Information and Communication Technologies (ICTs). Traditional expert-based methods struggle to keep pace with short innovation cycles and ambiguous early-stage terminology. In this work, we propose a novel, data-driven pipeline to monitor the emergence of transformative technologies by identifying patterns of technological convergence. Our approach leverages advances in Large Language Models (LLMs) to extract semantic triples from unstructured text and construct a large-scale graph of technology-related entities and relations. We introduce a new method for grouping semantically similar technology terms (noun stapling) and develop graph-based metrics to detect convergence signals. The pipeline includes multi-stage filtering, domain-specific keyword clustering, and a temporal trend analysis of topic co-occurence. We validate our methodology on two complementary datasets: 278,625 arXiv preprints (2017--2024) to capture early scientific signals, and 9,793 USPTO patent applications (2018-2024) to track downstream commercial developments. Our results demonstrate that the proposed pipeline can identify both established and emerging convergence patterns, offering a scalable and generalizable framework for technology forecasting grounded in full-text analysis.
