Table of Contents
Fetching ...

WebMap -- Large Language Model-assisted Semantic Link Induction in the Web

Shiraj Pokharel, Georg P. Roßrucker, Mario M. Kubek

TL;DR

This paper addresses the inadequacy of conventional web search for research tasks by proposing WebMap extensions that fuse LLM-powered semantic induction with a peer-to-peer overlay of Cluster Files (TRCs). It introduces local term proximity graphs derived from contextual embeddings, enabling more nuanced document clustering and topic signaling than co-occurrence alone. A semantic signpost is built within clusters using an extended HITS framework to identify authorities and hubs (keywords and source topics), guiding directed connections between documents. Subcluster detection via density-based methods and a discussion of limitations (text-centric content, cross-cluster navigation, and distributed data integrity) accompany a roadmap for future multimodal integration and enterprise deployments. Together, these contributions aim to deliver more accurate, navigable, and semantically organized web-scale research support.

Abstract

Carrying out research tasks is only inadequately supported, if not hindered, by current web search engines. This paper therefore proposes functional extensions of WebMap, a semantically induced overlay linking structure on the web to inherently facilitate research activities. These add-ons support the dynamic determination and regrouping of document clusters, the creation of a semantic signpost in the web, and the interactive tracing of topics back to their origins.

WebMap -- Large Language Model-assisted Semantic Link Induction in the Web

TL;DR

This paper addresses the inadequacy of conventional web search for research tasks by proposing WebMap extensions that fuse LLM-powered semantic induction with a peer-to-peer overlay of Cluster Files (TRCs). It introduces local term proximity graphs derived from contextual embeddings, enabling more nuanced document clustering and topic signaling than co-occurrence alone. A semantic signpost is built within clusters using an extended HITS framework to identify authorities and hubs (keywords and source topics), guiding directed connections between documents. Subcluster detection via density-based methods and a discussion of limitations (text-centric content, cross-cluster navigation, and distributed data integrity) accompany a roadmap for future multimodal integration and enterprise deployments. Together, these contributions aim to deliver more accurate, navigable, and semantically organized web-scale research support.

Abstract

Carrying out research tasks is only inadequately supported, if not hindered, by current web search engines. This paper therefore proposes functional extensions of WebMap, a semantically induced overlay linking structure on the web to inherently facilitate research activities. These add-ons support the dynamic determination and regrouping of document clusters, the creation of a semantic signpost in the web, and the interactive tracing of topics back to their origins.

Paper Structure

This paper contains 11 sections, 2 equations, 3 figures, 1 table, 1 algorithm.

Figures (3)

  • Figure 1: WebMap's architecture
  • Figure 2: Determining authorities (keywords) and hubs (source topics) in directed term association graphs
  • Figure 3: Multimodal distribution density (according to Bock:74)