Table of Contents
Fetching ...

Structured Linked Data as a Memory Layer for Agent-Orchestrated Retrieval

Andrea Volpini, Elie Raad, Beatrice Gamba, David Riccitelli

TL;DR

This paper investigates whether structured linked data, specifically Schema.org markup and dereferenceable entity pages served by a Linked Data Platform, can improve retrieval accuracy and answer quality in both standard and agentic RAG systems.

Abstract

Retrieval-Augmented Generation (RAG) systems typically treat documents as flat text, ignoring the structured metadata and linked relationships that knowledge graphs provide. In this paper, we investigate whether structured linked data, specifically Schema.org markup and dereferenceable entity pages served by a Linked Data Platform, can improve retrieval accuracy and answer quality in both standard and agentic RAG systems. We conduct a controlled experiment across four domains (editorial, legal, travel, e-commerce) using Vertex AI Vector Search 2.0 for retrieval and the Google Agent Development Kit (ADK) for agentic reasoning. Our experimental design tests seven conditions: three document representations (plain HTML, HTML with JSON-LD, and an enhanced agentic-optimized entity page) crossed with two retrieval modes (standard RAG and agentic RAG with multi-hop link traversal), plus an Enhanced+ condition that adds rich navigational affordances and entity interlinking. Our results reveal that while JSON-LD markup alone provides only modest improvements, our enhanced entity page format, incorporating llms.txt-style agent instructions, breadcrumbs, and neural search capabilities, achieves substantial gains: +29.6% accuracy improvement for standard RAG and +29.8% for the full agentic pipeline. The Enhanced+ variant, with richer navigational affordances, achieves the highest absolute scores (accuracy: 4.85/5, completeness: 4.55/5), though the incremental gain over the base enhanced format is not statistically significant. We release our dataset, evaluation framework, and enhanced entity page templates to support reproducibility.

Structured Linked Data as a Memory Layer for Agent-Orchestrated Retrieval

TL;DR

This paper investigates whether structured linked data, specifically Schema.org markup and dereferenceable entity pages served by a Linked Data Platform, can improve retrieval accuracy and answer quality in both standard and agentic RAG systems.

Abstract

Retrieval-Augmented Generation (RAG) systems typically treat documents as flat text, ignoring the structured metadata and linked relationships that knowledge graphs provide. In this paper, we investigate whether structured linked data, specifically Schema.org markup and dereferenceable entity pages served by a Linked Data Platform, can improve retrieval accuracy and answer quality in both standard and agentic RAG systems. We conduct a controlled experiment across four domains (editorial, legal, travel, e-commerce) using Vertex AI Vector Search 2.0 for retrieval and the Google Agent Development Kit (ADK) for agentic reasoning. Our experimental design tests seven conditions: three document representations (plain HTML, HTML with JSON-LD, and an enhanced agentic-optimized entity page) crossed with two retrieval modes (standard RAG and agentic RAG with multi-hop link traversal), plus an Enhanced+ condition that adds rich navigational affordances and entity interlinking. Our results reveal that while JSON-LD markup alone provides only modest improvements, our enhanced entity page format, incorporating llms.txt-style agent instructions, breadcrumbs, and neural search capabilities, achieves substantial gains: +29.6% accuracy improvement for standard RAG and +29.8% for the full agentic pipeline. The Enhanced+ variant, with richer navigational affordances, achieves the highest absolute scores (accuracy: 4.85/5, completeness: 4.55/5), though the incremental gain over the base enhanced format is not statistically significant. We release our dataset, evaluation framework, and enhanced entity page templates to support reproducibility.
Paper Structure (57 sections, 7 figures, 7 tables)

This paper contains 57 sections, 7 figures, 7 tables.

Figures (7)

  • Figure 1: Before and after: plain HTML (left) vs. enhanced entity page (right) for a sample entity. The enhanced format adds structured breadcrumbs, related entity links with dereferenceable URIs, agent instructions in llms.txt style, and an embedded JSON-LD block---yielding a +29.6% accuracy improvement in standard RAG and +29.8% in the agentic pipeline.
  • Figure 2: System architecture. User queries are processed by a Google ADK agent that orchestrates three tools: vector search over Vertex AI, entity link traversal, and neural search via MCP. Documents are indexed in three formats (C1--C6) and the agent generates grounded answers using a ReAct-style reasoning loop.
  • Figure 3: Mean accuracy and completeness scores by experimental condition. Enhanced entity pages (C3, C6, C6+) dramatically outperform plain HTML and JSON-LD conditions. C6+ achieves the highest scores. Error bars show 95% confidence intervals.
  • Figure 4: Answer quality progression across conditions for the same factual query. C1 (plain HTML) produces a generic answer (1/5), while C6 and C6+ (enhanced entity pages + agentic RAG) follow links to related entities and retrieve comprehensive structured data (5/5). C6+ achieves the same peak score with lower variance.
  • Figure 5: Accuracy improvement waterfall showing the cumulative effect of each optimization layer: JSON-LD markup, agentic retrieval, and enhanced entity pages. The largest gains come from link materialization in enhanced pages, not from adding structured data alone.
  • ...and 2 more figures