Table of Contents
Fetching ...

PIPE-RDF: An LLM-Assisted Pipeline for Enterprise RDF Benchmarking

Suraj Ranganath

TL;DR

PIPE-RDF is presented, a three-phase pipeline that constructs schema-specific NL-SPARQL benchmarks using reverse querying, category-balanced template generation, retrieval-augmented prompting, deduplication, and execution-based validation with repair.

Abstract

Enterprises rely on RDF knowledge graphs and SPARQL to expose operational data through natural language interfaces, yet public KGQA benchmarks do not reflect proprietary schemas, prefixes, or query distributions. We present PIPE-RDF, a three-phase pipeline that constructs schema-specific NL-SPARQL benchmarks using reverse querying, category-balanced template generation, retrieval-augmented prompting, deduplication, and execution-based validation with repair. We instantiate PIPE-RDF on a fixed-schema company-location slice (5,000 companies) derived from public RDF data and generate a balanced benchmark of 450 question-SPARQL pairs across nine categories. The pipeline achieves 100% parse and execution validity after repair, with pre-repair validity rates of 96.5%-100% across phases. We report entity diversity metrics, template coverage analysis, and cost breakdowns to support deployment planning. We release structured artifacts (CSV/JSONL, logs, figures) and operational metrics to support model evaluation and system planning in real-world settings. Code is available at https://github.com/suraj-ranganath/PIPE-RDF.

PIPE-RDF: An LLM-Assisted Pipeline for Enterprise RDF Benchmarking

TL;DR

PIPE-RDF is presented, a three-phase pipeline that constructs schema-specific NL-SPARQL benchmarks using reverse querying, category-balanced template generation, retrieval-augmented prompting, deduplication, and execution-based validation with repair.

Abstract

Enterprises rely on RDF knowledge graphs and SPARQL to expose operational data through natural language interfaces, yet public KGQA benchmarks do not reflect proprietary schemas, prefixes, or query distributions. We present PIPE-RDF, a three-phase pipeline that constructs schema-specific NL-SPARQL benchmarks using reverse querying, category-balanced template generation, retrieval-augmented prompting, deduplication, and execution-based validation with repair. We instantiate PIPE-RDF on a fixed-schema company-location slice (5,000 companies) derived from public RDF data and generate a balanced benchmark of 450 question-SPARQL pairs across nine categories. The pipeline achieves 100% parse and execution validity after repair, with pre-repair validity rates of 96.5%-100% across phases. We report entity diversity metrics, template coverage analysis, and cost breakdowns to support deployment planning. We release structured artifacts (CSV/JSONL, logs, figures) and operational metrics to support model evaluation and system planning in real-world settings. Code is available at https://github.com/suraj-ranganath/PIPE-RDF.
Paper Structure (17 sections, 11 figures, 8 tables)

This paper contains 17 sections, 11 figures, 8 tables.

Figures (11)

  • Figure 1: PIPE-RDF workload-construction pipeline: Phase 1 uses reverse query generation to build verified seeds; Phase 2 constructs category-specific seed banks; Phase 3 generates the balanced benchmark with execution-driven validation/repair.
  • Figure 2: Strategy coverage by category on Phase 3 outputs. The matrix shows broad operator coverage, with category-specific concentration on expected strategy patterns (e.g., COUNT for counting, ORDER for superlative, FILTER for difference).
  • Figure 3: Latency by category (stacked for single-column readability): top, LLM latency; bottom, SPARQL execution latency. LLM generation dominates end-to-end runtime.
  • Figure 4: Category-wise metrics: execution success (100%), non-empty rate, and normalized structural complexity.
  • Figure 5: Phase 3 strategy-conditioned error rates. Non-zero cells are concentrated in empty-result outcomes for sparse conjunctive structures.
  • ...and 6 more figures