Table of Contents
Fetching ...

Suppressing Domain-Specific Hallucination in Construction LLMs: A Knowledge Graph Foundation for GraphRAG and QLoRA on River and Sediment Control Technical Standards

Takato Yasuno

Abstract

This paper addresses the challenge of answering technical questions derived from Japan's River and Sediment Control Technical Standards -- a multi-volume regulatory document covering survey, planning, design, and maintenance of river levees, dams, and sabo structures -- using open-source large language models running entirely on local hardware. We implement and evaluate three complementary approaches: Case A (plain 20B LLM baseline), Case B (8B LLM with QLoRA domain fine-tuning on 715 graph-derived QA pairs), and Case C (20B LLM augmented with a Neo4j knowledge graph via GraphRAG). All three cases use the Swallow series of Japanese-adapted LLMs and are evaluated on a 100-question benchmark spanning 8 technical categories, judged automatically by an independent LLM (Qwen2.5-14B, score 0--3). The key finding is a performance inversion: the 8B QLoRA fine-tuned model (Case B) achieves a judge average of 2.92/3 -- surpassing both the 20B plain baseline (Case A: 2.29/3, $+$0.63) and the 20B GraphRAG approach (Case C: 2.62/3, $+$0.30) -- while running at 3$\times$ faster latency (14.2s vs. 42.2s for Case A). GraphRAG provides moderate gains ($+$0.33 over baseline) but is outperformed by domain-specific fine-tuning in both quality and efficiency. We document the full engineering pipeline, including knowledge graph construction (200 nodes, 268 relations), QLoRA training data generation from Neo4j relations, training on a single GPU (16 GB VRAM) using unsloth, GGUF Q4_K_M quantisation and Ollama deployment, and the graph retrieval and re-ranking design. High-level engineering lessons are distilled in the main body; implementation pitfalls and toolchain details are documented in Supplementary Materials.

Suppressing Domain-Specific Hallucination in Construction LLMs: A Knowledge Graph Foundation for GraphRAG and QLoRA on River and Sediment Control Technical Standards

Abstract

This paper addresses the challenge of answering technical questions derived from Japan's River and Sediment Control Technical Standards -- a multi-volume regulatory document covering survey, planning, design, and maintenance of river levees, dams, and sabo structures -- using open-source large language models running entirely on local hardware. We implement and evaluate three complementary approaches: Case A (plain 20B LLM baseline), Case B (8B LLM with QLoRA domain fine-tuning on 715 graph-derived QA pairs), and Case C (20B LLM augmented with a Neo4j knowledge graph via GraphRAG). All three cases use the Swallow series of Japanese-adapted LLMs and are evaluated on a 100-question benchmark spanning 8 technical categories, judged automatically by an independent LLM (Qwen2.5-14B, score 0--3). The key finding is a performance inversion: the 8B QLoRA fine-tuned model (Case B) achieves a judge average of 2.92/3 -- surpassing both the 20B plain baseline (Case A: 2.29/3, 0.63) and the 20B GraphRAG approach (Case C: 2.62/3, 0.30) -- while running at 3 faster latency (14.2s vs. 42.2s for Case A). GraphRAG provides moderate gains (0.33 over baseline) but is outperformed by domain-specific fine-tuning in both quality and efficiency. We document the full engineering pipeline, including knowledge graph construction (200 nodes, 268 relations), QLoRA training data generation from Neo4j relations, training on a single GPU (16 GB VRAM) using unsloth, GGUF Q4_K_M quantisation and Ollama deployment, and the graph retrieval and re-ranking design. High-level engineering lessons are distilled in the main body; implementation pitfalls and toolchain details are documented in Supplementary Materials.
Paper Structure (70 sections, 15 equations, 5 figures, 8 tables)

This paper contains 70 sections, 15 equations, 5 figures, 8 tables.

Figures (5)

  • Figure 1: Knowledge graph schema (Node & Relation Map). The Structural Hierarchy (left, orange) encodes four-level document structure. The Domain Semantics (right, blue) encodes engineering entities and their mutual relations. Dashed grey arrows cross-link domain nodes to structural locations (DESCRIBED_IN, DEFINED_IN). In total: 9 node types, 200 nodes, 11 relation types, 268 relations.
  • Figure 2: GraphRAG inference pipeline (Case C). At query time, keywords extracted from the user question drive five parallel Neo4j Cypher queries whose results are deduplicated and scored. If fewer than 25 hits are returned, an adaptive retry doubles TOP_K and broadens the match to substring search. The top-scoring 80% of records (up to 2,000 chars) are prepended to the plain-LLM prompt before generation by Swallow-20B. Qwen2.5-14B then evaluates the output and returns $s_{\mathrm{J}} \in \{0,1,2,3\}$.
  • Figure 3: Approach trade-off: inference speed vs. answer quality (normalised). Case B (QLoRA FT) occupies the ideal upper-right quadrant --- highest quality and fastest inference.
  • Figure 4: Judge score distributions (0--3) for Cases A, B, and C across 100 questions.
  • Figure 5: Evolution of approaches: accuracy improvement across experimental phases. QLoRA FT (Case B) achieves the highest score at the final stage.