Table of Contents
Fetching ...

Simplifying Data Integration: SLM-Driven Systems for Unified Semantic Queries Across Heterogeneous Databases

Teng Lin

TL;DR

The paper tackles the challenge of unified semantic querying across heterogeneous databases under resource constraints. It presents a lightweight, SLM-driven Retrieval-Augmented Generation framework that fuses semantic-aware heterogeneous graph indexing, topology-enhanced retrieval, and SLM-driven structured data extraction, augmented with semantic entropy for uncertainty quantification. The approach introduces a MiniRAG-inspired indexing strategy, relational-table generation, and semantic operator synthesis to enable robust Multi-Entity QA across diverse data formats, achieving improved accuracy and efficiency. The work demonstrates cost-effective, domain-agnostic applicability, aiming to empower real-time analytics and knowledge-base construction in next-generation database systems.

Abstract

The integration of heterogeneous databases into a unified querying framework remains a critical challenge, particularly in resource-constrained environments. This paper presents a novel Small Language Model(SLM)-driven system that synergizes advancements in lightweight Retrieval-Augmented Generation (RAG) and semantic-aware data structuring to enable efficient, accurate, and scalable query resolution across diverse data formats. By integrating MiniRAG's semantic-aware heterogeneous graph indexing and topology-enhanced retrieval with SLM-powered structured data extraction, our system addresses the limitations of traditional methods in handling Multi-Entity Question Answering (Multi-Entity QA) and complex semantic queries. Experimental results demonstrate superior performance in accuracy and efficiency, while the introduction of semantic entropy as an unsupervised evaluation metric provides robust insights into model uncertainty. This work pioneers a cost-effective, domain-agnostic solution for next-generation database systems.

Simplifying Data Integration: SLM-Driven Systems for Unified Semantic Queries Across Heterogeneous Databases

TL;DR

The paper tackles the challenge of unified semantic querying across heterogeneous databases under resource constraints. It presents a lightweight, SLM-driven Retrieval-Augmented Generation framework that fuses semantic-aware heterogeneous graph indexing, topology-enhanced retrieval, and SLM-driven structured data extraction, augmented with semantic entropy for uncertainty quantification. The approach introduces a MiniRAG-inspired indexing strategy, relational-table generation, and semantic operator synthesis to enable robust Multi-Entity QA across diverse data formats, achieving improved accuracy and efficiency. The work demonstrates cost-effective, domain-agnostic applicability, aiming to empower real-time analytics and knowledge-base construction in next-generation database systems.

Abstract

The integration of heterogeneous databases into a unified querying framework remains a critical challenge, particularly in resource-constrained environments. This paper presents a novel Small Language Model(SLM)-driven system that synergizes advancements in lightweight Retrieval-Augmented Generation (RAG) and semantic-aware data structuring to enable efficient, accurate, and scalable query resolution across diverse data formats. By integrating MiniRAG's semantic-aware heterogeneous graph indexing and topology-enhanced retrieval with SLM-powered structured data extraction, our system addresses the limitations of traditional methods in handling Multi-Entity Question Answering (Multi-Entity QA) and complex semantic queries. Experimental results demonstrate superior performance in accuracy and efficiency, while the introduction of semantic entropy as an unsupervised evaluation metric provides robust insights into model uncertainty. This work pioneers a cost-effective, domain-agnostic solution for next-generation database systems.

Paper Structure

This paper contains 12 sections.