CHASE: A Native Relational Database for Hybrid Queries on Structured and Unstructured Data

Rui Ma; Kai Zhang; Zhenying He; Yinan Jing; X. Sean Wang; Zhenqiang Chen

CHASE: A Native Relational Database for Hybrid Queries on Structured and Unstructured Data

Rui Ma, Kai Zhang, Zhenying He, Yinan Jing, X. Sean Wang, Zhenqiang Chen

TL;DR

CHASE addresses the challenge of executing hybrid queries over structured and unstructured data by providing a native relational engine optimized across semantic analysis, plan rewriting, operator design, and code generation. It introduces end-to-end optimizations for VKNN-SF, DR-SF, and W-VKNN-SF queries, including new logical operators (map, updateState) and ANN-aware physical operators, all compiled to efficient machine code. Empirical results on LAION data show substantial improvements, with speedups up to 7500x and recall consistently high, demonstrating robust performance gains over PASE, pgvector, VBASE, and LingoDB variants. The work highlights CHASE’s potential as a practical platform for scalable, accurate hybrid querying in real-world analytics and recommendations.

Abstract

Querying both structured and unstructured data has become a new paradigm in data analytics and recommendation. With unstructured data, such as text and videos, are converted to high-dimensional vectors and queried with approximate nearest neighbor search (ANNS). State-of-the-art database systems implement vector search as a plugin in the relational query engine, which tries to utilize the ANN index to enhance performance. After investigating a broad range of hybrid queries, we find that such designs may miss potential optimization opportunities and achieve suboptimal performance for certain queries. In this paper, we propose CHASE, a query engine that is natively designed to support efficient hybrid queries on structured and unstructured data. CHASE performs specific designs and optimizations on multiple stages in query processing. First, semantic analysis is performed to categorize queries and optimize query plans dynamically. Second, new physical operators are implemented to avoid redundant computations, which is the case with existing operators. Third, compilation-based techniques are adopted for efficient machine code generation. Extensive evaluations using real-world datasets demonstrate that CHASE achieves substantial performance improvements, with speedups ranging from 13% to an extraordinary 7500 times compared to existing systems. These results highlight CHASE's potential as a robust solution for executing hybrid queries.

CHASE: A Native Relational Database for Hybrid Queries on Structured and Unstructured Data

TL;DR

Abstract

Paper Structure (25 sections, 6 equations, 13 figures, 7 tables, 2 algorithms)

This paper contains 25 sections, 6 equations, 13 figures, 7 tables, 2 algorithms.

Introduction
Background and Motivavtion
Background and Advances in Hybrid Queries Processing
Vector KNN with Structured Data Filter Queries
Distance-based Range with Structured Data Filter Queries
Window Vector KNN with Structured Data Filter Queries
Overview
Logical Plan Rewriting
Rewriting KNN-like Queries
Rewriting Entity-Centric VKNN-SF Queries
Rewriting Category-Driven VKNN-SF Queries
Physical Operators Optimization
Map Operator for KNN-like Queries
Index Scan Operator for DR-SF Queries
UpdateState Operator for Category-Driven VKNN-SF Queries
...and 10 more sections

Figures (13)

Figure 1: Hybrid query example
Figure 2: The query plan of PASE
Figure 3: The query plan of VBASE
Figure 4: The query plan of CHASE
Figure 5: Performance comparison for query plan of PASE, VBASE and CHASE
...and 8 more figures

CHASE: A Native Relational Database for Hybrid Queries on Structured and Unstructured Data

TL;DR

Abstract

CHASE: A Native Relational Database for Hybrid Queries on Structured and Unstructured Data

Authors

TL;DR

Abstract

Table of Contents

Figures (13)