$\boldsymbol{Steiner}$-Hardness: A Query Hardness Measure for Graph-Based ANN Indexes

Zeyu Wang; Qitong Wang; Xiaoxing Cheng; Peng Wang; Themis Palpanas; Wei Wang

$\boldsymbol{Steiner}$-Hardness: A Query Hardness Measure for Graph-Based ANN Indexes

Zeyu Wang, Qitong Wang, Xiaoxing Cheng, Peng Wang, Themis Palpanas, Wei Wang

TL;DR

This work addresses the instability of query performance in graph-based ANN indexes by introducing Steiner-hardness, a graph-native hardness measure defined as the minimum query effort on a representative MRNG graph and computed via connections to Directed Steiner Tree formulations. The authors develop a comprehensive framework for Minimum Effort (ME), adapt it to greedy graph searches, incorporate decision costs, and map the problem to DST and related problems with efficient Steiner-tree solvers. They demonstrate that Steiner-hardness correlates more strongly with actual query effort than prior measures and use unbiased workload generation to reveal robust index behavior, challenging some expectations from old benchmarks. The practical impact lies in providing a principled, graph-aware metric and benchmarks to guide index selection and robustness improvements in real-world high-dimensional similarity search systems.

Abstract

Graph-based indexes have been widely employed to accelerate approximate similarity search of high-dimensional vectors. However, the performance of graph indexes to answer different queries varies vastly, leading to an unstable quality of service for downstream applications. This necessitates an effective measure to test query hardness on graph indexes. Nonetheless, popular distance-based hardness measures like LID lose their effects due to the ignorance of the graph structure. In this paper, we propose $Steiner$-hardness, a novel connection-based graph-native query hardness measure. Specifically, we first propose a theoretical framework to analyze the minimum query effort on graph indexes and then define $Steiner$-hardness as the minimum effort on a representative graph. Moreover, we prove that our $Steiner$-hardness is highly relevant to the classical Directed $Steiner$ Tree (DST) problems. In this case, we design a novel algorithm to reduce our problem to DST problems and then leverage their solvers to help calculate $Steiner$-hardness efficiently. Compared with LID and other similar measures, $Steiner$-hardness shows a significantly better correlation with the actual query effort on various datasets. Additionally, an unbiased evaluation designed based on $Steiner$-hardness reveals new ranking results, indicating a meaningful direction for enhancing the robustness of graph indexes. This paper is accepted by PVLDB 2025.

$\boldsymbol{Steiner}$-Hardness: A Query Hardness Measure for Graph-Based ANN Indexes

TL;DR

Abstract

-hardness, a novel connection-based graph-native query hardness measure. Specifically, we first propose a theoretical framework to analyze the minimum query effort on graph indexes and then define

-hardness as the minimum effort on a representative graph. Moreover, we prove that our

-hardness is highly relevant to the classical Directed

Tree (DST) problems. In this case, we design a novel algorithm to reduce our problem to DST problems and then leverage their solvers to help calculate

-hardness efficiently. Compared with LID and other similar measures,

-hardness shows a significantly better correlation with the actual query effort on various datasets. Additionally, an unbiased evaluation designed based on

-hardness reveals new ranking results, indicating a meaningful direction for enhancing the robustness of graph indexes. This paper is accepted by PVLDB 2025.

Paper Structure (26 sections, 5 theorems, 5 equations, 17 figures, 2 tables, 2 algorithms)

This paper contains 26 sections, 5 theorems, 5 equations, 17 figures, 2 tables, 2 algorithms.

Introduction
Preliminaries
Problem Setting
Query on Graph Indexes
Related Work
ANN indexes
Hardness Measures
Theoretical Study of Query Complexity on Graph Indexes
Characterize the Effort of Answering a Query
Basic ME Definition
Adapt ME for Greedy Search
Incorporate Decision Cost into ME
Map ME Definitions to DST Problems
Calculate ME using Steiner Tree Solvers
Find the Critical Point $\delta_0$
...and 11 more sections

Key Result

Theorem 1

Given a graph $G(V,E)$, a query $q$ with its $k$NN $N_k$,

Figures (17)

Figure 1: Query performance variance on graph indexes. (a) Histograms of NDC to reach 90% recall on Deep deep dataset. (b) A real example on a RAG task rag, where the low recall of hard queries impairs model accuracy.
Figure 2: Comparison of LID and ME on the same dataset.
Figure 3: The correlation between $Steiner$-hardness (b) and NDC to reach 90% recall is much stronger than LID (a).
Figure 4: An illustrative example of our ME definitions with $k$=5. The orange points and edges form $Y$. (a) $Acc$=100%, (b) $Acc$=80%, (c) entry point is limited to be in $N_k$, (d) $p$=0.4, (e) Limited range of candidates, (f) $ME-exhaustive$.
Figure 5: Query time breakdown on 1,000 queries ($k$=50).
...and 12 more figures

Theorems & Definitions (14)

Definition 1: $k$NN Query
Definition 2: $ME@Acc$
Definition 3: $ME_{\delta}^p@Acc$
Definition 4: critical point $\delta_0$
Definition 5: Decision cost
Definition 6: $ME_{\delta}^p@Acc-exhaustive$
Definition 7: Directed Steiner Tree (DST)
Theorem 1
Definition 8: vertex-focused Directed Steiner Network (vDSN)
Theorem 2
...and 4 more

$\boldsymbol{Steiner}$-Hardness: A Query Hardness Measure for Graph-Based ANN Indexes

TL;DR

Abstract

$\boldsymbol{Steiner}$-Hardness: A Query Hardness Measure for Graph-Based ANN Indexes

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (17)

Theorems & Definitions (14)