Bridging the Evaluation Gap: Standardized Benchmarks for Multi-Objective Search

Hadar Peer; Carlos Hernandez; Sven Koenig; Ariel Felner; Oren Salzman

Bridging the Evaluation Gap: Standardized Benchmarks for Multi-Objective Search

Hadar Peer, Carlos Hernandez, Sven Koenig, Ariel Felner, Oren Salzman

Abstract

Empirical evaluation in multi-objective search (MOS) has historically suffered from fragmentation, relying on heterogeneous problem instances with incompatible objective definitions that make cross-study comparisons difficult. This standardization gap is further exacerbated by the realization that DIMACS road networks, a historical default benchmark for the field, exhibit highly correlated objectives that fail to capture diverse Pareto-front structures. To address this, we introduce the first comprehensive, standardized benchmark suite for exact and approximate MOS. Our suite spans four structurally diverse domains: real-world road networks, structured synthetic graphs, game-based grid environments, and high-dimensional robotic motion-planning roadmaps. By providing fixed graph instances, standardized start-goal queries, and both exact and approximate reference Pareto-optimal solution sets, this suite captures a full spectrum of objective interactions: from strongly correlated to strictly independent. Ultimately, this benchmark provides a common foundation to ensure future MOS evaluations are robust, reproducible, and structurally comprehensive.

Bridging the Evaluation Gap: Standardized Benchmarks for Multi-Objective Search

Abstract

Paper Structure (51 sections, 7 equations, 5 figures, 2 tables)

This paper contains 51 sections, 7 equations, 5 figures, 2 tables.

Introduction
Notations and Problem Definition
Objective correlation.
Related Work
Exact and Approximate Search Algorithms.
Extensions beyond the classical formulation.
Benchmarking and evaluation practice.
The Benchmark Suite
Benchmark Design Goals
Road Networks
DIMACS Road Networks
Objectives.
Structure.
Objective interaction.
Queries.
...and 36 more sections

Figures (5)

Figure 1: Illustration of the NetMaker graph construction for $\vert V \vert=20$, $a_{\min}=1$, $a_{\max}=4$, and $I_{\text{vertex}}=4$. Blue and red edges correspond to $E_{\mathrm{cyc}}$ and $E_{\mathrm{loc}}$, respectively. Vertices are arranged according to the Hamiltonian cycle for visualization; since locality is defined in vertex-ID space rather than cycle order, some locality edges appear to connect distant vertices in the drawing.
Figure 2: Games benchmark environments. Black regions denote obstacles. Red intensity reflects cost magnitude with darker red indicating regions observed by multiple guards.
Figure 3: Pareto-front cardinality across queries as a function of the approximation parameter $\varepsilon$ for representative benchmarks from all benchmark families. Each point corresponds to a start–goal query.
Figure 4: (a) Panda RRG benchmark environment of a Franka Emika Panda manipulator a tabletop scene featuring vertical support pillars, and box-shaped obstacles. (b) Illustration of link-specific clearance distances ($d_1$, $d_2$, $d_3$) in a 2D setting. Unlike an aggregated clearance model which collapses safety into a single minimum-distance penalty (here, $d_2$), a per-link formulation preserves the geometric risk profile across the entire manipulator.
Figure 5: Representative exact two-objective Pareto fronts for single queries from different benchmark families. The examples highlight substantial variation in objective-space geometry across the benchmark suite.

Bridging the Evaluation Gap: Standardized Benchmarks for Multi-Objective Search

Abstract

Bridging the Evaluation Gap: Standardized Benchmarks for Multi-Objective Search

Authors

Abstract

Table of Contents

Figures (5)