Table of Contents
Fetching ...

Bridging the Evaluation Gap: Standardized Benchmarks for Multi-Objective Search

Hadar Peer, Carlos Hernandez, Sven Koenig, Ariel Felner, Oren Salzman

Abstract

Empirical evaluation in multi-objective search (MOS) has historically suffered from fragmentation, relying on heterogeneous problem instances with incompatible objective definitions that make cross-study comparisons difficult. This standardization gap is further exacerbated by the realization that DIMACS road networks, a historical default benchmark for the field, exhibit highly correlated objectives that fail to capture diverse Pareto-front structures. To address this, we introduce the first comprehensive, standardized benchmark suite for exact and approximate MOS. Our suite spans four structurally diverse domains: real-world road networks, structured synthetic graphs, game-based grid environments, and high-dimensional robotic motion-planning roadmaps. By providing fixed graph instances, standardized start-goal queries, and both exact and approximate reference Pareto-optimal solution sets, this suite captures a full spectrum of objective interactions: from strongly correlated to strictly independent. Ultimately, this benchmark provides a common foundation to ensure future MOS evaluations are robust, reproducible, and structurally comprehensive.

Bridging the Evaluation Gap: Standardized Benchmarks for Multi-Objective Search

Abstract

Empirical evaluation in multi-objective search (MOS) has historically suffered from fragmentation, relying on heterogeneous problem instances with incompatible objective definitions that make cross-study comparisons difficult. This standardization gap is further exacerbated by the realization that DIMACS road networks, a historical default benchmark for the field, exhibit highly correlated objectives that fail to capture diverse Pareto-front structures. To address this, we introduce the first comprehensive, standardized benchmark suite for exact and approximate MOS. Our suite spans four structurally diverse domains: real-world road networks, structured synthetic graphs, game-based grid environments, and high-dimensional robotic motion-planning roadmaps. By providing fixed graph instances, standardized start-goal queries, and both exact and approximate reference Pareto-optimal solution sets, this suite captures a full spectrum of objective interactions: from strongly correlated to strictly independent. Ultimately, this benchmark provides a common foundation to ensure future MOS evaluations are robust, reproducible, and structurally comprehensive.
Paper Structure (51 sections, 7 equations, 5 figures, 2 tables)

This paper contains 51 sections, 7 equations, 5 figures, 2 tables.

Figures (5)

  • Figure 1: Illustration of the NetMaker graph construction for $\vert V \vert=20$, $a_{\min}=1$, $a_{\max}=4$, and $I_{\text{vertex}}=4$. Blue and red edges correspond to $E_{\mathrm{cyc}}$ and $E_{\mathrm{loc}}$, respectively. Vertices are arranged according to the Hamiltonian cycle for visualization; since locality is defined in vertex-ID space rather than cycle order, some locality edges appear to connect distant vertices in the drawing.
  • Figure 2: Games benchmark environments. Black regions denote obstacles. Red intensity reflects cost magnitude with darker red indicating regions observed by multiple guards.
  • Figure 3: Pareto-front cardinality across queries as a function of the approximation parameter $\varepsilon$ for representative benchmarks from all benchmark families. Each point corresponds to a start–goal query.
  • Figure 4: (a) Panda RRG benchmark environment of a Franka Emika Panda manipulator a tabletop scene featuring vertical support pillars, and box-shaped obstacles. (b) Illustration of link-specific clearance distances ($d_1$, $d_2$, $d_3$) in a 2D setting. Unlike an aggregated clearance model which collapses safety into a single minimum-distance penalty (here, $d_2$), a per-link formulation preserves the geometric risk profile across the entire manipulator.
  • Figure 5: Representative exact two-objective Pareto fronts for single queries from different benchmark families. The examples highlight substantial variation in objective-space geometry across the benchmark suite.