Size Should not Matter: Scale-invariant Stress Metrics
Reyan Ahmed, Cesim Erten, Stephen Kobourov, Jonah Lotz, Jacob Miller, Hamlet Taraz
TL;DR
The paper tackles the problem that widely used stress metrics for graph drawings are often scale-sensitive, which can mislead comparisons across algorithms. It analyzes eight metrics, derives closed-form, scale-minimizing variants, and demonstrates that scale-invariant metrics, particularly Scale-normalized Stress (SNS), align with intuitive and ground-truth layout quality across two diverse graph sets. Through experiments on Rome-Lib and SuiteSparse datasets using Neato, SFDP, and Random layouts, the authors show that scale-sensitive metrics can produce incorrect orderings, while SNS reliably recovers the expected ranking and remains fast to compute. They advocate adopting scale-normalized stress as the standard metric for fair stress-based evaluation, provide open-source implementations, and discuss limitations and directions for future work including re-evaluating prior studies with scale-invariant metrics.
Abstract
The normalized stress metric measures how closely distances between vertices in a graph drawing match the graph-theoretic distances between those vertices. It is one of the most widely employed quality metrics for graph drawing, and is even the optimization goal of several popular graph layout algorithms. However, normalized stress can be misleading when used to compare the outputs of two or more algorithms, as it is sensitive to the size of the drawing compared to the graph-theoretic distances used. Uniformly scaling a layout will change the value of stress despite not meaningfully changing the drawing. In fact, the change in stress values can be so significant that a clearly better layout can appear to have a worse stress score than a random layout. In this paper, we study different variants for calculating stress used in the literature (raw stress, normalized stress, etc.) and show that many of them are affected by this problem, which threatens the validity of experiments that compare the quality of one algorithm to that of another. We then experimentally justify one of the stress calculation variants, scale-normalized stress, as one that fairly compares drawing outputs regardless of their size. We also describe an efficient computation for scale-normalized stress and provide an open source implementation.
