Kruskal-EDS: Edge Dynamic Stratification

Yves Mercadier

Kruskal-EDS: Edge Dynamic Stratification

Yves Mercadier

TL;DR

An effective complexity of $\Theta(m + p\cdot(m/k)\log(m/k)$ is proved, where $p \leq k$ is the number of strata actually processed and the algorithm achieves near-\Theta(m)$ behaviour.

Abstract

We introduce \textbf{Kruskal-EDS} (\emph{Edge Dynamic Stratification}), a distribution-adaptive variant of Kruskal's minimum spanning tree (MST) algorithm that replaces the mandatory $Θ(m\log m)$ global sort with a three-phase procedure inspired by Birkhoff's ergodic theorem. In Phase 1, a sample of $\sqrt{m}$ edges estimates the weight distribution in $Θ(\sqrt{m}\log m)$ time. In Phase 2, all $m$ edges are assigned to $k$ strata in $Θ(m\log k)$ time via binary search on quantile boundaries -- no global sort. In Phase 3, strata are sorted and processed in order; execution halts as soon as $n{-}1$ MST edges are accepted. We prove an effective complexity of $Θ(m + p\cdot(m/k)\log(m/k))$, where $p \leq k$ is the number of strata actually processed. On sparse graphs or heavy-tailed weight distributions, $p \ll k$ and the algorithm achieves near-$Θ(m)$ behaviour. We further derive the optimal strata count $k^* = \lceil\sqrt{m/\ln(m+1)}\,\rceil$, balancing partition overhead against intra-stratum sort cost. An extensive benchmark on 14 graph families demonstrates correctness on 12 test cases and practical speedups reaching $\mathbf{10\times}$ in wall-clock time and $\mathbf{33\times}$ in sort operations over standard Kruskal. A 3-dimensional TikZ visualisation of the complexity landscape illustrates the algorithm's adaptive behaviour as a function of graph density and weight distribution skewness.

Kruskal-EDS: Edge Dynamic Stratification

TL;DR

An effective complexity of

is proved, where

is the number of strata actually processed and the algorithm achieves near-\Theta(m)$ behaviour.

Abstract

We introduce \textbf{Kruskal-EDS} (\emph{Edge Dynamic Stratification}), a distribution-adaptive variant of Kruskal's minimum spanning tree (MST) algorithm that replaces the mandatory

global sort with a three-phase procedure inspired by Birkhoff's ergodic theorem. In Phase 1, a sample of

edges estimates the weight distribution in

time. In Phase 2, all

edges are assigned to

strata in

time via binary search on quantile boundaries -- no global sort. In Phase 3, strata are sorted and processed in order; execution halts as soon as

MST edges are accepted. We prove an effective complexity of

, where

is the number of strata actually processed. On sparse graphs or heavy-tailed weight distributions,

and the algorithm achieves near-

behaviour. We further derive the optimal strata count

, balancing partition overhead against intra-stratum sort cost. An extensive benchmark on 14 graph families demonstrates correctness on 12 test cases and practical speedups reaching

in wall-clock time and

in sort operations over standard Kruskal. A 3-dimensional TikZ visualisation of the complexity landscape illustrates the algorithm's adaptive behaviour as a function of graph density and weight distribution skewness.

Paper Structure (52 sections, 6 theorems, 8 equations, 4 figures, 3 tables, 1 algorithm)

This paper contains 52 sections, 6 theorems, 8 equations, 4 figures, 3 tables, 1 algorithm.

Introduction
The sorting bottleneck.
The ergodic insight.
Contributions.
Background and Related Work
Classical MST Algorithms
Kruskal (1956).
Prim (1957).
Borůvka (1926).
Fredman-Tarjan (1987).
Chazelle (2000).
Linear MST (randomised).
Optimal deterministic MST.
Distribution-Aware Sorting and Partitioning
From Ergodic Sampling to Edge Stratification
...and 37 more sections

Key Result

Proposition 4.1

By the Dvoretzky-Kiefer-Wolfowitz inequality, for any $\varepsilon>0$: With $s = 2\sqrt{m}$ and $\varepsilon = m^{-1/4}$, the probability that any stratum boundary misclassifies an edge (relative to the true quantile) is at most $2\exp(-2\sqrt{m}\cdot m^{-1/2}) = 2e^{-2}$.

Figures (4)

Figure 1: The three phases of Kruskal-EDS. Phases 1 and 2 replace the global sort of standard Kruskal. Phase 3 terminates early once $n{-}1$ MST edges are found (red dashed arrow), potentially leaving the heaviest strata entirely unprocessed.
Figure 2: 3D complexity landscape of Kruskal-EDS. The surface shows the theoretical speedup ratio $T_{\mathrm{STD}}/T_{\mathrm{EDS}}$ as a joint function of graph density $\rho$ and weight distribution skewness $\sigma$. Colour encodes speedup: blue $=$ low ($\approx 1\times$), red $=$ high ($\geq 8\times$). Purple spheres mark empirical measurements from \ref{['tab:benchmark']}. The algorithm is most effective in the dense, low-skew regime (red) and near-neutral for sparse, heavily-skewed inputs (blue).
Figure 3: Distribution of MST edges across strata for three weight distributions. On uniform and normal distributions, the MST edges concentrate in the first 3--4 strata ($\approx 70$--$80\%$ in strata 0--3); on power-law, they cluster in strata 2--4 due to the heavy left tail. The dashed line marks the typical early-termination point (EDS halts once cumulative fraction reaches 1).
Figure 4: Sensitivity of EDS to the strata count $k$ (sparse graph, $n=500$, $m=600$, uniform weights). Time decreases as $k$ increases from 2 (too few, large strata) to $k^*\approx100$ (optimal balance), then increases again as the partition overhead dominates. The STD baseline (dashed) is flat at 0.778 ms.

Theorems & Definitions (14)

Definition 3.1: Minimum Spanning Tree
Definition 3.2: Weight distribution and quantiles
Definition 3.3: Stratification
Definition 3.4: Stratum processing count
Proposition 4.1: Quantile estimation accuracy
Theorem 5.1: Correctness of Kruskal-EDS
proof
Remark 5.2
Theorem 5.3: Complexity of Kruskal-EDS
proof
...and 4 more

Kruskal-EDS: Edge Dynamic Stratification

TL;DR

Abstract

Kruskal-EDS: Edge Dynamic Stratification

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (4)

Theorems & Definitions (14)