Table of Contents
Fetching ...

Kruskal-EDS: Edge Dynamic Stratification

Yves Mercadier

TL;DR

An effective complexity of $\Theta(m + p\cdot(m/k)\log(m/k)$ is proved, where $p \leq k$ is the number of strata actually processed and the algorithm achieves near-\Theta(m)$ behaviour.

Abstract

We introduce \textbf{Kruskal-EDS} (\emph{Edge Dynamic Stratification}), a distribution-adaptive variant of Kruskal's minimum spanning tree (MST) algorithm that replaces the mandatory $Θ(m\log m)$ global sort with a three-phase procedure inspired by Birkhoff's ergodic theorem. In Phase 1, a sample of $\sqrt{m}$ edges estimates the weight distribution in $Θ(\sqrt{m}\log m)$ time. In Phase 2, all $m$ edges are assigned to $k$ strata in $Θ(m\log k)$ time via binary search on quantile boundaries -- no global sort. In Phase 3, strata are sorted and processed in order; execution halts as soon as $n{-}1$ MST edges are accepted. We prove an effective complexity of $Θ(m + p\cdot(m/k)\log(m/k))$, where $p \leq k$ is the number of strata actually processed. On sparse graphs or heavy-tailed weight distributions, $p \ll k$ and the algorithm achieves near-$Θ(m)$ behaviour. We further derive the optimal strata count $k^* = \lceil\sqrt{m/\ln(m+1)}\,\rceil$, balancing partition overhead against intra-stratum sort cost. An extensive benchmark on 14 graph families demonstrates correctness on 12 test cases and practical speedups reaching $\mathbf{10\times}$ in wall-clock time and $\mathbf{33\times}$ in sort operations over standard Kruskal. A 3-dimensional TikZ visualisation of the complexity landscape illustrates the algorithm's adaptive behaviour as a function of graph density and weight distribution skewness.

Kruskal-EDS: Edge Dynamic Stratification

TL;DR

An effective complexity of is proved, where is the number of strata actually processed and the algorithm achieves near-\Theta(m)$ behaviour.

Abstract

We introduce \textbf{Kruskal-EDS} (\emph{Edge Dynamic Stratification}), a distribution-adaptive variant of Kruskal's minimum spanning tree (MST) algorithm that replaces the mandatory global sort with a three-phase procedure inspired by Birkhoff's ergodic theorem. In Phase 1, a sample of edges estimates the weight distribution in time. In Phase 2, all edges are assigned to strata in time via binary search on quantile boundaries -- no global sort. In Phase 3, strata are sorted and processed in order; execution halts as soon as MST edges are accepted. We prove an effective complexity of , where is the number of strata actually processed. On sparse graphs or heavy-tailed weight distributions, and the algorithm achieves near- behaviour. We further derive the optimal strata count , balancing partition overhead against intra-stratum sort cost. An extensive benchmark on 14 graph families demonstrates correctness on 12 test cases and practical speedups reaching in wall-clock time and in sort operations over standard Kruskal. A 3-dimensional TikZ visualisation of the complexity landscape illustrates the algorithm's adaptive behaviour as a function of graph density and weight distribution skewness.
Paper Structure (52 sections, 6 theorems, 8 equations, 4 figures, 3 tables, 1 algorithm)

This paper contains 52 sections, 6 theorems, 8 equations, 4 figures, 3 tables, 1 algorithm.

Key Result

Proposition 4.1

By the Dvoretzky-Kiefer-Wolfowitz inequality, for any $\varepsilon>0$: With $s = 2\sqrt{m}$ and $\varepsilon = m^{-1/4}$, the probability that any stratum boundary misclassifies an edge (relative to the true quantile) is at most $2\exp(-2\sqrt{m}\cdot m^{-1/2}) = 2e^{-2}$.

Figures (4)

  • Figure 1: The three phases of Kruskal-EDS. Phases 1 and 2 replace the global sort of standard Kruskal. Phase 3 terminates early once $n{-}1$ MST edges are found (red dashed arrow), potentially leaving the heaviest strata entirely unprocessed.
  • Figure 2: 3D complexity landscape of Kruskal-EDS. The surface shows the theoretical speedup ratio $T_{\mathrm{STD}}/T_{\mathrm{EDS}}$ as a joint function of graph density $\rho$ and weight distribution skewness $\sigma$. Colour encodes speedup: blue $=$ low ($\approx 1\times$), red $=$ high ($\geq 8\times$). Purple spheres mark empirical measurements from \ref{['tab:benchmark']}. The algorithm is most effective in the dense, low-skew regime (red) and near-neutral for sparse, heavily-skewed inputs (blue).
  • Figure 3: Distribution of MST edges across strata for three weight distributions. On uniform and normal distributions, the MST edges concentrate in the first 3--4 strata ($\approx 70$--$80\%$ in strata 0--3); on power-law, they cluster in strata 2--4 due to the heavy left tail. The dashed line marks the typical early-termination point (EDS halts once cumulative fraction reaches 1).
  • Figure 4: Sensitivity of EDS to the strata count $k$ (sparse graph, $n=500$, $m=600$, uniform weights). Time decreases as $k$ increases from 2 (too few, large strata) to $k^*\approx100$ (optimal balance), then increases again as the partition overhead dominates. The STD baseline (dashed) is flat at 0.778 ms.

Theorems & Definitions (14)

  • Definition 3.1: Minimum Spanning Tree
  • Definition 3.2: Weight distribution and quantiles
  • Definition 3.3: Stratification
  • Definition 3.4: Stratum processing count
  • Proposition 4.1: Quantile estimation accuracy
  • Theorem 5.1: Correctness of Kruskal-EDS
  • proof
  • Remark 5.2
  • Theorem 5.3: Complexity of Kruskal-EDS
  • proof
  • ...and 4 more