Table of Contents
Fetching ...

ArrowFlow: Hierarchical Machine Learning in the Space of Permutations

Ozgur Yilmaz

Abstract

We introduce ArrowFlow, a machine learning architecture that operates entirely in the space of permutations. Its computational units are ranking filters, learned orderings that compare inputs via Spearman's footrule distance and update through permutation-matrix accumulation, a non-gradient rule rooted in displacement evidence. Layers compose hierarchically: each layer's output ranking becomes the next layer's input, enabling deep ordinal representation learning without any floating-point parameters in the core computation. We connect the architecture to Arrow's impossibility theorem, showing that violations of social-choice fairness axioms (context dependence, specialization, symmetry breaking) serve as inductive biases for nonlinearity, sparsity, and stability. Experiments span UCI tabular benchmarks, MNIST, gene expression cancer classification (TCGA), and preference data, all against GridSearchCV-tuned baselines. ArrowFlow beats all baselines on Iris (2.7% vs. 3.3%) and is competitive on most UCI datasets. A single parameter, polynomial degree, acts as a master switch: degree 1 yields noise robustness (8-28% less degradation), privacy preservation (+0.5pp cost), and missing-feature resilience; higher degrees trade these for improved clean accuracy. ArrowFlow is not designed to surpass gradient-based methods. It is an existence proof that competitive classification is possible in a fundamentally different computational paradigm, one that elevates ordinal structure to a first-class citizen, with natural alignment to integer-only and neuromorphic hardware.

ArrowFlow: Hierarchical Machine Learning in the Space of Permutations

Abstract

We introduce ArrowFlow, a machine learning architecture that operates entirely in the space of permutations. Its computational units are ranking filters, learned orderings that compare inputs via Spearman's footrule distance and update through permutation-matrix accumulation, a non-gradient rule rooted in displacement evidence. Layers compose hierarchically: each layer's output ranking becomes the next layer's input, enabling deep ordinal representation learning without any floating-point parameters in the core computation. We connect the architecture to Arrow's impossibility theorem, showing that violations of social-choice fairness axioms (context dependence, specialization, symmetry breaking) serve as inductive biases for nonlinearity, sparsity, and stability. Experiments span UCI tabular benchmarks, MNIST, gene expression cancer classification (TCGA), and preference data, all against GridSearchCV-tuned baselines. ArrowFlow beats all baselines on Iris (2.7% vs. 3.3%) and is competitive on most UCI datasets. A single parameter, polynomial degree, acts as a master switch: degree 1 yields noise robustness (8-28% less degradation), privacy preservation (+0.5pp cost), and missing-feature resilience; higher degrees trade these for improved clean accuracy. ArrowFlow is not designed to surpass gradient-based methods. It is an existence proof that competitive classification is possible in a fundamentally different computational paradigm, one that elevates ordinal structure to a first-class citizen, with natural alignment to integer-only and neuromorphic hardware.

Paper Structure

This paper contains 104 sections, 12 theorems, 24 equations, 6 figures, 29 tables, 1 algorithm.

Key Result

Proposition 1

Given $d_F(\pi_x, r_j) = \sum_{p=1}^\ell |m_j[p]|$, the displacement $m_j[p]$ from Eq. eq:motion is the unique per-position minimizer---the shift that zeroes item $\pi_x[p]$'s footrule contribution: for each $p$, $\blacktriangleleft$$\blacktriangleleft$

Figures (6)

  • Figure 1: ArrowFlow pipeline. Real-valued input $x$ is encoded via polynomial expansion, standardization, random projection, and argsort. $K$ independent networks receive different projections $W_k$. Each network (detail, right) is a stack of sort layers computing Spearman's footrule distance to learned filters. Predictions are combined by majority vote.
  • Figure 2: Ranking layer forward pass. Input $\pi_x = [C, A, E, B, D]$ is compared to three filters. The motion $m_j[p] = \mathrm{rank}(r_j, \pi_x[p]) - p$ measures signed displacement (red = forward, blue = backward). The footrule distance $D_j = \sum |m_j[p]|$ totals the displacements. Filters are ranked by proximity: $r_2$ (distance 2) is closest. The output $\pi'$ becomes the input to the next layer.
  • Figure 3: Permutation-matrix accumulation. (a) Accumulator starts as identity (encoding current filter $[A,B,C,D]$). (b) Training input $[C,A,D,B]$: each item votes for its position in the input (Def. \ref{['def:vote-matrix']}). (c) Votes sum additively (Eq. \ref{['eq:accum-update']}). (d) Weighted average and argsort yield the updated filter $[A,C,B,D]$ (Eq. \ref{['eq:accum-reorder']}), which has moved toward the training data. Convergence is guaranteed under Proposition \ref{['prop:convergence']}.
  • Figure 4: Multi-view ensemble. A single preprocessed input is projected through $K$ different matrices using diverse strategies (target-aware, random, calibrated). Each produces a different permutation---a different ordinal "view." Independent ArrowFlow networks are trained per view; predictions are combined by majority vote. Projection diversity approximates the independence condition of Theorem \ref{['thm:ensemble']}.
  • Figure 5: Permutation cones. The argsort encoding partitions $\mathbb{R}^d$ into $d!$ convex cones (Weyl chambers), separated by hyperplanes $\{x_i = x_j\}$. Shown for $d=3$: six orderings of three coordinates. All points within a cone map to the same permutation. The dotted circle illustrates the stability radius from Theorem \ref{['thm:stability']}: perturbations smaller than $\delta_{\min}(x)/2$ cannot cross a boundary.
  • ...and 1 more figures

Theorems & Definitions (33)

  • Proposition 1: Motion as Discrete Gradient
  • Definition 2: Vote Matrix
  • Remark 1
  • Lemma 3: Footrule--Kendall Equivalence diaconis1977footrule
  • Theorem 4: Argsort Stability
  • proof
  • Corollary 5: Gaussian Stability Bound
  • proof
  • Remark 2
  • Theorem 6: Ordinal Information Capacity
  • ...and 23 more