Table of Contents
Fetching ...

A Trainable Centrality Framework for Modern Data

Minh Duc Vu, Mingshuo Liu, Doudou Zhou

TL;DR

We address the challenge of defining centrality in high-dimensional and non-Euclidean data by learning a depth-like notion that scales with modern representations. FUSE combines a global anchor-free ranking head with a local, density-based score-matching head, and introduces a homotopy interpolation to balance global structure and local density in a single forward pass. Across synthetic and real data, including images, time series, and text, FUSE recovers classical depth-like orderings, reveals multi-scale geometry, and achieves competitive outlier detection while remaining computationally efficient. The framework operates on fixed representations, enabling fast inference and broad applicability, with future work exploring alternative similarities and multimodal extensions.

Abstract

Measuring how central or typical a data point is underpins robust estimation, ranking, and outlier detection, but classical depth notions become expensive and unstable in high dimensions and are hard to extend beyond Euclidean data. We introduce Fused Unified centrality Score Estimation (FUSE), a neural centrality framework that operates on top of arbitrary representations. FUSE combines a global head, trained from pairwise distance-based comparisons to learn an anchor-free centrality score, with a local head, trained by denoising score matching to approximate a smoothed log-density potential. A single parameter between 0 and 1 interpolates between these calibrated signals, yielding depth-like centrality from different views via one forward pass. Across synthetic distributions, real images, time series, and text data, and standard outlier detection benchmarks, FUSE recovers meaningful classical ordering, reveals multi-scale geometric structures, and attains competitive performance with strong classical baselines while remaining simple and efficient.

A Trainable Centrality Framework for Modern Data

TL;DR

We address the challenge of defining centrality in high-dimensional and non-Euclidean data by learning a depth-like notion that scales with modern representations. FUSE combines a global anchor-free ranking head with a local, density-based score-matching head, and introduces a homotopy interpolation to balance global structure and local density in a single forward pass. Across synthetic and real data, including images, time series, and text, FUSE recovers classical depth-like orderings, reveals multi-scale geometry, and achieves competitive outlier detection while remaining computationally efficient. The framework operates on fixed representations, enabling fast inference and broad applicability, with future work exploring alternative similarities and multimodal extensions.

Abstract

Measuring how central or typical a data point is underpins robust estimation, ranking, and outlier detection, but classical depth notions become expensive and unstable in high dimensions and are hard to extend beyond Euclidean data. We introduce Fused Unified centrality Score Estimation (FUSE), a neural centrality framework that operates on top of arbitrary representations. FUSE combines a global head, trained from pairwise distance-based comparisons to learn an anchor-free centrality score, with a local head, trained by denoising score matching to approximate a smoothed log-density potential. A single parameter between 0 and 1 interpolates between these calibrated signals, yielding depth-like centrality from different views via one forward pass. Across synthetic distributions, real images, time series, and text data, and standard outlier detection benchmarks, FUSE recovers meaningful classical ordering, reveals multi-scale geometric structures, and attains competitive performance with strong classical baselines while remaining simple and efficient.

Paper Structure

This paper contains 35 sections, 30 equations, 11 figures, 8 tables, 2 algorithms.

Figures (11)

  • Figure 1: Training (top): mapped data $\psi(X)$ pass through a shared encoder and branch into global $g_\theta$ and local $l_\theta$ heads. Inference (bottom): given $(X,t)$, the homotopy outputs a single centrality score $f(X,t)$. The dissimilarity $\delta$ can be defined on raw data or pretrained embeddings.
  • Figure 2: From top to bottom: Homotopy centrality contours on Normal, Student-$t$, Uniform, Gaussian mixture. $t \in \{0, 0.25, 0.5, 0.75, 1\}$.
  • Figure 3: Inference time per sample comparison of methods across increasing dimensions (left) and increasing sample sizes (right).
  • Figure 4: Comparison of centrality methods on CIFAR-10 (airplane).
  • Figure 5: Homotopy centrality on MNIST.
  • ...and 6 more figures