Table of Contents
Fetching ...

NOMAD Projection

Brandon Duderstadt, Zach Nussbaum, Laurens van der Maaten

TL;DR

This paper introduces NOMAD Projection, a distributed nonlinear dimensionality reduction method for unstructured data visualization that can train across multiple GPUs by approximating an upper bound on the InfoNC-t-SNE loss. By leveraging a cluster-based ANN index for positive forces and a surrogate loss using cluster means for negative forces, NOMAD Projection dramatically improves scalability while preserving local and, to some extent, global structure. The authors provide theoretical bounds linking NOMAD Projection to InfoNC-t-SNE and demonstrate strong empirical performance on ArXiv, ImageNet, PubMed, and a 60-million-point Multilingual Wikipedia map, often outperforming or matching GPU-based baselines with significantly reduced wall-clock time. The work enables large-scale, explainable data visualizations and opens avenues for multi-node extensions and broader applications in contrastive learning and language modeling.

Abstract

The rapid adoption of generative AI has driven an explosion in the size of datasets consumed and produced by AI models. Traditional methods for unstructured data visualization, such as t-SNE and UMAP, have not kept up with the pace of dataset scaling. This presents a significant challenge for AI explainability, which relies on methods such as t-SNE and UMAP for exploratory data analysis. In this paper, we introduce Negative Or Mean Affinity Discrimination (NOMAD) Projection, the first method for unstructured data visualization via nonlinear dimensionality reduction that can run on multiple GPUs at train time. We provide theory that situates NOMAD Projection as an approximate upper bound on the InfoNC-t-SNE loss, and empirical results that demonstrate NOMAD Projection's superior performance and speed profile compared to existing state-of-the-art methods. We demonstrate the scalability of NOMAD Projection by computing the first complete data map of Multilingual Wikipedia.

NOMAD Projection

TL;DR

This paper introduces NOMAD Projection, a distributed nonlinear dimensionality reduction method for unstructured data visualization that can train across multiple GPUs by approximating an upper bound on the InfoNC-t-SNE loss. By leveraging a cluster-based ANN index for positive forces and a surrogate loss using cluster means for negative forces, NOMAD Projection dramatically improves scalability while preserving local and, to some extent, global structure. The authors provide theoretical bounds linking NOMAD Projection to InfoNC-t-SNE and demonstrate strong empirical performance on ArXiv, ImageNet, PubMed, and a 60-million-point Multilingual Wikipedia map, often outperforming or matching GPU-based baselines with significantly reduced wall-clock time. The work enables large-scale, explainable data visualizations and opens avenues for multi-node extensions and broader applications in contrastive learning and language modeling.

Abstract

The rapid adoption of generative AI has driven an explosion in the size of datasets consumed and produced by AI models. Traditional methods for unstructured data visualization, such as t-SNE and UMAP, have not kept up with the pace of dataset scaling. This presents a significant challenge for AI explainability, which relies on methods such as t-SNE and UMAP for exploratory data analysis. In this paper, we introduce Negative Or Mean Affinity Discrimination (NOMAD) Projection, the first method for unstructured data visualization via nonlinear dimensionality reduction that can run on multiple GPUs at train time. We provide theory that situates NOMAD Projection as an approximate upper bound on the InfoNC-t-SNE loss, and empirical results that demonstrate NOMAD Projection's superior performance and speed profile compared to existing state-of-the-art methods. We demonstrate the scalability of NOMAD Projection by computing the first complete data map of Multilingual Wikipedia.

Paper Structure

This paper contains 14 sections, 1 theorem, 15 equations, 4 figures, 1 table.

Key Result

Theorem 1

Let $G$ be an ANN graph formed from a vector dataset. Let $P$ be a probability distribution over directed edges in $G$ with finite moments, $\xi$ be a uniform distribution over all edges in the complete digraph on nodes in $G$, $R$ be a partition of the support of $\xi$, and $q$ be the Cauchy kernel

Figures (4)

  • Figure 1: A visualization of Multilingual Wikipedia made using NOMAD Projection. Bright regions indicate regions of high data density.
  • Figure 2: NOMAD Projection's Distributed Training Strategy - First, input data is partitioned into clusters $C_1, C_2, ..., C_{|R|}$ during the creation of the ANN index. Clusters are then sharded across devices $D_1, ..., D_{\text{rank}}$. Since each cluster is a component of the ANN graph, no inter-device communication is required during positive spring force calculation. After every epoch, only the matrices of cluster means $\mu_{D_1}, ..., \mu_{D_\text{rank}}$ are all-gathered, minimizing the inter-device communication required for negative spring force calculation.
  • Figure 3: A comparison of the speed and performance of several GPU accelerated data mapping algorithms. In all cases, NOMAD Projection achieves similar or superior neighborhood preservation and random triplet accuracy to existing methods when run for sufficiently many epochs. It is worth noting that t-SNE-CUDA achieves impressively fast neighborhood preservation scores, but struggles to continue improving with additional epochs of training. NOMAD Projection can take advantage of multiple GPUs to significantly improve its neighborhood preservation performance at the expense of random triplet accuracy. Note that t-SNE-CUDA does not take advantage of techniques for improving global coherence such as early exaggeration or spectral initialization, which we conjecture negatively impacts its random triplet accuracies.
  • Figure 4: A multiscale qualitative exploration of a NOMAD Projection of Multilingual Wikipedia. A detailed analysis is presented in Section \ref{['sec:wikitext']}

Theorems & Definitions (1)

  • Theorem 1