Table of Contents
Fetching ...

Attraction-Repulsion Swarming: A Generalized Framework of t-SNE via Force Normalization and Tunable Interactions

Jingcheng Lu, Jeff Calder

TL;DR

The proposed ARS data visualization method is not gradient descent on the Kullback-Leibler (KL) divergence, and can be viewed solely as an interacting particle system driven by attraction and repulsion forces, which illustrates that the KL divergence is not an essential part of the t-SNE algorithm.

Abstract

We propose a new method for data visualization based on attraction-repulsion swarming (ARS) dynamics, which we call ARS visualization. ARS is a generalized framework that is based on viewing the t-distributed stochastic neighbor embedding (t-SNE) visualization technique as a swarm of interacting agents driven by attraction and repulsion. Motivated by recent developments in swarming, we modify the t-SNE dynamics to include a normalization by the \emph{total influence}, which results in better posed dynamics in which we can use a data size independent time step (of $h=1$) and a simple iteration, without the need for the array of optimization tricks employed in t-SNE. ARS also includes the ability to separately tune the attraction and repulsion kernels, which gives the user control over the tightness within clusters and the spacing between them in the visualization. In contrast with t-SNE, our proposed ARS data visualization method is not gradient descent on the Kullback-Leibler divergence, and can be viewed solely as an interacting particle system driven by attraction and repulsion forces. We provide theoretical results illustrating how the choice of interaction kernel affects the dynamics, and experimental results to validate our method and compare to t-SNE on the MNIST and Cifar-10 data sets.

Attraction-Repulsion Swarming: A Generalized Framework of t-SNE via Force Normalization and Tunable Interactions

TL;DR

The proposed ARS data visualization method is not gradient descent on the Kullback-Leibler (KL) divergence, and can be viewed solely as an interacting particle system driven by attraction and repulsion forces, which illustrates that the KL divergence is not an essential part of the t-SNE algorithm.

Abstract

We propose a new method for data visualization based on attraction-repulsion swarming (ARS) dynamics, which we call ARS visualization. ARS is a generalized framework that is based on viewing the t-distributed stochastic neighbor embedding (t-SNE) visualization technique as a swarm of interacting agents driven by attraction and repulsion. Motivated by recent developments in swarming, we modify the t-SNE dynamics to include a normalization by the \emph{total influence}, which results in better posed dynamics in which we can use a data size independent time step (of ) and a simple iteration, without the need for the array of optimization tricks employed in t-SNE. ARS also includes the ability to separately tune the attraction and repulsion kernels, which gives the user control over the tightness within clusters and the spacing between them in the visualization. In contrast with t-SNE, our proposed ARS data visualization method is not gradient descent on the Kullback-Leibler divergence, and can be viewed solely as an interacting particle system driven by attraction and repulsion forces. We provide theoretical results illustrating how the choice of interaction kernel affects the dynamics, and experimental results to validate our method and compare to t-SNE on the MNIST and Cifar-10 data sets.

Paper Structure

This paper contains 15 sections, 1 theorem, 51 equations, 11 figures.

Key Result

Proposition 4.1

Let $\mathbf{y}_i(t)$, $i = 1,2,\cdots,N$ be the solution to the attraction-repulsion dynamics eq:ARS dynamics subject to initial data $\mathbf{y}_i(0) = \mathbf{y}^0_i$. Then the spatial diameter of the solution, satisfies

Figures (11)

  • Figure 1: Examples of MNIST and Cifar-10 images.
  • Figure 2: Comparison of ARS and t-SNE on MNIST and Cifar-10.
  • Figure 3: ARS$_{2,2}$ vs t-SNE applied to 1000 MNIST images of digits $0,1,2,3$, using $100$ steps of early exaggeration, $\alpha=40$. The top row corresponds to iteration $200$, while the bottom row is $1000$. For t-SNE, the time step $h$ is given for the gradient descent iterations, while $h=1$ is used during early exaggeration.
  • Figure 4: ARS$_{2,2}$ vs t-SNE applied to 2000 MNIST images of digits $0,1,2,3$, using $100$ steps of early exaggeration, $\alpha=40$. The top row corresponds to iteration $200$, while the bottom row is $1000$. For t-SNE, the time step $h$ is given for the gradient descent iterations, while $h=1$ is used during early exaggeration.
  • Figure 5: t-SNE with uniform learning rate $h = 70$ for both early exaggeration and gradient descent.
  • ...and 6 more figures

Theorems & Definitions (3)

  • Remark 2.1
  • Remark 3.1
  • Proposition 4.1