Table of Contents
Fetching ...

Archetypal Analysis++: Rethinking the Initialization Strategy

Sebastian Mair, Jens Sjölund

TL;DR

An extensive empirical evaluation of 15 real-world data sets of varying sizes and dimensionalities and considering two pre-processing strategies, it is shown that AA++ almost always outperforms all baselines, including the most frequently used ones.

Abstract

Archetypal analysis is a matrix factorization method with convexity constraints. Due to local minima, a good initialization is essential, but frequently used initialization methods yield either sub-optimal starting points or are prone to get stuck in poor local minima. In this paper, we propose archetypal analysis++ (AA++), a probabilistic initialization strategy for archetypal analysis that sequentially samples points based on their influence on the objective function, similar to $k$-means++. In fact, we argue that $k$-means++ already approximates the proposed initialization method. Furthermore, we suggest to adapt an efficient Monte Carlo approximation of $k$-means++ to AA++. In an extensive empirical evaluation of 15 real-world data sets of varying sizes and dimensionalities and considering two pre-processing strategies, we show that AA++ almost always outperforms all baselines, including the most frequently used ones.

Archetypal Analysis++: Rethinking the Initialization Strategy

TL;DR

An extensive empirical evaluation of 15 real-world data sets of varying sizes and dimensionalities and considering two pre-processing strategies, it is shown that AA++ almost always outperforms all baselines, including the most frequently used ones.

Abstract

Archetypal analysis is a matrix factorization method with convexity constraints. Due to local minima, a good initialization is essential, but frequently used initialization methods yield either sub-optimal starting points or are prone to get stuck in poor local minima. In this paper, we propose archetypal analysis++ (AA++), a probabilistic initialization strategy for archetypal analysis that sequentially samples points based on their influence on the objective function, similar to -means++. In fact, we argue that -means++ already approximates the proposed initialization method. Furthermore, we suggest to adapt an efficient Monte Carlo approximation of -means++ to AA++. In an extensive empirical evaluation of 15 real-world data sets of varying sizes and dimensionalities and considering two pre-processing strategies, we show that AA++ almost always outperforms all baselines, including the most frequently used ones.
Paper Structure (32 sections, 5 theorems, 12 equations, 17 figures, 1 table, 4 algorithms)

This paper contains 32 sections, 5 theorems, 12 equations, 17 figures, 1 table, 4 algorithms.

Key Result

Lemma 3.0

Let $\| \mathbf{X} - \mathbf{A} \mathbf{Z} \|_{\operatorname{F}}^2 > 0$, i.e., there are points yielding projection errors. Then, adding a point $\mathbf{x}\in\mathcal{X} \setminus \mathcal{Z}$ to the set of archetypes $\mathcal{Z}$ according to AA++ (Algorithm alg:AApp) is guaranteed to decrease th

Figures (17)

  • Figure 1: Archetypal analysis in two dimensions with $k=4$ randomly initialized archetypes $\{\mathbf{z}_1,\ldots,\mathbf{z}_4\}$ shown in orange. The archetypes after optimization are depicted in blue.
  • Figure 2: A comparison of Uniform, FurthestSum, and the proposed AA++ when consecutively initializing $k=4$ archetypes. MSE denotes the mean square error, i.e., Equation \ref{['eq:sum_proj']} multiplied by $n^{-1}$.
  • Figure 3: Approximation of the distance function in two dimensions. The true distance of the green point is depicted using a solid line whereas the approximation is shown as a (larger) dashed line. The red point has no distance to the convex hull, but the approximation yields a positive distance.
  • Figure 4: Results on California Housing, Covertype, FMA, KDD-Protein, Pose, RNA, and Song.
  • Figure 5: Aggregated statistics over 15 data sets (seven data sets from above and eight data sets from the appendix). Each table shows how often each initialization method yields the best result for various choices of $k$ under different settings. Best refers to the lowest error of a single seed and median refers to the median over many seeds. We report on the performance after initialization and overall during the optimization.
  • ...and 12 more figures

Theorems & Definitions (6)

  • Lemma 3.0
  • Proposition 3.0
  • Theorem 4.1: Adapted from bachem2016approximate
  • Lemma A.0
  • proof
  • Proposition A.0