A Survey on Archetypal Analysis
Aleix Alcacer, Irene Epifanio, Sebastian Mair, Morten Mørup
TL;DR
Archetypal Analysis (AA) provides an interpretable, geometry-based framework to represent each observation as a convex combination of a small set of extreme archetypes that lie on the data's convex hull. The paper surveys the mathematical formulation, optimization approaches, extensions (e.g., kernel AA, archetypoids, BiAA), robustness and missing-data strategies, and a wide array of applications across life sciences, physics/chemistry, climate science, computer science, and social sciences. It also discusses practical concerns such as initialization, model-order selection, scalability, and reproducibility, and outlines future directions including non-linear extensions, temporal dynamics, and automated selection of the number of archetypes. Overall, the work positions AA as a versatile, interpretable tool that complements clustering and matrix factorization, while identifying key limitations and open problems for ongoing research.
Abstract
Archetypal analysis (AA) was originally proposed in 1994 by Adele Cutler and Leo Breiman as a computational procedure for extracting distinct aspects, so-called archetypes, from observations, with each observational record approximated as a mixture (i.e., convex combination) of these archetypes. AA thereby provides straightforward, interpretable, and explainable representations for feature extraction and dimensionality reduction, facilitating the understanding of the structure of high-dimensional data and enabling wide applications across the sciences. However, AA also faces challenges, particularly as the associated optimization problem is non-convex. This is the first survey that provides researchers and data mining practitioners with an overview of the methodologies and opportunities that AA offers, surveying the many applications of AA across disparate fields of science, as well as best practices for modeling data with AA and its limitations. The survey concludes by explaining crucial future research directions concerning AA.
