Table of Contents
Fetching ...

Nested subspace learning with flags

Tom Szwagier, Xavier Pennec

TL;DR

The paper addresses the nestedness gap that arises when selecting subspace dimensions in Grassmannian-based learning. It introduces flag manifolds and the flag trick to lift fixed-dimension subspace problems into multilevel, nested formulations, yielding natural hierarchies of subspaces via the average of projection operators. The authors develop optimization on flag manifolds (steepest descent, with IRLS and Newton variants in the appendices) and demonstrate the approach on robust subspace recovery, trace ratio problems, and spectral clustering, showing improved consistency across levels and potential performance gains through ensemble-like aggregation. This framework offers a unifying, extensible path to multilevel subspace learning with interpretable hierarchies and broad applicability to ML tasks beyond PCA.

Abstract

Many machine learning methods look for low-dimensional representations of the data. The underlying subspace can be estimated by first choosing a dimension $q$ and then optimizing a certain objective function over the space of $q$-dimensional subspaces (the Grassmannian). Trying different $q$ yields in general non-nested subspaces, which raises an important issue of consistency between the data representations. In this paper, we propose a simple and easily implementable principle to enforce nestedness in subspace learning methods. It consists in lifting Grassmannian optimization criteria to flag manifolds (the space of nested subspaces of increasing dimension) via nested projectors. We apply the flag trick to several classical machine learning methods and show that it successfully addresses the nestedness issue.

Nested subspace learning with flags

TL;DR

The paper addresses the nestedness gap that arises when selecting subspace dimensions in Grassmannian-based learning. It introduces flag manifolds and the flag trick to lift fixed-dimension subspace problems into multilevel, nested formulations, yielding natural hierarchies of subspaces via the average of projection operators. The authors develop optimization on flag manifolds (steepest descent, with IRLS and Newton variants in the appendices) and demonstrate the approach on robust subspace recovery, trace ratio problems, and spectral clustering, showing improved consistency across levels and potential performance gains through ensemble-like aggregation. This framework offers a unifying, extensible path to multilevel subspace learning with interpretable hierarchies and broad applicability to ML tasks beyond PCA.

Abstract

Many machine learning methods look for low-dimensional representations of the data. The underlying subspace can be estimated by first choosing a dimension and then optimizing a certain objective function over the space of -dimensional subspaces (the Grassmannian). Trying different yields in general non-nested subspaces, which raises an important issue of consistency between the data representations. In this paper, we propose a simple and easily implementable principle to enforce nestedness in subspace learning methods. It consists in lifting Grassmannian optimization criteria to flag manifolds (the space of nested subspaces of increasing dimension) via nested projectors. We apply the flag trick to several classical machine learning methods and show that it successfully addresses the nestedness issue.

Paper Structure

This paper contains 42 sections, 7 theorems, 49 equations, 9 figures, 1 table, 4 algorithms.

Key Result

Theorem 8

Let $X \coloneqq \in\mathbb{R}^{p\times n}$ be a centered $p$-dimensional ($p \geq 2$) dataset with $n$ samples. Let $q_{1:d} \coloneqq (q_1, q_2 ,\dots, q_d)$ be a sequence of increasing dimensions such that $0 < q_1 < q_2 < \dots < q_d < p$. Let $S \coloneqq \frac{1}{n} X X^\top$ be the sample cov More precisely, one has ${\mathcal{S}_{1:d}^*} = \left(\operatorname{Span}(v_1, \dots, v_{q_1}), \o

Figures (9)

  • Figure 1: Illustration of the subspace nestedness issue in three important machine learning problems: robust subspace recovery (left), linear discriminant analysis (middle) and sparse spectral clustering (right). For each dataset (top), we plot its projection onto the optimal 1D subspace (middle) and 2D subspace (bottom) obtained by solving the associated Grassmannian optimization problem. The 1D and 2D representations are inconsistent---in the sense that the 1D plot is not the projection of the 2D plot onto the horizontal axis---which is a pitfall for data analysis.
  • Figure 2: Illustration of the flag trick methodology. 1) We start with a subspace learning problem: $\mathop{\mathrm{arg\,min}}\limits_{\mathcal{S} \in \operatorname{Gr}(p, q)} f(\Pi_\mathcal{S})$. Trying different $q$'s yields in general non-nested subspaces, which raises an issue of consistency between data representations. 2) We convert the subspace learning problem into a nested subspace learning problem via the flag trick: $\mathop{\mathrm{arg\,min}}\limits_{\mathcal{S}_{1:d} \in \operatorname{Fl}(p, q_{1:d})} f(\frac{1}{d} \sum_{k=1}^d \Pi_{\mathcal{S}_k})$. We run a steepest descent on flag manifolds (Algorithm \ref{['alg:GD']}) and get a flag of nested subspaces. 3) We fit a machine learning algorithm (regression, classification, etc.) to the projected data at each dimension $q_k \in q_{1:d}$. We aggregate the estimators via ensembling methods (hard voting, soft voting, etc.) and get improved predictions.
  • Figure 3: Illustration of the nestedness issue in robust subspace recovery. Given a dataset consisting in a mixture of inliers (blue) and outliers (red) we plot its projection onto the optimal 1D subspace and 2D subspace obtained by solving the associated Grassmannian optimization problem \ref{['eq:RSR_Gr']} or flag optimization problem \ref{['eq:RSR_Fl']}. We can see that the Grassmann representations are not nested, while the flag representations are nested and robust to outliers.
  • Figure 4: Euclidean reconstruction errors (sorted in ascending order) on the corrupted digits dataset for robust subspace recovery ($\|x_i - \Pi_{\mathcal{S}^*} x_i\|$, left) and its flag-tricked version ($\|x_i - \Pi_{\mathcal{S}_{1:d}^*} x_i\|$, right). With the Grassmann-based method, the distributions of reconstruction errors for the inliers and outliers intersect, meaning that we cannot fully distinguish the inliers from the outliers (see the gray zone on the left plot). In contrast, with the flag-based method, the distributions of reconstruction errors for the inliers and outliers are clearly separated, meaning that we can easily distinguish the inliers from the outliers (see the white zone on the right plot). This phenomenon can be explained by the multilevel nature of the flag trick (see the main text for more details).
  • Figure 5: Illustration of the nestedness issue in linear discriminant analysis (trace ratio problem). Given a dataset with five clusters, we plot its projection onto the optimal 1D subspace and 2D subspace obtained by solving the associated Grassmannian optimization problem \ref{['eq:TR_Gr']} or flag optimization problem \ref{['eq:TR_Fl']}. We can see that the Grassmann representations are not nested, while the flag representations are nested and well capture the distribution of clusters. In this example, it is less the nestedness than the rotation of the optimal axes inside the 2D subspace that is critical to the analysis of the Grassmann-based method.
  • ...and 4 more figures

Theorems & Definitions (21)

  • Remark 1: Sequential Methods
  • Remark 2: Importance of Nestedness
  • Remark 3: Orthogonal Representation
  • Remark 4: Initialization
  • Remark 5: Optimization Variants
  • Remark 6: Complexity Analysis
  • Remark 7: Centering
  • Theorem 8: Nested PCA with Flag Manifolds
  • Remark 9: Weighted Projectors
  • Definition 10: Flag Trick
  • ...and 11 more