Table of Contents
Fetching ...

FORML: A Riemannian Hessian-free Method for Meta-learning on Stiefel Manifolds

Hadi Tabealhojeh, Soumava Kumar Roy, Peyman Adibi, Hossein Karshenas

TL;DR

This work targets the computational bottleneck of meta-learning on Riemannian manifolds by introducing FORML, a Hessian-free, first-order Riemannian meta-learning method on the Stiefel manifold. By constraining the final classification head to lie on $St(n,p)$ and employing a first-order gradient approximation, FORML avoids differentiating through full inner-loop trajectories while preserving effective gradient reuse via an orthogonal head. The bi-level optimization trains the Stiefel head with a normalized cosine-distance forward pass, while other layers operate in Euclidean space, yielding significant reductions in memory and compute. Empirically, FORML achieves competitive or superior performance to MAML across single-domain and cross-domain few-shot benchmarks, with additional benefits in deeper architectures and robust meta-learning dynamics.

Abstract

Meta-learning problem is usually formulated as a bi-level optimization in which the task-specific and the meta-parameters are updated in the inner and outer loops of optimization, respectively. However, performing the optimization in the Riemannian space, where the parameters and meta-parameters are located on Riemannian manifolds is computationally intensive. Unlike the Euclidean methods, the Riemannian backpropagation needs computing the second-order derivatives that include backward computations through the Riemannian operators such as retraction and orthogonal projection. This paper introduces a Hessian-free approach that uses a first-order approximation of derivatives on the Stiefel manifold. Our method significantly reduces the computational load and memory footprint. We show how using a Stiefel fully-connected layer that enforces orthogonality constraint on the parameters of the last classification layer as the head of the backbone network, strengthens the representation reuse of the gradient-based meta-learning methods. Our experimental results across various few-shot learning datasets, demonstrate the superiority of our proposed method compared to the state-of-the-art methods, especially MAML, its Euclidean counterpart.

FORML: A Riemannian Hessian-free Method for Meta-learning on Stiefel Manifolds

TL;DR

This work targets the computational bottleneck of meta-learning on Riemannian manifolds by introducing FORML, a Hessian-free, first-order Riemannian meta-learning method on the Stiefel manifold. By constraining the final classification head to lie on and employing a first-order gradient approximation, FORML avoids differentiating through full inner-loop trajectories while preserving effective gradient reuse via an orthogonal head. The bi-level optimization trains the Stiefel head with a normalized cosine-distance forward pass, while other layers operate in Euclidean space, yielding significant reductions in memory and compute. Empirically, FORML achieves competitive or superior performance to MAML across single-domain and cross-domain few-shot benchmarks, with additional benefits in deeper architectures and robust meta-learning dynamics.

Abstract

Meta-learning problem is usually formulated as a bi-level optimization in which the task-specific and the meta-parameters are updated in the inner and outer loops of optimization, respectively. However, performing the optimization in the Riemannian space, where the parameters and meta-parameters are located on Riemannian manifolds is computationally intensive. Unlike the Euclidean methods, the Riemannian backpropagation needs computing the second-order derivatives that include backward computations through the Riemannian operators such as retraction and orthogonal projection. This paper introduces a Hessian-free approach that uses a first-order approximation of derivatives on the Stiefel manifold. Our method significantly reduces the computational load and memory footprint. We show how using a Stiefel fully-connected layer that enforces orthogonality constraint on the parameters of the last classification layer as the head of the backbone network, strengthens the representation reuse of the gradient-based meta-learning methods. Our experimental results across various few-shot learning datasets, demonstrate the superiority of our proposed method compared to the state-of-the-art methods, especially MAML, its Euclidean counterpart.
Paper Structure (20 sections, 20 equations, 2 figures, 7 tables)

This paper contains 20 sections, 20 equations, 2 figures, 7 tables.

Figures (2)

  • Figure 1: An Illustrative schematic of various operations required in GD-based optimization on Riemannian manifold. Let $\textit{P}$ and $\textit{Q}$ represent points on the manifold $\pazocal{M}$ connected by a geodesic shown by the light green dashed curve. The tangent spaces at $\textit{P}$ and $\textit{Q}$, i.e.$T_{\textit{P}} \pazocal{M}$ and $T_{\textit{Q}} \pazocal{M}$ are shown in orange color. Vector $\textit{v}_1 \in T_{\textit{P}} \pazocal{M}$ is the result of the orthogonal projection of the euclidean vector $\textit{u}$ at $\textit{P}$. The retraction operation $\textit{R}\!=\!R_{\textit{P}}(\textit{v}_1)$ is used to move back to the manifold from the tangent space at $\textit{P}$. In a neighborhood of $\textit{P}$, the retraction operation (shown in brown) identifies a point on the geodesic. The parallel transport $\textit{v}_2\!=\!\Gamma_{\textit{P}\rightarrow \textit{Q}}(\textit{v}_1)$ maps $\textit{v}_1 \in T_{\textit{P}} \pazocal{M}$ to $\textit{v}_2 \in T_{\textit{Q}} \pazocal{M}$ by parallely moving across the geodesic (as shown in blue dotted arrows) connecting P and Q.
  • Figure 2: A sample representation of the Stiefel fully connected layer for 2D output space, where $\bm{W}=[\bm{w}_1,\bm{w}_2]$ represent the orthogonal weight matrix (lies on Stiefel manifold) and $\bm{x}$ is the input vector of the Stiefel fully connected layer. For this example, the equation (\ref{['eqn:Stiefel-layer']}) will be as: $\bm{W}^{T}\bm{x}=\bm{\gamma}=[\gamma_1,\gamma_2]$.

Theorems & Definitions (6)

  • Definition III.1: Smooth Riemannian manifold
  • Definition III.2: Stiefel manifold
  • Definition III.3: Manifold optimization
  • Definition III.4: Orthogonal projection
  • Definition III.5: Exponential map and Retraction
  • Definition III.6: Parallel transport