FORML: A Riemannian Hessian-free Method for Meta-learning on Stiefel Manifolds

Hadi Tabealhojeh; Soumava Kumar Roy; Peyman Adibi; Hossein Karshenas

FORML: A Riemannian Hessian-free Method for Meta-learning on Stiefel Manifolds

Hadi Tabealhojeh, Soumava Kumar Roy, Peyman Adibi, Hossein Karshenas

TL;DR

This work targets the computational bottleneck of meta-learning on Riemannian manifolds by introducing FORML, a Hessian-free, first-order Riemannian meta-learning method on the Stiefel manifold. By constraining the final classification head to lie on $St(n,p)$ and employing a first-order gradient approximation, FORML avoids differentiating through full inner-loop trajectories while preserving effective gradient reuse via an orthogonal head. The bi-level optimization trains the Stiefel head with a normalized cosine-distance forward pass, while other layers operate in Euclidean space, yielding significant reductions in memory and compute. Empirically, FORML achieves competitive or superior performance to MAML across single-domain and cross-domain few-shot benchmarks, with additional benefits in deeper architectures and robust meta-learning dynamics.

Abstract

Meta-learning problem is usually formulated as a bi-level optimization in which the task-specific and the meta-parameters are updated in the inner and outer loops of optimization, respectively. However, performing the optimization in the Riemannian space, where the parameters and meta-parameters are located on Riemannian manifolds is computationally intensive. Unlike the Euclidean methods, the Riemannian backpropagation needs computing the second-order derivatives that include backward computations through the Riemannian operators such as retraction and orthogonal projection. This paper introduces a Hessian-free approach that uses a first-order approximation of derivatives on the Stiefel manifold. Our method significantly reduces the computational load and memory footprint. We show how using a Stiefel fully-connected layer that enforces orthogonality constraint on the parameters of the last classification layer as the head of the backbone network, strengthens the representation reuse of the gradient-based meta-learning methods. Our experimental results across various few-shot learning datasets, demonstrate the superiority of our proposed method compared to the state-of-the-art methods, especially MAML, its Euclidean counterpart.

FORML: A Riemannian Hessian-free Method for Meta-learning on Stiefel Manifolds

TL;DR

and employing a first-order gradient approximation, FORML avoids differentiating through full inner-loop trajectories while preserving effective gradient reuse via an orthogonal head. The bi-level optimization trains the Stiefel head with a normalized cosine-distance forward pass, while other layers operate in Euclidean space, yielding significant reductions in memory and compute. Empirically, FORML achieves competitive or superior performance to MAML across single-domain and cross-domain few-shot benchmarks, with additional benefits in deeper architectures and robust meta-learning dynamics.

Abstract

Paper Structure (20 sections, 20 equations, 2 figures, 7 tables)

This paper contains 20 sections, 20 equations, 2 figures, 7 tables.

Introduction
Related Works
Optimization-based meta-learning
Riemannian meta-learning
Hessian matrix, the curse of optimization-based methods
Preliminaries
Riemannian manifolds
Proposed Method
First Order approximation: From Euclidean space to Stiefel manifold
The proposed bi-level algorithm
Experiments and Results
Evaluation Scenarios and Datasets
Single-domain image classification
Cross-domain image classification
Experimental Details
...and 5 more sections

Figures (2)

Figure 1: An Illustrative schematic of various operations required in GD-based optimization on Riemannian manifold. Let $\textit{P}$ and $\textit{Q}$ represent points on the manifold $\pazocal{M}$ connected by a geodesic shown by the light green dashed curve. The tangent spaces at $\textit{P}$ and $\textit{Q}$, i.e.$T_{\textit{P}} \pazocal{M}$ and $T_{\textit{Q}} \pazocal{M}$ are shown in orange color. Vector $\textit{v}_1 \in T_{\textit{P}} \pazocal{M}$ is the result of the orthogonal projection of the euclidean vector $\textit{u}$ at $\textit{P}$. The retraction operation $\textit{R}\!=\!R_{\textit{P}}(\textit{v}_1)$ is used to move back to the manifold from the tangent space at $\textit{P}$. In a neighborhood of $\textit{P}$, the retraction operation (shown in brown) identifies a point on the geodesic. The parallel transport $\textit{v}_2\!=\!\Gamma_{\textit{P}\rightarrow \textit{Q}}(\textit{v}_1)$ maps $\textit{v}_1 \in T_{\textit{P}} \pazocal{M}$ to $\textit{v}_2 \in T_{\textit{Q}} \pazocal{M}$ by parallely moving across the geodesic (as shown in blue dotted arrows) connecting P and Q.
Figure 2: A sample representation of the Stiefel fully connected layer for 2D output space, where $\bm{W}=[\bm{w}_1,\bm{w}_2]$ represent the orthogonal weight matrix (lies on Stiefel manifold) and $\bm{x}$ is the input vector of the Stiefel fully connected layer. For this example, the equation (\ref{['eqn:Stiefel-layer']}) will be as: $\bm{W}^{T}\bm{x}=\bm{\gamma}=[\gamma_1,\gamma_2]$.

Theorems & Definitions (6)

Definition III.1: Smooth Riemannian manifold
Definition III.2: Stiefel manifold
Definition III.3: Manifold optimization
Definition III.4: Orthogonal projection
Definition III.5: Exponential map and Retraction
Definition III.6: Parallel transport

FORML: A Riemannian Hessian-free Method for Meta-learning on Stiefel Manifolds

TL;DR

Abstract

FORML: A Riemannian Hessian-free Method for Meta-learning on Stiefel Manifolds

Authors

TL;DR

Abstract

Table of Contents

Figures (2)

Theorems & Definitions (6)