Deep Extrinsic Manifold Representation for Vision Tasks

Tongtong Zhang; Xian Wei; Yuanxiang Li

Deep Extrinsic Manifold Representation for Vision Tasks

Tongtong Zhang, Xian Wei, Yuanxiang Li

TL;DR

This work introduces Deep Extrinsic Manifold Representation (DEMR), a framework that embeds manifold-valued outputs into a Euclidean space via an extrinsic embedding $J$, allowing standard neural networks to be trained with Euclidean losses while preserving manifold geometry through a learnable projection and an inverse map $J^{-1}$. The approach is analyzed theoretically, establishing feasibility, local bilipschitz properties, and asymptotic maximum likelihood estimation for key manifolds such as $SE(3)$ and the Grassmann manifold; the analysis links extrinsic distances to intrinsic geometry. Empirically, DEMR demonstrates strong performance on two vision tasks: relative point cloud transformation on $SE(3)$ and illumination subspace estimation on the Grassmann manifold, with 6D/9D embeddings outperforming Euclidean-output baselines and offering faster convergence than intrinsic methods. The results indicate that maintaining the geometric structure of outputs via extrinsic embeddings yields better generalization to unseen data and practical computational advantages, supporting broader applicability across manifold-valued vision tasks and suggesting a unified path for integrating extrinsic embeddings into deep learning architectures.

Abstract

Non-Euclidean data is frequently encountered across different fields, yet there is limited literature that addresses the fundamental challenge of training neural networks with manifold representations as outputs. We introduce the trick named Deep Extrinsic Manifold Representation (DEMR) for visual tasks in this context. DEMR incorporates extrinsic manifold embedding into deep neural networks, which helps generate manifold representations. The DEMR approach does not directly optimize the complex geodesic loss. Instead, it focuses on optimizing the computation graph within the embedded Euclidean space, allowing for adaptability to various architectural requirements. We provide empirical evidence supporting the proposed concept on two types of manifolds, $SE(3)$ and its associated quotient manifolds. This evidence offers theoretical assurances regarding feasibility, asymptotic properties, and generalization capability. The experimental results show that DEMR effectively adapts to point cloud alignment, producing outputs in $ SE(3) $, as well as in illumination subspace learning with outputs on the Grassmann manifold.

Deep Extrinsic Manifold Representation for Vision Tasks

TL;DR

This work introduces Deep Extrinsic Manifold Representation (DEMR), a framework that embeds manifold-valued outputs into a Euclidean space via an extrinsic embedding

, allowing standard neural networks to be trained with Euclidean losses while preserving manifold geometry through a learnable projection and an inverse map

. The approach is analyzed theoretically, establishing feasibility, local bilipschitz properties, and asymptotic maximum likelihood estimation for key manifolds such as

and the Grassmann manifold; the analysis links extrinsic distances to intrinsic geometry. Empirically, DEMR demonstrates strong performance on two vision tasks: relative point cloud transformation on

and illumination subspace estimation on the Grassmann manifold, with 6D/9D embeddings outperforming Euclidean-output baselines and offering faster convergence than intrinsic methods. The results indicate that maintaining the geometric structure of outputs via extrinsic embeddings yields better generalization to unseen data and practical computational advantages, supporting broader applicability across manifold-valued vision tasks and suggesting a unified path for integrating extrinsic embeddings into deep learning architectures.

Abstract

and its associated quotient manifolds. This evidence offers theoretical assurances regarding feasibility, asymptotic properties, and generalization capability. The experimental results show that DEMR effectively adapts to point cloud alignment, producing outputs in

, as well as in illumination subspace learning with outputs on the Grassmann manifold.

Paper Structure (49 sections, 13 theorems, 11 equations, 4 figures, 3 tables)

This paper contains 49 sections, 13 theorems, 11 equations, 4 figures, 3 tables.

Introduction
Contribution
DEMR
Problem Formulation
Estimation in the embedded space
Pipeline design
Reformulation of neural network for images
Projection $Pr$ onto the preimage of $J$
DIMR
The extrinsic embedding $J$
Matrix Lie Group
9D: SVD of rank 9
6D: cross product
The Quotient Manifold of Lie Group
DEMR as a generalization of previous research
...and 34 more sections

Key Result

Lemma 3.1

Suppose that $\mathcal{M}_1, \mathcal{M}_2$ are smooth and compact Riemannian manifolds,$f:\mathcal{M}_1 \rightarrow \mathcal{M}_2$ is a diffeomorphism. Then $f$ is bilipschitz w.r.t. the Riemannian distance.

Figures (4)

Figure 1: Manifold regression explores the relationship between a manifold-valued variable and a value in vector space. A typical intrinsic manifold regression finds the best-fitted geodesic curve $\gamma$ on $\mathcal{M}$ via (a) minimizing a complex energy function of distance and smoothness, or (b) updating parameters in the local tangent bundle $T\mathcal{M}$. Extrinsic manifold regression (c) models the relationship in the extrinsically embedded space.
Figure 2: DEMR pipeline, with black arrows indicating the forward process, and optimization in the red box.
Figure 3: DIMR pipeline with geodesic loss on $\mathcal{M}$, with the black arrow indicating the forward process.
Figure 4: The cumulative distributions comparison of position errors for the pose regression task on $SE(3)$.

Theorems & Definitions (20)

Lemma 3.1
Proposition 3.2
Proposition 3.3
Proposition 3.4
proof
Proposition 3.5
Proposition 3.6
Corollary 3.7
Lemma 5.1
proof
...and 10 more

Deep Extrinsic Manifold Representation for Vision Tasks

TL;DR

Abstract

Deep Extrinsic Manifold Representation for Vision Tasks

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (4)

Theorems & Definitions (20)