Intrinsic Dimension Estimating Autoencoder (IDEA) Using CancelOut Layer and a Projected Loss
Antoine Oriou, Philipp Krah, Julian Koellermeier
TL;DR
This work tackles intrinsic-dimension estimation for high-dimensional data that lie on linear or nonlinear manifolds and pairs it with faithful reconstruction from a reduced latent space. It introduces IDEA, a neural autoencoder with a re-weighted double CancelOut bottleneck and a projected reconstruction loss that enforces latent sparsity without sacrificing accuracy. Across synthetic Legendre-based datasets and benchmark manifolds, IDEA accurately identifies the true intrinsic dimension $d$ and reconstructs data using a compact latent space, often outperforming standard estimators and baselines. The method is further demonstrated on vertically resolved free-surface flow data, where IDEA achieves a low-dimensional, physically interpretable representation that competes with, and can surpass, traditional POD approaches in efficiency and interpretability.
Abstract
This paper introduces the Intrinsic Dimension Estimating Autoencoder (IDEA), which identifies the underlying intrinsic dimension of a wide range of datasets whose samples lie on either linear or nonlinear manifolds. Beyond estimating the intrinsic dimension, IDEA is also able to reconstruct the original dataset after projecting it onto the corresponding latent space, which is structured using re-weighted double CancelOut layers. Our key contribution is the introduction of the projected reconstruction loss term, guiding the training of the model by continuously assessing the reconstruction quality under the removal of an additional latent dimension. We first assess the performance of IDEA on a series of theoretical benchmarks to validate its robustness. These experiments allow us to test its reconstruction ability and compare its performance with state-of-the-art intrinsic dimension estimators. The benchmarks show good accuracy and high versatility of our approach. Subsequently, we apply our model to data generated from the numerical solution of a vertically resolved one-dimensional free-surface flow, following a pointwise discretization of the vertical velocity profile in the horizontal direction, vertical direction, and time. IDEA succeeds in estimating the dataset's intrinsic dimension and then reconstructs the original solution by working directly within the projection space identified by the network.
