Manifold Learning via Foliations and Knowledge Transfer
E. Tron, E. Fioresi
TL;DR
Addresses how to represent data geometry in high dimensions by using a CNN-trained classifier to induce a foliation on data space via the Data Information Matrix $D(x,w)$. The data-space distribution $\mathcal{D}_x = \mathrm{span}_{i}\{\nabla_x \log p_i(y|x,w)\}$ leads to a learning foliation that exists almost everywhere due to singular points forming a measure-zero set, and Frobenius-type integrability provides leaves. Empirically, leaves align with real data and moving along a leaf preserves meaningful predictions, while orthogonal directions degrade accuracy; the spectrum of $D(x,w)$ serves as a distance proxy between datasets and informs transfer by comparing DIM eigenvalues across datasets. This framework extends beyond traditional manifold assumptions, offering a geometric, information-theoretic approach to dimensionality reduction and transfer in deep classifiers.
Abstract
Understanding how real data is distributed in high dimensional spaces is the key to many tasks in machine learning. We want to provide a natural geometric structure on the space of data employing a deep ReLU neural network trained as a classifier. Through the data information matrix (DIM), a variation of the Fisher information matrix, the model will discern a singular foliation structure on the space of data. We show that the singular points of such foliation are contained in a measure zero set, and that a local regular foliation exists almost everywhere. Experiments show that the data is correlated with leaves of such foliation. Moreover we show the potential of our approach for knowledge transfer by analyzing the spectrum of the DIM to measure distances between datasets.
