Neural Feature Learning in Function Space
Xiangxiang Xu, Lizhong Zheng
TL;DR
This work introduces a principled framework for neural feature learning based on a function-space feature geometry that links statistical dependence to learned features via the canonical dependence kernel and H-score. It then develops the nesting technique to decompose and learn dependence components (and their modal decompositions) in multivariate settings, enabling flexible assembling of features into diverse inference models without retraining. The approach is demonstrated across conditional inference, side information, and multimodal learning with missing modalities, with theoretical results tying to maximum entropy, MLE in local regimes, and connections to classical regression and multitask networks. Empirically, the authors verify maximal-variance dependence modes across discrete, continuous, and sequential data, and show the ability to reconstruct posterior relations and conditional expectations from learned features. Overall, the framework provides a scalable, interpretable, and modular pathway to harness neural feature extractors for rich multivariate dependence representations.
Abstract
We present a novel framework for learning system design with neural feature extractors. First, we introduce the feature geometry, which unifies statistical dependence and feature representations in a function space equipped with inner products. This connection defines function-space concepts on statistical dependence, such as norms, orthogonal projection, and spectral decomposition, exhibiting clear operational meanings. In particular, we associate each learning setting with a dependence component and formulate learning tasks as finding corresponding feature approximations. We propose a nesting technique, which provides systematic algorithm designs for learning the optimal features from data samples with off-the-shelf network architectures and optimizers. We further demonstrate multivariate learning applications, including conditional inference and multimodal learning, where we present the optimal features and reveal their connections to classical approaches.
