Nonparametric Identification of Latent Concepts
Yujia Zheng, Shaoan Xie, Kun Zhang
TL;DR
This work tackles the problem of identifying latent concepts underlying observed data without parametric generative models. It introduces a nonparametric identifiability framework based on learning by comparison across diverse classes, including both local (pairwise) and global (multi-class) strategies. The authors prove that, under sufficient diversity, class-dependent concepts can be identified up to permutation and invertible transformations, with a nonparametric recovery of the class-concept connective structure, and they extend guarantees to partial identifiability when diversity is incomplete. Empirical results on synthetic and real-world datasets validate the theory, showing improved latent recovery and illustrating the practical reach of the framework.
Abstract
We are born with the ability to learn concepts by comparing diverse observations. This helps us to understand the new world in a compositional manner and facilitates extrapolation, as objects naturally consist of multiple concepts. In this work, we argue that the cognitive mechanism of comparison, fundamental to human learning, is also vital for machines to recover true concepts underlying the data. This offers correctness guarantees for the field of concept learning, which, despite its impressive empirical successes, still lacks general theoretical support. Specifically, we aim to develop a theoretical framework for the identifiability of concepts with multiple classes of observations. We show that with sufficient diversity across classes, hidden concepts can be identified without assuming specific concept types, functional relations, or parametric generative models. Interestingly, even when conditions are not globally satisfied, we can still provide alternative guarantees for as many concepts as possible based on local comparisons, thereby extending the applicability of our theory to more flexible scenarios. Moreover, the hidden structure between classes and concepts can also be identified nonparametrically. We validate our theoretical results in both synthetic and real-world settings.
