Learning Visual-Semantic Subspace Representations
Gabriel Moreira, Manuel Marques, João Paulo Costeira, Alexander Hauptmann
TL;DR
This paper addresses learning image representations that respect semantic partial orders and enable logical reasoning by introducing a nuclear norm-based loss grounded in information-theoretic principles.The core idea is a joint low-rank formulation where $Z=YX$, with a loss $l(X)=ig\|Zigigig|_*-oldsymbol{ extalpha}igigig|_*+etaigig ext|Xigig|_2^2$, which yields a spectral embedding of the label Gram matrix $Y^ op Y$ and prevents representation collapse.The learned representations form a Boolean subspace lattice, enabling propositional queries via projection operators and supporting multi-label classification and complex retrieval tasks with logical queries.Empirical results on standard benchmarks and CelebA demonstrate competitive classification performance and effective retrieval with negations, while theoretical results guarantee orthogonalization of minterms and spectral geometry aligned with semantics.Overall, the work provides a principled, interpretable, and modality-agnostic framework for visual-semantic representation learning with strong connections to symbolic reasoning.
Abstract
Learning image representations that capture rich semantic relationships remains a significant challenge. Existing approaches are either contrastive, lacking robust theoretical guarantees, or struggle to effectively represent the partial orders inherent to structured visual-semantic data. In this paper, we introduce a nuclear norm-based loss function, grounded in the same information theoretic principles that have proved effective in self-supervised learning. We present a theoretical characterization of this loss, demonstrating that, in addition to promoting class orthogonality, it encodes the spectral geometry of the data within a subspace lattice. This geometric representation allows us to associate logical propositions with subspaces, ensuring that our learned representations adhere to a predefined symbolic structure.
