Binding via Reconstruction Clustering
Klaus Greff, Rupesh Kumar Srivastava, Jürgen Schmidhuber
TL;DR
The paper addresses the binding problem in representation learning by proposing Reconstruction Clustering (RC), an unsupervised framework that treats inputs as compositions of multiple objects and binds features via mutual predictability learned through a denoising autoencoder. RC alternates between updating object-specific representations and per-pixel object assignments, yielding soft/hard clustering that aligns with ground-truth segments on synthetic binary-image datasets. It demonstrates robust binding performance (AMI > 0.5 across datasets and often > 0.8) and rapid convergence, while also showing generalization to novel object configurations and domains. The work provides a mathematically grounded approach to dynamic, object-based binding that could extend to real-valued data and multiple modalities, and suggests connections to Gestalt principles as emergent properties of learned representations.
Abstract
Disentangled distributed representations of data are desirable for machine learning, since they are more expressive and can generalize from fewer examples. However, for complex data, the distributed representations of multiple objects present in the same input can interfere and lead to ambiguities, which is commonly referred to as the binding problem. We argue for the importance of the binding problem to the field of representation learning, and develop a probabilistic framework that explicitly models inputs as a composition of multiple objects. We propose an unsupervised algorithm that uses denoising autoencoders to dynamically bind features together in multi-object inputs through an Expectation-Maximization-like clustering process. The effectiveness of this method is demonstrated on artificially generated datasets of binary images, showing that it can even generalize to bind together new objects never seen by the autoencoder during training.
