Table of Contents
Fetching ...

Binding via Reconstruction Clustering

Klaus Greff, Rupesh Kumar Srivastava, Jürgen Schmidhuber

TL;DR

The paper addresses the binding problem in representation learning by proposing Reconstruction Clustering (RC), an unsupervised framework that treats inputs as compositions of multiple objects and binds features via mutual predictability learned through a denoising autoencoder. RC alternates between updating object-specific representations and per-pixel object assignments, yielding soft/hard clustering that aligns with ground-truth segments on synthetic binary-image datasets. It demonstrates robust binding performance (AMI > 0.5 across datasets and often > 0.8) and rapid convergence, while also showing generalization to novel object configurations and domains. The work provides a mathematically grounded approach to dynamic, object-based binding that could extend to real-valued data and multiple modalities, and suggests connections to Gestalt principles as emergent properties of learned representations.

Abstract

Disentangled distributed representations of data are desirable for machine learning, since they are more expressive and can generalize from fewer examples. However, for complex data, the distributed representations of multiple objects present in the same input can interfere and lead to ambiguities, which is commonly referred to as the binding problem. We argue for the importance of the binding problem to the field of representation learning, and develop a probabilistic framework that explicitly models inputs as a composition of multiple objects. We propose an unsupervised algorithm that uses denoising autoencoders to dynamically bind features together in multi-object inputs through an Expectation-Maximization-like clustering process. The effectiveness of this method is demonstrated on artificially generated datasets of binary images, showing that it can even generalize to bind together new objects never seen by the autoencoder during training.

Binding via Reconstruction Clustering

TL;DR

The paper addresses the binding problem in representation learning by proposing Reconstruction Clustering (RC), an unsupervised framework that treats inputs as compositions of multiple objects and binds features via mutual predictability learned through a denoising autoencoder. RC alternates between updating object-specific representations and per-pixel object assignments, yielding soft/hard clustering that aligns with ground-truth segments on synthetic binary-image datasets. It demonstrates robust binding performance (AMI > 0.5 across datasets and often > 0.8) and rapid convergence, while also showing generalization to novel object configurations and domains. The work provides a mathematically grounded approach to dynamic, object-based binding that could extend to real-valued data and multiple modalities, and suggests connections to Gestalt principles as emergent properties of learned representations.

Abstract

Disentangled distributed representations of data are desirable for machine learning, since they are more expressive and can generalize from fewer examples. However, for complex data, the distributed representations of multiple objects present in the same input can interfere and lead to ambiguities, which is commonly referred to as the binding problem. We argue for the importance of the binding problem to the field of representation learning, and develop a probabilistic framework that explicitly models inputs as a composition of multiple objects. We propose an unsupervised algorithm that uses denoising autoencoders to dynamically bind features together in multi-object inputs through an Expectation-Maximization-like clustering process. The effectiveness of this method is demonstrated on artificially generated datasets of binary images, showing that it can even generalize to bind together new objects never seen by the autoencoder during training.

Paper Structure

This paper contains 27 sections, 20 equations, 15 figures, 2 tables.

Figures (15)

  • Figure 1: (a) A Greeble. (b) Example demonstrating the binding problem (c) An illustration of intra-object predictability. The missing pixels from the square can be predicted using other pixels constituting the box, but not from pixels constituting other objects.
  • Figure 2: (a) The assumed probabilistic structure. (b) A schematic illustration of one iteration of the RC algorithm.
  • Figure 3: One example from each of the six datasets. The input images are shown on the top row with the corresponding ground-truth grouping below.
  • Figure 4: Left: Mean AMI score over 1000 test samples for all datasets and various number of clusters $K$. Right: Convergence of the log-likelihood on the shapes dataset for different numbers of clusters, showing test set mean (line) and standard deviation (shaded) over the test set.
  • Figure 5: The top plot shows the score and confidence for each of the 1000 test images from the shapes dataset, sorted by score. The confidence is the average value of $\max_k \gamma_{ik}$ for each evaluated pixel (non-background, non-overlap). The central part of the figure shows six examples (columns) along with the cluster assignments (indicated by different colors) over RC iterations. The corresponding ground-truth is shown at the bottom. The right vertical plot shows the log-likelihood over the RC iterations corresponding to the displayed cluster assignments. Similar plots for the other datasets are included in the Appendix.
  • ...and 10 more figures