Table of Contents
Fetching ...

Binding Dynamics in Rotating Features

Sindy Löwe, Francesco Locatello, Max Welling

TL;DR

This work investigates how Rotating Features can form object-centric representations through binding dynamics. It introduces cosine binding as an explicit, alignment-driven alternative to the original $\bm{\chi}$-binding, enabling a clearer view of the dynamics that support object grouping. Empirically, cosine binding achieves performance on par with $\bm{\chi}$-binding across datasets like Pascal VOC and FoodSeg103, while offering stronger interpretability and clearer links to neuroscience and self-attention. The authors also highlight substantial memory and time costs of the alignment-based approach and suggest spiking-neural-network implementations as a promising direction for scalable, biologically plausible binding. Overall, the study advances understanding of how alignment-based mechanisms can yield robust object-centric representations with practical implications for generalization and reasoning in neural models.

Abstract

In human cognition, the binding problem describes the open question of how the brain flexibly integrates diverse information into cohesive object representations. Analogously, in machine learning, there is a pursuit for models capable of strong generalization and reasoning by learning object-centric representations in an unsupervised manner. Drawing from neuroscientific theories, Rotating Features learn such representations by introducing vector-valued features that encapsulate object characteristics in their magnitudes and object affiliation in their orientations. The "$χ$-binding" mechanism, embedded in every layer of the architecture, has been shown to be crucial, but remains poorly understood. In this paper, we propose an alternative "cosine binding" mechanism, which explicitly computes the alignment between features and adjusts weights accordingly, and we show that it achieves equivalent performance. This allows us to draw direct connections to self-attention and biological neural processes, and to shed light on the fundamental dynamics for object-centric representations to emerge in Rotating Features.

Binding Dynamics in Rotating Features

TL;DR

This work investigates how Rotating Features can form object-centric representations through binding dynamics. It introduces cosine binding as an explicit, alignment-driven alternative to the original -binding, enabling a clearer view of the dynamics that support object grouping. Empirically, cosine binding achieves performance on par with -binding across datasets like Pascal VOC and FoodSeg103, while offering stronger interpretability and clearer links to neuroscience and self-attention. The authors also highlight substantial memory and time costs of the alignment-based approach and suggest spiking-neural-network implementations as a promising direction for scalable, biologically plausible binding. Overall, the study advances understanding of how alignment-based mechanisms can yield robust object-centric representations with practical implications for generalization and reasoning in neural models.

Abstract

In human cognition, the binding problem describes the open question of how the brain flexibly integrates diverse information into cohesive object representations. Analogously, in machine learning, there is a pursuit for models capable of strong generalization and reasoning by learning object-centric representations in an unsupervised manner. Drawing from neuroscientific theories, Rotating Features learn such representations by introducing vector-valued features that encapsulate object characteristics in their magnitudes and object affiliation in their orientations. The "-binding" mechanism, embedded in every layer of the architecture, has been shown to be crucial, but remains poorly understood. In this paper, we propose an alternative "cosine binding" mechanism, which explicitly computes the alignment between features and adjusts weights accordingly, and we show that it achieves equivalent performance. This allows us to draw direct connections to self-attention and biological neural processes, and to shed light on the fundamental dynamics for object-centric representations to emerge in Rotating Features.
Paper Structure (15 sections, 6 equations, 2 figures, 1 table)

This paper contains 15 sections, 6 equations, 2 figures, 1 table.

Figures (2)

  • Figure 1: Coincidence detection in biological neurons (left) compared to cosine binding in Rotating Features (right). In biological neurons, incoming spikes are integrated temporally. When the resulting membrane potential (dotted line) reaches threshold (dashed horizontal line), an output spike is emitted, and the membrane potential is reset. Synchronously arriving spikes (dark green) trigger an output spike, while incoming spikes that arrive immediately after an output spike or occur individually (light green) have no impact on the output. In cosine binding, groups of aligned input features result in an intermediate output of similar orientation (gray line, centered at zero). Then, inputs are weighed based on their alignment with this intermediate output (highlighted by different shades of green) leading to the masking of unaligned features. In both setups, synchronization regulates signal transmission. Aligned inputs jointly generate the output and are thus processed together, while unaligned inputs are masked to minimize their influence on this output.
  • Figure 2: The $\bm{\chi}$- and cosine binding mechanisms assign different orientations to objects. With $\bm{\chi}$-binding (left), Rotating Features adopt maximally separated orientations for different objects -- about $180^{\circ}$ apart for two objects, and roughly $120^{\circ}$ apart when three are present. The cosine binding mechanism (right), on the other hand, results in a tighter clustering of orientations for different objects. Nonetheless, both mechanisms create accurate segmentation masks in their predictions.