Binding Dynamics in Rotating Features
Sindy Löwe, Francesco Locatello, Max Welling
TL;DR
This work investigates how Rotating Features can form object-centric representations through binding dynamics. It introduces cosine binding as an explicit, alignment-driven alternative to the original $\bm{\chi}$-binding, enabling a clearer view of the dynamics that support object grouping. Empirically, cosine binding achieves performance on par with $\bm{\chi}$-binding across datasets like Pascal VOC and FoodSeg103, while offering stronger interpretability and clearer links to neuroscience and self-attention. The authors also highlight substantial memory and time costs of the alignment-based approach and suggest spiking-neural-network implementations as a promising direction for scalable, biologically plausible binding. Overall, the study advances understanding of how alignment-based mechanisms can yield robust object-centric representations with practical implications for generalization and reasoning in neural models.
Abstract
In human cognition, the binding problem describes the open question of how the brain flexibly integrates diverse information into cohesive object representations. Analogously, in machine learning, there is a pursuit for models capable of strong generalization and reasoning by learning object-centric representations in an unsupervised manner. Drawing from neuroscientific theories, Rotating Features learn such representations by introducing vector-valued features that encapsulate object characteristics in their magnitudes and object affiliation in their orientations. The "$χ$-binding" mechanism, embedded in every layer of the architecture, has been shown to be crucial, but remains poorly understood. In this paper, we propose an alternative "cosine binding" mechanism, which explicitly computes the alignment between features and adjusts weights accordingly, and we show that it achieves equivalent performance. This allows us to draw direct connections to self-attention and biological neural processes, and to shed light on the fundamental dynamics for object-centric representations to emerge in Rotating Features.
