Learning Interpretable Low-dimensional Representation via Physical Symmetry
Xuanjie Liu, Daniel Chin, Yichen Huang, Gus Xia
TL;DR
This work introduces SPS, a self-supervised framework that enforces physical symmetry on the latent dynamics of time-series to learn interpretable, low-dimensional representations. By requiring the prior dynamics $R$ to be equivariant to transformations $S$ in latent space, SPS recovers human-aligned factors such as a linear pitch in music and 3D Cartesian coordinates from monocular video, without domain-specific labels. A novel counterfactual representation augmentation mechanism expands training data in the latent space, boosting sample efficiency and aiding interpretability, even under incorrect symmetry assumptions. The approach is demonstrated on two domains (music and video) and extended with SPS+ to enable content–style disentanglement, indicating practical impact for learning compact, human-understandable representations in diverse time-series settings.
Abstract
We have recently seen great progress in learning interpretable music representations, ranging from basic factors, such as pitch and timbre, to high-level concepts, such as chord and texture. However, most methods rely heavily on music domain knowledge. It remains an open question what general computational principles give rise to interpretable representations, especially low-dim factors that agree with human perception. In this study, we take inspiration from modern physics and use physical symmetry as a self consistency constraint for the latent space of time-series data. Specifically, it requires the prior model that characterises the dynamics of the latent states to be equivariant with respect to certain group transformations. We show that physical symmetry leads the model to learn a linear pitch factor from unlabelled monophonic music audio in a self-supervised fashion. In addition, the same methodology can be applied to computer vision, learning a 3D Cartesian space from videos of a simple moving object without labels. Furthermore, physical symmetry naturally leads to counterfactual representation augmentation, a new technique which improves sample efficiency.
