Galaxy Walker: Geometry-aware VLMs For Galaxy-scale Understanding
Tianyu Chen, Xingcheng Fu, Yisen Gao, Haodong Qian, Yuecen Wei, Kun Yan, Haoyi Zhou, Jianxin Li
TL;DR
Galaxy Walker addresses the limitations of Euclidean-only vision-language models for astronomical data by introducing geometry-aware representations that span Euclidean, spherical, and hyperbolic spaces. The framework combines a geometry prompt that injects multi-space geometric priors with a geometry adapter that uses a mixture-of-experts to fuse geometry-aware features into a pre-trained backbone, trained in two stages. It achieves state-of-the-art performance on galaxy property estimation ($R^2$ up to $0.91$) and morphology classification (up to $+0.17$ in $F1$), significantly outperforming domain-specific models and general-purpose VLMs. The work demonstrates the value of incorporating non-Euclidean geometry into multimodal astronomy models and provides guidance on efficient adapter-based integration and modality-aware analysis. It also analyzes expert specialization and modality interactions to inform scalable deployment and future expansions to larger, more capable models.
Abstract
Modern vision-language models (VLMs) develop patch embedding and convolution backbone within vector space, especially Euclidean ones, at the very founding. When expanding VLMs to a galaxy scale for understanding astronomical phenomena, the integration of spherical space for planetary orbits and hyperbolic spaces for black holes raises two formidable challenges. a) The current pre-training model is confined to Euclidean space rather than a comprehensive geometric embedding. b) The predominant architecture lacks suitable backbones for anisotropic physical geometries. In this paper, we introduced Galaxy-Walker, a geometry-aware VLM, for the universe-level vision understanding tasks. We proposed the geometry prompt that generates geometry tokens by random walks across diverse spaces on a multi-scale physical graph, along with a geometry adapter that compresses and reshapes the space anisotropy in a mixture-of-experts manner. Extensive experiments demonstrate the effectiveness of our approach, with Galaxy-Walker achieving state-of-the-art performance in both galaxy property estimation ($R^2$ scores up to $0.91$) and morphology classification tasks (up to $+0.17$ F1 improvement in challenging features), significantly outperforming both domain-specific models and general-purpose VLMs.
