Table of Contents
Fetching ...

Galaxy Walker: Geometry-aware VLMs For Galaxy-scale Understanding

Tianyu Chen, Xingcheng Fu, Yisen Gao, Haodong Qian, Yuecen Wei, Kun Yan, Haoyi Zhou, Jianxin Li

TL;DR

Galaxy Walker addresses the limitations of Euclidean-only vision-language models for astronomical data by introducing geometry-aware representations that span Euclidean, spherical, and hyperbolic spaces. The framework combines a geometry prompt that injects multi-space geometric priors with a geometry adapter that uses a mixture-of-experts to fuse geometry-aware features into a pre-trained backbone, trained in two stages. It achieves state-of-the-art performance on galaxy property estimation ($R^2$ up to $0.91$) and morphology classification (up to $+0.17$ in $F1$), significantly outperforming domain-specific models and general-purpose VLMs. The work demonstrates the value of incorporating non-Euclidean geometry into multimodal astronomy models and provides guidance on efficient adapter-based integration and modality-aware analysis. It also analyzes expert specialization and modality interactions to inform scalable deployment and future expansions to larger, more capable models.

Abstract

Modern vision-language models (VLMs) develop patch embedding and convolution backbone within vector space, especially Euclidean ones, at the very founding. When expanding VLMs to a galaxy scale for understanding astronomical phenomena, the integration of spherical space for planetary orbits and hyperbolic spaces for black holes raises two formidable challenges. a) The current pre-training model is confined to Euclidean space rather than a comprehensive geometric embedding. b) The predominant architecture lacks suitable backbones for anisotropic physical geometries. In this paper, we introduced Galaxy-Walker, a geometry-aware VLM, for the universe-level vision understanding tasks. We proposed the geometry prompt that generates geometry tokens by random walks across diverse spaces on a multi-scale physical graph, along with a geometry adapter that compresses and reshapes the space anisotropy in a mixture-of-experts manner. Extensive experiments demonstrate the effectiveness of our approach, with Galaxy-Walker achieving state-of-the-art performance in both galaxy property estimation ($R^2$ scores up to $0.91$) and morphology classification tasks (up to $+0.17$ F1 improvement in challenging features), significantly outperforming both domain-specific models and general-purpose VLMs.

Galaxy Walker: Geometry-aware VLMs For Galaxy-scale Understanding

TL;DR

Galaxy Walker addresses the limitations of Euclidean-only vision-language models for astronomical data by introducing geometry-aware representations that span Euclidean, spherical, and hyperbolic spaces. The framework combines a geometry prompt that injects multi-space geometric priors with a geometry adapter that uses a mixture-of-experts to fuse geometry-aware features into a pre-trained backbone, trained in two stages. It achieves state-of-the-art performance on galaxy property estimation ( up to ) and morphology classification (up to in ), significantly outperforming domain-specific models and general-purpose VLMs. The work demonstrates the value of incorporating non-Euclidean geometry into multimodal astronomy models and provides guidance on efficient adapter-based integration and modality-aware analysis. It also analyzes expert specialization and modality interactions to inform scalable deployment and future expansions to larger, more capable models.

Abstract

Modern vision-language models (VLMs) develop patch embedding and convolution backbone within vector space, especially Euclidean ones, at the very founding. When expanding VLMs to a galaxy scale for understanding astronomical phenomena, the integration of spherical space for planetary orbits and hyperbolic spaces for black holes raises two formidable challenges. a) The current pre-training model is confined to Euclidean space rather than a comprehensive geometric embedding. b) The predominant architecture lacks suitable backbones for anisotropic physical geometries. In this paper, we introduced Galaxy-Walker, a geometry-aware VLM, for the universe-level vision understanding tasks. We proposed the geometry prompt that generates geometry tokens by random walks across diverse spaces on a multi-scale physical graph, along with a geometry adapter that compresses and reshapes the space anisotropy in a mixture-of-experts manner. Extensive experiments demonstrate the effectiveness of our approach, with Galaxy-Walker achieving state-of-the-art performance in both galaxy property estimation ( scores up to ) and morphology classification tasks (up to F1 improvement in challenging features), significantly outperforming both domain-specific models and general-purpose VLMs.

Paper Structure

This paper contains 13 sections, 13 equations, 20 figures, 8 tables.

Figures (20)

  • Figure 1: Geometries of the universe. While traditional VLMs are confined to flat Euclidean space, the actual universe exhibits rich geometric diversity including spherical and hyperbolic spaces, motivating our Galaxy Walker framework to incorporate multi-geometric representations.
  • Figure 2: The overall framework of Galaxy Walker. Left: The architecture integrates a Geometry Adapter with the pre-trained VLM backbone. The adapter includes a projection layer $\pi_\theta$ that processes various input modalities (e.g., Euclidean, Spherical, Hyperbolic embeddings, spectral data, and multi-band images), followed by $L$ transformer blocks enhanced with geometry-aware FFN experts. A gating network dynamically routes features to appropriate geometric experts. Two parallel heads (Numeric Head and LM Head) enable both regression and classification tasks. Right: Visualization of how different geometric spaces (Euclidean, Spherical, and Hyperbolic Walker) process astronomical data, demonstrating the distinct token arrangements and relationships in each geometry. The Geometry Prompt guides the model to utilize appropriate geometric representations for different astronomical features.
  • Figure 3: Visualization of geometry-specific expert contributions and case study analysis.
  • Figure 4: Analysis of modality contributions: (a) Performance impact when removing each modality; (b) Cross-modal correlation analysis.
  • Figure 5: Training Dynamics Analysis of Different Geometry Adapter Integration Strategies. Performance evolution during training ($0$k-$20$k steps) for different adapter integration densities in Qwen2-VL-2B, comparing sparse (every $4$ layer), medium (every $2$ layer), and dense (every layer) integration patterns. The plots show $R^2$ scores for physical property estimation ($\mathbf{M^*}$, $\mathbf{Z_{MW}}$, $\mathbf{t_{age}}$, $\mathbf{sSFR}$) and F1 scores for morphological classification tasks, revealing distinct convergence characteristics across different astronomical tasks.
  • ...and 15 more figures