Radio Astronomy in the Era of Vision-Language Models: Prompt Sensitivity and Adaptation

Mariia Drozdova; Erica Lastufka; Vitaliy Kinakh; Taras Holotyak; Daniel Schaerer; Slava Voloshynovskiy

Radio Astronomy in the Era of Vision-Language Models: Prompt Sensitivity and Adaptation

Mariia Drozdova, Erica Lastufka, Vitaliy Kinakh, Taras Holotyak, Daniel Schaerer, Slava Voloshynovskiy

TL;DR

This work investigates whether generic vision-language models can perform morphology-based classification of radio galaxies (FR-I vs FR-II) on MiraBest data, using prompting strategies and lightweight LoRA fine-tuning. It shows that VLMs carry useful priors for unfamiliar scientific imagery, but their outputs can be highly sensitive to prompt design and decoding settings, revealing fragility in their reasoning. With around 15 million trainable parameters, LoRA-tuned Qwen-VL approaches domain-specific performance, rivaling specialized models with minimal astronomy pretraining. The study highlights both the potential and the caution required when applying VLMs to scientific tasks, suggesting careful prompt engineering and targeted adaptation as practical paths forward.

Abstract

Vision-Language Models (VLMs), such as recent Qwen and Gemini models, are positioned as general-purpose AI systems capable of reasoning across domains. Yet their capabilities in scientific imaging, especially on unfamiliar and potentially previously unseen data distributions, remain poorly understood. In this work, we assess whether generic VLMs, presumed to lack exposure to astronomical corpora, can perform morphology-based classification of radio galaxies using the MiraBest FR-I/FR-II dataset. We explore prompting strategies using natural language and schematic diagrams, and, to the best of our knowledge, we are the first to introduce visual in-context examples within prompts in astronomy. Additionally, we evaluate lightweight supervised adaptation via LoRA fine-tuning. Our findings reveal three trends: (i) even prompt-based approaches can achieve good performance, suggesting that VLMs encode useful priors for unfamiliar scientific domains; (ii) however, outputs are highly unstable, i.e. varying sharply with superficial prompt changes such as layout, ordering, or decoding temperature, even when semantic content is held constant; and (iii) with just 15M trainable parameters and no astronomy-specific pretraining, fine-tuned Qwen-VL achieves near state-of-the-art performance (3% Error rate), rivaling domain-specific models. These results suggest that the apparent "reasoning" of VLMs often reflects prompt sensitivity rather than genuine inference, raising caution for their use in scientific domains. At the same time, with minimal adaptation, generic VLMs can rival specialized models, offering a promising but fragile tool for scientific discovery.

Radio Astronomy in the Era of Vision-Language Models: Prompt Sensitivity and Adaptation

TL;DR

Abstract

Radio Astronomy in the Era of Vision-Language Models: Prompt Sensitivity and Adaptation

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (8)