Scaling Laws for Galaxy Images

Mike Walmsley; Micah Bowles; Anna M. M. Scaife; Jason Shingirai Makechemu; Alexander J. Gordon; Annette M. N. Ferguson; Robert G. Mann; James Pearson; Jürgen J. Popp; Jo Bovy; Josh Speagle; Hugh Dickinson; Lucy Fortson; Tobias Géron; Sandor Kruk; Chris J. Lintott; Kameswara Mantha; Devina Mohan; David O'Ryan; Inigo V. Slijepevic

Scaling Laws for Galaxy Images

Mike Walmsley, Micah Bowles, Anna M. M. Scaife, Jason Shingirai Makechemu, Alexander J. Gordon, Annette M. N. Ferguson, Robert G. Mann, James Pearson, Jürgen J. Popp, Jo Bovy, Josh Speagle, Hugh Dickinson, Lucy Fortson, Tobias Géron, Sandor Kruk, Chris J. Lintott, Kameswara Mantha, Devina Mohan, David O'Ryan, Inigo V. Slijepevic

TL;DR

This paper tests neural scaling laws in a non-ImageNet domain by pretraining on Galaxy Zoo-labeled galaxy images (842k images across 88 tasks). It demonstrates a robust power-law relationship between the amount of labelled in-domain data and upstream loss across multiple architectures, with parameter scaling offering diminishing returns beyond about $100\mathrm{M}$ parameters. In downstream transfer, additional in-domain pretraining yields substantial, label-efficient improvements (average relative error reductions of $31\%$ across five tasks) and often achieves linear transfer performance close to full finetuning, outperforming ImageNet-only pretraining. The authors advocate a pragmatic middle-ground: domain-specific pretraining followed by targeted downstream labeling, and release Zoobot encoders to enable community use.

Abstract

We present the first systematic investigation of supervised scaling laws outside of an ImageNet-like context - on images of galaxies. We use 840k galaxy images and over 100M annotations by Galaxy Zoo volunteers, comparable in scale to Imagenet-1K. We find that adding annotated galaxy images provides a power law improvement in performance across all architectures and all tasks, while adding trainable parameters is effective only for some (typically more subjectively challenging) tasks. We then compare the downstream performance of finetuned models pretrained on either ImageNet-12k alone vs. additionally pretrained on our galaxy images. We achieve an average relative error rate reduction of 31% across 5 downstream tasks of scientific interest. Our finetuned models are more label-efficient and, unlike their ImageNet-12k-pretrained equivalents, often achieve linear transfer performance equal to that of end-to-end finetuning. We find relatively modest additional downstream benefits from scaling model size, implying that scaling alone is not sufficient to address our domain gap, and suggest that practitioners with qualitatively different images might benefit more from in-domain adaption followed by targeted downstream labelling.

Scaling Laws for Galaxy Images

TL;DR

parameters. In downstream transfer, additional in-domain pretraining yields substantial, label-efficient improvements (average relative error reductions of

across five tasks) and often achieves linear transfer performance close to full finetuning, outperforming ImageNet-only pretraining. The authors advocate a pragmatic middle-ground: domain-specific pretraining followed by targeted downstream labeling, and release Zoobot encoders to enable community use.

Abstract

Paper Structure (21 sections, 2 equations, 12 figures, 2 tables)

This paper contains 21 sections, 2 equations, 12 figures, 2 tables.

Introduction
Data
Why are galaxy images different?
Datasets Used
Upstream Scaling Laws
Downstream Performance
Limitations
Conclusion
Scaling Law Fits
Training Details
Upstream Training
Downstream Training
Upstream Datasets
Galaxy Zoo Labels
Multi-Task Learning
...and 6 more sections

Figures (12)

Figure 1: Upstream and downstream scales investigated in this work (left) and upstream scaling results (right). We use more labelled galaxy images than any previous work and train larger models than all but one dagli_astroformer_2023. Data in Tables \ref{['tab:scaling_law_fits']} and \ref{['tab:previous_models']}.
Figure 2: Galaxy images are qualitatively different to common pretraining datasets (e.g. ImageNet). Pretraining on labelled galaxy images instead substantially improves performance on diverse downstream galaxy image tasks. See Sec. \ref{['sec:downstream_performance']} for details.
Figure 3: Illustrative galaxy images from our pretraining dataset, split by telescope.
Figure 4: Upstream performance on galaxy images when scaling dataset size and parameters.
Figure 5: Change in upstream test loss vs. num. labelled images (above) or num. parameters (below), split by galaxy Task. More labelled data improves performance as a power law for every individual task. Adding parameters only improves performance at some tasks (e.g. 'spiral arm count') and has no effect on others (e.g. 'bar').
...and 7 more figures

Scaling Laws for Galaxy Images

TL;DR

Abstract

Scaling Laws for Galaxy Images

Authors

TL;DR

Abstract

Table of Contents

Figures (12)