Scalable Cosmic AI Inference using Cloud Serverless Computing

Mills Staylor; Amirreza Dolatpour Fathkouhi; Md Khairul Islam; Kaleigh O'Hara; Ryan Ghiles Goudjil; Geoffrey Fox; Judy Fox

Scalable Cosmic AI Inference using Cloud Serverless Computing

Mills Staylor, Amirreza Dolatpour Fathkouhi, Md Khairul Islam, Kaleigh O'Hara, Ryan Ghiles Goudjil, Geoffrey Fox, Judy Fox

TL;DR

This paper tackles the bottleneck of scalable, cost-effective inference on massive astronomical image datasets by introducing CAI, a Cloud-based Astronomy Inference framework that leverages AWS Lambda serverless computing to run large foundation-models (AstroMAE) for redshift prediction. CAI partitions data and executes parallel inferences, achieving near-linear scaling with dataset size and delivering substantial speedups (e.g., 28 s on 12.6 GB data) and high throughput (up to 18.04B bps) at costs under $5 per experiment. The authors validate CAI across devices including personal laptops, HPC clusters, and the cloud, and extend experiments to 1 TB data, demonstrating robust scalability and accessibility for the astronomy community. They also discuss limitations (e.g., Lambda memory and inter-function communication) and outline future work to integrate FMI and enhance high-performance communication between functions.

Abstract

Large-scale astronomical image data processing and prediction are essential for astronomers, providing crucial insights into celestial objects, the universe's history, and its evolution. While modern deep learning models offer high predictive accuracy, they often demand substantial computational resources, making them resource-intensive and limiting accessibility. We introduce the Cloud-based Astronomy Inference (CAI) framework to address these challenges. This scalable solution integrates pre-trained foundation models with serverless cloud infrastructure through a Function-as-a-Service (FaaS). CAI enables efficient and scalable inference on astronomical images without extensive hardware. Using a foundation model for redshift prediction as a case study, our extensive experiments cover user devices, HPC (High-Performance Computing) servers, and Cloud. Using redshift prediction with the AstroMAE model demonstrated CAI's scalability and efficiency, achieving inference on a 12.6 GB dataset in only 28 seconds compared to 140.8 seconds on HPC GPUs and 1793 seconds on HPC CPUs. CAI also achieved significantly higher throughput, reaching 18.04 billion bits per second (bps), and maintained near-constant inference times as data sizes increased, all at minimal computational cost (under $5 per experiment). We also process large-scale data up to 1 TB to show CAI's effectiveness at scale. CAI thus provides a highly scalable, accessible, and cost-effective inference solution for the astronomy community. The code is accessible at https://github.com/UVA-MLSys/AI-for-Astronomy.

Scalable Cosmic AI Inference using Cloud Serverless Computing

TL;DR

Abstract

Paper Structure (21 sections, 6 equations, 11 figures, 7 tables)

This paper contains 21 sections, 6 equations, 11 figures, 7 tables.

Introduction
Related Work
Problem Statement
Methodology
AstroMAE
Pretraining:
Fine-tuning:
Proposed Framework: CAI
Experiments
Dataset
Implementation
AstroMAE model
Evaluation Metrics
Experiment Setup
Performance Results and Analysis
...and 6 more sections

Figures (11)

Figure 1: The architecture of masked autoencoder of AstroMAE astromae.
Figure 2: AstroMAE fine-tuning architecture.
Figure 3: CAI framework overview using AWS Lambda Functions. It uses an AWS S3 bucket for data, code, and result storage. The state machine defines the workflow execution steps using AWS Lambda functions and distributed maps. Parallel execution is achieved through data partitions for almost linear high-performance inference scaling.
Figure 4: AWS Lambda Memory Usage by Partition Data Size. We empirically size the dataset based on the partition data size in MB.
Figure 5: The parameter counts for recent deep learning-based methods developed for astronomy images, capable of inference across diverse computing environments—including a personal laptop, HPC CPUs, HPC GPUs, and our proposed cloud-based framework, CAI. A pre-trained AstroMAE model is used for the inference scaling experiments.
...and 6 more figures

Scalable Cosmic AI Inference using Cloud Serverless Computing

TL;DR

Abstract

Scalable Cosmic AI Inference using Cloud Serverless Computing

Authors

TL;DR

Abstract

Table of Contents

Figures (11)