Table of Contents
Fetching ...

Animating Petascale Time-varying Data on Commodity Hardware with LLM-assisted Scripting

Ishrat Jahan Eliza, Xuan Huang, Aashish Panta, Alper Sahistan, Zhimin Li, Amy A. Gooch, Valerio Pascucci

TL;DR

A user-friendly framework for creating 3D animations of petascale, time-varying data on a commodity workstation that uses large-scale NASA climate-oceanographic datasets and can seamlessly incorporate as much high-resolution data as needed for the final version.

Abstract

Scientists face significant visualization challenges as time-varying datasets grow in speed and volume, often requiring specialized infrastructure and expertise to handle massive datasets. Petascale climate models generated in NASA laboratories require a dedicated group of graphics and media experts and access to high-performance computing resources. Scientists may need to share scientific results with the community iteratively and quickly. However, the time-consuming trial-and-error process incurs significant data transfer overhead and far exceeds the time and resources allocated for typical post-analysis visualization tasks, disrupting the production workflow. Our paper introduces a user-friendly framework for creating 3D animations of petascale, time-varying data on a commodity workstation. Our contributions: (i) Generalized Animation Descriptor (GAD) with a keyframe-based adaptable abstraction for animation, (ii) efficient data access from cloud-hosted repositories to reduce data management overhead, (iii) tailored rendering system, and (iv) an LLM-assisted conversational interface as a scripting module to allow domain scientists with no visualization expertise to create animations of their region of interest. We demonstrate the framework's effectiveness with two case studies: first, by generating animations in which sampling criteria are specified based on prior knowledge, and second, by generating AI-assisted animations in which sampling parameters are derived from natural-language user prompts. In all cases, we use large-scale NASA climate-oceanographic datasets that exceed 1PB in size yet achieve a fast turnaround time of 1 minute to 2 hours. Users can generate a rough draft of the animation within minutes, then seamlessly incorporate as much high-resolution data as needed for the final version.

Animating Petascale Time-varying Data on Commodity Hardware with LLM-assisted Scripting

TL;DR

A user-friendly framework for creating 3D animations of petascale, time-varying data on a commodity workstation that uses large-scale NASA climate-oceanographic datasets and can seamlessly incorporate as much high-resolution data as needed for the final version.

Abstract

Scientists face significant visualization challenges as time-varying datasets grow in speed and volume, often requiring specialized infrastructure and expertise to handle massive datasets. Petascale climate models generated in NASA laboratories require a dedicated group of graphics and media experts and access to high-performance computing resources. Scientists may need to share scientific results with the community iteratively and quickly. However, the time-consuming trial-and-error process incurs significant data transfer overhead and far exceeds the time and resources allocated for typical post-analysis visualization tasks, disrupting the production workflow. Our paper introduces a user-friendly framework for creating 3D animations of petascale, time-varying data on a commodity workstation. Our contributions: (i) Generalized Animation Descriptor (GAD) with a keyframe-based adaptable abstraction for animation, (ii) efficient data access from cloud-hosted repositories to reduce data management overhead, (iii) tailored rendering system, and (iv) an LLM-assisted conversational interface as a scripting module to allow domain scientists with no visualization expertise to create animations of their region of interest. We demonstrate the framework's effectiveness with two case studies: first, by generating animations in which sampling criteria are specified based on prior knowledge, and second, by generating AI-assisted animations in which sampling parameters are derived from natural-language user prompts. In all cases, we use large-scale NASA climate-oceanographic datasets that exceed 1PB in size yet achieve a fast turnaround time of 1 minute to 2 hours. Users can generate a rough draft of the animation within minutes, then seamlessly incorporate as much high-resolution data as needed for the final version.
Paper Structure (24 sections, 8 figures)

This paper contains 24 sections, 8 figures.

Figures (8)

  • Figure 1: Our versatile framework generates a region of interest keyframe animations from petascale data using a generalized animation descriptor (GAD) file and flexible scripting. We also enable AI-assisted scripting that removes the hurdles of describing more than a region of interest to get the first image in minutes. a) The user first asks to see the "salinity of the Mediterranean Sea". After four iterations with the AI, the result in b) displays the salinity as a grayscale transfer function with streamlines to highlight the Meddies (Mediterranean eddies). c) Further chatting with the AI applies the same visualization parameters to the Red Sea, including the Bab el Mandeb Strait and the Gulf of Aden. Both b) and c) show regions of high salinity (white) with fast-moving currents (red) versus the oceanic regions with lower salinity (gray) and slowing currents (blue).
  • Figure 2: Our framework. Unlike traditional visualizations, where the user faces an initial big data management challenge with large-than-disk, cloud-hosted datasets, our framework starts with the event description. An optional addition of an AI-assisted scripting mechanism presents an intuitive environment to translate the conceptual design into GAD for application-independent visualization. By hiding the complexity of data management and cross-application translation, our framework simplifies the animation production cycle for the user with the illusion of directly getting a rendered video from a scripting interface with full remote dataset access.
  • Figure 3: Our novel Generalized Animation Descriptor (GAD) file format. GAD describes an animation as a sequence of keyframes stored in an individual file. The list of data is recorded separately and accessed by keyframe files through indexing. Our modular design splits data storage and rendering information into two pieces, allowing an independent description of animation design regardless of the dataset in use.
  • Figure 4: Our simple interactive viewer provides a quick interface for users to create and iteratively modify animations with just a few settings.
  • Figure 5: An ocean surface (Z=0) salinity map, ranging from 33 to 38 g/kg. This visualization represents one of 10,269 timesteps from a 1PB dataset, each 20GB in size. The complete dataset includes five 3D scalar fields: temperature, salinity, and velocity components east-west, south-north, and vertical.
  • ...and 3 more figures