Table of Contents
Fetching ...

GenIE - Simulator-Driven Iterative Data Exploration for Scientific Discovery

Ashwin Gerard Colaco, Martin Boissier, Sriram Rao, Shubharoop Ghosh, Sharad Mehrotra, Tilmann Rabl

TL;DR

GenIE addresses the bottleneck of static, linear simulator workflows by integrating physics-based simulators as first-class database components. It introduces a simulation-aware data model with virtual attributes, a query-driven orchestration layer for cross-simulator workflows, and progressive refinement to deliver interactive results with reuse. Case studies on wildfire smoke dispersion and hurricane hazard demonstrate substantial speedups and reduced redundant computation, validating the approach while highlighting challenges in multi-simulator optimization and usability. The work points to a path toward interactive, data-driven discovery across scientific and engineering domains.

Abstract

Physics-based simulators play a critical role in scientific discovery and risk assessment, enabling what-if analyses for events like wildfires and hurricanes. Today, databases treat these simulators as external pre-processing steps. Analysts must manually run a simulation, export the results, and load them into a database before analysis can begin. This linear workflow is inefficient, incurs high latency, and hinders interactive exploration, especially when the analysis itself dictates the need for new or refined simulation data. We envision a new database paradigm, entitled GenIE, that seamlessly integrates multiple simulators into databases to enable dynamic orchestration of simulation workflows. By making the database "simulation-aware," GenIE can dynamically invoke simulators with appropriate parameters based on the user's query and analytical needs. This tight integration allows GenIE to avoid generating data irrelevant to the analysis, reuse previously generated data, and support iterative, incremental analysis where results are progressively refined at interactive speeds. We present our vision for GenIE, designed as an extension to PostgreSQL, and demonstrate its potential benefits through comprehensive use cases: wildfire smoke dispersion analysis using WRF-SFIRE and HYSPLIT, and hurricane hazard assessment integrating wind, surge, and flood models. Our preliminary experiments show how GenIE can transform these slow, static analyses into interactive explorations by intelligently managing the trade-off between simulation accuracy and runtime across multiple integrated simulators. We conclude by highlighting the challenges and opportunities ahead in realizing the full vision of GenIE as a cornerstone for next-generation scientific data analysis.

GenIE - Simulator-Driven Iterative Data Exploration for Scientific Discovery

TL;DR

GenIE addresses the bottleneck of static, linear simulator workflows by integrating physics-based simulators as first-class database components. It introduces a simulation-aware data model with virtual attributes, a query-driven orchestration layer for cross-simulator workflows, and progressive refinement to deliver interactive results with reuse. Case studies on wildfire smoke dispersion and hurricane hazard demonstrate substantial speedups and reduced redundant computation, validating the approach while highlighting challenges in multi-simulator optimization and usability. The work points to a path toward interactive, data-driven discovery across scientific and engineering domains.

Abstract

Physics-based simulators play a critical role in scientific discovery and risk assessment, enabling what-if analyses for events like wildfires and hurricanes. Today, databases treat these simulators as external pre-processing steps. Analysts must manually run a simulation, export the results, and load them into a database before analysis can begin. This linear workflow is inefficient, incurs high latency, and hinders interactive exploration, especially when the analysis itself dictates the need for new or refined simulation data. We envision a new database paradigm, entitled GenIE, that seamlessly integrates multiple simulators into databases to enable dynamic orchestration of simulation workflows. By making the database "simulation-aware," GenIE can dynamically invoke simulators with appropriate parameters based on the user's query and analytical needs. This tight integration allows GenIE to avoid generating data irrelevant to the analysis, reuse previously generated data, and support iterative, incremental analysis where results are progressively refined at interactive speeds. We present our vision for GenIE, designed as an extension to PostgreSQL, and demonstrate its potential benefits through comprehensive use cases: wildfire smoke dispersion analysis using WRF-SFIRE and HYSPLIT, and hurricane hazard assessment integrating wind, surge, and flood models. Our preliminary experiments show how GenIE can transform these slow, static analyses into interactive explorations by intelligently managing the trade-off between simulation accuracy and runtime across multiple integrated simulators. We conclude by highlighting the challenges and opportunities ahead in realizing the full vision of GenIE as a cornerstone for next-generation scientific data analysis.

Paper Structure

This paper contains 21 sections, 1 equation, 3 figures, 2 tables.

Figures (3)

  • Figure 1: Transformation from current static workflow to GenIE's integrated approach. (a) Current workflow: Users manually configure simulators, wait for completion, export results, load into database, and analyze -- a repetitive cycle that must restart for any parameter change. (b) GenIE workflow: Users register simulators once, then query their output directly via SQL; GenIE automatically orchestrates simulator execution based on query needs.
  • Figure 2: GenIE Architecture (FMC notation) showing integration with native database systems and multiple simulator adapters. The Generator Driver sits within the DBMS and manages multiple Data Generator Adapters, each wrapping a physics-based simulator.
  • Figure 3: Impact of temporal resolution on HYSPLIT execution time and accuracy. Coarser time steps provide significant speedup with modest accuracy loss.