Table of Contents
Fetching ...

COSMOS: A Data-Driven Probabilistic Time Series simulator for Chemical Plumes across Spatial Scales

Arunava Nag, Floris van Breugel

TL;DR

COSMOS addresses the need for realistic odor time series across large spatial domains without the prohibitive cost of full CFD simulations. It combines a data-driven spatial prior for whiff onset with a logit-space AR(2) concentration model driven by empirical whiff statistics, using a Gaussian-plume-inspired spatial field to guide onset in space and time. Validation against outdoor desert and forest measurements, as well as CFD-derived plumes, shows that COSMOS reproduces key statistics (whiff frequency, duration, concentration) and yields agent behaviors similar to CFD-based scenarios, while being ~35× faster. This approach enables rapid, large-scale evaluation and learning of odor-tracking strategies, with potential extensions to 3D plume dynamics and speed-aware temporal scaling.

Abstract

The development of robust odor navigation strategies for automated environmental monitoring applications requires realistic simulations of odor time series for agents moving across large spatial scales. Traditional approaches that rely on computational fluid dynamics (CFD) methods can capture the spatiotemporal dynamics of odor plumes, but are impractical for large-scale simulations due to their computational expense. On the other hand, puff-based simulations, although computationally tractable for large scales and capable of capturing the stochastic nature of plumes, fail to reproduce naturalistic odor statistics. Here, we present COSMOS (Configurable Odor Simulation Model over Scalable Spaces), a data-driven probabilistic framework that synthesizes realistic odor time series from spatial and temporal features of real datasets. COSMOS generates similar distributions of key statistical features such as whiff frequency, duration, and concentration as observed in real data, while dramatically reducing computational overhead. By reproducing critical statistical properties across a variety of flow regimes and scales, COSMOS enables the development and evaluation of agent-based navigation strategies with naturalistic odor experiences. To demonstrate its utility, we compare odor-tracking agents exposed to CFD-generated plumes versus COSMOS simulations, showing that both their odor experiences and resulting behaviors are quite similar.

COSMOS: A Data-Driven Probabilistic Time Series simulator for Chemical Plumes across Spatial Scales

TL;DR

COSMOS addresses the need for realistic odor time series across large spatial domains without the prohibitive cost of full CFD simulations. It combines a data-driven spatial prior for whiff onset with a logit-space AR(2) concentration model driven by empirical whiff statistics, using a Gaussian-plume-inspired spatial field to guide onset in space and time. Validation against outdoor desert and forest measurements, as well as CFD-derived plumes, shows that COSMOS reproduces key statistics (whiff frequency, duration, concentration) and yields agent behaviors similar to CFD-based scenarios, while being ~35× faster. This approach enables rapid, large-scale evaluation and learning of odor-tracking strategies, with potential extensions to 3D plume dynamics and speed-aware temporal scaling.

Abstract

The development of robust odor navigation strategies for automated environmental monitoring applications requires realistic simulations of odor time series for agents moving across large spatial scales. Traditional approaches that rely on computational fluid dynamics (CFD) methods can capture the spatiotemporal dynamics of odor plumes, but are impractical for large-scale simulations due to their computational expense. On the other hand, puff-based simulations, although computationally tractable for large scales and capable of capturing the stochastic nature of plumes, fail to reproduce naturalistic odor statistics. Here, we present COSMOS (Configurable Odor Simulation Model over Scalable Spaces), a data-driven probabilistic framework that synthesizes realistic odor time series from spatial and temporal features of real datasets. COSMOS generates similar distributions of key statistical features such as whiff frequency, duration, and concentration as observed in real data, while dramatically reducing computational overhead. By reproducing critical statistical properties across a variety of flow regimes and scales, COSMOS enables the development and evaluation of agent-based navigation strategies with naturalistic odor experiences. To demonstrate its utility, we compare odor-tracking agents exposed to CFD-generated plumes versus COSMOS simulations, showing that both their odor experiences and resulting behaviors are quite similar.

Paper Structure

This paper contains 12 sections, 20 equations, 7 figures, 1 table.

Figures (7)

  • Figure 1: Overview of the COSMOS pipeline for generating realistic, stochastic odor experiences. (A) High-level schematic illustrating the primary inputs (trajectory data and wind measurements) and the core algorithmic modules in COSMOS. (B) Empirical measurements of odor encounters are used to derive a spatial prior (i-iv), estimating the probability of whiff onset at each location. (C) Using recent history and spatial prior probability, the posterior probability of a whiff onset is calculated, with red representing the current whiff onset being evaluated. (D) Empirical Whiff Duration (WD) data guides transitions into or maintenance of whiff states at each time step, with the current whiff length represented in red. (E) A second-order autoregressive model combined with logistic transforms refines concentration values using empirical Whiff Concentration (WC) and Whiff Standard Deviation (WSD), ensuring smooth, realistic odor fluctuations in the simulated time series. (F) Intermittency between whiffs is modeled using empirical binned data and memory (shown in blue). At the end of an intermittency period, the next position is evaluated for the probability of a new whiff onset.
  • Figure 2: COSMOS generates odor experiences with statistics that closely match the statistical characteristics and distributions observed in real data. Throughout the figure blue shows empirical data from the HWS dataset collected in the Black Rock Desert, whereas red corresponds to the COSMOS simulation results. Quantities with a subscript $D$ represents desert data, and with subscript $CO$ represents COSMOS. (A-i) Spatial data driven heatmap representing probability of whiff onset. (A-ii) Actual odor encounter locations from the HWS desert experiment nag2024odour. (A-iii) COSMOS simulation of an odor experience for same trajectory as in A-ii. (B-i) Time series of odor concentration for actual and simulated odor. (B-ii) and (B-iii) presents histograms of whiff count distributions (peak normalized between 0 to 1) and average odor concentrations as a function of distance from the source for the real and simulated data, respectively. (C) Two-dimensional histograms comparing whiff statistics with distance from the source for both real and simulated data, highlighting key metrics identified in nag2024odour. The histograms were normalized between 0 to 1. (D) Wasserstein distance distributions between real and simulated whiff statistics, bootstrapped 1000 times. The red dotted lines indicate observed Wasserstein distance values, and the p-values quantify the similarity between real and predicted distributions, higher p-values shows stronger similarity in the distributions.
  • Figure 3: COSMOS simulations provide odor experiences with statistics that closely match those from a dataset generated from a computational fluid dynamics (CFD) simulation rigolli2022learning. Throughout the figure, blue represents data from the CFD dataset, while red corresponds to simulation results from COSMOS. Quantities with a subscript $CFD$ represents the CFD data, and with subscript $CO$ represents COSMOS. (A-i-iii) Snapshots of the CFD simulated plume at different time steps rigolli2022learning. (A-iv) Data driven heatmap representing probability of whiff onset. (A-v) Actual odor experience for an agent moving through the CFD simulation. (A-vi) Simulated odor experience for same trajectory as in A-v. (B-i) Example time series traces of actual and predicted odor. (B-ii) and (B-iii) presents histograms of whiff count distributions (peak normalized between 0 to 1) and average odor concentrations as a function of distance from the source for the real and simulated data, respectively. (C) Two-dimensional histograms comparing whiff statistics (same statistics as in\ref{['fig:figure2']}) with distance from the source for both real and simulated data. The histograms were normalized between 0 to 1. Wasserstein distance distributions between real and simulated whiff statistics, bootstrapped 1000 times as in previous section and Fig. \ref{['fig:figure2']}. The red dotted lines indicate observed Wasserstein distance values.
  • Figure 4: COSMOS can be used to test odor navigation strategies. To demonstrate the application of COSMOS we used it to provide odor experiences to agents programmed with a cast and surge strategy starting from 150 different locations. Throughout the figure, blue represents simulations that used the CFD data to provide odor experiences for the moving agents, and red represents simulations that used COSMOS to provide odor experiences. (A) Representative trajectories with 6-second snippets of odor concentration experienced during these trajectories. Black dots indicate locations where whiffs were detected. In A(i), the green dot indicates the starting location of the agent, purple 'x' indicates the source position, and blue and red show the agent's trajectory. The grey scale $\bar{P}(wo|x,y)$ represents the odor probability field. (B (i-iii)) UMAP projections of multiple trajectory features, calculated from 150 simulated trajectories using either the CFD data or COSMOS simulator. The clusters show substantial overlap with a silhouette score ($a_s$) = 0.01 and a normalized centroid distance ($d_{norm}$) = 0.09 for 150 trajectories. (C) CPU time measured for 150 trajectory simulations normalized by number of steps taken in each simulations shows that COSMOS is 35 times faster than reading odor experiences from the CFD data.
  • Figure 5: COSMOS relies on spatially binned empirical data to determine whiff onset probabilities, whiff concentrations, durations, and intermittencies. (A) Empirical whiff characteristics experienced in Higher Wind Speed (HWS) candidate dataset binned in 5x5 meters bins (i) Whiff Concentration (WC, a.u.) (ii) Whiff Duration (WD, s) (iii) Whiff Standard Deviation (WSD, a.u.) (iv) Whiff Intermittency (WI, s).(B) Stepwise Whiff Concentration modeling and smoothing (C) Internal memory based decision making and sampling Whiff Intermittency to create similar experience as seen in HWS dense and sparse odor packets.
  • ...and 2 more figures