Table of Contents
Fetching ...

Hierarchical Gradient-Based Genetic Sampling for Accurate Prediction of Biological Oscillations

Heng Rao, Yu Gu, Jason Zipeng Zhang, Ge Yu, Yang Cao, Minghan Chen

TL;DR

This work tackles the challenge of predicting biological oscillations from system coefficients, a regression task hampered by data imbalance and sharp boundary transitions between oscillatory and non-oscillatory regimes. It introduces the Hierarchical Gradient-based Genetic Sampling (HGGS) framework, a two-layer approach combining Gradient-based Filtering to create a balanced coarse dataset, and Multigrid Genetic Sampling to refine boundaries and probe high-residual regions via multi-scale grids. Through experiments on four biological systems, HGGS consistently achieves superior accuracy and better data diversity than seven baselines, demonstrating reductions in non-oscillatory bias and boundary sensitivity. The method offers a data-efficient strategy for sampling in highly constrained coefficient spaces and is broadly applicable to predicting other system-level features beyond oscillations.

Abstract

Biological oscillations are periodic changes in various signaling processes crucial for the proper functioning of living organisms. These oscillations are modeled by ordinary differential equations, with coefficient variations leading to diverse periodic behaviors, typically measured by oscillatory frequencies. This paper explores sampling techniques for neural networks to model the relationship between system coefficients and oscillatory frequency. However, the scarcity of oscillations in the vast coefficient space results in many samples exhibiting non-periodic behaviors, and small coefficient changes near oscillation boundaries can significantly alter oscillatory properties. This leads to non-oscillatory bias and boundary sensitivity, making accurate predictions difficult. While existing importance and uncertainty sampling approaches partially mitigate these challenges, they either fail to resolve the sensitivity problem or result in redundant sampling. To address these limitations, we propose the Hierarchical Gradient-based Genetic Sampling (HGGS) framework, which improves the accuracy of neural network predictions for biological oscillations. The first layer, Gradient-based Filtering, extracts sensitive oscillation boundaries and removes redundant non-oscillatory samples, creating a balanced coarse dataset. The second layer, Multigrid Genetic Sampling, utilizes residual information to refine these boundaries and explore new high-residual regions, increasing data diversity for model training. Experimental results demonstrate that HGGS outperforms seven comparative sampling methods across four biological systems, highlighting its effectiveness in enhancing sampling and prediction accuracy.

Hierarchical Gradient-Based Genetic Sampling for Accurate Prediction of Biological Oscillations

TL;DR

This work tackles the challenge of predicting biological oscillations from system coefficients, a regression task hampered by data imbalance and sharp boundary transitions between oscillatory and non-oscillatory regimes. It introduces the Hierarchical Gradient-based Genetic Sampling (HGGS) framework, a two-layer approach combining Gradient-based Filtering to create a balanced coarse dataset, and Multigrid Genetic Sampling to refine boundaries and probe high-residual regions via multi-scale grids. Through experiments on four biological systems, HGGS consistently achieves superior accuracy and better data diversity than seven baselines, demonstrating reductions in non-oscillatory bias and boundary sensitivity. The method offers a data-efficient strategy for sampling in highly constrained coefficient spaces and is broadly applicable to predicting other system-level features beyond oscillations.

Abstract

Biological oscillations are periodic changes in various signaling processes crucial for the proper functioning of living organisms. These oscillations are modeled by ordinary differential equations, with coefficient variations leading to diverse periodic behaviors, typically measured by oscillatory frequencies. This paper explores sampling techniques for neural networks to model the relationship between system coefficients and oscillatory frequency. However, the scarcity of oscillations in the vast coefficient space results in many samples exhibiting non-periodic behaviors, and small coefficient changes near oscillation boundaries can significantly alter oscillatory properties. This leads to non-oscillatory bias and boundary sensitivity, making accurate predictions difficult. While existing importance and uncertainty sampling approaches partially mitigate these challenges, they either fail to resolve the sensitivity problem or result in redundant sampling. To address these limitations, we propose the Hierarchical Gradient-based Genetic Sampling (HGGS) framework, which improves the accuracy of neural network predictions for biological oscillations. The first layer, Gradient-based Filtering, extracts sensitive oscillation boundaries and removes redundant non-oscillatory samples, creating a balanced coarse dataset. The second layer, Multigrid Genetic Sampling, utilizes residual information to refine these boundaries and explore new high-residual regions, increasing data diversity for model training. Experimental results demonstrate that HGGS outperforms seven comparative sampling methods across four biological systems, highlighting its effectiveness in enhancing sampling and prediction accuracy.
Paper Structure (19 sections, 19 equations, 6 figures, 3 tables, 1 algorithm)

This paper contains 19 sections, 19 equations, 6 figures, 3 tables, 1 algorithm.

Figures (6)

  • Figure 1: Examples of different sampling methods applied to the system coefficients of a cell cycle model liu2012hybrid. The orange-blue and black-white color bars indicate trends in oscillation and residual changes, respectively. (a) LHS performs a random sampling of 1000 points in the coefficient domain. (b) IS fails to introduce new samples; (c-d) SMOTE and US lack sufficient coverage in high residual regions (red box). US also produces redundant samples spreading across the domain; (e) Our proposed HGGS method extracts the boundary information (red, yellow boxes) and effectively generates new samples (tear triangle) concentrating on high-residual regions (red box).
  • Figure 2: Overview of Hierarchical Gradient-based Genetic Sampling framework: The input consists of initial samples obtained via LHS. Gradient-based Filtering is then employed to extract sensitive boundary information and eliminate redundant samples, generating balanced coarse data. Next, Multigrid Genetic Sampling constructs candidate sampling from stratified data to sharpen boundary precision. During training, new instances in high-residual areas are explored to continuously improve model performance. Finally, given any set of coefficients $\bm{\lambda}$, the model predicts the oscillatory frequency of the biological system.
  • Figure 3: Accuracy comparison of seven baseline methods (LHS, WRS, IS, IS$^\dag$, US-S, US-P) and our HGGS across four biological systems. HGGS obtained the lowest RMSEs across all testing subsets: minority, boundary, majority, and overall.
  • Figure 4: Ablation study for majority and minority classes across four oscillatory systems. Both Gradient-based Filtering (GF) and Multigrid Genetic Sampling (MGS) layers contribute to the improvement in model accuracy.
  • Figure 5: Sensitivity analysis on (a) number of neighbors $K$, (b) MGS ratio $n_{v1}:n_{v2}$, (c) MGS sampling size $n_s$, and (d) GF filtering ratio $r$. (a-c) are performed on the MPF system, while (d) is conducted on the Cell Cycle system.
  • ...and 1 more figures