Table of Contents
Fetching ...

Dynamic Sensor Selection for Biomarker Discovery

Joshua Pickard, Cooper Stansbury, Amit Surana, Lindsey Muir, Anthony Bloch, Indika Rajapakse

TL;DR

Biological time series data are high-dimensional, making biomarker selection challenging for accurately tracking system states. The authors develop an observability-based framework and Dynamic Sensor Selection (DSS) to allocate biomarkers over time, maximizing observability metrics $\mathcal{M}_1$, $\mathcal{M}_2$, and $\mathcal{M}_3$ under practical constraints. By combining data-driven dynamics (DMD, DGC), time-varying observability analyses, and biological priors (Hi-C constraints), they demonstrate improved state estimation for unmeasured genes, functional observability of phenotypes, and cross-domain applicability to neural activity and EEG data. The work offers a versatile toolkit for designing time-adaptive biomarker panels with broad implications for genomics, cellular reprogramming, and neuroscience, enabling more efficient and robust monitoring of dynamic biological processes through observability-guided sensor selection.

Abstract

Advances in methods of biological data collection are driving the rapid growth of comprehensive datasets across clinical and research settings. These datasets provide the opportunity to monitor biological systems in greater depth and at finer time steps than was achievable in the past. Classically, biomarkers are used to represent and track key aspects of a biological system. Biomarkers retain utility even with the availability of large datasets, since monitoring and interpreting changes in a vast number of molecules remains impractical. However, given the large number of molecules in these datasets, a major challenge is identifying the best biomarkers for a particular setting Here, we apply principles of observability theory to establish a general methodology for biomarker selection. We demonstrate that observability measures effectively identify biologically meaningful sensors in a range of time series transcriptomics data. Motivated by the practical considerations of biological systems, we introduce the method of dynamic sensor selection (DSS) to maximize observability over time, thus enabling observability over regimes where system dynamics themselves are subject to change. This observability framework is flexible, capable of modeling gene expression dynamics and using auxiliary data, including chromosome conformation, to select biomarkers. Additionally, we demonstrate the applicability of this approach beyond genomics by evaluating the observability of neural activity These applications demonstrate the utility of observability-guided biomarker selection for across a wide range of biological systems, from agriculture and biomanufacturing to neural applications and beyond.

Dynamic Sensor Selection for Biomarker Discovery

TL;DR

Biological time series data are high-dimensional, making biomarker selection challenging for accurately tracking system states. The authors develop an observability-based framework and Dynamic Sensor Selection (DSS) to allocate biomarkers over time, maximizing observability metrics , , and under practical constraints. By combining data-driven dynamics (DMD, DGC), time-varying observability analyses, and biological priors (Hi-C constraints), they demonstrate improved state estimation for unmeasured genes, functional observability of phenotypes, and cross-domain applicability to neural activity and EEG data. The work offers a versatile toolkit for designing time-adaptive biomarker panels with broad implications for genomics, cellular reprogramming, and neuroscience, enabling more efficient and robust monitoring of dynamic biological processes through observability-guided sensor selection.

Abstract

Advances in methods of biological data collection are driving the rapid growth of comprehensive datasets across clinical and research settings. These datasets provide the opportunity to monitor biological systems in greater depth and at finer time steps than was achievable in the past. Classically, biomarkers are used to represent and track key aspects of a biological system. Biomarkers retain utility even with the availability of large datasets, since monitoring and interpreting changes in a vast number of molecules remains impractical. However, given the large number of molecules in these datasets, a major challenge is identifying the best biomarkers for a particular setting Here, we apply principles of observability theory to establish a general methodology for biomarker selection. We demonstrate that observability measures effectively identify biologically meaningful sensors in a range of time series transcriptomics data. Motivated by the practical considerations of biological systems, we introduce the method of dynamic sensor selection (DSS) to maximize observability over time, thus enabling observability over regimes where system dynamics themselves are subject to change. This observability framework is flexible, capable of modeling gene expression dynamics and using auxiliary data, including chromosome conformation, to select biomarkers. Additionally, we demonstrate the applicability of this approach beyond genomics by evaluating the observability of neural activity These applications demonstrate the utility of observability-guided biomarker selection for across a wide range of biological systems, from agriculture and biomanufacturing to neural applications and beyond.
Paper Structure (34 sections, 39 equations, 9 figures, 2 tables, 1 algorithm)

This paper contains 34 sections, 39 equations, 9 figures, 2 tables, 1 algorithm.

Figures (9)

  • Figure 1: Framework for applying observability to biological data.(Box 1) Models of biological systems are constructed from experimental data, and sensor selection determines which low-dimensional representations of system trajectories will capture the most informative aspects of the system. (A) The thirteen state variables along with their first order interactions of the Tyson and Novak model are shown. Each state is colored according to their individual contribution to observability measured by $\mathcal{M}_1$ in the synthetic data. $G1R,$ which has the highest average contribution to observability, is boxed in red since it is selected as the first sensor by the greedy sensor selection algorithm. (B) The ratio of singular values $\sigma_k/\sigma_1$ of $\mathcal{O}(\mathbf{x})$ measures $\mathcal{M}_1$ and increases with the number of sensors. This shows that when few sensors are used the smaller singular values are insignificant and the observability matrix $\mathcal{O}(\mathbf{x})$ is approximately low rank. (C) The effective rank of $\mathcal{O}(\mathbf{x})$ is shown when pairs of state variables are included in the sensor set. After G1R is selected by the first iteration of the greedy algorithm, G2R is the next best choice to maximize $\mathcal{M}_1.$(D) The observability $\mathcal{M}_1$ is shown over multiple iterations of the greedy algorithm. At each iteration, the observability increases and the contribution of the next sensor diminishes.
  • Figure 2: State-Dependent Observability.(A) A cell's progression through the cell cycle—whether transitioning through phases during proliferation or stalling in G1/G0 during quiescence—is mediated by CDK2 activity spencer2013proliferation. (B) The Andronov-Hopf oscillator demonstrates either asymptotically stable or periodic limit cycle behavior, depending on the parameter $\alpha$. (C) The transition from stable to periodic behavior in the Andronov-Hopf oscillator coincides with an increase in observability. Initial conditions used to construct the empirical observability Gramians were selected by sampling $x_1$ and $x_2$ from uniform distributions bounded by $\pm 1,\ \pm2,$ and $\pm 4.$
  • Figure 3: Biomarker Selection from Time Series Data.(A) DSS improves the estimation error of individual genes from biomarker data relative to the use of biomarkers that are fixed throughout time. (B) Constraining the sensor selection problem with Hi-C positions highly observable biomarker genes on chromosomes to more closely reflect the spatial distribution of genes within the nucleus, as indicated by the gray background. The positions of the top 10% of biomarkers selected with unconstrained DSS, DSS constrained by Hi-C data, and biomarkers common to both methods are shown in pink, green, and blue, respectively. (C) The time series neuron activity was collected for 10 minute segments on three consecutive days. The recorded activity extracted from twenty neurons is shown, with the activity of three neurons highlighted in red, green, and blue. (D) Throughout the three day period, the observability contributed by each neuron varies greatly. The neuron indicated in red, which initially is the worst sensors, becomes the most observable as its overall activity becomes the largest in day 3. (E) The spatial position of 64 EEG leads colored by their contribution to observability. (F) The signals from each of the 64 EEG leads are ranked based on their observability, with the average rank representing each sensor's mean ranking across all six tasks.
  • Figure 4: Effective Rank of Observability Matrices. The effective rank R of each sensor is shown using $\varepsilon=10^{-5}$. Although the system is observable according to Sedoglavik's algorithm sedoglavic2001probabilistic as noted by liu2013observability, in practice some sensors are better than others.
  • Figure 5: Functionally Observable Cell Types. The top 10% of genes that contribute to the first 5 functionally observable modes of the observablility matrix $\mathcal{O},$ which are obtained as the right singular vectors $\mathbf{V}$ from $\mathcal{O}=\mathbf{U}\Sigma\mathbf{V}^\top,$ are enriched to identify which cell types are observable xie2021gene. In the recreation of Weintraub's reprogramming experiment, Fibroblasts are reprogrammed to myogenic lineages. The functionally observable modes are highly enriched for Fibroblasts, Myofibroblasts and Myoblasts, which are progenitors of Myocytes. Myocytes, which come later in differentiation than Myoblasts, are not functionally observable in this data, which is consistent with the short duration over which the experiment is monitored. The strong enrichment for Fibroblasts and Myogenic lineages indicates that the DSS selected biomarkers make the early stages of reprogramming process functionally observable.
  • ...and 4 more figures