Table of Contents
Fetching ...

Position: Biology is the Challenge Physics-Informed ML Needs to Evolve

Julien Martinelli

TL;DR

The paper argues that physics-informed ML (PIML) struggles to model complex biological dynamical systems due to uncertain priors, data heterogeneity, partial observability, and high dimensionality. It then proposes Biology-Informed ML (BIML), an evolution of PIML defined by four pillars—uncertainty quantification, contextualization, constrained latent structure inference, and scalability—and enabled by Foundation Models (FMs) and Large Language Models (LLMs) as integrative tools. BIML aims to ground mechanistic inference in biological context, leveraging priors and latent components while maintaining tractable, scalable inference. The authors illustrate BIML with a gene regulatory network inference use case and offer concrete recommendations for benchmarks, an application-driven research approach, and community-building to accelerate adoption in biology and related domains.

Abstract

Physics-Informed Machine Learning (PIML) has successfully integrated mechanistic understanding into machine learning, particularly in domains governed by well-known physical laws. This success has motivated efforts to apply PIML to biology, a field rich in dynamical systems but shaped by different constraints. Biological modeling, however, presents unique challenges: multi-faceted and uncertain prior knowledge, heterogeneous and noisy data, partial observability, and complex, high-dimensional networks. In this position paper, we argue that these challenges should not be seen as obstacles to PIML, but as catalysts for its evolution. We propose Biology-Informed Machine Learning (BIML): a principled extension of PIML that retains its structural grounding while adapting to the practical realities of biology. Rather than replacing PIML, BIML retools its methods to operate under softer, probabilistic forms of prior knowledge. We outline four foundational pillars as a roadmap for this transition: uncertainty quantification, contextualization, constrained latent structure inference, and scalability. Foundation Models and Large Language Models will be key enablers, bridging human expertise with computational modeling. We conclude with concrete recommendations to build the BIML ecosystem and channel PIML-inspired innovation toward challenges of high scientific and societal relevance.

Position: Biology is the Challenge Physics-Informed ML Needs to Evolve

TL;DR

The paper argues that physics-informed ML (PIML) struggles to model complex biological dynamical systems due to uncertain priors, data heterogeneity, partial observability, and high dimensionality. It then proposes Biology-Informed ML (BIML), an evolution of PIML defined by four pillars—uncertainty quantification, contextualization, constrained latent structure inference, and scalability—and enabled by Foundation Models (FMs) and Large Language Models (LLMs) as integrative tools. BIML aims to ground mechanistic inference in biological context, leveraging priors and latent components while maintaining tractable, scalable inference. The authors illustrate BIML with a gene regulatory network inference use case and offer concrete recommendations for benchmarks, an application-driven research approach, and community-building to accelerate adoption in biology and related domains.

Abstract

Physics-Informed Machine Learning (PIML) has successfully integrated mechanistic understanding into machine learning, particularly in domains governed by well-known physical laws. This success has motivated efforts to apply PIML to biology, a field rich in dynamical systems but shaped by different constraints. Biological modeling, however, presents unique challenges: multi-faceted and uncertain prior knowledge, heterogeneous and noisy data, partial observability, and complex, high-dimensional networks. In this position paper, we argue that these challenges should not be seen as obstacles to PIML, but as catalysts for its evolution. We propose Biology-Informed Machine Learning (BIML): a principled extension of PIML that retains its structural grounding while adapting to the practical realities of biology. Rather than replacing PIML, BIML retools its methods to operate under softer, probabilistic forms of prior knowledge. We outline four foundational pillars as a roadmap for this transition: uncertainty quantification, contextualization, constrained latent structure inference, and scalability. Foundation Models and Large Language Models will be key enablers, bridging human expertise with computational modeling. We conclude with concrete recommendations to build the BIML ecosystem and channel PIML-inspired innovation toward challenges of high scientific and societal relevance.

Paper Structure

This paper contains 31 sections, 1 equation, 3 figures.

Figures (3)

  • Figure 1: Biology-specific challenges in dynamical systems discovery. While the field has mostly focused on problems arising from physics (top panel), the resulting methods are not geared towards the unique challenges inherent to biological data (lower panel).
  • Figure 2: The four pillars of Biology-Informed Machine Learning and the integrative role of Foundation Models. FMs and LLMs support each BIML pillar by embedding biological knowledge, guiding inference, and enabling scalable, uncertainty-aware modeling across heterogeneous and partially observed systems.
  • Figure 3: Growth of physics-informed ML terminology in biomedicine (2015--2024).