Table of Contents
Fetching ...

Toward AI-Driven Digital Organism: Multiscale Foundation Models for Predicting, Simulating and Programming Biology at All Levels

Le Song, Eran Segal, Eric Xing

TL;DR

Biology spans multiple scales from molecules to populations, yet traditional foundation models struggle to reason across this breadth. The authors propose AI-Driven Digital Organism (AIDO), a modular, multiscale system of foundation models designed to predict, simulate, and program biology across molecular, cellular, and phenotypic levels. They lay out a three-stage construction plan (divide-and-conquer, connect, align) and detail data types, new tokenizers, 2D positional encodings, and differentiable computation graphs to enable cross-scale integration, along with scalable HPC and standardized software. The framework aims to safely accelerate discovery and personalized medicine by enabling in silico design, testing, and optimization, while addressing explainability, trust, and safety. Open-source infrastructure, benchmarks, and a roadmap for future growth are proposed to foster a community-driven ecosystem for biologically grounded AI advances.

Abstract

We present an approach of using AI to model and simulate biology and life. Why is it important? Because at the core of medicine, pharmacy, public health, longevity, agriculture and food security, environmental protection, and clean energy, it is biology at work. Biology in the physical world is too complex to manipulate and always expensive and risky to tamper with. In this perspective, we layout an engineering viable approach to address this challenge by constructing an AI-Driven Digital Organism (AIDO), a system of integrated multiscale foundation models, in a modular, connectable, and holistic fashion to reflect biological scales, connectedness, and complexities. An AIDO opens up a safe, affordable and high-throughput alternative platform for predicting, simulating and programming biology at all levels from molecules to cells to individuals. We envision that an AIDO is poised to trigger a new wave of better-guided wet-lab experimentation and better-informed first-principle reasoning, which can eventually help us better decode and improve life.

Toward AI-Driven Digital Organism: Multiscale Foundation Models for Predicting, Simulating and Programming Biology at All Levels

TL;DR

Biology spans multiple scales from molecules to populations, yet traditional foundation models struggle to reason across this breadth. The authors propose AI-Driven Digital Organism (AIDO), a modular, multiscale system of foundation models designed to predict, simulate, and program biology across molecular, cellular, and phenotypic levels. They lay out a three-stage construction plan (divide-and-conquer, connect, align) and detail data types, new tokenizers, 2D positional encodings, and differentiable computation graphs to enable cross-scale integration, along with scalable HPC and standardized software. The framework aims to safely accelerate discovery and personalized medicine by enabling in silico design, testing, and optimization, while addressing explainability, trust, and safety. Open-source infrastructure, benchmarks, and a roadmap for future growth are proposed to foster a community-driven ecosystem for biologically grounded AI advances.

Abstract

We present an approach of using AI to model and simulate biology and life. Why is it important? Because at the core of medicine, pharmacy, public health, longevity, agriculture and food security, environmental protection, and clean energy, it is biology at work. Biology in the physical world is too complex to manipulate and always expensive and risky to tamper with. In this perspective, we layout an engineering viable approach to address this challenge by constructing an AI-Driven Digital Organism (AIDO), a system of integrated multiscale foundation models, in a modular, connectable, and holistic fashion to reflect biological scales, connectedness, and complexities. An AIDO opens up a safe, affordable and high-throughput alternative platform for predicting, simulating and programming biology at all levels from molecules to cells to individuals. We envision that an AIDO is poised to trigger a new wave of better-guided wet-lab experimentation and better-informed first-principle reasoning, which can eventually help us better decode and improve life.

Paper Structure

This paper contains 28 sections, 15 figures.

Figures (15)

  • Figure 1: Biology is a complex multiscale network.
  • Figure 2: An AI-driven digital organism (AIDO) is a system of multiscale foundation models for biology, which consists of 4 layers: data layer, foundation model system layer, downstream utility layer, and applications of various types of bioengineering problems.
  • Figure 3: Stage 1 of building an AIDO.
  • Figure 4: Tokenizer for complex biological data which leverages vector quantized variational autoencoder architecture. Tokenized or quantized biological data can then inter-operate better with other discrete information and operations in foundation models.
  • Figure 5: Stage 2 of building an AIDO.
  • ...and 10 more figures