Table of Contents
Fetching ...

A Review of BioTree Construction in the Context of Information Fusion: Priors, Methods, Applications and Trends

Zelin Zang, Yongjie Xu, Chenrui Duan, Yue Yuan, Jinlin Wu, Zhen Lei, Stan Z. Li

TL;DR

This review addresses the challenge of constructing BioTrees by integrating biological priors with multimodal data through deep learning. It surveys data modalities, priors, classical methods, and DL-based approaches, highlighting how information fusion can improve accuracy and interpretability. The article outlines applications across phylogenetics, development, medicine, and ecology, and discusses current limitations and future directions for priors-informed, multimodal BioTree models. By consolidating data resources, priors, and methodological trends, the work aims to guide the development of scalable, interpretable BioTree tools at the intersection of biology and deep learning.

Abstract

Biological tree (BioTree) analysis is a foundational tool in biology, enabling the exploration of evolutionary and differentiation relationships among organisms, genes, and cells. Traditional tree construction methods, while instrumental in early research, face significant challenges in handling the growing complexity and scale of modern biological data, particularly in integrating multimodal datasets. Advances in deep learning (DL) offer transformative opportunities by enabling the fusion of biological prior knowledge with data-driven models. These approaches address key limitations of traditional methods, facilitating the construction of more accurate and interpretable BioTrees. This review highlights critical biological priors essential for phylogenetic and differentiation tree analyses and explores strategies for integrating these priors into DL models to enhance accuracy and interpretability. Additionally, the review systematically examines commonly used data modalities and databases, offering a valuable resource for developing and evaluating multimodal fusion models. Traditional tree construction methods are critically assessed, focusing on their biological assumptions, technical limitations, and scalability issues. Recent advancements in DL-based tree generation methods are reviewed, emphasizing their innovative approaches to multimodal integration and prior knowledge incorporation. Finally, the review discusses diverse applications of BioTrees in various biological disciplines, from phylogenetics to developmental biology, and outlines future trends in leveraging DL to advance BioTree research. By addressing the challenges of data complexity and prior knowledge integration, this review aims to inspire interdisciplinary innovation at the intersection of biology and DL.

A Review of BioTree Construction in the Context of Information Fusion: Priors, Methods, Applications and Trends

TL;DR

This review addresses the challenge of constructing BioTrees by integrating biological priors with multimodal data through deep learning. It surveys data modalities, priors, classical methods, and DL-based approaches, highlighting how information fusion can improve accuracy and interpretability. The article outlines applications across phylogenetics, development, medicine, and ecology, and discusses current limitations and future directions for priors-informed, multimodal BioTree models. By consolidating data resources, priors, and methodological trends, the work aims to guide the development of scalable, interpretable BioTree tools at the intersection of biology and deep learning.

Abstract

Biological tree (BioTree) analysis is a foundational tool in biology, enabling the exploration of evolutionary and differentiation relationships among organisms, genes, and cells. Traditional tree construction methods, while instrumental in early research, face significant challenges in handling the growing complexity and scale of modern biological data, particularly in integrating multimodal datasets. Advances in deep learning (DL) offer transformative opportunities by enabling the fusion of biological prior knowledge with data-driven models. These approaches address key limitations of traditional methods, facilitating the construction of more accurate and interpretable BioTrees. This review highlights critical biological priors essential for phylogenetic and differentiation tree analyses and explores strategies for integrating these priors into DL models to enhance accuracy and interpretability. Additionally, the review systematically examines commonly used data modalities and databases, offering a valuable resource for developing and evaluating multimodal fusion models. Traditional tree construction methods are critically assessed, focusing on their biological assumptions, technical limitations, and scalability issues. Recent advancements in DL-based tree generation methods are reviewed, emphasizing their innovative approaches to multimodal integration and prior knowledge incorporation. Finally, the review discusses diverse applications of BioTrees in various biological disciplines, from phylogenetics to developmental biology, and outlines future trends in leveraging DL to advance BioTree research. By addressing the challenges of data complexity and prior knowledge integration, this review aims to inspire interdisciplinary innovation at the intersection of biology and DL.
Paper Structure (19 sections, 2 figures)

This paper contains 19 sections, 2 figures.

Figures (2)

  • Figure 1: Number of publications in Cell, Nature, and Science related to cell differentiation and phylogenetic tree from 1980 to 2025.Publications were retrieved from the Web of Science Core Collection using the following search queries: (a) Phylogenetic Tree (469 publications): TS=("phylogenetic tree" OR "evolutionary tree" OR "tree of life" OR "phylogenetic analysis" OR "tree-based" OR "phylogenetic reconstruction" OR "phylogenetic relationship" OR "evolutionary relationships"). (b) Cell Differentiation (1689 publications): TS=("cell differentiation" OR "cellular differentiation" OR "differentiation of cells" OR "trajectory inference" OR "lineage inference" OR "pseudotime inference" OR "cell lineage" OR "cell fate"). The blue bars represent publications related to cell differentiation, while the orange bars represent those related to phylogenetic tree. The data shows an increasing trend in both fields, with cell differentiation seeing a more pronounced growth.
  • Figure 2: Relevance Analysis of Papers on Cell Differentiation and Phylogenetic Trees to "Information Fusion" Topic (Published in Cell, Nature, and Science).This analysis evaluates the relevance of papers to the topic of "Information Fusion" using the DeepSeek-70B large language model. Each bar represents the average relevance score for papers published in a given year, showcasing trends in how research aligns with the "Information Fusion" theme over time. Relevance score '0' means the paper is not relevant to the topic, while '1' indicates high relevance. The code for DeepSeek-70B analysis is available at https://github.com/zangzelin/code_info_fusion_biotree.