Deep-learning-based pan-phenomic data reveals the explosive evolution of avian visual disparity
Jiao Sun
TL;DR
This study tackles biases in traditional morphometrics by using a deep learning pan-phenomic approach to quantify avian morphology from a CNN trained on >10k bird species. It extracts 512-dimensional fc weights to build a high-dimensional morphospace, reducing to the minimal subspace explaining a substantial portion of variance and measuring phenotypic relationships via cosine similarity. The embedding encodes phenotypic convergence and reveals that species richness primarily drives morphospace expansion, with a visual early-burst in disparity after the K-Pg extinction. The work demonstrates that hierarchical taxonomic structure emerges from flat labels and that CNNs can learn holistic body plans rather than textures, offering a scalable framework for macroevolution across taxa.
Abstract
The evolution of biological morphology is critical for understanding the diversity of the natural world, yet traditional analyses often involve subjective biases in the selection and coding of morphological traits. This study employs deep learning techniques, utilising a ResNet34 model capable of recognising over 10,000 bird species, to explore avian morphological evolution. We extract weights from the model's final fully connected (fc) layer and investigate the semantic alignment between the high-dimensional embedding space learned by the model and biological phenotypes. The results demonstrate that the high-dimensional embedding space encodes phenotypic convergence. Subsequently, we assess the morphological disparity among various taxa and evaluate the association between morphological disparity and species richness, demonstrating that species richness is the primary driver of morphospace expansion. Moreover, the disparity-through-time analysis reveals a visual "early burst" after the K-Pg extinction. While mainly aimed at evolutionary analysis, this study also provides insights into the interpretability of Deep Neural Networks. We demonstrate that hierarchical semantic structures (biological taxonomy) emerged in the high-dimensional embedding space despite being trained on flat labels. Furthermore, through adversarial examples, we provide evidence that our model in this task can overcome texture bias and learn holistic shape representations (body plans), challenging the prevailing view that CNNs rely primarily on local textures.
