The Phantom of the Elytra -- Phylogenetic Trait Extraction from Images of Rove Beetles Using Deep Learning -- Is the Mask Enough?
Roberta Hunt, Kim Steenstrup Pedersen
TL;DR
This study tackles the scalability challenge of phylogenetic trait extraction from images by comparing three morphological representations—segmentations, binary masks, and Fourier descriptors—using deep learning on the rove beetle-focused Rove-Tree-11 dataset. It finds that binary masks yield the strongest phylogenetic signal, while Fourier descriptors underperform, likely due to limited model capacity and outline distortions. The results suggest shape representations may be more phylogenetically informative than dorsal texture or full segmentation in this taxon, though broader validation is needed. Overall, the work informs representation choices for automated morphological phylogenetics and highlights the need for explainability and cross-taxon assessment.
Abstract
Phylogenetic analysis traditionally relies on labor-intensive manual extraction of morphological traits, limiting its scalability for large datasets. Recent advances in deep learning offer the potential to automate this process, but the effectiveness of different morphological representations for phylogenetic trait extraction remains poorly understood. In this study, we compare the performance of deep learning models using three distinct morphological representations - full segmentations, binary masks, and Fourier descriptors of beetle outlines. We test this on the Rove-Tree-11 dataset, a curated collection of images from 215 rove beetle species. Our results demonstrate that the mask-based model outperformed the others, achieving a normalized Align Score of 0.33 plus/minus 0.02 on the test set, compared to 0.45 plus/minus 0.01 for the Fourier-based model and 0.39 plus/minus 0.07 for the segmentation-based model. The performance of the mask-based model likely reflects its ability to capture shape features while taking advantage of the depth and capacity of the ResNet50 architecture. These results also indicate that dorsal textural features, at least in this group of beetles, may be of lowered phylogenetic relevance, though further investigation is necessary to confirm this. In contrast, the Fourier-based model suffered from reduced capacity and occasional inaccuracies in outline approximations, particularly in fine structures like legs. These findings highlight the importance of selecting appropriate morphological representations for automated phylogenetic studies and the need for further research into explainability in automatic morphological trait extraction.
