Table of Contents
Fetching ...

Machine learning approaches to explore important features behind bird flight modes

Yukino Kawai, Tatsuya Hisada, Kozue Shiomi, Momoko Hayamizu

TL;DR

This study quantified the relative importance of each feature using Feature Importance and SHAP values, and used them to construct weighted L1 distance matrices and construct NJ trees, which highlighted the complexity of constructing a biologically useful distance matrix from correlated phenotypic features.

Abstract

Birds exhibit a variety of flight styles, primarily classified as flapping, which is characterized by rapid up-and-down wing movements, and soaring, which involves gliding with wings outstretched. Each species usually performs specific flight styles, and this has been argued in terms of morphological and physiological adaptation. However, it remains a challenge to evaluate the contribution of each factor to the difference in flight styles. In this study, using phenotypic data from 635 migratory bird species, such as body mass, wing length, and breeding periods, we quantified the relative importance of each feature using Feature Importance and SHAP values, and used them to construct weighted L1 distance matrices and construct NJ trees. Comparison with traditional phylogenetic logistic regression revealed similarity in top-ranked features, but also differences in overall weight distributions and clustering patterns in NJ trees. Our results highlight the complexity of constructing a biologically useful distance matrix from correlated phenotypic features, while the complementary nature of these weighting methods suggests the potential utility of multi-faceted approaches to assessing feature contributions.

Machine learning approaches to explore important features behind bird flight modes

TL;DR

This study quantified the relative importance of each feature using Feature Importance and SHAP values, and used them to construct weighted L1 distance matrices and construct NJ trees, which highlighted the complexity of constructing a biologically useful distance matrix from correlated phenotypic features.

Abstract

Birds exhibit a variety of flight styles, primarily classified as flapping, which is characterized by rapid up-and-down wing movements, and soaring, which involves gliding with wings outstretched. Each species usually performs specific flight styles, and this has been argued in terms of morphological and physiological adaptation. However, it remains a challenge to evaluate the contribution of each factor to the difference in flight styles. In this study, using phenotypic data from 635 migratory bird species, such as body mass, wing length, and breeding periods, we quantified the relative importance of each feature using Feature Importance and SHAP values, and used them to construct weighted L1 distance matrices and construct NJ trees. Comparison with traditional phylogenetic logistic regression revealed similarity in top-ranked features, but also differences in overall weight distributions and clustering patterns in NJ trees. Our results highlight the complexity of constructing a biologically useful distance matrix from correlated phenotypic features, while the complementary nature of these weighting methods suggests the potential utility of multi-faceted approaches to assessing feature contributions.

Paper Structure

This paper contains 13 sections, 5 figures, 1 table.

Figures (5)

  • Figure 1: flapping type (left) and soaring type (right)
  • Figure 2: Illustration of wing length $W$ and the distance $S$ from the carpal joint to the tip of the second primary feather used to calculate HWI sheard2020ecological.
  • Figure 3: Normalized mean FI and SHAP values of the bird features
  • Figure 4: The NJ trees $T_{\rm ave}$, $T_{\rm FI}$, $D_{\rm SHAP}$, and $T_{\rm PL}$ constructed from the distance matrices $D_{\rm ave}$, $D_{\rm FI}$, $D_{\rm SHAP}$ and $D_{\rm PL}$. For ease of comparison, we have colored and annotated only the subtrees for the soaring species. A and A$^\prime$ are clusters mainly consisting of Accipitridae, Diomedeidae, Fregatidae, and Procellariidae families; B, B$^\prime$, and B$^{\prime\prime}$ are those mainly formed by Falconidae family; C is a singleton cluster of Anhingidae family; D and D$^\prime$ are clusters mainly consisting of Accipitridae, Cathartidae, and Falconidae families; E and E$^\prime$ are those containing all species of Ciconiidae and Pelecanidae families.
  • Figure 5: Histograms for the standardized values of each features in the flapping and soaring species.