Table of Contents
Fetching ...

Towards AI-Guided Open-World Ecological Taxonomic Classification

Cheng Yaw Low, Heejoon Koo, Jaewoo Park, Kaleb Mesfin Asfaw, Meeyoung Cha

TL;DR

The paper tackles open-world ecological taxonomy by addressing the intertwined challenges of long-tailed distributions, fine-grained differentiation, spatiotemporal domain shifts, and open-set taxa in plant classification. It introduces TaxoNet, a domain-specific embedding backbone equipped with a dual-margin penalization loss and a norm-guided sampling strategy to balance learning signals across head and tail taxa, supplemented by adaptive margin scaling and open-set awareness. The approach yields significant macro-recall gains on three diverse plant datasets, improves domain generalization, and enhances unseen-taxa rejection, while also demonstrating limitations of general-purpose multimodal models in fine-grained plant taxonomy. Together, these contributions advance scalable, open-world plant biodiversity monitoring and motivate future development of domain-specialized multimodal foundations for ecological reasoning.

Abstract

AI-guided classification of ecological families, genera, and species underpins global sustainability efforts such as biodiversity monitoring, conservation planning, and policy-making. Progress toward this goal is hindered by long-tailed taxonomic distributions from class imbalance, along with fine-grained taxonomic variations, test-time spatiotemporal domain shifts, and closed-set assumptions that can only recognize previously seen taxa. We introduce the Open-World Ecological Taxonomy Classification, a unified framework that captures the co-occurrence of these challenges in realistic ecological settings. To address them, we propose TaxoNet, an embedding-based encoder with a dual-margin penalization loss that strengthens learning signals from rare underrepresented taxa while mitigating the dominance of overrepresented ones, directly confronting interrelated challenges. We evaluate our method on diverse ecological domains: Google Auto-Arborist (urban trees), iNat-Plantae (Plantae observations from various ecosystems in iNaturalist-2019), and NAFlora-Mini (a curated herbarium collection). Our model consistently outperforms baselines, particularly for rare taxa, establishing a strong foundation for open-world plant taxonomic monitoring. Our findings further show that general-purpose multimodal foundation models remain constrained in plant-domain applications.

Towards AI-Guided Open-World Ecological Taxonomic Classification

TL;DR

The paper tackles open-world ecological taxonomy by addressing the intertwined challenges of long-tailed distributions, fine-grained differentiation, spatiotemporal domain shifts, and open-set taxa in plant classification. It introduces TaxoNet, a domain-specific embedding backbone equipped with a dual-margin penalization loss and a norm-guided sampling strategy to balance learning signals across head and tail taxa, supplemented by adaptive margin scaling and open-set awareness. The approach yields significant macro-recall gains on three diverse plant datasets, improves domain generalization, and enhances unseen-taxa rejection, while also demonstrating limitations of general-purpose multimodal models in fine-grained plant taxonomy. Together, these contributions advance scalable, open-world plant biodiversity monitoring and motivate future development of domain-specialized multimodal foundations for ecological reasoning.

Abstract

AI-guided classification of ecological families, genera, and species underpins global sustainability efforts such as biodiversity monitoring, conservation planning, and policy-making. Progress toward this goal is hindered by long-tailed taxonomic distributions from class imbalance, along with fine-grained taxonomic variations, test-time spatiotemporal domain shifts, and closed-set assumptions that can only recognize previously seen taxa. We introduce the Open-World Ecological Taxonomy Classification, a unified framework that captures the co-occurrence of these challenges in realistic ecological settings. To address them, we propose TaxoNet, an embedding-based encoder with a dual-margin penalization loss that strengthens learning signals from rare underrepresented taxa while mitigating the dominance of overrepresented ones, directly confronting interrelated challenges. We evaluate our method on diverse ecological domains: Google Auto-Arborist (urban trees), iNat-Plantae (Plantae observations from various ecosystems in iNaturalist-2019), and NAFlora-Mini (a curated herbarium collection). Our model consistently outperforms baselines, particularly for rare taxa, establishing a strong foundation for open-world plant taxonomic monitoring. Our findings further show that general-purpose multimodal foundation models remain constrained in plant-domain applications.

Paper Structure

This paper contains 51 sections, 2 theorems, 15 equations, 6 figures, 11 tables, 1 algorithm.

Key Result

Proposition 1

Let $I_c$ be the set of indices of samples in the $c$-th class. Then, as $\sigma_c \to 0$ where $\mu_c$ is class-wise mean $\sigma_c$ is the class-wise standard deviation of the predicted probabilities with $\overline{p}_c = \frac{1}{N_c} \sum_{i \in I_c} p_{i, c}$, and $\alpha = N_c (1 - \overline{p}_c)$

Figures (6)

  • Figure 1: The proposed Open-World Ecological Taxonomy Challenge, which organizes general, unique and deployment-level challenges according to realistic ecological scenarios. The challenges targetted in this work are shown with their corresponding problem settings—for example, the open-set task (C1) involves recognizing both known and unknown taxa during inference; and so on.
  • Figure 2: Schematic comparison of softmax-based losses and the proposed dual-margin penalization loss: (a) Softmax losses cause prototype misalignment as head classes strongly repel tail embeddings; (b) the proposed loss regulates within-class attraction and between-class repulsion, inducing compact tail-class embeddings.
  • Figure 3: Embedding norm distribution for 200 highest and lowest instances in AA-Central. High-norm examples are exclusively from head and between classes; low-norms are from tail or those with greater within-class variations (e.g., due to seasonal changes). Class Group is defined by training cardinality: head $>$ 2,000 samples, tail $<$ 100 samples, between: otherwise.
  • Figure 4: Success and failure cases on Auto-Arborist and iNat-Plantae under open-world ecological conditions. Softmax prediction scores are annotated on each image.
  • Figure 5: Zero-shot chain-of-thought (CoT) prompt template used to evaluate MLLMs, instructing the models to perform hierarchical reasoning by first predicting the genus and then refining the prediction to the species level. This approach is inspired by sequential diagnosis prediction utilizing medical ontology-guided implicit regularization koo2024next. We also enforce a structured JSON output for evaluation. BioCLIP, on the contrary, is evaluated via CLIP-style embedding similarity using its vision-language encoders (e.g., encoding class-name prompts such as "a photo of Quercus robur"), without CoT-based reasoning.
  • ...and 1 more figures

Theorems & Definitions (4)

  • Proposition 1
  • proof
  • Proposition 2
  • proof