Table of Contents
Fetching ...

Efficient Mixture of Geographical Species for On Device Wildlife Monitoring

Emmanuel Azuh Mensah, Joban Mand, Yueheng Ou, Min Jang, Kurtis Heimerl

TL;DR

The paper addresses efficient, on-device wildlife monitoring by conditioning vision-transformer subnetworks on geographic location. It presents a geography-aware mixture-of-experts framework that uses Space2Vec-based location embeddings to route activations to location-specific experts, with a pruning mechanism to drop unused experts per deployment location. The approach is evaluated on iNaturalist and iWildcam, showing that a subset of geographically specialized experts can maintain high accuracy while reducing inference load, especially with careful imputation for imbalanced data. This work advances practical edge deployment for ecosystem monitoring by enabling geo-conditioned, energy-efficient inference without extensive downstream finetuning.

Abstract

Efficient on-device models have become attractive for near-sensor insight generation, of particular interest to the ecological conservation community. For this reason, deep learning researchers are proposing more approaches to develop lower compute models. However, since vision transformers are very new to the edge use case, there are still unexplored approaches, most notably conditional execution of subnetworks based on input data. In this work, we explore the training of a single species detector which uses conditional computation to bias structured sub networks in a geographically-aware manner. We propose a method for pruning the expert model per location and demonstrate conditional computation performance on two geographically distributed datasets: iNaturalist and iWildcam.

Efficient Mixture of Geographical Species for On Device Wildlife Monitoring

TL;DR

The paper addresses efficient, on-device wildlife monitoring by conditioning vision-transformer subnetworks on geographic location. It presents a geography-aware mixture-of-experts framework that uses Space2Vec-based location embeddings to route activations to location-specific experts, with a pruning mechanism to drop unused experts per deployment location. The approach is evaluated on iNaturalist and iWildcam, showing that a subset of geographically specialized experts can maintain high accuracy while reducing inference load, especially with careful imputation for imbalanced data. This work advances practical edge deployment for ecosystem monitoring by enabling geo-conditioned, energy-efficient inference without extensive downstream finetuning.

Abstract

Efficient on-device models have become attractive for near-sensor insight generation, of particular interest to the ecological conservation community. For this reason, deep learning researchers are proposing more approaches to develop lower compute models. However, since vision transformers are very new to the edge use case, there are still unexplored approaches, most notably conditional execution of subnetworks based on input data. In this work, we explore the training of a single species detector which uses conditional computation to bias structured sub networks in a geographically-aware manner. We propose a method for pruning the expert model per location and demonstrate conditional computation performance on two geographically distributed datasets: iNaturalist and iWildcam.

Paper Structure

This paper contains 21 sections, 2 equations, 10 figures, 3 tables.

Figures (10)

  • Figure 1: The proposed model setup first finetunes MobileViTV2 mehta2022_mvitv2, with geographical location embeddings from wu2024torchspatial corresponding to each image, which is included in a multimodal supervised contrastive formulation ($\mathcal{L}$) before each transformer layer MLP. We freeze the location encoder but finetune the location projection MLP. After fine tuning, we follow the same approach as mensah2024visionmixtureexpertswildlife to derive the experts model. Experts not activated for a deployment location can then be pruned away.
  • Figure 2: S2 regions for the 6 identifiers '2/1', '2/2', '2/0', '0/', '1/', and '4/2', used in evaluating the effect of location grouped species on subnetwork importance for iNaturalist-Geo-10K.
  • Figure 3: Frequency plot of iWildcam2020-WILS dataset, demonstrating the difficulty of obtaining class balanced camera trap datasets to develop models that generalize well at deployment time.
  • Figure 4: Density plot of global coverage percentage of various species. iNaturalist species distribution prediction data suggests that many species have local distribution, a characteristic useful in designing a geographical based mixture of experts model.
  • Figure 5: Sample cross layer routing for S2 cell '2/1' at layers $1,3,5,7$ of a $64$ expert MoE model. Grey lines represent routes in the bottom 90th percentile utilization, blue lines are between 90th and 99.9th percentile and red lines are above 99.9th percentile.
  • ...and 5 more figures