Efficient Mixture of Geographical Species for On Device Wildlife Monitoring
Emmanuel Azuh Mensah, Joban Mand, Yueheng Ou, Min Jang, Kurtis Heimerl
TL;DR
The paper addresses efficient, on-device wildlife monitoring by conditioning vision-transformer subnetworks on geographic location. It presents a geography-aware mixture-of-experts framework that uses Space2Vec-based location embeddings to route activations to location-specific experts, with a pruning mechanism to drop unused experts per deployment location. The approach is evaluated on iNaturalist and iWildcam, showing that a subset of geographically specialized experts can maintain high accuracy while reducing inference load, especially with careful imputation for imbalanced data. This work advances practical edge deployment for ecosystem monitoring by enabling geo-conditioned, energy-efficient inference without extensive downstream finetuning.
Abstract
Efficient on-device models have become attractive for near-sensor insight generation, of particular interest to the ecological conservation community. For this reason, deep learning researchers are proposing more approaches to develop lower compute models. However, since vision transformers are very new to the edge use case, there are still unexplored approaches, most notably conditional execution of subnetworks based on input data. In this work, we explore the training of a single species detector which uses conditional computation to bias structured sub networks in a geographically-aware manner. We propose a method for pruning the expert model per location and demonstrate conditional computation performance on two geographically distributed datasets: iNaturalist and iWildcam.
