Priority prediction of Asian Hornet sighting report using machine learning methods
Yixin Liu, Jiaxin Guo, Jieyang Dong, Luoqian Jiang, Haoyuan Ouyang
TL;DR
This paper addresses prioritizing public sighting reports of Vespa mandarinia to enable targeted nest verification and destruction. It formulates a binary credibility prediction problem where $p^{(i)}=f(x^{(i)};\theta)$ is learned by a logistic regression with a weighted cross-entropy loss to handle extreme class imbalance. It extracts multimodal features from location, time, image, and text, building a feature vector $\mathbf{s}$ fed to the classifier. It then computes report priority via mutual influence $F_{i,j}$ and $Z_i=\sum_j F_{i,j} p_j$, enabling ranking that favors corroborated sightings. Experiments on the WSDA dataset from 2019–2021 show improved accuracy and prioritization performance, validating the method's practical value for pest-control resource allocation.
Abstract
As infamous invaders to the North American ecosystem, the Asian giant hornet (Vespa mandarinia) is devastating not only to native bee colonies, but also to local apiculture. One of the most effective way to combat the harmful species is to locate and destroy their nests. By mobilizing the public to actively report possible sightings of the Asian giant hornet, the governmentcould timely send inspectors to confirm and possibly destroy the nests. However, such confirmation requires lab expertise, where manually checking the reports one by one is extremely consuming of human resources. Further given the limited knowledge of the public about the Asian giant hornet and the randomness of report submission, only few of the numerous reports proved positive, i.e. existing nests. How to classify or prioritize the reports efficiently and automatically, so as to determine the dispatch of personnel, is of great significance to the control of the Asian giant hornet. In this paper, we propose a method to predict the priority of sighting reports based on machine learning. We model the problem of optimal prioritization of sighting reports as a problem of classification and prediction. We extracted a variety of rich features in the report: location, time, image(s), and textual description. Based on these characteristics, we propose a classification model based on logistic regression to predict the credibility of a certain report. Furthermore, our model quantifies the impact between reports to get the priority ranking of the reports. Extensive experiments on the public dataset from the WSDA (the Washington State Department of Agriculture) have proved the effectiveness of our method.
