Table of Contents
Fetching ...

Priority prediction of Asian Hornet sighting report using machine learning methods

Yixin Liu, Jiaxin Guo, Jieyang Dong, Luoqian Jiang, Haoyuan Ouyang

TL;DR

This paper addresses prioritizing public sighting reports of Vespa mandarinia to enable targeted nest verification and destruction. It formulates a binary credibility prediction problem where $p^{(i)}=f(x^{(i)};\theta)$ is learned by a logistic regression with a weighted cross-entropy loss to handle extreme class imbalance. It extracts multimodal features from location, time, image, and text, building a feature vector $\mathbf{s}$ fed to the classifier. It then computes report priority via mutual influence $F_{i,j}$ and $Z_i=\sum_j F_{i,j} p_j$, enabling ranking that favors corroborated sightings. Experiments on the WSDA dataset from 2019–2021 show improved accuracy and prioritization performance, validating the method's practical value for pest-control resource allocation.

Abstract

As infamous invaders to the North American ecosystem, the Asian giant hornet (Vespa mandarinia) is devastating not only to native bee colonies, but also to local apiculture. One of the most effective way to combat the harmful species is to locate and destroy their nests. By mobilizing the public to actively report possible sightings of the Asian giant hornet, the governmentcould timely send inspectors to confirm and possibly destroy the nests. However, such confirmation requires lab expertise, where manually checking the reports one by one is extremely consuming of human resources. Further given the limited knowledge of the public about the Asian giant hornet and the randomness of report submission, only few of the numerous reports proved positive, i.e. existing nests. How to classify or prioritize the reports efficiently and automatically, so as to determine the dispatch of personnel, is of great significance to the control of the Asian giant hornet. In this paper, we propose a method to predict the priority of sighting reports based on machine learning. We model the problem of optimal prioritization of sighting reports as a problem of classification and prediction. We extracted a variety of rich features in the report: location, time, image(s), and textual description. Based on these characteristics, we propose a classification model based on logistic regression to predict the credibility of a certain report. Furthermore, our model quantifies the impact between reports to get the priority ranking of the reports. Extensive experiments on the public dataset from the WSDA (the Washington State Department of Agriculture) have proved the effectiveness of our method.

Priority prediction of Asian Hornet sighting report using machine learning methods

TL;DR

This paper addresses prioritizing public sighting reports of Vespa mandarinia to enable targeted nest verification and destruction. It formulates a binary credibility prediction problem where is learned by a logistic regression with a weighted cross-entropy loss to handle extreme class imbalance. It extracts multimodal features from location, time, image, and text, building a feature vector fed to the classifier. It then computes report priority via mutual influence and , enabling ranking that favors corroborated sightings. Experiments on the WSDA dataset from 2019–2021 show improved accuracy and prioritization performance, validating the method's practical value for pest-control resource allocation.

Abstract

As infamous invaders to the North American ecosystem, the Asian giant hornet (Vespa mandarinia) is devastating not only to native bee colonies, but also to local apiculture. One of the most effective way to combat the harmful species is to locate and destroy their nests. By mobilizing the public to actively report possible sightings of the Asian giant hornet, the governmentcould timely send inspectors to confirm and possibly destroy the nests. However, such confirmation requires lab expertise, where manually checking the reports one by one is extremely consuming of human resources. Further given the limited knowledge of the public about the Asian giant hornet and the randomness of report submission, only few of the numerous reports proved positive, i.e. existing nests. How to classify or prioritize the reports efficiently and automatically, so as to determine the dispatch of personnel, is of great significance to the control of the Asian giant hornet. In this paper, we propose a method to predict the priority of sighting reports based on machine learning. We model the problem of optimal prioritization of sighting reports as a problem of classification and prediction. We extracted a variety of rich features in the report: location, time, image(s), and textual description. Based on these characteristics, we propose a classification model based on logistic regression to predict the credibility of a certain report. Furthermore, our model quantifies the impact between reports to get the priority ranking of the reports. Extensive experiments on the public dataset from the WSDA (the Washington State Department of Agriculture) have proved the effectiveness of our method.

Paper Structure

This paper contains 10 sections, 12 equations, 3 figures, 1 table.

Figures (3)

  • Figure 1: Given a sighting report, we first extract four features: locational, temporal, graphical, and textual. After constructing the feature vector of the report, we use a logistic regression model to predict and classify the credibility of the report. Finally, we quantitatively consider the interaction between the reports and determine the priority of the reports.
  • Figure 2: Under different scaling parameter $\tau$ settings, the cross-validation accuracy on report samples of imbalanced statuses.
  • Figure 3: The priority rankings of the $7$ positive samples in the validation set with different ranking models. For these positive reports, the higher the priority predicted by the model, the better. Noted that we use the sort position of the report among $14$ report in testing set to indicate its priority.