Predicting BVD Re-emergence in Irish Cattle From Highly Imbalanced Herd-Level Data Using Machine Learning Algorithms

Niamh Mimnagh; Andrew Parnell; Conor McAloon; Jaden Carlson; Maria Guelbenzu; Jonas Brock; Damien Barrett; Guy McGrath; Jamie Tratalos; Rafael Moral

Predicting BVD Re-emergence in Irish Cattle From Highly Imbalanced Herd-Level Data Using Machine Learning Algorithms

Niamh Mimnagh, Andrew Parnell, Conor McAloon, Jaden Carlson, Maria Guelbenzu, Jonas Brock, Damien Barrett, Guy McGrath, Jamie Tratalos, Rafael Moral

TL;DR

This study tackles the risk of BVD re-emergence in Ireland after substantial eradication by evaluating a broad set of machine learning approaches on highly imbalanced herd-level data. It compares binary classification methods (GLMs, regularised regression, tree-based models, SVM) and anomaly detectors (LOF, ABOF, Mahalanobis, MCD, Isolation Forest, Autoencoders) under varying sample sizes and imbalance, incorporating resampling and class weighting. Across simulations and real data (2013–2023), Random Forest and XGBoost emerge as top performers, with RF achieving the highest sensitivity and AUC and correctly identifying 219 of 250 positive herds in 2023, while reducing blanket testing by about half. The findings support targeted surveillance strategies that balance detection of re-emergence with practical testing burdens, and they illustrate the value and limitations of imbalanced-data ML approaches for livestock disease monitoring.

Abstract

Bovine Viral Diarrhoea (BVD) has been the focus of a successful eradication programme in Ireland, with the herd-level prevalence declining from 11.3% in 2013 to just 0.2% in 2023. As the country moves toward BVD freedom, the development of predictive models for targeted surveillance becomes increasingly important to mitigate the risk of disease re-emergence. In this study, we evaluate the performance of a range of machine learning algorithms, including binary classification and anomaly detection techniques, for predicting BVD-positive herds using highly imbalanced herd-level data. We conduct an extensive simulation study to assess model performance across varying sample sizes and class imbalance ratios, incorporating resampling, class weighting, and appropriate evaluation metrics (sensitivity, positive predictive value, F1-score and AUC values). Random forests and XGBoost models consistently outperformed other methods, with the random forest model achieving the highest sensitivity and AUC across scenarios, including real-world prediction of 2023 herd status, correctly identifying 219 of 250 positive herds while halving the number of herds that require compared to a blanket-testing strategy.

Predicting BVD Re-emergence in Irish Cattle From Highly Imbalanced Herd-Level Data Using Machine Learning Algorithms

TL;DR

Abstract

Predicting BVD Re-emergence in Irish Cattle From Highly Imbalanced Herd-Level Data Using Machine Learning Algorithms

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (9)