Machine learning augmented diagnostic testing to identify sources of variability in test performance
Christopher J. Banks, Aeron Sanchez, Vicki Stewart, Kate Bowen, Thomas Doherty, Oliver Tearne, Graham Smith, Rowland R. Kao
TL;DR
This study addresses limitations of the SICCT bovine tuberculosis test by augmenting its interpretation with herd-level epidemiological risk factors using a Histogram-based Gradient Boosted Tree. The model, trained on 1.3 million tests with temporal cross-validation, achieves AUROC 0.90 and increases herd-level sensitivity from 63.9% to 69.0% while maintaining 90.3% specificity, corresponding to earlier detection of 240 outbreaks in 2020. SHAP analysis identifies key risk factors (e.g., herd size, movements, prior breakdown, location) and shows that the importance of factors can vary with time. Simulation in two regions indicates potential regional benefits and trade-offs, supporting the case for field deployment with regulatory considerations and targeted testing policies.
Abstract
Diagnostic tests that can detect pre-clinical or sub-clinical infection, are one of the most powerful tools in our armoury of weapons to control infectious diseases. Considerable effort has been paid to improving diagnostic testing for human, plant and animal diseases, including strategies for targeting the use of diagnostic tests towards individuals who are more likely to be infected. We use machine learning to assess the surrounding risk landscape under which a diagnostic test is applied to augment its interpretation. We develop this to predict the occurrence of bovine tuberculosis incidents in cattle herds, exploiting the availability of exceptionally detailed testing records. We show that, without compromising test specificity, test sensitivity can be improved so that the proportion of infected herds detected improves by over 5 percentage points, or 240 additional infected herds detected in one year beyond those detected by the skin test alone. We also use feature importance testing for assessing the weighting of risk factors. While many factors are associated with increased risk of incidents, of note are several factors that suggest that in some herds there is a higher risk of infection going undetected.
