Table of Contents
Fetching ...

Machine learning augmented diagnostic testing to identify sources of variability in test performance

Christopher J. Banks, Aeron Sanchez, Vicki Stewart, Kate Bowen, Thomas Doherty, Oliver Tearne, Graham Smith, Rowland R. Kao

TL;DR

This study addresses limitations of the SICCT bovine tuberculosis test by augmenting its interpretation with herd-level epidemiological risk factors using a Histogram-based Gradient Boosted Tree. The model, trained on 1.3 million tests with temporal cross-validation, achieves AUROC 0.90 and increases herd-level sensitivity from 63.9% to 69.0% while maintaining 90.3% specificity, corresponding to earlier detection of 240 outbreaks in 2020. SHAP analysis identifies key risk factors (e.g., herd size, movements, prior breakdown, location) and shows that the importance of factors can vary with time. Simulation in two regions indicates potential regional benefits and trade-offs, supporting the case for field deployment with regulatory considerations and targeted testing policies.

Abstract

Diagnostic tests that can detect pre-clinical or sub-clinical infection, are one of the most powerful tools in our armoury of weapons to control infectious diseases. Considerable effort has been paid to improving diagnostic testing for human, plant and animal diseases, including strategies for targeting the use of diagnostic tests towards individuals who are more likely to be infected. We use machine learning to assess the surrounding risk landscape under which a diagnostic test is applied to augment its interpretation. We develop this to predict the occurrence of bovine tuberculosis incidents in cattle herds, exploiting the availability of exceptionally detailed testing records. We show that, without compromising test specificity, test sensitivity can be improved so that the proportion of infected herds detected improves by over 5 percentage points, or 240 additional infected herds detected in one year beyond those detected by the skin test alone. We also use feature importance testing for assessing the weighting of risk factors. While many factors are associated with increased risk of incidents, of note are several factors that suggest that in some herds there is a higher risk of infection going undetected.

Machine learning augmented diagnostic testing to identify sources of variability in test performance

TL;DR

This study addresses limitations of the SICCT bovine tuberculosis test by augmenting its interpretation with herd-level epidemiological risk factors using a Histogram-based Gradient Boosted Tree. The model, trained on 1.3 million tests with temporal cross-validation, achieves AUROC 0.90 and increases herd-level sensitivity from 63.9% to 69.0% while maintaining 90.3% specificity, corresponding to earlier detection of 240 outbreaks in 2020. SHAP analysis identifies key risk factors (e.g., herd size, movements, prior breakdown, location) and shows that the importance of factors can vary with time. Simulation in two regions indicates potential regional benefits and trade-offs, supporting the case for field deployment with regulatory considerations and targeted testing policies.

Abstract

Diagnostic tests that can detect pre-clinical or sub-clinical infection, are one of the most powerful tools in our armoury of weapons to control infectious diseases. Considerable effort has been paid to improving diagnostic testing for human, plant and animal diseases, including strategies for targeting the use of diagnostic tests towards individuals who are more likely to be infected. We use machine learning to assess the surrounding risk landscape under which a diagnostic test is applied to augment its interpretation. We develop this to predict the occurrence of bovine tuberculosis incidents in cattle herds, exploiting the availability of exceptionally detailed testing records. We show that, without compromising test specificity, test sensitivity can be improved so that the proportion of infected herds detected improves by over 5 percentage points, or 240 additional infected herds detected in one year beyond those detected by the skin test alone. We also use feature importance testing for assessing the weighting of risk factors. While many factors are associated with increased risk of incidents, of note are several factors that suggest that in some herds there is a higher risk of infection going undetected.
Paper Structure (1 section, 3 figures, 3 tables)

This paper contains 1 section, 3 figures, 3 tables.

Figures (3)

  • Figure 1: (a) Receiver operating characteristic (ROC) curve for the diagnostic model. Performance is consistently better than SICCT testing alone for all decision thresholds. (b) The decision threshold choice, such that the herd-level specificity (HSp) is maintained at the level of the SICCT test and the herd-level sensitivity (HSe) is maximised.
  • Figure 2: (a) Proportion (%) of herds by area that had a negative SICCT test result, but were correctly predicted by the diagnostic model to have a confirmed breakdown, over the year 2020. (b) Proportion of herd tests by area that were misclassified by the model in the year 2020.
  • Figure 3: The relative importance of model features (risk factors), as tested by SHAP importance testing, with a random control variable. Only feature whose absolute SHAP values are significantly greater than the random feature (Mann-Whitney U test, $p<0.01$) are shown. Features marked * refer to previous tests or breakdowns and will be left-censored where the previous test or breakdown is before the first date in the dataset.