Table of Contents
Fetching ...

Predictive Modeling and Explainable AI for Veterinary Safety Profiles, Residue Assessment, and Health Outcomes Using Real-World Data and Physicochemical Properties

Hossein Sholehrasa, Xuan Xu, Doina Caragea, Jim E. Riviere, Majid Jaberi-Douraki

TL;DR

This work tackles the problem of predicting veterinary safety outcomes (Death vs Recovery) from a large real-world OpenFDA CVM dataset by integrating VeDDRA ontology mappings and PubChem-derived physicochemical properties. It employs a multi-model pipeline that combines traditional ML, ensemble methods, transformer-based DL, and semi-supervised learning with $AUM$-based pseudo-labeling, augmented by SHAP for interpretability. The results show that ensemble methods such as CatBoost, XGBoost, and Stacking achieve high predictive performance ($F1$ ≈ $0.94$–$0.95$) and strong recall for fatal outcomes, while SSL improves minority-class detection; LLMs underperform without domain-specific fine-tuning. The study demonstrates a scalable, transparent framework for proactive veterinary pharmacovigilance and residue risk assessment, with practical impact for regulatory decisions, prescribing practices, and public health protection; code and configurations will be released on GitHub.

Abstract

The safe use of pharmaceuticals in food-producing animals is vital to protect animal welfare and human food safety. Adverse events (AEs) may signal unexpected pharmacokinetic or toxicokinetic effects, increasing the risk of violative residues in the food chain. This study introduces a predictive framework for classifying outcomes (Death vs. Recovery) using ~1.28 million reports (1987-2025 Q1) from the U.S. FDA's OpenFDA Center for Veterinary Medicine. A preprocessing pipeline merged relational tables and standardized AEs through VeDDRA ontologies. Data were normalized, missing values imputed, and high-cardinality features reduced; physicochemical drug properties were integrated to capture chemical-residue links. We evaluated supervised models, including Random Forest, CatBoost, XGBoost, ExcelFormer, and large language models (Gemma 3-27B, Phi 3-12B). Class imbalance was addressed, such as undersampling and oversampling, with a focus on prioritizing recall for fatal outcomes. Ensemble methods(Voting, Stacking) and CatBoost performed best, achieving precision, recall, and F1-scores of 0.95. Incorporating Average Uncertainty Margin (AUM)-based pseudo-labeling of uncertain cases improved minority-class detection, particularly in ExcelFormer and XGBoost. Interpretability via SHAP identified biologically plausible predictors, including lung, heart, and bronchial disorders, animal demographics, and drug physicochemical properties. These features were strongly linked to fatal outcomes. Overall, the framework shows that combining rigorous data engineering, advanced machine learning, and explainable AI enables accurate, interpretable predictions of veterinary safety outcomes. The approach supports FARAD's mission by enabling early detection of high-risk drug-event profiles, strengthening residue risk assessment, and informing regulatory and clinical decision-making.

Predictive Modeling and Explainable AI for Veterinary Safety Profiles, Residue Assessment, and Health Outcomes Using Real-World Data and Physicochemical Properties

TL;DR

This work tackles the problem of predicting veterinary safety outcomes (Death vs Recovery) from a large real-world OpenFDA CVM dataset by integrating VeDDRA ontology mappings and PubChem-derived physicochemical properties. It employs a multi-model pipeline that combines traditional ML, ensemble methods, transformer-based DL, and semi-supervised learning with -based pseudo-labeling, augmented by SHAP for interpretability. The results show that ensemble methods such as CatBoost, XGBoost, and Stacking achieve high predictive performance () and strong recall for fatal outcomes, while SSL improves minority-class detection; LLMs underperform without domain-specific fine-tuning. The study demonstrates a scalable, transparent framework for proactive veterinary pharmacovigilance and residue risk assessment, with practical impact for regulatory decisions, prescribing practices, and public health protection; code and configurations will be released on GitHub.

Abstract

The safe use of pharmaceuticals in food-producing animals is vital to protect animal welfare and human food safety. Adverse events (AEs) may signal unexpected pharmacokinetic or toxicokinetic effects, increasing the risk of violative residues in the food chain. This study introduces a predictive framework for classifying outcomes (Death vs. Recovery) using ~1.28 million reports (1987-2025 Q1) from the U.S. FDA's OpenFDA Center for Veterinary Medicine. A preprocessing pipeline merged relational tables and standardized AEs through VeDDRA ontologies. Data were normalized, missing values imputed, and high-cardinality features reduced; physicochemical drug properties were integrated to capture chemical-residue links. We evaluated supervised models, including Random Forest, CatBoost, XGBoost, ExcelFormer, and large language models (Gemma 3-27B, Phi 3-12B). Class imbalance was addressed, such as undersampling and oversampling, with a focus on prioritizing recall for fatal outcomes. Ensemble methods(Voting, Stacking) and CatBoost performed best, achieving precision, recall, and F1-scores of 0.95. Incorporating Average Uncertainty Margin (AUM)-based pseudo-labeling of uncertain cases improved minority-class detection, particularly in ExcelFormer and XGBoost. Interpretability via SHAP identified biologically plausible predictors, including lung, heart, and bronchial disorders, animal demographics, and drug physicochemical properties. These features were strongly linked to fatal outcomes. Overall, the framework shows that combining rigorous data engineering, advanced machine learning, and explainable AI enables accurate, interpretable predictions of veterinary safety outcomes. The approach supports FARAD's mission by enabling early detection of high-risk drug-event profiles, strengthening residue risk assessment, and informing regulatory and clinical decision-making.

Paper Structure

This paper contains 23 sections, 1 equation, 2 figures, 1 table.

Figures (2)

  • Figure 1: Overview of the SSL pipeline, illustrated with an AUM-based pseudo-labeling workflow. The model is trained on labeled data, assigns pseudo-labels to unlabeled cases, and high-confidence predictions are merged with the labeled set for retraining.
  • Figure 2: Comprehensive SHAP visualizations: (a–c) summary plots of the top 15 features for companion animals, livestock, and poultry; (d–f) top and bottom SHAP mean values for AE terms; (g–i) top and bottom SHAP mean values for active ingredients across animal groups.