VA-Calibration: Correcting for Algorithmic Misclassification in Estimating Cause Distributions

Sandipan Pramanik; Emily B. Wilson; Henry D. Kalter; Agbessi Amouzou; Robert E. Black; Li Liu; Jamie Perin; Abhirup Datta

VA-Calibration: Correcting for Algorithmic Misclassification in Estimating Cause Distributions

Sandipan Pramanik, Emily B. Wilson, Henry D. Kalter, Agbessi Amouzou, Robert E. Black, Li Liu, Jamie Perin, Abhirup Datta

Abstract

Accurate estimation of cause-specific mortality fractions (CSMFs), the percentage of deaths attributable to each cause in a population, is essential for global health monitoring. Challenge arises because computer-coded verbal autopsy (CCVA) algorithms, commonly used to estimate CSMFs, frequently misclassify the cause of death (COD). This misclassification is further complicated by structured patterns and substantial variation across countries. To address this, we introduce the R package 'vacalibration'. It implements a modular Bayesian framework to correct for the misclassification, thereby yielding more accurate CSMF estimates from verbal autopsy (VA) questionnaire data. The package utilizes uncertainty-quantified CCVA misclassification matrix estimates derived from data collected in the CHAMPS project and available on the 'CCVA-Misclassification-Matrices' GitHub repository. Currently, these matrices cover three CCVA algorithms (EAVA, InSilicoVA, and InterVA) and two age groups (neonates aged 0-27 days, and children aged 1-59 months) across countries (specific estimates for Bangladesh, Ethiopia, Kenya, Mali, Mozambique, Sierra Leone, and South Africa, and a combined estimate for all other countries), enabling global calibration. The 'vacalibration' package also supports ensemble calibration when multiple algorithms are available. Implemented using the 'RStan', the package offers rapid computation, uncertainty quantification, and seamless compatibility with openVA, a leading COD analysis software ecosystem. We demonstrate the package's flexibility with two real-world applications in COMSA-Mozambique and CA CODE. The package and its foundational methodology applies more broadly and can calibrate any discrete classifier or their ensemble.

VA-Calibration: Correcting for Algorithmic Misclassification in Estimating Cause Distributions

Abstract

Paper Structure (38 sections, 7 equations, 5 figures, 1 table)

This paper contains 38 sections, 7 equations, 5 figures, 1 table.

Introduction
Example dataset: COMSA-Mozambique verbal autopsy records
Characterizing algorithmic misclassification using CHAMPS data
Method
Modeling country-specific misclassification matrices
Base model: Parsimonious homogeneous modeling of misclassification matrices
Intrinsic accuracy (diagonal effect).
Systematic preference or pull (column effect).
General homogeneous misclassification matrices
County-specific misclassification matrices
Misclassification for unobserved CHAMPS causes and countries
Modular VA-Calibration with country-specific misclassification matrix
Key objects and functions in the package
CCVA_missmat: CCVA misclassification matrices based on CHAMPS
cause_map: CCVA output to broad cause
...and 23 more sections

Figures (5)

Figure 1: Example VA survey dataset (records) in the standard WHO 2016 VA questionnaire format, provided as "who151_odk_export.csv" in the CrossVA package.
Figure 2: These are VA survey data (records) in Figure \ref{['fig:exampleva-fig']} that are mapped using odk2EAVA for input to the EAVA algorithm.
Figure 3: These are VA survey data (records) in Figure \ref{['fig:exampleva-fig']} that are mapped using odk2openVA_v151 for input to algorithms in 'openVA' (e.g., InSilicoVA, InterVA).
Figure 4: The top panel displays the average CHAMPS-based misclassification matrix estimates for each algorithm as stored. The middle panel shows the average misclassification matrix after correction. This is used for calibration in an uncertainty-quantified informative prior. The bottom panel compares the uncalibrated CSMF estimates with the corresponding uncertainty-quantified calibrated CSMF estimates. Grey rows and columns in the misclassification matrix indicate causes that are not calibrated.
Figure 5: Comparison of uncalibrated and calibrated estimates of cause-specific mortality fractions (CSMFs) across five studies in the CA CODE example, as presented in Table \ref{['tab:cacode-data']}.

VA-Calibration: Correcting for Algorithmic Misclassification in Estimating Cause Distributions

Abstract

VA-Calibration: Correcting for Algorithmic Misclassification in Estimating Cause Distributions

Authors

Abstract

Table of Contents

Figures (5)