Table of Contents
Fetching ...

SEANN: A Domain-Informed Neural Network for Epidemiological Insights

Jean-Baptiste Guimbaud, Marc Plantevit, Léa Maître, Rémy Cazabet

TL;DR

This paper presents SEANN, a domain-informed neural network that integrates Pooled Effect Sizes from meta-analyses as soft, perturbation-based constraints to improve learning in epidemiology where data are scarce and noisy. By encoding three PES forms—Standardized Regression Coefficients, Odds Ratios, and Risk Ratios—into a unified loss, SEANN promotes generalizable predictions and interpretable, literature-consistent relationships. Synthetic experiments demonstrate that SEANN better generalizes under noise, clarifies dose–response patterns, and remains robust to missing confounders, outperforming a domain-agnostic DNN. The work advances trustworthy, exposome-aware deep learning and suggests extensions to large-scale epidemiological scoring tasks such as HELIX exposome analyses.

Abstract

In epidemiology, traditional statistical methods such as logistic regression, linear regression, and other parametric models are commonly employed to investigate associations between predictors and health outcomes. However, non-parametric machine learning techniques, such as deep neural networks (DNNs), coupled with explainable AI (XAI) tools, offer new opportunities for this task. Despite their potential, these methods face challenges due to the limited availability of high-quality, high-quantity data in this field. To address these challenges, we introduce SEANN, a novel approach for informed DNNs that leverages a prevalent form of domain-specific knowledge: Pooled Effect Sizes (PES). PESs are commonly found in published Meta-Analysis studies, in different forms, and represent a quantitative form of a scientific consensus. By direct integration within the learning procedure using a custom loss, we experimentally demonstrate significant improvements in the generalizability of predictive performances and the scientific plausibility of extracted relationships compared to a domain-knowledge agnostic neural network in a scarce and noisy data setting.

SEANN: A Domain-Informed Neural Network for Epidemiological Insights

TL;DR

This paper presents SEANN, a domain-informed neural network that integrates Pooled Effect Sizes from meta-analyses as soft, perturbation-based constraints to improve learning in epidemiology where data are scarce and noisy. By encoding three PES forms—Standardized Regression Coefficients, Odds Ratios, and Risk Ratios—into a unified loss, SEANN promotes generalizable predictions and interpretable, literature-consistent relationships. Synthetic experiments demonstrate that SEANN better generalizes under noise, clarifies dose–response patterns, and remains robust to missing confounders, outperforming a domain-agnostic DNN. The work advances trustworthy, exposome-aware deep learning and suggests extensions to large-scale epidemiological scoring tasks such as HELIX exposome analyses.

Abstract

In epidemiology, traditional statistical methods such as logistic regression, linear regression, and other parametric models are commonly employed to investigate associations between predictors and health outcomes. However, non-parametric machine learning techniques, such as deep neural networks (DNNs), coupled with explainable AI (XAI) tools, offer new opportunities for this task. Despite their potential, these methods face challenges due to the limited availability of high-quality, high-quantity data in this field. To address these challenges, we introduce SEANN, a novel approach for informed DNNs that leverages a prevalent form of domain-specific knowledge: Pooled Effect Sizes (PES). PESs are commonly found in published Meta-Analysis studies, in different forms, and represent a quantitative form of a scientific consensus. By direct integration within the learning procedure using a custom loss, we experimentally demonstrate significant improvements in the generalizability of predictive performances and the scientific plausibility of extracted relationships compared to a domain-knowledge agnostic neural network in a scarce and noisy data setting.
Paper Structure (16 sections, 16 equations, 3 figures, 1 table)

This paper contains 16 sections, 16 equations, 3 figures, 1 table.

Figures (3)

  • Figure 1: Performance comparison of SEANN and agnostic DNN with different noise levels in input features (experiment 1). The X-axis is the standard deviation of added noise.
  • Figure 2: Comparison of extracted relationships (Shapley values) between the agnostic DNN and SEANN (experiment 2). Only the beta coefficient for fish intake was added as external knowledge. However, both mercury and fish intake are better captured by SEANN compared with the agnostic DNN.
  • Figure 3: Comparison of extracted relationships (Shapley values) between the agnostic DNN and SEANN for the linear case (experiment 3). Without external knowledge, the interpretation of the Mercury effect is opposed to the ground truth. When adding the constraint, we see that the duplicated variable can capture the confounder, i.e., fish intake.