Table of Contents
Fetching ...

Temporal Positive-unlabeled Learning for Biomedical Hypothesis Generation via Risk Estimation

Uchenna Akujuobi, Jun Chen, Mohamed Elhoseiny, Michael Spranger, Xiangliang Zhang

TL;DR

The paper addresses the challenge of hypothesis generation in biomedicine by predicting future term-term connections on temporal, attributed graphs under a positive-unlabeled (PU) learning framework. It introduces the Temporal Relationship Predictor (TRP), a GRU- and GraphSAGE-based architecture that learns node-pair embeddings and uses an unbiased PU risk together with variational prior estimation to predict new links over time. Key contributions include (i) applying PU learning to temporal graphs for biomedical HG, (ii) estimating the positive prior via deep variational inference, and (iii) demonstrating superior performance on three real-world biomedical graphs with qualitative case studies. Empirical results show TRP-UPU outperforms PN and SOTA PU baselines, with incremental learning behavior and meaningful predicted associations, suggesting a significant advance for temporally aware hypothesis generation in biomedicine. The approach enables more timely and data-driven discovery while highlighting the need for careful validation of predicted connections and consideration of data quality and interpretability.

Abstract

Understanding the relationships between biomedical terms like viruses, drugs, and symptoms is essential in the fight against diseases. Many attempts have been made to introduce the use of machine learning to the scientific process of hypothesis generation(HG), which refers to the discovery of meaningful implicit connections between biomedical terms. However, most existing methods fail to truly capture the temporal dynamics of scientific term relations and also assume unobserved connections to be irrelevant (i.e., in a positive-negative (PN) learning setting). To break these limits, we formulate this HG problem as future connectivity prediction task on a dynamic attributed graph via positive-unlabeled (PU) learning. Then, the key is to capture the temporal evolution of node pair (term pair) relations from just the positive and unlabeled data. We propose a variational inference model to estimate the positive prior, and incorporate it in the learning of node pair embeddings, which are then used for link prediction. Experiment results on real-world biomedical term relationship datasets and case study analyses on a COVID-19 dataset validate the effectiveness of the proposed model.

Temporal Positive-unlabeled Learning for Biomedical Hypothesis Generation via Risk Estimation

TL;DR

The paper addresses the challenge of hypothesis generation in biomedicine by predicting future term-term connections on temporal, attributed graphs under a positive-unlabeled (PU) learning framework. It introduces the Temporal Relationship Predictor (TRP), a GRU- and GraphSAGE-based architecture that learns node-pair embeddings and uses an unbiased PU risk together with variational prior estimation to predict new links over time. Key contributions include (i) applying PU learning to temporal graphs for biomedical HG, (ii) estimating the positive prior via deep variational inference, and (iii) demonstrating superior performance on three real-world biomedical graphs with qualitative case studies. Empirical results show TRP-UPU outperforms PN and SOTA PU baselines, with incremental learning behavior and meaningful predicted associations, suggesting a significant advance for temporally aware hypothesis generation in biomedicine. The approach enables more timely and data-driven discovery while highlighting the need for careful validation of predicted connections and consideration of data quality and interpretability.

Abstract

Understanding the relationships between biomedical terms like viruses, drugs, and symptoms is essential in the fight against diseases. Many attempts have been made to introduce the use of machine learning to the scientific process of hypothesis generation(HG), which refers to the discovery of meaningful implicit connections between biomedical terms. However, most existing methods fail to truly capture the temporal dynamics of scientific term relations and also assume unobserved connections to be irrelevant (i.e., in a positive-negative (PN) learning setting). To break these limits, we formulate this HG problem as future connectivity prediction task on a dynamic attributed graph via positive-unlabeled (PU) learning. Then, the key is to capture the temporal evolution of node pair (term pair) relations from just the positive and unlabeled data. We propose a variational inference model to estimate the positive prior, and incorporate it in the learning of node pair embeddings, which are then used for link prediction. Experiment results on real-world biomedical term relationship datasets and case study analyses on a COVID-19 dataset validate the effectiveness of the proposed model.

Paper Structure

This paper contains 17 sections, 8 equations, 4 figures, 3 tables, 1 algorithm.

Figures (4)

  • Figure 1: The proposed TRP model. Block (a) shows the outer view of the model framework. The inner structure of the recurrent update block and neighborhood aggregation method are shown in block (b) and (c), respectively.
  • Figure 2: Stability comparison of TRP-PN, TRP-NNPU and TRP-UPU, showing the F1-S performance of the models (Y-axis) with different learning rates (X-axis) on 10 epochs.
  • Figure 3: F1-P per year. The models are incrementally trained with data before the evaluation year.
  • Figure 4: Pair embedding visualization. The blue color denotes the true positive samples, the red points are unobserved negative, the green points are unobserved positive.

Theorems & Definitions (2)

  • Definition 1
  • Definition 2