Table of Contents
Fetching ...

Learning Predictive Checklists with Probabilistic Logic Programming

Yukti Makhija, Edward De Brouwer, Rahul G. Krishnan

TL;DR

This work proposes a novel method for learning predictive checklists from diverse data modalities, such as images and time series, that outperforms various explainable machine learning techniques on prediction tasks involving image sequences, time series, and clinical notes.

Abstract

Checklists have been widely recognized as effective tools for completing complex tasks in a systematic manner. Although originally intended for use in procedural tasks, their interpretability and ease of use have led to their adoption for predictive tasks as well, including in clinical settings. However, designing checklists can be challenging, often requiring expert knowledge and manual rule design based on available data. Recent work has attempted to address this issue by using machine learning to automatically generate predictive checklists from data, although these approaches have been limited to Boolean data. We propose a novel method for learning predictive checklists from diverse data modalities, such as images and time series. Our approach relies on probabilistic logic programming, a learning paradigm that enables matching the discrete nature of checklist with continuous-valued data. We propose a regularization technique to tradeoff between the information captured in discrete concepts of continuous data and permit a tunable level of interpretability for the learned checklist concepts. We demonstrate that our method outperforms various explainable machine learning techniques on prediction tasks involving image sequences, time series, and clinical notes.

Learning Predictive Checklists with Probabilistic Logic Programming

TL;DR

This work proposes a novel method for learning predictive checklists from diverse data modalities, such as images and time series, that outperforms various explainable machine learning techniques on prediction tasks involving image sequences, time series, and clinical notes.

Abstract

Checklists have been widely recognized as effective tools for completing complex tasks in a systematic manner. Although originally intended for use in procedural tasks, their interpretability and ease of use have led to their adoption for predictive tasks as well, including in clinical settings. However, designing checklists can be challenging, often requiring expert knowledge and manual rule design based on available data. Recent work has attempted to address this issue by using machine learning to automatically generate predictive checklists from data, although these approaches have been limited to Boolean data. We propose a novel method for learning predictive checklists from diverse data modalities, such as images and time series. Our approach relies on probabilistic logic programming, a learning paradigm that enables matching the discrete nature of checklist with continuous-valued data. We propose a regularization technique to tradeoff between the information captured in discrete concepts of continuous data and permit a tunable level of interpretability for the learned checklist concepts. We demonstrate that our method outperforms various explainable machine learning techniques on prediction tasks involving image sequences, time series, and clinical notes.

Paper Structure

This paper contains 64 sections, 1 theorem, 18 equations, 13 figures, 14 tables.

Key Result

Proposition 4.1

The probability of the query $\hat{y}_i = y_i$ in the predictive checklist is given by where $\Sigma_d$ is the set of selection functions $\sigma : [d'] \rightarrow \{0,1\}$ such that $\sum_{j=1}^{d'}\sigma(j) = d$.

Figures (13)

  • Figure 1: Example checklist learnt by our architecture. Three or more checks entail a positive neoplasm prediction. We identify key tokens in clinical notes that correspond to positive and negative concepts, where each concept is characterized by the presence of positive and absence of negative tokens.
  • Figure 2: Overview of our proposed ProbChecklist. Given $K$ data modalities as the input for sample $i$, we train $K$ concept learners to obtain the vector of probabilistic concepts of each modality $\mathbf{p_i^k} \in [0,1]^{d'_k}$. Next, we concatenate into the full concepts probabilities ($\mathbf{p_i}$) for sample i. For training the concept learners, we pass $\mathbf{p_i}$ through the probabilistic logic module. At inference time, we discretize $\mathbf{p_i}$ through the thresholding parameter $\tau$ to obtain binary concepts $\mathbf{c_i}$, which are used to construct a complete predictive checklist.
  • Figure 3: Learnt checklist for PhysioNet Sepsis Prediction Task (Tabular). We report the performance result as accuracy (65.69%), precision (0.527), recall (0.755), and specificity (0.6).
  • Figure 4: Results of ProbChecklist on MNIST Checklist Dataset: (a) Sensitivity Analysis (b) Interpretation of the concepts learnt.
  • Figure 5: Improvement in fairness metrics across gender and ethnicity on MIMIC III for the mortality prediction task after adding fairness regularization. We report $\Delta$FNR and $\Delta$FPR for all pairs of subgroups of sensitive features and the percentage decrease (% $\downarrow$) wrt unregularized checklist.
  • ...and 8 more figures

Theorems & Definitions (2)

  • Proposition 4.1
  • proof