Applications of 0-1 Neural Networks in Prescription and Prediction

Vrishabh Patil; Kara Hoppe; Yonatan Mintz

Applications of 0-1 Neural Networks in Prescription and Prediction

Vrishabh Patil, Kara Hoppe, Yonatan Mintz

TL;DR

This work tackles learning personalized treatment policies under limited observational data by proposing Prescriptive Neural Networks (PNNs), shallow 0-1 networks trained with mixed-integer programming. PNNs integrate counterfactual estimation (IPW/DM/DR) into a MILP framework to directly optimize policies while preserving interpretability, and they come with statistical consistency guarantees. Empirical results on simulated data and a postpartum hypertension case study show PNNs can reduce peak SBP and outperform competing prescriptive methods, with SHAP-based insights highlighting clinically plausible feature importance. The framework thereby provides a practical, auditable approach to prescriptive analytics in medicine, with extensions demonstrated for Warfarin dosing and MNIST prediction in the appendix.

Abstract

A key challenge in medical decision making is learning treatment policies for patients with limited observational data. This challenge is particularly evident in personalized healthcare decision-making, where models need to take into account the intricate relationships between patient characteristics, treatment options, and health outcomes. To address this, we introduce prescriptive networks (PNNs), shallow 0-1 neural networks trained with mixed integer programming that can be used with counterfactual estimation to optimize policies in medium data settings. These models offer greater interpretability than deep neural networks and can encode more complex policies than common models such as decision trees. We show that PNNs can outperform existing methods in both synthetic data experiments and in a case study of assigning treatments for postpartum hypertension. In particular, PNNs are shown to produce policies that could reduce peak blood pressure by 5.47 mm Hg (p=0.02) over existing clinical practice, and by 2 mm Hg (p=0.01) over the next best prescriptive modeling technique. Moreover PNNs were more likely than all other models to correctly identify clinically significant features while existing models relied on potentially dangerous features such as patient insurance information and race that could lead to bias in treatment.

Applications of 0-1 Neural Networks in Prescription and Prediction

TL;DR

Abstract

Paper Structure (45 sections, 7 theorems, 30 equations, 9 figures, 18 tables)

This paper contains 45 sections, 7 theorems, 30 equations, 9 figures, 18 tables.

Introduction
Problem Description
Contributions
Literature Review
Mixed-Integer Programming Formulation of ANNs
Binary Activations
Loss Functions for Prescription
Consistency of Prescriptive Networks
Numerical Experiments on Prescriptive Problems
Experimental Setup
Simulated Data
Personalized Postpartum Hypertension Treatments
Results
Simulated Data
Personalized Postpartum Hypertension Treatments
...and 30 more sections

Key Result

Proposition 1

In the binary activation case, the Constraints l_0 output_const,l output_const,L output_const can be reformulated as a set of MIP constraints. Specifically for all $k\in [K]_1, i\in[n]_1$ Constraint l_0 output_const can be reformulated as: For all, $\ell \in [L-1]_1,k\in[K]_1,i\in[n]_1$ Constraints l output_const can be reformulated as: And for all $t\in \mathcal{T},i \in [n]_1$, Constraints L o

Figures (9)

Figure 1: Causal Graphs Detailing the Covariates, Treatments, and Outcomes Along with their Causal Structures for the Three Hypertension Experiments.
Figure 2: Results of the simulated dataset with experimental design 1. Panel A visualizes the results when the data is binarized using adapted encoding. Panel B visualizes the results when the data is binarized using one-hot encoded binarization. The left plots are policies learned from 100 datapoints and the right plots are policies learned from 500 datapoints.
Figure 3: Results of the simulated dataset with experimental design 2 and 3. Panel A visualizes the results when the data is binarized using one-hot encoding with the left learned from 100 datapoints and the right learned from 500 datapoints. Panel B visualizes the results when the data is binarized using adapted binarization with the left learned from 100 datapoints and the right learned from 500 datapoints.
Figure 4: Results of experiments on the hypertension dataset for out-of-sample patients prescribed treatments based on policies learned by different models. Each panel displays the mean out-of-sample maximum systolic blood pressure. The left plots include the 90% confidence intervals and the right plots include the 95% confidence intervals.
Figure 5: Out-of-sample treatment assignments for policies learned by different models over the hypertension dataset. Each panel displays the number of patients who were prescribed each treatment.
...and 4 more figures

Theorems & Definitions (7)

Proposition 1
Proposition 2
Theorem 3
Proposition 4
Proposition 5
Proposition 6
Proposition 7

Applications of 0-1 Neural Networks in Prescription and Prediction

TL;DR

Abstract

Applications of 0-1 Neural Networks in Prescription and Prediction

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (9)

Theorems & Definitions (7)