A General Theory of Outcome Weighted Learning for Individualized Treatment Rules

Zhu Wang

A General Theory of Outcome Weighted Learning for Individualized Treatment Rules

Zhu Wang

TL;DR

This work develops a general relationship between population 0-1 risk and risks from a broad class of nonnegative surrogate losses using a constrained variational transformation and proposes two iteratively reweighted convex optimization algorithms.

Abstract

Personalized medicine aims to tailor treatments to individual patients, especially when people respond heterogeneously to therapies. A key objective is to learn individualized treatment rules that recommend optimal treatments from patient characteristics. Outcome weighted learning (OWL) is an important framework because it reformulates the task as a weighted classification problem targeting clinical benefit and using modern machine learning tools. Existing OWL theory has been focusing on specific surrogate losses and Gaussian kernels. Matern kernels, which allow adjustable smoothness and better match many real world data structures, are often more suitable and include the Gaussian kernel as a special case. This work develops a general relationship between population 0-1 risk and risks from a broad class of nonnegative surrogate losses using a constrained variational transformation. The transform simplifies for convex losses and provides simple expressions for certain nonconvex losses. A condition is established that ensures a nontrivial upper bound on the excess 0-1 risk. The paper establishes convergence rates for kernel based OWL under smoothness conditions with Matern kernels or geometric noise conditions with Gaussian kernels for both convex and nonconvex losses. It also proposes two iteratively reweighted convex optimization algorithms. Simulations and an application to ACTG 175 show strong performance.

A General Theory of Outcome Weighted Learning for Individualized Treatment Rules

TL;DR

Abstract

Paper Structure (20 sections, 19 theorems, 35 equations, 2 figures, 5 tables, 2 algorithms)

This paper contains 20 sections, 19 theorems, 35 equations, 2 figures, 5 tables, 2 algorithms.

Introduction
Relating Excess Risk to Excess Surrogate Risk
The $\psi$-transform for OWL
Policy-Calibration
The $\Psi$-Transform and the Relationship Between Excess Risks
The $\Psi$-Transform of Nonconvex Loss Functions
Learning Rates
Approximation Error Assumptions
Smoothness Assumption
Geometric Noise Assumption
Learning Rates for Convex Loss Functions
Learning Rates for Nonconvex Loss Functions
Computational Implementation
Algorithms for OWL
Algorithms for RWL
...and 5 more sections

Key Result

Theorem 2.1

Assume $T$ is classification-calibrated and $\psi$ is positive homogeneous, i.e., for $c > 0$ and $\theta \geq 0$, the equality holds: $\psi(c\theta) = c \psi(\theta).$ Then $\psi(\mathcal{R}(f) - \mathcal{R}^*) \leq \mathcal{R}_T-\mathcal{R}_T^*$ and $\psi(\theta) = \theta \psi(1)$ for all $\theta

Figures (2)

Figure 1: Loss function comparisons: $g\circ s$, where $s$ denotes the binomial loss. Various concave functions $g$ are shown, with their associated $\sigma$ selected to closely approximate the scaled smoothed ramp loss.
Figure 2: Example 1: Comparison of three kernel specifications—exponential, Matérn 3/2, and Gaussian—under both convex and nonconvex loss functions. Left panels: scaled target and estimate. Right panels: excess risks (log-scale) for the smooth and nonsmooth targets.

Theorems & Definitions (22)

Definition 2.1: Classification-Calibration
Theorem 2.1
Definition 2.2: Policy-Calibration
Theorem 2.2
Corollary 2.3
Corollary 2.4
Definition 2.3: $\Psi$-Transform
Theorem 2.5
Theorem 2.6
Lemma 2.7
...and 12 more

A General Theory of Outcome Weighted Learning for Individualized Treatment Rules

TL;DR

Abstract

A General Theory of Outcome Weighted Learning for Individualized Treatment Rules

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (2)

Theorems & Definitions (22)