Table of Contents
Fetching ...

A General Theory of Outcome Weighted Learning for Individualized Treatment Rules

Zhu Wang

TL;DR

This work develops a general relationship between population 0-1 risk and risks from a broad class of nonnegative surrogate losses using a constrained variational transformation and proposes two iteratively reweighted convex optimization algorithms.

Abstract

Personalized medicine aims to tailor treatments to individual patients, especially when people respond heterogeneously to therapies. A key objective is to learn individualized treatment rules that recommend optimal treatments from patient characteristics. Outcome weighted learning (OWL) is an important framework because it reformulates the task as a weighted classification problem targeting clinical benefit and using modern machine learning tools. Existing OWL theory has been focusing on specific surrogate losses and Gaussian kernels. Matern kernels, which allow adjustable smoothness and better match many real world data structures, are often more suitable and include the Gaussian kernel as a special case. This work develops a general relationship between population 0-1 risk and risks from a broad class of nonnegative surrogate losses using a constrained variational transformation. The transform simplifies for convex losses and provides simple expressions for certain nonconvex losses. A condition is established that ensures a nontrivial upper bound on the excess 0-1 risk. The paper establishes convergence rates for kernel based OWL under smoothness conditions with Matern kernels or geometric noise conditions with Gaussian kernels for both convex and nonconvex losses. It also proposes two iteratively reweighted convex optimization algorithms. Simulations and an application to ACTG 175 show strong performance.

A General Theory of Outcome Weighted Learning for Individualized Treatment Rules

TL;DR

This work develops a general relationship between population 0-1 risk and risks from a broad class of nonnegative surrogate losses using a constrained variational transformation and proposes two iteratively reweighted convex optimization algorithms.

Abstract

Personalized medicine aims to tailor treatments to individual patients, especially when people respond heterogeneously to therapies. A key objective is to learn individualized treatment rules that recommend optimal treatments from patient characteristics. Outcome weighted learning (OWL) is an important framework because it reformulates the task as a weighted classification problem targeting clinical benefit and using modern machine learning tools. Existing OWL theory has been focusing on specific surrogate losses and Gaussian kernels. Matern kernels, which allow adjustable smoothness and better match many real world data structures, are often more suitable and include the Gaussian kernel as a special case. This work develops a general relationship between population 0-1 risk and risks from a broad class of nonnegative surrogate losses using a constrained variational transformation. The transform simplifies for convex losses and provides simple expressions for certain nonconvex losses. A condition is established that ensures a nontrivial upper bound on the excess 0-1 risk. The paper establishes convergence rates for kernel based OWL under smoothness conditions with Matern kernels or geometric noise conditions with Gaussian kernels for both convex and nonconvex losses. It also proposes two iteratively reweighted convex optimization algorithms. Simulations and an application to ACTG 175 show strong performance.
Paper Structure (20 sections, 19 theorems, 35 equations, 2 figures, 5 tables, 2 algorithms)

This paper contains 20 sections, 19 theorems, 35 equations, 2 figures, 5 tables, 2 algorithms.

Key Result

Theorem 2.1

Assume $T$ is classification-calibrated and $\psi$ is positive homogeneous, i.e., for $c > 0$ and $\theta \geq 0$, the equality holds: $\psi(c\theta) = c \psi(\theta).$ Then $\psi(\mathcal{R}(f) - \mathcal{R}^*) \leq \mathcal{R}_T-\mathcal{R}_T^*$ and $\psi(\theta) = \theta \psi(1)$ for all $\theta

Figures (2)

  • Figure 1: Loss function comparisons: $g\circ s$, where $s$ denotes the binomial loss. Various concave functions $g$ are shown, with their associated $\sigma$ selected to closely approximate the scaled smoothed ramp loss.
  • Figure 2: Example 1: Comparison of three kernel specifications—exponential, Matérn 3/2, and Gaussian—under both convex and nonconvex loss functions. Left panels: scaled target and estimate. Right panels: excess risks (log-scale) for the smooth and nonsmooth targets.

Theorems & Definitions (22)

  • Definition 2.1: Classification-Calibration
  • Theorem 2.1
  • Definition 2.2: Policy-Calibration
  • Theorem 2.2
  • Corollary 2.3
  • Corollary 2.4
  • Definition 2.3: $\Psi$-Transform
  • Theorem 2.5
  • Theorem 2.6
  • Lemma 2.7
  • ...and 12 more