Causal K-Means Clustering

Kwangho Kim; Jisu Kim; Edward H. Kennedy

Causal K-Means Clustering

Kwangho Kim, Jisu Kim, Edward H. Kennedy

TL;DR

This paper tackles the problem of identifying heterogeneous treatment effects when subgroup structure is unknown by introducing Causal K-Means Clustering, which clusters on the vector of conditional counterfactual means $\mu(X)$. It develops two estimators: a straightforward plug-in method and a bias-corrected semiparametric estimator based on efficient influence functions with cross-fitting, achieving fast $\sqrt{n}$-type rates under a margin condition. Theoretical results establish risk consistency for the plug-in approach and efficient, asymptotically normal behavior for the semiparametric estimator, including consistent codebooks and improved rates when nuisance functions are estimated flexibly. Empirical illustrations include a simulation study and a case study on adolescent substance-abuse treatment programs, revealing meaningful subgroup structures with distinct treatment effects. Overall, the framework provides a practical, flexible tool for uncovering and evaluating subgroup-specific causal effects in multi-treatment and outcome-wide settings.

Abstract

Causal effects are often characterized with population summaries. These might provide an incomplete picture when there are heterogeneous treatment effects across subgroups. Since the subgroup structure is typically unknown, it is more challenging to identify and evaluate subgroup effects than population effects. We propose a new solution to this problem: Causal k-Means Clustering, which harnesses the widely-used k-means clustering algorithm to uncover the unknown subgroup structure. Our problem differs significantly from the conventional clustering setup since the variables to be clustered are unknown counterfactual functions. We present a plug-in estimator which is simple and readily implementable using off-the-shelf algorithms, and study its rate of convergence. We also develop a new bias-corrected estimator based on nonparametric efficiency theory and double machine learning, and show that this estimator achieves fast root-n rates and asymptotic normality in large nonparametric models. Our proposed methods are especially useful for modern outcome-wide studies with multiple treatment levels. Further, our framework is extensible to clustering with generic pseudo-outcomes, such as partially observed outcomes or otherwise unknown functions. Finally, we explore finite sample properties via simulation, and illustrate the proposed methods in a study of treatment programs for adolescent substance abuse.

Causal K-Means Clustering

TL;DR

. It develops two estimators: a straightforward plug-in method and a bias-corrected semiparametric estimator based on efficient influence functions with cross-fitting, achieving fast

-type rates under a margin condition. Theoretical results establish risk consistency for the plug-in approach and efficient, asymptotically normal behavior for the semiparametric estimator, including consistent codebooks and improved rates when nuisance functions are estimated flexibly. Empirical illustrations include a simulation study and a case study on adolescent substance-abuse treatment programs, revealing meaningful subgroup structures with distinct treatment effects. Overall, the framework provides a practical, flexible tool for uncovering and evaluating subgroup-specific causal effects in multi-treatment and outcome-wide settings.

Abstract

Paper Structure (22 sections, 15 theorems, 116 equations, 5 figures)

This paper contains 22 sections, 15 theorems, 116 equations, 5 figures.

Introduction
Heterogeneity in Treatment Effects
Understanding Heterogeneity via Cluster Analysis
Setup and estimands
Plug-in Estimator
Semiparametric Estimator
Proposed estimator
Asymptotic Properties
Illustration
Simulation Study
Case Study
Discussion
Acknowledgements
Simulation Study Details
Proofs
...and 7 more sections

Key Result

Theorem 3.1

Suppose $\mathbb{P}$ satisfies the margin condition with some $\kappa > 0$, $\alpha > 0$. Let Then under Assumptions assumption:A1-boundedness, assumption:A2-consistency, we have whenever $\widehat{\mu}$ is constructed from a separate independent sample.

Figures (5)

Figure 1: (a) An illustration of causal clustering for binary treatments, where $\mathbb{E}[Y^1-Y^0]=0$; (b) We aim to uncover true subgroup structure with six clusters, with units within each cluster being more homogeneous in terms of the CATE; (c) The histogram fails to reveal the details about the true subgroup structure.
Figure 2: Consider a scenario where a treatment is ineffective for the low-risk patients but beneficial for those whose baseline risk $\mu_0$ exceeds a certain threshold. For example, the treatment effect could be near-zero for a group with $\mu_0\approx 10$, highly beneficial for $\mu_0\approx 20$, and moderately beneficial for $\mu_0\approx 30$. In this case, given the same data, cluster analysis with the parametrization $\mu = (\mu_0, \mu_1 - \mu_0)$ in (b) makes it easier to understand how treatment effects vary with the baseline risk than $\mu = (\mu_0, \mu_1)$ in (a).
Figure 3: Illustration of the margin condition in Definition \ref{['def:margin-condition']}, where we control the probability mass in the shaded area within the red-dashed lines specified by $\kappa$.
Figure 4: (a) presents the finite sample performance of the plug-in estimator (pi) and the semiparametric estimator (eff) with respect to excess risk and codebook, across different sample sizes ($n=250, 1000, 5000$) and nuisance estimation rates ($1/4, 1/2$). (b) and (c) illustrate causal k-means clustering with the substance abuse dataset; (b) displays the four clusters in the counterfactual mean vector space, and (c) shows the density plots of pairwise CATE estimates $\widehat{\tau}_{2,1}$ (upper) and $\widehat{\tau}_{3,1}$ (lower) across the four clusters.
Figure 5: Finite sample performance of the plug-in estimator (pi) and the efficient semiparametric estimator (eff) with respect to the excess risk (a) and codebook (b), across different sample sizes ($n=250 \sim 10,000$) and nuisance estimation rates ($1/4, 1/2$). Each point is obtained with $500$ simulations.

Theorems & Definitions (32)

Definition 3.1: Margin condition
Theorem 3.1
Theorem 3.2
Corollary 3.3
Lemma 4.1
Lemma 4.2
Corollary 4.3
Theorem 4.4
Theorem 4.5
Lemma B.1
...and 22 more

Causal K-Means Clustering

TL;DR

Abstract

Causal K-Means Clustering

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (5)

Theorems & Definitions (32)