Table of Contents
Fetching ...

Interpretable clustering via optimal multiway-split decision trees

Hayato Suzuki, Shunnosuke Ikeda, Yuichi Takano

TL;DR

This work proposes an interpretable clustering method based on optimal multiway-split decision trees, formulated as a 0-1 integer linear optimization problem, which yields multiway-split decision trees with concise decision rules while maintaining competitive performance across various evaluation metrics.

Abstract

Clustering serves as a vital tool for uncovering latent data structures, and achieving both high accuracy and interpretability is essential. To this end, existing methods typically construct binary decision trees by solving mixed-integer nonlinear optimization problems, often leading to significant computational costs and suboptimal solutions. Furthermore, binary decision trees frequently result in excessively deep structures, which makes them difficult to interpret. To mitigate these issues, we propose an interpretable clustering method based on optimal multiway-split decision trees, formulated as a 0-1 integer linear optimization problem. This reformulation renders the optimization problem more tractable compared to existing models. A key feature of our method is the integration of a one-dimensional K-means algorithm for the discretization of continuous variables, allowing for flexible and data-driven branching. Extensive numerical experiments on publicly available real-world datasets demonstrate that our method outperforms baseline methods in terms of clustering accuracy and interpretability. Our method yields multiway-split decision trees with concise decision rules while maintaining competitive performance across various evaluation metrics.

Interpretable clustering via optimal multiway-split decision trees

TL;DR

This work proposes an interpretable clustering method based on optimal multiway-split decision trees, formulated as a 0-1 integer linear optimization problem, which yields multiway-split decision trees with concise decision rules while maintaining competitive performance across various evaluation metrics.

Abstract

Clustering serves as a vital tool for uncovering latent data structures, and achieving both high accuracy and interpretability is essential. To this end, existing methods typically construct binary decision trees by solving mixed-integer nonlinear optimization problems, often leading to significant computational costs and suboptimal solutions. Furthermore, binary decision trees frequently result in excessively deep structures, which makes them difficult to interpret. To mitigate these issues, we propose an interpretable clustering method based on optimal multiway-split decision trees, formulated as a 0-1 integer linear optimization problem. This reformulation renders the optimization problem more tractable compared to existing models. A key feature of our method is the integration of a one-dimensional K-means algorithm for the discretization of continuous variables, allowing for flexible and data-driven branching. Extensive numerical experiments on publicly available real-world datasets demonstrate that our method outperforms baseline methods in terms of clustering accuracy and interpretability. Our method yields multiway-split decision trees with concise decision rules while maintaining competitive performance across various evaluation metrics.
Paper Structure (28 sections, 19 equations, 11 figures, 9 tables)

This paper contains 28 sections, 19 equations, 11 figures, 9 tables.

Figures (11)

  • Figure 1: An example of a feature graph
  • Figure 2: A multiway-split decision tree obtained from the red path in Fig. \ref{['fig:feature_graph']}
  • Figure 3: Distribution of the target variable (house price of unit area)
  • Figure 4: Distribution of ground-truth labels (house price of unit area)
  • Figure 5: Distribution of the target variable (loyalty score)
  • ...and 6 more figures