A Hybrid Tsallis-Polarization Impurity Measure for Decision Trees: Theoretical Foundations and Empirical Evaluation

Edouard Lansiaux; Idriss Jairi; Hayfa Zgaya-Biau

A Hybrid Tsallis-Polarization Impurity Measure for Decision Trees: Theoretical Foundations and Empirical Evaluation

Edouard Lansiaux, Idriss Jairi, Hayfa Zgaya-Biau

Abstract

We introduce the Integrated Tsallis Combination (ITC), a hybrid impurity measure for decision tree learning that combines normalized Tsallis entropy with an exponential polarization component. While many existing measures sacrifice theoretical soundness for computational efficiency or vice versa, ITC provides a mathematically principled framework that balances both aspects. The core innovation lies in the complementarity between Tsallis entropy's information-theoretic foundations and the polarization component's sensitivity to distributional asymmetry. We establish key theoretical properties-concavity under explicit parameter conditions, proper boundary conditions, and connections to classical measures-and provide a rigorous justification for the hybridization strategy. Through an extensive comparative evaluation on seven benchmark datasets comparing 23 impurity measures with five-fold repetition, we show that simple parametric measures (Tsallis $α=0.5$) achieve the highest average accuracy ($91.17\%$), while ITC variants yield competitive results ($88.38-89.16\%$) with strong theoretical guarantees. Statistical analysis (Friedman test: $χ^2=3.89$, $p=0.692$) reveals no significant global differences among top performers, indicating practical equivalence for many applications. ITC's value resides in its solid theoretical grounding-proven concavity under suitable conditions, flexible parameterization ($α$, $β$, $γ$), and computational efficiency $O(K)$-making it a rigorous, generalizable alternative when theoretical guarantees are paramount. We provide guidelines for measure selection based on application priorities and release an open-source implementation to foster reproducibility and further research.

A Hybrid Tsallis-Polarization Impurity Measure for Decision Trees: Theoretical Foundations and Empirical Evaluation

Abstract

) achieve the highest average accuracy (

), while ITC variants yield competitive results (

) with strong theoretical guarantees. Statistical analysis (Friedman test:

) reveals no significant global differences among top performers, indicating practical equivalence for many applications. ITC's value resides in its solid theoretical grounding-proven concavity under suitable conditions, flexible parameterization (

), and computational efficiency

-making it a rigorous, generalizable alternative when theoretical guarantees are paramount. We provide guidelines for measure selection based on application priorities and release an open-source implementation to foster reproducibility and further research.

Paper Structure (24 sections, 4 theorems, 1 figure, 5 tables)

This paper contains 24 sections, 4 theorems, 1 figure, 5 tables.

Introduction
Background and Related Work
Classical Impurity Measures
Parametric Generalizations
Probabilistic Divergences and Distance-Based Measures
Polarization and Hybrid Approaches
The ITC Impurity Measure
Mathematical Formulation
Theoretical Foundation for Hybridization
Theoretical Properties
Parameter Optimization and Sensitivity Analysis
Experimental Evaluation
Experimental Setup
Overall Performance Results
Statistical Significance Analysis
...and 9 more sections

Key Result

Theorem 1

For any $K \ge 2$, $\alpha > 0$, $\beta > 0$, $\gamma \in [0,1]$, ITC satisfies purity (zero if and only if $p_i = 1$ for some $i$), uniformity (maximized at $p_i = 1/K$ for all $i$), and symmetry (invariant under permutation of class labels). These properties follow directly from the corresponding

Figures (1)

Figure 1: Average ranks of the top 10 impurity measures. The horizontal axis represents the mean rank across datasets. ITC variants lie in the middle range, overlapping with many top measures.

Theorems & Definitions (5)

Theorem 1: Boundary Conditions
Theorem 2: Concavity
Remark 1
Theorem 3: Connection to Classical Measures
Theorem 4: Computational Complexity

A Hybrid Tsallis-Polarization Impurity Measure for Decision Trees: Theoretical Foundations and Empirical Evaluation

Abstract

A Hybrid Tsallis-Polarization Impurity Measure for Decision Trees: Theoretical Foundations and Empirical Evaluation

Authors

Abstract

Table of Contents

Key Result

Figures (1)

Theorems & Definitions (5)