Table of Contents
Fetching ...

A Hybrid Tsallis-Polarization Impurity Measure for Decision Trees: Theoretical Foundations and Empirical Evaluation

Edouard Lansiaux, Idriss Jairi, Hayfa Zgaya-Biau

Abstract

We introduce the Integrated Tsallis Combination (ITC), a hybrid impurity measure for decision tree learning that combines normalized Tsallis entropy with an exponential polarization component. While many existing measures sacrifice theoretical soundness for computational efficiency or vice versa, ITC provides a mathematically principled framework that balances both aspects. The core innovation lies in the complementarity between Tsallis entropy's information-theoretic foundations and the polarization component's sensitivity to distributional asymmetry. We establish key theoretical properties-concavity under explicit parameter conditions, proper boundary conditions, and connections to classical measures-and provide a rigorous justification for the hybridization strategy. Through an extensive comparative evaluation on seven benchmark datasets comparing 23 impurity measures with five-fold repetition, we show that simple parametric measures (Tsallis $α=0.5$) achieve the highest average accuracy ($91.17\%$), while ITC variants yield competitive results ($88.38-89.16\%$) with strong theoretical guarantees. Statistical analysis (Friedman test: $χ^2=3.89$, $p=0.692$) reveals no significant global differences among top performers, indicating practical equivalence for many applications. ITC's value resides in its solid theoretical grounding-proven concavity under suitable conditions, flexible parameterization ($α$, $β$, $γ$), and computational efficiency $O(K)$-making it a rigorous, generalizable alternative when theoretical guarantees are paramount. We provide guidelines for measure selection based on application priorities and release an open-source implementation to foster reproducibility and further research.

A Hybrid Tsallis-Polarization Impurity Measure for Decision Trees: Theoretical Foundations and Empirical Evaluation

Abstract

We introduce the Integrated Tsallis Combination (ITC), a hybrid impurity measure for decision tree learning that combines normalized Tsallis entropy with an exponential polarization component. While many existing measures sacrifice theoretical soundness for computational efficiency or vice versa, ITC provides a mathematically principled framework that balances both aspects. The core innovation lies in the complementarity between Tsallis entropy's information-theoretic foundations and the polarization component's sensitivity to distributional asymmetry. We establish key theoretical properties-concavity under explicit parameter conditions, proper boundary conditions, and connections to classical measures-and provide a rigorous justification for the hybridization strategy. Through an extensive comparative evaluation on seven benchmark datasets comparing 23 impurity measures with five-fold repetition, we show that simple parametric measures (Tsallis ) achieve the highest average accuracy (), while ITC variants yield competitive results () with strong theoretical guarantees. Statistical analysis (Friedman test: , ) reveals no significant global differences among top performers, indicating practical equivalence for many applications. ITC's value resides in its solid theoretical grounding-proven concavity under suitable conditions, flexible parameterization (, , ), and computational efficiency -making it a rigorous, generalizable alternative when theoretical guarantees are paramount. We provide guidelines for measure selection based on application priorities and release an open-source implementation to foster reproducibility and further research.
Paper Structure (24 sections, 4 theorems, 1 figure, 5 tables)

This paper contains 24 sections, 4 theorems, 1 figure, 5 tables.

Key Result

Theorem 1

For any $K \ge 2$, $\alpha > 0$, $\beta > 0$, $\gamma \in [0,1]$, ITC satisfies purity (zero if and only if $p_i = 1$ for some $i$), uniformity (maximized at $p_i = 1/K$ for all $i$), and symmetry (invariant under permutation of class labels). These properties follow directly from the corresponding

Figures (1)

  • Figure 1: Average ranks of the top 10 impurity measures. The horizontal axis represents the mean rank across datasets. ITC variants lie in the middle range, overlapping with many top measures.

Theorems & Definitions (5)

  • Theorem 1: Boundary Conditions
  • Theorem 2: Concavity
  • Remark 1
  • Theorem 3: Connection to Classical Measures
  • Theorem 4: Computational Complexity