Table of Contents
Fetching ...

A classification model based on a population of hypergraphs

Samuel Barton, Adelle Coster, Diane Donovan, James Lefevre

TL;DR

This work tackles classification by explicitly modeling high-order feature interactions through a population of hypergraphs. It constructs discrete incidence representations via normalization and discretization, builds a main-effects hypergraph, and extends to two-way and higher-order eta-way interactions, aggregating predictions from many models for robustness. Empirical results on Fisher's Iris and starch grain datasets show competitive accuracy relative to Random Forest, with higher gains on the more complex starch dataset, especially when using eta > 1 and threshold-based decision rules. The approach highlights the value of higher-order interactions and ensemble hypergraph models for robust, generalizable classification with practical capabilities to rule out unlikely classes.

Abstract

This paper introduces a novel hypergraph classification algorithm. The use of hypergraphs in this framework has been widely studied. In previous work, hypergraph models are typically constructed using distance or attribute based methods. That is, hyperedges are generated by connecting a set of samples which are within a certain distance or have a common attribute. These methods however, do not often focus on multi-way interactions directly. The algorithm provided in this paper looks to address this problem by constructing hypergraphs which explore multi-way interactions of any order. We also increase the performance and robustness of the algorithm by using a population of hypergraphs. The algorithm is evaluated on two datasets, demonstrating promising performance compared to a generic random forest classification algorithm.

A classification model based on a population of hypergraphs

TL;DR

This work tackles classification by explicitly modeling high-order feature interactions through a population of hypergraphs. It constructs discrete incidence representations via normalization and discretization, builds a main-effects hypergraph, and extends to two-way and higher-order eta-way interactions, aggregating predictions from many models for robustness. Empirical results on Fisher's Iris and starch grain datasets show competitive accuracy relative to Random Forest, with higher gains on the more complex starch dataset, especially when using eta > 1 and threshold-based decision rules. The approach highlights the value of higher-order interactions and ensemble hypergraph models for robust, generalizable classification with practical capabilities to rule out unlikely classes.

Abstract

This paper introduces a novel hypergraph classification algorithm. The use of hypergraphs in this framework has been widely studied. In previous work, hypergraph models are typically constructed using distance or attribute based methods. That is, hyperedges are generated by connecting a set of samples which are within a certain distance or have a common attribute. These methods however, do not often focus on multi-way interactions directly. The algorithm provided in this paper looks to address this problem by constructing hypergraphs which explore multi-way interactions of any order. We also increase the performance and robustness of the algorithm by using a population of hypergraphs. The algorithm is evaluated on two datasets, demonstrating promising performance compared to a generic random forest classification algorithm.
Paper Structure (21 sections, 15 equations, 8 figures, 14 tables)

This paper contains 21 sections, 15 equations, 8 figures, 14 tables.

Figures (8)

  • Figure 1: Accuracy scores across different hypergraph population sizes.
  • Figure 2: The accuracy of the $\mathcal{H}_3$-algorithm (red) and the percentage of units classified (blue) across decision threshold values.
  • Figure 3: The accuracy scores achieved for different threshold values when using the class prediction technique for the $\mathcal{H}_1$ and $\mathcal{H}_2$-algorithms.
  • Figure 4: The number of species ruled out for different threshold values when using the class prediction technique for the $\mathcal{H}_1$ and $\mathcal{H}_2$-algorithms.
  • Figure 5: The accuracy scores achieved for different threshold values when using the probability distribution technique for the $\mathcal{H}_1$ and $\mathcal{H}_2$-algorithms.
  • ...and 3 more figures

Theorems & Definitions (9)

  • Definition 2.1
  • Definition 2.2
  • Example 2.3
  • Definition 3.1
  • Definition 3.2
  • Definition 3.3
  • Definition 3.4
  • Definition 4.1
  • Definition 4.2