Table of Contents
Fetching ...

Pre-Sorted Tsetlin Machine (The Genetic K-Medoid Method)

Jordan Morris

TL;DR

The paper tackles slow training and inference in Tsetlin Machines by introducing a three-stage pre-sort pipeline that partitions data into $K$ dispersed subproblems. It first selects $K$ representative datapoints per class via Binary Maximum Dispersion, then forms $K$ clusters with Binary K-Medoid using hamming distance, and finally aligns these medoids across classes with an expedited Genetic Class Alignment to create $K$ independent Tsetlin Machines. Empirical results on MNIST-style datasets show improvements in accuracy (up to $10.0 ext{pp}$) and dramatic reductions in both training (up to $383\times$) and inference (up to $86\times$) times, with potential one-shot inference and reduced memory footprint in edge scenarios. The approach leverages the efficiency of logical operations and emphasizes hardware-friendly primitives (AND, XNOR, popcount) to enable scalable deployment and hardware optimization.

Abstract

This paper proposes a machine learning pre-sort stage to traditional supervised learning using Tsetlin Machines. Initially, K data-points are identified from the dataset using an expedited genetic algorithm to solve the maximum dispersion problem. These are then used as the initial placement to run the K-Medoid clustering algorithm. Finally, an expedited genetic algorithm is used to align K independent Tsetlin Machines by maximising hamming distance. For MNIST level classification problems, results demonstrate up to 10% improvement in accuracy, approx. 383X reduction in training time and approx. 86X reduction in inference time.

Pre-Sorted Tsetlin Machine (The Genetic K-Medoid Method)

TL;DR

The paper tackles slow training and inference in Tsetlin Machines by introducing a three-stage pre-sort pipeline that partitions data into dispersed subproblems. It first selects representative datapoints per class via Binary Maximum Dispersion, then forms clusters with Binary K-Medoid using hamming distance, and finally aligns these medoids across classes with an expedited Genetic Class Alignment to create independent Tsetlin Machines. Empirical results on MNIST-style datasets show improvements in accuracy (up to ) and dramatic reductions in both training (up to ) and inference (up to ) times, with potential one-shot inference and reduced memory footprint in edge scenarios. The approach leverages the efficiency of logical operations and emphasizes hardware-friendly primitives (AND, XNOR, popcount) to enable scalable deployment and hardware optimization.

Abstract

This paper proposes a machine learning pre-sort stage to traditional supervised learning using Tsetlin Machines. Initially, K data-points are identified from the dataset using an expedited genetic algorithm to solve the maximum dispersion problem. These are then used as the initial placement to run the K-Medoid clustering algorithm. Finally, an expedited genetic algorithm is used to align K independent Tsetlin Machines by maximising hamming distance. For MNIST level classification problems, results demonstrate up to 10% improvement in accuracy, approx. 383X reduction in training time and approx. 86X reduction in inference time.
Paper Structure (18 sections, 12 figures, 3 tables)

This paper contains 18 sections, 12 figures, 3 tables.

Figures (12)

  • Figure 1: Tsetlin Automaton
  • Figure 2: Tsetlin Machine Clause
  • Figure 3: Mono Tsetlin Machine Architecture
  • Figure 4: Multi-Class Tsetlin Machine Architecture
  • Figure 5: Pre-Sort Architecture
  • ...and 7 more figures