Pre-Sorted Tsetlin Machine (The Genetic K-Medoid Method)
Jordan Morris
TL;DR
The paper tackles slow training and inference in Tsetlin Machines by introducing a three-stage pre-sort pipeline that partitions data into $K$ dispersed subproblems. It first selects $K$ representative datapoints per class via Binary Maximum Dispersion, then forms $K$ clusters with Binary K-Medoid using hamming distance, and finally aligns these medoids across classes with an expedited Genetic Class Alignment to create $K$ independent Tsetlin Machines. Empirical results on MNIST-style datasets show improvements in accuracy (up to $10.0 ext{pp}$) and dramatic reductions in both training (up to $383\times$) and inference (up to $86\times$) times, with potential one-shot inference and reduced memory footprint in edge scenarios. The approach leverages the efficiency of logical operations and emphasizes hardware-friendly primitives (AND, XNOR, popcount) to enable scalable deployment and hardware optimization.
Abstract
This paper proposes a machine learning pre-sort stage to traditional supervised learning using Tsetlin Machines. Initially, K data-points are identified from the dataset using an expedited genetic algorithm to solve the maximum dispersion problem. These are then used as the initial placement to run the K-Medoid clustering algorithm. Finally, an expedited genetic algorithm is used to align K independent Tsetlin Machines by maximising hamming distance. For MNIST level classification problems, results demonstrate up to 10% improvement in accuracy, approx. 383X reduction in training time and approx. 86X reduction in inference time.
