Table of Contents
Fetching ...

Omni TM-AE: A Scalable and Interpretable Embedding Model Using the Full Tsetlin Machine State Space

Ahmed K. Kadhim, Lei Jiao, Rishad Shafik, Ole-Christoffer Granmo

TL;DR

Omni TM-AE tackles interpretability and reusability of word embeddings by leveraging the full state space of the Tsetlin Machine, including literals traditionally excluded from clause formation. It enables single-phase training and produces interpretable, clause-based embeddings whose components derive from signed sums over all literals in the state matrix, including negations. Empirically, Omni TM-AE delivers competitive performance across semantic similarity, sentiment classification, and clustering tasks, outperforming several traditional embeddings and approaching contextual models like ELMo and BERT in some settings. This work demonstrates a principled balance between performance, scalability, and interpretability, offering a transparent alternative for NLP embeddings with diagnostic tractability.

Abstract

The increasing complexity of large-scale language models has amplified concerns regarding their interpretability and reusability. While traditional embedding models like Word2Vec and GloVe offer scalability, they lack transparency and often behave as black boxes. Conversely, interpretable models such as the Tsetlin Machine (TM) have shown promise in constructing explainable learning systems, though they previously faced limitations in scalability and reusability. In this paper, we introduce Omni Tsetlin Machine AutoEncoder (Omni TM-AE), a novel embedding model that fully exploits the information contained in the TM's state matrix, including literals previously excluded from clause formation. This method enables the construction of reusable, interpretable embeddings through a single training phase. Extensive experiments across semantic similarity, sentiment classification, and document clustering tasks show that Omni TM-AE performs competitively with and often surpasses mainstream embedding models. These results demonstrate that it is possible to balance performance, scalability, and interpretability in modern Natural Language Processing (NLP) systems without resorting to opaque architectures.

Omni TM-AE: A Scalable and Interpretable Embedding Model Using the Full Tsetlin Machine State Space

TL;DR

Omni TM-AE tackles interpretability and reusability of word embeddings by leveraging the full state space of the Tsetlin Machine, including literals traditionally excluded from clause formation. It enables single-phase training and produces interpretable, clause-based embeddings whose components derive from signed sums over all literals in the state matrix, including negations. Empirically, Omni TM-AE delivers competitive performance across semantic similarity, sentiment classification, and clustering tasks, outperforming several traditional embeddings and approaching contextual models like ELMo and BERT in some settings. This work demonstrates a principled balance between performance, scalability, and interpretability, offering a transparent alternative for NLP embeddings with diagnostic tractability.

Abstract

The increasing complexity of large-scale language models has amplified concerns regarding their interpretability and reusability. While traditional embedding models like Word2Vec and GloVe offer scalability, they lack transparency and often behave as black boxes. Conversely, interpretable models such as the Tsetlin Machine (TM) have shown promise in constructing explainable learning systems, though they previously faced limitations in scalability and reusability. In this paper, we introduce Omni Tsetlin Machine AutoEncoder (Omni TM-AE), a novel embedding model that fully exploits the information contained in the TM's state matrix, including literals previously excluded from clause formation. This method enables the construction of reusable, interpretable embeddings through a single training phase. Extensive experiments across semantic similarity, sentiment classification, and document clustering tasks show that Omni TM-AE performs competitively with and often surpasses mainstream embedding models. These results demonstrate that it is possible to balance performance, scalability, and interpretability in modern Natural Language Processing (NLP) systems without resorting to opaque architectures.

Paper Structure

This paper contains 16 sections, 3 equations, 5 figures, 3 tables, 1 algorithm.

Figures (5)

  • Figure 1: (Left) State‑transition diagram for a single Tsetlin automaton, illustrating penalty/reward actions between the ground state=1 “max‑forgotten” to the $2N$ “max‑memorizing” state. (Right) Clause construction space: the $x$‑axis indexes literals (vocabulary features and their negations) while the $y$‑axis records their current automaton state. Literals above threshold $N$ are selected to constitute the clause.
  • Figure 2: In The TM-AE structure the process of each example in epochs consists of three phases: (1) Preparation, where input vectors are generated from documents and encoded into binary form. (2) Evaluation, where input $X$ is processed through a state matrix, weight matrix, and logical conditions to determine output predictions. and (3) Update, where TAs in a clause adjust their states to include or exclude literals based on whether the clause matches the input data.
  • Figure 3: Visualization of a trained clause for the target class “happy,” where the word “fox” is the only literal exceeding the threshold state ($N$ = 128 red line). Additional literals are present at lower levels and provide latent contextual information.
  • Figure 4: t-SNE visualization of word embeddings generated by the Omni model for 130 words grouped into 13 semantic clusters.
  • Figure 5: Distribution of literal after training with a large vocabulary of 40,000 tokens (80,000 literals including negations). The model learns to reduce literal states for original tokens to improve clause discrimination within a few epochs.