Table of Contents
Fetching ...

Harmonic Token Projection (HTP): A Vocabulary-Free, Training-Free, Deterministic, and Reversible Embedding Methodology

Tcharlies Schmitz

TL;DR

Harmonic Token Projection (HTP) introduces a vocabulary-free, training-free, deterministic, and reversible embedding framework that encodes tokens through harmonic projections of their Unicode representations. By mapping each token to an integer N_t in base $B=2^{16}$, decomposing into modular residues $r_i$, and projecting onto the unit circle, HTP yields a bijective transform $E(t)$ that can be inverted via the Chinese Remainder Theorem. Sentence representations are obtained through ITF-weighted harmonic pooling, enabling efficient, corpus-independent semantics with sub-millisecond CPU latency. Empirical evaluation on STS-B and multilingual STS-B shows competitive semantic similarity relative to traditional baselines and near-reasonable performance compared to lower-range transformer models, highlighting a transparent analytic alternative that complements data-driven approaches.

Abstract

This paper introduces the Harmonic Token Projection (HTP), a reversible and deterministic framework for generating text embeddings without training, vocabularies, or stochastic parameters. Unlike neural embeddings that rely on statistical co-occurrence or optimization, HTP encodes each token analytically as a harmonic trajectory derived from its Unicode integer representation, establishing a bijective and interpretable mapping between discrete symbols and continuous vector space. The harmonic formulation provides phase-coherent projections that preserve both structure and reversibility, enabling semantic similarity estimation from purely geometric alignment. Experimental evaluation on the Semantic Textual Similarity Benchmark (STS-B) and its multilingual extension shows that HTP achieves a Spearman correlation of \r{ho} = 0.68 in English, maintaining stable performance across ten languages with negligible computational cost and sub-millisecond latency per sentence pair. This demonstrates that meaningful semantic relations can emerge from deterministic geometry, offering a transparent and efficient alternative to data-driven embeddings. Keywords: Harmonic Token Projection, reversible embedding, deterministic encoding, semantic similarity, multilingual representation.

Harmonic Token Projection (HTP): A Vocabulary-Free, Training-Free, Deterministic, and Reversible Embedding Methodology

TL;DR

Harmonic Token Projection (HTP) introduces a vocabulary-free, training-free, deterministic, and reversible embedding framework that encodes tokens through harmonic projections of their Unicode representations. By mapping each token to an integer N_t in base , decomposing into modular residues , and projecting onto the unit circle, HTP yields a bijective transform that can be inverted via the Chinese Remainder Theorem. Sentence representations are obtained through ITF-weighted harmonic pooling, enabling efficient, corpus-independent semantics with sub-millisecond CPU latency. Empirical evaluation on STS-B and multilingual STS-B shows competitive semantic similarity relative to traditional baselines and near-reasonable performance compared to lower-range transformer models, highlighting a transparent analytic alternative that complements data-driven approaches.

Abstract

This paper introduces the Harmonic Token Projection (HTP), a reversible and deterministic framework for generating text embeddings without training, vocabularies, or stochastic parameters. Unlike neural embeddings that rely on statistical co-occurrence or optimization, HTP encodes each token analytically as a harmonic trajectory derived from its Unicode integer representation, establishing a bijective and interpretable mapping between discrete symbols and continuous vector space. The harmonic formulation provides phase-coherent projections that preserve both structure and reversibility, enabling semantic similarity estimation from purely geometric alignment. Experimental evaluation on the Semantic Textual Similarity Benchmark (STS-B) and its multilingual extension shows that HTP achieves a Spearman correlation of \r{ho} = 0.68 in English, maintaining stable performance across ten languages with negligible computational cost and sub-millisecond latency per sentence pair. This demonstrates that meaningful semantic relations can emerge from deterministic geometry, offering a transparent and efficient alternative to data-driven embeddings. Keywords: Harmonic Token Projection, reversible embedding, deterministic encoding, semantic similarity, multilingual representation.

Paper Structure

This paper contains 16 sections, 13 equations, 5 tables.