Table of Contents
Fetching ...

Polynomial Surrogate Training for Differentiable Ternary Logic Gate Networks

Sai Sandeep Damera, Ryan Matheu, Aniruddh G. Puranic, John S. Baras

TL;DR

Polynomial Surrogate Training (PST) is introduced, which represents each ternary neuron as a degree-$(2,2)$ polynomial with 9 learnable coefficients and proves that the gap between the trained network and its discretized logic circuit is bounded by a data-independent commitment loss that vanishes at convergence.

Abstract

Differentiable logic gate networks (DLGNs) learn compact, interpretable Boolean circuits via gradient-based training, but all existing variants are restricted to the 16 two-input binary gates. Extending DLGNs to Ternary Kleene $K_3$ logic and training DTLGNs where the UNKNOWN state enables principled abstention under uncertainty is desirable. However, the support set of potential gates per neuron explodes to $19{,}683$, making the established softmax-over-gates training approach intractable. We introduce Polynomial Surrogate Training (PST), which represents each ternary neuron as a degree-$(2,2)$ polynomial with 9 learnable coefficients (a $2{,}187\times$ parameter reduction) and prove that the gap between the trained network and its discretized logic circuit is bounded by a data-independent commitment loss that vanishes at convergence. Scaling experiments from 48K to 512K neurons on CIFAR-10 demonstrate that this hardening gap contracts with overparameterization. Ternary networks train $2$-$3\times$ faster than binary DLGNs and discover true ternary gates that are functionally diverse. On synthetic and tabular tasks we find that the UNKNOWN output acts as a Bayes-optimal uncertainty proxy, enabling selective prediction in which ternary circuits surpass binary accuracy once low-confidence predictions are filtered. More broadly, PST establishes a general polynomial-surrogate methodology whose parameterization cost grows only quadratically with logic valence, opening the door to many-valued differentiable logic.

Polynomial Surrogate Training for Differentiable Ternary Logic Gate Networks

TL;DR

Polynomial Surrogate Training (PST) is introduced, which represents each ternary neuron as a degree- polynomial with 9 learnable coefficients and proves that the gap between the trained network and its discretized logic circuit is bounded by a data-independent commitment loss that vanishes at convergence.

Abstract

Differentiable logic gate networks (DLGNs) learn compact, interpretable Boolean circuits via gradient-based training, but all existing variants are restricted to the 16 two-input binary gates. Extending DLGNs to Ternary Kleene logic and training DTLGNs where the UNKNOWN state enables principled abstention under uncertainty is desirable. However, the support set of potential gates per neuron explodes to , making the established softmax-over-gates training approach intractable. We introduce Polynomial Surrogate Training (PST), which represents each ternary neuron as a degree- polynomial with 9 learnable coefficients (a parameter reduction) and prove that the gap between the trained network and its discretized logic circuit is bounded by a data-independent commitment loss that vanishes at convergence. Scaling experiments from 48K to 512K neurons on CIFAR-10 demonstrate that this hardening gap contracts with overparameterization. Ternary networks train - faster than binary DLGNs and discover true ternary gates that are functionally diverse. On synthetic and tabular tasks we find that the UNKNOWN output acts as a Bayes-optimal uncertainty proxy, enabling selective prediction in which ternary circuits surpass binary accuracy once low-confidence predictions are filtered. More broadly, PST establishes a general polynomial-surrogate methodology whose parameterization cost grows only quadratically with logic valence, opening the door to many-valued differentiable logic.
Paper Structure (58 sections, 3 theorems, 22 equations, 13 figures, 13 tables, 1 algorithm)

This paper contains 58 sections, 3 theorems, 22 equations, 13 figures, 13 tables, 1 algorithm.

Key Result

theorem 1

Let $\mathcal{N}$ be a $q$-logic PST network with $N$ neurons. Let $\mathbf{t}_j = [p_{\mathbf{w}_j}(a,b)]_{(a,b) \in \mathcal{Q}^2}$ be neuron $j$'s soft truth table and $\bar{\mathbf{t}}_j \in \Lambda_q$ its hardened truth table. Then

Figures (13)

  • Figure 1: End-to-end comparison of binary DLGN training (top) and the proposed Polynomial Surrogate Training pipeline for ternary logic gate networks (bottom). Input encoding maps normalized features to bits/trits using temperature thresholding. Training: binary DLGNs learn softmax distributions over 16 gates per neuron; PST instead learns 9 polynomial coefficients per neuron, parameterising the full space of $19{,}683$ ternary gates directly. Hardening: binary networks select the argmax gate; PST evaluates each polynomial on the $3{\times}3$ ternary grid, rounds to the nearest valid truth table, and recovers discrete gate coefficients via $\mathbf{w}_j^{\mathrm{hard}} = \mathbf{V}^{-1}\operatorname{round}_{\mathcal{T}}(\mathbf{V}\mathbf{w}_j)$. Inference: both pipelines produce logic circuits; which can be taped out as ultra-efficient ASICs for Inference
  • Figure 2: Per-class circuit accuracy (vlarge and huge). Blue: DLGN. Orange: TLGN. Solid: vlarge. Hatched: huge. TLGN-huge recovers on visually complex classes (bird $+32.3$ pp, cat $+37.2$ pp, frog $+26.9$ pp vs. vlarge), surpassing DLGN on bird and cat.
  • Figure 3: Ternary $U$ tracks Bayes-optimal uncertainty. Top row: Bayes posterior entropy, ternary $U$ density, and their overlay for Gaussians (separation${}=1.5$). Bottom row: as class separation increases from 0.5 to 3.0$\sigma$, $U$ fraction and Bayes error decrease in lockstep. Binary accuracy (red) matches the Bayes rate at every separation, confirming that neither architecture is capacity-limited; the ternary circuit chooses to abstain where the posterior is ambiguous.
  • Figure 4: Training loss curves (small--large + deeper).$2 \times 2$ grid, log-scale $y$-axis. Blue: DLGN. Dashed orange: TLGN task loss. Solid orange: TLGN total loss (task + $\lambda \cdot$ commitment). TLGN converges 1-2 orders of magnitude below DLGN at medium and large scales, while both remain comparable at small scale.
  • Figure 5: Training loss curves (vlarge and huge). Left: vlarge (192K neurons). Right: huge (512K neurons). At huge scale, DLGN finally achieves clean convergence (0.015), eliminating the noisy-plateau pathology. TLGN-huge reaches 0.004 (task), the lowest of any model.
  • ...and 8 more figures

Theorems & Definitions (8)

  • theorem 1: PST Hardening Gap Bound
  • proof
  • remark 1: Scaling and Data-Independence
  • definition 1: Truth-Table Lattice
  • proposition 1: Covering Radius
  • proof
  • proposition 2: PST Architectural Scaling
  • remark 2