Table of Contents
Fetching ...

Differentiable Weightless Controllers: Learning Logic Circuits for Continuous Control

Fabian Kresse, Christoph H. Lampert

TL;DR

This work introduces Differentiable Weightless Controllers (DWCs), a differentiable, light-weight logic-based alternative to neural networks for continuous control, designed for FPGA deployment. DWCs encode real-valued observations with thermometer encoding, process them through sparse boolean LUTs, and produce continuous actions with a learnable head, enabling few-cycle latency and nanojoule energy per action. Trained with gradient-based RL and surrogate gradients, DWCs achieve parity with full-precision and quantized baselines on most MuJoCo tasks, with capacity limitations mainly evident in HalfCheetah. The approach also offers interpretability via sparse input connections and threshold analyses, while demonstrating substantial hardware efficiency and potential for formal verification in future work.

Abstract

We investigate whether continuous-control policies can be represented and learned as discrete logic circuits instead of continuous neural networks. We introduce Differentiable Weightless Controllers (DWCs), a symbolic-differentiable architecture that maps real-valued observations to actions using thermometer-encoded inputs, sparsely connected boolean lookup-table layers, and lightweight action heads. DWCs can be trained end-to-end by gradient-based techniques, yet compile directly into FPGA-compatible circuits with few- or even single-clock-cycle latency and nanojoule-level energy cost per action. Across five MuJoCo benchmarks, including high-dimensional Humanoid, DWCs achieve returns competitive with weight-based policies (full precision or quantized neural networks), matching performance on four tasks and isolating network capacity as the key limiting factor on HalfCheetah. Furthermore, DWCs exhibit structurally sparse and interpretable connectivity patterns, enabling a direct inspection of which input thresholds influence control decisions.

Differentiable Weightless Controllers: Learning Logic Circuits for Continuous Control

TL;DR

This work introduces Differentiable Weightless Controllers (DWCs), a differentiable, light-weight logic-based alternative to neural networks for continuous control, designed for FPGA deployment. DWCs encode real-valued observations with thermometer encoding, process them through sparse boolean LUTs, and produce continuous actions with a learnable head, enabling few-cycle latency and nanojoule energy per action. Trained with gradient-based RL and surrogate gradients, DWCs achieve parity with full-precision and quantized baselines on most MuJoCo tasks, with capacity limitations mainly evident in HalfCheetah. The approach also offers interpretability via sparse input connections and threshold analyses, while demonstrating substantial hardware efficiency and potential for formal verification in future work.

Abstract

We investigate whether continuous-control policies can be represented and learned as discrete logic circuits instead of continuous neural networks. We introduce Differentiable Weightless Controllers (DWCs), a symbolic-differentiable architecture that maps real-valued observations to actions using thermometer-encoded inputs, sparsely connected boolean lookup-table layers, and lightweight action heads. DWCs can be trained end-to-end by gradient-based techniques, yet compile directly into FPGA-compatible circuits with few- or even single-clock-cycle latency and nanojoule-level energy cost per action. Across five MuJoCo benchmarks, including high-dimensional Humanoid, DWCs achieve returns competitive with weight-based policies (full precision or quantized neural networks), matching performance on four tasks and isolating network capacity as the key limiting factor on HalfCheetah. Furthermore, DWCs exhibit structurally sparse and interpretable connectivity patterns, enabling a direct inspection of which input thresholds influence control decisions.

Paper Structure

This paper contains 25 sections, 7 equations, 11 figures, 10 tables.

Figures (11)

  • Figure 1: Differentiable Weightless Controllers (DWCs): real-valued observations are thermometer-encoded into bitvectors, processed by two layers of multi-input boolean-output lookup tables (here drawn with 2 inputs), aggregated by group summation, and mapped via per-action memory lookups to final action values.
  • Figure 2: DWC thermometer threshold positions.
  • Figure 3: Mean return and standard deviation for ten models evaluated across the training steps, ten evaluation episodes per datapoint and model. Except for HalfCheetah (see main text), training trajectories are comparable to the FP baseline.
  • Figure 4: Policy returns for FP, Quant and DWCs with varying LUT layer widths. Generally, already models with 256 to 512 LUTs per layer achieve returns on par with the FP baseline. Only for HalfCheetah, we observe a monotonically increasing median return with increasing LUT layer width. * indicates a special high-capacity model with 16k-LUTs per layer and 255-bit per input dimension, see Section \ref{['sec:ablation']}.
  • Figure 5: Reward performance under injected observation noise with varying noise level $\sigma$. Floating-point (FP), QAT policies from kresse2025learningquantizedcontinuouscontrollers and our DWCs on MuJoCo tasks. Bands show one standard deviation across trained models. The quantized models and DWCs perform better, or on par, with the FP baseline under injection, except for Humanoid, where the smaller DWCs show reduced rewards for larger noise.
  • ...and 6 more figures