Differentiable Weightless Controllers: Learning Logic Circuits for Continuous Control
Fabian Kresse, Christoph H. Lampert
TL;DR
This work introduces Differentiable Weightless Controllers (DWCs), a differentiable, light-weight logic-based alternative to neural networks for continuous control, designed for FPGA deployment. DWCs encode real-valued observations with thermometer encoding, process them through sparse boolean LUTs, and produce continuous actions with a learnable head, enabling few-cycle latency and nanojoule energy per action. Trained with gradient-based RL and surrogate gradients, DWCs achieve parity with full-precision and quantized baselines on most MuJoCo tasks, with capacity limitations mainly evident in HalfCheetah. The approach also offers interpretability via sparse input connections and threshold analyses, while demonstrating substantial hardware efficiency and potential for formal verification in future work.
Abstract
We investigate whether continuous-control policies can be represented and learned as discrete logic circuits instead of continuous neural networks. We introduce Differentiable Weightless Controllers (DWCs), a symbolic-differentiable architecture that maps real-valued observations to actions using thermometer-encoded inputs, sparsely connected boolean lookup-table layers, and lightweight action heads. DWCs can be trained end-to-end by gradient-based techniques, yet compile directly into FPGA-compatible circuits with few- or even single-clock-cycle latency and nanojoule-level energy cost per action. Across five MuJoCo benchmarks, including high-dimensional Humanoid, DWCs achieve returns competitive with weight-based policies (full precision or quantized neural networks), matching performance on four tasks and isolating network capacity as the key limiting factor on HalfCheetah. Furthermore, DWCs exhibit structurally sparse and interpretable connectivity patterns, enabling a direct inspection of which input thresholds influence control decisions.
