Chimera: Neuro-Symbolic Attention Primitives for Trustworthy Dataplane Intelligence
Rong Fu, Wenxin Zhang, Xiaowen Ma, Kun Liu, Wangyu Wu, Ziyu Kong, Jia Yee Tan, Tailong Luo, Xianda Li, Zeli Su, Youjin Wang, Yongtai Liu, Simon Fong
TL;DR
Chimera tackles trustworthy, in-network inference by mapping Transformer-style attention and neuro-symbolic reasoning onto programmable dataplane primitives, enabling line-rate, auditable decisions on commodity switches. It introduces kernelized linear attention with a two-layer key selection (local SRAM window + static TCAM indices) and a cascade fusion that enforces hard symbolic vetoes while retaining neural expressivity. A two-timescale mapping protocol combines fast dataplane adaptations with slow control-plane re-clustering, ensuring stability and minimal table churn under budgeted SRAM/TCAM resources. Empirical results on public traffic datasets show high classification and anomaly-detection performance with sub-microsecond latency and orders-of-magnitude throughput gains over CPU/GPU baselines, with ablations confirming the importance of the architectural choices. Overall, Chimera demonstrates that neuro-symbolic primitives can achieve high-fidelity, trustworthy inference within realistic dataplane budgets and multi-pipeline scalability.
Abstract
Deploying expressive learning models directly on programmable dataplanes promises line-rate, low-latency traffic analysis but remains hindered by strict hardware constraints and the need for predictable, auditable behavior. Chimera introduces a principled framework that maps attention-oriented neural computations and symbolic constraints onto dataplane primitives, enabling trustworthy inference within the match-action pipeline. Chimera combines a kernelized, linearized attention approximation with a two-layer key-selection hierarchy and a cascade fusion mechanism that enforces hard symbolic guarantees while preserving neural expressivity. The design includes a hardware-aware mapping protocol and a two-timescale update scheme that together permit stable, line-rate operation under realistic dataplane budgets. The paper presents the Chimera architecture, a hardware mapping strategy, and empirical evidence showing that neuro-symbolic attention primitives can achieve high-fidelity inference within the resource envelope of commodity programmable switches.
