Table of Contents
Fetching ...

Chimera: Neuro-Symbolic Attention Primitives for Trustworthy Dataplane Intelligence

Rong Fu, Wenxin Zhang, Xiaowen Ma, Kun Liu, Wangyu Wu, Ziyu Kong, Jia Yee Tan, Tailong Luo, Xianda Li, Zeli Su, Youjin Wang, Yongtai Liu, Simon Fong

TL;DR

Chimera tackles trustworthy, in-network inference by mapping Transformer-style attention and neuro-symbolic reasoning onto programmable dataplane primitives, enabling line-rate, auditable decisions on commodity switches. It introduces kernelized linear attention with a two-layer key selection (local SRAM window + static TCAM indices) and a cascade fusion that enforces hard symbolic vetoes while retaining neural expressivity. A two-timescale mapping protocol combines fast dataplane adaptations with slow control-plane re-clustering, ensuring stability and minimal table churn under budgeted SRAM/TCAM resources. Empirical results on public traffic datasets show high classification and anomaly-detection performance with sub-microsecond latency and orders-of-magnitude throughput gains over CPU/GPU baselines, with ablations confirming the importance of the architectural choices. Overall, Chimera demonstrates that neuro-symbolic primitives can achieve high-fidelity, trustworthy inference within realistic dataplane budgets and multi-pipeline scalability.

Abstract

Deploying expressive learning models directly on programmable dataplanes promises line-rate, low-latency traffic analysis but remains hindered by strict hardware constraints and the need for predictable, auditable behavior. Chimera introduces a principled framework that maps attention-oriented neural computations and symbolic constraints onto dataplane primitives, enabling trustworthy inference within the match-action pipeline. Chimera combines a kernelized, linearized attention approximation with a two-layer key-selection hierarchy and a cascade fusion mechanism that enforces hard symbolic guarantees while preserving neural expressivity. The design includes a hardware-aware mapping protocol and a two-timescale update scheme that together permit stable, line-rate operation under realistic dataplane budgets. The paper presents the Chimera architecture, a hardware mapping strategy, and empirical evidence showing that neuro-symbolic attention primitives can achieve high-fidelity inference within the resource envelope of commodity programmable switches.

Chimera: Neuro-Symbolic Attention Primitives for Trustworthy Dataplane Intelligence

TL;DR

Chimera tackles trustworthy, in-network inference by mapping Transformer-style attention and neuro-symbolic reasoning onto programmable dataplane primitives, enabling line-rate, auditable decisions on commodity switches. It introduces kernelized linear attention with a two-layer key selection (local SRAM window + static TCAM indices) and a cascade fusion that enforces hard symbolic vetoes while retaining neural expressivity. A two-timescale mapping protocol combines fast dataplane adaptations with slow control-plane re-clustering, ensuring stability and minimal table churn under budgeted SRAM/TCAM resources. Empirical results on public traffic datasets show high classification and anomaly-detection performance with sub-microsecond latency and orders-of-magnitude throughput gains over CPU/GPU baselines, with ablations confirming the importance of the architectural choices. Overall, Chimera demonstrates that neuro-symbolic primitives can achieve high-fidelity, trustworthy inference within realistic dataplane budgets and multi-pipeline scalability.

Abstract

Deploying expressive learning models directly on programmable dataplanes promises line-rate, low-latency traffic analysis but remains hindered by strict hardware constraints and the need for predictable, auditable behavior. Chimera introduces a principled framework that maps attention-oriented neural computations and symbolic constraints onto dataplane primitives, enabling trustworthy inference within the match-action pipeline. Chimera combines a kernelized, linearized attention approximation with a two-layer key-selection hierarchy and a cascade fusion mechanism that enforces hard symbolic guarantees while preserving neural expressivity. The design includes a hardware-aware mapping protocol and a two-timescale update scheme that together permit stable, line-rate operation under realistic dataplane budgets. The paper presents the Chimera architecture, a hardware mapping strategy, and empirical evidence showing that neuro-symbolic attention primitives can achieve high-fidelity inference within the resource envelope of commodity programmable switches.
Paper Structure (39 sections, 5 theorems, 50 equations, 11 figures, 5 tables, 1 algorithm)

This paper contains 39 sections, 5 theorems, 50 equations, 11 figures, 5 tables, 1 algorithm.

Key Result

Theorem A.1

Let $k(q,k)=\exp\!(q^\top k/\sqrt{d})$ be the target attention kernel. Suppose $\phi(\cdot)$ is constructed via i.i.d. positive random features (for example, the Performer-style positive random features) yielding an unbiased estimator Assume $|\phi(q)^\top\phi(k)|\le C$ almost surely for all $q,k$ in the domain. Then for any $\varepsilon\in(0,C)$ and failure probability $\delta\in(0,1)$, Consequ

Figures (11)

  • Figure 1: Overview of the Chimera architecture for trustworthy dataplane intelligence. The pipeline executes within a P4 Programmable Switch across three primary stages: Partition, where the incoming Packet Stream is segmented into discrete units $X_1, \dots, X_k$; Map, which bifurcates into a Neural Path ($\phi$) for computing Linearized Attention via high-dimensional feature maps and a Symbolic Path ($\mathcal{R}$) that executes Rule Matching against hardware-resident Symbolic Constraints; and SumReduce, which aggregates partial results. These paths converge in the Cascade Fusion engine, which applies a Hard Veto / Soft Blend logic to ensure safety guarantees while maintaining neural expressivity. The final output is a Trustworthy Score representing a verified, line-rate inference result.
  • Figure 2: Transformation from standard attention to dataplane-native primitives. (a) Architectural comparison between infeasible exact attention and Chimera's linearized formulation. (b) Temporal unfolding of incremental state updates mapped to stateful ALU operations.
  • Figure 3: Two-layer key selection hierarchy and memory efficiency analysis. (Left) The architectural flow of Chimera: combining temporal locality in the SRAM-based Local Layer with structural prior knowledge in the TCAM-indexed Static Layer to perform sparse key selection. (Right) Comparative memory footprint showing Chimera's significant reduction in per-flow state compared to dense and linearized baselines.
  • Figure 4: Transformation from standard attention to dataplane-native primitives. (a) Architectural comparison between infeasible exact attention and Chimera's linearized formulation. (b) Temporal unfolding of incremental state updates mapped to stateful ALU operations.
  • Figure 5: Pareto frontier: CICIOT F1 versus per-flow state bits. Chimera is highlighted as Pareto-optimal, providing higher accuracy with lower per-flow state than competing dataplane models.
  • ...and 6 more figures

Theorems & Definitions (5)

  • Theorem A.1: Kernel-feature approximation; probabilistic bound
  • Theorem A.2: Spectral-norm approximation for linearized attention
  • Theorem A.3: Accumulated quantization and numeric error bound
  • Theorem A.4: Coverage preservation of two-layer selection
  • Theorem A.5: Stability of the two-timescale control/data-plane protocol