EONSim: An NPU Simulator for On-Chip Memory and Embedding Vector Operations
Sangun Choi, Yunho Oh
TL;DR
EONSim addresses the gap in NPU simulation by jointly modeling matrix and embedding-vector operations, capturing the non-deterministic, data-dependent memory accesses that embedding workloads impose. It combines a validated analytic model for matrix computations with a detailed cycle-level memory simulator for embedding vector accesses, while translating hardware-agnostic embedding traces into platform-specific addresses to enable trace reuse. The approach yields high fidelity against TPUv6e for DLRM and demonstrates that on-chip memory management substantially influences embedding performance, validating the need for flexible cache-like policies and profiling-based data pinning. This simulator enables architectural exploration of next-generation NPUs and on-chip memory designs for embedding-heavy workloads, with practical implications for memory hierarchy and prefetch/caching strategies. EONSim is publicly available to support research in this area.
Abstract
Embedding vector operations are a key component of modern deep neural network workloads. Unlike matrix operations with deterministic access patterns, embedding vector operations exhibit input data-dependent and non-deterministic memory accesses. Existing neural processing unit (NPU) simulators focus on matrix computations with simple double-buffered on-chip memory systems, lacking the modeling capability for realistic embedding behavior. Next-generation NPUs, however, call for more flexible on-chip memory architectures that can support diverse access and management schemes required by embedding workloads. To enable flexible exploration and design of emerging NPU architectures, we present EONSim, an NPU simulator that holistically models both matrix and embedding vector operations. EONSim integrates a validated performance model for matrix computations with detailed memory simulation for embedding accesses, supporting various on-chip memory management policies. Validated against TPUv6e, EONSim achieves an average inference time error of 1.4\% and an average on-chip memory access count error of 2.2\%.
