Toward Foundation Models for Online Complex Event Detection in CPS-IoT: A Case Study
Liying Han, Gaofeng Dong, Xiaomin Ouyang, Lance Kaplan, Federico Cerutti, Mani Srivastava
TL;DR
This work formalizes online complex event detection (CED) in CPS-IoT and introduces a multimodal, online dataset with 10 CE classes defined by finite-state rules. It compares three broad approaches—LLMs, neural architectures, and neurosymbolic methods—and finds that the Mamba state-space model delivers the best accuracy and generalization for long-span CE reasoning, outperforming both purely neural and LLM-based detectors. The study demonstrates that LLMs struggle with precise, real-time CE timing and are hampered by latency and hallucinations, while end-to-end Mamba models provide robust online inference and serve as a strong backbone for CPS-IoT foundation models. Overall, the work advocates state-space approaches as scalable, data-efficient foundations for long-term CE reasoning in CPS-IoT contexts, with practical implications for smart monitoring and autonomous systems.
Abstract
Complex events (CEs) play a crucial role in CPS-IoT applications, enabling high-level decision-making in domains such as smart monitoring and autonomous systems. However, most existing models focus on short-span perception tasks, lacking the long-term reasoning required for CE detection. CEs consist of sequences of short-time atomic events (AEs) governed by spatiotemporal dependencies. Detecting them is difficult due to long, noisy sensor data and the challenge of filtering out irrelevant AEs while capturing meaningful patterns. This work explores CE detection as a case study for CPS-IoT foundation models capable of long-term reasoning. We evaluate three approaches: (1) leveraging large language models (LLMs), (2) employing various neural architectures that learn CE rules from data, and (3) adopting a neurosymbolic approach that integrates neural models with symbolic engines embedding human knowledge. Our results show that the state-space model, Mamba, which belongs to the second category, outperforms all methods in accuracy and generalization to longer, unseen sensor traces. These findings suggest that state-space models could be a strong backbone for CPS-IoT foundation models for long-span reasoning tasks.
