Table of Contents
Fetching ...

Toward Foundation Models for Online Complex Event Detection in CPS-IoT: A Case Study

Liying Han, Gaofeng Dong, Xiaomin Ouyang, Lance Kaplan, Federico Cerutti, Mani Srivastava

TL;DR

This work formalizes online complex event detection (CED) in CPS-IoT and introduces a multimodal, online dataset with 10 CE classes defined by finite-state rules. It compares three broad approaches—LLMs, neural architectures, and neurosymbolic methods—and finds that the Mamba state-space model delivers the best accuracy and generalization for long-span CE reasoning, outperforming both purely neural and LLM-based detectors. The study demonstrates that LLMs struggle with precise, real-time CE timing and are hampered by latency and hallucinations, while end-to-end Mamba models provide robust online inference and serve as a strong backbone for CPS-IoT foundation models. Overall, the work advocates state-space approaches as scalable, data-efficient foundations for long-term CE reasoning in CPS-IoT contexts, with practical implications for smart monitoring and autonomous systems.

Abstract

Complex events (CEs) play a crucial role in CPS-IoT applications, enabling high-level decision-making in domains such as smart monitoring and autonomous systems. However, most existing models focus on short-span perception tasks, lacking the long-term reasoning required for CE detection. CEs consist of sequences of short-time atomic events (AEs) governed by spatiotemporal dependencies. Detecting them is difficult due to long, noisy sensor data and the challenge of filtering out irrelevant AEs while capturing meaningful patterns. This work explores CE detection as a case study for CPS-IoT foundation models capable of long-term reasoning. We evaluate three approaches: (1) leveraging large language models (LLMs), (2) employing various neural architectures that learn CE rules from data, and (3) adopting a neurosymbolic approach that integrates neural models with symbolic engines embedding human knowledge. Our results show that the state-space model, Mamba, which belongs to the second category, outperforms all methods in accuracy and generalization to longer, unseen sensor traces. These findings suggest that state-space models could be a strong backbone for CPS-IoT foundation models for long-span reasoning tasks.

Toward Foundation Models for Online Complex Event Detection in CPS-IoT: A Case Study

TL;DR

This work formalizes online complex event detection (CED) in CPS-IoT and introduces a multimodal, online dataset with 10 CE classes defined by finite-state rules. It compares three broad approaches—LLMs, neural architectures, and neurosymbolic methods—and finds that the Mamba state-space model delivers the best accuracy and generalization for long-span CE reasoning, outperforming both purely neural and LLM-based detectors. The study demonstrates that LLMs struggle with precise, real-time CE timing and are hampered by latency and hallucinations, while end-to-end Mamba models provide robust online inference and serve as a strong backbone for CPS-IoT foundation models. Overall, the work advocates state-space approaches as scalable, data-efficient foundations for long-term CE reasoning in CPS-IoT contexts, with practical implications for smart monitoring and autonomous systems.

Abstract

Complex events (CEs) play a crucial role in CPS-IoT applications, enabling high-level decision-making in domains such as smart monitoring and autonomous systems. However, most existing models focus on short-span perception tasks, lacking the long-term reasoning required for CE detection. CEs consist of sequences of short-time atomic events (AEs) governed by spatiotemporal dependencies. Detecting them is difficult due to long, noisy sensor data and the challenge of filtering out irrelevant AEs while capturing meaningful patterns. This work explores CE detection as a case study for CPS-IoT foundation models capable of long-term reasoning. We evaluate three approaches: (1) leveraging large language models (LLMs), (2) employing various neural architectures that learn CE rules from data, and (3) adopting a neurosymbolic approach that integrates neural models with symbolic engines embedding human knowledge. Our results show that the state-space model, Mamba, which belongs to the second category, outperforms all methods in accuracy and generalization to longer, unseen sensor traces. These findings suggest that state-space models could be a strong backbone for CPS-IoT foundation models for long-span reasoning tasks.

Paper Structure

This paper contains 27 sections, 6 equations, 4 figures, 3 tables.

Figures (4)

  • Figure 1: (a) An intelligent smartwatch assistant monitors workplace hygiene and alerts users of potential violations. (b) In a smart facility, a surveillance system detects suspicious activities, such as unusual parcel hand-offs, using distributed cameras. (c) An illustration of the online CED task showing sensor processing and CE labeling.
  • Figure 2: Overview of the online CED pipeline.
  • Figure 3: Boxplot of positive $F_1$ scores on complex events with different training data sizes. Solid line in the box shows median; dashed line in the box shows mean.
  • Figure 4: Positive $F1$ scores of different models tested on 5-min, 15-min, and 30-min CE sensor data.

Theorems & Definitions (2)

  • definition 1
  • definition 2