An Empirical Evaluation of Neural and Neuro-symbolic Approaches to Real-time Multimodal Complex Event Detection

Liying Han; Mani B. Srivastava

An Empirical Evaluation of Neural and Neuro-symbolic Approaches to Real-time Multimodal Complex Event Detection

Liying Han, Mani B. Srivastava

TL;DR

Empirically, the neuro-symbolic architecture significantly surpasses purely neural models, demonstrating superior performance in CE recognition, even with extensive training data and ample temporal context for neural approaches.

Abstract

Robots and autonomous systems require an understanding of complex events (CEs) from sensor data to interact with their environments and humans effectively. Traditional end-to-end neural architectures, despite processing sensor data efficiently, struggle with long-duration events due to limited context sizes and reasoning capabilities. Recent advances in neuro-symbolic methods, which integrate neural and symbolic models leveraging human knowledge, promise improved performance with less data. This study addresses the gap in understanding these approaches' effectiveness in complex event detection (CED), especially in temporal reasoning. We investigate neural and neuro-symbolic architectures' performance in a multimodal CED task, analyzing IMU and acoustic data streams to recognize CE patterns. Our methodology includes (i) end-to-end neural architectures for direct CE detection from sensor embeddings, (ii) two-stage concept-based neural models mapping sensor embeddings to atomic events (AEs) before CE detection, and (iii) a neuro-symbolic approach using a symbolic finite-state machine for CE detection from AEs. Empirically, the neuro-symbolic architecture significantly surpasses purely neural models, demonstrating superior performance in CE recognition, even with extensive training data and ample temporal context for neural approaches.

An Empirical Evaluation of Neural and Neuro-symbolic Approaches to Real-time Multimodal Complex Event Detection

TL;DR

Abstract

Paper Structure (32 sections, 8 equations, 5 figures, 3 tables)

This paper contains 32 sections, 8 equations, 5 figures, 3 tables.

Introduction
Related Work
Complex Event Detection Task Formulation
Definitions
Complex Event Detection Task
Multimodal Complex Event dataset
Designing Complex Events for In-home Robots
Complex Event Simulator
Audio dataset
IMU dataset
Generating Ground-truth CE Labels
Real-time Complex Event Detection System
Overview
Multimodal Fusion Module
Complex Event Detector
...and 17 more sections

Figures (5)

Figure 1: An illustration of the real-time complex events detection task. The example on the right shows that "Using Restroom" and "Eating" without "Washing hands" triggers the complex event detection, but only at the last action "Washing hands" we attach the CE label "1" of this complex event.
Figure 2: Daily activity simulator. Each Stage has a set of $n$Activities that may happen according to a predefined distribution, where Activity$i$ has a probability $p_i$ of taking place in that Stage. Each Activity is defined by a temporal combination of relevant AEs. For example, in Daytime StageActivities "Walk-only", "Sit-only", "Restroom," "Work", and "Drink-only" happen with probabilities $[0.27, 0.27, 0.02, 0.4, 0.04]$ respectively. Each Activity is defined by the pattern displayed on the right side.
Figure 3: Overview of (Left) the entire real-time CED system, (Middle) the multimodal fusion module, and (Right) the complex event detector module.
Figure 4: Example CE label sequences predicted by NN models. Positive CE labels are highlighted in red in both ground-truth (label_i) and corresponding prediction (pred_i).
Figure 5: Evaluation of NN models with various CE training data sizes.

An Empirical Evaluation of Neural and Neuro-symbolic Approaches to Real-time Multimodal Complex Event Detection

TL;DR

Abstract

An Empirical Evaluation of Neural and Neuro-symbolic Approaches to Real-time Multimodal Complex Event Detection

Authors

TL;DR

Abstract

Table of Contents

Figures (5)