Neuro Symbolic Knowledge Reasoning for Procedural Video Question Answering

Basura Fernando; Thanh-Son Nguyen; Hong Yang; Tzeh Yuan Neoh; Hao Zhang; Ee Yeo Keat

Neuro Symbolic Knowledge Reasoning for Procedural Video Question Answering

Basura Fernando, Thanh-Son Nguyen, Hong Yang, Tzeh Yuan Neoh, Hao Zhang, Ee Yeo Keat

TL;DR

Knowledge Module Learning (KML) introduces a neurosymbolic framework for procedural knowledge reasoning in videos by learning relation-specific neural modules that map PKG relations to executable programs generated by large language models. The approach decouples program synthesis from execution, grounding module behavior in a Procedural Knowledge Graph (PKG) with explicit relations such as HAS_TOOL and HAS_PURPOSE, enabling interpretable intermediate states and uncertainty-aware multi-hop reasoning. Theoretical results establish a separation condition for learned mappings and a deterministic bound on error accumulation across hops, providing stability guarantees for multi-step reasoning. Empirically, KML outperforms LLM-only and black-box baselines on the PKR-QA benchmark, with ablations and robustness analyses demonstrating the benefits of procedure-grounded grounding, LLM-generated programs, and learned KMs; code is publicly available for reproducibility. The work also extends to logical operators like AND/NOT and discusses future directions toward richer logic and embodied reasoning.

Abstract

In this work we present Knowledge Module Learning (KML) to understand and reason over procedural tasks that requires models to learn structured and compositional procedural knowledge. KML is a neurosymbolic framework that learns relation categories within a knowledge graph as neural knowledge modules and composes them into executable reasoning programs generated by large language models (LLMs). Each module encodes a specific procedural relation capturing how each entity type such as tools are related to steps, purpose of each tool, and steps of each task. Given a question conditioned on a task shown in a video, then KML performs multistep reasoning with transparent, traceable intermediate states. Our theoretical analysis demonstrated two desired properties of KML. KML satisfy strong optimal conditions for modelling KG relations as neural mappings, providing strong foundations for generalizable procedural reasoning. It also shows a bound on the expected error when it performs multistep reasoning. To evaluate this model, we construct a large procedural knowledge graph (PKG) consisting of diverse instructional domains by integrating the COIN instructional video dataset, and COIN ontology, commonsense relations from ConceptNet, and structured extractions from LLMs, followed by expert verification. We then generate question and answer pairs by applying graph traversal templates over the PKG, constructing the PKR-QA benchmark for procedural knowledge reasoning. Experiments show that KML improves structured reasoning performance while providing interpretable step-by-step traces, outperforming LLM-only and black-box neural baselines. Code is publicly available at https://github.com/LUNAProject22/KML.

Neuro Symbolic Knowledge Reasoning for Procedural Video Question Answering

TL;DR

Abstract

Neuro Symbolic Knowledge Reasoning for Procedural Video Question Answering

TL;DR

Abstract

Paper Structure

Table of Contents

Key Result

Figures (10)

Theorems & Definitions (8)