Table of Contents
Fetching ...

Inferring Past Human Actions in Homes with Abductive Reasoning

Clement Tan, Chai Kiat Yeo, Cheston Tan, Basura Fernando

TL;DR

This work defines abductive past action inference: inferring plausible past human actions from a single image by leveraging current scene evidence. It introduces an object-relational representation framework and a set of architectures (GNNED, RBP, and BiGED) to reason over human–object relations, with BiGED achieving the strongest performance by fusing bilinear pooling and a relational graph encoder–decoder. Evaluations on the Action Genome/Charades-derived dataset show that object-relational approaches outperform end-to-end and vision–language baselines, though humans still outperform AI, highlighting the challenge and value of relational reasoning for abductive inference. The findings suggest practical impact for human–robot interaction and elder care, where understanding past actions from present evidence can improve safety and decision-making; code and data are released to facilitate further research.

Abstract

Abductive reasoning aims to make the most likely inference for a given set of incomplete observations. In this paper, we introduce "Abductive Past Action Inference", a novel research task aimed at identifying the past actions performed by individuals within homes to reach specific states captured in a single image, using abductive inference. The research explores three key abductive inference problems: past action set prediction, past action sequence prediction, and abductive past action verification. We introduce several models tailored for abductive past action inference, including a relational graph neural network, a relational bilinear pooling model, and a relational transformer model. Notably, the newly proposed object-relational bilinear graph encoder-decoder (BiGED) model emerges as the most effective among all methods evaluated, demonstrating good proficiency in handling the intricacies of the Action Genome dataset. The contributions of this research significantly advance the ability of deep learning models to reason about current scene evidence and make highly plausible inferences about past human actions. This advancement enables a deeper understanding of events and behaviors, which can enhance decision-making and improve system capabilities across various real-world applications such as Human-Robot Interaction and Elderly Care and Health Monitoring. Code and data available at https://github.com/LUNAProject22/AAR

Inferring Past Human Actions in Homes with Abductive Reasoning

TL;DR

This work defines abductive past action inference: inferring plausible past human actions from a single image by leveraging current scene evidence. It introduces an object-relational representation framework and a set of architectures (GNNED, RBP, and BiGED) to reason over human–object relations, with BiGED achieving the strongest performance by fusing bilinear pooling and a relational graph encoder–decoder. Evaluations on the Action Genome/Charades-derived dataset show that object-relational approaches outperform end-to-end and vision–language baselines, though humans still outperform AI, highlighting the challenge and value of relational reasoning for abductive inference. The findings suggest practical impact for human–robot interaction and elder care, where understanding past actions from present evidence can improve safety and decision-making; code and data are released to facilitate further research.

Abstract

Abductive reasoning aims to make the most likely inference for a given set of incomplete observations. In this paper, we introduce "Abductive Past Action Inference", a novel research task aimed at identifying the past actions performed by individuals within homes to reach specific states captured in a single image, using abductive inference. The research explores three key abductive inference problems: past action set prediction, past action sequence prediction, and abductive past action verification. We introduce several models tailored for abductive past action inference, including a relational graph neural network, a relational bilinear pooling model, and a relational transformer model. Notably, the newly proposed object-relational bilinear graph encoder-decoder (BiGED) model emerges as the most effective among all methods evaluated, demonstrating good proficiency in handling the intricacies of the Action Genome dataset. The contributions of this research significantly advance the ability of deep learning models to reason about current scene evidence and make highly plausible inferences about past human actions. This advancement enables a deeper understanding of events and behaviors, which can enhance decision-making and improve system capabilities across various real-world applications such as Human-Robot Interaction and Elderly Care and Health Monitoring. Code and data available at https://github.com/LUNAProject22/AAR
Paper Structure (34 sections, 13 equations, 8 figures, 9 tables)

This paper contains 34 sections, 13 equations, 8 figures, 9 tables.

Figures (8)

  • Figure 1: Proposed object-relational approach for abductive past action inference. Models are tasked to: 1) abduct the set of past actions, 2) abduct the sequence of past actions, and 3) perform abductive past action verification.
  • Figure 2: The graph neural network encoder (left) and graph neural network decoder (right) architecture. The residual connections are shown with the $+$ sign.
  • Figure 3: The Bilinear Graph Encoder-Decoder (BiGED) architecture.
  • Figure 4: (left) The relational multi-head self-attention transformer. (right) The relational cross-attention transformer.
  • Figure 5: The context description and the textual prompt used for the GPT-3.5 turbo model.
  • ...and 3 more figures