Table of Contents
Fetching ...

Detection and Recognition: A Pairwise Interaction Framework for Mobile Service Robots

Mengyu Liang, Sarah Gillet Schlegel, Iolanda Leite

TL;DR

It is argued that pairwise human interaction constitute a minimal yet sufficient perceptual unit for robot-centric social understanding, and a two-stage framework in which candidate interacting pairs are first identified based on lightweight geometric and motion cues, and interaction types are subsequently classified using a relation network is adopted.

Abstract

Autonomous mobile service robots, like lawnmowers or cleaning robots, operating in human-populated environments need to reason about local human-human interactions to support safe and socially aware navigation while fulfilling their tasks. For such robots, interaction understanding is not primarily a fine-grained recognition problem, but a perception problem under limited sensing quality and computational resources. Many existing approaches focus on holistic group activity recognition, which often requires complex and large models which may not be necessary for mobile service robots. Others use pairwise interaction methods which commonly rely on skeletal representations but their use in outdoor environments remains challenging. In this work, we argue that pairwise human interaction constitute a minimal yet sufficient perceptual unit for robot-centric social understanding. We study the problem of identifying interacting person pairs and classifying coarse-grained interaction behaviors sufficient for downstream group-level reasoning and service robot decision-making. To this end, we adopt a two-stage framework in which candidate interacting pairs are first identified based on lightweight geometric and motion cues, and interaction types are subsequently classified using a relation network. We evaluate the proposed approach on the JRDB dataset, where it achieves sufficient accuracy with reduced computational cost and model size compared to appearance-based methods. Additional experiments on the Collective Activity Dataset and zero shot test on a lawnmower-collected dataset further illustrate the generality of the proposed framework. These results suggest that pairwise geometric and motion cues provide a practical basis for interaction perception on mobile service robot providing a promising method for integration into mobile robot navigation stacks in future work. Code will be released soon

Detection and Recognition: A Pairwise Interaction Framework for Mobile Service Robots

TL;DR

It is argued that pairwise human interaction constitute a minimal yet sufficient perceptual unit for robot-centric social understanding, and a two-stage framework in which candidate interacting pairs are first identified based on lightweight geometric and motion cues, and interaction types are subsequently classified using a relation network is adopted.

Abstract

Autonomous mobile service robots, like lawnmowers or cleaning robots, operating in human-populated environments need to reason about local human-human interactions to support safe and socially aware navigation while fulfilling their tasks. For such robots, interaction understanding is not primarily a fine-grained recognition problem, but a perception problem under limited sensing quality and computational resources. Many existing approaches focus on holistic group activity recognition, which often requires complex and large models which may not be necessary for mobile service robots. Others use pairwise interaction methods which commonly rely on skeletal representations but their use in outdoor environments remains challenging. In this work, we argue that pairwise human interaction constitute a minimal yet sufficient perceptual unit for robot-centric social understanding. We study the problem of identifying interacting person pairs and classifying coarse-grained interaction behaviors sufficient for downstream group-level reasoning and service robot decision-making. To this end, we adopt a two-stage framework in which candidate interacting pairs are first identified based on lightweight geometric and motion cues, and interaction types are subsequently classified using a relation network. We evaluate the proposed approach on the JRDB dataset, where it achieves sufficient accuracy with reduced computational cost and model size compared to appearance-based methods. Additional experiments on the Collective Activity Dataset and zero shot test on a lawnmower-collected dataset further illustrate the generality of the proposed framework. These results suggest that pairwise geometric and motion cues provide a practical basis for interaction perception on mobile service robot providing a promising method for integration into mobile robot navigation stacks in future work. Code will be released soon
Paper Structure (23 sections, 14 equations, 2 figures, 2 tables)

This paper contains 23 sections, 14 equations, 2 figures, 2 tables.

Figures (2)

  • Figure 1: Overview of the proposed two-stage pairwise interaction recognition framework: Stage 1 performs interaction detection using a $7D$ geometric feature vector derived from bounding box configurations, producing candidate interacting person pairs. Stage 2 classifies coarse-grained interaction types by combining frozen visual appearance features extracted by EfficientNet with geometric–motion features computed from optical flow, using a relation network for explicit pairwise reasoning. The framework is designed for efficient and robust deployment on constrained resource robotic platforms.
  • Figure 2: Qualitative zero shot interaction recognition results on data collected from a mobile lawnmower platform (a) The lawnmower moves rapidly through a group of pedestrians, inducing strong ego-motion and viewpoint changes, which lead to misclassifications of interaction types. (b) The lawnmower remains stationary, resulting in stable observations and correct interaction recognition most person pairs. (c) The lawnmower approaches two individuals engaged in a face-to-face interaction, illustrating the intended use case of interaction aware perception to support downstream robotic decision-making.