SITUATE: Indoor Human Trajectory Prediction through Geometric Features and Self-Supervised Vision Representation

Luigi Capogrosso; Andrea Toaiari; Andrea Avogaro; Uzair Khan; Aditya Jivoji; Franco Fummi; Marco Cristani

SITUATE: Indoor Human Trajectory Prediction through Geometric Features and Self-Supervised Vision Representation

Luigi Capogrosso, Andrea Toaiari, Andrea Avogaro, Uzair Khan, Aditya Jivoji, Franco Fummi, Marco Cristani

TL;DR

Indoor human trajectory forecasting is challenged by dense indoor layouts and semantic constraints. SITUATE addresses this by fusing equivariant/invariant geometric feature learning with a self-supervised vision representation (BEiT) of scene layouts, enabling scene-aware and transformation-consistent predictions. The model combines EquiGCN and InvGCN blocks, scene tokens, and a principled feature initialization to predict future trajectories, achieving state-of-the-art results on THÖR and Supermarket and competitive outdoor performance, with ablations confirming the importance of the scene representation and regularization components. This approach offers practical impact for indoor robotics and location-based services by providing accurate, context-aware forecasts that generalize across indoor and outdoor scenarios, and the authors provide public code for reproducibility.

Abstract

Patterns of human motion in outdoor and indoor environments are substantially different due to the scope of the environment and the typical intentions of people therein. While outdoor trajectory forecasting has received significant attention, indoor forecasting is still an underexplored research area. This paper proposes SITUATE, a novel approach to cope with indoor human trajectory prediction by leveraging equivariant and invariant geometric features and a self-supervised vision representation. The geometric learning modules model the intrinsic symmetries and human movements inherent in indoor spaces. This concept becomes particularly important because self-loops at various scales and rapid direction changes often characterize indoor trajectories. On the other hand, the vision representation module is used to acquire spatial-semantic information about the environment to predict users' future locations more accurately. We evaluate our method through comprehensive experiments on the two most famous indoor trajectory forecasting datasets, i.e., THÖR and Supermarket, obtaining state-of-the-art performance. Furthermore, we also achieve competitive results in outdoor scenarios, showing that indoor-oriented forecasting models generalize better than outdoor-oriented ones. The source code is available at https://github.com/intelligolabs/SITUATE.

SITUATE: Indoor Human Trajectory Prediction through Geometric Features and Self-Supervised Vision Representation

TL;DR

Abstract

Paper Structure (22 sections, 6 equations, 2 figures, 5 tables)

This paper contains 22 sections, 6 equations, 2 figures, 5 tables.

Introduction
Motivations for this paper.
Innovations in this paper.
Related Work
Indoor human trajectory prediction.
Equivariant and invariant graph neural networks.
Self-supervised vision representation.
Method
Mathematical background.
Motion prediction.
The SITUATE Prediction Network
Feature Initialization
Experiments
Evaluation Setup
Datasets.
...and 7 more sections

Figures (2)

Figure 1: Examples of different trajectories from the Supermarket gabellini2019large dataset to show the difficulty of the indoor trajectory prediction task. In particular, the dataset showcases long trajectories (Person 4), self-loops (Person 1 and Person 3), and confusing movements (Person 2) performed in an environment that strongly affects the people's paths. Specifically, the red circle represents the starting point of a trajectory, and the yellow star represents its final point.
Figure 2: In SITUATE, we first produce a feature vector regarding the scene using the self-supervised vision representation module. Then, a feature initialization layer is used to initialize geometric and pattern features. We then successively update the geometric and pattern features by the equivariant geometric feature learning and invariant pattern feature learning layers, obtaining expressive feature representation. We further use an invariant reasoning module to infer an interaction graph used in equivariant geometric feature learning. Finally, we use an equivariant output layer to obtain the final prediction.

SITUATE: Indoor Human Trajectory Prediction through Geometric Features and Self-Supervised Vision Representation

TL;DR

Abstract

SITUATE: Indoor Human Trajectory Prediction through Geometric Features and Self-Supervised Vision Representation

Authors

TL;DR

Abstract

Table of Contents

Figures (2)