BIRD: A Museum Open Dataset Combining Behavior Patterns and Identity Types to Better Model Visitors' Experience
Alexanne Worm, Florian Marchal, Sylvain Castagnos
TL;DR
The paper addresses the lack of comprehensive open data for modeling museum visitors by introducing BIRD, a dataset that fuses contextual, behavioral, and feedback information from 51 participants. The data were collected with eye-tracking and rich questionnaires as visitors explored a three-floor museum, and pre-processing produced standardized, multi-modal trajectories and interaction records suitable for identity analysis and recommender evaluations. A key contribution is demonstrating the dataset's utility for identity analysis via clustering that aligns with established profiles, while providing a flexible format for future research in trajectory prediction and personalization. The dataset and accompanying resources aim to advance human-centered museum analytics and enable reproducible evaluation of recommender systems and simulation models in cultural heritage contexts.
Abstract
Lack of data is a recurring problem in Artificial Intelligence, as it is essential for training and validating models. This is particularly true in the field of cultural heritage, where the number of open datasets is relatively limited and where the data collected does not always allow for holistic modeling of visitors' experience due to the fact that data are ad hoc (i.e. restricted to the sole characteristics required for the evaluation of a specific model). To overcome this lack, we conducted a study between February and March 2019 aimed at obtaining comprehensive and detailed information about visitors, their visit experience and their feedback. We equipped 51 participants with eye-tracking glasses, leaving them free to explore the 3 floors of the museum for an average of 57 minutes, and to discover an exhibition of more than 400 artworks. On this basis, we built an open dataset combining contextual data (demographic data, preferences, visiting habits, motivations, social context. . . ), behavioral data (spatiotemporal trajectories, gaze data) and feedback (satisfaction, fatigue, liked artworks, verbatim. . . ). Our analysis made it possible to re-enact visitor identities combining the majority of characteristics found in the literature and to reproduce the Veron and Levasseur profiles. This dataset will ultimately make it possible to improve the quality of recommended paths in museums by personalizing the number of points of interest (POIs), the time spent at these different POIs, and the amount of information to be provided to each visitor based on their level of interest.
