Robustness Testing of Multi-Modal Models in Varied Home Environments for Assistive Robots
Lea Hirlimann, Shengqiang Zhang, Hinrich Schütze, Philipp Wicke
TL;DR
This work tackles the problem of robustness for multi-modal robotic models operating in home environments by introducing disturbances in the AI2Thor simulator to emulate real-world variability relevant to geriatric care. It evaluates three open-source, ALFRED-strong models (HLSM, FILM, EmBERT) across disturbed tasks derived from ALFRED, including dim-lit conditions, glass doors, and reflections from mirrors. Preliminary results reveal that disturbances generally reduce Task Success and Goal Condition Success, with depth sensing offering some resilience for certain models (e.g., HLSM with depth data showing improved metrics in glass-wall scenarios). The study provides a methodology and initial findings to guide robust development and collaboration with geriatrics practitioners, aiming to improve the reliability and safety of assistive robots in real homes.
Abstract
The development of assistive robotic agents to support household tasks is advancing, yet the underlying models often operate in virtual settings that do not reflect real-world complexity. For assistive care robots to be effective in diverse environments, their models must be robust and integrate multiple modalities. Consider a caretaker needing assistance in a dimly lit room or navigating around a newly installed glass door. Models relying solely on visual input might fail in low light, while those using depth information could avoid the door. This demonstrates the necessity for models that can process various sensory inputs. Our ongoing study evaluates state-of-the-art robotic models in the AI2Thor virtual environment. We introduce disturbances, such as dimmed lighting and mirrored walls, to assess their impact on modalities like movement or vision, and object recognition. Our goal is to gather input from the Geriatronics community to understand and model the challenges faced by practitioners.
