MultiIoT: Benchmarking Machine Learning for the Internet of Things
Shentong Mo, Louis-Philippe Morency, Russ Salakhutdinov, Paul Pu Liang
TL;DR
The paper tackles the need for scalable benchmarks in the diverse IoT sensor landscape. It introduces MultiIoT, a benchmark with 1.15M samples, 12 modalities, and 8 practical tasks, plus a taxonomy of modeling paradigms from unimodal to multisensory language-grounded models. Key findings show that multisensory multitask models generally outperform unimodal and single-task baselines, with language-grounded variants offering the strongest results and zero-/few-shot transfer capabilities. The work emphasizes both opportunities for performance gains and the challenges of long-range temporal interactions, heterogeneous noisy data, and the need for efficient, real-time inference, and it provides open-code and datasets to accelerate IoT ML research.
Abstract
The next generation of machine learning systems must be adept at perceiving and interacting with the physical world through a diverse array of sensory channels. Commonly referred to as the `Internet of Things (IoT)' ecosystem, sensory data from motion, thermal, geolocation, depth, wireless signals, video, and audio are increasingly used to model the states of physical environments and the humans inside them. Despite the potential for understanding human wellbeing, controlling physical devices, and interconnecting smart cities, the community has seen limited benchmarks for building machine learning systems for IoT. Existing efforts are often specialized to a single sensory modality or prediction task, which makes it difficult to study and train large-scale models across many IoT sensors and tasks. To accelerate the development of new machine learning technologies for IoT, this paper proposes MultiIoT, the most expansive and unified IoT benchmark to date, encompassing over 1.15 million samples from 12 modalities and 8 real-world tasks. MultiIoT introduces unique challenges involving (1) generalizable learning from many sensory modalities, (2) multimodal interactions across long temporal ranges, (3) extreme heterogeneity due to unique structure and noise topologies in real-world sensors, and (4) complexity during training and inference. We evaluate a comprehensive set of models on MultiIoT, including modality and task-specific methods, multisensory and multitask supervised models, and large multisensory foundation models. Our results highlight opportunities for ML to make a significant impact in IoT, but many challenges in scalable learning from heterogeneous, long-range, and imperfect sensory modalities still persist. We release all code and data to accelerate future research in machine learning for IoT.
