IoT-LM: Large Multisensory Language Models for the Internet of Things
Shentong Mo, Russ Salakhutdinov, Louis-Philippe Morency, Paul Pu Liang
TL;DR
IoT-LM tackles the challenge of learning from richly multisensory IoT data by grounding a pretrained large language model with a dedicated multisensory encoder and a novel multisensory multitask adapter. The authors introduce the 1.15-million-sample MultiIoT dataset spanning 12 modalities and 8 tasks, and demonstrate joint learning through multisensory pretraining and instruction tuning. The approach yields strong improvements across 8 IoT tasks, enables zero-shot and few-shot transfer, and exhibits favorable scaling properties, establishing a foundation for interactive, reasoning-enabled IoT systems. By releasing data, models, and training code, IoT-LM aims to accelerate practical development of sensor-grounded language reasoning for smart devices and cities.
Abstract
The Internet of Things (IoT) network integrating billions of smart physical devices embedded with sensors, software, and communication technologies is a critical and rapidly expanding component of our modern world. The IoT ecosystem provides a rich source of real-world modalities such as motion, thermal, geolocation, imaging, depth, sensors, and audio to recognize the states of humans and physical objects. Machine learning presents a rich opportunity to automatically process IoT data at scale, enabling efficient inference for understanding human wellbeing, controlling physical devices, and interconnecting smart cities. To realize this potential, we introduce IoT-LM, an open-source large multisensory language model tailored for the IoT ecosystem. IoT-LM is enabled by two technical contributions: the first is MultiIoT, the most expansive unified IoT dataset to date, encompassing over 1.15 million samples from 12 modalities and 8 tasks prepared for multisensory pre-training and instruction-tuning. The second is a new multisensory multitask adapter layer to condition pre-trained large language models on multisensory IoT data. Not only does IoT-LM yield substantial improvements on 8 supervised IoT classification tasks, but it also demonstrates new interactive question-answering, reasoning, and dialog capabilities conditioned on IoT sensors. We release IoT-LM's data sources and new multisensory language modeling framework.
