IoT-LLM: a framework for enhancing Large Language Model reasoning from real-world sensor data

Tuo An; Yunjiao Zhou; Han Zou; Jianfei Yang

IoT-LLM: a framework for enhancing Large Language Model reasoning from real-world sensor data

Tuo An, Yunjiao Zhou, Han Zou, Jianfei Yang

TL;DR

Problem: LLMs struggle with physical-world reasoning. Approach: IoT-LLM augments LLM perception with IoT sensor data and IoT-domain knowledge using data simplification, retrieval-augmented knowledge, and targeted prompting. Contributions: first unified IoT-sensory task framework, a five-task IoT benchmark, and demonstration that retrieval-driven knowledge augmentation yields substantial performance gains across multiple models, including near-expert accuracy on several tasks. Significance: enables scalable, generalizable IoT reasoning for real-world applications without manual expert engineering, highlighting the importance of perception in LLM-based embodied AI.

Abstract

Large Language Models excel in textual tasks but often struggle with physical-world reasoning tasks. Inspired by human cognition, where perception is fundamental to reasoning, we explore augmenting LLMs with enhanced perception abilities using Internet of Things (IoT) data and pertinent knowledge. In this work, we systematically study LLMs' capability to address IoT-sensory tasks by augmenting their perception and knowledge base, and then propose a unified framework, IoT-LLM, to enhance such capability. In IoT-LLM, we customize three steps: preprocessing IoT data into suitable formats, expanding LLMs knowledge via IoT-oriented retrieval-augmented generation and activating LLMs commonsense knowledge through chain-of-thought prompting. We design a benchmark comprising five real-world tasks with varying data types and reasoning complexities to evaluate the performance of IoT-LLM. Experimental results reveal that IoT-LLM significantly improves the performance of IoT-sensory task reasoning of LLMs, with models like GPT-4o-mini showing a 49.4% average improvement over previous methods.

IoT-LLM: a framework for enhancing Large Language Model reasoning from real-world sensor data

TL;DR

Abstract

Paper Structure (14 sections, 13 figures, 3 tables)

This paper contains 14 sections, 13 figures, 3 tables.

IoT Sensory Tasks.
LLM baselines.
Baseline method.
IoT data simplification.
IoT data enrichment.
Construct IoT knowledge base.
Embed knowledge base into vector database.
IoT-oriented retrieval.
Seamless expansion to additional tasks.
Human Activity Recognition.
Industrial anomaly detection.
Heartbeat anomaly detection.
Human sensing task.
Indoor localization task.

Figures (13)

Figure 1: Inspired by human cognitive science, we augment LLMs with physical world perception from IoT data. Furthermore, by retrieving pertinent knowledge about IoT tasks, we enhance the reasoning capabilities of LLMs in executing real-world applications.
Figure 2: In our framework, IoT data is initially preprocessed to create a data description. Next, relevant IoT domain knowledge and task-specific demonstrations are retrieved. These elements are then combined into a prompt, which is input into a LLM to generate the final output.
Figure 3: Response examples comparing the baseline method and our approach in heartbeat anomaly detection. The baseline method offers logically coherent but generalized analyses, whereas our method provides deeper insights and more precise descriptions of ventricular premature contraction characteristics, resulting in more professional and accurate responses.
Figure 4: Accuracy heatmap over $(n,m)$. Cells report accuracy (%) for candidates per retriever $n$ (rows) and kept passages after re-ranking $m$ (columns). Missing cells are infeasible ($m>2n$); the orange polyline marks the feasibility boundary $m{=}2n$. The optimum appears at a small keep set ($m{=}2$) with moderate candidates ($n{=}5$); accuracy degrades for larger $m$, while increasing $n$ beyond $\sim\!5$ yields diminishing returns.
Figure 5: Retrieval quantity sensitivity—per-$n$ and per-$m$ views.Top: Accuracy vs. kept passages $m$ for fixed $n\in\{1,2,3,5,7,10\}$. Bottom: Accuracy vs. candidates $n$ for fixed $m\in\{1,2,3,4,6,10\}$. Across settings, accuracy rises steeply from $m{=}0$ to a small keep set and peaks at $m\in\{2,3\}$, then declines as $m$ grows, consistent with noise/attention dilution. Increasing $n$ helps up to a saturation around $n\!\approx\!5$, after which returns diminish. Together these views show that a small $m$ and a moderate $n$ are sufficient and robust.
...and 8 more figures

IoT-LLM: a framework for enhancing Large Language Model reasoning from real-world sensor data

TL;DR

Abstract

IoT-LLM: a framework for enhancing Large Language Model reasoning from real-world sensor data

Authors

TL;DR

Abstract

Table of Contents

Figures (13)