Table of Contents
Fetching ...

Thoughtful Things: Building Human-Centric Smart Devices with Small Language Models

Evan King, Haoxiang Yu, Sahil Vartak, Jenna Jacob, Sangsu Lee, Christine Julien

TL;DR

This paper tackles the usability and transparency gap in modern smart devices by proposing Thoughtful Things: on-device, small-language-models grounded by formal state models to perform actions and provide explanations for unconstrained user commands. The authors present a five-step framework—state modeling, random state generation, knowledge synthesis, bootstrapping, distillation, and integration—to train sub-3B LMs that operate entirely on-device with no cloud dependence. They demonstrate two implementations, a lamp and a thermostat, trained on Raspberry Pi 5 hardware, and evaluate performance in terms of accuracy, generalization, and runtime metrics, showing feasibility and privacy benefits. The work contributes a practical, scalable path toward human-centric smart devices and opens avenues for richer, private, explainable AI in everyday hardware.

Abstract

Everyday devices like light bulbs and kitchen appliances are now embedded with so many features and automated behaviors that they have become complicated to actually use. While such "smart" capabilities can better support users' goals, the task of learning the "ins and outs" of different devices is daunting. Voice assistants aim to solve this problem by providing a natural language interface to devices, yet such assistants cannot understand loosely-constrained commands, they lack the ability to reason about and explain devices' behaviors to users, and they rely on connectivity to intrusive cloud infrastructure. Toward addressing these issues, we propose thoughtful things: devices that leverage lightweight, on-device language models to take actions and explain their behaviors in response to unconstrained user commands. We propose an end-to-end framework that leverages formal modeling, automated training data synthesis, and generative language models to create devices that are both capable and thoughtful in the presence of unconstrained user goals and inquiries. Our framework requires no labeled data and can be deployed on-device, with no cloud dependency. We implement two thoughtful things (a lamp and a thermostat) and deploy them on real hardware, evaluating their practical performance.

Thoughtful Things: Building Human-Centric Smart Devices with Small Language Models

TL;DR

This paper tackles the usability and transparency gap in modern smart devices by proposing Thoughtful Things: on-device, small-language-models grounded by formal state models to perform actions and provide explanations for unconstrained user commands. The authors present a five-step framework—state modeling, random state generation, knowledge synthesis, bootstrapping, distillation, and integration—to train sub-3B LMs that operate entirely on-device with no cloud dependence. They demonstrate two implementations, a lamp and a thermostat, trained on Raspberry Pi 5 hardware, and evaluate performance in terms of accuracy, generalization, and runtime metrics, showing feasibility and privacy benefits. The work contributes a practical, scalable path toward human-centric smart devices and opens avenues for richer, private, explainable AI in everyday hardware.

Abstract

Everyday devices like light bulbs and kitchen appliances are now embedded with so many features and automated behaviors that they have become complicated to actually use. While such "smart" capabilities can better support users' goals, the task of learning the "ins and outs" of different devices is daunting. Voice assistants aim to solve this problem by providing a natural language interface to devices, yet such assistants cannot understand loosely-constrained commands, they lack the ability to reason about and explain devices' behaviors to users, and they rely on connectivity to intrusive cloud infrastructure. Toward addressing these issues, we propose thoughtful things: devices that leverage lightweight, on-device language models to take actions and explain their behaviors in response to unconstrained user commands. We propose an end-to-end framework that leverages formal modeling, automated training data synthesis, and generative language models to create devices that are both capable and thoughtful in the presence of unconstrained user goals and inquiries. Our framework requires no labeled data and can be deployed on-device, with no cloud dependency. We implement two thoughtful things (a lamp and a thermostat) and deploy them on real hardware, evaluating their practical performance.
Paper Structure (19 sections, 4 equations, 11 figures, 6 tables)

This paper contains 19 sections, 4 equations, 11 figures, 6 tables.

Figures (11)

  • Figure 1: Our prototype implementation of a "thoughtful lamp". The lamp leverages a fine-tuned small language model running on a Raspberry Pi to respond to loosely constrained user commands with appropriate actions (e.g., changing color; left to right) and explanations (e.g., describing its state and capabilities; right)
  • Figure 2: Visualizations of attention in a transformer model (Phi-2) given input user commands and device states. Generalist base models are pre-trained on large amounts of code and unconstrained text, so they learn semantic relationships between commands and relevant machine-readable state (e.g., "dark" and "off"). In this paper, we fine-tune these models to generate responses with device-specific actions and explanations that adhere to a real device's state model.
  • Figure 3: Thoughtful things are devices that respond to unconstrained user commands with appropriate actions (i.e., state changes) or explanations (i.e., descriptions of current state and capabilities). We accomplish this by combining a small, fine-tuned generative language model with a formal system model. The LLM flexibly synthesizes new states and explanations in response to diverse user commands, while the system model grounds responses in a device's true capabilities.
  • Figure 4: Overview of our framework. Our five-step process leverages a combination of formal modeling, training data synthesis, and fine-tuning and distillation of large language models to train a lightweight model capable of generating appropriate settings and explanations for individual smart devices in response to unconstrained user commands.
  • Figure 5: Overview of device state models, with examples included for a thermostat. We model a device based on a high-level state machine $S$ that describes valid transitions between states $m$ (left). A lower-level template $t_m$ associated with each state $m$ captures each setting $i$ and sensor $j$ and their valid ranges $\sigma_i, \gamma_j$ (center). At runtime, a snapshot $s_m$ describes the current state of the device (right). A thoughtful thing leverages a fine-tuned small language model to act by generating valid snapshots for new states and to explain by describing the snapshot of the device's current state.
  • ...and 6 more figures