Bridging the gap between natural user expression with complex automation programming in smart homes

Yingtian Shi; Xiaoyi Liu; Chun Yu; Tianao Yang; Cheng Gao; Chen Liang; Yuanchun Shi

Bridging the gap between natural user expression with complex automation programming in smart homes

Yingtian Shi, Xiaoyi Liu, Chun Yu, Tianao Yang, Cheng Gao, Chen Liang, Yuanchun Shi

TL;DR

This work tackles the challenge of enabling end-users to configure complex smart-home automation using natural expressions while ensuring executability. It introduces AwareAuto, a system that fuses context-aware multimodal sensing with two-step LLM inference (intent and feasibility) and a TA-pair grounding mechanism to generate deployable automation rules. Key contributions include standardized multimodal inputs, a structured prompt framework for both rule inference and grounding, a TA-pair representation of automation, and an interactive loop for controllability, achieving an overall 91.7% inference success on a realistic complex-task dataset. The approach demonstrates how grounding LLMs in real-world smart-home contexts can balance naturalness and expressiveness, paving the way for practical, user-friendly end-user programming of dynamic, multi-modal automation.

Abstract

A long-standing challenge in end-user programming (EUP) is to trade off between natural user expression and the complexity of programming tasks. As large language models (LLMs) are empowered to handle semantic inference and natural language understanding, it remains under-explored how such capabilities can facilitate end-users to configure complex automation more naturally and easily. We propose AwareAuto, an EUP system that standardizes user expression and finishes two-step inference with the LLMs to achieve automation generation. AwareAuto allows contextual, multi-modality, and flexible user expression to configure complex automation tasks (e.g., dynamic parameters, multiple conditional branches, and temporal constraints), which are non-manageable in traditional EUP solutions. By studying realistic, complex rules data, AwareAuto gains 91.7% accuracy in matching user intentions and feasibility. We introduced user interaction to ensure system controllability and usability. We discuss the opportunities and challenges of incorporating LLMs in end-user programming techniques and grounding complex smart home contexts.

Bridging the gap between natural user expression with complex automation programming in smart homes

TL;DR

Abstract

Paper Structure (31 sections, 10 figures, 1 table)

This paper contains 31 sections, 10 figures, 1 table.

Introduction
Backgrounds and Related Works
Naturalness and Expressiveness in End User Programming
LLMs Grounding Real-World Tasks
AI-enabled Smart Home Automation Programming
Identifying Challenges
User natural behavior understanding
Source of Complexity in Rules
Design and Implementations
Standardization of multimodal information
Reasoning logic of automation rules
Standardization of automation operation
Prompt design for rule inference
Grounding automation rules
TA pair design for automation
...and 16 more sections

Figures (10)

Figure 1: Source of complexity for automation rule generation tasks. All sources can be classified into six categories, depending on where the complexity is generated and whether the complexity depends on holistic or partial factors.
Figure 2: AwareAuto System Framework. The cooperation between the four subsystems of Sensing, Reasoning, Grounding, and Interaction transforms natural user expressions into complex automation rules that can be executed.
Figure 3: Standardization of user expressions. The standardization contains the dynamic environment information and the description of the user behavior in natural language, which facilitates the inference of the subsequent model.
Figure 4: Standardization of complex rules. In the rule logic relationship using TA-pair management, the trigger is responsible for recording all the relevant conditions. The actions are in the form of a group of records, each group recorded the id set of the trigger
Figure 5: Structure of prompt. The Reasoning and Grounding sections use a similar prompt design structure, starting with a formalization of both the output section and the inference constraints, giving the required static information in the middle section according to the requirements, and finally reinforcing LLM's understanding of the task and inference with examples.
...and 5 more figures

Bridging the gap between natural user expression with complex automation programming in smart homes

TL;DR

Abstract

Bridging the gap between natural user expression with complex automation programming in smart homes

Authors

TL;DR

Abstract

Table of Contents

Figures (10)