Table of Contents
Fetching ...

"What If Smart Homes Could See Our Homes?": Exploring DIY Smart Home Building Experiences with VLM-Based Camera Sensors

Sojeong Yun, Youn-kyung Lim

TL;DR

This work investigates how Vision-Language Model (VLM) camera sensors could transform DIY smart homes by enabling autonomous understanding of household contexts. Through a three-week diary-based experience prototyping with 12 participants, the study reveals three key outcomes: roles for VLM-based features (auto-monitoring, assistant, advisory), the distinctive sensor characteristics (comprehensive sensing, inference, perspective-embodied sensing, unbounded values, interpretive capabilities) that reshape the DIY process, and user concerns (privacy, replacement of family interactions, over-dependence, and AI control). The authors offer design implications across the DIY workflow to support feature construction with VLM sensors and discuss implications for living with intelligent homes. The findings highlight both the potential to simplify DIY smart-home building and the need to address trust, privacy, and social-psychological dynamics to ensure user autonomy and well-being. Overall, the work provides a user-centered foundation for developing VLM-based DIY smart-home systems and identifies critical directions for real-world deployment and collaborative use.

Abstract

The advancement of Vision-Language Model (VLM) camera sensors, which enable autonomous understanding of household situations without user intervention, has the potential to completely transform the DIY smart home building experience. Will this simplify or complicate the DIY smart home process? Additionally, what features do users want to create using these sensors? To explore this, we conducted a three-week diary-based experience prototyping study with 12 participants. Participants recorded their daily activities, used GPT to analyze the images, and manually customized and tested smart home features based on the analysis. The study revealed three key findings: (1) participants' expectations for VLM camera-based smart homes, (2) the impact of VLM camera sensor characteristics on the DIY process, and (3) users' concerns. Through the findings of this study, we propose design implications to support the DIY smart home building process with VLM camera sensors, and discuss living with intelligence.

"What If Smart Homes Could See Our Homes?": Exploring DIY Smart Home Building Experiences with VLM-Based Camera Sensors

TL;DR

This work investigates how Vision-Language Model (VLM) camera sensors could transform DIY smart homes by enabling autonomous understanding of household contexts. Through a three-week diary-based experience prototyping with 12 participants, the study reveals three key outcomes: roles for VLM-based features (auto-monitoring, assistant, advisory), the distinctive sensor characteristics (comprehensive sensing, inference, perspective-embodied sensing, unbounded values, interpretive capabilities) that reshape the DIY process, and user concerns (privacy, replacement of family interactions, over-dependence, and AI control). The authors offer design implications across the DIY workflow to support feature construction with VLM sensors and discuss implications for living with intelligent homes. The findings highlight both the potential to simplify DIY smart-home building and the need to address trust, privacy, and social-psychological dynamics to ensure user autonomy and well-being. Overall, the work provides a user-centered foundation for developing VLM-based DIY smart-home systems and identifies critical directions for real-world deployment and collaborative use.

Abstract

The advancement of Vision-Language Model (VLM) camera sensors, which enable autonomous understanding of household situations without user intervention, has the potential to completely transform the DIY smart home building experience. Will this simplify or complicate the DIY smart home process? Additionally, what features do users want to create using these sensors? To explore this, we conducted a three-week diary-based experience prototyping study with 12 participants. Participants recorded their daily activities, used GPT to analyze the images, and manually customized and tested smart home features based on the analysis. The study revealed three key findings: (1) participants' expectations for VLM camera-based smart homes, (2) the impact of VLM camera sensor characteristics on the DIY process, and (3) users' concerns. Through the findings of this study, we propose design implications to support the DIY smart home building process with VLM camera sensors, and discuss living with intelligence.

Paper Structure

This paper contains 43 sections, 13 figures, 5 tables.

Figures (13)

  • Figure 1: Three proposed concepts and the process of deriving them (All images were created using DALL-E 3)
  • Figure 2: Examples of cameras and mounts used by participants (From left: each camera setup for simulating smart glasses, a 360-degree rotating camera, and a fixed kitchen camera)
  • Figure 3: Step 1 of P3's Diary
  • Figure 4: Step 2 of P3's Diary, (a) capturing the necessary footage and key images, (b) determining the role of the camera sensor and the format of the output sensor data
  • Figure 5: Example photo of P3 utilizing ChatGPT to receive sensor data based on captured images and a prompt specifying the camera sensor’s role and the format of the output sensor data.
  • ...and 8 more figures