Table of Contents
Fetching ...

IRIS: Wireless Ring for Vision-based Smart Home Interaction

Maruchi Kim, Antonio Glenn, Bandhav Veluri, Yunseo Lee, Eyoel Gebre, Aditya Bagaria, Shwetak Patel, Shyamnath Gollakota

TL;DR

IRIS tackles the challenge of vision-based smart home control with a fully wireless ring that integrates a camera, BLE, IMU, and battery in a SWaP-constrained form factor. It fuses a hardware design with a cross-layer ML pipeline (YOLO+CODA for center-object detection and DINOV2-based semantic embeddings) to achieve instance-level device recognition via context-rich scene semantics. The system delivers real-time interaction (end-to-end latency under or around 1 s) and 16–24 h of battery life, validated by a user study showing IRIS is often preferred over voice for toggling, granular control, and social acceptability. The work demonstrates practical, unobtrusive smart home control with broad potential impact on ring-based HCI, while outlining future directions for gesture expansion, on-device inference, and privacy-preserving deployment.

Abstract

Integrating cameras into wireless smart rings has been challenging due to size and power constraints. We introduce IRIS, the first wireless vision-enabled smart ring system for smart home interactions. Equipped with a camera, Bluetooth radio, inertial measurement unit (IMU), and an onboard battery, IRIS meets the small size, weight, and power (SWaP) requirements for ring devices. IRIS is context-aware, adapting its gesture set to the detected device, and can last for 16-24 hours on a single charge. IRIS leverages the scene semantics to achieve instance-level device recognition. In a study involving 23 participants, IRIS consistently outpaced voice commands, with a higher proportion of participants expressing a preference for IRIS over voice commands regarding toggling a device's state, granular control, and social acceptability. Our work pushes the boundary of what is possible with ring form-factor devices, addressing system challenges and opening up novel interaction capabilities.

IRIS: Wireless Ring for Vision-based Smart Home Interaction

TL;DR

IRIS tackles the challenge of vision-based smart home control with a fully wireless ring that integrates a camera, BLE, IMU, and battery in a SWaP-constrained form factor. It fuses a hardware design with a cross-layer ML pipeline (YOLO+CODA for center-object detection and DINOV2-based semantic embeddings) to achieve instance-level device recognition via context-rich scene semantics. The system delivers real-time interaction (end-to-end latency under or around 1 s) and 16–24 h of battery life, validated by a user study showing IRIS is often preferred over voice for toggling, granular control, and social acceptability. The work demonstrates practical, unobtrusive smart home control with broad potential impact on ring-based HCI, while outlining future directions for gesture expansion, on-device inference, and privacy-preserving deployment.

Abstract

Integrating cameras into wireless smart rings has been challenging due to size and power constraints. We introduce IRIS, the first wireless vision-enabled smart ring system for smart home interactions. Equipped with a camera, Bluetooth radio, inertial measurement unit (IMU), and an onboard battery, IRIS meets the small size, weight, and power (SWaP) requirements for ring devices. IRIS is context-aware, adapting its gesture set to the detected device, and can last for 16-24 hours on a single charge. IRIS leverages the scene semantics to achieve instance-level device recognition. In a study involving 23 participants, IRIS consistently outpaced voice commands, with a higher proportion of participants expressing a preference for IRIS over voice commands regarding toggling a device's state, granular control, and social acceptability. Our work pushes the boundary of what is possible with ring form-factor devices, addressing system challenges and opening up novel interaction capabilities.
Paper Structure (36 sections, 2 equations, 16 figures, 6 tables)

This paper contains 36 sections, 2 equations, 16 figures, 6 tables.

Figures (16)

  • Figure 1: IRIS hardware inside 3D-printed enclosure and when placed beside a quarter. The battery sits inside the band of the ring. The ring diameter and band thickness are 17.5 and 2.9 mm.
  • Figure 2: IRIS in Action: Context-aware smart home control. This figure illustrates the application of IRIS, a custom-designed, camera-enabled wireless ring for context-aware control of the smart home. IRIS utilizes instance-based object detection for real-time interaction with the environment.
  • Figure 3: IRIS on a User's Hand. (A) Front View, (B) Top View.
  • Figure 4: IRIS image and IMU wireless data streaming. BLE packets are formed with the following structure: ST-Status Flags (Start of Frame, IMU Valid, Button State), AD-Accelerometer Data, GD-Gyroscope Data, CD-Camera Data.
  • Figure 5: IRIS pipeline. (1) A user points and clicks at the smart device they would like to control. (2) IRIS wirelessly streams the images to a smartphone, and (3) runs YOLO and DinoV2. (4) The centered object detection algorithm (CODA) filters out the multiple objects YOLO may detect and outputs the object closest to the center of the frame. IRIS then queries the smart device library, but since, in this example, there are two instances of Blinds in the home it stops here and utilizes the Dinov2 path. Next, the output embedding from Dinov2 is passed as input to (5) the Embedding + UUID database to find the embedding with the highest similarity, and the output of CODA is also passed as input to reduce the search space. The highest similarity corresponds to Blinds 2 UUID, and the (6) Home Device Manager controls Blinds 2.
  • ...and 11 more figures