Table of Contents
Fetching ...

AromaGen: Interactive Generation of Rich Olfactory Experiences with Multimodal Language Models

Yunge Wen, Awu Chen, Jianing Yu, Jas Brooks, Hiroshi Ishii, Paul Pu Liang

Abstract

Smell's deep connection with food, memory, and social experience has long motivated researchers to bring olfaction into interactive systems. Yet most olfactory interfaces remain limited to fixed scent cartridges and pre-defined generation patterns, and the scarcity of large-scale olfactory datasets has further constrained AI-based approaches. We present AromaGen, an AI-powered wearable interface capable of real-time, general-purpose aroma generation from free-form text or visual inputs. AromaGen is powered by a multimodal LLM that leverages latent olfactory knowledge to map semantic inputs to structured mixtures of 12 carefully selected base odorants, released through a neck-worn dispenser. Users can iteratively refine generated aromas through natural language feedback via in-context learning. Through a controlled user study ($N = 26$), AromaGen matches human-composed mixtures in zero-shot generation and significantly surpasses them after iterative refinement, achieving a median similarity of 8/10 to real food aromas and reducing perceived artificiality to levels comparable to real food. AromaGen is a step towards real-world interactive aroma generation, opening new possibilities for communication, wellbeing, and immersive technologies.

AromaGen: Interactive Generation of Rich Olfactory Experiences with Multimodal Language Models

Abstract

Smell's deep connection with food, memory, and social experience has long motivated researchers to bring olfaction into interactive systems. Yet most olfactory interfaces remain limited to fixed scent cartridges and pre-defined generation patterns, and the scarcity of large-scale olfactory datasets has further constrained AI-based approaches. We present AromaGen, an AI-powered wearable interface capable of real-time, general-purpose aroma generation from free-form text or visual inputs. AromaGen is powered by a multimodal LLM that leverages latent olfactory knowledge to map semantic inputs to structured mixtures of 12 carefully selected base odorants, released through a neck-worn dispenser. Users can iteratively refine generated aromas through natural language feedback via in-context learning. Through a controlled user study (), AromaGen matches human-composed mixtures in zero-shot generation and significantly surpasses them after iterative refinement, achieving a median similarity of 8/10 to real food aromas and reducing perceived artificiality to levels comparable to real food. AromaGen is a step towards real-world interactive aroma generation, opening new possibilities for communication, wellbeing, and immersive technologies.

Paper Structure

This paper contains 48 sections, 13 figures, 5 tables.

Figures (13)

  • Figure 1: Formative study 1 stimuli.
  • Figure 2: Polar chart of aroma descriptors elicited across Formative Study 1, grouped by olfactory category. Word proximity to center reflect frequency of use across all 10 participants. Recurring descriptors such as sweet, roasted, and fresh suggest a shared perceptual vocabulary that informed the selection of base odorants in AromaGen.
  • Figure 3: The 12 base odorants used in AromaGen's palette.
  • Figure 4: The AromaGen system pipeline. Users initiate zero-shot generation via multimodal inputs (text, image, or speech), which the system translates into an initial odorant mixture vector. Through human-in-the-loop iterative refinement, users can refine the aroma using natural language.
  • Figure 5: An example of AromaGen's iterative refinement and internal reasoning process: semantic decomposition (e.g., identifying food components), projection into a perceptual space (e.g., savory, sour), and constrained allocation to a ratio vector over base odorants. User feedback is incorporated via in-context learning, where high-level adjustments (e.g., "less sour") are translated into targeted updates of the aroma mixture.
  • ...and 8 more figures