Expanding the Generative AI Design Space through Structured Prompting and Multimodal Interfaces
Nimisha Karnatak, Adrien Baranes, Rob Marchant, Huinan Zeng, Tríona Butler, Kristen Olson
TL;DR
Small business owners face difficulties translating brand intent into prompts and maintaining control over generated content. The authors propose ACAI, a multimodal, panel-based interface that composes a structured 'super prompt' from Branding, Audience & Goals, and Inspiration Board inputs, guided by Gemini 1.5 Pro, to produce brand-aligned ad briefs rather than raw visuals. A formative study with six SBOs identifies promptability, brand alignment, and user control as key pain points, which ACAI addresses through context-rich, constraint-aware design. The work demonstrates that structured, multimodal interaction can broaden accessibility and improve co-creative workflows in advertising, suggesting a shift from traditional prompt-based interfaces toward more inclusive AI design paradigms.
Abstract
Text-based prompting remains the predominant interaction paradigm in generative AI, yet it often introduces friction for novice users such as small business owners (SBOs), who struggle to articulate creative goals in domain-specific contexts like advertising. Through a formative study with six SBOs in the United Kingdom, we identify three key challenges: difficulties in expressing brand intuition through prompts, limited opportunities for fine-grained adjustment and refinement during and after content generation, and the frequent production of generic content that lacks brand specificity. In response, we present ACAI (AI Co-Creation for Advertising and Inspiration), a multimodal generative AI tool designed to support novice designers by moving beyond traditional prompt interfaces. ACAI features a structured input system composed of three panels: Branding, Audience and Goals, and the Inspiration Board. These inputs allow users to convey brand-relevant context and visual preferences. This work contributes to HCI research on generative systems by showing how structured interfaces can foreground user-defined context, improve alignment, and enhance co-creative control in novice creative workflows.
