Table of Contents
Fetching ...

DuetML: Human-LLM Collaborative Machine Learning Framework for Non-Expert Users

Wataru Kawabe, Yusuke Sugano

TL;DR

DuetML tackles the challenge of non-experts translating vague goals into ML tasks by introducing humans and multimodal LLMs as collaborative agents in ML prototyping. It uses two agents, passive/reactive and active/proactive, that monitor user interactions and training data to provide context-aware guidance within a GUI. In a comparative user study, DuetML improved alignment between defined categories and target tasks without increasing cognitive load, with multimodal feedback enhancing guidance. The work demonstrates that human-MLLM collaboration can democratize task-specific ML development and informs future design of interactive AI systems.

Abstract

Machine learning (ML) models have significantly impacted various domains in our everyday lives. While large language models (LLMs) offer intuitive interfaces and versatility, task-specific ML models remain valuable for their efficiency and focused performance in specialized tasks. However, developing these models requires technical expertise, making it particularly challenging for non-expert users to customize them for their unique needs. Although interactive machine learning (IML) aims to democratize ML development through user-friendly interfaces, users struggle to translate their requirements into appropriate ML tasks. We propose human-LLM collaborative ML as a new paradigm bridging human-driven IML and machine-driven LLM approaches. To realize this vision, we introduce DuetML, a framework that integrates multimodal LLMs (MLLMs) as interactive agents collaborating with users throughout the ML process. Our system carefully balances MLLM capabilities with user agency by implementing both reactive and proactive interactions between users and MLLM agents. Through a comparative user study, we demonstrate that DuetML enables non-expert users to define training data that better aligns with target tasks without increasing cognitive load, while offering opportunities for deeper engagement with ML task formulation.

DuetML: Human-LLM Collaborative Machine Learning Framework for Non-Expert Users

TL;DR

DuetML tackles the challenge of non-experts translating vague goals into ML tasks by introducing humans and multimodal LLMs as collaborative agents in ML prototyping. It uses two agents, passive/reactive and active/proactive, that monitor user interactions and training data to provide context-aware guidance within a GUI. In a comparative user study, DuetML improved alignment between defined categories and target tasks without increasing cognitive load, with multimodal feedback enhancing guidance. The work demonstrates that human-MLLM collaboration can democratize task-specific ML development and informs future design of interactive AI systems.

Abstract

Machine learning (ML) models have significantly impacted various domains in our everyday lives. While large language models (LLMs) offer intuitive interfaces and versatility, task-specific ML models remain valuable for their efficiency and focused performance in specialized tasks. However, developing these models requires technical expertise, making it particularly challenging for non-expert users to customize them for their unique needs. Although interactive machine learning (IML) aims to democratize ML development through user-friendly interfaces, users struggle to translate their requirements into appropriate ML tasks. We propose human-LLM collaborative ML as a new paradigm bridging human-driven IML and machine-driven LLM approaches. To realize this vision, we introduce DuetML, a framework that integrates multimodal LLMs (MLLMs) as interactive agents collaborating with users throughout the ML process. Our system carefully balances MLLM capabilities with user agency by implementing both reactive and proactive interactions between users and MLLM agents. Through a comparative user study, we demonstrate that DuetML enables non-expert users to define training data that better aligns with target tasks without increasing cognitive load, while offering opportunities for deeper engagement with ML task formulation.

Paper Structure

This paper contains 33 sections, 10 figures.

Figures (10)

  • Figure 1: (A) Our system DuetML aims to assist users without a technical background in appropriately formulating tasks and creating training data in machine learning prototyping. (B) The system uses a multimodal large language model (MLLM)-based assistant to elicit user needs and guide them interactively toward appropriate training data.
  • Figure 2: Each agent's data processing flow in DuetML. The passive agent receives the dialogue history, including the latest prompt about the user's request and training data. The active agent receives dialogue history and the training data. The generated advice is appended to the dialogue history.
  • Figure 3: The overview of DuetML. Users can create training data for the classification model (A), train the model with it (B), and evaluate the trained model's performance (C) in the IML area. During this IML process, the passive agent provides advice in response to either the user's chat input (D) or button inputs related to training data (E) or inference results (F). Additionally, the active agent monitors the user's overall interaction and periodically offers advice. Users can toggle this feature on or off as they prefer (G). All the advice from the agents is shown in the chat area.
  • Figure 4: An example use case scenario of DuetML.
  • Figure 5: The baseline system for the user study. Except for the absence of a chat area in DuetML, the UI design is consistent with DuetML.
  • ...and 5 more figures