Table of Contents
Fetching ...

MEETING DELEGATE: Benchmarking LLMs on Attending Meetings on Our Behalf

Lingxiang Hu, Shurun Yuan, Xiaoting Qin, Jue Zhang, Qingwei Lin, Dongmei Zhang, Saravan Rajmohan, Qi Zhang

TL;DR

The paper investigates using LLMs as meeting delegates that represent individual participants in multi-person meetings. It introduces a prototype system with three components (information gathering, meeting engagement, and voice generation) and a benchmark dataset derived from real transcripts to evaluate timing and content of delegated contributions across four scene types. Across multiple LLMs, GPT-4/4o show the strongest, most balanced performance, while other models display varying degrees of conservatism or proactivity; overall recall of key points is about 60%, with notable sensitivity to transcription noise and content quality. The study also proposes a phased deployment approach (Execute, Assist, Delegate) to balance autonomy with privacy, emphasizes privacy-by-design and human-in-the-loop governance, and presents real-world demos to illustrate practical potential and challenges in applying LLMs to alleviate meeting burdens.

Abstract

In contemporary workplaces, meetings are essential for exchanging ideas and ensuring team alignment but often face challenges such as time consumption, scheduling conflicts, and inefficient participation. Recent advancements in Large Language Models (LLMs) have demonstrated their strong capabilities in natural language generation and reasoning, prompting the question: can LLMs effectively delegate participants in meetings? To explore this, we develop a prototype LLM-powered meeting delegate system and create a comprehensive benchmark using real meeting transcripts. Our evaluation reveals that GPT-4/4o maintain balanced performance between active and cautious engagement strategies. In contrast, Gemini 1.5 Pro tends to be more cautious, while Gemini 1.5 Flash and Llama3-8B/70B display more active tendencies. Overall, about 60\% of responses address at least one key point from the ground-truth. However, improvements are needed to reduce irrelevant or repetitive content and enhance tolerance for transcription errors commonly found in real-world settings. Additionally, we implement the system in practical settings and collect real-world feedback from demos. Our findings underscore the potential and challenges of utilizing LLMs as meeting delegates, offering valuable insights into their practical application for alleviating the burden of meetings.

MEETING DELEGATE: Benchmarking LLMs on Attending Meetings on Our Behalf

TL;DR

The paper investigates using LLMs as meeting delegates that represent individual participants in multi-person meetings. It introduces a prototype system with three components (information gathering, meeting engagement, and voice generation) and a benchmark dataset derived from real transcripts to evaluate timing and content of delegated contributions across four scene types. Across multiple LLMs, GPT-4/4o show the strongest, most balanced performance, while other models display varying degrees of conservatism or proactivity; overall recall of key points is about 60%, with notable sensitivity to transcription noise and content quality. The study also proposes a phased deployment approach (Execute, Assist, Delegate) to balance autonomy with privacy, emphasizes privacy-by-design and human-in-the-loop governance, and presents real-world demos to illustrate practical potential and challenges in applying LLMs to alleviate meeting burdens.

Abstract

In contemporary workplaces, meetings are essential for exchanging ideas and ensuring team alignment but often face challenges such as time consumption, scheduling conflicts, and inefficient participation. Recent advancements in Large Language Models (LLMs) have demonstrated their strong capabilities in natural language generation and reasoning, prompting the question: can LLMs effectively delegate participants in meetings? To explore this, we develop a prototype LLM-powered meeting delegate system and create a comprehensive benchmark using real meeting transcripts. Our evaluation reveals that GPT-4/4o maintain balanced performance between active and cautious engagement strategies. In contrast, Gemini 1.5 Pro tends to be more cautious, while Gemini 1.5 Flash and Llama3-8B/70B display more active tendencies. Overall, about 60\% of responses address at least one key point from the ground-truth. However, improvements are needed to reduce irrelevant or repetitive content and enhance tolerance for transcription errors commonly found in real-world settings. Additionally, we implement the system in practical settings and collect real-world feedback from demos. Our findings underscore the potential and challenges of utilizing LLMs as meeting delegates, offering valuable insights into their practical application for alleviating the burden of meetings.

Paper Structure

This paper contains 16 sections, 10 figures, 27 tables.

Figures (10)

  • Figure 1: Architecture of the meeting delegate system.
  • Figure 2: Workflow of an LLM-powered meeting delegate system. The process involves user input of meeting intent and shareable information prior to the meeting, real-time participation based on meeting transcripts, and response generation aligned with prompted instructions and meeting objectives.
  • Figure 3: Data statistics of the Matched Dataset.
  • Figure 4: Response Rate on Matched Dataset vs. Silence Rate on Mismatched Dataset.
  • Figure 5: Solution directions from error analysis of bad cases in Response (Silence) Rate for Matched and Mismatched Datasets.
  • ...and 5 more figures