MEETING DELEGATE: Benchmarking LLMs on Attending Meetings on Our Behalf
Abstract
In contemporary workplaces, meetings are essential for exchanging ideas and ensuring team alignment but often face challenges such as time consumption, scheduling conflicts, and inefficient participation. Recent advancements in Large Language Models (LLMs) have demonstrated their strong capabilities in natural language generation and reasoning, prompting the question: can LLMs effectively delegate participants in meetings? To explore this, we develop a prototype LLM-powered meeting delegate system and create a comprehensive benchmark using real meeting transcripts. Our evaluation reveals that GPT-4/4o maintain balanced performance between active and cautious engagement strategies. In contrast, Gemini 1.5 Pro tends to be more cautious, while Gemini 1.5 Flash and Llama3-8B/70B display more active tendencies. Overall, about 60\% of responses address at least one key point from the ground-truth. However, improvements are needed to reduce irrelevant or repetitive content and enhance tolerance for transcription errors commonly found in real-world settings. Additionally, we implement the system in practical settings and collect real-world feedback from demos. Our findings underscore the potential and challenges of utilizing LLMs as meeting delegates, offering valuable insights into their practical application for alleviating the burden of meetings.
Community
This is an automated message from the Librarian Bot. I found the following papers similar to this paper.
The following papers were recommended by the Semantic Scholar API
- Probing Large Language Models in Reasoning and Translating Complex Linguistic Puzzles (2025)
- The Emergence of Strategic Reasoning of Large Language Models (2024)
- Factuality or Fiction? Benchmarking Modern LLMs on Ambiguous QA with Citations (2024)
- RealCritic: Towards Effectiveness-Driven Evaluation of Language Model Critiques (2025)
- Can Multimodal Large Language Models be Guided to Improve Industrial Anomaly Detection? (2025)
- Evaluating and Enhancing LLMs for Multi-turn Text-to-SQL with Multiple Question Types (2024)
- Are Your LLMs Capable of Stable Reasoning? (2024)
Please give a thumbs up to this comment if you found it helpful!
If you want recommendations for any Paper on Hugging Face checkout this Space
You can directly ask Librarian Bot for paper recommendations by tagging it in a comment:
@librarian-bot
recommend
Interesting paper! Since we’re all familiar with Neuro-sama on Twitch—an LLM with reportedly very fast response times—this paper explored a similar concept. It involved an LLM connected to a TTS system to participate in and guide meetings, such as stand-ups. A major issue highlighted in the paper was latency, with delays of up to 5 seconds. How much faster would using an MLLM or OpenAI's Real-Time API have made the system? Could it have reduced latency by 10%, 50%, or more?
Models citing this paper 0
No model linking this paper
Datasets citing this paper 0
No dataset linking this paper
Spaces citing this paper 0
No Space linking this paper
Collections including this paper 0
No Collection including this paper