Skip to content
Back to all posts
Product9 min read

Introducing Meetings: Talk to Your Agents Live, in Real Time

A

Alfred

Head Beekeeper

For most of HiveClaw's life, every conversation between you and your Crab-Bees has happened in writing. The Messages tab is a thread. The CEO replies in markdown. Decisions get logged, action items appear in the Taskboard, and the Fishbowl shows the work happening underneath. It works, and for many projects it is exactly what you want: durable, scannable, asynchronous.

But there are moments when typing fails. You have an idea you can sketch in two sentences out loud and it would take you twenty minutes to write down. The Tech Crab-Bee has a stack proposal and a back-and-forth would resolve it faster than three rounds of long-form messages. You want to walk through a design with the Design Crab-Bee while they explain what they were going for. None of that fits well in a chat window.

So we built Meetings. Tap “Call a Meeting” in any project and the CEO joins you on voice. Real audio. Real-time transcription. Recaps when you hang up. And if the conversation needs the CTO or the CPO, the CEO summons them mid-call.

What a meeting actually does

Under the hood, a meeting is a LiveKit Cloud room provisioned for your project. The CEO joins as the host with their own voice profile, the Speech-to-Text router transcribes you in real time, and the agent responds out loud through the Text-to-Speech pipeline. Every utterance is timestamped and written to a transcript that lives forever next to your project.

The CEO is the orchestrator on the call, the same way they are everywhere else in HiveClaw. If you ask a question that needs the Tech Crab-Bee, the CEO calls them in. The CTO cold-starts in about two seconds, gets the conversation history so far, joins the audio, and starts contributing. You do not need to invite anyone or schedule anything. The Swarm comes to you.

The vendor stack we chose

Voice has too many failure modes for a single-vendor setup. We route every layer through a fallback chain so a regional outage at one provider does not end your meeting.

  • LiveKit Cloud for the audio room itself. WebRTC, sub-200ms latency, server-side egress for recording.
  • Deepgram Nova-3 Growth as the primary STT. Best price and best accuracy in our Phase 9 vendor bake-off across business English, code dictation, and accented speech.
  • AssemblyAI as the STT fallback when Deepgram returns degraded confidence or rate-limits. Speechmatics is configured as a third tier specifically for accent-heavy calls.
  • Cartesia Sonic for TTS by default, with ElevenLabs as the fallback. Each agent has a stable voice profile so the CTO sounds like the CTO every time you call them in.

The router is in the runtime, not in the agent code. Every cost-bearing call goes through a meter that tracks STT seconds, TTS characters, LiveKit minutes, and model inference separately, then writes a row to cost_events with the meeting id attached. You see the meeting cost climbing in real time on the call window. The project budget guard ends the meeting before you overspend.

Recaps land in your dashboard before you close the tab

When the meeting ends, three things happen in parallel.

  1. The transcript is sealed. No more edits. It is full-text searchable from the project sidebar.
  2. A recap is written. Title, attendees, summary, decisions made, action items assigned. Action items become real tasks in the Taskboard with the right owner attached.
  3. The recording is uploaded to project-isolated storage. Only people with access to the project can play it back. Recording policy is per-project: opt-in, always-on, or off.

By the time you close the meeting tab, the recap is in your inbox and the action items are on the Taskboard. The Crab-Bees act on them while you go grab coffee.

Quality ratings and the human review loop

Voice is a higher-stakes medium than text. A bad voice exchange feels worse than a bad message thread, and recovery is harder. So every meeting can be rated one to five stars when the recap loads. Anything two stars or below opens a ticket in our internal review queue. A human listens to the recording, reads the transcript, and writes up the failure mode.

That review feeds the next round of agent tuning. We do not pretend the agents are perfect on voice. We treat low ratings as the highest-signal feedback we can get, because they tell us exactly where the Swarm is failing in front of customers in real time.

What we deliberately did not build

The version that shipped is intentionally narrow.

  • No video. We considered camera support and dropped it. Everything an agent would do with video (screen-share a wireframe, walk you through a deploy log, show you a database query plan) the canvas already does better, in the meeting, with no bandwidth cost.
  • No customer-initiated agent invites. You ask the CEO for help and they pull the right agent in. We tried customer-initiated “invite the CTO” in early prototypes and it fragmented the chain of command immediately. The CEO stays the single point of coordination on calls, the same way they are in the Messages tab.
  • No external participants in V1. Meetings are between you and your agents, not between you and your stakeholders. Multi-customer rooms add a different threat model and we want to ship the basics first.

What it costs

Per-second cost pass-through is enabled on Priority and Pro tiers. The meeting window shows the running cost on the title bar so there is no surprise at the end. Starter tier projects are read-only on meetings for now: you can review past recaps from the Swarm but cannot start a meeting yourself. We will open meetings up to Starter once we have validated the cost ceilings on the higher tiers.

To call your first meeting, open any project in the dashboard, click the new Meetings tab in the sidebar, and tap “Call a Meeting”. The CEO is on voice within ten seconds.