Advanced — Flickki

Anatomy of a conversation

Eight things happen between a ping and a booked job.

A walkthrough of the runtime path for any inbound conversation — a phone call, a WhatsApp message, a web chat session, an SMS, an email thread. The same loop handles all of them.

Inbound signal

A message arrives on one of your connected channels. The channel adapter normalises it into a single event shape — the rest of the runtime doesn't care whether it came from PSTN, WhatsApp, or a web widget.

Session opens

The runtime creates or joins a session for that conversation and attaches the compiled assistant that matches the rule you set.

Assistant greets

The assistant loads its tone, glossary, and tool list, then posts or plays its opening line on whatever channel the conversation is on.

Input → LLM loop

Audio gets transcribed, text flows straight in, attachments ride along as context. The LLM sees a rolling transcript and the tool schema and decides what to do next.

Tool dispatch

When the LLM decides to call a tool, the runtime invokes the server-side executor, captures the result, and feeds it back into the loop.

Reply streamed

Text replies post instantly as messages. Voice replies stream through TTS, interruptible at word level. Rich cards, links, and attachments ride the same path.

Session settles

On hangup, thread close, or idle timeout, minutes and message units are reconciled against your plan and your channel cost.

Transcript stored

Full transcript, structured collected fields, and the tool-call log land in your portal. The run is searchable and exportable.

Runtime model catalog

Pick the right tradeoff for each part of the loop.

Flickki can run a classic STT → LLM → TTS pipeline or a voice-native model that handles audio directly. These are the production and beta options currently wired into the runtime.

Workload Run by Sort Search

Workload	Model	Run by	Good for	Price	Quality	Latency
LLM	Claude Sonnet 4.6Default for careful setup and complex assistant behavior.	Anthropic API	Nuanced conversation, tool use, high-stakes drafting.	$$$$$3 in / $15 out per 1M tokens	★★★	450 ms
LLM	Gemini 2.5 FlashDefault in-call LLM for new assistants.	Google API	Low-cost real-time turns, extraction, routing, normal support calls.	$$0.10 in / $0.40 out per 1M tokens	★★☆	300 ms
LLM	Gemini 2.5 ProHigher-reasoning Google option.	Google API	Structured extraction, deeper reasoning, complicated policy handling.	$$$$1.25 in / $5 out per 1M tokens	★★★	700 ms
LLM	Llama 3.3 70BOpen-weight Llama served on Groq LPUs.	Groq API	Fast response starts with stronger open-weight reasoning.	$$$0.59 in / $0.79 out per 1M tokens	★★★	250 ms
LLM	Llama 3.1 8B InstantSmall open-weight model optimized for speed.	Groq API	Short transactional turns where latency matters more than depth.	$$0.05 in / $0.08 out per 1M tokens	★★☆	150 ms
LLM	GPT-4oOpenAI flagship chat model.	OpenAI API	General reasoning, tool use, fallback compatibility.	$$$$$2.50 in / $10 out per 1M tokens	★★★	500 ms
LLM	GPT-4o miniCheaper OpenAI chat model.	OpenAI API	Simple assistants, summarization, low-cost OpenAI routing.	$$0.15 in / $0.60 out per 1M tokens	★★☆	300 ms
Voice-native	Gemini Live 2.5Audio in and audio out without separate STT/TTS.	Google API	Lowest-latency voice loops, interruption-heavy conversations.	$$$0.50 in / $2 out per 1M tokens + audio	★★☆	120 ms
Voice-native	OpenAI RealtimeNative audio model with strong tool adherence.	OpenAI API	Warm voice, complex realtime tool use, premium voice interactions.	$$$$$$5 in / $20 out per 1M tokens + audio	★★★	200 ms
STT	Deepgram Nova-2 PhonecallDefault speech-to-text model for calls.	Deepgram API	Telephone audio, voice agents, noisy inbound calls.	$$$0.005 per audio minute	★★☆	250 ms
STT	Deepgram Nova-3Higher-quality Deepgram transcription.	Deepgram API	Cleaner transcripts when accuracy matters more than a small cost bump.	$$$0.007 per audio minute	★★★	300 ms
STT	Google Cloud Speech latest_shortGoogle STT optimized for short utterances.	Google Cloud	Short voice commands and faster turn-taking.	$$$$0.012 per audio minute	★★☆	250 ms
STT	Google Cloud Speech latest_longGoogle STT for general spoken input.	Google Cloud	Varied speech and longer utterances.	$$$$0.012 per audio minute	★★☆	350 ms
STT	OpenAI WhisperBatch-mode transcription fallback.	OpenAI API	Fallback transcription and non-realtime files.	$$$0.006 per audio minute	★★☆	800 ms
TTS	Google TTS Standard-IDefault neutral US English voice.	Google Cloud	Cheap, clear, neutral voice for everyday callers.	$$0.004 per 1K characters	★★☆	250 ms
TTS	Google TTS Neural2Male and female US English voices.	Google Cloud	Higher-quality Google voices with predictable pronunciation.	$$$0.016 per 1K characters	★★★	300 ms
TTS	OpenAI TTS-1Fast OpenAI speech model.	OpenAI API	OpenAI fallback voice with simple integration.	$$$0.015 per 1K characters	★★☆	400 ms
TTS	OpenAI TTS-1 HDHigher-quality OpenAI speech model.	OpenAI API	Polished voiceovers and slower premium TTS fallback.	$$$$0.030 per 1K characters	★★★	700 ms
TTS	Deepgram Aura-2Andromeda English voice.	Deepgram API	Fast TTS when STT and TTS should share one vendor.	$$$$0.030 per 1K characters	★★★	150 ms
TTS	Cartesia Sonic-2Katie, Blake, and Sarah voices.	Cartesia API	Lowest-latency TTS, warm voice agents, interruption-heavy calls.	$$$$$$0.065 per 1K characters	★★★	60 ms
TTS	ElevenLabs Turbo v2.5Fast ElevenLabs voice option.	ElevenLabs API	Recognizable premium voice quality with lower latency.	$$$$$0.050 per 1K characters	★★★	300 ms
TTS	ElevenLabs Multilingual v2Multilingual ElevenLabs voice option.	ElevenLabs API	Higher-quality multilingual speech and branded voices.	$$$$$$0.060 per 1K characters	★★★	550 ms
VAD	Silero VADOpen-source voice activity detector.	Flickki worker	Detecting when a caller starts and stops speaking.	FreeRuns locally in the worker	★★☆	10 ms

Prices are provider pass-through estimates from the current runtime registry.

A deeper toolbox

Power users get a sharper set of tools.

The interview is for normal people. If you already know what you're doing, Flickki gets out of your way and gives you a clean, versioned, pasteable source of truth.

sam.assistant.md

tools.yml

glossary.md

v14 · saved

12345 678910 1112131415 1617

---
name: Sam — front-desk assistant
business: Delaney Plumbing
starter: library/front-desk-assistant
llm: claude-sonnet-4-6
voice: elevenlabs/nicole
skills:
  - message.take
  - urgency.triage
  - appointment.book
tools:
  - calendar.book
  - webhook.post    # → our CRM
escalate_when:
  - caller says "flooding"
  - caller asks for the owner by name
---

markdown utf-8 LF ⏱ diffed against v13 · ✓ 8 evals passing

🛠️

Tools you can wire to anything

The webhook tool is a full escape hatch — point it at your Supabase function, your Zapier scenario, your internal API. Structured inputs and outputs mean the LLM knows exactly what it can do.

🔁

Versioning and evals

Every assistant keeps a history. Roll back, diff, replay a transcript against a new version. Catch regressions before they hit production.

For those who build things

Under the hood, it's just Markdown and tools.

The interview is for everyone. This is for people who want to know what happens between a message arriving and an outcome landing in your portal — and who want to wire the underlying pieces themselves.

Assistants are plain Markdown with YAML front-matter. Paste one in, edit it, diff it, check it into git. The interview just writes the same file under the hood.

---
name: Sam — front desk assistant
business: Delaney Plumbing
starter: library/front-desk-assistant
voice: elevenlabs/nicole
llm: claude-sonnet-4-6
skills:
  - message.take
  - urgency.triage
  - appointment.book
glossary:
  - hydrojet
  - backflow preventer
  - Outer Sunset
tools:
  - calendar.book
  - webhook.post    # → our CRM
escalate_when:
  - caller says "flooding"
  - caller asks for the owner by name
---

Channels in, tools in the middle, outcomes out.