Channels in, tools in the middle, outcomes out.

Every Flickki assistant is a real-time loop. A message arrives on a channel, the runtime spins up or joins an assistant, the LLM decides what to say and which tools to call, and the result lands on one of three outcomes — booked, escalated, or noted.

Tool calls are first-class. The LLM doesn't just talk — it books a slot, sends a message, pokes a webhook, transfers a call. Every invocation is logged, typed, and replayable.

Eight things happen between a ping and a booked job.

A walkthrough of the runtime path for any inbound conversation — a phone call, a WhatsApp message, a web chat session, an SMS, an email thread. The same loop handles all of them.

01

Inbound signal

A message arrives on one of your connected channels. The channel adapter normalises it into a single event shape — the rest of the runtime doesn't care whether it came from PSTN, WhatsApp, or a web widget.

02

Session opens

The runtime creates or joins a session for that conversation and attaches the compiled assistant that matches the rule you set.

03

Assistant greets

The assistant loads its tone, glossary, and tool list, then posts or plays its opening line on whatever channel the conversation is on.

04

Input → LLM loop

Audio gets transcribed, text flows straight in, attachments ride along as context. The LLM sees a rolling transcript and the tool schema and decides what to do next.

05

Tool dispatch

When the LLM decides to call a tool, the runtime invokes the server-side executor, captures the result, and feeds it back into the loop.

06

Reply streamed

Text replies post instantly as messages. Voice replies stream through TTS, interruptible at word level. Rich cards, links, and attachments ride the same path.

07

Session settles

On hangup, thread close, or idle timeout, minutes and message units are reconciled against your plan and your channel cost.

08

Transcript stored

Full transcript, structured collected fields, and the tool-call log land in your portal. The run is searchable and exportable.

Pick the right tradeoff for each part of the loop.

Flickki can run a classic STT → LLM → TTS pipeline or a voice-native model that handles audio directly. These are the production and beta options currently wired into the runtime.

Workload Model Run by Good for Price Quality Latency
LLM Claude Sonnet 4.6Default for careful setup and complex assistant behavior. Anthropic API Nuanced conversation, tool use, high-stakes drafting. $$$$$3 in / $15 out per 1M tokens ★★★ 450 ms
LLM Gemini 2.5 FlashDefault in-call LLM for new assistants. Google API Low-cost real-time turns, extraction, routing, normal support calls. $$0.10 in / $0.40 out per 1M tokens ★★☆ 300 ms
LLM Gemini 2.5 ProHigher-reasoning Google option. Google API Structured extraction, deeper reasoning, complicated policy handling. $$$$1.25 in / $5 out per 1M tokens ★★★ 700 ms
LLM Llama 3.3 70BOpen-weight Llama served on Groq LPUs. Groq API Fast response starts with stronger open-weight reasoning. $$$0.59 in / $0.79 out per 1M tokens ★★★ 250 ms
LLM Llama 3.1 8B InstantSmall open-weight model optimized for speed. Groq API Short transactional turns where latency matters more than depth. $$0.05 in / $0.08 out per 1M tokens ★★☆ 150 ms
LLM GPT-4oOpenAI flagship chat model. OpenAI API General reasoning, tool use, fallback compatibility. $$$$$2.50 in / $10 out per 1M tokens ★★★ 500 ms
LLM GPT-4o miniCheaper OpenAI chat model. OpenAI API Simple assistants, summarization, low-cost OpenAI routing. $$0.15 in / $0.60 out per 1M tokens ★★☆ 300 ms
Voice-native Gemini Live 2.5Audio in and audio out without separate STT/TTS. Google API Lowest-latency voice loops, interruption-heavy conversations. $$$0.50 in / $2 out per 1M tokens + audio ★★☆ 120 ms
Voice-native OpenAI RealtimeNative audio model with strong tool adherence. OpenAI API Warm voice, complex realtime tool use, premium voice interactions. $$$$$$5 in / $20 out per 1M tokens + audio ★★★ 200 ms
STT Deepgram Nova-2 PhonecallDefault speech-to-text model for calls. Deepgram API Telephone audio, voice agents, noisy inbound calls. $$$0.005 per audio minute ★★☆ 250 ms
STT Deepgram Nova-3Higher-quality Deepgram transcription. Deepgram API Cleaner transcripts when accuracy matters more than a small cost bump. $$$0.007 per audio minute ★★★ 300 ms
STT Google Cloud Speech latest_shortGoogle STT optimized for short utterances. Google Cloud Short voice commands and faster turn-taking. $$$$0.012 per audio minute ★★☆ 250 ms
STT Google Cloud Speech latest_longGoogle STT for general spoken input. Google Cloud Varied speech and longer utterances. $$$$0.012 per audio minute ★★☆ 350 ms
STT OpenAI WhisperBatch-mode transcription fallback. OpenAI API Fallback transcription and non-realtime files. $$$0.006 per audio minute ★★☆ 800 ms
TTS Google TTS Standard-IDefault neutral US English voice. Google Cloud Cheap, clear, neutral voice for everyday callers. $$0.004 per 1K characters ★★☆ 250 ms
TTS Google TTS Neural2Male and female US English voices. Google Cloud Higher-quality Google voices with predictable pronunciation. $$$0.016 per 1K characters ★★★ 300 ms
TTS OpenAI TTS-1Fast OpenAI speech model. OpenAI API OpenAI fallback voice with simple integration. $$$0.015 per 1K characters ★★☆ 400 ms
TTS OpenAI TTS-1 HDHigher-quality OpenAI speech model. OpenAI API Polished voiceovers and slower premium TTS fallback. $$$$0.030 per 1K characters ★★★ 700 ms
TTS Deepgram Aura-2Andromeda English voice. Deepgram API Fast TTS when STT and TTS should share one vendor. $$$$0.030 per 1K characters ★★★ 150 ms
TTS Cartesia Sonic-2Katie, Blake, and Sarah voices. Cartesia API Lowest-latency TTS, warm voice agents, interruption-heavy calls. $$$$$$0.065 per 1K characters ★★★ 60 ms
TTS ElevenLabs Turbo v2.5Fast ElevenLabs voice option. ElevenLabs API Recognizable premium voice quality with lower latency. $$$$$0.050 per 1K characters ★★★ 300 ms
TTS ElevenLabs Multilingual v2Multilingual ElevenLabs voice option. ElevenLabs API Higher-quality multilingual speech and branded voices. $$$$$$0.060 per 1K characters ★★★ 550 ms
VAD Silero VADOpen-source voice activity detector. Flickki worker Detecting when a caller starts and stops speaking. FreeRuns locally in the worker ★★☆ 10 ms

Prices are provider pass-through estimates from the current runtime registry.

Every starter is a bundle. Not a box.

The starter assistants on the assistants page are just the opening move. Each one gives you a style, useful skills, recommended tools, and sane boundaries — then it's yours to expand, fork, and rewire however you want.

Swap the LLM. Add a skill. Attach your own tools. Paste in your domain glossary. Wire a webhook to your internal API. Check the whole thing into git. Flickki doesn't care how fancy you get — it just runs the file.

Power users get a sharper set of tools.

The interview is for normal people. If you already know what you're doing, Flickki gets out of your way and gives you a clean, versioned, pasteable source of truth.

🛠️

Tools you can wire to anything

The webhook tool is a full escape hatch — point it at your Supabase function, your Zapier scenario, your internal API. Structured inputs and outputs mean the LLM knows exactly what it can do.

🔁

Versioning and evals

Every assistant keeps a history. Roll back, diff, replay a transcript against a new version. Catch regressions before they hit production.

Under the hood, it's just Markdown and tools.

The interview is for everyone. This is for people who want to know what happens between a message arriving and an outcome landing in your portal — and who want to wire the underlying pieces themselves.

Assistants are plain Markdown with YAML front-matter. Paste one in, edit it, diff it, check it into git. The interview just writes the same file under the hood.

---
name: Sam — front desk assistant
business: Delaney Plumbing
starter: library/front-desk-assistant
voice: elevenlabs/nicole
llm: claude-sonnet-4-6
skills:
  - message.take
  - urgency.triage
  - appointment.book
glossary:
  - hydrojet
  - backflow preventer
  - Outer Sunset
tools:
  - calendar.book
  - webhook.post    # → our CRM
escalate_when:
  - caller says "flooding"
  - caller asks for the owner by name
---

Paste an assistant. See what it can do.

Flickki runs the file. Pick a channel, drop a Markdown assistant into the editor, and watch the runtime spin up a real Room with real tools attached.

Free sign up →