Every Flickki assistant is a real-time loop. A message arrives on a channel, the runtime spins up or joins an assistant, the LLM decides what to say and which tools to call, and the result lands on one of three outcomes — booked, escalated, or noted.
Tool calls are first-class. The LLM doesn't just talk — it books a slot, sends a message, pokes a webhook, transfers a call. Every invocation is logged, typed, and replayable.
A walkthrough of the runtime path for any inbound conversation — a phone call, a WhatsApp message, a web chat session, an SMS, an email thread. The same loop handles all of them.
A message arrives on one of your connected channels. The channel adapter normalises it into a single event shape — the rest of the runtime doesn't care whether it came from PSTN, WhatsApp, or a web widget.
The runtime creates or joins a session for that conversation and attaches the compiled assistant that matches the rule you set.
The assistant loads its tone, glossary, and tool list, then posts or plays its opening line on whatever channel the conversation is on.
Audio gets transcribed, text flows straight in, attachments ride along as context. The LLM sees a rolling transcript and the tool schema and decides what to do next.
When the LLM decides to call a tool, the runtime invokes the server-side executor, captures the result, and feeds it back into the loop.
Text replies post instantly as messages. Voice replies stream through TTS, interruptible at word level. Rich cards, links, and attachments ride the same path.
On hangup, thread close, or idle timeout, minutes and message units are reconciled against your plan and your channel cost.
Full transcript, structured collected fields, and the tool-call log land in your portal. The run is searchable and exportable.
Flickki can run a classic STT → LLM → TTS pipeline or a voice-native model that handles audio directly. These are the production and beta options currently wired into the runtime.
| Workload | Model | Run by | Good for | Price | Quality | Latency |
|---|---|---|---|---|---|---|
| LLM | Claude Sonnet 4.6Default for careful setup and complex assistant behavior. | Anthropic API | Nuanced conversation, tool use, high-stakes drafting. | $$$$$3 in / $15 out per 1M tokens | ★★★ | 450 ms |
| LLM | Gemini 2.5 FlashDefault in-call LLM for new assistants. | Google API | Low-cost real-time turns, extraction, routing, normal support calls. | $$0.10 in / $0.40 out per 1M tokens | ★★☆ | 300 ms |
| LLM | Gemini 2.5 ProHigher-reasoning Google option. | Google API | Structured extraction, deeper reasoning, complicated policy handling. | $$$$1.25 in / $5 out per 1M tokens | ★★★ | 700 ms |
| LLM | Llama 3.3 70BOpen-weight Llama served on Groq LPUs. | Groq API | Fast response starts with stronger open-weight reasoning. | $$$0.59 in / $0.79 out per 1M tokens | ★★★ | 250 ms |
| LLM | Llama 3.1 8B InstantSmall open-weight model optimized for speed. | Groq API | Short transactional turns where latency matters more than depth. | $$0.05 in / $0.08 out per 1M tokens | ★★☆ | 150 ms |
| LLM | GPT-4oOpenAI flagship chat model. | OpenAI API | General reasoning, tool use, fallback compatibility. | $$$$$2.50 in / $10 out per 1M tokens | ★★★ | 500 ms |
| LLM | GPT-4o miniCheaper OpenAI chat model. | OpenAI API | Simple assistants, summarization, low-cost OpenAI routing. | $$0.15 in / $0.60 out per 1M tokens | ★★☆ | 300 ms |
| Voice-native | Gemini Live 2.5Audio in and audio out without separate STT/TTS. | Google API | Lowest-latency voice loops, interruption-heavy conversations. | $$$0.50 in / $2 out per 1M tokens + audio | ★★☆ | 120 ms |
| Voice-native | OpenAI RealtimeNative audio model with strong tool adherence. | OpenAI API | Warm voice, complex realtime tool use, premium voice interactions. | $$$$$$5 in / $20 out per 1M tokens + audio | ★★★ | 200 ms |
| STT | Deepgram Nova-2 PhonecallDefault speech-to-text model for calls. | Deepgram API | Telephone audio, voice agents, noisy inbound calls. | $$$0.005 per audio minute | ★★☆ | 250 ms |
| STT | Deepgram Nova-3Higher-quality Deepgram transcription. | Deepgram API | Cleaner transcripts when accuracy matters more than a small cost bump. | $$$0.007 per audio minute | ★★★ | 300 ms |
| STT | Google Cloud Speech latest_shortGoogle STT optimized for short utterances. | Google Cloud | Short voice commands and faster turn-taking. | $$$$0.012 per audio minute | ★★☆ | 250 ms |
| STT | Google Cloud Speech latest_longGoogle STT for general spoken input. | Google Cloud | Varied speech and longer utterances. | $$$$0.012 per audio minute | ★★☆ | 350 ms |
| STT | OpenAI WhisperBatch-mode transcription fallback. | OpenAI API | Fallback transcription and non-realtime files. | $$$0.006 per audio minute | ★★☆ | 800 ms |
| TTS | Google TTS Standard-IDefault neutral US English voice. | Google Cloud | Cheap, clear, neutral voice for everyday callers. | $$0.004 per 1K characters | ★★☆ | 250 ms |
| TTS | Google TTS Neural2Male and female US English voices. | Google Cloud | Higher-quality Google voices with predictable pronunciation. | $$$0.016 per 1K characters | ★★★ | 300 ms |
| TTS | OpenAI TTS-1Fast OpenAI speech model. | OpenAI API | OpenAI fallback voice with simple integration. | $$$0.015 per 1K characters | ★★☆ | 400 ms |
| TTS | OpenAI TTS-1 HDHigher-quality OpenAI speech model. | OpenAI API | Polished voiceovers and slower premium TTS fallback. | $$$$0.030 per 1K characters | ★★★ | 700 ms |
| TTS | Deepgram Aura-2Andromeda English voice. | Deepgram API | Fast TTS when STT and TTS should share one vendor. | $$$$0.030 per 1K characters | ★★★ | 150 ms |
| TTS | Cartesia Sonic-2Katie, Blake, and Sarah voices. | Cartesia API | Lowest-latency TTS, warm voice agents, interruption-heavy calls. | $$$$$$0.065 per 1K characters | ★★★ | 60 ms |
| TTS | ElevenLabs Turbo v2.5Fast ElevenLabs voice option. | ElevenLabs API | Recognizable premium voice quality with lower latency. | $$$$$0.050 per 1K characters | ★★★ | 300 ms |
| TTS | ElevenLabs Multilingual v2Multilingual ElevenLabs voice option. | ElevenLabs API | Higher-quality multilingual speech and branded voices. | $$$$$$0.060 per 1K characters | ★★★ | 550 ms |
| VAD | Silero VADOpen-source voice activity detector. | Flickki worker | Detecting when a caller starts and stops speaking. | FreeRuns locally in the worker | ★★☆ | 10 ms |
Prices are provider pass-through estimates from the current runtime registry.
The starter assistants on the assistants page are just the opening move. Each one gives you a style, useful skills, recommended tools, and sane boundaries — then it's yours to expand, fork, and rewire however you want.
Swap the LLM. Add a skill. Attach your own tools. Paste in your domain glossary. Wire a webhook to your internal API. Check the whole thing into git. Flickki doesn't care how fancy you get — it just runs the file.
The interview is for normal people. If you already know what you're doing, Flickki gets out of your way and gives you a clean, versioned, pasteable source of truth.
The webhook tool is a full escape hatch — point it at your Supabase function, your Zapier scenario, your internal API. Structured inputs and outputs mean the LLM knows exactly what it can do.
Every assistant keeps a history. Roll back, diff, replay a transcript against a new version. Catch regressions before they hit production.
The interview is for everyone. This is for people who want to know what happens between a message arriving and an outcome landing in your portal — and who want to wire the underlying pieces themselves.
Assistants are plain Markdown with YAML front-matter. Paste one in, edit it, diff it, check it into git. The interview just writes the same file under the hood.
--- name: Sam — front desk assistant business: Delaney Plumbing starter: library/front-desk-assistant voice: elevenlabs/nicole llm: claude-sonnet-4-6 skills: - message.take - urgency.triage - appointment.book glossary: - hydrojet - backflow preventer - Outer Sunset tools: - calendar.book - webhook.post # → our CRM escalate_when: - caller says "flooding" - caller asks for the owner by name ---
Flickki runs the file. Pick a channel, drop a Markdown assistant into the editor, and watch the runtime spin up a real Room with real tools attached.
Free sign up →