The latency is a factor of the models you are picking up for reasoning. If you are colocating the models by self hosting on GPUs, the latency can be as low as 500 - 600 ms between bot - user turns. With models like Gemini-2.5-flash, the latency is around 800-1000 ms. The latency can be higher with reasoning and larger models, like gpt-4.1.
We are more of a horizontal platform and can support a wide variety of use cases. We are serving large BPO call centres on our managed hosted service for outbound and inbound cases.
There are individual builders also trying to build inbound use cases for personal use or trying to build their business on top of Dograh.
There is always a tradeoff between latency and reasoning. The bigger the model, the more stuff we can get it to do by better instruction following, but it comes at a cost of increased latency. OpenSource colocated smaller models do much better in terms of latency, but the instruction following is not that great, and we might have to tune the prompts much more than tuning for bigger models.
Hello. Thank you for trying out Dograh and being our 100th Github Star.:)
1. Having different voice personas selector like Vapi is in our pipeline.
2. The lag can be either because of system resource constraints, or due to LLM Inference Lags from the LLM inference providers. We are constantly trying to squeeze out every milisecond to combat the latency issues.
Earlier I was using other platforms for production voice agents. One thing that became obvious was the cost: 60–70% of our total spend was the Vapi platform fee, and only 30-40% was actual LLM/STT/TTS usage. Platform cost dominated everything. That alone pushed us toward something self-hosted.
But when we switched to OSS stacks (Pipecat, LiveKit), we realise that even with great OSS, the plumbing was still painful and necessary- no standard way to extract variables from conversations (name/date/order ID), no straightforward tracing of LLM calls, no way to run AI-to-AI test loops, and no fast workflow iteration - and every change meant another redeploy.
The infrastructure glue kept ballooning, and each time it felt like rebuilding the same system from scratch.
Dograh came out of that combination of cost pain and integration pain.
Happy to dig deeper into anything.
The latency is a factor of the models you are picking up for reasoning. If you are colocating the models by self hosting on GPUs, the latency can be as low as 500 - 600 ms between bot - user turns. With models like Gemini-2.5-flash, the latency is around 800-1000 ms. The latency can be higher with reasoning and larger models, like gpt-4.1.