a6kme's comments

a6kme · 2026-03-13T07:04:41 1773385481

Hello.

The latency is a factor of the models you are picking up for reasoning. If you are colocating the models by self hosting on GPUs, the latency can be as low as 500 - 600 ms between bot - user turns. With models like Gemini-2.5-flash, the latency is around 800-1000 ms. The latency can be higher with reasoning and larger models, like gpt-4.1.

a6kme · 2026-03-12T16:51:08 1773334268

Thanks for the kind words @ajabish.

We are more of a horizontal platform and can support a wide variety of use cases. We are serving large BPO call centres on our managed hosted service for outbound and inbound cases.

There are individual builders also trying to build inbound use cases for personal use or trying to build their business on top of Dograh.

a6kme · 2026-03-12T15:06:13 1773327973

Hello HN. I am Abhishek, one of the creators and maintainers of Dograh - github.com/dograh-hq/dograh

Please feel free to ask any question you may have or give us feedbacks on how we can make it better for you.

Thanks!

a6kme · 2025-12-16T06:24:56 1765866296

There is always a tradeoff between latency and reasoning. The bigger the model, the more stuff we can get it to do by better instruction following, but it comes at a cost of increased latency. OpenSource colocated smaller models do much better in terms of latency, but the instruction following is not that great, and we might have to tune the prompts much more than tuning for bigger models.

a6kme · 2025-12-16T06:22:40 1765866160

Thank you for your kind words.

Among many other useful and fun things, yes, the dream of having a Star Trek Voice Computer or the good HAL is not very far away. :)

a6kme · 2025-12-16T06:21:23 1765866083

Hello. Thank you for trying out Dograh and being our 100th Github Star.:)

1. Having different voice personas selector like Vapi is in our pipeline. 2. The lag can be either because of system resource constraints, or due to LLM Inference Lags from the LLM inference providers. We are constantly trying to squeeze out every milisecond to combat the latency issues.

Thank you again for your kind words.

a6kme · 2025-12-08T08:43:12 1765183392

Earlier I was using other platforms for production voice agents. One thing that became obvious was the cost: 60–70% of our total spend was the Vapi platform fee, and only 30-40% was actual LLM/STT/TTS usage. Platform cost dominated everything. That alone pushed us toward something self-hosted.

But when we switched to OSS stacks (Pipecat, LiveKit), we realise that even with great OSS, the plumbing was still painful and necessary- no standard way to extract variables from conversations (name/date/order ID), no straightforward tracing of LLM calls, no way to run AI-to-AI test loops, and no fast workflow iteration - and every change meant another redeploy.

The infrastructure glue kept ballooning, and each time it felt like rebuilding the same system from scratch.

Dograh came out of that combination of cost pain and integration pain. Happy to dig deeper into anything.

a6kme · on Aug 25, 2024

You are probably not spending time yak shaving and over engineering which is time well spent in understanding the problem and building the business :)