Hacker Newsnew | past | comments | ask | show | jobs | submit | teilom's commentslogin

+1 on interaction terms + tails : fanout × retries × context growth is where linear token math dies.

One thing we do in enzu is make “budget as constraint” executable: we clamp `max_output_tokens` from the budget before the call, and in multi-step/RLM runs we adapt output caps downward as the budget depletes (so it naturally gets shorter/cheaper instead of spiraling). When token counting is unavailable we explicitly enter a “budget degraded” mode rather than pretending estimates are exact.

Also agree p90/p95 cost/run matters more than averages; max-output caps are crude but effective.

Docs: https://github.com/teilomillet/enzu/blob/main/docs/PROD_MULT... and https://github.com/teilomillet/enzu/blob/main/docs/BUDGET_CO...


If you’re trying to estimate before prod, logging these 4 things in a pilot gets you 80% there: - tokens/run (in+out) - tool calls/run (and fanout) - retry rate (timeouts/429s) - context length over turns (P50/P95)

Fanout × retries is the classic “bill exploder”, and P95 context growth is the stealth one. The point of “budget as contract” is deciding in advance what happens at limit (degraded mode / fallback / partial answer / hard fail), not discovering it from the invoice.


Background note I wrote (framing + “budget as contract”): https://github.com/teilomillet/enzu/blob/main/docs/BUDGETS_A...


Yo I am building gofh, I am heavily inspire by the go FastHTML(https://fastht.ml) Python library.

That I prefer over streamlit or gradio.

The goal is to dev something to start-up allowing users to build a cool working POC web app within 100 lines.

I am new to Go, still learning, would be cool if some of you comes along or if you could give feedback.

Cheers,


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: