More

edunteman · 2026-03-02T18:02:11 1772474531

The llm detector in my brain went off too

Cloudef · 2026-03-02T21:45:26 1772487926

Every paragraph in the article is exactly what LLM produces

edunteman · 2026-02-25T15:36:27 1772033787

Your repo was actually a major point of reference! Thank you for open sourcing it. Ironically when I first got into zig I built a similar generator for python bridging which your project reminded me of https://github.com/erik-dunteman/zigpy

Ultimate decision for not using a bindings generator was primarily to deeply understand NAPI.

cztomsik · 2026-03-02T13:40:08 1772458808

great to hear I could help :) yeah, no worries - I totally understand :)

edunteman · 2026-02-23T20:45:19 1771879519

Correct, your PATH resolves to your local tools as if it was unprotected bash, but syscalls are filtered/virtualized

vrn21 · 2026-02-24T07:43:11 1771918991

from a utilitarian perspective, can we swap this instead of a e2b or some other provider? since this doesnt require n number of micrvovm kernals and rootfs hanging round?

edunteman · 2026-02-24T17:53:42 1771955622

Exactly, that’d be the intention. For compute-heavy or long running jobs you’d still probably want a dedicated VM like on E2B but for quick stuff, bVisor

edunteman · 2026-02-23T20:23:21 1771878201

Hell yeah, love to hear it! Happy to answer any questions or issues you run into

edunteman · 2026-01-26T17:48:24 1769449704

The part that most resonates with me is the lingering feeling of “oh but it must be my fault for underspecifying” which blocks the outright belief that models are just still sloppy at certain things

edunteman · 2025-10-29T02:45:32 1761705932

Good question, I imagine you’d need to set up an ngrok endpoint to tunnel to local LLMs.

In those cases perhaps an open source (maybe even local) version would make more sense. For our hosted version we’d need to charge something, given storage requirements to run such a service, but especially for local models that feels wrong. I’ve been considering open source for this reason.

edunteman · 2025-10-29T02:32:10 1761705130

I’d love your opinion here!

Right now, we assume first call is correct, and will eagerly take the first match we find while traversing the tree.

One of the worst things that could currently happen is we cache a bad run, and now instead of occasional failures you’re given 100% failures.

A few approaches we’ve considered - maintain a staging tree, and only promote to live if multiple sibling nodes (messages) look similar enough. Decision to promote could be via tempting, regex, fuzzy, semantic, or LLM-judged - add some feedback APIs for a client to score end-to-end runs so that path could develop some reputation

toobulkeh · 2025-10-29T07:49:17 1761724157

I’d assume RL would be baked in to the request structure. I’m surprised OAI spec doesn’t include it, but I suppose you could hijack a conversation flow to do so

edunteman · 2025-10-29T02:25:40 1761704740

Very, very common approach!

Wrote more on that here: https://blog.butter.dev/the-messy-world-of-deterministic-age...

toobulkeh · 2025-10-29T07:47:23 1761724043

What a great overview!

I’d love your thoughts on my addition, autolearn.dev — voyager behind MCP.

The proxy format is exactly what I needed!

Thanks

edunteman · 2025-10-29T01:02:48 1761699768

Awesome to hear you’ve done similar. JSON artifacts from runs seem to be a common approach for building this in house, similar to what we did with the muscle mem. Detecting cache misses is a bit hard without seeing what the model sees, part of what inspired this proxy direction.

Thanks for the nice words!

edunteman · 2025-10-29T00:56:18 1761699378

I feel the same - we’ll use it as long as we can since it’s customer aligned but I wouldn’t be surprised if competitive or COGs costs force us to change in the future.