Hacker Newsnew | past | comments | ask | show | jobs | submit | edunteman's commentslogin

The llm detector in my brain went off too

Every paragraph in the article is exactly what LLM produces

Your repo was actually a major point of reference! Thank you for open sourcing it. Ironically when I first got into zig I built a similar generator for python bridging which your project reminded me of https://github.com/erik-dunteman/zigpy

Ultimate decision for not using a bindings generator was primarily to deeply understand NAPI.


great to hear I could help :) yeah, no worries - I totally understand :)

Correct, your PATH resolves to your local tools as if it was unprotected bash, but syscalls are filtered/virtualized

from a utilitarian perspective, can we swap this instead of a e2b or some other provider? since this doesnt require n number of micrvovm kernals and rootfs hanging round?

Exactly, that’d be the intention. For compute-heavy or long running jobs you’d still probably want a dedicated VM like on E2B but for quick stuff, bVisor

Hell yeah, love to hear it! Happy to answer any questions or issues you run into

The part that most resonates with me is the lingering feeling of “oh but it must be my fault for underspecifying” which blocks the outright belief that models are just still sloppy at certain things


Good question, I imagine you’d need to set up an ngrok endpoint to tunnel to local LLMs.

In those cases perhaps an open source (maybe even local) version would make more sense. For our hosted version we’d need to charge something, given storage requirements to run such a service, but especially for local models that feels wrong. I’ve been considering open source for this reason.


I’d love your opinion here!

Right now, we assume first call is correct, and will eagerly take the first match we find while traversing the tree.

One of the worst things that could currently happen is we cache a bad run, and now instead of occasional failures you’re given 100% failures.

A few approaches we’ve considered - maintain a staging tree, and only promote to live if multiple sibling nodes (messages) look similar enough. Decision to promote could be via tempting, regex, fuzzy, semantic, or LLM-judged - add some feedback APIs for a client to score end-to-end runs so that path could develop some reputation


I’d assume RL would be baked in to the request structure. I’m surprised OAI spec doesn’t include it, but I suppose you could hijack a conversation flow to do so


Very, very common approach!

Wrote more on that here: https://blog.butter.dev/the-messy-world-of-deterministic-age...


What a great overview!

I’d love your thoughts on my addition, autolearn.dev — voyager behind MCP.

The proxy format is exactly what I needed!

Thanks


Awesome to hear you’ve done similar. JSON artifacts from runs seem to be a common approach for building this in house, similar to what we did with the muscle mem. Detecting cache misses is a bit hard without seeing what the model sees, part of what inspired this proxy direction.

Thanks for the nice words!


I feel the same - we’ll use it as long as we can since it’s customer aligned but I wouldn’t be surprised if competitive or COGs costs force us to change in the future.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: