Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

If you ask an LLM to generate JSON or another language that has a grammar, it will sometimes produce invalid syntax. This pull request constrains the LLM so that it can only output valid syntax according to whatever grammar you supply. It's a modification to the sampling procedure.

What is the sampling procedure? Well, the way an LLM generates text is one token (short sequence of characters) at a time. First the giant neural net assigns a probability to every possible token (this is the hard part). Then a sampling procedure uses the probabilities to pick one of the tokens, and the process repeats.

The sampling procedure is not a neural net and can be modified in many different ways. You might think that the sampling procedure should always simply pick the token with the highest probability (greedy sampling). You can do that, but it's usually better to pick at random weighted by the probabilities. This gives more diversity and is less likely to get stuck in loops. But this means that literally any token with nonzero probability might get picked, so you can see how this might lead to invalid JSON being generated sometimes. This pull request zeros out the probabilities of all the tokens that wouldn't be valid according to your grammar, so they can't be picked.

BTW there are lots of other interesting modifications to the sampling process you could consider. For example, maybe you can see that in the process of sampling tokens one after the other you might paint yourself into a corner and end up with no good options to choose from. So maybe it makes sense to allow backtracking. In fact, maybe at each sampling step we can consider multiple options, making a tree of possible outputs, and at the end we can pick the path through the tree with the highest overall probability. Of course we can't consider every option; it would be a complete tree with a branching factor of the number of possible tokens, which would grow exponentially. Let's prune the tree at each step and only consider the top, say, five paths we've seen so far. This is called "beam search". It's not normally used for LLMs because the neural net that generates the probabilities is very expensive to run and multiplying that cost by a factor of e.g. five is unpalatable. But it can be done, and produces somewhat better results. You could also consider using MCTS like chess engines do.



This is a sort of modern version of https://wiki.c2.com/?AlternateHardAndSoftLayers, one of the most useful software patterns.


Say more… I read the link, and it seems to be advocating for replacing specific business logic with a generic code interpreter?


I swear the content of that page was a lot more helpful the last time I looked at it.

Anyway, the idea is that if you can architecture your program into layers (rather than spaghetti) it can get a lot more powerful if some of your layers are "soft" (dynamic typing/scripting/etc) and some are "hard" (static typing/lower level code/etc). Because some things like UI are too inconvenient to do in static languages so are better as a soft layer, but you want that resting on top of something more proven and type-checked so it still catches bugs easily.

In this case, LLMs are a big blob of unproven data that we don't know why they work, i.e. they're not tested. But that's also the reason they can do everything they do.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: