How do our base reptilian brains reason? We don't know the specifics, but unless it's magic, then it's determined by some kind of logic. I doubt that logic is so unique that it can't eventually be reproduced in computers.
Reptiles didn't use language tokens, that's for sure. We don't have reptilian brains anyway, it's just that part of our brain architecture evolved from a common ancestor. The stuff that might function somewhat similar to an LLM is most likely in the neocortex. But that's for neuroscientists to figure out, not computer scientists. Whatever the case is, it had to have evolved. LLMs are intelligently designed by us, so we should be a little cautious in making that analogy.
This is a great idea but it is only possible if the model(s) can actually reason.
Currently, even GPT-4 struggles with:
- Scope
- Abduction (compared to deduction and induction which it appears already capable of)
- Out-of-distribution questions
- Knowing what it doesn't know
Etc.
General understanding and in-context learning are incredible, but there are still missing pieces. A council of voices that all have the same blind spots will still get stuck.
How would this scale for a use case like writing code? I could imagine that some inputs would require a large number of neurons. Would this architecture be able to do that if it were scaled up?
I'm also curious if this model architecture would achieve the grokking of more complex concepts at scale.
For chess engines such as Stockfish to beat weaker opponents, you would want to turn up the contempt parameter.[1]
To be able to do this, you would want to use a UCI-compliant chess GUI such as Cute Chess.[2] It lets you change the command line arguments you use for your engine, and you can play Human-vs-Human, Engine-vs-Engine, Engine-vs-Human, etc.
It appears that RAG actually dominates for 2k context lengths compared to this method, but that this method outperforms it more and more the longer the context gets (see the graph titled "Retrieval Benchmark Results, by Document Length")
"Document length" is the length of the text that contains the answer. "Context length" is how much text the model can process to produce the answer, and this number is fixed across their experiments.
When the document length is 2k, it's likely smaller than the context and RAG can just retrieve the entire document to have the model read it. When the document is longer, RAG needs to actually do some work to pick the parts that contain the answer.
The "extended mind" can always query tokens across the entire document, though evidently worse than if they were included in the context.
I disagree. Mistral actually being open source, as well as the new criteria for how open source models are, is pretty worth being aware of.
Plus, if they make a larger Mistral model that's the same level of performance but much larger, it'll be the be one of the very best open source models.
It's also pretty insulting that Meta is calling their models "open source" when that's really debatable.