Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

> Second, let’s give credit where credit’s due. GPT-4 is exactly as impressive as users say. The details of the internal architecture can’t change that. If it works, it works. It doesn’t matter whether it’s one model or eight tied together.

If this is true, I wonder about how likely chain of thought is involved in passing data between the different models?



Chain of thought is an in-context technique. The ensemble of models concept that GPT-4 supposedly uses works within the model itself. My understanding of the relevant papers is poor, but my impression is that there are layers that help the model select which two of the 16 different “experts” should contribute to the generation of the next token.

The 16 models in the ensemble are all trained in the same way, but attend to the input data differently. It’s a little like how multi-headed attention works by splitting the input embeddings into typically 8 parts and having a set of query, key, and value vectors each train on their own 1/8th slice of the input embeddings. Although there is no “meaning” to these slices, nonetheless the KQV vectors for each head will learn different relationships between the inputs simply because they were trained differently.


I have often wondered about this: does padding and chopping to context size along arbitrary boundaries have lasting effect on the overall data (due to misalignment at the ends, some words or lines getting chopped in the middle, etc.) Does that net out, because we do it consistently, or do we lose vital information that's compounded across the entire training data set?


Like a binary tree of models?


My guess is that it isn't doing a lot of self-interaction. I've been using it for coding, and the replies feel much too coherent to be either:

1. expanded from an outline (2-stage process)

or

2. being chained between models

If it was multi-stage, I'd expect some chinese whispers-style drift, where the plot gets lost slightly between steps. The responses I'm getting from GPT4 are focused and specific.

Likewise, if the responses were getting chained between models, I'd expect visible seams in tone / content in between.

My guess for how they're architecturing it, if it is 8 models, is either:

1. Each response handled by 1 model

or

2. Some kind of voting / confidence system that switches between models on the fly


>the details of the internal architecture can't change that

GPT4 is definitely useful, but it points towards bad news for OpenAI and potentially the entire field. A lot of people really wanted to believe that OpenAI had some secret sauce that actually pointed towards a path at true AI.

Turns out they just poached Google's own researchers and did a better job at turning Google research into a product(all the papers for this type of architecture came from Google Brain, authors are now at OpenAI). OpenAI is doing impressive work on the practical side of AI but apparently nothing revolutionary in terms of research, which is why people are disappointed by this reveal.


True AI? What, like 'one model to rule them all?' Why does it matter how many models we are using? Do you have more than one computer in your computer? Is it a true computer?


“…true AI.”

What do you understand this to mean?


Is there a tool or technique called chain of thought, or are you talking about the colloquial concept as it relates to thinking?


It has a specific technical connotation but it does map exactly to what you think it is.[1]

Basically it's a trick that recognizes that language transformers only perform computation to generate words so for complex tasks you can get better results by asking the model to explain its chain of thought and only give the answer at the end. This has the effect of giving the model "time to think". If it didn't generate those words it wouldn't have anything to hang that computation off since it is fundamentally a word-generation model.

[1] https://arxiv.org/abs/2201.11903


It's a technique, but could also be likened to how we think.

Here's a simple example: keyterms from text are extracted with text detection from an image. Those keyterms will sometimes have bad reads where "aligned AI" might pop out as "aligned Al". A subsequent "internal thought" would be formed and ask, "What's wrong with the 'aligned Al' keyterm?". If an updated response is returned, we use it instead of the original output.


They’re referring to a method of prompting the model that encourages it to think through something step by step rather than spit out the answer.

I think this is the paper that really kicked off this technique: https://arxiv.org/abs/2201.11903





Consider applying for YC's Summer 2026 batch! Applications are open till May 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: