That's the point - it does not need to be different. If it demonstrates similar improvement to what we saw with GPT-1 --> GPT-2 --> GPT-3, then it will be enough to actually start using it. It's like the progression MNIST --> CIFAR-10 --> ImageNet --> the point where object recognition is good enough for real world applications.
But in addition to making it bigger, we can also make it better: smarter attention, external data queries, better word encoding, better data quality, more than one data type as input, etc. There's plenty of room for improvement.
That's the point - it does not need to be different. If it demonstrates similar improvement to what we saw with GPT-1 --> GPT-2 --> GPT-3, then it will be enough to actually start using it. It's like the progression MNIST --> CIFAR-10 --> ImageNet --> the point where object recognition is good enough for real world applications.
But in addition to making it bigger, we can also make it better: smarter attention, external data queries, better word encoding, better data quality, more than one data type as input, etc. There's plenty of room for improvement.