find the only 10% is real part interesting. I'm starting to note two distinct camps emerge I think? Transformer architectures are just search (/bad search) vs they are the first commercial grade compression of language into real vectors. The trap I think is it's somewhat subtle in how they discuss things?