Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

It's a next-word predictor like a Markov chain, but a Markov chain couldn't do all the things ChatGPT does. ChatGPT has learned lots of syntax-level patterns pretty well.


Is it actually a next-word predictor? I thought the training loss is against a set of words, not just one.


I'm not sure what distinction you're getting at, but transformers do use "fill in the missing word" training and text generation chooses the next word (token actually) one at a time. Once it chooses a word, it doesn't go back.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: