It's a next-word predictor like a Markov chain, but a Markov chain couldn't do all the things ChatGPT does. ChatGPT has learned lots of syntax-level patterns pretty well.
I'm not sure what distinction you're getting at, but transformers do use "fill in the missing word" training and text generation chooses the next word (token actually) one at a time. Once it chooses a word, it doesn't go back.