Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

The success of the newest GPT models relies on RL to refine the latent space inside the LLM. There's a bottleneck when using humans to refine that space. The next model or subsequent models will surely use RL techniques like self-play to break through that bottleneck.


The success of the newest GPT models relies on RL to refine the latent space inside the LLM. There's a bottleneck when using humans to refine that space.

Human based RL is used because humans know stuff about the real world and can sort language utterances by this. There's "self play" process that gives a system this sort of knowledge.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: