Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Does anyone know if LLMs have been used to augment their own training data?

I wonder what would happen if you trained an LLM on a little input but then had it generate a lot of synthetic input added to the training data. I think of it as "dreaming". This seems like it would just add noise, but LLMs are able to improve their output by augmenting their own context (by "thinking out loud"), maybe they can do the same with their own training data?



Yes, a lot of recent research uses LLM outputs as training data, and it's been an extremely successful line of work.


That's effectively what RLHF is; a means for LLMs to self train on their own output exclusively by using a small human curated dataset as guidance as to what a "good" and "bad" output is.


It's interesting that this conclusion is the exact opposite of a sibling comment, which proposes that a small, human-curated corpus may be more effective than big, synthetic datasets.


I have no "conclusion". I'm just wondering.


If it's training on the same data that it generates, there's no new information being added into the system. You'd be reinforcing everything that it already gets right and wrong, which would lead to zero improvement.

That said, it's common to use large models to generate synthetic training data for training other smaller models. In this way, we're able to transfer knowledge from one model to another.


You can find the answer by trying the following: generate random data according to a model, fit a linear regression (or any other distribution), sample from the distribution, add it as to the training set.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: