This paper is really important: it shows that transfer learning can be applied to a wide variety of NLP problems with great success. They show state of the art results on nearly every major class of NLP problem.
The basic approach is the same as our ULMFiT (http://nlp.fast.ai/classification/2018/05/15/introducting-ul...) model - pre-train a language model (a model that predicts the next word in a sequence) on a large corpus, and then modify the language model slightly for whatever task you wish to do (e.g. text classification). Finally, fine-tune that model using your target corpus (e.g. texts labeled with classes).
This new paper has two significant leaps over ULMFiT:
- Replace the RNN with a transformer model
- Apply to many more types of problem.
Note that although the original language model takes them a long time to train (a month on 8 GPUs), there's almost no reason for anyone else to create their own model from scratch, except if you need to use this approach on a language that doesn't have a pre-trained model yet. The transfer learning fine-tuning doesn't take anywhere close to as long as the language model pre-training, and you can just use the existing pre-trained weights.
Is it really true that these is no reason to train your own language model? What if you have hundreds of millions of unlabeled examples of something that looks different to formal language? E.g. you're analyzing Slack messages.
I tested adding things like tweets and reddit comments to our language model, and it didn't help the target model at all, even when that used less formal language.
Note however that the fine tuning stage adapts to the target corpus - it just doesn't require starting from random weights (so it's orders of magnitude faster).
>Furthermore, with only 100 labeled examples, it matches the performance of training from scratch on 100x more data. We open-source our pretrained models and code.
Good guy Jeremy. Works hard during day, open sources it at night
Should be noted this idea is not that novel though - it's just replacing word vectors with a pre-trained model. Interesting that it works so well but not very surprising.
It may not be novel, but then why do you we still have commercial APIs (Microsoft, Google, IBM Watson, etc.) where there is pretty much no way to "fine tune" them to your domain with a small set of supervised examples? We all know domain adaptation is a real problem.
Instead you either have to roll-your-own models in-house (which defeats the whole point of using a ready made cloud solution) or deal with whatever accuracy you happen to get from those APIs.
IMHO this is an area where you can make some serious competitive headway in commoditised AI/ML. Do all the heavy lifting of pretraining and give your customers an API to "fine-tune" with. Who is currently doing this?
Wait what is the difference between word vectors and a pre-trained model? Aren't word vectors generated by a neural network trained to predict the next word, or to recognize noise (noise-contrastive training)? What is the difference in a "pre-trained model" as opposed to the training needed to generate word vectors?
The basic approach is the same as our ULMFiT (http://nlp.fast.ai/classification/2018/05/15/introducting-ul...) model - pre-train a language model (a model that predicts the next word in a sequence) on a large corpus, and then modify the language model slightly for whatever task you wish to do (e.g. text classification). Finally, fine-tune that model using your target corpus (e.g. texts labeled with classes).
This new paper has two significant leaps over ULMFiT:
- Replace the RNN with a transformer model
- Apply to many more types of problem.
Note that although the original language model takes them a long time to train (a month on 8 GPUs), there's almost no reason for anyone else to create their own model from scratch, except if you need to use this approach on a language that doesn't have a pre-trained model yet. The transfer learning fine-tuning doesn't take anywhere close to as long as the language model pre-training, and you can just use the existing pre-trained weights.
The previous HN discussion on ULMFit may also be of interest: https://news.ycombinator.com/item?id=17076222