This paper is really important: it shows that transfer learning can be applied t...

Eridrus · on June 11, 2018

Is it really true that these is no reason to train your own language model? What if you have hundreds of millions of unlabeled examples of something that looks different to formal language? E.g. you're analyzing Slack messages.

jph00 · on June 12, 2018

I tested adding things like tweets and reddit comments to our language model, and it didn't help the target model at all, even when that used less formal language.

Note however that the fine tuning stage adapts to the target corpus - it just doesn't require starting from random weights (so it's orders of magnitude faster).

pwaai · on June 12, 2018

>Furthermore, with only 100 labeled examples, it matches the performance of training from scratch on 100x more data. We open-source our pretrained models and code.

Good guy Jeremy. Works hard during day, open sources it at night

andreyk · on June 12, 2018

Should be noted this idea is not that novel though - it's just replacing word vectors with a pre-trained model. Interesting that it works so well but not very surprising.

laichzeit0 · on June 12, 2018

It may not be novel, but then why do you we still have commercial APIs (Microsoft, Google, IBM Watson, etc.) where there is pretty much no way to "fine tune" them to your domain with a small set of supervised examples? We all know domain adaptation is a real problem.

Instead you either have to roll-your-own models in-house (which defeats the whole point of using a ready made cloud solution) or deal with whatever accuracy you happen to get from those APIs.

IMHO this is an area where you can make some serious competitive headway in commoditised AI/ML. Do all the heavy lifting of pretraining and give your customers an API to "fine-tune" with. Who is currently doing this?

pwaai · on June 12, 2018

Well i hope that http://js.fo will be that as i add more aiml libraries although currently focused on web area. I hope to expend in other areas

radarsat1 · on June 12, 2018

Wait what is the difference between word vectors and a pre-trained model? Aren't word vectors generated by a neural network trained to predict the next word, or to recognize noise (noise-contrastive training)? What is the difference in a "pre-trained model" as opposed to the training needed to generate word vectors?