Zero shot is wrong, but that definition is commonly used. Zero shot is testing o...

Zero shot is wrong, but that definition is commonly used.

Zero shot is testing out if distribution, not just "a task" not trained on. The later is ill defined.

The original definition comes from a few papers. But the classic example is a clarifier recognizing zebras but having never been trained in zebras (but may have been trained on horses). There's are out of distribution. But importantly, out of the implicit distribution, not the target distribution.

The common improper usage usually confuses these two. A simple example might me training in 256x256 images and testing on 1024x1024. That's still in the implicit distribution (as long as the classes are identical). A very common example is training on a large dataset like LAION and then testing on coco or image net 1k. This is not zero shot because the classes in ImageNet are in LAION (and in Coco). Basically, this is a useless definition because then any validation or test set is zero shot because those were never seen in the training data and thus out of the training distribution. But remember that data sets are proxies for larger distributions.

Where is can get sometimes tricky is tasks (emergence has entered the chat). For example, you may not intend to train a generative model to do clarification but you probably did (it's very clear -- in the math -- if you're training density models (KLD, score, etc)). This can get hairy because it's very easy to train a model to do things that you aren't realizing you are and later find out. Some people can get upset about this but it's the nature of frameworks that have low interpretability. There's still a lot of mathematics we need to learn and it tends not to be an explicit focus in ML but there are plenty in the community focused on this.