The "default" alpha and beta are not the correct ones for a website A/B test.
If you're designing a drug, you'd better be very careful not to accidentally approve something that is useless. It would cost a ton of money and lives in the long run if it was no better than placebo. False rejection of the null is very bad. False acceptance of the null is not so bad.
By contrast, if you're doing an A/B test on a website, you're actually not in bad shape if you accidentally think that a red button is a bit better than a blue button, assuming that they're pretty close. False rejection of the null is okay.
However you are screwed if you miss out on the chance that a red button gives you 50% more conversion. With websites, false acceptance of the null is very bad. It's okay to mistakenly think your button is effective but it's very bad to mistakenly think that the button is ineffective.
Websites have the opposite cost benefit calculation to science generally and shouldn't use the same parameters.
Perhaps a simple conceptual tool is to consider risk as the product of cost and likelihood, and choose an acceptable level of overall risk for type 1 and 2 errors. Thus the potential cost of each error has to be part of the decision making process.
Yes, of course, I can see that if you have an extremely small sample, then the _resolution_ of your results will suffer. However, I think it's much more important to ensure unbiased sampling than it is to ensure a large sample size.
For example, if you sample 1% of the population in a fair and unbiased way, that would tell you something with a much higher degree of confidence than if you sampled even 10% of the population in a biased way (or in a way such that you don't know whether you are biased or not).
Yes, but it's the resolution that would suffer, not necessarily the result. For example, if 65% of the population would vote for candidate 1, an unbiased sample of size 10 would indicate that either 60% or 70% of the population would vote for candidate 1. A sample that is biased could literally tell you anything, regardless of how large it is (in absolute numbers).
Not every time, surely? A sample is randomly taken, so there will be variation. Even if it's unbiased, your samples will swing between all possible extremes. So you need to take a very large number of small samples.
You can't take an "unbiased" sample in the sense that you mean here.
There's only one way you can take a completely unbiased sample: if you know exactly how everyone will vote in advance and select them carefully on that basis. But if you already know how everyone will vote in advance, then sampling is a fruitless exercise.
If you're designing a drug, you'd better be very careful not to accidentally approve something that is useless. It would cost a ton of money and lives in the long run if it was no better than placebo. False rejection of the null is very bad. False acceptance of the null is not so bad.
By contrast, if you're doing an A/B test on a website, you're actually not in bad shape if you accidentally think that a red button is a bit better than a blue button, assuming that they're pretty close. False rejection of the null is okay.
However you are screwed if you miss out on the chance that a red button gives you 50% more conversion. With websites, false acceptance of the null is very bad. It's okay to mistakenly think your button is effective but it's very bad to mistakenly think that the button is ineffective.
Websites have the opposite cost benefit calculation to science generally and shouldn't use the same parameters.