There is one rule of thumb that I find quite useful and happen to use fairly often. It is probably not widely known nor described in textbooks (I stumbled upon it on my own), so I regularly have to explain it. Next time I'll just point out to this post.

The rule is the following: a proportion estimate obtained on a sample of points should only be trusted up to an error of .

For example, suppose that you read in the newspaper that "25% of students like statistics". Now, if this result has been obtained from a survey of 64 participants, you should actually interpret the answer as , that is, , which means that the actual percentage lies somewhere between 12.5% and 37.5%.

As another example, in machine learning, you often get to see cases where someone evaluates two classification algorithms on a test set of, say, 400 instances, measures that the first algorithm has an accuracy of 90%, the second an accuracy of, say, 92%, and boldly claims the dominance of the second algorithm. At this point, without going deeply into statistics, it is easy to figure that should be somewhere around 5%, hence the difference between 90% and 92% is not too significant to celebrate.

### The Explanation

The derivation of the rule is fairly straightforward. Consider a Bernoulli-distributed random variable with parameter . We then take an i.i.d. sample of size , and use it to estimate :

The 95% confidence interval for this estimate, computed using the normal approximation is then:

What remains is to note that and that . By substituting those two approximations we immediately get that the interval is at most

### Limitations

It is important to understand the limitations of the rule. In the cases where the true proportion estimate is and is large enough for the normal approximation to make sense (20 is already good), the one-over-square-root-of-n rule is very close to a true 95% confidence interval.

When the true proportion estimate is closer to 0 or 1, however, is not close to 0.5 anymore, and the rule of thumb results in a conservatively large interval.

In particular, the true 95% confidence interval for will be nearly two times smaller (). For the actual interval is five times smaller (). However, given the simplicity of the rule, the fact that the true is rarely so close to 1, and the idea that it never hurts to be slightly conservative in statistical estimates, I'd say *the one-over-a-square-root-of-n rule* is a practically useful tool in most situations.

### Use in Machine Learning

The rule is quite useful to quickly interpret performance indicators of machine learning models, such as precision, accuracy or recall, however, you should make sure you understand *what proportion* is actually being computed for each metric. Suppose we are evaluating a machine learning model on a test set of 10000 elements, 400 of which were classified as "positive" by the model. We measure the *accuracy* of the model by computing a proportion of correct predictions *on the whole set of 10000 elements*. Thus, the here is 10000 and we should expect the confidence intervals of the resulting value to be under 1 percent point. However, the *precision* of the model is measured by computing the proportion of correct predictions *among the 400 positives*. Here is actually 400 and the confidence intervals will be around 0.05.

The rule can also be used to *choose* the size of the test set for your model. Despite the popular tradition, using a *fraction* of your full dataset for testing (e.g. "75%/25% split") is arbitrary and misguided. Instead, it is the *absolute size* of the test sample that you should care most about. For example, if you are happy with your precision estimates to be within a 1% range, you only need to make sure your test set would include around 10000 positives. Wasting an extra million examples for this estimate will increase the quality of your estimate, but you might be better off leaving these examples for model training instead.