That is, if we are lucky and everything is done according to the proper tradition of doing statistics with 1% confidence intervals. In practice, things are probably even worse (many use 5% for instance), but this is what you would expect when everyone used proper methodology.

Think about it...

Seriously? That is what concerns you? This doesn't even make any sense -- 99% confidence intervals only say that "science" is _off_ in 1% of the case, not _wrong_. And people are wrong way more than 1% of the time, and all the science is done and verified by people, so the methodology inaccuracies are way down on the list where science is wrong.

Also to think science is _wrong_ assumes that there's some static thing called science. In reality science is a long-term social process, with a feedback loop, that produces useful knowledge. Some of this knowledge may be (and often is) wrong, most of it is irrelevant, but the fraction that is correct and relevant can be used to further the interests of people (and possibly mankind).

Formality in science has the same role as standards in engineering, it doesn't ensure useful or correct results, it just help avoid stupid mistakes.

My 2 cents š

Good points, all of them... even changed the title to better reflect reality... Thinking of that 1% as an upper bound rather than the actual number resolves many of the things for me, at least. Thanks š

Also - no, I have not lost any sleep over it. It just occurred to me that this is a somewhat interesting point that not many realize..

Things could be even worse, especially in life sciences:

"Why most published research findings are false" (J.P.A. Ioannidis, PLOS Medicine, 2005)

http://www.plosmedicine.org/article/info%3Adoi%2F10.1371%2Fjournal.pmed.0020124;jsessionid=9B627C3F78570091A27045A54EBC5BA9

Well, there are different kinds of science. There are purely formal fields like maths, statistics, or computer science (which some people do not, in fact, consider proper āscienceā), and these most of the time cannot be inherently right or wrong, we can only speak of formal correctness which is mostly verifiable with certainty via peer-review.

And then there are the experimental fields like natural sciences, life sciences or sociology, where we do care about statistical methodology and confidence intervals, as well as honesty of the researchers. We therefore should expect a certain proportion of āfalse positiveā conclusions. However, whenever these are going to have any practical significance, they would either have to have some āreasonable prior supportā, or be properly reconfirmed by follow up work anyway.

There is a related topic, which I find even more curious, namely that of the publication bias both in medical studies and in the field of algorithm development.

Let me also add that cases, when statistical methodology is either used inappropriately or is close to being āinapplicableā, are way more common than you could expect.

Consider a study where a researcher measures certain indicators for a number of objects

with the purpose of determining what are the interrelations among the indicators. As trivial as this problem formulation might seem, contemporary statistical methodology, combined with contemporary scientific peer-review, fail to provide even the expected 5% guarantee for the published findings.Indeed, considering the fact that there was no clear experimental statement before the data collection began, once the researcher obtains the dataset, he essentially starts looking for statistical relations. Suppose that during this process he performs 50 different hypothesis tests, about half of which are in fact complex methods, such as linear model fits and ANOVAs, that produce more than one p-value per invocation. It is not improbable that the researcher will not document the āuninterestingā tests, and end up with the situation where she has essentially performed several hundreds of statistical tests, yet is conscious about just some 20 or 30 of them. However, the number of tests you perform may seriously influence your final conclusion: via multiple testing correction your p-values may easily turn from seemingly very significant to completely insignificant. As a result, the researcher (who, as we remember, did not count his tests properly) can be misled by her own p-values.

Furthermore, the peer-reviewers will later have absolutely no way to verify exactly how many tests the researcher performed, because the number indicated in the paper may easily be less than the true one not only due to ignorance of the researcher, but simply due to the desire to make results look nicer.

A follow up study should make things clearer, but it is not always possible, and even if it is, it is still never possible to verify, whether the claims were presupposed before the study or derived from the data.

As a result, it is only the common sense of the reader, rather than statistics and protocols of the researcher and the reviewers, which should help to assess the validity of the claims in the paper.