Comments on “Failure to Correct for Multiple Comparisons”—“Ten common statistical mistakes…” #8

This week’s installment of comments on Makin and Orban de Xivry’s Common Statistical Mistakes deals with #8: Failure to Correct for Multiple Comparisons. (Previous posts: #1-2, #3 , #4, #5, #6, #7.)

Makin and Orban de Xivry’s description is rather complex, but the error is a simple one. To illustrate: suppose we have a control group (gray in the plot below) and several different test groups, for example populations exposed to different drugs, or different genes whose expression levels we’re measuring, or pixels of different intensities in an MRI image. We see one of these (orange) is “significantly different” if it’s compared to the control set, with p < 0.05, and we conclude that we’ve found something meaningful.

The problem is essentially the same as the previous topic of p-hacking. (I would have grouped these into one item if I were writing the eLife article.) This is convenient, since I spent too much time on last week’s post and so can spend less time on this one! I’ll phrase the underlying issue this way: A p-value isn’t simply a property of the data alone, but also carries with it aspects of the process.

The process, in the example above, is looking at multiple measurements. We can’t just ask, “is the probability less than 0.05 that this dataset would show a particular value?” Rather, we have to ask, “is the probability less than 0.05 that any of several datasets would show a particular value?” The odds of rolling a die and getting a “6” are 1/6; the odds of rolling 6 dice and finding at least one “6” are much higher (67%).

Here’s another way to think about this: In the plot above, the mean and standard deviation of the orange points are 1.46 and 0.82, respectively; the control’s are 0.64 and 0.75. If I had only measured the control and orange datasets…

… the mean and standard deviation of the orange points are 1.46 and 0.82, respectively; the control’s are 0.64 and 0.75. Obviously, nothing has changed. However, in the latter case, p = 0.03; there’s a 3% chance that two draws from an identical random number generator (as I used) would give values as different as we see. In the former case, the numbers are exactly the same, but the true p-value, not a pairwise comparison but the actual probability of finding a dataset that’s “significantly different” than the control, is much more than 3%. (See Exercise 2.) For the p-value to mean anything, we have to account for the multiple comparisons we could make.

There are statistical treatments for correcting p-values for multiple comparisons, but less abstractly one could simulate the properties of multiple random processes as occur in one’s experiment (e.g. the 5 test groups above) and calculate what the likelihood of various observations is. What does p < 0.05 actually look like? Even better, one can use the multiple comparisons as a starting point for new experiments that take away this freedom, experiments that look specifically at the group that may be demonstrating an effect.

Discussion topic 1. I claim that this is the same issue as p-hacking. Do you agree or disagree? Explain.

Discussion topic 2. Why is it difficult to detect whether researchers have committed this error when reading papers?

Exercise 8.1 You roll twenty-one twenty-one-sided dice. The probability that one of them shows “12” is therefore 1/21, which is less than 0.05. What’s the probability that at least one of the dice shows “12”? (You should be able to quickly write an equation for the exact solution.).

Exercise 8.2 Recreate the graphs above, where each set of points is 10 values drawn from a Gaussian distribution of mean = std. dev. = 1 . Simulating this many times, what’s the probability that at least one dataset has a mean differs from the control set’s mean by 0.82 or more (the difference between orange and gray above)..

Today’s illustration…

Water, shore. You know the routine.

— Raghuveer Parthasarathy, November 20, 2020

Comments on “Failure to Correct for Multiple Comparisons”—“Ten common statistical mistakes…” #8

Today’s illustration…

Trending Articles

Practice Sheet of Right form of verbs for HSC Students

Download: FK ft Shenky – Nakuyewa ”Prod by: Shenky”

How to win at Markstrat (Markstrat Tips and Tricks) – Vodites

Ominde Commission Report and Recommendations – Ominde Report of 1964

Bureau of Internal Revenue: Regional Offices (Directory)

GO 53 on Enhancement of Ex-gratia upto 5 Lakhs Toddy Tappers in Telangana

Cakewalk CA-2A Leveling Amplifier v2.0.1.97 WiN, v2.0.1.96 OSX Incl Keygen

Mp3 Download: Mdu - Kunjenjenjena

How the kill the job , when DTP request running for long hours.

Microsoft Intune から展開しているアプリのアップデートについて

18-year-old girl was beaten for half an hour by two Northampton men in 'an...

Car crash in Dunton Bassett leaves driver in critical condition

Macky 2, Two Others In Road Accident

Application log 00000000000000089514: Could not convert queue DLVST90CLNT

Detroit mafia: D’Anna Brothers agree to plea deal

Delivery block field greyed out using VA02

Muloraki Au

【個人撮影】スマホのプライベート映像♪「中に出さないで///」カラオケ屋での生ハメ撮りが流出ｗ【リベンジポルノ】＠PornHub

BREAKING NEWS: Diamond Platnumz Is Reported Dead After Ghastly Car Accident

FIAT 500 B0111 B0112