The burden of the “multiple testing burden”

I’ve written before on this site about the way metaphors influence scientists and motivate entire directions of research. I was thinking about this again during talks at the meeting of the American Society for Human Genetics, where I was repeatedly reminded of that albatross hanging around the necks of anyone looking genome-wide for variants that influence disease risk: the “multiple testing burden“.

Over and over again, I was told by speakers that an unfortunate side effect of looking at a whole genome is that there are so many things to look at–as all good statisticians know, we brush our teeth twice a day, floss once, and always correct for multiple comparisons. This correction (requiring more stringent thresholds in a genome-wide study than in a targeted study) is our “burden”.

This rather Puritanical point of view (which has always been a bit odd [1]) has one important upside: it encourages stringent thresholds for calling a “true” association. For example, in GWAS, the heuristic P-value threshold of 5×10^-8 has proved incredibly useful for avoiding follow-up work on false positives.

However, this point of view also has one important downside: it implicitly suggests that the multiple testing “burden” can be “lifted” simply by looking at fewer things! But this makes no sense: taken to the logical extreme, it suggests that a reasonable study design would be to only consider rs78704525 (to randomly choose a SNP) in all future studies. So much burden gone!

So I’d like to replace the idea of the “multiple testing burden” in genomics with something a bit more upbeat. Perhaps a “multiple testing party”. As in: “we collected 1,000 phenotypes and 1,000,000 genotypes on all study participants. Naturally, this required throwing a multiple testing party”. The more the merrier! [2]

—-

[1] As has been pointed out, for example, by one of the first successful genome-wide association studies [WTCCC 2007]:

Classical multiple testing theory in statistics is concerned with the problem of ‘multiple tests’ of a single ‘global’ null hypothesis. This, we would argue, is a problem far removed from that which faces us in genome-wide association studies, where we face the problem of testing ‘multiple hypotheses’ (for a particular disease, one hypothesis for each SNP, or region of correlated SNPs, in the genome) and we thus do not subscribe to the view that one should correct significance levels for the number of tests performed to obtain ‘genome-wide significance levels’.

[2] More seriously, by simultaneously looking at millions of genetic variants and thousands of phenotypes, one can identify systematic biases in the data and potentially correct for them (c.f. Leek et al. 2010), and in principle learn what proportion of variants influence each trait (c.f. Storey 2001). This is not a “burden”, it is actually a huge advantage.

One thought on “The burden of the “multiple testing burden”

  1. i have taken to referring to multiple tests as an “opportunity”, specifically the opportunity to learn about the true distribution of the effects. So we have the “Multiple testing opportunity” instead of “multiple testing burden”. However, the multiple testing party sounds catchier!

Leave a comment