TL;DR Using functional information improves fine-mapping in genome-wide association studies
I recently reviewed a paper title “Integrating Functional Data to Prioritize Causal Variants in Statistical Fine-Mapping Studies”, which has has now been published. Overall I thought this was a useful contribution that improves on several methodological aspects of fine-mapping in genome-wide association studies.
(NB: none of the points from my actual review are still worth discussing, so the following are my somewhat rambling current thoughts)
The key to the method here is that 1) it explicitly includes a prior from functional genomic information and 2) it allows for multiple causal SNPs. The figure at the top of this post shows how different methods perform in identifying truly causal SNPs in simulations–in the bottom left is performance in simulations with a single causal variant at a locus, and in the bottom right performance in simulations with multiple causal variants. Perhaps unsurprisingly, methods that explicitly assume a single causal variant (like fgwas, which I wrote) perform best in the former situation, while methods that allow multiple causal variants (like PAINTOR, by these authors) perform best in the latter situation.
One thing that I’ve been thinking about is the question: how much do we expect to gain from incorporating functional genomic information into GWAS? In this study, the authors are able to reduce the number of plausible causal variants at a locus by around ~20% using annotations enriched around 5-10X for strong associations (see also a paper from Gusev et al. with similar results); in my own work, I’ve seen that this style of approach also increases the number of identified associations by around 5%. This shows that this line of work is on the right track, but personally, I’d perhaps naively expected this approach to be more powerful. A couple possibilities:
- Do we have the right functional annotations? One possibility is that we don’t have data from the tissues and experimental conditions that are most relevant to annotate disease-related SNPs–for example, perhaps we need to incorporate maps of transcription factor binding in stimulated immune cells, or from different developmental stages. Some nice work along these lines has been done by Fairfax et al.
- Do we need to get better at predicting which SNPs in an annotation are important? Another possibility is that we’re doing a poor job of distinguishing SNPs that matter from SNPs that don’t; e.g. two variants could both fall in an annotated transcription factor binding site, but only one might actually influence binding. Important work along these lines has been done by Moyerbrailean et al.
It seems unlikely that there’s going be be a “magic bullet” that works in all situations; rather, it will take progress on each of these points, plus the development of new methods that use all these different sources of information together.