Review: High Resolution Genomic Analysis of Human Mitochondrial RNA Sequence Variation

This post is part of an experiment where I will be posting summaries and critiques of the main points of papers I review for journals. Apologies in advance for any misunderstandings and errors on my end; please correct these in the comments.

TL;DR: A clever analysis of RNA sequencing data identifies natural genetic variation influencing mitochondrial tRNA processing in humans.

I recently reviewed a manuscript titled “High Resolution Genomic Analysis of Human Mitochondrial RNA Sequence Variation”, which has now been published. Overall I thought the paper was creative and surprising; I’d be interested in hearing other folks’ thoughts.

The experiment

The initial goal of this study seems to have been to use RNA-seq to quantify variation in mitochondrial RNA and DNA sequences. The authors sequenced cDNA libraries prepared from mRNA from whole blood in ~700 individuals, and focused specifically on sequencing reads that mapped to the mitochondrial genome. Since each individual in principle inherited a single mitochondrial genome from their mother, there should be essentially no sequence-level variation within individuals (modulo sequencing and mapping artifacts, more on this later).

The authors then did a simple analysis: they looked for positions in the mitochondrial transcriptome where they observed more than a single base in an individual. They identified ~600 such sites (some observed in multiple individuals), which they call “heteroplasmies”. Putting aside potential technical explanations for these sites, heteroplasmies could be due to either 1) variation at the DNA level (e.g. mutations that have occurred in mitochondria of the individual’s blood during their lifetime) or 2) variation at the RNA level (post-transcriptional modifications of RNA through mechanisms like RNA editing).

Main result: A genetic variant in MRPP3 influences processing of mitochondrial tRNAs

At 13 of the heteroplasmic sites, the authors noticed that their data contained multiple alleles (rather than the two you might expect from a new mutation or a simple RNA editing event). They also made an odd observation: 11 of these 13 sites fell in the ninth position of tRNA genes. By reference to what is known about tRNA biology, they argue that the particular patterns of mismatches they observe at these sites are caused by the presence of RNA methylation (which causes the observed mismatches via reverse transcriptase errors).

Under this model, the proportion of non-reference alleles at a site is a quantitative measure of the fraction of mitochondria in an individual that is methylated at the site. The authors reasoned that as a quantitative phenotype, genetic variants influencing methylation levels might be mapped by standard human genetics methods. Shown at the top of the post is a “Manhattan plot” showing the authors’ results from a genome-wide association study of (putative) tRNA methylation in the mitochondria. The result is essentially every human geneticist’s dream: there’s a single strong peak centered on a nonsynonymous SNP in a biologically plausible gene (in this case, MRPP3, a gene involved in processing of mitochondrial tRNAs).

Putting all of this together, is seems that there is variation in mitochondrial tRNA methylation (or some other modification that could cause similar reverse-transcriptase errors) among individuals in a population, and that this variation is partially due to a trans-acting genetic variant of relatively large effect. I found this is quite impressive.

A note of caution regarding estimates of the total number of heteroplasmies

At various points in the paper, the authors include other results that are often interesting but not as important to the main conclusion. One of these that is worth thinking about is the overall number of heteroplasmic sites.

The authors estimate that in their samples, there are around 600 mitochondrial sites that have multiple alleles (note that this is a sum of DNA-level heteroplasmies and RNA-level heteroplasmies). I have a nagging suspicion that this is an overestimate.

The reason for this suspicion is that I’m worried about mapping errors from “nuclear mitochondrial DNA” (AKA Numt) sequences causing false inference of heteroplamies. Examination of some of the reported sites suggests that the alleles of the “heteroplasmies” indeed are consistent with instead being due to mismapping errors from autosomal sequences.

For example, below is a screenshot of the UCSC genome browser surrounding two “heteroplasmic” sites from Supplementary Table 1. I’m showing the sequence of the reference mtDNA (at the top), as well as the sequences of all relevant Numts (using the NumtS Sequence track). As you can see, at the two sites called by the authors, the alternative “allele” at the site matches the sequence of the Numt. My guess is that there is no mitochondrial sequence variation at these two sites, just mis-mapped sequencing reads that originated from the Numts.


It’s unclear how many of the sites identified by the authors are potentially affected by mapping errors (though note none of the 13 used in the mapping experiment described above have any indication of such problems to my eye). For people interested in quantifying the overall extent of the phenomenon observed by the authors, this seems like a potentially important source of error to take into account.

Y-chromosome “Adam” was not necessarily human


Metaphors in science play an important role in communicating results from one field to scientists in other fields and to the general public. In some cases, however, metaphors are so successful and so appealing that they actually obscure rather than enlighten.

In human population genetics, it is a simple fact that all of the Y chromosomes present in the world today can be traced back to a single common ancestor–if you follow my paternal line (my father’s father’s father’s father, and so on) and your paternal line back far enough, eventually they will overlap. At some point, a population geneticist had the clever idea of calling this common ancestor “Adam”. This is a biblical allusion, of course, and it probably was good for a bit of amusement a couple of decades ago. But it’s time to retire this metaphor–not only because it confuses the public (see a nice series of posts by Melissa Wilson Sayres on this topic here) or scientists in other fields–but because it confuses even practicing human population geneticists!

I was reminded of this when reading over a paper by Eran Elhaik, Dan Graur, and colleagues critiquing work on the human Y chromosome phylogeny by Mendez et al. The basic question being debated is: when did the most recent common ancestor (MRCA) of all Y chromosomes exist? Mendez et al. claimed that this Y chromosome was present around 300,000 years ago, and Elhaik et al. claim they arrived at this number incorrectly.

The details of these papers are not relevant for this post. The key thing I want to point out is an underlying assumption, perhaps most clearly expressed by Elhaik et al., who write:

[Mendez et al.] estimated the time to the most recent common ancestor (TMRCA) for the Y tree to be 338,000 years ago (95% CI=237,000–581,000). Such an extraordinarily early estimate contradicts all previous estimates in the literature and is over a 100,000 years older than the earliest fossils of anatomically modern humans. This estimate raises two astonishing possibilities

The implicit assumption here (the reason Elhaik et al. find the numbers “extraordinarily early” and “astonishing”) is that the individual carrying the most recent common ancestor of all human Y chromosomes (AKA “Adam”) should be an anatomically modern human. Amusingly, Elhaik et al. argue that to claim otherwise is analogous to claiming you have a unicorn in your backyard. But there is simply no reason that “Adam” must be a human. At the top of this post I’ve put a figure showing a hypothetical Y-chromosome genealogy superimposed on a hypothetical human phylogeny. In this (of course hypothetical) example, “Adam” existed well before the diversification of modern humans; this type of scenario is perfectly compatible with basic population genetic theory. From the point of view of population genetics, there is absolutely no reason that the common ancestor of all human Y chromosomes must have existed in an individual that we would identify as “human”.

So why would anyone make this assumption? Note that Elhaik et al. made a YouTube video describing their results; this video leads with a bit of religious iconography. It seems plausible that by calling the most recent common ancestor of all Y chromosomes “Adam”, population geneticists have confused themselves into thinking that “he” must have been human.