Phylogenetic Diversity Theory Sheds Light on the Structure of Microbial Communities
July 22, 2013 1 Comment
It’s become sooo cliché to say this, but I just can’t help myself: It’s a very exciting time to be a microbial ecologist! You lovers of life writ large enough to be viewed with the naked eye have had all the fun so far. Spilling gut contents and watching predators eat prey to fashion food webs, hunkering down to observe the social behaviors among and within species in a tropical rainforest, catching, marking, and releasing things to understand how they move through space and time, counting, collecting, cataloging. Now it’s our turn!
But wait, when it comes to piecing together the interactions of microbial communities, we have no guts to spill, no behaviors to observe, and while I suppose that in theory one could capture, mark, and release a microbe, I would certainly never hope to recapture it. We have been doing a ton of the collecting, counting, and cataloging in recent years, thanks to cheap and easy 16S rDNA sequencing from diverse environmental samples. We have learned that there is a stupid amount of phylogenetic diversity almost everywhere we “look,” and we can infer, from the functional potential encoded in the genes of the few genomes we’ve sequenced so far, that they are interacting with each other and their environments in really interesting ways.
However, this paper argues, we are not yet very good at using our high-throughput sequencing data to answer questions about the fundamental ecological processes that drive microbial community assembly. Now, everything I know about ecology I learned from reading David Quammen’s Song of the Dodo in 1996. I honestly don’t even remember why – maybe it was the simplicity and applicability of the models, maybe it was Quammen’s excellent storytelling, but I fell in love with Island Biogeography. Not “devote the rest of my professional life to it” kind of love, but more like “I once went on the most awesome road trip, and every time I think back to it, I yearn for it again” kind of love. So when roughly 10 years later, the field of microbial community ecology went bananas, and I found myself smack dab in the middle of it, my thoughts immediately and frequently turned to Island Biogeography. How cool would it be to take these models that have been tested and tinkered with for decades and adapt them for microbial communities? Unfortunately, I was not equipped or inclined to actually do this sort of work. But, there are people out there like James O’Dwyer, Steven Kembel, and Jessica Green, who are.
Having said that, I’m hoping that the real ecologists will chime in about the nuts and bolts of the model described in this paper, because I just want to provide a context for it. The authors propose that their framework will allow us to address two issues.
The first, more pragmatic, issue is related to how we census microbial communities. We cannot simply stake off grid and sit down for a few hours identifying and counting species. We have to scoop up the entire grid, put it in a blender, and extract DNA from it. Typically, even after millions of observations (sequences), you will still encounter new species. Think rainforest canopy fogging, like times 100 million. So, a Species Abundance Distribution (SAD,) with # of species on the y-axis and abundance on the x-axis, will have a very long tail. No big deal, except that obtaining millions of sequences for every sample can still be tricky. For example, I recently sequenced 15 samples on an Illumina MiSeq. I obtained ~20 million high-quality 16S rDNA sequences. Ideally, this would be more than 1 million per sample, unfortunately (and this is common), for reasons we don’t yet understand, the number of sequences per sample ranged from ~98,000-~2million. Most methods (e.g., UniFrac) used to compare phylogenetic diversity (PD) between samples involve subsampling all to the smallest sample size. In this case, I’d be ignoring 18,323,722 sequences! That’s 92.5% of my data. And, forget about it if I want to compare my data to something collected 10 years ago, or to the samples of the future with their bajillion sequences per sample!
In walks the central result of this paper: an analytical method to obtain the expected phylogenetic diversity of a local sample from a larger community. This they term the Edge-length Abundance Distribution (EAD,) and it is an analogue to the SAD. But, instead of counting species and plotting them against their abundance, we are now plotting the total amount of branch length leading to a given number of tips against that number of tips. Or something like that… Anyway, this EAD displays approximately power law behavior, which apparently means that we can use it to do ecology!
One thing we can do with it is use it to normalize the UniFrac distance between differently sized samples, so that’s nice because it makes the first issue go away. The other thing that it can be used for is to start testing hypotheses about the ecological processes that contribute to microbial community structure, and they provide some proof-of-principle examples of its use with human microbiome data. For example, they asked whether the microbiome of someone’s hand has more or less PD than expected if the microbiome were derived from a random sampling of all microbiomes. If the PD is lower than expected, we might hypothesize that some environmental filtering is taking place (ahem, hand sanitizer). I don’t know that any particularly mind-blowing ecological questions were answered in their proof-of-principle application, but now that we microbial ecologists have this phylogenetic framework, we can extend it, and most importantly, start designing experiments with these interesting ecological questions in mind.
I really, really enjoyed this paper. Since I’m a methods nerd, I’m going to talk first about the method, and then about why the biology is exciting here.
I’ve wasted hours of my life shuffling species around to make null distributions, so a method like this that allows us to exactly and quickly compute a null expectation is amazing! The derivation is extremely neat, but I found it initially confusing because I was stuck thinking about PD (phylogenetic distance). They don’t use the distance between species, rather the ‘opposite’ of phylogenetic distance between species: the distance between the crown of a clade and the root of a phylogeny. I’m very much at the limit of my maths here, but this does make me wonder one thing. If all of these expectations are based on branch lengths for complete clades in the metacommunity phylogeny (i.e., it counts the tips descending from a node), how appropriate is that for situations where a community doesn’t contain all members of a clade? In such cases, is the expected variance in PD meaningful, or would it under-estimate what we see in practice, because not all species within a clade are going to be present in a particular community? I find it hard to imagine that I’ve hit upon a central problem with the paper, so I’d be grateful if someone could comment and clear up my confusion!
Moving on to the biology. Community phylogeneticists spend a lot of time looking at the importance of how a source pool is defined spatially (look at this lovely paper someone wrote), and that we can find similar patterns in the human microbiome is wonderful. The classical explanation for clustering at wider scales (communities vs. other humans and other habitats) is that there’s habitat filtering, and that overdispersion should be found in tighter definitions of source pool (community vs. other humans in the same habitat) suggests competition within habitat type. I think it would be cool to have some more functional data on what these microbes are doing; this overdispersion might actually reflect facilitation, whereby different microbes are performing different ecosystem services and together they’re making a more stable community. That might sound a bit group-selection-y, but I’m certain it would be an interesting avenue to explore.
My heart kinda sank when I saw the paper for this week. One, I knew Will would have smarter things to say than me and two, I just don’t enjoy community phylogenetics papers…I ummed and ahhed over what to write for my post, to sit in the fence and mumble on about interesting facets of the paper or to jump right in and have a poorly thought out and largely uninformed rant about community phylogenetics. I’m going to have a go at the second, but try to not fall flat on my face. Ho ho ho.
My biggest gripe with CP is what is a community? How can it be circumscribed? And the same goes for the metacommunity/regional pool. Of course, you can find out cool and interesting things about delimited communities (whole humans vs. noses, continents vs. ecoregions, whatever), but it all just feels so forced. The authors here appear well aware of this and devote admirably long tracts of the paper in highlighting that methods such as theirs are still quite dissociated from actual ecological mechanisms and processes that drive species dynamics.
While the authors’ advances here, doing away with tedious null distributions and endless simluations is definitely great, I just feel like there is still a gap to cross before these kinds of metrics really help us…either to understand some fundamentals of assembly processes or have more practical ends like guiding conservation decisions or informing public health policy. Yes, I’m being vague and I’m not even sure what I ultimately want or feel is possible to get out of similar analyses, but as it is, methods are getting more and more swanky but with no real advance. Yes, metacommunity size matters, no shit! Its always about scale, scale, scale, scale. I’m guessing because communities and metacommunities are bordering on arbritrary concepts, any metric will always depend on scale? No?
On less vague and ranty notes….some other thoughts that struck me. How did the authors generate phylogenies of the microbiome? What breadth of microbial diversity is found in humans and how well characterised is it? How robust are these methods to these kinds of mega phylogenies? This kind of thing probably interests me more than applying some crazy metrics?
Another thing that I wonder about CP and I’m fairly sure that some work on this kind of thing already exists is, what happens when you think about CP across trophic levels? Does bringing in trophic interactions help explain, or bring consistency to, the patterns observed? Because, of course, species don’t just interact with other co-occurring species in their clade.
I think I will stop here and leave the rant at that. Apologies to the authors, this was a well-written, balanced and indeed innovative paper…whose subject I just don’t happen like. Probably my loss more than anybody elses…