As the past two posts have attested, we are on the cusp of doing some actual, real-life analysis. If at all possible, I want to run the analyses using freely available, open source software. Fortunately, all of the major software packages for phylogenetically-informed, quantitative analaysis are free! Here are the ones I’m looking at:
- For constructing trees and some simple analyses: Mesquite. (all of the tree plots you’ve seen have been done in this program)
- For general statistical analysis: R, along with the smatr, ape and ggplot2 packages. I will make a serious effort to post all scripts that are used to generate graphics and analyses; I would encourage all of you to do the same.
- For additional analysis of quantitative data in a phylogenetic context: COMPARE.
For reconstructing ancestral states for quantitative characters, COMPARE should probably be your main go-to software package. It reconstructs states in a likelihood framework, which has a number of advantages over squared-change parsimony (the method implemented in Mesquite for the last two plots). The big one is that you can assign a range and mean of values to each taxon, and it also allows calculation of 95% confidence intervals for reconstructed values. The program is fairly easy to learn. And, because it runs with Java, its the ultimate cross-platform software package!
Good point; likelihood is certainly preferable. Mesquite will also do likelihood, too. . .but the range and mean values are something that Mesquite won’t do.
Should be specific – COMPARE specifically uses pGLS (phylogenetic generalized least-squares regression) in a likelihood framework.
Also, if we want to test whether certain models of evolution best fit the evolution of character(s) of interest, I suggest using BayesTraits. For example, you can test for directional evolution vs. a brownian model of evolution, or whether most of the change across the tree occurred early or late along branches.
You bring up an important point there, Randy. Even if we aren’t interested in testing mode of evolution, we at least need to pay some attention to it to ensure that our data properly fit the assumptions of the different methods (e.g., brownian model of evolution assumed for PICs).
Another program to look into is Brownie, which allows the investigation of rates of morphological character evolution.
I should point out that another advantage of pGLS is that you can use it to remove the effects of phylogeny (just like PICs), but you have a choice of models, not just Brownian motion.
Another good point – choice of models is important (and a weakness of PICs). And to be utterly pedantic, these methods can never remove the effects of phylogeny, only account for the effects of phylogeny to one degree or another. 😉
The R code for doing the NMF is here:
https://sites.google.com/site/daviddreisigmeyer/home/files
The R code only is called nmfDIV.R. It assumes that no initial starting point has a zero row or column. If you’re having problems with it let me know and I’ll update. If you’re using larger data sets, the FORTRAN 95/2003 version (nmfDIVf95.R and nmfDIV.f95) will be significantly faster. Compiles with gfortran 4.2 and 4.4 on Snow Leopard. If you want to do a Generalized Linear Discriminant Analysis you can call the gsvd.R which calls LAPACK’s gsvd routine (dggsvd). So you’ll need the new version of LAPACK installed and change the path in dyn.load.
One of the advantages of online journals is that we’re not limited to static grayscale images. In my mind, the real power of this medium is the advancement of data visualization through animation, 3D rendering, etc. I recently came across this chart, which is a visualization of the accrual of vertebrate phylogenetic data through time, from Thomson & Shaffer (2010). I immediately thought of the ODP, because this is a perfect way to show how our data change through geologic time for different clades! The best part is that this visualization tool is free and built into Google Docs, so anyone can do it. Its based off software called Trendalyzer originally developed by a team lead by Hans Rosling. Here’s a talk given by Hans at the TED conference that gives an excellent demonstration of the power of this type of visualization: http://www.ted.com/talks/hans_rosling_shows_the_best_stats_you_ve_ever_seen.html
Thomson, R.C., and H.B. Shaffer. 2010. Rapid progress on the vertebrate tree of life. BMC Biology 8: 19, 1-27. [Link]
Thanks, 220mya, for letting me know about that visualization tool. Very interesting!
One of the more enjoyable tasks I’ve gotten to do at my day job of late is to evaluate a newly acquired application that produces powerful visualizations very much akin to the one Randy pointed out. All this time I’ve been thinking how cool it would be if we could make use of it for producing ODP visuals, but there would of course be proprietary issues. It’s a revelation to know that you can do this sort of thing for free through Google docs, and that bubble chart would indeed be perfect for some of our data displays. (Also, for those who have yet to check it out, the talk by Hans Rosling is very much worth a look-see!) I add my thanks to Jay’s, as this is a very handy tool to know about.