Analytical Software

As the past two posts have attested, we are on the cusp of doing some actual, real-life analysis. If at all possible, I want to run the analyses using freely available, open source software. Fortunately, all of the major software packages for phylogenetically-informed, quantitative analaysis are free! Here are the ones I’m looking at:

  • For constructing trees and some simple analyses: Mesquite. (all of the tree plots you’ve seen have been done in this program)
  • For general statistical analysis: R, along with the smatr, ape and ggplot2 packages. I will make a serious effort to post all scripts that are used to generate graphics and analyses; I would encourage all of you to do the same.
  • For additional analysis of quantitative data in a phylogenetic context: COMPARE.
This entry was posted in Basics, Miscellaneous. Bookmark the permalink.

10 Responses to Analytical Software

  1. 220mya says:

    For reconstructing ancestral states for quantitative characters, COMPARE should probably be your main go-to software package. It reconstructs states in a likelihood framework, which has a number of advantages over squared-change parsimony (the method implemented in Mesquite for the last two plots). The big one is that you can assign a range and mean of values to each taxon, and it also allows calculation of 95% confidence intervals for reconstructed values. The program is fairly easy to learn. And, because it runs with Java, its the ultimate cross-platform software package!

  2. Andy Farke says:

    Good point; likelihood is certainly preferable. Mesquite will also do likelihood, too. . .but the range and mean values are something that Mesquite won’t do.

  3. 220mya says:

    Should be specific – COMPARE specifically uses pGLS (phylogenetic generalized least-squares regression) in a likelihood framework.

    Also, if we want to test whether certain models of evolution best fit the evolution of character(s) of interest, I suggest using BayesTraits. For example, you can test for directional evolution vs. a brownian model of evolution, or whether most of the change across the tree occurred early or late along branches.

  4. Andy Farke says:

    You bring up an important point there, Randy. Even if we aren’t interested in testing mode of evolution, we at least need to pay some attention to it to ensure that our data properly fit the assumptions of the different methods (e.g., brownian model of evolution assumed for PICs).

  5. Andy Farke says:

    Another good point – choice of models is important (and a weakness of PICs). And to be utterly pedantic, these methods can never remove the effects of phylogeny, only account for the effects of phylogeny to one degree or another.😉

  6. David Dreisigmeyer says:

    The R code for doing the NMF is here:

    The R code only is called nmfDIV.R. It assumes that no initial starting point has a zero row or column. If you’re having problems with it let me know and I’ll update. If you’re using larger data sets, the FORTRAN 95/2003 version (nmfDIVf95.R and nmfDIV.f95) will be significantly faster. Compiles with gfortran 4.2 and 4.4 on Snow Leopard. If you want to do a Generalized Linear Discriminant Analysis you can call the gsvd.R which calls LAPACK’s gsvd routine (dggsvd). So you’ll need the new version of LAPACK installed and change the path in dyn.load.

  7. 220mya says:

    One of the advantages of online journals is that we’re not limited to static grayscale images. In my mind, the real power of this medium is the advancement of data visualization through animation, 3D rendering, etc. I recently came across this chart, which is a visualization of the accrual of vertebrate phylogenetic data through time, from Thomson & Shaffer (2010). I immediately thought of the ODP, because this is a perfect way to show how our data change through geologic time for different clades! The best part is that this visualization tool is free and built into Google Docs, so anyone can do it. Its based off software called Trendalyzer originally developed by a team lead by Hans Rosling. Here’s a talk given by Hans at the TED conference that gives an excellent demonstration of the power of this type of visualization:

    Thomson, R.C., and H.B. Shaffer. 2010. Rapid progress on the vertebrate tree of life. BMC Biology 8: 19, 1-27. [Link]

  8. Thanks, 220mya, for letting me know about that visualization tool. Very interesting!

  9. Rob Taylor says:

    One of the more enjoyable tasks I’ve gotten to do at my day job of late is to evaluate a newly acquired application that produces powerful visualizations very much akin to the one Randy pointed out. All this time I’ve been thinking how cool it would be if we could make use of it for producing ODP visuals, but there would of course be proprietary issues. It’s a revelation to know that you can do this sort of thing for free through Google docs, and that bubble chart would indeed be perfect for some of our data displays. (Also, for those who have yet to check it out, the talk by Hans Rosling is very much worth a look-see!) I add my thanks to Jay’s, as this is a very handy tool to know about.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s