I’ve begun the process of paring down the database into the form for final analysis (but remember, of course, that the full, unaltered data live on). So far, I have deleted entries that fit into the following categories:
- Extraneous entries for “combined” specimen data sets. In other words, only the combined entry will be used for analysis. One specimen, one data point.
- Specimens comprising isolated elements (e.g., isolated humeri or femora). Because we’re analyzing skeletal proportions primarily, isolated elements aren’t terribly useful.
- Individual measurements with a “+” have been removed, as they represent measurements “as preserved” rather than estimates of original length.
Very soon we will want to make a final decision on what to do with species represented by multiple specimens. One strategy (which I think was used by Carrano 2006, if I remember correctly) is to use only the largest specimen. Here, the benefits are that juvenile specimens are pretty automatically excluded. The downside is that the largest specimens may not be the most complete.
A second strategy is to average all of the entries together. Of course, we would have to be careful on this. For instance, when we’re using ratios, we’ll want to calculate the ratio first, and then average the ratio. We don’t want to calculate, for instance, a humerus:femur ratio from the averaged measurements. Here’s why.
Let’s pretend we have specimens A and B. Specimen A has a humerus and femur length of 100 and 50, respectively. This gives a humerus:femur ratio of 2.0. Specimen B has a humerus and femur length of 50 and 100, respectively, for a humerus:femur ratio of 0.50. If we were to average the humeri and femora first, we would get average lengths of 75 each, which then results in a ratio of 1! Obviously (I hope), it is apparent that this “ratio of averages” doesn’t accurately reflect what’s going on. Furthermore, it’s quite different from the “average of ratios,” which weighs in at 1.25.
The advantage of averaging values for all specimens in a species is that we can better incorporate individual variation, and also better deal with incomplete specimens. We need to be cautious of averaging in cases of extreme size variation for a single species (hence, part of why it’s desirable to use ratios). Here, it may be worthwhile still to discard known juveniles.
Want to see the work in progress? Check it out here.
Thoughts? Let’s hear from you in the comments section!
A huge thank you to all of the volunteers who helped to organize multiple entries for individual specimens into a single combined entry. This was an unglamorous task, but an important one for subsequent analysis. We got 243 combined specimen entries in under two weeks’ time. In no particular order (other than alphabetically by first name), Doug Henning, Jr., Henrique Niza, Jay Fitzsimmons, John Dziak, Rob Taylor, and William Banks Miller did a fantastic job heading up the effort. As a result, we now have 1,898 individual verified lines of data (including combined entries). That’s pretty darned amazing!
Also, kudos to all of the individuals who have been participating in the discussion on the blog over the past weeks. This sort of environment is exactly what we (Matt, Mike, and I) wanted to foster when we started the project. The sheer breadth and depth of expertise among the contributors is pretty impressive! And for those who don’t feel like “experts”, it’s certainly OK to chime in. Everyone’s comments are welcomed. Anyone can do science.
Anyone can do science – this firm belief is part of why we started the Open Dinosaur Project. In fact, as Matt noted some time ago, there is a whole world of “citizen science” opportunities out there! If you’re addicted to the idea of citizen science, and want to learn about other projects in this vein, head on over to scienceforcitizens.net. They’ve got a whole directory of opportunities in all scientific fields in which you can participate!
Even better, there is a project page for the ODP, and a nifty little blog post by John Ohab with a video message from Matt and me (Mike’s over in England, and Matt and I practically live next door, so you’re stuck with only 2/3 of the project leads). John mentioned that scienceforcitizens.net (which is still in the beta stage, but looking quite nice) encourages all of us to create an account and even member blog posts about our experiences as citizen scientists. If you have a moment, go check it out!
As we finish up combining the data, it’s time to start thinking about the specific analyses that we’re going to do. What are the specific questions we’re asking? What are the techniques that we need to address the questions? Some excellent discussions between ODPers have been happening in one of the recent posts, and I was hoping to continue that here. In particular, I wanted to refocus the discussion on the project’s essential questions, and consider the types of analyses that we can use to answer each question. I’m just thinking out loud here (this is open notebook science, after all), and invite suggestions and discussion in the comments section. In particular, I’m referencing the “big questions” outlined in one of our first posts.
Why did ornithischians evolve quadrupedality multiple times?
I think this one is going to have to simply rely upon our interpretation of the data. After all, we can perhaps answer “how,” but the “why” can’t really be answered in this setting. So, it’s something to consider in the “discussion” section of the paper. But, see the next question. . .
Was the evolution of quadrupedality consistently associated with an increase in body size?
Here, we’re basically looking at evolutionary trends. In other words, can we detect a trend in body size within various ornithischian lineages? The more I think about this, the less I’m convinced we can directly answer the question (if you disagree, and have a solution, pipe up in the comments, please). One problem is the difficulty in knowing whether or not certain taxa were truly quadrupedal. So, where do you make the cut-off for quadrupedal vs. bipedal vs. both? In many cases we just don’t know. There’s a danger in circular reasoning, too (the limb bones look like it’s quadrupedal, so we call it quadrupedal, and then use it as an example of a quadrupedal taxon for analysis of limb bones).
But, I think we can detect trends across Ornithischia as a whole, and within specific lineages. For instance, is there a trend for increasing body size across Ornithiscians? Is there a trend for increasing body size within Ornithopoda? Ceratopsia? Thyreophora? In fact, Matt Carrano found a consistent and statistically significant increase in body size within ornithischians (and indeed, within most dinosaurs) when considering femoral measurements (go here to download a free PDF of Carrano, 2006). So, that makes this question a little less interesting (and indeed, less publishable, because it’s already been done). Do you think we should move it to the back burner? Or should we spin it in another way? Thoughts are welcome.
Did different groups of quadrupedal ornithischians arrive at this body form in similar ways, or did they have different strategies?
Here (as far as I know) is a genuinely novel question, and I think it’s the core of the ODP’s current phase. What we’re really saying (I think) is this: We know that thyreophorans, hadrosaurs, and ceratopsids independently evolved quadrupedal locomotion. Did each group have similar limb proportions, or were they different? I think this is where we’ll want to look at principal components analysis, at least as a starting point for data visualization. And, we’ll have to do that within a phylogenetic context. A recent paper by Liam Revell (2009) addressed how to do this (thanks to ODPer Randy Irmis for bringing up this paper; you can download it for free here – it’s well worth a read).
A second way to look at this question is to look for trends in certain structures – for instance, do the metacarpals tend to get elongated in each group (relative to the rest of the arm) as different clades became quadrupedal? Here, we might use a simple non-parametric correlation of the ratio with patristic distances (see the Carrano paper, again, and references therein, for a brief introduction to this method), to investigate that question within different lineages. Basically, patristic distance estimates the distance of a particular species from the base of the tree (by the number of branching points leading up to it). A taxon that split off early in a group’s evolution would have a low patristic distance, and vice versa for one that split off late in a group’s evolution. So, we might look at the correlation of metacarpal:arm length ratio to patristic distance for thyreophorans, hadrosaurs, and ceratopsians.
I think I’ll end here for now! Please add thoughts, suggestions, corrections, and anything else you think relevant in the comments. Next time, I’ll move on to the final issue, quantifying morphological disparity in ornithischian evolution.
Carrano, M. T. 2006. Body-size evolution in the Dinosauria. In M. T. Carrano , R. W. Blob, T. J. Gaudin & J. R. Wible (eds.), Amniote Paleobiology: Perspectives on the Evolution of Mammals, Birds, and Reptiles. University of Chicago Press, Chicago:225-268. Freely available here.
Revell, L. J. 2009. Size-correction and principal components for interspecific comparative studies. Evolution 63: 3258-3268. Freely available here.
We’re on the home stretch for combining specimen data. . .I just updated the spreadsheet (accessible, as always, here); feel free to edit as appropriate to combine all of the final entries. Note that I have temporarily removed the already combined entries, as well as the singletons.
The first combined entry has been left as an example. As before, please color the original data orange, and the combined line that you insert yellow.
Those who have contributed to the ODP over the last few months know that a single specimen might have measurements featured in 2, 3, 4, or more separate scientific papers. In order to keep data entry and verification as transparent as possible, we’ve included the presentation from each scientific paper as a separate entry. Now, though, it’s time to combine these separate entries into composite entries that can be analyzed as a single unit (see this post for how you can help).
But, we do face some real challenges in cobbling this information together. One major problem concerns different specimen numbers or museum abbreviations for the same specimen. For those who aren’t familiar with the museum world, every specimen in a museum gets a unique number. This helps us to keep track of the data with each specimen (not just measurements, but locality information, storage location, etc.). Rather than saying “that big T. rex skull on display in that big New York museum,” we just say “AMNH 5027″. This means that it’s specimen number 5027 at the American Museum of Natural History; there’s only one specimen with that number. Believe it or not, some people memorize such minutia (maybe you’re one of them). I know the specimen numbers for most of the well-known ceratopsian skulls (just mention the phrase “YPM 1822″, and Triceratops prorsus springs to mind), but still have a tough time remembering my wife’s birthday. Believe me, I catch grief for that one.
At any rate. . .in some cases, it’s pretty easy to figure out multiple presentations of the same specimen. AMNH FR5240 (American Museum of Natural History Fossil Reptile #5240) is pretty certainly the same as AMNH 5240. There are just a few extra letters (to distinguish 5240 in the fossil reptile collection from 5240 in the modern fish collection, for instance).
Sometimes things get complicated. For instance, museums change names. The old “Geological Survey of Canada” specimens eventually became “National Museum of Canada” specimens, which then morphed into “Canadian Museum of Nature” specimens when the institution changed its name. So, the Chasmosaurus skeleton that started out as GSC 2245 became NMC 2245 became CMN 2245. “CMN” seems to be the abbreviation of choice nowadays, and luckily the specimen numbers stayed the same. Sometimes historic abbreviations are carried on through sheer inertia. For instance, “USNM” stands for “United States National Museum.” Yet, it hasn’t been called that in decades – today we know it as the “National Museum of Natural History” (or just “The Smithsonian” to most of the general public). But, for various reasons (including overlap in abbreviations with all of the other countries’ national museums), “USNM” still stands. When different publications use different abbreviations, we still have to sort out what’s going on.
Sometimes things get really complicated. Did you know that the Protoceratops skeleton listed as AMNH 6471 by Brown and Schlaikjer’s 1940 paper is now known as CM 9185? This happened when the specimen was sent from the American Museum of Natural History to the Carnegie Museum in Pittsburgh. The only reason I know of this is because Matt Carrano had noted this in one of his data entries, and also through a chance reading of a 1981 publication on dinosaurs of the Carnegie by Jack McIntosh.
And sometimes things get just flat-out twisted. Back in the day, the Royal Ontario Museum completely renumbered their fossil collection. What was once known as the Corythosaurus ROM 5505 is now ROM 845. The Lambeosaurus ROM 6474 is now called ROM 1218. Thankfully, some papers indicate the old and the new catalog numbers. But not always. There are measurements from old papers of certain specimens (e.g., ROM 5167 and ROM 5971, specimens of Edmontosaurus regalis and Prosaurolophus maximus, respectively) that just aren’t clear. So, we’ll either hope that someone out there reading this knows the current specimen number, or we’ll have to contact a curator at the museum to find out. (feel free to chime in in the comments, if you know the answer)
These sorts of things are hugely important for the utility of our dataset, and we’re depending on each other to get these details ironed out. That’s the real strength of an open project like the ODP – anyone can contribute!
Have you been featured in the news, on a blog, or elsewhere? Let us know!