Time to Get to Work
Thank you to everyone for an excellent discussion going on over at the previous post. It’s really helping to clarify a number of issues – and I appreciate all of the expertise being tossed in. This is what open science is all about. Of course, the discussion continues – keep the comments rolling in!
As mentioned, we want to have a way to combine duplicate (and non-duplicate) measurements from all of the different sources for each specimen into a single entry. For instance, the Ankylosaurus magniventris specimen AMNH 5214 has four separate entries. One entry presents humerus, femur, fibula, and metatarsal lengths, another one presents only femur lengths, and so on. And, there are multiple different values given for some measurements. For instance, the femur length is given variously as 560, 536, and 542 mm (whether referring to left, right, or an unspecified side). So, we want to condense those four entries into one for the sake of further analysis (keeping the original data safe and sound, in case anyone wants to go back to them).
There is no perfect strategy, but based on our previous discussions it’s looking like the best approach is to “average and combine.” As another example, let’s consider how this would work for the Psittacosaurus mongoliensis specimen AMNH 6538.
There are two entries for this specimen, and we’ll only take a look at subsets of these entries. Two tibia lengths are presented: one at 129 mm and the other at 125 mm. So, our combined entry would use the average of these, 127 mm. Only one of the two entries presents the fibula length (given as 121.4e). In this case, we’ll assume that the estimated measurement is accurate, and enter 121.4 as the combined value (in my general experience, most of these estimated values seem to be pretty darned close, and reflect a little bit missing at the end of the bone or a similar condition; of course, it’s up to everyone to keep their eyes on exceptions to this and flag them accordingly).
I’ve begun to modify the spreadsheet, so that all specimens which can have combined entries have a line for this (thanks to John Dziak for noting this). As the ceratopsians and ankylosaurs are mostly together in terms of taxonomic updating (unless anyone else spots additional problems – please flag them if you do!), they’re first targets for combination.
Here’s a proposed set of guidelines; if any other situations crop up, please post a comment and we can amend as appropriate. This is the sort of thing that will probably go into a Materials & Methods section in the paper.
Guidelines for Combining Multiple Entries for a Single Specimen
- If values for various sides are included, please average them all into a single measurement. For instance, if a left and right humerus (in the L L and R L columns) are noted, the average would go into the “L” column. If two measurements for left humeri are included (in the L L column), the average again should go into the “L” column. And so on. . .
- If a value is indicated as estimated (with an “e” before or after the number), it is appropriate to treat the measurement as valid (unless information indicates that the restoration is too extensive to trust the measurement).
- If, in a set of measurements, one or more values seem to be “off” (e.g., a case where femur length is given as 342, 339, and 402 mm, respectively), flag this entry. Here, we will probably go with the more likely values (342 and 339) and dump the 402 as an outlier.
- Each combined entry is indicated by yellow in the first few columns, and the word “combined” in the Reference column.
- The metatarsal and metacarpal columns are in “text” format (to avoid funky autoformatting of the L/R measurements to dates). So, you will have to adjust techniques accordingly.
- Once an entry is finished, the person who combined it puts their name in column CM (“Entry 1″) and marks the entire row as yellow.
How to Contribute
So, we’re looking for some volunteers to help combine entries. In the true spirit of crowdsourcing, the fully editable document is available here. Please make edits directly on the document (rather than downloading and resending it to me). Right now, the data down to row 531 are prepped and ready to combine, and I’ve taken care of the first few entries. As we resolve and clean up other areas of the database, those will pop up as available. As always, it’s important to check your work frequently and alert someone if you notice an error or inconsistency.
Thank you, and good luck!