Paring Down the Data

Awhile back, I brought up the issue of how to handle specimens with multiple individuals, as well as juvenile individuals, and a lively discussion ensued. The general conclusions were:

  1. Any ratios must be calculated for individual specimens, not from Frankensteinian averages of elements from different specimens. It is OK to average ratios across specimens, just not the raw measurements.
  2. Extremely young juveniles should be excluded, as they may differ in body proportions from adults of the same or closely related species.

With this in mind, I’ve started to go through the data set to flag obviously small juveniles. I would propose that we use body size rather than sexual maturity or LAGs as an indicator. Recent studies (e.g., this one on pachycephalosaurs by Horner and Goodwin) have shown that some near-adult size dinosaurs are probably not fully mature. I suggest that we assume that the limb proportions approximate those of adults sufficiently in these individuals (even if the skull morphology doesn’t). This will allow us to keep a few interesting taxa in. For instance, Fruitadens is not fully adult, but probably pretty close to it (according to histology; see the original paper). It’d be a shame to ignore the world’s smallest known ornithischian!

This is pretty straightforward for taxa known from only single specimens of obviously young juveniles (e.g., Avaceratops, Nipponosaurus, etc.). These taxa can be pretty safely removed.

What about animals like Stegosaurus armatus, where we have many specimens of various sizes, from tiny to gigantic? Here, I suggest cutting the specimens that are smaller than 2/3 the size of the largest specimen in one or more measurements (modifying a suggestion from William Miller). This 2/3 is a completely arbitrary cut-off, but I feel that it removes the smallest individuals while recognizing that indeterminate growth means adults can have a range of sizes.

As an example, the longest Stegosaurus femur, from YPM 1853, is 1,334 mm in length. Two-thirds of this is 889 mm, so any specimen with a femur smaller than this would be cut. This includes YPM 1856 (883 mm), DNM 2438 (308 mm), and many more. Of course, we can use our judgment for specimens close to the line. Because some specimens lack a femur, we would look at other elements (e.g., the humerus) to narrow the sample down more.

This elimination method will give us a rarefied data set, with the most egregious juveniles out of the picture. I suggest that we then use the average of the ratios from the trimmed sample for further analysis. In cases where we need actual bone lengths, I suggest using the largest specimen with sufficiently complete data necessary for the analysis.

For Everyone to Check
Following these criteria, I have posted a new Google spreadsheet here. It has two worksheets. The first, titled “To Analyze” (you can find the tabs along the bottom of the screen), has specimens that should be considered for removal marked with a greenish color. Do you agree? Disagree? See another specimen that should be removed? Let me know in the comments.

The second worksheet, titled “Deletion Candidates,” has specimens and taxa that I think should be excluded from the final analysis. The reasoning is given in the final column. Examples include, “can’t match with actual specimen” (for cases where it looks like derivative data given without citation that couldn’t be tied to a particular specimen), “non-diagnostic taxon of uncertain affinity” (should be obvious), etc. Once again, please check it over. Do you agree? Disagree? Say something in the comments here.

This entry was posted in Uncategorized. Bookmark the permalink.

15 Responses to Paring Down the Data

  1. Chris Noto says:

    Would it help in future entries if there was a column for for participants to specifically add the ontogenetic status if it is stated in the reference for the material?

  2. Mike Taylor says:

    That all sounds good to me, Andy.

    Although the word “egregious” seems awfully harsh on the poor juveniles 🙂

  3. Rob Taylor says:

    On the Deletion Candidates worksheet, the specimen of Centrosaurus apertus identified only as ‘ROM’ under the specimen column is almost assuredly ROM 767 (already represented on the To Analyze worksheet). Although not identified as such in the original work (Parks 1921), the locality data provided on p.54 also appears in the spreadsheet I obtained from Kevin Seymour at the ROM, shown in association with catalog no. 767. In addition, the two entries are clearly very similar.

    I would think this candidate could safely be ousted, or otherwise combined with the present ROM 767 entry if preferred.

  4. Rob Taylor says:

    Also concerning Parks 1921, another of the unnumbered Centrosaurus apertus deletion candidates (length of scapula: 700 mm) is easily traced to AMNH 5351. The data appear to have been pulled directly from Brown 1917, which is cited in the work.

    This specimen is already represented in the ‘To Analyze’ worksheet, and it appears the Parks-related entry would add nothing to our knowledge, so it should be an easy oust.

  5. Rob Taylor says:

    Additionally, the unnumbered Centrosaurus apertus specimen with MC I length of 97 mm is AMNH 5427. Same story as above; the data in Parks 1921 were obtained from Brown 1917. In this instance, the Brown ’17 entry is also on the Deletion Candidates worksheet (and along with a combined entry), shown as Centrosaurinae indeterminate.

  6. William Miller says:

    The 2/3 thing is good so long as we don’t get an unusually big maximum size. If you happened by chance to find fossils from a 7’6″ human, for example, this rule would exclude all adult humans under 5 feet tall, which is a non-negligible proportion of the species. If you got fossils from an 8′ human, it would be worse. So in species we have a lot of specimens of, we might want to take a look and make sure one is not overly huge and skewing this.

  7. Rob Taylor says:

    With the exception of possibly combining entries for ROM 767 (as noted in my initial post on this thread above), I’m in agreement on all proposed deletions, including those specimens highlighted as being below our size thresholds on the ‘To Analyze’ spreadsheet. In the case of the latter in particularly, I was pleasantly surprised to find that the impact of removing these specimens appears pretty minimal. Virtually all of the species in question still look to be well-represented, or at least better represented by a surviving entry.

  8. William Miller says:

    Not sure if I should ask this here or at tasks, but what can we be doing to help out the ODP at this phase?

  9. David Dreisigmeyer says:

    I post this here if anyone wants the data. Gregory Paul has made dinosaur weight data available:

    You can get csv and sql (MySQL 5.1) versions here:

    Let me know if you see any errors in the csv and sql files. (If there’s and error in one there’s the same error in the other.)

  10. William Miller says:

    Wow. Mamenchisaurus sinocanadorum at 75,000 kg?!

  11. David Dreisigmeyer says:

    William Miller :
    Wow. Mamenchisaurus sinocanadorum at 75,000 kg?!

    There was a large discussion about this on the Dinosaur Mailing List.

  12. Heinrich Mallison says:

    As I already wrote in an email to Andy, the Kentrosaurus data (not marked ‘to analyze’ or as a deletion candidate) should be mostly dumped. I’ll go over it and send in a spreadsheet with data from the lectotpye (=least uncomplete individual).

  13. Mike Taylor says:

    On the mass of Mamenchisaurus sinocandorum, see also my comment at SV-POW! and Zach Armstrong’s immediate response:

  14. William Miller says:


    What’s next for the ODP? Is there anything for us to work on now?

  15. Pingback: The Data Set (as it sits now) « The Open Dinosaur Project

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s