Notes on Data Processing
I’ve begun the process of paring down the database into the form for final analysis (but remember, of course, that the full, unaltered data live on). So far, I have deleted entries that fit into the following categories:
- Extraneous entries for “combined” specimen data sets. In other words, only the combined entry will be used for analysis. One specimen, one data point.
- Specimens comprising isolated elements (e.g., isolated humeri or femora). Because we’re analyzing skeletal proportions primarily, isolated elements aren’t terribly useful.
- Individual measurements with a “+” have been removed, as they represent measurements “as preserved” rather than estimates of original length.
Very soon we will want to make a final decision on what to do with species represented by multiple specimens. One strategy (which I think was used by Carrano 2006, if I remember correctly) is to use only the largest specimen. Here, the benefits are that juvenile specimens are pretty automatically excluded. The downside is that the largest specimens may not be the most complete.
A second strategy is to average all of the entries together. Of course, we would have to be careful on this. For instance, when we’re using ratios, we’ll want to calculate the ratio first, and then average the ratio. We don’t want to calculate, for instance, a humerus:femur ratio from the averaged measurements. Here’s why.
Let’s pretend we have specimens A and B. Specimen A has a humerus and femur length of 100 and 50, respectively. This gives a humerus:femur ratio of 2.0. Specimen B has a humerus and femur length of 50 and 100, respectively, for a humerus:femur ratio of 0.50. If we were to average the humeri and femora first, we would get average lengths of 75 each, which then results in a ratio of 1! Obviously (I hope), it is apparent that this “ratio of averages” doesn’t accurately reflect what’s going on. Furthermore, it’s quite different from the “average of ratios,” which weighs in at 1.25.
The advantage of averaging values for all specimens in a species is that we can better incorporate individual variation, and also better deal with incomplete specimens. We need to be cautious of averaging in cases of extreme size variation for a single species (hence, part of why it’s desirable to use ratios). Here, it may be worthwhile still to discard known juveniles.
Want to see the work in progress? Check it out here.
Thoughts? Let’s hear from you in the comments section!