Kudos to the ODP Contributors!

February 10, 2010 Andy Farke Leave a comment

There’s a really fantastic article (with a great photo) over at Science Buzz, featuring ODPers Becky Huset and Neva Key on their quest for Camptosaurus limb bone measurements. Go check it out!

Have you been featured in the news, on a blog, or elsewhere? Let us know!

Categories: Publicity

Found an Error? Have a Change to Suggest?

February 9, 2010 Andy Farke 3 comments

In order to streamline things during this time in the project (and in order to keep important notes from getting lost in other comment threads or email inboxes), I’ve created an “Errata” page. As it says there, this is an excellent place to post taxonomic suggestions, museum abbreviation updates, potential typos in the data, duplicate entries, etc. You can access it on the side bar, under Resources, with the link name of “Tasks: Found an Error“?

Thanks to contributor David Dreisigmeyer for the suggestion!

Categories: Miscellaneous, To-Do List

Time to Get to Work

February 9, 2010 Andy Farke Leave a comment
The Ankylosaur of Destiny needs your help!

The Ankylosaur of Destiny needs your help!

Thank you to everyone for an excellent discussion going on over at the previous post. It’s really helping to clarify a number of issues – and I appreciate all of the expertise being tossed in. This is what open science is all about. Of course, the discussion continues – keep the comments rolling in!

As mentioned, we want to have a way to combine duplicate (and non-duplicate) measurements from all of the different sources for each specimen into a single entry. For instance, the Ankylosaurus magniventris specimen AMNH 5214 has four separate entries. One entry presents humerus, femur, fibula, and metatarsal lengths, another one presents only femur lengths, and so on. And, there are multiple different values given for some measurements. For instance, the femur length is given variously as 560, 536, and 542 mm (whether referring to left, right, or an unspecified side). So, we want to condense those four entries into one for the sake of further analysis (keeping the original data safe and sound, in case anyone wants to go back to them).

There is no perfect strategy, but based on our previous discussions it’s looking like the best approach is to “average and combine.” As another example, let’s consider how this would work for the Psittacosaurus mongoliensis specimen AMNH 6538.

There are two entries for this specimen, and we’ll only take a look at subsets of these entries. Two tibia lengths are presented: one at 129 mm and the other at 125 mm. So, our combined entry would use the average of these, 127 mm. Only one of the two entries presents the fibula length (given as 121.4e). In this case, we’ll assume that the estimated measurement is accurate, and enter 121.4 as the combined value (in my general experience, most of these estimated values seem to be pretty darned close, and reflect a little bit missing at the end of the bone or a similar condition; of course, it’s up to everyone to keep their eyes on exceptions to this and flag them accordingly).

Spreadsheet screen capture; click to enlarge

I’ve begun to modify the spreadsheet, so that all specimens which can have combined entries have a line for this (thanks to John Dziak for noting this). As the ceratopsians and ankylosaurs are mostly together in terms of taxonomic updating (unless anyone else spots additional problems – please flag them if you do!), they’re first targets for combination.

Here’s a proposed set of guidelines; if any other situations crop up, please post a comment and we can amend as appropriate. This is the sort of thing that will probably go into a Materials & Methods section in the paper.

Guidelines for Combining Multiple Entries for a Single Specimen

  • If values for various sides are included, please average them all into a single measurement. For instance, if a left and right humerus (in the L L and R L columns) are noted, the average would go into the “L” column. If two measurements for left humeri are included (in the L L column), the average again should go into the “L” column. And so on. . .
  • If a value is indicated as estimated (with an “e” before or after the number), it is appropriate to treat the measurement as valid (unless information indicates that the restoration is too extensive to trust the measurement).
  • If, in a set of measurements, one or more values seem to be “off” (e.g., a case where femur length is given as 342, 339, and 402 mm, respectively), flag this entry. Here, we will probably go with the more likely values (342 and 339) and dump the 402 as an outlier.
  • Each combined entry is indicated by yellow in the first few columns, and the word “combined” in the Reference column.
  • The metatarsal and metacarpal columns are in “text” format (to avoid funky autoformatting of the L/R measurements to dates). So, you will have to adjust techniques accordingly.
  • Once an entry is finished, the person who combined it puts their name in column CM (“Entry 1″) and marks the entire row as yellow.

How to Contribute

So, we’re looking for some volunteers to help combine entries. In the true spirit of crowdsourcing, the fully editable document is available here. Please make edits directly on the document (rather than downloading and resending it to me). Right now, the data down to row 531 are prepped and ready to combine, and I’ve taken care of the first few entries. As we resolve and clean up other areas of the database, those will pop up as available. As always, it’s important to check your work frequently and alert someone if you notice an error or inconsistency.

Thank you, and good luck!

What’s Next?

February 8, 2010 Andy Farke 37 comments

Thanks to the hard work of a number of individuals, our big old measurement spreadsheet is nearly complete. Almost all of the relevant entries have been verified (aside from a few stragglers), and we’re ready to get serious about data analysis. It’s not too late to contribute to the verification effort – as a reminder, I’d like to close off submission of new entries (except for previously unpublished, original measurements), unless there is a very, very good reason. Don’t worry – you’ll get another chance to contribute later this year when we start Phase II!

Today I began the task of sorting out synonymies and specimen numbers in the database. As always, the latest version is available here (note: there is also a bare-bones CSV format snapshot current as of 7 February 2010) available here). Before we can began working any sort of statistical magic, we need to get the data into order. This includes:

  • Making sure that all genus/species names are up to date. In general, we’ll use the latest taxonomic authority. The 2004 Dinosauria is a good start, and any more recent papers are also helpful for sorting things out. In some cases, it’s just going to take going to an expert. If you think a genus or species name should be updated, please post it in the comments.
  • Making sure that all museum abbreviations are up to date. There is some variation from paper to paper in how museum abbreviations are listed, so we’ll want to get all of those clarified. For instance, all of the instances of NMC (National Museum of Canada) and GSC (Geological Survey of Canada) should get changed over to CMN (Canadian Museum of Nature).
  • Combining duplicate entries for a single specimen into one. How do you think we should do this one? I’m thinking of doing an average of all measurements, but maintaining some leeway to discard a measurement that doesn’t seem right. For instance, if two sources cite femur length as 520 and 523 mm, and a third cites femur length as 783, I think we can safely toss out the latter. Thoughts or opinions? This is important, and is something that we’ll have to write up for the materials and methods portion of the paper.
  • Combining duplicate entries for a single species into one. Again, how should we deal with this? We don’t really want to include multiple data points for a single species when doing our analyses (or do we?), because it adds erroneous degrees of freedom (bad from a statistical standpoint), among other things. There is a case for taking species means in some analyses, but again we need to be careful about how we average things. For instance, we probably want to toss out juveniles (in most cases). Does this mean only using the very largest specimen for a species? Or use only the specimen with the most complete data appropriate for a given analysis? Thoughts or opinions?
  • Types of analyses. We should start thinking about the kinds of regressions/PCA’s/etc. that we want to run. I expect that some bivariate plots similar to what we posted earlier might make their way in (e.g., humerus vs. femur length).

Note: As discussed in the comments, no data will truly be “tossed out” – we’re maintaining the primary archive of data as is. Any deletions or combinations will be done on a second copy of the data.

At this stage, it’s quite possible that we might catch some little errors that have crept into the data here and there. As always, please let someone know if this is the case! A comment on the blog is certainly appropriate.

So, please offer any input or advice that you might have. This might include species synonymies, museum abbreviation adjustments, opinions on data combination, etc. Every opinion counts!

The ODP on the ABC

February 3, 2010 Andy Farke 2 comments

The Australian Broadcasting Corporation’s radio show Future Tense is airing an episode later today on Open Science. It just so happens that I was interviewed about the Open Dinosaur Project for the program! Sadly, I can’t really tell you what I’m going to talk about, because I just don’t remember. I was coming off a nasty flu at the time of the interview, so there are no guarantees that I’m terribly coherent. I’m pretty sure I owe an apology to Matt and Mike, for neglecting to mention them at all! Not my best interview. . .but then again, there’s no such thing as bad publicity.

An MP3 of the program will be posted later; I’ll provide the link then. Update: Here’s the link to the program page, and here’s the MP3, and here’s the audio stream. Enjoy!

We’ve only got 70 more measurements left to verify, and there are now 1,757 measurements on the verified list. Amazing!

Wrapping Up Data Collection

February 2, 2010 Andy Farke 2 comments

Our original schedule specified the close of data collection on February 1. Well, today is the day! I don’t think we’ll hit that target precisely, but we are very close. Only 212 entries remain to be verified (including a large batch of single measurement specimens that was dropped in recently), and around 30 are on the double-check queue. In the next few days, let’s abide by the following:

  • Aim to finish up the verification list by Friday, February 5. Important: Please post a comment on the “Tasks for Contributors” page as you complete references, just so we don’t double up on work.
  • Unless the previously un-entered reference is a stunningly important one that includes multiple elements from an otherwise unrepresented species or specimen, let’s hold off on submission of new entries from the literature. At this point, most of what we’re getting are repeats of previously included specimens or measurements of isolated elements. These are still useful to some degree, but we have to call it quits somewhere! Otherwise, we’ll be verifying data until this time next century.
  • Submissions of original specimen measurements are still welcome.

In other news, we have 1,596 verified entries! Very, very awesome.

Categories: Progress Reports

The ODP in the Spotlight #scio10

January 21, 2010 Andy Farke 1 comment

As I catch up after ScienceOnline2010, I wanted to share a few things that I learned there.

  1. People love what we’re doing with this project. The response I received was nearly uniformly positive, and a number of people provided leads for future funding possibilities.
  2. We’re not the only folks doing open notebook science. But, it’s still a pretty small niche in the broader profession. Will it become the dominant model? Or just a toy for a few crazy individuals? Only time will tell.
  3. Our project is unusual among many citizen science projects in the depth to which participants are encouraged to contribute beyond data collection. We don’t want just data monkeys – we want folks who think about the process, contribute ideas, and (hopefully) help us craft the best research paper possible. Of course, we don’t think any less of you if you just want to submit data – but don’t feel limited to data entry alone if you desire more participation!
  4. Our project is also unusual among citizen science projects in the stated publication goals. There seems to be a sense out there (I don’t know how accurate it truly is) that many of these sorts of efforts end in a nice pile of data, but no real published results. That’s all the more incentive to bring our paper through to its logical conclusion!

I have no word yet on the YouTube video of my presentation. Did anyone catch it live?

The ODP Around the Blogosphere
We’ve got a few new links to mention. These include:

If you know of any others, please feel free to post the link in the comments section.

Categories: Miscellaneous, Publicity Tags:

ScienceOnline2010 Update

January 16, 2010 Andy Farke 3 comments

My presentation on the ODP came and went, and seemed to be very well received. I’ve had a fantastic time talking with other folks leading and participating in citizen science efforts, and there is plenty to follow up on. Look for more news in the days to come.

And, once I get info on where the YouTube video is posted, I’ll add a link here.

Thank you to all of our participants – your efforts knocked their socks off!

Catching the ODP at ScienceOnline2010

January 15, 2010 Andy Farke Leave a comment

In just a few short hours, I’ll be leaving for ScienceOnline2010. Rather fitting for the topic, this is an “unconference” – meaning everyone (not just physical attendees) is welcome to participate through chats, Twitter, streamed conference sessions, blogs, etc. I’m really excited for my talk (the “demo” session that I’m in is the closest thing to formal presentations that the meeting has – most of the rest of the sessions are less structured), as well as for the chance to network and hang out with so many people who share a common interest in science communication. A number of other “citizen science” people will be there, so one goal is to compare notes on all of our respective projects.

This afternoon I ran through the presentation with ODP co-leader Matt Wedel, who also happens to live just down the street from me (Mike Taylor, of course, couldn’t make it from England just for a half hour practice and critique session). Thanks to Matt’s suggestions (and Mike’s emailed suggestions about some of the slides), the whole thing is much more polished now. I’ll be posting the slides here shortly after my session on Saturday.

Although the great majority of you won’t be there in person, there are ways to participate from home. I’m in Session E (full program here), from 2 – 3:05 pm Eastern Standard Time on Saturday, January 16. This session will be livestreamed through The RTP Stream, and a chat function at that site will allow you to ask questions or add comments in realtime. For those of you who are active in SecondLife, the session will also be livestreamed onto the RTP Island. After the conclusion of the conference, most of the sessions will also be archived on the scienceinthetriangle YouTube channel – search for the hashtag #scio10. For more details on these and other ways to participate, check out Bora’s post here.

I’m hoping to get a post or two in during the conference, so stay tuned for more!

Categories: Publicity Tags:

Previewing the ODP Presentation at ScienceOnline2010

January 14, 2010 Andy Farke 4 comments

I spent much of today working on my presentation about the Open Dinosaur Project for ScienceOnline2010. The hope is to post the full thing in some form after the meeting; in the meantime, here’s a working outline for the talk:

  • Brief introduction to the ODP and what I’ll be discussing
  • Paleontology as a historically (and sometimes necessarily) secretive field, due to issues of fossil poaching, worries about calling “firsties” on an idea or discovery, stipulations from funding agencies, etc.
  • The rise of open access literature (with its body of untapped data), as well as an interested and savvy community of non-professionals, makes now a great time to attempt something new
  • The team of Farke, Taylor, and Wedel started the Open Dinosaur Project in order to set a precedent for open notebook science within paleo, involve all sorts of people in the science, do some great paleo research on ornithischians, and assemble a database for use by others
  • Brief background on what an ornithischian is, and why we care about their limb bones
  • Brief outline of how we got the word out about the project
  • Brief introduction to you, the participants, and all of the work we’ve accomplished in a short time
  • Potential problems in data mining the literature, and how we’ve worked around them
  • An overview and demo of the data collection and verification process
  • Next steps: analysis, paper writing, and publication
  • What’s worked well, and what we hope to improve
  • Conclusion and acknowledgments

Right now, we have about 33 slides and the whole thing takes about 13 or 14 minutes to run through. I’m hoping to smooth things out over the next few days, to bring it in around 12 minutes. The time slot is 15 minutes total, so ideally I want a 12 minute talk with 3 minutes for questions.

Bora asked if I’ll be bringing a dinosaur bone along with me. . .sadly, it probably won’t happen.

Categories: Publicity Tags: