How is finding a consensus among citizen science transcriptions like aligning gene sequences AND textual analysis of medieval codices? Part 2

Great observation from the Notes from Nature team: ‘Citizen science approaches place us right between existing standards-oriented thinking in biodiversity informatics and edition-oriented thinking in the humanities.’ http://www.notesfromnature.org/

So You Think You Can Digitize

(cross-posted at SciStarter)

In our last post, we went through the mechanics of how to find consensus from a set of independently created transcriptions by citizen scientists — this involved a mash-up of bioinformatics tools for sequence alignment (repurposed for use with text strings) and natural language processing tools to find tokens and perform some word synonymizing.  In the end, the informatics blender did indeed churn out a consensus —  but this attempt at automation led us to realize that there’s more than one kind of consensus.  In this post we want to to explore that issue a bit more.

So, lets return to our example text:

Some volunteers spelled out abbreviations (changing “SE” to “Southeast”) or corrected errors on the original label (changing “Biv” to “River”); but others did their best to transcribe each label verbatim – typos and all.

These differences in transcription style led us…

View original post 710 more words

Advertisements
This entry was posted in Uncategorized. Bookmark the permalink.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s