Great observation from the Notes from Nature team: ‘Citizen science approaches place us right between existing standards-oriented thinking in biodiversity informatics and edition-oriented thinking in the humanities.’ http://www.notesfromnature.org/
(cross-posted at SciStarter)
In our last post, we went through the mechanics of how to find consensus from a set of independently created transcriptions by citizen scientists — this involved a mash-up of bioinformatics tools for sequence alignment (repurposed for use with text strings) and natural language processing tools to find tokens and perform some word synonymizing. In the end, the informatics blender did indeed churn out a consensus — but this attempt at automation led us to realize that there’s more than one kind of consensus. In this post we want to to explore that issue a bit more.
So, lets return to our example text:
Some volunteers spelled out abbreviations (changing “SE” to “Southeast”) or corrected errors on the original label (changing “Biv” to “River”); but others did their best to transcribe each label verbatim – typos and all.
These differences in transcription style led us…
View original post 710 more words