Archive for the ‘Posts from researchers’ Category
This is a guest post from Andrea Thomer and Rob Guralnick from Notes From Nature. See more about the authors below. Find more citizen science projects about words.
Notes From Nature (official site) is a citizen science transcription project that launched in April 2013 and is part of the Zooniverse stable of projects. The goal behind Notes from Nature is to make a dent in a major endeavor – digitizing the information contained in biocollections scattered over the nation and world. More information about the project can be found here. Conservative estimates place the number of biology and paleontology museum specimens at over a billion in the United States alone. The task of digitizing these records cannot be completed without help, and Notes from Nature is one place where we can ask for that help. Because we know some specimen labels are hard to read, we don’t simply ask for one transcription per record. Instead, each record is seen by at least four pairs of eyes. This has its own challenges; how do we take those 4+ transcriptions and get a reconciled or ‘canonical’ version?
In a previous post on our blog “So You Think You Can Digitize,” we went through the mechanics of how to find consensus from a set of independently created transcriptions by citizen scientists — this involved a mash-up of bioinformatics tools for sequence alignment (repurposed for use with text strings) and natural language processing tools to find tokens and perform some word synonymizing. In the end, the informatics blender did indeed churn out a consensus — but this attempt at automation led us to realize that there’s more than one kind of consensus. In this post we want to to explore that issue a bit more.
So, lets return to our example text:
Some volunteers spelled out abbreviations (changing “SE” to “Southeast”) or corrected errors on the original label (changing “Biv” to “River”); but others did their best to transcribe each label verbatim – typos and all.
These differences in transcription style led us to ask — when we build “consensus,” what kind do we want? Do we want a verbatim transcription of each label (thus preserving a more accurate, historical record)? Or do we want to take advantage of our volunteers’ clever human brains, and preserve the far more legible, more georeferenceable strings that they (and the text clean-up algorithms described in our last post) were able to produce? Which string is more ‘canonical’?
Others have asked these questions before us — in fact, after doing a bit of research (read: googling and reading wikipedia), we realized we were essentially reinventing the wheel that is textual criticism, “the branch ofliterary criticism that is concerned with the identification and removal of transcription errors in the texts of manuscripts” (thanks, wikipedia!). Remember, before there were printing presses there were scribes: individuals tasked with transcribing sometimes messy, sometimes error-ridden texts by hand — sometimes introducing new errors in the process. Scholars studying these older, hand-duplicated texts often must resolve discrepancies across different copies of a manuscripts (or “witnesses”) in order to create either:
- a “critical edition” of the text, one which “most closely approximates the original”, or
- a “copy-text” edition, which “the critic examines the base text and makes corrections (called emendations) in places where the base text appears wrong” (thanks again, wikipedia).
Granted, the distinction between a “critical edition” and a “copy-text edition” may be a little unwieldy when applied to something like a specimen label as opposed to a manuscript. And while existing biodiversity data standards developers have recognized the issue — Darwin Core, for example, has “verbatim” and “interpreted” fields (e.g. dwc:verbatimLatitude) — those existing terms don’t necessarily capture the complexity of multiple interpretations, done multiple times, by multiple people and algorithms and then a further interpretation to compute some final “copy text”. Citizen science approaches place us right between existing standards-oriented thinking in biodiversity informatics and edition-oriented thinking in the humanities. This middle spot is a challenging but fascinating one – and another confirmation of the clear, and increasing, interdisciplinarity of fields like biodiversity informatics and the digital humanities.
In prior posts, we’ve talked about finding links between the sciences and humanities — what better example of cross-discipline-pollination than this? Before, we mentioned we’re not the first to meditate on the meaning of “consensus” — we’re also not the first to repurpose tools originally designed for phylogenetic analysis for use with general text; linguists and others in the field of phylomemetics (h/t to Nic Weber for the linked paper) have been doing the same for years. While the sciences and humanities may still have very different research questions and epistemologies, our informatics tools have much in common. Being aware of, if not making use of, one another’s conceptual frameworks may be a first step to sharing informatics tools, and building towards new, interesting collaborations.
Finally, back to our question about what we mean by “consensus”: we can now see that our volunteers and algorithms are currently better suited to creating “copy-text” editions, or interpreted versions of the specimen labels — which makes sense, given the many levels of human and machine interpretation that each label goes through. Changes to the NfN transcription workflow would need to be made if museums want a “critical edition,” or verbatim version of each label as well. Whether this is necessary is up for debate, however — would the preserved image, on which transcriptions were based be enough for museum curators’ and collection managers’ purposes? Could that be our most “canonical” representation of the label, to which we link later interpretations? More (interdisciplinary) work and discussion is clearly necessary — but we hope this first attempt to link a few disparate fields and methods will help open the door for future exchange of ideas and methods.
Image: Notes From Nature
References and links of potential interest:
If you’re interested in learning more about DH tools relevant to this kind of work, check out Juxta, an open source software package designed to support collation and comparison of different “witnesses” (or texts).
Howe, C. J., & Windram, H. F. (2011). Phylomemetics–evolutionary analysis beyond the gene. PLoS biology, 9(5), e1001069. doi:10.1371/journal.pbio.1001069
Andrea Thomer is a Ph.D. student in Library and Information Science at the University of Illinois at Urbana-Champaign, and is supported by the Center for Informatics Research in Science and Scholarship. Her research interests include text mining; scholarly communication; data curation; biodiversity, phylogenetic and natural history museum informatics; and mining and making available undiscovered public knowledge. She is particularly interested in information extraction from natural history field notes and texts, and improving methods of digitizing and publishing data about the world’s 3–4 billion museum specimen records so they can be used to better model evolutionary and ecological processes.
This is a guest post by Dr. Tom Keeble, who was born and raised in Melbourne, Australia, and completed a science degree with honours at The University of Melbourne. He then completed a Ph.D, studying Developmental Neurobiology, at The Walter and Eliza Hall Institute in Melbourne, and the Queensland Brain Institute. He did a postdoc in Singapore and has now moved into Science Communication. Because he couldn’t see himself staying in the active research scene but hated the thought of leaving science entirely, becoming the Neuroscience Communicator at The Florey Institute of Neuroscience and Mental Health has been the perfect fit.
Most people reading this blog will be familiar with the idea crowdfunding – so I won’t explain the concept in much more detail other than to state its definition as “asking heaps of people to chip in to do something epic.”
Pozible is the Australian equivalent of Kickstarter.com, and is the third largest crowdfunding platform in the world. It works on the “all-or-none” model of funding projects, so if you don’t reach your target, you don’t receive any of the funds (you can’t buy ¾ of a PCR machine…). Fifty-five percent of Pozible projects are successful, and they have raised over $11 million since 2010. What’s more, they have an entire section of their site dedicated to crowdfunding research projects.
The Florey Institute of Neuroscience and Mental Health performs some pretty epic neuroscience – we’re 4th in the world in terms of cumulative publication citations since 2002, and we study the brain from conception right through to the end of life. Major disease focuses include stroke, epilepsy, autism, Multiple Sclerosis, Parkinson’s, Huntington’s and Alzheimer’s Diseases.
In the Australian context, the pool of funds available for Medical Research via our National Health and Medical Research Council has remained static at $800 million, while funding success rates have fallen to an all-time low of 17%, with even more dismal early career researcher success rates.
Against this background, crowdfunding can provide the resources to generate pilot data that forms the basis of a larger grant application, particularly for a high-risk high-reward project where proof of principle is crucial, and for younger researchers still establishing that vital “track record.”
Pledgers at every level get to be more hands-on with the research, becoming part of the daily life of the labs that are raising the money, through online engagement and in the case of higher pledgers, visits to the Institute. The campaign is also a valuable tool in educating scientists about their role in public engagement – increasingly being seen as non-negotiable when receiving public dollars.
Traditional engagement tools at The Florey Institute include direct mail, e-newsletters, on-site public lectures and school outreach programs. These are very successful, but in the next decade this model is going to need updating – and online engagement through crowdfunding is great training for scientists; engage or perish!
Now, to the projects themselves!
The Florey Institute has 7 projects up on Pozible, the most successful ones being run by those with extensive online and offline networks to draw upon. Our standout performers have been, in no particular order:
- A project run by Dr David Hawkes gives pledgers the chance to either suggest names for 4 viral vectors he’s creating, with the most popular names getting the honour, or you can skip the popularity contest and ‘buy’ a name for the vector yourself – which will then literally go viral as it spreads to his collaborators around the globe.
- Dr. Wah Chin Boon has leveraged her extensive international connections to great success for her project examining DNA changes in response to environmental chemicals possibly leading to Autism.
- Everyone’s looking for ways to reduce the pharmacopeia associated with modern day life. Animal studies have shown that light levels play an important role in increasing or decreasing the number of brain cells that produce dopamine, an important “feel-good” neurochemical. Dr Tim Aumann is looking to see whether this holds true for humans as well, by examining brains from people who lived (well, died) during periods of long days and short nights, or vice versa, opening the door to drug-free brain treatments.
- And finally, in what might be a world-first, Faith Lamont is looking to crowdfund her Ph.D stipend! Due to citizenship restrictions, Faith as a New Zealander is ineligible for funding from the Australian Government, so she’s looking for funding from the people! Her project aims to use humanized mouse assays – ipads for mice - to better assess learning and memory in mice in the context of Schizophrenia and Autism. Faith’s even made a little game where you can test your cognitive skills against those of a mouse.
So head over to Pozible and check out the projects – one of the added beauties of crowdfunding is that the project doesn’t even have to be in your own backyard, the benefits of science are global – and epic.
From searching for invertebrates to measuring wind speed, everyone can gain new knowledge and skills and play their part in protecting the natural environment. This is the philosophy of Open Air Laboratories (OPAL), a project based in England that encourages the public to explore their surroundings, record their findings, and submit their results to the OPAL national database making their contribution available to scientists and others involved in environmental science and policy.
OPAL has created six surveys that the public can use to collect data and all are important areas of research:
Each one of these surveys has been designed so that anyone can use them – no specialist knowledge is needed to take part and equipment is either provided or is easy to make or find. The instructions are simple to follow and each survey contains a ‘workbook’ for recording results. Once people have completed their survey, they upload their results onto the OPAL website or send them by post.
This past winter, we invited you to participate in SnowTweets and simply “measure your snow to help the planet.”
SnowTweets is a citizen science project run by cryosphere researchers Richard Kelly (pictured far left) and Raymond Cabrera at the University of Waterloo (Canada), who sent us the following report to share with you! They’d love to hear back from you, so please feel free to post your reactions in the comments field, below.
Crowdsourcing for Weather and Climate Science: The Snowtweets Project
By Richard Kelly and Raymond Cabrera Interdisciplinary Centre on Climate Change, and Department of Geography and Environmental Management University of Waterloo
Where in the world is the water?
Roughly 71% of the Earth’s surface is covered by water, most of which is contained in oceans, ice sheets, glaciers, lakes and rivers. Much of it is stored as seasonal snow; as much as 50% of the northern hemisphere’s land surface is covered during the year. (By springtime, the winter snow accumulation melts and finds its way into rivers that fill reservoirs or replenish the lakes and oceans.)
An accurate system for monitoring seasonal snow accumulation is important for several reasons, not the least is of which is to help policy makers in making sound decisions concerning the protection of our planet.
At the University of Waterloo, we cryosphere researchers have been studying the accuracy of global and regional snow cover extent and snow accumulation estimates which are based on observations from sensors such as NASA’s MODIS–or Moderate Resolution Imaging Spectroradiometer–a key instrument aboard the Terra (EOS AM) and Aqua (EOS PM) satellites, and sophisticated models. We also know that certain things can interfere with their estimation accuracy:
• Clouds, for example, can obscure snow from being observed by visible and infrared instruments aboard satellites.
• When publishing daily snow depth model estimates, both the Canadian Meteorological Centre (CMC) and the U.S. National Oceanic and Atmospheric Administration (NOAA) combine environmental variables known to control snow depth variation.
• And, while these models are available for global or national regions on a daily basis, the data are averaged over large areas typically from 1 km up to 25km wide.
To help us gather as much on-the-ground data as possible, we turned to citizen scientists and asked them (you) to monitor snow cover and snow depth by simply telling us how much snow had accumulated around them, and simply report it as depth.
Citizen scientists from around the world participated in this endeavor, which we dubbed Snowtweets because we accepted data both by using Twitter and a standard webform on ScienceForCitizens.net.
Snowtweets provided us with pinpoint measurements at specific times. We put this data right to use by comparing it to daily global and regional snow cover model estimates.
We put this data right to use by comparing it to global and regional snow cover model estimates.
How well did the citizen scientists’ data stack up against modern snow-mapping models? From the comparisons made to date, citizen scientists’ measurements, on average, match the snow cover model estimates! The differences were generally related to differences in the spatial resolution of the models compared with the pinpoint measurements of the Snowtweets data. These ground measurements, in part, verify the snow cover models, at least in regions where the snowtweets were reported. Snowtweets also complement these (daily or weekly) models by providing a sparse, yet near-real-time source of data and we are looking towards ways to incorporate them carefully once they are available.
We know we’ll need to see more frequent measurements from citizen scientists when we run this again, this winter. This will help us look for patterns and, again, compare the information against the models. With enough Snowtweets over a few seasons, we may be able to create a new snow depth product, blending and mapping satellite observations with ground measurements in real time! If you’d like to learn more about Snowtweets or our findings, please post your questions or comments in the comments field and we’ll respond as best we can. To sign up to participate in Snowtweets, go to scienceforcitizens.net/changing-planet to register or to snowtweets.org for instructions on how to contribute via Twitter.
Thank you to all who participated. We look forward to continuing this research with you, this winter!