Skip to main content

cc0 vs. the world

Today I had some discussion that stirred up my desire to say, in a loud voice: "But look here, if it is credit you want, then the best way to get it is if your data are cc0, it's even better than cc-by!" So I tweeted:
TL;DR There are lots of hints, but apparently no direct studies that address this.

Two important bits of clarification based on my original thoughts. I was only interested in cc licensed data. I was not asking whether cc0 data is reused more than other cc- data, just whether cc0 data gets *cited* (yes, citations = bad metric, so use a generic "pointed to" perhaps) more than other cc- data, particularly cc-by.

The basic premise is that the best way to (ultimately) bring focus to your work is to make it completely free, and that this will bring more attention, in the long run, than requiring attribution.

"Unethical" people will use open data regardless of the license, however they want, so they are wash, and it follows that we can eliminate them from the conversation. If this premise is accepted (it is just a premise) then any possible mechanism that causes someone (i.e. an "ethical" person) to pause before they use a dataset will result in that dataset being less widely cited. Cc-by, however innocuous, is a mechanism that will cause some to pause. I'm not claiming this scenario as my idea, it is straightforward enough that many have thought of it. What's curious is that it seems that perhaps no-one has tested it explicitly.

Many thanks to all who responded with insights (see conversation by clicking on tweet) here is a list of tweeted links for future reference:


  1. > "Unethical" people will use open data regardless of the license, however they want, so they are wash, and it follows that we can eliminate them from the conversation

    Hmmm... I appreciate this discussion is intended I presume for the context of academia but outside that domain I really don't agree with your statement there.

    e.g. for open government data they're usually delighted that data gets used. It really doesn't always matter if the use is 'cited' or not - it's meant to be used & use is good, regardless. That's why the data is openly provided - to be used, not to be cited.

    I think there's also an interesting discussion to be had over the subtle difference between rigorous and detailed acknowledgement of the provenance of data & 'citation' -- they're not the same thing in my mind. Scientifically, a super detailed methodology of the provenance & handling/transformation of the data used is probably more *useful* in terms of reproducibility than a simple 'citation' (in terms of what I'm thinking of). I find citations rather constraining in terms of what can be expressed in the reference list I guess. That's not to say you can't have both, but just recognising that citation is a pretty crude & imperfect mechanism for credit-giving.

  2. No disagreement here. I was indeed trying to keep things narrowly focused, mostly for pragmatic reasons. There certainly won't be one parameter (citations) that allows us to point at all the downstream consequences of open data.

    I chose citations as a proxy for a broader concept of linkages, or pointers. In my thinking citations, or any linking mechanisms, for example provenance chains, are a sort of "gravitational" mechanism, that draws attention (and other data) towards some information (open data). Data with a lot of "gravity", regardless of whether the they are good or bad, should draw in other resources (e.g. users of the software that produced that data, grant funding, public scrutiny, critical reviewers, repeat experiments, etc.). This seems like a reasonable hypothesis (and again, I'm not claiming to have come up with it), but actually testing it requires some proxies to start with, like the old-school concept of citations.


Post a Comment

Popular posts from this blog

Guest Post: Notes on mx: Lessons from Treehoppers

In response to our previous post this is a guest post from Lewis L. Deitz, Department of Entomology. North Carolina State University, Raleigh, NC. Thanks for the feedback Lew, you've set some lofty goals for us to reach!

I sincerely appreciate the creation of this blog for user input. I urge other users to set aside time to share their recommendations. A great deal is outstanding about mx, even though my focus is necessarily on items that I feel might be improved. My suggestions stem from work in developing the Treehoppers Website and Database. We bulk-loaded taxon data for higher categories and genera from spreadsheets. I fear that refining these data and back-filling data that did not quite fit mx formats will require a tremendous amount of time and effort. It is my hope that the lessons learned from my experience will prove helpful to future mx projects and the development of mx/TaxonWorks.

Database as Work in Progress: Need for Draft Data and Explanatory Data I have…

Insect Frenzy

This is the third in a series of posts by our intern Jeff  Jaureguy.

Day 3 June 27, 2016

         I was spoiled this morning with an amazing Norwegian breakfast called  a smothered omelet lefse wrap at the Norske Nook.  Who would have known this beautiful gem would be in such a small town.  We packed the car and headed out of the town of Hayward towards our next field site in Washburn county on the Namekagon River at Lat: 46.02739, Long: -92.01258.  This was a very large river about 80 m wide and had a blackish brownish color.  I ended up using a dip net the whole time scouring the river for aquatic insects.  I found a lot of caddisfly pupae and casings on the bottom of the river along with some local fish, my first catch!   The next site we collected at was in Burnett county at the St. Croix River at Lat: 46.07568, Long: -92.7077.  This location was a very large sinuous river with a dark brown color to the water.I collected samples using a beating sheet and stick in the river.I found a…