Skip to main content

cc0 vs. the world

Today I had some discussion that stirred up my desire to say, in a loud voice: "But look here, if it is credit you want, then the best way to get it is if your data are cc0, it's even better than cc-by!" So I tweeted:
TL;DR There are lots of hints, but apparently no direct studies that address this.

Two important bits of clarification based on my original thoughts. I was only interested in cc licensed data. I was not asking whether cc0 data is reused more than other cc- data, just whether cc0 data gets *cited* (yes, citations = bad metric, so use a generic "pointed to" perhaps) more than other cc- data, particularly cc-by.

The basic premise is that the best way to (ultimately) bring focus to your work is to make it completely free, and that this will bring more attention, in the long run, than requiring attribution.

"Unethical" people will use open data regardless of the license, however they want, so they are wash, and it follows that we can eliminate them from the conversation. If this premise is accepted (it is just a premise) then any possible mechanism that causes someone (i.e. an "ethical" person) to pause before they use a dataset will result in that dataset being less widely cited. Cc-by, however innocuous, is a mechanism that will cause some to pause. I'm not claiming this scenario as my idea, it is straightforward enough that many have thought of it. What's curious is that it seems that perhaps no-one has tested it explicitly.

Many thanks to all who responded with insights (see conversation by clicking on tweet) here is a list of tweeted links for future reference:


  1. > "Unethical" people will use open data regardless of the license, however they want, so they are wash, and it follows that we can eliminate them from the conversation

    Hmmm... I appreciate this discussion is intended I presume for the context of academia but outside that domain I really don't agree with your statement there.

    e.g. for open government data they're usually delighted that data gets used. It really doesn't always matter if the use is 'cited' or not - it's meant to be used & use is good, regardless. That's why the data is openly provided - to be used, not to be cited.

    I think there's also an interesting discussion to be had over the subtle difference between rigorous and detailed acknowledgement of the provenance of data & 'citation' -- they're not the same thing in my mind. Scientifically, a super detailed methodology of the provenance & handling/transformation of the data used is probably more *useful* in terms of reproducibility than a simple 'citation' (in terms of what I'm thinking of). I find citations rather constraining in terms of what can be expressed in the reference list I guess. That's not to say you can't have both, but just recognising that citation is a pretty crude & imperfect mechanism for credit-giving.

  2. No disagreement here. I was indeed trying to keep things narrowly focused, mostly for pragmatic reasons. There certainly won't be one parameter (citations) that allows us to point at all the downstream consequences of open data.

    I chose citations as a proxy for a broader concept of linkages, or pointers. In my thinking citations, or any linking mechanisms, for example provenance chains, are a sort of "gravitational" mechanism, that draws attention (and other data) towards some information (open data). Data with a lot of "gravity", regardless of whether the they are good or bad, should draw in other resources (e.g. users of the software that produced that data, grant funding, public scrutiny, critical reviewers, repeat experiments, etc.). This seems like a reasonable hypothesis (and again, I'm not claiming to have come up with it), but actually testing it requires some proxies to start with, like the old-school concept of citations.


Post a Comment

Popular posts from this blog

NSF Summer Internship

Insect Frenzy

This is the third in a series of posts by our intern Jeff  Jaureguy.

Day 3 June 27, 2016

         I was spoiled this morning with an amazing Norwegian breakfast called  a smothered omelet lefse wrap at the Norske Nook.  Who would have known this beautiful gem would be in such a small town.  We packed the car and headed out of the town of Hayward towards our next field site in Washburn county on the Namekagon River at Lat: 46.02739, Long: -92.01258.  This was a very large river about 80 m wide and had a blackish brownish color.  I ended up using a dip net the whole time scouring the river for aquatic insects.  I found a lot of caddisfly pupae and casings on the bottom of the river along with some local fish, my first catch!   The next site we collected at was in Burnett county at the St. Croix River at Lat: 46.07568, Long: -92.7077.  This location was a very large sinuous river with a dark brown color to the water.I collected samples using a beating sheet and stick in the river.I found a…