CC 4.0 and Open Data
After a lengthy two year development process the Creative Commons recently announced the publication of their Version 4.0 licences. The new licences include a raft of changes based on lessons learned through application of the Creative Commons licences around the world.
The changes include a number that relate to data publication, making the release an important milestone for those publishing or using open data. In this blog post I wanted to summarise a few of those changes to highlight how they are relevant to the publication and sharing of open data.
Traditionally, Creative Commons (CC) licences have used copyright law to allow rights holders to waive specific rights over their content. This works well for text, images, videos etc. But it has been problematic for dealing with sharing of data and databases. In some jurisdictions the creator of a database may have gained additional rights — sometimes referred to as sui generis rights — over some data, e.g. as part of the process of its compilation. This is true in the European Union which has legislation relating to database rights.
As these databases rights are different from copyright, they were either not covered by the Creative Commons licences or were inconsistently addressed. This lead to the creation of new data licences intended to address the short-fall.
This issue has been fixed in the Version 4.0 licences, which now explicitly address database rights, making them more suitable for data publishing. For open data, the CC-BY 4.0 and CC-BY-SA 4.0 licences are now a viable option for publishing both data and content. This greatly simplifies the licensing process, allowing publishers to pick from a single family of licences for all of their open works.
The Creative Commons Data guidance provides a good overview of how the new licences work with respect to both copyright and database rights. The ODI publishers and re-users guides to data licensing also provide good introductions to this topic. These introductory guides have now been supplemented with additional technical guidance on how to publish machine-readable rights statements and how to use that information in applications.
Correctly attributing data sources is an important part of building trust around open data and applications. However the attribution requirements for publishers are often unclear, or can be awkward to implement in some scenarios, e.g. in mobile applications. The ODI Rights Statement vocabulary is intended to improve this situation by ensuring that the relevant information is available in a machine-readable form. However there is still plenty of scope to explore approaches to attributing and citing datasets.
With the new release, the Creative Commons have revised their approach to attribution, adopting a "common sense" approach that recognises the need for flexibility around how attribution is performed. Importantly it allows re-users to link to their attribution statements, rather than including them on every page or screen.
For example, developers re-using CC licensed data can now create a general "acknowledgements" or "colophon" page in their application to refer to their sources. Particularly useful if you are re-using a large number of datasets, this seems like a good practice to encourage more widely across the community.
There are numerous other changes to the CC licences which generally improve their use for supporting the global publication of open data:
- simplified language makes it easier for non-experts to understand the terms of each licence
- a single set of licences suitable for all jurisdictions allows them to be re-used, unchanged, around the world
- translations into multiple languages facilitates re-use by many different people
- accidental license infringements can now be fixed, avoiding a complete loss of rights over some data
The UK OGL v2.0 has been defined to be compatible with CC-BY 4.0. Now the licence has been formally published, re-users in the UK and beyond can now safely adapt and re-publish UK Government material under compatible licences from the Creative Commons family.
While CC-BY 4.0 and CC-BY-SA 4.0 are not yet marked as conformant with the Open Definition, this review will shortly be getting underway and is unlikely to identify any major issues.
All things considered the CC 4.0 licences (CC-BY 4.0, CC-BY-SA 4.0, CC0) form a viable default option that should be given serious consideration by any open data and open government data initiative.