Open data and democracy: answer by Henri Verdier, the "Mr. Data" of government

Henri Verdier is currently the director of Etalab, the French government’s Open Data task force in charge of opening public data and development of the Open Data platform. http://www.henriverdier.com/ Twitter: @henriverdier

Article originally published in French on Le Nouvel Observateur / Rue 89 translated by André Confiado of Five by Five, www.fivebyfive.paris

The text was motivated by the interview of British sociologist Evelyne Ruppert, which appeared on Rue89. I called out to Henri Verdier on Twitter to ask him about what he thought about it. This is his response, initially published on his blog, which we are reproducing with his kind authorization. Open Data is a complex debate but it is in the heart of the changes in our society. It deserves to garner the interests of citizens, outside of the inner circle of the initiated. - Pierre Haski.


An article entitled "Is open data a political illusion" appeared beginning of July on the journal MyScienceWork, then reprinted in La gazette des Communes, and then by Rue89.

This interview by Evelyne Ruppert, a British sociologist and notably the writer of the blog Big data and society, is inspired by her work on transparency in Britain which she appears to know well, but adapter to the French approach, of which she appears to know a little less.

Evelyne Ruppert formulates an analysis which can be summarized as:

  • Absolute transparency is an illusion, since governments always choose what they want to communicate, and never share the most important information;

  • Transparency does not build confidence, but rather mistrust, since it can never be complete;

  • The steps for transparent limits citizens to data that we’d like to transmet to them;

  • Open Data promises a more direct rapport with power, but in fact creates a new technocracy, that of those that can understand data;

  • Thus, close attention has to be paid to documenting the data itself (who created it, when, why, etc.) in order to allow citizens to criticize the data that is given to them.

Double mistrust

A number of friends ask me what I think of this paper. It’s embarrassing: I more or less agree with everything that it says, without me really feeling concerned.

Fundamentally, I think Evelyne Ruppert reasons from an implicit idea that I would qualify as a "model of double mistrust."

Her implicit reading of the Open Data movement is the following: as a response to the increasing mistrust of citizens, governments decided to release certain information allowing citizens to control them better, hoping to restore this confidence.

I do not know if this reasoning exists elsewhere. One feels that this is related to the British context where open data is hard to separate from the Big Society project.

However, what I know is that this is not the context of the French government, and that it is not the spirit in which Etalab works.

In France, the opening and sharing of public data is not seen as an end by itself, but rather levers that can service three objectives:

  • A more complete democracy;

  • Innovation and growth;

  • And a more efficient public action.

Transparency, "accountability," participation…

Let’s start with the democratic dimension. In France, opening data does not rely on a value granted to "transparency."

Incidentally, I am personally not a fan of this concept. I find in it an apolitical apathy, traces of the theory of the invisible hand, and ignorance of the resources of human activity.

I prefer more the concept of responsibility or accountability, which recognizes the entire dignity of the subject that exercises its responsibilities.

Opening of public data in France is founded on the Declaration of Human Rights and the Citizen, and its article 15:

"Society has a right to demand for an account from every public agent of his/her administration"

The Cada law, which is a legislative translation of this article 15, deals with "administrative documents" (memos, opinions, notes, letters, files, databases, etc.).

We feel that it is not a question of placing the administration in glass displays, of spying on all exchanges, of dragging everything to the spotlight, in all circumstances. It is a question of making public acts of responsibility, "documents" that an author worked on where s/he is responsible for notably to his/her superiors.

The law provides for the bearing of this responsibility before the citizens. It does not ban hesitation, the secrets of the deliberation, the preparation of the decision.

Incidentally, it provides as well several exceptions to this principle of publication: protection of the right to privacy, national security, secrets protected by law.

We do not ask the State to describe reality.

Consequently, the law regulates as well the question of data that the State has to produce and share, which seems to worry Evelyne Ruppert. We do not ask the State to describe reality. We ask it to share with simplicity the data that it uses within the framework of its public service missions as they are used.

There is no need to open the epistemological question of the meaning of the data, from the point of view that it dissimulates possible observation biases.

Open Data is not the public statistics office, and neither is it a great story through which the State tells us what to think…It is the sharing of instruments with which the State works, and on which it bases its decisions. It's the search for a second life and of a new use for the knowledge that the States creates through its daily activity.

Of course, there are other activities that the State keeps secret for more or less good reasons. There are democratic battles to reduce the perimeter of secrecy. But that are also many other jobs that are taking more and more steps to share, and that are learning to find new efficiencies thanks to the relationships opened through sharing.

Evidently, opening data is nothing without dialogue and consultation around the data: the aim is to create the conditions for an active, informed, and responsible citizenry.

Exchanging with the users of this data, responding to their questions, compiling criticisms and suggestions is one of the essential changes that these steps produce.

It is one of the dimensions of the data.gouv.fr portal: a world first in this genre, authorizing its users, administrations, citizens, researchers, to enter into a dialogue with the producer of the data, to share points of view not his/her own, to improve the data, to cross-reference it with others, even depositing data that is not the State's.

With more than 1300 reuses one semester after opening the platform, it is clear that a living community was created, and that it took hold of this resource.

We believe in debate

We believe a lot in this debate, which is developing on our platform, on Twitter, on many "hackathons" that we organize, as well as within organizations. Even more so than on metadata.

This is because I wanted to highlight the paradox of Evelyne Ruppert's position: after making the observation on the impossibility of transparency, and the impossibility of restoring confidence, she proposes to us a "transparency on transparency", a "metatransparency."

Without active communities, confident and caring, you can increase transparency to the second or third power, and nothing will happen. As always on the internet, what counts are the human communities that organize themselves thanks to these resources.

France is highly conscious of the fact that sharing data should allow for the construction of genuine gestures of democratic exchange, to kindle the informed contribution of the citizen to public decisions. Opening data should lead to an opening of the public decision in itself.

This is the meaning behind France's entry into the Open Government Partnership announced in May by the President. By joining this community of public innovators that elected us to its steering committee on August 4, France has chosen to build a partnership with people that know that open data takes all of its meaning in open government, and who work concretely in this direction.

A democracy of the actors

There are, however, other dimensions to open data not limited to the control of representative democracy. There is a dimension for empowerment which is curiously absent from this interview. To really understand this, you have to understand how much this movement is linked to the current digital revolution, and how much an impact it can have on the economy of innovation.

We have entered a world where every ten years the power of computers is multiplied by thirty, and the cost is divided by the same number. Very recently, digital technology has opened in society an unimaginable power to act.

Added to the existence of the internet, which allows citizens to be synchronized, to organize, and to cooperate, this revolution has brought us to a new world where the power of the many has become an essential political parameter.

Admittedly, not everyone is equal before technology, as rightly emphasized by Evelyne Ruppert. Admittedly, the risk of a digital fracture is chronic. Admittedly, we shall see the appearance of new technocrats.

But, the same applies to all democratic revolutions. [The events of] 1789 or 1830 did not equally distribute power to all French people. But it did associate more people to power.

This prioritization of skills is not antidemocratic. The multitude is not the masses. It is a living body, moving, with its own dynamics and organisations.

There are a multitude of skills. The social organization becomes multipolar. New elites appear, basing their legitimacy on immeasurable dimensions. Bloggers can hold their own against journalists. Collectives develop Linux, Wikipedia, or OpenStreetMap. Guardianship on the social web will allow for a new prioritization of information.

To think of society without taking this reality into account is to confine oneself to a pre-digital vision. It's to give up on using this dynamic as a lever.

The "contributions of the multitude"

If data.gouv.fr was opened to contributions from the multitude, it's not to make it "interactive." It is because the site strives to become a platform embodying a community of producers and users of data.

This is because the State no longer has the monopoly on the capacity to create information of general interest. This is because OpenStreetMap, OpenMétéoForecast, OpenFoodFacts, WikiDB, but also Celtipharm, Que Choisir ?, the Red Cross, and tomorrow perhaps even unions, associations, and think thanks, have something to say, something to share.

Why split the community of producers and the reusers that work in a network?

It's also the feeling of new alliances that we look for with civil society when we support the winners of Dataconnexions on Kiss Kiss Bank Bank, when we help out project Bano, when we develop research models such as OpenFisca

Admittedly, the State has a big role. Data.gouv.fr cleanly separates data from authorities created by the State holding itself liable on its sincerity, and data created by other acteurs that the State merely authenticates its identity if asked. But this difference does not stop cooperation.

The revolution of usages

Above all, what counts with Open Data is what we can do with it. It is this dimension that is at the basis of two others ambitions of Open Data: fuel for innovation and motivation for the efficiency of public action. This data is not only to "watch the State." Far from it. It is material for the exercise of an active citizenry. It is material for creation.

Admittedly, there are unfounded secrets, but there is a need for more Mediapart (a web site containing information and opinion articles) to update these. This is, however, not the primary mission of Open Data. The mission of Open Data is that there are loads of unused knowledge in our systems.

Knowledge that comes to life and takes value when put into circulation. There aren't just little secrets in the State. There are also thousands of useful and passionate information resulting from the work of thousands of public servants, involving their professional knowledge.

The geolocated and time-stamped map of road accidents used during the recent hackathon organized by OKFN and the Ministry of the Interior, and that enabled Rue89 to do a fantastic job, is worth its weight in kilobits. The files of street names that fed so many developments. The statistics on pollutions, the funding of the map, weather data…

It is the story of "Moneyball," a cult movie for "datascientists." There was something in the statistics of Major League Baseball (in the USA) to reconstruct the sport. There is something in data from the State that can reconstruct large areas of public action. The important thing is to render this data implementable so that new actors can take this data and ask new questions.

It is a dimension that is frequently lost on non-coders: Open Data is not limited to revealing data. The aim of it is to be accessible, manipulable, implementable. This is why it opens up a universe of unforeseen reuses.

We do not ask for raw data in the name of some sort of naturalism of data. We ask for raw data in order to have the most manipulable data possible, to get the most number of interpretations possible, and to facilitate more possible manipulations.

This is what Tim Berners-Lee wanted to say with "raw data." This is not a sociological premise: this is the claim of an engineer. Deep down, we do not understand Open Data as long as we do not understand the difference between information concealed in a 1459 page report from the Court of Auditors, and information replaced in a complete series in an easy-to-use spreadsheet.

The revolution of data

We have in fact entered a new world where: whole segments of reality are codified and thus become analyzable; an increasing number of individuals and entities would know how to get new uses from them; big data opens up epistemological perspectives whose limits we have yet to see.

By sharing data under a licence that prevents it from being used for individual profit, one introduces a political dimension to this data revolution. One rebalances power, prevents monopolies, creates common goods. It is one of the forms of essential political action at this period of digital revolution.

The question remains about the quality of the data, its pertinence, and its capacity to describe reality. You have to admit that human sciences has a long history with data.

From phrenology to contemporary marketing, from Gobineau Alexis Carrel, from IQ to EQ, they have often translated their prejudices into pseudo-knowledge.

They are well placed to know the risk of using contestable ontologies, of selecting correlations presented as causalities, of hiding from politics under the appearance of objectivity (see Stat-Activism, how to fight against the numbers), or of working on data that does not reflect the prejudices of its authors.

We mentioned it above. It’s not entirely the problem of Open Data, that demands the simple sharing of tools that the State uses for its missions.

But if we really want to talk about it, let's remember that not all data is of the same nature:

  • The exact number of the prison population

  • The geolocation of hospitals

  • Trains schedules

  • The exact number of professors or tax officers

  • The price of fuel

  • The zip codes of villages in France

  • Weather forecasts

  • Road alignments

Certainly the data are intellectual constructions, but does not mean that they do not pose a big epistemological question.

We need not be told every time about the difference between noumena and phenomena and the myth of the cave. There is factual information that citizens demand, and that we can share without erring on the side of caution.

Other data need more interpretation and deserve more analysis. For example, since the work of the Stiglitz commission, France is familiar with the problems posed by GDP. Why is the value creation of government employees only addressed through their payroll?

It is true that a lot of data does not exactly say what it appears to say. In this case it is true that the more transparency on the choices regarding the construction of this data, the more precision on the implicit hypotheses are in order.

But conversely, it must be highlighted that each one of this data responds to at least one question. The statistics on justice, for example: the social distribution of sentences is it:

  • The distribution of delinquency?

  • The punishment strategies adopted by judges?

  • The capacity to call on a good lawyer?

  • Or does it talk about the distribution of the forms of delinquency and the severity of the law depending on these forms?

I do not know. But sure enough, the precise analysis of the conditions of the construction of this data allows one to understand it. One just has to ask the right questions on this data.

By any standards, the world of Open Data appears to me to be more capable of asking these right questions than other social worlds, and notably the media.