‘Free your locked-up data’: FAO statistician advises companies on being more efficient

Steve Katz of the FAO’s Statistics Division calls on institutions in the agriculture industry and beyond to make better use of company data by ensuring it is accessible, of a high quality and can be integrated across different processes

null By unlocking and integrating their data, companies can avoid duplication, inefficiency and inconsistency in how it is accessed and managed. CC BY 2.0, uploaded by [Brenda Clarke](https://www.flickr.com/photos/brenda-starr/4498078166/in/photolist-7RtPVQ-6vJiUQ-6tS17G-9Vypi4-6YKqEc-9VBaGL-8gCrWa-9VB9gG-iJdgCX-7151rB-2jzpPJ-iuSuaX-rV5HX-fUS7TF-aoW8KP-7vXkgX-8kED5C-gdVNmx-2hdD4U-4uZVTx-4HMHFB-cNCqW3-jNzM5-bpo5vA-9Z4bEt-nTwNU-6Z6HtE-5sK2Yv-4sHbkD-TjBT-LgJpn-718ZJ1-kkf4m-o5CEj-spBHyW-pqw5J4-9EptFx-r4HWnL-5sEueF-6T9vJd-FA5EV-x42nN7-4iAakH-8SFe45-9VykZK-6pS5yA-asqvck-7yeBkh-6ZH4sh-5ooWsh “Brenda Clarke").

With all this talk lately about big data, data revolutions, crowd sourcing, data warehousing, knowledge repositories and the like, one tends to lose sight of and forget the fundamentals of data and its use.

The fact is that many institutions continue to have serious difficulties managing their core corporate data and making it accessible in user-friendly formats for both human and computer-system consumption. At the same time, for the most part, data is still generated in a highly decentralised manner and primarily to support business processes and transactions, with little consideration for other possible uses.

For example, most organisations have an automated system to manage the workflow of staff business travel (ie to administer transactions and expenses). However, although this same data could potentially be applied to other important business objectives (to assess who has travelled where and for what reason, to see who has interacted a lot with a specific country or to identify those areas of the world receiving more or less coverage than others, for example), often this is simply not the case. The reason for this is that typically data is not well integrated across different business processes and organisational units, leading in turn to duplication, inefficiency and inconsistency in data access and management.

Early decisions on standards can limit corporate data’s potential

The potential use of corporate data for purposes beyond its original context usually emerges only as an afterthought, when data management standards and technological choices have already long been made. Moreover, such decisions are often put in the hands of IT experts who may not have sufficient knowledge or familiarity with the business processes that generate the data in the first place. This in turn results in the development of 'data warehouses' and 'management information systems', providing sophisticated 'business intelligence tools' and/or 'data-mining techniques'. But these are unfortunately not demand-driven, and they present data (and processed derivatives) out of its original context, making it hard to understand and difficult to use.

A lack of metadata can cause all sorts of problems

Metadata – which describes the context and the use of a specific dataset: data about the data – is generally limited and difficult to access in company data. The result of this can be catastrophic; organisations find it difficult to assess the full set of data they have at their disposal, or do not know where to get the data they need, even when it is theoretically available. They have to conduct expensive ad-hoc and manual data exploration to obtain information that should really be readily available and accessible at their fingertips (administrative travel data, for example).

Data hoarding confuses users

There are also cultural issues that affect the availability and accessibility of data. Motivated by the famous adage that 'information is power', some data owners intentionally hide and lock their data, even when the underlying technologies would permit it to be easily shared. The exact opposite occurs in the case of data hoarding, where too much overlapping data is collected and kept indefinitely, making it difficult for users to know what version of the data should be used for what purpose, even when all the data is readily accessible.

How to free your locked-up data

For those people starting new data projects, or wanting to do something about all this locked-up data waiting to be freed up, here are a few tips and guiding principles that you may find helpful:

  • Engage with your users and assess their current and emerging data needs over time.

  • Monitor actual data usage by your customers and use that feedback to further develop your data services. This should not only include the quantitative measures that most institutions already take, such as web traffic analysis, but also qualitative assessments, such as user satisfaction surveys and case-studies of actual use.

  • Develop and implement a corporate data model and a quality assurance framework. For example, at the Food and Agricultural Organisation (FAO) we have adopted a Quality Assurance Framework for all the statistical data outputs it produces.

  • Develop and promote an open data policy within your organisation. This will help to raise awareness of the importance of opening data and how it can benefit business by extending your audience and smoothing demand curves, among many other benefits. Developing an open data policy will also encourage the adoption of good-practices when implementing open data, ensuring initiatives are properly planned and budgeted for, reviewed, evaluated and therefore sustainable.

  • Keep your data separate from applications that use it, to facilitate reuse.

  • Nominate a Chief Data Officer, with the responsibility to keep a birds-eye-view of your organisation’s data and to develop an organisational model and catalogue of what information is available, where and in what format.

  • Include metadata in data collection processes and make sure that it is made available for user consultation.

  • Advocate the use of international and non-proprietary data standards in your organisation. For example, in the statistical area, FAO is promoting the use of SDMX for data exchange and offers a list of recommended standards and good practices related to data collection and dissemination.

  • Share information about current data projects – both internal and external to the organisation – and encourage collaboration and joint ventures. For example, FAO is part of the Global Open Data for Agriculture and Nutrition (GODAN) initiative, which strives to advocate and improve coordination globally for the various efforts to make agricultural and nutritionally relevant data available, accessible and usable for widespread application worldwide.

  • Support a number of machine-readable data-exchange formats, preferably adopting international standards (eg SDMX), making it easier to exchange your data with partners.

To those of you embarking on ‘big data’ or similar adventures, please don’t lose sight of your organisation’s locked-up data in the process: a priceless treasure chest sitting at your fingertips.

Stephen Katz is Senior Coordinator of Statistics Governance at the Office of the Chief Statistician at the Food and Agriculture Organisation of the United Nations. Follow @SteveK1958 and @FAOstatistics on Twitter.

If you have ideas or experience in open data that you'd like to share, pitch us a blog or tweet us at @ODIHQ.