Guest post: Open data is for small business too

Big businesses have enriched their customer information by using external data sources for many years. But small and medium-sized enterprises have long seen licence fees and their lack of technical skills as a barrier to adoption.

By augmenting a customer database with external data, a company can gain a greater insight into its customer demographics and how they relate to factors such as purchase behaviour and loyalty. This knowledge can then help the company use its marketing budgets in a more cost-effective way, by targeting direct campaigns more accurately and making better decisions about the advertising it purchases.

With free open data becoming more widely available, and the development of extraction portals that are easy to use, these barriers are beginning to be broken down and small business owners are being given greater opportunities to grow their businesses.

A friend of mine runs a popular delicatessen shop in a suburb of Chester. In the seven years since opening her doors, the business has attracted a strong loyal customer base from local residents and nearby businesses to the point of saturation in the immediate surrounding area. Therefore, to grow the business, the owner needed to look further afield for new opportunities.

She decided to offer a home delivery service south of the River Dee within a three mile radius of the shop, and looked for a cost-effective way of promoting it through a leafleting campaign. Analysis of the delivery zone, using ONS census data, showed that approximately 10,000 households lived in the target delivery zone. Budgetary and logistic constraints meant that a maximum of 2,000 leaflets could be printed and distributed.

So, she asked for my help.

The business had acquired a customer database by asking customers to sign up to a newsletter and special offers in store, which formed a good starting point. The customer database was held in Microsoft Access which made it easy to overlay external data by matching to address fields.

The first task I undertook was to clean up the data, as much as possible, by standardising postcodes to upper case with a single space between the two parts using a VBA script. I then de-duplicated the database to household level for the analysis to avoid over-counting. I then sought suitable data sources to enrich the data. The first stage was to add the ONS Postcode Directory (ONSPD) to the database to perform census lookups. As the campaign was confined to the local area, only the CH postcode area was needed which kept the size to a minimum. This was downloaded from the ONS Geography Portal. Using the grid reference coordinates in ONSPD, I was able to extract postcodes of households within 3 miles of the shop restricted to the CH4 postcode district for properties south of the river.

The aim was to build a demographic profile of her customer base compared with the demographics of the area as a whole. I sourced a number of datasets to do this including census data on social grade, tenure, level of education, household composition and deprivation levels; DWP data on benefit claimants; and HMRC data on tax credits and child benefit and council tax data aggregated to output area.

The values in each dataset were converted from raw counts to proportions by dividing by the total to represent probabilities. Then, by matching these 1:1 to the postcodes in the customer database in the delivery zone at household level, the mean proportions for each characteristic were calculated to produce a profile of the customer base.

Census data was extracted at output area level using ONSPD, typically around 100 households, using the postcodes identified within the target delivery zone. A profile of the area was produced by weighting the proportions against the household count for each postcode in the zone provided in the ONS Postcode Estimates dataset.

Other datasets were extracted at the smallest possible geographic unit and matched to postcodes also using ONSPD.

The two profiles were then compared using an index (the ratio of the proportion in the customer profile over the proportion in the area as a whole) and a z-score (the standard deviation of proportions), to identify strongly differentiating characteristics.


This identified the core customer base as home owning professionals and managers with young families. Using a cross-tabulation of the variables with the most significant difference in profile, I ranked postcodes in descending order of propensity using the cumulative household count to cut off at the required 2,000 households.


The resultant dataset was plotted in the open source mapping software QGIS overlaid on Ordnance Survey Open Data Street View maps to identify the roads to target.


The campaign was a success, resulting in many new enquiries and orders. Feedback from new customers showed that the home delivery service was welcomed as they led busy work and family lives and didn’t have time to go shopping.

From this experience I developed an open data for Marketing workshop, in conjunction with the University of Chester’s Riverside Innovation Centre, as part of a series of EDRF funding training courses for small businesses.

The aim of the popular workshop was to show small businesses how they can use open data with readily available Microsoft Office software to improve marketing decision making through greater market insights.

The business owners who attended came from a variety of business sectors and brought some diverse problems to the table, which all could be solved using open data: A home security company wished to monitor burglary trends and target high crime areas. An investment broker wanted to make better use of his expensive call centre resources by pre-qualifying leads.

The principal of a local legal practice wanted to target newly formed companies in the Chester area for business services.

I showed each one how their solution could be achieved, and helped them to do so. For the legal practice I wrote a script, which runs nightly as a scheduled task, to extract newly formed companies in the CH postcode area, using the Companies House URI service. It works by storing the highest previously formed company number for companies registered in England and Wales and then increments this to retrieve new company details until it encounters two consecutive ‘404 not found’ errors. The script filters the results to only companies registered in the CH postcode area, using the postcode field, outputs the resultant dataset to a CSV file, and emails it to the practice who then use it to send letters to the new companies with the mail merge facility in Microsoft Word.

The full economic benefit of open data will come through greater adoption by business to improve efficiency. To achieve this, businesses need to become more aware of the potential value open data can bring.

If SMEs are going to adopt open data, it’s essential that more web-based tools that are easy to use and don’t require specialist technical skills to operate are developed.

John Murray is a freelance data scientist with over 30 years’ experience of building customer information systems for predictive analytics.