Open data in action: Digging for treasure with data

How a big data company can save infrastructure groups millions by using open data to predict where archaeological artefacts might be found during construction

null In the City of London, archaeological work can count for up to 3% of total construction costs. Image licensed under CC BY-NC 2.0 – Flickr: Roberto Trm

Democrata uses big data analytics to reduce risk in major engineering projects. How? By automating environmental impact assessments with a supercomputer, using open data.

I catch up with Founder and CEO Geoff Roberts, to hear his story and how it all works.

Hi Geoff, how are you doing?

I’m fine thanks, things are busy which is always a good sign.

What’s Democrata’s overall vision?

Democrata’s vision is to democratise data and provide insight and opportunities for both corporates and SMEs.

The company focuses on big data, small data, open data and proprietary data, but whatever the scale it’s the insight we generate and the software we design that gives our customers a commercial advantage. We take a business problem and then find datasets that, when linked and analysed using models we create, can help us find the solution to that problem.

What first got you excited about open data?

Over the last few of years I've been looking at the open datasets out there and could see so much value. We went along to event on open data held as part of Big Data Week at Harwell Campus last year, along with lots of other organisations, including the Science & Technology Facilities Council (STFC). We heard speakers from government and academia talk about the data they had under management and how they were trying to make it more open.

One of the speakers was Catherine Hardman, Deputy Director of UK Archaeology Data Service, whose throwaway line at the end of her talk about the data they curated was

we haven’t worked out how to make money out of it yet, so let us know if you have any ideas!

There was also a competition that day being run by STFC’s Hartree Centre – we were challenged to take some of the data mentioned and put a project proposal together that solved a problem.

How did you start using open data to help spot archaeological artefacts?

According to City of London research, the average cost of archaeological work in its jurisdiction is between 1-3% of total construction costs.

We put a project proposal together that used open data from the Archaeology Data Service, the British Geological Society, Ordnance Survey, and English Heritage to de-risk major infrastructure projects by predicting where archaeological artefacts might be found.

This idea won us the Open Data Innovation competition, giving us the time and resources we needed to develop the concept using Europe’s largest supercomputer dedicated to industrial R&D, developed by ODI Member the STFC Hartree Centre, based at STFC's Daresbury Laboratory in Cheshire.

How does this non-commercial supercomputer work? What kinds of data and software do you use with it?

The data comes in various formats, including .csv and pdf files, as well as geographic information system (GIS) formats and text. As with most projects, there is a lot of cleaning and standardising to do. But without the open data we would not be able to carry out the project.

A real bonus was that included in the prize was the use of IBM’s BigInsights, their own version of Hadoop and IBM Content Analytics. The data scientists used R for the modelling, so that helped keep other licensing costs low.

That sounds really interesting! Who has shown interest in the project so far? What’s next?

By using the supercomputer we found that we were able to predict where potential archaeological sites in the UK would be. The algorithm still needs more work but it's an exciting development which has the potential to save construction companies millions.

We’ve had a lot of interest from various sectors – especially major engineering companies and others involved in large-scale infrastructure projects. We’re now looking to raise funds for the next stage of development.

We also want to take up consulting roles to help businesses use predictive analytics to make better decisions. There is a huge amount of value in open data. The problem isn’t 'what to do', but which ideas to pick and run with. There is so much that could be done and so much to exploit.


You can read more on this story in features in the New Scientist, Computer Weekly, E&T and STFC.

Geoff Roberts is Founder and CEO of Democrata. Follow @DemocrataUK on Twitter

Find out more about the Hartree Centre and follow @HartreeCentre on Twitter

Have you got an open data story you'd like to share? Send it over to [email protected] and I'll get back to you.