I’m Dan Klein and I’m the Chief Data Officer at Valtech, a digital agency focused on business transformation.
We’re delighted to have been jointly shortlisted with DWP Digital in the Outstanding Analytics Infrastructure category at the 2018 Big Data Excellence Awards.
The project we’ve been shortlisted for is Universal Credit DataWorks; a common data infrastructure platform that’s used by all of DWP’s directorates.
When it’s fully rolled out, Universal Credit will serve around 10 million people in near real-time, so the data available is substantial. A common data platform allows DWP Digital to create valuable insights from the data gathered.
The insights generated from the new platform enable 10,000 Jobcentre colleagues to better support their claimants, and helps around 1,000 DWP analysts support policymaking and respond to parliamentary questions. It also aids the Fraud and Error team to find instances of potential misuse.
Building a team
From the very beginning, DWP and Valtech set out to build a joint team to develop and assist in the implementation of the platform, which enables Business Intelligence reporting and data analytics services.
We blended the skills of DWP Digital colleagues with those building user needs-focused data technologies. We succeeded in creating a fun and collaborative team that undertook some significant challenges and pushed boundaries in leading-edge technologies. Everyone from Jobcentre specialists to young ‘fast streamers’ were combined with user experience designers and data technologists, to ensure a dynamic ‘test and learn’ delivery environment was created.
The challenge for us was to provide an analytics platform that could cope with many issues and offer timely, accurate, and transparent data to its users daily. The existing Universal Credit team used a schema-less database to remove the need for expensive database schema modification – effectively pushing that responsibility downstream.
Traditionally, to model the transactional data for analytics requires that it is first explored, analysed, and understood from both business context and end user exploitation perspectives. However, there was no schema release process and the existing team was already overwhelmed with work and unable to absorb the additional impact this would have created. The traditional approach was unrealistic given the large volume and frequency of changes, and the lack of knowledge of what and where these changes were.
Finding a solution
Valtech and DWP Digital agreed a low impact solution to provide an automatic representation of the internal Java class graph (120 micro-services, written in over 500 classes, manipulating 15,000 data attributes), giving the team the latest state of their data schema. We then developed a two-step approach to the data pipeline that could keep the data in sync and up-to-date. This new approach rebuilds the schemas on the fly as the underlying source data changes over time.
In addition, the team developed a multi-layered platform - with data sensitivity in mind - which offers different levels of data freshness/completeness and cleanliness/quality. To support the Role Based Access Control security model, views with stripped-off sensitive attributes are automatically produced for every relevant layer. Therefore, only people with relevant permissions can access sensitive data.
Realising the benefits
The Universal Credit DataWorks platform offers many benefits to the department:
- It provides faster time to market, with data available quicker for analytics
- The trade-off between data freshness/completeness and quality/cleanliness is now a user’s choice and not a technical one
- Most important of all – the ‘man-in-the-middle’ bottleneck has been removed. Conversations happen directly between end data users and data producers. This leads to better understanding of the meaning of the data by the development teams
- Data transformations are explicit and provided to the end user in the form of queryable lineage tables
- Increased productivity as less time is spent trying to get data
- Ability to identify and act on instances of fraud as they happen
The 2018 winners will be announced at the ceremony on Wednesday 16 May, until then we are all keeping our fingers crossed! You can find this year’s full shortlist and follow our progress on the Big Data Excellence Awards 2018 website.