Back to Resources

‘Testing on speed’: digital twins and synthetic data

Developments in technology over the last forty years have been vast. We moved from server-based technologies in the 90s to open source in the 00s, then there was the rise of the cloud era and big capacity programming. Moving into the 10s, the Big Data revolution has driven other innovations such as global connectivity and IOT sensors. All of these developments mean an increased demand to do more with data; but doing more with data means spending more money, as most big organisations are starting to find out.

The days of everything being on-prem in a data centre with fixed costs are gone. With a push to the likes of Google, Amazon and Microsoft Azure, many of these costs are now operational costs where organisations pay by month or by usage. As a result, a lot of the pain that cloud technology started to solve is coming back around in the shape of new problems. The other issue these big organisations are facing is the great amount of legacy technology (old transactional, non-transactional and analytics databases for example) that exists thanks to the constant chasing of new technologies and therefore projects which never get finished.

In the digital age, businesses rely on a vast number of technologies, tools and processes in order to operate. There are probably three or four thousand tools you can choose from, with the deciding factors usually being who gives you the best rate or the best pitch, regardless of whether it’s the right fit for the business. Organisations are finding it increasingly hard to know which technology they should be using and how to continually optimise their tech stack. This is down to several factors:

• an inability to predict cost and performance across different scenarios

• unrealistic expectations on ease of implementing the tech, with all vendors promoting their product as the ‘silver bullet’

• the disruption caused by testing and optimising tech in a live setting

So, what’s the solution? Well, one way to overcome these challenges is by using digital twins, synthetic data and simulations.

What is a digital twin?

A digital twin is a virtual representation of an organisation which serves as a digital counterpart to be used for simulating scenarios, testing technologies and integrations, optimising processes, plus monitoring and maintaining systems – all in a safe, cost efficient and fast environment. ‘Testing on speed’ as we like to call it. Data from the real world will pass through that virtual model and that will allow you to apply machine learning or deep learning for example, with the idea being that you can push that data back into the real world to achieve great insights.

As an example, we were tasked by one of our partners with creating a real-time platform capable of ingesting one million transactions per second. As you can imagine, that’s a huge undertaking, involving hundreds and hundreds of nodes of processing globally. Thankfully, we were able to use digital twins to build that area in a safe test environment before rolling it out across the globe, which would cost millions of dollars.

What is synthetic data?

Using real data, especially if your project involves multiple companies, presents numerous problems in terms of compliance and security. While synthetic data has always been around, it is becoming more and more prevalent in these test environments at a much greater scale, thanks again to the rise in technology and the lowering of costs. It allows us to take the ‘shape’ of the data that already exists in the real world, find those correlations, use algorithms to generate volumes of data that emulates what happens in the real world, then do some pretty interesting testing.

Another opportunity provided by synthetic data, which has never really been available before, is to model your own problem but in a greater universe. For example, in a telco world, businesses always want to get more customers, but everybody in the UK already has a mobile phone and broadband. So how can you do competitive marketing? By modelling out where your competitors are doing their campaigns, and understanding how you can be more effective, you can actually beat them in the marketplace. In the end it will get to a point where everybody has this technology and therefore nobody gets an advantage, but for the early adopters it’s a huge win.

Enhancing digital twins with synthetic data

When you combine the digital twin and the synthetic data, all of a sudden you can start modelling the future. Synthetic data expands the use case of a digital twin to emulate scenarios with different external factors, providing a more realistic setting in which to test things like market models, and to forecast costs and risks in an uncertain world. Using this method, you can determine whether a particular change is worth doing or not, and this can apply on a business level, a technology level, or even a process level.

One point to note – the data analysis is everything. All the tools and technologies in the world will not help if you start with poor quality data. So when you're generating synthetic data, you have to understand the source data that you’re trying to mimic.

Limitations and challenges

As you’d probably imagine, these technologies cost a huge amount to set up, run and scale, given that they rely on the cloud. While we’re seeing everyone rush to the cloud at the moment, we predict a return to some form of on-prem, especially for big workloads, as the cost is much easier to manage. Another way to reduce costs is to shut down all that legacy technology which we mentioned at the start, saving businesses a small fortune in operational costs.

As well as the cost implications, there are still question marks over the accuracy of digital twins and synthetic data. There is a huge amount of bias in data, and simply creating simulated data from normal data doesn't necessarily guarantee a great result either. While there is analysis you can do on simulated data to understand what bias you've got in your current platforms, nothing comes for free and it all requires a huge amount of hard work – there are never silver bullets.

With around 16,000 technology vendors on the market, they need a way to showcase their technology in reality and prove that they can deliver the outcomes they promise. By building a digital twin with synthetic data, you can compare vendors in a neutral environment that you control, without using real data, so you can try and maximize your return on investment and find the technology that works best for your needs.