Marnix Zoutenbier, Vincent Bos, © Demcon
Author profile picture

In an era where data is increasingly driving innovation, Demcon data-driven solutions (DDS) plays a crucial role. Through the use of advanced algorithms and synthetic data, Demcon is helping various industries to operate more efficiently and effectively. Marnix Zoutenbier and Vincent Bos explain the added value of synthetic data and elaborate on Demcon DDS’ innovative approach. “If you feel like you can do more with data, it’s time for a cup of coffee with us,” says Zoutenbier with a reference to Demcon’s mission to help companies use their data more effectively.

Much of the modern economy is based on data, often called “the new gold” for a good reason. The more data, the better the understanding of underlying processes and the greater the ability of an organization to make the right decisions for the future. “The absolute amount is not even the most important thing in this regard,” says Zoutenbier, ”but mainly the degree of overlap and representativeness between the data you use to develop a model and the full breadth of the scope. Then you can have all the variations that you encounter in use also in your training data. The consequence is that you need a lot of data points but quantity is less essential than variation.”

Strawberry grower and pulmonologist

That principle applies as much to big data-first companies like Google and Amazon as it does to a strawberry grower or a pulmonologist in the operating room. So the challenge for all those data users is always to gather even more varied data. But somewhere it stops: you can install as many sensors, cameras, or lidar devices as you like, but these can obviously only detect what actually occurs. Hence the importance of synthetic data: artificially generated data that mimics the statistical characteristics and patterns of real-world data.

Marnix Zoutenbier, © Demcon
Marnix Zoutenbier, © Demcon

Demcon Data Driven Solutions, an independent company within the Demcon holding company, specializes in exactly that. Demcon DDS has two main branches: algorithm development and synthetic data. To appreciate the power of synthetic data, Zoutenbier, an experienced statistician and data scientist, first explains what algorithmics focuses on. Demcon distinguishes three applications within algorithmics: vision, time series, and process optimization. Applications range from automatic inspection of the water supply network to improving medical diagnoses and halving the lead times of a manufacturing process. “And together with my colleagues, we cover the whole spectrum from classical methods to reinforcement learning and deep learning. After all, every question needs its unique approach.”

Four major benefits of synthetic data

1. The ability to generate specific data at a scale and variation that would not otherwise be feasible. 
2. The ability to generate data before a system is operational. This can enable training of vision systems for production lines before they are operational. 
3. Rich labeling: any feature generated can be annotated at will and very consistently. 
4. Complete control over datasets. If there is “bias” or incompleteness in the real data, if certain edge cases are under-represented, synthetic data can resolve this. 

See how Demcon works here: https://vimeo.com/810084583/7847b90c9e

Synthetic data

Those algorithms are essential to any data project, but especially when synthetic data is involved. Vincent Bos, a mathematician with a background in scientific visualization, leads the team focusing on synthetic data. This data is used to train AI models without depending on real data, which offers several advantages. 

Stages of a strawberry, created with synthetic data, © Demcon DSS
Stages of a strawberry, created with synthetic data, © Demcon DSS

“We create synthetic data by building realistic 3D objects in an environment and simulate the resulting sensor signal, for example, a light source and a camera” explains Bos. Demcon DDS uses advanced 3D animations and simulations to generate, for example, images of strawberries that may be at different stages of ripeness or rot. These synthetic strawberries are then used to train AI models that are robust and consistent. “The big advantage we have is that we know exactly what is what. The 3D models are self-generated, so we know what each pixel means,” Bos adds. The generated images exactly match the images that “normal” cameras, or other sensors such as radar, lidar, and CT scans would also provide; users don’t have to change their workflow for it.

Medical applications

One of the most appealing applications of synthetic data at Demcon DDS is in the medical sector. For example, Demcon DDS develops realistic anatomical models of lungs with various defects, such as fluid accumulation, pneumothorax, and tumors. These synthetic lung models are used to train AI systems that can distinguish malignant from benign nodules, even in cases that are rare in practice. “We can create lung masses – lung nodules – that have never been seen before,” Bos says proudly. This allows physicians and AI systems to prepare for rare cases that do not appear in existing databases.

Vincent Bos, © Demcon
Vincent Bos, © Demcon

Another big advantage of synthetic data is that it can avoid unwanted bias. Bos: “In practice, you know that systems often work well on white people and not on Asian or African people, purely because the data they are initially trained with is limited. With synthetic data, we can ensure that these systems become more robust and inclusive.” DDS is now working on projects where they generate data for different skin types to improve the training of AI models.

Defensie

Demcon DDS is also working with Defensie on projects such as detecting military vehicles in different environments using only synthetic data. This approach, precisely because of its almost unlimited diversity, has led to better results than with traditional algorithms based on real data. According to Bos, this shows how powerful and versatile synthetic data can be. “For example, for Defensie, we create images of a tank parked somewhere concealed, with a passenger car next to it. That way a model learns that a car is not a military object. You often can’t simulate this variety of situations with real data.”

A real-life example illustrates the power of synthetic data well, Zoutenbier adds: “A while back, someone reported on LinkedIn that he had been fined 385 euros for talking on a phone when it wasn’t a phone. The man was just scratching his ear. Based on the photo, that person was eventually able to prove it, but it would not have been necessary at all. With synthetic data, you can teach an algorithm to better recognize these situations – and a thousand others – and avoid such mistakes.”

Strategy before data

There are few companies today for which data are not important. Yet the question “I have a bin of data here, what can I actually do with it?” should never be the starting point of the discussion, Zoutenbier emphasizes. “It always starts for every company with its own strategy in physical reality. If from that strategy you think it is possible to use data better, we like to come into the picture. Together we can then look at how data can further help a company to use their data effectively and purposefully, and even if the data is limited, we can synthesize it.”

Collaboration

This story is the result of a collaboration between Demcon and our editorial team. Innovation Origins is an independent journalism platform that carefully chooses its partners and only cooperates with companies and institutions that share our mission: spreading the story of innovation. This way we can offer our readers valuable stories that are created according to journalistic guidelines. Want to know more about how Innovation Origins works with other companies? Click here