Two multiplied by five machines on two legs running after one ball so that they can kick it into the opponents’ goal – that is robot football. Even detecting the ball is a challenge for these players. Tim Laue from the University of Bremen explains how deep learning plays an important role and what a training camp for robotic footballers looks like.

What role does Deep Learning play in robot football?

Deep learning is a technique which is good at identifying and classifying objects. Some mobile phones use it. For example, if you search for ” bicycle ” in your picture gallery, the program will find photos that show bicycles without you ever having itemized them. With robot football, it is essential to be able to immediately and unambiguously identify your teammates, the playing field and above all the ball on the basis of the video images that have been recorded. So we use that.

Don’t the ball and the other players have a tracking device that makes recognition easier?

No. The ball used to be orange. It was usually the only orange spot on the field. They simply had to look for this color value while processing images. That was comparatively easy. Today the ball has a black and white pattern and is easily confused with shadows on the ground or with parts of other robots. I soon reached the limit using a classic image processing approach, such as scanning for dark spots on a large white spot.

So, you are now teaching your robots the concept of the ‘ball’?

We are designing a so-called neural network. We show this network a large number of images. Images that were previously categorized by humans. The network then receives information along with an image as to whether the image depicts a ball or not. This way the network learns frame by frame what constitutes a ‘ball’ in these images. Normally you would define a list of properties that the software has to scan for.

These days, we are redefining the learning process and now let the software compile on the images themselves the properties which make up a ball. There are two main influencing factors for that. On the one hand, there is the range of variations of the example material and the number of repetitions with these images which we feed the network. On the other, we also determine the level of depth for the network.

How long does such a training last?

In order to get reasonably satisfactory results, we need more than 20,000 images, of which only a fraction actually contain a ball, as well as hundreds or even thousands of repeat runs just which feature the characteristics of a ball.

Nowadays, all teams in the RoboCup use this method because it produces pretty solid results, even when lighting conditions change and colors are different. However, you can still see robots running into the penalty area.

Why does it take so long to learn?

Computers are not as good at recognition as humans are. In fact, I can show a child a painting of a giraffe and it is highly probable that the next day when it visits the zoo, the child will recognize the unfamiliar creature with the long neck as a giraffe. A lot of processes happen when a child recognizes something.
The neuronal network has none of these. That means, you have to show it the ball in all its possible variations; the ball is far away, close, half hidden by a teammate, shaded, brightly lit etc … The more variations that we are able to offer the system, the better. Always in the hope that the images will cover as much of the playing field as possible where a ball can be placed. The network can then abstract and recognize ball representations that lie somewhere between the examples shown.

Is the speed at which the image is processed during the game important as well?

It then calculate is the ball rolling, how fast does it roll, which direction is it heading, where am I right now, what do I do now? After all that, it determines its course of action. In case I really want to evaluate every single image, then I only have 16 milliseconds to do all of those calculations for every single image. And we do want to process every single image. There may be a crucial piece of information hidden in an image. Missing even one is therefore not a good idea.

A next step could be to link the optical information with other properties.

That would open a whole new kettle of fish. A computer is good at calculating things. But first you have to convert everything into quantifiable information. You are actually able to do that quite well with images these days. At the moment there is no other way around this when it comes to robot soccer.

How does the robot decide where to look? Do NAOs not have a 360-degree camera?

That is the eternal question. Should all robots always have the ball in sight or should the tasks be distributed across the team? There are a few software programs that try to answer this question. In fact, you frequently see on the video recordings that robots sometimes miss something important.

Mr. Laue, do you play football yourself?

No, I watch football, but I have no talent whatsoever myself. At present these two areas are still too far removed from each other. With real football knowledge, you’d be more likely to get in the way than be able to help.

Yet the long-term goal for the robot community is to one day really get robots to compete against a human team?

Somebody set a goal for 2050. I can’t say whether this is realistic.

Nevertheless, one thing is clear: compared to other challenges, football is still a very simple discipline. We have an entire Collaborative Research Center at the University of Bremen that is working on how to let robots act sensibly in a domestic environment. This is highly complex because this environment is far more unstructured than football. As a human being, I can also cook in an unfamiliar kitchen. When a robot enters these kinds of environments, it gets really complicated.

@Photos: Tim Laue, University of Bremen