Author profile picture

At the RSS in Freiburg, robotic researchers presented new ideas to make robots’ independent learning more efficient. Leaving out the right elements is the most important thing.

From the point of view of AI and robot research, providing robots with precise motion sequences that they can repeat reliably and without any problems is comparable to a digital Stone Age. In the present day what matters is equipping robots with algorithms that allow them to find their way around the unknown as independently, flexibly and quickly as possible. The company motto of DeepMind, an AI company of the Alphabet Group, sums up the connected promise nicely: Solve intelligence. Use it to make the world a better place. Anyone who manages to reproduce intelligence digitally doesn’t have to worry about the rest.


However, the actual problems researchers are currently dealing with seem rather mundane: A robot arm is supposed to learn to carry out a kind of wiping action with different objects in order to throw different objects into a box and sort them in the process, a computer is supposed to steer a vehicle accident-free on a road, and so on.

There are at least two basic problems: The physical properties of the objects involved are usually not 100 percent known, and the solution sequences predefined by trainers are often only partial solutions and not exactly perfect either. Therefore, robots should learn to find their own solution in inadequately defined situations by trial and error and assemble the promising elements into a new and better solution. Reinforcement learning, meaning interactive experimentation with positive reinforcement of the most successful experiments, is usually the method of choice. The decisive factor is the choice of reward system. After all, one wants to prevent endless trial and error and not rule out successful trials from the beginning.


A robot from a team of researchers from Princeton, Google, Columbia and MIT has now succeeded in deriving control parameters for gripping and throwing different objects based on visual observations using the trial-and-error method. Many challenges are hidden in this process: The robot must recognize objects and their position in the messy pile, grab, accelerate, and release them at the right moment in the right place. Mass and weight distribution of the objects are just as unknown as their flight characteristics.

The possibilities of such a solution are tempting: Robots that perform similar tasks in industry could become significantly faster and multiply their maximum reach by throwing.

The newly developed Tossingbot uses its own observations to correct the predictions suggested by a simple physical throwing model and independently optimizes the grip position, throwing speed and dropping point. In order to throw an average of 500 objects per hour into the correct boxes, 15,000 test attempts were necessary. After that, it tossed the objects without any further errors.


Robots don’t have to start from scratch. Trainers can provide the device with different solutions. The typical issues with so-called imitation learning: You want to reduce the number of trainings required, and the robot should develop solutions that are better than the initial input and the system should not be sensitive to extremely bad input.

A research group at King Abdullah University in Thuwal, Saudi Arabia, proposes a method called OIL, Observational Imitation Learning. OIL delivers significantly more solid results than the mere imitation of more or less good training sessions and thus achieves results faster than learning systems that are primarily based on rewards. Although those might not require human training at all, they often do not reach their full potential since the reward system leaves too much leeway.

This is why OIL evaluates input and only adopts the most successful sequences, for example when driving a car over a test track. This way, the input of a large number of trainers can be processed and thus also make it possible to figure out different strategies. At the same time, OIL does not waste time exploring meaningless options, as the trainers, in contrast to autonomous learning systems, exclude them from the very beginning.

The team used the algorithm to control a drone and a simulated car. Compared to other algorithms and human pilots, OIL was extremely successful. For example, the expert human driver was able to steer the car a little faster along the route but made more than twice as many mistakes as OIL. Something that makes this somewhat perplexing: OIL also made mistakes.