Author profile picture

In human history, our travel habits have changed enormously. I think that we are on the cusp of a new breakthrough. COVID-19, our awareness of climate change and a range of technological developments will take telepresence to a whole new level. In this post I will lay out my argument and follow with a call to action for others to help me build a telepresence robot.

Phase one: follow the food – from nomads to farmers

The travel habits of humans have changed continuously, depending on technology available to us. In the past two million years or so (while we developed from homo erectus to homo sapiens) we mainly lived as hunter gatherers. We had to: we went to where there was food and that changed with the seasons.

Then we discovered we could coach nature to grow certain plants and it became possible to settle down. We became farmers. This is often seen as the start of humans civilisation. Apart from living in one place it also enabled us to specialise and develop technology further: ploughing, writing, forging metal, etc. We lost freedom but we starved less often and gained comfort and knowledge. We also travelled a lot less. Most people lived their whole life without going further than one day by foot away from the place where they farmed. And it makes sense when you think of it: what’s the use of going further if you have to come back to work on your patch of land which is the thing that enables you to survive.

Phase two: follow the money – from farmers to city dwellers

With the industrialisation we became better and better at letting machines do the work, using fossil fuels. As a consequence, the number of people of working in agriculture plummeted in the most industrialized countries. Even my country the Netherlands, which with a small population of 17 million people is one of the worlds largest food exporters, only employs 2.5% of people in agriculture. Working directly on food was no longer a think. I have never eaten anything I’ve grown myself in my life and I’ve never wondered if I was going to have enough food tomorrow or next year. So we started to work on other stuff and this was mainly done in cities. So city populations rose from only 7% in 1800 to over 50% in 2007 and urbanisation is still picking up speed.

So in the past we lived close to our land and we where spaced out because a farmer needs a sizable portion of land for his or her work. City work is different. You can put thousands of human beings in a tower with a small footprint and the densest and largest cities are actually the most productive. Dense cities seem to indicate we travel less but the sheer number of people of even the densest megacity means that the travel distance to work is increasing, not decreasing. And our industrialized society running on fossil fuels means we can now travel distances that were unimaginable before. It no longer takes a week to travel a hundred kilometres: it takes an hour. And travelling around the world no longer takes years but days.

It sometimes seems to me we have become intoxicated with this newfound ability to travel. The wealthiest part of the world thinks it’s completely normal to drive 100 km to work every day in a steel box of 2000 kg and to go shopping on the other side of the world. Some expect this to continue endlessly but I think they are engaging in unimaginative extrapolation of the present to the future. I think travel is going to change drastically for three reasons: COVID-19; climate change; and untapped technological potential.

COVID-19 has shown us that quickly jumping all over the world has it’s downsides when you want to limit the spread of disease. And just like the common flu has been for us for over 2000 years, COVID-19 is just one disease of long list of diseases we can expect to the list in the coming decades. Especially since we travel so much and so fast and since we have developed a pretty unhealthy relationship with animals. (It’s one of the many reasons I don’t eat meat anymore.)

Climate change has made us aware that we are strip mining the natural world that keeps us alive. And travel is responsible for an increasing amount of green house gasses. I believe that it’s eminently possible to construct cars and airplanes that emit only an insignificant amount of greenhouse gas and use recycled resources with limited impact on nature. BUT, realistically, it will take us many decades to get there so will have to take a hard look at our travel habits if we want to preserve nature for our children and avoid the worst consequences of climate change.

I think we have to think more out of the box and that we have a number of technologies at our disposal that can help. Which takes me to my predictions…

Phase three: follow your emotion – from city dwellers to digital nomads

Humans have this tendency to underestimate change or think it’s linear. In actuality, we have revamped our society since the industrial revolution and the most important changes increase exponentially. Since building a business or implementing a policy takes time, we will increasingly have to aim for the future in order to have any chance of hitting our target. And this is certainly true for travel.

I think that in the future we will move our physical bodies around less. Instead we will stay at places that make us feel good and we will gather people around us that make the experience of being there physically even more pleasurable.

While our bodies move less, we will move our ‘spirit’ around much more freely. I think telepresence robots and virtual reality (VR) will make it possible to visit people and places all over the world within seconds and to work together much more efficiently and even effectively than is currently the case. I don’t deny that it currently makes sense and is good for business to gather people in offices and factories and have them travel to other work locations. I just think that this will no longer be the case in the future.

And I even think this will go beyond work and also apply to experiencing other cultures and places. And why stop there? Maybe you want to feel what it’s like to be a cat or a small bird. How about visiting the Amazon rainforest and howl with the howler monkeys or prowl with the caimans? Maybe you want to visit the Serengeti and trod with the elephants or chill with the lions. Swimming with a dolphin, orca or whale perhaps? Observing the earth like an Eagle? Skimming the waves like an Albatross? Discovering the depths of the ocean? Experiencing all this in the first person while your body stays safely at home will become possible and a compelling experience with telepresence robots.

A way to put this in technological terms is that I think we see and will see a paradigm shift from cyborg to avatar. People getting inside robots in order to operate them (whether the robot is a car, crane or plane) might become increasingly rare: too expensive, inconvenient, dangerous and limiting.

At the same time I think the paradigm is moving from robot versus human to robot plus human. In Hollywood movies it is all about robots turning on us but in reality it looks like the robot will not replace but enhance us. The combination of human and robot will be superior to fully autonomous robots in most tasks for the foreseeable future. That’s why drones are replacing fighter planes, crane machinists are staying on the ground and surgery is increasingly done by robots plus surgeons (with the TU/e at the forefront). Even chess benefits from this approach: you probably knew that a computer outclasses humans now but maybe you didn’t know that a combination of human plus computer still outclasses the computer. I think the same will be true for robots in the foreseeable future. And that’s why telepresence is so compelling.

What do I mean by telepresence?

Let me give some near future examples:

  • Visit your elderly mother.
    You want to visit your mother but you can’t due to COVID-19. Talking on the phone is better than not talking at all but you would rather see her too. She never got the hang of video conferencing and besides you don’t want her to get out of her favourite chair, away from the computer. So you inhabit your Avatar and visit her that way. James S.A. Corey wrote a poignant short story about that.
  • An important business meeting without flight and hotel.
    By inhabiting a local avatar you can move around freely and can interact wherever, whenever and with whomever you like. You decide to come a bit early to you can find a quiet spot and talk to some people before and after the larger group meeting. During the meeting you can look around so don’t miss anything and others can see who you are talking to. And maybe it’s not a meeting at the other end of the world but you just want to work from home today without missing that one meeting. Or maybe it’s COVID- 19 again.
  • Look how your kids, pets or spouse are doing.
    So you open an app on your smartphone and start moving yourself through the house. You can check there are no burglars, your baby is sleeping safely, your cat or dog has eaten and still has water, your teenager has stopped playing videogames, or you find your spouse watching TV in the living room and you chat with him or her for a while. Maybe you just want to enjoy the view or feel like you are at home for a while.
  • Record professional quality video.
    Moving a camera around smoothly is not as easy as it sounds. Which is why professionals often use expensive dollies that are a pain to set up. A telepresence robot could produce dolly-quality shots with much less cost and hassle. You could even program complex or repetitive movements and record visually more interesting videos from home without the help of a cameraman so this could become a go-to tool for YouTube vloggers.

Let’s actually build a telepresence robot together!

Now I want to change gear. I’ve told you my vision for the future and I would love to discuss it more on twitter.

But in the rest of this post I would like to see if some of you are interested in actually such a robot with me. I just turned in a proposal and if all goes well I will have 100k to spend the coming years and can offer a place to work on the Eindhoven University of Technology and a host of academics to help out. So if you are e.g. an entrepreneur or student interested in building such a robot: read on!

You might think that with all the billions invested in robots and all the advantages of telepresence it would already be a mature technology. You would be wrong. The current telepresence robot you can buy is basically a tablet elevated to chest height on a platform that you can drive around by telling it to move forwards, backwards, left and right. There is a lot of room for improvement and that’s why we are launching project Tero (telepresence robot). We think it could be an open source platform for telepresence robot development. We have organised the goals around human capabilities that we would like to emulate and give some first indication of how we could work on this platform at limited cost:

  • Moving
    For Tero, we don’t need expensive and complicated acrobatics and we don’t need a robot with limbs. We just need something that can move over a flat surface, preferably without bumping into stuff. For humans, avoiding objects while we move is a largely automatic process. To mimic that we want Tero to be able to calculate a path and avoid bumping into objects while only being given generic directions. To further emulate human movement we want Tero to be able to change direction instantly. So no backing up and turning like we do with a car. This can be achieved by using three omnidirectional wheels, like the TU/e soccer robots (or a $130 DOIT kit for the Arduino). The head should be able to move up (‘stand’) and down (‘sit’). This is pretty easy to implement using a standard ($50 $100/4kg) 1000mm stroke linear actuator.
    One advantage of Tero over humans is that it could comfortably hold a very stable position for longer periods of time. So if you used a couple of Tero’s as remote controlled cameras you could record high quality footage for NEON and EE events that would usually require large and expensive camera dollies.
  • Looking around
    Humans move their heads continuously. This is how we establish where we are and how we see details (see next point). It’s also how we make eye contact and it gives important non-verbal cues to others (see below). That’s why Tero should have a pan- tilt unit. Ideally the remote user wears a VR headset and the movement of the headset is replicated by Tero.
  • Seeing
    If you combined the spatial resolution of the human eye eye (016 degrees with healthy 6/6 vision) and our field of view (170 degrees horizontal and 120 degrees vertical) you would end up with a high quality 724 megapixel camera. If you add our colour depth (24 bit) and maximum refresh rate (120 Hz) you would need 2 Terabit of bandwidth (2000x Gigabit ethernet or 200x HDMI 1.4) to the optical unit of the telepresence robot.
    Fortunately nature shows us a more frugal approach. Our optic nerve only has around a thousand wires which translates to just a 2 megapixel sensor. That’s because 99% of our light sensitive cells are concentrated on just 1% of the retina and when we want to see details we move our eyeballs. Probably we also limit updates to the part of the image that changes.
    A way to mimic that could be to use 2 camera’s per eye: one wide angle camera and one tele lens. Both would need a limited resolution of e.g. 2 megapixel. That’s a 4 megapixels your need to send instead of 724. Furthermore you only need to update the pixels that change. The result would be perceived as very high quality vision but would only need very limited bandwidth. The wide angle camera could track the head movements and the tele lens would track the eye movements of the remote user. On top of human level quality eyesight we could add low light sensitivity and the ability to zoom in.
    Unfortunately, this setup doesn’t exist yet. There are experimental virtual reality headsets with added camera’s for eye tracking but they are not commercially available. And neither is a camera setup that tracks eye movements. So although this functionality doesn’t break the in terms of components, it would need to be developed. A first phase implementation could be a setup using a high resolution VR headset (like the Pimax 5K) and two headset tracking 2560×1440 (or 4K) camera’s.
  • Hearing
    Although looking around and seeing well give us a strong sense of ‘being there’, hearing dominates how we experience a scene emotionally. Human hearing combined with visual input also allows us to single out specific voices (the so called ‘cocktail party effect’).
  • Current telepresence robots and webcams only use a low quality speaker and amplifier and miss any form of voice isolation. A better solution would be to use a microphone array that when coupled with software can do sound separation remarkably well (and there are open source implementations). Machine learning algorithms that don’t have to work in real-time now have very high performance voice isolation. Multiple compact and affordable microphones with smart software seem the way forward and could give Tero affordable hearing that’s far superior to that of humans, combined with low bandwidth requirements. But a first phase implementation could be to give Tero a simple head tracking directional microphone and state of the art voice isolation that doesn’t use video (e.g. NVIDIA RTX Voice).
  • Talking
    By replacing the toy speakers of telepresence robots with a good quality wideband loudspeaker it would be relatively easy to achieve good voice quality. The next phase could be a speaker that has a directionality similar to a human so you could hear if the head is turned.
  • Nonverbal communication
    Other users will see a head sized screen (e.g. a tablet) with the face of the remote user. In order to limit complexity we will not give Tero and Deli limbs or actuators that reconstruct the movement of limbs. We will also not use an artificial head that mimics the remote users facial expression: apart from making the robot more complex (by an order of magnitude) it would also make it ‘creepy’. If we deploy eye tracking tele-lens camera’s we will hide them in order not to give the user two sets of eyes. An open question is how we can represent facial expressions when the remote user is wearing a virtual reality headset. The optimal solution would be to animate a VR avatar that reads the expression using custom camera’s. Another open question is how eye-contact can be maintained when the camera is positioned beside or above the screen. Different approaches exist but the use of face warping seems to the most popular one recently and it can even be used to make you a little bit more attractive.

All in all a first phase implementation of Tero would already turn a lot of heads and offer remarkable improvements over currently available telepresence robots. It would need very limited investments in terms of components and would not require fundamental research. Higher levels of functionality could be obtained at very limited costs too. What we need is manpower and determination. What do you think? Let me know on twitter!