Quest for Meaning: A Sense of Purpose Enables Better Human-Robot Collaboration

Robots need to understand the meaning of human activities if they are to work alongside people safely and effectively.

6 years ago • Robotics / AI & Machine Learning

The year is 1954. Two Americans, inventor George Devol and entrepreneur Joseph F. Engelberger are discussing their favorite science fiction writers at a cocktail party. Devol has recently filed his latest idea at the Patent Office, the first Universal Automation or Unimate, an early effort to replace factory workers with robotic machinery. His creative genius had already given birth to some of the first technological marvels of the modern world: the Phantom Doorman, an automatic door with photocells and vacuum tubes, and the Speedy Weeny, a machine that uses microwave energy to cook hot dogs on demand.

Written by UX Designer & Anthropologist Yisela Alvarez Trentini for Wevolver.

Engelberger found the Unimate industrial transfer machine so compelling that seven years later, as soon as the patent was approved, he formed a company to develop the ideas of Mr. Devol. The name of that company was Unimation Inc. Their “programmed article transfer”, later rebaptized “manipulator” and finally “robot” (one can only wonder whether Devol and Engelberger had discussed Isaac Asimov’s first use of the term “robotics” over those cocktails) eventually entered full-scale production as a unit for materials handling and welding.

Unimate reshaped and sped up production lines at manufacturing plants around the world. General Motors, Chrysler, Ford and Fiat quickly recognized the benefits and placed large orders, helping turn the machine into one of the “Top 50 inventions of the past 50 years”. This was but the seed that spawned a new industry: commercial robotics. In time, Unimation Inc. and its sole rival, Cincinnati Milacron Inc. of Ohio, would see the rise of several Japanese and European competitors that incorporated new innovations such as electric micro-processors, servo gun technology and arc welding.

However, the world would have to wait a little longer to see the first truly collaborative robot (or cobot), which was installed at Linatex in 2008. Unlike its predecessors, Universal Robot’s UR5 is lightweight, inexpensive, easy to set up and can be re-programmed by untrained operators.

The cobot UR5. Image courtesy of Universal Robots

Cobots are designed to share a workspace with humans — which is why the UR5 doesn’t need to be placed behind a fence and can safely function alongside employees. This is in stark contrast to traditional industrial robots, usually locked in cages because their rapid movements and heavy bulk can make them unsafe for humans. Once installed, the heavy robotic arms commonly seen in factories are rarely moved; the majority are actually bolted to the floor. This newer generation is, instead, lighter, plug & play and aided by sensor and vision technology to help make workspaces easier to share.

“You don’t need to type or calculate anything to get the robot to work. You just need to show it the movements”. Universal Robots.

Despite the fact that cobots were introduced eleven years ago, less than 5 percent of all industrial robots sold in 2018 were collaborative (according to the Financial Times). Cobots might not be new, but they still have the potential to revolutionize production, particularly for smaller companies — which account for an impressive 70 percent of global manufacturing. It’s estimated that the cobot market will grow from just over $100m in 2018 to $3bn by 2020, because these flexible robots are considerably cheaper than their traditional industrial counterparts (averaging a price of $24,000 each, according to Barclays Capital).

Sawyer™, a high-performance collaborative robot from Rethink Robotics

It is expected that the adoption of cobots will significantly increase within industrial environments over the coming years. But what is the case for their use in other areas, such as healthcare?

At the Copenhagen University Hospital in Gentofte, the cobot UR5 is already being used to optimize the handling and sorting of blood samples, helping the laboratory uphold its target of delivering more than 90% of results within 1 hour. Universal Robots’ mechanical arm has also been incorporated into the Modus V™, developed by Synaptive Medical in Canada to assist in neurosurgery by providing unprecedented views of patient anatomy and perform less invasive procedures with more precision. And that’s not all of it. Cobots are also beginning to be utilized in other medical areas, for example for telepresence, rehabilitation, medical transportation and sanitation.

Some of these scenarios come with a new set of challenges, as they are filled with dynamic and unpredictable interactions that are significantly different from those within an industrial environment.

For example, a robot nurse might be asked to support a patient fetch specific medicine from a cabinet. The robot would need to understand what the patient is requesting in whatever way they express it, locate the cabinet, navigate through the room avoiding obstacles, open the cabinet, grasp the right medicine and give it to the patient. In order to handle an object, a robot needs to select the correct grasp for a certain shape, understand how different materials respond to force and coordinate feedback and joint steps when handing them over to a person. During these types of interactions, humans naturally monitor the pace and workload of their partners and adapt their handovers accordingly. Robots would need to achieve similar adaptivity in performing them.

Manipulating objects involves a series of actions that are deeply entwined with perception, control, and coordination.

Moxi is a hospital robot assistant that helps clinical staff with non-patient-facing tasks like gathering supplies and bringing them to patient rooms, delivering lab samples, fetching items from central supply, and removing soiled linen bags.

Traditionally, the focus of robotics has been on the success of reaching and grasping things through two metrics: speed and stability. These are of particular interest in logistics and have largely benefitted from recent advances in computer vision and machine learning. However, they might not be enough to determine the success of a robotic action when it comes to interacting with humans in more flexible, unpredictable environments such as the hospital or the home.

In order for cobots to navigate these new areas effectively, they need to understand how people interact with the world so they can anticipate what’s expected of them.

How ‘Purpose’ Influences Human Actions and Tool Usage

As humans, we have an intuitive physics model in our heads. We’re able to imagine an object and how it will behave if we interact with it by pushing, pulling, squeezing, etc. This allows us to accomplish incredible manipulation tasks that are far beyond the reach of current robots.

François Osiurak, a researcher from the University of Angers, argues that “we do not perceive the world without any intention”. Humans of all cultures spontaneously and almost systematically use tools to modify the world, innovate and move away from previous technical equipment.

An Oldowan chopper, one of the earliest examples of the stone industry. Ethiopia, c. 1.7 million years before our era. Source: Wikipedia CC.

This ability is rooted in our particular talent at acquiring a technique and applying it to a variety of goals, and it’s deeply linked to our evolution. About seven million years ago, bipedalism separated the first hominids from the rest of the four-legged apes. Our hands became free to manipulate objects, so we had to create representations about tools and their associated actions. Our brain became really good at processing complex spatial information, which in turn led to abstract and transcendental thought.

Unlike non-human animals, who can’t understand the underlying mechanisms involved in an action, we can extrapolate the relations we learn to other situations. A single object can offer multiple possibilities at different times, depending on the purpose we are pursuing.

A stone axe in various stages of manufacture, from left to right, from the Neolithic settlement of Vinelz on Lake Biel, Switzerland, c. 2700 BC. Wikipedia CC.

A hundred years ago, W. James wrote that when we’re writing, we see a piece of paper as a surface for inscription, but if we need to light a fire and no other materials were available, that paper would be combustible material. The item can be any of those two, or a thin thing, or a hydrocarbonaceous thing, or an American thing, and so on ad infinitum.

“There is no property ABSOLUTELY essential to any one thing. The same property which figures at the essence of a thing on one occasion becomes a very inessential feature upon another.” James, 1890/2007b, p. 333.

We see objects as affordances, dispositions that require the presence of a person to actualize them. The same technique, for example a cutting action, can be used to achieve several distinct goals: to feed, to hunt, to defend oneself. At the same time, the same goal can be achieved using different objects.

Affordances enable us to have flexibility in the way we interact with the world. They allow the establishment of a goal and different means for reaching it.

Human-human and Human-Robot handovers in a household scenario. From “Adaptive Coordination Strategies for Human-Robot Handovers”, Chien-Ming Huang et. al.

In contrast, most robotic systems are designed to fulfill a single task. They usually know with which objects they will need to interact and do so in a pre-planned way. But in an open, dynamic world, we can’t explicitly spell out every detail of all possible tasks a robot might face.

The integration of robotic systems into these new environments will require the development of better interaction, perception, and coordination between humans and robots. One way to do this is by showing cobots how, and more importantly why, we manipulate objects.

Teaching Robots How to Manipulate Objects More Effectively

A good way of transferring some of our manipulation abilities to robots is by letting them observe us while we utilize tools in a variety of scenarios.

“Complex observations can extract rules valid for new situations even when observing multiple humans performing the same tasks with different styles.” Gordon Cheng.

In 2017, researchers from MIT’s Computer Science and Artificial Intelligence Laboratory (CSAIL) developed C-LEARN, a system that allows noncoders to teach robots a range of tasks by providing some information about how objects are typically manipulated and then showing the robot a single demo of the task.

“By combining the intuitiveness of learning from demonstration with the precision of motion-planning algorithms, this approach can help robots do new types of tasks that they haven’t been able to learn before, like multistep assembly using both of their arms.” Claudia Pérez-D’Arpino, MIT.

MIT’s system connects the demonstration of a task with what robots already know about the world. First, a knowledge base of information on how to reach and grasp various objects with different constraints is loaded onto the robot. Then, an operator executes a task using a sequence of relevant moments known as “keyframes”. By matching these keyframes to the different situations in the knowledge base, the robot can automatically suggest motion plans with an accuracy of 87.5%.

By trying to infer the principles behind motion, these types of systems are far more flexible than those which just try to mimic what’s being demonstrated and can prove important in time-sensitive and dangerous scenarios. The system was actually tested using Optimus, a two-armed robot designed for bomb disposal. The machine was taught to perform tasks such as opening doors, transporting items and extracting objects from containers, and the results were so effective that the skills it learned could then be seamlessly transferred to Atlas, CSAIL’s 6-foot-tall, 400-pound humanoid robot.

Although C-LEARN can’t yet handle certain advanced tasks such as avoiding collisions or planning for different step sequences for a given task, it’s just a matter of time until more insights from human learning can provide robots with an even wider range of capabilities.

MIT CSAIL’s team is not the only one trying to solve this problem.

Karinne Ramirez-Amaro’s team from the Technical University of Munich has been researching how to use semantic representations to obtain and determine a higher-level understanding of a demonstrator’s behavior. This would allow the robot to anticipate the next probable motion according to human expectations.

The iCub robot, an open-source child-size humanoid robot capable of crawling, grasping objects, and interacting with people.

Ramirez-Amaro’s framework is organized in three modules: 1) Observation of human motions and object properties to accurately extract the most relevant aspects of a task; 2) Inferral of a goal by interpreting human behaviors hierarchically and generating semantic rules; and 3) Imitation/reproduction of the best motion to achieve the same goal.

The team tested their system with a set of real-world scenarios and the iCub, a humanoid robot developed at IITas part of the EU project RobotCub.

The scenarios tested were making pancakes, making a sandwich, and setting the table.

1. Observing and Extracting Relevant Aspects of a Task

The first thing the iCub was trained to do was to segment a series of videos that depicted simple human motions. This was done on the fly, in both normal and fast speeds, using a color-based tracking algorithm and the OpenCV library. The hand position, its motion and velocity, and certain object properties were determined with an impressive 95% accuracy.

2. Inferring the Demonstrator’s Goal

The next module used the recognized hand motions and objects properties to find meaningful relationships between them and infer the activity, for example reaching, taking, cutting, pouring, releasing, etc. This was done using a series of low-level abstractions (generalized motions like move, not_move, or tool_use) and high-level ones (more complex human behaviors such as idle, reach, take, cut, pour, put something somewhere, release, etc.)

Method for inferring human activities. First, the input video is segmented into hand motions and two object properties. Next, the rules are computed to infer human activities. Ramirez-Amaro.

3. Reproducing the Best Motion to Achieve the Goal

Finally, an execution plan was selected using online motion primitives so the robot could reproduce the activity inferred in step two. The iCub proved to be an excellent testing platform because it has 53 degrees of freedom.

Results

The experiment demonstrated that the robot could achieve high accuracy in recognition and imitation of tasks for a variety of scenarios in real-time, even for unknown ones that included different conditions and users. The extracted semantic representations allowed to construct machine/human understandable descriptions of human activities, such as:

if Hand(Move) & OBjectInHand(Non) & ObjectActedOn(Something) → Activity(Reach)

if Hand(Not_Move) & ObjectInHand(Something) → Activity(Take)

if Hand(Move) & ObjectInHand(Something) → Activity(PutSomethingSomewhere)

The results showed that the system can handle the challenges of new (granular) behaviors and re-use models. Since the segmentation was done with a basic visual process, the study could be scaled by including more sophisticated object recognition software. The next step, the researchers argue, would be to learn behaviors on-demand by incorporating a knowledge-based ontology and a reasoning engine to compute new relationships between objects and activities and control the growth of the decision tree.

Representations for Robot Knowledge: KnowRob

We can’t guarantee that knowledge stored manually in a system will be valid under every possible different scenario — one of the main challenges of flexible collaborative robotics. In order to allow more natural interactions with humans, we need to grow the robot’s knowledge base in a meaningful manner. A project that takes us a step closer to this goal is KnowRob.

KnowRob is an open-source system that is currently being used on several robots performing complex object manipulation tasks. Rather than working with original high-resolution continuous data, KnowRob offers an on-demand abstraction or symbolic view. Its queries combine inferences made at all levels of abstraction with a shallow knowledge base that works as an integration layer for inference algorithms.

This general-purpose ontology, the sort of ‘glue’ at the representational level, can be extended with micro-theories that add domain-specific knowledge or special-purpose inference methods. To facilitate this, all modules share the same language, the Web Ontology Language (OWL).

KnowRob can represent objects and spatial information, events and actions and, specifically useful for our analysis regarding goals and purpose, effects of actions on objects. For example, a robot can search for an Action that turns a Device from DeviceStateOff to DeviceStateOn, and will obtain the action class TurningOnPoweredDevice.

Class: TurningOnPoweredDevice
SubClassOf:
ControllingAPhysicalDevice
objectOfStateChange some PhysicalDevice
fromState value DeviceStateOff
toState value DeviceStateOn
[…]

To complement this, there are also procedural projection rules for computing how a concrete world state would change after performing an action.

Knowledge systems like KnowRob can help service robots cope with often shallow and symbolic instructions, filling in the gaps to generate detailed, grounded, and real-valued information needed for execution. Because KnowRob is open source, extensions developed by third parties include knowledge about industrial assembly tasks, search and rescue in alpine environments and underwater robotics, among others.

Conclusions

As robot factory workers, co-workers and home assistant robots scale towards open environments and more complex tasks, their knowledge processing systems need to be up to the challenges that emerge from a dynamic, human-centric world.

The advantages of flexible, intelligent cobots are not limited to precision, repeatability or teleoperation. Right at this moment in Pasadena, California, Flippy the hamburger-flipping robot is serving 300 orders a day from a kitchen so unbearably hot, most employees had quit it within weeks. Now, they are free to focus on attending customers.

In hospitals, robots are preparing slides and loading centrifuges, transporting supplies and operating on people. Furthermore, they are beginning to take care of tasks such as moving patients (see the Bear robot in the video), provide standardized approaches to symptom management and even combat loneliness and inactivity (like Paro,Pepper, and Dinsow). These activities can benefit immensely from understanding context and interpreting words and emotions.

In order to cooperate with robots in an infinitely variable number of settings, we need a combination of different integrated knowledge areas, sources and inference mechanisms.

The second generation of the aforementioned KnowRob will include narrative enabled episodic memory, the ability to acquire experiential knowledge by re-using components in virtual environments with physics and almost photorealistic renderings. The system would be able to draw conclusions about what action parametrization is likely to succeed in the real world.

This brings up an interesting closing question: If robots see objects as affordances, something that helps us, humans, continuously invent new ways of doing things, then it’s not a stretch to imagine they could come up with completely new and better ways to do things. We’ll have to wait and see what those are.

This article was written by UX Designer & Anthropologist Yisela Alvarez Trentini and first published on Wevolver.com
If you want to read more on Human-Robot Interaction then for example read Yisela's article about the rise of social robotics.

publications & research

Wevolver

Wevolver is a platform providing engineers informative content to help them innovate. Wevolver is how today's engineers stay cutting edge.