How do Social Robots Interact With Humans?

5 Ways Social Robots Can Be Designed and Programmed to Interact With You

- Written by Philip Graves for GWS Robotics, 29th July 2019
- Edited and selectively amended by David Graves, 2nd August 2019

Modern social robots are designed to interact with humans by making use of a variety of analogue sensors, the inputs into which are then converted into digital data that the robots' programming can meaningfully interpret.

We can broadly divide the tools of interactivity with which they are equipped into artificial senses and outwardly perceptible responses, with the latter being mediated by artificial intelligence, this being the product of all protocols by which they are designed to process and respond to data received and memories stored.

Having summarised these, we shall go on to look at five strategic ways in which social robots can be designed and programmed to interact with humans.

Part 1: Artificial Senses

Just as any animal needs senses to detect what is present in and happening in its physical environment, so too do robots need artificial senses for the same purpose.

21^st century social robots like Pepper have been equipped with artificial sight, hearing and touch, but generally have no artificial sense of taste or smell.

Why are social robots not made able to smell or taste?

Although it is possible for sophisticated drug-detection machinery used by customs officials to intercept illicit trafficking operations to be equipped with an ‘artificial nose’, there is no clear economic justification for equipping a social robot with such expensive technology as would be required to detect and recognise the presence of chemical vapours in the air.

Taste is another animal sense that depends on the detection of chemicals, generally in the presence of a water-based fluid called saliva that breaks down the food and releases its chemical constituents; and because of the electrically conductive properties of water, it would potentially be electrically unsafe for an electronic machine with moving parts like a robot to have any kind of fluid inserted into it to imitate this process.

1a. Artificial Vision

Robots are equipped with internal digital cameras by which they are able to receive digital images of their visual environments. This makes for a rich source of data for their programming to process into identifying what these environments consist of, the first step towards responding appropriately in a way that facilitates communication with nearby people.

1b. Artificial Hearing

Modern social robots are fitted with integral microphones, allowing them to receive analogue audio data. This is then converted into digital audio by on-board Analogue-to-Digital Converters (ADCs) and fed into their programs. In order for them to make sense of that digital data, they need to be programmed to interpret the sounds they are hearing with reference to their ADSR (attack, decay, sustain, release) envelopes and frequencies. Ideally they should be programmed in a sophisticated enough way to recognise words from the digital audio patterns of human speech, as well as making sense of background noises and not being distracted by them when people are speaking.

1c. Artificial Touch

Advanced robots can be fitted with an outer ‘skin’ of material that is made sensitive to pressure and / or electrical conductivity, thereby imitating the key ways by which humans perceive touch and also forces acting upon them.

Touch sensitivity can be useful in social robots’ interactions with humans for several reasons. It can allow them to detect when a human is placing a hand on them, opening the way to a host of programmed social responses. It can also allow them to detect the weight of an object if they are expected to carry it, and to respond defensively or self-protectively if subjected to heavy force such as a blow.

Where robots like Pepper are fitted with an internal tablet, they additionally use touch-sensitive screen technology as a direct interface with programs with which they are equipped.

Many robots like Pepper deploy various other sensors to inform their operating systems of the behaviour of their moving parts and joints - notably inertial sensors such as gyroscopes and accelerometers. Information from these sensors is mostly used programmatically to avoid or detect malfunction, or to avoid falling over.

Part 2: Perceptible Responses

It is the first job of the social robot programmer to devise sophisticated routines for the interpretation of the raw digital data from the robot’s visual, sonic and tactile sense mechanisms. The second job is then to devise further routines to determine how the robot should behave based on what its program now understands to be happening around it.

This can be approached in a number of ways, but to confer to a social robot an effective semblance of intelligence requires programming it to behave in ways that seem to its human companions to be appropriate responses to their behaviour and revealed wants and intentions.

Depending on the design of the robot, it is likely to have at its disposal a variety of forms of physical movement, as well as the ability to generate artificial speech and other sounds through its built-in loudspeakers. Both these classes of functionality can be fully exploited to make the robot behave in a lifelike interactive fashion.

2a. Mechanical and Electrical Movement

Social robots can be programmed to draw on their electrical power source to move the internal joints of their bodies and to move themselves across the surface on which they are standing. Some robots can also be made to apply force to third-party objects to achieve specific purposes such as opening a door or throwing an object, and others developed by research laboratories have been made to run or jump using leg-like appendages.

Robots can be made to rotate on the spot or wander around a room or hall. They can be made to turn and tilt their heads, and move their arms, wrists and fingers, whether for lifting and carrying objects, reaching out to touch a human, or simply gesticulating. They can even be made to dance. These abilities are at the disposal of programmers of modern social robots; but they need to be programmed to move in ways that are appropriate to the situation in which they are engaged.

Robots fitted with internal lights and screens can also be programmed to switch them on or off or change their colour in order to convey a sense of emotion. Many social robots are equipped with electronic image-based ‘eyes’ whose appearance can be made to change depending on what they perceive to be happening around them and the ‘emotional’ effect that has upon them. All these changing appearances can be classed collectively as electrical movement, since no mechanical motion is involved.

2b. Speech and sound

Almost all modern social robots are equipped with internal loudspeakers and virtual speech synthesis software so that they can be made to say anything they are programmed to say, comprehensibly to human beings around them. The notable exceptions would be social robots designed to behave more like dogs and other animals, with different kinds of vocalisations.

Most social robots can also be made to produce a variety of audible tones and noises that do not resemble speech but may be designed to indicate their ‘moods’ or to attract human attention.

Some social robots can also be used to play music and pre-recorded audio tracks.

Part 3: Strategy for successful robot-human interaction

Having covered the technical basics of how robots can be equipped with the tools allowing them to interact with humans, we should also consider what kinds of interaction with robots are subjectively appreciated the most by humans. Here are five areas of their design and programmable behaviour that can make the most difference to user perceptions.

3a. Design and visual styling of robot body and head

It’s commonly observed that humans are happiest to interact with social robots that have human-like qualities of behaviour and personality but do not physically resemble humans to the degree that they could be mistaken for them.

We have been inundated with apocalyptic science fiction dramas exploring the theme of robots integrating themselves into society in human disguise and then taking control. These themes in popular fiction and film play into fears of robots that are indistinguishable from humans.

However, Hanson Robotics is a notable example of an active company that has flown in the face of this conventional wisdom and set out to produce robots that look as similar to humans as possible, at least in the designs of their heads and faces, and has even modelled several of them after real individuals. These robots have mostly been used in show applications, such as stage appearances where they are used to answer questions. People may be more comfortable watching them from the safe distance of an auditorium as part of an entertaining stage show than they would be interacting with them closely in an enclosed private setting.

Softbank Robotics is an example of a company that has followed conventional thinking in making its humanoid-style robots appear distinct from human forms. Its robot Pepper resembles neither a male nor a female form, but has some aspects of both.

Other robots may be deliberately designed not to be of humanoid form at all. Some may resemble other creatures such as dogs, while others resemble shapes such as eggs and are seemingly designed to appeal to their audience with cute or childlike features.

The choice of physical form should take into consideration the desired mechanical functionality of the robot as well as the subjective dimension of its aesthetic appeal. For a robot to be socially popular, it probably needs to be aesthetically pleasing, and not purely functional like an industrial robot arm. But equally, to be called a robot at all, it would be expected by most people to be capable of movement.

3b. Manner of movement

The movement of a social robot will always be a mechanical response to an electrical current; but mechanical robotic technology is nowadays sophisticated enough for movements that appear relatively natural or even graceful to be possible.

A social robot that can vary the speed with which it moves in a fluid and responsive manner can be much more interesting for humans to interact with than one that operates at a fixed and predictable speed in all it does.

Ideally, a robot’s movements should not be too unpredictable or make the individuals with them nervous, but should be varied enough to appear to show some kind of social awareness and inner consciousness, even though this is essentially an illusion.

3c. Sound of voice

There are also different schools of thought regarding how a social robot should sound. Should it sound like a robot, or should its speech sound as natural as that of a real human?

Non-robotic interactive devices such as Amazon’s Echo have often seemingly compensated for the lack of humanoid or animal-like form and mechanical functionality of their devices by giving them a highly realistic human voice, and this is also a possibility for robots, but are people ready to hear robots sounding exactly like humans in their homes?

Softbank has given Pepper a very obviously robotic child-like voice, for instance, so when you hear it speak, there is no risk of mistaking its voice for that of a real live human. At the same time, Pepper’s range of vocal pitch and expression is fairly broad compared with the traditional monotone robotic voices ascribed to such robotic characters as the daleks in the British television series ‘Doctor Who?’ in the 20^th century, or the robot in the celebrated computer game Exile for the BBC Micro (1988) that chases the player around and beyond the main cavern while firing bullets and repeatedly growling: “’Pare[1] to die!”

Perhaps it is indeed the necessity to move away from precisely these kinds of stereotypes of aggressive armed robots that makes it a more palatable move not to give today’s social robots monotonous voices.

3d. Interactivity with Visual Environment

Social robots built to have the appearance of eyes should be programmed to show engagement in a way that attracts the attention of those around but without making them too uncomfortable.

Sophisticated social robots can be programmed to recognise movement and to distinguish faces from inanimate objects, to read facial expressions, and to follow individuals around. They probably also ought to be programmed to vary their gaze so that they do not stare constantly at one individual for long periods, a behaviour that would be considered impolite and discomforting in most circumstances of human company.

They can also be programmed to respond to sudden and shocking movements by assuming defensive postures or frightened facial expressions as represented by their coloured lights.

When a human draws very close to a robot in a non-aggressive fashion, it may be programmed to adapt is behaviour by focusing closer attention on that individual, and possibly even by moving its arms into a position of readiness to gently embrace or to have its hand held – provided that the design of robot is robust enough to withstand this and that safeguards have been built in against pinched or trapped human fingers.

It is also within the scope of robotic programming to recognise and mirror certain human behaviours such as dancing and the adoption of certain postures or gestures.

3e. Interactivity with Sonic Environment

One of the primary modes by which social robots function is to seek to respond to cues that could be giving them permission to start a conversation – most especially, an individual greeting them. This can be managed by a combination of programming that recognises language and programming that infers from the orientation of the human speaker’s head and eyes that the robot is most likely to be the one being talked to at that time.

Social robots can also be programmed to recognise vocal expression and not just the content of language, as a means of trying to read the mood of their interlocutors; and they can be made to respond adaptively to such cues by varying their behaviour either to mirror or to respond in a fashion complementary to the manifest mood of the humans with them – whether this be cheerful and jolly, nervous and animated, sombre and morose, or calm and serious.

Sophisticated programming would combine the comprehension of language with non-verbal clues to mood in determining the most appropriate way to respond.

[1] i.e. ‘prepare’, reduced to monosyllabic form, presumably as a statement of the single-minded stupidity of the device and not as a result of a simple failure to program the BBC’s 8-bit sound chip with the first syllable