Affective Computing

Affective computing is the study and development of systems and devices that can recognize, interpret, process, and simulate human affects.

It is an interdisciplinary field spanning computer sciences, psychology, and cognitive science, which originated at MIT with Rosalind Picard’s 1995 paper on affective computing. A motivation for the research is the ability to simulate empathy. The machine should interpret the emotional state of humans and adapt its behavior to them, giving an appropriate response for those emotions.

Detecting emotional information begins with passive sensors which capture data about the user’s physical state or behavior. For example, a video camera might capture facial expressions, body posture and gestures, while a microphone might capture speech. Other sensors detect emotional cues by directly measuring physiological data, such as skin temperature and galvanic resistance (e.g. Picard’s Affectiva). Recognizing emotional information requires the extraction of meaningful patterns from the gathered data. This is done using machine learning techniques that process different modalities, such as speech recognition, natural language processing, or facial expression detection.

Another area within affective computing is the design of computational devices proposed to exhibit either innate emotional capabilities or that are capable of convincingly simulating emotions. A more practical approach, based on current technological capabilities, is the simulation of emotions in conversational agents in order to enrich and facilitate interactivity between human and machine.

While human emotions are often associated with surges in hormones and other neuropeptides ((small protein-like molecules used by neurons to communicate with each other), emotions in machines might be associated with abstract states associated with progress (or lack of progress) in autonomous learning systems. In this view, affective emotional states correspond to time-derivatives (perturbations) in the learning curve of an arbitrary learning system. Marvin Minsky, one of the pioneering computer scientists in artificial intelligence, relates emotions to the broader issues of machine intelligence stating in ‘The Emotion Machine’ that emotion is ‘not especially different from the processes that we call ‘thinking.”

One can take advantage of the fact that changes in the autonomic (involuntary) nervous system indirectly alter speech, and use this information to produce systems capable of recognizing affect based on extracted features of speech. For example, speech produced in a state of fear, anger, or joy is faster, louder, and precisely enunciated with a higher and wider pitch range. Other emotions such as tiredness, boredom, or sadness, lead to slower, lower-pitched and slurred speech.

Emotional speech processing recognizes the user’s emotional state by analyzing speech patterns. Vocal parameters and prosody (the rhythm, stress, and intonation of speech)  features such as pitch variables and speech rate are analyzed through pattern recognition. Speech recognition is a useful method of identifying affective state, having an average success rate reported in research of 63%. This result appears fairly satisfying when compared with humans’ success rate at identifying emotions, but a little insufficient compared to other forms of emotion recognition (such as those which employ physiological states or facial processing). Furthermore, many speech characteristics are independent of semantics or culture, which makes this technique a very promising one to use.

The vast majority of present systems are data-dependent. This creates one of the biggest challenges in detecting emotions based on speech, as it implicates choosing an appropriate database used to train the classifier. Most of the currently possessed data was obtained from actors and is thus a representation of archetypal emotions. Those so-called acted databases are usually based on the Basic Emotions theory (by psychologist Paul Ekman), which assumes the existence of six basic emotions (anger, fear, disgust, surprise, joy, sadness), the others simply being a mix of the former ones. However, for real life application, naturalistic data is preferred.

A naturalistic database can be produced by observation and analysis of subjects in their natural context. Ultimately, such database should allow the system to recognize emotions based on their context as well as work out the goals and outcomes of the interaction. The nature of this type of data allows for authentic real life implementation, due to the fact it describes states naturally occurring during the human-computer interaction. Despite the numerous advantages which naturalistic data has over acted data, it is difficult to obtain, and usually has low emotional intensity. Moreover, data obtained in a natural context has lower signal quality, due to surroundings noise and distance of the subjects from the microphone.

By doing cross-cultural research in Papua New Guinea, on the Fore Tribesmen, at the end of the 1960s Paul Ekman proposed the idea that facial expressions of emotion are not culturally determined, but universal. Thus, he suggested that they are biological in origin and can therefore be safely and correctly categorized. He therefore officially put forth the six basic emotions in 1972.

However in the 1990s, Ekman expanded his list, including a range of positive and negative emotions, not all of which are encoded in facial muscles. The newly included emotions are: Amusement, Contempt, Contentment, Embarrassment, Excitement, Guilt, Pride in achievement, Relief, Satisfaction, Sensory pleasure, Shame. A system has been conceived in order to formally categorize the physical expression of emotions by defining expressions in terms of muscle actions. The central concept of the Facial Action Coding System, or FACS, as created by Paul Ekman and Wallace V. Friesen in 1978 are Action Units (AU). They are, basically, a contraction or a relaxation of one or more muscles. However, as simple as this concept may seem, it is enough to form the base of a complex and devoid of interpretation emotional identification system.

As with every computational practice, in affect detection by facial processing, some obstacles need to be surpassed, in order to fully unlock the hidden potential of the overall algorithm or method employed. The accuracy of modelling and tracking has been an issue, especially in the incipient stages of affective computing. As hardware evolves, as new discoveries are made and new practices introduced, this lack of accuracy fades, leaving behind noise issues.

However, methods for noise removal exist, including most recently, the Bacterial Foraging Optimization Algorithm. It is generally known that the degree of accuracy in facial recognition (not affective state recognition) has not been brought to a level high enough to permit its widespread efficient use across the world (there have been many attempts, especially by law enforcement, which failed at successfully identifying criminals). Without improving the accuracy of hardware and software used to scan faces, progress is very much slowed down.

Gestures are also used as a means of detecting a particular emotional state of the user, especially when used in conjunction with speech and face recognition. Depending on the specific action, gestures can be simple reflexive responses, like lifting your shoulders when you don’t know the answer to a question, or complex and meaningful as when communicating with sign language. Without making use of any object or surrounding environment, we can wave our hands, clap or beckon. On the other hand, when using objects, we can point at them, move, touch or handle them.

A computer should be able to recognize these, analyze the context and respond in a meaningful way, in order to be efficiently used for Human-Computer Interaction. There are many proposed methods to detect the body gesture. The foremost method makes use of 3D information of key elements of the body parts in order to obtain several important parameters, like palm position or joint angles. On the other hand, Appearance-based systems use images or videos to for direct interpretation. Hand gestures have been a common focus of body gesture detection.

Aesthetics, in the world of art and photography, refers to the principles of the nature and appreciation of beauty. Judging beauty and other aesthetic qualities is a highly subjective task. Computer scientists at Penn State treat the challenge of automatically inferring aesthetic quality of pictures using their visual content as a machine learning problem, with a peer-rated on-line photo sharing website as data source. They extract certain visual features based on the intuition that they can discriminate between aesthetically pleasing and displeasing images. In e-learning applications, affective computing can be used to adjust the presentation style of a computerized tutor when a learner is bored, interested, frustrated, or pleased.

Psychological health services, i.e. counseling, benefit from affective computing applications when determining a client’s emotional state. Robotic systems capable of processing affective information exhibit higher flexibility while one works in uncertain or complex environments. Companion devices, such as digital pets, use affective computing abilities to enhance realism and provide a higher degree of autonomy.

Affective technology is applied as the core principal behind Cognitive Sensors like those being researched and created at The Affective Computing Company in Pittsburgh, PA. The combination of direct human interaction mixed with repetitive probing of the users emotional state allows the Cognitive Sensors to record data over time and turn it into a chart of emotional state over time. Other potential applications are centered around social monitoring. For example, a car can monitor the emotion of all occupants and engage in additional safety measures, such as alerting other vehicles if it detects the driver to be angry.

Affective computing has potential applications in human computer interaction, such as affective mirrors allowing the user to see how he or she performs; emotion monitoring agents sending a warning before one sends an angry email; or even music players selecting tracks based on mood. One idea, put forth by the Romanian researcher Dr. Nicu Sebe in an interview, is the analysis of a person’s face while they are using a certain product (he mentioned ice cream as an example). Companies would then be able to use such analysis to infer whether their product will or will not be well received by the respective market. Affective computing is also being applied to the development of communicative technologies for use by people with autism.

Tags:

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s