Emotional State Analysis Model of Humanoid Robot in Human-Computer Interaction Process
Emotional State Analysis Model of Humanoid Robot in Human-Computer Interaction Process
Peng, Boxin
2022-05-06 00:00:00
Hindawi Journal of Robotics Volume 2022, Article ID 8951671, 6 pages https://doi.org/10.1155/2022/8951671 Research Article Emotional State Analysis Model of Humanoid Robot in Human-Computer Interaction Process Boxin Peng School of Computer Science and Technology, Southeast University, Dhaka, Bangladesh Correspondence should be addressed to Boxin Peng; 213193474@seu.edu.cn Received 22 December 2021; Revised 27 March 2022; Accepted 30 March 2022; Published 6 May 2022 Academic Editor: Shan Zhong Copyright © 2022 Boxin Peng. is is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. e traditional humanoid robot dialogue system is generally based on template construction, which can make a good response in the set dialogue domain but cannot generate a good response to the content outside the domain. e rules of the dialogue system rely on manual design and lack of emotion detection of the interactive objects. In view of the shortcomings of traditional methods, this study designed an emotion analysis model based on deep neural network to detect the emotion of interactive objects and built an open-domain dialogue system of humanoid robot. In aŠective state analysis language processing, language coding, feature analysis, and Word2vec research are carried out. en, an emotional state analysis model is constructed to train the emotional state of a humanoid robot, and the training results are summarized. interactions to complete tasks [7]. is kind of interaction is 1. Introduction familiar to most people and is more concise and e•cient. So it can help people and robots interact more eŠectively to With the progress of science and technology, robots have complete tasks [8]. e emotion analysis model of humanoid gradually entered every aspect of people’s lives. From in- robot can analyze and identify the emotional information of dustrial use to military applications, home service, educa- the interactive object in the process of interaction, which is an tion, and laboratories, robots are playing a signi’cant role important part of the dialogue system [9]. In the process of [1]. According to the three laws of robotics [2], the ultimate interaction, the language of the object of interaction contains goal of robot development is to make robots imitate human rich emotional information and the text content information intelligent behavior, to help humans better complete tasks, is a high-level expression of human thinking. and to achieve goals [3]. In the process of human and robot cooperation to complete tasks, human beings cannot avoid the need to better communicate with the robot in order to 2. Literature Review complete tasks more e•ciently [4, 5]. Traditional interaction between human and computer is human mainly through e main implementation methods of traditional text sen- keyboard, mouse, and other manual input equipment to tell timent analysis are generally divided into two kinds: senti- the computer information and computer through the dis- ment dictionary and machine learning algorithm. Text play and other peripherals to feedback information to hu- emotion analysis is usually based on emotion dictionary. At man. is interaction is inconvenient and requires a large present, relatively well-known emotion dictionaries include number of peripherals. And in ordinary life, it is not HowNET, Chinese Polar Emotion Word (NTUSD) from guaranteed that everyone can use a computer [6]. DiŠerent Taiwan University, and English emotion dictionary Word Net from the traditional interaction between human and com- from Preston University [10]. e emotion analysis process puter, the interaction between human and machine is carried based on the emotion dictionary is shown in Figure 1. out through some well-known natural channels, such as Machine learning is used for sentiment analysis, which is speech, vision, touch, hearing, proximity, and other human regarded as a text classi’cation task. e commonly used 2 Journal of Robotics Mark emotional words and count them Emotional Search for Search for Calculate Sentence Participle word emotion words emotion emotional matching before degree words before score words turning words Tag word weight Emotional dictionary Figure 1: +e emotion analysis process based on the emotion dictionary. objects, so intention prediction becomes important and methods include Naive Bayes, SVM, and CRF. Li [11] compared Naive Bayes, maximum entropy model, and essential [18]. Human interaction usually requires contin- uous prediction of intention. For example, in a conversation, support vector machine algorithm in the emotion classifi- cation task of film reviews and found that SVM achieved the people are constantly trying to predict the direction of a best classification effect. Huang [12] (2021) used the mul- future conversation or the reactions of others through in- tistrategy method of SVM hierarchical structure to classify tention prediction [19]. +erefore, in order to make human- the emotional polarity of Chinese microblogs. +e experi- robot interaction more like human-human interaction, in- ment shows that the SVM-based multistrategy method has tention prediction in human-computer interaction is es- the best effect, and the introduction of theme-related fea- sential. According to the classification of human-computer tures can improve the accuracy to some extent. Lu [13] interaction, intention prediction also has different pro- (2018) experimented with SVM, Bayes, and other classifi- cessing situations and forms [20]. In cooperative human- computer interaction, intention prediction is mainly caused cation algorithms and information gain and other feature selection algorithms in Chinese microblog sentiment anal- by incomplete interactive information. Humans and robots should cooperate to complete tasks. With the completeness ysis and took TF-IDF as the feature weight. Experimental results show that TF-IDF can be used as a feature weight, of information, it is necessary to add the prediction of SVM can be used as a classification algorithm, and infor- human intentions to complete tasks better and more mation gain can be used as a feature selection algorithm to efficiently. achieve the best classification effect. Due to the linguistic phenomenon of polysemy and With the development of deep learning, deep learning irony in Chinese, the method based on emotion dictionary models have also been applied to text classification. Law [14] cannot achieve high accuracy and is not suitable for cross- (2013) proposed the reinforcement learning framework field research. With the geometric increase in information DISA based on CNN and LSTM by taking Chinese audio content, the establishment of a data-driven machinelearning model for emotion analysis of irregular documents has a information and pinyin as emotion analysis features and achieved good results. Cela [15] (2013) applied the D-S good application prospect. evidence theory to integrate emotional information from visual, sound, and other aspects and analyzed the transfer 3. Emotion Recognition Process and Data rule of emotional state caused by the simultaneous action of Acquisition Preprocessing the two factors. Finally, the emotion model is applied to the emotion robot system, so that the robot can generate 3.1. Emotion Recognition Process. In the process of inter- emotions according to the external stimulus and make the action, to obtain text information, the voice content of the corresponding expression. +e experimental results show interactive object needs to be recorded through the mi- that the affective model is effective. Tidoni [16] (2014) crophone and converted into an audio and then the voice combined the idea of a recurrent neural network with that of recognition module obtains the text information through a convolutional neural network to improve the limitations of voice recognition. +e text information is preprocessed and CNN in long-distance context capture and proposed RCNN fed into the emotion analysis model, from which the for text classification. BaTula [17] (2017) proposed a game- emotional state of the interactive object is output. As for the based cognitive and emotional interaction model for robots construction of emotion analysis models, this paper adopts based on PAD (pleasure-arousal-dominance) emotion the idea of machine learning to build a data-driven emotion space, aiming at the problem of lack of emotion and low analysis model [21]. +e algorithm is selected to conduct participation of members in existing open-domain human- offline training through data sets, and the model after computer interaction systems. Experimental results show training is reserved. +e saved model is loaded and used for that compared with other cognitive interaction models, the prediction. +e text emotion analysis process based on proposed model can reduce the dependence of robot on machine learning is shown in Figure 2. external emotional stimuli and guide members to participate in human-computer interaction effectively. +ere are various forms of information transmission. 3.2. Data Acquisition and Preprocessing. +e data includes Due to the limitation of technology, it is impossible for data acquisition and data preprocessing. +e specific con- robots to completely obtain the information of interactive tents are as follows. Journal of Robotics 3 Table 1: Comparison table of punctuation and facial expressions in Training data Data preprocessing Model training textual processing. Punctuation and facial expression Textual processing Doubt Test data Data preprocessing Model Sentiment analysis Amazing End, helpless Figure 2: Text emotion analysis process based on machine Plaint learning. Awkward Happy 3.2.1. Data Acquisition. In building the emotion analysis Unhappy model of humanoid robot, we used the data set from the “Microblog Cross-Language Emotion Recognition Dataset” published by the International Conference on Natural Lan- [“+is,” “book,” “Content,” “Good,” “!”] +ere are al- together 5 independent features in the above examples, guage Processing and Chinese Computing (NLPCC) in 2019 and 2020. +e corpus is divided into positive and negative which can be represented by independent thermal coding according to the order of word occurrence. +e independent categories. Among them, there are 12,153 positive corpus categories and 12,178 negative corpus categories. +e corpus thermal coding of some features can be expressed as follows: content is from microblogs, and the sentences are colloquial, “+is”: [1, 0, 0, 0, 0, 0, 0, 0, 0] which is suitable for training the emotion analysis model. “book”: [0, 1, 0, 0, 0, 0, 0, 0, 0] “Content”: [0, 0, 1, 0, 0, 0, 0, 0, 0] 3.2.2. Data Preprocessing. In data preprocessing, in order to “Good”: [0, 0, 0, 1, 0, 0, 0, 0, 0] ensure the uniformity of positive and negative categories in “!”: [0, 0, 0, 0, 0, 0, 0, 0, 1] thecorpus,25itemsweredeletedfromthenegativelabelcorpus through downsampling and the positive and negative samples In unique thermal coding, each feature has a single wereunifiedinto12,153items.Sincemostofthecorpusistaken dimension and the dimensions of unique thermal coding are from Weibo, it contains many emojis and repetitive punctu- the same as the number of different features. To a certain ation marks. In addition, in practical application, the sentence extent, the unique thermal coding plays a role in expanding after speech recognition will not have multiple repeated the features, but when the database dictionary content is punctuation marks [22]. Based on the above points, redundant large, this representation method takes up a lot of space and punctuation marks and emoticons were deleted in the pre- the calculation dimension is large. processing, which were not regarded as features. During word “+is is a good book with good content!”: [1, 2, 2, 1, 1, 1, segmentation, the Jieba word segmentation tool kit was used in 1, 1, 1]. this study. A comparison table of punctuation and facial ex- +e word bag model is a vector space model in which the pressions in textual processing is shown in Table 1. number of words is represented in the corresponding po- In the text vector space model, the commonly used sition according to the word index to achieve the repre- feature selection methods include the chi-square test, in- sentation of the whole sentence. formation gain, mutual information method, and TF-IDF. TF-IDF combines word distribution information among documents in the word bag model and highlights key in- 4.2. Characteristics Analysis. In the text vector space model, formation by calculating absolute word frequency and the commonly used feature selection methods include the inverted document frequency. chi-square test, information gain, mutual information In the subsequent construction of emotion analysis method, and TF-IDF. TF-IDF combines word distribution models, we experimented with traditional machine learning information between documents in the word bag model, and models such as Bernoulli Bayes, Polynomial Bayes, and keywords are highlighted by calculating absolute term fre- support vector machine (SVM) and neural network models quency (TF) and inverse document frequency (IDF). such as Bi-LSTM, Bi-LSTM combined with attention Absolute word frequency (TF) represents the spelling of mechanism, and Text-CNN. the feature item in the training text D . Important words in a text are often emphasized multiple times, and absolute word frequency can easily highlight these words. 4. Affective State Analysis Language Processing Calculation of IDF of inverted document frequency is shown in the following formula: 4.1. Language Code. In natural language processing (NLP), sentence segmentation is generally carried out, with char- IDF � log . (1) acters, words, and phrases as the minimum unit of esti- mation of Chinese language. Unique thermal coding is the simplest representation of this kind of feature. In unique N represents the total number of documents in the coding, each different feature has its special state bit. training set, and n represents the number of documents in Forexample,“It’sagoodbook!”;afterwordsegmentation, which feature item appears in the training set. IDF highlights the result reads “+is is a good book with good content!.” some words that appear less frequently but have strong 4 Journal of Robotics classification ability. In the actual calculation process, IDF min w w + C ζ i will carry out smooth processing in order to avoid the w,b,ζ 2 i�1 missing of rare words in the corpus. N + 1 TF − IDF � TF × IDF � tf × log . (2) ij s.t.y w ϕ