Get 20M+ Full-Text Papers For Less Than $1.50/day. Start a 14-Day Trial for You or Your Team.

Learn More →

Emotional State Analysis Model of Humanoid Robot in Human-Computer Interaction Process

Emotional State Analysis Model of Humanoid Robot in Human-Computer Interaction Process Hindawi Journal of Robotics Volume 2022, Article ID 8951671, 6 pages https://doi.org/10.1155/2022/8951671 Research Article Emotional State Analysis Model of Humanoid Robot in Human-Computer Interaction Process Boxin Peng School of Computer Science and Technology, Southeast University, Dhaka, Bangladesh Correspondence should be addressed to Boxin Peng; 213193474@seu.edu.cn Received 22 December 2021; Revised 27 March 2022; Accepted 30 March 2022; Published 6 May 2022 Academic Editor: Shan Zhong Copyright © 2022 Boxin Peng. ­is is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. ­e traditional humanoid robot dialogue system is generally based on template construction, which can make a good response in the set dialogue domain but cannot generate a good response to the content outside the domain. ­e rules of the dialogue system rely on manual design and lack of emotion detection of the interactive objects. In view of the shortcomings of traditional methods, this study designed an emotion analysis model based on deep neural network to detect the emotion of interactive objects and built an open-domain dialogue system of humanoid robot. In aŠective state analysis language processing, language coding, feature analysis, and Word2vec research are carried out. ­en, an emotional state analysis model is constructed to train the emotional state of a humanoid robot, and the training results are summarized. interactions to complete tasks [7]. ­is kind of interaction is 1. Introduction familiar to most people and is more concise and e•cient. So it can help people and robots interact more eŠectively to With the progress of science and technology, robots have complete tasks [8]. ­e emotion analysis model of humanoid gradually entered every aspect of people’s lives. From in- robot can analyze and identify the emotional information of dustrial use to military applications, home service, educa- the interactive object in the process of interaction, which is an tion, and laboratories, robots are playing a signi’cant role important part of the dialogue system [9]. In the process of [1]. According to the three laws of robotics [2], the ultimate interaction, the language of the object of interaction contains goal of robot development is to make robots imitate human rich emotional information and the text content information intelligent behavior, to help humans better complete tasks, is a high-level expression of human thinking. and to achieve goals [3]. In the process of human and robot cooperation to complete tasks, human beings cannot avoid the need to better communicate with the robot in order to 2. Literature Review complete tasks more e•ciently [4, 5]. Traditional interaction between human and computer is human mainly through ­e main implementation methods of traditional text sen- keyboard, mouse, and other manual input equipment to tell timent analysis are generally divided into two kinds: senti- the computer information and computer through the dis- ment dictionary and machine learning algorithm. Text play and other peripherals to feedback information to hu- emotion analysis is usually based on emotion dictionary. At man. ­is interaction is inconvenient and requires a large present, relatively well-known emotion dictionaries include number of peripherals. And in ordinary life, it is not HowNET, Chinese Polar Emotion Word (NTUSD) from guaranteed that everyone can use a computer [6]. DiŠerent Taiwan University, and English emotion dictionary Word Net from the traditional interaction between human and com- from Preston University [10]. ­e emotion analysis process puter, the interaction between human and machine is carried based on the emotion dictionary is shown in Figure 1. out through some well-known natural channels, such as Machine learning is used for sentiment analysis, which is speech, vision, touch, hearing, proximity, and other human regarded as a text classi’cation task. ­e commonly used 2 Journal of Robotics Mark emotional words and count them Emotional Search for Search for Calculate Sentence Participle word emotion words emotion emotional matching before degree words before score words turning words Tag word weight Emotional dictionary Figure 1: +e emotion analysis process based on the emotion dictionary. objects, so intention prediction becomes important and methods include Naive Bayes, SVM, and CRF. Li [11] compared Naive Bayes, maximum entropy model, and essential [18]. Human interaction usually requires contin- uous prediction of intention. For example, in a conversation, support vector machine algorithm in the emotion classifi- cation task of film reviews and found that SVM achieved the people are constantly trying to predict the direction of a best classification effect. Huang [12] (2021) used the mul- future conversation or the reactions of others through in- tistrategy method of SVM hierarchical structure to classify tention prediction [19]. +erefore, in order to make human- the emotional polarity of Chinese microblogs. +e experi- robot interaction more like human-human interaction, in- ment shows that the SVM-based multistrategy method has tention prediction in human-computer interaction is es- the best effect, and the introduction of theme-related fea- sential. According to the classification of human-computer tures can improve the accuracy to some extent. Lu [13] interaction, intention prediction also has different pro- (2018) experimented with SVM, Bayes, and other classifi- cessing situations and forms [20]. In cooperative human- computer interaction, intention prediction is mainly caused cation algorithms and information gain and other feature selection algorithms in Chinese microblog sentiment anal- by incomplete interactive information. Humans and robots should cooperate to complete tasks. With the completeness ysis and took TF-IDF as the feature weight. Experimental results show that TF-IDF can be used as a feature weight, of information, it is necessary to add the prediction of SVM can be used as a classification algorithm, and infor- human intentions to complete tasks better and more mation gain can be used as a feature selection algorithm to efficiently. achieve the best classification effect. Due to the linguistic phenomenon of polysemy and With the development of deep learning, deep learning irony in Chinese, the method based on emotion dictionary models have also been applied to text classification. Law [14] cannot achieve high accuracy and is not suitable for cross- (2013) proposed the reinforcement learning framework field research. With the geometric increase in information DISA based on CNN and LSTM by taking Chinese audio content, the establishment of a data-driven machinelearning model for emotion analysis of irregular documents has a information and pinyin as emotion analysis features and achieved good results. Cela [15] (2013) applied the D-S good application prospect. evidence theory to integrate emotional information from visual, sound, and other aspects and analyzed the transfer 3. Emotion Recognition Process and Data rule of emotional state caused by the simultaneous action of Acquisition Preprocessing the two factors. Finally, the emotion model is applied to the emotion robot system, so that the robot can generate 3.1. Emotion Recognition Process. In the process of inter- emotions according to the external stimulus and make the action, to obtain text information, the voice content of the corresponding expression. +e experimental results show interactive object needs to be recorded through the mi- that the affective model is effective. Tidoni [16] (2014) crophone and converted into an audio and then the voice combined the idea of a recurrent neural network with that of recognition module obtains the text information through a convolutional neural network to improve the limitations of voice recognition. +e text information is preprocessed and CNN in long-distance context capture and proposed RCNN fed into the emotion analysis model, from which the for text classification. BaTula [17] (2017) proposed a game- emotional state of the interactive object is output. As for the based cognitive and emotional interaction model for robots construction of emotion analysis models, this paper adopts based on PAD (pleasure-arousal-dominance) emotion the idea of machine learning to build a data-driven emotion space, aiming at the problem of lack of emotion and low analysis model [21]. +e algorithm is selected to conduct participation of members in existing open-domain human- offline training through data sets, and the model after computer interaction systems. Experimental results show training is reserved. +e saved model is loaded and used for that compared with other cognitive interaction models, the prediction. +e text emotion analysis process based on proposed model can reduce the dependence of robot on machine learning is shown in Figure 2. external emotional stimuli and guide members to participate in human-computer interaction effectively. +ere are various forms of information transmission. 3.2. Data Acquisition and Preprocessing. +e data includes Due to the limitation of technology, it is impossible for data acquisition and data preprocessing. +e specific con- robots to completely obtain the information of interactive tents are as follows. Journal of Robotics 3 Table 1: Comparison table of punctuation and facial expressions in Training data Data preprocessing Model training textual processing. Punctuation and facial expression Textual processing Doubt Test data Data preprocessing Model Sentiment analysis Amazing End, helpless Figure 2: Text emotion analysis process based on machine Plaint learning. Awkward Happy 3.2.1. Data Acquisition. In building the emotion analysis Unhappy model of humanoid robot, we used the data set from the “Microblog Cross-Language Emotion Recognition Dataset” published by the International Conference on Natural Lan- [“+is,” “book,” “Content,” “Good,” “!”] +ere are al- together 5 independent features in the above examples, guage Processing and Chinese Computing (NLPCC) in 2019 and 2020. +e corpus is divided into positive and negative which can be represented by independent thermal coding according to the order of word occurrence. +e independent categories. Among them, there are 12,153 positive corpus categories and 12,178 negative corpus categories. +e corpus thermal coding of some features can be expressed as follows: content is from microblogs, and the sentences are colloquial, “+is”: [1, 0, 0, 0, 0, 0, 0, 0, 0] which is suitable for training the emotion analysis model. “book”: [0, 1, 0, 0, 0, 0, 0, 0, 0] “Content”: [0, 0, 1, 0, 0, 0, 0, 0, 0] 3.2.2. Data Preprocessing. In data preprocessing, in order to “Good”: [0, 0, 0, 1, 0, 0, 0, 0, 0] ensure the uniformity of positive and negative categories in “!”: [0, 0, 0, 0, 0, 0, 0, 0, 1] thecorpus,25itemsweredeletedfromthenegativelabelcorpus through downsampling and the positive and negative samples In unique thermal coding, each feature has a single wereunifiedinto12,153items.Sincemostofthecorpusistaken dimension and the dimensions of unique thermal coding are from Weibo, it contains many emojis and repetitive punctu- the same as the number of different features. To a certain ation marks. In addition, in practical application, the sentence extent, the unique thermal coding plays a role in expanding after speech recognition will not have multiple repeated the features, but when the database dictionary content is punctuation marks [22]. Based on the above points, redundant large, this representation method takes up a lot of space and punctuation marks and emoticons were deleted in the pre- the calculation dimension is large. processing, which were not regarded as features. During word “+is is a good book with good content!”: [1, 2, 2, 1, 1, 1, segmentation, the Jieba word segmentation tool kit was used in 1, 1, 1]. this study. A comparison table of punctuation and facial ex- +e word bag model is a vector space model in which the pressions in textual processing is shown in Table 1. number of words is represented in the corresponding po- In the text vector space model, the commonly used sition according to the word index to achieve the repre- feature selection methods include the chi-square test, in- sentation of the whole sentence. formation gain, mutual information method, and TF-IDF. TF-IDF combines word distribution information among documents in the word bag model and highlights key in- 4.2. Characteristics Analysis. In the text vector space model, formation by calculating absolute word frequency and the commonly used feature selection methods include the inverted document frequency. chi-square test, information gain, mutual information In the subsequent construction of emotion analysis method, and TF-IDF. TF-IDF combines word distribution models, we experimented with traditional machine learning information between documents in the word bag model, and models such as Bernoulli Bayes, Polynomial Bayes, and keywords are highlighted by calculating absolute term fre- support vector machine (SVM) and neural network models quency (TF) and inverse document frequency (IDF). such as Bi-LSTM, Bi-LSTM combined with attention Absolute word frequency (TF) represents the spelling of mechanism, and Text-CNN. the feature item in the training text D . Important words in a text are often emphasized multiple times, and absolute word frequency can easily highlight these words. 4. Affective State Analysis Language Processing Calculation of IDF of inverted document frequency is shown in the following formula: 4.1. Language Code. In natural language processing (NLP), sentence segmentation is generally carried out, with char- IDF � log . (1) acters, words, and phrases as the minimum unit of esti- mation of Chinese language. Unique thermal coding is the simplest representation of this kind of feature. In unique N represents the total number of documents in the coding, each different feature has its special state bit. training set, and n represents the number of documents in Forexample,“It’sagoodbook!”;afterwordsegmentation, which feature item appears in the training set. IDF highlights the result reads “+is is a good book with good content!.” some words that appear less frequently but have strong 4 Journal of Robotics classification ability. In the actual calculation process, IDF min w w + C 􏽘 ζ i will carry out smooth processing in order to avoid the w,b,ζ 2 i�1 missing of rare words in the corpus. N + 1 TF − IDF � TF × IDF � tf × log . (2) ij s.t.y 􏼐w ϕ x􏼁 + b􏼑≥1 − 􏽘 ζ i, ζ i≥0, i � 1,2, . . . , N. i i n + 1 i�1 (3) In formula (3), w is the normal vector separating the 4.3. Word2vec. Word2vec was proposed by Google in 2013. hyperplane, ζ i is the relaxation variable, and ϕ(x ) is the It is a way to represent words through dense feature rep- mapping function. +e dual form of the problem can be resentation, also known as distributed representation. +ere expressed as are two models for Word2vec training, namely, the Con- tinuous Bag-of-Words (CBOW) model and Skip-Gram 1 T T min α Qα − e α model. +e improved methods of Word2vec are divided into (4) two types, one is based on Hierarchical Softmax and the other is based on Negative Sampling, both of which are for s.t.y α � 0, 0≤ α ≤ C, i � 1,2, . . . , N. simplifying computation and accelerating training. In Hi- In formula (4), e stands for all vectors 1, C is the upper erarchical Softmax, the output of projection layer under the bound of Lagrange everyday number, and Q is a semidefinite CBOW model is the mean value of input word vector sum matrix of shape (N, N). and the output of projection layer under the Skip-Gram model is the same as the input. In order to avoid calculating Q � y y K􏼐x , x 􏼑. (5) ij i j i j the probability of all words, the Hierarchical Softmax ap- proach uses Huffman tree instead of Softmax mapping of the T In formulas (3–5), K(x , x ) � ϕ(x ) ϕ(x ) is the kernel. i j i j projection layer to output layer. Negative Sampling +e decision function Gram of the support vector machine is Word2vecis widely usedin various naturallanguage models, N expressed as sign(􏽐 y α K(x , x) + b). i i i i and word vector is also a pretraining method that can bring a neural network to a better training starting point and make the network easier to optimize [23]. 5.2. Training Process. In order to make the support vector Compared with the independent thermal coding, the machine output category probability, Platt Scaling was used dense feature is easy to calculate and does not have the in this paper. +is is a parameterized method that uses the problem of dimension explosion, which has a strong gen- logistic regression model to fit the output values; that is, the eralization ability. Dense feature representation can provide Sigmoid function is used to map the values to between [0,1] similarity information between features. +is distributed and finally the output values of the original model are representation of the word vector is widely used in natural mapped to probability values, as shown in the following language processing, such as Chinese word segmentation, formula: sentiment analysis, and reading comprehension. P(y � 1|x) ≈ P (f) ≡ . (6) A,B 1 + exp(Af + B) 5. Construction of the Affective State Analysis Model In formula (3), f(x) is the decision function of the support vector machine, which can output corresponding 5.1. Introduction to the Model. Support vector machine labels to any input X, and parameter A and parameter B are (SVM) is a binary model algorithm that can also be used for trainable parameters. text classification. +e basic idea of a support vector machine +e objective function of training is the loss of cross is to find the hyperplane with the largest interval in the entropy, as shown in the following formulas: feature space. Its advantages are that it is effective in high- dimensional space and still has good effect when the number min F(z) � − 􏽘 t log(P) + 1 − t􏼁 log 1 − P􏼁 􏼁, (7) i i i of dimensions is larger than the number of samples. Dif- z�(A,B) ferent kernel functions can be specified during the design of support vector machines. However, when the number of N + 1 ⎧ ⎪ if y � +1 features is much larger than the number of samples, the ⎪ i N + 2 performance of SVM is poor. t � i � 1,2, . . . , l. (8) For the training data set T, T is ⎪ 1 if y � −1 (x , y ), (x , y ), . . . , (x , y ) and i � 1,2, . . . , N, 􏼈 􏼉 1 1 2 2 N N N + 2 y ∈ {−1, +1} stands for negative and positive labels, x is a i i sample or sentence, and N is the number of samples. Op- By Platt Scaling, support vector machines can output probability values of categories. +e basic idea is that the timization problems solved by support vector machines are shown in the following formula: closer the points are to the interface, the less likely they are to Journal of Robotics 5 [3] M. Vir´ıkova and S. Peter, “Teach your robot how you want it match and the farther the points are from the interface, the to express emotions,” Advances in Intelligent Systems and more likely they are to match. Computing, vol. 316, pp. 81–92, 2015. [4] X. Ke, Y. Shang, and K. Lu, “Based on hyper works humanoid 5.3. Training Results. Support vector machines generally robot facial expression simulation,” Manufacturing Auto- adopt a linear kernel in text classification. +e test results of mation, vol. 137, no. 1, pp. 118–121, 2015. the SVM model with a linear kernel as the kernel function [5] F. Azni Jafar, N. Abdullah, N. Blar, M. N. Muhammad, and and a word bag model as input are as follows: F1 value is A. M. Kassim, “Analysis of human emotion state in collab- 0.763, the accuracy is 76.81%, and the AUC value is 0.821. oration with robot,” Applied Mechanics and Materials, vol. 465-466, pp. 682–687, 2013. +e test results of the SVM model with a linear kernel as the [6] Z. Shao, R. Chandramouli, K. P. Subbalakshmi, and kernel function and TF-IDF as the input are as follows: F1 C. T. Boyadjiev, “An analytical system for user emotion ex- value is 0.795, the accuracy is 78.94%, and the AUC value is traction, mental state modeling, and rating,” Expert Systems 0.863. with Applications, vol. 124, no. 7, pp. 82–96, 2019. Compared with the word bag model, SVM with TF-IDF [7] J. Hernandez-Vicen, S. Martinez, J. Garcia-Haro, and as feature inputachieves better results underthe current data C. Balaguer, “Correction of visual perception based on neuro- set. fuzzy learning for the humanoid robot TEO,” Sensors, vol.18, no. 4, pp. 972-973, 2018. 6. Conclusion [8] A. Zaraki, D. Mazzei, M. Giuliani, and D. De Rossi, “De- signing and evaluating a social gaze-control system for a +e main steps of building an emotion analysis model in- humanoid robot,” IEEE Transactions on Human-Machine clude emotion recognition process and data acquisition and Systems, vol. 44, no. 2, pp. 157–168, 2014. preprocessing, emotional state analysis language processing, [9] J.Wainer, B.Robins, F. Amirabdollahian,andK. Dautenhahn, emotional state analysis model construction, and integration “Using the humanoid robot KASPAR to autonomously play [24]. +e effects of TF-IDF characteristics and the word bag triadic games and facilitate collaborative play among children model on the training results are analyzed and explored in with autism,” IEEE Transactions on Autonomous Mental the experiment. +e results show that in a single model, the Development, vol. 6, no. 3, pp. 183–199, 2014. TF-IDF feature combined with the support vector machine [10] L. Tang, Z. Li, X. Yuan, W. Li, and A. Liu, “Analysis of op- eration behavior of inspection robot in human-machine in- achieves the optimal result. +e stacking strategy and soft teraction,” Modern Manufacturing Engineering, vol. 3, no. 3, voting strategy were compared in the model integration, and pp. 7-8, 2021. the optimal performance was obtained by stacking with a [11] Z. Li and H. Wang, “Design and implementation of mobile support vector machine learner. +is is the main innovation robot remote human-computer interaction software plat- of this paper. +e introduction of the methods used in each form,” Computer Measurement & Control, vol. 25, no. 4, component is too superficial, which is the main shortcoming pp. 5-6, 2017. of this paper. In this paper, the design principle of a sen- [12] H. Huang,N.Liu, M.Hu,Y. Tao,andL.Kou, “Robotcognitive timent analysis model based on the support vector machine and affective interaction model based on game,” Journal of is studied. Based on the experimental data, the influence of Electronics and Information Technology, vol. 43, no. 6, pp. 8-9, attention mechanism on the neural network model is ex- plored and the performance of traditional machine learning [13] Lufei, Y. Jiang, and G. Tian, “Autonomous cognition and model is compared with different inputs. In the single model personalized selection of robot service based on emotion- experiment, the support vector machine combined with TF- spatiotemporal information,” Robot, vol. 40, no. 4, pp. 9-10, IDF achieved the best classification effect, with a F1 value of [14] J. Law, P. Shaw, and M. Lee, “A biologically constrained 0.795, an accuracy rate of 78.94%, and an AUC value of 0.863. architecture for developmental learning of eye-head gaze control on a humanoid robot,” Autonomous Robots, vol. 35, Data Availability no. 1, pp. 77–92, 2013. [15] A. Cela, J. Yebes, R. Arroyo, L. R. Bergasa, and E. Lopez, ´ +e data used to support the findings of this study are in- “Complete low-cost implementation of a teleoperated control cluded within the article. system for a humanoid robot,” Sensors, vol. 13, no. 2, pp. 1385–1401, 2013. Conflicts of Interest [16] E. Tidoni, P. Gergondet, A. Kheddar, and S. M. Aglioto, “Audio-visual feedback improves the BCI performance in the +e author declares no conflicts of interest. navigational control of a humanoid robot,” Frontiers in Neurorobotics, vol. 8, 2014. References [17] A. M. BaTula, Y. E. Kim, and H. Ayaz, “Virtual and actual humanoid robot control withfour-class motor-imagery-based [1] L. Gabriella, G. Marta, ´ K. Veronika et al., “Emotion attri- optical brain-computer interface,” BioMed Research Inter- bution to a non-humanoid robot in different social situa- national, vol. 2017, Article ID 1463512, 13 pages, 2017. tions,” PLoS One, vol. 9, no. 12, Article ID e114207, 2014. [18] T. Sato, Y. Nishida, J. Ichikawa, and Y. Hatamura, “Active [2] A. Rozanska and M. Podpora, “Multimodal sentiment anal- ysis applied to interaction between patients and a humanoid understanding of human intention by a robot through monitoring of human behavior,” in Proceedings of the IEEE/ robot pepper,” IFAC-Papers Online, vol. 52, no. 27, pp. 411–414, 2019. RSJ/GI International Conference on Intelligent Robots and 6 Journal of Robotics Systems, IROS’94, pp. 405–414, IEEE, Munich, Germany, September 1994. [19] H. Clint, Modelling Intention Recognition for Intelligent Agent Systems, DTIC, Mexico City, Mexico, 2004. [20] K. A. Tahboub, “Intelligent human-machine interaction based on dynamic bayesian networks probabilistic intention rec- ognition,” Journal of Intelligent and Robotic Systems, vol. 45, no. 1, pp. 31–52, 2006. [21] T. Koolen, S. Bertrand, G. +omas et al., “Design of a mo- mentum-based control framework and application to the humanoid robot atlas,” International Journal of Humanoid Robotics, vol. 13, no. 1, 2016. [22] W. Wu and H. Li, “Artificial emotion modeling and human- computer interaction experiment in PAD emotion space,” Journal of Harbin Institute of Technology, vol. 51, no. 1, pp. 9-10, 2019. [23] M. S. Erden, “Emotional postures for the humanoid-robot nao,” International Journal of Social Robotics, vol. 5, no. 4, pp. 441–456, 2013. [24] B. Browatzki, V. Tikhanoff, G. Metta, and H. H. C. Bulthoff, “Active in-hand object recognition on a humanoid robot,” IEEE Transactions on Robotics, vol. 30, no. 5, pp. 1260–1269, http://www.deepdyve.com/assets/images/DeepDyve-Logo-lg.png Journal of Robotics Hindawi Publishing Corporation

Emotional State Analysis Model of Humanoid Robot in Human-Computer Interaction Process

Journal of Robotics , Volume 2022 – May 6, 2022

Loading next page...
 
/lp/hindawi-publishing-corporation/emotional-state-analysis-model-of-humanoid-robot-in-human-computer-4jdcgsUO3J

References

References for this paper are not available at this time. We will be adding them shortly, thank you for your patience.

Publisher
Hindawi Publishing Corporation
Copyright
Copyright © 2022 Boxin Peng. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
ISSN
1687-9600
eISSN
1687-9619
DOI
10.1155/2022/8951671
Publisher site
See Article on Publisher Site

Abstract

Hindawi Journal of Robotics Volume 2022, Article ID 8951671, 6 pages https://doi.org/10.1155/2022/8951671 Research Article Emotional State Analysis Model of Humanoid Robot in Human-Computer Interaction Process Boxin Peng School of Computer Science and Technology, Southeast University, Dhaka, Bangladesh Correspondence should be addressed to Boxin Peng; 213193474@seu.edu.cn Received 22 December 2021; Revised 27 March 2022; Accepted 30 March 2022; Published 6 May 2022 Academic Editor: Shan Zhong Copyright © 2022 Boxin Peng. ­is is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. ­e traditional humanoid robot dialogue system is generally based on template construction, which can make a good response in the set dialogue domain but cannot generate a good response to the content outside the domain. ­e rules of the dialogue system rely on manual design and lack of emotion detection of the interactive objects. In view of the shortcomings of traditional methods, this study designed an emotion analysis model based on deep neural network to detect the emotion of interactive objects and built an open-domain dialogue system of humanoid robot. In aŠective state analysis language processing, language coding, feature analysis, and Word2vec research are carried out. ­en, an emotional state analysis model is constructed to train the emotional state of a humanoid robot, and the training results are summarized. interactions to complete tasks [7]. ­is kind of interaction is 1. Introduction familiar to most people and is more concise and e•cient. So it can help people and robots interact more eŠectively to With the progress of science and technology, robots have complete tasks [8]. ­e emotion analysis model of humanoid gradually entered every aspect of people’s lives. From in- robot can analyze and identify the emotional information of dustrial use to military applications, home service, educa- the interactive object in the process of interaction, which is an tion, and laboratories, robots are playing a signi’cant role important part of the dialogue system [9]. In the process of [1]. According to the three laws of robotics [2], the ultimate interaction, the language of the object of interaction contains goal of robot development is to make robots imitate human rich emotional information and the text content information intelligent behavior, to help humans better complete tasks, is a high-level expression of human thinking. and to achieve goals [3]. In the process of human and robot cooperation to complete tasks, human beings cannot avoid the need to better communicate with the robot in order to 2. Literature Review complete tasks more e•ciently [4, 5]. Traditional interaction between human and computer is human mainly through ­e main implementation methods of traditional text sen- keyboard, mouse, and other manual input equipment to tell timent analysis are generally divided into two kinds: senti- the computer information and computer through the dis- ment dictionary and machine learning algorithm. Text play and other peripherals to feedback information to hu- emotion analysis is usually based on emotion dictionary. At man. ­is interaction is inconvenient and requires a large present, relatively well-known emotion dictionaries include number of peripherals. And in ordinary life, it is not HowNET, Chinese Polar Emotion Word (NTUSD) from guaranteed that everyone can use a computer [6]. DiŠerent Taiwan University, and English emotion dictionary Word Net from the traditional interaction between human and com- from Preston University [10]. ­e emotion analysis process puter, the interaction between human and machine is carried based on the emotion dictionary is shown in Figure 1. out through some well-known natural channels, such as Machine learning is used for sentiment analysis, which is speech, vision, touch, hearing, proximity, and other human regarded as a text classi’cation task. ­e commonly used 2 Journal of Robotics Mark emotional words and count them Emotional Search for Search for Calculate Sentence Participle word emotion words emotion emotional matching before degree words before score words turning words Tag word weight Emotional dictionary Figure 1: +e emotion analysis process based on the emotion dictionary. objects, so intention prediction becomes important and methods include Naive Bayes, SVM, and CRF. Li [11] compared Naive Bayes, maximum entropy model, and essential [18]. Human interaction usually requires contin- uous prediction of intention. For example, in a conversation, support vector machine algorithm in the emotion classifi- cation task of film reviews and found that SVM achieved the people are constantly trying to predict the direction of a best classification effect. Huang [12] (2021) used the mul- future conversation or the reactions of others through in- tistrategy method of SVM hierarchical structure to classify tention prediction [19]. +erefore, in order to make human- the emotional polarity of Chinese microblogs. +e experi- robot interaction more like human-human interaction, in- ment shows that the SVM-based multistrategy method has tention prediction in human-computer interaction is es- the best effect, and the introduction of theme-related fea- sential. According to the classification of human-computer tures can improve the accuracy to some extent. Lu [13] interaction, intention prediction also has different pro- (2018) experimented with SVM, Bayes, and other classifi- cessing situations and forms [20]. In cooperative human- computer interaction, intention prediction is mainly caused cation algorithms and information gain and other feature selection algorithms in Chinese microblog sentiment anal- by incomplete interactive information. Humans and robots should cooperate to complete tasks. With the completeness ysis and took TF-IDF as the feature weight. Experimental results show that TF-IDF can be used as a feature weight, of information, it is necessary to add the prediction of SVM can be used as a classification algorithm, and infor- human intentions to complete tasks better and more mation gain can be used as a feature selection algorithm to efficiently. achieve the best classification effect. Due to the linguistic phenomenon of polysemy and With the development of deep learning, deep learning irony in Chinese, the method based on emotion dictionary models have also been applied to text classification. Law [14] cannot achieve high accuracy and is not suitable for cross- (2013) proposed the reinforcement learning framework field research. With the geometric increase in information DISA based on CNN and LSTM by taking Chinese audio content, the establishment of a data-driven machinelearning model for emotion analysis of irregular documents has a information and pinyin as emotion analysis features and achieved good results. Cela [15] (2013) applied the D-S good application prospect. evidence theory to integrate emotional information from visual, sound, and other aspects and analyzed the transfer 3. Emotion Recognition Process and Data rule of emotional state caused by the simultaneous action of Acquisition Preprocessing the two factors. Finally, the emotion model is applied to the emotion robot system, so that the robot can generate 3.1. Emotion Recognition Process. In the process of inter- emotions according to the external stimulus and make the action, to obtain text information, the voice content of the corresponding expression. +e experimental results show interactive object needs to be recorded through the mi- that the affective model is effective. Tidoni [16] (2014) crophone and converted into an audio and then the voice combined the idea of a recurrent neural network with that of recognition module obtains the text information through a convolutional neural network to improve the limitations of voice recognition. +e text information is preprocessed and CNN in long-distance context capture and proposed RCNN fed into the emotion analysis model, from which the for text classification. BaTula [17] (2017) proposed a game- emotional state of the interactive object is output. As for the based cognitive and emotional interaction model for robots construction of emotion analysis models, this paper adopts based on PAD (pleasure-arousal-dominance) emotion the idea of machine learning to build a data-driven emotion space, aiming at the problem of lack of emotion and low analysis model [21]. +e algorithm is selected to conduct participation of members in existing open-domain human- offline training through data sets, and the model after computer interaction systems. Experimental results show training is reserved. +e saved model is loaded and used for that compared with other cognitive interaction models, the prediction. +e text emotion analysis process based on proposed model can reduce the dependence of robot on machine learning is shown in Figure 2. external emotional stimuli and guide members to participate in human-computer interaction effectively. +ere are various forms of information transmission. 3.2. Data Acquisition and Preprocessing. +e data includes Due to the limitation of technology, it is impossible for data acquisition and data preprocessing. +e specific con- robots to completely obtain the information of interactive tents are as follows. Journal of Robotics 3 Table 1: Comparison table of punctuation and facial expressions in Training data Data preprocessing Model training textual processing. Punctuation and facial expression Textual processing Doubt Test data Data preprocessing Model Sentiment analysis Amazing End, helpless Figure 2: Text emotion analysis process based on machine Plaint learning. Awkward Happy 3.2.1. Data Acquisition. In building the emotion analysis Unhappy model of humanoid robot, we used the data set from the “Microblog Cross-Language Emotion Recognition Dataset” published by the International Conference on Natural Lan- [“+is,” “book,” “Content,” “Good,” “!”] +ere are al- together 5 independent features in the above examples, guage Processing and Chinese Computing (NLPCC) in 2019 and 2020. +e corpus is divided into positive and negative which can be represented by independent thermal coding according to the order of word occurrence. +e independent categories. Among them, there are 12,153 positive corpus categories and 12,178 negative corpus categories. +e corpus thermal coding of some features can be expressed as follows: content is from microblogs, and the sentences are colloquial, “+is”: [1, 0, 0, 0, 0, 0, 0, 0, 0] which is suitable for training the emotion analysis model. “book”: [0, 1, 0, 0, 0, 0, 0, 0, 0] “Content”: [0, 0, 1, 0, 0, 0, 0, 0, 0] 3.2.2. Data Preprocessing. In data preprocessing, in order to “Good”: [0, 0, 0, 1, 0, 0, 0, 0, 0] ensure the uniformity of positive and negative categories in “!”: [0, 0, 0, 0, 0, 0, 0, 0, 1] thecorpus,25itemsweredeletedfromthenegativelabelcorpus through downsampling and the positive and negative samples In unique thermal coding, each feature has a single wereunifiedinto12,153items.Sincemostofthecorpusistaken dimension and the dimensions of unique thermal coding are from Weibo, it contains many emojis and repetitive punctu- the same as the number of different features. To a certain ation marks. In addition, in practical application, the sentence extent, the unique thermal coding plays a role in expanding after speech recognition will not have multiple repeated the features, but when the database dictionary content is punctuation marks [22]. Based on the above points, redundant large, this representation method takes up a lot of space and punctuation marks and emoticons were deleted in the pre- the calculation dimension is large. processing, which were not regarded as features. During word “+is is a good book with good content!”: [1, 2, 2, 1, 1, 1, segmentation, the Jieba word segmentation tool kit was used in 1, 1, 1]. this study. A comparison table of punctuation and facial ex- +e word bag model is a vector space model in which the pressions in textual processing is shown in Table 1. number of words is represented in the corresponding po- In the text vector space model, the commonly used sition according to the word index to achieve the repre- feature selection methods include the chi-square test, in- sentation of the whole sentence. formation gain, mutual information method, and TF-IDF. TF-IDF combines word distribution information among documents in the word bag model and highlights key in- 4.2. Characteristics Analysis. In the text vector space model, formation by calculating absolute word frequency and the commonly used feature selection methods include the inverted document frequency. chi-square test, information gain, mutual information In the subsequent construction of emotion analysis method, and TF-IDF. TF-IDF combines word distribution models, we experimented with traditional machine learning information between documents in the word bag model, and models such as Bernoulli Bayes, Polynomial Bayes, and keywords are highlighted by calculating absolute term fre- support vector machine (SVM) and neural network models quency (TF) and inverse document frequency (IDF). such as Bi-LSTM, Bi-LSTM combined with attention Absolute word frequency (TF) represents the spelling of mechanism, and Text-CNN. the feature item in the training text D . Important words in a text are often emphasized multiple times, and absolute word frequency can easily highlight these words. 4. Affective State Analysis Language Processing Calculation of IDF of inverted document frequency is shown in the following formula: 4.1. Language Code. In natural language processing (NLP), sentence segmentation is generally carried out, with char- IDF � log . (1) acters, words, and phrases as the minimum unit of esti- mation of Chinese language. Unique thermal coding is the simplest representation of this kind of feature. In unique N represents the total number of documents in the coding, each different feature has its special state bit. training set, and n represents the number of documents in Forexample,“It’sagoodbook!”;afterwordsegmentation, which feature item appears in the training set. IDF highlights the result reads “+is is a good book with good content!.” some words that appear less frequently but have strong 4 Journal of Robotics classification ability. In the actual calculation process, IDF min w w + C 􏽘 ζ i will carry out smooth processing in order to avoid the w,b,ζ 2 i�1 missing of rare words in the corpus. N + 1 TF − IDF � TF × IDF � tf × log . (2) ij s.t.y 􏼐w ϕ x􏼁 + b􏼑≥1 − 􏽘 ζ i, ζ i≥0, i � 1,2, . . . , N. i i n + 1 i�1 (3) In formula (3), w is the normal vector separating the 4.3. Word2vec. Word2vec was proposed by Google in 2013. hyperplane, ζ i is the relaxation variable, and ϕ(x ) is the It is a way to represent words through dense feature rep- mapping function. +e dual form of the problem can be resentation, also known as distributed representation. +ere expressed as are two models for Word2vec training, namely, the Con- tinuous Bag-of-Words (CBOW) model and Skip-Gram 1 T T min α Qα − e α model. +e improved methods of Word2vec are divided into (4) two types, one is based on Hierarchical Softmax and the other is based on Negative Sampling, both of which are for s.t.y α � 0, 0≤ α ≤ C, i � 1,2, . . . , N. simplifying computation and accelerating training. In Hi- In formula (4), e stands for all vectors 1, C is the upper erarchical Softmax, the output of projection layer under the bound of Lagrange everyday number, and Q is a semidefinite CBOW model is the mean value of input word vector sum matrix of shape (N, N). and the output of projection layer under the Skip-Gram model is the same as the input. In order to avoid calculating Q � y y K􏼐x , x 􏼑. (5) ij i j i j the probability of all words, the Hierarchical Softmax ap- proach uses Huffman tree instead of Softmax mapping of the T In formulas (3–5), K(x , x ) � ϕ(x ) ϕ(x ) is the kernel. i j i j projection layer to output layer. Negative Sampling +e decision function Gram of the support vector machine is Word2vecis widely usedin various naturallanguage models, N expressed as sign(􏽐 y α K(x , x) + b). i i i i and word vector is also a pretraining method that can bring a neural network to a better training starting point and make the network easier to optimize [23]. 5.2. Training Process. In order to make the support vector Compared with the independent thermal coding, the machine output category probability, Platt Scaling was used dense feature is easy to calculate and does not have the in this paper. +is is a parameterized method that uses the problem of dimension explosion, which has a strong gen- logistic regression model to fit the output values; that is, the eralization ability. Dense feature representation can provide Sigmoid function is used to map the values to between [0,1] similarity information between features. +is distributed and finally the output values of the original model are representation of the word vector is widely used in natural mapped to probability values, as shown in the following language processing, such as Chinese word segmentation, formula: sentiment analysis, and reading comprehension. P(y � 1|x) ≈ P (f) ≡ . (6) A,B 1 + exp(Af + B) 5. Construction of the Affective State Analysis Model In formula (3), f(x) is the decision function of the support vector machine, which can output corresponding 5.1. Introduction to the Model. Support vector machine labels to any input X, and parameter A and parameter B are (SVM) is a binary model algorithm that can also be used for trainable parameters. text classification. +e basic idea of a support vector machine +e objective function of training is the loss of cross is to find the hyperplane with the largest interval in the entropy, as shown in the following formulas: feature space. Its advantages are that it is effective in high- dimensional space and still has good effect when the number min F(z) � − 􏽘 t log(P) + 1 − t􏼁 log 1 − P􏼁 􏼁, (7) i i i of dimensions is larger than the number of samples. Dif- z�(A,B) ferent kernel functions can be specified during the design of support vector machines. However, when the number of N + 1 ⎧ ⎪ if y � +1 features is much larger than the number of samples, the ⎪ i N + 2 performance of SVM is poor. t � i � 1,2, . . . , l. (8) For the training data set T, T is ⎪ 1 if y � −1 (x , y ), (x , y ), . . . , (x , y ) and i � 1,2, . . . , N, 􏼈 􏼉 1 1 2 2 N N N + 2 y ∈ {−1, +1} stands for negative and positive labels, x is a i i sample or sentence, and N is the number of samples. Op- By Platt Scaling, support vector machines can output probability values of categories. +e basic idea is that the timization problems solved by support vector machines are shown in the following formula: closer the points are to the interface, the less likely they are to Journal of Robotics 5 [3] M. Vir´ıkova and S. Peter, “Teach your robot how you want it match and the farther the points are from the interface, the to express emotions,” Advances in Intelligent Systems and more likely they are to match. Computing, vol. 316, pp. 81–92, 2015. [4] X. Ke, Y. Shang, and K. Lu, “Based on hyper works humanoid 5.3. Training Results. Support vector machines generally robot facial expression simulation,” Manufacturing Auto- adopt a linear kernel in text classification. +e test results of mation, vol. 137, no. 1, pp. 118–121, 2015. the SVM model with a linear kernel as the kernel function [5] F. Azni Jafar, N. Abdullah, N. Blar, M. N. Muhammad, and and a word bag model as input are as follows: F1 value is A. M. Kassim, “Analysis of human emotion state in collab- 0.763, the accuracy is 76.81%, and the AUC value is 0.821. oration with robot,” Applied Mechanics and Materials, vol. 465-466, pp. 682–687, 2013. +e test results of the SVM model with a linear kernel as the [6] Z. Shao, R. Chandramouli, K. P. Subbalakshmi, and kernel function and TF-IDF as the input are as follows: F1 C. T. Boyadjiev, “An analytical system for user emotion ex- value is 0.795, the accuracy is 78.94%, and the AUC value is traction, mental state modeling, and rating,” Expert Systems 0.863. with Applications, vol. 124, no. 7, pp. 82–96, 2019. Compared with the word bag model, SVM with TF-IDF [7] J. Hernandez-Vicen, S. Martinez, J. Garcia-Haro, and as feature inputachieves better results underthe current data C. Balaguer, “Correction of visual perception based on neuro- set. fuzzy learning for the humanoid robot TEO,” Sensors, vol.18, no. 4, pp. 972-973, 2018. 6. Conclusion [8] A. Zaraki, D. Mazzei, M. Giuliani, and D. De Rossi, “De- signing and evaluating a social gaze-control system for a +e main steps of building an emotion analysis model in- humanoid robot,” IEEE Transactions on Human-Machine clude emotion recognition process and data acquisition and Systems, vol. 44, no. 2, pp. 157–168, 2014. preprocessing, emotional state analysis language processing, [9] J.Wainer, B.Robins, F. Amirabdollahian,andK. Dautenhahn, emotional state analysis model construction, and integration “Using the humanoid robot KASPAR to autonomously play [24]. +e effects of TF-IDF characteristics and the word bag triadic games and facilitate collaborative play among children model on the training results are analyzed and explored in with autism,” IEEE Transactions on Autonomous Mental the experiment. +e results show that in a single model, the Development, vol. 6, no. 3, pp. 183–199, 2014. TF-IDF feature combined with the support vector machine [10] L. Tang, Z. Li, X. Yuan, W. Li, and A. Liu, “Analysis of op- eration behavior of inspection robot in human-machine in- achieves the optimal result. +e stacking strategy and soft teraction,” Modern Manufacturing Engineering, vol. 3, no. 3, voting strategy were compared in the model integration, and pp. 7-8, 2021. the optimal performance was obtained by stacking with a [11] Z. Li and H. Wang, “Design and implementation of mobile support vector machine learner. +is is the main innovation robot remote human-computer interaction software plat- of this paper. +e introduction of the methods used in each form,” Computer Measurement & Control, vol. 25, no. 4, component is too superficial, which is the main shortcoming pp. 5-6, 2017. of this paper. In this paper, the design principle of a sen- [12] H. Huang,N.Liu, M.Hu,Y. Tao,andL.Kou, “Robotcognitive timent analysis model based on the support vector machine and affective interaction model based on game,” Journal of is studied. Based on the experimental data, the influence of Electronics and Information Technology, vol. 43, no. 6, pp. 8-9, attention mechanism on the neural network model is ex- plored and the performance of traditional machine learning [13] Lufei, Y. Jiang, and G. Tian, “Autonomous cognition and model is compared with different inputs. In the single model personalized selection of robot service based on emotion- experiment, the support vector machine combined with TF- spatiotemporal information,” Robot, vol. 40, no. 4, pp. 9-10, IDF achieved the best classification effect, with a F1 value of [14] J. Law, P. Shaw, and M. Lee, “A biologically constrained 0.795, an accuracy rate of 78.94%, and an AUC value of 0.863. architecture for developmental learning of eye-head gaze control on a humanoid robot,” Autonomous Robots, vol. 35, Data Availability no. 1, pp. 77–92, 2013. [15] A. Cela, J. Yebes, R. Arroyo, L. R. Bergasa, and E. Lopez, ´ +e data used to support the findings of this study are in- “Complete low-cost implementation of a teleoperated control cluded within the article. system for a humanoid robot,” Sensors, vol. 13, no. 2, pp. 1385–1401, 2013. Conflicts of Interest [16] E. Tidoni, P. Gergondet, A. Kheddar, and S. M. Aglioto, “Audio-visual feedback improves the BCI performance in the +e author declares no conflicts of interest. navigational control of a humanoid robot,” Frontiers in Neurorobotics, vol. 8, 2014. References [17] A. M. BaTula, Y. E. Kim, and H. Ayaz, “Virtual and actual humanoid robot control withfour-class motor-imagery-based [1] L. Gabriella, G. Marta, ´ K. Veronika et al., “Emotion attri- optical brain-computer interface,” BioMed Research Inter- bution to a non-humanoid robot in different social situa- national, vol. 2017, Article ID 1463512, 13 pages, 2017. tions,” PLoS One, vol. 9, no. 12, Article ID e114207, 2014. [18] T. Sato, Y. Nishida, J. Ichikawa, and Y. Hatamura, “Active [2] A. Rozanska and M. Podpora, “Multimodal sentiment anal- ysis applied to interaction between patients and a humanoid understanding of human intention by a robot through monitoring of human behavior,” in Proceedings of the IEEE/ robot pepper,” IFAC-Papers Online, vol. 52, no. 27, pp. 411–414, 2019. RSJ/GI International Conference on Intelligent Robots and 6 Journal of Robotics Systems, IROS’94, pp. 405–414, IEEE, Munich, Germany, September 1994. [19] H. Clint, Modelling Intention Recognition for Intelligent Agent Systems, DTIC, Mexico City, Mexico, 2004. [20] K. A. Tahboub, “Intelligent human-machine interaction based on dynamic bayesian networks probabilistic intention rec- ognition,” Journal of Intelligent and Robotic Systems, vol. 45, no. 1, pp. 31–52, 2006. [21] T. Koolen, S. Bertrand, G. +omas et al., “Design of a mo- mentum-based control framework and application to the humanoid robot atlas,” International Journal of Humanoid Robotics, vol. 13, no. 1, 2016. [22] W. Wu and H. Li, “Artificial emotion modeling and human- computer interaction experiment in PAD emotion space,” Journal of Harbin Institute of Technology, vol. 51, no. 1, pp. 9-10, 2019. [23] M. S. Erden, “Emotional postures for the humanoid-robot nao,” International Journal of Social Robotics, vol. 5, no. 4, pp. 441–456, 2013. [24] B. Browatzki, V. Tikhanoff, G. Metta, and H. H. C. Bulthoff, “Active in-hand object recognition on a humanoid robot,” IEEE Transactions on Robotics, vol. 30, no. 5, pp. 1260–1269,

Journal

Journal of RoboticsHindawi Publishing Corporation

Published: May 6, 2022

References