Get 20M+ Full-Text Papers For Less Than $1.50/day. Start a 14-Day Trial for You or Your Team.

Learn More →

Voice Recognition and Inverse Kinematics Control for a Redundant Manipulator Based on a Multilayer Artificial Intelligence Network

Voice Recognition and Inverse Kinematics Control for a Redundant Manipulator Based on a... Hindawi Journal of Robotics Volume 2021, Article ID 5805232, 10 pages https://doi.org/10.1155/2021/5805232 ResearchArticle Voice Recognition and Inverse Kinematics Control for a Redundant Manipulator Based on a Multilayer Artificial Intelligence Network Mai Ngoc Anh and Duong Xuan Bien Le Quy Don Technical University, 236 Hoang Quoc Viet, Hanoi, Vietnam Correspondence should be addressed to Mai Ngoc Anh; maingocanh@lqdtu.edu.vn Received 7 May 2021; Accepted 20 June 2021; Published 29 June 2021 Academic Editor: L. Fortuna Copyright © 2021 Mai Ngoc Anh and Duong Xuan Bien. *is is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. *is study presents the construction of a Vietnamese voice recognition module and inverse kinematics control of a redundant manipulator by using artificial intelligence algorithms. *e first deep learning model is built to recognize and convert voice information into input signals of the inverse kinematics problem of a 6-degrees-of-freedom robotic manipulator. *e inverse kinematics problem is solved based on the construction and training. *e second deep learning model is built using the data determined from the mathematical model of the system’s geometrical structure, the limits of joint variables, and the workspace. *e deep learning models are built in the PYTHON language. *e efficient operation of the built deep learning networks demonstrates the reliability of the artificial intelligence algorithms and the applicability of the Vietnamese voice recognition module for various tasks. IK solving algorithms such as analytic methods [1] or nu- 1. Introduction merical methods such as AGV [2], CLIK [3], and Jacobi In recent years, control system designs have been developed transpose [4] are hardly suitable, especially for redundant in the trend of intelligent control systems but still ensure a manipulator systems. fast and flexible response in real time to constantly changing *e results of recent research on artificial intelligence control requirements and allow for high-precision human (AI) show that neural networks (NN), deep learning, and interaction. reinforcement learning algorithms are extremely useful and In conventional intelligent control systems, research on effective for dealing with complex nonlinear problems with voice-based control is attracting many scientists thanks to its cost savings in computation time and system resources [5]. user-friendly interaction. Among the voice-based control *e most important point when applying these algorithms is to have a good understanding of the network structure built systems of industrial robotics, users can have the robots perform a variety of tasks through simple commands that up and its functioning. *e quality of the network and the carry control information related to the motion direction performance of the network will be used as criteria to and the characteristics of the object. evaluate the effectiveness of the algorithms. In terms of In essence, the voice commands are used as the input of programming languages, AI-related networks can be built the control system to solve the problem of inverse kine- on different languages like PYTHON, C ++, and Java [6]. matics (IK) and then converted into various operations of However, the PYTHON language has recently become more the manipulator. Due to the diverse nature of voice com- suitable for building deep learning (DL) network structures mands, the manipulator tasks change constantly, requiring with efficient support libraries such as Tensorflow, PyTorch, the control system to be processed quickly to respond. *e Numpy, Keras, and Sklearn. More importantly, these 2 Journal of Robotics applications in response to the constantly changing trajec- libraries support optimization problems in data science, machine learning, and control [7]. Based on the outstanding tory of the manipulator without preprograming. advantages of AI techniques, many intelligent control sys- tems have been built to solve IK problems for redundant 2. Materials and Methods manipulation systems. Furthermore, these AI techniques are well suited for control systems that require constantly 2.1. eDiagramofVoice-BasedController. *e manipulator changing motion by voice commands that may not be receives voice commands from the operator using the voice preprogrammed. recognition module. *en, the control system automatically Many solutions to apply voice control systems based on analyzes, calculates, and gives the control signals for the AI algorithms for industrial machines are mentioned in [8]. motors at the joints of the manipulator (Figure 1). To determine the direction of the emitted sound source, Specifically, the voice recognition module converts from Hwang et al. [9] designed an intelligent ear for the robot arm. human voice containing control information to text in the To control the fabrication machine and industrial robotic program. *e manipulator control information contained in arms, Rogowski [10] designed a VCS solution with good the voice includes information such as the direction of the noise resistance. For serving services, multiple manipulators movement of the manipulator (turn to the left or right), what designed to be human friendly interactively through gesture action the manipulator needs to perform (the action of recognition and voice feedback are introduced in [11–13]. grabbing or dropping), identifying the object (wheels, tray, *e manipulator in [14] serving household chores is con- boxes, etc.), and distinguishing features of objects (color, trolled by VCS to increase the usability and entertainment. shape, size, etc.). An enhanced version of DL algorithm-based speech rec- *e input voice and the output control signal must be ognition is proposed in [15]. *e medical robot arm in [16] is defined to solve the manipulator control target. In essence, designed with a VCS that allows the nurses and patients to the voice recognition module is a natural language pro- easily interact with the robot. *e manipulator in [17] uses a cessing problem, and a DL model is built in order for the VCS with visible light communication. An autonomous network to learn how to convert information from voice to manipulator is controlled by voice through the Google text. *e steps to perform the VCS are depicted in Figure 2. Assistant application tool on the basis of IoT technology and is shown in [18]. A voice-controlled application that uses IoT technology in combination with an adaptive NN is proposed 2.1.1. Preprocessing the Input Voice. *is problem is solved in [19] to improve the efficiency of solving IK problems for through the following substeps: noise filtering, word sepa- 6-degrees-of-freedom (DOF) robots. Differently, a Bayesian- ration, converting sound oscillation into sound energy in the BP NN is built to create an efficient control system for the frequency domain, and converting this energy into input root mean square (RMS) with fast and precise learning [20]. data for the DL1 model. *e simulation results show that the error of the method is *e noise filter step can be handled through a number of extremely small. *e IK problem for a 2DOF manipulator methods such as noise reduction based on the hardware using NN is presented in [21], 3DOF robot in [22], and design of the receiver microphone or by electronic elements 4DOF robot with hybrid IK control system NN and genetic of the circuit recording or by the program adjust. Voices algorithm in [23]. *e NN has output feedback to solve the include the main expected sounds we need to record and IK problem of the 6DOF manipulator proposed in [24]. *is noises (unwanted sounds or no control information). *ese is a technique with very high control efficiency. A new al- acoustic noises can come from the sounds of outside en- gorithm in 5DOF manipulator control in real time on the vironments such as traffic and industrial noise. *ey often basis of the NN is proposed in [25]. negatively affect the accuracy of speech recognition results. *is study presents setting up of two deep learning To significantly reduce audio noise, a noise reduction networks DL1 and DL2 to process voice signals to take the transceiver is used in this study. input of a 6DOF redundant manipulator to solve the IK Each human sentence often consists of many words control problem. Control information in the voice tag in- combined. Each word contains one or several syllables. *us, cludes the direction of movement and the object whose the speech recognition program must perform two basic attributes are given in the speech. *e robot will then tasks: separating words in sentences and separating syllables conduct image recognition to determine the object has the in each word. appropriate attributes for voice recognition results from the Interestingly, every Vietnamese word has only one sentence. *e image recognition is performed through the syllable. *erefore, this study only needs to focus on the first computer’s built-in vision module and will not be deeply task, which is separating words in sentences. To better analyzed in this study. *e center coordinates of the object understand this problem, let us consider the following will represent the position the end-effector point of the example. manipulator needs to move to. Training data for model DL2 Let consider a Vietnamese voice command to control are taken from the results of the forward kinetics problem the manipulator: “Quay beˆn phải, lấy ba´nh xe ma`u va`ng” based on the kinematics modeling according to Dena- (“turn right, grab the yellow wheel” in English). It is vit–Hartenberg’s (DH) theory. *e DL network models are noticed that the Vietnamese sentence has 8 syllables, while built using the PYTHON language. Successfully solving the English one has 7 syllables, of which “yellow” has 2 these two problems has a wide range of potential syllables. Journal of Robotics 3 Robot control Voice Micro Laptop board Robot arm 6DOF Figure 1: Diagram of voice control for the 6DOF manipulator arm. Begin Noise filtering Preprocessing DL1 model structure Word separation the voice SCC function Convert into sound energy DL1 model voice recognition Optimizer function Create Tensor Input DL1 DL1 output word Training processing Data collection ML structure Machine learning Create complete text data extraction Data extraction Decode information output ML Encrypt text information DL2 model and control signals Create motors control signal End Figure 2: *e steps to perform the VCS. Voice is received through the microphone and recorded 1 through the regular application Void Recorder available on Interference Windows Microsoft operating system. *e audio file can be signal 0.5 read and written with Scipy library in PYTHON pro- gramming. Acoustic oscillation amplitude values are stan- dardized so that the input signal does not contain a lot of suboscillations, making the separation process more efficient and easy to set a useful filter threshold. After normalization, decomposition is performed in the DL1 model with network –0.5 “bánh” node parameters that can be adjusted through a learning “bên” “xe” “lây” “phải” process on the sample to improve accuracy. “màu” “vàng” Acoustic oscillation amplitude values are normalized so “Quay” –1 that the input signal does not contain a lot of suboscillations, 0 12 345 Time (seconds) making the separation process more efficient and easy by setting a threshold filter. After normalization, the word Figure 3: *e normalized sound amplitude. decomposition is performed in the DL1 model with network node parameters that can be adjusted through a learning process on the sample to improve accuracy. and fairly equal amplitudes are also considered noise signals *e acoustic oscillation amplitudes after normalization that can be ignored. *erefore, if a user suddenly screams are shown in Figure 3. It can be seen that the difference of out a word or speaks all words in a sentence at low volume, normalized amplitudes can be clearly distinguished when the system may not understand the voice command. speaking and not speaking. *is difference is used as a key *e change in the amplitude of the sound oscillation is feature to separate words in sentences. determined to separate the words using the gradient method However, it should be noted that areas with excep- [26]. After separating the words in the spoken sentence, the tionally large amplitude of sound fluctuations relative to sound oscillation will be analyzed for the sound energy in the other areas while speaking will be considered as acoustic frequency domain through the Fourier transform. *is noise in speech. In addition, oscillation regions with small sound energy value will be used to convert to Input Tensor Fluctuation amplitude 4 Journal of Robotics the desired value or in other words, to make the error for the DL model. *e sound of the human voice is actually a combination of many signals with different frequencies. *e function value decrease with 0. To update the DL model, [30] the ADAM optimization oscillation function can be described through the following Fourier transform [17]. function is used to combine two momentum methods and RMSprop, whose learning rate changes with respect to time and can find the global minimum optimal value instead of f(t) � a + 􏽘 a cos(nωt) + b sin(nωt) , 􏼂 􏼃 (1) 0 n n the local minimum optimal value. Model DL1 is built n�1 through the Tensorflow library in PYTHON (Figure 7). where a is the original sound amplitude, a and b are the 0 n n Line 47 declares the output layer with 17 nodes with the Fourier constants, n is the frequency ratio coefficient,ω is the Softmax activation function. *is output number represents angular velocity, and t is a time variable. 17 common words in the voice command framework. *e From equation (1), the sound energy value in the fre- Softmax activation function will calculate to give a sample quency domain can be specified [17]. Figure 4 shows the with the highest probability to separate words and phrases sound energy that illustrates the two words “Quay” (turn) from each other. A dictionary with words or phrases and the and “Phải” (right) in the frequency domain. number of words that appear in the sentence is constructed A fundamental characteristic of sound is the energy and encoded as a vector. As such, network DL1 can ensure value, which is used to convert the input data to the DL voice recognition, converting recognition data into text model. Considering the energy value of the sound at each containing specific control information. frequency spaced by 1 (Hz), the limit frequency is 0/2 (kHz). TensorInput is a vector of sound energy values in increasing 2.1.3. Extracting Control Information Using the Machine order of frequency (Figure 5(a)). *e values of Tensor Input LearningModel. Technically, the Vietnamese sentence, after after being created are usually very large. For the DL model being separated into single words, will be classified to be better learned, the data level in the Tensor Inputs needs according to the DL1 model to form a set of words that are to be normalized by dividing all components by a certain necessary to combine into an equivalent complete text, free value greater than the maximum value of the energy ob- of noise and other redundant words. *is complete text tained. *e Tensor Input for the DL model after normali- (meaningful Vietnamese words and phrases) is used as input zation can be described in Figure 5(b). to the machine learning (ML) model. Actually, the algorithm TF-IDF is used to extract features of the text. *en, Naive Bayes algorithm is used to classify 2.1.2. Building the DL1 Model. After building the Tensor feature words and phrases of the text belonging to control Inputs, the DL1 model is built with many inputs and many information layer. *e ML model is built in PYTHON outputs (Figure 6) similar to the multilayer AI network in language in combination with the math librariesSklearn and [27]. Pyvi. *e extracted information fields will be encoded nu- *e number of inputs depends on the number of pa- merically and transmitted to the manipulator control circuit rameters in the Tensor Input vector. *e output layer of via SERIAL communication. *e output of the model is network DL1 includes different nodes, and each of these manipulator control information such as motion direction, nodes represents a certain word. *e output words have the robot’s action, and object color. probability value of appearing in the range [0, 1]. *e word with the highest probability value will be chosen as the result 2.2. Inverse Kinematics Control for the Manipulator Using of the voice-to-text transition. DeepLearningNetwork. *e real 6DOF manipulator arm is *e layers hidden within the DL1 model determine the presented in Figure 8 and its kinematics model is described probability value of the words producing the correct output. in Figure 9. *e elements inside the TensorInput and TensorOutput are In the kinematics model, the fixed global coordinate scalar quantities, so the nonlinear activation function is system is (OXYZ) . *e local coordinate systems used. According to [28], some nonlinear functions can be (OXYZ) (i � 1/6) are placed on the joints accordingly. *e used such asSigmoid,Tanh, andRelu, and the output layer is i joint variable i is denoted by q . used with Softmax activation function to calculate the Let us denote q � 􏼂 q q q q q q 􏼃 as the gen- probability distribution across the classes. *e DL model 1 2 3 4 5 6 eralized coordinate vector of the 6 joint variables. *e ki- simulates how human biological neural work, so this needs nematics parameters of the 6DOF manipulator arm are to be trained to simulate the outputs with corresponding determined according to DH rule [1], as given in Table 1. inputs and predict the results with other inputs. To train the Homogeneous transformation matrices H , (i � 1/6) on DL model, the limiting criteria need to be defined and how it i the six links are determined in [1] as the following general can be learned need to be outlined to distinguish between equation: right and wrong. According to [29], the Sparse Categorical Crossentropy (SCC) function is used as follows: after each i− 1 i− 1 R p i i ⎣ ⎦ ⎡ ⎤ learning, the DL model needs to update these parameters H � , (2) 0 1 again to create the actual output that converges gradually to Journal of Robotics 5 50 15 (a) (b) Figure 4: Sound energy of the word “Quay” and the word “Phải”. 30 45 20 10 13 26 18 11 8 ... 0.3 0.45 0.2 0.1 0.13 0.26 0.18 0.11 0.08 ... (a) (b) Figure 5: Tensor Input before and after normalization. Simple neural network Deep learning neural network Input layer Input layer Hidden layer Hidden layer Output layer Output layer (a) (b) Figure 6: *e multilayer artificial intelligence network [27]. (a) Simple neural network. (b) Deep learning neural network. Figure 7: Model DL1 in PYTHON. i− 1 where R is a rotation matrix from the local coordinate *e position and direction of the end-effector relative to system (OXYZ) to the local coordinate system (OXYZ) the fixed global coordinate system are represented by the i− 1 i i− 1 and p is a position vector of joint i on the coordinate homogeneous transformation matrix D . *is matrix is i 6 system (OXYZ) . calculated as follows: i− 1 0 0.5 1 1.5 2 0 0.5 1 1.5 2 3 3 Frequency (10 Hz) Frequency (10 Hz) Energy (W/m ) Energy (W/m ) 6 Journal of Robotics Figure 8: Real 6DOF manipulator arm. q E X d 4 4 Z O 4 5 X q 3 5 a 2 0 d O X 0 0 Figure 9: Kinematics model. D � H H H H H H , 6 1 2 3 4 5 6 Table 1: DH parameters. (3) R p E E D � 􏼢 􏼣, Parameters θ d a α i i i i 0 1 Link 1 q d + d 0 π/2 1 0 1 Link 2 q 0 a 0 where R is a direction matrix (3 × 3) for rotating global 2 2 Link 3 q − π/2 d 0 − (π /2) 3 3 coordinate system (OXYZ) to the local coordinate system Link 4 q d 0 − (π /2) 4 4 of the end-effector (OXYZ) and p � x y z is a 􏼂 􏼃 6 E E E E Link 5 q 0 a 0 5 5 position vector of the end-effector relative to the fixed global Link 6 q 0 0 0 coordinate system (OXYZ) . Note: ∗means variable. a is a length from the origin O to the origin O i i i+1 By applying the DH parameters into equations (2)–(4) along the axis OX ; α is an angle for rotating (OXYZ) to (OXYZ) i+1 i i i+1 and performing mathematical transformations (see the around the axis OZ ; d is a length from the origin O to the origin O i+1 i i− 1 i details in [1]), the position coordinates of the end-effector along the axis OZ ; and θ is an angle for rotating (OXYZ) to (OXYZ) i i i− 1 i around the axis OZ . point are given as i Journal of Robotics 7 xE � ((cq1∗ cq2∗ sq3 + cq1∗ sq2∗ cq3)∗ cq4 − sq1∗ sq4)∗ a5∗ cq5 + (− cq1∗ cq2∗ cq3 + cq1∗ sq2∗ sq3)∗ a5∗ sq5 + (cq1∗ cq2∗ cq3 − cq1∗ sq2∗ sq3)∗ d4 + cq1∗ cq2∗ d3∗ cq3 − cq1∗ sq2∗ d3∗ sq3 + cq1∗ a2∗ cq2; yE � (sq1∗ cq2∗ sq3 + sq1∗ sq2∗ cq3)∗ cq4 + cq1∗ sq4∗ a5∗ cq5 + (− sq1∗ cq2∗ cq3 + s1∗ sq2∗ sq3)∗ a5∗ sq5 + (sq1∗ cq2∗ cq3 − sq1∗ sq2∗ sq3)∗ d4 + sq1∗ cq2∗ d3∗ cq3 − sq1∗ sq2∗ d3∗ sq3 + sq1∗ a2∗ cq2; zE � (sq2∗ sq3 − cq2∗ cq3)∗ cq4∗ a5∗ cq5 + (− sq2∗ cq3 − cq2∗ sq3)∗ a5∗ sq5 + (sq2∗ cq3 + cq2∗ sq3)∗ d4 + sq2∗ d3∗ cq3 + cq2∗ d3∗ sq3 + a2∗ sq2 + d1 + d0; (4) where cq stands for cos(q ) and sq stands for sin(q ). rotation angles of the manipulator joints. *e network i i i i *e data for the network DL2 model are the spatial consists of 9 hidden layers with theRelu activation function. coordinate sets of the end-effector point and the corre- *e number of nodes per layer is presented in Figure 13. sponding set of joint variable parameters that have been Training results and prediction results of motor control collected and fed into the training DL2 network multiple signals are shown in Figure 14. Check on the test data with times until the model can give control signals for the ma- input as the position vector of the end-effector point in the nipulator accurately, meeting the motion requirements. workspace is x � 􏼂 0 20 0 􏼃 (mm), and the output of the After training and assessing responsiveness well, the DL2 test data corresponds to the joint variable value. *e q value model is used as a model to predict manipulator rotation obtained from the model is angle values with object positions in the manipulator q � 􏼂 90 50 105 90 79 􏼃 (deg). *us, the accuracy is workspace. 98.67% on the test dataset. Figure 10 describes the entire process where the DL2 *e actual experimental system with the circuit reading model is built with input as the request signal received after and writing the joint variable values and the feedback values encoding the vector and feasible position data in the on the 16 × 2 LCD is shown in Figure 15. workspace. *e output of the model is the corresponding *e joint variable values to control the manipulator arm joint variable values. to the position on the object (a yellow wheel) are shown in Figure 16. 3. Experimental Results 4. Discussion *e geometry parameters of the manipulator are as follows: In actual operation, industrial robots in general and re- d � 57 mm, dundant manipulators in particular often perform not as d � 36 mm, perfectly as calculated in ideal conditions due to the in- fluence of many different factors called noise that create the a � 120 mm, imperfect robot control system. According to [31], although (5) d � 90 mm, imperfections are unavoidable in real production processes, the real devices still operate well in regimes far from ideality. d � 30 mm, For example, mechanical imperfections may occur prior a � 38 mm. to operation due to mechanical manufacturing defects, as- sembly errors, or during operation due to mechanical system *e joint variable limits are as follows: vibrations. Meanwhile, electrical imperfections can be − 2.97≤ q ≤ 2.97(rad), caused by the electromagnetic interference of the sur- rounding environment, the instability of the power supply, − 0.52≤ q ≤ 2.7(rad), or high-intensity electric pulses of welding machines. To − 0.785≤ q ≤ 2.97(rad), overcome the imperfections, additional modules related to (6) noise compensation, noise cancellation, or noise suppres- − 2.7≤ q ≤ 2.7(rad), sion will be studied in the next research stages. − 0.785≤ q ≤ 2.97(rad), *is study only considers the problem of kinematics in ideal conditions or the impact of noise can be ignored. In − 3.57≤ q ≤ 3.57(rad). fact, it is not possible to have a general anti-interference *e workspace of the manipulator arm is shown in problem for all types of noise. *erefore, when practically Figure 11. applied, the research team will apply anti-interference so- *e drive motors are Servo MG995, Arduino Nano lutions suitable for each context. Circuit, Logitech B525-720p camera, Dell Precision M680 In the case of group coordination between multiple laptop, and Razer Seiren Mini microphone (Figure 12). voice-controlled robots in a narrow space, naming or coding Network parameter DL2 controlling the manipulator is for each robot needs to be done through an independent shown in Figure 13 with 5 outputs corresponding to 5 module with name recognition or decoding capabilities. y (mm) y (mm) 8 Journal of Robotics Training Good Begin and End evaluating Learn DL2 Updated Predict for dataset model DL2 model new datasets Not good Figure 10: *e building process for the DL2 model. –100 –100 –200 –200 –100 –100 –300 –200 –200 –100 (a) (b) Figure 11: *e workspace of 6DOF manipulator. (a) (b) (c) (d) (e) (f) (g) (h) (i) Figure 12: Devices in the experimental system. (a) Dell Precision laptop, (b) Logitech camera, (c) RC Servo MG995, (d) micro, (e) Arduino Mega 2560, (f) 12 V-5 A adapter, (g) XL4015 5A DC, (h) 16 × 2 LCD, and (i) joystick shield. Figure 13: Network parameter DL2 for IK control. x (mm) x (mm) z (mm) z (mm) Journal of Robotics 9 Figure 14: Training results and prediction on the test dataset. Laptop Camera Robot 6DOF Micro (a) (b) Figure 15: *e actual experimental system. 150 interference “can be improved by including long-range connections between the robots” [32]. 5. Conclusion In summary, the PYTHON language has been applied to build AI models for the Vietnamese voice recognition module and IK control for the 6DOF redundant manipu- lator. DL and ML techniques have been applied successfully with over 98% training accuracy. Data used for training models DL1 and DL2 are independently built according to the Vietnamese language and calculated data from 6DOF manipulator kinematics modeling. AI models are tested on 0 100 200 300 400 500 real manipulator models and gave possible results. *is Time (seconds/100) study could serve as the foundation for developing appli- cations for various types of manipulators (serial manipu- Joint 1 Joint 4 lators, parallel manipulators, hybrid manipulators, and Joint 2 Joint 5 Joint 3 mobile manipulators) for industrial production (welding robots, robot 3D printing, and machining robots), medical, Figure 16: *e joint values are received by the voice command. service industries, home activities (surgical robots, flexible robots, soft robots, humanoid robots, UAVs, service robots When the operator calls the robot’s name or activates the in families, and restaurants). code, the related robot is ready to receive the next voice commands. *us, when it is necessary to add a new robot to Data Availability an existing robot network, it is possible to adjust the module of name recognition or decoding without any change in the *e datasets generated during the current study are available entire control system. from the corresponding authors on reasonable request. Differently, in a robot network, the audio imperfections may come from the voice interference. *e audio imper- Conflicts of Interest fections can be solved by the effect of different range con- nections controlled by a central dispatcher and the voice *e authors declare that they have no conflicts of interest. Value of joint position (deg) 10 Journal of Robotics kinematics problem of a 6 D.O.F serial robot manipulator,” References AdvancesinEngineeringSoftware, vol. 37, no. 7, pp. 432–438, [1] M. W. Spong, S. Hutchinson, and M. Vidyasagar, Robot 2006. Modeling and Control, First edition, 2001. [20] Y. Zhou, W. Tang, and J. Zhang, “Algorithm for multi-joint [2] D. X. Bien, “On the effect of the end-effector point trajectory redundant robot inverse kinematics based on the bayesian-BP on the joint jerk of the redundant manipulators,” Journal of neural network,” in Proceedings of the 2008 International AppliedandComputationalMechanics, vol. 20, no. 10, pp. 1–8, Conference on Intelligent Computation Technology and Au- 2021. tomation (ICICTA), pp. 173–178, Changsha, China, 2008. [3] C. A. My, D. X. Bien, H. B. Tung, L. C. Hieu, N. V. Cong, and [21] B. Daya, S. Khawandi, and M. Akoum, “Applying neural T. V. Hieu, “Inverse kinematic control algorithm for a welding network architecture for inverse kinematics problem in ro- robot positioner system to trace a 3D complex curve,” in botics,” Journal of Software Engineering and Applications, Proceedings of the International Conference on Advanced vol. 3, no. 3, pp. 230–239, 2010. TechnologiesforCommunications(ATC), pp. 319–323, Hanoi, [22] A.-V. Duka, “Neural network based inverse kinematics so- Vietnam, October 2019. lution for trajectory tracking of a robotic arm,” Procedia [4] S. Lian, Y. Han, Y. Wang et al., “Accelerating inverse kine- Technology, vol. 12, pp. 20–27, 2014. matics for high-DOF robots,” in Proceedings of the 54th [23] R. Koker ¨ and T. Çakar, “A neuro-genetic-simulated annealing Annual Design Automation Conference, Austin, TX, USA, approach to the inverse kinematics solution of robots: a 2017. simulation based study,”EngineeringwithComputers, vol. 32, [5] https://www.edureka.co/blog/artificial-intelligence- no. 4, pp. 553–565, 2016. algorithms/. [24] A. R. J. Almusawi, L. C. Dulger, ¨ and S. Kapucu, “A new [6] https://www.geeksforgeeks.org/top-5-best-programming- artificial neural network approach in solving inverse kine- languages-for-artificial-intelligence-field/. matics of robotic arm (denso VP6242),” Computational In- [7] https://www.cuelogic.com/blog/role-of-python-in-artificial- telligence and Neuroscience, vol. 2016, Article ID 5720163, intelligence. 10 pages, 2016. [8] D. P. Mital and G. W. Leng, “A voice-activated robot with [25] P. M. Shailendrasingh and L. P. Pratap, “A real-time approach artificial intelligence,” Robotics and Autonomous Systems, using feed forward neural network for a 5 DOF robot ma- vol. 4, no. 4, pp. 339–344, 1989. nipulator,” in Proceedings of the IEEE International Confer- [9] S. Hwang, Y. Park, and Y.-s. Park, “Sound direction esti- ence on Power, Control, Signals and Instrumentation mation using an artificial ear for robots,” Robotics and Au- Engineering (ICPCSI-2017), pp. 1240–1245, Chennai, India, tonomous Systems, vol. 59, no. 3-4, pp. 208–217, 2011. September 2017. [10] A. Rogowski, “Industrially oriented voice control system,” [26] A. Garzelli, L. Capobianco, and F. Nencini, “Fusion of Robotics and Computer-Integrated Manufacturing, vol. 28, multispectral and panchromatic images as an optimization no. 3, pp. 303–315, 2012. problem,” Image Fusion Algorithms and Applications, [11] V. Alvarez-Santos, R. Iglesias, X. M. Pardo, C. V. Regueiro, pp. 223–250, Academic Press, Cambridge, MA, USA, 2008. and A. Canedo-Rodriguez, “Gesture-based interaction with [27] https://www.securityinfowatch.com/video-surveillance/ voice feedback for a tour-guide robot,” Journal of Visual video-analytics/article/21069937/deep-learning-to-the- Communication and Image Representation, vol. 25, no. 2, rescue. pp. 499–509, 2014. [28] https://www.programmersought.com/article/10025152444/. [12] S. S. Turakne and P. Loni, “Intelligent interactive robot with [29] https://www.Tensorflow.org/api_docs/python/tf/keras/losses/ gesture recognition and voice feedback,”InternationalJournal sparse_categorical_crossentropy. of Engineering Research & Technology, vol. 5, no. 4, [30] https://www.programmersought.com/article/33553292079/. pp. 276–280, 2016. [31] M. Bucolo, A. Buscarino, C. Famoso, L. Fortuna, and [13] M. Meghana, Ch. U. Kumari, J. S. Priya et al., “Hand gesture M. Frasca, “Control of imperfect dynamical systems,” Non- recognition and voice controlled robot,” Materials Today: linear Dynamics, vol. 98, no. 4, pp. 2989–2999, 2019. Proceedings, vol. 33, no. 7, pp. 4121–4123, 2020. [32] A. Buscarino, L. Fortuna, M. Frasca, and A. Rizzo, “Dynamical [14] M. F. Rafael and D. S. Manuel, “Design in robotics based in network interactions in distributed control of robots,”Chaos: the voice of the customer of household robots,” Robotics and An Interdisciplinary Journal of Nonlinear Science, vol. 16, Autonomous Systems, vol. 79, pp. 99–107, 2016. no. 1, Article ID 015116, 2006. [15] M. Buyukyilmaz and A. O. Cibikdiken, “Voice gender rec- ognition using deep learning,” Advances in Computer Science Research, vol. 58, pp. 409–411, 2017. [16] K. Gundogdu, S. Bayrakdar, and I. Yucedag, “Developing and modeling of voice control system for prosthetic robot arm in medical systems,” Journal of King Saud University-Computer and Information Sciences, vol. 30, no. 2, pp. 198–205, 2018. [17] V. P. Saradi and P. Kailasapathi, “Voice-based motion control of a robotic vehicle through visible light communication,” Computers & Electrical Engineering, vol. 76, pp. 154–167, [18] S. Sachdev, J. Macwan, C. Patel, and N. Doshi, “Voice-con- trolled autonomous vehicle using IoT,” Procedia Computer Science, vol. 160, pp. 712–717, 2019. [19] A. T. Hasan, A. M. S. Hamouda, N. Ismail, and H. M. A. A. Al- Assadi, “An adaptive-learning algorithm to solve the inverse http://www.deepdyve.com/assets/images/DeepDyve-Logo-lg.png Journal of Robotics Hindawi Publishing Corporation

Voice Recognition and Inverse Kinematics Control for a Redundant Manipulator Based on a Multilayer Artificial Intelligence Network

Journal of Robotics , Volume 2021 – Jun 29, 2021

Loading next page...
 
/lp/hindawi-publishing-corporation/voice-recognition-and-inverse-kinematics-control-for-a-redundant-mCI1S3N9sF

References

References for this paper are not available at this time. We will be adding them shortly, thank you for your patience.

Publisher
Hindawi Publishing Corporation
Copyright
Copyright © 2021 Mai Ngoc Anh and Duong Xuan Bien. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
ISSN
1687-9600
eISSN
1687-9619
DOI
10.1155/2021/5805232
Publisher site
See Article on Publisher Site

Abstract

Hindawi Journal of Robotics Volume 2021, Article ID 5805232, 10 pages https://doi.org/10.1155/2021/5805232 ResearchArticle Voice Recognition and Inverse Kinematics Control for a Redundant Manipulator Based on a Multilayer Artificial Intelligence Network Mai Ngoc Anh and Duong Xuan Bien Le Quy Don Technical University, 236 Hoang Quoc Viet, Hanoi, Vietnam Correspondence should be addressed to Mai Ngoc Anh; maingocanh@lqdtu.edu.vn Received 7 May 2021; Accepted 20 June 2021; Published 29 June 2021 Academic Editor: L. Fortuna Copyright © 2021 Mai Ngoc Anh and Duong Xuan Bien. *is is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. *is study presents the construction of a Vietnamese voice recognition module and inverse kinematics control of a redundant manipulator by using artificial intelligence algorithms. *e first deep learning model is built to recognize and convert voice information into input signals of the inverse kinematics problem of a 6-degrees-of-freedom robotic manipulator. *e inverse kinematics problem is solved based on the construction and training. *e second deep learning model is built using the data determined from the mathematical model of the system’s geometrical structure, the limits of joint variables, and the workspace. *e deep learning models are built in the PYTHON language. *e efficient operation of the built deep learning networks demonstrates the reliability of the artificial intelligence algorithms and the applicability of the Vietnamese voice recognition module for various tasks. IK solving algorithms such as analytic methods [1] or nu- 1. Introduction merical methods such as AGV [2], CLIK [3], and Jacobi In recent years, control system designs have been developed transpose [4] are hardly suitable, especially for redundant in the trend of intelligent control systems but still ensure a manipulator systems. fast and flexible response in real time to constantly changing *e results of recent research on artificial intelligence control requirements and allow for high-precision human (AI) show that neural networks (NN), deep learning, and interaction. reinforcement learning algorithms are extremely useful and In conventional intelligent control systems, research on effective for dealing with complex nonlinear problems with voice-based control is attracting many scientists thanks to its cost savings in computation time and system resources [5]. user-friendly interaction. Among the voice-based control *e most important point when applying these algorithms is to have a good understanding of the network structure built systems of industrial robotics, users can have the robots perform a variety of tasks through simple commands that up and its functioning. *e quality of the network and the carry control information related to the motion direction performance of the network will be used as criteria to and the characteristics of the object. evaluate the effectiveness of the algorithms. In terms of In essence, the voice commands are used as the input of programming languages, AI-related networks can be built the control system to solve the problem of inverse kine- on different languages like PYTHON, C ++, and Java [6]. matics (IK) and then converted into various operations of However, the PYTHON language has recently become more the manipulator. Due to the diverse nature of voice com- suitable for building deep learning (DL) network structures mands, the manipulator tasks change constantly, requiring with efficient support libraries such as Tensorflow, PyTorch, the control system to be processed quickly to respond. *e Numpy, Keras, and Sklearn. More importantly, these 2 Journal of Robotics applications in response to the constantly changing trajec- libraries support optimization problems in data science, machine learning, and control [7]. Based on the outstanding tory of the manipulator without preprograming. advantages of AI techniques, many intelligent control sys- tems have been built to solve IK problems for redundant 2. Materials and Methods manipulation systems. Furthermore, these AI techniques are well suited for control systems that require constantly 2.1. eDiagramofVoice-BasedController. *e manipulator changing motion by voice commands that may not be receives voice commands from the operator using the voice preprogrammed. recognition module. *en, the control system automatically Many solutions to apply voice control systems based on analyzes, calculates, and gives the control signals for the AI algorithms for industrial machines are mentioned in [8]. motors at the joints of the manipulator (Figure 1). To determine the direction of the emitted sound source, Specifically, the voice recognition module converts from Hwang et al. [9] designed an intelligent ear for the robot arm. human voice containing control information to text in the To control the fabrication machine and industrial robotic program. *e manipulator control information contained in arms, Rogowski [10] designed a VCS solution with good the voice includes information such as the direction of the noise resistance. For serving services, multiple manipulators movement of the manipulator (turn to the left or right), what designed to be human friendly interactively through gesture action the manipulator needs to perform (the action of recognition and voice feedback are introduced in [11–13]. grabbing or dropping), identifying the object (wheels, tray, *e manipulator in [14] serving household chores is con- boxes, etc.), and distinguishing features of objects (color, trolled by VCS to increase the usability and entertainment. shape, size, etc.). An enhanced version of DL algorithm-based speech rec- *e input voice and the output control signal must be ognition is proposed in [15]. *e medical robot arm in [16] is defined to solve the manipulator control target. In essence, designed with a VCS that allows the nurses and patients to the voice recognition module is a natural language pro- easily interact with the robot. *e manipulator in [17] uses a cessing problem, and a DL model is built in order for the VCS with visible light communication. An autonomous network to learn how to convert information from voice to manipulator is controlled by voice through the Google text. *e steps to perform the VCS are depicted in Figure 2. Assistant application tool on the basis of IoT technology and is shown in [18]. A voice-controlled application that uses IoT technology in combination with an adaptive NN is proposed 2.1.1. Preprocessing the Input Voice. *is problem is solved in [19] to improve the efficiency of solving IK problems for through the following substeps: noise filtering, word sepa- 6-degrees-of-freedom (DOF) robots. Differently, a Bayesian- ration, converting sound oscillation into sound energy in the BP NN is built to create an efficient control system for the frequency domain, and converting this energy into input root mean square (RMS) with fast and precise learning [20]. data for the DL1 model. *e simulation results show that the error of the method is *e noise filter step can be handled through a number of extremely small. *e IK problem for a 2DOF manipulator methods such as noise reduction based on the hardware using NN is presented in [21], 3DOF robot in [22], and design of the receiver microphone or by electronic elements 4DOF robot with hybrid IK control system NN and genetic of the circuit recording or by the program adjust. Voices algorithm in [23]. *e NN has output feedback to solve the include the main expected sounds we need to record and IK problem of the 6DOF manipulator proposed in [24]. *is noises (unwanted sounds or no control information). *ese is a technique with very high control efficiency. A new al- acoustic noises can come from the sounds of outside en- gorithm in 5DOF manipulator control in real time on the vironments such as traffic and industrial noise. *ey often basis of the NN is proposed in [25]. negatively affect the accuracy of speech recognition results. *is study presents setting up of two deep learning To significantly reduce audio noise, a noise reduction networks DL1 and DL2 to process voice signals to take the transceiver is used in this study. input of a 6DOF redundant manipulator to solve the IK Each human sentence often consists of many words control problem. Control information in the voice tag in- combined. Each word contains one or several syllables. *us, cludes the direction of movement and the object whose the speech recognition program must perform two basic attributes are given in the speech. *e robot will then tasks: separating words in sentences and separating syllables conduct image recognition to determine the object has the in each word. appropriate attributes for voice recognition results from the Interestingly, every Vietnamese word has only one sentence. *e image recognition is performed through the syllable. *erefore, this study only needs to focus on the first computer’s built-in vision module and will not be deeply task, which is separating words in sentences. To better analyzed in this study. *e center coordinates of the object understand this problem, let us consider the following will represent the position the end-effector point of the example. manipulator needs to move to. Training data for model DL2 Let consider a Vietnamese voice command to control are taken from the results of the forward kinetics problem the manipulator: “Quay beˆn phải, lấy ba´nh xe ma`u va`ng” based on the kinematics modeling according to Dena- (“turn right, grab the yellow wheel” in English). It is vit–Hartenberg’s (DH) theory. *e DL network models are noticed that the Vietnamese sentence has 8 syllables, while built using the PYTHON language. Successfully solving the English one has 7 syllables, of which “yellow” has 2 these two problems has a wide range of potential syllables. Journal of Robotics 3 Robot control Voice Micro Laptop board Robot arm 6DOF Figure 1: Diagram of voice control for the 6DOF manipulator arm. Begin Noise filtering Preprocessing DL1 model structure Word separation the voice SCC function Convert into sound energy DL1 model voice recognition Optimizer function Create Tensor Input DL1 DL1 output word Training processing Data collection ML structure Machine learning Create complete text data extraction Data extraction Decode information output ML Encrypt text information DL2 model and control signals Create motors control signal End Figure 2: *e steps to perform the VCS. Voice is received through the microphone and recorded 1 through the regular application Void Recorder available on Interference Windows Microsoft operating system. *e audio file can be signal 0.5 read and written with Scipy library in PYTHON pro- gramming. Acoustic oscillation amplitude values are stan- dardized so that the input signal does not contain a lot of suboscillations, making the separation process more efficient and easy to set a useful filter threshold. After normalization, decomposition is performed in the DL1 model with network –0.5 “bánh” node parameters that can be adjusted through a learning “bên” “xe” “lây” “phải” process on the sample to improve accuracy. “màu” “vàng” Acoustic oscillation amplitude values are normalized so “Quay” –1 that the input signal does not contain a lot of suboscillations, 0 12 345 Time (seconds) making the separation process more efficient and easy by setting a threshold filter. After normalization, the word Figure 3: *e normalized sound amplitude. decomposition is performed in the DL1 model with network node parameters that can be adjusted through a learning process on the sample to improve accuracy. and fairly equal amplitudes are also considered noise signals *e acoustic oscillation amplitudes after normalization that can be ignored. *erefore, if a user suddenly screams are shown in Figure 3. It can be seen that the difference of out a word or speaks all words in a sentence at low volume, normalized amplitudes can be clearly distinguished when the system may not understand the voice command. speaking and not speaking. *is difference is used as a key *e change in the amplitude of the sound oscillation is feature to separate words in sentences. determined to separate the words using the gradient method However, it should be noted that areas with excep- [26]. After separating the words in the spoken sentence, the tionally large amplitude of sound fluctuations relative to sound oscillation will be analyzed for the sound energy in the other areas while speaking will be considered as acoustic frequency domain through the Fourier transform. *is noise in speech. In addition, oscillation regions with small sound energy value will be used to convert to Input Tensor Fluctuation amplitude 4 Journal of Robotics the desired value or in other words, to make the error for the DL model. *e sound of the human voice is actually a combination of many signals with different frequencies. *e function value decrease with 0. To update the DL model, [30] the ADAM optimization oscillation function can be described through the following Fourier transform [17]. function is used to combine two momentum methods and RMSprop, whose learning rate changes with respect to time and can find the global minimum optimal value instead of f(t) � a + 􏽘 a cos(nωt) + b sin(nωt) , 􏼂 􏼃 (1) 0 n n the local minimum optimal value. Model DL1 is built n�1 through the Tensorflow library in PYTHON (Figure 7). where a is the original sound amplitude, a and b are the 0 n n Line 47 declares the output layer with 17 nodes with the Fourier constants, n is the frequency ratio coefficient,ω is the Softmax activation function. *is output number represents angular velocity, and t is a time variable. 17 common words in the voice command framework. *e From equation (1), the sound energy value in the fre- Softmax activation function will calculate to give a sample quency domain can be specified [17]. Figure 4 shows the with the highest probability to separate words and phrases sound energy that illustrates the two words “Quay” (turn) from each other. A dictionary with words or phrases and the and “Phải” (right) in the frequency domain. number of words that appear in the sentence is constructed A fundamental characteristic of sound is the energy and encoded as a vector. As such, network DL1 can ensure value, which is used to convert the input data to the DL voice recognition, converting recognition data into text model. Considering the energy value of the sound at each containing specific control information. frequency spaced by 1 (Hz), the limit frequency is 0/2 (kHz). TensorInput is a vector of sound energy values in increasing 2.1.3. Extracting Control Information Using the Machine order of frequency (Figure 5(a)). *e values of Tensor Input LearningModel. Technically, the Vietnamese sentence, after after being created are usually very large. For the DL model being separated into single words, will be classified to be better learned, the data level in the Tensor Inputs needs according to the DL1 model to form a set of words that are to be normalized by dividing all components by a certain necessary to combine into an equivalent complete text, free value greater than the maximum value of the energy ob- of noise and other redundant words. *is complete text tained. *e Tensor Input for the DL model after normali- (meaningful Vietnamese words and phrases) is used as input zation can be described in Figure 5(b). to the machine learning (ML) model. Actually, the algorithm TF-IDF is used to extract features of the text. *en, Naive Bayes algorithm is used to classify 2.1.2. Building the DL1 Model. After building the Tensor feature words and phrases of the text belonging to control Inputs, the DL1 model is built with many inputs and many information layer. *e ML model is built in PYTHON outputs (Figure 6) similar to the multilayer AI network in language in combination with the math librariesSklearn and [27]. Pyvi. *e extracted information fields will be encoded nu- *e number of inputs depends on the number of pa- merically and transmitted to the manipulator control circuit rameters in the Tensor Input vector. *e output layer of via SERIAL communication. *e output of the model is network DL1 includes different nodes, and each of these manipulator control information such as motion direction, nodes represents a certain word. *e output words have the robot’s action, and object color. probability value of appearing in the range [0, 1]. *e word with the highest probability value will be chosen as the result 2.2. Inverse Kinematics Control for the Manipulator Using of the voice-to-text transition. DeepLearningNetwork. *e real 6DOF manipulator arm is *e layers hidden within the DL1 model determine the presented in Figure 8 and its kinematics model is described probability value of the words producing the correct output. in Figure 9. *e elements inside the TensorInput and TensorOutput are In the kinematics model, the fixed global coordinate scalar quantities, so the nonlinear activation function is system is (OXYZ) . *e local coordinate systems used. According to [28], some nonlinear functions can be (OXYZ) (i � 1/6) are placed on the joints accordingly. *e used such asSigmoid,Tanh, andRelu, and the output layer is i joint variable i is denoted by q . used with Softmax activation function to calculate the Let us denote q � 􏼂 q q q q q q 􏼃 as the gen- probability distribution across the classes. *e DL model 1 2 3 4 5 6 eralized coordinate vector of the 6 joint variables. *e ki- simulates how human biological neural work, so this needs nematics parameters of the 6DOF manipulator arm are to be trained to simulate the outputs with corresponding determined according to DH rule [1], as given in Table 1. inputs and predict the results with other inputs. To train the Homogeneous transformation matrices H , (i � 1/6) on DL model, the limiting criteria need to be defined and how it i the six links are determined in [1] as the following general can be learned need to be outlined to distinguish between equation: right and wrong. According to [29], the Sparse Categorical Crossentropy (SCC) function is used as follows: after each i− 1 i− 1 R p i i ⎣ ⎦ ⎡ ⎤ learning, the DL model needs to update these parameters H � , (2) 0 1 again to create the actual output that converges gradually to Journal of Robotics 5 50 15 (a) (b) Figure 4: Sound energy of the word “Quay” and the word “Phải”. 30 45 20 10 13 26 18 11 8 ... 0.3 0.45 0.2 0.1 0.13 0.26 0.18 0.11 0.08 ... (a) (b) Figure 5: Tensor Input before and after normalization. Simple neural network Deep learning neural network Input layer Input layer Hidden layer Hidden layer Output layer Output layer (a) (b) Figure 6: *e multilayer artificial intelligence network [27]. (a) Simple neural network. (b) Deep learning neural network. Figure 7: Model DL1 in PYTHON. i− 1 where R is a rotation matrix from the local coordinate *e position and direction of the end-effector relative to system (OXYZ) to the local coordinate system (OXYZ) the fixed global coordinate system are represented by the i− 1 i i− 1 and p is a position vector of joint i on the coordinate homogeneous transformation matrix D . *is matrix is i 6 system (OXYZ) . calculated as follows: i− 1 0 0.5 1 1.5 2 0 0.5 1 1.5 2 3 3 Frequency (10 Hz) Frequency (10 Hz) Energy (W/m ) Energy (W/m ) 6 Journal of Robotics Figure 8: Real 6DOF manipulator arm. q E X d 4 4 Z O 4 5 X q 3 5 a 2 0 d O X 0 0 Figure 9: Kinematics model. D � H H H H H H , 6 1 2 3 4 5 6 Table 1: DH parameters. (3) R p E E D � 􏼢 􏼣, Parameters θ d a α i i i i 0 1 Link 1 q d + d 0 π/2 1 0 1 Link 2 q 0 a 0 where R is a direction matrix (3 × 3) for rotating global 2 2 Link 3 q − π/2 d 0 − (π /2) 3 3 coordinate system (OXYZ) to the local coordinate system Link 4 q d 0 − (π /2) 4 4 of the end-effector (OXYZ) and p � x y z is a 􏼂 􏼃 6 E E E E Link 5 q 0 a 0 5 5 position vector of the end-effector relative to the fixed global Link 6 q 0 0 0 coordinate system (OXYZ) . Note: ∗means variable. a is a length from the origin O to the origin O i i i+1 By applying the DH parameters into equations (2)–(4) along the axis OX ; α is an angle for rotating (OXYZ) to (OXYZ) i+1 i i i+1 and performing mathematical transformations (see the around the axis OZ ; d is a length from the origin O to the origin O i+1 i i− 1 i details in [1]), the position coordinates of the end-effector along the axis OZ ; and θ is an angle for rotating (OXYZ) to (OXYZ) i i i− 1 i around the axis OZ . point are given as i Journal of Robotics 7 xE � ((cq1∗ cq2∗ sq3 + cq1∗ sq2∗ cq3)∗ cq4 − sq1∗ sq4)∗ a5∗ cq5 + (− cq1∗ cq2∗ cq3 + cq1∗ sq2∗ sq3)∗ a5∗ sq5 + (cq1∗ cq2∗ cq3 − cq1∗ sq2∗ sq3)∗ d4 + cq1∗ cq2∗ d3∗ cq3 − cq1∗ sq2∗ d3∗ sq3 + cq1∗ a2∗ cq2; yE � (sq1∗ cq2∗ sq3 + sq1∗ sq2∗ cq3)∗ cq4 + cq1∗ sq4∗ a5∗ cq5 + (− sq1∗ cq2∗ cq3 + s1∗ sq2∗ sq3)∗ a5∗ sq5 + (sq1∗ cq2∗ cq3 − sq1∗ sq2∗ sq3)∗ d4 + sq1∗ cq2∗ d3∗ cq3 − sq1∗ sq2∗ d3∗ sq3 + sq1∗ a2∗ cq2; zE � (sq2∗ sq3 − cq2∗ cq3)∗ cq4∗ a5∗ cq5 + (− sq2∗ cq3 − cq2∗ sq3)∗ a5∗ sq5 + (sq2∗ cq3 + cq2∗ sq3)∗ d4 + sq2∗ d3∗ cq3 + cq2∗ d3∗ sq3 + a2∗ sq2 + d1 + d0; (4) where cq stands for cos(q ) and sq stands for sin(q ). rotation angles of the manipulator joints. *e network i i i i *e data for the network DL2 model are the spatial consists of 9 hidden layers with theRelu activation function. coordinate sets of the end-effector point and the corre- *e number of nodes per layer is presented in Figure 13. sponding set of joint variable parameters that have been Training results and prediction results of motor control collected and fed into the training DL2 network multiple signals are shown in Figure 14. Check on the test data with times until the model can give control signals for the ma- input as the position vector of the end-effector point in the nipulator accurately, meeting the motion requirements. workspace is x � 􏼂 0 20 0 􏼃 (mm), and the output of the After training and assessing responsiveness well, the DL2 test data corresponds to the joint variable value. *e q value model is used as a model to predict manipulator rotation obtained from the model is angle values with object positions in the manipulator q � 􏼂 90 50 105 90 79 􏼃 (deg). *us, the accuracy is workspace. 98.67% on the test dataset. Figure 10 describes the entire process where the DL2 *e actual experimental system with the circuit reading model is built with input as the request signal received after and writing the joint variable values and the feedback values encoding the vector and feasible position data in the on the 16 × 2 LCD is shown in Figure 15. workspace. *e output of the model is the corresponding *e joint variable values to control the manipulator arm joint variable values. to the position on the object (a yellow wheel) are shown in Figure 16. 3. Experimental Results 4. Discussion *e geometry parameters of the manipulator are as follows: In actual operation, industrial robots in general and re- d � 57 mm, dundant manipulators in particular often perform not as d � 36 mm, perfectly as calculated in ideal conditions due to the in- fluence of many different factors called noise that create the a � 120 mm, imperfect robot control system. According to [31], although (5) d � 90 mm, imperfections are unavoidable in real production processes, the real devices still operate well in regimes far from ideality. d � 30 mm, For example, mechanical imperfections may occur prior a � 38 mm. to operation due to mechanical manufacturing defects, as- sembly errors, or during operation due to mechanical system *e joint variable limits are as follows: vibrations. Meanwhile, electrical imperfections can be − 2.97≤ q ≤ 2.97(rad), caused by the electromagnetic interference of the sur- rounding environment, the instability of the power supply, − 0.52≤ q ≤ 2.7(rad), or high-intensity electric pulses of welding machines. To − 0.785≤ q ≤ 2.97(rad), overcome the imperfections, additional modules related to (6) noise compensation, noise cancellation, or noise suppres- − 2.7≤ q ≤ 2.7(rad), sion will be studied in the next research stages. − 0.785≤ q ≤ 2.97(rad), *is study only considers the problem of kinematics in ideal conditions or the impact of noise can be ignored. In − 3.57≤ q ≤ 3.57(rad). fact, it is not possible to have a general anti-interference *e workspace of the manipulator arm is shown in problem for all types of noise. *erefore, when practically Figure 11. applied, the research team will apply anti-interference so- *e drive motors are Servo MG995, Arduino Nano lutions suitable for each context. Circuit, Logitech B525-720p camera, Dell Precision M680 In the case of group coordination between multiple laptop, and Razer Seiren Mini microphone (Figure 12). voice-controlled robots in a narrow space, naming or coding Network parameter DL2 controlling the manipulator is for each robot needs to be done through an independent shown in Figure 13 with 5 outputs corresponding to 5 module with name recognition or decoding capabilities. y (mm) y (mm) 8 Journal of Robotics Training Good Begin and End evaluating Learn DL2 Updated Predict for dataset model DL2 model new datasets Not good Figure 10: *e building process for the DL2 model. –100 –100 –200 –200 –100 –100 –300 –200 –200 –100 (a) (b) Figure 11: *e workspace of 6DOF manipulator. (a) (b) (c) (d) (e) (f) (g) (h) (i) Figure 12: Devices in the experimental system. (a) Dell Precision laptop, (b) Logitech camera, (c) RC Servo MG995, (d) micro, (e) Arduino Mega 2560, (f) 12 V-5 A adapter, (g) XL4015 5A DC, (h) 16 × 2 LCD, and (i) joystick shield. Figure 13: Network parameter DL2 for IK control. x (mm) x (mm) z (mm) z (mm) Journal of Robotics 9 Figure 14: Training results and prediction on the test dataset. Laptop Camera Robot 6DOF Micro (a) (b) Figure 15: *e actual experimental system. 150 interference “can be improved by including long-range connections between the robots” [32]. 5. Conclusion In summary, the PYTHON language has been applied to build AI models for the Vietnamese voice recognition module and IK control for the 6DOF redundant manipu- lator. DL and ML techniques have been applied successfully with over 98% training accuracy. Data used for training models DL1 and DL2 are independently built according to the Vietnamese language and calculated data from 6DOF manipulator kinematics modeling. AI models are tested on 0 100 200 300 400 500 real manipulator models and gave possible results. *is Time (seconds/100) study could serve as the foundation for developing appli- cations for various types of manipulators (serial manipu- Joint 1 Joint 4 lators, parallel manipulators, hybrid manipulators, and Joint 2 Joint 5 Joint 3 mobile manipulators) for industrial production (welding robots, robot 3D printing, and machining robots), medical, Figure 16: *e joint values are received by the voice command. service industries, home activities (surgical robots, flexible robots, soft robots, humanoid robots, UAVs, service robots When the operator calls the robot’s name or activates the in families, and restaurants). code, the related robot is ready to receive the next voice commands. *us, when it is necessary to add a new robot to Data Availability an existing robot network, it is possible to adjust the module of name recognition or decoding without any change in the *e datasets generated during the current study are available entire control system. from the corresponding authors on reasonable request. Differently, in a robot network, the audio imperfections may come from the voice interference. *e audio imper- Conflicts of Interest fections can be solved by the effect of different range con- nections controlled by a central dispatcher and the voice *e authors declare that they have no conflicts of interest. Value of joint position (deg) 10 Journal of Robotics kinematics problem of a 6 D.O.F serial robot manipulator,” References AdvancesinEngineeringSoftware, vol. 37, no. 7, pp. 432–438, [1] M. W. Spong, S. Hutchinson, and M. Vidyasagar, Robot 2006. Modeling and Control, First edition, 2001. [20] Y. Zhou, W. Tang, and J. Zhang, “Algorithm for multi-joint [2] D. X. Bien, “On the effect of the end-effector point trajectory redundant robot inverse kinematics based on the bayesian-BP on the joint jerk of the redundant manipulators,” Journal of neural network,” in Proceedings of the 2008 International AppliedandComputationalMechanics, vol. 20, no. 10, pp. 1–8, Conference on Intelligent Computation Technology and Au- 2021. tomation (ICICTA), pp. 173–178, Changsha, China, 2008. [3] C. A. My, D. X. Bien, H. B. Tung, L. C. Hieu, N. V. Cong, and [21] B. Daya, S. Khawandi, and M. Akoum, “Applying neural T. V. Hieu, “Inverse kinematic control algorithm for a welding network architecture for inverse kinematics problem in ro- robot positioner system to trace a 3D complex curve,” in botics,” Journal of Software Engineering and Applications, Proceedings of the International Conference on Advanced vol. 3, no. 3, pp. 230–239, 2010. TechnologiesforCommunications(ATC), pp. 319–323, Hanoi, [22] A.-V. Duka, “Neural network based inverse kinematics so- Vietnam, October 2019. lution for trajectory tracking of a robotic arm,” Procedia [4] S. Lian, Y. Han, Y. Wang et al., “Accelerating inverse kine- Technology, vol. 12, pp. 20–27, 2014. matics for high-DOF robots,” in Proceedings of the 54th [23] R. Koker ¨ and T. Çakar, “A neuro-genetic-simulated annealing Annual Design Automation Conference, Austin, TX, USA, approach to the inverse kinematics solution of robots: a 2017. simulation based study,”EngineeringwithComputers, vol. 32, [5] https://www.edureka.co/blog/artificial-intelligence- no. 4, pp. 553–565, 2016. algorithms/. [24] A. R. J. Almusawi, L. C. Dulger, ¨ and S. Kapucu, “A new [6] https://www.geeksforgeeks.org/top-5-best-programming- artificial neural network approach in solving inverse kine- languages-for-artificial-intelligence-field/. matics of robotic arm (denso VP6242),” Computational In- [7] https://www.cuelogic.com/blog/role-of-python-in-artificial- telligence and Neuroscience, vol. 2016, Article ID 5720163, intelligence. 10 pages, 2016. [8] D. P. Mital and G. W. Leng, “A voice-activated robot with [25] P. M. Shailendrasingh and L. P. Pratap, “A real-time approach artificial intelligence,” Robotics and Autonomous Systems, using feed forward neural network for a 5 DOF robot ma- vol. 4, no. 4, pp. 339–344, 1989. nipulator,” in Proceedings of the IEEE International Confer- [9] S. Hwang, Y. Park, and Y.-s. Park, “Sound direction esti- ence on Power, Control, Signals and Instrumentation mation using an artificial ear for robots,” Robotics and Au- Engineering (ICPCSI-2017), pp. 1240–1245, Chennai, India, tonomous Systems, vol. 59, no. 3-4, pp. 208–217, 2011. September 2017. [10] A. Rogowski, “Industrially oriented voice control system,” [26] A. Garzelli, L. Capobianco, and F. Nencini, “Fusion of Robotics and Computer-Integrated Manufacturing, vol. 28, multispectral and panchromatic images as an optimization no. 3, pp. 303–315, 2012. problem,” Image Fusion Algorithms and Applications, [11] V. Alvarez-Santos, R. Iglesias, X. M. Pardo, C. V. Regueiro, pp. 223–250, Academic Press, Cambridge, MA, USA, 2008. and A. Canedo-Rodriguez, “Gesture-based interaction with [27] https://www.securityinfowatch.com/video-surveillance/ voice feedback for a tour-guide robot,” Journal of Visual video-analytics/article/21069937/deep-learning-to-the- Communication and Image Representation, vol. 25, no. 2, rescue. pp. 499–509, 2014. [28] https://www.programmersought.com/article/10025152444/. [12] S. S. Turakne and P. Loni, “Intelligent interactive robot with [29] https://www.Tensorflow.org/api_docs/python/tf/keras/losses/ gesture recognition and voice feedback,”InternationalJournal sparse_categorical_crossentropy. of Engineering Research & Technology, vol. 5, no. 4, [30] https://www.programmersought.com/article/33553292079/. pp. 276–280, 2016. [31] M. Bucolo, A. Buscarino, C. Famoso, L. Fortuna, and [13] M. Meghana, Ch. U. Kumari, J. S. Priya et al., “Hand gesture M. Frasca, “Control of imperfect dynamical systems,” Non- recognition and voice controlled robot,” Materials Today: linear Dynamics, vol. 98, no. 4, pp. 2989–2999, 2019. Proceedings, vol. 33, no. 7, pp. 4121–4123, 2020. [32] A. Buscarino, L. Fortuna, M. Frasca, and A. Rizzo, “Dynamical [14] M. F. Rafael and D. S. Manuel, “Design in robotics based in network interactions in distributed control of robots,”Chaos: the voice of the customer of household robots,” Robotics and An Interdisciplinary Journal of Nonlinear Science, vol. 16, Autonomous Systems, vol. 79, pp. 99–107, 2016. no. 1, Article ID 015116, 2006. [15] M. Buyukyilmaz and A. O. Cibikdiken, “Voice gender rec- ognition using deep learning,” Advances in Computer Science Research, vol. 58, pp. 409–411, 2017. [16] K. Gundogdu, S. Bayrakdar, and I. Yucedag, “Developing and modeling of voice control system for prosthetic robot arm in medical systems,” Journal of King Saud University-Computer and Information Sciences, vol. 30, no. 2, pp. 198–205, 2018. [17] V. P. Saradi and P. Kailasapathi, “Voice-based motion control of a robotic vehicle through visible light communication,” Computers & Electrical Engineering, vol. 76, pp. 154–167, [18] S. Sachdev, J. Macwan, C. Patel, and N. Doshi, “Voice-con- trolled autonomous vehicle using IoT,” Procedia Computer Science, vol. 160, pp. 712–717, 2019. [19] A. T. Hasan, A. M. S. Hamouda, N. Ismail, and H. M. A. A. Al- Assadi, “An adaptive-learning algorithm to solve the inverse

Journal

Journal of RoboticsHindawi Publishing Corporation

Published: Jun 29, 2021

References