Simulation Design of a Live Working Manipulator for Patrol Inspection in Power Grid
Simulation Design of a Live Working Manipulator for Patrol Inspection in Power Grid
Xie, Tao;Li, Zhengyu;Zhang, Yongyun;Yuan, Bin;Liu, Xiao
2022-09-14 00:00:00
Hindawi Journal of Robotics Volume 2022, Article ID 7318090, 8 pages https://doi.org/10.1155/2022/7318090 Research Article Simulation Design of a Live Working Manipulator for Patrol Inspection in Power Grid 1 2 2 2 2 Tao Xie , Zhengyu Li , Yongyun Zhang , Bin Yuan , and Xiao Liu State Grid Shanxi Electric Power Company, Taiyuan, Shanxi 030021, China State Grid UHV Transformation Co. of SEPC, Taiyuan, Shanxi 030032, China Correspondence should be addressed to Tao Xie; 220192216007@ncepu.edu.cn Received 15 August 2022; Revised 24 August 2022; Accepted 26 August 2022; Published 14 September 2022 Academic Editor: Shahid Hussain Copyright © 2022 Tao Xie et al. �is is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. �e distribution line network is the electric power infrastructure directly facing the users, with the characteristics of large coverage and complex network, and its operation safety is directly related to the stability and reliability of the power supply system, which is the key link to ensure the safety of power supply and the reliability of residential electricity consumption. In order to realize the autonomous obstacle avoidance and autonomous navigation of the live working manipulator for inspection and maintenance of the power grid equipment, a mobile manipulator intelligent control method combining SumTree-weighted sampling and deep deterministic policy gradient (DDPG) is proposed. Firstly, the traditional DDPG algorithm is improved to optimize the action value function of Q-learning to get a better control strategy, and the weighted sampling technique is used to add priority to each sample in the replay bu“er, which improves the learning speed and accelerates the convergence speed. �e corresponding environmental state space is designed, and simulation experiments are conducted to verify the proposed manipulator control method. Simulation results demonstrate that the proposed method performs better than traditional DDPG and DQN methods in obstacle avoidance and navigation tasks, with faster convergence, better path planning ability, and lower o“set cost, which can provide theoretical and technical references for realizing fully autonomous power grid inspection operations. Since the 1980s, the United States, Canada, Spain, 1. Introduction France, Japan, and other developed countries have succes- Mechanization, automation, and intelligentization of live sively carried out the research of like working robots, such as working operation are the development trend and vital Japan Kyushu Electric Power Company, the State Grid technological direction to realize intelligent inspection Corporation of China, and so on. According to their au- and maintenance of power grid and safe live working tomation degree, robots can be mainly divided into three operation [1]. At present, the research related to the categories, namely, manual remote control, remote tele- automation of substation equipment with power grid operation, and autonomous operation [4]. Manipulator operation has been carried out worldwide, pilot appli- control is based on an automation control algorithm, which cations have been carried out, and certain results have automatically searches the locations of the objects in the been achieved, which can replace or assist the manual space and accurately identi¢es the obstacles in the travel path completion of some typical inspection and maintenance to plan a most reasonable motion trajectory [5]. Deep projects [2]. �is kind of technology has been innovatively learning is an advanced stage in the development of arti¢cial developed and gradually applied in recent years, which is intelligence and one of the most important and popular conducive to enriching the technical means of power grid research areas of arti¢cial intelligence, but there are di“erent operation with a live working environment and improving states in the obstacle avoidance navigation process of the the safety and automation level of the operation, and has robot, which needs to solve the two core problems of large good development prospects [3]. di“erences in strategy distribution and sparse positive 2 Journal of Robotics feedback rewards. )e traditional reinforcement learning optimization approach in which the learned deep dynamic model cannot explore the space completely, causing the model was used to compute the policy gradient and the value function was adopted as a criterion, thus improving the strategy to fall into local minima easily, and cannot derive the full-space manipulator obstacle avoidance and naviga- efficiency of training. tion control strategy through long training time and mul- In the study of practical operations of robotic arms, in tiple training rounds [6]. [13], based on deep reinforcement learning techniques, the In this article, through deeply analyzing the live working optimal grasping point and grasping stability of the robotic process and action sequence of the mobile manipulator, the arms were evaluated considering the working performance traditional deep deterministic policy gradient (DDPG) al- under different tasks, in order to realize the requirement of gorithm is optimized and improved to improve the di- grasping different shapes of objects in various tasks and to mensionality, scalability, and generalization capability of the complete the collision-free autonomous mobile robot nav- algorithm, making it more applicable to the trajectory igation. Sangiovanni et al. [14] applied reinforcement planning and obstacle avoidance control of the mobile learning methods to the obstacle avoidance and navigation tasks of the robot manipulators and achieved obstacle-free manipulator and improving the control accuracy of the trajectories during the manipulator movement. Specifically, path planning for the robot. However, the model is only based on the DDPG algorithm, this article adopts a weighted trained and tested for a single target point, and the full sampling method with a SumTree data structure instead of operational space decision was not achieved. Wei and Shan uniform sampling, so that successful samples have a higher [15] proposed to treat the three arms of the manipulator as chance of being learned by the agent, and the Q-learning three different agents, and the three arms were restricted to action value function is optimized and applied to the au- constantly be in the same plane and the control strategies tonomous power grid inspection and maintenance tasks. were set separately to realize the full working space decision- Simulation results prove that the proposed method enables making of the robot manipulator. In summary, the above the manipulator to complete the tasks more accurately and methods do not change the way of sample extraction but still quickly. uniformly collect samples from the replay buffer for training, and the agents cannot learn the successful samples effi- )e rest of this article is organized as follows: Section 2 introduces the related research. )e basic principles of the ciently, which leads to a long training time [16]. proposed method are explained in Section 3. Section 4 describes the proposed intelligent manipulator control 3. Fundamental Principles method. Section 5 gives the simulation results and discus- 3.1. Live Working Manipulator and Kinematic Analysis. sion. Finally, Section 6 concludes the whole article. Considering that the weight of the end-effector for live working operation is more than 5 kg, this article takes the 2. Related Works UR10 robot arm as the research object and further carries out its abstract modeling and kinematic analysis, mainly A lot of effective path planning methods have been proposed studying the motion state and control strategy of the three- for mobile robots worldwide, but most of the path planning segment arm of UR10 in the Cartesian coordinate system algorithms for mobile robots are not applicable to robotic [17]. arms because manipulators are complex nonlinear systems with challenging factors such as high degrees of freedom and coupling between connecting rods, which make the planning 3.1.1. Cartesian Coordinate System. In the Cartesian coor- more difficult [7]. dinate system, (x, y, z) coordinates can be used to represent Researchers have carried out the exploration of deep any position information in three-dimensional space by reinforcement learning in the field of grasping, localization, constructing multiple Cartesian coordinate systems of the and obstacle avoidance of industrial manipulators with some same basic direction with {o , o , o , o } as the origin, the first 1 2 3 4 success. Mnih et al. [8] proposed the Deep Q-Network and last positions of each endpoint of the manipulator in (DQN) algorithm, which combines neural networks with Q- three-dimensional space can be calculated by accumulating learning. )e model was trained in the ATARI2600 game, the increments of each segment as shown in Figure 1. and the final performance was much higher than that of humans. Lillicrap et al. [9] proposed the DDPG algorithm and applies it to a high-dimensional continuous action 3.1.2. Motion Degree Freedom Analysis. )e kinematic space. Rusu et al. [10] transferred the training results in a analysis of the manipulator used in this article is mainly a simulation environment to a physical robotic arm and forward operation problem [18]. )e forward operation achieved similar results to the simulation after only a simple problem is to solve the endpoint coordinates of the ma- training. Liz et al. [11] proposed an improved DDPG al- nipulator in three-dimensional space given the arm lengths gorithm in which a success experience pool and a collision l , l , and l of the three-segment arm and the rotation angles 1 2 3 experience pool are added to optimize the traditional ex- θ , θ , θ , θ , θ , and θ . Using the z-coordinate axis as the 1 2 3 4 5 6 perience pool. Compared with the traditional DDPG al- rotation axis and the endpoint of the first segment of the gorithm, the improved DDPG algorithm has fewer training robot arm as the origin, a manipulator model is constructed, episodes, but achieves better results. Luck et al. [12] com- and six angles of rotation are defined as six degrees of bined the DDPG algorithm with a model-based trajectory freedom. θ and θ control the first segment, where θ 1 2 1 Journal of Robotics 3 θ 5 (a) (b) Figure 1: Diagrams of the manipulator and its simplified model in the Cartesian coordinate system. (a) UR10 manipulator. (b) Simplified model. represents the angle of rotation of l around the z-axis and θ executing a in the environment s . Typically, the environ- 1 2 i i represents the angle of l with the x-y plane. l , l , and the z- ment in which the long-term gain is scaled down by the 1 1 2 axis are kept in the same plane. θ controls l and represents parameter c: 3 2 its angle with l . For the convenience of calculation, it is c i expressed by the angle between the second segment arm and R � c r s , a , (3) i i the x-y plane of the o coordinate system. θ ∼θ control l , 2 4 6 3 i�0 where θ and θ act similar to θ and θ to make l move at 4 5 1 2 3 where c ∈ (0, 1). )e long-term benefits of performing ac- any angle within the o coordinate system, and θ controls l 3 6 3 tion a in an environment s are generally represented by the to rotate around its own arm axis without affecting the action value function: endpoint coordinates. )e coordinates of each node of the manipulator can be calculated as T c i− t ⎡ ⎣ ⎤ ⎦ Q s , a � ER |s � s , a � a � E c r s , a . (4) t t t t i i ⎧ ⎪Δl � l cos θ cos θ , l cos θ sin θ , l sin θ , i�t 1 1 2 1 1 2 1 1 2 Δl � l cos θ cos θ , l cos θ sin θ , l sin θ , ⎪ 2 2 3 1 2 3 1 2 3 )is optimal action value function is usually found using the Bellman equation [20]: ⎨ Δl � l cos θ cos θ , l cos θ sin θ , l sin θ , 3 3 5 4 3 5 4 3 5 (1) o � o + Δl , 2 1 1 Q∗ s , a � Er s , a + c max Q∗ s , a . (5) t t t t t+1 t+1 o � o + Δl , t+1 3 1 2 o � o + Δl . However, this approach is only suitable for those situ- 4 3 3 ations where both action space and state space are discrete. To apply reinforcement learning to the problem where ac- tion space and state space are continuous, DDPG designed 3.2. Deep Deterministic Policy Gradient. In a standard re- Q two deep neural networks: action value network Q(s , a |θ ) t t inforcement learning environment, each agent interacts with μ Q μ and action network μ(s , a |θ ), where θ and θ are network t t the environment with the ultimate goal of maximizing μ parameters. Action network μ(s , a |θ ) is a mapping cor- t t environmental gain. )is interactive process is formally responding to the state space and action space, which can described as a Markov decision process (MDP) [19], which directly generate the desired action based on the state. )e can be described by the quadruplet (S, A, R, P). S represents Q action value network Q(s , a |θ ) is used to approach the t t the state space, A represents the action space, action value function and can provide gradients for the R: S × A ⟶ R is the reward function, and training of the action network. P: S × A × S ⟶ [0, 1] is the transfer probability. In this )e training of the action value networks is to minimize environment, an intelligence learns a strategy π: S ⟶ A to the loss function: maximize the gain in the environment: Q Q′ Q T ′ Lθ � r s , a + cQ s , a |θ − Qs , a |θ , t t t+1 t+1 t t R � r s , a , (2) 0 i i (6) i�0 where T is the number of steps advanced at the end of the where Q′ is the target value network with synchronized interaction and r(s , a ) denotes the gain obtained by weights from the Q-network. And the update of the action i i 4 Journal of Robotics network parameters requires using the policy gradient al- Control gorithm in which the gradient update direction can be Strategy expressed as: Q Q μ (s , a , s , r , δ ) ∇ μ Qs, a|θ | � ∇ Qs, a|θ | ∇ μμ