Get 20M+ Full-Text Papers For Less Than $1.50/day. Start a 14-Day Trial for You or Your Team.

Learn More →

Autonomous Robotic Manipulation: Real-Time, Deep-Learning Approach for Grasping of Unknown Objects

Autonomous Robotic Manipulation: Real-Time, Deep-Learning Approach for Grasping of Unknown Objects Hindawi Journal of Robotics Volume 2022, Article ID 2585656, 14 pages https://doi.org/10.1155/2022/2585656 Research Article Autonomous Robotic Manipulation: Real-Time, Deep-Learning Approach for Grasping of Unknown Objects 1 2 1 Malak H. Sayour , Sharbel E. Kozhaya , and Samer S. Saab Department of Electrical Mechatronics and Computer Engineering, Lebanese American University, Beirut, Lebanon Henry Samueli School of Engineering, University of California, Irvine, USA Correspondence should be addressed to Samer S. Saab; ssaab@lau.edu.lb Received 24 January 2022; Revised 15 April 2022; Accepted 17 May 2022; Published 30 June 2022 Academic Editor: Weitian Wang Copyright © 2022 Malak H. Sayour et al. �is is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. Recent advancement in vision-based robotics and deep-learning techniques has enabled the use of intelligent systems in a wider range of applications requiring object manipulation. Finding a robust solution for object grasping and autonomous manipulation became the focus of many engineers and is still one of the most demanding problems in modern robotics. �is paper presents a full grasping pipeline proposing a real-time data-driven deep-learning approach for robotic grasping of unknown objects using MATLAB and convolutional neural networks. �e proposed approach employs RGB-D image data acquired from an eye-in-hand camera centering the object of interest in the Šeld of view using visual servoing. Our approach aims at reducing propagation errors and eliminating the need for complex hand tracking algorithm, image segmentation, or 3D reconstruction. �e proposed approach is able to e‘ciently generate reliable multi-view object grasps regardless of the geometric complexity and physical properties of the object in question. �e proposed system architecture enables simple and e’ective path generation and a real-time tracking control. In addition, our system is modular, reliable, and accurate in both end e’ector path generation and control. We experimentally justify the e‘cacy and e’ectiveness of our overall system on the Barrett Whole Arm Manipulator. designing hand-engineered features for the detection of 1. Introduction grasps. �e performance is then evaluated according to Recent advancements in robotics and autonomous systems physical models [6] such as the ability to resist external have resulted in an increase in autonomous capabilities and wrenches [7] or the ability to constrain the object’s motion the use of more intelligent machines in a wider range of [6, 8]. Such hand-engineered function could reach a very applications [1–4]. Gripping various geometric objects is a complex state which implies lots of generalization and key competency for robots to achieve more general-purpose simpliŠcation of the kinetic and physical problem. �e work functionality [5]. �e perceptual abilities of such general- in [9] describes an automated grasping system for 2.5D purpose robots might be used to visually recognize robotic objects. �e approach relies on capturing the object surface end e’ector conŠgurations to safely grab and lift an object curvature with a focus on concave and convex edges to without slipping [5]. Multiple techniques have been pro- achieve 2D immobilization using numerical computation posed to generate e’ective grasps for everyday robotic ap- and hand-engineered equations. �e optimal grasp is then plications. Nevertheless, robotic grasping still provides many generated and extended to 2.5D by performing tactile challenges for researchers and is still among the most de- probing. �e work in [10] proposes a grasping algorithm for manding problems in modern robotics. 3 Šnger grippers. �eir grasp synthesis approach relies on �e various approaches of robotic grasping can be di- object curvature by mean of a curvature function. �e vided into two categories, analytical and data-driven [5]. generated grasps are then Šltered to achieve maximum Most of the e’ort of traditional analytical approaches lies in stability using multiple kinematic criteria such as the force- 2 Journal of Robotics appealing for heavy industrial applications [37]. Real-time closure criterion and finger adaptation criterion. Analytical methods may not be the ideal approach for the exploration control is a critical and essential feature for operation of autonomous industrial manipulators [38]. Multiple of previously unknown scenes since they restrict the system to situations predicted by the programmer [5]. In addition, techniques have emerged to enable real-time control of such methods tend to heavily rely on the geometric features autonomous systems [39, 40]. To enable flexible and easy of a specific object, which might be corrupted with partial control of industrial manipulators, suppliers like universal and noisy information from image and depth sensors [11]. robot or Barrett often include external controllers or On the other hand, empirical methods provide an increased internal PCs. )ose external controllers however increase cognitive and adaptive capacity to the robots, while elimi- the cost of robot manipulators and make it even harder for companies to adopt autonomous solution for their desired nating the need to manually model a robotic solution [12]. Some techniques associate good grasps points with an offline applications. Implementing easy flexible and modular real-time controllers could help engineers build real-time database [11] reducing the challenge to object recognition and pose estimation during online execution. However, this solutions and decrease the cost autonomous systems. Industrial companies such as Ocado technologies, BMW approach is unsuitable for the exploration of unknown scenes and is unable to generalize to new objects. group, and Amazon are looking for finding simple and Deep leaning techniques were proven efficient in effective solutions that can lead to accurate grasp generation solving multiple automation problems [13–15] in a variety of unknown objects with low computation efforts [41, 42]. of fields including medicine [16–18], security [19, 20], )e latter is considered as the primary motivation behind management [21–24], and Internet of things [25]. With our work. We propose a simple, effective, and complete the proliferation of vision-based deep-learning techniques grasp pipeline for practical applications, a pipeline that is easy to implement by novice engineers, easy to handle by [26–29], significant advancements have been made in the generation of potential grasps. Multiple deep-learning average computers, and performs well in noisy industrial environment. We present a full real-time grasping pipeline approaches have been studied over the years; see, e.g., [5] and the references therein. Some of the methods rely on with real-time MATLAB Simulink controller and data- driven deep-learning approach for robotic grasping of un- deploying static cameras for object detection and pose estimation [30], and the other class makes use of eye-in- known objects using generative grasps Deep Convolutional hand cameras mounted on the robot manipulator [31]. Neural Networks (CNN). Although static cameras can achieve efficient task per- We make the following contributions: formance, they require complex hand tracking algo- (i) Computer Vision: the proposed computer vision rithms, 3D reconstruction, and image segmentation for approach integrates multiple simple and effective accurate estimation of object pose and orientation [30]. As techniques for grasp generation. In addition, relying for the control and decision-making problem, multiple on eye-in-hand camera and visual servoing, we are architectures have been proposed to achieve autonomous able to reduce the error propagation while elimi- manipulation [30–32]. )e work in [30] proposes the use nating the need of complex hand tracking algo- of a logical structuring of tasks driven by a behavior tree rithms. Detecting and centralizing the object in architecture. )is allows a reduction in execution time question helped in capturing the object features and while integrating advanced algorithms for autonomous led to more accurate grasps. manipulation. However, behavior trees are found to be (ii) Real-Time control: this paper proposes a real-time computationally heavy due to the involved ticking pro- controller using MATLAB Simulink real time for cedure [33]. For complex movement tasks, the reader is autonomous manipulation. )e proposed real-time referred to the work in [34]. )is study proposes a co- controller eliminates the need of internal PC or the variance matrix adaptation-evolution strategies to ensure need of external controllers. Our proposed ap- the safety and robustness of the robot’s movements in a proach involves a simple block diagram, highly noisy environment. )e approach takes advantage of the flexible and more intuitive for the user. dynamic movement primitives method to represent the robot’s reference trajectories in joint space that are used (iii) Flexible grasp generation: since CNN limits the for trajectory tracking. A vision system is employed to get grasp generation to the normal of the plane of view, the coordinates of the target object. )e effectiveness of we propose a flexible multi-view grasp generation the entire system is tested on a 4-DOF Barrett WAM while eliminating such limitations. )e generated robotic arm. )e Robot Operating System (ROS) on the grasp is therefore not restricted to a single plane but other hand is also widely used for both the control rather multiple angles of view capturing more problem and interprocess communication [12]. ROS is an features. open-source robotic middleware platform for large-scale )e structure of the paper is as follows. Section 2 robotic system development. )e work in [35] provides a presents the proposed system as well as an overview of the structured communication layer on top of the host op- methods used for autonomous grasp generation and ma- erating system for heterogeneous cluster computing [36]. nipulation. Section 3 includes our experimental results. While ROS is proven efficient in robotics applications, Finally, the paper comes to a conclusion and motivates real-time capabilities remain a serious issue [37]. ROS further challenges. does not support real-time applications and may not be Journal of Robotics 3 graspable object is needed. Coded in Python, we implement 2. Methodology a task planner that uses PyTorch and OpenCV to detect and In this section, we describe each part of our system archi- locate the object; we apply visual servoing and generate firm tecture illustrated in Figure 1. In addition, we present our grasps based on RGB-D images of the object in question. proposed grasp planning system that deals with object de- More details about the planner are provided in the following tection, object centralization, and grasp generation algo- section. Once the desired end effector is generated, the rithm; path planning that generates the desired trajectories output is sent to the real-time controller for execution. of the robot joint positions; and the associated tracking controller. 2.2. Grasp Planning. Generating a gripper configuration that maximizes a success metric consists of the main objective of 2.1. System Architecture grasp planning. Taking into consideration numerous object shapes and all possible orientation variations in the work- 2.1.1. Hardware. For autonomous grasping, the Barrett space, the planner aims at providing the correct grasp Whole Arm Manipulator (WAM) 7-DOF robotics arm is representation that leads to successful object gripping. employed. )e WAM arm has direct-drive capability sup- Robust grasp planning considers the presence of pertur- ported by Transparent Dynamics between the motors and bation in the object properties such as shape and poses joints providing a robust control of contact forces. Mounted caused by noisy and partial sensor readings as well as control on the Barrett manipulator is the BH8-series Barrett Hand, a imprecision. To ensure successful and accurate grasp highly flexible eight-axis gripper with three fingers. In our planning, three key tasks are defined: (1) object detection, (2) work, the third finger is removed to simulate a parallel plate object centralization, and (3) grasp generation. gripper effect. For RGB-D image acquisition, the Microsoft Azure Kinect RGB-D sensor is mounted on the wrist to achieve dynamic sensing of the environment and objects. 2.2.1. Object Detection. Autonomous manipulation requires )e Azure Kinect integrates a Microsoft-designed 1- localization of the desired graspable objects in 3D space. )is Megapixel Time-of-Flight (ToF) depth camera using the information is essential for motion planning since it sets the image sensor presented at ISSCC 2018. )e depth sensor goal location for the manipulator end effector. For per- supports the following five modes: NFOV unbinned, ceiving objects in the vicinity of a robot, 2D vision data such NFOV 2 × 2 binned, WFOV 2 × 2 binned, WFOV unbinned, as edge information are used to localize the object. A 3 × 3 and passive IR. kernel horizontal and vertical edge enhancement is imple- mented. )e resultant image tends to be noisy which ne- cessitates the use of noise reduction where in our case we 2.1.2. Real-Time Controller and Communication. We set the employ erosion kernels. A 3 × 3 kernel is subsequently ap- Barrett WAM control loop frequency at 500 Hz. )at is, at plied to reduce noise and distortion in the image followed by each 2 ms interval, a signal is sent to the joint encoders a 3 × 3 dilation kernel to restore the previous edge scale. (puck) for the arm to function properly; otherwise, this Following this step, bounding boxes are generated around might lead to a heartbeat fault where the manipulator would the available objects each with a coupled ID. After specifying stop operating. It is important to note that along the ex- the desired Object ID, the generated Bounding Box is locked periment the internal PC of the Barrett WAM was not used. reducing any effect of the external environment noise and/or In addition, for the closed-loop control, current joint angle perturbation. Figure 2 illustrates the steps involved in our acquisition is necessary as well as the desired end effector object detection approach. pose. )is calls for a real-time controller that will ensure stable communication in a precise time window. For that purpose, MATLAB Simulink real-time is selected for robot 2.2.2. Object Centralization. )e main cause behind inac- manipulation. To ensure reliable communication between curate grasp generation is in the camera perspective. For the WAM and the MATLAB, a UDP communication is robust grasp generation, localizing the object is not sufficient established. In particular, a UDP-receive and UDP-send and uniform object locations in the field of view of the block is used. Each joint has a unique ID which helps in camera could be beneficial. Furthermore, most depth identifying the origin of the message sent as well as the cameras possess an optimal working range. However, when destination of the generated control signal. A customized dealing with unknown objects with different heights and UDP-to-CAN bus communication converter is imple- shapes, using a stationary camera would not be the best mented allowing for commands send by UDP from the solution. In our system, we install the camera at the end MATLAB controller to be converted to CAN and sent to the effector making the camera dynamic. As the end effector or WAM joints. camera moves towards an object from its initial position, a better shot can be captured depending on the object shape 2.1.3. Task Planner and Computer Vision. Once our real- and camera position. )is approach assists in reducing the errors resulting from multilevel processing and camera time controller is implemented, our manipulator becomes capable of reaching the desired end effector target. At this perspective. Visual servoing, also known as vision-based robot control, is implemented to center the object in the field stage, a task planner that generates the desired end poses based on the surrounding environment and the desired of view of the camera. Visual servoing [43] is a technique 4 Journal of Robotics Customized UDP to CAN bus communication converter Real Time Controller Task Planner (i) Path generation (i) Object detection (ii) Inverse Kinematics Kinect RGB-D (ii) Visual Servoing (iii) Gravity Compensation sensor (iii) RGB-D image processing (iv) PID Controller (iv) Grasp generation using CNN Barrett WAM (v) Motor and encoders transformations and Hand Figure 1: System architecture. Bilateral Filter Edge 3×3 Erosion 3×3 Dilation Dispaly Enhancement Figure 2: Object detection steps. postprocessing techniques and present our CNN for grasp that uses feedback information extracted from a vision sensor to control the motion of a robot [44]. )is technique generation. (see Table 1) minimizes the camera distortion effects and ensures that the object is fully contained in the field of view. )ree visual (1) Preprocessing. Irrespective of the available depth-sensing servoing techniques are discussed in the literature; see, e.g., techniques such as Stereo Reconstruction, Infrared, Time of [45], in particular, position-based visual servo (PBVS) and Flight, and LiDAR, the depth information is always bounded image-based visual servo (IBVS) and hybrid visual servo. In by a minimum and maximum range, beyond which the PBVS, vision data is used to reconstruct the 3D pose of the depth information is clipped. Adding to this, light reflection, manipulator and a kinematic error is generated and mapped refraction, diffusion, and shadows all contribute to lowering to the robot actuator commands. In IBVS, the position error the efficiency of the depth sensor and producing invalid is generated directly from the image plan features. Fur- spots in the depth image. To mitigate the effect of invalid data in our depth image that produces invalid grasps, thermore, there exist two architectures for visual servo in- tegration in the control loop. )e first architecture is known prepossessing of the depth map is crucial before further data manipulation. To do so, we propose a technique that detects as dynamic look and move, which relies on external input provided by the vision system to the joint closed-loop invalid pixels in the depth image and reconstructs the invalid control of the manipulator. )e second architecture is data. )e proposed preprocessing method is detailed in known as direct visual servo, which uses the vision system Algorithm 1 and illustrated in Figure 3. directly in the control loop. In our work, we use an image- based visual servo in a dynamic look and go configuration. (2) Generative Grasp Convolutional Neural Network. As )e bounding box location is used as our visual feedback and mentioned in the introduction section, the aim of our work the arm is kept moving until the calibrated camera center-line is to implement a full, simple, and effective grasp pipeline to is aligned with the object location, and until the distance from enable grasping of unknown objects that can be easily adopted in the industry. )us, one of the main factors to be the camera to the object’s uppermost plane is equal to an “optimal” depth value for grasp generation. Consequently, the considered is the speed and size of the network. Many acquired depth image is used for grasp generation. networks have been implemented in the field to enable accurate and precise grasp generation such as the network presented in [48]. For our pipeline, we use a fully con- 2.2.3. Deep-Learning-Based Grasp Generation. In order to volutional neural network for grasp generation that is generate efficient grasps for diverse geometric shapes, the proposed in [31], which is a generative grasp convolutional employed deep-learning approach is presented. To ensure neural network. )e main advantage behind this network firm grasping with an effective approach in capturing the relates to its small number of parameters. It consists of six geometric key-points of the objects, we use preprocessing convolutional layers (see Figure 4) with a total of 62,420 techniques targeting edge enhancement, and postprocessing parameters compared to 8,806,986 parameters in the net- techniques for feature matching. In what follows, we de- work proposed by [31]. )e relatively small number of scribe the steps employed in our preprocessing and parameters yields less computation efforts that can be easily Journal of Robotics 5 Table 1: What our work has to offer. A comparison between our approach and the related work in the field. Feature [46] [11] [30] [31] [47] Our approach Unknown objects No Yes No Yes Yes Yes Object feature matching NA NA NA No Yes Yes Eye-in-hand camera and object centering No No No Yes No Yes Grasp generation complexity DL + analytical DL 3-stage DL Analytical DL Training data availability No No NA Yes NA Yes Real robot experiments No Yes Yes Yes Yes Yes Multi-view grasp generation No No No No No Yes Robot control No No Yes No No Yes Real-time control without arm controller No No No No No Yes Depth image with (i) 2D gradient (iv) Flood filling Depth inpainting invalid (ii) Thresholding (iv) Dilation and smoothing spots (Black) (iii) Creating mask (OpenCV method) Figure 3: Depth image preprocessing steps. (1) Capture image (with likelihood of invalid depth data) (2) Generate 2D gradient of the depth map (3) )reshold the gradient to produce a binary mask (4) Flood-fill the mask. At this stage a mask that indicates all the invalid spots in the depth image is generated (5) Dilate generated mask with a 3 × 3 kernel (6) Feed the generated mask to OpenCV inpainting function that attempts to deduce and reconstructs the invalid data from the pixels’ value in its neighborhood ALGORITHM 1: Preprocessing algorithm. 300×300 inpainted depth image pi/2 0 θ -pi/2 300×300 pixels Figure 4: Description of the neural network architecture that is implemented in our study. )e generative grasp CNN takes as an input an inpainted depth image, and generates a grasp pose for every pixel, comprising the grasp quality, Q , grasp width, W , and grasp angle, φ . Θ Θ Θ conv 9×9 32 Filters, Stride 3 conv 5×5 16 Filters, Stride 2 conv 3×3 8 Filters, Stride 2 conv Transpose 3×3 8 Filters, Stride 2 conv Transpose 5×5 16 Filters, Stride 2 conv Transpose 9×9 32 Filters, Stride 3 6 Journal of Robotics handled by our employed network compared to other more eliminate all false-positives and would only keep the valid evolved CNN used for the same purpose. grasps for consideration. On the other hand, in most cases, the width of the line )ere are two main advantages of the proposed GG- CNN over other state-of-the-art grasp synthesis CNNs. estimated by the neural network does not match the exact First, rather than sampling grasp candidates, we directly width of the object’s feature. A discrepancy between the real create grasp poses on a pixel-by-pixel basis, similar to feature width and the manipulator opening width can lead to improvements in object detection, where fully con- undesirable results and failure of the grasping attempt. To volutional networks are frequently utilized to provide mitigate the effect of this mismatch, we propose a post- pixel-wise semantic segmentation rather than sliding processing method presented in Algorithm 2 and illustrated windows or bounding boxes [49]. Second, unlike other in Figure 5. CNNs used for grasp synthesis, our GG-CNN has orders Adding to the above, the neural network can detect of magnitude fewer parameters, allowing for fast closed- various potential grasps for a unique object. To eliminate loop grasping. Our grasp detection pipeline can be ex- further chances of errors and confusion, ten trials are specified. Similar grasps are aggregated based on the ecuted in only 19 ms on a GPU-equipped desktop computer. output of the trials. After querying the trial, the ma- )e neural network proposed approximates the complex nipulator considers the most recurring grasp as the final �����→ function M: I ⟶ G, by training the network. M(I) can be potential grasp that is ready for application. Once the learned by using a set of input images IT and corresponding ideal grasp is generated, there is no need to keep the output grasps GT. )e network equation can be expressed by object in the center of view. )e next challenge lies in applying the grasp by accurately controlling the robotic M (I) � Q ; φ ; W 􏼁. (1) manipulator. Θ Θ Θ Θ )e grasp map G estimates the parameters of a set of grasps, for each Cartesian point in the 3D space corre- 2.3. Path Planning sponding to each pixel in the captured image. It constitutes a 2.3.1. Inverse Kinematics. )e motion planning component set of 3 images denoted as, Q, φ, and W. )e representations are as follows: of the system is responsible for computing trajectories for the arm that will place the gripper fingers at or close to M (I) denotes the neural network with Θ being the weights of the network and I representing the 300 × 300 the generated grasp configuration. To complete this task, a mapping between the world and joint coordinates is pixel RGB-D input image. Q describes the grasp quality. Its scalar value ranges within [0,1] with 0 indicating the required. We approach this problem by using inverse ki- lowest quality while one the highest grasp quality. φ nematics. Inverse kinematics (IK) is a technique that maps denotes the grasp angle where the angles belong to the end effector position to the corresponding joint angles. [(−π/2), (π/2)]. W describes the grasp width of a grasp to Given a desired end effector pose, the IK solver generates the be executed. joint angles needed to move the arm to the desired pose )e network is trained over the Cornell Grasping Dataset avoiding collisions with itself. To compute the correct joint positions that leads to a certain six-dimensional Cartesian which contains 885 RGB-D images of everyday objects, with a variable number of human-labeled positive and negative pose, detailed knowledge of the geometry and configuration of the robot is essential. )e SolidWorks Computer Aided grasps. )e network is trained for 32 epochs over 8 hours on an NVIDIA Quadro P100 GPU. Design (CAD) model provided by Barrett offers detailed information about the joint configuration of the robot. To (3). Postprocessing. )e grasp representation generated by transfer this information into our controller, a Unified the GG-CNN output consists of a line position coupled with Robot Description (URDF) model is generated. )e URDF a specific width and angle. )e line position specifies where model takes advantage of the geometric data provided by to look for potential grasp in the depth image, while the SolidWorks and generates an XML file that contains detailed information associated with each link such as the joints angle provides us with the information about the rotation of the manipulator end effector with respect to its normal. After limits and link length. )e latter parameters are listed in Table 2. experimenting with the neural network performance, a considerable amount of false-positive potential grasps has To apply inverse kinematics (IK), the Inverse Kinematics block provided by Simulink is used. )e IK solver provided been observed. )is can be brought back to the noise in- troduced by the background of the object (i.e., uneven by MATLAB allows for accurate joint angle computation without the need for in-depth knowledge of the robot surface of the table that translates into a small variation in the depth image and triggers the GG-CNN to detect an geometry and joint configuration. It also eliminates the invalid grasp) or sensor readings. For this purpose, we need of setting Denavit–Hartenberg (DH) frames trans- implement a grasp validation condition that checks the formation necessary for generation of the robot inverse amount of depth variation in the potential grasp in order to kinematics model. )e IK block requires a stored con- figuration file and three main inputs. )e configuration increase the chances of successful grasps. In particular, if depth is below a certain threshold, the grasp generated is file allows for specific equation generation associated with the described robot geometry. )e generated URDF model disregarded and not accounted for. )is approach would Journal of Robotics 7 Unmatched width Matched width 1.0 0.5 0.0 –0.5 –1.0 0 20 40 60 80 100 120 Pixels Figure 5: Depth image postprocessing: matching the generated grasp width to the object feature. )e blue line represents the grasp generated by the GG-CNN. )e purple line represents the processed grasp generated matching the grasp line to the object width. (1) Plot the depth versus image pixels from the starting point of the line to its end. th (2) Apply a low pass filter to the data. In this paper, a 5 -order Butter-worth filter is selected chosen due to its flat profile across the bandwidth of interest (3) Take the gradient of the low pass filtered depth data. (4) Find all the local minima and maxima above a certain threshold. )is is essential due to the fact that some small irregularities in object’s surface can induce a local minima or maxima. (5) Select the first minimum and last maximum and compute the grasp width from the distance between the generated points. ALGORITHM 2: Postprocessing algorithm. Table 2: Barrett WAM joint limits and link lengths. Joint Lower limit (rad) Upper limit (rad) Link Length (mm) J1 −2.6 2.6 Base link 186 J2 −2.0 2.0 L1 550 J3 −2.8 2.8 L2 300 J4 −0.9 3.1 L3 60 J5 −4.76 1.24 J6 −1.6 1.6 J7 −3.0 3.0 Depth 8 Journal of Robotics configurations can lead to the same end effector pose. is used as a configuration file. )e proposed method makes the approach highly flexible. Changing the entire ma- Inconsistency of joint configurations or non-opti- mal path generation can be controversial. )e work nipulator geometry or even using another manipulator can be simply done by replacing the configuration file in [50] proposes an incremental inverse kinemat- stored in the IK block by the corresponding arm model, ics-based vision servo approach for robotic ma- eliminating the need for link parameter adjustment or axis nipulators to capture non-cooperative objects reconfiguration. autonomously. )e proposed approach generates )e first input specifies the current joint angle config- the joint angles iteratively at each time stamp based uration of the robot. )e second input allows us to control on the object location detected by a stationary the error tolerance of the end effector position. )at is, if the camera. )e approach assumes a fixed-camera- end effector is within the specified margin, the pose is frame to base-frame transformation and uses the actual robot pose as a starting point to generate considered as attained. )is input is presented with a list of six variables: 􏼈δx, δy, δz􏼉 represent the error tolerance of the appropriate joint angles. Inspired by this technique, we use the current pose of the robot manipulator as Cartesian translation components of the end effector, and 􏼈δΨ, δθ, δϕ􏼉 represent the error tolerance of the Euler ro- an initial guess in the provided IK block. )erefore, tation components of the end effector. )e Euler angle configurations closer to the actual robot state are system is a method to describe the coordinate transfor- favored over others. However, since the base joint mations. It consists of three independent variables, ψ θ, ϕ of the Barrett is limited to −2.6 and 2.6 radiant as defined as successive planar rotation angles around the x, y, shown in Table 2, choosing the closest joint config- and z axes, respectively. )e latter values describe the end uration as a starting point can result in longer path leading to non-optimal path generation. To resolve effector pose. High tolerance in the translational x-axis means that a relatively high error in the x translation this issue, we propose the use of the two inverse ki- nematics modules. One uses the current joints as component of the gripper is acceptable. Similarly, a relatively high tolerance in the rotational z-axis means that a high initial guess and the other has no initial guess. )e two outputs are fed into an error function. Based on the error in the z orientation component is acceptable. )is helps the IK solver in defining the most critical position output of this function, the configuration with smaller components and plan the path accordingly. error is used. )e third and final input deals with the initial guess. )e proposed approach is simple and comes with low Inverse kinematics algorithms do not provide one unique computational efforts. )e efficacy of our proposed so- joint configuration. Multiple joint angles, as well as multiple lution is justified experimentally. It also generates the paths, can lead to the same end-effector pose in the 3D space. joint angles within five milliseconds. )is approach Some IK solvers fail to converge sometimes with the eliminates the need for generating adequate initial guess presence of strict time limits and others require heavy or the use of optimization techniques. It is also important computation to plan one path. )e initial guess helps the IK to note that the generated arm motion is not limited to a solver in finding the best path by providing an ultimate planar motion and can follow any reachable path in the 3D starting point for the algorithm. )is approach reduces space. computation, makes the planner faster, and leads to a more accurate path. )e current joint angles are fed as an initial 2.3.2. Coordinate System. For robotic manipulation, we guess to inform the solver that joint angles closer to the define two coordinate systems depicted in Figure 6, namely, actual position are desired. )is approach helps with the A and B. We set A, the frame corresponding to the base link elimination of the singularities in the joint configuration that of the Barrett WAM, to be the base frame. We define B to be rotates the base by 360 degrees, for example, to reach a pose the camera frame placed at arm end effector. To obtain the while zero degrees can do the job. camera frame, the forward kinematics plugin provided by To make sure the joint configuration generated from MATLAB is used. )e camera fixation part is designed in inverse kinematics is reliable, we implement the following SolidWorks and added to the urdf model of the manipulator two-step approach: as a virtual link. )e latter is then used to generate the camera (1) In order to establish our operating space domain, joints frame using forward kinematics and end effector pose. Fol- limits are employed. In particular, before generating lowing this coordinate system, all end effector goal poses are the urdf file, safe joint limits are added to the CAD defined with respect to the manipulator base. Similarly, all design on SolidWorks and the unified robot descrip- detected poses and vision output are transformed back from tion model is generated accordingly. )e generated the camera frame to the base frame. )e joint coordinates are URDF is then stored inside the IK block. By following defined with respect to the manipulator home pose, which this approach, all unattainable singularities are auto- implies a fully extended arm in the positive z direction. matically dropped due to violation of the joint limit )erefore, a pose in the joint space is based on seven joint constrains where the space of singularities becomes angles describing the difference angle between the joint goal confined to the operating space of the robotic arm. and home pose. Mapping from a 6D pose defined in the global (2) It is widely known that Inverse kinematics does not base frame to seven joint values defined in the joint space is generate unique solutions and that multiple joint the main task of the inverse kinematics block. Journal of Robotics 9 Task space x  y  z X, Y, Z coordinates grasp width grasp Angle End effector Axis‐Angle representation Figure 6: Coordinate system and grasp representation. )e x-axis is represented in red, y-axis in green, and the z-axis in blue. Joint 1 Joint 2 Joint 3 Joint 4 Joint 5 Joint 6 Joint 7 Torques Figure 7: )e first seven subplots illustrate the tracking performance of the employed controller corresponding to the seven joints. Each of the seven plots includes the reference trajectory of the joint position versus its actual position. )e eighth subplot shows the relatively smooth torques that are associated with the seven joints. Figure 8: Modeling of finger closing path. 2.3.3. Safe Motion. Once the joint positions are calculated by exceeds TL2, a torque fault is raised and the arm stops. To the IK block and before feeding it to the position controller, avoid any velocity or torque fault, a velocity limiter block is certain conditions have to be applied to the velocity to secure added. )e specified block can slow down the velocity while safe control. In Barrett safety system, VL2 is defined as the keeping the control operation safe. )is step fortifies a safe safety system’s motor velocity fault limit while TL2 repre- operation but can introduce jerky movements. To ensure a sents the safety system’s motor torque fault level. Velocities smoothed path, a spline function is used. )is function higher than VL2 cause a velocity fault and stop the entire smooths out the generated path while securing zero initial arm motion in a E-stop effect. Similarly, if any joint torque velocity and acceleration. 10 Journal of Robotics 2.4. Controller. )e nonlinear dynamics model describing order filter at the PID inputs to reduce the measurement the continuous-time dynamics of a rigid n-link serial robot noise. manipulator, with all actuated revolute joints described in Before sending the torques to the arm, appropriate joint coordinates, can be represented by the Euler-Lagrange transformations are applied to transform the joint torques to model as follows motor torques and then to motor current. For that, we use the following: u � M(q)q € + C(q, q _)q _ + b(q _) + g(q), (2) M � VJ , (3) τ τ i i n n where q ∈ R represents joint angles, q _ ∈ R represents where i is the joint number and V is the joint to motor n×n joint velocities, M(q) ∈ R is a symmetric positive torque transformation matrix provided by Barrett and n×n definite inertia matrix, C(q, q _) ∈ R is the Coriolis and defined as centripetal matrix, b(q _) ∈ R represents the joint friction, n n g(q) ∈ R is the gravity vector, and u ∈ R stands for the ⎧ ⎪ ⎫ ⎪ ⎪ ⎪ − 0 0 0 0 0 0 ⎪ ⎪ ⎪ ⎪ torque input. ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ Once the desired path is generated, the required torques ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ needed to move the end effector to the desired pose must be ⎪ ⎪ ⎪ ⎪ ⎪ 1 n ⎪ ⎪ 3 ⎪ automatically generated. Many different discrete-time ⎪ ⎪ 0 − 0 0 0 0 ⎪ ⎪ ⎪ 2N 2N ⎪ multivariable controllers have been proposed that can be 2 2 ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ applied to our problem such as for the deterministic setting ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ [51, 52] and the references therein. However, the inevitable ⎪ ⎪ 1 n ⎪ ⎪ ⎪ ⎪ noise associated with the measurement of the joint angles 0 − − 0 0 0 0 ⎪ ⎪ ⎪ ⎪ ⎪ 2N 2N ⎪ 2 2 ⎪ ⎪ yields poor performance and possible vibrations in the arm. ⎪ ⎪ ⎪ ⎪ )ere are also multivariable iterative-type controllers with ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ the capability of suppressing measurement noise such as the ⎨ ⎬ 0 0 0 − 0 0 0 , (4) ones proposed in [29, 53, 54]; however, accurate tracking ⎪ ⎪ ⎪ N ⎪ 4 ⎪ ⎪ ⎪ may require several iterations, and the learning would be ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ based on a specific trajectory. A common class of closed- ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ loop controllers employs slide-mode approach; see, e.g., ⎪ ⎪ 1 n ⎪ 6 0 0 0 0 − 0 ⎪ ⎪ [55, 56] and the references therein. )ere are also several ⎪ ⎪ ⎪ 2N 2N ⎪ ⎪ 5 5 ⎪ ⎪ ⎪ optimal stochastic simultaneous multivariable controllers ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ such a PD-type [57], and ones with gravity compensation ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ 1 n [58], PID-type [59–61], and even PDD [62]. Although the ⎪ ⎪ ⎪ ⎪ 0 0 0 0 0 ⎪ ⎪ ⎪ ⎪ works in [55–57, 62] include experimental justification on ⎪ 2N 2N ⎪ 5 5 ⎪ ⎪ ⎪ ⎪ the Barrett WAM, they do not deal with controlling its ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ gripper. All of the aforementioned strategies and many ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ others come with a number of unique features; however, ⎪ 0 0 0 0 0 0 − ⎭ they require deep comprehension of control theory and/or some knowledge of the statistical models. In this paper, we and N and n are constants provided by Barrett. i i use simple single-input single-output PID controller that Following this step, we multiply the output by the can be designed by any novice engineer. In particular, we motor-toque-to-motor-current ratio provided by Barrett tackle the torque control problem by introducing closed- and finally send the signal to the arm pucks for motion. loop PID control with gravity compensation. Gravity )e tracking performance of the seven joints and corre- compensation is a well-known technique in robot design, sponding torques are shown in Figure 7. It is important to which compensates for the force generated by the gravity. note that although the Barrett WAM comes with a force- )is technique requires the knowledge of the gravity model. torque sensor at the wrist, we do not use the sensor in our We use the corresponding model parameters provided by controller. Barrett. To account for gravity forces, we set B(q) to be the inverse of the inertia matrix, which is a symmetric and − 1 nonsingular. )at is, B(q) � M (q). )us, the gravity compensation term A(q) is the product of the inverse inertia 2.4.1. Gripper Control. For the control of the Barrett WAM matrix and the gravity force vector, gripper, no closed-loop control is needed to be implemented − 1 A(q) � −M (q)g(q) � −B(q)g(q). )is term is added to and no parameters are needed to be set or tuned. )e gripper the control vector. )e gravity compensation is imple- simply accepts finger commands in a specific CAN format mented in a MATLAB Simulink block and added to the provided by Barrett that allows to move the gripper fingers to controller. For closed-loop control, seven independent PID the desired pose as well as open and close commands for controllers are implemented and manually tuned for each grasping and disposing objects. Since the closing profile of joint. To tune the PID, a sine wave is used as a fictitious the gripper is curved rather than being a straight line as reference trajectory of the joints, and the controller pa- shown in Figure 8, modeling of the gripper closing path rameters are adjusted accordingly. We also deploy a first- needs to be done in order to link the opening width to the Journal of Robotics 11 Table 3: Grasp performance. Item Box Tape Piller Tube Eraser Open box Number of trials 20 25 15 10 20 5 Success rate (%) 90 92 93 80 95 100 Figure 9: Grasp generation results: comparison between grasp generated by the GG-CNN with and without RGB-D image preprocessing for shiny and black objects. )e upper 2 rows show the generated grasp output without preprocessing of the RGB-D data. Grasps generated are invalid and the depth info is corrupted. )e lower 2 rows show the generated grasp output after preprocessing. )e depth data was reconstructed leading to valid grasps. boxes, tape, tubes, and other objects. )e proposed approach finger height and avoid any collision with the table. Fol- lowing this step, we use the postprocessed width generated is shown to be efficient with high-success rates as docu- in Subsection B as a pre-grasping finger opening. )en, only mented in Table 3 even when the objects are shiny and when the gripper reaches the ideal grasping pose, is the associated with a defective depth map. )e used technique closing signal sent. Since the Barrett gripper is equipped with helps in the depth map reconstruction leading to accurate a force-torque sensor, determining the sufficient force to grasp generations. Figure 9 shows the neural network output hold the object in question is not needed. )e gripper fingers grasp before and after applying the pre- and postprocessing stop automatically whenever enough force is applied to techniques. stabilize the object in question. As for the multiple view grasp generation, Figure 9 il- lustrates the results of the aforementioned techniques in filtering disturbances from the external environment. )e 3. Experimental Results grasp generation is successful even on transparent or light reflecting objects. )is technique enables capturing the full To assess the performance of our proposed system, a series of object feature in a simple and effective way without the need experiments are conducted. Multiple unknown objects with for 3D reconstruction or point cloud generation which could different geometric representations are used to test the be challenging in noisy industrial environments. system performance. In our evaluation, we focus on 2 cri- teria: the grasp generation and the control of the manipu- lator. In the first series of experiments, we evaluate the 3.2. Overall Performance. Coupled with a reliable control of performance of our approach for grasp generation. In the the Barrett manipulator, successful grasp generation led to second series, the grasp generation is integrated into the successful grasps of multiple objects regardless of their complete grasping pipeline to assess the torque control and geometric shapes or orientation. Visual servoing perfor- path planning. mance is proven efficient in limiting grasp application error resulting from camera perspective. )e implemented algo- 3.1. Grasp Generation. )e efficiency of the approach fol- rithm also centers the object of interest in the field of view lowed for grasp generation is assessed using various pre- within the desired time frame. Our controller yields smooth, viously unknown objects with complex geometric relatively fast path execution driving the end effector to the representations. )e objects vary between multiple me- desired pose with millimeters accuracy, and matching of the chanical tools, transparent and nontransparent bottles, end effector contact point with the generated grasp, which With pre-processing Without pre-processing Depth RGB image Depth RGB image image with grasp image with grasp 12 Journal of Robotics gripper types. In addition, since convolutional neural made grasping of complex shapes possible. Table 3 shows the success rate of our proposed automated grasping approach. network limits the grasp to the normal of the plane of view, having an eye-in-hand camera allowed for multi- A video describing the overall project performance can be displayed by visiting https://www.youtube.com/watch? view grasp generation. )is technique enables the gen- v�gL5oefp_QKoVideo. eration of grasps in multiple plans covering multiple features and leading to a better grasp generation for complex shapes. 4. Conclusion and Future Work )e multi-view grasp generation was implemented We presentenced a data-driven deep-learning approach to from the vision and grasp generation perspective; how- robotic grasping of unknown objects with eye-in-hand ever, our solution has not employed kinematics. Tech- camera and real-time controller. Our paper has tackled the niques using kinematics would allow approaching the object from various angles, which could significantly autonomous grasping problem from start to finish covering WAM real-time control, data manipulation, grasp genera- increase the grasping accuracy and the object immobility. Furthermore, dynamic collision avoidance implementa- tion, and execution. Our proposed approach is suitable for the exploration of unknown scenes and generation of ef- tion could be of great benefit. Implementing such tech- fective grasps for various geometric shapes and unknown niques might result in a more robust system suitable for objects without the need for complex hand tracking algo- human-robot application by increasing the safety level rithm, image segmentation, or 3D reconstruction. )anks to and flexibility of the system. the unique well-defined decision-making tree and the in- tegration of different techniques, we were able to eliminate Data Availability the error propagation resulting from multilevel processing, camera perspective. and non-ideal depth range. Starting )e justification of this work is experimental and no specific with visual servoing, we centered the object in question in data are available. However, we have a link to a video in the the field of view reducing perceptual error and eliminating manuscript that illustrates the overall performance that can error propagation. Following this step, we implemented be displayed by visiting https://www.youtube.com/watch? high level prepossessing computer vision-based tech- v=gL5oefp_QKo. niques coupled with edge enhancements and depth data reconstruction method to account for shiny surfaces and Conflicts of Interest light variation making the object localization, grasp generation, and execution more robust. For the grasp No potential conflicts of interest are reported by the authors. generation problem, a generative grasp convolutional neural network was implemented providing efficient grasp References configuration for parallel grippers. We topped this by a postprocessing layer that matched the generated width [1] M. Lopes and J. Santos-Victor, “Visual learning by imitation with the object feature gradients. To the best of our with motor representations,” IEEE Transactions on Systems, knowledge, the latter is a functional attribute that has not Man, and Cybernetics. Part B, Cybernetics: a Publication of the been considered in the literature. IEEE Systems, Man, and Cybernetics Society, vol. 35, no. 3, From the robotics and control point of view, we tackled pp. 438–449, 2005. [2] C. Follini, V. Magnago, K. Freitag et al., “Bim-integrated the control of the arm in a simple and effective approach. In collaborative robotics for application in building construction particular, we proposed a simple real-time control imple- and maintenance,” Robotics, vol. 10, no. 1, p. 2, 2021. mentation of the Barrett WAM manipulator using MATLAB [3] B. Abouzeid, G. El Hasbani, J. Abouzeid, S. Halleit, and Simulink real-time module. Unlike the majority of the work K. Jallad, “Cost comparison of gynecologic procedures be- implemented on MATLAB for robotic manipulators, tween the us and a developing country: an observational MATLAB was not only used for simulation or as a tool that study,” Journal of Robotic Surgery, vol. 16, pp. 1–5, 2021. communicates with the arm controller such as the work in [4] E. Fakhoury, P. Culmer, and B. Henson, “)e impact of visual [63, 64]. In fact, we used MATLAB Simulink for the full arm cues on haptic compliance discrimination using a pseudo- control whereas the arm internal PC was left undeployed. By haptic robotic system,” in Proceedings of the 2017 IEEE In- applying our approach to a class of robotic manipulators, ternational Conference on Robotics and Biomimetics (ROBIO), one can eliminate the need of the controller box that takes pp. 1719–1725, IEEE, Macau, Macao, December 2017. [5] S. Caldera, A. Rassau, and D. Chai, “Review of deep learning additional space and increases the cost of the arm. )is could methods in robotic grasp detection,” Multimodal Technologies be substantially beneficial whenever a company desires to and Interaction, vol. 2, no. 3, p. 57, 2018. adapt a robotic solution in limited spaces with lower cost. [6] G. Kootstra, M. Popovic, ´ J. A. Jørgensen et al., “Enabling Furthermore, the proposed approach is not only cost grasping of unknown objects through a synergistic use of edge effective and leads to reduction in space; it is also simple and surface information,” Ae International Journal of Ro- and modular. In our work, the deployed gripper is botics Research, vol. 31, no. 10, pp. 1190–1213, 2012. equipped with a force toque sensor that ensured firm [7] F. T. Pokorny and D. Kragic, “Classical grasp quality eval- grasps. For future work, a force optimization layer as uation: new algorithms and theory,” in Proceedings of the 2013 proposed in [65] can be added to our system to make the IEEE/RSJ International Conference on Intelligent Robots and approach more modular and functional with different Systems, pp. 3493–3500, IEEE, Tokyo, Japan, November 2013. Journal of Robotics 13 [8] A. Rodriguez, M. T. Mason, and S. Ferry, “From caging to scheduling,” Journal of Management in Engineering, vol. 37, grasping,” Ae International Journal of Robotics Research, no. 1, Article ID 04020104, 2021. [23] S. AbdulRahman, H. Tout, A. Mourad, and C. Talhi, vol. 31, no. 7, pp. 886–900, 2012. [9] P. Bender and G. M. Bone, “Automated grasp planning and “FedMCCS: multicriteria client selection model for optimal execution for real-world objects using computer vision and iot federated learning,” IEEE Internet of Aings Journal, vol. 8, tactile probing,” International Journal of Robotics and Au- no. 6, pp. 4723–4735, 2020. [24] B. Tay and A. Mourad, “Intelligent performance-aware ad- tomation, vol. 19, no. 1, pp. 15–27, 2004. [10] A. Morales, P. J. Sanz, A. P. Del Pobil, and A. H. Fagg, “Vision- aptation of control policies for optimizing banking teller based three-finger grasp synthesis constrained by hand ge- process using machine learning,” IEEE Access, vol. 8, ometry,” Robotics and Autonomous Systems, vol. 54, no. 6, pp. 153403–153412, 2020. pp. 496–512, 2006. [25] H. Tout, N. Kara, C. Talhi, and A. Mourad, “Proactive ma- [11] P. Schmidt, N. Vahrenkamp, M. Wachter, ¨ and T. Asfour, chine learning-based solution for advanced manageability of “Grasping of unknown objects using deep convolutional multi-persona mobile computing,” Computers & Electrical neural networks based on depth images,” in Proceedings of the Engineering, vol. 80, Article ID 106497, 2019. 2018 IEEE International Conference on Robotics and Auto- [26] B. M. Haddad, S. F. Dodge, L. J. Karam, N. S. Patel, and mation (ICRA), pp. 6831–6838, IEEE, Brisbane, QLD, Aus- M. W. Braun, “Locally adaptive statistical background tralia, May 2018. modeling with deep learning-based false positive rejection for [12] G. Konidaris, S. Kuindersma, R. Grupen, and A. Barto, “Robot defect detection in semiconductor units,” IEEE Transactions learning from demonstration by constructing skill trees,” Ae on Semiconductor Manufacturing, vol. 33, no. 3, pp. 357–372, International Journal of Robotics Research, vol. 31, no. 3, 2020. pp. 360–375, 2012. [27] A. Almeshal and M. Alenezi, “A vision-based neural network [13] H. Abdellatef, M. Khalil-Hani, N. Shaikh-Husin, and controller for the autonomous landing of a quadrotor on S. O. Ayat, “Accurate and compact convolutional neural moving targets,” Robotics, vol. 7, no. 4, p. 71, 2018. network based on stochastic computing,” Neurocomputing, [28] A. Helwan, M. K. S. Ma’aitah, H. Hamdan, D. U. Ozsahin, and vol. 471, pp. 31–47, 2022. O. Tuncyurek, “Radiologists versus deep convolutional neural [14] A. Nader and D. Azar, “Searching for activation functions networks: a comparative study for diagnosing COVID-19,” using a self-adaptive evolutionary algorithm,” in Proceedings Computational and Mathematical Methods in Medicine, of the 2020 Genetic and Evolutionary Computation Conference vol. 2021, Article ID 5527271, 9 pages, 2021. Companion, pp. 145-146, Cancun, ´ Mexico, July 2020. [29] D. Shen, N. Huo, and S. S. Saab, “A probabilistically quantized [15] M. Orabi, J. Khalife, A. A. Abdallah, Z. M. Kassas, and learning control framework for networked linear systems,” S. S. Saab, “A machine learning approach for gps code phase IEEE Transactions on Neural Networks and Learning Systems, estimation in multipath environments,” in Proceedings of the vol. PP, 2021. 2020 IEEE/ION Position, Location and Navigation Symposium [30] J. A. Bagnell, F. Cavalcanti, L. Cui et al., “An integrated system (PLANS), pp. 1224–1229, IEEE, Portland, OR, USA, April for autonomous robotics manipulation,” in Proceedings of the 2020. 2012 IEEE/RSJ International Conference on Intelligent Robots [16] P. Saade, R. El Jammal, S. El Hayek, J. Abi Zeid, O. Falou, and and Systems, pp. 2955–2962, IEEE, Vilamoura-Algarve, D. Azar, “Computer-aided detection of white blood cells using Portugal, October 2012. geometric features and color,” in Proceedings of the 2018 9th [31] D. Morrison, P. Corke, and J. Leitner, “Closing the loop for Cairo International Biomedical Engineering Conference robotic grasping: a real-time, generative grasp synthesis ap- (CIBEC), pp. 142–145, IEEE, Cairo, Egypt, December 2018. proach,” 2018, https://arxiv.org/abs/1804.05172. [17] D. Azar, R. Moussa, and G. Jreij, “A comparative study of nine [32] R. K. Megalingam, R. V. Rohith Raj, T. Akhil, A. Masetti, machine learning techniques used for the prediction of dis- G. N. Chowdary, and V. S. Naick, “Integration of vision based eases,” International Journal of Artificial Intelligence, vol. 16, robot manipulation using ROS for assistive applications,” in no. 2, pp. 25–40, 2018. Proceedings of the 2020 Second International Conference on [18] S. Zhang, Y. Wang, J. Jiang, J. Dong, W. Yi, and W. Hou, Inventive Research in Computing Applications (ICIRCA), “CNN-based medical ultrasound image quality assessment,” pp. 163–169, Coimbatore, India, July 2020. Complexity, vol. 2021, Article ID 9938367, 9 pages, 2021. [33] M. Colledanchise and P. Ogren, Behavior Trees in Robotics [19] T. Borkar, F. Heide, and L. Karam, “Defending against uni- and AI: An Introduction, CRC Press, Boca Raton, FL, USA, versal attacks through selective feature regeneration,” in Proceedings of the 2020 IEEE/CVF Conference on Computer [34] Y. Hu, X. Wu, P. Geng, and Z. Li, “Evolution strategies Vision and Pattern Recognition (CVPR), pp. 709–719, June learning with variable impedance control for grasping under uncertainty,” IEEE Transactions on Industrial Electronics, [20] O. A. Wahab, J. Bentahar, H. Otrok, and A. Mourad, “Re- vol. 66, no. 10, pp. 7788–7799, 2018. source-aware detection and defense system against multi-type [35] A. Koubaa, ˆ Robot Operating System (ROS), Springer, Berlin, attacks in the cloud: repeated Bayesian Stackelberg game,” Germany, 2017. IEEE Transactions on Dependable and Secure Computing, [36] M. Quigley, K. Conley, B. Gerkey et al., “ROS: an open-source vol. 18, no. 2, pp. 605–622, 2019. robot operating system,” in Proceedings of the ICRA Workshop [21] O. A. Wahab, R. Cohen, J. Bentahar, H. Otrok, A. Mourad, on Open Source Software, Kobe, Japan, January 2009. and G. Rjoub, “An endorsement-based trust bootstrapping [37] H. Wei, Z. Shao, Z. Huang et al., “RT-ROS: a real-time ROS approach for newcomer cloud services,” Information Sciences, architecture on multi-core processors,” Future Generation vol. 527, pp. 159–175, 2020. Computer Systems, vol. 56, pp. 171–178, 2016. [22] M. Awada, F. J. Srour, and I. M. Srour, “Data-driven machine [38] P. Zhang, Advanced Industrial Control Technology, William learning approach to integrate field submittals in project Andrew, Norwich, NY, USA, 2010. 14 Journal of Robotics [39] C. Acosta Lua, ´ C. C. Vaca Garc´ıa, S. Di Gennaro, B. Castillo- Electronics & Mobile Communication Conference (UEMCON), Toledo, and M. E. Sanchez ´ Morales, “Real-time hovering pp. 716–720, IEEE, New York, NY, USA, November 2018. [56] R. H. Jaafar and S. S. Saab, “Approximate differentiator with control of unmanned aerial vehicles,” Mathematical Problems varying bandwidth for control tracking applications,” IEEE in Engineering, vol. 2020, Article ID 2314356, 8 pages, 2020. Control Systems Letters, vol. 5, no. 5, pp. 1585–1590, 2020. [40] Y. Zeng, J. Sheng, and M. Li, “Adaptive real-time energy [57] S. S. Saab Jr., M. Hauser, A. Ray, and S. S. Saab, “Multivariable management strategy for plug-in hybrid electric vehicle based nonadaptive controller design,” IEEE Transactions on In- on simplified-ECMS and a novel driving pattern recognition dustrial Electronics, vol. 68, no. 7, pp. 6181–6191, 2020. method,” Mathematical Problems in Engineering, vol. 2018, [58] S. S. Saab and P. Ghanem, “A multivariable stochastic tracking Article ID 5816861, 12 pages, 2018. controller for robot manipulators without joint velocities,” [41] C. Eppner, S. Hofer, ¨ R. Jonschkowski et al., “Lessons from the IEEE Transactions on Automatic Control, vol. 63, no. 8, amazon picking challenge: four aspects of building robotic pp. 2481–2495, 2017. systems,” in Robotics: Science and Systems, 2016. [59] S. S. Saab, “An optimal stochastic multivariable PID con- [42] N. Correll, K. E. Bekris, D. Berenson et al., “Analysis and troller: a direct output tracking approach,” International observations from the first amazon picking challenge,” IEEE Journal of Control, vol. 92, no. 3, pp. 623–641, 2019. Transactions on Automation Science and Engineering, vol. 15, [60] S. S. Saab, “Development of multivariable PID controller gains no. 1, pp. 172–188, 2016. in presence of measurement noise,” International Journal of [43] F. Chaumette, S. Hutchinson, and P. Corke, “Visual servoing,” Control, vol. 90, no. 12, pp. 2692–2710, 2017. in Springer Handbook of RoboticsSpringer, Berlin, Germany, [61] S. S. Saab, “A stochastic PID controller for a class of MIMO systems,” International Journal of Control, vol. 90, no. 3, [44] S. Hutchinson, G. D. Hager, and P. I. Corke, “A tutorial on pp. 447–462, 2017. visual servo control,” IEEE Transactions on Robotics and [62] S. S. Saab and R. H. Jaafar, “A proportional-derivative-double Automation, vol. 12, no. 5, pp. 651–670, 1996. derivative controller for robot manipulators,” International [45] G. Palmieri, M. Palpacelli, M. Battistelli, and M. Callegari, “A Journal of Control, vol. 94, no. 5, pp. 1273–1285, 2021. comparison between position-based and image-based dy- [63] R. R. Ardeshiri, M. H. Khooban, A. Noshadi, N. Vafamand, namic visual servoings in the control of a translating parallel and M. Rakhshan, “Robotic manipulator control based on an manipulator,” Journal of Robotics, vol. 2012, Article ID optimal fractional-order fuzzy PID approach: sil real-time 103954, 11 pages, 2012. simulation,” Soft Computing, vol. 24, no. 5, pp. 3849–3860, [46] J. Varley, J. Weisz, J. Weiss, and P. Allen, “Generating multi- fingered robotic grasps via deep learning,” in Proceedings of [64] J. Till, C. E. Bryson, S. Chung, A. Orekhov, and D. C. Rucker, the 2015 IEEE/RSJ International Conference on Intelligent “Efficient computation of multiple coupled cosserat rod Robots and Systems (IROS), pp. 4415–4420, IEEE, Hamburg, models for real-time simulation and control of parallel Germany, September 2015. continuum manipulators,” in Proceedings of the 2015 IEEE [47] B. Sauvet, F. Levesque, S. Park, P. Cardou, and C. Gosselin, International Conference on Robotics and Automation (ICRA), “Model-based grasping of unknown objects from a random pp. 5067–5074, IEEE, Seattle, WA, USA, May 2015. pile,” Robotics, vol. 8, no. 3, p. 79, 2019. [65] Y. Hu, Z. Li, G. Li, P. Yuan, C. Yang, and R. Song, “Devel- [48] S. Qiu, D. Lodder, and F. Du, “HGG-CNN: the generation of opment of sensory-motor fusion-based manipulation and the optimal robotic grasp pose based on vision,” Intelligent grasping control for a robotic hand-eye system,” IEEE Automation and Soft Computing, vol. 26, no. 6, pp. 1517–1529, Transactions on Systems, Man, and Cybernetics: Systems, 2020. vol. 47, no. 7, pp. 1169–1180, 2016. [49] J. Long, E. Shelhamer, and T. Darrell, “Fully convolutional networks for semantic segmentation,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3431–3440, Boston, MA, USA, April 2015. [50] G. Dong and Z. H. Zhu, “Incremental inverse kinematics based vision servo for autonomous robotic capture of non- cooperative space debris,” Advances in Space Research, vol. 57, no. 7, pp. 1508–1514, 2016. [51] S. S. Saab and R. Toukhtarian, “A MIMO sampling-rate- dependent controller,” IEEE Transactions on Industrial Electronics, vol. 62, no. 6, pp. 3662–3671, 2014. [52] A. Loria, “Observers are unnecessary for output-feedback control of Lagrangian systems,” IEEE Transactions on Auto- matic Control, vol. 61, no. 4, pp. 905–920, 2015. [53] D. Shen and S. S. Saab, “Noisy output based direct learning tracking control with Markov Nonuniform trial lengths using adaptive gains,” IEEE Transactions on Automatic Control, [54] S. S. Saab, D. Shen, M. Orabi, D. Kors, and R. Jaafar, “Iterative learning control: practical implementation and automation,” IEEE Transactions on Industrial Electronics, vol. 69, 2021. [55] R. H. Jaafar and S. S. Saab, “Sliding mode control with dirty derivatives filter for rigid robot manipulators,” in Proceedings of the 2018 9th IEEE Annual Ubiquitous Computing, http://www.deepdyve.com/assets/images/DeepDyve-Logo-lg.png Journal of Robotics Hindawi Publishing Corporation

Autonomous Robotic Manipulation: Real-Time, Deep-Learning Approach for Grasping of Unknown Objects

Loading next page...
 
/lp/hindawi-publishing-corporation/autonomous-robotic-manipulation-real-time-deep-learning-approach-for-VVSQKfkRxO

References (65)

Publisher
Hindawi Publishing Corporation
Copyright
Copyright © 2022 Malak H. Sayour et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
ISSN
1687-9600
eISSN
1687-9619
DOI
10.1155/2022/2585656
Publisher site
See Article on Publisher Site

Abstract

Hindawi Journal of Robotics Volume 2022, Article ID 2585656, 14 pages https://doi.org/10.1155/2022/2585656 Research Article Autonomous Robotic Manipulation: Real-Time, Deep-Learning Approach for Grasping of Unknown Objects 1 2 1 Malak H. Sayour , Sharbel E. Kozhaya , and Samer S. Saab Department of Electrical Mechatronics and Computer Engineering, Lebanese American University, Beirut, Lebanon Henry Samueli School of Engineering, University of California, Irvine, USA Correspondence should be addressed to Samer S. Saab; ssaab@lau.edu.lb Received 24 January 2022; Revised 15 April 2022; Accepted 17 May 2022; Published 30 June 2022 Academic Editor: Weitian Wang Copyright © 2022 Malak H. Sayour et al. �is is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. Recent advancement in vision-based robotics and deep-learning techniques has enabled the use of intelligent systems in a wider range of applications requiring object manipulation. Finding a robust solution for object grasping and autonomous manipulation became the focus of many engineers and is still one of the most demanding problems in modern robotics. �is paper presents a full grasping pipeline proposing a real-time data-driven deep-learning approach for robotic grasping of unknown objects using MATLAB and convolutional neural networks. �e proposed approach employs RGB-D image data acquired from an eye-in-hand camera centering the object of interest in the Šeld of view using visual servoing. Our approach aims at reducing propagation errors and eliminating the need for complex hand tracking algorithm, image segmentation, or 3D reconstruction. �e proposed approach is able to e‘ciently generate reliable multi-view object grasps regardless of the geometric complexity and physical properties of the object in question. �e proposed system architecture enables simple and e’ective path generation and a real-time tracking control. In addition, our system is modular, reliable, and accurate in both end e’ector path generation and control. We experimentally justify the e‘cacy and e’ectiveness of our overall system on the Barrett Whole Arm Manipulator. designing hand-engineered features for the detection of 1. Introduction grasps. �e performance is then evaluated according to Recent advancements in robotics and autonomous systems physical models [6] such as the ability to resist external have resulted in an increase in autonomous capabilities and wrenches [7] or the ability to constrain the object’s motion the use of more intelligent machines in a wider range of [6, 8]. Such hand-engineered function could reach a very applications [1–4]. Gripping various geometric objects is a complex state which implies lots of generalization and key competency for robots to achieve more general-purpose simpliŠcation of the kinetic and physical problem. �e work functionality [5]. �e perceptual abilities of such general- in [9] describes an automated grasping system for 2.5D purpose robots might be used to visually recognize robotic objects. �e approach relies on capturing the object surface end e’ector conŠgurations to safely grab and lift an object curvature with a focus on concave and convex edges to without slipping [5]. Multiple techniques have been pro- achieve 2D immobilization using numerical computation posed to generate e’ective grasps for everyday robotic ap- and hand-engineered equations. �e optimal grasp is then plications. Nevertheless, robotic grasping still provides many generated and extended to 2.5D by performing tactile challenges for researchers and is still among the most de- probing. �e work in [10] proposes a grasping algorithm for manding problems in modern robotics. 3 Šnger grippers. �eir grasp synthesis approach relies on �e various approaches of robotic grasping can be di- object curvature by mean of a curvature function. �e vided into two categories, analytical and data-driven [5]. generated grasps are then Šltered to achieve maximum Most of the e’ort of traditional analytical approaches lies in stability using multiple kinematic criteria such as the force- 2 Journal of Robotics appealing for heavy industrial applications [37]. Real-time closure criterion and finger adaptation criterion. Analytical methods may not be the ideal approach for the exploration control is a critical and essential feature for operation of autonomous industrial manipulators [38]. Multiple of previously unknown scenes since they restrict the system to situations predicted by the programmer [5]. In addition, techniques have emerged to enable real-time control of such methods tend to heavily rely on the geometric features autonomous systems [39, 40]. To enable flexible and easy of a specific object, which might be corrupted with partial control of industrial manipulators, suppliers like universal and noisy information from image and depth sensors [11]. robot or Barrett often include external controllers or On the other hand, empirical methods provide an increased internal PCs. )ose external controllers however increase cognitive and adaptive capacity to the robots, while elimi- the cost of robot manipulators and make it even harder for companies to adopt autonomous solution for their desired nating the need to manually model a robotic solution [12]. Some techniques associate good grasps points with an offline applications. Implementing easy flexible and modular real-time controllers could help engineers build real-time database [11] reducing the challenge to object recognition and pose estimation during online execution. However, this solutions and decrease the cost autonomous systems. Industrial companies such as Ocado technologies, BMW approach is unsuitable for the exploration of unknown scenes and is unable to generalize to new objects. group, and Amazon are looking for finding simple and Deep leaning techniques were proven efficient in effective solutions that can lead to accurate grasp generation solving multiple automation problems [13–15] in a variety of unknown objects with low computation efforts [41, 42]. of fields including medicine [16–18], security [19, 20], )e latter is considered as the primary motivation behind management [21–24], and Internet of things [25]. With our work. We propose a simple, effective, and complete the proliferation of vision-based deep-learning techniques grasp pipeline for practical applications, a pipeline that is easy to implement by novice engineers, easy to handle by [26–29], significant advancements have been made in the generation of potential grasps. Multiple deep-learning average computers, and performs well in noisy industrial environment. We present a full real-time grasping pipeline approaches have been studied over the years; see, e.g., [5] and the references therein. Some of the methods rely on with real-time MATLAB Simulink controller and data- driven deep-learning approach for robotic grasping of un- deploying static cameras for object detection and pose estimation [30], and the other class makes use of eye-in- known objects using generative grasps Deep Convolutional hand cameras mounted on the robot manipulator [31]. Neural Networks (CNN). Although static cameras can achieve efficient task per- We make the following contributions: formance, they require complex hand tracking algo- (i) Computer Vision: the proposed computer vision rithms, 3D reconstruction, and image segmentation for approach integrates multiple simple and effective accurate estimation of object pose and orientation [30]. As techniques for grasp generation. In addition, relying for the control and decision-making problem, multiple on eye-in-hand camera and visual servoing, we are architectures have been proposed to achieve autonomous able to reduce the error propagation while elimi- manipulation [30–32]. )e work in [30] proposes the use nating the need of complex hand tracking algo- of a logical structuring of tasks driven by a behavior tree rithms. Detecting and centralizing the object in architecture. )is allows a reduction in execution time question helped in capturing the object features and while integrating advanced algorithms for autonomous led to more accurate grasps. manipulation. However, behavior trees are found to be (ii) Real-Time control: this paper proposes a real-time computationally heavy due to the involved ticking pro- controller using MATLAB Simulink real time for cedure [33]. For complex movement tasks, the reader is autonomous manipulation. )e proposed real-time referred to the work in [34]. )is study proposes a co- controller eliminates the need of internal PC or the variance matrix adaptation-evolution strategies to ensure need of external controllers. Our proposed ap- the safety and robustness of the robot’s movements in a proach involves a simple block diagram, highly noisy environment. )e approach takes advantage of the flexible and more intuitive for the user. dynamic movement primitives method to represent the robot’s reference trajectories in joint space that are used (iii) Flexible grasp generation: since CNN limits the for trajectory tracking. A vision system is employed to get grasp generation to the normal of the plane of view, the coordinates of the target object. )e effectiveness of we propose a flexible multi-view grasp generation the entire system is tested on a 4-DOF Barrett WAM while eliminating such limitations. )e generated robotic arm. )e Robot Operating System (ROS) on the grasp is therefore not restricted to a single plane but other hand is also widely used for both the control rather multiple angles of view capturing more problem and interprocess communication [12]. ROS is an features. open-source robotic middleware platform for large-scale )e structure of the paper is as follows. Section 2 robotic system development. )e work in [35] provides a presents the proposed system as well as an overview of the structured communication layer on top of the host op- methods used for autonomous grasp generation and ma- erating system for heterogeneous cluster computing [36]. nipulation. Section 3 includes our experimental results. While ROS is proven efficient in robotics applications, Finally, the paper comes to a conclusion and motivates real-time capabilities remain a serious issue [37]. ROS further challenges. does not support real-time applications and may not be Journal of Robotics 3 graspable object is needed. Coded in Python, we implement 2. Methodology a task planner that uses PyTorch and OpenCV to detect and In this section, we describe each part of our system archi- locate the object; we apply visual servoing and generate firm tecture illustrated in Figure 1. In addition, we present our grasps based on RGB-D images of the object in question. proposed grasp planning system that deals with object de- More details about the planner are provided in the following tection, object centralization, and grasp generation algo- section. Once the desired end effector is generated, the rithm; path planning that generates the desired trajectories output is sent to the real-time controller for execution. of the robot joint positions; and the associated tracking controller. 2.2. Grasp Planning. Generating a gripper configuration that maximizes a success metric consists of the main objective of 2.1. System Architecture grasp planning. Taking into consideration numerous object shapes and all possible orientation variations in the work- 2.1.1. Hardware. For autonomous grasping, the Barrett space, the planner aims at providing the correct grasp Whole Arm Manipulator (WAM) 7-DOF robotics arm is representation that leads to successful object gripping. employed. )e WAM arm has direct-drive capability sup- Robust grasp planning considers the presence of pertur- ported by Transparent Dynamics between the motors and bation in the object properties such as shape and poses joints providing a robust control of contact forces. Mounted caused by noisy and partial sensor readings as well as control on the Barrett manipulator is the BH8-series Barrett Hand, a imprecision. To ensure successful and accurate grasp highly flexible eight-axis gripper with three fingers. In our planning, three key tasks are defined: (1) object detection, (2) work, the third finger is removed to simulate a parallel plate object centralization, and (3) grasp generation. gripper effect. For RGB-D image acquisition, the Microsoft Azure Kinect RGB-D sensor is mounted on the wrist to achieve dynamic sensing of the environment and objects. 2.2.1. Object Detection. Autonomous manipulation requires )e Azure Kinect integrates a Microsoft-designed 1- localization of the desired graspable objects in 3D space. )is Megapixel Time-of-Flight (ToF) depth camera using the information is essential for motion planning since it sets the image sensor presented at ISSCC 2018. )e depth sensor goal location for the manipulator end effector. For per- supports the following five modes: NFOV unbinned, ceiving objects in the vicinity of a robot, 2D vision data such NFOV 2 × 2 binned, WFOV 2 × 2 binned, WFOV unbinned, as edge information are used to localize the object. A 3 × 3 and passive IR. kernel horizontal and vertical edge enhancement is imple- mented. )e resultant image tends to be noisy which ne- cessitates the use of noise reduction where in our case we 2.1.2. Real-Time Controller and Communication. We set the employ erosion kernels. A 3 × 3 kernel is subsequently ap- Barrett WAM control loop frequency at 500 Hz. )at is, at plied to reduce noise and distortion in the image followed by each 2 ms interval, a signal is sent to the joint encoders a 3 × 3 dilation kernel to restore the previous edge scale. (puck) for the arm to function properly; otherwise, this Following this step, bounding boxes are generated around might lead to a heartbeat fault where the manipulator would the available objects each with a coupled ID. After specifying stop operating. It is important to note that along the ex- the desired Object ID, the generated Bounding Box is locked periment the internal PC of the Barrett WAM was not used. reducing any effect of the external environment noise and/or In addition, for the closed-loop control, current joint angle perturbation. Figure 2 illustrates the steps involved in our acquisition is necessary as well as the desired end effector object detection approach. pose. )is calls for a real-time controller that will ensure stable communication in a precise time window. For that purpose, MATLAB Simulink real-time is selected for robot 2.2.2. Object Centralization. )e main cause behind inac- manipulation. To ensure reliable communication between curate grasp generation is in the camera perspective. For the WAM and the MATLAB, a UDP communication is robust grasp generation, localizing the object is not sufficient established. In particular, a UDP-receive and UDP-send and uniform object locations in the field of view of the block is used. Each joint has a unique ID which helps in camera could be beneficial. Furthermore, most depth identifying the origin of the message sent as well as the cameras possess an optimal working range. However, when destination of the generated control signal. A customized dealing with unknown objects with different heights and UDP-to-CAN bus communication converter is imple- shapes, using a stationary camera would not be the best mented allowing for commands send by UDP from the solution. In our system, we install the camera at the end MATLAB controller to be converted to CAN and sent to the effector making the camera dynamic. As the end effector or WAM joints. camera moves towards an object from its initial position, a better shot can be captured depending on the object shape 2.1.3. Task Planner and Computer Vision. Once our real- and camera position. )is approach assists in reducing the errors resulting from multilevel processing and camera time controller is implemented, our manipulator becomes capable of reaching the desired end effector target. At this perspective. Visual servoing, also known as vision-based robot control, is implemented to center the object in the field stage, a task planner that generates the desired end poses based on the surrounding environment and the desired of view of the camera. Visual servoing [43] is a technique 4 Journal of Robotics Customized UDP to CAN bus communication converter Real Time Controller Task Planner (i) Path generation (i) Object detection (ii) Inverse Kinematics Kinect RGB-D (ii) Visual Servoing (iii) Gravity Compensation sensor (iii) RGB-D image processing (iv) PID Controller (iv) Grasp generation using CNN Barrett WAM (v) Motor and encoders transformations and Hand Figure 1: System architecture. Bilateral Filter Edge 3×3 Erosion 3×3 Dilation Dispaly Enhancement Figure 2: Object detection steps. postprocessing techniques and present our CNN for grasp that uses feedback information extracted from a vision sensor to control the motion of a robot [44]. )is technique generation. (see Table 1) minimizes the camera distortion effects and ensures that the object is fully contained in the field of view. )ree visual (1) Preprocessing. Irrespective of the available depth-sensing servoing techniques are discussed in the literature; see, e.g., techniques such as Stereo Reconstruction, Infrared, Time of [45], in particular, position-based visual servo (PBVS) and Flight, and LiDAR, the depth information is always bounded image-based visual servo (IBVS) and hybrid visual servo. In by a minimum and maximum range, beyond which the PBVS, vision data is used to reconstruct the 3D pose of the depth information is clipped. Adding to this, light reflection, manipulator and a kinematic error is generated and mapped refraction, diffusion, and shadows all contribute to lowering to the robot actuator commands. In IBVS, the position error the efficiency of the depth sensor and producing invalid is generated directly from the image plan features. Fur- spots in the depth image. To mitigate the effect of invalid data in our depth image that produces invalid grasps, thermore, there exist two architectures for visual servo in- tegration in the control loop. )e first architecture is known prepossessing of the depth map is crucial before further data manipulation. To do so, we propose a technique that detects as dynamic look and move, which relies on external input provided by the vision system to the joint closed-loop invalid pixels in the depth image and reconstructs the invalid control of the manipulator. )e second architecture is data. )e proposed preprocessing method is detailed in known as direct visual servo, which uses the vision system Algorithm 1 and illustrated in Figure 3. directly in the control loop. In our work, we use an image- based visual servo in a dynamic look and go configuration. (2) Generative Grasp Convolutional Neural Network. As )e bounding box location is used as our visual feedback and mentioned in the introduction section, the aim of our work the arm is kept moving until the calibrated camera center-line is to implement a full, simple, and effective grasp pipeline to is aligned with the object location, and until the distance from enable grasping of unknown objects that can be easily adopted in the industry. )us, one of the main factors to be the camera to the object’s uppermost plane is equal to an “optimal” depth value for grasp generation. Consequently, the considered is the speed and size of the network. Many acquired depth image is used for grasp generation. networks have been implemented in the field to enable accurate and precise grasp generation such as the network presented in [48]. For our pipeline, we use a fully con- 2.2.3. Deep-Learning-Based Grasp Generation. In order to volutional neural network for grasp generation that is generate efficient grasps for diverse geometric shapes, the proposed in [31], which is a generative grasp convolutional employed deep-learning approach is presented. To ensure neural network. )e main advantage behind this network firm grasping with an effective approach in capturing the relates to its small number of parameters. It consists of six geometric key-points of the objects, we use preprocessing convolutional layers (see Figure 4) with a total of 62,420 techniques targeting edge enhancement, and postprocessing parameters compared to 8,806,986 parameters in the net- techniques for feature matching. In what follows, we de- work proposed by [31]. )e relatively small number of scribe the steps employed in our preprocessing and parameters yields less computation efforts that can be easily Journal of Robotics 5 Table 1: What our work has to offer. A comparison between our approach and the related work in the field. Feature [46] [11] [30] [31] [47] Our approach Unknown objects No Yes No Yes Yes Yes Object feature matching NA NA NA No Yes Yes Eye-in-hand camera and object centering No No No Yes No Yes Grasp generation complexity DL + analytical DL 3-stage DL Analytical DL Training data availability No No NA Yes NA Yes Real robot experiments No Yes Yes Yes Yes Yes Multi-view grasp generation No No No No No Yes Robot control No No Yes No No Yes Real-time control without arm controller No No No No No Yes Depth image with (i) 2D gradient (iv) Flood filling Depth inpainting invalid (ii) Thresholding (iv) Dilation and smoothing spots (Black) (iii) Creating mask (OpenCV method) Figure 3: Depth image preprocessing steps. (1) Capture image (with likelihood of invalid depth data) (2) Generate 2D gradient of the depth map (3) )reshold the gradient to produce a binary mask (4) Flood-fill the mask. At this stage a mask that indicates all the invalid spots in the depth image is generated (5) Dilate generated mask with a 3 × 3 kernel (6) Feed the generated mask to OpenCV inpainting function that attempts to deduce and reconstructs the invalid data from the pixels’ value in its neighborhood ALGORITHM 1: Preprocessing algorithm. 300×300 inpainted depth image pi/2 0 θ -pi/2 300×300 pixels Figure 4: Description of the neural network architecture that is implemented in our study. )e generative grasp CNN takes as an input an inpainted depth image, and generates a grasp pose for every pixel, comprising the grasp quality, Q , grasp width, W , and grasp angle, φ . Θ Θ Θ conv 9×9 32 Filters, Stride 3 conv 5×5 16 Filters, Stride 2 conv 3×3 8 Filters, Stride 2 conv Transpose 3×3 8 Filters, Stride 2 conv Transpose 5×5 16 Filters, Stride 2 conv Transpose 9×9 32 Filters, Stride 3 6 Journal of Robotics handled by our employed network compared to other more eliminate all false-positives and would only keep the valid evolved CNN used for the same purpose. grasps for consideration. On the other hand, in most cases, the width of the line )ere are two main advantages of the proposed GG- CNN over other state-of-the-art grasp synthesis CNNs. estimated by the neural network does not match the exact First, rather than sampling grasp candidates, we directly width of the object’s feature. A discrepancy between the real create grasp poses on a pixel-by-pixel basis, similar to feature width and the manipulator opening width can lead to improvements in object detection, where fully con- undesirable results and failure of the grasping attempt. To volutional networks are frequently utilized to provide mitigate the effect of this mismatch, we propose a post- pixel-wise semantic segmentation rather than sliding processing method presented in Algorithm 2 and illustrated windows or bounding boxes [49]. Second, unlike other in Figure 5. CNNs used for grasp synthesis, our GG-CNN has orders Adding to the above, the neural network can detect of magnitude fewer parameters, allowing for fast closed- various potential grasps for a unique object. To eliminate loop grasping. Our grasp detection pipeline can be ex- further chances of errors and confusion, ten trials are specified. Similar grasps are aggregated based on the ecuted in only 19 ms on a GPU-equipped desktop computer. output of the trials. After querying the trial, the ma- )e neural network proposed approximates the complex nipulator considers the most recurring grasp as the final �����→ function M: I ⟶ G, by training the network. M(I) can be potential grasp that is ready for application. Once the learned by using a set of input images IT and corresponding ideal grasp is generated, there is no need to keep the output grasps GT. )e network equation can be expressed by object in the center of view. )e next challenge lies in applying the grasp by accurately controlling the robotic M (I) � Q ; φ ; W 􏼁. (1) manipulator. Θ Θ Θ Θ )e grasp map G estimates the parameters of a set of grasps, for each Cartesian point in the 3D space corre- 2.3. Path Planning sponding to each pixel in the captured image. It constitutes a 2.3.1. Inverse Kinematics. )e motion planning component set of 3 images denoted as, Q, φ, and W. )e representations are as follows: of the system is responsible for computing trajectories for the arm that will place the gripper fingers at or close to M (I) denotes the neural network with Θ being the weights of the network and I representing the 300 × 300 the generated grasp configuration. To complete this task, a mapping between the world and joint coordinates is pixel RGB-D input image. Q describes the grasp quality. Its scalar value ranges within [0,1] with 0 indicating the required. We approach this problem by using inverse ki- lowest quality while one the highest grasp quality. φ nematics. Inverse kinematics (IK) is a technique that maps denotes the grasp angle where the angles belong to the end effector position to the corresponding joint angles. [(−π/2), (π/2)]. W describes the grasp width of a grasp to Given a desired end effector pose, the IK solver generates the be executed. joint angles needed to move the arm to the desired pose )e network is trained over the Cornell Grasping Dataset avoiding collisions with itself. To compute the correct joint positions that leads to a certain six-dimensional Cartesian which contains 885 RGB-D images of everyday objects, with a variable number of human-labeled positive and negative pose, detailed knowledge of the geometry and configuration of the robot is essential. )e SolidWorks Computer Aided grasps. )e network is trained for 32 epochs over 8 hours on an NVIDIA Quadro P100 GPU. Design (CAD) model provided by Barrett offers detailed information about the joint configuration of the robot. To (3). Postprocessing. )e grasp representation generated by transfer this information into our controller, a Unified the GG-CNN output consists of a line position coupled with Robot Description (URDF) model is generated. )e URDF a specific width and angle. )e line position specifies where model takes advantage of the geometric data provided by to look for potential grasp in the depth image, while the SolidWorks and generates an XML file that contains detailed information associated with each link such as the joints angle provides us with the information about the rotation of the manipulator end effector with respect to its normal. After limits and link length. )e latter parameters are listed in Table 2. experimenting with the neural network performance, a considerable amount of false-positive potential grasps has To apply inverse kinematics (IK), the Inverse Kinematics block provided by Simulink is used. )e IK solver provided been observed. )is can be brought back to the noise in- troduced by the background of the object (i.e., uneven by MATLAB allows for accurate joint angle computation without the need for in-depth knowledge of the robot surface of the table that translates into a small variation in the depth image and triggers the GG-CNN to detect an geometry and joint configuration. It also eliminates the invalid grasp) or sensor readings. For this purpose, we need of setting Denavit–Hartenberg (DH) frames trans- implement a grasp validation condition that checks the formation necessary for generation of the robot inverse amount of depth variation in the potential grasp in order to kinematics model. )e IK block requires a stored con- figuration file and three main inputs. )e configuration increase the chances of successful grasps. In particular, if depth is below a certain threshold, the grasp generated is file allows for specific equation generation associated with the described robot geometry. )e generated URDF model disregarded and not accounted for. )is approach would Journal of Robotics 7 Unmatched width Matched width 1.0 0.5 0.0 –0.5 –1.0 0 20 40 60 80 100 120 Pixels Figure 5: Depth image postprocessing: matching the generated grasp width to the object feature. )e blue line represents the grasp generated by the GG-CNN. )e purple line represents the processed grasp generated matching the grasp line to the object width. (1) Plot the depth versus image pixels from the starting point of the line to its end. th (2) Apply a low pass filter to the data. In this paper, a 5 -order Butter-worth filter is selected chosen due to its flat profile across the bandwidth of interest (3) Take the gradient of the low pass filtered depth data. (4) Find all the local minima and maxima above a certain threshold. )is is essential due to the fact that some small irregularities in object’s surface can induce a local minima or maxima. (5) Select the first minimum and last maximum and compute the grasp width from the distance between the generated points. ALGORITHM 2: Postprocessing algorithm. Table 2: Barrett WAM joint limits and link lengths. Joint Lower limit (rad) Upper limit (rad) Link Length (mm) J1 −2.6 2.6 Base link 186 J2 −2.0 2.0 L1 550 J3 −2.8 2.8 L2 300 J4 −0.9 3.1 L3 60 J5 −4.76 1.24 J6 −1.6 1.6 J7 −3.0 3.0 Depth 8 Journal of Robotics configurations can lead to the same end effector pose. is used as a configuration file. )e proposed method makes the approach highly flexible. Changing the entire ma- Inconsistency of joint configurations or non-opti- mal path generation can be controversial. )e work nipulator geometry or even using another manipulator can be simply done by replacing the configuration file in [50] proposes an incremental inverse kinemat- stored in the IK block by the corresponding arm model, ics-based vision servo approach for robotic ma- eliminating the need for link parameter adjustment or axis nipulators to capture non-cooperative objects reconfiguration. autonomously. )e proposed approach generates )e first input specifies the current joint angle config- the joint angles iteratively at each time stamp based uration of the robot. )e second input allows us to control on the object location detected by a stationary the error tolerance of the end effector position. )at is, if the camera. )e approach assumes a fixed-camera- end effector is within the specified margin, the pose is frame to base-frame transformation and uses the actual robot pose as a starting point to generate considered as attained. )is input is presented with a list of six variables: 􏼈δx, δy, δz􏼉 represent the error tolerance of the appropriate joint angles. Inspired by this technique, we use the current pose of the robot manipulator as Cartesian translation components of the end effector, and 􏼈δΨ, δθ, δϕ􏼉 represent the error tolerance of the Euler ro- an initial guess in the provided IK block. )erefore, tation components of the end effector. )e Euler angle configurations closer to the actual robot state are system is a method to describe the coordinate transfor- favored over others. However, since the base joint mations. It consists of three independent variables, ψ θ, ϕ of the Barrett is limited to −2.6 and 2.6 radiant as defined as successive planar rotation angles around the x, y, shown in Table 2, choosing the closest joint config- and z axes, respectively. )e latter values describe the end uration as a starting point can result in longer path leading to non-optimal path generation. To resolve effector pose. High tolerance in the translational x-axis means that a relatively high error in the x translation this issue, we propose the use of the two inverse ki- nematics modules. One uses the current joints as component of the gripper is acceptable. Similarly, a relatively high tolerance in the rotational z-axis means that a high initial guess and the other has no initial guess. )e two outputs are fed into an error function. Based on the error in the z orientation component is acceptable. )is helps the IK solver in defining the most critical position output of this function, the configuration with smaller components and plan the path accordingly. error is used. )e third and final input deals with the initial guess. )e proposed approach is simple and comes with low Inverse kinematics algorithms do not provide one unique computational efforts. )e efficacy of our proposed so- joint configuration. Multiple joint angles, as well as multiple lution is justified experimentally. It also generates the paths, can lead to the same end-effector pose in the 3D space. joint angles within five milliseconds. )is approach Some IK solvers fail to converge sometimes with the eliminates the need for generating adequate initial guess presence of strict time limits and others require heavy or the use of optimization techniques. It is also important computation to plan one path. )e initial guess helps the IK to note that the generated arm motion is not limited to a solver in finding the best path by providing an ultimate planar motion and can follow any reachable path in the 3D starting point for the algorithm. )is approach reduces space. computation, makes the planner faster, and leads to a more accurate path. )e current joint angles are fed as an initial 2.3.2. Coordinate System. For robotic manipulation, we guess to inform the solver that joint angles closer to the define two coordinate systems depicted in Figure 6, namely, actual position are desired. )is approach helps with the A and B. We set A, the frame corresponding to the base link elimination of the singularities in the joint configuration that of the Barrett WAM, to be the base frame. We define B to be rotates the base by 360 degrees, for example, to reach a pose the camera frame placed at arm end effector. To obtain the while zero degrees can do the job. camera frame, the forward kinematics plugin provided by To make sure the joint configuration generated from MATLAB is used. )e camera fixation part is designed in inverse kinematics is reliable, we implement the following SolidWorks and added to the urdf model of the manipulator two-step approach: as a virtual link. )e latter is then used to generate the camera (1) In order to establish our operating space domain, joints frame using forward kinematics and end effector pose. Fol- limits are employed. In particular, before generating lowing this coordinate system, all end effector goal poses are the urdf file, safe joint limits are added to the CAD defined with respect to the manipulator base. Similarly, all design on SolidWorks and the unified robot descrip- detected poses and vision output are transformed back from tion model is generated accordingly. )e generated the camera frame to the base frame. )e joint coordinates are URDF is then stored inside the IK block. By following defined with respect to the manipulator home pose, which this approach, all unattainable singularities are auto- implies a fully extended arm in the positive z direction. matically dropped due to violation of the joint limit )erefore, a pose in the joint space is based on seven joint constrains where the space of singularities becomes angles describing the difference angle between the joint goal confined to the operating space of the robotic arm. and home pose. Mapping from a 6D pose defined in the global (2) It is widely known that Inverse kinematics does not base frame to seven joint values defined in the joint space is generate unique solutions and that multiple joint the main task of the inverse kinematics block. Journal of Robotics 9 Task space x  y  z X, Y, Z coordinates grasp width grasp Angle End effector Axis‐Angle representation Figure 6: Coordinate system and grasp representation. )e x-axis is represented in red, y-axis in green, and the z-axis in blue. Joint 1 Joint 2 Joint 3 Joint 4 Joint 5 Joint 6 Joint 7 Torques Figure 7: )e first seven subplots illustrate the tracking performance of the employed controller corresponding to the seven joints. Each of the seven plots includes the reference trajectory of the joint position versus its actual position. )e eighth subplot shows the relatively smooth torques that are associated with the seven joints. Figure 8: Modeling of finger closing path. 2.3.3. Safe Motion. Once the joint positions are calculated by exceeds TL2, a torque fault is raised and the arm stops. To the IK block and before feeding it to the position controller, avoid any velocity or torque fault, a velocity limiter block is certain conditions have to be applied to the velocity to secure added. )e specified block can slow down the velocity while safe control. In Barrett safety system, VL2 is defined as the keeping the control operation safe. )is step fortifies a safe safety system’s motor velocity fault limit while TL2 repre- operation but can introduce jerky movements. To ensure a sents the safety system’s motor torque fault level. Velocities smoothed path, a spline function is used. )is function higher than VL2 cause a velocity fault and stop the entire smooths out the generated path while securing zero initial arm motion in a E-stop effect. Similarly, if any joint torque velocity and acceleration. 10 Journal of Robotics 2.4. Controller. )e nonlinear dynamics model describing order filter at the PID inputs to reduce the measurement the continuous-time dynamics of a rigid n-link serial robot noise. manipulator, with all actuated revolute joints described in Before sending the torques to the arm, appropriate joint coordinates, can be represented by the Euler-Lagrange transformations are applied to transform the joint torques to model as follows motor torques and then to motor current. For that, we use the following: u � M(q)q € + C(q, q _)q _ + b(q _) + g(q), (2) M � VJ , (3) τ τ i i n n where q ∈ R represents joint angles, q _ ∈ R represents where i is the joint number and V is the joint to motor n×n joint velocities, M(q) ∈ R is a symmetric positive torque transformation matrix provided by Barrett and n×n definite inertia matrix, C(q, q _) ∈ R is the Coriolis and defined as centripetal matrix, b(q _) ∈ R represents the joint friction, n n g(q) ∈ R is the gravity vector, and u ∈ R stands for the ⎧ ⎪ ⎫ ⎪ ⎪ ⎪ − 0 0 0 0 0 0 ⎪ ⎪ ⎪ ⎪ torque input. ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ Once the desired path is generated, the required torques ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ needed to move the end effector to the desired pose must be ⎪ ⎪ ⎪ ⎪ ⎪ 1 n ⎪ ⎪ 3 ⎪ automatically generated. Many different discrete-time ⎪ ⎪ 0 − 0 0 0 0 ⎪ ⎪ ⎪ 2N 2N ⎪ multivariable controllers have been proposed that can be 2 2 ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ applied to our problem such as for the deterministic setting ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ [51, 52] and the references therein. However, the inevitable ⎪ ⎪ 1 n ⎪ ⎪ ⎪ ⎪ noise associated with the measurement of the joint angles 0 − − 0 0 0 0 ⎪ ⎪ ⎪ ⎪ ⎪ 2N 2N ⎪ 2 2 ⎪ ⎪ yields poor performance and possible vibrations in the arm. ⎪ ⎪ ⎪ ⎪ )ere are also multivariable iterative-type controllers with ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ the capability of suppressing measurement noise such as the ⎨ ⎬ 0 0 0 − 0 0 0 , (4) ones proposed in [29, 53, 54]; however, accurate tracking ⎪ ⎪ ⎪ N ⎪ 4 ⎪ ⎪ ⎪ may require several iterations, and the learning would be ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ based on a specific trajectory. A common class of closed- ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ loop controllers employs slide-mode approach; see, e.g., ⎪ ⎪ 1 n ⎪ 6 0 0 0 0 − 0 ⎪ ⎪ [55, 56] and the references therein. )ere are also several ⎪ ⎪ ⎪ 2N 2N ⎪ ⎪ 5 5 ⎪ ⎪ ⎪ optimal stochastic simultaneous multivariable controllers ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ such a PD-type [57], and ones with gravity compensation ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ 1 n [58], PID-type [59–61], and even PDD [62]. Although the ⎪ ⎪ ⎪ ⎪ 0 0 0 0 0 ⎪ ⎪ ⎪ ⎪ works in [55–57, 62] include experimental justification on ⎪ 2N 2N ⎪ 5 5 ⎪ ⎪ ⎪ ⎪ the Barrett WAM, they do not deal with controlling its ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ gripper. All of the aforementioned strategies and many ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ others come with a number of unique features; however, ⎪ 0 0 0 0 0 0 − ⎭ they require deep comprehension of control theory and/or some knowledge of the statistical models. In this paper, we and N and n are constants provided by Barrett. i i use simple single-input single-output PID controller that Following this step, we multiply the output by the can be designed by any novice engineer. In particular, we motor-toque-to-motor-current ratio provided by Barrett tackle the torque control problem by introducing closed- and finally send the signal to the arm pucks for motion. loop PID control with gravity compensation. Gravity )e tracking performance of the seven joints and corre- compensation is a well-known technique in robot design, sponding torques are shown in Figure 7. It is important to which compensates for the force generated by the gravity. note that although the Barrett WAM comes with a force- )is technique requires the knowledge of the gravity model. torque sensor at the wrist, we do not use the sensor in our We use the corresponding model parameters provided by controller. Barrett. To account for gravity forces, we set B(q) to be the inverse of the inertia matrix, which is a symmetric and − 1 nonsingular. )at is, B(q) � M (q). )us, the gravity compensation term A(q) is the product of the inverse inertia 2.4.1. Gripper Control. For the control of the Barrett WAM matrix and the gravity force vector, gripper, no closed-loop control is needed to be implemented − 1 A(q) � −M (q)g(q) � −B(q)g(q). )is term is added to and no parameters are needed to be set or tuned. )e gripper the control vector. )e gravity compensation is imple- simply accepts finger commands in a specific CAN format mented in a MATLAB Simulink block and added to the provided by Barrett that allows to move the gripper fingers to controller. For closed-loop control, seven independent PID the desired pose as well as open and close commands for controllers are implemented and manually tuned for each grasping and disposing objects. Since the closing profile of joint. To tune the PID, a sine wave is used as a fictitious the gripper is curved rather than being a straight line as reference trajectory of the joints, and the controller pa- shown in Figure 8, modeling of the gripper closing path rameters are adjusted accordingly. We also deploy a first- needs to be done in order to link the opening width to the Journal of Robotics 11 Table 3: Grasp performance. Item Box Tape Piller Tube Eraser Open box Number of trials 20 25 15 10 20 5 Success rate (%) 90 92 93 80 95 100 Figure 9: Grasp generation results: comparison between grasp generated by the GG-CNN with and without RGB-D image preprocessing for shiny and black objects. )e upper 2 rows show the generated grasp output without preprocessing of the RGB-D data. Grasps generated are invalid and the depth info is corrupted. )e lower 2 rows show the generated grasp output after preprocessing. )e depth data was reconstructed leading to valid grasps. boxes, tape, tubes, and other objects. )e proposed approach finger height and avoid any collision with the table. Fol- lowing this step, we use the postprocessed width generated is shown to be efficient with high-success rates as docu- in Subsection B as a pre-grasping finger opening. )en, only mented in Table 3 even when the objects are shiny and when the gripper reaches the ideal grasping pose, is the associated with a defective depth map. )e used technique closing signal sent. Since the Barrett gripper is equipped with helps in the depth map reconstruction leading to accurate a force-torque sensor, determining the sufficient force to grasp generations. Figure 9 shows the neural network output hold the object in question is not needed. )e gripper fingers grasp before and after applying the pre- and postprocessing stop automatically whenever enough force is applied to techniques. stabilize the object in question. As for the multiple view grasp generation, Figure 9 il- lustrates the results of the aforementioned techniques in filtering disturbances from the external environment. )e 3. Experimental Results grasp generation is successful even on transparent or light reflecting objects. )is technique enables capturing the full To assess the performance of our proposed system, a series of object feature in a simple and effective way without the need experiments are conducted. Multiple unknown objects with for 3D reconstruction or point cloud generation which could different geometric representations are used to test the be challenging in noisy industrial environments. system performance. In our evaluation, we focus on 2 cri- teria: the grasp generation and the control of the manipu- lator. In the first series of experiments, we evaluate the 3.2. Overall Performance. Coupled with a reliable control of performance of our approach for grasp generation. In the the Barrett manipulator, successful grasp generation led to second series, the grasp generation is integrated into the successful grasps of multiple objects regardless of their complete grasping pipeline to assess the torque control and geometric shapes or orientation. Visual servoing perfor- path planning. mance is proven efficient in limiting grasp application error resulting from camera perspective. )e implemented algo- 3.1. Grasp Generation. )e efficiency of the approach fol- rithm also centers the object of interest in the field of view lowed for grasp generation is assessed using various pre- within the desired time frame. Our controller yields smooth, viously unknown objects with complex geometric relatively fast path execution driving the end effector to the representations. )e objects vary between multiple me- desired pose with millimeters accuracy, and matching of the chanical tools, transparent and nontransparent bottles, end effector contact point with the generated grasp, which With pre-processing Without pre-processing Depth RGB image Depth RGB image image with grasp image with grasp 12 Journal of Robotics gripper types. In addition, since convolutional neural made grasping of complex shapes possible. Table 3 shows the success rate of our proposed automated grasping approach. network limits the grasp to the normal of the plane of view, having an eye-in-hand camera allowed for multi- A video describing the overall project performance can be displayed by visiting https://www.youtube.com/watch? view grasp generation. )is technique enables the gen- v�gL5oefp_QKoVideo. eration of grasps in multiple plans covering multiple features and leading to a better grasp generation for complex shapes. 4. Conclusion and Future Work )e multi-view grasp generation was implemented We presentenced a data-driven deep-learning approach to from the vision and grasp generation perspective; how- robotic grasping of unknown objects with eye-in-hand ever, our solution has not employed kinematics. Tech- camera and real-time controller. Our paper has tackled the niques using kinematics would allow approaching the object from various angles, which could significantly autonomous grasping problem from start to finish covering WAM real-time control, data manipulation, grasp genera- increase the grasping accuracy and the object immobility. Furthermore, dynamic collision avoidance implementa- tion, and execution. Our proposed approach is suitable for the exploration of unknown scenes and generation of ef- tion could be of great benefit. Implementing such tech- fective grasps for various geometric shapes and unknown niques might result in a more robust system suitable for objects without the need for complex hand tracking algo- human-robot application by increasing the safety level rithm, image segmentation, or 3D reconstruction. )anks to and flexibility of the system. the unique well-defined decision-making tree and the in- tegration of different techniques, we were able to eliminate Data Availability the error propagation resulting from multilevel processing, camera perspective. and non-ideal depth range. Starting )e justification of this work is experimental and no specific with visual servoing, we centered the object in question in data are available. However, we have a link to a video in the the field of view reducing perceptual error and eliminating manuscript that illustrates the overall performance that can error propagation. Following this step, we implemented be displayed by visiting https://www.youtube.com/watch? high level prepossessing computer vision-based tech- v=gL5oefp_QKo. niques coupled with edge enhancements and depth data reconstruction method to account for shiny surfaces and Conflicts of Interest light variation making the object localization, grasp generation, and execution more robust. For the grasp No potential conflicts of interest are reported by the authors. generation problem, a generative grasp convolutional neural network was implemented providing efficient grasp References configuration for parallel grippers. We topped this by a postprocessing layer that matched the generated width [1] M. Lopes and J. Santos-Victor, “Visual learning by imitation with the object feature gradients. To the best of our with motor representations,” IEEE Transactions on Systems, knowledge, the latter is a functional attribute that has not Man, and Cybernetics. Part B, Cybernetics: a Publication of the been considered in the literature. IEEE Systems, Man, and Cybernetics Society, vol. 35, no. 3, From the robotics and control point of view, we tackled pp. 438–449, 2005. [2] C. Follini, V. Magnago, K. Freitag et al., “Bim-integrated the control of the arm in a simple and effective approach. In collaborative robotics for application in building construction particular, we proposed a simple real-time control imple- and maintenance,” Robotics, vol. 10, no. 1, p. 2, 2021. mentation of the Barrett WAM manipulator using MATLAB [3] B. Abouzeid, G. El Hasbani, J. Abouzeid, S. Halleit, and Simulink real-time module. Unlike the majority of the work K. Jallad, “Cost comparison of gynecologic procedures be- implemented on MATLAB for robotic manipulators, tween the us and a developing country: an observational MATLAB was not only used for simulation or as a tool that study,” Journal of Robotic Surgery, vol. 16, pp. 1–5, 2021. communicates with the arm controller such as the work in [4] E. Fakhoury, P. Culmer, and B. Henson, “)e impact of visual [63, 64]. In fact, we used MATLAB Simulink for the full arm cues on haptic compliance discrimination using a pseudo- control whereas the arm internal PC was left undeployed. By haptic robotic system,” in Proceedings of the 2017 IEEE In- applying our approach to a class of robotic manipulators, ternational Conference on Robotics and Biomimetics (ROBIO), one can eliminate the need of the controller box that takes pp. 1719–1725, IEEE, Macau, Macao, December 2017. [5] S. Caldera, A. Rassau, and D. Chai, “Review of deep learning additional space and increases the cost of the arm. )is could methods in robotic grasp detection,” Multimodal Technologies be substantially beneficial whenever a company desires to and Interaction, vol. 2, no. 3, p. 57, 2018. adapt a robotic solution in limited spaces with lower cost. [6] G. Kootstra, M. Popovic, ´ J. A. Jørgensen et al., “Enabling Furthermore, the proposed approach is not only cost grasping of unknown objects through a synergistic use of edge effective and leads to reduction in space; it is also simple and surface information,” Ae International Journal of Ro- and modular. In our work, the deployed gripper is botics Research, vol. 31, no. 10, pp. 1190–1213, 2012. equipped with a force toque sensor that ensured firm [7] F. T. Pokorny and D. Kragic, “Classical grasp quality eval- grasps. For future work, a force optimization layer as uation: new algorithms and theory,” in Proceedings of the 2013 proposed in [65] can be added to our system to make the IEEE/RSJ International Conference on Intelligent Robots and approach more modular and functional with different Systems, pp. 3493–3500, IEEE, Tokyo, Japan, November 2013. Journal of Robotics 13 [8] A. Rodriguez, M. T. Mason, and S. Ferry, “From caging to scheduling,” Journal of Management in Engineering, vol. 37, grasping,” Ae International Journal of Robotics Research, no. 1, Article ID 04020104, 2021. [23] S. AbdulRahman, H. Tout, A. Mourad, and C. Talhi, vol. 31, no. 7, pp. 886–900, 2012. [9] P. Bender and G. M. Bone, “Automated grasp planning and “FedMCCS: multicriteria client selection model for optimal execution for real-world objects using computer vision and iot federated learning,” IEEE Internet of Aings Journal, vol. 8, tactile probing,” International Journal of Robotics and Au- no. 6, pp. 4723–4735, 2020. [24] B. Tay and A. Mourad, “Intelligent performance-aware ad- tomation, vol. 19, no. 1, pp. 15–27, 2004. [10] A. Morales, P. J. Sanz, A. P. Del Pobil, and A. H. Fagg, “Vision- aptation of control policies for optimizing banking teller based three-finger grasp synthesis constrained by hand ge- process using machine learning,” IEEE Access, vol. 8, ometry,” Robotics and Autonomous Systems, vol. 54, no. 6, pp. 153403–153412, 2020. pp. 496–512, 2006. [25] H. Tout, N. Kara, C. Talhi, and A. Mourad, “Proactive ma- [11] P. Schmidt, N. Vahrenkamp, M. Wachter, ¨ and T. Asfour, chine learning-based solution for advanced manageability of “Grasping of unknown objects using deep convolutional multi-persona mobile computing,” Computers & Electrical neural networks based on depth images,” in Proceedings of the Engineering, vol. 80, Article ID 106497, 2019. 2018 IEEE International Conference on Robotics and Auto- [26] B. M. Haddad, S. F. Dodge, L. J. Karam, N. S. Patel, and mation (ICRA), pp. 6831–6838, IEEE, Brisbane, QLD, Aus- M. W. Braun, “Locally adaptive statistical background tralia, May 2018. modeling with deep learning-based false positive rejection for [12] G. Konidaris, S. Kuindersma, R. Grupen, and A. Barto, “Robot defect detection in semiconductor units,” IEEE Transactions learning from demonstration by constructing skill trees,” Ae on Semiconductor Manufacturing, vol. 33, no. 3, pp. 357–372, International Journal of Robotics Research, vol. 31, no. 3, 2020. pp. 360–375, 2012. [27] A. Almeshal and M. Alenezi, “A vision-based neural network [13] H. Abdellatef, M. Khalil-Hani, N. Shaikh-Husin, and controller for the autonomous landing of a quadrotor on S. O. Ayat, “Accurate and compact convolutional neural moving targets,” Robotics, vol. 7, no. 4, p. 71, 2018. network based on stochastic computing,” Neurocomputing, [28] A. Helwan, M. K. S. Ma’aitah, H. Hamdan, D. U. Ozsahin, and vol. 471, pp. 31–47, 2022. O. Tuncyurek, “Radiologists versus deep convolutional neural [14] A. Nader and D. Azar, “Searching for activation functions networks: a comparative study for diagnosing COVID-19,” using a self-adaptive evolutionary algorithm,” in Proceedings Computational and Mathematical Methods in Medicine, of the 2020 Genetic and Evolutionary Computation Conference vol. 2021, Article ID 5527271, 9 pages, 2021. Companion, pp. 145-146, Cancun, ´ Mexico, July 2020. [29] D. Shen, N. Huo, and S. S. Saab, “A probabilistically quantized [15] M. Orabi, J. Khalife, A. A. Abdallah, Z. M. Kassas, and learning control framework for networked linear systems,” S. S. Saab, “A machine learning approach for gps code phase IEEE Transactions on Neural Networks and Learning Systems, estimation in multipath environments,” in Proceedings of the vol. PP, 2021. 2020 IEEE/ION Position, Location and Navigation Symposium [30] J. A. Bagnell, F. Cavalcanti, L. Cui et al., “An integrated system (PLANS), pp. 1224–1229, IEEE, Portland, OR, USA, April for autonomous robotics manipulation,” in Proceedings of the 2020. 2012 IEEE/RSJ International Conference on Intelligent Robots [16] P. Saade, R. El Jammal, S. El Hayek, J. Abi Zeid, O. Falou, and and Systems, pp. 2955–2962, IEEE, Vilamoura-Algarve, D. Azar, “Computer-aided detection of white blood cells using Portugal, October 2012. geometric features and color,” in Proceedings of the 2018 9th [31] D. Morrison, P. Corke, and J. Leitner, “Closing the loop for Cairo International Biomedical Engineering Conference robotic grasping: a real-time, generative grasp synthesis ap- (CIBEC), pp. 142–145, IEEE, Cairo, Egypt, December 2018. proach,” 2018, https://arxiv.org/abs/1804.05172. [17] D. Azar, R. Moussa, and G. Jreij, “A comparative study of nine [32] R. K. Megalingam, R. V. Rohith Raj, T. Akhil, A. Masetti, machine learning techniques used for the prediction of dis- G. N. Chowdary, and V. S. Naick, “Integration of vision based eases,” International Journal of Artificial Intelligence, vol. 16, robot manipulation using ROS for assistive applications,” in no. 2, pp. 25–40, 2018. Proceedings of the 2020 Second International Conference on [18] S. Zhang, Y. Wang, J. Jiang, J. Dong, W. Yi, and W. Hou, Inventive Research in Computing Applications (ICIRCA), “CNN-based medical ultrasound image quality assessment,” pp. 163–169, Coimbatore, India, July 2020. Complexity, vol. 2021, Article ID 9938367, 9 pages, 2021. [33] M. Colledanchise and P. Ogren, Behavior Trees in Robotics [19] T. Borkar, F. Heide, and L. Karam, “Defending against uni- and AI: An Introduction, CRC Press, Boca Raton, FL, USA, versal attacks through selective feature regeneration,” in Proceedings of the 2020 IEEE/CVF Conference on Computer [34] Y. Hu, X. Wu, P. Geng, and Z. Li, “Evolution strategies Vision and Pattern Recognition (CVPR), pp. 709–719, June learning with variable impedance control for grasping under uncertainty,” IEEE Transactions on Industrial Electronics, [20] O. A. Wahab, J. Bentahar, H. Otrok, and A. Mourad, “Re- vol. 66, no. 10, pp. 7788–7799, 2018. source-aware detection and defense system against multi-type [35] A. Koubaa, ˆ Robot Operating System (ROS), Springer, Berlin, attacks in the cloud: repeated Bayesian Stackelberg game,” Germany, 2017. IEEE Transactions on Dependable and Secure Computing, [36] M. Quigley, K. Conley, B. Gerkey et al., “ROS: an open-source vol. 18, no. 2, pp. 605–622, 2019. robot operating system,” in Proceedings of the ICRA Workshop [21] O. A. Wahab, R. Cohen, J. Bentahar, H. Otrok, A. Mourad, on Open Source Software, Kobe, Japan, January 2009. and G. Rjoub, “An endorsement-based trust bootstrapping [37] H. Wei, Z. Shao, Z. Huang et al., “RT-ROS: a real-time ROS approach for newcomer cloud services,” Information Sciences, architecture on multi-core processors,” Future Generation vol. 527, pp. 159–175, 2020. Computer Systems, vol. 56, pp. 171–178, 2016. [22] M. Awada, F. J. Srour, and I. M. Srour, “Data-driven machine [38] P. Zhang, Advanced Industrial Control Technology, William learning approach to integrate field submittals in project Andrew, Norwich, NY, USA, 2010. 14 Journal of Robotics [39] C. Acosta Lua, ´ C. C. Vaca Garc´ıa, S. Di Gennaro, B. Castillo- Electronics & Mobile Communication Conference (UEMCON), Toledo, and M. E. Sanchez ´ Morales, “Real-time hovering pp. 716–720, IEEE, New York, NY, USA, November 2018. [56] R. H. Jaafar and S. S. Saab, “Approximate differentiator with control of unmanned aerial vehicles,” Mathematical Problems varying bandwidth for control tracking applications,” IEEE in Engineering, vol. 2020, Article ID 2314356, 8 pages, 2020. Control Systems Letters, vol. 5, no. 5, pp. 1585–1590, 2020. [40] Y. Zeng, J. Sheng, and M. Li, “Adaptive real-time energy [57] S. S. Saab Jr., M. Hauser, A. Ray, and S. S. Saab, “Multivariable management strategy for plug-in hybrid electric vehicle based nonadaptive controller design,” IEEE Transactions on In- on simplified-ECMS and a novel driving pattern recognition dustrial Electronics, vol. 68, no. 7, pp. 6181–6191, 2020. method,” Mathematical Problems in Engineering, vol. 2018, [58] S. S. Saab and P. Ghanem, “A multivariable stochastic tracking Article ID 5816861, 12 pages, 2018. controller for robot manipulators without joint velocities,” [41] C. Eppner, S. Hofer, ¨ R. Jonschkowski et al., “Lessons from the IEEE Transactions on Automatic Control, vol. 63, no. 8, amazon picking challenge: four aspects of building robotic pp. 2481–2495, 2017. systems,” in Robotics: Science and Systems, 2016. [59] S. S. Saab, “An optimal stochastic multivariable PID con- [42] N. Correll, K. E. Bekris, D. Berenson et al., “Analysis and troller: a direct output tracking approach,” International observations from the first amazon picking challenge,” IEEE Journal of Control, vol. 92, no. 3, pp. 623–641, 2019. Transactions on Automation Science and Engineering, vol. 15, [60] S. S. Saab, “Development of multivariable PID controller gains no. 1, pp. 172–188, 2016. in presence of measurement noise,” International Journal of [43] F. Chaumette, S. Hutchinson, and P. Corke, “Visual servoing,” Control, vol. 90, no. 12, pp. 2692–2710, 2017. in Springer Handbook of RoboticsSpringer, Berlin, Germany, [61] S. S. Saab, “A stochastic PID controller for a class of MIMO systems,” International Journal of Control, vol. 90, no. 3, [44] S. Hutchinson, G. D. Hager, and P. I. Corke, “A tutorial on pp. 447–462, 2017. visual servo control,” IEEE Transactions on Robotics and [62] S. S. Saab and R. H. Jaafar, “A proportional-derivative-double Automation, vol. 12, no. 5, pp. 651–670, 1996. derivative controller for robot manipulators,” International [45] G. Palmieri, M. Palpacelli, M. Battistelli, and M. Callegari, “A Journal of Control, vol. 94, no. 5, pp. 1273–1285, 2021. comparison between position-based and image-based dy- [63] R. R. Ardeshiri, M. H. Khooban, A. Noshadi, N. Vafamand, namic visual servoings in the control of a translating parallel and M. Rakhshan, “Robotic manipulator control based on an manipulator,” Journal of Robotics, vol. 2012, Article ID optimal fractional-order fuzzy PID approach: sil real-time 103954, 11 pages, 2012. simulation,” Soft Computing, vol. 24, no. 5, pp. 3849–3860, [46] J. Varley, J. Weisz, J. Weiss, and P. Allen, “Generating multi- fingered robotic grasps via deep learning,” in Proceedings of [64] J. Till, C. E. Bryson, S. Chung, A. Orekhov, and D. C. Rucker, the 2015 IEEE/RSJ International Conference on Intelligent “Efficient computation of multiple coupled cosserat rod Robots and Systems (IROS), pp. 4415–4420, IEEE, Hamburg, models for real-time simulation and control of parallel Germany, September 2015. continuum manipulators,” in Proceedings of the 2015 IEEE [47] B. Sauvet, F. Levesque, S. Park, P. Cardou, and C. Gosselin, International Conference on Robotics and Automation (ICRA), “Model-based grasping of unknown objects from a random pp. 5067–5074, IEEE, Seattle, WA, USA, May 2015. pile,” Robotics, vol. 8, no. 3, p. 79, 2019. [65] Y. Hu, Z. Li, G. Li, P. Yuan, C. Yang, and R. Song, “Devel- [48] S. Qiu, D. Lodder, and F. Du, “HGG-CNN: the generation of opment of sensory-motor fusion-based manipulation and the optimal robotic grasp pose based on vision,” Intelligent grasping control for a robotic hand-eye system,” IEEE Automation and Soft Computing, vol. 26, no. 6, pp. 1517–1529, Transactions on Systems, Man, and Cybernetics: Systems, 2020. vol. 47, no. 7, pp. 1169–1180, 2016. [49] J. Long, E. Shelhamer, and T. Darrell, “Fully convolutional networks for semantic segmentation,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3431–3440, Boston, MA, USA, April 2015. [50] G. Dong and Z. H. Zhu, “Incremental inverse kinematics based vision servo for autonomous robotic capture of non- cooperative space debris,” Advances in Space Research, vol. 57, no. 7, pp. 1508–1514, 2016. [51] S. S. Saab and R. Toukhtarian, “A MIMO sampling-rate- dependent controller,” IEEE Transactions on Industrial Electronics, vol. 62, no. 6, pp. 3662–3671, 2014. [52] A. Loria, “Observers are unnecessary for output-feedback control of Lagrangian systems,” IEEE Transactions on Auto- matic Control, vol. 61, no. 4, pp. 905–920, 2015. [53] D. Shen and S. S. Saab, “Noisy output based direct learning tracking control with Markov Nonuniform trial lengths using adaptive gains,” IEEE Transactions on Automatic Control, [54] S. S. Saab, D. Shen, M. Orabi, D. Kors, and R. Jaafar, “Iterative learning control: practical implementation and automation,” IEEE Transactions on Industrial Electronics, vol. 69, 2021. [55] R. H. Jaafar and S. S. Saab, “Sliding mode control with dirty derivatives filter for rigid robot manipulators,” in Proceedings of the 2018 9th IEEE Annual Ubiquitous Computing,

Journal

Journal of RoboticsHindawi Publishing Corporation

Published: Jun 30, 2022

There are no references for this article.