Access the full text.
Sign up today, get DeepDyve free for 14 days.
Alon Oring, Z. Yakhini, Y. Hel-Or (2020)
Autoencoder Image Interpolation by Shaping the Latent Space
Kihyuk Sohn, Honglak Lee, Xinchen Yan (2015)
Learning Structured Output Representation using Deep Conditional Generative Models
G. Palli, F. Ficuciello, U. Scarcia, C. Melchiorri, B. Siciliano (2014)
Experimental evaluation of synergy-based in-hand manipulationIFAC Proceedings Volumes, 47
O. Jenkins (2006)
2D Subspaces for Sparse Control of High-DOF Robots2006 International Conference of the IEEE Engineering in Medicine and Biology Society
D. Dimou, J. Santos-Victor, Plinio Moreno (2021)
Learning Conditional Postural Synergies for Dexterous Hands: A Generative Approach Based on Variational Auto-Encoders and Conditioned on Object Size and Category2021 IEEE International Conference on Robotics and Automation (ICRA)
M. Kramer (1991)
Nonlinear principal component analysis using autoassociative neural networksAiche Journal, 37
A Asperti (2020)
199440IEEE Access, 8
A. Asperti, Matteo Trentin (2020)
Balancing Reconstruction Error and Kullback-Leibler Divergence in Variational AutoencodersIEEE Access, 8
Diederik Kingma, M. Welling (2013)
Auto-Encoding Variational BayesCoRR, abs/1312.6114
Diederik Kingma, Jimmy Ba (2014)
Adam: A Method for Stochastic OptimizationCoRR, abs/1412.6980
J. Starke, Christian Eichmann, Simon Ottenhaus, T. Asfour (2020)
Human-Inspired Representation of Object-Specific Grasps for Anthropomorphic HandsInt. J. Humanoid Robotics, 17
Thomas Feix, R. Pawlik (2009)
A comprehensive grasp taxonomy
M. Ciocarlie, Corey Goldfeder, P. Allen (2007)
Dimensionality reduction for hand-independent dexterous robotic grasping2007 IEEE/RSJ International Conference on Intelligent Robots and Systems
A. Bernardino, M. Henriques, N. Hendrich, Jianwei Zhang (2013)
Precision grasp synergies for dexterous robotic hands2013 IEEE International Conference on Robotics and Biomimetics (ROBIO)
Javier Romero, Thomas Feix, C. Ek, H. Kjellström, D. Kragic (2013)
Extracting Postural Synergies for Robotic GraspingIEEE Transactions on Robotics, 29
J. Starke, Christian Eichmann, Simon Ottenhaus, T. Asfour (2018)
Synergy-Based, Data-Driven Generation of Object-Specific Grasps for Anthropomorphic Hands2018 IEEE-RAS 18th International Conference on Humanoid Robots (Humanoids)
Neil Lawrence (2003)
Gaussian Process Latent Variable Models for Visualisation of High Dimensional Data
M. Santello, M. Bianchi, M. Gabiccini, E. Ricciardi, G. Salvietti, D. Prattichizzo, M. Ernst, A. Moscatelli, H. Jörntell, A. Kappers, K. Kyriakopoulos, A. Albu-Schäffer, Claudio Castellini, A. Bicchi (2016)
Hand synergies: Integration of robotics and neuroscience for understanding the control of biological and artificial hands.Physics of life reviews, 17
Nutan Chen, Maximilian Karl, Patrick Smagt (2016)
Dynamic movement primitives in latent space of time-dependent variational autoencoders2016 IEEE-RAS 16th International Conference on Humanoid Robots (Humanoids)
G. Salvietti (2018)
Replicating Human Hand Synergies Onto Robotic Hands: A Review on Software and Hardware StrategiesFrontiers in Neurorobotics, 12
Kai Xu, Huan Liu, Yuheng Du, Xiangyang Zhu (2016)
A Comparative Study for Postural Synergy Synthesis Using Linear and Nonlinear MethodsInt. J. Humanoid Robotics, 13
M. Santello, M. Flanders, J. Soechting (1998)
Postural Hand Synergies for Tool UseThe Journal of Neuroscience, 18
Tao Chen, Jie Xu, Pulkit Agrawal (2021)
A System for General In-Hand Object Re-Orientation
S. Katyara, F. Ficuciello, D. Caldwell, B. Siciliano, Fei Chen (2020)
Leveraging Kernelized Synergies on Shared Subspace for Precision Grasp and Dexterous ManipulationArXiv, abs/2008.11574
M. Ciocarlie, Corey Goldfeder, P. Allen (2007)
Dexterous Grasping via Eigengrasps : A Low-dimensional Approach to a High-complexity Problem
Neil Lawrence, J. Candela (2006)
Local distance preservation in the GP-LVM through back constraintsProceedings of the 23rd international conference on Machine learning
We develop a conditional generative model to represent dexterous grasp postures of a robotic hand and use it to generate in-hand regrasp trajectories. Our model learns to encode the robotic grasp postures into a low-dimensional space, called Synergy Space, while taking into account additional information about the object such as its size and its shape category. We then generate regrasp trajectories through linear interpolation in this low-dimensional space. The result is that the hand conﬁguration moves from one grasp type to another while keeping the object stable in the hand. We show that our model achieves higher success rate on in-hand regrasping compared to previous methods used for synergy extraction, by taking advantage of the grasp size conditional variable. Keywords Robotics · Dexterous robotic grasping · In-hand manipulation · Regrasping 1 Introduction al., 2007a), as control algorithms can operate directly on the low dimensional space, thus reducing the number of control The control of dexterous artiﬁcial hands with a high num- parameters. The problem is formulated as ﬁnding a mapping ber of degrees of freedom has been a long-standing research from the high-dimensional conﬁguration space (e.g. joint problem in robotics. Researchers have tried mimicking the angle values) of an artiﬁcial hand to a lower-dimensional way humans control their hands in order to facilitate the con- embedding, commonly referred to as Synergy Space (Salvi- trol of robotic hands (Santello et al., 2016). Studies from the etti, 2018). ﬁeld of neuroscience have shown that the human brain lever- The ﬁrst studies (Ciocarlie et al., 2007a, b) to apply this ages a synergistic framework that relies on coupled neural principle, closely following the nominal neuroscience study signals, muscles and kinematic constraints to perform dex- (Santello et al., 1998) that established this concept, used the terous manipulation skills like grasping (Santello et al., 1998, classical dimensionality reduction linear method, Principal 2016). This type of organization allows a small number of Component Analysis (PCA), to extract a low-dimensional parameters (synergies) to control a large number of degrees representation from a set of recorded grasp postures, and used of freedom (hand joints, muscles etc.). this representation to search for grasp postures. This model Roboticists have taken advantage of this concept in neuro- though cannot be conditioned on additional information such science by modelling the control of dexterous hands in lower as the object’s properties. dimensional spaces. This way they can effectively reduce Following works (Romero et al., 2013; Xu et al., 2016), the computational burden of dexterous control (Ciocarlie et have improved the reconstruction error of the grasp pos- tures, when compared with PCA, by using a non-linear latent variable model based on Gaussian processes (GPLVM). B Dimitrios Dimou Auto-Encoder (AE) models, which is a non-linear, deter- mijuomij@gmail.com ministic dimensionality reduction method, based on neural José Santos-Victor networks, have also been shown to improve the performance jasv@isr.tecnico.ulisboa.pt in terms of reconstruction and are able to encode additional Plinio Moreno information such as the object size (Starke et al., 2018, 2020). plinio@isr.tecnico.ulisboa.pt However, these models produce irregular latent spaces that Institute for Systems and Robotics, Instituto Superior can be hard to generate trajectories in. Tecnico, Universidade de Lisboa, Lisboa, Portugal 123 Autonomous Robots Fig. 1 Example regrasp trajectory generated by our method and executed using the iCub robot. The grasp posture changes from a lateral pinch to a tip pinch In our previous work (Dimou et al., 2021), we used a 2 Related work Conditional Variational Auto-Encoder to learn a conditional low-dimensional latent space from a set of grasp postures Synergy extraction. In neuroscience, the study (Santello et executed on a robotic hand. This latent space was condi- al., 1998) established the concept of synergies as the method tioned on the size of the object and its shape category. The used by the human brain to ease the control of the human smoothness of the latent space was introduced as a metric to hand. By analyzing grasp postures of human subjects who evaluate its effectiveness. We showed that this model learns a where asked to grasp imaginary objects, they showed that smoother latent space, and we used it to generate successful the ﬁrst two principal components, that were extracted using trajectories for in-hand regrasping tasks in a simulated envi- the PCA method, were responsible for 80% of the variation ronment, using the Shadow Hand that resembles the human in the data, suggesting that using only two components they adult in size and degrees of freedom. could represent the acquired data to a high degree. This work evaluates further the methods proposed in In robotics, the data collected in Santello et al. (1998) Dimou et al. (2021), applying the same learning architecture were used to transfer the grasp postures of the human sub- to the real world iCub hand, which resembles a child human jects to 4 robotics hands by Ciocarlie et al. (2007a). Then size and has less degrees of freedom than the Shadow Hand the PCA method was used to ﬁnd a low-dimensional basis (i.e. less dexterity). We use data collected for the iCub Robot to express the recorded grasp postures. The recovered basis hand to learn a grasp posture generation model and use it to components were called eigengrasps, and were combined to generate trajectories for in-hand regrasping. We found that in generate new grasp postures. They showed that control algo- the real world setting the generated trajectories became unsta- rithms that search in that space for stable grasps were more ble due to unaccounted object properties such as the material efﬁcient. But PCA is a linear model that cannot model com- and mass. To overcome this we reduce the conditional size plex high dimensional data and also cannot be conditioned variable of the model during testing and we improve the suc- on additional variables. cess rate in the regrasping tasks. An example of a regrasp Precision grasps were analysed in Bernardino et al. (2013), trajectory executed with our model can be seen in Fig. 1.In where the PCA method was used to extract the principal com- our experiments we also present the results from Dimou et ponents from dataset of grasps, achieved by a human operator al. (2021) for the Shadow Robot Hand and we show that our controlling two dexterous robotic hands. In our experiments results for the iCub Robot hand exhibit the same patterns. we use the dataset collected in this work for the iCub robot In summary our contributions are: to learn the conditional latent space for precision grasps. Since the purpose of synergies is to ﬁnd a low-dimensional space to represent the grasp postures, most classical dimen- • We apply the conditional model for the generation of sionality reduction methods can be applied to the problem. grasp postures presented in Dimou et al. (2021), to a In Jenkins (2006); Tsoli Odest & Jenkins (2007) several of real-world in-hand regrasping task using the iCub robot. them were compared for encoding the control of a robotic • We propose a method for generating regrasp trajectories hand in 2D subspaces. The methods were applied on recorded in the latent space of the model. hand movements and evaluated based on the inconsistency • We show the practical impact of the object size variable, and continuity of the representations, while a method for in the regrasping performance of our model. By adapt- denoising graphs consisted of embedded grasp postures was ing the object size, our model avoids slippage in grasp proposed. In this work, we use the smoothness of the latent execution. space to evaluate its effectiveness which is similar to the con- • We perform real world experiments with the iCub robot tinuity presented in Tsoli Odest & Jenkins (2007). demonstrating that this method greatly improves the per- An Auto-Encoder (AE) model (Kramer, 1991) was used in formance of previous approaches, showing the versatility Starke et al. (2018, 2020) to extract postural synergies from of our model that was applied to two dexterous hands with human data. The relative object size compared to the palm, different dexterity and size. 123 Autonomous Robots was also used as an additional input variable in the decoder for ture x ∈ R , usually represented by a vector containing the the grasp generation. The model was trained with a modiﬁed joint angle values of the hand, we want to ﬁnd a mapping loss in order to disentangle the grasp types in the latent space. e(x) that encodes the grasp into a low-dimensional point It achieved better reconstruction results when compared to z ∈ R , where d << d . We also need a mapping d(z) z x the PCA, and was able to generate grasp postures for objects that decodes the low-dimensional point back into the origi- of different sizes. However, Auto-Encoder models tend to nal space. These mappings are parameterized by vectors θ, φ. learn non smooth latent spaces that can result in unstable Given a dataset of observations X, we want to compute the trajectories in manipulation tasks (Dimou et al., 2021). parameter vectors (θ, φ) that minimize an optimization crite- The methods presented until now in this section were rion. The low-dimensional embeddings z are the synergistic deterministic. In Romero et al. (2013); Xu et al. (2016), components and the space is called a latent space. the Gaussian Process Latent Variable Model (GPLVM) PCA parameterizes this mapping as a linear function d ×d x z (Lawrence, 2003), which is a non linear probabilistic model, which can be written as e(x) = xW, where W ∈ R was used to learn a grasp manifold from a dataset of recorded are the parameters to be found. Auto-encoders models, use grasp postures. They showed that this model has lower recon- neural networks to represent the encoding and decoding map- struction error when compared with the PCA. pings. So the input has a non-linear relationship with the latent embedding. In both cases the optimization criterion In this work we use a Conditional Variational Auto- Encoder (CVAE) (Sohn et al., 2015), which is conditional used to ﬁnd the optimal parameters is the mean squared error probabilistic model based on the VAE framework (Kingma between the input and the reconstruction of the model. In the &Welling, 2013). This model learns to represent and gen- case of PCA, an analytic solution can be found, while for erate grasp postures given additional input signals such as Auto-encoders gradient based optimization is usually per- the object size and type (Dimou et al., 2021). In addition, it formed. has been shown to learn smoother latent spaces that previ- Probabilistic approaches assume that the data are gener- ous approaches which is instrumental in planning for in-hand ated by unobserved latent variables following a probability regrasping. distribution p(x, z; θ), where x are the observed data, i.e. Regrasping. Postural synergies have also been used to the grasp postures, z are the latent variables, i.e. the syn- facilitate in-hand manipulation. In Palli et al. (2014), they ergistic components, and θ are the parameters of the model. used the PCA method to compute a Synergy Space from The GPLVM approach uses Gaussian Processes to model the grasp postures achieved by human subjects. Then they cre- probability distribution p.InXuetal. (2016); Romero et al. ated regrasp trajectories by linearly interpolating between (2013), a variant of the GPLVM with back constraints (BCG- the boundary conﬁgurations in the Synergy Space. They also PLVM) (Lawrence et al., 2006) is used for synergy extraction. computed an additional Synergy Space from demonstrations This model enforces a constraint on the latent variables that of manipulation tasks and showed that combining these two ensures that points that are close in the original space remain spaces improves the quality of the manipulations. In Kat- close in the latent space. yara et al. (2021), they used the PCA to compute a Synergy In this work, we use a Conditional Variational Auto- Space from a set of demonstrations of in-hand manipulations. Encoder to model the representation learning process. The Then they parameterized the demonstrations in the Syn- CVAE consists of an encoder and a decoder network. The ergy Space using Kernelized movement primitives. Using encoder takes as input a data point x and a corresponding this parameterization they were able to achieve four in-hand conditional variable c and produces a latent point z.The manipulation tasks. decoder takes as input a latent point z and a conditional vari- In this work, we generate the trajectories in the Synergy able c and generate a new data point x. The encoder models Space by linearly interpolating between the initial and target the probability distribution q(z | x, c), while the decoder grasp postures, similarly to Palli et al. (2014), butwedonot the probability distribution p(x | z, c). During training we use any demonstration data. Instead to improve the success maximize the evidence lower bound (ELBO): rate of the generated trajectories we adjust the conditional variables of the model in order to perform ﬁrmer grips. We L (x) = E log p (x | c, z) θ ,φ q (z|x,c) show that our method outperforms previous approaches in (1) − D q (z | x, c) p(z) KL φ in-hand regrasping tasks in a real world setting. The ﬁrst term corresponds to the mean squared error between 3 Background the reconstruction and the input, while the second minimizes the Kullback–Leibler divergence between the true posterior In the context of robotics, postural synergies are modeled distribution p(z | x) and a variational distribution q(z | x), with dimensionality reduction techniques. Given a hand pos- which works as a regularization criterion for the latent space. 123 Autonomous Robots of recorded grasps originally presented in (Bernardino et al., 2013). The dataset consists of 536 grasps performed using the iCub robot by a human operator teleoperating the robot with a data glove. Each grasp, following the grasp taxonomy (Feix et al., 2009), belongs in one of the eight following precision- grasp categories: tripod, palmar pinch, lateral, writing tripod, parallel extension, adduction grip, tip pinch, lateral tripod. Fig. 2 Schematic representation of CVAE model The objects that were grasped were three balls of different radius, three cylinders of different radius and height, and a box with three different sizes in each side. So for each grasp The loss is minimized using standard gradient descent meth- posture there is a corresponding label denoting the size of ods like (Kingma &Ba, 2014). the side of the object that was grasped, i.e. if it was the large, the medium or the small side, and the shape category of the object. i.e. if it is a ball, a cylinder or a box. The size label 4 Methods is represented with a scalar value in (0.0, 1.0), where 0 cor- responded to a grasp on the small side of the object, 0.5 to Grasp posture representation. Given a set of recorded grasp a grasp on the medium side of the object, and 1.0 to a grasp postures, the goal of this work is to learn a low-dimensional on the large side of the object. The shape category label is Synergy Space that can be used to generate regrasping trajec- represented using one-hot encoding. tories. By regrasping we mean changing the grasp type from The model is trained by feeding the grasp postures x and an initial type of grasp to a target one, while holding the the corresponding labels c , which denote the size and the object in the hand. In previous works, the goal was to learn shape category of the object, into the encoder, which mod- a model that accurately reproduces the recorded grasps. In els a sampler of the probability distribution q(z | x, c).So Dimou et al. (2021) we showed that this criterion is not sufﬁ- given a grasp posture x from the dataset and a label c ,we i i cient to evaluate the effectiveness of the learned latent space sample a latent point z . The decoder models a sampler of in regrasping tasks. Instead, we used the smoothness of the the distribution p(x | z, c), so given a latent point z and latent space as an alternative evaluation metric. Smoothness a label c it generates a grasp posture x ˆ that has the prop- i i is deﬁned using the distance between grasps decoded from erties of the given label. The parameters of the model are latent points on a grid, similar to Chen et al. (2016). The then optimized in order to minimize the mean squared error distance is computed as the average difference of their joint between the input x and the output x ˆ , and the KL diver- i i angles. More precisely, given two latent points z and z and 1 2 gence between the prior p(z), which is a standard normal a decoding mapping d(z) to the grasp posture space we deﬁne distribution, and the posterior q(z | x, c). The minimization smoothness as: of the mean squared error forces the model to learn to recon- struct the input grasp postures, while the minimization of the S(z , z ) =d(z ) − d(z ) 1 2 1 2 2 KL divergence forces the latent space to follow the standard normal distribution. This gives us the average change in the joint angles of the In-hand regrasping. In the in-hand regrasping task we robotic hand if we move from one point on the grid to another. want to execute a regrasp trajectory that changes the grasp The use of smoothness as a metric for evaluating the learned type executed on the object without changing the side of the latent space for manipulation tasks, agrees with our intuition grasp that the object is grasped from and without breaking that if we are planning ﬁnger movements we want to avoid contact with the object. To generate regrasp trajectories we sudden changes in ﬁnger joints that might make our grasp encode the initial and the target grasp postures into the latent of the object unstable. Instead, we want to perform smooth space, so we obtain two latent points z and z . initial target transitions between hand states that keep the object stable We then linearly interpolate between these two points in and balanced. the Euclidean space and sample N points, where N equals Following the work in Dimou et al. (2021)wetrainaCon- the number of steps in the trajectory. Finally we decode the ditional Variational Auto-Encoder to generate grasp postures new sampled points to obtain a trajectory in the conﬁgura- given as additional signals the size of the side of the object tion space. The complete trajectory generation procedure is to be grasped and the type of the object. A schematic repre- outlined in Algorithm 1. A schematic representation of the sentation of our model can be seen in Fig. 2. We use a dataset 123 Autonomous Robots trajectory generation process can be seen in Fig. 3. In essence, were more stable. We show in the experimental results that instead of employing a complex planning algorithm to ﬁnd a this process was crucial to increasing the performance of the path of states that can perform the required regrasp we rely model. on the structure of the latent space of the model. If the latent space is smooth we can generate successful trajectories by a simple procedure such as linear interpolation. 5 Experimental results Dataset description. The dataset used to train our models Algorithm 1 Trajectory generation in latent space was originally presented in (Bernardino et al., 2013). It was Initial x and target x grasp postures initial target z = encoder (x ) acquired by teleoperating two robotic hands: the Shadow initial initial z = encoder (x ) target target Dexterous Hand, and the iCub robot hand. In this work we N : number of steps in trajectory use the dataset recorded for the iCub robot. Initialize trajectory T = [] For the teleoperation of the robot, a mapping was used for i=0toN do i i z = z ∗ (1 − ) + z ∗ that transformed the joint angles of the human operator’s new initial target N N x = decoder (z ) new new hand to the joint angles of the iCub hand. The mapping, was T.append(x ) new ﬁxed for each user and it was generated by the acquisition end for software. So each recorded grasp is represented by the angle values in degrees of the 9 joints of the iCub robot hand. Although this method has worked in simulation (Dimou et This way our model is trained directly on the robot angles al., 2021), we found that in real world experiments the method and we do not need the human to robot mapping after the was not outperforming the other approaches, due to issues data collection process. Twelve objects were grasped, from during contact. More speciﬁcally, small objects with smooth ﬁve distinct categories: ball (three sizes), box (three sizes), surface and high mass were slipping. In order to overcome cylinder (three sizes), pen and cube. The objects were grasped these problems, during testing we adjusted the conditional from different sides adding up to a total of 20 different object size variable used by the decoder to generate the regrasp conﬁgurations. trajectory. The initial model was trained with size values in The models were trained on a subset of this dataset, (0.0, 1.0), while when testing the size values were reduced more speciﬁcally on the grasps performed on the balls, the by 0.5. This way the model produced ﬁrmer grips which cylinders and the box with three different size sides. We rep- Fig. 3 Schematic representation of generating trajectory in latent space. denote different grasp types. During the decoding phase, N points along In the encoding phase both the initial and the target grasp are encoded the line connecting the initial and target grasps are decoded and a regrasp into the latent space. The different colors of the points in the latent space trajectory is produced 123 Autonomous Robots Table 1 Size value label for each object conﬁguration Object conﬁguration Size label Medium green cylinder, top 0.0 Medium green cylinder, side Small wooden cylinder, side Small white ball Box, small side Box, medium side Medium yellow ball 0.5 Small wooden cylinder, top Big red cylinder, side Box, large side Big blue ball 1.0 Big red cylinder, top Fig. 4 Plot of the mean squared error of each model for latent dimen- sions from one to the number of degrees of freedom of the iCub hand Table 2 Smoothness results for iCub Robot: The mean and standard deviation calculated from the latent space gradients of each model for resented the shape category of the object, meaning if it was three grid resolutions. Lower values suggest a smoother latent space a ball, a box or a cylinder using one-hot encoding. For the (μ, σ ) size we used a continuous scalar variable in (0.0, 1.0).In N=5 N=15 N=25 the original dataset grasps were labeled as: large, medium, PCA (26.0, 5.0) (7.4, 1.4) (4.3, 0.8) and small, according to the size of the object. In our dataset, AE (33.9, 8.9) (10.8, 2.9) (6.4, 1.7) we labeled the grasps for each object conﬁguration with the values 0.0, 0.5, 1.0. The object conﬁgurations were selected VAE (29.1, 10.2) (8.7, 3.2) (5.1, 1.9) such that the size of the grasps were similar. In Table 1, you CVAE (21.9, 6.7) (6.3, 2.0) (3.7, 1.2) can see each object conﬁguration and the corresponding size BCGPLVM (Linear) (28.2, 8.8) (12.1, 6.0) (7.3, 3.7) label. In our dataset we mapped these labels to the (0.0, 1.0) BCGPLVM (MLP) (27.1, 8.3) (8.9, 3.5) (5.3, 2.1) range. Large size grasps were represented by the value 1.0, BCGPLVM (RBF) (46.2, 17.5) (26.6, 14.5) (17.1, 10.1) medium size grasps by the value 0.5, and small size grasps The lowest mean values are highlighted in bold by the value 0.0. But the value was given as a continuous variable and can take any real value, in contrast the shape variable that is discrete and it was one-hot encoded. We con- models because in their loss function there is the additional catenated the shape and the size into a vector and used it as term of the KL divergence between the prior and the posterior. the conditional variable c in the CVAE model. Following the nominal neuroscience study (Santello et al., Models. We trained seven models on the iCub dataset. A 1998), that showed that 2 synergies are enough to repre- PCA model, a standard AE architecture model, a standard sent most human grasps, and previous robotics works that VAE, the CVAE model described in Sect. 4 and three BCG- also used 2 synergistic components, the dimensionality of PLVM models with the following back projection mappings: the latent space for the following experiments was chosen (1) a linear mapping, (2) a multi layer perceptron (MLP) and to be 2, to be directly comparable with the other works. We (3) a radial basis function (RBF). The choice of mapping has then calculated the smoothness of each model’s latent space an effect on the models reconstruction error and the smooth- as proposed in Dimou et al. (2021) for three grid resolutions ness of its latent space. N = 5, 15, 25, seen in Table 2. The CVAE model exhibits the Latent space analysis. Following (Dimou et al., 2021), lowest average change in joint angle values between neigh- we performed the same analysis of the latent space for each boring grasps, indicating that movements between near states model. More speciﬁcally, for each model we computed its in the latent space will results in smooth transitions in the con- reconstruction error on the dataset, seen in the left Fig. 4.The ﬁguration space. On the other hand, the BCGPLVM model BCGPLVM model with the RBF kernel as the back constraint with the RBF kernel that had the lowest reconstruction error has the lowest reconstruction error among all models. As we exhibits high variation in the latent space which results in increase the latent dimensions to the number of degrees of sudden changes in the conﬁguration space. freedom of the hand, the reconstruction error in most models Finally, Fig. 5 shows the latent space traversals for the goes to zero. That is not the case for the VAE and the CVAE PCA, the VAE and the BCGPLVM with a linear kernel. We 123 Autonomous Robots Fig. 5 Latent space traversals in the spaces learned by the PCA, the VAE, and the BCGPLVM with a linear kernel model used the iCub simulator to decode the grasps from a square grid of size 5 in each model’s latent space. We can notice that the latent space of the PCA and the VAE exhibit more variability but also smoother transitions between neighboring grasps. On the other hand, in the latent space of the BCG- PLVM we see a lot of grasps that are not valid, and sudden changes between neighboring grasps. The smoothness of the latent space is a result of the KL divergence regularization in the loss function of the VAE and cVAE models as mentioned in Asperti &Trentin (2020); Oring et al. (2021). This term forces the prior distribution of the model to match the nor- mal distribution. It brings the latent points towards the origin of the axes making the latent space more dense and evenly distributed, compared for example with the AE model that uses the same loss function without the regularization. Real robot experiments. In Dimou et al. (2021), regrasp Fig. 6 Objects used to execute regrasp trajectories experiments were conducted in simulation and using only one of the objects of the dataset, the box. In this work, we performed real world experiments with the iCub robot, using the model to produce tight grasps and the ﬁngertips to always all the objects that were used to execute the grasps from the be in contact with the object. So since there was no breaking training dataset, seen in Fig. 6. and creating contacts the object was always grasped from the For each object conﬁguration, i.e. each side an object can initial side. be grasped from (e.g. a sphere has one object conﬁguration, During preliminary experiments we noticed that the a cylinder two: top and side, a box with three different sized regrasp trajectories generated by the CVAE model, when sides has three), 5 pairs of grasps, where each grasp had an executed on the real robot, were not outperforming the other associated grasp type, were chosen randomly from the dataset models as suggested in Dimou et al. (2021). The reason was and for each pair a regrasp trajectory was generated by each that although the transitions between states were smooth, the model using Algorithm 1, totaling 60 regrasp trajectories. unaccounted properties of the objects, such as their material The number of steps in each trajectory was set to 10. Each and mass, were causing some grasp postures to be unstable trajectory was performed three times to account for variabil- mainly due to slippage. This phenomenon was most appar- ity in initial conditions. In Fig. 7, we see a chord diagram ent in objects with small size and smooth surface. In order to representing the connections of grasp types in the regrasp overcome this, we took advantage of the conditional variables trajectories executed. The robot was restricted to perform the that the CVAE model encodes, i.e. the object size. When gen- regrasp trajectory on the same side of the object. This was erating trajectories using the CVAE model we adjusted the done using the object size conditional variable, that forced size label to be lower than the corresponding original label. 123 Autonomous Robots More speciﬁcally, when decoding the latent points of each trajectory we reduced each point’s size label by the value of 0.5. This way the model produced ﬁrmer grips that were more stable during execution. This adjustment is not possible to be applied to the other models as they cannot be condi- tioned on additional variables. The results of the execution of the trajectories can be seen in Fig. 8 in a box plot format. On the vertical axis is the percentage of successful regrasp tra- jectories generated by each model. Each box represents the interquartile range between the three runs of each trajectory, while the line in the middle of the box the median. We see that most models follow the same pattern with the results in Dimou et al. (2021), but the CVAE model using the original labels for the size of the object does not surpass the perfor- mance of the other models. On the other hand the trajectories generated with the CVAE model but with the adjusted size labels generates twice as many successful regrasp trajecto- ries. In Fig. 9, we can see some of the regrasp trajectories Fig. 7 Chord diagram representing connections of grasp types in exe- generated by the CVAE model with the adjusted size labels. cuted regrasp trajectories. The arcs represent each grasp type in the Finally, we investigated the size interpolation and extrap- dataset. The size of each arc is analogous to the number of occurrence of each grasp type in the trajectories. A connection (chord) between two olation capabilities of the proposed CVAE model. More arcs means that a regrasp from one grasp type to another was executed. speciﬁcally, as the size label during training is a scalar value The chords are colored according to the initial grasp type. Regrasps in (0.0, 1.0), we wanted to see if the model is able to produce between grasps of the same type are represented as half-ellipses on grasps for other values in that range, as well as values outside arcs. These regrasps occur because the initial grasps in the dataset were recorded by multiple human operators, and since the robotic has a lot that range. To test this, we generated 100 different grasps, ran- of DoFs some operators performed the same grasp type differently domly choosing a latent point and an object type, and using 20 evenly spaced grasp size values in the range (−1.0, 2.0).We executed each grasp on the simulated model of the iCub hand and computed the Euclidean distance between the ﬁngertip Fig. 8 Percentage of successful regrasp trajectories generated from each model 123 Autonomous Robots Fig. 9 Example regrasp trajectories executed on the iCub robot using four different objects. Each trajectory has ten steps. The pictures are taken every two steps. The ﬁrst trajectory moves from a tripod grasp type to a lateral, the second from a tip pinch to a lateral, and the third from a tip pinch to a lateral Fig. 10 Examples of size extrapolations. All grasps generated from the same latent point with conditional size values = {−2.0, −1.0, 0.0, 0.5, 1.0} of the thumb and the index for each grasp. In Fig. 11,onthe x axis we plot the value of the conditional size variable given to the network and on the y axis the distance between the ﬁn- gertips of the generated grasp. The graph demonstrates that as the grasp size variable increases the distances between the ﬁngertips also increases in an almost linear fashion. That indi- cates that the network learns to encode the relation between the grasp size variable and the ﬁngertip distance. In Fig. 12, we show the average ﬁngertip distance for each size value and the standard deviation. The variance present in the distances is a result of the differences between grasp types, for example in the tripod grasp the object is stabilised in the opposition created between the tips of the thumb and the index, while in the parallel extension grasps the object is stabilised in the opposition between the tip of the thumb and the distal link of the index. In addition, we tested this on a real-world exper- Fig. 11 Grasp size as a function of the conditional size variable iment, where we chose a tip pinch grasp executed on a ball from the dataset, we encoded it into the latent space, and then decoded the produced latent point by varying the size label from −2.0to 1.0. In Fig. 10, we can see that the model is able to generate grasps for very small objects without having seen this object size during training. 123 Autonomous Robots ous conditional variable that represents the side of the object that is being grasped from. This way we could smoothly interpolate from one side of the object to the other. It would also require a lot more training data points for the interme- diate steps of the transition from one side to the other, with the ﬁnger gaits in each step. Finally another future research direction would be to add force feedback to the conditional model which could be used to automatically adjust the grasp size and generate hand conﬁgurations based on it. Acknowledgements Work partially supported by the Robotics, Brain and Cognition Lab, part of the FCT - Portuguese Roadmap of Research Infrastructures [01/SAICT/2016 SAICT Proj. 22084], the LARSyS - FCT Project [UIDB/50009/2020], the H2020 FET-Open project Recon- structing the Past: Artiﬁcial Intelligence and Robotics Meet Cultural Heritage (RePAIR) under EU grant agreement 964854, the Lisbon Ellis Unit (LUMLIS), and the FCT PhD grant [PD/BD/09714/2020]. Fig. 12 Average grasp size as a function of the conditional size variable Funding Open access funding provided by FCT|FCCN (b-on). Declarations 6 Conclusions Compliance with Ethical Standards We have no conﬂicts of interest to disclose. In summary, we presented a conditional model based on the VAE framework for grasp generation and used it to generate Open Access This article is licensed under a Creative Commons trajectories directly in its latent space for in-hand regrasping. Attribution 4.0 International License, which permits use, sharing, adap- tation, distribution and reproduction in any medium or format, as We also show that reducing the size labels during testing can long as you give appropriate credit to the original author(s) and the avoid slippage during execution of the generated trajectories. source, provide a link to the Creative Commons licence, and indi- We presented experiments that validate this approach as we cate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, were able to double the success rate of regrasp trajectories in unless indicated otherwise in a credit line to the material. If material a real world setting. Finally, we investigated the capabilities is not included in the article’s Creative Commons licence and your of the model to extrapolate on the size of the grasps that it intended use is not permitted by statutory regulation or exceeds the generates. permitted use, you will need to obtain permission directly from the copy- right holder. To view a copy of this licence, visit http://creativecomm Another line of research has explored the use reinforce- ons.org/licenses/by/4.0/. ment learning in in-hand manipulation tasks. In Chen et al. (2021) they train a policy using deep reinforcement learn- ing to reorient objects to arbitrary orientations. They ﬁnd that their system can perform the reorientation task and deal with novel objects without any visual information about the References object’s shape. In our work the problem is framed differently, as we want to regrasp the object using a speciﬁc grasp type Asperti, A., & Trentin, M. (2020). Balancing reconstruction error and but not explicitly reorient it or create new contacts with it, so Kullback-Leibler divergence in variational autoencoders. IEEE it is not possible to directly compare both works because the Access, 8, 199440–199448. Bernardino, A., Henriques, M., Hendrich, N., et al. (2013). Precision objectives of each are different. In order for our model to be grasp synergies for dexterous robotic hands. In 2013 IEEE inter- able to perform a similar behavior, we would need to acquire national conference on robotics and biomimetics (ROBIO), pp. more data that have intermediate steps from the ﬁnger tra- 62–67, https://doi.org/10.1109/ROBIO.2013.6739436. jectories while reorienting the object and the object states at Chen, N., Karl, M., Smagt, PVD. (2016). Dynamic movement primi- tives in latent space of time-dependent variational autoencoders. each step. In future work we plan to explore smarter ways to In 2016 IEEE-RAS 16th international conference on humanoid generate the trajectories in latent space, for example by tak- robots (Humanoids) pp. 629–636. ing advantage of the smoothness of the neighborhood of the Chen, T., Xu, J., Agrawal, P. (2021). A system for general in-hand object latent space we can avoid regions that result in large changes re-orientation. In: CoRL. Ciocarlie, M., Goldfeder, C., Allen, P. (2007a). Dimensionality in the hand conﬁguration, and test the model on arbitrary reduction for hand-independe.nt dexterous robotic grasping. In objects with more complex shapes. In addition, we would like 2007 IEEE/RSJ international conference on intelligent robots to explore generating trajectories that regrasp the object from and systems, pp 3270–3275, https://doi.org/10.1109/IROS.2007. different sides. This could be achieved by adding a continu- 4399227. 123 Autonomous Robots Ciocarlie, M.T., Goldfeder, C., Allen, P.K. (2007b). Dexterous grasp- tional Journal of Humanoid Robotics, 13(03), 1650009. https:// ing via eigengrasps : A low-dimensional approach to a high- doi.org/10.1142/S0219843616500092 complexity problem Dimou, D., Santos-Victor, J., Moreno, P. (2021). Learning conditional Publisher’s Note Springer Nature remains neutral with regard to juris- postural synergies for dexterous hands: A generative approach dictional claims in published maps and institutional afﬁliations. based on variational auto-encoders and conditioned on object size and category. In 2021 IEEE International Conference on Robotics and Automation (ICRA), pp. 4710–4716, https://doi.org/10.1109/ Dimitrios Dimou is a Ph.D. can- ICRA48506.2021.9560818. didate in Electrical and Computer Feix, T., bodo Schmiedmayer, H., Romero, J, et al. (2009). A com- Engineering at Instituto Superior prehensive grasp taxonomy. In In robotics, science and systems Técnico, Universidade de Lisboa. conference: Workshop on understanding the human hand for He received his M.Sc in Electrical advancing robotic manipulation. and Computer Engineering from Jenkins, OC. (2006). 2d subspaces for sparse control of high-dof the University of Patras, Greece robots. In 2006 international conference of the ieee engineering in 2018. His research interests are in medicine and biology society, pp. 2722–2725, https://doi.org/ in the areas of robotic grasping 10.1109/IEMBS.2006.259857. and manipulation using machine Katyara, S., Ficuciello, F., Caldwell, DG., et al. (2021). Leveraging learning methods. kernelized synergies on shared subspace for precision grasp and dexterous manipulation. arXiv:2008.11574 Kingma, DP., Ba, J. (2014). Adam: A method for stochastic optimiza- tion. CoRR abs/1412.6980. Kingma, DP., Welling, M. (2013). Auto-encoding variational bayes. CoRR abs/1312.6114. José Santos-Victor is a full Pro- Kramer, M. (1991). Nonlinear principal component analysis using fessor of Electrical and Computer autoassociative neural networks. AIChE Journal, 37, 233–243. Engineering, Instituto Superior Téc- Lawrence, N. (2003). Gaussian process latent variable models for visu- nico, Universidade de Lisboa. He alisation of high dimensional data. In: NIPS. is the President of the Institute for Lawrence, N., Candela, JQ. (2006). Local distance preservation in the Systems and Robotics|Lisboa and GP-LVM through back constraints. In: ICML ’06. the Coordinator of LARSyS (Lab- Oring, A., Yakhini, Z., Hel-Or, Y. (2021). Autoencoder image interpo- oratory of Robotics and Engineer- lation by shaping the latent space. In: ICML. ing Systems) that includes ISR|L Palli, G., Ficuciello, F., Scarcia, U., et al. (2014). Experimental evalu- ISBOA and three other research ation of synergy-based in-hand manipulation. IFAC Proceedings units (M-ITI, IN+ and MAR Volumes, 47, 299–304. ETEC). He founded the Computer Romero, J., Feix, T., Ek, C. H., et al. (2013). Extracting postural syner- and Robot Vision Lab - VisLab gies for robotic grasping. IEEE Transactions on Robotics, 29(6), at ISR|Lisboa. He graduated 21 1342–1352. https://doi.org/10.1109/TRO.2013.2272249 Ph.D. students. José’s research inter- Salvietti, G. (2018). Replicating human hand synergies onto robotic ests are in the areas of Computer and Robot Vision, particularly in hands: A review on software and hardware strategies. Frontiers in the relationship between visual perception and the control of action, Neurorobotics, 12, 27. https://doi.org/10.3389/fnbot.2018.00027 biologically inspired vision and robotics, cognitive vision and visual Santello, M., Flanders, M., & Soechting, J. F. (1998). Postural hand controlled (land, air and underwater) mobile robots. Recent research synergies for tool use. Journal of Neuroscience, 18(23), 10105– has a focus on biologically inspired models of human(oid) cognition, 10115. https://doi.org/10.1523/JNEUROSCI.18-23-10105.1998 through the creation of artiﬁcial models of human(oid) cognition. He Santello, M., Bianchi, M., Gabiccini, M., et al. (2016). Hand synergies: explored the neuroscientiﬁc ﬁndings of the Mirror Neurons to pro- Integration of robotics and neuroscience for understanding the con- pose models that use motor information for visual action recognition; trol of biological and artiﬁcial hands. Physics of Life Reviews, 17, and the concept of Gibsonian Affordances drawn from psychology for 1–23. https://doi.org/10.1016/j.plrev.2016.02.001 building advanced cognitive skills in humanoid-robots. He was the sci- Sohn, K., Lee, H., & Yan, X., et al. (2015). Learning structured out- entiﬁc responsible for the participation of IST/ISR in 10+ European put representation using deep conditional generative models. In C. Projects in the areas of Computer Vision and Robotics, (e.g. MIR- Cortes, N. D. Lawrence, & D. D. Lee (Eds.), Advances in neural ROR, CONTACT, ROBOSOM, ROBOTCUB, FIRST-MM, POETI- information processing systems (pp. 3483–3491). Curran Asso- CON+), and the current EU-FET-Pathﬁnder project “REPAIR- AI- ciates Inc. and-Robotics Meet Cultural-Heritage”. Starke, J., Eichmann, C., Ottenhaus, S., et al. (2018). Synergy-based, data-driven generation of object-speciﬁc grasps for anthropomor- phic hands. 2018 IEEE-RAS 18th international conference on humanoid robots (Humanoids) pp. 327–333 Starke, J., Eichmann, C., Ottenhaus, S., et al. (2020). Human-inspired representation of object-speciﬁc grasps for anthropomorphic hands. International Journal of Humanoid Robotics, 17, 2050008. Tsoli Odest, A., Jenkins, O. (2007). 2d subspaces for user-driven robot grasping. Robotics, Science and Systems Conference: Workshop on Robot Manipulation. Xu, K., Liu, H., Du, Y., et al. (2016). A comparative study for postural synergy synthesis using linear and nonlinear methods. Interna- 123 Autonomous Robots authored 10 articles in journals and 17 articles at conferences. P. Plinio Moreno received a B.Sc. Moreno’s global citation indexes h-index and i10-index are currently in Mechanical Engineering, B.Sc. 16 and 26 according to Google Scholar, and since 2017 h-index is 13 in Computer Science and M.Sc. in and i10-index is 20. According to Scopus his h-index is 10. Computer Science from the Uni- versidad de los Andes (Bogotá, Colombia) in 1998, 2000 and 2002 respectively. He completed the Ph.D. degree in Electrical and Computers Engineering at the Instituto Superior Técnico (IST, Lisbon, Portugal) in 2008. Dur- ing his Ph.D. studies, P. Moreno was holder of a Portuguese Sci- ence Foundation (Fundação para a Ciência e Tecnologia) grant (SFRH/BD/10753/2002). P. Moreno published a total of 61 papers in top journals and conferences in the areas of Computer Vision and Pat- tern Recognition (Pattern Recognition Letters), Artiﬁcial Intelligence (Neurocomputing), and Robotics (Autonomous Robots, Robotics and Autonomous Systems, IROS and ICRA). Of the 61 articles, 15 were published in international peer review journals and the remaining 46 at international peer review conferences. Since 2016, P. Moreno co-
Autonomous Robots – Springer Journals
Published: Apr 1, 2023
Keywords: Robotics; Dexterous robotic grasping; In-hand manipulation; Regrasping
You can share this free article with as many people as you like with the url below! We hope you enjoy this feature!
Read and print from thousands of top scholarly journals.
Already have an account? Log in
Bookmark this article. You can see your Bookmarks on your DeepDyve Library.
To save an article, log in first, or sign up for a DeepDyve account if you don’t already have one.
Copy and paste the desired citation format or use the link below to download a file formatted for EndNote
Access the full text.
Sign up today, get DeepDyve free for 14 days.
All DeepDyve websites use cookies to improve your online experience. They were placed on your computer when you launched this website. You can change your cookie settings through your browser.