Access the full text.
Sign up today, get DeepDyve free for 14 days.
Martin Riedmiller, B. Janusz, Komplexitat Und (1995)Using Neural Reinforcement Controllers in Robotics
M. McCloskey, N. Cohen (1989)Catastrophic Interference in Connectionist Networks: The Sequential Learning Problem
Psychology of Learning and Motivation, 24
Kai Krueger (2011)Sequential learning in the form of shaping as a source of cognitive flexibility
James McClelland, B. McNaughton, R. O’Reilly (1995)Why there are complementary learning systems in the hippocampus and neocortex: insights from the successes and failures of connectionist models of learning and memory.
Psychological review, 102 3
M. Haruno, D. Wolpert, M. Kawato (2001)MOSAIC Model for Sensorimotor Learning and Control
Neural Computation, 13
A. Barto, Steven Bradtke, Satinder Singh (1995)Learning to Act Using Real-Time Dynamic Programming
Artif. Intell., 72
B. AnsNeural Information Processing -letters and Reviews Sequential Learning in Distributed Neural Networks without Catastrophic Forgetting: a Single and Realistic Self-refreshing Memory Can Do It
Satinder Singh (2004)Transfer of learning by composing solutions of elemental sequential tasks
Machine Learning, 8
M. Rand, O. Hikosaka, S. Miyachi, Xiaofeng Lu, Kae Nakamura, K. Kitaguchi, Y. Shimo (2000)Characteristics of sequential movements during early learning period in monkeys
Experimental Brain Research, 131
Matthew Taylor, P. Stone (2009)Transfer Learning for Reinforcement Learning Domains: A Survey
J. Mach. Learn. Res., 10
T. Hesselroth, K. Sarkar, Patrick Smagt, K. Schulten (1994)Neural Network Control of a Pneumatic Robot Arm
IEEE Trans. Syst. Man Cybern. Syst., 24
P. Abbeel, A. Ng (2004)Apprenticeship learning via inverse reinforcement learning
Proceedings of the twenty-first international conference on Machine learning
M. Nekouie (2011)Adaptive Control of a Robotic Arm Using Neural Networks Based Approach
Yu Zhao, C. Cheah (2004)Position and force control of robot manipulators using neural networks
IEEE Conference on Robotics, Automation and Mechatronics, 2004., 1
A. Bouganis and M. ShanahanTraining a spiking neural network to control a 4-dof robotic arm based on spike timing-dependent plasticity,
Proceedings of the IEEE World Congress on Computational Intelligences (WCCI '10)
R. Shadmehr, H. Holcomb, H. Holcomb (1997)Neural correlates of motor memory consolidation.
Science, 277 5327
S. Rousset B. AnsPreventing catastrophic in- terference in multiple-sequence learning using coupled reverberating elman networks,
Proceedings of the 24th Annual Conference of the Cognitive Science Society
I. Jenkins, D. Brooks, Pd Nixon, R. Frackowiak, R. Passingham (1994)Motor sequence learning: a study with positron emission tomography
O. Hikosaka, Kae Nakamura, K. Sakai, H. Nakahara (2002)Central mechanisms of motor skill learning
Current Opinion in Neurobiology, 12
Yoshua Bengio, J. Louradour, Ronan Collobert, J. Weston (2009)Curriculum learning
C. Balkenius, J. Morén (1998)Computational models of classical conditioning: a comparative study
R. Challoo J. JohnsonA multi-neural network intelligent path planner for a robot arm,
Proceedings of the Artificial Neural Networks in Engineering (ANNIE '96)
B. Ans (2004)Sequential learning in distributed neural networks without catastrophic forgetting: a single and realistic self-refreshing memory can do it,
Neural Information Processing, 4
R. Rescorla (2003)More rapid associative change with retraining than with initial training.
Journal of experimental psychology. Animal behavior processes, 29 4
T. Brashers-Krug, R. Shadmehr, E. Bizzi (1996)Consolidation in human motor memory
Ashish Gupta, D. Noelle (2005)The Role of Neurocomputational Principles in Skill Savings
Alexandros Bouganis, M. Shanahan (2010)Training a spiking neural network to control a 4-DoF robotic arm based on Spike Timing-Dependent Plasticity
The 2010 International Joint Conference on Neural Networks (IJCNN)
R. O’Reilly, Y. Munakata (2009)Computational Explorations in Cognitive Neuroscience
R. Rescorla (2002)Savings tests: separating differences in rate of learning from differences in initial levels.
Journal of experimental psychology. Animal behavior processes, 28 4
G. Reynolds (1968)A Primer of Operant Conditioning
(1996)“Robot Manipulator Simulation,”
L. Saksida, Scott Raymond, D. Touretzky (1997)Shaping robot behavior using principles from instrumental conditioning
Robotics Auton. Syst., 22
M. Er, Yang Gao (2003)Robust adaptive control of robot manipulators using generalized fuzzy neural networks
IEEE Trans. Ind. Electron., 50
R. Rescorla (2002)Comparison of the rates of associative change during acquisition and extinction.
Journal of experimental psychology. Animal behavior processes, 28 4
R. French (1999)Catastrophic forgetting in connectionist
R. French (1999)Catastrophic forgetting in connectionist networks
Trends in Cognitive Sciences, 3
T. Brashers-Krug, R. Shadmehr, E. Todorov (1994)Catastrophic Interference in Human Motor Learning
Y. Zhao and C. C. CheahPosition and force control of robot manipulators using neural networks,
Prroceedings of the IEEE Conference on Robotics, Automation and Mechatronics
Hannes Saal, Jo-Anne Ting, S. Vijayakumar (2010)Active Sequential Learning with Tactile Feedback
M. Botvinick, D. Plaut (2004)Doing without schema hierarchies: a recurrent connectionist approach to normal and impaired routine sequential action.
Psychological review, 111 2
R. Rescorla (2001)Retraining of extinguished Pavlovian stimuli.
Journal of experimental psychology. Animal behavior processes, 27 2
R. Bapi, K. Doya, Alexander Harner (2000)Evidence for effector independent and dependent representations and their differential time course of acquisition during motor sequence learning
Experimental Brain Research, 132
M. Colombetti, M. Dorigo (1992)Robot shaping: developing situated agents through learning
R. M. FrenchDynamically constraining connectionist networks to produce distributed, orthogonal representations to reduce catastrophic interference,
Proceedings of the 16th Annual Cognitive Society Conference
R. French (2003)Catastrophic interference in connectionist networks
C. Miall (2002)Modular motor learning
Trends in Cognitive Sciences, 6
A. Robins (1995)Catastrophic Forgetting, Rehearsal and Pseudorehearsal
Connect. Sci., 7
Hindawi Publishing Corporation Journal of Robotics Volume 2011, Article ID 617613, 12 pages doi:10.1155/2011/617613 Research Article 1 2 3 Ashish Gupta, Lovekesh Vig, and David C. Noelle Department of Electrical Engineering and Computer Science, Vanderbilt University, Nashville, TN 37235-1826, USA School of Comptational and Integrative Sciences, Jawaharlal Nehru University, New Delhi 110067, India School of Engineering, University of California, Merced, Merced, CA 95343, USA Correspondence should be addressed to Lovekesh Vig, firstname.lastname@example.org Received 31 May 2011; Revised 2 September 2011; Accepted 20 September 2011 Academic Editor: Ivo Bukovsky Copyright © 2011 Ashish Gupta et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. Traditional artiﬁcial neural network models of learning suﬀer from catastrophic interference. They are commonly trained to perform only one speciﬁc task, and, when trained on a new task, they forget the original task completely. It has been shown that the foundational neurocomputational principles embodied by the Leabra cognitive modeling framework, speciﬁcally fast lateral inhibition and a local synaptic plasticity model that incorporates both correlational and error-based components, are suﬃcient to largely overcome this limitation during the sequential learning of multiple motor skills. Evidence has also provided that Leabra is able to generalize the subsequences of motor skills, when doing so is appropriate. In this paper, we provide a detailed analysis of the extent of generalization possible with Leabra during sequential learning of multiple tasks. For comparison, we measure the generalization exhibited by the backpropagation of error learning algorithm. Furthermore, we demonstrate the applicability of sequential learning to a pair of movement tasks using a simulated robotic arm. 1. Introduction however, that when neurons are shared between tasks savings are still possible. Savings could arise from subthreshold Humans acquire many diﬀerent skills, behaviors, and mem- residual synaptic weights associated with the initial task— ories throughout their lifetime. Lack of use of a particular weights that have been driven down by interfering expe- skill, behavior, or memory results in its slow degradation. It riences to below their threshold for neural ﬁring, but not is a common observation that reacquisition of any knowledge all the way down to their initial values. Finally, tasks may is typically rapid as compared to its initial acquisition. This share components or “subtasks.” To the degree that such retention of knowledge, often in a latent form, is known as components have isolated neural representations, learning savings. a new task may actually reinforce portions of a previously Why do we observe a degradation in performance when learned task. a particular piece of knowledge fails to be used? The initial Traditional artiﬁcial neural network models of human acquisition of task knowledge is driven by synaptic plasticity, learning, including those based on the powerful backpropa- shaped by experience. This plasticity continues even when gation of error learning algorithm, fail to display adequately that task knowledge is not regularly used. Thus, experience robust savings when tasks are learned sequentially. A set of with other activities could shape the neural circuits in a synaptic connection weights serve as the neural network’s manner that interferes with the initial knowledge that was memory, and any task is learned by modifying these weights. acquired. This is the strength of neural networks since they can What could be the neural basis of savings? Savings could learn almost any possible input-output mapping. However, emerge due to neural specialization. Some of the neurons this is also the source of problems, as the networks have employed by an initial task might not be reused when learn- an inherent tendency to abruptly and completely forget ing a subsequent task. To the degree that the sets of neurons previously learned knowledge when presented with new associated with diﬀerent tasks are disjoint, learning one task training inputs. This phenomenon is known as catastrophic will not aﬀect the synapses associated with another. Note, interference . This prevents artiﬁcial neural networks 2 Journal of Robotics from exhibiting savings and, therefore, leads to questions movement tasks sequentially, thereby substantially reducing about their biological plausibility. the retraining time required. This paper is organized as follows. In the next section, In our previous work, we proposed that the biological we provide some background and an overview of related constraints imposed by the structure of cortical circuitry may work. Section 3 provides a description of a Leabra-based task embody the properties that are necessary to promote savings, learning model and some simulations of the model. Section 4 as observed during human skill acquisition . Speciﬁcally, describes the results of these simulation experiments. We end we examined the neurocomputational principles forming the the paper with a discussion of the results, as well as some Leabra cognitive modeling framework , and we found conclusions. that these biologically motivated principles give rise to savings without the need for any auxiliary mechanisms. Our ﬁndings suggest that the Leabra implementation of fast acting lateral inhibition acts in concert with its synaptic 2. Background plasticity mechanism in order to produce adequately sparse 2.1. Leabra. The Leabra framework oﬀers a collection of representations to support skill savings. integrated cognitive modeling formalisms that are grounded Sparse representations involve patterns of neural ﬁring in known properties of cortical circuits while being suf- in which only a small fraction of neurons in a “pool” or ﬁciently abstract to support the simulation of behaviors “layer” are strongly active at any one time. The use of sparse arising from large neural systems. It includes dendritic inte- representations results in diﬀerent sets of neurons being used gration using a point-neuron approximation, a ﬁring rate for the diﬀerent tasks that the network has been trained to model of neural coding, bidirectional excitation between perform. There is good reason to believe that, when learning cortical regions, fast feedforward and feedback inhibition, multiple tasks, humans are able to generalize the common and a mechanism for synaptic plasticity  (refer to the structure, if any, that exists between them. In our previous appendices for a more detailed description). Leabra models work, we provided preliminary evidence that Leabra net- have successfully illuminated cognitive function in a wide works, even with a sparse internal representation enforced by variety of domains, including perception, object recognition, a biologically plausible lateral inhibition mechanism, are able attention, semantic memory, episodic memory, working to generalize the common subtasks shared by multiple tasks, memory, skill learning, reinforcement learning, implicit when doing so leads to appropriate responses. In this paper, learning, cognitive control, and various aspects of language we provide a more detailed analysis of the generalization seen learning and use. Of particular relevance to skill savings are in Leabra networks. We show that the generalization seen Leabra lateral inhibition formalism and its synaptic learning in a Leabra network when learning two tasks sequentially is rule. comparable to the generalization seen when the two tasks In the neocortex, two general patterns of connectivity are learned in an interleaved manner. In comparison, a have been observed involving inhibitory neurons and their backpropagation-based artiﬁcial neural network not only interactions with excitatory neurons, namely, feedforward does not show any generalization, but it also does not show and feedback inhibition . Feedforward inhibition occurs any savings either. when the inhibitory interneurons in a cortical region are Most neural-network-based controllers require substan- driven directly by the inputs to that region, producing rapid tial training on a particular task and need to be retrained inhibition of the excitatory neurons in that area. Feedback if the network subsequently learns a diﬀerent task [4, 5]. A inhibition occurs when the same neurons that excite nearby network model that exhibits savings oﬀers potential beneﬁts inhibitory interneurons are, in turn, inhibited by the cells for such applications. The retraining time required for a they excite, producing a kind of negative feedback loop. previously learnt task is considerably reduced if the network The eﬀects of inhibitory interneurons tend to be strong exhibits savings. Such a network would be capable of learning and fast in the cortex. This allows inhibition to act in a multiple tasks sequentially without the need to interleave regulatory role, mediating the positive feedback of bidi- these tasks. rectional excitatory connections between brain regions. A wide variety of sequential key pressing tasks have been Simulation studies have shown that a combination of fast used to investigate human motor skill learning [6–8], and feedforward and feedback inhibition can produce a kind of a number of interesting ﬁndings have resulted. There is a “set-point dynamics,” where the mean ﬁring rate of cells period of rapid improvement in performance during the in a given region remains relatively constant in the face of early stages of training. During this stage, learning is eﬀector moderate changes to the mean strength of inputs. As inputs independent (e.g., switching hands does not substantially become stronger, they drive inhibitory interneurons as well degrade performance). Further, interfering with the frontal as excitatory pyramidal cells, producing a dynamic balance systems involved in the controlled pathway during this between excitation and inhibition. Leabra implements this period seriously disrupts performance. Interfering with the dynamic using a k-Winners-Take-All (kWTA) inhibition automatic pathway, however, does not aﬀect performance function that quickly modulates the amount of pooled during this early period. In this paper, we demonstrate inhibition presented to a layer of simulated cortical neural the applicability of our Leabra network model to learning units, based on the layer level of input activity. This results in of movement tasks in a simulated robotic manipulator. a roughly constant number of units surpassing their ﬁring The network is able to exhibit savings whilst learning arm threshold. The amount of lateral inhibition within a layer Journal of Robotics 3 can be parameterized in a number of ways, with the most is observed in humans [12, 13]. While such modular models common being the percentage of the units in the layer that can exhibit robust savings (and appropriately limited forms are expected, on average, to surpass threshold. A layer of of interference), the biological plausibility of a reserve of neural units with a small value of this k parameter (e.g., 10– untrained neural modules awaiting assignment when a new 25%) will produce sparse representations, with only a small task is to be learned is questionable. Even if we assume the fraction of the units being active at once. existence of unused modules, questions still remain about With regard to learning, Leabra modiﬁes the strength their granularity—do we need a diﬀerent module even for of synaptic connections in two primary ways. An error- diﬀerent variants of the same task? It also poses a question correction learning algorithm changes synaptic weights concerning the total number of available modules. Some so as to improve network task performance. Unlike the modular networks do show limited forms of generalization backpropagation of error algorithm, Leabra error-correction through combining the outputs from multiple modules , scheme does not require the biologically implausible com- but they still need to use a fresh module for most cases . munication of error information backward across synapses. Modular approaches of this kind should be distinguished In addition to this error-correction mechanism, Leabra also from the hypothesis that the hippocampus and the neocortex incorporates a Hebbian correlational learning rule. This form distinct learning systems . This hypothesis asserts means that synaptic weights will continue to change even that catastrophic interference is alleviated through the use when task performance is essentially perfect. This form of of a fast hippocampal learning system that uses sparse correlational learning allows Leabra to capture certain eﬀects representations. While neocortical systems are assumed to of overlearning. use less sparse representations, making them more vul- nerable to interference, problems are avoided through a 2.2. Catastrophic Interference. Many past studies have shown hippocampally mediated process of consolidation, where that artiﬁcial neural networks suﬀer from a kind of catas- neocortical networks receive interleaved “virtual” practice in trophic interference that is uncharacteristic of human perfor- multiple skills. Through the mediation of the hippocampus, mance. The seminal example of catastrophic interference is multiple skills continue to be essentially learned “together,” rather than sequentially, one after the other. the experiment performed by McCloskey and Cohen , in which the authors tried to employ a standard backpropaga- One successful computational learning strategy that is tion network to perform the AB-AC list-learning task. In this similar in nature to hippocampally mediated consolidation involves the use of “pseudopatterns” . Trained artiﬁcial task, the network begins by learning a set of paired associates (A-B) consisting of a nonword and a real word (e.g., “pruth- neural network models are used to generate virtual training heavy”). Once, this learning was completed, they trained experiences—pseudopatterns—which are similar to previ- the network to associate a new real word with each of the ously learned patterns. These pseudopatterns are mixed with original nonwords (A-C). The authors found that as soon as the new patterns, corresponding to new learning experiences, the training on the AC list started, the network completely and the network is given interleaved training on all of these forgot the AB list. patterns. It has been observed that the use of pseudopatterns Since the original observation of catastrophic interfer- alleviates the problem of catastrophic interference substan- tially [17, 18]. However, the biological mechanisms giving ence in artiﬁcial neural networks, a number of compu- tational mechanisms have been proposed to overcome it. rise to the generation and use of these patterns have yet Most of these involve segregating the neural units associated to be fully explicated. It is also diﬃcult to imagine the generation of pseudopatterns for the maintenance of all with diﬀerent skills in order to avoid the damage caused by “reuse” of synaptic weights . For example, forcing layers kinds of knowledge. This is particularly true if hippocampus of neural units to form sparse representations reduces the is believed to be the sole site for the maintenance of probability that a given unit will be active while performing mixed training pattern sets. All kinds of knowledge and a given skill, thereby reducing the probability of interference skill acquisition do not depend upon the hippocampus. For example, there is evidence that humans can continue to when learning multiple skills in sequence. Leabra oﬀers a biologically justiﬁed mechanism for producing sparse learn new motor skills even after complete removal of the representations. With a low k parameter, Leabra kWTA hippocampus . There are many other factors that contribute to savings lateral inhibition implementation limits the overlap between the neural representations used for diﬀerent tasks. This has [2, 20]. Contextual cues help in disambiguating between been shown to improve performance on the AB-AC list diﬀerent tasks and could also lead to the use of diﬀerent sets of neurons for their representation. Overlearning a learning task . One extreme form of segregation between neurons de- particular task results in a sharpening of its representation, voted to diﬀerent tasks involves isolating them into discrete which is more resistant to perturbation. Finally, it is possible modules. Modular artiﬁcial neural network architectures that synaptic changes during the learning of an interfering have been proposed in which diﬀerences between tasks are task may drive certain neurons associated with a previously learned task below their ﬁring threshold—but just below that explicitly detected during learning, and a “fresh” module of neural units is engaged to learn the task, protecting threshold, allowing them to recover quickly once practice of previously trained modules from interference [10, 11]. the previous task is resumed. We have shown, in previous work, that the sparse repre- Importantly, overlearning of a task can strengthen its consol- idation in a module, increasing resistance to interference, as sentations enforced by Leabra lateral inhibition mechanism, 4 Journal of Robotics in conjunction with its synaptic learning rule, causes Leabra gripper through feedback from two video cameras. The simulations of cortical circuits to escape the most dire pitfalls pneumatically driven robot arm (SoftArm) employed in of catastrophic interference when those circuits are required this investigation shares essential mechanical characteristics to sequentially learn multiple temporally extended motor with skeletal muscle systems. To control the position of the trajectories . arm, 200 neurons formed a network representing the three- dimensional workspace embedded in a four-dimensional system of coordinates from the two cameras and learned a 2.3. Generalization. The use of sparse representation results three-dimensional set of pressures corresponding to the end- in the use of diﬀerent sets of neurons for diﬀerent tasks. eﬀector positions. This implies that the network learns the tasks in a highly Bouganis and Shanahan  present a spiking neural specialized manner. Hence, the network may not be able network architecture that autonomously learns to control a to generalize the common substructure existing between 4-degrees-of-freedom robotic arm after an initial period of diﬀerent tasks. There is good reason to believe that humans motor babbling. Its aim is to provide the joint commands are able to make such generalizations, even if the tasks are that will move the end-eﬀector in a desired spatial direc- learned sequentially. Also, artiﬁcial neural network models of tion, given the joint conﬁguration of the arm. Vaezi and human learning show generalization when multiple tasks are Nekouie  utilize a new neural networks and time series learned in an interleaved manner . Such generalization prediction-based method to control the complex nonlinear also emerges from networks that generate pseudopatterns of multivariable robotic arm motion system in 3D environment previously learned tasks [17, 18]. without engaging the complicated and voluminous dynamic Is it possible for a network with a sparse representation equations of robotic arms in the controller design stage. scheme to still exhibit generalization when tasks are learned sequentially? We know that the use of a small value for the 2.5. Sequential Learning. The implications and advantages of k parameter for internal hidden layers creates the possibility sequential training of partially unconnected tasks on a given of using diﬀerent units for diﬀerent tasks. A contextual cue network architecture remain mostly unexplored, unclear layer increases this probability by providing inputs to distinct or have even been seen negatively in form, for example, hidden layer neurons for the diﬀerent tasks. Contextual catastrophic interference [20, 29]. In robotics and machine cues are biologically justiﬁed, and they improve savings learning, though, a number of attempts have been made in signiﬁcantly. However, if the cue signal is too strong, it analysing enhanced learning through a sequence of training enforces a separate representation even for the common environments [30–35]. Saal et al.  consider the problem subtask, thus hindering generalization. We found that the of tactile discrimination, with the goal of estimating an strength of the cues can be set to an optimum value, such underlying state parameter in a sequential setting. They that the network continues to show signiﬁcant savings by present a framework that uses active learning to help with enforcing orthogonal representation , while still allowing the sequential gathering of data samples, using information the use of the same neurons for the common subtask, thus theoretic criteria to ﬁnd optimal actions at each time step. enabling generalization as well. Kruger explores the eﬀects of sequential training in the form of shaping in the cognitive domain. He considers abstract, yet 2.4. Neural Network Controllers. The neural-network-based neurally inspired, learning models and proposes extensions control technique has been widely used to solve problems and requirements to ensure that shaping is beneﬁcial using commonly encountered in control engineering. The most a long term memory model. While this model can learn useful property of neural networks in control is their ability sequentially and mitigate catastrophic interference, it is to approximate arbitrary linear or nonlinear mappings reliant on an explicit memory model whereas our model through learning. It is because of the above property that avoids catastrophic interference by simulating the biological many neural-network-based controllers have been developed processes in real neurons such as lateral inhibition. Kruger for the compensation of the eﬀects of nonlinearities and  explores the eﬀects of sequential training in the form of system uncertainties in control systems so that the system shaping in the cognitive domain. He considers abstract, yet performance such as the stability and robustness can be neurally inspired, learning models and proposes extensions improved. and requirements to conﬁrm that shaping is beneﬁcial for There has been considerable research towards the devel- sequential learning. However, to the best of our knowledge opment of intelligent or self-learning neural-based controller no system has yet been designed to explicitly utilize the architectures in robotics.Riedmiller and Janusz pro- phenomenon of savings in sequential environments by posed a neural self-learning controller architecture based utilizing generalization of neural models in a cognitive on the method of asynchronous dynamic programming framework. . The capabilities of the controller are shown on two typical sequential decision problems. Johnson et al. were 2.6. RoboSim Robotic Arm Simulator. The RoboSim simu- the ﬁrst to utilize a backpropagation network to train a lated robotic manipulator  was used to demonstrate robotic arm. The backprop was able to develop the inverse the applicability of our approach to learning of movement kinematics relationship for the arm after being trained on sequences. RoboSim is a simulation system for a 6-degrees- the arm data. Hesselroth et al.  employ a neural map of-freedom robot manipulator. For our experiments we used algorithm to control a ﬁve-joint pneumatic robot arm and only three joints and kept the other 3 ﬁxed to result in Journal of Robotics 5 Table 1: Similarities between various tasks as compared to Base ance of 0.25. The minimum weight value for any intercon- Task. nection was 0 and the maximum was 1. The activation of any node could range between 0 and 1. Leabra uses a mixture Task Similarity of the GeneRec learning algorithm and a form of Hebbian Nothing Same Task Nothing in common with Base Task learning. GeneRec is an error-driven learning algorithm . 5 Patterns Same Task 5 of the 18 patterns same as in Base Task Hebbian learning was strengthened in our simulations, as 10 Patterns Same Task 10 of the 18 patterns same as in Base Task compared to the Leabra default, contributing to 1% of synaptic weight changes rather than the default 0.1%. The learning rate for training was set to 0.01. Error tolerance was set to 0.25. This means that any output unit could be a manipulator with 3 degrees of freedom. The manipulator within ±0.25 of its desired activity without prompting error- may be moved either in joint coordinates or in cartesian correction learning. In order to facilitate a sparse internal coordinates. RoboSim allows joint weights to be altered, in representation, a value of k = 10 was used for the hidden order to favor movement of individual joints versus others. layer. For comparison, a backpropagation of error network The simulation system includes both forward and inverse (BP), matching the Leabra network in structure and size, was manipulator kinematics. The system also allows the user to also examined. teach the manipulator a sequence of consecutive movements. There are two common measures of savings: exact recognition and relearning . The exact recognition measure assesses the percentage of the original task that the network 3. Method performs correctly after it has learned a second task. The relearning measure examines how long it takes the network 3.1. Simulated Tasks with the Leabra Network. Four diﬀerent to relearn the original task. The two measures are usually tasks were used in our simulations. Each task contained correlated. We used an exact recognition measure to assess 18 diﬀerent input-output pairs, with the input and output savings. In particular, we measured the sum-squared error vectors generated randomly. While the output vectors for (SSE) of the network output on the ﬁrst task after training each task diﬀered, the input vectors were kept identical for all on the second task. In order to contrast this SSE value tasks. This resulted in diﬀerent input-output mappings being with “complete forgetting” of the ﬁrst task, the SSE was learnt by the network for diﬀerent tasks. also recorded prior to the ﬁrst task training, and we report After training on the Base Task, the network was trained the ratio of SSE after interference training to SSE of the on one of the interfering tasks. Table 1 brieﬂy describes the untrained network. A value of one or more for this ratio similarity between the Base Task and the interfering tasks. indicates complete forgetting of the initial task, while lower After training on the interfering task, retained performance values indicate savings. on the Base Task was assessed. Each of the input and output To measure the similarity of the hidden layer represen- patterns was encoded in the Leabra network over a pool of tation for the common subtask between the two tasks, two 36 neural units. diﬀerent measures were used. First, we computed the root- mean-square (RMS) value of the diﬀerence between the 3.2. The Network. Figure 1 shows the Leabra network used hidden layer activities for the common subtask using the in our simulations. At each step, the network was provided following formula: with a 36-unit input vector, encoding one of the 18 patterns 1 2 comprising a task. Complete interconnections from this A − A i=1 i i (1) RMS = . input layer to a 100-unit hidden layer produced an internal representation for the current pattern, with the sparsity of this representation controlled by lateral inhibition within the Here, A is the activity of the ith hidden layer unit for the common subtask after the network has been trained on hidden layer (i.e., by its k parameter). Complete bidirectional excitatory connections map this internal representation to an the Base Task. Similarly, A is the activity of the ith hidden output layer that is intended to encode the output pattern. layer unit for the common subtask after the network has been During training, the output layer was also provided with trained on the interfering task. N is the total number of units a target signal, indicating the correct output. The context in the layer. The minimum RMS value is 0. This would occur layer contained two units, each corresponding to one of the if the hidden layer activity is exactly the same for the two two learned tasks, indicating which of the two tasks was tasks. Since the activity of the units is restricted to the range to be produced by the network. This layer was connected between 0 and 1, the maximum RMS value is 1. This would to the hidden layer with an 80% chance of interconnection occur if the hidden layer activity is completely orthogonal between a given context layer unit and any given hidden layer and every unit is ﬁring at its maximum possible activation unit. This random interconnection structure was expected value for one of the tasks and is not ﬁring at all for the other to increase the chances of orthogonal representations for task. diﬀerent tasks. Second, we measured the percentage of total active units Most of the network parameters used in the simulations (at least 0.05 activation) that were common for the common were Leabra default values. Connection weights between subtask at the two relevant points in training. This percent- units were randomly initialized with a mean of 0.5 and a vari- age is a good measure of similarity in representation for the 6 Journal of Robotics Table 2: Mean and standard error of SSE ratios for Base Task, when diﬀerent interfering tasks were used. The ﬁrst row gives the ratio for a BP network, the second row for a Leabra network with Output pattern contextual cues of strength 1.0, the third row for a Leabra network with contextual cues of strength 0.35, and the fourth row for a Leabra network trained in an interleaved manner for the two tasks. A smaller ratio signiﬁes more savings. Network 5 Patterns Same 10 Patterns Same BP 1.269 (± 0.0242) 0.841 (± 0.0425) Context = 1.0 0.196 (± 0.0606) 0.144 (± 0.0457) Context = 0.35 0.147 (± 0.0350) 0.108 (± 0.0168) Interleaved 0.000 (± 0.0000) 0.000 (± 0.0000) Sparse hidden layer Next, we ﬁxed k = 10 for the hidden layer and performed similar experiments with all of the other interfering tasks. Table 2 shows the SSE ratio for the various cases. It was found that the SSE ratio was close to 1 for the BP network. Input context Input pattern This suggests that the BP network consistently experienced catastrophic interference. On the other extreme was the Figure 1: The Leabra network. Leabra network that was given interleaved training for the two tasks. In this case, the network learned both of the tasks completely. The Leabra network with contextual cues (of Leabra network, since roughly only 10 units are active at activation magnitude 1.0 and 0.35) that was given sequential any time (since k = 10 for the hidden layer). However, this training on the two tasks displayed signiﬁcant savings, measure is not appropriate for the BP network, since BP uses represented by a very small SSE ratio. almost all the hidden units to represent a task. To test if the networks were able to come up with We repeated each experiment ﬁve times in order to a generalized representation for the common subtask, we deal with stochastic variations (arising from such factors as compared the hidden layer activity for the common subtask random initial weights and the random order of pattern between the learning of the two tasks. Table 3 shows the presentation) in our simulations. We report the average of percentage of active units that were common across the these repetitions. two tasks. As explained before, this percentage has been computed only for the Leabra networks. We found that the Leabra network with contextual cues 4. Results of strength 0.35 shows generalization comparable to the 4.1. Generalization. A randomly initialized network was Leabra network that is trained on the two tasks in an trained on the Base Task. Then, the network was trained on interleaved manner. On the other hand, the Leabra network an interfering task. Using the “Nothing Same Task” as the with contextual cues of strength 1.0 shows very little gener- second task; the variation of the SSE ratio as a function of alization. For comparison, we also measured the percentage the hidden layer kWTA parameter is shown in Figure 2. of common active units when the “Nothing Same Task” was We found a regular decrease in the error ratio with used as the interfering task and the strength of contextual decrease in k. The reason for this decrease is the decrease cues was 1.0. We found that this percentage was zero, in overlap between the Base Task and Nothing Same Task signifying completely orthogonal internal representations. hidden layer activation patterns as representations became To compare the results of generalization seen in the sparser. We measured the percentage of total active units Leabra networks with that seen in the BP network, we common for the two tasks. This percentage was large for computed the root-mean-square (RMS) diﬀerence in the dense representation and decreased considerably for sparse hidden layer activity for the common subtask between the representations. Thus, increasing lateral inhibition produced two tasks. Table 4 shows the RMS values for the diﬀerent more distinct internal representations between the tasks cases. Once again, it was observed that the RMS diﬀerence and resulted in improved savings. The network showed is comparable for the Leabra network with contextual cues complete forgetting (SSE ratio greater than 1) of the base of strength 0.35 and the Leabra network that was given task for k greater than 40. This suggests that the savings interleaved training. Also, the RMS value for the Leabra exhibited by the Leabra network are due to the kWTA network with contextual cues of strength 1.0 is comparable mechanism and not due to its recurrent architecture. It was to that of a BP network. For comparison, we also measured found that the BP network exhibited no savings at all. This the RMS value when the “Nothing Same Task” was used was as expected since there is no explicit mechanism to as the interfering task. The RMS value for the BP network facilitate nonoverlapping hidden layer representation in the was 0.3279 ± 0.0098, and for the Leabra network it was BP network. 0.3517 ± 0.0027. Journal of Robotics 7 Savings with sparse representation The network starts with small initial synaptic weights. Hence, a large change in weights is required for success during the ﬁrst acquisition training session. During the ﬁrst 1.5 extinction training session, the weights to the acquisition 1 neurons start decreasing and the weights to the extinction neurons start increasing. As soon as the extinction neurons 0.5 win the inhibitory competition, the acquisition neurons tend to fall below their ﬁring threshold. At this stage, the weights to the acquisition neurons stop decreasing, BP 10 20 30 40 50 as these neurons are no longer contributing to erroneous k (kWTA) outputs. Hence, a signiﬁcant amount of acquisition-related Figure 2: Savings as a function of sparsity. An SSE ratio of one or association strength is retained through the extinction more indicates no savings, while lower values indicate retention of process. During the reacquisition training, the weights to Base Task knowledge. Nothing Same Task was used as the interfering the acquisition neurons increase once again and the weights task. The k parameter roughly equals the percentage of active units to the extinction neurons decrease. Once again, the weights in the hidden layer. Error bars display standard errors of the mean. stop changing as soon as the extinction neurons lose the inhibitory competition. Hence, most of extinction-related plasticity is retained through the acquisition process. In Table 3: Mean and standard error of percentage of common units this manner, subsequent acquisition and extinction trainings for the common subtask between Base Task and the interfering task. require a very small change in weights (Figure 4). Eﬀectively, The ﬁrst row gives the ratio for a Leabra network with contextual acquisition and extinction associations are maintained side cues of strength 1.0, the second row for a Leabra network with contextual cues of strength 0.35, and the third row for a Leabra by side in the network, allowing for the rapid switching network trained in an interleaved manner for the two tasks. between them based on recent training feedback. Network 5 Patterns Same 10 Patterns Same 4.2.1. Savings in Robotic Arm Movements. We have used our Context = 1.0 12.748 (±1.695) % 12.724 (±1.975) % model to simulate the learning of arm movement sequences. Context = 0.35 71.246 (±2.036) % 66.702 (±2.750) % Our model controls a simulated 3-joint planar arm which Interleaved 68.882 (±1.785) % 67.340 (±2.300) % moves over a 3-dimensional space, as shown in Figure 5. The state of the arm at any point in time is represented by the vector (q1, q2, q3), where q1, q2, and q3 are the Table 4: Mean and standard error of RMS diﬀerence in the hidden three joint angles. The joint angles range between −180 layer activation for the common subtask between Base Task and and 180 . Two unrelated trajectories are taught to the the interfering task. The ﬁrst row gives the RMS value for a BP manipulator with each trajectory represented as a sequence network, the second row for a Leabra network with contextual cues of strength 1.0, the third row for a Leabra network with contextual of arm states over the successive time steps. During training, cues of strength 0.35, and the fourth row for a Leabra network the arm is essentially guided along the desired trajectory, with trained in an interleaved manner for the two tasks. diﬀerences between the motor output of the arm controller and the conﬁguration of the arm, as speciﬁed by the guide, Network 5 Patterns Same 10 Patterns Same acting as a measure of error to drive synaptic weight change. BP 0.324 (±0.0080) 0.381 (±0.0067) Figure 6 shows the Leabra network used for our simu- Context = 1.0 0.299 (±0.0044) 0.329 (±0.0060) lations. The Sensory Input layer provides the current state Context = 0.35 0.159 (±0.0060) 0.147 (±0.0052) of the arm as input to the network and the Motor Output Interleaved 0.156 (±0.0053) 0.156 (±0.0083) layer is to produce the desired arm state for the next time step. Each joint angle is encoded over a pool of 36 neural units. Each of the 36 units has a preferred angle, ◦ ◦ ◦ 4.2. Savings. To uncover the degree to which our model ranging from −180 to 180 ,in10 increments. To encode exhibits savings, we conducted simulations to record how a given joint angle, the closest unit with regard to preference, the network changes during training and extinction. Animals as well as its two neighbors, is set to its maximal ﬁring are faster to reacquire an extinguished behavior, as compared rates. Similarly, patterns of activity over each row of 36 to initial acquisition, and they are faster to extinguish units in the Motor Output are decoded by identifying the a reacquired behavior, as compared to initial extinction preferred angle of the unit in the middle of the three adjacent [39–44]. A randomly initialized network was trained to units that are all active. Other patterns of activity in the respond upon the presentation of a pattern. Once this Motor Output layer are considered to be ill-formed. With training reached criterion, the network was trained to not each joint angle encoded over 36 units in this way, the respond upon the presentation of the pattern. This process complete arm conﬁguration is encoded over 108 units. The was repeated 5 times. Figure 3 shows the number of trials context layer was used to encode contextual information required for successive acquisition and extinction trainings. related to the tasks. Note that the required time quickly decreases. The model The tasks consist of 20 consecutive arm positions that predicts that the required number of trials will asymptote to the manipulator was trained to achieve in order. After a small value after just a few acquisition-extinction iterations. the manipulator had learnt the movements associated with SSE ratio 8 Journal of Robotics 45 1 0.8 0.6 0.4 0.2 15 0 −0.2 A1 E1 A2 A2 A3 E3 A4 E4 A5 E5 Acquisition weights Number of trainings Extinction weights Acquisition Figure 4: This graph plots the change in the summed connection Extinction weights in the acquisition pathway and in the extinction pathway Figure 3: The number of training trials required to reach criterion (y-axis) during the successive acquisition and extinction trainings (y-axis) decreases as the number of prior acquisition and extinction (x-axis). The change in weights decreases in both the pathways as training sessions (x-axis) increases. Error bars report standard the number of prior acquisitions and extinctions training sessions errors of the mean. increases. There seems to be a slow upward going trend in the weights in both the pathways, which appears to be a quirk of the simulator. Table 5: SSE ratio and retraining epochs. Task SSE ratio Training epochs Retraining epochs Task 1 0.522 (±0.0033) 10.8 (±0.24) 4.2 (±0.20) Task 2 0.467 (±0.0023) 12.2 (±0.24) 4.2 (±0.20) Task 3 0.413 (±0.0097) 15.4 (±0.24) 2.4 (±0.48) Task 4 0.328 (±0.0049) 13.8 (±0.24) 2.4 (±0.48) Task 5 — 13.6 (±0.24) — the ﬁrst task, the network was subsequently trained on the second task. The tasks were generated to ensure that there were no common patterns between the tasks. The manipulator was then tested and retrained on the ﬁrst task. The manipulator was able to accurately remember the training for 15 out of 20 movements for the ﬁrst tasks with an SSE ratio of 0.234. More importantly, the training time for relearning the ﬁrst time was just 2 epochs as opposed to 15 Figure 5: The RoboSim Robotic Arm Simulator. The simulator epochs for the untrained network. allows for both forward and inverse kinematics, learning of a To further investigate the applications of savings, the sequence of movements, varying degrees of freedom and link manipulator network was sequentially trained on ﬁve manip- lengths, and speed control. ulator tasks (task 1 to task 5). The SSE ratio for all the tasks was shown in Table 5. As expected, the SSE ratio is the lowest for the most recently trained tasks, indicating greater Output layer savings for these tasks. More importantly, the retraining time for all tasks was signiﬁcantly less than the original training time. Figure 7 shows the variation in savings for successively trained tasks with diﬀerent values of hidden layer kWTA. As the kWTA parameter is increased, the network is able to Sparse hidden layer retain less information about previously trained tasks. 5. Conclusion and Discussion We have shown that the neurocomputational principles embodied by the Leabra modeling framework are suﬃcient to produce generalization, whilst exhibiting signiﬁcant sav- Input context Input data ings during the sequential learning of multiple tasks. No aux- iliary computational mechanisms are needed. Generalization Figure 6: The Leabra network for robot manipulator simulations. Number of trials Journal of Robotics 9 networks. In many real-world scenarios, the full range 14 of training data is not available up front. In such cases, traditional neural networks need to be updated as new information arrives. The techniques described in this paper can be used to improve the eﬃciency of such systems by enabling the networks to learn tasks in a sequential manner. Appendices A. Dendritic Integration A fundamental function of neurons is the transformation 123456789 10 11 12 13 14 15 of incoming synaptic information into speciﬁc patterns of Number of tasks action potential output. An important component of this transformation is synaptic integration, the combination of K = 10 K = 30 voltage deﬂections produced by a myriad of synaptic inputs K = 20 K = 40 into a singular change in membrane potential. Leabra sim- Figure 7: Savings in successive tasks. ulates this integration at the dendrite of the neuron via a weighted summation of all the input activations followed by functional transformation (normally sigmoidal) of the sum. is found to be sensitive to the strength of the contextual cues B. Point Neuron Approximation input. Certain neurons develop strong connection weights for the common subtask during the base task training. Due to Leabra uses a point neuron activation function that models strong weights, these neurons receive stronger input for the the electrophysiological properties of real neurons, while common subtask even during the interfering task training. simplifying their geometry to a single point. This func- Hence, they have a greater chance of ﬁring. Contextual cues, tion is nearly as simple computationally as the standard however, provide strong inputs to certain other neurons in sigmoidal activation function, but the more biologically the hidden layer that compete with the shared neurons. If based implementation makes it considerably easier to model the cue signal is too strong, these other neurons win the inhibitory competition, as described below. Further, using competition, and the network fails to generalize the common this function enables cognitive models to be more easily subtask. We have shown that the cue strength can be set related to more physiologically detailed simulations, thereby to an optimum value such that cues are strong enough facilitating bridge-building between biology and cognition. to enforce savings through the generation of orthogonal representations, while still allowing the emergence of similar C. Lateral Inhibition representation for the common subtasks between diﬀerent tasks. The generalization observed during sequential learning The processes involved in lateral inhibition are particularly of the tasks is comparable to the generalization observed relevant to the model presented in this paper. Lateral inhibi- when the two tasks are learned in an interleaved manner. tion allows for competition between neurons involved in the While our previous work  demonstrated how to over- encoding of stimuli. Along with the mechanisms of synaptic come catastrophic interference via generalization on a Leabra learning, this competition separates the neurons that asso- Network, the current paper demonstrates the application of ciate the stimulus with responding, or acquisition neurons, savings to a simulated robotic arm. We have shown that from those which associate the stimulus with nonrespond- the application of biologically plausible savings during the ing, called extinction neurons. The class of in-hibitory func- sequential learning of robotic manipulator tasks has the tions that Leabra adopts are known as k-winners-take-all additional beneﬁt of retention of previously learnt tasks. We (kWTA) functions. A kWTA function ensures that no more demonstrated that with a minimal amount of retraining the than k units out of a total of n in a layer are active at any given manipulator is able to perform the original task accurately. point in time. This is attractive from a biological perspective This highlights the advantages of designing a network which becauseitcapturesthe setpoint property of inhibitory is capable of learning multiple tasks in sequence. It saves interneurons, where the activity level is maintained through the user the trouble of having to interleave the tasks or negative feedback at a roughly constant level (i.e., k). generating artiﬁcial pseudopatterns. While human motor skill learning is considerably more nuanced, this framework oﬀers insight into the rapid reacquisition after extinction D. kWTA Function Implementation exhibited by subjects during conditioning experiments. Many more applications of such networks may exist. The k active units in a kWTA function are the ones receiving Even though biological plausibility has been the focus the most excitatory input (g ). Each unit in the layer com- of our work, research in this direction is also signiﬁ- putes a layer-wide level of inhibitory conductance (g ) while cant for many engineering domains that use artiﬁcial neural updating its membrane potential such that the top k units Savings in epochs 10 Journal of Robotics will have above threshold equilibrium membrane potentials x , weighted in proportion to the activation of the receiving with that value of g , while the rest will remain below unit (y ). Thus inactivity of the receiving unit implies that no i j ﬁring threshold. The function computes the amount of weight modiﬁcation will occur. Conversely, if the receiving inhibitory current g that would put a unit just at threshold unit is very active (near 1), the update rule modiﬁes the given its present level of excitatory input, where θ is the weight to match the input unit’s activation. The weight will threshold membrane potential value. Computing inhibitory eventually come to approximate the expected value of the conductance at the threshold (g ) yields sending unit when the receiver is active (consistent with (F.1)). ∗ − g g (E − θ) + g g (E − θ) e l l e e l (D.1) g = , θ − E G. Error-Driven Learning where g represents the excitatory input minus the con- GeneRec implements error backpropagation using locally tribution from the bias weight and g g , g g are the total l e available activation variables thereby making such a learning conductances from the potassium and sodium channels, rule biologically plausible. The algorithm incorporates the respectively. E and E are the equilibrium potentials for the l e notion of plus and minus activation phases. In the minus the potassium and sodium channels, respectively . g is phase, the outputs of the network represent the expectation computed as an intermediate value between the g values or response of the network, as a function of the standard acti- for the kth and k + 1th units as sorted by level of excitatory vation settling process in response to a given input pattern. conductance (g ). This ensures that the k + 1th unit remains Then, in the plus phase, the environment is responsible for below threshold, while the kth unit is above it. Expressed as providing the outcome or target output activations. a formula this is given by The learning rule for all units in the network is given by θ θ θ g = g + q g (k) − g (k +1),(D.2) i i k+1 + − − Δw = y − y x (G.1) ij j j i where 0 <q < 1 determines where the inhibition lies between the k and k + 1th units. for a receiving unit with activation y and sending unit with activation x . The rule for adjusting the bias weights is just the same as for the regular weights, but with the sending unit E. Leabra Learning Algorithms activation set to 1: Leabra provides for a balance between Hebbian and error- + − driven learning. Hebbian learning is performed using a Δβ = y − y . (G.2) ij j j conditional principal component analysis (CPCA) algo- rithm. Error-driven learning is performed using GeneRec, The diﬀerence between the two phases of activation is which is a generalization of the recirculation algorithm and an indication of the unit contribution to the overall error approximates Almeida-Pineda recurrent backpropagation. signal. Bidirectional connectivity allows output error to be communicated to a hidden unit in terms of the diﬀerence in its activation states during the plus and minus states. F. Hebbian Learning + − (y − y ). j j The objective of the CPCA learning rule is to modify the weights for a given input unit (x ) to represent the con- Acknowledgments ditional probability that the input unit (x ) is active when the corresponding receiving unit (y ) is also active. This is The authors would like to thank three anonymous reviewers expressed as for their valuable feedback which helped to signiﬁcantly im- prove the quality of this paper. w = P x = 1 | y = 1 = P x | y . (F.1) ij i i i j In (F.1) the weights reﬂect the frequency with which References a given input is active across the subset of input patterns,  M. McCloskey and N. J. Cohen, “Catastrophic interference in represented by the receiving unit. If an input pattern occurs connectionist networks: the sequential learning problem,” in frequently with such inputs, then the resulting weights from Psychology of Learning and Motivation,G.H.Bower,Ed.,vol. it will be relatively large. On the other hand if the input 24, pp. 109–164, Academic Press, New York, NY, USA, 1989. pattern occurs rarely across such input patterns, then the  A. Gupta and D. C. Noelle, “The role of neurocomputational resulting weights will be small. The following weight update principles in skill savings,” in Proceedings of the 27th Annual rule achieves the CPCA conditional probability objective Conference of the Cognitive Science Society, pp. 863–868, 2005. represented by (F.1)  R. C. O’Reilly and Y. Munakata, Computational Explorations in Cognitive Neuroscience, MIT Press, 2000. Δw = y x − y w = y x − w ,(F.2) ij j i j ij j i ij  Y. Zhao and C. C. Cheah, “Position and force control of robot manipulators using neural networks,” in Prroceedings of the where is the learning rate parameter. The weights are IEEE Conference on Robotics, Automation and Mechatronics, adjusted to match the value of the sending unit activation pp. 300–305, December 2004. Journal of Robotics 11  M. J. Er and Y. Gao, “Robust adaptive control of robot  M. Riedmiller and B. Janusz, “Using neural reinforcement manipulators using generalized fuzzy neural networks,” IEEE controllers in robotics,” in Proceedings of the 8th Australian Transactions on Industrial Electronics, vol. 50, no. 3, pp. 620– Conference on Artiﬁcial Intelligence, 1995. 628, 2003.  A. G. Barto, S. J. Bradtke, and S. P. Singh, “Learning to act  R. S. Bapi, K. Doya, and A. M. Harner, “Evidence for using real-time dynamic programming,” Artiﬁcial Intelligence, eﬀector independent and dependent representations and their vol. 72, no. 1-2, pp. 81–138, 1995. diﬀerential time course of acquisition during motor sequence  J. Johnson,R.Challoo,R.A.McLauchlan, andS.I.Omar, learning,” Experimental Brain Research, vol. 132, no. 2, pp. “A multi-neural network intelligent path planner for a robot 149–162, 2000. arm,” in Proceedings of the Artiﬁcial Neural Networks in Engi-  O. Hikosaka, K. Nakamura, K. Sakai, and H. Nakahara, “Cen- neering (ANNIE ’96), 1996. tral mechanisms of motor skill learning,” Current Opinion in  T. Hesselroth, K. Sarkar,P.P.Van derSmagt,and K. Schulten, Neurobiology, vol. 12, no. 2, pp. 217–222, 2002. “Neural network control of a pneumatic robot arm,” IEEE  M. K. Rand, O. Hikosaka, S. Miyachi et al., “Characteristics Transactions on Systems, Man and Cybernetics, vol. 24, no. 1, of sequential movements during early learning period in pp. 28–38, 1994. monkeys,” Experimental Brain Research, vol. 131, no. 3, pp.  A. Bouganis and M. Shanahan, “Training a spiking neural 293–304, 2000. network to control a 4-dof robotic arm based on spike  R. M. French, Catastrophic interference in connectionist timing-dependent plasticity,” in Proceedings of the IEEE World networks, Macmillan Encyclopedia of the Cognitive Sciences, Congress on Computational Intelligences (WCCI ’10), 2010.  M. Vaezi and M. A. Nekouie, “Adaptive control of a robotic  T. Brashers-Krug, R. Shadmehr, and E. Todorov, “Catastrophic arm using neural networks based approach,” International interference in human motor learning,” Advances in Neural Journal of Robotics and Automation, vol. 1, no. 5, pp. 87–99, Information Processing Systems, vol. 7, pp. 19–26, 1995.  M. Haruno,D.M.Wolpert, andM.Kawato, “MOSAIC model  M. McCloskey, “Catastrophic interference in connectionist for sensorimotor learning and control,” Neural Computation, networks: the sequential learning problem,” Psychology of vol. 13, no. 10, pp. 2201–2220, 2001. Learning and Motivation, vol. 24, pp. 164–169, 1989.  T. Brashers-Krug, R. Shadmehr, and E. Bizzi, “Consolidation  S. P. Singh, “Transfer of learning by composing solutions of in human motor memory,” Nature, vol. 382, no. 6588, pp. 252– elemental sequential tasks,” Machine Learning,vol. 8, no.3-4, 255, 1996. pp. 323–339, 1992.  R. Shadmehr and H. H. Holcomb, “Neural correlates of motor  L. M. Saksida, S. M. Raymond, and D. S. Touretzky, “Shaping memory consolidation,” Science, vol. 277, no. 5327, pp. 821– robot behavior using principles from instrumental condition- 825, 1997. ing,” Robotics and Autonomous Systems,vol. 22, no.3-4,pp.  C. Miall, “Modular motor learning,” Trends in Cognitive Sci- 231–249, 1997. ences, vol. 6, no. 1, pp. 1–3, 2002.  M. Dorigo and M. Colombetti, “Robot shaping: developing  J. L. McClelland, B. L. McNaughton, and R. C. O’Reilly, situated agents through learning,” Tech. Rep. TR-92-040, “Why there are complementary learning systems in the International Computer Science Institute, 1993. hippocampus and neocortex: insights from the successes and  P. Abbeel and A. Y. Ng, “Apprenticeship learning via inverse failures of connectionist models of learning and memory,” reinforcement learning,” in Proceedings of the 21st Interna- Psychological Review, vol. 102, no. 3, pp. 419–457, 1995. tional Conference on Machine Learning (ICML ’04), pp. 1–8,  A. Robins, “Catastrophic forgetting, rehearsal and pseudore- July 2004. hearsal,” Connection Science, vol. 7, pp. 123–146, 1995.  Y. Bengio,J.Louradour,R.Collobert,and J. Weston,“Cur-  B. Ans, “Sequential learning in distributed neural networks riculum learning,” in Proceedings of the 26th International without catastrophic forgetting: a single and realistic self- Conference On Machine Learning (ICML ’09), pp. 41–48, June refreshing memory can do it,” Neural Information Processing, vol. 4, no. 2, pp. 27–32, 2004.  M. E. Taylor and P. Stone, “Transfer learning for reinforcement  B. Ans, S. Rousset, R. M. French, and S. Musca, “Preventing learning domains: a survey,” Journal of Machine Learning catastrophic in- terference in multiple-sequence learning Research, vol. 10, pp. 1633–1685, 2009. using coupled reverberating elman networks,” in Proceedings  H. Saal, J. Ting, and S. Vijayakumar, “Active sequential learn- of the 24th Annual Conference of the Cognitive Science Society, ing with tactile feedback,” Journal of Machine Learning Re- search, vol. 9, pp. 677–684, 2010.  I. H. Jenkins, D. J. Brooks, P. D. Nixon, R. S. J. Frackowiak,  K. A. Kruger, Sequential learning in the form of shapingas a and R. E. Passingham, “Motor sequence learning: a study with source of cognitive exibility, Ph.D. thesis, Gatsby Computa- positron emission tomography,” Journal of Neuroscience, vol. tional Neuroscience Unit, University of London, 2011. 14, no. 6, pp. 3775–3790, 1994.  R. Pollak, J. Schuetznerz, and T. Braunl, “Robot Manipulator  R. M. French, “Catastrophic forgetting in connectionist net- Simulation,” 1996, http://robotics.ee.uwa.edu.au/robosim/. works,” Trends in Cognitive Sciences, vol. 3, no. 4, pp. 128–135,  C. Balkenius and J. Moren, “Computational models of classical conditioning: a comparitive study,” Tech. Rep. LUCS 62, 1998.  M. Botvinick and D. C. Plaut, “Doing without schema  R. A. Rescorla, “Retraining of extinguished Pavlovian stimuli,” hierarchies: a recurrent connectionist approach to normal and Journal of Experimental Psychology: Animal Behavior Processes, impaired routine sequential action,” Psychological Review, vol. vol. 27, no. 2, pp. 115–124, 2001. 111, no. 2, pp. 395–429, 2004.  R. M. French, “Dynamically constraining connectionist net-  R. A. Rescorla, “Savings tests: separating diﬀerences in rate of learning from diﬀerences in initial levels,” Journal of Experi- works to produce distributed, orthogonal representations to reduce catastrophic interference,” in Proceedings of the 16th mental Psychology: Animal Behavior Processes, vol. 28, no. 4, pp. 369–377, 2002. Annual Cognitive Society Conference, 1994. 12 Journal of Robotics  R. A. Rescorla, “Comparison of the rates of associative change during acquisition and extinction,” Journal of Experimental Psychology: Animal Behavior Processes, vol. 28, no. 4, pp. 406– 415, 2002.  R. A. Rescorla, “More rapid associative change with retraining than with initial training,” Journal of Experimental Psychology: Animal Behavior Processes, vol. 29, no. 4, pp. 251–260, 2003.  G. S. Reynolds, A Primer of Operant Conditioning,Scott, Foresman and Company, 1975. International Journal of Rotating Machinery International Journal of Journal of The Scientific Journal of Distributed Engineering World Journal Sensors Sensor Networks Hindawi Publishing Corporation Hindawi Publishing Corporation Hindawi Publishing Corporation Hindawi Publishing Corporation Hindawi Publishing Corporation http://www.hindawi.com http://www.hindawi.com Volume 2014 http://www.hindawi.com Volume 2014 http://www.hindawi.com Volume 2014 http://www.hindawi.com Volume 2014 Volume 2014 Journal of Control Science and Engineering Advances in Civil Engineering Hindawi Publishing Corporation Hindawi Publishing Corporation http://www.hindawi.com Volume 2014 http://www.hindawi.com Volume 2014 Submit your manuscripts at http://www.hindawi.com Journal of Journal of Electrical and Computer Robotics Engineering Hindawi Publishing Corporation Hindawi Publishing Corporation http://www.hindawi.com Volume 2014 http://www.hindawi.com Volume 2014 VLSI Design Advances in OptoElectronics International Journal of Modelling & Aerospace International Journal of Simulation Navigation and in Engineering Engineering Observation Hindawi Publishing Corporation Hindawi Publishing Corporation Hindawi Publishing Corporation Hindawi Publishing Corporation Volume 2014 http://www.hindawi.com Volume 2014 http://www.hindawi.com Volume 2010 Hindawi Publishing Corporation http://www.hindawi.com Volume 2014 http://www.hindawi.com http://www.hindawi.com Volume 2014 International Journal of Active and Passive International Journal of Antennas and Advances in Chemical Engineering Propagation Electronic Components Shock and Vibration Acoustics and Vibration Hindawi Publishing Corporation Hindawi Publishing Corporation Hindawi Publishing Corporation Hindawi Publishing Corporation Hindawi Publishing Corporation http://www.hindawi.com Volume 2014 http://www.hindawi.com Volume 2014 http://www.hindawi.com Volume 2014 http://www.hindawi.com Volume 2014 http://www.hindawi.com Volume 2014
Journal of Robotics – Hindawi Publishing Corporation
Published: Dec 28, 2011
Access the full text.
Sign up today, get DeepDyve free for 14 days.