Access the full text.
Sign up today, get DeepDyve free for 14 days.
Anna Llenas, Mohamed Talamali, Xu Xu, James Marshall, A. Reina (2018)
Quality-Sensitive Foraging by a Robot Swarm Through Virtual Pheromone Trails
O. Michel (2004)
Cyberbotics Ltd. Webots™: Professional Mobile Robot SimulationInternational Journal of Advanced Robotic Systems, 1
G. Beni, Jing Wang (1993)
Swarm Intelligence in Cellular Robotic Systems
N. Lemmens, S. Jong, K. Tuyls, A. Nowé (2007)
Bee System with inhibition Pheromones
Ralf Mayet, Jonathan Roberz, T. Schmickl, K. Crailsheim (2010)
Antbots: A Feasible Visual Emulation of Pheromone Trails for Swarm Robots
Brian Hrolenok, S. Luke, Keith Sullivan, C. Vo (2010)
Collaborative foraging using beacons
Ryusuke Fujisawa, Hikaru Imamura, T. Hashimoto, F. Matsuno (2008)
Communication using pheromone field for multiple robots2008 IEEE/RSJ International Conference on Intelligent Robots and Systems
Agents have some form of collision avoidance mechanism, acting independently of the design dynamics
Ashvin Nair, Bob McGrew, Marcin Andrychowicz, Wojciech Zaremba, P. Abbeel (2017)
Overcoming Exploration in Reinforcement Learning with Demonstrations2018 IEEE International Conference on Robotics and Automation (ICRA)
R. Johansson, A. Saffiotti (2009)
Navigating by stigmergy: A realization on an RFID floor for minimalistic robots2009 IEEE International Conference on Robotics and Automation
T Balch (2000)
209Autonomous robots, 8
Valerio Sperati, V. Trianni, S. Nolfi (2011)
Self-organised path formation in a swarm of robotsSwarm Intelligence, 5
M. Dorigo, M. Birattari (2012)
Swarm Intelligence, 7461
M Dorigo, M Birattari (2007)
ScholarpediaSwarm intelligence, 2
V. Ziparo, A. Kleiner, B. Nebel, D. Nardi (2007)
RFID-Based Exploration for Large Robot TeamsProceedings 2007 IEEE International Conference on Robotics and Automation
D. Payton, Mike Daily, R. Estkowski, M. Howard, Craig Lee (2001)
Pheromone RoboticsAutonomous Robots, 11
Ryusuke Fujisawa, S. Dobata, K. Sugawara, F. Matsuno (2014)
Designing pheromone communication in swarm robotics: Group foraging behavior mediated by chemical substanceSwarm Intelligence, 8
R. Russell (1997)
Heat trails as short-lived navigational markers for mobile robotsProceedings of International Conference on Robotics and Automation, 4
Alexandre Campo, Álvaro Gutiérrez, Shervin Nouyan, Carlo Pinciroli, V. Longchamp, S. Garnier, M. Dorigo (2010)
Artificial pheromone for path selection by a foraging swarm of robotsBiological Cybernetics, 103
K. Sugawara, T. Kazama, Toshinori Watanabe (2004)
Foraging behavior of interacting robots with virtual pheromone2004 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) (IEEE Cat. No.04CH37566), 3
Katherine Russell, Michael Schader, Kevin Andrea, S. Luke (2015)
Swarm Robot Foraging with Wireless Sensor Motes
F. Ducatelle, G. Caro, Carlo Pinciroli, F. Mondada, L. Gambardella (2011)
Communication assisted navigation in robotic swarms: Self-organization and cooperation2011 IEEE/RSJ International Conference on Intelligent Robots and Systems
S. Alers, K. Tuyls, B. Ranjbar-Sahraei, D. Claes, Gerhard Weiss (2014)
Insect-Inspired Robot Coordination: Foraging and Coverage
Jonas Svennebring, Sven Koenig (2004)
Building Terrain-Covering Ant Robots: A Feasibility StudyAutonomous Robots, 16
S. Ishii, W. Yoshida, J. Yoshimoto (2002)
Control of exploitation-exploration meta-parameter in reinforcement learningNeural networks : the official journal of the International Neural Network Society, 15 4-6
T. Balch (2000)
Hierarchic Social Entropy: An Information Theoretic Measure of Robot Group DiversityAutonomous Robots, 8
Shervin Nouyan, M. Dorigo
Path Formation in a Robot Swarm Path Formation in a Robot Swarm
(1993)
Some experiments with foraging robots
S. Garnier, F. Tâche, M. Combe, A. Grimal, G. Theraulaz (2007)
Alice in Pheromone Land: An Experimental Setup for the Study of Ant-like Robots2007 IEEE Swarm Intelligence Symposium
Shervin Nouyan, Alexandre Campo, M. Dorigo (2008)
Path formation in a robot swarmSwarm Intell., 2
J. Langford (2010)
Efficient Exploration in Reinforcement Learning
Agents have the ability to send and receive basic signals (up to 6 scalar values), within a maximum range δ
N. Lemmens, K. Tuyls (2009)
Stigmergic landmark foraging
F. Ducatelle, G. Caro, Carlo Pinciroli, L. Gambardella (2011)
Self-organized cooperation between robotic swarmsSwarm Intelligence, 5
S Alers (2014)
761Artificial life, 14
A. Reina, A. Cope, E. Nikolaidis, James Marshall, Chelsea Sabo (2017)
ARK: Augmented Reality for KilobotsIEEE Robotics and Automation Letters, 2
J. Kuffner, S. LaValle (2000)
RRT-connect: An efficient approach to single-query path planningProceedings 2000 ICRA. Millennium Conference. IEEE International Conference on Robotics and Automation. Symposia Proceedings (Cat. No.00CH37065), 2
Daniel Ornia, M. Mazo (2020)
Convergence of ant colony multi-agent swarmsProceedings of the 23rd International Conference on Hybrid Systems: Computation and Control
F. Ducatelle, A. Förster, G. Caro, L. Gambardella (2009)
Supporting Navigation in Multi-Robot Systems through Delay Tolerant Network CommunicationIFAC Proceedings Volumes, 42
Mohamed Talamali, Thomas Bose, Matthew Haire, Xu Xu, James Marshall, A. Reina (2019)
Sophisticated collective foraging with minimalist agents: a swarm robotics testSwarm Intelligence, 14
We present a biologically inspired design for swarm foraging based on ant’s pheromone deployment, where the swarm is assumed to have very restricted capabilities. The robots do not require global or relative position measurements and the swarm is fully decentralized and needs no infrastructure in place. Additionally, the system only requires one-hop communication over the robot network, we do not make any assumptions about the connectivity of the communication graph and the transmission of information and computation is scalable versus the number of agents. This is done by letting the agents in the swarm act as foragers or as guiding agents (beacons). We present experimental results computed for a swarm of Elisa-3 robots on a simulator, and show how the swarm self-organizes to solve a foraging problem over an unknown environment, converging to trajectories around the shortest path, and test the approach on a real swarm of Elisa-3 robots. At last, we discuss the limitations of such a system and propose how the foraging efﬁciency can be increased. Keywords Swarm robotics · Collective learning · Navigation · Path planning 1 Introduction Dorigo et al. (2007); Blum and Merkle (2008); Kennedy (2006)). In the past thirty years the use of multi-agent techniques to These biological methods have enabled plenty of theoret- solve robotic tasks has exploded. The advancement of pro- ical developments, but the applicability of them is still sparse cessing power, sensor accuracy and battery sizes has enabled partly due to problem complexities hard to satisfy with min- the realisation of coordinated multi-robot problems, where imalistic robots. We focus in this work on the well known a number of agents organize to solve a speciﬁc task. When foraging problem, deﬁned as the dual problem of explo- designing increasingly larger systems, a huge part of meth- ration/exploitation, where a number of agents start at a given ods draw inspiration from biological systems (ants, bees…) point in space and must ﬁnd a target in an unknown (pos- where large amounts of agents interact (directly or indirectly) sibly time varying) environment, while converging to cycle to produce emerging behaviour that results in an optimized trajectories that enable them to exploit the found resources solution to a physical problem. Robotic swarms are at the core as efﬁciently as possible. Foraging is still a paradigmatic of these solutions, being decentralised multi-robot systems problem when designing very large robotic systems since with biologically inspired minimalist dynamics that where it combines exploration and on-line path planning. Addi- complex behaviours emerge (see e.g. Beni and Wang (1993); tionally the duality exploration vs. exploitation is nowadays extremely relevant in Reinforcement Learning and other AI related ﬁelds (Thrun, 1992; Ishii et al., 2002; Nair et al., 2018). This problem has been addressed with robotic ant- B Daniel Jarne Ornia inspired swarms that use indirect communication through d.jarneornia@tudelft.nl some “pheromone” (either virtual or chemical) (Fujisawa Steven Adams et al., 2008; Russell, 1997; Johansson and Safﬁotti, 2009). s.j.l.adams@tudelft.nl Some early work was done in Drogoul and Ferber (1993) Manuel Mazo Jr showing how robots can use pheromone based information m.mazo@tudelft.nl to explore and collect. After that, authors in e.g., Sugawara Delft Center for Systems and Control, Delft University of et al. (2004); Fujisawa et al. (2014); Campo et al. (2010); Technology, 2628CD Delft, The Netherlands 123 Autonomous Robots Alers et al. (2014); Font Llenas et al. (2018); Mayet et al. Note that (Nouyan et al., 2008; Ducatelle et al., 2011b; (2010); Garnier et al. (2007); Ziparo et al. (2007); Ducatelle Payton et al., 2001) assume agents in the swarm communi- et al. (2009); Svennebring and Koenig (2004) have explored cate directly with other agents, while (Hrolenok et al., 2010; several practical methods to implement pheromone-based Russell et al., 2015; Ducatelle et al., 2011a) de-couple this robotic navigation. However, complexities explode when and propose an environment-based interactions where agents designing very large multi-robot swarms, be it in terms of only write and read data into locally reachable beacons. How- computation, data transmission or dynamic coupling. Addi- ever, all assume knowledge of relative positions between tionally, these systems often include restrictive assumptions agents. in terms of e.g., sensor range, memory storage in the agents, computational capacity or reliability of communications. 1.2 Main proposal 1.1 Related work We aim in this work to provide a scalable and robust solu- tion to the problem of decentralised foraging robots, in terms There have been multiple proposals for de-centralised multi of communication disruptions, robot failures and changing robot foraging systems. Authors in Payton et al. (2001)pro- environment. We propose a design a minimalistic swarm sys- pose a network (multi-hop) algorithm that propagates counter tem capable of solving the foraging problem by using a form signals through a grid of robots, such that the robots accumu- of pheromone-inspired communication with the following late around the shortest path between two targets in space, restrictions: propagating messages using line-of-sight communication. The authors in Ducatelle et al. (2011b) present an naviga- • All agents do not have knowledge of relative (or global) tional algorithm for a homogeneous swarm to travel between positions with respect to other robots, and only need an two targets in space that requires the agents to ﬂood the orientation measure (a compass). network with their believe of the relative locations of each • The system relies on one-hop communication only with unique identiﬁable agent, and assumes a pre-positioned agent limited range, and does not require direction or distance at the target locations. of signals nor line-of-sight connectivity. In Nouyan et al. (2008) a path planning algorithm is • The system is fully distributed and needs no infrastructure presented that explores the environment using chains of in place. sweeping robots, or grow a grid of robots storing a vector ﬁeld pointing towards a given target relying on relative positioning measures and one-hop line-of-sight communication via LED The contribution is therefore split in two main branches: signalling. In Hrolenok et al. (2010); Russell et al. (2015), authors equip agents with deployable beacon devices to 1. Swarm doesn’t require position measurements, multi-hop guide navigation and store pheromone values. Authors treat or line of sight communication. pheromones as utility estimates for environmental states, and 2. Swarm is robust versus changing environments, com- agents rely on line-of-sight one-hop communication and a munication disruptions and agent failures, demonstrated relative position measure to the beacons to generate forag- through extensive simulations. ing trajectories. In Ducatelle et al. (2011a) the collaboration between two different swarms is investigated, using forag- ing as a paradigmatic problem, having a pre-deployed swarm The goal of our proposal is to obtain a swarm system that able to overlook the full domain and act as guiding agents for is deployable in real life, without knowledge of the spe- the second swarm to solve the foraging problem. In this case ciﬁc environment and that adapts to changing environments the use of vectors stored in the guiding swarm is explored, and unforeseen events. This goal dictates contributions 1 with such vectors being updated using the relative positions and 2 above. In this way, contribution 1 is an improvement of the foraging swarm. In Reina et al. (2017); Talamali et al. with respect to, among others, Hrolenok et al. (2010); Rus- (2020) the authors use a virtual-reality approach to imple- sell et al. (2015); Ducatelle et al. (2011a, b); Nouyan et al. ment the pheromone ﬁeld, allowing the robots to have access (2008); Sperati et al. (2011); Hoff et al. (2013). Although to this virtual pheromone from a central controller, enabling there are (few) examples of foraging swarm work that con- effective foraging. siders changing environments, robustness studies in the form In Lemmens et al. (2007); Lemmens and Tuyls (2009), of contribution 2 do not appear in detail in existing literature authors propose bee-inspired path integration algorithms on (with the exception of Nouyan et al. (2008) where a swarm a computational set-up, where agents use landmarks to store path formation problem is considered) for a fully distributed pheromone-based information when a change in direction is and autonomous foraging swarm, and we believe it is of great required. importance for a swarm intended to operate in the real world. 123 Autonomous Robots 1.3 Preliminaries 3. Agents have some form of collision avoidance mecha- nism, acting independently of the design dynamics. We use calligraphic letters for sets (A), regular letters for 4. Agents have sensing ability to detect S, T . scalars (a ∈ R) and bold letters for vectors (a ∈ R ). We 5. Agents have some measure of angular orientation (e.g., a consider discrete time dynamics k ∈ N, and we deﬁne an compass). inter-sampling time τ ∈ R such that we keep a “total” time 6. Agents are able to remain static. measure t = τ k. With vectors we use v as the euclidean norm, and v:= . We use the diagonal operator D(·) = v Observe that we do not assume the ability to measure direc- diag(·). tionality in the signals, nor any form of self-localisation capacity. Additionally, the agents do not have access to any form of global information about D, do not have unique 2 Problem description identiﬁers and do not require line-of-sight interactions. The swarm does not require either any form of infrastructure Take a swarm of N agents A ={1, 2,..., N } navigating in in place. At last, the swarm relies on one-hop communica- a bounded domain D ⊂ R , where D is compact and con- tion only, and we do not require the communication to be nected (possibly non-convex). We deﬁne x (k) ∈ D as the a synchronous. That is, communication happens on an agent- position of agent a at (discrete) time k, and velocity v (k) = to-agent basis, and agents do not cascade information through v (cos(α (k)) sin(α (k))) with α (k) ∈[−π, π ) as its 0 a a a the communication network. The challenge to be solved in orientation. We deﬁne the dynamics of the system to be in this work is then the following. discrete time, such that the positions of the agents evolve as problem 1 Design a swarm of N agents that solves a forag- ing problem over an unknown domain D, while satisfying the x (k + 1) = x (k) + v (k)τ. (1) a a a set of Assumptions 1, and does so with guarantees. Consider the case where the swarm is trying to solve a for- aging problem. 3 Proposal: self guided swarm Deﬁnition 1 (Foraging problem). A foraging problem on an unknown domain D is the joint problem of ﬁnding a target We now present our design for a self-guided swarm that region T ⊂ D when starting on a different region S ⊂ D, S ∩ solves the foraging problem presented in Sect. 2. Our design T =∅, and eventually following (sub) optimal trajectories. is based on the idea of allowing agents in the swarm to behave as “beacons” (agents in charge of guiding others) The main goal when solving a foraging problem is to com- or “foragers” (agents in charge of travelling from S to T ). plete trajectories between S and T as fast as possible (through Beacon agents store weight values and guiding velocity vec- the shortest path), back and forth. Then, the quality of the tors, which they broadcast to the foragers in the swarm to solutions can be quantiﬁed by the amount of trips between generate foraging trajectories. We ﬁrst describe the different the S and T performed by all agents in some time interval. modes the agents can operate in, the switching rules between To design such a swarm, we make the following assumptions modes, and then the dynamics in every mode. on the swarm agents’ capabilities. Assumption 1 (Swarm constraints ) 3.1 States and transitions Let us then split the swarm into three groups of agents. We 1. Agents have a small memory, enough to store scalars and will use the state variable s (k) ∈{B, F , F } with a ∈ A to a 1 2 vectors in R , and enough computational power to per- indicate: form sums and products of vectors. 2. Agents have the ability to send and receive basic signals (up to 6 scalar values), within a maximum range δ,every 1. s (k) = B ⇒ a is a beacon. τ seconds. 2. s (k) = F ⇒ a is a forager looking for T . a 1 3. s (k) = F ⇒ a is a forager looking for S. a 2 A trajectory is optimal if there exists no other trajectory connecting S and T that has a shorter length. We say (informally) it is sub-optimal We use s = F , F as the different foraging states. Then, we 1 2 if it is relatively close (in travel time) to the optimal trajectory length can group the agents in time-dependent sub-sets: the beacons between S and T . B(k) :{a ∈ A : s (k) = B} and the foragers A (k) := {a ∈ Many off-the-shelf small robotic systems satisfy these assumptions, e.g. Elisa-3, e-Puck, Sphero Bolt…. A : s (k) = s}. 123 Autonomous Robots At t = 0 all agents are initialised at S. One agent is cho- where λ ∈[0, 1] is a diffusion rate, and for some r ∈ R , sen to be the initial beacon, and all others are initialised as foragers looking for T : r if s (k) = F ∧ x (k) ∈ S, ⎨ f 1 f γ (k) = r if s (k) = F ∧ x (k) ∈ T , (6) f 2 f x (0) ∈ S ∀a ∈ A, a 0else. |B(0)|= 1, (2) The reward function in (5) works as follows: Foragers listen |A (0)|= N − 1. for weight signals from neighbouring beacons and broadcast back the maximum discounted weight, plus a constant reward This initial beacon can be chosen at random, or based on if they are in the regions S or T depending on their state. The some order of deployment of the swarm. Let us now deﬁne discounted term λ serves the role of a hop-count in other liter- the regions of inﬂuence of every agent as D (k) := {x ∈ D : ature examples. However, in our case the robots do not need to x − x (k) ≤ δ}, for some maximum instrument range a 2 keep count of messages, nor cascade messages through the δ ∈ R . As time evolves, the agents switch between states network. This propagation of information through the net- following the logic rules ∀a ∈ A: work is done in a fully sparse and memory-less way (since robots only listen for messages, compute new weights and B if b ∈ B(k) : x (k) ∈ D (k), ⎪ b a broadcast messages to robots around them), without the need F if s (k) = F ∧ x (k) ∈ S, 1 a 2 a of keeping connectivity in the beacon network. Observe that s (k + 1) = . (3) F if s (k) = F ∧ x (k) ∈ T , 2 a 1 a (5) depends on s, indicating that foragers listen and reinforce s (k) else, only the weights corresponding to their internal state value. The beacons update their weight values using a (possibly different) discount factor ρ ∈[0, 1] as The switching rule in (3) is interpreted in the following way. If a forager moves to a point in the domain where there is no other beacons around, it becomes a beacon. If a forager s (k) f ∈F (k) s s b is looking for the set T (mode 1) and ﬁnds it, it switches ω (k + 1) = (1 − ρ)ω (k) + ρ . (7) b b s |F (k)| to ﬁnding the starting set S (mode 2), and the other way around. For now we do not consider transitions from beacon The iteration in (7) is only applied if there are indeed to forager. neighbouring foragers around a beacon, so |F (k)|≥ 1. s s Otherwise, ω (k + 1) = ω (k). b b 3.2 Dynamics The update rule of u (k) is similarly: We assume that beacons remain static while in beacon state: s v (k) f ∈F (k) s s u (k + 1) = (1 − ρ)u (k) − ρ . (8) b b s |F (k)| v (k) = 0, x (k) = x (k ) ∀k ≥ k , (4) b b b b b b s s s and again, |F (k)|= 0 ⇒ u (k + 1) = u (k).Atthe same where k is the time step when agent b switched to beacon b b b b time that beacons update their stored weight values based on state. Beacon agents store weight values ω (k) ∈ R and s 2 the foragers around them, they update as well the guiding guiding velocity vectors u (k) ∈ R , initialised at zero for velocity vectors by adding the corresponding velocities of all agents in the swarm. At every time-step, beacon agents s s the foragers around them (with an opposite sign). The logic broadcast their stored values ω (k), u (k) with a signal of b b behind this has to do with the reward function in (5). For- radius δ. Let us deﬁne the set of neighbouring beacons to agers looking for T update weights and guiding velocities a forager in state s, f ∈ A (k) as B (k) := {b ∈ B(k) : associated with state F , but to indicate that they are in fact x (k) ∈ D (k)}, and the set of neighbouring foragers to b f s s moving out of S, we want to update the guiding velocities a beacon b ∈ B(k) as F (k) := { f ∈ A (k) : x (k) ∈ based on the opposite direction that they are following. D (k)}. Until now we have deﬁned the dynamics of the beacon At every time-step each forager receives a set of signals agents: position and velocities in (4) and update rules for from neighbouring beacons, and computes a reward function s s ω (k), u (k) in (7) and (8). We have yet to deﬁne the dynam- (k) ∈ R : + b b ics of the foragers. At every step, the foragers listen for guiding velocity and weight signals from beacons around s s s (k) = γ (k) + λ max ω (k), (5) f f b them. With this information, they compute the guiding vec- b∈B (k) 123 Autonomous Robots Algorithm 2 Behaviour of foragers tor: while s (k) = 0 do Listen for signals during τ seconds; s s s v ˆ (k) := v ω (k)u (k) , (9) f b b Broadcast v (k), (k); b∈B (k) Compute v (k + 1) from (10)and (11); f f Move according to v (k + 1); Asynchronous action where s is the opposite state to s, s = F ⇐⇒ s = F .At 1 2 if Obstacle then every time-step foragers choose stochastically, for a design Move to avoid obstacle; exploration rate ε ∈ (0, 1), if they follow the guiding vector end if Check transitions in (3); v ˆ (k) or they introduce some random noise to their move- end while ment. Let α be a random variable taking values (−π, π ], following some probability density function p(α ), and let α ˜ (k) := α (k) + α . Then, ∀ f ∈ A (k): a a u Assumption 2 cos α ˜ (k) ( ) Pr v (k + 1) = v = ε, f 0 1. Any D ⊆ D is a compact disc of radius δ. sin (α ˜ (k)) (10) 2. τ< , and can be chosen small enough in comparison 2v Pr{v (k + 1) = v ˆ (k)}= 1 − ε. 0 to the diameter of the domain D. 3. The regions S and T are compact discs of radius, at least, Additionally, we add a ﬁxed condition for an agent to turn 2τv . around when switching between foraging states. That is, 4.1 Domain exploration v (k + 1) =−v (k) if s (k + 1) = s (k). (11) f f f f Remark 1 It holds by construction that ∃b : x (k) ∈ With (10) and (11) the dynamics of the foragers are deﬁned a D (k) ∀k, ∀a ∈ A (k). too. We have experimentally veriﬁed that, over a diverse vari- b ety of set-ups, the weights ω (k) converge on average to a From the transition rule in (3) it follows that, whenever ﬁxed point forming a gradient outwards from S and T , there- x (k)/ ∈ D (k) for any beacon b ∈ B(k), it becomes a a b fore guiding the swarm to and from the goal regions. beacon, therefore covering a new sub-region of the space. To obtain the ﬁrst results regarding the exploration of the 3.3 Implementation domain, we can take inspirations from results in RRT explo- ration (Kuffner & LaValle, 2000), where the agents perform The dynamics presented result in the following algorithms a similar ﬁxed length step exploration over a continuous running in the robots. The robots switch between Algorithms domain. Let p (x, k) be the agent probability density of point 1 and 2, depending on if they are beacons or foragers. x at time k. We obtain the following results. Proposition 1 Let D ⊂ R be convex. Let some a ∈ A have Algorithm 1 Behaviour of beacons x (k ) = x . Then, for any convex region D ⊂ D with while s (k) = 0 do a 0 0 n s s Broadcast ω (k), u (k); non-zero Lebesgue measure, there exists κ ∈ R and time b b n + Listen for signals during τ seconds; k ∈ N such that s s Compute ω (k + 1), u (k + 1); b b Move according to v (k + 1); Pr[x (k + k ) ∈ D | x (k )]≥ ε κ . (12) a n 0 n a 0 n if Obstacle then Move to avoid obstacle; end if Proof (Sketch of proof) At every time-step k, with probabil- Check transitions in (3); ity ε, it holds α (k + 1) = α (k) + α , and observe that a a u end while α (k + 1) ∈ (−π, π ]. Let agent a be at point x at time a 0 k , and take the event where for every time-step after k the 0 0 agent always chooses the random velocity. For k = k + 1, the set of points X (1) ⊆ D satisfying p (k + 1, x)> 0 a a 0 4 Results and guarantees is X (k + 1) ={x ∈ D :x − x = v τ }. One can a 0 0 2 0 verify that X (k + 1) forms a circle in R around x .Now a 0 0 To show that the swarm ﬁnds the target region T and even- for k = k + 2, the set X (k + 2) is 0 a 0 tually converges to close to optimal trajectories between S X (k + 2) ={x ∈ D :x − x = τv , and T , let us ﬁrst state the assumptions for the following a 0 1 2 0 (13) theoretical results. x ∈ X (k + 1)}. 1 a 0 123 Autonomous Robots In this case, the set in (13) forms a ball of radius 2v τ around robots across the space, as well as their trajectories. In this x .At k = k + 2, it holds from (13) that p (x, k + 2)> work we have experimentally veriﬁed that, over a diverse 0 0 a 0 0 ∀ x ∈ X (2). Then, for any subset D ⊆ X (2), it holds that variety of set-ups, the weights ω (k) converge on average a 2 a Pr[x (2) ∈ D | x (k )]≥ ε κ , where κ ∈ R is a func- to a ﬁxed point forming a gradient outwards from S and T , a 2 a 0 2 2 + tion of the set D and the probability density p (x, k + 2). therefore guiding the swarm to and from the goal regions. 2 a 0 The sets X (k) are balls centred at x and radius k τ , and a 0 p (x, k)> 0 ∀ x ∈ X (k). Let at last D be any convex sub- a a n set of D with non-zero Lebesgue measure, and k = min{k : 5 Experiments D ⊂ X (k)}. Then, Pr[x (k +k ) ∈ D | x (k )]≥ ε κ n a a 0 n n a 0 n for some κ > 0. For the experiments we present an extensive statistical analysis of simulations running on Webots (Michel, 2004), adaptability results simulated in a particle swarm simulator Now we can draw a similar conclusion for the case where D and an application running on real Elisa-3 robot (GCTronic). is non-convex. We used the Webots simulator for the implementation of the Lemma 1 Let D be a non-convex path-connected domain. work given its capabilities to simulate realistic robots in phys- Let some a ∈ A have x (k ) = x . Then, for any convex a 0 0 ical environments, and since it includes Elisa-3 robot models region D ⊂ D of non-zero Lebesgue measure, there exists which satisfy the restrictions in Assumption 1 (using odome- τ> 0 and κ > 0 such that we can ﬁnd a ﬁnite horizon try for direction of motion measures). The swarm agents are k ∈ N: able to listen almost-continuously, store any incoming sig- nals in a buffer, and empty the buffer every τ seconds. The Pr[x (k + k ) ∈ D | x (k )]≥ ε κ . (14) a n 0 n a 0 n maximum range δ is virtually limited to restrict the com- munication range and simulate larger environments where Proof (Sketchofproof)If D is connected, then for any not all robots are capable of reaching each-other. Addition- two points x , x ∈ D, we can construct a sequence of 0 n ally, the robots use a built-in collision avoidance mechanism, balls {X , X ,..., X } of radius R ≥ v τ centred at 0 1 k 0 triggered when the infra-red sensors detect an obstacle closer x , x ,..., x such that the intersections X ∩ X =∅ 0 1 k i i +1 than 2 cm. See Appendix 4 for a detailed description of the and are open sets, and x ∈ X . Then, we can pick τ to be i i −1 collision avoidance mechanism. small enough such that every ball X ⊂ D does not intersect The ﬁnal parameters used for both the simulation and the with the boundary of D, and we can apply now Proposi- real swarm are presented in Table 1. For details on how these tion 1 recursively at every ball. If x − x < 2v τ , i i −1 2 0 parameters were selected and more information on how they then from Proposition 1 we know p(x , k + 2)> 0 i i −1 affect the system, see Appendix 1. To fully evaluate the per- since, for a given x and k , any point x has a non- i −1 i −1 i formance of the method we now include results regarding zero probability density in at most 2 steps. Then, it holds convergence and efﬁcacy metrics. We are interested in mea- that p(x , k + 2)> 0for some k ∈[k , 2k ], and n n−1 n−1 n n suring two main properties in such swarm: for a target region D :Pr[x (k + k ) ∈ D | x (k )]≥ n a n 0 n a 0 k k n n ε p (x , k + k )dx ≥ ε κ for some κ > 0. a n n 0 n n 1. Foraging performance over ﬁnite horizon t ∈[t , T ]: hit It follows from Lemma 1 that for a ﬁnite domain D every Navigation delay d(t , T ) (Ducatelle et al., 2011b). hit forager agent visits every region inﬁnitely often. For a given initial combination of foraging agents, we have now guar- 1 T − t hit antees that the entire domain is be explored and covered by d(t , T ) = , (15) hit |A| #trips beacons as k →∞. i ∈A 4.2 Foraging where t is deﬁned as the ﬁrst time step an agent reaches hit the target region. We leave for future work the formal guarantees regarding 2. Accumulation of agents around foraging trajectories: the expected weight ﬁeld values ω (k) and guiding veloci- Hyerarchic social entropy S(A). ties u (k). Based on existing literature (Payton et al., 2001; Jarne Ornia & Mazo, 2020), one could model the network of beacons as a discrete graph with a stochastic vertex weight For conciseness in the results we will address in this sec- reinforcement process based on the movement of the robots tion the performance in terms of navigation delay. For further s s to proof convergence of both ω (k) and u (k). Such results details on the entropy metric and results see Appendix 2.The b b would also allow us to study the limiting distribution of the experiment section is split in three parts: 123 Autonomous Robots Table 1 Simulation parameters Parameters m 2 ρλ r ετ(s)δ(m)v ( )σ 0.01 0.810.05 1 0.40.25 0.01 1. Foraging results We perform a statistical analysis of the foraging performance in terms of the metrics above for different swarm sizes. 2. Swarm adaptability We present the results obtained for the cases with mobile obstacles, communication errors and measurement errors. 3. Real swarm We demonstrate an example of a real robotic swarm solving a foraging problem. 5.1 Foraging results We simulated a Webots swarm in an environment of size 2.5m × 3 m, with and without obstacles, where the nest and food are 2 m apart, to evaluate the inﬂuence of the number of agents N in a foraging benchmark scenario. All simulations are run for horizon T = 400 s, since that was observed to be enough for the swarm to stabilize the trajectories. A ﬁrst result on the swarm conﬁguration under these parameters is shown in Fig. 1. Fig. 1 Webots swarm state with obstacle The robots are released in batches to avoid that the starting regions is overcrowded and the movements of the agents in Table 2 Navigation delay for [t , T ] without obstacle hit the densely populated region are solely determined by their Swarm size Foragers Random collision avoidance mechanism, which is called the over- crowding effect.Fig. 1a and b are simpliﬁed visualizations of Mean Std Mean Std the Webots simulator, The size of green dots represents rela- 49 17.45 5.46 294.84 12.53 tive amount of weights stored at the beacons, and the plotted 81 16.45 0.92 322.90 41.38 vectors are the guiding velocities u (k) at each beacon. One 157 103.3 63.44 1152.03 709.4 can see how the swarm is able to navigate to the target set, Bold values indicate the mean of the results and does so around the shortest path. Note however how the random motion still affects some members of the swarm, and how the agents’ accumulation seems to be restricted by how Table 3 Navigation delay for [t , T ] with symmetric obstacle hit well the beacons represent the desired velocity ﬁeld. This can Swarm size Foragers Random be clearly seen in Fig. 1. Most swarm members are indeed Mean Std Mean Std moving back and forth from S to T , but do so spreading 49 30.18 7.42 375.74 27.85 between the rows of beacons closer to the minimum length 81 23.55 1.53 433.93 66.30 path. In Tables 2 and 3 we present the navigation delay for three 157 97.54 22.25 1250.24 706.13 different sizes of the swarm, N = 49, N = 81 and N = 157. Bold values indicate the mean of the results These numbers were picked as follows. First, N = 49 is a7 × 7 group of robots. With the selected δ = 0.4m, we can expect it will take between 30 and 40 beacons to cover was observed in the ﬁrst experiments to perform very efﬁ- π0.4 2 the entire area (assuming every robot covers ≈ m of ciently for the selected environment. At last, N = 157 is area). This leaves a group of 10–20 robots for foraging tasks, the maximum robots Webots could run simultaneously (in which can be considered the minimum to obtain signiﬁcant our equipment), and is enough to produce the overcrowding results. Second, N = 81 is a 9 × 9 group of robots, which effect. 123 Autonomous Robots Each value includes results from 12 independent test runs. The navigation delay is presented for 2 different scenarios: 1. Random Agents moving at random. 2. Foragers Only forager agents N − B(T ). Computing the navigation delay for the Foragers case is 1 T −t equivalent to using d(t , T ) = , such hit i ∈A #trips |N −B(T )| that the number of trips is averaged among the number of foragers. We also include an absolute lower bound, corre- sponding to the absolute minimum possible travel time in the considered scenario. We can now extract several conclu- sions with respect to the swarm sizes. For a size of N = 49 or smaller too many agents are needed as beacons, hence the performance (specially when considering the full swarm) is signiﬁcantly worse than for bigger swarms. When increasing the size of the swarm to N = 81 the performance increases and the variance in the results is reduced, but this tendency reverses for too large swarms due to the overcrowding effect, as it can be seen in the results for N = 157. Another point worth noting is that the lower bound of navigation delay where robots know and follow the optimal trajectory to the target is ≈ 8s without obstacles and ≈ 9s with obstacle. These conclusions with respect to the swarm size are subject to mainly two factors: The size of the environment and the radius of inﬂuence δ. For larger environments and lower δ we can expect a scaling of the problem, such that we need Fig. 2 Swarm with movable beacons more robots to solve the foraging task and it will also take more robots to reach the overcrowding point. are employed as the system evolves, and increases the gran- ularity (or the deﬁnition) of the paths that are being used 5.2 Swarm adaptability more often, possibly enabling more optimal conﬁgurations. We implemented such method allowing beacons to turn to The ultimate goal of the system proposed in this work is foragers when their weights are lower than a speciﬁc thresh- to obtain swarms as efﬁciently as possible, without requiring old for too long, and adding a P controller to the velocity of complex single systems, with high ﬂexibility and robustness. the beacons (set to 0 in previous experiments): In this line of thought, we want to analyse the adaptability of the swarm with respect to three points: 0if D (k) ∩ S ∩ T =∅, v (k) = (16) v u (k) else 1. Removing beacons and allowing them to move. b,0 s b 2. Moving obstacles (non-stationary environments). 3. Agent failure and measurement errors. with v ∈ R ,v << v the beacons speed. This con- b,0 + b,0 0 troller allows the beacons to slowly move towards the mid For all the scenarios, we simulate a baseline swarm with direction between their two guiding velocity vectors u (k). N = 101 agents and the parameters as in Table 2, and every The logic between this controllers is straight-forward: If the numerical result includes a sample of 50 simulations. most optimal trajectory is a straight line, the most optimal conﬁguration for the beacons would be to sit along the line, 5.2.1 Movable beacons with the guiding vectors on 180 . From Figs. 2 and 3 one can see how this controller in- To increase overall efﬁciency of the system we implement creases the efﬁciency of the system. Here, the black lines controllers in the beacons to allow them to turn back to for- indicate the paths traveled by 10 randomly selected foraging agers when their weights have not been updated for a long agents for the last 5 time steps. While the swarm is exploring, time and to move to more transited areas of the environ- it uses ≈ 35 beacons before starting to re-conﬁgure. After ment. This ensures only the necessary amount of beacons around 100 s, beacons disappear from large unvisited areas 123 Autonomous Robots Table 4 Navigation delay measured for all agents for [t , T ] for mov- hit able beacons Mean Std Static beacons 12.51 5.12 Movable beacons 10.86 3.47 Bold values indicate the mean of the results well that the absolute optimal travel time between nest and food source would in this case be ≈ 9s, and the re-conﬁgured swarm achieves 10.86 5.2.2 Non-stationary environments The updating of information stored in the beacon grid forced by the evaporation factors, and the dynamical beacon grid generated by allowing beacons to move, should in theory allow our method to adapt to changing environments. Here, we present the experimental results regarding the adaptability of the system with respect to non-stationary environments. We considered a similar environment as in Ducatelle et al. (2011a) with two obstacles forcing a curved path between the targets. From Fig. 3 one can see that the system is able to create a sub-optimal trajectory with an efﬁcient beacon infrastructure. After 201 seconds the obstacles are removed. From Fig. 3 one can conclude that 600 seconds after the envi- ronment changes, the system is able to create a new efﬁcient beacon structure around the new optimal trajectory. 5.2.3 Agent failures and measurement errors We now present the experiment results regarding the adapt- ability of the system with respect to agent failures and noisy measurements. First, to account for possible disturbances in sensor measurements, we consider the disturbed weight and velocity values for a random variable μ ∼ N (1,σ ), s s ω ˜ (k,σ ) :=|μ |ω (k), b b Fig. 3 Foraging in a dynamical environment (17) v ˜ (k,σ ) :=D(μ ,μ )v (k). f k k f of the domain and the rest get closer to the optimal trajectory, That is, every measurement of weights transmitted by the allowing more accurate trajectories by the agents. When the beacons and every velocity measurement computed by the system converges, the swarm is using ≈ 18 beacons, half of agents are perturbed proportionally with a random noise sig- those in the static beacon scenario. nal with variance σ . In this case we simulate the swarm We compare in Table 4 the navigation delay for the same without obstacles. scenario with and without moving beacons. The distributed The results are presented in Table 5. One can see how as approach for adapting the beacon positions results in a signif- we increase the variance of the noise, the navigation delay icant reduction of navigation delay, indicating a more optimal worsens progressively. However, even for relatively “wide” trajectory set but specially a direct improvement in the total normal distributions with σ = 1, the swarm shows rela- amount of trips for a similar size of the swarm, given that a tive success at solving the foraging problem (recall that the number of beacons return to foraging functions. Reecall as absolutely optimal navigation delay is ≈ 8s). 123 Autonomous Robots Table 5 Navigation delay for shortcoming, a global tracking system is employed which σ Mean Std [t , T ] for swarm with hit equips the robots with a virtual compass. Since our tracking perturbed communication 0.01 13.37 2.62 system is only able to measure the orientations of the robots 0.1 13.63 3.04 when all robots are not moving, the robots run synchronously. 0.5 14.30 1.02 In the test set-up, a swarm of 35 robots is deployed around 1.0 20.24 0.76 a starting region and needs to look for a target in an arena of 2.5 327.80 67.29 size 2.4 m × 1.15 m without obstacles (see Electronic Suple- mentary Material 1). Figure 4a and b shows snapshots of this Bold values indicate the mean of experiment. The red and blue pillars are the centers of the the results nest region and target region, respectively. Green coloured robots are beacons, red coloured are searching for the nest Table 6 Navigation delay for n (%) Mean Std of f [t , T ] for swarm with hit region, blue coloured are searching for the target region. The temporal agent failure 1 15.59 7.44 black arrows indicate the heading direction, the blue and red 5 17.98 10.58 arrows the guiding velocities u(k) at each beacon. 10 20.01 12.94 The resulting behaviour aligns with the predicted design 20 21.28 11.31 dynamics. The swarm creates a covering beacon network. 50 32.43 6.36 The ﬁrst robots reaching the target region attracts other robots to this region and after 30 steps all non-beacon robots travel Bold values indicate the mean of back and forth between the target regions, clustered around the results the shortest path. We point out that during all tests there were robot failures. This did not affected the swarm’s behaviour Now we set to analyse a scenario where agents fail noticeably, showing its robustness. We leave for future work randomly in the swarm (both foragers and beacons). We the realisation of extensive tests with more powerful robots to implement this by considering the following failure scheme. conﬁrm all results in other scenarios of swarm deployment. Every T = 20s, a random fraction n of the agents of f of f “fails” during a total of T seconds. When an agent fails, of f it is unable to move, send or receive messages. After T of f 6 Discussion seconds, the agents return to normal operation, and a new subset of agents is drawn at random and experiences fail- We have presented a foraging swarm design where robots ure. We repeat this for the entire 400s of the simulation, and are able to guide each-other to successfully forage with- measure the navigation delay of the swarm. out the need of position measurements, infrastructure or The results for the random agent failure scenario are global knowledge, and using minimal amounts of knowl- presented in Table 6. The algorithm proposed presents signif- edge about each-other or the environment. The system has icant robustness versus random agent failures. Observe that, been implemented on a swarm of Elisa-3 robots, and an even in the scenario where 50% of agents fail every 20s, extensive experimental study has been performed using the the performance of the swarm does not collapse, and in fact Webots simulator. We have shown how a middle sized swarm yields a navigation delay of only 2× the baseline scenario. (N ≈ 100) is able to ﬁnd an unknown target and exploit This would mean that, on average, the agents that are online trajectories close to the minimum length path. The system solve the foraging problem almost with the same delay as does require agents to know their orientation, and we have the baseline scenario: the navigation delay is a measure of seen how it can be affected by overcrowding effects when seconds per trip and agent, therefore having half the agents agents need to avoid colliding with each-other. Additionally, failing through the entire simulation would already yield a we have observed how the optimality of the trajectories is navigation delay at least twice as the baseline. affected by the resulting distribution of the beacons, which gives room for future work regarding the possibility of having 5.3 Robotic swarm beacons re-conﬁgure to more optimal positions. In Sect. 5.1 we show how the swarm is able to solve the foraging problem At last we implemented the work on real Elisa3-robots, in for an environment with and without obstacles, for different order to qualitatively conﬁrm results of the work. Since the numbers of agents. Then, in Sect. 5.2 we demonstrate how the Elisa-3 robot is also used for the Webots experiments, the proposed system is able to adapt to dynamic environments, robot characteristics and behaviour are the same for both the and the robustness it presents versus measurement noise and real swarm and the simulated swarm. Note that the Elisa3- system faults. We expect in the future to add ultra-wide band robots matches the restrictions of Assumption 1, except the communication modules to the Elisa-3 robots with magne- capacity to measure its angular orientation. To overcome this tometers, that would allow us to run the system on a much 123 Autonomous Robots 6.1 Advantages and shortcomings We aim now to summarize the advantages this method presents with respect to existing literature, as well as short- comings that emerge from the design. First, we would like to emphasize the fact that the proposed system does not require line-of-sight communication, as done in e.g., Sperati et al. (2011); Hoff et al. (2013); Ducatelle et al. (2011a); Hrolenok et al. (2010); Lemmens and Tuyls (2009); Russell et al. (2015); Payton et al. (2001). This would enable the swarm to be deployed in much more diverse environments (clut- tered, uneven terrain…), and we believe is behind the strong robustness results obtained in the experiments. Additionally, it does not require relative position measurements (Hrolenok et al., 2010; Payton et al., 2001; Ducatelle et al., 2011a;Lem- mens and Tuyls, 2009; Hoff et al., 2013; Ducatelle et al., 2011b), which also adds signiﬁcant ﬂexibility when consid- ering speciﬁc environments or robot types, and it does this while keeping foraging metrics in the line of the existing lit- erature (in the best case, ≈ 1.5 ×−2× the absolute optimal travel time). Additionally, we demonstrated experimentally how this simplicity of the swarm requirements results in a swarm that is signiﬁcantly robust towards robot faults or dynamic environments. At last, the proposed structure for the foragers to become beacons (and beacons to go back to foragers) should lead to much lower amount of beacons required (in comparison to e.g.(Hrolenok et al., 2010;Hoff et al., 2013)), since we do not require the beacon network to be in range of each other, and we require an amount of mes- sages per step of order O(N ) that are asynchronous between robots. However, we do still retain the assumption of a global reference frame, that may not be able to be implemented with a compass in inside spaces. Additionally, the nature of the algorithm causes the robotic swarm’s performance to decrease signiﬁcantly when the environment becomes too cluttered with robots. At last, we still need to demonstrate the capacity of a real swarm of robots to operate in a large environment, since the Elisa3 robots proved difﬁcult to be applicable to large swarms and arenas. Testing the algorithm in more complex robots (e.g., e-Puck robots) would probably result in much more relevant experiments. Supplementary Information The online version contains supplemen- tary material available at https://doi.org/10.1007/s10514-023-10102- y. Fig. 4 Robotic swarm with Elisa3 robots Acknowledgements The authors D. Jarne Ornia and M. Mazo Jr. want to acknowledge their work was partially funded by ERC Starting Grant SENTIENT 755953. larger swarm and on more complex environments, and to apply it to other navigation-based problems like surveillance, Declarations target interception or ﬂocking. We leave for future work as well the formal analysis of the resulting beacon graphs, and Conﬂict of interest No other conﬂict of interest apply. the evolution of the variables ω(k) and u(k). 123 Autonomous Robots A.2 Diffusion rate The diffusion rate inﬂuences how fast the information spreads through the network of beacons. It can be analogous to a dis- count rate in a reinforcement learning problem; larger values generate smoother gradients of the weight and velocity ﬁelds, and lower values generate sharper gradients. Intuitively, only a small subset of beacons receive a reward of r (they need to be close to S, T ). Then, the foragers propagate this informa- tion through the network as they explore, and the information L(T ,b) is diffused at a rate of λ , where L(T , b) the minimum number of beacons that a forager needs to encounter from Fig. 5 Results for combinations of ρ values the food source S to beacon b. We are then interested on not losing too much information for large environments, there- fore we can pick a λ close to 1 to ensure this, but strictly Open Access This article is licensed under a Creative Commons smaller than 1. In fact, in Ornia et al. (2022), authors show Attribution 4.0 International License, which permits use, sharing, adap- how for an ideal ant inspired swarm with number of agents tation, distribution and reproduction in any medium or format, as N →∞, the inﬂuence of diffusion parameters vanishes as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indi- long as 0 >λ> 1, and they show that for λ> 1 the weight cate if changes were made. The images or other third party material values explode and grow unbounded as k →∞. in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material A.3 Reward r is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copy- The reward parameter mainly controls the scale of the weight right holder. To view a copy of this licence, visit http://creativecomm values. Since the agents only reward those beacons in range ons.org/licenses/by/4.0/. of the target region with r, and the rest with 0, it serves as a scaling factor for the weight ﬁeld. Very small values would create weight gradients through the network that are A Parameter selection not strong enough to properly guide agents, where too large values could attract agents too fast towards wrong areas of The swarm design presented has a set of parameters that the environment. A baseline value of r = 1 was tested affect the behaviour and performance of the swarm. We set and showed good results for the considered scenario. When out now to address the choice of these parameters used for having larger swarms and larger environments, it would be the simulations in this work. The ﬁnal parameters used for recommended to use larger r values to make sure the gradi- all swarms are presented in Table 1. ents are constructed on larger networks. A.1 Evaporation rate A.4 Exploration rate " The evaporation rate ρ can be interpreted as the speed at The exploration rate controls the probability of choosing a which the system forgets about “past” information. It intro- random direction of movement for the foragers at every step duces a trade-off between how important “new” actions are k. Larger probabilities yield stronger exploration and worse versus how important old knowledge is. Therefore, we would foraging, but too low probabilities yield a risk of trapping want higher values of ρ when there is unknown information agents in local minima or not solving the foraging problem being discovered, and lower values to prevent de-stabilization correctly. A batch of simulations were performed to decide of the swarm due to single agent errors. We ﬁxed the rest of on the values of , and values between ∈[0.03, 0.1] showed the parameters to the values of Table 1 for the considered the best navigation delay for swarms between N = 47 and environment with obstacle and run 50 particle simulation N = 101. runs for different combinations of evaporation rates, comput- ing the resulting navigation delay. We allowed for different A.5 Sampling period evaporation rates in the weights (ρ ) and the velocities (ρ ), w v and the results are presented in Fig. 5. Therefore, a single The sampling period affects both the communication rate value of ρ = 0.01 was chosen as relevant for the current and the rate of re-computation of velocities for the foraging experiments. robots. Intuitively, shorter periods are necessary for faster 123 Autonomous Robots of robots is |A | |A | c c i i H (A, h) =− log . (18) 0.8 2 N N c ∈C(t ,h) 0.6 Hierarchic social entropy is then deﬁned as integrating H (A, h) over all values of h, S(A) = H (A, h)dh. To allow for zero entropy, we take the lower limit of the 0.4 integral to be the diameter of the robots (δ = 4cm). The entropy results of the experiments in Sect. 5.1 are presented 0.2 �× N=49 in Fig. 6. In this case, we only compute the entropy of the �× N=81 forager agents, and we normalise against a practical “max- �× N = 157 imum” entropy computed by randomizing agents over the entire space. At t = 0 all agents start at the nest, hence 0 100 200 300 400 the low entropy values, and from there the entropy increases Time [s] while the swarm explores and tries to ﬁnd the target. After the Fig. 6 Social entropy with and without (×) obstacles exploration phase, the entropy begins to settle to lower values as the robots accumulate over fewer trajectories. Entropy is higher for the symmetrical obstacle due to ﬁrst, the split of robots and smaller environments, and longer periods are suit- agents among the two possible paths, and second, the mini- able for slower robots or very large environments. A value of mum length path being longer. τ =1 s was chosen to be representative of most system’s capa- bilities (1 Hz communication frequencies) and reasonable for the velocities of the agents (they would communicate 4 times C Extended concept per meter of trajectory). C.1 Beacon-forager switching A.6 Instrument range ı To introduce switching from the beacon state to a forager state, we ﬁrst deﬁne the last foraging state of a beacon agent In practice, the instrument range is determined by the robotic as s (k). Then, the logic rules for state switching in (3)are platform used (However, this can be “virtually” limited if updated as: the robots are able to measure signal power). Larger values would require less beacons to cover the space, but the detail s (k + 1) = of the coverage would be much more coarse. We limited the “B if b ∈ B(k) : x (k) ∈ D (k) ∧ range of signals to δ = 0.4m to consider a representative case b a a ∈ / B(k − 2), where the robots can use other short range communication if ⎪ “F if s (k) = F ∧ x (k) ∈ S or necessary (e.g., Bluetooth), and to allow for a more general ⎪ a 2 a a ∈ B(k) ∧ ω (k)<η ∧ ⎨ ω scenario where we need several robots to cover the entire s a (19) f ∈ A : x (k) ∈ D (k) ∧ s (k) = F , environment. f a 1 ⎪ “F if s (k) = F ∧ x (k) ∈ T or a 1 a ⎪ 2 ⎪ s ⎪ a ∈ B(k) ∧ ω (k)<η ∧ ⎪ s a ⎪ − f ∈ A : x (k) ∈ D (k) ∧ s (k) = F f a 2 B Entropy results s (k) else Accumulation of agents around trajectories (or path cre- with η ∈ R . Compared to the old switching rules, two w + ation) is an example of emergent self-organized behaviour. To conditions for the foraging states are added, these work as measure this, entropy has been used to quantize forms of clus- follows: A beacon agent can only switch to a foraging state tering in robots, and to investigate if stable self-organization if there are no foraging agents within its region of inﬂuence arises. We use the hierarchic social entropy as deﬁned by and its summed stored weight values are smaller than some Balch (2000) using single linkage clustering. Two robots are threshold value η . in the same cluster if the relative position between them is After a beacon agent switches to a foraging state, it will smaller than h. Then, with C(t , h) being the set of clusters always perform a random step, as by construction there is at time t with minimum distance h, c ∈ C(t , h) and A no other beacon within the agents range. Whether or not the i c being the subset of agents in cluster c , entropy for a group agent will end up in the region of inﬂuence of another beacon S(A) Autonomous Robots after the ﬁrst step, is related to the number of former neigh- bouring beacons. If the former beacon was fully surrounded by beacons, the agent will always end up in the region of inﬂuence of a beacon after one step. Else, there is a chance of ending up in a region not covered by a beacon after the ﬁrst step. To prevent an agent from switching directly back to the beacon state in such case, we restrict the switching from forager state to beacon state such that a forager agent needs to be for at least 2 time steps in a foraging state before it can switch back to the beacon state. Last, we force an agent to ‘forget’ the weight and guiding velocity values after switching from the beacon state to a for- s s ager state, i.e. ω and u are set to zero after a state transition. a a Since the information stored at a beacon is location depended, preserving the former beacon information will only disturb the system if a former beacon agent transforms to a beacon again at another location. D Experiment details We use the Webots (Michel, 2004) simulator to perform the extensive statistical analysis presented in Sect. 5.1.The Webots simulator can simulate realistic robots in physical environments, including the Elisa-3 robot (GCTronic) (see Fig. 7) we used for the real implementation of the work. Fig. 7 Elisa-3 robot The Elisa-3 robots are equipped with 8 IR proximity sen- sors (see Fig. 7) that are used for collision avoidance. Let us denote the event of detection of an obstacle by sensor # as References Trigger_Prox_#. The collision avoidance algorithm used for both the Webots simulations and the robotic implementation Alers, S., Tuyls, K., Ranjbar-Sahraei, B., et al. (2014). Insect-inspired of the work is then given by Algorithm 3. The algorithm robot coordination: foraging and coverage. Artiﬁcial life, 14, 761– ensures that the maximum distance traveled per time step is Balch, T. (2000). Hierarchic social entropy: An information theoretic never greater than τv , i.e., x − x ≤ τv . Consequently, 0 1 2 0 measure of robot group diversity. Autonomous robots, 8(3), 209– the formal guarantees presented in Sect. 4 still hold for a swarm that performs collision avoidance. Beni, G., Wang, J. (1993). Swarm intelligence in cellular robotic sys- tems. In Robots and biological systems: Towards a new bionics? Springer, pp. 703–712 Blum, C., Merkle, D. (2008). Swarm intelligence. In Blum, C., Merkle, Algorithm 3 Collision avoidance Elisa-3 robot D. (Eds.) Swarm Intelligence in Optimization, pp 43–85 if Trigger_Prox_1 then Campo, A., Gutiérrez, Á., Nouyan, S., et al. (2010). Artiﬁcial Turn 10 left pheromone for path selection by a foraging swarm of robots. Bio- else if Trigger_Prox_7 then logical cybernetics, 103(5), 339–352. Turn 10 right Dorigo, M., Birattari, M., et al. (2007). Scholarpedia. Swarm intelli- else if Trigger_Prox_0 then gence, 2(9), 1462. Turn 180 Drogoul, A., Ferber. J. (1993). Some experiments with foraging robots. else if Trigger_Prox_5 then In From Animals to Animats 2: Proceedings of the Second Interna- Turn 10 left tional Conference on Simulation of Adaptive Behavior, MIT Press, else if Trigger_Prox_3 then p. 451 Turn 10 right Ducatelle, F., Di Caro, GA., Pinciroli, C., et al. (2011b). Communica- else if Trigger_Prox_4 then tion assisted navigation in robotic swarms: Self-organization and Turn 180 cooperation. In 2011 IEEE/RSJ International Conference on Intel- end if ligent Robots and Systems, IEEE, pp. 4981–4988 Ducatelle, F., Di Caro, G. A., Pinciroli, C., et al. (2011). Self-organized cooperation between robotic swarms. Swarm Intelligence, 5(2), 73–96. 123 Autonomous Robots Ducatelle, F., Förster, A., Di Caro, G. A., et al. (2009). Supporting Reina, A., Cope, A. J., Nikolaidis, E., et al. (2017). ARK: Augmented navigation in multi-robot systems through delay tolerant network reality for Kilobots. IEEE Robotics and Automation Letters, 2(3), communication. IFAC Proceedings Volumes, 42(22), 25–30. 1755–1761. Font Llenas, A., Talamali, MS., Xu, X., et al. (2018). Quality-sensitive Russell, RA. (1997). Heat trails as short-lived navigational markers foraging by a robot swarm through virtual pheromone trails. In for mobile robots. In Proceedings of International Conference on Swarm Intelligence. Springer International Publishing, pp. 135– Robotics and Automation, IEEE, pp. 3534–3539 149 Russell, K., Schader, M., Andrea, K., et al. (2015). Swarm robot for- Fujisawa, R., Imamura, H., Hashimoto, T., et al. (2008). Communica- aging with wireless sensor motes. In Proceedings of the 2015 tion using pheromone ﬁeld for multiple robots. In 2008 IEEE/RSJ International Conference on Autonomous Agents and Multiagent International Conference on Intelligent Robots and Systems, IEEE, Systems, Citeseer, pp. 287–295 pp. 1391–1396 Sperati, V., Trianni, V., & Nolﬁ, S. (2011). Self-organised path formation Fujisawa, R., Dobata, S., Sugawara, K., et al. (2014). Designing in a swarm of robots. Swarm Intelligence, 5(2), 97–119. pheromone communication in swarm robotics: Group foraging Sugawara, K., Kazama, T., Watanabe, T. (2004). Foraging behavior behavior mediated by chemical substance. Swarm Intelligence, of interacting robots with virtual pheromone. In 2004 IEEE/RSJ 8(3), 227–246. https://doi.org/10.1007/s11721-014-0097-z International Conference on Intelligent Robots and Systems Garnier, S., Tache, F., Combe, M., et al. (2007). Alice in pheromone (IROS)(IEEE Cat. No. 04CH37566), IEEE, pp. 3074–3079 land: An experimental setup for the study of ant-like robots. In Svennebring, J., & Koenig, S. (2004). Building terrain-covering ant 2007 IEEE Swarm Intelligence Symposium, pp. 37–44, https:// robots: A feasibility study. Autonomous Robots, 16(3), 313–332. doi.org/10.1109/SIS.2007.368024 Talamali, M. S., Bose, T., Haire, M., et al. (2020). Sophisticated col- Hoff, N., Wood, R., Nagpal, R. (2013). Distributed colony-level lective foraging with minimalist agents: A swarm robotics test. algorithm switching for robot swarm foraging. In Distributed Swarm Intelligence, 14(1), 25–56. autonomous robotic systems. Springer, pp. 417–430 Thrun, S. B. (1992). Efﬁcient exploration in reinforcement learning. Hrolenok, B., Luke, S., Sullivan, K., et al. (2010). Collaborative foraging Technical Report, USA using beacons. In AAMAS, pp. 1197–1204 Ziparo, VA., Kleiner, A., Nebel, B., et al. (2007). RFID-based explo- Ishii, S., Yoshida, W., & Yoshimoto, J. (2002). Control of exploitation- ration for large robot teams. In Proceedings 2007 IEEE Interna- exploration meta-parameter in reinforcement learning. Neural tional Conference on Robotics and Automation, pp. 4606–4613, networks, 15(4–6), 665–687. https://doi.org/10.1109/ROBOT.2007.364189 Jarne Ornia, D., Mazo, M. (2020). Convergence of ant colony multi- agent swarms. In Proceedings of the 23rd International Conference Publisher’s Note Springer Nature remains neutral with regard to juris- on Hybrid Systems: Computation and Control. Association for dictional claims in published maps and institutional afﬁliations. Computing Machinery, New York, NY, USA, HSCC ’20, https:// doi.org/10.1145/3365365.3382199 Johansson, R., Safﬁotti, A. (2009). Navigating by stigmergy: A real- Steven Adams is a Ph.D. Candi- ization on an RFID ﬂoor for minimalistic robots. In 2009 IEEE date at the Delft Center for Sys- International Conference on Robotics and Automation, IEEE, pp. tems and Control, Delft Univer- 245–252 sity of Technology (The Nether- Kennedy, J. (2006). Swarm intelligence. In Handbook of nature-inspired lands). He received a BSc in and innovative computing. Springer, pp. 187–219 Mechanical Engineering at Delft Kuffner, J. J., LaValle, S. M. (2000). Rrt-connect: An efﬁcient approach University of Technology in 2016, to single-query path planning. In Proceedings 2000 ICRA. Millen- a MSc in Econometrics and Oper- nium Conference. IEEE International Conference on Robotics and ations Research from the Vrije Automation. Symposia Proceedings (Cat. No. 00CH37065), IEEE, Universiteit Amsterdam in 2020 pp. 995–1001 and a MSc in Systems and Con- Lemmens, N., de Jong, S., Tuyls, K., et al. (2007). Bee system with inhi- trol from Delft University of Tech- bition pheromones. In European conference on complex systems, nology in 2021. His main research Citeseer interests are multi-agent learning, Lemmens, N., Tuyls, K. (2009). Stigmergic landmark foraging. In multi-robot systems and bayesian Proceedings of the 8th International Conference on Autonomous deep learning models. Agents and Multiagent Systems-Vol. 1, pp. 497–504 Mayet, R., Roberz, J., Schmickl, T., et al. (2010). Antbots: A feasi- Daniel Jarne Ornia is a Ph.D. Can- ble visual emulation of pheromone trails for swarm robots. In M. didate at the Delft Center for Sys- Dorigo, M. Birattari, G. A. Di Caro, et al. (Eds.), Swarm Intelli- tems and Control, Delft Univer- gence (pp. 84–94). Springer. sity of Technology (The Nether- TM Michel, O. (2004). Cyberbotics ltd. webots : professional mobile lands). He received a BSc in robot simulation. International Journal of Advanced Robotic Sys- Aerospace Engineering at Univer- tems, 1(1), 5 sitat Politecnica de Catalunya in Nair, A., McGrew, B., Andrychowicz, M., et al. (2018). Overcom- Barcelona in 2015, and his MSc ing exploration in reinforcement learning with demonstrations. In in Aerospace Engineering from 2018 IEEE International Conference on Robotics and Automation the Royal Institute of Technol- (ICRA), IEEE, pp. 6292–6299 ogy (KTH) in Stockholm in 2017. Nouyan, S., Campo, A., & Dorigo, M. (2008). Path formation in a robot Prior to joining TU Delft as a swarm. Swarm Intelligence, 2(1), 1–23. PhD student he worked as a data Ornia, DJ., Zuﬁria, PJ., Mazo Jr., M. (2022). Mean ﬁeld behavior of scientist on AI industry applica- collaborative multiagent foragers. IEEE Transactions on Robotics tions. His main research interests Payton, D., Daily, M., Estowski, R., et al. (2001). Pheromone robotics. are learning multi-agent systems, and the study of efﬁcient communi- Autonomous Robots, 11(3), 319–324. cation and information sharing in collaborative systems. 123 Autonomous Robots vation centre INCAS3 (The Netherlands). His main research interest Manuel Mazo is an associate pro- is the formal study of problems emerging in modern control sys- fessor at the Delft Center for Sys- tem implementations, and in particular the study of networked con- tems and Control, Delft Univer- trol systems and the application of formal veriﬁcation and synthesis sity of Technology (The Nether- techniques to control. He has been the recipient of a University of lands). He received the Ph.D. and Newcastle Research Fellowship (2005), the Spanish Ministry of Edu- M.Sc. degrees in Electrical Engi- cation/UCLA Fellowship (2005–2009), the Henry Samueli Scholar- neering from the University of ship from the UCLA School of Engineering and Applied Sciences California, Los Angeles, in 2010 (2007/2008) and an ERCStarting Grant (2017). and 2007 respectively. He also holds a Telecommunications Engi- neering “Ingeniero” degree from the Polytechnic University of Madrid (Spain), and a “Civilingenjor” degree in Electrical Engineering from the Royal Institute of Technology (Swe- den), both awarded in 2003. Between 2010 and 2012 he held a joint post-doctoral position at the University of Groningen and the inno-
Autonomous Robots – Springer Journals
Published: Oct 1, 2023
Keywords: Swarm robotics; Collective learning; Navigation; Path planning
You can share this free article with as many people as you like with the url below! We hope you enjoy this feature!
Read and print from thousands of top scholarly journals.
Already have an account? Log in
Bookmark this article. You can see your Bookmarks on your DeepDyve Library.
To save an article, log in first, or sign up for a DeepDyve account if you don’t already have one.
Copy and paste the desired citation format or use the link below to download a file formatted for EndNote
Access the full text.
Sign up today, get DeepDyve free for 14 days.
All DeepDyve websites use cookies to improve your online experience. They were placed on your computer when you launched this website. You can change your cookie settings through your browser.