Get 20M+ Full-Text Papers For Less Than $1.50/day. Start a 14-Day Trial for You or Your Team.

Learn More →

RLSS: real-time, decentralized, cooperative, networkless multi-robot trajectory planning using linear spatial separations

RLSS: real-time, decentralized, cooperative, networkless multi-robot trajectory planning using... Trajectory planning for multiple robots in shared environments is a challenging problem especially when there is limited communication available or no central entity. In this article, we present Real-time planning using Linear Spatial Separations, or RLSS: a real-time decentralized trajectory planning algorithm for cooperative multi-robot teams in static environments. The algorithm requires relatively few robot capabilities, namely sensing the positions of robots and obstacles without higher- order derivatives and the ability of distinguishing robots from obstacles. There is no communication requirement and the robots’ dynamic limits are taken into account. RLSS generates and solves convex quadratic optimization problems that are kinematically feasible and guarantees collision avoidance if the resulting problems are feasible. We demonstrate the algorithm’s performance in real-time in simulations and on physical robots. We compare RLSS to two state-of-the-art planners and show empirically that RLSS does avoid deadlocks and collisions in forest-like and maze-like environments, significantly improving prior work, which result in collisions and deadlocks in such environments. Keywords Trajectory planning for multiple mobile robots · Decentralized robot systems · Collision avoidance · Multi-robot systems 1 Introduction is especially challenging when there is no central entity that plans all robots’ trajectories a priori or re-plans mid- Effective navigation of multiple robots in cluttered envi- execution if there is a fault. In some cases, such a central ronments is key to emerging industries such as ware- entity is undesirable because of the communication link that house automation (Wurman et al., 2008), autonomous driv- must be maintained between each robot and the central entity. ing (Furda & Vlacic, 2011), and automated intersection If the map is not known a priori, building and relaying the management (Dresner & Stone, 2008). One of the core observed map of the environment to the central entity through challenges of navigation systems in such domains is trajec- the communication channel adds further challenges. In some tory planning. Planning safe trajectories for multiple robots cases, it is impractical to have such a central entity because it cannot react to robot trajectory tracking errors and map Baskın Senba ¸ slar ¸ updates fast enough due to communication and computation baskin.senbaslar@usc.edu delays. In practice, robots must operate safely even if there is Wolfgang Hönig limited communication available. This necessitates the abil- hoenig@tu-berlin.de ity for robots to plan in a decentralized fashion, where each Nora Ayanian robot plans a safe trajectory for itself while operating in envi- nora_ayanian@brown.edu ronments with other robots and obstacles. Decentralized algorithms delegate the computation of Department of Computer Science, University of Southern California, 941 Bloom Walk, Los Angeles, CA 90089, USA trajectories: each robot plans for itself and reacts to the environment by itself. In this paper, we introduce RLSS, Department of Electrical Engineering and Computer Science, Technical University of Berlin, Marchstr. 23, 10587 Berlin, a real-time decentralized trajectory planning algorithm for Germany multiple robots in a shared environment that requires no Department of Computer Science and School of Engineering, communication between robots and requires relatively few Brown University, 115 Waterman Street, Providence, RI sensing capabilities: each robot senses the relative positions 02912, USA 123 Autonomous Robots ear spatial separations, between robot shapes, obstacles, and sweep geometries (the subsets of space swept by robots while traversing straight line segments) to enforce safety during trajectory optimization. We demonstrate, through simulation and experiments on physical robots, that RLSS works in dense environments in real-time (Fig. 1). We compare our approach to two state-of- the-art receding horizon decentralized multi-robot trajectory planning algorithms in 3D. In the first, introduced by Zhou et al. (2017), robots plan for actions using a model predic- Fig. 1 RLSS runs in real-time in dense environments. Each robot plans tive control-style optimization formulation while enforcing a trajectory by itself using only the position information of other robots and obstacles that each robot stays inside its buffered Voronoi cell at every iteration. We refer to this method as BVC. The original BVC formulation works only for discrete single-integrator dynam- of other robots and obstacles along with their geometric ics and environments without obstacles. We extend the BVC shapes in the environment, and is able to distinguish robots formulation to discrete time-invariant linear systems with from obstacles. RLSS requires relatively few robot capabili- position outputs and environments with obstacles, and call ties than most state-of-the-art decentralized planners used for the extended version eBVC (short for extended BVC). RLSS multi-robot navigation, which typically require communica- and eBVC have similar properties: they require no communi- tion (Luis et al., 2020; Tordesillas & How, 2021; Wang et al., cation between robots, and require position sensing of other 2021), higher order derivative estimates (Park & Kim, 2021; objects in the environment. However, unlike RLSS, eBVC Wang et al., 2021), or predicted trajectories of objects (Park does not require that robots are able to distinguish robots &Kim, 2021). However, the ability to distinguish robots from obstacles, as it treats each obstacle as a robot. In the from obstacles is not required by some state-of-the-art algo- second, introduced by Park and Kim (2021), robots plan rithms (Zhou et al., 2017), which require modelling obstacles for trajectories by utilizing relative safe navigation corridors, as robots. RLSS is cooperative in the sense that we assume which they execute for a short duration, and replan. We refer each robot stays within its current cell of a tessellation of the to this method as RSFC. RSFC requires no communication space until the next planning iteration. We assume obstacles between robots, and utilizes positions as well as velocities are static, which is required to guarantee collision avoidance. of the objects in the environment, thus requires more sens- RLSS explicitly accounts for the dynamic limits of the ing capabilities than RLSS. We demonstrate empirically that robots and enforces safety with hard constraints, reporting RLSS results in no deadlocks or collisions in our experiments when they cannot be enforced, thus guaranteeing collision in forest-like and maze-like environments, while eBVC is avoidance when it succeeds. The planning algorithm can be prone to deadlocks and RSFC results in collisions in such guided with desired trajectories, thus it can be used in con- environments. However, RLSS results in slightly longer nav- junction with centralized planners. If no centralized planner igation durations compared to both eBVC and RSFC. or plan is available, RLSS can be used on its own, without The contribution of this work can be summarized as fol- any central guidance, by setting the desired trajectories to lows: line segments directly connecting robots’ start positions to their goal positions. There are 4 stages in RLSS. • A carefully designed, numerically stable, and effective real-time decentralized planning algorithm for multiple 1. Select a goal position to plan toward on the desired tra- robots in shared environments with static obstacles with jectory; relatively few requirements: no communication between 2. Plan a discrete path toward the selected goal position; robots, position-only sensing of robots and obstacles, and 3. Formulate and solve a kinematically feasible convex opti- ability to distinguish robots from obstacles. mization problem to compute a safe smooth trajectory • An extension (eBVC) of a baseline planner (BVC) to guided by the discrete plan; environments with obstacles and a richer set of dynamics 4. Check if the trajectory obeys the dynamic limits of the than only single-integrators. robot, and temporally rescale the trajectory if not. • The first comparison of more than two state-of-the-art communication-free decentralized multi-robot trajectory planning algorithms, namely, RSFC, eBVC, and RLSS, RLSS works in the receding horizon fashion: it plans a long in complicated forest-like and maze-like environments, trajectory, executes it for a short duration, and replans at some of which containing more than 2000 obstacles. the next iteration. It utilizes separating hyperplanes, i.e., lin- 123 Autonomous Robots 2 Related work 2.1 Centralized algorithms The pipeline of RLSS contains three stages (discrete plan- We further categorize centralized algorithms into those that ning, safe navigation corridor construction, and trajectory do not consider robot dynamics during planning and those optimization) that are employed by several existing single- that do. robot trajectory planning algorithms. Richter et al. (2013) Centralized without dynamics: If planning can be present a single-robot trajectory planning method for aggres- abstracted to agents moving along edges in a graph syn- sive quadrotor flight which utilizes RRT* (Karaman & chronously, we refer to the multi-agent path finding (MAPF) Frazzoli, 2010) to find a kinematically feasible discrete path problem. Some variants of the MAPF problem are NP-Hard and formulates an unconstrained quadratic program over to solve optimally (Yu & LaValle, 2013). For NP-Hard vari- polynomial splines guided by the discrete path. Collisions are ants, there are many optimal (Sharon et al., 2015; Lam et al., checked after optimization, and additional decision points are 2019), bounded suboptimal (Barer et al., 2014), and subop- added and optimization is re-run if there is collision. Chen et timal (Solovey et al., 2013; Ma et al., 2019) algorithms that al. (2016) present a method that utilizes OcTree representa- perform well in some environments. Trajectories generated tion (Hornung et al., 2013) of the environment during discrete by planning algorithms that do not model robot dynamics search. They find a discrete path using unoccupied grid cells may not be followed perfectly by real robots, resulting in of the OcTree. Then, they maximally inflate unoccupied grid divergence from the plan. cells to create a safe navigation corridor that they use as con- Centralized with dynamics: These algorithms deliver straints in the subsequent polynomial spline optimization. smooth control or output trajectories that are executable by Liu et al. (2017) uses Jump Point Search (JPS) (Harabor & real robots. Trajectories are usually generated by formulating Grastien, 2011) as the discrete planner, and construct safe the problem under an optimization framework, and commu- navigation corridors that are used as constraints in the opti- nicated to the robots for them to execute. Tang and Kumar mization stage. Our planning system uses these three stages (2016) propose an approach that combines a discrete motion (discrete planning, safe navigation corridor construction, and planning algorithm with trajectory optimization that works corridor-constrained optimization) and extends it to multi- only in obstacle-free environments. Another approach com- robot scenarios in a decentralized way. We handle robot-robot bines a MAPF solver with trajectory optimization to plan safety by cooperatively computing a linear partitioning of the trajectories for hundreds of quadrotors in environments with environment, and enforcing that each robot stays within its obstacles, while ensuring that the resulting trajectories are own cell of the partition. executable by the robots (Hönig et al., 2018). Park et al. We categorize multi-robot trajectory planning algorithms (2020) combine a MAPF solver with trajectory optimization first on where computation is done, since the location of and provide executability and feasibility guarantees. Desai computation changes the properties of the algorithms. There and Michael (2020) propose an approach utilizing position- are two main strategies to solve the multi-robot trajectory invariant geometric trees to find kinodynamically feasible planning problem: centralized and decentralized. Centralized trajectories. algorithms compute trajectories for all robots on a central Centralized algorithms can provide theoretical complete- entity with global information; these trajectories are com- ness and global optimality guarantees under the assumptions municated back to the robots for them to execute. In the of perfect prior knowledge and execution of trajectories. decentralized strategy, each robot runs an algorithm on- However, centralized replanning is required when the envi- board to compute its own trajectory; these algorithms may ronment is discovered during operation or the robots deviate utilize direct communication between robots. Centralized from the planned trajectories. This necessitates continuous algorithms often provide strong theoretical guarantees on communication between the central computer and the indi- completeness or global optimality because (i) they have the vidual robots. In some cases, solving the initial problem from complete knowledge about the environment beforehand, (ii) scratch may be required. If the communication between the they generally run on powerful centralized computers, and robots and the central computer cannot be maintained, or (iii) there are generally less restrictive time limits on their exe- solving the initial problem from scratch cannot be done in cution. Decentralized algorithms generally forgo such strong real-time, the robots may collide with each other or obstacles. theoretical guarantees in favor of fast computation because (i) they are often used when complete information of the 2.2 Decentralized algorithms environment is not known beforehand, and (ii) they have to work in real-time on-board. When the requirements of the centralized strategies cannot be satisfied, decentralization of the trajectory planning among robots is required. In decentralized planning, robots assess the state of the environment and plan for themselves using 123 Autonomous Robots their on-board capabilities. We categorize decentralized algo- Voronoi cell. It requires no inter-robot communication and rithms into reactive and long horizon algorithms. depends on sensing of other robots’ positions only. Another Decentralized reactive: In reactive algorithms, each robot MPC-based approach approximates robots’ controller behav- computes the next action to execute based on the state of the iors under given desired states as a linear system (Luis et environment without considering the actions that might fol- al., 2020). A smooth Bézier curve for each robot is com- low it. One such approach is Optimal Reciprocal Collision puted by solving an optimization problem, in which samples Avoidance (ORCA) (Alonso-Mora et al., 2013). In ORCA, of desired states are drawn from the curve and fed into the each robot chooses a velocity vector that is as close as possi- model of the system. The approach requires communication ble to a desired velocity vector such that collisions between of future states for collision avoidance. Another MPC-based each pair of robots are provably avoided. It requires robots approach generates plans using motion primitives to com- to (i) be cooperatively executing the same algorithm, (ii) pute time-optimal trajectories, then trains a neural network obey single integrator dynamics, and (iii) either sense other to approximate the behavior of the planner, which is used dur- robots’ velocities or receive them through communication. ing operation for fast planning (Wang et al., 2021). It requires While ORCA handles dynamic changes in the environment sensing or communicating the full states of robots. Tordesil- and is completely distributed, it fails to avoid deadlocks in las and How (2021) present a method for dynamic obstacle environments with obstacles (Senba ¸ slar ¸ et al., 2019). Safety avoidance, handling asynchronous planning between agents barrier certificates guarantee collision avoidance by com- in a principled way. However, the approach requires instant puting a safe control action that is as close as possible to communication of planned trajectories between robots to a desired control action (Wang et al., 2017). Similarly to ensure safety. Park and Kim (2021) plan a piecewise Bézier ORCA, the robots using safety barrier certificates may get curve by formulating an optimization problem. It utilizes rel- stuck in deadlocks. Most learning-based approaches fall into ative safe flight corridors (RSFCs) for collision avoidance the body of reactive strategies as well. Here, neural networks and requires no communication between robots, but requires are trained to compute the next action to execute, given a position and velocity sensing. In Peterson et al. (2021)’s robot’s immediate neighborhood and goal. For instance, approach, robot tasks are defined as time window tempo- PRIMAL (Sartoretti et al., 2019) is a computationally effi- ral logic specifications. Robots plan trajectories to achieve cient suboptimal online MAPF solver, but it only works on their tasks while avoiding each other. The method provably grids, and does not consider robot dynamics. GLAS (Riviere avoids deadlocks, but requires communication of planned et al., 2020), which considers robot dynamics, can guarantee trajectories between robots. safety for some robot dynamics by combining the network RLSS falls into this body of algorithms. It requires sens- output with a safety term. It performs better than ORCA in ing only the positions of other robots and obstacles, and terms of deadlocks, nevertheless results in deadlocks in dense does not require any communication between robots. The environments. Another approach employs graph neural net- work of Zhou et al. (2017) (BVC) is the only other approach works (GNNs) that allow communication between robots to with these properties. Different from their approach, RLSS avoid collisions (Li et al., 2020), but results in deadlocks uses discrete planning to avoid local minima, plans piecewise in dense environments similar to GLAS. Batra et al. (2021) Bézier trajectories instead of MPC-style input planning, and propose a method that outputs direct motor commands for a does not constrain the full trajectory to be inside a constrain- quadrotor using the positions and velocities of objects in the ing polytope. These result in better deadlock and collision its immediate neighborhood, but results in collisions between avoidance as we show in the evaluation section. robots. In terms of inter-robot collision avoidance, RLSS belongs Decentralized long horizon: In long horizon algorithms, to a family of algorithms that ensure multi-robot collision robots generate a sequence of actions or long trajectories avoidance without inter-robot communication by using the instead of computing a single action to execute. These fact that each robot in the team runs the same algorithm. In approaches employ receding horizon planning: they plan these algorithms, robots share the responsibility of collision long trajectories, execute them for a short duration, and avoidance by computing feasible action sets that would not replan. Zhou et al. (2017) present a model predictive control result in collisions with others when others compute their (MPC) based approach in which robots plan for a sequence of own feasible action sets using the same algorithm. Examples actions while enforcing each robot to stay within its buffered include BVC (Zhou et al., 2017), which partitions position space between robots and makes sure that each robot stays within its own cell in the partition, ORCA (Alonso-Mora et Note that some learning-based approaches train neural networks to mimic the behavior of a global long horizon planner. Because of this, al., 2013), which computes feasible velocity sets for robots even if they output a single action, they abstractly consider a longer such that there can be no collisions between robots as long horizon evolution of the system. We place these approaches to the reac- as each robot chooses a velocity command from their corre- tive algorithms category because they output a single action at each iteration. 123 Autonomous Robots sponding sets, and SBC (Wang et al., 2017), which does the ensures kinematic feasibility of the optimization problem. same with accelerations. Hence, there is no longer a different soft optimization formu- BVC and RLSS also belong to a family of decentral- lation. In the workshop version, we compare RLSS against ized multi-robot trajectory planning algorithms that utilize Luis et al. (2020)’s work, which is based on distributed mutually computable separating hyperplanes for inter-robot MPC, which requires communication of planned trajectories safety as defined by Senba ¸ slar ¸ and Sukhatme (2022). These between robots. Herein, we compare our approach to two algorithms do not require any inter-robot communication and planners that do not require communication between robots utilize only position/geometry sensing between robots. The as this provides a more comparable baseline. inter-robot safety is enforced by making sure that each pair of robots can compute the same separating hyperplane and enforce themselves to be in the different sides of this hyper- 3 Problem statement plane at every planning iteration. Separating hyperplanes that can be mutually computed by robots using only posi- We first define the multi-robot trajectory planning problem, tion/geometry sensing and no communication are defined as then, we indicate the specific case of the problem we solve. mutually computable. BVC uses Voronoi hyperplanes and Consider a team of N robots for which trajectories must RLSS uses support vector machine hyperplanes, both of be computed at time t = 0. The team can be heterogeneous, which are mutually computable. i.e., each robot might be of a different type or shape. Let R RLSS is an extension to our previous conference paper be the collision shape function of robot i such that R (p) is (¸Senbaslar ¸ et al., 2019). We extend it conceptually by the collision shape of robot i located at position p ∈ R , d ∈ {2, 3}, i.e., the subset of space occupied by robot i when • supporting robots with any convex geometry instead of placed at position p. We define R (p) ={p + x | x ∈ only spherical robots; and R } where R ⊂ R is the the space occupied by robot i i ,0 i ,0 • decreasing the failure rate of the algorithm from 3% in when placed at the origin. Note that we do not model robot complex environments to 0.01% by providing impor- orientation. If a robot can rotate, the collision shape should tant modifications to the algorithm. These modifications contain the union of spaces occupied by the robot for each include (i) changing the discrete planning to best-effort possible orientation at a given position; which is minimally A search, thus allowing robots to plan towards their a hypersphere if all orientations are possible. goals even when goals are not currently reachable; (ii) We assume the robots are differentially flat (Murray increasing the numerical stability of the algorithm by run- et al., 1995), i.e., the robots’ states and inputs can be ning discrete planning at each iteration instead of using expressed in terms of output trajectories and a finite num- the trajectory of the previous iteration when it is collision ber of derivatives thereof. Differential flatness is common free to define the homotopy class, and adding a preferred for many kinds of mobile robots, including differential drive distance cost term to the optimization problem in order to robots (Campion et al., 1996), car-like robots (Murray & create a safety zone between objects when possible; and Sastry, 1993), omnidirectional robots (Jiang & Song, 2013), (iii) ensuring the kinematic feasibility of the optimization and quadrotors (Mellinger & Kumar, 2011). When robots are problem generated at the trajectory optimization stage. differentially-flat, their dynamics can be accounted for by (i) imposing C continuity on the trajectories for the required c th We extend our previous work empirically by (i.e. continuity up to c degree of derivative), and (ii) impos- th ing constraints on the maximum k derivative magnitude • applying our algorithm to a heterogeneous team of dif- of trajectories for any required k. For example, quadrotor ferential drive robots in 2D; states and inputs can be expressed in terms of the output • applying our algorithm to quadrotors in 3D; and trajectory, and its first (velocity), second (acceleration) and • comparing the performance of our algorithm to stronger third (jerk) derivatives (Mellinger & Kumar, 2011). Hence, to baselines. account for quadrotor dynamics, continuity up to jerk should be enforced by setting c = 3, and velocity, acceleration, and A preliminary version of RLSS previously appeared in a jerk of the output trajectory should be limited appropriately workshop (Senba ¸ slar ¸ et al., 2021). In the workshop version, by setting upper bounds for k = 1, 2, and 3. the RLSS optimization stage fails about 3% of the time, and Let O(t ) ={Q ⊂ R } be the set of obstacles in the switches to a soft optimization formulation when that hap- environment at time t. The union Q(t ) =∪ Q denotes Q∈O(t ) pens. In the current version, the optimization rarely (close to the occupied space at time t.Let W ⊆ R be the workspace 1 in every 10000 iterations in our experiments) fails thanks that robot i should stay inside of. to the newly added preferred distance cost term, changed Let d (t ) :[0, T ]→ R be the desired trajectory of i i goal selection stage, and changed discrete planning stage that duration T that robot i should follow as closely as possible. 123 Autonomous Robots Define d (t ) = d (T ) ∀t ≥ T . We do not require that this ing problem can be solved optimally with classical search i i i i trajectory is collision-free or even dynamically feasible to methods in polynomial time. This curse of dimensionality track by the robot. If no such desired trajectory is known, it affects continuous motion planning as well, where even geo- can be initialized, for example, with a straight line to a goal metric variants are known to be PSPACE-Hard (Hopcroft et location. al., 1984). The intent is that each robot i tracks a Euclidean trajec- A centralized planner can be used to solve the generic tory f (t ) :[0, T]→ R such that f (t ) is collision-free, multi-robot trajectory planning problem in one-shot provided i i executable according to the robot’s dynamics, is as close as that: the current and future obstacle positions are known a possible to the desired trajectory d (t ), and ends at d (T ). priori, computed trajectories can be sent to robots over a com- i i i Here, T is the navigation duration of the team. We generi- munication link, and robots can track these trajectories well cally define the multi-robot trajectory planning problem as enough that they do not violate the spatio-temporal safety the problem of finding trajectories f ,..., f and navigation of the computed trajectories. In the present work, we aim to 1 N duration T that optimizes the following formulation: approximately solve this problem in the case where obsta- cles are static, but not necessarily known a priori by a central N entity, and there is no communication channel between robots min f (t )−d (t ) dt s.t . (1) i i 2 or between robots and a central entity. Each robot plans its f ,...,f ,T 1 N i =1 own trajectory in real-time. They plan at a high frequency to f (t ) ∈ C ∀i ∈{1,..., N } (2) compensate for trajectory tracking errors. f (T ) = d (T ) ∀i ∈{1,..., N } (3) i i i c 0 d p d f (0) = ∀i ∀c ∈{0,..., c } (4) c c dt dt 4 Preliminaries R (f (t )) ∩ Q(t ) =∅ ∀i ∀t ∈[0, T ] (5) i i R (f (t )) ∩ R (f (t )) =∅ ∀ j =i ∀t ∈[0, T ] (6) i i j j We now introduce essential mathematical concepts used R (f (t )) ∈ W ∀i ∀t ∈[0, T ] (7) i i i herein. d f (t ) max ≤ γ ∀i ∀k ∈{1,..., K } (8) t ∈[0,T ] dt 2 4.1 Parametric curves and splines where c is the order of derivative up to which the trajectory d Trajectories are curves f :[0, T]→ R that are parametrized th 0 of the i robot must be continuous, p is the initial position i by time, with duration T . Mathematically, we choose to use of robot i (derivatives of which are the initial higher order splines, i.e. piecewise polynomials, where each piece is a state components, e.g., velocity, acceleration, etc.), γ is the i Bézier curve defined by a set of control points and a duration. th th maximum k derivative magnitude that the i robot can d A Bézier curve f :[0, T]→ R of degree h is defined by execute, and K is the maximum derivative degree that robot d h + 1 control points P ,..., P ∈ R as follows: 0 h i has a derivative magnitude limit on. The cost (1) of the optimization problem is a metric i (h−i ) h t t for the total deviation from the desired trajectories; it is f (t ) = P 1 − . i T T the sum of position distances between the planned and the i =0 desired trajectories. (2) enforces that the trajectory of each robot is continuous up to the required degree of derivatives. Any Bézier curve f satisfies f (0) = P and f (T ) = P . Other 0 h (3) enforces that planned trajectories end at the endpoints points guide the curve from P to P . Since any Bézier curve 0 h of desired trajectories. (4) enforces that the planned tra- f is a polynomial of degree h, it is smooth, meaning f ∈ C . jectories have the same initial position and higher order We choose Bézier curves as pieces because of their convex derivatives as the initial states of the robots. (5) and (6) hull property: the curves themselves lie inside the convex hull enforce robot-obstacle and robot-robot collision avoidance, of their control points, i.e., f (t ) ∈ Convex Hull{P ,..., P } 0 h respectively. (7) enforces that each robot stays within its ∀t ∈[0, T ] (Farouki, 2012). Using the convex hull property, defined workspace. Lastly, (8) enforces that the dynamic lim- we can constrain a curve to be inside a convex region by its of the robot are obeyed. constraining its control points to be inside the same convex Note that only constraint (6) stems from multiple robots. region. Tordesillas and How (2020) discuss the conservative- However, this seemingly simple constraint couples robots’ ness of the convex hulls of control points of Bézier curves and trajectories both spatially and temporally, making the prob- show that the convex hulls are considerably less conserva- lem much harder. As discussed in Sect. 2.1, solving the tive than those of B-Splines (Piegl & Tiller, 1995), which are multi-agent path finding problem optimally is NP-Hard even another type of curve with the convex hull property. Yet, they for the discrete case while the discrete single-agent path find- also show that convex hulls of the control points of the Bézier 123 Autonomous Robots curves can be considerably more conservative compared to cles and robots as well as their shapes in the environment. the smallest possible convex sets containing the curves. The approach we propose does not require sensing of higher order state components (e.g., velocity, acceleration, etc.) or planned/estimated trajectories of objects, as the former is 4.2 Linear spatial separations: half-spaces, convex generally a noisy signal which cannot be expected to be polytopes, and support vector machines sensed perfectly and the latter would require either commu- nication or a potentially noisy trajectory estimation. A hyperplane H in R can be defined by a normal vector d d RLSS treats robots and obstacles differently. It enforces H ∈ R and an offset H as H ={x ∈ R | H x + H = n a a d d ˜ that each robot stays within a spatial cell that is disjoint from 0}. A half-space H in R is a subset of R that is bounded the cells of other robots until the next planning iteration to by a hyperplane such that H ={x ∈ R | H x + H ≤ 0}. ensure robot-robot collision avoidance. To compute the spa- A convex polytope is an intersection of a finite number of tial cell for each robot, RLSS uses positions and shapes of half-spaces. nearby robots, but not obstacles, in the environment. There- Our approach relies heavily on computing safe convex fore, robots must be able to distinguish other robots from polytopes and constraining spline pieces to be inside these obstacles. However, we do not require individual identifica- polytopes. Specifically, we compute hard-margin support tion of robots. vector machine (SVM) (Cortes & Vapnik, 1995) hyperplanes We assume that the team is cooperative in that each robot between spaces swept by robots along line segments and the runs the same algorithm using the same replanning period. obstacles/robots in the environment, and use these hyper- Lastly, we assume that planning is synchronized between planes to create safe convex polytopes for robots to navigate robots, meaning that each robot plans at the same time instant. in. The synchronization assumption is needed for ensuring robot-robot collision avoidance when planning succeeds. 5 Assumptions Here, we list our additional assumptions about the problem 6 Approach formulation defined in Sect. 3 and the capabilities of robots. We assume that obstacles in the environment are static, i.e., Under the given assumptions, we solve the generic multi- O(t ) = O ∀t ∈[0, ∞], and convex. Many existing, effi- robot trajectory planning problem approximately using cient, and widely-used mapping tools, including occupancy decentralized receding horizon planning. Each robot plans grids (Homm et al., 2010) and octomaps (Hornung et al., a trajectory, executes it for a short period of time, and repeats 2013), internally store obstacles as convex shapes; such maps this cycle until it reaches its goal. We call each plan-execute can be updated in real-time using visual or RGBD sensors, cycle an iteration. and use unions of convex axis-aligned boxes to approximate We refer to the planning robot as the ego robot, and hence- the obstacles in the environment. forth drop indices for the ego robot from our notation as the Similarly, we assume that the shapes of the robots are same algorithm is executed by each robot. Workspace W is convex. the convex polytope in which the ego robot must remain. We assume that the workspace W ⊆ R is a convex R is the collision shape function of the ego robot such that polytope. It can be set to a bounding box that defines a room R(p) ⊂ R is the space occupied by the ego robot when it is that a ground robot must not leave, a half-space that contains placed at position p. The ego robot is given a desired trajec- vectors with positive z coordinates so that a quadrotor does tory d(t ) :[0, T]→ R that it should follow. The dynamic not hit the ground or simply be set to R . A non-convex limits of the robot are modeled using required derivative workspace W can be modeled by a convex workspace W degree c up to which the trajectory must be continuous, i i ˜ ˜ such that W ⊆ W and a static set of convex obstacles O and maximum derivative magnitudes γ for required degrees i i that block portions of the convex workspace so that W = k ∈{1,..., K }. At every iteration, the ego robot computes a piecewise W \ ∪ Q . i ˜ Q∈O trajectory f (t ), which is dynamically feasible and safe up To provide a guarantee of collision avoidance, we assume to the safety duration s, that it executes for the replanning that robots can perfectly sense the relative positions of obsta- 2 3 For the purposes of our algorithm, concave obstacles can be described In physical deployments, asynchronous planning can cause collisions or approximated by a union of finite number of convex shapes provided between robots when they are in close proximity to each other. To handle that the union contains the original obstacle. Using our algorithm with collisions stemming from asynchronous planning, robot shapes can be approximations of concave obstacles results in trajectories that avoid artificially inflated according to the maximum possible planning lag the approximations. between robots (an empirical value) at the expense of conservativeness. 123 Autonomous Robots period δt ≤ s, and fully re-plans, i.e., runs the full planning pipeline, for the next iteration. The only parameter that is shared by all robots is the replanning period δt. Without loss of generality, we assume that navigation begins at t = 0, and at the start of planning iteration u, the current timestamp is T = uδt. RLSS fits into the planning part of the classical robotics pipeline using perception, planning, and control. The inputs from perception for the ego robot are: • S: Shapes of other robots. S ∈ S where j ∈{1,..., i − 1, i + 1,..., N } is the collision shape of robot j sensed by the ego robot such that S ⊆ R . • O: The set of obstacles in the environment, where each obstacle Q ∈ O is a subset of R . • p: Current position of the ego robot, from which deriva- tives up to required degree of continuity can be com- puted. We define O =∪ Q as the space occupied by the Q∈O obstacles, and S =∪ S as the space occupied by the S ∈S robot shapes. Robots sense the set of obstacles and the set of robot shapes and use those sets in practice. We use spaces occupied by obstacles and robots for brevity in notation. There are 4 main stages of RLSS: (1) goal selection, (2) discrete planning, (3) trajectory optimization, and (4) tempo- ral rescaling. The planning pipeline is summarized in Fig. 2. At each planning iteration, the ego robot executes the four Fig. 2 RLSS planning pipeline. Based on the sensed robots S, sensed stages to plan the next trajectory. In the goal selection stage, obstacles O, and current position p, the ego robot computes the trajec- a goal position and the corresponding timestamp of the goal tory f (t ) that is dynamically feasible and safe up to time s position on the desired trajectory d(t ) is selected. In the discrete planning stage, a discrete path from robot’s cur- rent position toward the selected goal position is computed the task at hand; these are not parameters that can be tuned and durations are assigned to each segment. In the trajec- freely and they do not change during navigation. tory optimization stage, discrete segments are smoothed to a piecewise trajectory. In the temporal rescaling stage, the 6.1 Goal selection dynamic limits of the robot are checked and duration rescal- ing is applied if necessary. At the goal selection stage (Algorithm 1), we find a goal Next, we describe each stage in detail. Each stage has position g on the desired trajectory d(t ) and a timestamp access to the workspace W, the collision shape function R, by which it should be reached. These are required in the the desired trajectory d(t ) and its duration T , the maximum subsequent discrete planning stage, which is a goal-oriented derivative magnitudes γ , and the derivative degree c up to search algorithm. which the trajectory must be continuous. We call these task Goal selection has two parameters: the desired planning inputs. They describe the robot shape, robot dynamics, and horizon τ and safety distance D. It uses the robot collision shape function R, desired trajectory d(t ) and its duration T , Practically, if the ego robot cannot sense a particular robot j because and workspace W from the task inputs. The inputs of goal it is not within the sensing range of the ego robot, robot j can be omitted by the ego robot. As long as the sensing range of the ego robot is more selection are the shapes of other robots S, obstacles in the than the maximum distance that can be travelled by the ego robot and environment O, current position p, and the current timestamp robot j in duration δt, omitting robot j does not affect the behavior of T . the algorithm. At the goal selection stage, the algorithm finds the times- In reality, all required derivatives of the position would be estimated tamp T that is closest to T + τ (i.e., the timestamp that using a state estimation method. We use this definition of p for notational convenience. is one planning horizon away from the current timestamp) 123 Autonomous Robots Algorithm 1 GOAL-SELECTION Input : S : Set of robot shapes Input : O : Set of obstacles Input : p : Current position Input : T : Current timestamp TaskInput : R : Collision shape function TaskInput : d(t ), T : Desired trajectory and its duration TaskInput : W : Workspace Parameter : τ : Desired planning horizon Parameter : D : Safety distance Return : Goal and timestamp by which it should be reached 1 T ← Solve (9) with linear search 2 if (9) is infeasible then 3 return (p, T ) 4 else 5 g ← d(T ) 6 return (g, T ) 7 end Fig. 3 Goal Selection. a Blue and red squares are robots, while obsta- cles are black boxes. The desired trajectories d (t ) and d (t ) of red blue when the robot, if placed on the desired trajectory at T ,isat each robot are given as dotted lines. Safety distance D is set to 0 for least safety distance D away from all objects in the environ- clarity. b The desired trajectory of the red robot is not collision-free at timestamp T + τ . It selects its goal timestamp T (and hence its goal ment. We use the safety distance D as a heuristic to choose position) by solving (9), which is the closest timestamp to T + τ when goal positions that have free volume around them in order the robot, when placed on the desired trajectory, would be collision free. not to command robots into tight goal positions. Note that c Since the desired trajectory of blue robot is collision free at timestamp ˜ ˜ goal selection only chooses a single point on the desired tra- T + τ , it selects its goal timestamp T = T + τ after solving (9). d Selected goal positions are shown jectory that satisfies the safety distance; the actual trajectory the robot follows will be planned by the rest of the algorithm. Formally, the problem we solve in the goal selection stage is given as follows: 6.2 Discrete planning ˜ Discrete planning (Algorithm 2) performs two main tasks: (i) T = arg min|t − (T + τ)| s.t . it finds a collision-free discrete path from the current position (9) t ∈[0, T ] p towards the goal position g, and (ii) it assigns durations to each segment of the discrete path. The discrete path found at ˆ ˆ min-dist(R(d(t )), O ∪ S ∪ ∂W) ≥ D the discrete planning stage represents the homotopy class of the final trajectory. Trajectories in the same homotopy class where ∂W is the boundary of workspace W, and min-dist can be smoothly deformed into one another without inter- returns the minimum distance between two sets. secting obstacles (Bhattacharya et al., 2010). The subsequent We solve (9) using linear search on timestamps starting trajectory optimization stage computes a smooth trajectory from T + τ with small increments and decrements. within the homotopy class. The trajectory optimization stage Figure 3, demonstrates the goal selection procedure for a utilizes the discrete path to (i) generate obstacle avoidance particular instance. constraints and (ii) guide the computed trajectory by adding If there is no safe point on the desired trajectory, i.e. if the distance cost terms between the discrete path and the com- robot is closer than D to objects when it is placed on any point puted trajectory. It uses the durations assigned by the discrete on the desired trajectory, we return the current position and planning stage as the piece durations of the piecewise trajec- timestamp. This allows us to plan a safe stopping trajectory. tory it computes. Note that while the selected goal position has free volume Finding a discrete path from the start position p to the around it, it may not be reachable by the ego robot. For exam- goal position g is done with best-effort A search (Line 2, ple, the goal position may be encircled by obstacles or other Algorithm 2), which we define as follows. If there are valid robots. Therefore, we use a best-effort search method during paths from p to g, we find the least-cost path among such discrete planning (as described in Sect. 6.2) that plans a path paths. If no such path exists, we choose the least-cost path towards the goal position. to the position that has the lowest heuristic value (i.e., the The goal and the timestamp are used in the subsequent position that is heuristically closest to g). This modification discrete planning stage as suggestions. of A search is done due to the fact that the goal position may 123 Autonomous Robots Algorithm 2 DISCRETE-PLANNING action is the distance travelled divided by the step size (i.e. Input : g : Goal position cells travelled), which is equal to the size of the direc- Input : T : The timestamp that goal position should be (or tion vector. REACHGOAL has the cost of one rotation plus should have been) reached at π −g cells travelled from π to goal position g:1 + .One Input : S : Set of robot shapes σ rotation cost is there because it is almost surely required to Input : O : Set of obstacles Input : p : Current position do one rotation before going to goal from a cell. ROTATE Input : T : Current timestamp actions in all directions are always valid whenever the cur- TaskInput : R : Collision shape function rent state is valid. FORWARD and REACHGOAL actions are TaskInput : W : Workspace valid whenever the robot shape R swept along the movement TaskInput : γ : Maximum velocity the robot can execute Parameter : σ : Step size of search grid is contained in free space F. Parameter : s : Duration up to which computed trajectory must be For any state (π , ), we use the Euclidean distance from safe. s ≥ δt must hold. position π to the goal position g divided by step size σ (i.e. Return : Discrete path and duration assignments to segments ˆ ˆ cells travelled when π is connected to g with a straight line) 1 F ← W\(O ∪ S) 2 actions ← BEST-EFFORT-A (p, g, R, F,σ ) as the admissible heuristic. 3 {e ,..., e }← EXTRACT-SEGMENTS(actions) 1 L 4 Prepend e to {e ,..., e } so that e = e . Lemma 1 In the action sequence of the resulting plan 1 1 L 0 1 5 total Length ← e − e i i −1 i =1 2 total Length 1. Each ROTATE action must be followed by at least one 6 fDuration ← max T − T , FORWARD action, 7 T ← s 8 for i = 2 → L do 2. The first action cannot be a FORWARD action, e −e i i −1 9 T ← fDuration 3. And no action can appear after REACHGOAL action. total Length 10 end 11 return {e ,..., e }, {T ,..., T } 0 L 1 L Proof Sketch 1. After each ROTATE action, a FORWARD action must be executed in a least-cost plan because (i) a ROTATE action cannot be the last action since goal states accept any direction and removing any ROTATE action from not always be reachable, since the goal selection stage does the end would result in a valid lower cost plan, (ii) the not enforce reachability. REACHGOAL action cannot appear after ROTATE action The ego robot plans its path in a search grid where the because REACHGOAL internally assumes robot rotation grid has hypercubic cells (i.e. square in 2D, cube in 3D) with and removing the ROTATE action would result in a valid edge size σ , which we call the step size of the search. The grid lower cost plan, and iii) there cannot be consecutive ROTATE shifts in the environment with the robot in the sense that the actions in a least-cost path as each rotation has the cost of 1 robot’s current position always coincides with a grid center. and removing consecutive rotations and replacing them with ˆ ˆ Let F = W\(O ∪ S) be the free space within the workspace a single rotation would result in a valid lower cost plan. (Line 1, Algorithm 2). We do not map free space F to the 2. The first action cannot be a FORWARD action since grid. Instead, we check if the robot shape swept along any initial direction is set to 0 and FORWARD action is available segment on the grid is contained in F or not, to decide if a only when  = 0. movement is valid. This allows us to (i) model obstacles and 3. No action after a REACHGOAL action can appear robot shapes independently from the grid’s step size σ , and in a least cost plan because REACHGOAL connects to the (ii) shift the grid with no additional computation since we do goal position, which is a goal state regardless of the direc- not store occupancy information within the grid. tion. Removing any action after REACHGOAL action would The states during the search have two components: posi- result in a valid lower cost plan. tion π and direction . Robots are allowed to move perpen- dicular or diagonal to the grid. This translates to 8 directions By Lemma 1, the action sequence can be described by the in 2-dimension, 26 directions in 3-dimension. Goal states following regular expression in POSIX-Extended Regular are states that have position g and any direction. We model Expression Syntax: directions using vectors  of d components where each com- + ∗ {0,1} ponent is in {−1, 0, 1}. When the robot moves 1 step along ((ROTATE)(FORWARD) ) (REACHGOAL) direction , its position changes by σ . The initial state of the search is the ego robot’s current position p and direction We collapse consecutive FORWARD actions in the result- 0. ing plan and extract discrete segments (Line 3, Algo- There are 3 actions in the search formulation: ROTATE, rithm 2). Each (ROTATE)(FORWARD) sequence and FORWARD, and REACHGOAL, summarized in Table 1. REACHGOAL becomes a separate discrete segment. Let ROTATE action has a cost of 1. The cost of the FORWARD {e ,..., e } be the endpoints of discrete segments. We 1 L 123 Autonomous Robots Table 1 Discrete search actions and their costs Action Description Cost ROTATE Change current direction to a new direction 1 FORWARD Move forward from the current position π along current direction  by the step size σ . It is only available when  = 0 π −g REACHGOAL Connect the current position π to the goal position g where step size of the grid is σ 1 + zero length, to the safety duration s (Line 7, Algorithm 2); the reason for this will be explained in Sect. 6.3. The outputs of discrete planning are segments described by endpoints {e ,..., e } with assigned durations {T ,..., T } 0 L 1 L that are used in the trajectory optimization stage. 6.3 Trajectory optimization In the trajectory optimization stage (Algorithm 3), we for- mulate a convex quadratic optimization problem (QP) to compute a piecewise trajectory f (t ) by smoothing discrete segments. The computed trajectory is collision-free and con- tinuous up to the desired degree of derivative. However, it may be dynamically infeasible (i.e., derivative magnitudes Fig. 4 Discrete Planning. a Goal positions of two robots computed at may exceed the maximum allowed derivative magnitudes the goal selection stage are given. b Red robot plans a discrete path from its current position to its goal position on a search grid that is γ ); this is resolved during the subsequent temporal rescaling aligned on its current position. c Blue robot plans a discrete path from stage. its current position to its goals position on a search grid that is aligned on The decision variables of the optimization problem are its current position. Computed discrete paths are prepended with robot the control points of an L-piece spline where each piece is a start positions to have 0-length first segments. d The resulting discrete paths are given Bèzier curve. The duration of each piece is assigned during discrete planning. Let T = T denote the total duration i =1 of the planned trajectory. The degree of the Bézier curves is d th tuned with the parameter h.Let P ∈ R be the j control i , j prepend the first endpoint to the endpoint sequence in order to th point of the i piece where i ∈{1,..., L}, j ∈{0,..., h}. have a 0-length first segment for reasons that we will explain Let P ={P | i ∈{1,..., L}, j ∈{0,..., h}} be the set i , j in Sect. 6.3 (Line 4, Algorithm 2). The resulting L segments of all control points. are described by L +1 endpoints {e ,..., e } where e = e . 0 L 0 1 Example discrete paths for two robots are shown in Fig. 4. Next, we assign durations to each segment. The total duration of the segments is computed using the ego robot’s maximum velocity γ , the timestamp T that the goal posi- tion should be reached by, and the current timestamp T.We 6.3.1 Constraints use T − T as the desired duration of the plan. However, ˜ ˜ if T − T is negative (i.e. T < T , meaning that the goal There are 4 types of constraints on the trajectory; all are linear position should been reached in the past), or T − T is small in the decision variables P. such that the robot cannot traverse the discrete segments even 1) Workspace constraints: We shift each bounding hyper- with its maximum velocity γ , we adjust the desired dura- plane of workspace W (there are a finite number of such tion to a feasible one, specifically the total length of segments hyperplanes as W is a convex polytope) to account for the divided by the maximum velocity (Line 6, Algorithm 2). We robot’s collision shape R, such that when the robot’s posi- distribute the feasible duration to the segments except for the tion is on the safe side of the shifted hyperplane, the entire first one, proportional to their lengths (Loop at line 8, Algo- robot is on the safe side of the original hyperplane (Line 2, rithm 2). We set the duration of the first segment, which has Algorithm 3). 123 Autonomous Robots Algorithm 3 TRAJECTORY-OPTIMIZATION (ii) it allows for a richer set of collision shapes than basic Input : {e ,..., e }: Endpoints of discrete segments Voronoi cells, which is valid only for hyperspherical objects, 0 L Input : {T ,..., T }: Durations of discrete segments 1 L and (iii) SVM cells are always convex unlike generalized Input : S : Set of robot shapes Voronoi cells. Input : O : Set of obstacles We buffer the ego robot’s SVM cell to account for its colli- Input : p : Current position TaskInput : R : Collision shape function sion shape R, and constrain the first piece of the trajectory to TaskInput : W : Workspace stay inside the buffered SVM cell (BSVM) (loop at Line 5, TaskInput : c : Degree of derivative up to which resulting trajectory Algorithm 3). Only the first piece of the trajectory is con- must be continuous strained to remain in the buffered SVM cell, since the entire Parameter : h : Degree of Bézier curves Parameter : o ˜ : Obstacle check distance planning pipeline is run after δt, which is smaller than the Parameter : r˜ : Robot check distance duration s of the first piece. At that time, planning begins at Parameter : p ˜ : Preferred distance to objects a new location, generating a new first piece that must remain Parameter : α : Preferred distance cost weight in the new buffered SVM cell. Parameter : θ : Endpoint cost weights Parameter : λ : Integrated derivative cost weights Buffering is achieved by changing the offset of the hyper- Return : Potentially dynamically infeasible trajectory plane. R(x) is the shape of the robot when placed at x, defined 1 QP is a quadratic program with variables P , where R is the shape of the robot when as R(x) ={x}⊕R 0 0 2 ϒ ← BUFFER-WORKSPACE(W, R) placed the origin and ⊕ is the Minkowski sum operator. Given 3 addWorkspaceConstraints(QP, ϒ ) 4 ϒ ←∅ ∀i ∈{1,..., L} a hyperplane with normal H and offset H , we find H  that n a a 5 for ∀S ∈ S min-dist(S , R(p)) ≤˜r do j j ensures R(x) is on the negative side of the hyperplane with 6 ϒ ← ϒ ∪{BUFFERED-SVM(S , R(p), R)} 1 1 j normal H and offset H whenever x is on the negative side n a 7 end of the hyperplane with normal H and offset H , and vice n a 8 for i = 1 → L do 9 ζ ← region swept by R from e to e versa, by setting H  = H + max H · y. The follow- i i −1 i a a y∈R n 10 for ∀Q ∈ O min-dist(Q,ζ ) ≤˜ o do ing shows that whenever R(x) is on the negative side of the 11 ϒ ← ϒ ∪{BUFFERED-SVM(Q,ζ , R)} i i i hyperplane (H , H ), x is on the negative side of the hyper- n a 12 end plane (H , H ). The converse can be shown by following n a 13 end the steps backwards. 14 addCollAvoidanceConstraints(QP, ϒ ,...,ϒ ) 1 L 15 addContinuityConstraints(QP, c, p, T ,..., T ) 1 L 16 addEnergyCostTerm(QP, λ, T ,..., T ) 1 L ∀y ∈ R(x) H · y + H ≤ 0 n a 17 addDeviationCostTerm(QP, θ, e ,..., e ) 1 L ⇒ max H · y + H ≤ 0 n a 18 ϒ ← SHIFT-HYPERPLANES(ϒ , p ˜) 1 1 y∈R(x) 19 addPreferredDistanceCostTerm(QP, ϒ ,α) ⇒ max H · y + H ≤ 0 20 f (t ) ← SOLVE-QP(QP) n a y∈{x}⊕R 21 return f (t ) ⇒ max H · (y + x) + H ≤ 0 n a y∈R ⇒ H · x + H + max H · y ≤ 0 n a n Let ϒ be the set of shifted hyperplanes of W. We con- y∈R strain each control point of the trajectory to be on the valid sides of the shifted hyperplanes (Line 3, Algorithm 3). Since Since the duration of the first piece was set to the safety Bézier curves are contained in the convex hulls of their con- duration s ≥ δt in discrete planning, the robot stays within trol points, constraining control points to be in a convex set its buffered SVM cell for at least δt. Moreover, since plan- automatically constrains the Bézier curves to stay within the ning is synchronized across all robots, the pairwise SVM same convex set. hyperplanes they compute will match, thus the buffered SVM 2) Robot-robot collision avoidance constraints: For robot- cells of robots are disjoint. This ensures robot-robot collision robot collision avoidance, recall that each robot replans with avoidance until the next planning iteration. the same period δt and planning is synchronized between Computed SVMs and BSVMs are shown in Fig. 5bfor a robots. two robot case. At each iteration, the ego robot computes its SVM cell To ensure that the number of constraints of the optimiza- within the SVM tessellation of robots using hard-margin tion problem does not grow indefinitely, SVM hyperplanes SVMs. SVM tessellation is similar to Voronoi tesselation, are only computed against those robots that are at most r˜ the only difference is that pairwise SVM hyperplanes are away from the ego robot. This does not result in unsafe computed between collision shapes instead of Voronoi hyper- trajectories so long as r˜ is more than the total maximum planes. We choose SVM tessellation because (i) hard-margin distance that can be traversed by two robots while follow- SVM is convex, hence pairs of robots can compute the same ing the first pieces of their trajectories, for which an upper 1 1 exact hyperplane under the assumption of perfect sensing, bound is max (γ s + γ s ), where s and s are i , j ∈{1,...,N } i j i j i j 123 Autonomous Robots Fig. 5 Trajectory Optimization. a Discrete segments of two robots com- robot’s color. SVM hyperplanes between the swept region and the obsta- puted at the discrete planning stage. b Robot-robot collision avoidance cles are given as light-colored lines. SVM hyperplanes are buffered to constraints are computed using BSVMs. The green hyperplane is the account for the robot’s collision shape and shown as dark-colored lines SVM hyperplane between two robots. Each robot shifts the SVM hyper- (BSVMs). The shift operations are shown as arrows. Obstacles and con- plane to account for its geometry, and constrains the first piece of the straints generated from them are colored using the same colors. For each trajectory with the resulting BSVM. c–e Active set of robot-obstacle piece, the feasible region that is induced by the robot-obstacle collision collision avoidance constraints for three different pieces (one belong- avoidance constraints is colored in gray. f Trajectories computed by the ing to the blue robot, two belonging to the red robot). In each figure, trajectory optimization stage are given the region swept by the robot while traversing the segment is shown in the durations of the first pieces of the trajectories of robots i avoidance, pairs of robots must compute matching hyper- and j respectively. planes using the same algorithm. 3) Robot-obstacle collision avoidance constraints: Buffered Elements of the active set of SVM and BSVM hyperplanes SVM hyperplanes are used for robot-obstacle collision avoid- for robot-obstacle collision avoidance are shown in an exam- ance as well. Let ζ ⊂ R be the region swept by the ego robot ple scenario in Fig. 5c–e for three different pieces. th while traversing the i segment from e to e . We compute Similar to robot-robot collision avoidance, to ensure that i −1 i the SVM hyperplane between ζ and each object in O,buffer the number of constraints of the optimization problem does it as explained before to account for robot’s collision shape, not grow indefinitely, we only compute SVM hyperplanes th and constrain the i piece by the resulting buffered SVM between ζ and obstacles that are not more than o ˜ away from (Loop at line 8, Algorithm 3). This ensures that trajectory ζ . pieces do not cause collisions with obstacles. Let ϒ be the set of buffered SVM hyperplanes that th The use of SVMs for collision avoidance against obsta- constrain the i piece ∀i ∈{1,..., L}. ϒ contains both cles is a choice of convenience, as we already use them for robot-robot and robot-obstacle collision avoidance hyper- robot-robot collision avoidance. For robot-obstacle collision planes while ϒ ∀ j ∈{2,..., L} contain only robot-obstacle avoidance, one can use any separating hyperplane between collision avoidance hyperplanes. This is because the first ζ and objects in O, while in the case of robot-robot collision piece is the only piece that we constrain with robot-robot 123 Autonomous Robots collision avoidance hyperplanes because it is the only piece feasibility of the kinematic constraints. The feasibility of the that will be executed until the next planning iteration. dynamic constraints cannot be ensured for arbitrary degrees 4) Continuity constraints: We add two types of continu- of continuity. ity constraints: (i) continuity constraints between planning Remark 2 Kinematic constraints of the optimization problem iterations, and (ii) continuity constraints between trajectory generated from a discrete path output from the discrete plan- pieces (Line 15, Algorithm 3). ning stage are feasible for the same path when the degree of To enforce continuity between planning iterations, we add Bézier curves h ≥ 1. constraints that enforce Reasoning. Any Bézier curve with degree h ≥ 1 can repre- j j d f (0) d p sent a discrete segment by setting half of the points to the = ∀ j ∈{0,..., c} j j dt dt start of the segment and other half to the end of the segment. Hence, we will only show that a discrete path output from where c is the task input denoting the continuity degree up discrete planning stage by itself satisfies the kinematic con- to which the resulting trajectory must be continuous and p is straints generated. Remember that discrete path output from the current position. the discrete planning stage has the property p = e = e . 0 1 To enforce continuity between trajectory pieces, we add Each robot-robot SVM problem is feasible (see Remark 1). constraints that enforce Let H be the normal and H be the offset of any of the n a j j robot-robot SVMs. It is shifted by setting the offset to H = d f (T ) d f (0) a i i i +1 j j H + max H · y. The point p satisfies H · p + H  ≤ 0 a y∈R n n a dt dt 0 because R(p) is on the negative side of the SVM. The robot- ∀i ∈{1,..., L − 1}∀ j ∈{0,..., c} robot BSVM hyperplanes are used to constrain the first piece th of the trajectory, and setting the first piece as a 0-length seg- where f (t ) is the i piece of the trajectory. ment with p = e = e satisfies the robot-robot collision 0 1 Remark 1 discusses that the SVM problems generated dur- avoidance constraints. ing trajectory optimization are feasible, i.e., the trajectory Each robot-obstacle SVM problem is feasible (see Remark 1). optimization stage will always succeed constructing the QP. Let H be the normal and H be the offset of any of the robot- n a Remark 1 All SVM problems generated for robot-robot and obstacle SVMs that is between ζ and an obstacle. It is shifted robot-obstacle collision avoidance from the discrete path out- by setting the offset to H  = H +max H ·y. All points a a y∈R n putted by the discrete planning stage are feasible. on the line segment p (t ) = e + t (e − e ), t ∈[0, 1] i i −1 i i −1 from e to e satisfy the BSVM constraint because R(p (t )) i −1 i i Reasoning. The discrete planning stage outputs a discrete is on the negative side of the SVM hyperplane ∀t ∈[0, 1]. path such that a robot with collision shape R following the Since each SVM hyperplane between ζ and obstacles is only path does not collide with any obstacles in O or any other th used to constrain the i piece, constraints generated by it are robots in S. It also ensures that p = e since the search starts feasible for the segment from e to e . i −1 i from the robot’s current position p. Hence, R(p) does not The feasibility of the workspace constraints are trivial intersect with any S ∈ S. Since each robot is assumed to since the robot moving along discrete segments is contained be convex, there exists at least one hyperplane that separates in the workspace, and we shift bounding hyperplanes of the R(p) from S for each j by the separating hyperplane the- workspace in the same way as SVM hyperplanes. Hence, the orem. SVM is an optimal separating hyperplane according discrete path satisfies the workspace constraints. to a cost function. Therefore, SVM problems between R(p) Initial point position continuity of the robot is satisfied by and S ∈ S for robot-robot collision avoidance are feasible. the given discrete segments since p = e . Position continuity Since the robot moving along the discrete segments is col- between segments are trivially satisfied by the given discrete lision free, ζ is collision free for all i. Also, since it is a sweep segments, since the discrete path is position continuous by of a convex shape along a line segment, ζ is convex as shown its definition. in Lemma 2 in Appendix A. Similar to the previous argument, since ζ is collision-free and convex and all obstacles in the 6.3.2 Cost function environment are convex, each SVM problem between ζ and Q ∈ O for robot-obstacle collision avoidance is feasible by The cost function of the optimization problem has 3 terms: the separating hyperplane theorem. (i) energy usage, (ii) deviation from the discrete path, and Workspace, robot-robot collision avoidance, robot-obstacle (iii) preferred distance to objects. collision avoidance, and position continuity constraints are We use the sum of integrated squared derivative magni- kinematic constraints. Higher-order continuity constraints tudes as a metric for energy usage (Line 16, Algortihm 3), are dynamic constraints. Remark 2 discusses the ensured similar to the works of Richter et al. (2013) and Hönig et al. 123 Autonomous Robots (2018). Parameters λ ={λ } are the weights for integrated Notice that we formulate continuity and the safety of th squared derivative magnitudes, where λ is the weight for j the trajectory using hard constraints. This ensures that the degree of derivative. The energy term J (P) is given as resulting trajectory is kinematically safe and continuous up energy to degree c if the optimization succeeds for all robots and T j planning is synchronized. d f (t ) J (P) = λ dt . energy j dt λ ∈λ 6.4 Temporal rescaling We use squared Euclidean distances between trajectory At the temporal rescaling stage, we check whether the piece endpoints and discrete segment endpoints as a metric dynamic limits of the robot are violated. If the resulting tra- for deviation from the discrete path (Line 17, Algorithm 3). jectory is valid, i.e. derivative magnitudes of the trajectory Remember that each Bézier curve piece i ends at its last are less than or equal to the maximum derivative magnitudes control point P . Parameters θ ={θ }∈ R are the weights i ,h i k γ ∀k ∈{1,..., K }, the trajectory is returned as the output for each segment endpoint. The deviation term J (P) is dev of RLSS. If not, temporal rescaling is applied to the trajec- given as tory similar to Hönig et al. (2018) and Park et al. (2020)by increasing the durations of the pieces so that the trajectory is J (P) = θ P − e . dev i i ,h i valid. We scale the durations of pieces by multiplying them i ∈{1,...,L} with a constant parameter greater than 1 until the dynamic limits are obeyed. The last term of the cost function models the preferred We choose to enforce dynamic feasibility outside of the distance to objects. We use this term to discourage robots trajectory optimization problem as a post processing step from getting closer to other objects in the environment; this because (i) the output of the trajectory optimization stage is increases the numerical stability of the algorithm by driving often dynamically feasible, hence rescaling is rarely needed, robots away from tight states. and (ii) adding piece durations as variables to the opti- We shift each hyperplane in ϒ (i.e., buffered SVM hyper- mization problem would make it a non quadratic program, planes constraining the first piece) by the preferred distance potentially decreasing performance. to objects, p ˜, to create the preferred hyperplanes ϒ (Line 18, Algorithm 3). Since each robot replans with the period δt,we add a cost term that drives the robot closer to the preferred 7 Evaluation hyperplanes at the replanning period (Line 19, Algorithm 3). We take the sum of squared distances between f (δt ) and Here, using synchronized simulations, we first evaluate our hyperplanes in ϒ : algorithm’s performance when different parameters are used in Sect. 7.1. Second, we conduct an ablation study to show the J (P) = α H f (δt ) + H pre f a effects of two important steps, namely the prepend operation of the discrete planning stage and the preferred distance cost H∈ϒ term of the trajectory optimization stage, to the performance where H is the normal and H is the offset of hyperplane n a of the algorithm in Sect. 7.2. Third, we compare our algorithm H, and α is the weight of J term. Notice that this term is pre f to two state-of-the-art baseline planners in Sect. 7.3. Finally, defined over the control points of first piece since δt ≤ s = we show our algorithm’s applicability to real robots by using T , supporting the utilization of ϒ only. 1 1 it on quadrotors and differential drive robots in Sect. 7.4. The overall trajectory optimization problem is: We conduct our simulations on a laptop computer with Intel i7-8565U @ 1.80GHz running Ubuntu 20.04. We min J (P) + J (P) + J (P) s.t. energy dev pre f implement our algorithm for a single core because of imple- mentation simplicity and fairness to the baseline planners, H P + H ≤ 0 ∀i ∈{1,..., L} i , j a which are not parallelized. The memory usage of each sim- ∀ j ∈{0,..., h} ulation is 30MB on average for our algorithm. In synchronized simulations, we compute trajectories for ∀H ∈ ϒ ∪ ϒ i W each robot using the same snapshot of the environment, move j j d f (0) d p = ∀ j ∈{0,..., c} robots perfectly according to computed trajectories for the re- j j dt dt planning period, and replan. The effects of planning iterations j j d f (T ) d f (0) i i i +1 taking longer than the re-planning period are not modeled in = ∀i ∈{1,..., L − 1} j j dt dt the synchronized simulations. We show that all algorithms ∀ j ∈{0,..., c}. (RLSS and the baselines) can work in 1Hz − 10Hz on 123 Autonomous Robots the hardware we use, and assert that more powerful com- volume swept by the ego robot while traversing a given puters can be used to shorten planning times. In addition, segment, and check collisions between the robot travers- parallelization of the A* search on GPUs is possible. For ing the segment and only the obstacles returned from the example, Zhou and Zeng (2015) show the possibility of 6–7 query. Also, to generate robot-obstacle collision avoidance x speedup in A* search for pathfinding problems on GPUs. constraints, we execute axis aligned box queries around the Moreover, parallelization of quadratic program solving is segments to retrieve nearby obstacles. We generate BSVM possible through (i) running multiple competing solvers in constraints only against nearby obstacles that are no more multiple cores and returning the answer from the first one that than o ˜ away from the robot traversing the segment. Other pop- solves the problem, or (ii) parallelizing individual solvers. ular approaches for environment representations include (i) For instance, IBM ILOG CPLEX Optimizer and Gurobi Euclidean signed distance fields (ESDFs) (Oleynikova et al., Optimizer support running multiple competing solvers con- 2017), which support fast distance queries to nearest obsta- currently. IBM ILOG CPLEX Optimizer supports a parallel cles, (ii) 3D circular buffers (Usenko et al., 2017), which aim barrier optimizer, which parallelizes a barrier algorithm for to limit memory usage of maps and supports fast occupancy solving QPs. Gondzio and Grothey (2009) propose a par- checks, and (iii) Gaussian mixture models (O’Meadhra et allel interior point method solver that exploits nested block al., 2019), which continuously represent occupancy instead structure of problems. Performance improvements of these of discretizing the environment as the former approaches do. approaches are problem dependent, and we do not investigate None of these representations are as suitable as OcTrees for how much the performance of our trajectory optimization RLSS since they do not allow fast querying of obstacles in stage could be improved with such methods. the vicinity of segments. We use the IBM ILOG CPLEX In all experiments, 32 robots are placed in a circle forma- Optimizer to solve our optimization problems. In all experi- tion of radius 20m in 3D, and the task is to swap all robots to ments, robots that are closer than 0.25m to their goal positions the antipodal points on the circle. The workspace W is set to are considered as goal reaching robots. If a robot that has not reached its goal does not change its position more than 1cm an axis aligned bounding box from −25m −25m 0m to in the last 1s of the simulation, it is considered as a dead- 25m 25m 5m . Robot collision shapes are modeled as axis locked robot. At any point of the simulation, if each robot is aligned cubes with 0.2m edge lengths. The desired planning either at the goal or deadlocked, we conclude the simulation. horizon τ is set to 5s. The safety distance D of goal selection is set to 0.2m. Robots have velocity limit γ = 3.67 , and 2 m 7.1 Effects of selected parameters accelaration limit γ = 4.88 , which are chosen arbitrarily. The safety duration s is set to 0.11s and re-planning period We evaluate the performance of our algorithm when 4 impor- δt is set to 0.1s. We set integrated derivative cost weights tant parameters are changed: step size σ of the search grid, λ = 2.0 for velocity and λ = 2.8 for acceleration. We 1 2 degree h of Bézier curves, obstacle check distance o ˜, and set endpoint cost weights to θ = 0, θ = 150, θ = 240, 1 2 3 robot check distance r˜. Step size σ of the search grid is the θ = 300 ∀i ≥ 4. Setting θ = 0 allows the optimiza- i 1 parameter that affects discrete planning performance most tion to stretch the first 0-length segment freely, and setting because it determines the amount of movement at each step other θs incrementally increases the importance of tail seg- during the A search. The degree h of Bézier curves is ments. We set the preferred distance to objects to p ˜ = 0.6m important in trajectory optimization because it determines and the preferred distance cost weight α = 0.3. We use the the number of decision variables. The obstacle check dis- OcTree data structure from the octomap library (Hornung tance o ˜ and robot check distance r˜ determine the number of et al., 2013) to represent the environment. Each leaf-level collision avoidance constraints in the optimization problem. occupied axis aligned box of the OcTree is used as a sep- In all of the parameter evaluations, we use a random 3D arate obstacle in all algorithms. OcTree allows fast axis forest environment with OcTree resolution 0.5m in which aligned box queries which return obstacles that intersects 10% of the environment is occupied. There are 2332 leaf- with a given axis aligned box, an operation we use exten- level boxes in OcTree, translating to 2332 obstacles in total. sively in our implementation. For example, we use axis We set the desired trajectory of each robot to the straight line aligned box queries in discrete planning as broadphase col- segment that connects the robot’s start and goal positions. lision checkers to find the obstacles that are close to the We set the duration of the segment to the length of the line segment divided by the maximum velocity of the robot. We https://www.ibm.com/docs/en/icos/22.1.0?topic=optimizers- enforce continuity up to velocity, hence set c = 1. concurrent-optimizer-in-parallel. In our experiments, our algorithm does not result in any https://www.gurobi.com/documentation/9.5/refman/ collisions or deadlocks. concurrent_environments.html. https://www.ibm.com/docs/en/icos/22.1.0?topic=optimizers-using- parallel-barrier-optimizer. https://www.ibm.com/analytics/cplex-optimizer. 123 Autonomous Robots 1 m We report average computation time per iteration (Fig. 6) γ = 3.67 , the maximum amount of distance that can and average navigation duration of robots (Fig. 7). Average be traversed by a robot until the next planning iteration is navigation duration is the summed total navigation time for 0.367m. The obstacle check distance must be more than this all robots divided by the number of robots. value for safety. We set step size σ = 0.77m, robot check distance r˜ = 2.0m, and degree of Bézier curves h = 12, 7.1.1 Step size  of discrete search which are determined by premiliminary experiments and the results of the experiments in Sects. 7.1.1, 7.1.2, and 7.1.4. We evaluate our algorithm’s performance when step size σ The results are summarized in Figs. 6c and 7c. is changed. We set σ to values between 0.25m to 1.5m with The obstacle check distance is the most important param- 0.25m increments. We set obstacle check distance o ˜ = 1.0m, eter that determines the speed of trajectory optimization, and robot check distance r˜ = 2.0m, and degree of Bézier curves hence the planning pipeline. As o ˜ increases, the number of h = 12 in all cases, which are determined by premil- SVM computations and the number of constraints in the iminary experiments and the results of the experiments in optimization problem increases, which results in increased Sects. 7.1.2, 7.1.3, and 7.1.4. The results are summarized in computation time. Average navigation durations of the robots Figs. 6a and 7a. are close to 22.5m in all cases, suggesting the robustness As the step size gets smaller, discrete search takes more of the algorithm to this parameter. We explain the reason time; but the algorithm can still work in about 2Hz even when of this robustness as follows. All obstacles are considered σ = 0.25m. The average navigation duration of robots are during discrete search and o ˜ determines the obstacles that close to 22.5s in each case, suggesting the robustness of the are considered during trajectory optimization. Therefore, the algorithm to the changes in this parameter. At σ ≥ 0.75, path suggested by the discrete search is already very good, the time discrete planning takes is less than 6% of the time and obstacle avoidance behavior of trajectory optimization trajectory optimization takes. is only important when the discrete path is close to obsta- We also run the algorithm with σ = 3m,σ = 6m, and cles. In those cases, all obstacle check distances capture the σ = 12m, which decreases the flexibility of the discrete obstacles in the vicinity of the path. Therefore, the quality of search considerably. In all of those cases, discrete search the planned trajectories does not increase as o ˜ increases. results in fluctuations, and some robots get stuck in livelocks, in which they move between same set of positions without reaching to their goal positions. 7.1.4 Robot check distance r˜ 7.1.2 Degree h of Bézier curves Last, we evaluate our algorithm’s performance when the robot check distance r˜ is changed. We set r˜ to values between Next, we evaluate our algorithm’s performance when the 1m and 3.5m with 0.5m increments. Robot check distance degree h of Bézier curves is changed. We set h to values must be at least twice the amount of distance that can be in {5,..., 12}. We set step size σ = 0.77m, obstacle check traversed by the robot in one planning iteration, i.e. 0.734m, distance o ˜ = 1.0m, and robot check distance r˜ = 2.0m, because two robots may be travelling towards each other which are determined by premiliminary experiments and the with their maximum speed in the worst case. We set step size results of the experiments in Sects. 7.1.1, 7.1.3, and 7.1.4. σ = 0.77m, obstacle check distance o ˜ = 1.0m, and degree The results are summarized in Figs. 6b and 7b. of Bézier curves h = 12, which are determined by premil- Even if the degree of the Bézier curves determine the iminary experiments and the results of the experiments in number of decision variables of the trajectory optimization, Sects. 7.1.1, 7.1.2, and 7.1.3. the computation time increase of the trajectory optimization The speed of the algorithm is not affected by the robot stage is not more than 10% between degree 5 Bézier curves check distance considerably, because there are 32 robots in and degree 12 Bézier curves. Also, average navigation dura- the environment, and from the perspective of the ego robot, tion of robots are close to 22.5m in each case, suggesting the there are at most 31 constraints generated from other robots. robustness of the algorithm to the changes in this parameter Since there are more than 2000 obstacles in the environment, as well. effects of the obstacle check distance are more drastic than robot check distance. Similar to other cases, average naviga- 7.1.3 Obstacle check distance o ˜ tion duration of the robots is not affected by the choice of r˜ because constraints generated by the robots far away do Next, we evaluate our algorithm’s performance when the not actually constrain the trajectory of the ego robot since obstacle check distance o ˜ is changed. We set o ˜ to val- the ego robot cannot move faster than its maximum speed ues between 0.5m and 3m with 0.5m increments. Since and hence the first piece of the trajectory is never affected by the replanning period δt = 0.1s, and maximum velocity those constraints. 123 Autonomous Robots Fig. 6 Average computation time per stage are given when different parameters are used. Goal selection and temporal rescaling steps are given as“other”since they take a small amount time compared to other two stages. All experiments are done in a 3D random forest environment with 10% occupancy Fig. 7 Average navigation duration of robots from their start positions to their goal positions are given when the selected parameters are changed. In all cases, average navigation duration is not affected by the changes in the selected parameters in the chosen ranges Overall, RLSS does not result in collisions or deadlocks line segments divided by the maximum speed of the robots. when parameters are not set to extreme values. In addi- During simulation, robots continue using their existing plans tion, changes in parameters, outside of extreme ranges, do when planning fails. Also, colliding robots continue navigat- not result in significant changes in the average navigation ing and are not removed from the experiment. durations. These suggest that RLSS does not need extensive We generate ten 3D-maze like environments and list the parameter tuning. average and standard deviation values for our metrics in Table 2. We report the failure rate of all algorithms in the form of the ratio of number of failures to the number of plan- 7.2 Ablation study ning iterations (Fail Rate column in Table 2), the number of robots that are involved in at least one collision during navi- We investigate the effects of (i) the prepend operation of gation (# Coll. column in Table 2) and the average navigation the discrete planning stage (Line 4, Algorithm 2) and (ii) the duration of all robots (Avg. Nav. Dur. column in Table 2). preferred distance cost term J of the trajectory optimiza- pre f The failure rate of RLSS is 0.01% on average. RLSS w/o tion stage to the performance of the algorithm. The prepend pref. dist. fails 0.06% of the time. RLSS w/o prepend fails operation enables the kinematic feasibility of the generated 12.73% of the time. This is drastically more than RLSS optimization problem as shown in Remark 2. The preferred w/o pref. dist because our prepend operation ensures the distance cost term increases the numerical stability of the kinematic feasibility of the optimization problem while our algorithm by encouraging robots to create a gap between preferred distance cost term tackles numerical instabilities themselves and other objects. only. RLSS with neither results in a failure rate of 10.62%. We consider four versions our algorithm: RLSS, RLSS Interestingly, RLSS with the preferred distance cost term but without the prepend operation (RLSS w/o prepend), RLSS without the prepend operation (RLSS w/o prepend) results without the preferred distance cost term (RLSS w/o pref. in a higher failure rate than RLSS with neither. We do not dist.) and RLSS with neither. We set step size σ = 0.77m, investigate the root cause of this since failure rates in both obstacle check distance o ˜ = 1.0m, robot check distance cases are a lot higher than of RLSS. r˜ = 2.0m, degree of Bézier curves h = 12, and c = 1 The effects of failures are seen in the next two metrics. (continuity up to velocity). Robots navigate in 3D maze like RLSS results in no collisions. The number of colliding robots environments. The desired trajectories are set to straight line increase to 0.67 in RLSS w/o pref. dist, 6.78 in RLSS w/o segments connecting robot start positions to goal positions prepend and 8.67 in RLSS with neither on average. The and the durations of the segments are set to the length of the 123 Autonomous Robots Table 2 The results of the Fail Rate # Coll Avg. Nav. Dur [s] ablation study RLSS 0.89 (1.45) / 7904 (297) 0 (0) 24.68 (0.93) RLSS w/o pref. dist 4.78 (9.81) / 8194 (489) 0.67 (1) 25.58 (1.53) RLSS w/o prepend 1556 (120) / 12,220 (538) 6.78 (3.73) 38.17 (1.68) RLSS with neither 1569 (101) / 14,778 (412) 8.67 (2.87) 46.16 (1.29) We compare four different versions of RLSS. We ablate (i) the prepend operation of the discrete planning stage and (ii) the preferred distance cost term of the trajectory optimization stage. The details of the metrics are given in Sect. 7.2. The reported values are means and standard deviations (given in parentheses) of 10 experiments in random maze-like environments. Both prepend operation and preferred distance cost term are important for the effectiveness of the algorithm average navigation duration of robots is lowest in RLSS. It time invariant systems with position output. We define the increases by 3.65% in RLSS w/o pref. dis, 54.66% in RLSS systems with three matrices A, B, C. Since the output of the w/o prepend and 87.03% in RLSS with neither compared to system is the position of the robot, it cannot depend on the RLSS. current input, hence D = 0. This allows us to formulate the These results show that the prepend operation is more problem for a richer set of dynamics and have constraints on important than the preferred distance cost term for the success higher order derivatives than robot velocity. of the algorithm. Nevertheless, RLSS needs both to be safe BVC as published also does not consider obstacles in the and effective. environment. We extend their formulation to static obstacles by modeling obstacles as robots. Since obstacles are static, they stay within their buffered Voronoi cells at all times. 7.3 Comparisons with baseline planners Since we model obstacles as robots, extended BVC does not require the ability of distinguishing robots from obstacles. We compare the performance of our planner to two baseline Our extended BVC formulation, which we call eBVC, is planners that do not require communication in 3D experi- as follows: ments. We set step size σ = 0.77m, obstacle check distance o ˜ = 1.0m, robot check distance r˜ = 2.0m, and degree of M M −1 Bézier curves h = 12 in RLSS in all cases. min λ p − d(T + M t ) + θ u s.t . i i i i u ,...,u 0 M −1 i =1 i =0 x = Ax + Bu ∀i ∈{0,..., M − 1} i +1 i i 7.3.1 Extended buffered Voronoi cell (eBVC) planner p = Cx ∀i ∈{0,..., M } i i The first baseling planner is a MPC-style planner based on p ∈ V ∀i ∈{0,..., M } buffered Voronoi cells introduced by Zhou et al. (2017), p ∈ W ∀i ∈{0,..., M } which we call BVC. In the BVC approach, each robot com- u  u  u ∀i ∈{0,..., M − 1} min i max putes its Voronoi cell within the Voronoi tesselation of the x  x  x ∀i ∈{0,..., M } environment. This is done by using the position informa- min i max tion of other robots. Robots buffer their Voronoi cells to account for their collision shapes and plan their trajectories where T is the current timestamp, d(t ) is the desired trajec- within their corresponding buffered Voronoi cells. Similar to tory for the robot, t is the discretization timestep of the RLSS, BVC does not require any communication between system, M is the number of steps to plan for, u is the input robots, requires perfect sensing of robot positions in the envi- to apply from timestep i to i + 1, x is the state at timestep i, ronment, and the resulting trajectories are safe only when p is the position at timestep i, V is the buffered Voronoi cell planning is synchronized between robots. They also require of the robot, u and u are the limits for the inputs, x min max min that each robot stays within its buffered Voronoi cell until and x are the limits for the states (which can be used to max the next planning iterations. Unlike RLSS, BVC is based on bound velocity in a double integrator system, or velocity and buffered Voronoi cells and it cannot work with arbitrary con- acceleration in a triple integrator system for example), and vex objects. Instead, robots are modeled as hyperspheres in W is the workspace of the robot. M t is the planning hori- BVC. zon. A robot plans toward the position d(T + M t ), which The original formulation presented in the BVC article only is the position of the robot after the planning horizon if it allows position state for the robots, which cannot model a could follow the desired trajectory perfectly. x is the current rich set of dynamics, including double, triple, or higher order state of the robot. The first term of the cost function penalizes integrators. We extend their formulation to all discrete linear deviation from the goal position for the final and each inter- 123 Autonomous Robots mediate position with different weights λ . The second term of the cost function is the input cost that penalizes input mag- nitudes with different weights θ . We apply the first input of the solution for duration t, and replan at the next timestep. We use our own implementation of eBVC as explained above during the comparisons. 7.3.2 Relative safe flight corridor (RSFC) planner The second planner we compare against is presented by Park and Kim (2021), in which piecewise Bézier curves are com- puted, executed for a short duration, and replanning is done at the next iteration similar to our work. It utilizes the fact that the difference of two Bézier curves is another Bézier curve by constraining these relative Bézier curves to be inside safe regions (relative safe flight corridors, or RSFCs) defined Fig. 8 Desired trajectories and the used forest map of experiment 4 is given in (a). Same forest map is used in each forest experiment. according to robot collision shapes. We call this algorithm Desired trajectories and the used maze map of experiment 7 is given RSFC for short. RSFC does not require any communication in (b). Same maze map is used in each maze experiment. c shows the between robots. It utilizes both positions and velocities of executed trajectories of robots running RLSS from top in experiment 4. other objects in the environment, hence requires more sens- d shows the executed trajectories of robots running RLSS from top in experiment 7 ing information than our algorithm. Velocities are used to predict the trajectories of other robots as piecewise Bézier curves. While it can handle dynamic obstacles as well, we use it in static environments in our comparisons, since RLSS velocity continuity is required, the system of eBVC is a does not handle dynamic obstacles explicitly. double integrator with position output. When acceleration We use the authors’ implementation of RSFC during our continuity is required, the system of eBVC is a triple inte- comparisons. grator with position output. We compare RSFC to RLSS and eBVC only with accelera- 7.3.3 Experiments and results tion continuity requirement because the authors’ implemen- tation hard codes the acceleration continuity requirement, We compare RLSS against eBVC and RSFC in 10 differ- while in theory it can work with any degree. Also, we com- ent experiments differing in required degree of continuity, pare RSFC only in the case of an empty prior map in order desired trajectories, and map of the environment. In all exper- not to change RSFC’s source code, while in theory it can be iments, 32 robots are placed in a circle formation with radius guided with arbitrary trajectories. 20m in 3D. The task is to swap the positions of robots to the We plan for 5s long trajectories in every 0.1s with all antipodal points on the circle. The results of the experiments algorithms. In eBVC, we plan for M = 50 steps with dis- are summarized in Table 3. cretization timestep t = 0.1s. We set state and input upper There are 3 maps we use: empty, forest (Fig. 8a), and maze and lower bounds in eBVC in order to obey the dynamic lim- (Fig. 8b), listed in the map column of Table 3. In the empty its of the robots. We set distance to goal weights λ = 120, map, there are no obstacles in the environment. The forest λ = 20 ∀i ≥ 2 in eBVC, putting more importance on the map is a random forest with 10% occupancy; it has a radius position of the robot after 1 timestep. We set θ = 1 ∀i in of 15m and each tree is a cylinder with radius 0.5m.The eBVC. maze map is a maze-like environment with choke regions. Both eBVC and RSFC require spherical obstacles. We use We compute the desired trajectories of the robots by run- the smallest spheres containing each leaf-level box of the ning a single-agent shortest path using the discrete planning OcTree structure as obstacles in eBVC and RSFC. Similar to stage of RLSS. We run single-agent shortest path on a prior RLSS, we use robot and obstacle check distance to limit the map, which is set to either a full map of the environment or number of obstacles considered at each iteration. We set both to an empty map; this is listed in the prior map column of obstacle and robot check distance o ˜ =˜r = 2.0m in eBVC, Table 3. When the prior map is empty, single-agent short- and set o ˜ =˜r = 5.0m in RSFC, since smaller values in RSFC est paths are straight line segments connecting robot start result in a high number of collisions and higher values for positions to robot goal positions. the parameters do not improve the success of the algorithms. We set two different continuity requirements: velocity and Robots are modeled as spheres in eBVC and RSFC as well. acceleration, listed in the continuity column of Table 3. When We set robot shapes to spheres with radius 0.173m, which 123 Autonomous Robots Table 3 The results of the comparisons of RLSS, eBVC, and RSFC are summarized Experiment Map Prior Map Continuity Algorithm # Deadlocks # Coll. Robots Avg. Collision Dur. [s] Avg. Nav. Dur. [s] Avg. Comp. Time [ms] 1 Empty Empty Velocity RLSS 0 0 NA 22.37 107 eBVC 0 0 NA 18.50 106 2 Acceleration RLSS 0 0 NA 22.12 110 eBVC 0 0 NA 20.65 161 RSFC 0 0 NA 19.83 47 3 Forest Forest Velocity RLSS 00 NA 23.11 182 eBVC 10 7 0.53 18.94 387 4 Acceleration RLSS 00 NA 23.06 147 eBVC 8 13 0.83 21.24 763 5 Forest Empty Velocity RLSS 00 NA 22.62 192 eBVC 8 24 1.35 18.54 186 6 Acceleration RLSS 00 NA 22.72 244 eBVC 11 15 0.71 21.28 970 RSFC 0 6 0.42 21.82 201 7 Maze Maze Velocity RLSS 00 NA 25.09 160 eBVC 21 9 0.67 20.12 145 8 Acceleration RLSS 00 NA 25.32 170 eBVC 17 13 1.49 23.80 264 9 Maze Empty Velocity RLSS 00 NA 27.98 410 eBVC 32 0 NA NA 176 10 Acceleration RLSS 00 NA 32.04 386 eBVC 30 2 3.26 21.75 407 RSFC 0 22 0.34 28.23 80 Each experiment differ in the map used during navigation, prior map used during desired trajectory computation, and the required degree of continuity. RLSS results in longer navigation durations than eBVC and RSFC on average; but eBVC suffers from deadlocks in environments with obstacles and RSFC results in collisions in environments with obstacles Autonomous Robots are the smallest spheres containing the actual robot shapes. lot better than eBVC. RSFC results in no deadlocks while We count collisions only when contained leaf-level OcTree eBVC results in 11 deadlocks. 15 out of remaining 21 robots boxes and contained robot shapes intersect. This gives both get involved in at least one collision when eBVC is used with eBVC and RSFC buffer zones for collisions. an average collision duration of 0.71s. 6 out of 32 robots We report the number of deadlocking robots (# Dead- running RSFC collide at least once with average collision locks column in Table 3), number of robots that are involved duration of 0.42s. RLSS does not result in any deadlocks or in at least one collision (# Coll. Robots column in Table 3), collisions. collision duration of robots that are involved in collisions When the environment is a complicated maze, the per- averaged over robots (Avg. Collision Dur. column in Table 3), formance of both eBVC and RSFC degrades more. With full average navigation duration of non-deadlocking robots from prior map and velocity continuity (experiment 7), 21 out of 32 their start positions to goal positions (Avg. Nav. Dur. column robots deadlock, 9 out of remaining 11 are involved in colli- in Table 3), and computation time per iteration averaged over sions, leaving only 2 reaching to their goal volumes without each planning iteration of each robot (Avg. Comp. Time col- an incident when eBVC is used. With full prior map and umn in Table 3). We continue the navigation of colliding acceleration continuity (experiment 8), 17 out of 32 robots robots and do not remove them from the experiment. deadlock, 13 out of remaining 15 are involved in collisions, Executed trajectories of robots running RLSS are shown again leaving only 2 that reach their goal volumes without in Fig. 8c for experiment 4, and in Fig. 8d for experiment 7 incident when eBVC is used. RLSS does not result in any as examples. deadlocks or collisions in these scenarios. When the prior RLSS does not result in any deadlocks or collisions in map is empty, i.e. the desired trajectories are straight line all cases. eBVC has a significant number of deadlocks and segments, the performance of eBVC degrades even more. All RSFC results in collisions in experiments with obstacles. robots deadlock in the case with velocity continuity (experi- When there are no obstacles in the environment, e.g., ment 9), while 30 out of 32 robots deadlock and the rest get in experiments 1 and 2, no algorithm results in deadlocks involved in collisions in the case with acceleration continuity or collisions. When velocity continuity is required, e.g., in (experiment 10). No robot running RSFC deadlocks but 22 experiment 1, the average navigation duration of robots run- out of 32 get involved in collisions at least once with 0.34s ning eBVC is 18% lower than those that run RLSS. Both average collision duration in experiment 10 with accelera- eBVC and RLSS run close to 9Hz on average. When acceler- tion continuity. RLSS does not result in any deadlocks or ation continuity is required, e.g., in experiment 2, the average collisions in those cases. navigation duration of robots running RSFC is 10% lower Overall, robots running RLSS have higher navigation than those that run RLSS; and the average navigation dura- durations than those that use eBVC or RSFC in all exper- tion for eBVC is 7% lower than that for RLSS. RLSS runs at iments. While the higher navigation duration is not an about 9Hz on average, eBVC runs close to 6Hz on average, important metric when other algorithms cause deadlocks or and RSFC runs close to 21Hz on average. When there are collisions, it is an important metric when they do not (exper- no obstacles in the environment, the discrete search of RLSS iments 1 and 2 with no obstacles in particular). eBVC does results in unnecessary avoidance movements, which is the not have an integrated discrete planner, which is the rea- main reason for average navigation duration differences. son for its good performance in terms of average navigation When there are obstacles in the environment, performance durations. When robots are close to each other, RLSS uses of both eBVC and RSFC degrades in terms of the number of the free space less effectively since discrete planning has a deadlocks and collisions. step size 0.77m. eBVC does not rely on discrete planning In experiment 3, even if the full prior map of the envi- and hence avoids other robots by executing small direction ronment is given during initial discrete search with only changes. RSFC utilizes velocities of other robots on top of velocity continuity, 10 out of 32 robots deadlock, and 7 out positions, which allows it to estimate the intents of robots of the remaining 22 get involved in at least one collision more effectively, resulting in a better usage of the free space, when eBVC is used. When acceleration continuity is required and hence results in better navigation durations on average. (experiment 4), 8 robots deadlock and 13 other robots get Since RLSS does not utilize communication and uses only involved in collisions, resulting in only 11 robot reaching positions of other robots, it cannot deduce the intents of other their goal volumes without collisions in eBVC. RLSS both robots. This results in fluctuations of plans between planning works faster than eBVC and results in no deadlocks or colli- iterations, which increases the average navigation durations sions in those cases. of robots. Fluctuations of plans increase when the environ- When the prior map is not known in the forest environment ment is dense as seen in the supplemental video. If the (experiments 5 and 6), eBVC results in a lot of deadlocks environment becomes overly constraining, e.g. tens of robots and collisions. All robots in experiment 5 either deadlock or collide when eBVC is used. In experiment 6, RSFC does a https://youtu.be/Jrdvf2qyzrg. 123 Autonomous Robots Turtlebot3s, and Turtlebot2s are equipped with ODROID XU4 and ODROID C1+ single board computers running ROS Kinetic on the Ubuntu 16.04 operating system. In all cases the algorithm is run on a centralized base station com- puter using separate processes for each robot. Therefore, unlike simulations, planning is not synchronized between robots on real robot implementations. RLSS does not result in any deadlocks in collisions in these asynchronized deploy- ments as well. Commands are sent to 2D robots over a WiFi network, and to Crazyflie 2.0 s directly over their custom radio. We conduct external disturbance experiments with 2 iRobot Create2s, 3 Turtlebot3s, and 2 Turtlebot2s (Fig. 9a). A human changes the positions of some robots by moving them arbitrarily during execution several times. In all cases, robots replan in real-time and avoid each other successfully. We demonstrate the algorithm in 3D using 6 Crazyflie 2.0 s. We conduct an experiment without obstacles in which Crazyflies swap positions with straight lines as desired tra- jectories (Fig. 9b). In another experiment, we show that Fig. 9 Physical robot experiments using Turtlebot2s, Turtlebot3s, Crazyflies can navigate through an environment with obsta- iRobot Create2s, and Crazyflie 2.0 s. RLSS works in real-time under cles (9c, d). In each case, we externally disturb the Crazyflies external disturbances and show that they can replan in real-time. The recordings for our physical robot experiments are trying to pass through a narrow tube, these fluctuations may included in the supplemental video. turn into livelocks. However, when obstacles are introduced in the envi- ronment, the performance of RLSS is better than other 8 Conclusion algorithms. eBVC suffers greatly from deadlocks. RSFC does not result in deadlocks but results in collisions, even In this article, we present RLSS, a real-time decentralized while using more information than RLSS (velocity and posi- long horizon trajectory planning algorithm for the navigation tion instead of position only). of multiple robots in shared environments with obstacles that A note about the statistical significance of the results. In provides guarantees on collision avoidance if the resulting each experiment, we run each algorithm on single randomly problems are feasible. The generated optimization problem to generated forest-like or maze-like environment. To show that compute a smooth trajectory is convex and kinematically fea- the results are consistent for environment types, we generate sible. It does not require any communication between robots, 10 forest-like environments with the same parameters for requires only position sensing of obstacles and robots in the experiment 3 and run RLSS and eBVC. RLSS does not result environment, and robots to be able to distinguish other robots in deadlocks or collisions in any of the cases. The average from obstacles. With its comparatively minimal sensing navigation duration of robots averaged over 10 environments requirements and no reliance on communication, it presents a is 19.15s with standard deviation 0.24s for eBVC and 23.59s new baseline for algorithms that require communication and with standard deviation 0.25s for RLSS. The ratio between sensing/prediction of higher order state components of other average navigation durations when eBVC or RLSS is used robots. The algorithm considers the dynamic limits of the is consistent with the reported values given in Table 3 for robots explicitly and enforces safety using hard constraints. experiment 3. We show in synchronized simulation that RLSS performs better than two state-of-the-art planning algorithms (eBVC 7.4 Physical robots and RSFC), one of which requires velocity sensing on top of position sensing, in environments with obstacles in terms of We implement and run RLSS on physical robots using iRobot Create2s, Turtlebot2s and Turtlebot 3 s in 2D; and Since we define robots’ goals as single points, i.e. sets of measure Crazyflie 2.0 s in 3D. We use a VICON motion tracking sys- zero, in physical experiments, robots keep missing their goals slightly. tem for localization. Robots do not sense, but receive the This results in a spinning behavior in 2D since robots continuously fix position of others using the VICON system. iRobot Create2s, their positions by replanning. 123 Autonomous Robots number of deadlocks and number of colliding robots. In our Appendix A: Sweep of a convex shape along experiments, RLSS does not result in any deadlocks or colli- a line segment is convex sions, while eBVC suffers from deadlocks and RSFC results in collisions (while RLSS provides theoretical guarantees on Lemma 2 Let R(x) ={x}⊕ R be the Minkowski sum of d d collision avoidance when it succeeds, it does not provide {x}∈ R and R ⊆ R , and let R be a convex set. Let 0 0 theoretical guarantees on deadlock avoidance). When there p(t ) = a + t (b − a) where t ∈[0, 1] be the line segment d d are no obstacles in the environment, RSFC and eBVC out- from a ∈ R to b ∈ R . Then, the swept volume of R along perform RLSS in terms of average navigation duration by p(t ) is convex. 7–20%. Proof The swept volume ζ of R from a to b can be defined In future work, we would like to extend our planner to 1 1 as ζ =∪ R(p(t )) =∪ {p(t )}⊕ R . Choose two points asynchronously planning robots where each robot starts and 0 t =0 t =0 q , q ∈ ζ . ∃t ∈[0, 1]∃q ˜ ∈ R q = p(t ) + q˜ and ∃t ∈ 1 2 1 1 0 1 1 1 2 finishes planning at different unknown timestamps by utiliz- [0, 1]∃q ˜ ∈ R q = p(t ) + q˜ .Let q = (1 − t )q + t q 2 0 2 2 2 1 2 ing communication. Also, we would like to incorporate the be a point on the line segment connecting q and q where 1 2 noise in the sensing systems within our algorithm in order to t ∈[0, 1]. solve the cases with imperfect sensing in a principled way. Supplementary Information The online version contains supplemen- q = (1 − t )(p(t ) + q ˜ ) + t (p(t ) + q ˜ ) 1 1 2 2 tary material available at https://doi.org/10.1007/s10514-023-10104- = (1 − t )(a + t (b − a) + q ˜ ) 1 1 w. +t (a + t (b − a) + q ˜ ) 2 2 Acknowledgements This work was supported by National Science = (1 − t )q ˜ + t q ˜ 1 2 Foundation awards IIS-1724399, IIS-1724392, and CPS-1837779. B. Senba ¸ slar ¸ was supported by a University of Southern California +a + (t (1 − t ) + t t )(b − a) 1 2 Annenberg Fellowship. Wolfgang Hönig was partially funded by the Deutsche Forschungsgemeinschaft (DFG, German Research Founda- (t (1 − t ) + t t ) ∈[0, 1] because it is a convex combination tion) - 448549715. 1 2 of t , t ∈[0, 1]. Therefore a + (t (1 − t ) + t t )(b − a) ∈ 1 2 1 2 Funding Open access funding provided by SCELC, Statewide Califor- p(t ).Also, (1 − t )q ˜ + t q ˜ ∈ R , because it is a convex 1 2 0 nia Electronic Library Consortium combination of q ˜ , q ˜ ∈ R and R is convex. Therefore, 1 2 0 0 q ∈ ζ since ∃t ∈[0, 1] q ∈{p(t )}⊕ R . Hence, ζ is Declarations convex. Conflict of interest Baskın Senba ¸ slar ¸ , the first author, has moved to Prof. Gaurav S. Sukhatme’s group in January 2022 after the initial submis- sion of this paper. Prof. Sukhatme is the editor-in-chief of Autonomous References Robots journal and the advisor of Baskın Senba ¸ slar ¸ at the time of writing. Prof. Sukhatme was not involved in ideation, writing and exper- Alonso-Mora, J., Breitenmoser, A., Rufli, M., Beardsley, P., Siegwart, iments of this paper in any way. This paper is submitted to Robot R. (2013). Optimal reciprocal collision avoidance for multiple non- Swarms in the Real World: from Design to Deployment special issue of holonomic robots. In Distributed autonomous robotic systems: The Autonomous Robots, which has Dr. Siddharth Mayya as the lead guest 10th international symposium (pp. 203–216). https://doi.org/10. editor. 1007/978-3-642-32723-0. Barer, M., Sharon, G., Stern, R., & Felner, A. (2014). Suboptimal Open Access This article is licensed under a Creative Commons variants of the conflict based search algorithm for the multi- Attribution 4.0 International License, which permits use, sharing, adap- agent pathfinding problem. Frontiers in Artificial Intelligence tation, distribution and reproduction in any medium or format, as and Applications, 263, 961–962. https://doi.org/10.3233/978-1- long as you give appropriate credit to the original author(s) and the 61499-419-0-961 source, provide a link to the Creative Commons licence, and indi- Batra, S., Huang, Z., Petrenko, A., Kumar, T., Molchanov, A., cate if changes were made. The images or other third party material Sukhatme, G. S. (2021). Decentralized control of quadrotor in this article are included in the article’s Creative Commons licence, swarms with end-to-end deep reinforcement learning. In 5th Con- unless indicated otherwise in a credit line to the material. If material ference on robot learning. CoRL 2021. is not included in the article’s Creative Commons licence and your Bhattacharya, S., Kumar, V., Likhachev, M. (2010). Search-based path intended use is not permitted by statutory regulation or exceeds the planning with homotopy class constraints. In Proceedings of the permitted use, you will need to obtain permission directly from the copy- twenty-fourth AAAI conference on artificial intelligence, pp. 1230– right holder. To view a copy of this licence, visit http://creativecomm ons.org/licenses/by/4.0/. Campion, G., Bastin, G., & Dandrea-Novel, B. (1996). Structural properties and classification of kinematic and dynamic models of wheeled mobile robots. IEEE Transactions on Robotics and Automation, 12(1), 47–62. https://doi.org/10.1109/70.481750 Chen, J., Liu, T., Shen, S. (2016). Online generation of collision-free tra- jectories for quadrotor flight in unknown cluttered environments. In 2016 IEEE international conference on robotics and automa- 123 Autonomous Robots tion (ICRA) (pp. 1476–1483). https://doi.org/10.1109/ICRA.2016. motion planning. IEEE Robotics and Automation Letters, pp. 1–1. 7487283. https://doi.org/10.1109/LRA.2020.2964159. Cortes, C., & Vapnik, V. (1995). Support-vector networks. Machine Ma, H., Harabor, D., Stuckey, P.J., Li, J., Koenig, S. (2019). Searching Learning, 20(3), 273–297. https://doi.org/10.1007/BF00994018 with consistent prioritization for multi-agent path finding. In Pro- Desai, A., & Michael, N. (2020). Online planning for quadrotor ceedings of the AAAI conference on artificial intelligence, 33(01), teams in 3-d workspaces via reachability analysis on invariant 7643–7650. https://doi.org/10.1609/AAAI.V33I01.33017643. geometric trees. In IEEE international conference on robotics Mellinger, D., & Kumar, V. (2011). Minimum snap trajectory generation and automation (ICRA) (pp. 8769–8775). https://doi.org/10.1109/ and control for quadrotors. In IEEE international conference on ICRA40945.2020.9197195. robotics and automation (ICRA) (pp. 2520–2525). https://doi.org/ Dresner, K., & Stone, P. (2008). A multiagent approach to autonomous 10.1109/ICRA.2011.5980409. intersection management. Journal of Artificial Intelligence Murray, R. M., Rathinam, M., Sluis, W. (1995). Differential flatness of Research, 31, 591–656. https://doi.org/10.1613/jair.2502. mechanical control systems: A catalog of prototype systems. In Farouki, R. T. (2012). The Bernstein polynomial basis: A centennial ASME international mechanical engineering congress and expo- retrospective. Computer Aided Geometric Design, 29(6), 379–419. sition. https://doi.org/10.1016/j.cagd.2012.03.001 Murray, R. M., & Sastry, S. S. (1993). Nonholonomic motion planning: Furda, A., & Vlacic, L. (2011). Enabling safe autonomous driving steering using sinusoids. IEEE Transactions on Automatic Control, in real-world city traffic using multiple criteria decision making. 38(5), 700–716. https://doi.org/10.1109/9.277235. IEEE Intelligent Transportation Systems Magazine, 3(1), 4–17. Oleynikova, H., Taylor, Z., Fehr, M., Siegwart, R., & Nieto, J. (2017). https://doi.org/10.1109/MITS.2011.940472 Voxblox: Incremental 3D euclidean signed distance fields for on- Gondzio, J., & Grothey, A. (2009). Exploiting structure in parallel board mav planning. In IEEE/RSJ international conference on implementation of interior point methods for optimization. Com- intelligent robots and systems (IROS). https://doi.org/10.1109/ putational Management Science, 6(2), 135–160. https://doi.org/ IROS.2017.8202315. 10.1007/s10287-008-0090-3 O’Meadhra, C., Tabib, W., & Michael, N. (2019). Variable resolu- Harabor, D., & Grastien, A. (2011). Online graph pruning for pathfind- tion occupancy mapping using gaussian mixture models. IEEE ing on grid maps. In Proceedings of the AAAI conference on Robotics and Automation Letters, 4(2), 2015–2022. https://doi. artificial intelligence, vol. 25, pp. 1114–1119. org/10.1109/LRA.2018.2889348 Hönig, W., Preiss, J. A., Kumar, T. K. S., Sukhatme, G. S., & Ayanian, N. Park, J., & Kim, H. J. (2021). Online trajectory planning for multi- (2018). Trajectory planning for quadrotor swarms. IEEE Transac- ple quadrotors in dynamic environments using relative safe flight tions on Robotics, 34(4), 856–869. https://doi.org/10.1109/TRO. corridor. IEEE Robotics and Automation Letters, 6(2), 659–666. 2018.2853613 https://doi.org/10.1109/LRA.2020.3047786 Homm, F., Kaempchen, N., Ota, J., Burschka, D. (2010). Efficient occu- Park, J., Kim, J., Jang, I., Kim, H. J. (2020). Efficient multi-agent pancy grid computation on the gpu with lidar and radar for road trajectory planning with feasibility guarantee using relative Bern- boundary detection. In IEEE intelligent vehicles symposium (IV) stein polynomial. In IEEE international conference on robotics (pp. 1006–1013). https://doi.org/10.1109/IVS.2010.5548091. and automation (ICRA) (pp. 434–440). https://doi.org/10.1109/ Hopcroft, J., Schwartz, J., & Sharir, M. (1984). On the complexity ICRA40945.2020.9197162. of motion planning for multiple independent objects; pspace- Peterson, R., Buyukkocak, A. T., Aksaray, D., & Yazicioglu, ˇ Y. (2021). hardness of the“warehouseman’s problem”. The International Distributed safe planning for satisfying minimal temporal relax- Journal of Robotics Research, 3(4), 76–88. https://doi.org/10. ations of twtl specifications. Robotics and Autonomous Systems, 1177/027836498400300405 142, 103801. https://doi.org/10.1016/j.robot.2021.103801. Hornung, A., Wurm, K. M., Bennewitz, M., Stachniss, C., & Bur- Piegl, L., & Tiller, W. (1995). The NURBS book. Springer. https://doi. gard, W. (2013). Octomap: An efficient probabilistic 3d mapping org/10.1007/978-3-642-97385-7. framework based on octrees. Autonomous Robots, 34(3), 189–206. Richter, C., Bry, A., Roy, N. (2013). Polynomial trajectory planning https://doi.org/10.1007/s10514-012-9321-0 for aggressive quadrotor flight in dense indoor environments. In Jiang, S., & Song, K. (2013). Differential flatness-based motion control International symposium of robotic research (ISRR) (vol. 114, pp. of a steerand-drive omnidirectional mobile robot. In IEEE inter- 649–666). https://doi.org/10.1007/978-3-319-28872-7_37. national conference on mechatronics and automation (ICMA) (pp. Riviere, B., Hönig, W., Yue, Y., Chung, S.-J. (2020). Glas: Global- 1167–1172). https://doi.org/10.1109/ICMA.2013.6618079. to-local safe autonomy synthesis for multi-robot motion planning Karaman, S., & Frazzoli, E. (2010). Incremental sampling-based algo- with end-to-end learning. IEEE Robotics and Automation Letters, rithms for optimal motion planning. Robotics Science and Systems pp. 1–1. https://doi.org/10.1109/LRA.2020.2994035. VI, 104(2). https://doi.org/10.15607/RSS.2010.VI.034. Sartoretti, G., Kerr, J., Shi, Y., Wagner, G., Kumar, T. K., Koenig, Lam, E., Le Bodic, P., Harabor, D.D., Stuckey, P.J. (2019). Branch-and- S., & Choset, H. (2019). Primal: Pathfinding via reinforcement cut-and-price for multi-agent pathfinding. In Proceedings of the and imitation multi-agent learning. IEEE Robotics and Automa- twenty-eighth international joint conference on artificial intelli- tion Letters, 4, 2378–2385. https://doi.org/10.1109/LRA.2019. gence, IJCAI-19 (pp. 1289–1296). https://doi.org/10.24963/ijcai. 2903261 2019/179. Senba ¸ slar ¸ , B., Hönig, W., Ayanian, N. (2019). Robust trajectory execu- Li, Q., Gama, F., Ribeiro, A., Prorok, A. (2020). Graph neural networks tion for multi-robot teams using distributed real-time replanning. for decentralized multi-robot path planning. IEEE/RSJ interna- In Distributed Autonomous Robotic Systems (DARS) (pp. 167– tional conference on intelligent robots and systems (IROS) (pp. 181). https://doi.org/10.1007/978-3-030-05816-6_12. 11785–11792). https://doi.org/10.1109/iros45743.2020.9341668. Senba ¸ slar ¸ , B., Hönig, W., Ayanian, N. (2021). RLSS: Real-time Liu, S., Watterson, M., Mohta, K., Sun, K., Bhattacharya, S., Taylor, C. multi-robot trajectory replanning using linear spatial separations. J., & Kumar, V. (2017). Planning dynamically feasible trajectories Retrieved from arXiv:2103.07588. for quadrotors using safe flight corridors in 3-D complex environ- Senba ¸ slar ¸ , B., & Sukhatme, G. (2022). Asynchronous real-time decen- ments. IEEE Robotics and Automation Letters, 2(3), 1688–1695. tralized multirobot trajectory planning. In IEEE/RSJ international https://doi.org/10.1109/LRA.2017.2663526 conference on intelligent robots and systems (IROS 2022). Luis, C., Vukosavljev, M., & Schoellig, A. (2020). Online trajectory generation with distributed model predictive control for multirobot 123 Autonomous Robots Sharon, G., Stern, R., Felner, A., & Sturtevant, N. R. (2015). Conflict- tems Lab (RESL). His research based search for optimal multi-agent pathfinding. Artificial Intel- interests include multi-robot navigation systems, multi-robot trajec- ligence, 219, 40–66. https://doi.org/10.1016/j.artint.2014.11.006 tory planning, and multi-robot aware control. Solovey, K., Salzman, O., & Halperin, D. (2013). Finding a needle in an exponential haystack: Discrete RRT for exploration of implicit Wolfgang Hönig is an indepen- roadmaps in multi-robot motion planning. In The International dent junior research group leader Journal of Robotics Research, 107. https://doi.org/10.1007/978- at TU Berlin, Germany heading 3-319-16595-0_34. the Intelligent Multi-Robot Coor- Tang, S., & Kumar, V. (2016). Safe and complete trajectory generation dination Lab. Previously, he was for robot teams with higher-order dynamics. In IEEE/RSJ inter- a postdoctoral scholar at the Cal- national conference on intelligent robots and systems (IROS) (pp. ifornia Institute of Technology, 1894–1901). https://doi.org/10.1109/IROS.2016.7759300. USA. He received the diploma in Tordesillas, J., & How, J. P. (2020). MINVO basis: Finding sim- Computer Science from TU Dres- plexes with minimum volume enclosing polynomial curves. arXiv den, Germany in 2012, and the preprint arXiv:2010.10726. M.S. and Ph.D. degrees from the Tordesillas, J., & How, J. P. (2021). MADER: Trajectory planner in University of Southern California multi-agent and dynamic environments. IEEE Transactions on (USC), USA in 2016 and 2019, Robotics. https://doi.org/10.1109/TRO.2021.3080235 respectively. His research focuses Usenko, V., Von Stumberg, L., Pangercic, A., Cremers, D. (2017). Real- on enabling large teams of physi- time trajectory replanning for mavs using uniform b-splines and a cal robots to collaboratively solve real-world tasks, using tools from 3d circular buffer. In IEEE/RSJ international conference on intel- informed search, optimization, and machine learning. Dr. Hönig has ligent robots and systems (IROS) (pp. 215–222). https://doi.org/ been the recipient of several awards, including Best Paper in Robotics 10.1109/IROS.2017.8202160. Track for a paper at ICAPS 2016 and the 2019 Best Dissertation Wang, L., Ames, A.D., Egerstedt, M. (2017). Safety barrier certifi- Award in Computer Science at USC. cates for collisions-free multirobot systems. IEEE Transactions on Robotics, 33(3), 661–674. https://doi.org/10.1109/TRO.2017. 2659727. Nora Ayanian received the B.S. Wang, X., Xi, L., Chen, Y., Lai, S., Lin, F., Chen, B.M. (2021). Decen- degree in Mechanical Engineer- tralized mpc-based trajectory generation for multiple quadrotors ing and Mechanics in 2005 from in cluttered environments. Guidance, Navigation and Control, Drexel University, Philadelphia, 01(02), 2150007. https://doi.org/10.1142/S2737480721500072. PA, and the M.S. and Ph.D. Wurman, P., D’Andrea, R., & Mountz, M. (2008). Coordinating hun- degrees in Mechanical Engineer- dreds of cooperative, autonomous vehicles in warehouses. AI ing and Applied Mechanics from Magazine, 29, 9–20. the University of Pennsylvania, in Yu, J., & LaValle, S. M. (2013). Structure and intractability of opti- 2008 and 2011, respectively. She mal multi-robot path planning on graphs. In Proceedings of the was a Postdoctoral Associate at twenty-seventh AAAI conference on artificial intelligence (pp. the Computer Science and Artifi- 1443–1449). https://doi.org/10.1609/aaai.v27i1.8541. cial Intelligence Laboratory at the Zhou, D., Wang, Z., Bandyopadhyay, S., & Schwager, M. (2017). Fast, Massachusetts Institute of Tech- on-line collision avoidance for dynamic vehicles using buffered nology from 2011–2013. In 2013, Voronoi cells. IEEE Robotics and Automation Letters, 2(2), 1047– she joined the Department of Com- 1054. https://doi.org/10.1109/LRA.2017.2656241 puter Science, University of Southern California, as an Assistant Pro- Zhou, Y., & Zeng, J. (2015). Massively parallel A* search on a GPU. In fessor, and in 2020, became Associate Professor. Since January 2022, Proceedings of the AAAI conference on artificial intelligence (vol. she is an Associate Professor with the Department of Computer Sci- 29). https://doi.org/10.1609/aaai.v29i1.9367. ence and the School of Engineering at Brown University. Her research interests include multi-robot planning, coordination, and control. Dr. Ayanian is a founder and advisory chair of the the IEEE Technical Publisher’s Note Springer Nature remains neutral with regard to juris- Committee on Multi-Robot Systems. She is the recipient of several dictional claims in published maps and institutional affiliations. awards, including a being named to the MIT TR35, Okawa Founda- tion Research Award, and the USC Hanna Reisler Mentorship award. Baskın Senba ¸ slar ¸ received his B.S. degree in Computer Engineering in 2017 from Middle East Tech- nical University, Ankara, Turkey. He received his M.S. degree in Computer Science in 2019 from University of Southern Califor- nia, Los Angeles, USA as a Ful- bright Grantee. In 2019, he started his PhD. studies in University of Southern California as a part of Automatic Coordination of Teams (ACT) Lab. In 2022, he moved to the Robotics and Embedded Sys- http://www.deepdyve.com/assets/images/DeepDyve-Logo-lg.png Autonomous Robots Springer Journals

RLSS: real-time, decentralized, cooperative, networkless multi-robot trajectory planning using linear spatial separations

Loading next page...
 
/lp/springer-journals/rlss-real-time-decentralized-cooperative-networkless-multi-robot-33WpWnb3RK

References (17)

Publisher
Springer Journals
Copyright
Copyright © The Author(s) 2023
ISSN
0929-5593
eISSN
1573-7527
DOI
10.1007/s10514-023-10104-w
Publisher site
See Article on Publisher Site

Abstract

Trajectory planning for multiple robots in shared environments is a challenging problem especially when there is limited communication available or no central entity. In this article, we present Real-time planning using Linear Spatial Separations, or RLSS: a real-time decentralized trajectory planning algorithm for cooperative multi-robot teams in static environments. The algorithm requires relatively few robot capabilities, namely sensing the positions of robots and obstacles without higher- order derivatives and the ability of distinguishing robots from obstacles. There is no communication requirement and the robots’ dynamic limits are taken into account. RLSS generates and solves convex quadratic optimization problems that are kinematically feasible and guarantees collision avoidance if the resulting problems are feasible. We demonstrate the algorithm’s performance in real-time in simulations and on physical robots. We compare RLSS to two state-of-the-art planners and show empirically that RLSS does avoid deadlocks and collisions in forest-like and maze-like environments, significantly improving prior work, which result in collisions and deadlocks in such environments. Keywords Trajectory planning for multiple mobile robots · Decentralized robot systems · Collision avoidance · Multi-robot systems 1 Introduction is especially challenging when there is no central entity that plans all robots’ trajectories a priori or re-plans mid- Effective navigation of multiple robots in cluttered envi- execution if there is a fault. In some cases, such a central ronments is key to emerging industries such as ware- entity is undesirable because of the communication link that house automation (Wurman et al., 2008), autonomous driv- must be maintained between each robot and the central entity. ing (Furda & Vlacic, 2011), and automated intersection If the map is not known a priori, building and relaying the management (Dresner & Stone, 2008). One of the core observed map of the environment to the central entity through challenges of navigation systems in such domains is trajec- the communication channel adds further challenges. In some tory planning. Planning safe trajectories for multiple robots cases, it is impractical to have such a central entity because it cannot react to robot trajectory tracking errors and map Baskın Senba ¸ slar ¸ updates fast enough due to communication and computation baskin.senbaslar@usc.edu delays. In practice, robots must operate safely even if there is Wolfgang Hönig limited communication available. This necessitates the abil- hoenig@tu-berlin.de ity for robots to plan in a decentralized fashion, where each Nora Ayanian robot plans a safe trajectory for itself while operating in envi- nora_ayanian@brown.edu ronments with other robots and obstacles. Decentralized algorithms delegate the computation of Department of Computer Science, University of Southern California, 941 Bloom Walk, Los Angeles, CA 90089, USA trajectories: each robot plans for itself and reacts to the environment by itself. In this paper, we introduce RLSS, Department of Electrical Engineering and Computer Science, Technical University of Berlin, Marchstr. 23, 10587 Berlin, a real-time decentralized trajectory planning algorithm for Germany multiple robots in a shared environment that requires no Department of Computer Science and School of Engineering, communication between robots and requires relatively few Brown University, 115 Waterman Street, Providence, RI sensing capabilities: each robot senses the relative positions 02912, USA 123 Autonomous Robots ear spatial separations, between robot shapes, obstacles, and sweep geometries (the subsets of space swept by robots while traversing straight line segments) to enforce safety during trajectory optimization. We demonstrate, through simulation and experiments on physical robots, that RLSS works in dense environments in real-time (Fig. 1). We compare our approach to two state-of- the-art receding horizon decentralized multi-robot trajectory planning algorithms in 3D. In the first, introduced by Zhou et al. (2017), robots plan for actions using a model predic- Fig. 1 RLSS runs in real-time in dense environments. Each robot plans tive control-style optimization formulation while enforcing a trajectory by itself using only the position information of other robots and obstacles that each robot stays inside its buffered Voronoi cell at every iteration. We refer to this method as BVC. The original BVC formulation works only for discrete single-integrator dynam- of other robots and obstacles along with their geometric ics and environments without obstacles. We extend the BVC shapes in the environment, and is able to distinguish robots formulation to discrete time-invariant linear systems with from obstacles. RLSS requires relatively few robot capabili- position outputs and environments with obstacles, and call ties than most state-of-the-art decentralized planners used for the extended version eBVC (short for extended BVC). RLSS multi-robot navigation, which typically require communica- and eBVC have similar properties: they require no communi- tion (Luis et al., 2020; Tordesillas & How, 2021; Wang et al., cation between robots, and require position sensing of other 2021), higher order derivative estimates (Park & Kim, 2021; objects in the environment. However, unlike RLSS, eBVC Wang et al., 2021), or predicted trajectories of objects (Park does not require that robots are able to distinguish robots &Kim, 2021). However, the ability to distinguish robots from obstacles, as it treats each obstacle as a robot. In the from obstacles is not required by some state-of-the-art algo- second, introduced by Park and Kim (2021), robots plan rithms (Zhou et al., 2017), which require modelling obstacles for trajectories by utilizing relative safe navigation corridors, as robots. RLSS is cooperative in the sense that we assume which they execute for a short duration, and replan. We refer each robot stays within its current cell of a tessellation of the to this method as RSFC. RSFC requires no communication space until the next planning iteration. We assume obstacles between robots, and utilizes positions as well as velocities are static, which is required to guarantee collision avoidance. of the objects in the environment, thus requires more sens- RLSS explicitly accounts for the dynamic limits of the ing capabilities than RLSS. We demonstrate empirically that robots and enforces safety with hard constraints, reporting RLSS results in no deadlocks or collisions in our experiments when they cannot be enforced, thus guaranteeing collision in forest-like and maze-like environments, while eBVC is avoidance when it succeeds. The planning algorithm can be prone to deadlocks and RSFC results in collisions in such guided with desired trajectories, thus it can be used in con- environments. However, RLSS results in slightly longer nav- junction with centralized planners. If no centralized planner igation durations compared to both eBVC and RSFC. or plan is available, RLSS can be used on its own, without The contribution of this work can be summarized as fol- any central guidance, by setting the desired trajectories to lows: line segments directly connecting robots’ start positions to their goal positions. There are 4 stages in RLSS. • A carefully designed, numerically stable, and effective real-time decentralized planning algorithm for multiple 1. Select a goal position to plan toward on the desired tra- robots in shared environments with static obstacles with jectory; relatively few requirements: no communication between 2. Plan a discrete path toward the selected goal position; robots, position-only sensing of robots and obstacles, and 3. Formulate and solve a kinematically feasible convex opti- ability to distinguish robots from obstacles. mization problem to compute a safe smooth trajectory • An extension (eBVC) of a baseline planner (BVC) to guided by the discrete plan; environments with obstacles and a richer set of dynamics 4. Check if the trajectory obeys the dynamic limits of the than only single-integrators. robot, and temporally rescale the trajectory if not. • The first comparison of more than two state-of-the-art communication-free decentralized multi-robot trajectory planning algorithms, namely, RSFC, eBVC, and RLSS, RLSS works in the receding horizon fashion: it plans a long in complicated forest-like and maze-like environments, trajectory, executes it for a short duration, and replans at some of which containing more than 2000 obstacles. the next iteration. It utilizes separating hyperplanes, i.e., lin- 123 Autonomous Robots 2 Related work 2.1 Centralized algorithms The pipeline of RLSS contains three stages (discrete plan- We further categorize centralized algorithms into those that ning, safe navigation corridor construction, and trajectory do not consider robot dynamics during planning and those optimization) that are employed by several existing single- that do. robot trajectory planning algorithms. Richter et al. (2013) Centralized without dynamics: If planning can be present a single-robot trajectory planning method for aggres- abstracted to agents moving along edges in a graph syn- sive quadrotor flight which utilizes RRT* (Karaman & chronously, we refer to the multi-agent path finding (MAPF) Frazzoli, 2010) to find a kinematically feasible discrete path problem. Some variants of the MAPF problem are NP-Hard and formulates an unconstrained quadratic program over to solve optimally (Yu & LaValle, 2013). For NP-Hard vari- polynomial splines guided by the discrete path. Collisions are ants, there are many optimal (Sharon et al., 2015; Lam et al., checked after optimization, and additional decision points are 2019), bounded suboptimal (Barer et al., 2014), and subop- added and optimization is re-run if there is collision. Chen et timal (Solovey et al., 2013; Ma et al., 2019) algorithms that al. (2016) present a method that utilizes OcTree representa- perform well in some environments. Trajectories generated tion (Hornung et al., 2013) of the environment during discrete by planning algorithms that do not model robot dynamics search. They find a discrete path using unoccupied grid cells may not be followed perfectly by real robots, resulting in of the OcTree. Then, they maximally inflate unoccupied grid divergence from the plan. cells to create a safe navigation corridor that they use as con- Centralized with dynamics: These algorithms deliver straints in the subsequent polynomial spline optimization. smooth control or output trajectories that are executable by Liu et al. (2017) uses Jump Point Search (JPS) (Harabor & real robots. Trajectories are usually generated by formulating Grastien, 2011) as the discrete planner, and construct safe the problem under an optimization framework, and commu- navigation corridors that are used as constraints in the opti- nicated to the robots for them to execute. Tang and Kumar mization stage. Our planning system uses these three stages (2016) propose an approach that combines a discrete motion (discrete planning, safe navigation corridor construction, and planning algorithm with trajectory optimization that works corridor-constrained optimization) and extends it to multi- only in obstacle-free environments. Another approach com- robot scenarios in a decentralized way. We handle robot-robot bines a MAPF solver with trajectory optimization to plan safety by cooperatively computing a linear partitioning of the trajectories for hundreds of quadrotors in environments with environment, and enforcing that each robot stays within its obstacles, while ensuring that the resulting trajectories are own cell of the partition. executable by the robots (Hönig et al., 2018). Park et al. We categorize multi-robot trajectory planning algorithms (2020) combine a MAPF solver with trajectory optimization first on where computation is done, since the location of and provide executability and feasibility guarantees. Desai computation changes the properties of the algorithms. There and Michael (2020) propose an approach utilizing position- are two main strategies to solve the multi-robot trajectory invariant geometric trees to find kinodynamically feasible planning problem: centralized and decentralized. Centralized trajectories. algorithms compute trajectories for all robots on a central Centralized algorithms can provide theoretical complete- entity with global information; these trajectories are com- ness and global optimality guarantees under the assumptions municated back to the robots for them to execute. In the of perfect prior knowledge and execution of trajectories. decentralized strategy, each robot runs an algorithm on- However, centralized replanning is required when the envi- board to compute its own trajectory; these algorithms may ronment is discovered during operation or the robots deviate utilize direct communication between robots. Centralized from the planned trajectories. This necessitates continuous algorithms often provide strong theoretical guarantees on communication between the central computer and the indi- completeness or global optimality because (i) they have the vidual robots. In some cases, solving the initial problem from complete knowledge about the environment beforehand, (ii) scratch may be required. If the communication between the they generally run on powerful centralized computers, and robots and the central computer cannot be maintained, or (iii) there are generally less restrictive time limits on their exe- solving the initial problem from scratch cannot be done in cution. Decentralized algorithms generally forgo such strong real-time, the robots may collide with each other or obstacles. theoretical guarantees in favor of fast computation because (i) they are often used when complete information of the 2.2 Decentralized algorithms environment is not known beforehand, and (ii) they have to work in real-time on-board. When the requirements of the centralized strategies cannot be satisfied, decentralization of the trajectory planning among robots is required. In decentralized planning, robots assess the state of the environment and plan for themselves using 123 Autonomous Robots their on-board capabilities. We categorize decentralized algo- Voronoi cell. It requires no inter-robot communication and rithms into reactive and long horizon algorithms. depends on sensing of other robots’ positions only. Another Decentralized reactive: In reactive algorithms, each robot MPC-based approach approximates robots’ controller behav- computes the next action to execute based on the state of the iors under given desired states as a linear system (Luis et environment without considering the actions that might fol- al., 2020). A smooth Bézier curve for each robot is com- low it. One such approach is Optimal Reciprocal Collision puted by solving an optimization problem, in which samples Avoidance (ORCA) (Alonso-Mora et al., 2013). In ORCA, of desired states are drawn from the curve and fed into the each robot chooses a velocity vector that is as close as possi- model of the system. The approach requires communication ble to a desired velocity vector such that collisions between of future states for collision avoidance. Another MPC-based each pair of robots are provably avoided. It requires robots approach generates plans using motion primitives to com- to (i) be cooperatively executing the same algorithm, (ii) pute time-optimal trajectories, then trains a neural network obey single integrator dynamics, and (iii) either sense other to approximate the behavior of the planner, which is used dur- robots’ velocities or receive them through communication. ing operation for fast planning (Wang et al., 2021). It requires While ORCA handles dynamic changes in the environment sensing or communicating the full states of robots. Tordesil- and is completely distributed, it fails to avoid deadlocks in las and How (2021) present a method for dynamic obstacle environments with obstacles (Senba ¸ slar ¸ et al., 2019). Safety avoidance, handling asynchronous planning between agents barrier certificates guarantee collision avoidance by com- in a principled way. However, the approach requires instant puting a safe control action that is as close as possible to communication of planned trajectories between robots to a desired control action (Wang et al., 2017). Similarly to ensure safety. Park and Kim (2021) plan a piecewise Bézier ORCA, the robots using safety barrier certificates may get curve by formulating an optimization problem. It utilizes rel- stuck in deadlocks. Most learning-based approaches fall into ative safe flight corridors (RSFCs) for collision avoidance the body of reactive strategies as well. Here, neural networks and requires no communication between robots, but requires are trained to compute the next action to execute, given a position and velocity sensing. In Peterson et al. (2021)’s robot’s immediate neighborhood and goal. For instance, approach, robot tasks are defined as time window tempo- PRIMAL (Sartoretti et al., 2019) is a computationally effi- ral logic specifications. Robots plan trajectories to achieve cient suboptimal online MAPF solver, but it only works on their tasks while avoiding each other. The method provably grids, and does not consider robot dynamics. GLAS (Riviere avoids deadlocks, but requires communication of planned et al., 2020), which considers robot dynamics, can guarantee trajectories between robots. safety for some robot dynamics by combining the network RLSS falls into this body of algorithms. It requires sens- output with a safety term. It performs better than ORCA in ing only the positions of other robots and obstacles, and terms of deadlocks, nevertheless results in deadlocks in dense does not require any communication between robots. The environments. Another approach employs graph neural net- work of Zhou et al. (2017) (BVC) is the only other approach works (GNNs) that allow communication between robots to with these properties. Different from their approach, RLSS avoid collisions (Li et al., 2020), but results in deadlocks uses discrete planning to avoid local minima, plans piecewise in dense environments similar to GLAS. Batra et al. (2021) Bézier trajectories instead of MPC-style input planning, and propose a method that outputs direct motor commands for a does not constrain the full trajectory to be inside a constrain- quadrotor using the positions and velocities of objects in the ing polytope. These result in better deadlock and collision its immediate neighborhood, but results in collisions between avoidance as we show in the evaluation section. robots. In terms of inter-robot collision avoidance, RLSS belongs Decentralized long horizon: In long horizon algorithms, to a family of algorithms that ensure multi-robot collision robots generate a sequence of actions or long trajectories avoidance without inter-robot communication by using the instead of computing a single action to execute. These fact that each robot in the team runs the same algorithm. In approaches employ receding horizon planning: they plan these algorithms, robots share the responsibility of collision long trajectories, execute them for a short duration, and avoidance by computing feasible action sets that would not replan. Zhou et al. (2017) present a model predictive control result in collisions with others when others compute their (MPC) based approach in which robots plan for a sequence of own feasible action sets using the same algorithm. Examples actions while enforcing each robot to stay within its buffered include BVC (Zhou et al., 2017), which partitions position space between robots and makes sure that each robot stays within its own cell in the partition, ORCA (Alonso-Mora et Note that some learning-based approaches train neural networks to mimic the behavior of a global long horizon planner. Because of this, al., 2013), which computes feasible velocity sets for robots even if they output a single action, they abstractly consider a longer such that there can be no collisions between robots as long horizon evolution of the system. We place these approaches to the reac- as each robot chooses a velocity command from their corre- tive algorithms category because they output a single action at each iteration. 123 Autonomous Robots sponding sets, and SBC (Wang et al., 2017), which does the ensures kinematic feasibility of the optimization problem. same with accelerations. Hence, there is no longer a different soft optimization formu- BVC and RLSS also belong to a family of decentral- lation. In the workshop version, we compare RLSS against ized multi-robot trajectory planning algorithms that utilize Luis et al. (2020)’s work, which is based on distributed mutually computable separating hyperplanes for inter-robot MPC, which requires communication of planned trajectories safety as defined by Senba ¸ slar ¸ and Sukhatme (2022). These between robots. Herein, we compare our approach to two algorithms do not require any inter-robot communication and planners that do not require communication between robots utilize only position/geometry sensing between robots. The as this provides a more comparable baseline. inter-robot safety is enforced by making sure that each pair of robots can compute the same separating hyperplane and enforce themselves to be in the different sides of this hyper- 3 Problem statement plane at every planning iteration. Separating hyperplanes that can be mutually computed by robots using only posi- We first define the multi-robot trajectory planning problem, tion/geometry sensing and no communication are defined as then, we indicate the specific case of the problem we solve. mutually computable. BVC uses Voronoi hyperplanes and Consider a team of N robots for which trajectories must RLSS uses support vector machine hyperplanes, both of be computed at time t = 0. The team can be heterogeneous, which are mutually computable. i.e., each robot might be of a different type or shape. Let R RLSS is an extension to our previous conference paper be the collision shape function of robot i such that R (p) is (¸Senbaslar ¸ et al., 2019). We extend it conceptually by the collision shape of robot i located at position p ∈ R , d ∈ {2, 3}, i.e., the subset of space occupied by robot i when • supporting robots with any convex geometry instead of placed at position p. We define R (p) ={p + x | x ∈ only spherical robots; and R } where R ⊂ R is the the space occupied by robot i i ,0 i ,0 • decreasing the failure rate of the algorithm from 3% in when placed at the origin. Note that we do not model robot complex environments to 0.01% by providing impor- orientation. If a robot can rotate, the collision shape should tant modifications to the algorithm. These modifications contain the union of spaces occupied by the robot for each include (i) changing the discrete planning to best-effort possible orientation at a given position; which is minimally A search, thus allowing robots to plan towards their a hypersphere if all orientations are possible. goals even when goals are not currently reachable; (ii) We assume the robots are differentially flat (Murray increasing the numerical stability of the algorithm by run- et al., 1995), i.e., the robots’ states and inputs can be ning discrete planning at each iteration instead of using expressed in terms of output trajectories and a finite num- the trajectory of the previous iteration when it is collision ber of derivatives thereof. Differential flatness is common free to define the homotopy class, and adding a preferred for many kinds of mobile robots, including differential drive distance cost term to the optimization problem in order to robots (Campion et al., 1996), car-like robots (Murray & create a safety zone between objects when possible; and Sastry, 1993), omnidirectional robots (Jiang & Song, 2013), (iii) ensuring the kinematic feasibility of the optimization and quadrotors (Mellinger & Kumar, 2011). When robots are problem generated at the trajectory optimization stage. differentially-flat, their dynamics can be accounted for by (i) imposing C continuity on the trajectories for the required c th We extend our previous work empirically by (i.e. continuity up to c degree of derivative), and (ii) impos- th ing constraints on the maximum k derivative magnitude • applying our algorithm to a heterogeneous team of dif- of trajectories for any required k. For example, quadrotor ferential drive robots in 2D; states and inputs can be expressed in terms of the output • applying our algorithm to quadrotors in 3D; and trajectory, and its first (velocity), second (acceleration) and • comparing the performance of our algorithm to stronger third (jerk) derivatives (Mellinger & Kumar, 2011). Hence, to baselines. account for quadrotor dynamics, continuity up to jerk should be enforced by setting c = 3, and velocity, acceleration, and A preliminary version of RLSS previously appeared in a jerk of the output trajectory should be limited appropriately workshop (Senba ¸ slar ¸ et al., 2021). In the workshop version, by setting upper bounds for k = 1, 2, and 3. the RLSS optimization stage fails about 3% of the time, and Let O(t ) ={Q ⊂ R } be the set of obstacles in the switches to a soft optimization formulation when that hap- environment at time t. The union Q(t ) =∪ Q denotes Q∈O(t ) pens. In the current version, the optimization rarely (close to the occupied space at time t.Let W ⊆ R be the workspace 1 in every 10000 iterations in our experiments) fails thanks that robot i should stay inside of. to the newly added preferred distance cost term, changed Let d (t ) :[0, T ]→ R be the desired trajectory of i i goal selection stage, and changed discrete planning stage that duration T that robot i should follow as closely as possible. 123 Autonomous Robots Define d (t ) = d (T ) ∀t ≥ T . We do not require that this ing problem can be solved optimally with classical search i i i i trajectory is collision-free or even dynamically feasible to methods in polynomial time. This curse of dimensionality track by the robot. If no such desired trajectory is known, it affects continuous motion planning as well, where even geo- can be initialized, for example, with a straight line to a goal metric variants are known to be PSPACE-Hard (Hopcroft et location. al., 1984). The intent is that each robot i tracks a Euclidean trajec- A centralized planner can be used to solve the generic tory f (t ) :[0, T]→ R such that f (t ) is collision-free, multi-robot trajectory planning problem in one-shot provided i i executable according to the robot’s dynamics, is as close as that: the current and future obstacle positions are known a possible to the desired trajectory d (t ), and ends at d (T ). priori, computed trajectories can be sent to robots over a com- i i i Here, T is the navigation duration of the team. We generi- munication link, and robots can track these trajectories well cally define the multi-robot trajectory planning problem as enough that they do not violate the spatio-temporal safety the problem of finding trajectories f ,..., f and navigation of the computed trajectories. In the present work, we aim to 1 N duration T that optimizes the following formulation: approximately solve this problem in the case where obsta- cles are static, but not necessarily known a priori by a central N entity, and there is no communication channel between robots min f (t )−d (t ) dt s.t . (1) i i 2 or between robots and a central entity. Each robot plans its f ,...,f ,T 1 N i =1 own trajectory in real-time. They plan at a high frequency to f (t ) ∈ C ∀i ∈{1,..., N } (2) compensate for trajectory tracking errors. f (T ) = d (T ) ∀i ∈{1,..., N } (3) i i i c 0 d p d f (0) = ∀i ∀c ∈{0,..., c } (4) c c dt dt 4 Preliminaries R (f (t )) ∩ Q(t ) =∅ ∀i ∀t ∈[0, T ] (5) i i R (f (t )) ∩ R (f (t )) =∅ ∀ j =i ∀t ∈[0, T ] (6) i i j j We now introduce essential mathematical concepts used R (f (t )) ∈ W ∀i ∀t ∈[0, T ] (7) i i i herein. d f (t ) max ≤ γ ∀i ∀k ∈{1,..., K } (8) t ∈[0,T ] dt 2 4.1 Parametric curves and splines where c is the order of derivative up to which the trajectory d Trajectories are curves f :[0, T]→ R that are parametrized th 0 of the i robot must be continuous, p is the initial position i by time, with duration T . Mathematically, we choose to use of robot i (derivatives of which are the initial higher order splines, i.e. piecewise polynomials, where each piece is a state components, e.g., velocity, acceleration, etc.), γ is the i Bézier curve defined by a set of control points and a duration. th th maximum k derivative magnitude that the i robot can d A Bézier curve f :[0, T]→ R of degree h is defined by execute, and K is the maximum derivative degree that robot d h + 1 control points P ,..., P ∈ R as follows: 0 h i has a derivative magnitude limit on. The cost (1) of the optimization problem is a metric i (h−i ) h t t for the total deviation from the desired trajectories; it is f (t ) = P 1 − . i T T the sum of position distances between the planned and the i =0 desired trajectories. (2) enforces that the trajectory of each robot is continuous up to the required degree of derivatives. Any Bézier curve f satisfies f (0) = P and f (T ) = P . Other 0 h (3) enforces that planned trajectories end at the endpoints points guide the curve from P to P . Since any Bézier curve 0 h of desired trajectories. (4) enforces that the planned tra- f is a polynomial of degree h, it is smooth, meaning f ∈ C . jectories have the same initial position and higher order We choose Bézier curves as pieces because of their convex derivatives as the initial states of the robots. (5) and (6) hull property: the curves themselves lie inside the convex hull enforce robot-obstacle and robot-robot collision avoidance, of their control points, i.e., f (t ) ∈ Convex Hull{P ,..., P } 0 h respectively. (7) enforces that each robot stays within its ∀t ∈[0, T ] (Farouki, 2012). Using the convex hull property, defined workspace. Lastly, (8) enforces that the dynamic lim- we can constrain a curve to be inside a convex region by its of the robot are obeyed. constraining its control points to be inside the same convex Note that only constraint (6) stems from multiple robots. region. Tordesillas and How (2020) discuss the conservative- However, this seemingly simple constraint couples robots’ ness of the convex hulls of control points of Bézier curves and trajectories both spatially and temporally, making the prob- show that the convex hulls are considerably less conserva- lem much harder. As discussed in Sect. 2.1, solving the tive than those of B-Splines (Piegl & Tiller, 1995), which are multi-agent path finding problem optimally is NP-Hard even another type of curve with the convex hull property. Yet, they for the discrete case while the discrete single-agent path find- also show that convex hulls of the control points of the Bézier 123 Autonomous Robots curves can be considerably more conservative compared to cles and robots as well as their shapes in the environment. the smallest possible convex sets containing the curves. The approach we propose does not require sensing of higher order state components (e.g., velocity, acceleration, etc.) or planned/estimated trajectories of objects, as the former is 4.2 Linear spatial separations: half-spaces, convex generally a noisy signal which cannot be expected to be polytopes, and support vector machines sensed perfectly and the latter would require either commu- nication or a potentially noisy trajectory estimation. A hyperplane H in R can be defined by a normal vector d d RLSS treats robots and obstacles differently. It enforces H ∈ R and an offset H as H ={x ∈ R | H x + H = n a a d d ˜ that each robot stays within a spatial cell that is disjoint from 0}. A half-space H in R is a subset of R that is bounded the cells of other robots until the next planning iteration to by a hyperplane such that H ={x ∈ R | H x + H ≤ 0}. ensure robot-robot collision avoidance. To compute the spa- A convex polytope is an intersection of a finite number of tial cell for each robot, RLSS uses positions and shapes of half-spaces. nearby robots, but not obstacles, in the environment. There- Our approach relies heavily on computing safe convex fore, robots must be able to distinguish other robots from polytopes and constraining spline pieces to be inside these obstacles. However, we do not require individual identifica- polytopes. Specifically, we compute hard-margin support tion of robots. vector machine (SVM) (Cortes & Vapnik, 1995) hyperplanes We assume that the team is cooperative in that each robot between spaces swept by robots along line segments and the runs the same algorithm using the same replanning period. obstacles/robots in the environment, and use these hyper- Lastly, we assume that planning is synchronized between planes to create safe convex polytopes for robots to navigate robots, meaning that each robot plans at the same time instant. in. The synchronization assumption is needed for ensuring robot-robot collision avoidance when planning succeeds. 5 Assumptions Here, we list our additional assumptions about the problem 6 Approach formulation defined in Sect. 3 and the capabilities of robots. We assume that obstacles in the environment are static, i.e., Under the given assumptions, we solve the generic multi- O(t ) = O ∀t ∈[0, ∞], and convex. Many existing, effi- robot trajectory planning problem approximately using cient, and widely-used mapping tools, including occupancy decentralized receding horizon planning. Each robot plans grids (Homm et al., 2010) and octomaps (Hornung et al., a trajectory, executes it for a short period of time, and repeats 2013), internally store obstacles as convex shapes; such maps this cycle until it reaches its goal. We call each plan-execute can be updated in real-time using visual or RGBD sensors, cycle an iteration. and use unions of convex axis-aligned boxes to approximate We refer to the planning robot as the ego robot, and hence- the obstacles in the environment. forth drop indices for the ego robot from our notation as the Similarly, we assume that the shapes of the robots are same algorithm is executed by each robot. Workspace W is convex. the convex polytope in which the ego robot must remain. We assume that the workspace W ⊆ R is a convex R is the collision shape function of the ego robot such that polytope. It can be set to a bounding box that defines a room R(p) ⊂ R is the space occupied by the ego robot when it is that a ground robot must not leave, a half-space that contains placed at position p. The ego robot is given a desired trajec- vectors with positive z coordinates so that a quadrotor does tory d(t ) :[0, T]→ R that it should follow. The dynamic not hit the ground or simply be set to R . A non-convex limits of the robot are modeled using required derivative workspace W can be modeled by a convex workspace W degree c up to which the trajectory must be continuous, i i ˜ ˜ such that W ⊆ W and a static set of convex obstacles O and maximum derivative magnitudes γ for required degrees i i that block portions of the convex workspace so that W = k ∈{1,..., K }. At every iteration, the ego robot computes a piecewise W \ ∪ Q . i ˜ Q∈O trajectory f (t ), which is dynamically feasible and safe up To provide a guarantee of collision avoidance, we assume to the safety duration s, that it executes for the replanning that robots can perfectly sense the relative positions of obsta- 2 3 For the purposes of our algorithm, concave obstacles can be described In physical deployments, asynchronous planning can cause collisions or approximated by a union of finite number of convex shapes provided between robots when they are in close proximity to each other. To handle that the union contains the original obstacle. Using our algorithm with collisions stemming from asynchronous planning, robot shapes can be approximations of concave obstacles results in trajectories that avoid artificially inflated according to the maximum possible planning lag the approximations. between robots (an empirical value) at the expense of conservativeness. 123 Autonomous Robots period δt ≤ s, and fully re-plans, i.e., runs the full planning pipeline, for the next iteration. The only parameter that is shared by all robots is the replanning period δt. Without loss of generality, we assume that navigation begins at t = 0, and at the start of planning iteration u, the current timestamp is T = uδt. RLSS fits into the planning part of the classical robotics pipeline using perception, planning, and control. The inputs from perception for the ego robot are: • S: Shapes of other robots. S ∈ S where j ∈{1,..., i − 1, i + 1,..., N } is the collision shape of robot j sensed by the ego robot such that S ⊆ R . • O: The set of obstacles in the environment, where each obstacle Q ∈ O is a subset of R . • p: Current position of the ego robot, from which deriva- tives up to required degree of continuity can be com- puted. We define O =∪ Q as the space occupied by the Q∈O obstacles, and S =∪ S as the space occupied by the S ∈S robot shapes. Robots sense the set of obstacles and the set of robot shapes and use those sets in practice. We use spaces occupied by obstacles and robots for brevity in notation. There are 4 main stages of RLSS: (1) goal selection, (2) discrete planning, (3) trajectory optimization, and (4) tempo- ral rescaling. The planning pipeline is summarized in Fig. 2. At each planning iteration, the ego robot executes the four Fig. 2 RLSS planning pipeline. Based on the sensed robots S, sensed stages to plan the next trajectory. In the goal selection stage, obstacles O, and current position p, the ego robot computes the trajec- a goal position and the corresponding timestamp of the goal tory f (t ) that is dynamically feasible and safe up to time s position on the desired trajectory d(t ) is selected. In the discrete planning stage, a discrete path from robot’s cur- rent position toward the selected goal position is computed the task at hand; these are not parameters that can be tuned and durations are assigned to each segment. In the trajec- freely and they do not change during navigation. tory optimization stage, discrete segments are smoothed to a piecewise trajectory. In the temporal rescaling stage, the 6.1 Goal selection dynamic limits of the robot are checked and duration rescal- ing is applied if necessary. At the goal selection stage (Algorithm 1), we find a goal Next, we describe each stage in detail. Each stage has position g on the desired trajectory d(t ) and a timestamp access to the workspace W, the collision shape function R, by which it should be reached. These are required in the the desired trajectory d(t ) and its duration T , the maximum subsequent discrete planning stage, which is a goal-oriented derivative magnitudes γ , and the derivative degree c up to search algorithm. which the trajectory must be continuous. We call these task Goal selection has two parameters: the desired planning inputs. They describe the robot shape, robot dynamics, and horizon τ and safety distance D. It uses the robot collision shape function R, desired trajectory d(t ) and its duration T , Practically, if the ego robot cannot sense a particular robot j because and workspace W from the task inputs. The inputs of goal it is not within the sensing range of the ego robot, robot j can be omitted by the ego robot. As long as the sensing range of the ego robot is more selection are the shapes of other robots S, obstacles in the than the maximum distance that can be travelled by the ego robot and environment O, current position p, and the current timestamp robot j in duration δt, omitting robot j does not affect the behavior of T . the algorithm. At the goal selection stage, the algorithm finds the times- In reality, all required derivatives of the position would be estimated tamp T that is closest to T + τ (i.e., the timestamp that using a state estimation method. We use this definition of p for notational convenience. is one planning horizon away from the current timestamp) 123 Autonomous Robots Algorithm 1 GOAL-SELECTION Input : S : Set of robot shapes Input : O : Set of obstacles Input : p : Current position Input : T : Current timestamp TaskInput : R : Collision shape function TaskInput : d(t ), T : Desired trajectory and its duration TaskInput : W : Workspace Parameter : τ : Desired planning horizon Parameter : D : Safety distance Return : Goal and timestamp by which it should be reached 1 T ← Solve (9) with linear search 2 if (9) is infeasible then 3 return (p, T ) 4 else 5 g ← d(T ) 6 return (g, T ) 7 end Fig. 3 Goal Selection. a Blue and red squares are robots, while obsta- cles are black boxes. The desired trajectories d (t ) and d (t ) of red blue when the robot, if placed on the desired trajectory at T ,isat each robot are given as dotted lines. Safety distance D is set to 0 for least safety distance D away from all objects in the environ- clarity. b The desired trajectory of the red robot is not collision-free at timestamp T + τ . It selects its goal timestamp T (and hence its goal ment. We use the safety distance D as a heuristic to choose position) by solving (9), which is the closest timestamp to T + τ when goal positions that have free volume around them in order the robot, when placed on the desired trajectory, would be collision free. not to command robots into tight goal positions. Note that c Since the desired trajectory of blue robot is collision free at timestamp ˜ ˜ goal selection only chooses a single point on the desired tra- T + τ , it selects its goal timestamp T = T + τ after solving (9). d Selected goal positions are shown jectory that satisfies the safety distance; the actual trajectory the robot follows will be planned by the rest of the algorithm. Formally, the problem we solve in the goal selection stage is given as follows: 6.2 Discrete planning ˜ Discrete planning (Algorithm 2) performs two main tasks: (i) T = arg min|t − (T + τ)| s.t . it finds a collision-free discrete path from the current position (9) t ∈[0, T ] p towards the goal position g, and (ii) it assigns durations to each segment of the discrete path. The discrete path found at ˆ ˆ min-dist(R(d(t )), O ∪ S ∪ ∂W) ≥ D the discrete planning stage represents the homotopy class of the final trajectory. Trajectories in the same homotopy class where ∂W is the boundary of workspace W, and min-dist can be smoothly deformed into one another without inter- returns the minimum distance between two sets. secting obstacles (Bhattacharya et al., 2010). The subsequent We solve (9) using linear search on timestamps starting trajectory optimization stage computes a smooth trajectory from T + τ with small increments and decrements. within the homotopy class. The trajectory optimization stage Figure 3, demonstrates the goal selection procedure for a utilizes the discrete path to (i) generate obstacle avoidance particular instance. constraints and (ii) guide the computed trajectory by adding If there is no safe point on the desired trajectory, i.e. if the distance cost terms between the discrete path and the com- robot is closer than D to objects when it is placed on any point puted trajectory. It uses the durations assigned by the discrete on the desired trajectory, we return the current position and planning stage as the piece durations of the piecewise trajec- timestamp. This allows us to plan a safe stopping trajectory. tory it computes. Note that while the selected goal position has free volume Finding a discrete path from the start position p to the around it, it may not be reachable by the ego robot. For exam- goal position g is done with best-effort A search (Line 2, ple, the goal position may be encircled by obstacles or other Algorithm 2), which we define as follows. If there are valid robots. Therefore, we use a best-effort search method during paths from p to g, we find the least-cost path among such discrete planning (as described in Sect. 6.2) that plans a path paths. If no such path exists, we choose the least-cost path towards the goal position. to the position that has the lowest heuristic value (i.e., the The goal and the timestamp are used in the subsequent position that is heuristically closest to g). This modification discrete planning stage as suggestions. of A search is done due to the fact that the goal position may 123 Autonomous Robots Algorithm 2 DISCRETE-PLANNING action is the distance travelled divided by the step size (i.e. Input : g : Goal position cells travelled), which is equal to the size of the direc- Input : T : The timestamp that goal position should be (or tion vector. REACHGOAL has the cost of one rotation plus should have been) reached at π −g cells travelled from π to goal position g:1 + .One Input : S : Set of robot shapes σ rotation cost is there because it is almost surely required to Input : O : Set of obstacles Input : p : Current position do one rotation before going to goal from a cell. ROTATE Input : T : Current timestamp actions in all directions are always valid whenever the cur- TaskInput : R : Collision shape function rent state is valid. FORWARD and REACHGOAL actions are TaskInput : W : Workspace valid whenever the robot shape R swept along the movement TaskInput : γ : Maximum velocity the robot can execute Parameter : σ : Step size of search grid is contained in free space F. Parameter : s : Duration up to which computed trajectory must be For any state (π , ), we use the Euclidean distance from safe. s ≥ δt must hold. position π to the goal position g divided by step size σ (i.e. Return : Discrete path and duration assignments to segments ˆ ˆ cells travelled when π is connected to g with a straight line) 1 F ← W\(O ∪ S) 2 actions ← BEST-EFFORT-A (p, g, R, F,σ ) as the admissible heuristic. 3 {e ,..., e }← EXTRACT-SEGMENTS(actions) 1 L 4 Prepend e to {e ,..., e } so that e = e . Lemma 1 In the action sequence of the resulting plan 1 1 L 0 1 5 total Length ← e − e i i −1 i =1 2 total Length 1. Each ROTATE action must be followed by at least one 6 fDuration ← max T − T , FORWARD action, 7 T ← s 8 for i = 2 → L do 2. The first action cannot be a FORWARD action, e −e i i −1 9 T ← fDuration 3. And no action can appear after REACHGOAL action. total Length 10 end 11 return {e ,..., e }, {T ,..., T } 0 L 1 L Proof Sketch 1. After each ROTATE action, a FORWARD action must be executed in a least-cost plan because (i) a ROTATE action cannot be the last action since goal states accept any direction and removing any ROTATE action from not always be reachable, since the goal selection stage does the end would result in a valid lower cost plan, (ii) the not enforce reachability. REACHGOAL action cannot appear after ROTATE action The ego robot plans its path in a search grid where the because REACHGOAL internally assumes robot rotation grid has hypercubic cells (i.e. square in 2D, cube in 3D) with and removing the ROTATE action would result in a valid edge size σ , which we call the step size of the search. The grid lower cost plan, and iii) there cannot be consecutive ROTATE shifts in the environment with the robot in the sense that the actions in a least-cost path as each rotation has the cost of 1 robot’s current position always coincides with a grid center. and removing consecutive rotations and replacing them with ˆ ˆ Let F = W\(O ∪ S) be the free space within the workspace a single rotation would result in a valid lower cost plan. (Line 1, Algorithm 2). We do not map free space F to the 2. The first action cannot be a FORWARD action since grid. Instead, we check if the robot shape swept along any initial direction is set to 0 and FORWARD action is available segment on the grid is contained in F or not, to decide if a only when  = 0. movement is valid. This allows us to (i) model obstacles and 3. No action after a REACHGOAL action can appear robot shapes independently from the grid’s step size σ , and in a least cost plan because REACHGOAL connects to the (ii) shift the grid with no additional computation since we do goal position, which is a goal state regardless of the direc- not store occupancy information within the grid. tion. Removing any action after REACHGOAL action would The states during the search have two components: posi- result in a valid lower cost plan. tion π and direction . Robots are allowed to move perpen- dicular or diagonal to the grid. This translates to 8 directions By Lemma 1, the action sequence can be described by the in 2-dimension, 26 directions in 3-dimension. Goal states following regular expression in POSIX-Extended Regular are states that have position g and any direction. We model Expression Syntax: directions using vectors  of d components where each com- + ∗ {0,1} ponent is in {−1, 0, 1}. When the robot moves 1 step along ((ROTATE)(FORWARD) ) (REACHGOAL) direction , its position changes by σ . The initial state of the search is the ego robot’s current position p and direction We collapse consecutive FORWARD actions in the result- 0. ing plan and extract discrete segments (Line 3, Algo- There are 3 actions in the search formulation: ROTATE, rithm 2). Each (ROTATE)(FORWARD) sequence and FORWARD, and REACHGOAL, summarized in Table 1. REACHGOAL becomes a separate discrete segment. Let ROTATE action has a cost of 1. The cost of the FORWARD {e ,..., e } be the endpoints of discrete segments. We 1 L 123 Autonomous Robots Table 1 Discrete search actions and their costs Action Description Cost ROTATE Change current direction to a new direction 1 FORWARD Move forward from the current position π along current direction  by the step size σ . It is only available when  = 0 π −g REACHGOAL Connect the current position π to the goal position g where step size of the grid is σ 1 + zero length, to the safety duration s (Line 7, Algorithm 2); the reason for this will be explained in Sect. 6.3. The outputs of discrete planning are segments described by endpoints {e ,..., e } with assigned durations {T ,..., T } 0 L 1 L that are used in the trajectory optimization stage. 6.3 Trajectory optimization In the trajectory optimization stage (Algorithm 3), we for- mulate a convex quadratic optimization problem (QP) to compute a piecewise trajectory f (t ) by smoothing discrete segments. The computed trajectory is collision-free and con- tinuous up to the desired degree of derivative. However, it may be dynamically infeasible (i.e., derivative magnitudes Fig. 4 Discrete Planning. a Goal positions of two robots computed at may exceed the maximum allowed derivative magnitudes the goal selection stage are given. b Red robot plans a discrete path from its current position to its goal position on a search grid that is γ ); this is resolved during the subsequent temporal rescaling aligned on its current position. c Blue robot plans a discrete path from stage. its current position to its goals position on a search grid that is aligned on The decision variables of the optimization problem are its current position. Computed discrete paths are prepended with robot the control points of an L-piece spline where each piece is a start positions to have 0-length first segments. d The resulting discrete paths are given Bèzier curve. The duration of each piece is assigned during discrete planning. Let T = T denote the total duration i =1 of the planned trajectory. The degree of the Bézier curves is d th tuned with the parameter h.Let P ∈ R be the j control i , j prepend the first endpoint to the endpoint sequence in order to th point of the i piece where i ∈{1,..., L}, j ∈{0,..., h}. have a 0-length first segment for reasons that we will explain Let P ={P | i ∈{1,..., L}, j ∈{0,..., h}} be the set i , j in Sect. 6.3 (Line 4, Algorithm 2). The resulting L segments of all control points. are described by L +1 endpoints {e ,..., e } where e = e . 0 L 0 1 Example discrete paths for two robots are shown in Fig. 4. Next, we assign durations to each segment. The total duration of the segments is computed using the ego robot’s maximum velocity γ , the timestamp T that the goal posi- tion should be reached by, and the current timestamp T.We 6.3.1 Constraints use T − T as the desired duration of the plan. However, ˜ ˜ if T − T is negative (i.e. T < T , meaning that the goal There are 4 types of constraints on the trajectory; all are linear position should been reached in the past), or T − T is small in the decision variables P. such that the robot cannot traverse the discrete segments even 1) Workspace constraints: We shift each bounding hyper- with its maximum velocity γ , we adjust the desired dura- plane of workspace W (there are a finite number of such tion to a feasible one, specifically the total length of segments hyperplanes as W is a convex polytope) to account for the divided by the maximum velocity (Line 6, Algorithm 2). We robot’s collision shape R, such that when the robot’s posi- distribute the feasible duration to the segments except for the tion is on the safe side of the shifted hyperplane, the entire first one, proportional to their lengths (Loop at line 8, Algo- robot is on the safe side of the original hyperplane (Line 2, rithm 2). We set the duration of the first segment, which has Algorithm 3). 123 Autonomous Robots Algorithm 3 TRAJECTORY-OPTIMIZATION (ii) it allows for a richer set of collision shapes than basic Input : {e ,..., e }: Endpoints of discrete segments Voronoi cells, which is valid only for hyperspherical objects, 0 L Input : {T ,..., T }: Durations of discrete segments 1 L and (iii) SVM cells are always convex unlike generalized Input : S : Set of robot shapes Voronoi cells. Input : O : Set of obstacles We buffer the ego robot’s SVM cell to account for its colli- Input : p : Current position TaskInput : R : Collision shape function sion shape R, and constrain the first piece of the trajectory to TaskInput : W : Workspace stay inside the buffered SVM cell (BSVM) (loop at Line 5, TaskInput : c : Degree of derivative up to which resulting trajectory Algorithm 3). Only the first piece of the trajectory is con- must be continuous strained to remain in the buffered SVM cell, since the entire Parameter : h : Degree of Bézier curves Parameter : o ˜ : Obstacle check distance planning pipeline is run after δt, which is smaller than the Parameter : r˜ : Robot check distance duration s of the first piece. At that time, planning begins at Parameter : p ˜ : Preferred distance to objects a new location, generating a new first piece that must remain Parameter : α : Preferred distance cost weight in the new buffered SVM cell. Parameter : θ : Endpoint cost weights Parameter : λ : Integrated derivative cost weights Buffering is achieved by changing the offset of the hyper- Return : Potentially dynamically infeasible trajectory plane. R(x) is the shape of the robot when placed at x, defined 1 QP is a quadratic program with variables P , where R is the shape of the robot when as R(x) ={x}⊕R 0 0 2 ϒ ← BUFFER-WORKSPACE(W, R) placed the origin and ⊕ is the Minkowski sum operator. Given 3 addWorkspaceConstraints(QP, ϒ ) 4 ϒ ←∅ ∀i ∈{1,..., L} a hyperplane with normal H and offset H , we find H  that n a a 5 for ∀S ∈ S min-dist(S , R(p)) ≤˜r do j j ensures R(x) is on the negative side of the hyperplane with 6 ϒ ← ϒ ∪{BUFFERED-SVM(S , R(p), R)} 1 1 j normal H and offset H whenever x is on the negative side n a 7 end of the hyperplane with normal H and offset H , and vice n a 8 for i = 1 → L do 9 ζ ← region swept by R from e to e versa, by setting H  = H + max H · y. The follow- i i −1 i a a y∈R n 10 for ∀Q ∈ O min-dist(Q,ζ ) ≤˜ o do ing shows that whenever R(x) is on the negative side of the 11 ϒ ← ϒ ∪{BUFFERED-SVM(Q,ζ , R)} i i i hyperplane (H , H ), x is on the negative side of the hyper- n a 12 end plane (H , H ). The converse can be shown by following n a 13 end the steps backwards. 14 addCollAvoidanceConstraints(QP, ϒ ,...,ϒ ) 1 L 15 addContinuityConstraints(QP, c, p, T ,..., T ) 1 L 16 addEnergyCostTerm(QP, λ, T ,..., T ) 1 L ∀y ∈ R(x) H · y + H ≤ 0 n a 17 addDeviationCostTerm(QP, θ, e ,..., e ) 1 L ⇒ max H · y + H ≤ 0 n a 18 ϒ ← SHIFT-HYPERPLANES(ϒ , p ˜) 1 1 y∈R(x) 19 addPreferredDistanceCostTerm(QP, ϒ ,α) ⇒ max H · y + H ≤ 0 20 f (t ) ← SOLVE-QP(QP) n a y∈{x}⊕R 21 return f (t ) ⇒ max H · (y + x) + H ≤ 0 n a y∈R ⇒ H · x + H + max H · y ≤ 0 n a n Let ϒ be the set of shifted hyperplanes of W. We con- y∈R strain each control point of the trajectory to be on the valid sides of the shifted hyperplanes (Line 3, Algorithm 3). Since Since the duration of the first piece was set to the safety Bézier curves are contained in the convex hulls of their con- duration s ≥ δt in discrete planning, the robot stays within trol points, constraining control points to be in a convex set its buffered SVM cell for at least δt. Moreover, since plan- automatically constrains the Bézier curves to stay within the ning is synchronized across all robots, the pairwise SVM same convex set. hyperplanes they compute will match, thus the buffered SVM 2) Robot-robot collision avoidance constraints: For robot- cells of robots are disjoint. This ensures robot-robot collision robot collision avoidance, recall that each robot replans with avoidance until the next planning iteration. the same period δt and planning is synchronized between Computed SVMs and BSVMs are shown in Fig. 5bfor a robots. two robot case. At each iteration, the ego robot computes its SVM cell To ensure that the number of constraints of the optimiza- within the SVM tessellation of robots using hard-margin tion problem does not grow indefinitely, SVM hyperplanes SVMs. SVM tessellation is similar to Voronoi tesselation, are only computed against those robots that are at most r˜ the only difference is that pairwise SVM hyperplanes are away from the ego robot. This does not result in unsafe computed between collision shapes instead of Voronoi hyper- trajectories so long as r˜ is more than the total maximum planes. We choose SVM tessellation because (i) hard-margin distance that can be traversed by two robots while follow- SVM is convex, hence pairs of robots can compute the same ing the first pieces of their trajectories, for which an upper 1 1 exact hyperplane under the assumption of perfect sensing, bound is max (γ s + γ s ), where s and s are i , j ∈{1,...,N } i j i j i j 123 Autonomous Robots Fig. 5 Trajectory Optimization. a Discrete segments of two robots com- robot’s color. SVM hyperplanes between the swept region and the obsta- puted at the discrete planning stage. b Robot-robot collision avoidance cles are given as light-colored lines. SVM hyperplanes are buffered to constraints are computed using BSVMs. The green hyperplane is the account for the robot’s collision shape and shown as dark-colored lines SVM hyperplane between two robots. Each robot shifts the SVM hyper- (BSVMs). The shift operations are shown as arrows. Obstacles and con- plane to account for its geometry, and constrains the first piece of the straints generated from them are colored using the same colors. For each trajectory with the resulting BSVM. c–e Active set of robot-obstacle piece, the feasible region that is induced by the robot-obstacle collision collision avoidance constraints for three different pieces (one belong- avoidance constraints is colored in gray. f Trajectories computed by the ing to the blue robot, two belonging to the red robot). In each figure, trajectory optimization stage are given the region swept by the robot while traversing the segment is shown in the durations of the first pieces of the trajectories of robots i avoidance, pairs of robots must compute matching hyper- and j respectively. planes using the same algorithm. 3) Robot-obstacle collision avoidance constraints: Buffered Elements of the active set of SVM and BSVM hyperplanes SVM hyperplanes are used for robot-obstacle collision avoid- for robot-obstacle collision avoidance are shown in an exam- ance as well. Let ζ ⊂ R be the region swept by the ego robot ple scenario in Fig. 5c–e for three different pieces. th while traversing the i segment from e to e . We compute Similar to robot-robot collision avoidance, to ensure that i −1 i the SVM hyperplane between ζ and each object in O,buffer the number of constraints of the optimization problem does it as explained before to account for robot’s collision shape, not grow indefinitely, we only compute SVM hyperplanes th and constrain the i piece by the resulting buffered SVM between ζ and obstacles that are not more than o ˜ away from (Loop at line 8, Algorithm 3). This ensures that trajectory ζ . pieces do not cause collisions with obstacles. Let ϒ be the set of buffered SVM hyperplanes that th The use of SVMs for collision avoidance against obsta- constrain the i piece ∀i ∈{1,..., L}. ϒ contains both cles is a choice of convenience, as we already use them for robot-robot and robot-obstacle collision avoidance hyper- robot-robot collision avoidance. For robot-obstacle collision planes while ϒ ∀ j ∈{2,..., L} contain only robot-obstacle avoidance, one can use any separating hyperplane between collision avoidance hyperplanes. This is because the first ζ and objects in O, while in the case of robot-robot collision piece is the only piece that we constrain with robot-robot 123 Autonomous Robots collision avoidance hyperplanes because it is the only piece feasibility of the kinematic constraints. The feasibility of the that will be executed until the next planning iteration. dynamic constraints cannot be ensured for arbitrary degrees 4) Continuity constraints: We add two types of continu- of continuity. ity constraints: (i) continuity constraints between planning Remark 2 Kinematic constraints of the optimization problem iterations, and (ii) continuity constraints between trajectory generated from a discrete path output from the discrete plan- pieces (Line 15, Algorithm 3). ning stage are feasible for the same path when the degree of To enforce continuity between planning iterations, we add Bézier curves h ≥ 1. constraints that enforce Reasoning. Any Bézier curve with degree h ≥ 1 can repre- j j d f (0) d p sent a discrete segment by setting half of the points to the = ∀ j ∈{0,..., c} j j dt dt start of the segment and other half to the end of the segment. Hence, we will only show that a discrete path output from where c is the task input denoting the continuity degree up discrete planning stage by itself satisfies the kinematic con- to which the resulting trajectory must be continuous and p is straints generated. Remember that discrete path output from the current position. the discrete planning stage has the property p = e = e . 0 1 To enforce continuity between trajectory pieces, we add Each robot-robot SVM problem is feasible (see Remark 1). constraints that enforce Let H be the normal and H be the offset of any of the n a j j robot-robot SVMs. It is shifted by setting the offset to H = d f (T ) d f (0) a i i i +1 j j H + max H · y. The point p satisfies H · p + H  ≤ 0 a y∈R n n a dt dt 0 because R(p) is on the negative side of the SVM. The robot- ∀i ∈{1,..., L − 1}∀ j ∈{0,..., c} robot BSVM hyperplanes are used to constrain the first piece th of the trajectory, and setting the first piece as a 0-length seg- where f (t ) is the i piece of the trajectory. ment with p = e = e satisfies the robot-robot collision 0 1 Remark 1 discusses that the SVM problems generated dur- avoidance constraints. ing trajectory optimization are feasible, i.e., the trajectory Each robot-obstacle SVM problem is feasible (see Remark 1). optimization stage will always succeed constructing the QP. Let H be the normal and H be the offset of any of the robot- n a Remark 1 All SVM problems generated for robot-robot and obstacle SVMs that is between ζ and an obstacle. It is shifted robot-obstacle collision avoidance from the discrete path out- by setting the offset to H  = H +max H ·y. All points a a y∈R n putted by the discrete planning stage are feasible. on the line segment p (t ) = e + t (e − e ), t ∈[0, 1] i i −1 i i −1 from e to e satisfy the BSVM constraint because R(p (t )) i −1 i i Reasoning. The discrete planning stage outputs a discrete is on the negative side of the SVM hyperplane ∀t ∈[0, 1]. path such that a robot with collision shape R following the Since each SVM hyperplane between ζ and obstacles is only path does not collide with any obstacles in O or any other th used to constrain the i piece, constraints generated by it are robots in S. It also ensures that p = e since the search starts feasible for the segment from e to e . i −1 i from the robot’s current position p. Hence, R(p) does not The feasibility of the workspace constraints are trivial intersect with any S ∈ S. Since each robot is assumed to since the robot moving along discrete segments is contained be convex, there exists at least one hyperplane that separates in the workspace, and we shift bounding hyperplanes of the R(p) from S for each j by the separating hyperplane the- workspace in the same way as SVM hyperplanes. Hence, the orem. SVM is an optimal separating hyperplane according discrete path satisfies the workspace constraints. to a cost function. Therefore, SVM problems between R(p) Initial point position continuity of the robot is satisfied by and S ∈ S for robot-robot collision avoidance are feasible. the given discrete segments since p = e . Position continuity Since the robot moving along the discrete segments is col- between segments are trivially satisfied by the given discrete lision free, ζ is collision free for all i. Also, since it is a sweep segments, since the discrete path is position continuous by of a convex shape along a line segment, ζ is convex as shown its definition. in Lemma 2 in Appendix A. Similar to the previous argument, since ζ is collision-free and convex and all obstacles in the 6.3.2 Cost function environment are convex, each SVM problem between ζ and Q ∈ O for robot-obstacle collision avoidance is feasible by The cost function of the optimization problem has 3 terms: the separating hyperplane theorem. (i) energy usage, (ii) deviation from the discrete path, and Workspace, robot-robot collision avoidance, robot-obstacle (iii) preferred distance to objects. collision avoidance, and position continuity constraints are We use the sum of integrated squared derivative magni- kinematic constraints. Higher-order continuity constraints tudes as a metric for energy usage (Line 16, Algortihm 3), are dynamic constraints. Remark 2 discusses the ensured similar to the works of Richter et al. (2013) and Hönig et al. 123 Autonomous Robots (2018). Parameters λ ={λ } are the weights for integrated Notice that we formulate continuity and the safety of th squared derivative magnitudes, where λ is the weight for j the trajectory using hard constraints. This ensures that the degree of derivative. The energy term J (P) is given as resulting trajectory is kinematically safe and continuous up energy to degree c if the optimization succeeds for all robots and T j planning is synchronized. d f (t ) J (P) = λ dt . energy j dt λ ∈λ 6.4 Temporal rescaling We use squared Euclidean distances between trajectory At the temporal rescaling stage, we check whether the piece endpoints and discrete segment endpoints as a metric dynamic limits of the robot are violated. If the resulting tra- for deviation from the discrete path (Line 17, Algorithm 3). jectory is valid, i.e. derivative magnitudes of the trajectory Remember that each Bézier curve piece i ends at its last are less than or equal to the maximum derivative magnitudes control point P . Parameters θ ={θ }∈ R are the weights i ,h i k γ ∀k ∈{1,..., K }, the trajectory is returned as the output for each segment endpoint. The deviation term J (P) is dev of RLSS. If not, temporal rescaling is applied to the trajec- given as tory similar to Hönig et al. (2018) and Park et al. (2020)by increasing the durations of the pieces so that the trajectory is J (P) = θ P − e . dev i i ,h i valid. We scale the durations of pieces by multiplying them i ∈{1,...,L} with a constant parameter greater than 1 until the dynamic limits are obeyed. The last term of the cost function models the preferred We choose to enforce dynamic feasibility outside of the distance to objects. We use this term to discourage robots trajectory optimization problem as a post processing step from getting closer to other objects in the environment; this because (i) the output of the trajectory optimization stage is increases the numerical stability of the algorithm by driving often dynamically feasible, hence rescaling is rarely needed, robots away from tight states. and (ii) adding piece durations as variables to the opti- We shift each hyperplane in ϒ (i.e., buffered SVM hyper- mization problem would make it a non quadratic program, planes constraining the first piece) by the preferred distance potentially decreasing performance. to objects, p ˜, to create the preferred hyperplanes ϒ (Line 18, Algorithm 3). Since each robot replans with the period δt,we add a cost term that drives the robot closer to the preferred 7 Evaluation hyperplanes at the replanning period (Line 19, Algorithm 3). We take the sum of squared distances between f (δt ) and Here, using synchronized simulations, we first evaluate our hyperplanes in ϒ : algorithm’s performance when different parameters are used in Sect. 7.1. Second, we conduct an ablation study to show the J (P) = α H f (δt ) + H pre f a effects of two important steps, namely the prepend operation of the discrete planning stage and the preferred distance cost H∈ϒ term of the trajectory optimization stage, to the performance where H is the normal and H is the offset of hyperplane n a of the algorithm in Sect. 7.2. Third, we compare our algorithm H, and α is the weight of J term. Notice that this term is pre f to two state-of-the-art baseline planners in Sect. 7.3. Finally, defined over the control points of first piece since δt ≤ s = we show our algorithm’s applicability to real robots by using T , supporting the utilization of ϒ only. 1 1 it on quadrotors and differential drive robots in Sect. 7.4. The overall trajectory optimization problem is: We conduct our simulations on a laptop computer with Intel i7-8565U @ 1.80GHz running Ubuntu 20.04. We min J (P) + J (P) + J (P) s.t. energy dev pre f implement our algorithm for a single core because of imple- mentation simplicity and fairness to the baseline planners, H P + H ≤ 0 ∀i ∈{1,..., L} i , j a which are not parallelized. The memory usage of each sim- ∀ j ∈{0,..., h} ulation is 30MB on average for our algorithm. In synchronized simulations, we compute trajectories for ∀H ∈ ϒ ∪ ϒ i W each robot using the same snapshot of the environment, move j j d f (0) d p = ∀ j ∈{0,..., c} robots perfectly according to computed trajectories for the re- j j dt dt planning period, and replan. The effects of planning iterations j j d f (T ) d f (0) i i i +1 taking longer than the re-planning period are not modeled in = ∀i ∈{1,..., L − 1} j j dt dt the synchronized simulations. We show that all algorithms ∀ j ∈{0,..., c}. (RLSS and the baselines) can work in 1Hz − 10Hz on 123 Autonomous Robots the hardware we use, and assert that more powerful com- volume swept by the ego robot while traversing a given puters can be used to shorten planning times. In addition, segment, and check collisions between the robot travers- parallelization of the A* search on GPUs is possible. For ing the segment and only the obstacles returned from the example, Zhou and Zeng (2015) show the possibility of 6–7 query. Also, to generate robot-obstacle collision avoidance x speedup in A* search for pathfinding problems on GPUs. constraints, we execute axis aligned box queries around the Moreover, parallelization of quadratic program solving is segments to retrieve nearby obstacles. We generate BSVM possible through (i) running multiple competing solvers in constraints only against nearby obstacles that are no more multiple cores and returning the answer from the first one that than o ˜ away from the robot traversing the segment. Other pop- solves the problem, or (ii) parallelizing individual solvers. ular approaches for environment representations include (i) For instance, IBM ILOG CPLEX Optimizer and Gurobi Euclidean signed distance fields (ESDFs) (Oleynikova et al., Optimizer support running multiple competing solvers con- 2017), which support fast distance queries to nearest obsta- currently. IBM ILOG CPLEX Optimizer supports a parallel cles, (ii) 3D circular buffers (Usenko et al., 2017), which aim barrier optimizer, which parallelizes a barrier algorithm for to limit memory usage of maps and supports fast occupancy solving QPs. Gondzio and Grothey (2009) propose a par- checks, and (iii) Gaussian mixture models (O’Meadhra et allel interior point method solver that exploits nested block al., 2019), which continuously represent occupancy instead structure of problems. Performance improvements of these of discretizing the environment as the former approaches do. approaches are problem dependent, and we do not investigate None of these representations are as suitable as OcTrees for how much the performance of our trajectory optimization RLSS since they do not allow fast querying of obstacles in stage could be improved with such methods. the vicinity of segments. We use the IBM ILOG CPLEX In all experiments, 32 robots are placed in a circle forma- Optimizer to solve our optimization problems. In all experi- tion of radius 20m in 3D, and the task is to swap all robots to ments, robots that are closer than 0.25m to their goal positions the antipodal points on the circle. The workspace W is set to are considered as goal reaching robots. If a robot that has not reached its goal does not change its position more than 1cm an axis aligned bounding box from −25m −25m 0m to in the last 1s of the simulation, it is considered as a dead- 25m 25m 5m . Robot collision shapes are modeled as axis locked robot. At any point of the simulation, if each robot is aligned cubes with 0.2m edge lengths. The desired planning either at the goal or deadlocked, we conclude the simulation. horizon τ is set to 5s. The safety distance D of goal selection is set to 0.2m. Robots have velocity limit γ = 3.67 , and 2 m 7.1 Effects of selected parameters accelaration limit γ = 4.88 , which are chosen arbitrarily. The safety duration s is set to 0.11s and re-planning period We evaluate the performance of our algorithm when 4 impor- δt is set to 0.1s. We set integrated derivative cost weights tant parameters are changed: step size σ of the search grid, λ = 2.0 for velocity and λ = 2.8 for acceleration. We 1 2 degree h of Bézier curves, obstacle check distance o ˜, and set endpoint cost weights to θ = 0, θ = 150, θ = 240, 1 2 3 robot check distance r˜. Step size σ of the search grid is the θ = 300 ∀i ≥ 4. Setting θ = 0 allows the optimiza- i 1 parameter that affects discrete planning performance most tion to stretch the first 0-length segment freely, and setting because it determines the amount of movement at each step other θs incrementally increases the importance of tail seg- during the A search. The degree h of Bézier curves is ments. We set the preferred distance to objects to p ˜ = 0.6m important in trajectory optimization because it determines and the preferred distance cost weight α = 0.3. We use the the number of decision variables. The obstacle check dis- OcTree data structure from the octomap library (Hornung tance o ˜ and robot check distance r˜ determine the number of et al., 2013) to represent the environment. Each leaf-level collision avoidance constraints in the optimization problem. occupied axis aligned box of the OcTree is used as a sep- In all of the parameter evaluations, we use a random 3D arate obstacle in all algorithms. OcTree allows fast axis forest environment with OcTree resolution 0.5m in which aligned box queries which return obstacles that intersects 10% of the environment is occupied. There are 2332 leaf- with a given axis aligned box, an operation we use exten- level boxes in OcTree, translating to 2332 obstacles in total. sively in our implementation. For example, we use axis We set the desired trajectory of each robot to the straight line aligned box queries in discrete planning as broadphase col- segment that connects the robot’s start and goal positions. lision checkers to find the obstacles that are close to the We set the duration of the segment to the length of the line segment divided by the maximum velocity of the robot. We https://www.ibm.com/docs/en/icos/22.1.0?topic=optimizers- enforce continuity up to velocity, hence set c = 1. concurrent-optimizer-in-parallel. In our experiments, our algorithm does not result in any https://www.gurobi.com/documentation/9.5/refman/ collisions or deadlocks. concurrent_environments.html. https://www.ibm.com/docs/en/icos/22.1.0?topic=optimizers-using- parallel-barrier-optimizer. https://www.ibm.com/analytics/cplex-optimizer. 123 Autonomous Robots 1 m We report average computation time per iteration (Fig. 6) γ = 3.67 , the maximum amount of distance that can and average navigation duration of robots (Fig. 7). Average be traversed by a robot until the next planning iteration is navigation duration is the summed total navigation time for 0.367m. The obstacle check distance must be more than this all robots divided by the number of robots. value for safety. We set step size σ = 0.77m, robot check distance r˜ = 2.0m, and degree of Bézier curves h = 12, 7.1.1 Step size  of discrete search which are determined by premiliminary experiments and the results of the experiments in Sects. 7.1.1, 7.1.2, and 7.1.4. We evaluate our algorithm’s performance when step size σ The results are summarized in Figs. 6c and 7c. is changed. We set σ to values between 0.25m to 1.5m with The obstacle check distance is the most important param- 0.25m increments. We set obstacle check distance o ˜ = 1.0m, eter that determines the speed of trajectory optimization, and robot check distance r˜ = 2.0m, and degree of Bézier curves hence the planning pipeline. As o ˜ increases, the number of h = 12 in all cases, which are determined by premil- SVM computations and the number of constraints in the iminary experiments and the results of the experiments in optimization problem increases, which results in increased Sects. 7.1.2, 7.1.3, and 7.1.4. The results are summarized in computation time. Average navigation durations of the robots Figs. 6a and 7a. are close to 22.5m in all cases, suggesting the robustness As the step size gets smaller, discrete search takes more of the algorithm to this parameter. We explain the reason time; but the algorithm can still work in about 2Hz even when of this robustness as follows. All obstacles are considered σ = 0.25m. The average navigation duration of robots are during discrete search and o ˜ determines the obstacles that close to 22.5s in each case, suggesting the robustness of the are considered during trajectory optimization. Therefore, the algorithm to the changes in this parameter. At σ ≥ 0.75, path suggested by the discrete search is already very good, the time discrete planning takes is less than 6% of the time and obstacle avoidance behavior of trajectory optimization trajectory optimization takes. is only important when the discrete path is close to obsta- We also run the algorithm with σ = 3m,σ = 6m, and cles. In those cases, all obstacle check distances capture the σ = 12m, which decreases the flexibility of the discrete obstacles in the vicinity of the path. Therefore, the quality of search considerably. In all of those cases, discrete search the planned trajectories does not increase as o ˜ increases. results in fluctuations, and some robots get stuck in livelocks, in which they move between same set of positions without reaching to their goal positions. 7.1.4 Robot check distance r˜ 7.1.2 Degree h of Bézier curves Last, we evaluate our algorithm’s performance when the robot check distance r˜ is changed. We set r˜ to values between Next, we evaluate our algorithm’s performance when the 1m and 3.5m with 0.5m increments. Robot check distance degree h of Bézier curves is changed. We set h to values must be at least twice the amount of distance that can be in {5,..., 12}. We set step size σ = 0.77m, obstacle check traversed by the robot in one planning iteration, i.e. 0.734m, distance o ˜ = 1.0m, and robot check distance r˜ = 2.0m, because two robots may be travelling towards each other which are determined by premiliminary experiments and the with their maximum speed in the worst case. We set step size results of the experiments in Sects. 7.1.1, 7.1.3, and 7.1.4. σ = 0.77m, obstacle check distance o ˜ = 1.0m, and degree The results are summarized in Figs. 6b and 7b. of Bézier curves h = 12, which are determined by premil- Even if the degree of the Bézier curves determine the iminary experiments and the results of the experiments in number of decision variables of the trajectory optimization, Sects. 7.1.1, 7.1.2, and 7.1.3. the computation time increase of the trajectory optimization The speed of the algorithm is not affected by the robot stage is not more than 10% between degree 5 Bézier curves check distance considerably, because there are 32 robots in and degree 12 Bézier curves. Also, average navigation dura- the environment, and from the perspective of the ego robot, tion of robots are close to 22.5m in each case, suggesting the there are at most 31 constraints generated from other robots. robustness of the algorithm to the changes in this parameter Since there are more than 2000 obstacles in the environment, as well. effects of the obstacle check distance are more drastic than robot check distance. Similar to other cases, average naviga- 7.1.3 Obstacle check distance o ˜ tion duration of the robots is not affected by the choice of r˜ because constraints generated by the robots far away do Next, we evaluate our algorithm’s performance when the not actually constrain the trajectory of the ego robot since obstacle check distance o ˜ is changed. We set o ˜ to val- the ego robot cannot move faster than its maximum speed ues between 0.5m and 3m with 0.5m increments. Since and hence the first piece of the trajectory is never affected by the replanning period δt = 0.1s, and maximum velocity those constraints. 123 Autonomous Robots Fig. 6 Average computation time per stage are given when different parameters are used. Goal selection and temporal rescaling steps are given as“other”since they take a small amount time compared to other two stages. All experiments are done in a 3D random forest environment with 10% occupancy Fig. 7 Average navigation duration of robots from their start positions to their goal positions are given when the selected parameters are changed. In all cases, average navigation duration is not affected by the changes in the selected parameters in the chosen ranges Overall, RLSS does not result in collisions or deadlocks line segments divided by the maximum speed of the robots. when parameters are not set to extreme values. In addi- During simulation, robots continue using their existing plans tion, changes in parameters, outside of extreme ranges, do when planning fails. Also, colliding robots continue navigat- not result in significant changes in the average navigation ing and are not removed from the experiment. durations. These suggest that RLSS does not need extensive We generate ten 3D-maze like environments and list the parameter tuning. average and standard deviation values for our metrics in Table 2. We report the failure rate of all algorithms in the form of the ratio of number of failures to the number of plan- 7.2 Ablation study ning iterations (Fail Rate column in Table 2), the number of robots that are involved in at least one collision during navi- We investigate the effects of (i) the prepend operation of gation (# Coll. column in Table 2) and the average navigation the discrete planning stage (Line 4, Algorithm 2) and (ii) the duration of all robots (Avg. Nav. Dur. column in Table 2). preferred distance cost term J of the trajectory optimiza- pre f The failure rate of RLSS is 0.01% on average. RLSS w/o tion stage to the performance of the algorithm. The prepend pref. dist. fails 0.06% of the time. RLSS w/o prepend fails operation enables the kinematic feasibility of the generated 12.73% of the time. This is drastically more than RLSS optimization problem as shown in Remark 2. The preferred w/o pref. dist because our prepend operation ensures the distance cost term increases the numerical stability of the kinematic feasibility of the optimization problem while our algorithm by encouraging robots to create a gap between preferred distance cost term tackles numerical instabilities themselves and other objects. only. RLSS with neither results in a failure rate of 10.62%. We consider four versions our algorithm: RLSS, RLSS Interestingly, RLSS with the preferred distance cost term but without the prepend operation (RLSS w/o prepend), RLSS without the prepend operation (RLSS w/o prepend) results without the preferred distance cost term (RLSS w/o pref. in a higher failure rate than RLSS with neither. We do not dist.) and RLSS with neither. We set step size σ = 0.77m, investigate the root cause of this since failure rates in both obstacle check distance o ˜ = 1.0m, robot check distance cases are a lot higher than of RLSS. r˜ = 2.0m, degree of Bézier curves h = 12, and c = 1 The effects of failures are seen in the next two metrics. (continuity up to velocity). Robots navigate in 3D maze like RLSS results in no collisions. The number of colliding robots environments. The desired trajectories are set to straight line increase to 0.67 in RLSS w/o pref. dist, 6.78 in RLSS w/o segments connecting robot start positions to goal positions prepend and 8.67 in RLSS with neither on average. The and the durations of the segments are set to the length of the 123 Autonomous Robots Table 2 The results of the Fail Rate # Coll Avg. Nav. Dur [s] ablation study RLSS 0.89 (1.45) / 7904 (297) 0 (0) 24.68 (0.93) RLSS w/o pref. dist 4.78 (9.81) / 8194 (489) 0.67 (1) 25.58 (1.53) RLSS w/o prepend 1556 (120) / 12,220 (538) 6.78 (3.73) 38.17 (1.68) RLSS with neither 1569 (101) / 14,778 (412) 8.67 (2.87) 46.16 (1.29) We compare four different versions of RLSS. We ablate (i) the prepend operation of the discrete planning stage and (ii) the preferred distance cost term of the trajectory optimization stage. The details of the metrics are given in Sect. 7.2. The reported values are means and standard deviations (given in parentheses) of 10 experiments in random maze-like environments. Both prepend operation and preferred distance cost term are important for the effectiveness of the algorithm average navigation duration of robots is lowest in RLSS. It time invariant systems with position output. We define the increases by 3.65% in RLSS w/o pref. dis, 54.66% in RLSS systems with three matrices A, B, C. Since the output of the w/o prepend and 87.03% in RLSS with neither compared to system is the position of the robot, it cannot depend on the RLSS. current input, hence D = 0. This allows us to formulate the These results show that the prepend operation is more problem for a richer set of dynamics and have constraints on important than the preferred distance cost term for the success higher order derivatives than robot velocity. of the algorithm. Nevertheless, RLSS needs both to be safe BVC as published also does not consider obstacles in the and effective. environment. We extend their formulation to static obstacles by modeling obstacles as robots. Since obstacles are static, they stay within their buffered Voronoi cells at all times. 7.3 Comparisons with baseline planners Since we model obstacles as robots, extended BVC does not require the ability of distinguishing robots from obstacles. We compare the performance of our planner to two baseline Our extended BVC formulation, which we call eBVC, is planners that do not require communication in 3D experi- as follows: ments. We set step size σ = 0.77m, obstacle check distance o ˜ = 1.0m, robot check distance r˜ = 2.0m, and degree of M M −1 Bézier curves h = 12 in RLSS in all cases. min λ p − d(T + M t ) + θ u s.t . i i i i u ,...,u 0 M −1 i =1 i =0 x = Ax + Bu ∀i ∈{0,..., M − 1} i +1 i i 7.3.1 Extended buffered Voronoi cell (eBVC) planner p = Cx ∀i ∈{0,..., M } i i The first baseling planner is a MPC-style planner based on p ∈ V ∀i ∈{0,..., M } buffered Voronoi cells introduced by Zhou et al. (2017), p ∈ W ∀i ∈{0,..., M } which we call BVC. In the BVC approach, each robot com- u  u  u ∀i ∈{0,..., M − 1} min i max putes its Voronoi cell within the Voronoi tesselation of the x  x  x ∀i ∈{0,..., M } environment. This is done by using the position informa- min i max tion of other robots. Robots buffer their Voronoi cells to account for their collision shapes and plan their trajectories where T is the current timestamp, d(t ) is the desired trajec- within their corresponding buffered Voronoi cells. Similar to tory for the robot, t is the discretization timestep of the RLSS, BVC does not require any communication between system, M is the number of steps to plan for, u is the input robots, requires perfect sensing of robot positions in the envi- to apply from timestep i to i + 1, x is the state at timestep i, ronment, and the resulting trajectories are safe only when p is the position at timestep i, V is the buffered Voronoi cell planning is synchronized between robots. They also require of the robot, u and u are the limits for the inputs, x min max min that each robot stays within its buffered Voronoi cell until and x are the limits for the states (which can be used to max the next planning iterations. Unlike RLSS, BVC is based on bound velocity in a double integrator system, or velocity and buffered Voronoi cells and it cannot work with arbitrary con- acceleration in a triple integrator system for example), and vex objects. Instead, robots are modeled as hyperspheres in W is the workspace of the robot. M t is the planning hori- BVC. zon. A robot plans toward the position d(T + M t ), which The original formulation presented in the BVC article only is the position of the robot after the planning horizon if it allows position state for the robots, which cannot model a could follow the desired trajectory perfectly. x is the current rich set of dynamics, including double, triple, or higher order state of the robot. The first term of the cost function penalizes integrators. We extend their formulation to all discrete linear deviation from the goal position for the final and each inter- 123 Autonomous Robots mediate position with different weights λ . The second term of the cost function is the input cost that penalizes input mag- nitudes with different weights θ . We apply the first input of the solution for duration t, and replan at the next timestep. We use our own implementation of eBVC as explained above during the comparisons. 7.3.2 Relative safe flight corridor (RSFC) planner The second planner we compare against is presented by Park and Kim (2021), in which piecewise Bézier curves are com- puted, executed for a short duration, and replanning is done at the next iteration similar to our work. It utilizes the fact that the difference of two Bézier curves is another Bézier curve by constraining these relative Bézier curves to be inside safe regions (relative safe flight corridors, or RSFCs) defined Fig. 8 Desired trajectories and the used forest map of experiment 4 is given in (a). Same forest map is used in each forest experiment. according to robot collision shapes. We call this algorithm Desired trajectories and the used maze map of experiment 7 is given RSFC for short. RSFC does not require any communication in (b). Same maze map is used in each maze experiment. c shows the between robots. It utilizes both positions and velocities of executed trajectories of robots running RLSS from top in experiment 4. other objects in the environment, hence requires more sens- d shows the executed trajectories of robots running RLSS from top in experiment 7 ing information than our algorithm. Velocities are used to predict the trajectories of other robots as piecewise Bézier curves. While it can handle dynamic obstacles as well, we use it in static environments in our comparisons, since RLSS velocity continuity is required, the system of eBVC is a does not handle dynamic obstacles explicitly. double integrator with position output. When acceleration We use the authors’ implementation of RSFC during our continuity is required, the system of eBVC is a triple inte- comparisons. grator with position output. We compare RSFC to RLSS and eBVC only with accelera- 7.3.3 Experiments and results tion continuity requirement because the authors’ implemen- tation hard codes the acceleration continuity requirement, We compare RLSS against eBVC and RSFC in 10 differ- while in theory it can work with any degree. Also, we com- ent experiments differing in required degree of continuity, pare RSFC only in the case of an empty prior map in order desired trajectories, and map of the environment. In all exper- not to change RSFC’s source code, while in theory it can be iments, 32 robots are placed in a circle formation with radius guided with arbitrary trajectories. 20m in 3D. The task is to swap the positions of robots to the We plan for 5s long trajectories in every 0.1s with all antipodal points on the circle. The results of the experiments algorithms. In eBVC, we plan for M = 50 steps with dis- are summarized in Table 3. cretization timestep t = 0.1s. We set state and input upper There are 3 maps we use: empty, forest (Fig. 8a), and maze and lower bounds in eBVC in order to obey the dynamic lim- (Fig. 8b), listed in the map column of Table 3. In the empty its of the robots. We set distance to goal weights λ = 120, map, there are no obstacles in the environment. The forest λ = 20 ∀i ≥ 2 in eBVC, putting more importance on the map is a random forest with 10% occupancy; it has a radius position of the robot after 1 timestep. We set θ = 1 ∀i in of 15m and each tree is a cylinder with radius 0.5m.The eBVC. maze map is a maze-like environment with choke regions. Both eBVC and RSFC require spherical obstacles. We use We compute the desired trajectories of the robots by run- the smallest spheres containing each leaf-level box of the ning a single-agent shortest path using the discrete planning OcTree structure as obstacles in eBVC and RSFC. Similar to stage of RLSS. We run single-agent shortest path on a prior RLSS, we use robot and obstacle check distance to limit the map, which is set to either a full map of the environment or number of obstacles considered at each iteration. We set both to an empty map; this is listed in the prior map column of obstacle and robot check distance o ˜ =˜r = 2.0m in eBVC, Table 3. When the prior map is empty, single-agent short- and set o ˜ =˜r = 5.0m in RSFC, since smaller values in RSFC est paths are straight line segments connecting robot start result in a high number of collisions and higher values for positions to robot goal positions. the parameters do not improve the success of the algorithms. We set two different continuity requirements: velocity and Robots are modeled as spheres in eBVC and RSFC as well. acceleration, listed in the continuity column of Table 3. When We set robot shapes to spheres with radius 0.173m, which 123 Autonomous Robots Table 3 The results of the comparisons of RLSS, eBVC, and RSFC are summarized Experiment Map Prior Map Continuity Algorithm # Deadlocks # Coll. Robots Avg. Collision Dur. [s] Avg. Nav. Dur. [s] Avg. Comp. Time [ms] 1 Empty Empty Velocity RLSS 0 0 NA 22.37 107 eBVC 0 0 NA 18.50 106 2 Acceleration RLSS 0 0 NA 22.12 110 eBVC 0 0 NA 20.65 161 RSFC 0 0 NA 19.83 47 3 Forest Forest Velocity RLSS 00 NA 23.11 182 eBVC 10 7 0.53 18.94 387 4 Acceleration RLSS 00 NA 23.06 147 eBVC 8 13 0.83 21.24 763 5 Forest Empty Velocity RLSS 00 NA 22.62 192 eBVC 8 24 1.35 18.54 186 6 Acceleration RLSS 00 NA 22.72 244 eBVC 11 15 0.71 21.28 970 RSFC 0 6 0.42 21.82 201 7 Maze Maze Velocity RLSS 00 NA 25.09 160 eBVC 21 9 0.67 20.12 145 8 Acceleration RLSS 00 NA 25.32 170 eBVC 17 13 1.49 23.80 264 9 Maze Empty Velocity RLSS 00 NA 27.98 410 eBVC 32 0 NA NA 176 10 Acceleration RLSS 00 NA 32.04 386 eBVC 30 2 3.26 21.75 407 RSFC 0 22 0.34 28.23 80 Each experiment differ in the map used during navigation, prior map used during desired trajectory computation, and the required degree of continuity. RLSS results in longer navigation durations than eBVC and RSFC on average; but eBVC suffers from deadlocks in environments with obstacles and RSFC results in collisions in environments with obstacles Autonomous Robots are the smallest spheres containing the actual robot shapes. lot better than eBVC. RSFC results in no deadlocks while We count collisions only when contained leaf-level OcTree eBVC results in 11 deadlocks. 15 out of remaining 21 robots boxes and contained robot shapes intersect. This gives both get involved in at least one collision when eBVC is used with eBVC and RSFC buffer zones for collisions. an average collision duration of 0.71s. 6 out of 32 robots We report the number of deadlocking robots (# Dead- running RSFC collide at least once with average collision locks column in Table 3), number of robots that are involved duration of 0.42s. RLSS does not result in any deadlocks or in at least one collision (# Coll. Robots column in Table 3), collisions. collision duration of robots that are involved in collisions When the environment is a complicated maze, the per- averaged over robots (Avg. Collision Dur. column in Table 3), formance of both eBVC and RSFC degrades more. With full average navigation duration of non-deadlocking robots from prior map and velocity continuity (experiment 7), 21 out of 32 their start positions to goal positions (Avg. Nav. Dur. column robots deadlock, 9 out of remaining 11 are involved in colli- in Table 3), and computation time per iteration averaged over sions, leaving only 2 reaching to their goal volumes without each planning iteration of each robot (Avg. Comp. Time col- an incident when eBVC is used. With full prior map and umn in Table 3). We continue the navigation of colliding acceleration continuity (experiment 8), 17 out of 32 robots robots and do not remove them from the experiment. deadlock, 13 out of remaining 15 are involved in collisions, Executed trajectories of robots running RLSS are shown again leaving only 2 that reach their goal volumes without in Fig. 8c for experiment 4, and in Fig. 8d for experiment 7 incident when eBVC is used. RLSS does not result in any as examples. deadlocks or collisions in these scenarios. When the prior RLSS does not result in any deadlocks or collisions in map is empty, i.e. the desired trajectories are straight line all cases. eBVC has a significant number of deadlocks and segments, the performance of eBVC degrades even more. All RSFC results in collisions in experiments with obstacles. robots deadlock in the case with velocity continuity (experi- When there are no obstacles in the environment, e.g., ment 9), while 30 out of 32 robots deadlock and the rest get in experiments 1 and 2, no algorithm results in deadlocks involved in collisions in the case with acceleration continuity or collisions. When velocity continuity is required, e.g., in (experiment 10). No robot running RSFC deadlocks but 22 experiment 1, the average navigation duration of robots run- out of 32 get involved in collisions at least once with 0.34s ning eBVC is 18% lower than those that run RLSS. Both average collision duration in experiment 10 with accelera- eBVC and RLSS run close to 9Hz on average. When acceler- tion continuity. RLSS does not result in any deadlocks or ation continuity is required, e.g., in experiment 2, the average collisions in those cases. navigation duration of robots running RSFC is 10% lower Overall, robots running RLSS have higher navigation than those that run RLSS; and the average navigation dura- durations than those that use eBVC or RSFC in all exper- tion for eBVC is 7% lower than that for RLSS. RLSS runs at iments. While the higher navigation duration is not an about 9Hz on average, eBVC runs close to 6Hz on average, important metric when other algorithms cause deadlocks or and RSFC runs close to 21Hz on average. When there are collisions, it is an important metric when they do not (exper- no obstacles in the environment, the discrete search of RLSS iments 1 and 2 with no obstacles in particular). eBVC does results in unnecessary avoidance movements, which is the not have an integrated discrete planner, which is the rea- main reason for average navigation duration differences. son for its good performance in terms of average navigation When there are obstacles in the environment, performance durations. When robots are close to each other, RLSS uses of both eBVC and RSFC degrades in terms of the number of the free space less effectively since discrete planning has a deadlocks and collisions. step size 0.77m. eBVC does not rely on discrete planning In experiment 3, even if the full prior map of the envi- and hence avoids other robots by executing small direction ronment is given during initial discrete search with only changes. RSFC utilizes velocities of other robots on top of velocity continuity, 10 out of 32 robots deadlock, and 7 out positions, which allows it to estimate the intents of robots of the remaining 22 get involved in at least one collision more effectively, resulting in a better usage of the free space, when eBVC is used. When acceleration continuity is required and hence results in better navigation durations on average. (experiment 4), 8 robots deadlock and 13 other robots get Since RLSS does not utilize communication and uses only involved in collisions, resulting in only 11 robot reaching positions of other robots, it cannot deduce the intents of other their goal volumes without collisions in eBVC. RLSS both robots. This results in fluctuations of plans between planning works faster than eBVC and results in no deadlocks or colli- iterations, which increases the average navigation durations sions in those cases. of robots. Fluctuations of plans increase when the environ- When the prior map is not known in the forest environment ment is dense as seen in the supplemental video. If the (experiments 5 and 6), eBVC results in a lot of deadlocks environment becomes overly constraining, e.g. tens of robots and collisions. All robots in experiment 5 either deadlock or collide when eBVC is used. In experiment 6, RSFC does a https://youtu.be/Jrdvf2qyzrg. 123 Autonomous Robots Turtlebot3s, and Turtlebot2s are equipped with ODROID XU4 and ODROID C1+ single board computers running ROS Kinetic on the Ubuntu 16.04 operating system. In all cases the algorithm is run on a centralized base station com- puter using separate processes for each robot. Therefore, unlike simulations, planning is not synchronized between robots on real robot implementations. RLSS does not result in any deadlocks in collisions in these asynchronized deploy- ments as well. Commands are sent to 2D robots over a WiFi network, and to Crazyflie 2.0 s directly over their custom radio. We conduct external disturbance experiments with 2 iRobot Create2s, 3 Turtlebot3s, and 2 Turtlebot2s (Fig. 9a). A human changes the positions of some robots by moving them arbitrarily during execution several times. In all cases, robots replan in real-time and avoid each other successfully. We demonstrate the algorithm in 3D using 6 Crazyflie 2.0 s. We conduct an experiment without obstacles in which Crazyflies swap positions with straight lines as desired tra- jectories (Fig. 9b). In another experiment, we show that Fig. 9 Physical robot experiments using Turtlebot2s, Turtlebot3s, Crazyflies can navigate through an environment with obsta- iRobot Create2s, and Crazyflie 2.0 s. RLSS works in real-time under cles (9c, d). In each case, we externally disturb the Crazyflies external disturbances and show that they can replan in real-time. The recordings for our physical robot experiments are trying to pass through a narrow tube, these fluctuations may included in the supplemental video. turn into livelocks. However, when obstacles are introduced in the envi- ronment, the performance of RLSS is better than other 8 Conclusion algorithms. eBVC suffers greatly from deadlocks. RSFC does not result in deadlocks but results in collisions, even In this article, we present RLSS, a real-time decentralized while using more information than RLSS (velocity and posi- long horizon trajectory planning algorithm for the navigation tion instead of position only). of multiple robots in shared environments with obstacles that A note about the statistical significance of the results. In provides guarantees on collision avoidance if the resulting each experiment, we run each algorithm on single randomly problems are feasible. The generated optimization problem to generated forest-like or maze-like environment. To show that compute a smooth trajectory is convex and kinematically fea- the results are consistent for environment types, we generate sible. It does not require any communication between robots, 10 forest-like environments with the same parameters for requires only position sensing of obstacles and robots in the experiment 3 and run RLSS and eBVC. RLSS does not result environment, and robots to be able to distinguish other robots in deadlocks or collisions in any of the cases. The average from obstacles. With its comparatively minimal sensing navigation duration of robots averaged over 10 environments requirements and no reliance on communication, it presents a is 19.15s with standard deviation 0.24s for eBVC and 23.59s new baseline for algorithms that require communication and with standard deviation 0.25s for RLSS. The ratio between sensing/prediction of higher order state components of other average navigation durations when eBVC or RLSS is used robots. The algorithm considers the dynamic limits of the is consistent with the reported values given in Table 3 for robots explicitly and enforces safety using hard constraints. experiment 3. We show in synchronized simulation that RLSS performs better than two state-of-the-art planning algorithms (eBVC 7.4 Physical robots and RSFC), one of which requires velocity sensing on top of position sensing, in environments with obstacles in terms of We implement and run RLSS on physical robots using iRobot Create2s, Turtlebot2s and Turtlebot 3 s in 2D; and Since we define robots’ goals as single points, i.e. sets of measure Crazyflie 2.0 s in 3D. We use a VICON motion tracking sys- zero, in physical experiments, robots keep missing their goals slightly. tem for localization. Robots do not sense, but receive the This results in a spinning behavior in 2D since robots continuously fix position of others using the VICON system. iRobot Create2s, their positions by replanning. 123 Autonomous Robots number of deadlocks and number of colliding robots. In our Appendix A: Sweep of a convex shape along experiments, RLSS does not result in any deadlocks or colli- a line segment is convex sions, while eBVC suffers from deadlocks and RSFC results in collisions (while RLSS provides theoretical guarantees on Lemma 2 Let R(x) ={x}⊕ R be the Minkowski sum of d d collision avoidance when it succeeds, it does not provide {x}∈ R and R ⊆ R , and let R be a convex set. Let 0 0 theoretical guarantees on deadlock avoidance). When there p(t ) = a + t (b − a) where t ∈[0, 1] be the line segment d d are no obstacles in the environment, RSFC and eBVC out- from a ∈ R to b ∈ R . Then, the swept volume of R along perform RLSS in terms of average navigation duration by p(t ) is convex. 7–20%. Proof The swept volume ζ of R from a to b can be defined In future work, we would like to extend our planner to 1 1 as ζ =∪ R(p(t )) =∪ {p(t )}⊕ R . Choose two points asynchronously planning robots where each robot starts and 0 t =0 t =0 q , q ∈ ζ . ∃t ∈[0, 1]∃q ˜ ∈ R q = p(t ) + q˜ and ∃t ∈ 1 2 1 1 0 1 1 1 2 finishes planning at different unknown timestamps by utiliz- [0, 1]∃q ˜ ∈ R q = p(t ) + q˜ .Let q = (1 − t )q + t q 2 0 2 2 2 1 2 ing communication. Also, we would like to incorporate the be a point on the line segment connecting q and q where 1 2 noise in the sensing systems within our algorithm in order to t ∈[0, 1]. solve the cases with imperfect sensing in a principled way. Supplementary Information The online version contains supplemen- q = (1 − t )(p(t ) + q ˜ ) + t (p(t ) + q ˜ ) 1 1 2 2 tary material available at https://doi.org/10.1007/s10514-023-10104- = (1 − t )(a + t (b − a) + q ˜ ) 1 1 w. +t (a + t (b − a) + q ˜ ) 2 2 Acknowledgements This work was supported by National Science = (1 − t )q ˜ + t q ˜ 1 2 Foundation awards IIS-1724399, IIS-1724392, and CPS-1837779. B. Senba ¸ slar ¸ was supported by a University of Southern California +a + (t (1 − t ) + t t )(b − a) 1 2 Annenberg Fellowship. Wolfgang Hönig was partially funded by the Deutsche Forschungsgemeinschaft (DFG, German Research Founda- (t (1 − t ) + t t ) ∈[0, 1] because it is a convex combination tion) - 448549715. 1 2 of t , t ∈[0, 1]. Therefore a + (t (1 − t ) + t t )(b − a) ∈ 1 2 1 2 Funding Open access funding provided by SCELC, Statewide Califor- p(t ).Also, (1 − t )q ˜ + t q ˜ ∈ R , because it is a convex 1 2 0 nia Electronic Library Consortium combination of q ˜ , q ˜ ∈ R and R is convex. Therefore, 1 2 0 0 q ∈ ζ since ∃t ∈[0, 1] q ∈{p(t )}⊕ R . Hence, ζ is Declarations convex. Conflict of interest Baskın Senba ¸ slar ¸ , the first author, has moved to Prof. Gaurav S. Sukhatme’s group in January 2022 after the initial submis- sion of this paper. Prof. Sukhatme is the editor-in-chief of Autonomous References Robots journal and the advisor of Baskın Senba ¸ slar ¸ at the time of writing. Prof. Sukhatme was not involved in ideation, writing and exper- Alonso-Mora, J., Breitenmoser, A., Rufli, M., Beardsley, P., Siegwart, iments of this paper in any way. This paper is submitted to Robot R. (2013). Optimal reciprocal collision avoidance for multiple non- Swarms in the Real World: from Design to Deployment special issue of holonomic robots. In Distributed autonomous robotic systems: The Autonomous Robots, which has Dr. Siddharth Mayya as the lead guest 10th international symposium (pp. 203–216). https://doi.org/10. editor. 1007/978-3-642-32723-0. Barer, M., Sharon, G., Stern, R., & Felner, A. (2014). Suboptimal Open Access This article is licensed under a Creative Commons variants of the conflict based search algorithm for the multi- Attribution 4.0 International License, which permits use, sharing, adap- agent pathfinding problem. Frontiers in Artificial Intelligence tation, distribution and reproduction in any medium or format, as and Applications, 263, 961–962. https://doi.org/10.3233/978-1- long as you give appropriate credit to the original author(s) and the 61499-419-0-961 source, provide a link to the Creative Commons licence, and indi- Batra, S., Huang, Z., Petrenko, A., Kumar, T., Molchanov, A., cate if changes were made. The images or other third party material Sukhatme, G. S. (2021). Decentralized control of quadrotor in this article are included in the article’s Creative Commons licence, swarms with end-to-end deep reinforcement learning. In 5th Con- unless indicated otherwise in a credit line to the material. If material ference on robot learning. CoRL 2021. is not included in the article’s Creative Commons licence and your Bhattacharya, S., Kumar, V., Likhachev, M. (2010). Search-based path intended use is not permitted by statutory regulation or exceeds the planning with homotopy class constraints. In Proceedings of the permitted use, you will need to obtain permission directly from the copy- twenty-fourth AAAI conference on artificial intelligence, pp. 1230– right holder. To view a copy of this licence, visit http://creativecomm ons.org/licenses/by/4.0/. Campion, G., Bastin, G., & Dandrea-Novel, B. (1996). Structural properties and classification of kinematic and dynamic models of wheeled mobile robots. IEEE Transactions on Robotics and Automation, 12(1), 47–62. https://doi.org/10.1109/70.481750 Chen, J., Liu, T., Shen, S. (2016). Online generation of collision-free tra- jectories for quadrotor flight in unknown cluttered environments. In 2016 IEEE international conference on robotics and automa- 123 Autonomous Robots tion (ICRA) (pp. 1476–1483). https://doi.org/10.1109/ICRA.2016. motion planning. IEEE Robotics and Automation Letters, pp. 1–1. 7487283. https://doi.org/10.1109/LRA.2020.2964159. Cortes, C., & Vapnik, V. (1995). Support-vector networks. Machine Ma, H., Harabor, D., Stuckey, P.J., Li, J., Koenig, S. (2019). Searching Learning, 20(3), 273–297. https://doi.org/10.1007/BF00994018 with consistent prioritization for multi-agent path finding. In Pro- Desai, A., & Michael, N. (2020). Online planning for quadrotor ceedings of the AAAI conference on artificial intelligence, 33(01), teams in 3-d workspaces via reachability analysis on invariant 7643–7650. https://doi.org/10.1609/AAAI.V33I01.33017643. geometric trees. In IEEE international conference on robotics Mellinger, D., & Kumar, V. (2011). Minimum snap trajectory generation and automation (ICRA) (pp. 8769–8775). https://doi.org/10.1109/ and control for quadrotors. In IEEE international conference on ICRA40945.2020.9197195. robotics and automation (ICRA) (pp. 2520–2525). https://doi.org/ Dresner, K., & Stone, P. (2008). A multiagent approach to autonomous 10.1109/ICRA.2011.5980409. intersection management. Journal of Artificial Intelligence Murray, R. M., Rathinam, M., Sluis, W. (1995). Differential flatness of Research, 31, 591–656. https://doi.org/10.1613/jair.2502. mechanical control systems: A catalog of prototype systems. In Farouki, R. T. (2012). The Bernstein polynomial basis: A centennial ASME international mechanical engineering congress and expo- retrospective. Computer Aided Geometric Design, 29(6), 379–419. sition. https://doi.org/10.1016/j.cagd.2012.03.001 Murray, R. M., & Sastry, S. S. (1993). Nonholonomic motion planning: Furda, A., & Vlacic, L. (2011). Enabling safe autonomous driving steering using sinusoids. IEEE Transactions on Automatic Control, in real-world city traffic using multiple criteria decision making. 38(5), 700–716. https://doi.org/10.1109/9.277235. IEEE Intelligent Transportation Systems Magazine, 3(1), 4–17. Oleynikova, H., Taylor, Z., Fehr, M., Siegwart, R., & Nieto, J. (2017). https://doi.org/10.1109/MITS.2011.940472 Voxblox: Incremental 3D euclidean signed distance fields for on- Gondzio, J., & Grothey, A. (2009). Exploiting structure in parallel board mav planning. In IEEE/RSJ international conference on implementation of interior point methods for optimization. Com- intelligent robots and systems (IROS). https://doi.org/10.1109/ putational Management Science, 6(2), 135–160. https://doi.org/ IROS.2017.8202315. 10.1007/s10287-008-0090-3 O’Meadhra, C., Tabib, W., & Michael, N. (2019). Variable resolu- Harabor, D., & Grastien, A. (2011). Online graph pruning for pathfind- tion occupancy mapping using gaussian mixture models. IEEE ing on grid maps. In Proceedings of the AAAI conference on Robotics and Automation Letters, 4(2), 2015–2022. https://doi. artificial intelligence, vol. 25, pp. 1114–1119. org/10.1109/LRA.2018.2889348 Hönig, W., Preiss, J. A., Kumar, T. K. S., Sukhatme, G. S., & Ayanian, N. Park, J., & Kim, H. J. (2021). Online trajectory planning for multi- (2018). Trajectory planning for quadrotor swarms. IEEE Transac- ple quadrotors in dynamic environments using relative safe flight tions on Robotics, 34(4), 856–869. https://doi.org/10.1109/TRO. corridor. IEEE Robotics and Automation Letters, 6(2), 659–666. 2018.2853613 https://doi.org/10.1109/LRA.2020.3047786 Homm, F., Kaempchen, N., Ota, J., Burschka, D. (2010). Efficient occu- Park, J., Kim, J., Jang, I., Kim, H. J. (2020). Efficient multi-agent pancy grid computation on the gpu with lidar and radar for road trajectory planning with feasibility guarantee using relative Bern- boundary detection. In IEEE intelligent vehicles symposium (IV) stein polynomial. In IEEE international conference on robotics (pp. 1006–1013). https://doi.org/10.1109/IVS.2010.5548091. and automation (ICRA) (pp. 434–440). https://doi.org/10.1109/ Hopcroft, J., Schwartz, J., & Sharir, M. (1984). On the complexity ICRA40945.2020.9197162. of motion planning for multiple independent objects; pspace- Peterson, R., Buyukkocak, A. T., Aksaray, D., & Yazicioglu, ˇ Y. (2021). hardness of the“warehouseman’s problem”. The International Distributed safe planning for satisfying minimal temporal relax- Journal of Robotics Research, 3(4), 76–88. https://doi.org/10. ations of twtl specifications. Robotics and Autonomous Systems, 1177/027836498400300405 142, 103801. https://doi.org/10.1016/j.robot.2021.103801. Hornung, A., Wurm, K. M., Bennewitz, M., Stachniss, C., & Bur- Piegl, L., & Tiller, W. (1995). The NURBS book. Springer. https://doi. gard, W. (2013). Octomap: An efficient probabilistic 3d mapping org/10.1007/978-3-642-97385-7. framework based on octrees. Autonomous Robots, 34(3), 189–206. Richter, C., Bry, A., Roy, N. (2013). Polynomial trajectory planning https://doi.org/10.1007/s10514-012-9321-0 for aggressive quadrotor flight in dense indoor environments. In Jiang, S., & Song, K. (2013). Differential flatness-based motion control International symposium of robotic research (ISRR) (vol. 114, pp. of a steerand-drive omnidirectional mobile robot. In IEEE inter- 649–666). https://doi.org/10.1007/978-3-319-28872-7_37. national conference on mechatronics and automation (ICMA) (pp. Riviere, B., Hönig, W., Yue, Y., Chung, S.-J. (2020). Glas: Global- 1167–1172). https://doi.org/10.1109/ICMA.2013.6618079. to-local safe autonomy synthesis for multi-robot motion planning Karaman, S., & Frazzoli, E. (2010). Incremental sampling-based algo- with end-to-end learning. IEEE Robotics and Automation Letters, rithms for optimal motion planning. Robotics Science and Systems pp. 1–1. https://doi.org/10.1109/LRA.2020.2994035. VI, 104(2). https://doi.org/10.15607/RSS.2010.VI.034. Sartoretti, G., Kerr, J., Shi, Y., Wagner, G., Kumar, T. K., Koenig, Lam, E., Le Bodic, P., Harabor, D.D., Stuckey, P.J. (2019). Branch-and- S., & Choset, H. (2019). Primal: Pathfinding via reinforcement cut-and-price for multi-agent pathfinding. In Proceedings of the and imitation multi-agent learning. IEEE Robotics and Automa- twenty-eighth international joint conference on artificial intelli- tion Letters, 4, 2378–2385. https://doi.org/10.1109/LRA.2019. gence, IJCAI-19 (pp. 1289–1296). https://doi.org/10.24963/ijcai. 2903261 2019/179. Senba ¸ slar ¸ , B., Hönig, W., Ayanian, N. (2019). Robust trajectory execu- Li, Q., Gama, F., Ribeiro, A., Prorok, A. (2020). Graph neural networks tion for multi-robot teams using distributed real-time replanning. for decentralized multi-robot path planning. IEEE/RSJ interna- In Distributed Autonomous Robotic Systems (DARS) (pp. 167– tional conference on intelligent robots and systems (IROS) (pp. 181). https://doi.org/10.1007/978-3-030-05816-6_12. 11785–11792). https://doi.org/10.1109/iros45743.2020.9341668. Senba ¸ slar ¸ , B., Hönig, W., Ayanian, N. (2021). RLSS: Real-time Liu, S., Watterson, M., Mohta, K., Sun, K., Bhattacharya, S., Taylor, C. multi-robot trajectory replanning using linear spatial separations. J., & Kumar, V. (2017). Planning dynamically feasible trajectories Retrieved from arXiv:2103.07588. for quadrotors using safe flight corridors in 3-D complex environ- Senba ¸ slar ¸ , B., & Sukhatme, G. (2022). Asynchronous real-time decen- ments. IEEE Robotics and Automation Letters, 2(3), 1688–1695. tralized multirobot trajectory planning. In IEEE/RSJ international https://doi.org/10.1109/LRA.2017.2663526 conference on intelligent robots and systems (IROS 2022). Luis, C., Vukosavljev, M., & Schoellig, A. (2020). Online trajectory generation with distributed model predictive control for multirobot 123 Autonomous Robots Sharon, G., Stern, R., Felner, A., & Sturtevant, N. R. (2015). Conflict- tems Lab (RESL). His research based search for optimal multi-agent pathfinding. Artificial Intel- interests include multi-robot navigation systems, multi-robot trajec- ligence, 219, 40–66. https://doi.org/10.1016/j.artint.2014.11.006 tory planning, and multi-robot aware control. Solovey, K., Salzman, O., & Halperin, D. (2013). Finding a needle in an exponential haystack: Discrete RRT for exploration of implicit Wolfgang Hönig is an indepen- roadmaps in multi-robot motion planning. In The International dent junior research group leader Journal of Robotics Research, 107. https://doi.org/10.1007/978- at TU Berlin, Germany heading 3-319-16595-0_34. the Intelligent Multi-Robot Coor- Tang, S., & Kumar, V. (2016). Safe and complete trajectory generation dination Lab. Previously, he was for robot teams with higher-order dynamics. In IEEE/RSJ inter- a postdoctoral scholar at the Cal- national conference on intelligent robots and systems (IROS) (pp. ifornia Institute of Technology, 1894–1901). https://doi.org/10.1109/IROS.2016.7759300. USA. He received the diploma in Tordesillas, J., & How, J. P. (2020). MINVO basis: Finding sim- Computer Science from TU Dres- plexes with minimum volume enclosing polynomial curves. arXiv den, Germany in 2012, and the preprint arXiv:2010.10726. M.S. and Ph.D. degrees from the Tordesillas, J., & How, J. P. (2021). MADER: Trajectory planner in University of Southern California multi-agent and dynamic environments. IEEE Transactions on (USC), USA in 2016 and 2019, Robotics. https://doi.org/10.1109/TRO.2021.3080235 respectively. His research focuses Usenko, V., Von Stumberg, L., Pangercic, A., Cremers, D. (2017). Real- on enabling large teams of physi- time trajectory replanning for mavs using uniform b-splines and a cal robots to collaboratively solve real-world tasks, using tools from 3d circular buffer. In IEEE/RSJ international conference on intel- informed search, optimization, and machine learning. Dr. Hönig has ligent robots and systems (IROS) (pp. 215–222). https://doi.org/ been the recipient of several awards, including Best Paper in Robotics 10.1109/IROS.2017.8202160. Track for a paper at ICAPS 2016 and the 2019 Best Dissertation Wang, L., Ames, A.D., Egerstedt, M. (2017). Safety barrier certifi- Award in Computer Science at USC. cates for collisions-free multirobot systems. IEEE Transactions on Robotics, 33(3), 661–674. https://doi.org/10.1109/TRO.2017. 2659727. Nora Ayanian received the B.S. Wang, X., Xi, L., Chen, Y., Lai, S., Lin, F., Chen, B.M. (2021). Decen- degree in Mechanical Engineer- tralized mpc-based trajectory generation for multiple quadrotors ing and Mechanics in 2005 from in cluttered environments. Guidance, Navigation and Control, Drexel University, Philadelphia, 01(02), 2150007. https://doi.org/10.1142/S2737480721500072. PA, and the M.S. and Ph.D. Wurman, P., D’Andrea, R., & Mountz, M. (2008). Coordinating hun- degrees in Mechanical Engineer- dreds of cooperative, autonomous vehicles in warehouses. AI ing and Applied Mechanics from Magazine, 29, 9–20. the University of Pennsylvania, in Yu, J., & LaValle, S. M. (2013). Structure and intractability of opti- 2008 and 2011, respectively. She mal multi-robot path planning on graphs. In Proceedings of the was a Postdoctoral Associate at twenty-seventh AAAI conference on artificial intelligence (pp. the Computer Science and Artifi- 1443–1449). https://doi.org/10.1609/aaai.v27i1.8541. cial Intelligence Laboratory at the Zhou, D., Wang, Z., Bandyopadhyay, S., & Schwager, M. (2017). Fast, Massachusetts Institute of Tech- on-line collision avoidance for dynamic vehicles using buffered nology from 2011–2013. In 2013, Voronoi cells. IEEE Robotics and Automation Letters, 2(2), 1047– she joined the Department of Com- 1054. https://doi.org/10.1109/LRA.2017.2656241 puter Science, University of Southern California, as an Assistant Pro- Zhou, Y., & Zeng, J. (2015). Massively parallel A* search on a GPU. In fessor, and in 2020, became Associate Professor. Since January 2022, Proceedings of the AAAI conference on artificial intelligence (vol. she is an Associate Professor with the Department of Computer Sci- 29). https://doi.org/10.1609/aaai.v29i1.9367. ence and the School of Engineering at Brown University. Her research interests include multi-robot planning, coordination, and control. Dr. Ayanian is a founder and advisory chair of the the IEEE Technical Publisher’s Note Springer Nature remains neutral with regard to juris- Committee on Multi-Robot Systems. She is the recipient of several dictional claims in published maps and institutional affiliations. awards, including a being named to the MIT TR35, Okawa Founda- tion Research Award, and the USC Hanna Reisler Mentorship award. Baskın Senba ¸ slar ¸ received his B.S. degree in Computer Engineering in 2017 from Middle East Tech- nical University, Ankara, Turkey. He received his M.S. degree in Computer Science in 2019 from University of Southern Califor- nia, Los Angeles, USA as a Ful- bright Grantee. In 2019, he started his PhD. studies in University of Southern California as a part of Automatic Coordination of Teams (ACT) Lab. In 2022, he moved to the Robotics and Embedded Sys-

Journal

Autonomous RobotsSpringer Journals

Published: Oct 1, 2023

Keywords: Trajectory planning for multiple mobile robots; Decentralized robot systems; Collision avoidance; Multi-robot systems

There are no references for this article.