Get 20M+ Full-Text Papers For Less Than $1.50/day. Start a 14-Day Trial for You or Your Team.

Learn More →

A portable three-dimensional LIDAR-based system for long-term and wide-area people behavior measurement:

A portable three-dimensional LIDAR-based system for long-term and wide-area people behavior... It is important to measure and analyze people behavior to design systems which interact with people. This article describes a portable people behavior measurement system using a three-dimensional LIDAR. In this system, an observer carries the system equipped with a three-dimensional Light Detection and Ranging (LIDAR) and follows persons to be measured while keeping them in the sensor view. The system estimates the sensor pose in a three-dimensional envi- ronmental map and tracks the target persons. It enables long-term and wide-area people behavior measurements which are hard for existing people tracking systems. As a field test, we recorded the behavior of professional caregivers attending elderly persons with dementia in a hospital. The preliminary analysis of the behavior reveals how the caregivers decide the attending position while checking the surrounding people and environment. Based on the analysis result, empirical rules to design the behavior of attendant robots are proposed. Keywords 3-D LIDAR, people detection and tracking, behavior analysis Date received: 3 August 2018; accepted: 22 February 2019 Topic: Service Robotics Topic Editor: Antonio Fernandez-Caballero Associate Editor: Tiziana D’Orazio those models are based on simple analysis of the distance Introduction between persons, they cannot describe the influence of the It is important to measure and analyze people behavior surrounding environment and the other persons. Such lim- for designing systems which interact with people. We itations may yield unnatural behavior of the robots in com- have to understand how people behave with respect to plex situations. To realize a robot with natural and the surrounding people and environment to achieve sys- acceptable behavior, it is necessary to measure person tems with natural and rich interactions with people. In particular for service robots, by analyzing the behavior of a person who is helping another, we could model The Department of Computer Science and Information Engineering, their behavior and create a robot with human-like beha- Toyohashi University of Technology, Toyohashi, Japan vior. This allows robots to have natural interaction with 2 The Department of Information Engineering, the University of Padova, humans and makes them more acceptable in daily ser- Padua, Italy vice situations. Corresponding author: Several models which describe the social interaction Kenji Koide, The Department of Computer Science and Information between persons, such as social distance and social force Engineering, Toyohashi University of Technology, Toyohashi, Aichi model, have been proposed, and a number of works have 441-8580, Japan. 3–5 applied those models to service robots. However, since Email: koide@aisl.cs.tut.ac.jp Creative Commons CC BY: This article is distributed under the terms of the Creative Commons Attribution 4.0 License (http://www.creativecommons.org/licenses/by/4.0/) which permits any use, reproduction and distribution of the work without further permission provided the original work is attributed as specified on the SAGE and Open Access pages (https://us.sagepub.com/en-us/nam/ open-access-at-sage). 2 International Journal of Advanced Robotic Systems public data set (http://github.com/koide3/hdl_graph_slam, and http://www.aisl.cs.tut.ac.jp/database_fukushimura. html). They would be useful to measure and analyze people behavior in situations which are hard for existing people tracking systems. The rest of the article is organized as follows. The fol- lowing section explains related work. The third section describes an overview of the proposed system. The fourth and fifth sections describe the offline Simultaneous Loca- lization and Mapping (SLAM) method using a 3-D LIDAR and the online people behavior measurement method which includes sensor localization and people tracking, respec- Figure 1. The proposed system to measure people behavior tively. The sixth section explains a field test in a hospital using a 3-D LIDAR. The observer carries the backpack with a 3-D and provides a preliminary analysis of the field test. The LIDAR and follows the persons to be measured. 3-D: three- last section concludes the article and discusses future work. dimensional. behavior in diverse situations and construct a sophisticated Related work interaction behavior model. Systems to measure people behavior can be categorized There are several data sets which provide people beha- into two groups: (1) systems using static sensors which are 6 7,8 vior in indoor and outdoor environments. However, to fixed at the environment and (2) systems using wearable the best of our knowledge, no data set provides people sensors attached to the target persons. behavior involving interaction between followed and fol- People tracking using static sensors, such as cameras lowing persons even though such a situation is very com- and laser range finders, has been widely studied. In partic- mon in daily services. Most of existing robots just keep the ular, people tracking using cameras for surveillance is a distance to the target person constant, and this naive fol- major research topic in the computer vision community. lowing strategy could make people feel uncomfortable. We A lot of works have proposed people detection and track- believe that it is necessary to measure and analyze people ing methods using RGB cameras. Recent inexpensive attendant behavior to design the behavior of attendant consumer RGB-D cameras allow us to reliably detect and robots, and it triggered us to develop a system which 11 track people, and a camera network system for people enables long-term and wide-area people behavior measure- 12 tracking using RGB-D cameras has been proposed. ment and create a data set which consists of real profes- Although such works provide reliable people tracking, a sional human’s attendant behavior data. capability of recovering the track of a person, who left the Figure 1 illustrates the proposed system for people camera view once, is necessary. This problem (i.e. person behavior measurement. The system is based on a three- reidentification) has been one of the main research topics of dimensional (3-D) LIDAR, and a human observer carries vision-based people tracking systems. A lot of reidentifica- 13–16 the system and follows the persons to be observed while tion methods based on people appearance and soft 17,18 keeping them in the sensor view. The system simultane- biometric features have been proposed. They enable ously estimates the sensor pose in a 3-D environmental map reliable people reidentification over time and over cameras. and tracks the target persons. The proposed system can be Laser range finders have also been used for people 19,20 applied to long-term and wide-area people behavior mea- tracking systems. Such systems can very accurately surement tasks. localize people, and the measurement area of each sensor The contributions of this article are threefold. First, we is larger than cameras. While the reliability and the detec- propose a portable measurement system which enables tion accuracy of those static sensor-based systems are very long-term and wide-area people behavior measurements. good, they can measure people behavior only in an area We validated that the tracking accuracy of the proposed limited by the sensor view. In order to cover a large envi- system is comparable to a static sensor-based people track- ronment, they require the placement of a lot of static sen- ing system. Second, we provide a preliminary analysis of a sors, thereby increasing the time and cost of installing and field test of the proposed system in a hospital. We recorded calibrating all the sensors. the behavior of professional caregivers attending elderly Another way to measure the behavior of specific persons persons with dementia. The results show that the proposed for a long time over a wide area is to attach a wearable system can be applied to the measurement of real people sensor to each target person and measure their behavior behavior. In addition to that, based on the analysis results, with the sensor. Several kinds of sensors, such as inertia we propose empirical rules to design the behavior of atten- navigation system (INS) and global positioning system dant robots. Third, we provide the software of the system (GPS), have been used for this purpose. Recent small wear- and the recorded people behavior as open-source and a able GPS sensors allow us to track a person in outdoor Koide et al. 3 environments, and they have been applied to several appli- 21,22 cations of people behavior measurement and analysis. As an application, GPS-based wearable devices for helping elderly or visually impaired people have been pro- 23,24 posed. The combination of GPS and INS improves tracking accuracy under low-level GPS radio power. However, GPS signals are not available in places close to buildings and indoor environments. Recently, Wi-Fi signal-based localization has been 26–28 widely studied. Some of them are based on triangula- tion of Wi-Fi signal strength and show decimeter or cen- 26,27 timeter accuracy in ideal situations. However, they require to place multiple antennas in the environment to accurately estimate the device position, and thus, it is hard to be applied to a large environment. Other ones are based on the matching of Wi-Fi fingerprint matching. While they do not rely on external antennas and can be applied to large environments where Wi-Fi signal is available, the estimation accuracy is very limited. Figure 2. System overview. Behavior measurement systems for indoor environments based on pedestrian dead reckoning have also been pro- In the behavior measurement phase, the system esti- 29,30 posed. Those methods estimate the target person posi- mates its pose on the map created offline by combining tion by integrating acceleration and angular velocity a scan matching algorithm with an angular velocity-based obtained by an INS (attached to the person). In order to pose prediction using unscented Kalman filter (UKF). prevent estimation drift, Li et al. combined pedestrian dead Simultaneously, the system detects and tracks the target reckoning and map-based localization. Those methods persons. can keep track the position of the person as long as they hold the sensor. Since they utilize smartphones which are Offline environmental mapping very common and inexpensive in recent years, those meth- ods are cost-effective and easy to adopt. However, since Graph SLAM INS is an internal sensor and it cannot sense the surround- Graph SLAM is one of the most successful approaches to ing environment, it is hard to accurately measure the person the SLAM problem. In this approach, the SLAM problem is position with respect to the environment and other persons’ solved by constructing and optimizing a graph whose nodes positions. Thus, they cannot be applied to the measurement represent parameters to be optimized, such as sensor poses of the interaction between persons and that of person’s and landmark positions, and edges represent constraints, behavior affected by the environment. such as relative poses between sensor poses and landmarks. The graph is optimized so that the errors between the para- 31,33 System overview meters and the constraints are minimized. Following, let x be the node k. Let z and O be the mean and the k k k Figure 2 shows an overview of the proposed system. In information matrix of the constraints relating to x . The this system, the observer carries the backpack equipped objective function is defined as with a 3-D LIDAR (velodyne HDL-32e) and a PC and follows the persons to be measured. The 3-D LIDAR FðxÞ¼ e ðx ; z Þ O e ðx ; z Þ; ð1Þ k k k k k k k provides 360 range data at 10 Hz, and from the range data, the system estimates its pose while tracking the where e ðx ; z Þ is an error function between the para- k k k target persons. The process of the proposed system con- meters x and the constraints z . Typically, equation (1) k k sists of two phases: (1) offline environmental mapping is linearized and minimized by using Gauss–Newton or and (2) online sensor localization and people detection/ Levenberg–Marquardt algorithms. tracking. However, if the parameters span over non-Euclidean In the offline mapping phase, we create a 3-D environ- spaces (like pose parameters), those algorithms may lead mental map which covers the entire measurement area. For to suboptimal or invalid solutions. One way to deal with the mapping, we employ a graph optimization-based this problem is to perform the error optimization on a mani- SLAM approach (i.e. Graph SLAM ). In order to compen- fold which is a minimal representation of the parameters sate accumulated rotational errors of the scan matching, we and acts as a Euclidean space locally. In order to enable it, introduce ground plane and GPS position constraints for an operator ? is introduced, which transforms a local indoor and outdoor environments, respectively. variation Dx on the manifold. 4 International Journal of Advanced Robotic Systems Algorithm 1. Loop-detection Typically, in the 3-D SLAM problem, node x has parameters of the sensor pose at k (a translation vector t and a quaternion q ). A manifold of the quaternion T T q ¼½q ; q ; q ; q  can be represented as ½q ; q ; q  , and k w x y z x y z the operator ? is described as hi qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi 0 0 0 0 0 0 2 q ? Dq ¼ 1k q þ q þ q k ; q ; q ; q ð2Þ k x y z x y z where q ¼ q  Dq . In the proposed system, we first estimate the sensor trajectory by iteratively applying normal distributions transform (NDT) scan matching between consecutive frames. For 3-D LIDARs, NDT shows a better performance Figure 3. Comparison of the sensor trajectories estimated by than other scan matching algorithms, such as iterative clo- the existing method and the proposed method. (a) BLAM. (b) sest points, in terms of both the reliability and the pro- LeGO-LOAM. (c) Ours without plane constraints. (d) Ours with cessing speed. Let p be the sensor pose at t, consisting of t plane constraints. a translation vector t and a quaternion q, and r be the t;tþ1 relative sensor pose between t and t þ 1 estimated by the updated such that equation (1) is minimized. We utilize scan matching. We add them to the pose graph as nodes 33 g2o, a general framework for hypergraph optimization, ½p ;  ; p  and edges ½r ;  ; r . Then, we find 0;1 N1;N 0 N for the pose graph optimization. loops in the trajectory and add them to the graph as edges As a generated map gets larger, it tends to be bent due to (i.e. loop closure) to correct the accumulated error of the the accumulated rotational error of the scan matching (see scan matching with Algorithm 1. Figure 3). In order to compensate the error, we introduce The loop detection algorithm is similar to the work of ground plane and GPS position constraints for indoor and Nelson. First, we detect loop candidates based on the outdoor environments, respectively. Figure 4 shows an translational distance and the length of the trajectory illustration of the graph structure of the proposed system. between nodes (lines 2–11). Then, to validate the loop candidates, a scan matching algorithm (in our case, NDT) Ground plane constraint is applied between the nodes of each candidate. If the fit- ness score is lower than a threshold (e.g. 0.2), we add the To reliably generate the map of a large indoor environment, loop to the graph as an edge between the nodes (lines we assume that the environment has a single flat floor and 12–17). Every time a loop is found, the pose graph is introduce the ground plane constraint which optimizes the Koide et al. 5 To calculate the error between sensor pose p and the ground plane p , we first transform the ground plane into the local coordinate of the sensor pose p 0 0 0 T T ½n ; n ; n  ¼ R ½n ; n ; n  ð3Þ x y z t x y z 0 0 0 0 d ¼ d  t ½n ; n ; n  ð4Þ t x y z 0 0 0 0 0 where p ¼½n ; n ; n ; d  is the ground plane in the local 0 x y z coordinate, and ½R jt  is the sensor pose at time t. t t Following Ma et al.’s work, we employ the minimum parameterization tðpÞ¼ ð; ; dÞ,where ; ; and d are the azimuth angle, the elevation angle, and the length of the intercept, respectively. The error between a pose node and the ground plane node is defined as Figure 4. The proposed pose graph structure. n n y z tðpÞ¼ arctan ; arctan ; d ð5Þ n jnj e ¼ tðp Þ tðpÞð6Þ i;0 0 t where p is the detected ground plane at t. GPS constraint In outdoor environments where the ground is not flat, we use the GPS-based position constraint instead of the ground plane constraint. For ease of optimization, we first trans- form GPS data into the universal transverse mercator coor- dinate, where a GPS data has easting, northing, and altitude values in a Cartesian coordinate. Then, each GPS data is associated with the pose node, which has the closest time Figure 5. Ground plane detection. Points within a certain height stamp to the GPS data, as a unary edge of the prior position range are extracted by height thresholding (green points), and then RANSAC is applied to them to detect the ground plane (red information. points). The horizontality of the ground plane is validated by The error between the translation vector t of a pose checking the plane normal. node p and a GPS position T is simply given by e ¼ t  T ð7Þ i t t pose graph such that the ground plane detected in each observation becomes the same plane. This assumption is valid in many indoor public environments, such as schools SLAM framework evaluation and hospitals. In order to validate the proposed SLAM system, we We assume that the approximate height of the sensor is recorded a 3-D point cloud sequence in an indoor environ- known (e.g. 2 m) and extract points within a certain height ment. Figure 6 shows the experimental environment and range which should contain the ground plane points (e.g. the trajectory of the sequence. The duration of the sequence (1.0, þ1.0) m from the ground). Then, we apply RAN- is about 45 min (2700 s), and the length of the trajectory is SAC to the extracted point cloud and detect the ground about 2400 m (estimated by the proposed method). plane. If the normal of the detected plane is almost vertical For comparison, we generated 3-D environmental maps (the angle between the normal and the unit vertical vector is using the proposed method with and without plane con- lower than 10 ), we consider that the ground plane is cor- straints. We also applied existing publicly available SLAM rectly detected and add a ground plane constraint edge to 37 40 frameworks, BLAM and LeGO-LOAM, to this data set. the graph. Figure 5 shows an example of the detected Figure 3 shows the trajectories estimated by the different ground planes. Green points are the points extracted by the SLAM algorithms. BLAM and LeGO-LOAM were aborted height thresholding, and red points belong to the ground in the middle of the sequence when they failed to estimate plane detected by RANSAC. We detect the ground plane the trajectory and did not recover. BLAM failed to find the every 10 s and connect the corresponding sensor pose node loops due to the accumulated rotation error of the scan p with the fixed ground plane node where the plane coeffi- matching and generated a warped and inaccurate trajectory. T T cients are p ¼½n ; n ; n ; d ¼½0; 0; 1; 0 . 0 x y z Since LeGO-LOAM maintains the local consistence of the 6 International Journal of Advanced Robotic Systems Figure 6. The experimental environment. The duration of the sequence is about 45 min, and the length of the trajectory is about 2400 m. Figure 7. The created environmental map. The color indicates the height of each point. The height of the floor is consistent thanks to the plane constraint. ground plane between consecutive frames, the estimated We also validated the proposed method in an outdoor trajectory is flatter than the one estimated by BLAM. How- environment. Figure 8(a) shows the environment and the ever, it still suffers from the accumulated rotational error trajectory of the sequence. The duration of the sequence due to the lack of the global ground constraint. Eventually, is about 42 min (2500 s). Figure 8(b) shows the map it failed to estimate the trajectory when the observer made a generated by the proposed method with the GPS con- u-turn at the end of a narrow corridor. straint. Although there were large undulations, the sys- With and without the plane constraint, the proposed tem correctly found loops and constructed a proper pose method could construct pose graphs properly thanks to the graph thanks to the GPS constraint. Note that, without reliability of NDT, and it generated consistent maps. How- the GPS constraint, the system could not find the loop ever, without the plane constraint, the resultant map is due to the scan matching error and failed to create the warped due to the accumulated rotational error which is environmental map. hard to be corrected by loops on a plane. With the ground plane constraint, the accumulated rotational error is cor- rected, and the resultant map is completely flat. Figure 7 Online people behavior measurement shows the generated environmental map. The color indi- cates the height of each point. The floor has the consistent In order to measure people behavior, the system simulta- height thanks to the plane constraint. The result shows that neously estimates the sensor pose on the 3-D environmental the proposed plane constraint is effective to compensate the map and tracks people around the observer. Figure 9 shows accumulated rotational error in a large indoor environment. an overview of the online sensor localization and people Table 1 shows the processing time of the proposed tracking system. By integrating angular velocity and range method and BLAM. The processing time of LeGO- data provided by the LIDAR, the system estimates the sen- LOAM is not available here, since it provides only real- sor pose. Then, it detects and tracks people to know people time processing. While BLAM took about 15,327 (s) to positions with respect to the environmental map. Note that generate the map, the proposed method took about 5392 the initial pose of the sensor is given by hand to avoid the (s) thanks to the computational efficiency of NDT. global localization problem. Koide et al. 7 Table 1. Processing time of BLAM and our SLAM system. NDT scan matching with angular velocity data provided by the 3-D LIDAR using UKF. Method Time (s) We define the sensor state to be estimated as Ours Scan matching 1542 a T x ¼½p ; q ; v ; b  ð8Þ t t Floor detection 231 t t t Loop closing 3619 where p is the position, q is the rotation quaternion, v is t t Total 5382 the velocity, and b is the bias of the angular velocity of the BLAM Total 15,327 sensor at time t. Assuming constant translational velocity for the sensor motion model, and constant bias for the angular velocity sensor, the system equation for predicting the state is defined as a T x ¼½p þ Dt  v ; q  Dq ; v ; b  ð9Þ t t1 t1 t1 t1 t t1 where Dt is the duration between t and t  1, and Dq is the rotation during Dt caused by the bias-compensated angular velocity a ¼ a  b t t t1 Dt 0 Dt Dt 0 x y z Dq ¼ 1; a ; a ; a ð10Þ t t t 2 2 2 With equation (9), the system predicts the sensor pose by using UKF and then applies NDT to match the observed point cloud with the global map with the estimated x and q t t as the initial guess of the sensor pose. Then, the system corrects the sensor state with the sensor pose estimated by 0 0 the scan matching z ¼½p ; q  . The observation equation t t is defined as z ¼½p ; q  ð11Þ t t We normalize the quaternion in the state vector after each of the prediction and correction steps to prevent its norm from changing due to the unscented transform and the accumulated calculation error. It is worth mentioning that we also implemented pose prediction which takes accelera- tion into account. However, the estimation result got worse due to the strong noise on acceleration observations. Figure 8. The SLAM system validation in an outdoor environ- ment. (a) The outdoor environment. The duration of the People detection and tracking sequence is about 42 min, and the length of the trajectory is about We first remove the background points from an observed 3000 m. (b) The 3-D map of the outdoor environment generated by the proposed method with GPS constraints. The color indi- point cloud to extract the foreground points. Then, we cre- cates the height of each point. 3-D: three-dimensional; GPS: global ate an occupancy grid map with a certain voxel size (e.g. positioning system. 0.5 m) from the environmental map. The input point cloud is transformed into the map coordinate according to the sensor pose estimated by UKF, and then each point at a Sensor localization voxel containing environmental map points is removed as We can estimate the sensor ego motion by iteratively the background. The Euclidean clustering is then applied to applying a scan matching algorithm as in the SLAM part. the foreground points to detect human candidate clusters. However, in contrast to the SLAM scenario, the observer However, in case persons are close together, their clusters has to follow the target persons during the measurement may be wrongly merged and are detected as a single clus- and sometimes has to move quickly to keep them in the ter. To deal with this problem, we employ Haselich’s split- sensor view. In such cases, the sensor motion between merge clustering algorithm. frames gets very large and the scan matching may wrongly The algorithm first divides a cluster into subclusters estimate the sensor ego motion due to the large displace- until each cluster gets smaller than a threshold (e.g. 0.45 ment. In order to deal with this problem, we integrate the m) by using dp-means so that every cluster does not have 8 International Journal of Advanced Robotic Systems Figure 9. The online sensor pose estimation and people detection and tracking system. Figure 10. Haselich’s clustering algorithm. The green bounding box indicates the Euclidean clustering result. Two persons are Figure 11. The experimental environment of the sensor locali- wrongly detected as a single cluster. The cluster is divided into zation experiment. small subclusters (red bounding boxes) and then remerged if there is no gap between those subclusters. The blue bounding boxes are the final detection result. (a) Top view. (b) Bird’s eye the tracked persons are visible from the sensor and are view. correctly detected. points of different persons. Then, if there is no gap between Sensor localization evaluation those subclusters, the clusters are considered to belong to a single person and remerged into one cluster. Figure 10 To show how the pose prediction improves the sensor loca- shows an example of the detection results. The person lization, we conducted a sensor localization experiment. clusters are correctly separated even when they are very Figure 11 shows the experimental environment. An obser- close together thanks to the split and the remerge process. ver carries the system and moves along the corridor, and the The detected clusters may contain nonhuman clusters system estimates its pose from the range and angular velo- (i.e. false positives). To eliminate nonhuman clusters city data. We conducted the experiment twice. In the first among detected clusters, we judge whether a cluster is a trial, the observer walked (about 1.5 m/s) to avoid the sen- human or not by using a human classifier trained with slice sor being moved quickly. In the second trial, the observer 43 44 features by Kidono et al. and Schapire and Singer. ran (about 3.0 m/s) and the sensor got shaken very strongly. Assuming that persons walk on the ground plane, we track Figure 12 shows the results of the first trial. Figure 12(a) persons on the XY plane without the height. We employ the shows the estimated trajectories with and without the pose combination of Kalman filter with the constant velocity prediction. Since the observer moved slowly during the model and global nearest neighbor data association to first sequence, both the results show the same correct tra- track persons. The tracking scheme works well as long as jectory. To assess the effect of the sensor pose prediction, Koide et al. 9 Figure 13. The results of the second trial of the sensor locali- Figure 12. The results of the first trial of the sensor localization zation experiment. The observer ran during the trial (about 3.0 m/ experiment. The observer walked during the trial (about 1.5 m/s). s). Without the pose prediction, the system could not correctly Both the trajectories with and without the angular velocity-based estimate the pose due to the very quick motion. (a) Estimated pose prediction are correctly estimated. With the prediction, the trajectories. (b) Difference between the predicted and the cor- initial guess for NDT significantly gets closer to the correct pose. rected positions. (c) Difference between the predicted and the (a) Estimated trajectories. (b) Difference between the predicted corrected rotations. (d) Processing time. and the corrected positions. (c) Difference between the predicted and the corrected rotations. (d) Processing time. NDT: normal distributions transform. Table 2. The summary of the sensor localization experiment. With prediction Without prediction we assume that the trajectories estimated by NDT are mostly correct, and we compare the predicted sensor poses Error Error Time Error Error Time with the poses estimated by NDT since measuring the Seq. (m) ( ) (ms) (m) ( ) (ms) ground truth of the sensor trajectory is difficult. Figure st (walk) 0.0588 1.0913 38.88 0.1367 2.1625 40.06 12(b) and (c) shows the difference between the predicted nd (run) 0.1851 4.2845 45.14 0.3330 6.6798 56.11 sensor pose (initial guess pose) and the one estimated by NDT. In the case without the pose prediction, the previous matching result is used as an initial guess. With the predic- iterations to converge to a local solution. With the predic- tion, the translational and rotational pose prediction errors tion, the matching took about 45 ms per frame thanks to the significantly decrease thanks to the constant velocity model good initial guess (see Table 2). The results show that the and the consideration of angular velocity, respectively. angular velocity-based pose prediction makes the pose esti- The results of the second trial are shown in Figure 13. mation robust to quick motions and fast to converge. The system failed to estimate the sensor pose without the pose prediction (see Figure 13(a)) since the observer moved very quickly, and the sensor displacement between frames People detection evaluation got larger. The NDT matching took a longer time (about 56 ms per frame) without the pose prediction since the large To analyze the effect of the split-merge clustering and the displacement between frames makes NDT need more human classifier, we recorded a 3-D range data sequence, 10 International Journal of Advanced Robotic Systems Table 3. The people detection evaluation result. Split-merge 41 43 clustering Human classifier Precision Recall F-measure Without Without 1.000 0.834 0.909 Without With 1.000 0.809 0.894 With Without 0.902 0.995 0.946 With With 0.961 0.961 0.961 in which two persons are close together and walking side by side. It is a hard situation for the usual Euclidean clus- tering since the persons’ clusters may be merged into a single cluster. The number of frames is 102, and we applied the human detection method with and without the split- Figure 14. The experimental environment and the configuration merge clustering and the human classifier to this sequence. of RGB-D cameras for OpenPTrack. Nine Kinect v2s are placed in Table 3 shows the evaluation result. Without both the the corridor. While OpenPTrack can measure only the limited area covered by cameras (about 2  20 m area), the proposed techniques, the recall value is low (0.834), since clusters of system can cover the whole of the floor. the persons are sometimes detected as a single cluster due to the Euclidean clustering. With the split-merge cluster- ing, the wrongly merged clusters are split into subclusters, Table 4. The difference of the observer and the subject positions and the recall value gets higher (0.995). With both the measured by the proposed system and OpenPTrack. split-merge clustering and the human classifier, over split Difference (m) subclusters are eliminated by the classifier, and the highest F-measure value is achieved (0.961). This result shows Min Max Mean Standard deviation that, in situations where persons are close together, the Observer 0.0008 0.2126 0.0768 0.0448 split-merge clustering effectively increases the recall of Subject 0.0035 0.2837 0.0990 0.0445 human detection, and by combining it with the human classifier, we can obtain reliable human detection results. errors of OpenPTrack at the border of the camera view. However, the difference is lower than 0.1 m on average, Comparison with a static sensor-based people and the result shows that the measurement accuracy of the tracking system proposed system and the static sensor-based people track- In order to reveal the pros and cons of the proposed system, we ing system are comparable. compared the proposed system with a publicly available static In summary, the tracking accuracy of the proposed por- sensor-based people tracking framework, OpenPTrack. The table system is comparable to the static sensor-based system, framework is designed for people tracking using static RGB- and the measurement area of the proposed system can be D cameras, and it is scalable to a large camera network. extended easily. For instance, the system can measure the Moreover, it uses cost-effective hardware and is easy to setup. people behavior over the whole area of the map shown in It has been operated by people including nonexperts in com- Figure 7 (200  50 m ). We would need hundreds of cam- puter vision, such as artists and psychologists. eras to cover the whole area of the map if we used a static Figure 14 shows the experimental environment and the sensor-based system in the environment. On the other hand, configuration of the RGB-D camera network. The map is static sensor-based systems can measure behavior of all peo- created by the proposed SLAM method. We placed nine ple in the covered area simultaneously, while the proposed Kinect v2s so that they cover about 2  20 m area. We system covers only the surrounding area. Thus, we can say calibrated the camera network according to the procedure that the proposed system is suitable to measure the behavior provided by OpenPTrack and then estimated the transfor- of specific people over a large area, while static sensor-based mation between the environmental map and the camera systems are suitable for behavior measurement of all the network by performing ICP registration between point people in a relatively small environment. clouds of the Kinects and the environmental map. While a subject walked in the corridor, an observer car- Field test in a hospital rying the proposed system followed him. The trajectories of both the persons were measured by the proposed system Measuring behavior of caregivers attending and OpenPTrack. Table 4 shows the summary of the dif- elderly persons ferences between the people positions measured by the proposed system and OpenPTrack. The differences some- To show that the proposed system can be applied to real times became larger (about 0.2–0.3 m) due to detection people behavior measurements, we conducted a field test in Koide et al. 11 Figure 15. A snapshot of the field test. The behavior of the care giver attending an elderly is recorded by using the proposed system. (a) Image. (b) Range data. Sawarabikai Fukushimura hospital. The hospital is specia- lized for elderly care, and hundreds of elderly patients are hospitalized and receiving care and rehabilitation in the hospital. Under permission granted by the hospital, we recorded professional caregivers’ behavior while they attend elderly persons with dementia. Figure 15 shows a snapshot of the field test. The caregiver attends the elderly to prevent accidents (such as stumbling, colliding, and fall- ing) and sometimes guides him/her to their room. Figure 16. The environments of the field test. (a) Hallway (1F). The number of sequences is 33, and the total duration is (b) Ward (2F). about 52 min. We also recorded an attendant behavior sequence in an outdoor environment shown in Figure 8. Preliminary analysis of the attendant behavior The duration of the outdoor sequence is about 22 min. Note To show the possibility of the behavior analysis with the that, for privacy reasons, we captured images during only proposed system, we provide preliminary analysis of the the sequence shown in Figure 15 with the special permis- measured behavior sequences. sion from the hospital, the subject, and his family. In the Figure 17(a) shows the distribution of the distance other sequences, we recorded only range data. It is a merit between a caregiver and an elderly person in the indoor of the proposed system that it can measure people behavior environment. The distribution is unimodal, and the peak is without privacy problems. at about 0.6 m. In proxemics, this distance is categorized Figure 16 shows the created indoor environmental maps as “Personal distance (0.45–1.2 m),” and people allow through the field test. The elderly persons take rest at the only familiar people to be within this distance while they dining hall on the first floor and then return to their hospital keep more distance (i.e. “Social distance (1.2–3.6 m)”) room on the second floor with a caregiver using the eleva- when meeting or interacting with unfamiliar people. It tor. After they ride the elevator, we switch the map from the implies that people maintain a closer relationship while one of the first floor to the second floor. attending another person comparing to usual people inter- During the measurement, there were other patients and action, such as meeting. Figure 17(b) shows the distribu- objects, such as wheelchairs and medicine racks, and the tion of the caregivers’ position with respect to the elderly observer sometimes had to move quickly to keep the sub- persons. The caregivers usually locate at the side of the jects in sensor view. However, the proposed system could elderly persons. In order to lead the elderly persons, they correctly localize itself through all the sequences thanks to slightly precede the patients. The distribution is a bit ani- the wide measurement area of the 3-D LIDAR and the sotropic: when a caregiver is following an elderly person, integration of the scan matching and the angular velocity- the distance between them tends to be larger since the based pose prediction. caregivers see the elderly person and the surrounding Regarding people tracking, the system failed to keep environment at the same time. From this preliminary anal- track of the subjects when a patient came between the ysis, we can find that the caregivers decide their attending observer and the subjects to be observed, and new IDs were position in order to keep the elderly person in the view and assigned to the subjects after they reappeared. In such look ahead in the environment. cases, the system notifies that it lost the track of subjects, Figure 18(a) shows the trajectories of the caregivers and and we reassigned correct IDs to them by hand. Since we the elderly persons at a corner, and it also suggests the saw those cases only a few times, the system could keep track of the subjects for the most part of the sequences, and importance of visibility for deciding the attending position. we could reassign all the IDs with the minimum effort. The number of the trajectories is 17. The caregivers tend to 12 International Journal of Advanced Robotic Systems Figure 17. An analysis of the people attending behavior during the field test in an indoor environment. (a) The distribution of the distance between the elderly person and the caregiver. (b) The distribution of the relative position of the caregiver with respect to the elderly person. Figure 18. The trajectories of the caregivers (in orange) and the elderly persons (in green) at a corner. The light blue lines indicate that the connected points are measured at the same time. In most of the cases, the caregivers walked on the outer side of the corner (15 of 17). In a few cases, the caregivers walked on the inner side. In such cases, they preceded the elderly persons to ensure outlook of the corridor (2 of 17). (a) All the trajectories of the caregivers and the elderly person. (b) An example of the cases where the caregiver walks on the outer side of the corner. (c) The case where the caregiver walks on the inner side of the corner. walk on the outer side of the corner (15 of 17). We can to 1.0–1.2 m/s, while they walked at 1.2–1.4 m/s in down consider that, by walking at the outer side, the caregivers slopes. Slopes influence not only their walking speed but also their position relationship. We extracted their behavior keep the outlook of the corridor to prevent accidents, such as stumbling and colliding. The caregivers walk on the in up slopes and down slopes, respectively, and calculated inner side in a few cases (2 of 17). However, they preceded the distributions of the caregiver’s relative position with the elderly persons in order to check the safeness before the respect to the elderly (see Figure 20). We can see that, in elderly persons enter the corner. These results suggest that down slopes, the elderly led the caregiver while they the caregivers always check the existence of other sur- walked side by side in up slopes due to the change of the walking speed. Although the caregiver’s “X-axis” position rounding people and objects, such as wheelchairs, to pre- vent accidents. varies depending on the walking speed, he/she almost Figure 19(a) shows the recorded trajectories in the out- always stays at 0.6 m side from the elderly. This is also door environment. In this sequence, the elderly was fine to observed in indoor environments (see Figure 17). These walk, and the caregiver did let him walk relatively freely results suggest that, during attendance, professional care- while navigating him to return back to the hospital. Figure givers adjust their position depending on the elderly per- 19(b) shows the caregiver’s walking speed and the eleva- sons’ status and the surrounding environment, while tion of her position in the global map. When the caregiver keeping their side distance to the elderly persons constant. (and the elderly) was going up a slope, they got slow down This can be applied to designing of person following Koide et al. 13 robots. Most of existing person following robots just keep the side distance to the target constant, and it may contrib- the distance to the target constant. However, it might be ute the naturalness of the following behavior of the robot. Those analysis results are difficult to obtain using exist- unnatural behavior for people. We can make the robot keep ing measurement systems which use static sensors or wear- able devices, such as INS and GPS, since it requires accurately measure people behavior with respect to other people and the surrounding environment. The results show that we can capture and analyze such people behavior with the proposed system. Person following behavior rules Based on the analysis of the real caregivers’ behavior, we propose empirical rules to design the behavior of attendant robots. It would be helpful to design a robot which attends a person while keeping him/her away from dangerous situations. 1. The robot attends the person while keeping the side- by-side positioning as long as it’s possible. In par- ticular, it should keep in the position 0.6 m aside from the person. 2. Depending on the walking speed, the relative posi- tion would deviate along the front-back direction. However, even in such a case, the robot should keep the certain distance aside from the person. 3. At a corner, the robot should go on the outer side of the corner so that it can check the safeness of the corridor while avoiding to disturb the person. 4. In case the robot cannot go on the outer side due to positioning and obstacles, it should go on the inner side before the person enters the corner and check whether it’s safe. It would slightly disturb the per- son from walking. However, the safety has a higher priority than the comfortableness. Figure 19. The recorded attendant behavior in the outdoor environment. (a) People trajectory. (b) The caregiver’s walking 5. To attend a person who is fine to walk, the robot has speed (green) and altitude (blue). to be able to run at about 1.4 m/s. Figure 20. The distribution of the relative position of the care giver with respect to the elderly person in an outdoor environment. (a) Up slopes. (b) Down slopes. 14 International Journal of Advanced Robotic Systems 2. Helbing D and Molnar P. Social force model for pedestrian Note that the values in the rules, such as the distance to dynamics. Phys Rev E 1995; 51(5): 4282. DOI: 10.1103/ the person to be attended, should be adjusted depending on PhysRevE.51.4282. the robot configuration (e.g. size and shape). However, we 3. Ferrer G, Garrell A, and Sanfeliu A. Robot companion: a social- believe that the rules would be a good initial guide to force based approach with human awareness-navigation in designing a comfortable attendant robot which is socially crowded environments. In: IEEE/RSJ International Conference acceptable. on Intelligent Robots and Systems, Tokyo Japan, 6 January 2013, pp. 1688–1694. IEEE. DOI: 10.1109/IROS.2013. Conclusions and discussion 4. Ferrer G and Sanfeliu A. Proactive kinodynamic planning This article has described a portable people behavior mea- using the extended social force model and human motion surement system using a 3-D LIDAR. The proposed system enables long-term and wide-area behavior measurement. prediction in urban environments. In: IEEE/RSJ International The system first creates a 3-D map of the environment Conference on Intelligent Robots and Systems, Chicago, using the Graph SLAM approach in advance to measure- USA, 14 September 2014, pp. 1730–1735. DOI: 10.1109/ ments. Then, it estimates its pose, detects, and tracks people IROS.2014.6942788. simultaneously. The tracking accuracy of the system is 5. Oishi S, Kohari Y, and Miura J. Toward a robotic attendant comparable to a static sensor-based people tracking system. adaptively behaving according to human state. In: IEEE As a field test, we demonstrated the effectiveness of the International Symposium on Robot and Human Interactive proposed system in measuring the behavior of professional Communication, New York, USA, 26 August 2016, pp. caregivers’ attending elderly persons. Based on the analysis 1038–1043. IEEE. DOI: 10.1109/ROMAN.2016.7745236. of the measured behavior, empirical rules to design the 6. Brscic D, Kanda T, Ikeda T, et al. Person position and body behavior of attendant robots are proposed. The measure- direction tracking in large public spaces using 3D range sen- ment system and the professional caregivers’ behavior data sors. IEEE Trans Human Mach Syst 2013; 43(6): 522–534. set have been public so that they can be used for to mea- 7. Baltieri D, Vezzani R, and Cucchiara R. 3DPeS: 3D people surement and analysis of people attendant behavior. dataset for surveillance and forensics. In: ACM Workshop on The current system requires a human observer who car- Multimedia access to 3D Human Objects, Scottsdale, Ari- ries the backpack with the 3-D LIDAR, thus manual effort zona, USA, pp. 59–64. to observe people is necessary. The human observer would 8. Benfold B and Reid I. Stable multi-target tracking in real- be replaced with a mobile robot so that a large attendant time surveillance video. In: IEEE Conference on Computer behavior data set is automatically created for improving the Vision and Pattern Recognition, Colorado Springs, USA, 20 June 2011, pp. 3457–3464. robot attendant behavior. 9. Zhang S, Benenson R, Omran M, et al. How far are we from Acknowledgement solving pedestrian detection? In: IEEE Conference on Com- puter Vision and Pattern Recognition, Las Vegas, USA, 27 The authors would like to thank O. Kohashi, S. Yamamoto, and June 2016, pp. 1259–1267. IEEE. DOI: 10.1109/CVPR. T. Gomyo for allowing us to conduct the field test in Sawarabi- kai Fukushimura hospital and their excellent cooperation during 2016.141. the test. 10. Fuentes LM and Velastin SA. People tracking in surveillance applications. Image Vision Comput 2006; 24(11): 1165–1171. Declaration of conflicting interests DOI: 10.1016/j.imavis.2005.06.006. Performance Evaluation The author(s) declared no potential conflicts of interest with of Tracking and Surveillance. respect to the research, authorship, and/or publication of this 11. Luber M, Spinello L, and Arras KO. People tracking in RGB- article. D data with on-line boosted target models. In: IEEE/RSJ International Conference on Intelligent Robots and Systems, Funding San Francisco, USA, 25 September 2011, pp. 3844–3849. The author(s) disclosed receipt of the following financial support IEEE. DOI: 10.1109/IROS.2011.6095075. for the research, authorship, and/or publication of this article: This 12. Munaro M, Basso F, and Menegatti E. OpenPTrack: Open work is in part supported by JSPS Kakenhi No. 25280093 and the source multi-camera calibration and people tracking for Leading Graduate School Program R03 of MEXT. RGB-d camera networks. Robot Auton Syst 2016; 75: 525–538. DOI: 10.1016/j.robot.2015.10.004. ORCID iD 13. Bedagkar-Gala A and Shah SK. A survey of approaches and Kenji Koide https://orcid.org/0000-0001-5361-1428 trends in person re-identification. Image Vision Comput 2014; 32(4): 270–286. DOI: 10.1016/j.imavis.2014.02.001. References 14. Satake J, Chiba M, and Miura J. A SIFT-based person 1. Hall E. The hidden dimension: Man’s use of space in public identification using a distance-dependent appearance and private. London, UK: Doubleday anchor books, Bodley model for a person following robot. In: IEEE Interna- Head, 1969. ISBN 9780370013084. tional Conference on Robotics and Biomimetics, Koide et al. 15 Guangzhou, China, 11 December 2012, pp. 962–967. 29. Li F, Zhao C, Ding G, et al. A reliable and accurate indoor localization method using phone inertial sensors. In: ACM IEEE. DOI: 10.1109/ROBIO.2012.6491093. Conference on Ubiquitous Computing, Pittsburgh, USA, 5 15. Koide K and Miura J. Identification of a specific person using September 2012, pp. 421–430. ACM. DOI: 10.1145/ color, height, and gait features for a person following robot. 2370216.2370280. Robot Auton Syst 2016; 84: 76–87. DOI: 10.1016/j.robot. 30. Kang W and Han Y. Smartpdr: smartphone-based pedestrian 2016.07.004. dead reckoning for indoor localization. IEEE Sens J 2015; 16. Ristani E and Tomasi C. Features for multi-target multi- 15(5): 2906–2916. DOI: 10.1109/JSEN.2014.2382568. camera tracking and re-identification. In: IEEE Conference 31. Grisetti G, Kummerle R, Stachniss C, et al. A tutorial on on Computer Vision and Pattern Recognition, Salt Lake City, graph-based slam. IEEE Int Trans Syst Magaz 2010; 2(4): USA, 18 June 2018. 31–43. DOI: 10.1109/MITS.2010.939925. 17. Munaro M, Fossati A, Basso A, et al. One-shot person 32. Wan E and Merwe RVD. The unscented Kalman filter for re-identification with a consumer depth camera. In: Person nonlinear estimation. In: Adaptive Systems for Signal Pro- Re-Identification, 2014, pp. 161–181. Springer. DOI: 10.1007/ cessing, Communications, and Control Symposium,Lake 978-1-4471-6296-4_8. Louise, Canada, 4 October 2000. IEEE. DOI: 10.1109/ 18. Semwal VB, Raj M, and Nandi G. Biometric gait identifica- asspcc.2000.882463. tion based on a multilayer perceptron. Robot Auton Syst 2014; 33. Ku¨mmerle R, Grisetti G, Strasdat H, et al. G2o: A general 65: 65–75. DOI: 10.1016/j.robot.2014.11.010. framework for graph optimization. In: IEEE International 19. Song X, Cui J, Zhao H, et al. Laser-based tracking of multiple Conference on Robotics and Automation, Shanghai, China, interacting pedestrians via on-line learning. Neurocomputing 9 May 2011, pp. 3607–3613. IEEE. DOI: 10.1109/ICRA. 2013; 115: 92–105. DOI: 10.1016/j.neucom.2013.02.001. 2011.5979949. 20. Nakamura K, Zhao H, Shibasaki R, et al. Human sensing in 34. Magnusson M, Lilienthal A, and Duckett T. Scan registration crowd using laser scanners. London, UK: INTECH Open for autonomous mining vehicles using 3D-NDT. J Field Access Publisher, 2012. DOI: 10.5772/33276. Robot 2007; 24(10): 803–827. DOI: 10.1.1.189.2393. 21. Sabapathy T, Mustapha MA, Jusoh M, et al. Location track- 35. Besl PJ and McKay ND. A method for registration of 3-D ing system using wearable on-body GPS antenna. In: Engi- shapes. IEEE Trans Pattern Analysis Mach Int 1992; 14(2): neering Technology International Conference, Ho Chi Minh 239–256. DOI: 10.1109/34.121791. City, Vietnam, 5 August 2016, vol. 97. EDP. DOI: 10.1051/ 36. Magnusson M, Nuchter A, Lorken C, et al. Evaluation of 3D matecconf/20179701099. registration reliability and speed - a comparison of ICP and 22. Doherty ST, Lemieux CJ, and Canally C. Tracking human NDT. In: IEEE International Conference on Robotics and activity and well-being in natural environments using wear- Automation, Kobe, Japan, 12 May 2009, pp. 3907–3912. able sensors and experience sampling. Soc Sci Med 2014; IEEE. DOI: 10.1109/ROBOT.2009.5152538. 106: 83–92. DOI: 10.1016/j.socscimed.2014.01.048. 37. Nelson E. Blam - Berkeley localization and mapping, 2016. 23. Escriba C, Roux J, Hajjine B, et al. Smart wearable active https://github.com/erik-nelson/blam (accessed 3 April 2019). patch for elderly health prevention. In: 5th Annual Confer- 38. Fischler MA and Bolles RC. Random sample consensus: a ence on Computational Science & Computational Intelli- paradigm for model fitting with applications to image analy- gence, Las Vegas, United States, 13 December 2018. sis and automated cartography. Communications 1981; 24(6): 24. Ramadhan A. Wearable smart system for visually impaired 381–395. DOI: 10.1145/358669.358692. people. Sensors 2018; 18(3): 843. DOI: 10.3390/s18030843. 39. Ma L, Kerl C, Stckler J, et al. CPA-SLAM: Consistent 25. Zhu X, Li Q, and Chen G. Apt accurate outdoor pedestrian plane-model alignment for direct RGB-D slam. In: IEEE tracking with smartphones. In: Proceedings IEEE INFO- International Conference on Robotics and Automation, COM, Turin, Italy, 14 April 2013, pp. 2508–2516. IEEE. Stockholm, Sweden, 16 May 2016, pp. 1285–1291. DOI: 10.1109/INFCOM.2013.6567057. IEEE. DOI: 10.1109/ICRA.2016.7487260. 26. Kotaru M and Katti S. Position tracking for virtual reality 40. Shan T and Englot B. Lego-loam: lightweight and ground- using commodity Wi-Fi. In: IEEE Conference on Computer optimized LIDAR odometry and mapping on variable terrain. Vision and Pattern Recognition, Honolulu, USA, 21 July In: IEEE/RSJ International Conference on Intelligent Robots and Systems), Madrid, Spain, 1 October 2018, pp. 27. Soltanaghaei E, Kalyanaraman A, and Whitehouse K. Multi- 4758–4765. IEEE. DOI: 10.1109/IROS.2018.8594299. path triangulation: decimeter-level Wi-Fi localization and 41. Haselich M, Jobgen B, Wojke N, et al. Confidence- orientation with a single unaided receiver 2018; DOI: 10. basedpedestriantrackinginunstructured environments 1145/3210240.3210347. using 3D laser distance measurements. In: IEEE/RSJ 28. Edwards A, Silva B, dos Santos R, et al. Wi-Fi based International Conference on Intelligent Robots and Sys- indoor positioning using pattern recognition. In: IEEE tems, Chicago, USA, 14 September 2014, pp. 27th International Symposium on Industrial Electronics, 4118–4123. IEEE. DOI: 10.1109/iros.2014.6943142. Cairns, Australia, 13 June 2018. IEEE. DOI: 10.1109/ 42. Kulis B and Jordan MI. Revisiting k-means: new algorithms isie.2018.8433869. via Bayesian nonparametrics. CoRR 2011; abs/1111.0352. 16 International Journal of Advanced Robotic Systems 43. Kidono K, Miyasaka T, Watanabe A, et al. Pedestrian recog- on Computational learning theory, vol. 37, Madison, USA, nition using high-definition LIDAR. In: IEEE Intelligent 24 July 1998, pp. 297–336. ACM. DOI: 10.1145/279943. Vehicles Symp, (IV), Baden-Baden, Germany, 5 June 2011, 279960. pp. 405–410. IEEE. DOI: 10.1109/ivs.2011.5940433. 45. Radosavljevic Z. A study of a target tracking method using 44. Schapire RE and Singer Y. Improved boosting algorithms global nearest neighbor algorithm. Vojnotehnicki glasnik using confidence-rated predictions. In: Annual Conference 2006; (2): 160–167. DOI: 10.5937/vojtehg0602160r. http://www.deepdyve.com/assets/images/DeepDyve-Logo-lg.png International Journal of Advanced Robotic Systems SAGE

A portable three-dimensional LIDAR-based system for long-term and wide-area people behavior measurement:

Loading next page...
 
/lp/sage/a-portable-three-dimensional-lidar-based-system-for-long-term-and-wide-DtPv3TmCgb

References (46)

Publisher
SAGE
Copyright
Copyright © 2022 by SAGE Publications Ltd, unless otherwise noted. Manuscript content on this site is licensed under Creative Commons Licenses.
ISSN
1729-8814
eISSN
1729-8814
DOI
10.1177/1729881419841532
Publisher site
See Article on Publisher Site

Abstract

It is important to measure and analyze people behavior to design systems which interact with people. This article describes a portable people behavior measurement system using a three-dimensional LIDAR. In this system, an observer carries the system equipped with a three-dimensional Light Detection and Ranging (LIDAR) and follows persons to be measured while keeping them in the sensor view. The system estimates the sensor pose in a three-dimensional envi- ronmental map and tracks the target persons. It enables long-term and wide-area people behavior measurements which are hard for existing people tracking systems. As a field test, we recorded the behavior of professional caregivers attending elderly persons with dementia in a hospital. The preliminary analysis of the behavior reveals how the caregivers decide the attending position while checking the surrounding people and environment. Based on the analysis result, empirical rules to design the behavior of attendant robots are proposed. Keywords 3-D LIDAR, people detection and tracking, behavior analysis Date received: 3 August 2018; accepted: 22 February 2019 Topic: Service Robotics Topic Editor: Antonio Fernandez-Caballero Associate Editor: Tiziana D’Orazio those models are based on simple analysis of the distance Introduction between persons, they cannot describe the influence of the It is important to measure and analyze people behavior surrounding environment and the other persons. Such lim- for designing systems which interact with people. We itations may yield unnatural behavior of the robots in com- have to understand how people behave with respect to plex situations. To realize a robot with natural and the surrounding people and environment to achieve sys- acceptable behavior, it is necessary to measure person tems with natural and rich interactions with people. In particular for service robots, by analyzing the behavior of a person who is helping another, we could model The Department of Computer Science and Information Engineering, their behavior and create a robot with human-like beha- Toyohashi University of Technology, Toyohashi, Japan vior. This allows robots to have natural interaction with 2 The Department of Information Engineering, the University of Padova, humans and makes them more acceptable in daily ser- Padua, Italy vice situations. Corresponding author: Several models which describe the social interaction Kenji Koide, The Department of Computer Science and Information between persons, such as social distance and social force Engineering, Toyohashi University of Technology, Toyohashi, Aichi model, have been proposed, and a number of works have 441-8580, Japan. 3–5 applied those models to service robots. However, since Email: koide@aisl.cs.tut.ac.jp Creative Commons CC BY: This article is distributed under the terms of the Creative Commons Attribution 4.0 License (http://www.creativecommons.org/licenses/by/4.0/) which permits any use, reproduction and distribution of the work without further permission provided the original work is attributed as specified on the SAGE and Open Access pages (https://us.sagepub.com/en-us/nam/ open-access-at-sage). 2 International Journal of Advanced Robotic Systems public data set (http://github.com/koide3/hdl_graph_slam, and http://www.aisl.cs.tut.ac.jp/database_fukushimura. html). They would be useful to measure and analyze people behavior in situations which are hard for existing people tracking systems. The rest of the article is organized as follows. The fol- lowing section explains related work. The third section describes an overview of the proposed system. The fourth and fifth sections describe the offline Simultaneous Loca- lization and Mapping (SLAM) method using a 3-D LIDAR and the online people behavior measurement method which includes sensor localization and people tracking, respec- Figure 1. The proposed system to measure people behavior tively. The sixth section explains a field test in a hospital using a 3-D LIDAR. The observer carries the backpack with a 3-D and provides a preliminary analysis of the field test. The LIDAR and follows the persons to be measured. 3-D: three- last section concludes the article and discusses future work. dimensional. behavior in diverse situations and construct a sophisticated Related work interaction behavior model. Systems to measure people behavior can be categorized There are several data sets which provide people beha- into two groups: (1) systems using static sensors which are 6 7,8 vior in indoor and outdoor environments. However, to fixed at the environment and (2) systems using wearable the best of our knowledge, no data set provides people sensors attached to the target persons. behavior involving interaction between followed and fol- People tracking using static sensors, such as cameras lowing persons even though such a situation is very com- and laser range finders, has been widely studied. In partic- mon in daily services. Most of existing robots just keep the ular, people tracking using cameras for surveillance is a distance to the target person constant, and this naive fol- major research topic in the computer vision community. lowing strategy could make people feel uncomfortable. We A lot of works have proposed people detection and track- believe that it is necessary to measure and analyze people ing methods using RGB cameras. Recent inexpensive attendant behavior to design the behavior of attendant consumer RGB-D cameras allow us to reliably detect and robots, and it triggered us to develop a system which 11 track people, and a camera network system for people enables long-term and wide-area people behavior measure- 12 tracking using RGB-D cameras has been proposed. ment and create a data set which consists of real profes- Although such works provide reliable people tracking, a sional human’s attendant behavior data. capability of recovering the track of a person, who left the Figure 1 illustrates the proposed system for people camera view once, is necessary. This problem (i.e. person behavior measurement. The system is based on a three- reidentification) has been one of the main research topics of dimensional (3-D) LIDAR, and a human observer carries vision-based people tracking systems. A lot of reidentifica- 13–16 the system and follows the persons to be observed while tion methods based on people appearance and soft 17,18 keeping them in the sensor view. The system simultane- biometric features have been proposed. They enable ously estimates the sensor pose in a 3-D environmental map reliable people reidentification over time and over cameras. and tracks the target persons. The proposed system can be Laser range finders have also been used for people 19,20 applied to long-term and wide-area people behavior mea- tracking systems. Such systems can very accurately surement tasks. localize people, and the measurement area of each sensor The contributions of this article are threefold. First, we is larger than cameras. While the reliability and the detec- propose a portable measurement system which enables tion accuracy of those static sensor-based systems are very long-term and wide-area people behavior measurements. good, they can measure people behavior only in an area We validated that the tracking accuracy of the proposed limited by the sensor view. In order to cover a large envi- system is comparable to a static sensor-based people track- ronment, they require the placement of a lot of static sen- ing system. Second, we provide a preliminary analysis of a sors, thereby increasing the time and cost of installing and field test of the proposed system in a hospital. We recorded calibrating all the sensors. the behavior of professional caregivers attending elderly Another way to measure the behavior of specific persons persons with dementia. The results show that the proposed for a long time over a wide area is to attach a wearable system can be applied to the measurement of real people sensor to each target person and measure their behavior behavior. In addition to that, based on the analysis results, with the sensor. Several kinds of sensors, such as inertia we propose empirical rules to design the behavior of atten- navigation system (INS) and global positioning system dant robots. Third, we provide the software of the system (GPS), have been used for this purpose. Recent small wear- and the recorded people behavior as open-source and a able GPS sensors allow us to track a person in outdoor Koide et al. 3 environments, and they have been applied to several appli- 21,22 cations of people behavior measurement and analysis. As an application, GPS-based wearable devices for helping elderly or visually impaired people have been pro- 23,24 posed. The combination of GPS and INS improves tracking accuracy under low-level GPS radio power. However, GPS signals are not available in places close to buildings and indoor environments. Recently, Wi-Fi signal-based localization has been 26–28 widely studied. Some of them are based on triangula- tion of Wi-Fi signal strength and show decimeter or cen- 26,27 timeter accuracy in ideal situations. However, they require to place multiple antennas in the environment to accurately estimate the device position, and thus, it is hard to be applied to a large environment. Other ones are based on the matching of Wi-Fi fingerprint matching. While they do not rely on external antennas and can be applied to large environments where Wi-Fi signal is available, the estimation accuracy is very limited. Figure 2. System overview. Behavior measurement systems for indoor environments based on pedestrian dead reckoning have also been pro- In the behavior measurement phase, the system esti- 29,30 posed. Those methods estimate the target person posi- mates its pose on the map created offline by combining tion by integrating acceleration and angular velocity a scan matching algorithm with an angular velocity-based obtained by an INS (attached to the person). In order to pose prediction using unscented Kalman filter (UKF). prevent estimation drift, Li et al. combined pedestrian dead Simultaneously, the system detects and tracks the target reckoning and map-based localization. Those methods persons. can keep track the position of the person as long as they hold the sensor. Since they utilize smartphones which are Offline environmental mapping very common and inexpensive in recent years, those meth- ods are cost-effective and easy to adopt. However, since Graph SLAM INS is an internal sensor and it cannot sense the surround- Graph SLAM is one of the most successful approaches to ing environment, it is hard to accurately measure the person the SLAM problem. In this approach, the SLAM problem is position with respect to the environment and other persons’ solved by constructing and optimizing a graph whose nodes positions. Thus, they cannot be applied to the measurement represent parameters to be optimized, such as sensor poses of the interaction between persons and that of person’s and landmark positions, and edges represent constraints, behavior affected by the environment. such as relative poses between sensor poses and landmarks. The graph is optimized so that the errors between the para- 31,33 System overview meters and the constraints are minimized. Following, let x be the node k. Let z and O be the mean and the k k k Figure 2 shows an overview of the proposed system. In information matrix of the constraints relating to x . The this system, the observer carries the backpack equipped objective function is defined as with a 3-D LIDAR (velodyne HDL-32e) and a PC and follows the persons to be measured. The 3-D LIDAR FðxÞ¼ e ðx ; z Þ O e ðx ; z Þ; ð1Þ k k k k k k k provides 360 range data at 10 Hz, and from the range data, the system estimates its pose while tracking the where e ðx ; z Þ is an error function between the para- k k k target persons. The process of the proposed system con- meters x and the constraints z . Typically, equation (1) k k sists of two phases: (1) offline environmental mapping is linearized and minimized by using Gauss–Newton or and (2) online sensor localization and people detection/ Levenberg–Marquardt algorithms. tracking. However, if the parameters span over non-Euclidean In the offline mapping phase, we create a 3-D environ- spaces (like pose parameters), those algorithms may lead mental map which covers the entire measurement area. For to suboptimal or invalid solutions. One way to deal with the mapping, we employ a graph optimization-based this problem is to perform the error optimization on a mani- SLAM approach (i.e. Graph SLAM ). In order to compen- fold which is a minimal representation of the parameters sate accumulated rotational errors of the scan matching, we and acts as a Euclidean space locally. In order to enable it, introduce ground plane and GPS position constraints for an operator ? is introduced, which transforms a local indoor and outdoor environments, respectively. variation Dx on the manifold. 4 International Journal of Advanced Robotic Systems Algorithm 1. Loop-detection Typically, in the 3-D SLAM problem, node x has parameters of the sensor pose at k (a translation vector t and a quaternion q ). A manifold of the quaternion T T q ¼½q ; q ; q ; q  can be represented as ½q ; q ; q  , and k w x y z x y z the operator ? is described as hi qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi 0 0 0 0 0 0 2 q ? Dq ¼ 1k q þ q þ q k ; q ; q ; q ð2Þ k x y z x y z where q ¼ q  Dq . In the proposed system, we first estimate the sensor trajectory by iteratively applying normal distributions transform (NDT) scan matching between consecutive frames. For 3-D LIDARs, NDT shows a better performance Figure 3. Comparison of the sensor trajectories estimated by than other scan matching algorithms, such as iterative clo- the existing method and the proposed method. (a) BLAM. (b) sest points, in terms of both the reliability and the pro- LeGO-LOAM. (c) Ours without plane constraints. (d) Ours with cessing speed. Let p be the sensor pose at t, consisting of t plane constraints. a translation vector t and a quaternion q, and r be the t;tþ1 relative sensor pose between t and t þ 1 estimated by the updated such that equation (1) is minimized. We utilize scan matching. We add them to the pose graph as nodes 33 g2o, a general framework for hypergraph optimization, ½p ;  ; p  and edges ½r ;  ; r . Then, we find 0;1 N1;N 0 N for the pose graph optimization. loops in the trajectory and add them to the graph as edges As a generated map gets larger, it tends to be bent due to (i.e. loop closure) to correct the accumulated error of the the accumulated rotational error of the scan matching (see scan matching with Algorithm 1. Figure 3). In order to compensate the error, we introduce The loop detection algorithm is similar to the work of ground plane and GPS position constraints for indoor and Nelson. First, we detect loop candidates based on the outdoor environments, respectively. Figure 4 shows an translational distance and the length of the trajectory illustration of the graph structure of the proposed system. between nodes (lines 2–11). Then, to validate the loop candidates, a scan matching algorithm (in our case, NDT) Ground plane constraint is applied between the nodes of each candidate. If the fit- ness score is lower than a threshold (e.g. 0.2), we add the To reliably generate the map of a large indoor environment, loop to the graph as an edge between the nodes (lines we assume that the environment has a single flat floor and 12–17). Every time a loop is found, the pose graph is introduce the ground plane constraint which optimizes the Koide et al. 5 To calculate the error between sensor pose p and the ground plane p , we first transform the ground plane into the local coordinate of the sensor pose p 0 0 0 T T ½n ; n ; n  ¼ R ½n ; n ; n  ð3Þ x y z t x y z 0 0 0 0 d ¼ d  t ½n ; n ; n  ð4Þ t x y z 0 0 0 0 0 where p ¼½n ; n ; n ; d  is the ground plane in the local 0 x y z coordinate, and ½R jt  is the sensor pose at time t. t t Following Ma et al.’s work, we employ the minimum parameterization tðpÞ¼ ð; ; dÞ,where ; ; and d are the azimuth angle, the elevation angle, and the length of the intercept, respectively. The error between a pose node and the ground plane node is defined as Figure 4. The proposed pose graph structure. n n y z tðpÞ¼ arctan ; arctan ; d ð5Þ n jnj e ¼ tðp Þ tðpÞð6Þ i;0 0 t where p is the detected ground plane at t. GPS constraint In outdoor environments where the ground is not flat, we use the GPS-based position constraint instead of the ground plane constraint. For ease of optimization, we first trans- form GPS data into the universal transverse mercator coor- dinate, where a GPS data has easting, northing, and altitude values in a Cartesian coordinate. Then, each GPS data is associated with the pose node, which has the closest time Figure 5. Ground plane detection. Points within a certain height stamp to the GPS data, as a unary edge of the prior position range are extracted by height thresholding (green points), and then RANSAC is applied to them to detect the ground plane (red information. points). The horizontality of the ground plane is validated by The error between the translation vector t of a pose checking the plane normal. node p and a GPS position T is simply given by e ¼ t  T ð7Þ i t t pose graph such that the ground plane detected in each observation becomes the same plane. This assumption is valid in many indoor public environments, such as schools SLAM framework evaluation and hospitals. In order to validate the proposed SLAM system, we We assume that the approximate height of the sensor is recorded a 3-D point cloud sequence in an indoor environ- known (e.g. 2 m) and extract points within a certain height ment. Figure 6 shows the experimental environment and range which should contain the ground plane points (e.g. the trajectory of the sequence. The duration of the sequence (1.0, þ1.0) m from the ground). Then, we apply RAN- is about 45 min (2700 s), and the length of the trajectory is SAC to the extracted point cloud and detect the ground about 2400 m (estimated by the proposed method). plane. If the normal of the detected plane is almost vertical For comparison, we generated 3-D environmental maps (the angle between the normal and the unit vertical vector is using the proposed method with and without plane con- lower than 10 ), we consider that the ground plane is cor- straints. We also applied existing publicly available SLAM rectly detected and add a ground plane constraint edge to 37 40 frameworks, BLAM and LeGO-LOAM, to this data set. the graph. Figure 5 shows an example of the detected Figure 3 shows the trajectories estimated by the different ground planes. Green points are the points extracted by the SLAM algorithms. BLAM and LeGO-LOAM were aborted height thresholding, and red points belong to the ground in the middle of the sequence when they failed to estimate plane detected by RANSAC. We detect the ground plane the trajectory and did not recover. BLAM failed to find the every 10 s and connect the corresponding sensor pose node loops due to the accumulated rotation error of the scan p with the fixed ground plane node where the plane coeffi- matching and generated a warped and inaccurate trajectory. T T cients are p ¼½n ; n ; n ; d ¼½0; 0; 1; 0 . 0 x y z Since LeGO-LOAM maintains the local consistence of the 6 International Journal of Advanced Robotic Systems Figure 6. The experimental environment. The duration of the sequence is about 45 min, and the length of the trajectory is about 2400 m. Figure 7. The created environmental map. The color indicates the height of each point. The height of the floor is consistent thanks to the plane constraint. ground plane between consecutive frames, the estimated We also validated the proposed method in an outdoor trajectory is flatter than the one estimated by BLAM. How- environment. Figure 8(a) shows the environment and the ever, it still suffers from the accumulated rotational error trajectory of the sequence. The duration of the sequence due to the lack of the global ground constraint. Eventually, is about 42 min (2500 s). Figure 8(b) shows the map it failed to estimate the trajectory when the observer made a generated by the proposed method with the GPS con- u-turn at the end of a narrow corridor. straint. Although there were large undulations, the sys- With and without the plane constraint, the proposed tem correctly found loops and constructed a proper pose method could construct pose graphs properly thanks to the graph thanks to the GPS constraint. Note that, without reliability of NDT, and it generated consistent maps. How- the GPS constraint, the system could not find the loop ever, without the plane constraint, the resultant map is due to the scan matching error and failed to create the warped due to the accumulated rotational error which is environmental map. hard to be corrected by loops on a plane. With the ground plane constraint, the accumulated rotational error is cor- rected, and the resultant map is completely flat. Figure 7 Online people behavior measurement shows the generated environmental map. The color indi- cates the height of each point. The floor has the consistent In order to measure people behavior, the system simulta- height thanks to the plane constraint. The result shows that neously estimates the sensor pose on the 3-D environmental the proposed plane constraint is effective to compensate the map and tracks people around the observer. Figure 9 shows accumulated rotational error in a large indoor environment. an overview of the online sensor localization and people Table 1 shows the processing time of the proposed tracking system. By integrating angular velocity and range method and BLAM. The processing time of LeGO- data provided by the LIDAR, the system estimates the sen- LOAM is not available here, since it provides only real- sor pose. Then, it detects and tracks people to know people time processing. While BLAM took about 15,327 (s) to positions with respect to the environmental map. Note that generate the map, the proposed method took about 5392 the initial pose of the sensor is given by hand to avoid the (s) thanks to the computational efficiency of NDT. global localization problem. Koide et al. 7 Table 1. Processing time of BLAM and our SLAM system. NDT scan matching with angular velocity data provided by the 3-D LIDAR using UKF. Method Time (s) We define the sensor state to be estimated as Ours Scan matching 1542 a T x ¼½p ; q ; v ; b  ð8Þ t t Floor detection 231 t t t Loop closing 3619 where p is the position, q is the rotation quaternion, v is t t Total 5382 the velocity, and b is the bias of the angular velocity of the BLAM Total 15,327 sensor at time t. Assuming constant translational velocity for the sensor motion model, and constant bias for the angular velocity sensor, the system equation for predicting the state is defined as a T x ¼½p þ Dt  v ; q  Dq ; v ; b  ð9Þ t t1 t1 t1 t1 t t1 where Dt is the duration between t and t  1, and Dq is the rotation during Dt caused by the bias-compensated angular velocity a ¼ a  b t t t1 Dt 0 Dt Dt 0 x y z Dq ¼ 1; a ; a ; a ð10Þ t t t 2 2 2 With equation (9), the system predicts the sensor pose by using UKF and then applies NDT to match the observed point cloud with the global map with the estimated x and q t t as the initial guess of the sensor pose. Then, the system corrects the sensor state with the sensor pose estimated by 0 0 the scan matching z ¼½p ; q  . The observation equation t t is defined as z ¼½p ; q  ð11Þ t t We normalize the quaternion in the state vector after each of the prediction and correction steps to prevent its norm from changing due to the unscented transform and the accumulated calculation error. It is worth mentioning that we also implemented pose prediction which takes accelera- tion into account. However, the estimation result got worse due to the strong noise on acceleration observations. Figure 8. The SLAM system validation in an outdoor environ- ment. (a) The outdoor environment. The duration of the People detection and tracking sequence is about 42 min, and the length of the trajectory is about We first remove the background points from an observed 3000 m. (b) The 3-D map of the outdoor environment generated by the proposed method with GPS constraints. The color indi- point cloud to extract the foreground points. Then, we cre- cates the height of each point. 3-D: three-dimensional; GPS: global ate an occupancy grid map with a certain voxel size (e.g. positioning system. 0.5 m) from the environmental map. The input point cloud is transformed into the map coordinate according to the sensor pose estimated by UKF, and then each point at a Sensor localization voxel containing environmental map points is removed as We can estimate the sensor ego motion by iteratively the background. The Euclidean clustering is then applied to applying a scan matching algorithm as in the SLAM part. the foreground points to detect human candidate clusters. However, in contrast to the SLAM scenario, the observer However, in case persons are close together, their clusters has to follow the target persons during the measurement may be wrongly merged and are detected as a single clus- and sometimes has to move quickly to keep them in the ter. To deal with this problem, we employ Haselich’s split- sensor view. In such cases, the sensor motion between merge clustering algorithm. frames gets very large and the scan matching may wrongly The algorithm first divides a cluster into subclusters estimate the sensor ego motion due to the large displace- until each cluster gets smaller than a threshold (e.g. 0.45 ment. In order to deal with this problem, we integrate the m) by using dp-means so that every cluster does not have 8 International Journal of Advanced Robotic Systems Figure 9. The online sensor pose estimation and people detection and tracking system. Figure 10. Haselich’s clustering algorithm. The green bounding box indicates the Euclidean clustering result. Two persons are Figure 11. The experimental environment of the sensor locali- wrongly detected as a single cluster. The cluster is divided into zation experiment. small subclusters (red bounding boxes) and then remerged if there is no gap between those subclusters. The blue bounding boxes are the final detection result. (a) Top view. (b) Bird’s eye the tracked persons are visible from the sensor and are view. correctly detected. points of different persons. Then, if there is no gap between Sensor localization evaluation those subclusters, the clusters are considered to belong to a single person and remerged into one cluster. Figure 10 To show how the pose prediction improves the sensor loca- shows an example of the detection results. The person lization, we conducted a sensor localization experiment. clusters are correctly separated even when they are very Figure 11 shows the experimental environment. An obser- close together thanks to the split and the remerge process. ver carries the system and moves along the corridor, and the The detected clusters may contain nonhuman clusters system estimates its pose from the range and angular velo- (i.e. false positives). To eliminate nonhuman clusters city data. We conducted the experiment twice. In the first among detected clusters, we judge whether a cluster is a trial, the observer walked (about 1.5 m/s) to avoid the sen- human or not by using a human classifier trained with slice sor being moved quickly. In the second trial, the observer 43 44 features by Kidono et al. and Schapire and Singer. ran (about 3.0 m/s) and the sensor got shaken very strongly. Assuming that persons walk on the ground plane, we track Figure 12 shows the results of the first trial. Figure 12(a) persons on the XY plane without the height. We employ the shows the estimated trajectories with and without the pose combination of Kalman filter with the constant velocity prediction. Since the observer moved slowly during the model and global nearest neighbor data association to first sequence, both the results show the same correct tra- track persons. The tracking scheme works well as long as jectory. To assess the effect of the sensor pose prediction, Koide et al. 9 Figure 13. The results of the second trial of the sensor locali- Figure 12. The results of the first trial of the sensor localization zation experiment. The observer ran during the trial (about 3.0 m/ experiment. The observer walked during the trial (about 1.5 m/s). s). Without the pose prediction, the system could not correctly Both the trajectories with and without the angular velocity-based estimate the pose due to the very quick motion. (a) Estimated pose prediction are correctly estimated. With the prediction, the trajectories. (b) Difference between the predicted and the cor- initial guess for NDT significantly gets closer to the correct pose. rected positions. (c) Difference between the predicted and the (a) Estimated trajectories. (b) Difference between the predicted corrected rotations. (d) Processing time. and the corrected positions. (c) Difference between the predicted and the corrected rotations. (d) Processing time. NDT: normal distributions transform. Table 2. The summary of the sensor localization experiment. With prediction Without prediction we assume that the trajectories estimated by NDT are mostly correct, and we compare the predicted sensor poses Error Error Time Error Error Time with the poses estimated by NDT since measuring the Seq. (m) ( ) (ms) (m) ( ) (ms) ground truth of the sensor trajectory is difficult. Figure st (walk) 0.0588 1.0913 38.88 0.1367 2.1625 40.06 12(b) and (c) shows the difference between the predicted nd (run) 0.1851 4.2845 45.14 0.3330 6.6798 56.11 sensor pose (initial guess pose) and the one estimated by NDT. In the case without the pose prediction, the previous matching result is used as an initial guess. With the predic- iterations to converge to a local solution. With the predic- tion, the translational and rotational pose prediction errors tion, the matching took about 45 ms per frame thanks to the significantly decrease thanks to the constant velocity model good initial guess (see Table 2). The results show that the and the consideration of angular velocity, respectively. angular velocity-based pose prediction makes the pose esti- The results of the second trial are shown in Figure 13. mation robust to quick motions and fast to converge. The system failed to estimate the sensor pose without the pose prediction (see Figure 13(a)) since the observer moved very quickly, and the sensor displacement between frames People detection evaluation got larger. The NDT matching took a longer time (about 56 ms per frame) without the pose prediction since the large To analyze the effect of the split-merge clustering and the displacement between frames makes NDT need more human classifier, we recorded a 3-D range data sequence, 10 International Journal of Advanced Robotic Systems Table 3. The people detection evaluation result. Split-merge 41 43 clustering Human classifier Precision Recall F-measure Without Without 1.000 0.834 0.909 Without With 1.000 0.809 0.894 With Without 0.902 0.995 0.946 With With 0.961 0.961 0.961 in which two persons are close together and walking side by side. It is a hard situation for the usual Euclidean clus- tering since the persons’ clusters may be merged into a single cluster. The number of frames is 102, and we applied the human detection method with and without the split- Figure 14. The experimental environment and the configuration merge clustering and the human classifier to this sequence. of RGB-D cameras for OpenPTrack. Nine Kinect v2s are placed in Table 3 shows the evaluation result. Without both the the corridor. While OpenPTrack can measure only the limited area covered by cameras (about 2  20 m area), the proposed techniques, the recall value is low (0.834), since clusters of system can cover the whole of the floor. the persons are sometimes detected as a single cluster due to the Euclidean clustering. With the split-merge cluster- ing, the wrongly merged clusters are split into subclusters, Table 4. The difference of the observer and the subject positions and the recall value gets higher (0.995). With both the measured by the proposed system and OpenPTrack. split-merge clustering and the human classifier, over split Difference (m) subclusters are eliminated by the classifier, and the highest F-measure value is achieved (0.961). This result shows Min Max Mean Standard deviation that, in situations where persons are close together, the Observer 0.0008 0.2126 0.0768 0.0448 split-merge clustering effectively increases the recall of Subject 0.0035 0.2837 0.0990 0.0445 human detection, and by combining it with the human classifier, we can obtain reliable human detection results. errors of OpenPTrack at the border of the camera view. However, the difference is lower than 0.1 m on average, Comparison with a static sensor-based people and the result shows that the measurement accuracy of the tracking system proposed system and the static sensor-based people track- In order to reveal the pros and cons of the proposed system, we ing system are comparable. compared the proposed system with a publicly available static In summary, the tracking accuracy of the proposed por- sensor-based people tracking framework, OpenPTrack. The table system is comparable to the static sensor-based system, framework is designed for people tracking using static RGB- and the measurement area of the proposed system can be D cameras, and it is scalable to a large camera network. extended easily. For instance, the system can measure the Moreover, it uses cost-effective hardware and is easy to setup. people behavior over the whole area of the map shown in It has been operated by people including nonexperts in com- Figure 7 (200  50 m ). We would need hundreds of cam- puter vision, such as artists and psychologists. eras to cover the whole area of the map if we used a static Figure 14 shows the experimental environment and the sensor-based system in the environment. On the other hand, configuration of the RGB-D camera network. The map is static sensor-based systems can measure behavior of all peo- created by the proposed SLAM method. We placed nine ple in the covered area simultaneously, while the proposed Kinect v2s so that they cover about 2  20 m area. We system covers only the surrounding area. Thus, we can say calibrated the camera network according to the procedure that the proposed system is suitable to measure the behavior provided by OpenPTrack and then estimated the transfor- of specific people over a large area, while static sensor-based mation between the environmental map and the camera systems are suitable for behavior measurement of all the network by performing ICP registration between point people in a relatively small environment. clouds of the Kinects and the environmental map. While a subject walked in the corridor, an observer car- Field test in a hospital rying the proposed system followed him. The trajectories of both the persons were measured by the proposed system Measuring behavior of caregivers attending and OpenPTrack. Table 4 shows the summary of the dif- elderly persons ferences between the people positions measured by the proposed system and OpenPTrack. The differences some- To show that the proposed system can be applied to real times became larger (about 0.2–0.3 m) due to detection people behavior measurements, we conducted a field test in Koide et al. 11 Figure 15. A snapshot of the field test. The behavior of the care giver attending an elderly is recorded by using the proposed system. (a) Image. (b) Range data. Sawarabikai Fukushimura hospital. The hospital is specia- lized for elderly care, and hundreds of elderly patients are hospitalized and receiving care and rehabilitation in the hospital. Under permission granted by the hospital, we recorded professional caregivers’ behavior while they attend elderly persons with dementia. Figure 15 shows a snapshot of the field test. The caregiver attends the elderly to prevent accidents (such as stumbling, colliding, and fall- ing) and sometimes guides him/her to their room. Figure 16. The environments of the field test. (a) Hallway (1F). The number of sequences is 33, and the total duration is (b) Ward (2F). about 52 min. We also recorded an attendant behavior sequence in an outdoor environment shown in Figure 8. Preliminary analysis of the attendant behavior The duration of the outdoor sequence is about 22 min. Note To show the possibility of the behavior analysis with the that, for privacy reasons, we captured images during only proposed system, we provide preliminary analysis of the the sequence shown in Figure 15 with the special permis- measured behavior sequences. sion from the hospital, the subject, and his family. In the Figure 17(a) shows the distribution of the distance other sequences, we recorded only range data. It is a merit between a caregiver and an elderly person in the indoor of the proposed system that it can measure people behavior environment. The distribution is unimodal, and the peak is without privacy problems. at about 0.6 m. In proxemics, this distance is categorized Figure 16 shows the created indoor environmental maps as “Personal distance (0.45–1.2 m),” and people allow through the field test. The elderly persons take rest at the only familiar people to be within this distance while they dining hall on the first floor and then return to their hospital keep more distance (i.e. “Social distance (1.2–3.6 m)”) room on the second floor with a caregiver using the eleva- when meeting or interacting with unfamiliar people. It tor. After they ride the elevator, we switch the map from the implies that people maintain a closer relationship while one of the first floor to the second floor. attending another person comparing to usual people inter- During the measurement, there were other patients and action, such as meeting. Figure 17(b) shows the distribu- objects, such as wheelchairs and medicine racks, and the tion of the caregivers’ position with respect to the elderly observer sometimes had to move quickly to keep the sub- persons. The caregivers usually locate at the side of the jects in sensor view. However, the proposed system could elderly persons. In order to lead the elderly persons, they correctly localize itself through all the sequences thanks to slightly precede the patients. The distribution is a bit ani- the wide measurement area of the 3-D LIDAR and the sotropic: when a caregiver is following an elderly person, integration of the scan matching and the angular velocity- the distance between them tends to be larger since the based pose prediction. caregivers see the elderly person and the surrounding Regarding people tracking, the system failed to keep environment at the same time. From this preliminary anal- track of the subjects when a patient came between the ysis, we can find that the caregivers decide their attending observer and the subjects to be observed, and new IDs were position in order to keep the elderly person in the view and assigned to the subjects after they reappeared. In such look ahead in the environment. cases, the system notifies that it lost the track of subjects, Figure 18(a) shows the trajectories of the caregivers and and we reassigned correct IDs to them by hand. Since we the elderly persons at a corner, and it also suggests the saw those cases only a few times, the system could keep track of the subjects for the most part of the sequences, and importance of visibility for deciding the attending position. we could reassign all the IDs with the minimum effort. The number of the trajectories is 17. The caregivers tend to 12 International Journal of Advanced Robotic Systems Figure 17. An analysis of the people attending behavior during the field test in an indoor environment. (a) The distribution of the distance between the elderly person and the caregiver. (b) The distribution of the relative position of the caregiver with respect to the elderly person. Figure 18. The trajectories of the caregivers (in orange) and the elderly persons (in green) at a corner. The light blue lines indicate that the connected points are measured at the same time. In most of the cases, the caregivers walked on the outer side of the corner (15 of 17). In a few cases, the caregivers walked on the inner side. In such cases, they preceded the elderly persons to ensure outlook of the corridor (2 of 17). (a) All the trajectories of the caregivers and the elderly person. (b) An example of the cases where the caregiver walks on the outer side of the corner. (c) The case where the caregiver walks on the inner side of the corner. walk on the outer side of the corner (15 of 17). We can to 1.0–1.2 m/s, while they walked at 1.2–1.4 m/s in down consider that, by walking at the outer side, the caregivers slopes. Slopes influence not only their walking speed but also their position relationship. We extracted their behavior keep the outlook of the corridor to prevent accidents, such as stumbling and colliding. The caregivers walk on the in up slopes and down slopes, respectively, and calculated inner side in a few cases (2 of 17). However, they preceded the distributions of the caregiver’s relative position with the elderly persons in order to check the safeness before the respect to the elderly (see Figure 20). We can see that, in elderly persons enter the corner. These results suggest that down slopes, the elderly led the caregiver while they the caregivers always check the existence of other sur- walked side by side in up slopes due to the change of the walking speed. Although the caregiver’s “X-axis” position rounding people and objects, such as wheelchairs, to pre- vent accidents. varies depending on the walking speed, he/she almost Figure 19(a) shows the recorded trajectories in the out- always stays at 0.6 m side from the elderly. This is also door environment. In this sequence, the elderly was fine to observed in indoor environments (see Figure 17). These walk, and the caregiver did let him walk relatively freely results suggest that, during attendance, professional care- while navigating him to return back to the hospital. Figure givers adjust their position depending on the elderly per- 19(b) shows the caregiver’s walking speed and the eleva- sons’ status and the surrounding environment, while tion of her position in the global map. When the caregiver keeping their side distance to the elderly persons constant. (and the elderly) was going up a slope, they got slow down This can be applied to designing of person following Koide et al. 13 robots. Most of existing person following robots just keep the side distance to the target constant, and it may contrib- the distance to the target constant. However, it might be ute the naturalness of the following behavior of the robot. Those analysis results are difficult to obtain using exist- unnatural behavior for people. We can make the robot keep ing measurement systems which use static sensors or wear- able devices, such as INS and GPS, since it requires accurately measure people behavior with respect to other people and the surrounding environment. The results show that we can capture and analyze such people behavior with the proposed system. Person following behavior rules Based on the analysis of the real caregivers’ behavior, we propose empirical rules to design the behavior of attendant robots. It would be helpful to design a robot which attends a person while keeping him/her away from dangerous situations. 1. The robot attends the person while keeping the side- by-side positioning as long as it’s possible. In par- ticular, it should keep in the position 0.6 m aside from the person. 2. Depending on the walking speed, the relative posi- tion would deviate along the front-back direction. However, even in such a case, the robot should keep the certain distance aside from the person. 3. At a corner, the robot should go on the outer side of the corner so that it can check the safeness of the corridor while avoiding to disturb the person. 4. In case the robot cannot go on the outer side due to positioning and obstacles, it should go on the inner side before the person enters the corner and check whether it’s safe. It would slightly disturb the per- son from walking. However, the safety has a higher priority than the comfortableness. Figure 19. The recorded attendant behavior in the outdoor environment. (a) People trajectory. (b) The caregiver’s walking 5. To attend a person who is fine to walk, the robot has speed (green) and altitude (blue). to be able to run at about 1.4 m/s. Figure 20. The distribution of the relative position of the care giver with respect to the elderly person in an outdoor environment. (a) Up slopes. (b) Down slopes. 14 International Journal of Advanced Robotic Systems 2. Helbing D and Molnar P. Social force model for pedestrian Note that the values in the rules, such as the distance to dynamics. Phys Rev E 1995; 51(5): 4282. DOI: 10.1103/ the person to be attended, should be adjusted depending on PhysRevE.51.4282. the robot configuration (e.g. size and shape). However, we 3. Ferrer G, Garrell A, and Sanfeliu A. Robot companion: a social- believe that the rules would be a good initial guide to force based approach with human awareness-navigation in designing a comfortable attendant robot which is socially crowded environments. In: IEEE/RSJ International Conference acceptable. on Intelligent Robots and Systems, Tokyo Japan, 6 January 2013, pp. 1688–1694. IEEE. DOI: 10.1109/IROS.2013. Conclusions and discussion 4. Ferrer G and Sanfeliu A. Proactive kinodynamic planning This article has described a portable people behavior mea- using the extended social force model and human motion surement system using a 3-D LIDAR. The proposed system enables long-term and wide-area behavior measurement. prediction in urban environments. In: IEEE/RSJ International The system first creates a 3-D map of the environment Conference on Intelligent Robots and Systems, Chicago, using the Graph SLAM approach in advance to measure- USA, 14 September 2014, pp. 1730–1735. DOI: 10.1109/ ments. Then, it estimates its pose, detects, and tracks people IROS.2014.6942788. simultaneously. The tracking accuracy of the system is 5. Oishi S, Kohari Y, and Miura J. Toward a robotic attendant comparable to a static sensor-based people tracking system. adaptively behaving according to human state. In: IEEE As a field test, we demonstrated the effectiveness of the International Symposium on Robot and Human Interactive proposed system in measuring the behavior of professional Communication, New York, USA, 26 August 2016, pp. caregivers’ attending elderly persons. Based on the analysis 1038–1043. IEEE. DOI: 10.1109/ROMAN.2016.7745236. of the measured behavior, empirical rules to design the 6. Brscic D, Kanda T, Ikeda T, et al. Person position and body behavior of attendant robots are proposed. The measure- direction tracking in large public spaces using 3D range sen- ment system and the professional caregivers’ behavior data sors. IEEE Trans Human Mach Syst 2013; 43(6): 522–534. set have been public so that they can be used for to mea- 7. Baltieri D, Vezzani R, and Cucchiara R. 3DPeS: 3D people surement and analysis of people attendant behavior. dataset for surveillance and forensics. In: ACM Workshop on The current system requires a human observer who car- Multimedia access to 3D Human Objects, Scottsdale, Ari- ries the backpack with the 3-D LIDAR, thus manual effort zona, USA, pp. 59–64. to observe people is necessary. The human observer would 8. Benfold B and Reid I. Stable multi-target tracking in real- be replaced with a mobile robot so that a large attendant time surveillance video. In: IEEE Conference on Computer behavior data set is automatically created for improving the Vision and Pattern Recognition, Colorado Springs, USA, 20 June 2011, pp. 3457–3464. robot attendant behavior. 9. Zhang S, Benenson R, Omran M, et al. How far are we from Acknowledgement solving pedestrian detection? In: IEEE Conference on Com- puter Vision and Pattern Recognition, Las Vegas, USA, 27 The authors would like to thank O. Kohashi, S. Yamamoto, and June 2016, pp. 1259–1267. IEEE. DOI: 10.1109/CVPR. T. Gomyo for allowing us to conduct the field test in Sawarabi- kai Fukushimura hospital and their excellent cooperation during 2016.141. the test. 10. Fuentes LM and Velastin SA. People tracking in surveillance applications. Image Vision Comput 2006; 24(11): 1165–1171. Declaration of conflicting interests DOI: 10.1016/j.imavis.2005.06.006. Performance Evaluation The author(s) declared no potential conflicts of interest with of Tracking and Surveillance. respect to the research, authorship, and/or publication of this 11. Luber M, Spinello L, and Arras KO. People tracking in RGB- article. D data with on-line boosted target models. In: IEEE/RSJ International Conference on Intelligent Robots and Systems, Funding San Francisco, USA, 25 September 2011, pp. 3844–3849. The author(s) disclosed receipt of the following financial support IEEE. DOI: 10.1109/IROS.2011.6095075. for the research, authorship, and/or publication of this article: This 12. Munaro M, Basso F, and Menegatti E. OpenPTrack: Open work is in part supported by JSPS Kakenhi No. 25280093 and the source multi-camera calibration and people tracking for Leading Graduate School Program R03 of MEXT. RGB-d camera networks. Robot Auton Syst 2016; 75: 525–538. DOI: 10.1016/j.robot.2015.10.004. ORCID iD 13. Bedagkar-Gala A and Shah SK. A survey of approaches and Kenji Koide https://orcid.org/0000-0001-5361-1428 trends in person re-identification. Image Vision Comput 2014; 32(4): 270–286. DOI: 10.1016/j.imavis.2014.02.001. References 14. Satake J, Chiba M, and Miura J. A SIFT-based person 1. Hall E. The hidden dimension: Man’s use of space in public identification using a distance-dependent appearance and private. London, UK: Doubleday anchor books, Bodley model for a person following robot. In: IEEE Interna- Head, 1969. ISBN 9780370013084. tional Conference on Robotics and Biomimetics, Koide et al. 15 Guangzhou, China, 11 December 2012, pp. 962–967. 29. Li F, Zhao C, Ding G, et al. A reliable and accurate indoor localization method using phone inertial sensors. In: ACM IEEE. DOI: 10.1109/ROBIO.2012.6491093. Conference on Ubiquitous Computing, Pittsburgh, USA, 5 15. Koide K and Miura J. Identification of a specific person using September 2012, pp. 421–430. ACM. DOI: 10.1145/ color, height, and gait features for a person following robot. 2370216.2370280. Robot Auton Syst 2016; 84: 76–87. DOI: 10.1016/j.robot. 30. Kang W and Han Y. Smartpdr: smartphone-based pedestrian 2016.07.004. dead reckoning for indoor localization. IEEE Sens J 2015; 16. Ristani E and Tomasi C. Features for multi-target multi- 15(5): 2906–2916. DOI: 10.1109/JSEN.2014.2382568. camera tracking and re-identification. In: IEEE Conference 31. Grisetti G, Kummerle R, Stachniss C, et al. A tutorial on on Computer Vision and Pattern Recognition, Salt Lake City, graph-based slam. IEEE Int Trans Syst Magaz 2010; 2(4): USA, 18 June 2018. 31–43. DOI: 10.1109/MITS.2010.939925. 17. Munaro M, Fossati A, Basso A, et al. One-shot person 32. Wan E and Merwe RVD. The unscented Kalman filter for re-identification with a consumer depth camera. In: Person nonlinear estimation. In: Adaptive Systems for Signal Pro- Re-Identification, 2014, pp. 161–181. Springer. DOI: 10.1007/ cessing, Communications, and Control Symposium,Lake 978-1-4471-6296-4_8. Louise, Canada, 4 October 2000. IEEE. DOI: 10.1109/ 18. Semwal VB, Raj M, and Nandi G. Biometric gait identifica- asspcc.2000.882463. tion based on a multilayer perceptron. Robot Auton Syst 2014; 33. Ku¨mmerle R, Grisetti G, Strasdat H, et al. G2o: A general 65: 65–75. DOI: 10.1016/j.robot.2014.11.010. framework for graph optimization. In: IEEE International 19. Song X, Cui J, Zhao H, et al. Laser-based tracking of multiple Conference on Robotics and Automation, Shanghai, China, interacting pedestrians via on-line learning. Neurocomputing 9 May 2011, pp. 3607–3613. IEEE. DOI: 10.1109/ICRA. 2013; 115: 92–105. DOI: 10.1016/j.neucom.2013.02.001. 2011.5979949. 20. Nakamura K, Zhao H, Shibasaki R, et al. Human sensing in 34. Magnusson M, Lilienthal A, and Duckett T. Scan registration crowd using laser scanners. London, UK: INTECH Open for autonomous mining vehicles using 3D-NDT. J Field Access Publisher, 2012. DOI: 10.5772/33276. Robot 2007; 24(10): 803–827. DOI: 10.1.1.189.2393. 21. Sabapathy T, Mustapha MA, Jusoh M, et al. Location track- 35. Besl PJ and McKay ND. A method for registration of 3-D ing system using wearable on-body GPS antenna. In: Engi- shapes. IEEE Trans Pattern Analysis Mach Int 1992; 14(2): neering Technology International Conference, Ho Chi Minh 239–256. DOI: 10.1109/34.121791. City, Vietnam, 5 August 2016, vol. 97. EDP. DOI: 10.1051/ 36. Magnusson M, Nuchter A, Lorken C, et al. Evaluation of 3D matecconf/20179701099. registration reliability and speed - a comparison of ICP and 22. Doherty ST, Lemieux CJ, and Canally C. Tracking human NDT. In: IEEE International Conference on Robotics and activity and well-being in natural environments using wear- Automation, Kobe, Japan, 12 May 2009, pp. 3907–3912. able sensors and experience sampling. Soc Sci Med 2014; IEEE. DOI: 10.1109/ROBOT.2009.5152538. 106: 83–92. DOI: 10.1016/j.socscimed.2014.01.048. 37. Nelson E. Blam - Berkeley localization and mapping, 2016. 23. Escriba C, Roux J, Hajjine B, et al. Smart wearable active https://github.com/erik-nelson/blam (accessed 3 April 2019). patch for elderly health prevention. In: 5th Annual Confer- 38. Fischler MA and Bolles RC. Random sample consensus: a ence on Computational Science & Computational Intelli- paradigm for model fitting with applications to image analy- gence, Las Vegas, United States, 13 December 2018. sis and automated cartography. Communications 1981; 24(6): 24. Ramadhan A. Wearable smart system for visually impaired 381–395. DOI: 10.1145/358669.358692. people. Sensors 2018; 18(3): 843. DOI: 10.3390/s18030843. 39. Ma L, Kerl C, Stckler J, et al. CPA-SLAM: Consistent 25. Zhu X, Li Q, and Chen G. Apt accurate outdoor pedestrian plane-model alignment for direct RGB-D slam. In: IEEE tracking with smartphones. In: Proceedings IEEE INFO- International Conference on Robotics and Automation, COM, Turin, Italy, 14 April 2013, pp. 2508–2516. IEEE. Stockholm, Sweden, 16 May 2016, pp. 1285–1291. DOI: 10.1109/INFCOM.2013.6567057. IEEE. DOI: 10.1109/ICRA.2016.7487260. 26. Kotaru M and Katti S. Position tracking for virtual reality 40. Shan T and Englot B. Lego-loam: lightweight and ground- using commodity Wi-Fi. In: IEEE Conference on Computer optimized LIDAR odometry and mapping on variable terrain. Vision and Pattern Recognition, Honolulu, USA, 21 July In: IEEE/RSJ International Conference on Intelligent Robots and Systems), Madrid, Spain, 1 October 2018, pp. 27. Soltanaghaei E, Kalyanaraman A, and Whitehouse K. Multi- 4758–4765. IEEE. DOI: 10.1109/IROS.2018.8594299. path triangulation: decimeter-level Wi-Fi localization and 41. Haselich M, Jobgen B, Wojke N, et al. Confidence- orientation with a single unaided receiver 2018; DOI: 10. basedpedestriantrackinginunstructured environments 1145/3210240.3210347. using 3D laser distance measurements. In: IEEE/RSJ 28. Edwards A, Silva B, dos Santos R, et al. Wi-Fi based International Conference on Intelligent Robots and Sys- indoor positioning using pattern recognition. In: IEEE tems, Chicago, USA, 14 September 2014, pp. 27th International Symposium on Industrial Electronics, 4118–4123. IEEE. DOI: 10.1109/iros.2014.6943142. Cairns, Australia, 13 June 2018. IEEE. DOI: 10.1109/ 42. Kulis B and Jordan MI. Revisiting k-means: new algorithms isie.2018.8433869. via Bayesian nonparametrics. CoRR 2011; abs/1111.0352. 16 International Journal of Advanced Robotic Systems 43. Kidono K, Miyasaka T, Watanabe A, et al. Pedestrian recog- on Computational learning theory, vol. 37, Madison, USA, nition using high-definition LIDAR. In: IEEE Intelligent 24 July 1998, pp. 297–336. ACM. DOI: 10.1145/279943. Vehicles Symp, (IV), Baden-Baden, Germany, 5 June 2011, 279960. pp. 405–410. IEEE. DOI: 10.1109/ivs.2011.5940433. 45. Radosavljevic Z. A study of a target tracking method using 44. Schapire RE and Singer Y. Improved boosting algorithms global nearest neighbor algorithm. Vojnotehnicki glasnik using confidence-rated predictions. In: Annual Conference 2006; (2): 160–167. DOI: 10.5937/vojtehg0602160r.

Journal

International Journal of Advanced Robotic SystemsSAGE

Published: Apr 21, 2019

Keywords: 3-D LIDAR; people detection and tracking; behavior analysis

There are no references for this article.