Get 20M+ Full-Text Papers For Less Than $1.50/day. Subscribe now for You or Your Team.

Learn More →

A Deep Learning Framework for Assessing Physical Rehabilitation Exercises

A Deep Learning Framework for Assessing Physical Rehabilitation Exercises A Deep Learning Framework for Assessing Physical Rehabilitation Exercises Yalin Liao, Aleksandar Vakanski, Member, IEEE, and Min Xian, Member, IEEE Abstract—Computer-aided assessment of physical rehabilitation of all rehabilitation sessions are performed in a home-based entails evaluation of patient performance in completing prescribed setting [2]. Under these circumstances, patients are tasked to rehabilitation exercises, based on processing movement data record their daily progress and periodically visit the clinic for captured with a sensory system. Despite the essential role of functional assessment. Still, numerous medical sources report rehabilitation assessment toward improved patient outcomes and low levels of patient adherence to the recommended exercise reduced healthcare costs, existing approaches lack versatility, regimens in home-based rehabilitation, leading to prolonged robustness, and practical relevance. In this paper, we propose a treatment times and increased healthcare cost [3], [4]. Although deep learning-based framework for automated assessment of the many different factors have been identified that contribute to quality of physical rehabilitation exercises. The main components of the low compliance rates, the major impact factor is the absence the framework are metrics for quantifying movement performance, scoring functions for mapping the performance metrics into of continuous feedback and oversight of patient exercises by a numerical scores of movement quality, and deep neural network healthcare professional [5]. Despite the development of a models for generating quality scores of input movements via variety of tools and devices in support of physical supervised learning. The proposed performance metric is defined rehabilitation, such as robotic assistive systems [6], virtual based on the log-likelihood of a Gaussian mixture model, and reality and gaming interfaces [7], and Kinect-based assistants encodes low-dimensional data representation obtained with a deep [2], there is still a lack of versatile and robust systems for autoencoder network. The proposed deep spatio-temporal neural automatic monitoring and assessment of patient performance. network arranges data into temporal pyramids, and exploits the The article proposes a novel framework for assessment of spatial characteristics of human movements by using home-based rehabilitation that encompasses formulation of sub-networks to process joint displacements of individual body parts. The presented framework is validated using a dataset of ten metrics for quantifying movement performance, scoring rehabilitation exercises. The significance of this work is that it is the functions for mapping the performance metrics into numerical first that implements deep neural networks for assessment of scores of movement quality, and deep learning-based rehabilitation performance. end-to-end models for encoding the relationship between movement data and quality scores. The employed performance Index Terms—movement modeling, deep learning, metric is based on probabilistic modeling of the skeletal joints performance metrics, physical rehabilitation data with a Gaussian mixture model, and consequently, it employs the log-likelihood of the model for performance evaluation [8]. Next, the article investigates the effectiveness of I. INTRODUCTION deep autoencoder neural networks for dimensionality reduction ARTICIPATION in physical therapy and rehabilitation of captured data. Further, we propose a scoring function for programs is often compulsory and critical in postoperative scaling the values of the performance metric into movement recovery or for treatment of a wide array of musculoskeletal quality scores in the [0, 1] range. The resulting scores are conditions. However, it is infeasible and economically employed as the ground truth for training the proposed deep unjustified to offer patient access to a clinician for every single neural networks (NNs) for rehabilitation applications. rehabilitation session [1]. Accordingly, current healthcare The paper introduces a deep NN model designed to handle systems around the world are organized such that an initial spatial and temporal variability in human movements. portion of rehabilitation programs is performed in an inpatient Motivation for the proposed network structure was prior work facility under direct supervision by a clinician, followed by a on temporal pyramids [9] and hierarchical recurrent networks second portion performed in an outpatient setting, where for motion classification [10]. Specifically, the proposed model patients perform a set of prescribed exercises in their own aims to exploit spatial characteristics of human movements by residence. Reports in the literature indicate that more than 90% hierarchical processing of the joint displacements of different body parts via a series of sub-networks that gradually merge the Manuscript submitted June 14, 2019. This work was supported by the extracted feature vectors. Temporal pyramids are introduced Center for Modeling Complex Interactions (CMCI) at the University of Idaho using movement sequences at different time scales in order to through NIH Award #P20GM104420. learn data representations at multiple levels of abstraction. The Yalin Liao, Aleksandar Vakanski, and Min Xian are with the Department of network contains both convolutional layers for learning spatial Computer Science, University of Idaho, 1776 Science Center Drive, Idaho Falls, ID, 83402, USA (e-mail: liao4728@vandals.uidaho.edu; dependencies and recurrent layers for encoding temporal vakanski@uidaho.edu; mxian@uidaho.edu). correlations in movement data. The framework is validated on the University of Idaho – Physical Rehabilitation Movement movements for prediction of future motion patterns [25], Dataset (UI-PRMD) [11]. To the best of our knowledge, this is synthesis of movement sequences [26], and density estimation the first framework that employs deep NNs for assessment of [8]. Conversely, little research has been conducted on the rehabilitation exercises. application of NNs for evaluation of movement quality, which The main contributions of the paper are: (1) A novel can otherwise find use in various applications (physical framework for computer-aided assessment of rehabilitation rehabilitation being one of them). exercises; (2) A deep spatio-temporal NN model for outputing B. Movement Assessment movement quality scores; and (3) A performance metric that Quantifying the level of correctness in completing prescribed employs probabilistic modeling and autoencoder NNs for exercises is important for the development of tools and devices dimensionality reduction of rehabilitation data. in support of home-based rehabilitation. The movement The article is organized as follows. The next section provides assessment in existing studies is typically accomplished by an overview of related work. Section III first introduces the comparing a patient’s performance of an exercise to the desired mathematical notation and afterward describes the components performance by healthy participants. of the proposed framework for rehabilitation assessment, Several studies in the literature on exercise evaluation including dimensionality reduction, performance metric, employed machine learning methods to classify the individual scoring function, and deep learning model. The validation of repetitions into correct or incorrect classes of movements. the proposed framework on a dataset of rehabilitation exercises Methods used for this purpose include Adaboost classifier, is presented in Section IV. The last two sections discuss the k-nearest neighbors, Bayesian classifier, and an ensemble of results and summarize the paper. multi-layer perceptron NNs [27]–[29]. The outputs in these approaches are discrete class values of 0 or 1 (i.e., incorrect or II. RELATED WORK correct repetition). However, these methods do not provide the A. Human Movement Modeling capacity to detect varying levels of movement quality or Conventional approaches for mathematical modeling and identify incremental changes in patient performance over the representation of human movements are broadly classified into duration of the rehabilitation program. two categories: top-down approaches that introduce latent The majority of related studies employ distance functions for states for describing the temporal dynamics of the movements, deriving movement quality scores. Concretely, Houmanfar et and bottom-up approaches that employ local features for al. [18] used a variant of the Mahalanobis distance to quantify representing the movements. Commonly used methods in the the level of correctness of rehabilitation movements, based on a first category include Kalman filters [12], hidden Markov calculated distance between patient-performed repetitions and a models [13], and Gaussian mixture models [14]. The main set of repetitions performed by a group of healthy individuals. shortcomings of these methods originate from employing linear Similarly, a body of work utilized the dynamic time warping models for the transitions among the latent states (as in Kalman (DTW) algorithm [30] for calculating the distance between a filters), or from adopting simple internal structures of the latent patient’s performance and healthy subjects’ performance [31]– states (typical for hidden Markov models). The approaches [33]. The advantage of the distance functions is that they are not based on extracting local features employ predefined criteria exercise-specific, and thus can be applied for assessment of for identifying key points [15] and/or required body postures new types of exercises. The distance functions also have [16], [17], or a collection of statistics of the movements (e.g., shortcomings, because they do not attempt to derive a model of mean, standard deviation, mode, median) [18]. Such local the rehabilitation data, and the distances are calculated at the features are typically motion-specific, which limits the ability level of individual time-steps in the raw sensory measurements. to efficiently handle arbitrary spatio-temporal variations within A line of research utilized probabilistic approaches for movement data. modeling and evaluation of rehabilitation movements. Studies Recent developments in artificial NNs stirred significant based on hidden Markov models [34], [35] and mixtures of interest in their application for modeling and analysis of human Gaussian distributions [8] typically perform a quality motions. Numerous works employed NNs for motion assessment based on the likelihood that the individual classification and applied the trained models for activity sequences are being drawn from a trained model. Although the recognition, gait identification, gesture recognition, action utilization of probabilistic models is advantageous in handling localization, and related applications. NN-based motion the variability due to the stochastic character of human classifiers utilizing different computational units have been movements, models with abilities for a hierarchical data proposed, including convolutional units [19], long short-term representation can produce more reliable outcomes for memory (LSTM) recurrent units [20], gated recurrent units, and movement quality assessment, and better generalize to new combinations [21] or modifications of these computational exercises. units [22]. Also, NNs with different layer structures have been implemented, such as encoder-decoder networks, III. PROPOSED METHOD spatio-temporal graphs [23], and attention mechanism models A block-diagram of the envisioned framework for assessing [24]. Besides the task of classification, a body of work in the rehabilitation exercises is depicted in Fig. 1. The skeletal joint literature focused on modeling and representation of human coordinates acquired by the sensory system are processed via 3 dimensionality reduction, performance quantification, and positions and/or orientations for each joint, and therefore the scoring mapping to obtain movement quality scores that are dimensionality of the data ranges between 45 and 120. subsequently used for training an NN model. The trained NN Dimensionality reduction of recorded data is an essential step in model is afterward used to automatically generate movement processing human movements to suppress unimportant, quality scores for input movement data acquired by the sensory redundant, or highly correlated dimensions. The aim is to (𝑡 ) (𝑡 ) system. project the data 𝕏 = { } into a 𝐗 = (𝐱 ): 𝐱 ∈ ℝ 𝑠 ,𝑟 𝑠 ,𝑟 𝑠 ,𝑟 (𝑡 ) (𝑡 ) ̃ ̃ lower-dimensional representation 𝕏 = {𝐗 = (𝐱 ̃ ): 𝐱 ̃ ∈ 𝑠 ,𝑟 𝑠 ,𝑟 𝑠 ,𝑟 ℝ }, for 𝑡 ∈ 𝑇 , 𝑠 ∈ 𝑆 , 𝑟 ∈ 𝑅 , where 𝑀 < 𝐷 . A common approach for dimensionality reduction of human movement data is maximum variance [36], which simply retains the first 𝑀 dimensions with the largest variance and discards the remaining dimensions. Principal component analysis (PCA) and its variants [37] are also widely used for reducing the dimensionality of movement data, where a matrix containing the leading 𝑀 eigenvectors corresponding to the Fig. 1. Overview of the proposed framework for assessment of rehabilitation largest eigenvalues of the covariance matrix 𝐕 is used for exercises. projecting the data into a lower-dimensional space. Although PCA is one of the most common approaches for dimensionality A. Notation reduction in general, it employs linear mapping of In outpatient physical rehabilitation, a daily rehabilitation high-dimensional data into a lower-dimensional representation. session requires completing a series of exercises, where the Likewise, the shortcomings of maximum variance originate patient is instructed to complete a certain number of repetitions from its simplicity. of each exercise during each session. The acquired data by the In the proposed framework, we introduce autoencoder NNs sensory system for one particular exercise performed by 𝑆 [38] for dimensionality reduction. Autoencoder NN is a healthy subjects is denoted by 𝕏 , and hereafter they are referred nonlinear technique for dimensionality reduction allowing to as reference movements. The symbol 𝑅 is used for the extracting richer data representations for dimensionality number or repetitions of the exercise by the 𝑠 -th subject. The reduction in comparison to the linear techniques (such as PCA). combined data for all 𝑅 repetitions of the exercise by the 𝑠 -th Furthermore, deep autoencoder NNs created by stacking subject is denoted ∆ . Similar, 𝑅 is used for the total number of multiple consecutive layers of hidden neurons, can additionally all repetitions by the 𝑆 subjects, i.e., 𝑅 = ∑ 𝑅 . Using the 𝑠 =1 𝑠 increase the representational capacity of the network. notation 𝐗 for the collected data of the 𝑟 -th repetition by the 𝑠 ,𝑟 Autoencoders are used for unsupervised learning of an { } 𝑠 -th subject, we have 𝕏 = ∆ for 𝑠 ∈ 𝑆 , where ∆ = 𝑠 𝑠 alternative representation of input data, through a process of {𝐗 } for 𝑟 ∈ 𝑅 . For convenience, throughout the text the data compression and reconstruction. The data processing 𝑠 ,𝑟 𝑠 involves an encoding step of compressing input data through underscore symbol denotes a set of indices, e.g., 𝑆 = one or multiple hidden layers, followed by a decoding step of { } 1,2, … , 𝑆 for any positive integer 𝑆 . The data for each reconstructing the output from the encoded representation repetition 𝐗 is a temporal sequence of 𝑇 measurements, 𝑠 ,𝑟 through one or multiple hidden layers. If 𝒜 denotes a class of ( ) ( ) ( ) 1 2 𝑇 therefore 𝐗 = (𝐱 , 𝐱 , … , 𝐱 ) , where the superscripts 𝑠 ,𝑟 𝑀 𝐷 𝑠 ,𝑟 𝑠 ,𝑟 𝑠 ,𝑟 mapping functions from ℝ to ℝ , and ℬ is a class of mapping are used for indexing the temporal order of the joint 𝐷 𝑀 functions from ℝ to ℝ , then for any function A ∈ 𝒜 displacement vectors within the repetition. Furthermore, the (𝑡 ) and B ∈ ℬ, the encoder portion projects an input 𝐱 ∈ ℝ into 𝑠 ,𝑟 (𝑡 ) individual measurement 𝐱 for 𝑡 ∈ 𝑇 is a D-dimensional 𝑠 ,𝑟 ( ) ( ) 𝑡 𝑡 a lower-dimensional representation 𝐱 ̃ = B(𝐱 ) ∈ ℝ 𝑠 ,𝑟 𝑠 ,𝑟 vector, consisting of the values for all joint displacements in the (referred to as a code), and the decoder portion converts the ( ) ( ) ( ) ( ) 𝑡 𝑡 ,1 𝑡 ,2 𝑡 ,𝐷 human body, i.e. 𝐱 = [𝑥 , 𝑥 , … 𝑥 ]. 𝑠 ,𝑟 𝑠 ,𝑟 𝑠 ,𝑟 𝑠 ,𝑟 (𝑡 ) code into an output A (B(𝐱 )) ∈ ℝ . Autoencoders are 𝑠 ,𝑟 The collected data for the patient group are referred in the trained to find functions A ∈ 𝒜 and B ∈ ℬ which minimize the article as patient movements, and are denoted with the mean squared deviation between the input data and output data, symbol 𝕐 . By analogy to the introduced notation for the i.e., reference movements, 𝕐 = {𝐘 }, where 𝐘 is the data of the 𝑠 ,𝑟 𝑠 ,𝑟 𝑟 -th repetition by the 𝑠 -th subject. Analogously, the repetition (𝑡 ) (𝑡 ) argmin ‖A (B(𝐱 )) − 𝐱 ‖. (1) 𝑠 ,𝑟 𝑠 ,𝑟 (1) (2) (𝑇 ) 𝐘 = (𝐲 , 𝐲 , … , 𝐲 ) is comprised of a sequence of A,B 𝑟 ,𝑠 𝑠 ,𝑟 𝑠 ,𝑟 𝑠 ,𝑟 ( ) ( ) ( ) ( ) 𝑡 𝑡 ,1 𝑡 ,2 𝑡 ,𝐷 multidimensional vectors 𝐲 = [𝑦 , 𝑦 , … , 𝑦 ]. A graphical representation of the adopted architecture for the 𝑠 ,𝑟 𝑠 ,𝑟 𝑠 ,𝑟 𝑠 ,𝑟 autoencoder network is presented in Fig. 2. The encoder portion B. Dimensionality Reduction consists of three intermediate layers of LSTM recurrent units The sensory systems for motion capturing typically track with 30, 10, and 4 computational units, and the corresponding between 15 and 40 skeletal joints, depending on the sensor decoder portion has three intermediate layers of LSTM units type. The measurement data consists of 3-dimensional spatial 4 with 10, 30, and 117 computational units, respectively. The D. Scoring Function input time-series data are 117-dimensional vectors of joint In the presented framework, a scoring function maps the coordinates. The code representation of the proposed network values of the performance metrics into a movement quality is a temporal sequence of 4-dimensional vectors. score in the range between 0 and 1. The resulting movement quality scores play a dual role in the framework. First, in a real-world exercise assessment setting, (4) (10) (30) (117) (30) (10) the quality scores allow for intuitive understanding of the calculated values of the used performance metric. For instance, Code a movement quality score of 88% presented to a patient is easy Input Output to understand, and it can also enable the patient to self-monitor his/her progress toward functional recovery based on received Legend: LSTM layers Joint coordinates data scores over a period of time. Second, the movement quality Fig. 2. The proposed autoencoder architecture projects an input movement data scores are used here for supervised training of the deep NN into a code representation, and re-projects the code into the movement data. models. For a sequence of performance metric values of the reference C. Performance Metric movements 𝐱 = (𝑥 , 𝑥 , ⋯ , 𝑥 ) and a sequence 𝐲 = (𝑦 , 𝑦 , … 1 2 𝐿 1 2 The metrics for quantifying the patient performance are 𝑦 ) related to the patient movements, we propose the following classified into model-less and model-based groups of metrics scoring function: [39]. The model-less metrics employ distance functions, such as Euclidean, Mahalanobis distance, and dynamic time warping 𝑥 −1 −𝛼 𝜇 +3𝛿 𝑥 ̅ = (1 + 𝑒 ) ; (4) (DTW) [30] deviation between data sequences. The model-based metrics apply probabilistic approaches for −1 −𝛼 𝑦 −𝑥 1 𝑘 𝑘 ( ) 𝜇 +3𝛿 modeling the movement data, and employ the log-likelihood 𝑦 ̅ = (1 + 𝑒 + ) , (5) 𝛼 (𝜇 +3𝛿 ) for performance evaluation [8]. We adopt a metric based on Gaussian mixture model (GMM) 1 1 𝐿 𝐿 where 𝑘 ∈ 𝐿 , 𝜇 = ∑ |𝑥 | , 𝛿 = √ ∑ (|𝑥 | − 𝜇 ) , and 𝑘 =1 𝑘 𝑘 𝑘 =1 𝐿 𝐿 log-likelihood. The choice stems from the demonstrated 𝛼 , 𝛼 are data-specific parameters. The proposed scoring capacity of statistical methods to encode the inherent random 1 2 function is monotonically decreasing, and is designed to variability in human movements; this results in improved preserve the distribution of the values of the performance ability by the model-based metrics to handle spatio-temporal metric. The values for the reference movements 𝑥 are scaled variations in rehabilitation data. Log-likelihood of a movement data for a given model is a natural choice for evaluation of data by 𝜇 + 3𝛿 in (4) to ensure that the resulting scores 𝑥 have instances in probabilistic models. values close to 1 for inputs 𝑥 in the range (𝜇 − 3𝛿 , 𝜇 + 3𝛿 ). GMM is a parametric probabilistic model for representing Similarly, for the patient movements 𝑦 the scoring function in data with a mixture of Gaussian probability density functions (5) is designed to preserve their distribution in mapping the [40]. GMM is frequently used for modeling human performance metric values into movement quality scores. movements. For a dataset consisting of multidimensional E. Deep Learning Architecture for Rehabilitation Assessment ( ) vectors 𝒙 , a GMM with 𝐶 Gaussian components has the form 𝑠 ,𝑟 We propose a novel deep learning model for spatio-temporal (𝑡 ) (𝑡 ) modeling of skeletal data, for application in rehabilitation 𝒫 (𝒙 |𝜆 ) = ∑ 𝜋 𝒩 (𝒙 |𝜇 , Σ ), (2) 𝑠 ,𝑟 𝑐 =1 𝑐 𝑠 ,𝑟 𝑐 𝑐 assessment. A graphical representation of the NN architecture { } where 𝜆 = 𝜋 , 𝜇 , Σ are the mixing coefficient, mean, and 𝑐 𝑐 𝑐 is provided in Fig. 3. The NN model is designed to exploit the covariance of the c-th Gaussian component, respectively. The spatial characteristics of human movements by dedicating most popular method for estimating the model parameters 𝜆 in sub-networks for processing joint displacements of individual GMM is the expectation maximization (EM) algorithm [41]; body parts. In addition, the input data is arranged into temporal other approaches include maximum-a-posteriori estimation pyramids for processing multiple scaled version of the [42] and mixture density networks [40]. Subsequently, for a movement repetitions. The initial hierarchical layers in the GMM model with parameters λ, the negative log-likelihood is model employ strided one-dimensional convolutional filters for used as a performance metrics, and for the repetition 𝐘 is 𝑠 ,𝑟 learning spatial dependencies in human movements, and are calculated as followed by a series of LSTM recurrent layers for modeling temporal correlations in learned representations. (𝑡 ) 𝑇 𝐶 ∑ ∑ 𝒫 (𝐘 |𝜆 ) = − log{ 𝜋 𝒩 (𝐲 |𝜇 , Σ )} . (3) 𝑠 ,𝑟 𝑡 =1 𝑐 =1 𝐶 𝑠 ,𝑟 𝐶 𝐶 (a) (b) (c) Fig 3. (a) The proposed spatio-temporal model architecture. (b) Temporal pyramid sub-network. (c) Multi-branch convolutional block. The architecture of the NN draws inspiration from the In the proposed network, the temporal pyramids are hierarchical model proposed by Du et al. [10] that employs five composed of full-scaled input sequences, and three recurrent sub-networks taking as inputs joint displacements of sub-sampled versions with a temporal length equal to one half, the left arm, right arm, left leg, right leg, and torso, respectively. one quarter, and one eight of the sequence (see Fig. 3(b)). The The outputs from the five sub-networks are merged into a resulting feature vectors are then concatenated and passed to single representation. Such hierarchical organization of the the next layers. Such data processing enables recognizing network layers allows low-level spatial information from joint movement patterns at different levels of abstraction, and led to movements to be exploited for obtaining a high-level improved performance of the deep model for movement representation of the body parts’ movements in accomplishing assessment. required actions. Differently form the model proposed by Du et Inputs to the network are 117-dimensional sequences of the al. [10] that consists of bidirectional layers with LSTM full-body joint angles corresponding to single repetitions of an recurrent units, our proposed network uses convolutional units exercise. The convolutional blocks consist of two in the hierarchical layers and recurrent units in the succeeding convolutional layers followed by dropout layers with a rate of layers. The presented ablation study and performance 0.25. For these layers we adopted the multi-branch design comparison in Section VI corroborate the advantage of the approach shown in Fig. 3(c), popularized in the inception introduced modifications in our spatio-temporal model. convolutional network architectures [46]. Each layer contains Similarly, the introduction of temporal pyramids for three branches of 1D convolutional filters with different length, processing rehabilitation data was motivated by the concept of which outputs are concatenated and passed to the next layer. image pyramids in computer vision. Temporal pyramids have The use of multiple branches allows the model the select the been used for processing video data by dynamically most suitable filter length based on the input data. The recurrent subsampling input videos at varying frame rate [43], temporal portion of the model consists of four layers with 80, 40, 40, and pooling of multi-scale data representations from extracted 80 LSTM units, respectively. The last layer has linear feature map [44], or by applying sliding windows with activations, and outputs a numerical movement quality score changeable scales to the sequences of images [45]. In these for an input repetition. Mean-squared-error was used as a cost works, the use of multi-scale video pyramids has been function for training the model parameters, with the Adam conducive to improve the detection and localization of human optimizer. A batch size of 5 repetitions was applied, with early actions in videos. stopping regularization. 6 One can note that the proposed model is not particularly where 𝑥 , 𝑦 ′ ∈ [1, 20] denote the scaled values of the deep, as it comprises of a relatively low number of hidden performance metrics, 𝑀 = max{𝑥 , 𝑦 }, and 𝑚 = min {𝑥 , 𝑦 }. 𝑖 𝑗 𝑖 𝑗 𝑖 ,𝑗 ∈𝐿 𝑖 ,𝑗 ∈𝐿 layers; however, considering that the used dataset is also of The scaled values of the Euclidean distance for exercises E1 relatively small size, larger and deeper networks would overfit and E2 are shown in Fig. 4. Green circle markers are used for and produce suboptimal results. the repetitions of the correct movements, whereas the red squares symbolize the repetitions of the incorrect movements. IV. EXPERIMENTAL RESULTS Note that inconsistent data (associated with measurement errors A. Dataset or subjects performing the exercise with their left-arm/leg in a set of mostly right arm/leg exercises) were manually removed For validation of the presented framework, we created the from the original dataset, resulting in less than 100 repetitions UI-PRMD dataset [11]. The dataset consists of skeletal data per subject. E.g., there are 90 correct and incorrect movements collected from 10 healthy subjects. Each subject completed 10 for E1 in Fig. 4(a), and 55 correct and incorrect movements for repetitions of 10 rehabilitation exercises, listed in Table I. The E2 in Fig. 4(b). data were acquired with a Vicon optical tracking system, and Separation degree: For comparison of the scaled values of consist of 117-dimensional sequences of angular joint the performance metrics we propose the concept of separation displacements. The subjects performed the exercises both in a degree. Specifically, for any positive real numbers 𝑥 , 𝑦 , their correct manner, hereafter referred to as correct movements, and 𝑥 −𝑦 in an incorrect manner, i.e., simulating performance by patients ( ) [ ] separation degree is defined as S 𝑥 , 𝑦 = ∈ −1, 1 . The 𝑥 +𝑦 with musculoskeletal constraints, hereafter referred to as separation degree between two positive sequences 𝐱 = incorrect movements. The research study related to the data (𝑥 , 𝑥 , … , 𝑥 ) and 𝐲 = (𝑦 , 𝑦 , … , 𝑦 ) is defined by 1 2 𝑚 1 2 𝑛 collection was approved by the Institutional Review Boards at 𝑚 𝑛 the University of Idaho under the identification code IRB S (𝐱 , 𝐲 ) = ∑ ∑ 𝑆 (𝑥 , 𝑦 ) . (7) D 𝑖 =1 𝑗 =1 𝐷 𝑖 𝑗 16-124. A written informed consent for participation in a Values of the separation degree close to 1 or −1 indicate research study was approved by the board, and was obtained good separation between the two sequences. Conversely, for from all participants in the study. A detailed description of the values of the separation degree close to 0, the sequences don’t UI-PRMD dataset is provided in [11]. separate well and they are almost mixed together. TABLE I When applied to the values of the distance metrics, the EXERCISES IN THE UI-PRMD DATASET separation degree indicates greater ability of the used metric to Order Exercise differentiate between correct and incorrect repetitions of an E1 Deep squat exercise. For instance, in Fig. 4(b) one can observe a clearer E2 Hurdle step differentiation between the correct and incorrect movements, in E3 Inline lunge comparison to Fig. 4(a). This results in a larger value of the E4 Side lunge separation degree for the repetitions of exercise E2, which were E5 Sit to stand E6 Standing active straight leg raise calculated at 0.384 for E1, and 0.497 for E2, respectively. E7 Standing shoulder abduction E8 Standing shoulder extension E9 Standing shoulder internal–external rotation E10 Standing shoulder scaption B. Performance Quantification In this section, the adopted performance metric based on the log-likelihood of GMM is evaluated on the UI-PRMD dataset. (a) (b) For comparison, three common performance metrics for Fig. 4. Scaled values of the Euclidean distance for the between-subject case for: assessment of rehabilitation exercises based on Euclidean, (a) First exercise E1 ( S = 0.384); (b) second exercise E2 ( S = 0.497). D D Mahalanobis, and DTW distance are also evaluated. Data scaling: To compare the performance metrics on the The values for the separation degrees for the four studied same basis, their values are first linearly scaled to the same performance metrics are presented in Table II. Each cell in the range. In this study the range [1, 20] was selected based on an table corresponds to the average separate degree values S for empirical understanding of the data. For the obtained values of the 10 exercises in the dataset. The shown values are the means the performance metrics related to repetitions of the correct and in parentheses are the standard deviations. For the movements denoted 𝐱 = (𝑥 , 𝑥 , … , 𝑥 ), and for the metrics of comparison, scaled values of the metrics according to (6) are 1 2 𝐿 ( ) the incorrect movements 𝐲 = 𝑦 , 𝑦 , … , 𝑦 , the following u sed. Values for both between-subject and within-subject cases 1 2 𝐿 scaling functions were used are presented. Table II also compares the values of the metrics for the cases of raw 117-dimensional data, and 19(𝑥 −𝑚 ) 19(𝑦 −𝑚 ) ′ 𝑖 ′ 𝑖 𝑥 = + 1 ; 𝑦 = + 1 for 𝑖 ∈ 𝐿 , (6) 𝑖 𝑖 low-dimensional data obtained with the methods of 𝑀 −𝑚 𝑀 −𝑚 maximum-variance, PCA, and GMM log-likelihood. The 𝑚𝑛 7 largest values for the separation degree are indicated in each manner, where the output is a predicted value of the movement row with a bold font. quality for an input repetition. For each of the 10 exercises in Conclusively, the GMM log-likelihood metric applied on a the UI-PRMD dataset, a separate NN is trained and used for low-dimensional data with the autoencoder NN resulted in the quality assessment. Each network model is run five times, and largest separation between the correct and incorrect movements we report the average absolute deviation between the ground for both between- and within-subject cases. The within-subject truth quality scores and the network prediction. case provides improved separation because the repetitions performed by the same subject are characterized with a lower level of variability. The value of the GMM log-likelihood is not provided for the 117-dimensional data because GMM is commonly applied on low-dimensional data. Furthermore, the performance of the Euclidean and DTW distances in Table II is comparable, and better than the Mahalanobis distance. Also, one can notice that the autoencoder NN lost less information in (a) (b) compressing the high-dimensional data sequences in Fig 5. (a) GMM log-likelihood values for exercise E1; (b) Corresponding comparison to maximum variance and PCA, because the quality scores. separation degree values for all metrics using autoencoders are To evaluate the respective contributions of the individual very close to the corresponding metric values of the components in the design of our spatio-temporal model we 117-dimensions data without dimensionality reduction. In conducted an ablation study. The results for the 10 exercises in implementing GMM on the dataset, the number of Gaussian the dataset are displayed in Table III. Lower values of the components C was set to 6. absolute deviation indicate low errors by the NN model in TABLE II predicting the quality score for input data. The upper row in the SEPARATION DEGREE FOR THE PERFORMANCE METRICS: MEAN (ST. DEV.) table presents the aggregated mean deviation for all exercises Euclidean Mahalanobis DTW Log-likelihood Metric distance distance distance GMM E1 to E10. The results of the ablation study support our Between-subject intuitive assumptions that the introduced components in the D =117 0.445 (0.087) 0.195 (0.152) 0.487 (0.063) -- proposed model related to the multi-branch layers, temporal D=3 (MV) 0.309 (0.101) 0.063 (0.130) 0.310 (0.100) 0.344 (0.049) pyramids, hierarchical structure, and combination of D=3 (PCA) 0.296 (0.103) 0.108 (0.169) 0.265 (0.093) 0.360 (0.060) convolutional and recurrent units all contribute to improved D=4 (AE) 0.423 (0.092) 0.229 (0.102) 0.427 (0.094) 0.515 (0.106) assessment of rehabilitation exercises. Within-subject D=117 0.568 (0.058) 0.441 (0.118 0.570 (0.059) -- TABLE III D=3 (MV) 0.472 (0.048) 0.325 (0.118 0.455 (0.053) 0.471 (0.098) ABLATION STUDY: AVERAGE ABSOLUTE DEVIATION PER EXERCISE D=3 (PCA) 0.508 (0.032) 0.322 (0.169) 0.501 (0.031) 0.518 (0.057) Without Without Without Without Our D=4 (AE) 0.582 (0.057) 0.474 (0.133) 0.574 (0.060) 0.603 (0.073) Exercise Branching Temporal Hierarch. Recurrent Approach Layers Pyramids Layers Layers D: data dimensions; MV: maximum variance; PCA: principal component analysis; AE: autoencoder neural networks E1-E10 0.02527 0.02537 0.02594 0.02953 0.04729 E1 0.01077 0.01213 0.01162 0.01222 0.03631 C. Neural Networks Performance E2 0.02824 0.02415 0.02785 0.03522 0.04322 E3 0.03980 0.04232 0.04286 0.05350 0.07876 For training the deep neural networks, the movement quality E4 0.01185 0.01495 0.01226 0.01048 0.03654 scores based on the GMM log-likelihood calculated with E5 0.01870 0.01758 0.01569 0.01719 0.03716 autoencoder-reduced data are employed. Only the case of E6 0.01779 0.02110 0.01930 0.01858 0.04104 E7 0.03819 0.03907 0.04241 0.04016 0.05699 between-subject is considered, since for the within-subject case E8 0.02305 0.02369 0.02418 0.02658 0.04589 the number of repetitions per subject is too small to train NNs. E9 0.02271 0.02284 0.02296 0.02738 0.04130 Scoring function: The scoring function presented in (4)-(5) is E10 0.04162 0.03584 0.04027 0.05395 0.05565 used to calculate the movement quality scores. The values of We further compared the performance of the proposed NN to the parameters are empirically selected as 𝛼 = 3.2 and 𝛼 = 1 2 state-of-the-art deep learning models for movement 10. For example, Fig. 5 depicts the values of the log-likelihood classification. We are not aware of any other deep NN models and the corresponding performance scores for exercise E1 (i.e., for movement assessment. On the other hand, there is a large deep squat). The scores for the correct movements shown in body of research on using deep learning models for Fig. 5(b) have values close to 1, whereas most of the scores for the incorrect movements are in the range between 0.7 and 0.9. classification/recognition/detection of human movements (in a NN evaluation: The model was implemented on a Dell general context, rather than for biomedical purposes). Therefore, we adapted several recent NN classifiers that have Precision 5810 workstation with Intel Xeon CPU, 32 GB RAM, achieved top performance, and we re-purposed the models for 2 TB hard disk, and an NVIDIA Titan Xp GPU card. Inputs to regressing movement quality scores. The selected models are: the NNs are pairs of repetition data containing raw Co-occurrence [47], PA-LSTM [48], Two-stream CNN [49], 117-dimensional angular joint measurements and quality scores. The networks are trained in a supervised regression 8 Hierarchical LSTM [10], as well as two basic Deep CNN and closely follow the values of the input quality scores for all data Deep LSTM architectures. instances. We also validate the proposed approach using For these networks, we replaced the last softmax layer with a leave-one-out cross-validation (i.e., testing on one subject a fully-connected layer with linear activations. Furthermore, we model trained on all other subjects). The performance was omitted all batch normalization layers (if any were present) in comparable to the presented results using random test data, the original models, as they significantly degraded the capacity with the predicted quality scores closely following the ground for movement assessment. Other than that, we closely followed truth values. the proposed implementation as described by the authors in the respective papers. Hierarchical LSTM is the network proposed by Du et al. [10] that served as a motivation for our proposed deep learning model. We selected the architectures and hyperparameters of the basic Deep CNN and Deep LSTM models through an extensive grid-search; the resulting CNN (a) (b) network has three convolutional layers (60, 30, and 10 units) Fig. 6. (a) Predictions on the training set for exercise E1; (b) Predictions on the followed by two fully-connected layers (200 and 100 units), validation set for exercise E1. whereas the Deep LSTM network contains one LSTM layer (20 The proposed model was next evaluated on the KIMORE units), one fully-connected layer (30 units), and another LSTM dataset [50], which contains data for five rehabilitation layer (10 units). The values of the average absolute deviations exercises performed by 44 healthy subjects and 34 patients, and are presented in Table IV. With regards to the ability for collected with a Kinect v2 sensor. We implemented our movement quality assessment of all 10 exercises in the dataset, proposed deep learning model on the deep squat exercise. We our proposed model outperformed the other deep learning employed full-body joint orientations data for 33 healthy classification models, although some of the models provided subjects and 18 patients, and extracted 4 repetitions for each better performance on several of the exercises in the dataset subject, resulting in 204 repetitions in total. The KIMORE (shown with a bold font in the table). The computational times dataset provides clinical scores for each subject’s performance for training the models averaged over all exercises are shown in in the [0, 50] range. To train the model, we scaled the values in the last row in Table IV. The proposed spatio-temporal model is the [0, 1] range, and randomly selected 142 repetitions for computationally less expensive than almost all compared training, and 62 for validation. The results are displayed in models. The prediction of movement quality scores for input Figure 7, where the predicted movement quality scores by the repetitions by the trained model is very fast, and it took about deep learning model closely follow the ground truth scores 10 milliseconds per repetition on average. provided by the clinicians. The obtained mean absolute TABLE IV deviation was 0.03786, which is greater than the deviation for PERFORMANCE COMPARISON: AVERAGE ABSOLUTE DEVIATION PER EXERCISE the deep squat exercise in the UI-PRMD dataset, probably due Co-occu PA-LS Two-strea Hierar. Our Deep Deep to the lower accuracy of Kinect v2 compared to Vicon, and also Exercise rrence TM m CNN LSTM approach CNN LSTM [47] [48] [49] [10] in the KIMORE dataset the same clinical score is assigned for E1-E10 0.02527 0.02703 0.02615 0.04534 0.11044 0.08819 0.04059 all repetitions performed by the same subject. E1 0.01077 0.01052 0.01357 0.01839 0.28798 0.03010 0.01670 E2 0.02824 0.02905 0.02953 0.04413 0.22349 0.07742 0.04934 V. DISCUSSION E3 0.03980 0.05577 0.04141 0.08094 0.20493 0.13766 0.09382 E4 0.01185 0.01347 0.01640 0.02347 0.36033 0.03580 0.01609 The article introduces a novel framework for the assessment E5 0.01870 0.01687 0.01300 0.03156 0.12332 0.06367 0.02536 of rehabilitation exercises via deep NNs. The framework E6 0.01779 0.01886 0.02349 0.03426 0.21119 0.04676 0.02166 includes performance metrics, scoring functions, and NN E7 0.03819 0.02733 0.03346 0.04954 0.05016 0.19280 0.04090 models. Common metrics for quantifying the level of E8 0.02305 0.02464 0.02905 0.05070 0.04337 0.07260 0.04590 E9 0.02271 0.02720 0.02495 0.04313 0.14411 0.06508 0.04419 consistency in captured rehabilitation movements are E10 0.04162 0.04657 0.03667 0.07727 0.11044 0.16009 0.05198 compared. The metrics include Euclidean, Mahalanobis, DTW Training time (in seconds) distance, and GMM log-likelihood. The concept of separation 177 325 52 598 4,668 295 410 degree is proposed for metric comparison. GMM log-likelihood outperformed the model-less metrics on the The results of the proposed deep NN for assessment of UI-PRMD dataset. Such results confirm our hypothesis that exercise E1 are depicted in Fig. 6. The set of 90 correct and 90 efficient movement assessment is strongly predicated on the incorrect repetitions was randomly split using a ratio of 0.7/0.3 provision of models of human movements. Probabilistic into a training set of 124 and a validation set of 56 repetitions. approaches, such as the used GMM approach, have improved The ground truth scores and predicted scores for the training ability to handle the inherent variability and measurement and validation sets are shown in Figs. 6(a) and (b), respectively. uncertainty in human movement data, in comparison to the In the two sub-figures the first half of the scores are for the model-less approaches. correct sequences and have values close to one, and the second We compared the performance of PCA and maximum half of the scores pertain to the incorrect sequences and have variance approaches for dimensionality reduction of human lower quality scores. Conclusively, the network predictions movements to autoencoder NNs. Expectedly, the provision of 9 nonlinear functions for neuron activations in autoencoders of algorithms for dimensionality reduction, performance provides richer representational capacity of the data into a metrics, scoring functions, and deep learning models. The lower dimensional space, in comparison to the linear technique framework is evaluated on a dataset of 10 rehabilitation of PCA and the simple concept of maximum variance. exercises. The experimental results indicate that the quality scores generated by the proposed framework closely follow the ground truth quality scores for the movements. This work demonstrates the potential of deep learning models for assessment of rehabilitation exercises. Such models can consistently outperform the approaches that employ distance functions for movement assessment where the data (a) (b) processing is performed on low-level measurements of joint Fig. 7. (a) Predictions on the training set for deep squat exercise; (b) Predictions coordinates at the individual time-steps, and the probabilistic on the validation set for deep squat exercise. approaches where the data modeling is typically performed at a We propose a deep learning architecture for hierarchical single level of abstraction. The advantages of deep NNs for this spatio-temporal modeling of rehabilitation exercises at multiple task originate from the capacity for hierarchical modeling of levels of abstraction. NNs are trained for each exercise via human movements at multiple spatial and temporal levels of supervised regression, where for inputs comprising exercise abstraction. This type of models provide improved abilities to repetitions the inferred outputs are quality scores. The network “understand” the levels of hierarchy and the complex structure combines hierarchical merging of extracted feature spatiotemporal correlations in human movement data. vectors from different body parts, pyramidal processing of the movement sequences subsamples at multiple temporal scales, REFERENCES and multi-branch blocks for learning the structure of the used [1] S. R. Machlin, J. Chevan, W. W. Yu, and M. W. Zodet, “Determinants of computational units. Although recurrent units are most utilization and expenditures for episodes of ambulatory physical therapy commonly used for processing sequential time-series data as among adults,” Phys Ther, vol. 91, no. 7, pp. 1018–1029, Jul. 2011. [2] R. Komatireddy, A. Chokshi, J. Basnett, M. Casale, D. Goble, and T. the considered rehabilitation movements, our proposed model Shubert, “Quality and Quantity of Rehabilitation Exercises Delivered By employs convolutional filters in the initial layers and LSTM A 3-D Motion Controlled Camera: A Pilot Study,” Int J Phys Med recurrent units in the later layers of the network. The reasons Rehabil, vol. 2, no. 4, Aug. 2014. for such design stem from the following: (1) the employed [3] S. F. Bassett and H. Prapavessis, “Home-based physical therapy intervention with adherence-enhancing strategies versus clinic-based dataset is fairly small, consisting of less than 200 repetitions per management for patients with ankle sprains,” Phys Ther, vol. 87, no. 9, exercises, hence recurrent NNs can overfit the data due to the pp. 1132–1143, Sep. 2007. larger number of used parameters, and (2) a growing body of [4] K. Jack, S. M. McLean, J. K. Moffett, and E. Gardiner, “Barriers to work report of improved performance by CNNs on time-series treatment adherence in physiotherapy outpatient clinics: A systematic review,” Man Ther, vol. 15, no. 3–2, pp. 220–228, Jun. 2010. and movement data [51]. The proposed deep learning model [5] K. K. Miller, R. E. Porter, E. DeBaun-Sprague, M. Van Puymbroeck, and outperformed recent state-of-the-art deep NNs designed for A. A. Schmid, “Exercise after Stroke: Patient Adherence and Beliefs after movement classification. Discharge from Rehabilitation,” Top Stroke Rehabil, vol. 24, no. 2, pp. Our presented research has several limitations. The 142–148, 2017. [6] P. Maciejasz, J. Eschweiler, K. Gerlach-Hahn, A. Jansen-Troy, and S. validation is primarily performed on rehabilitation exercises Leonhardt, “A survey on robotic devices for upper limb rehabilitation,” performed by healthy subjects, where the measurements are Journal of NeuroEngineering and Rehabilitation, vol. 11, no. 1, p. 3, Jan. acquired with an expensive optical motion capturing system. Additionally, the largest segment of the validation is based on [7] L. V. Gauthier et al., “Video Game Rehabilitation for Outpatient Stroke (VIGoROUS): protocol for a multi-center comparative effectiveness trial movement data without a ground truth assessment of the of in-home gamified constraint-induced movement therapy for movement quality by clinicians. The evaluation of the deep rehabilitation of chronic upper extremity hemiparesis,” BMC Neurology, squat exercise in the KIMORE dataset provides a partial vol. 17, no. 1, p. 109, Jun. 2017. validation on patient data collected with a low-cost sensor. [8] A. Vakanski, J. M. Ferguson, and S. Lee, “Mathematical Modeling and Evaluation of Human Motions in Physical Therapy Using Mixture In future work, we will attempt to address the above-listed Density Neural Networks,” J Physiother Phys Rehabil, vol. 1, no. 4, Dec. shortcomings of this study, i.e., we will focus on a thorough validation of the framework on rehabilitation exercises [9] J. Choi, W. J. Jeon, and S.-C. Lee, “Spatio-temporal Pyramid Matching performed by patients and labeled by a group of clinicians who for Sports Videos,” in Proceedings of the 1st ACM International Conference on Multimedia Information Retrieval, New York, NY, USA, will assign quality scores. We will validate the proposed 2008, pp. 291–297. approach by acquiring muscle activity measurements. Also, we [10] Y. Du, W. Wang, and L. Wang, “Hierarchical recurrent neural network have plans to implement the framework for assessment of for skeleton based action recognition,” in 2015 IEEE Conference on patient performance in home-based rehabilitation using a Computer Vision and Pattern Recognition (CVPR), 2015, pp. 1110– Kinect sensor. [11] A. Vakanski, H. Jun, D. Paul, and R. Baker, “A Data Set of Human Body Movements for Physical Rehabilitation Exercises,” Data, vol. 3, no. 1, p. VI. CONCLUSION 2, Jan. 2018. [12] X. Yun and E. R. Bachmann, “Design, Implementation, and The article proposes a deep learning-based framework for Experimental Results of a Quaternion-Based Kalman Filter for Human assessment of rehabilitation exercises. The framework consists 10 Body Motion Tracking,” IEEE Transactions on Robotics, vol. 22, no. 6, [31] C.-J. Su, C.-Y. Chiang, and J.-Y. Huang, “Kinect-enabled home-based pp. 1216–1227, Dec. 2006. rehabilitation system using Dynamic Time Warping and fuzzy logic,” [13] G. Panahandeh, N. Mohammadiha, A. Leijon, and P. Handel, Applied Soft Computing, vol. 22, pp. 652–666, Sep. 2014. “Continuous Hidden Markov Model for Pedestrian Activity [32] Z. Zhang, Q. Fang, and X. Gu, “Objective Assessment of Upper-Limb Classification and Gait Analysis.,” IEEE Trans. Instrumentation and Mobility for Poststroke Rehabilitation,” IEEE Transactions on Measurement, vol. 62, no. 5, pp. 1073–1083, 2013. Biomedical Engineering, vol. 63, no. 4, pp. 859–868, Apr. 2016. [14] Y. Huang, K. B. Englehart, B. Hudgins, and A. D. C. Chan, “A Gaussian [33] D. Antón, A. Goñi, and A. Illarramendi, “Exercise Recognition for mixture model based classification scheme for myoelectric control of Kinect-based Telerehabilitation,” Methods Inf Med, vol. 54, no. 02, pp. powered upper limb prostheses,” IEEE Transactions on Biomedical 145–155, 2015. Engineering, vol. 52, no. 11, pp. 1801–1811, Nov. 2005. [34] M. Capecci et al., “A Hidden Semi-Markov Model based approach for [15] A. Vakanski, I. Mantegh, A. Irish, and F. Janabi-Sharifi, “Trajectory rehabilitation exercise assessment,” Journal of Biomedical Informatics, Learning for Robot Programming by Demonstration Using Hidden vol. 78, pp. 1–11, Feb. 2018. Markov Model and Dynamic Time Warping,” IEEE Transactions on [35] J. F. Lin, M. Karg, and D. Kulić, “Movement Primitive Segmentation for Systems, Man, and Cybernetics, Part B (Cybernetics), vol. 42, no. 4, pp. Human Motion Modeling: A Framework for Analysis,” IEEE 1039–1052, Aug. 2012. Transactions on Human-Machine Systems, vol. 46, no. 3, pp. 325–339, [16] D. Biswas, Z. Ye, E. B. Mazomenos, M. Jöbges, and K. Maharatna, Jun. 2016. “CORDIC Framework for Quaternion-based Joint Angle Computation to [36] J. F. Lin and D. Kulić, “Online Segmentation of Human Motion for Classify Arm Movements.,” in IEEE International Symposium on Automated Rehabilitation Exercise Analysis,” IEEE Transactions on Circuits and Systems, ISCAS 2018, 27-30 May 2018, Florence, Italy, Neural Systems and Rehabilitation Engineering, vol. 22, no. 1, pp. 168– 2018, pp. 1–5. 180, Jan. 2014. [17] E. B. Mazomenos et al., “Detecting Elementary Arm Movements by [37] F. Bashir, W. Qu, A. Khokhar, and D. Schonfeld, “HMM-based motion Tracking Upper Limb Joint Angles With MARG Sensors.,” IEEE J. recognition system using segmented PCA,” in IEEE International Biomedical and Health Informatics, vol. 20, no. 4, pp. 1088–1099, 2016. Conference on Image Processing 2005, 2005, vol. 3, pp. III–1288. [18] R. Houmanfar, M. Karg, and D. Kulic, “Movement Analysis of [38] H. Bourlard and Y. Kamp, “Auto-association by multilayer perceptrons Rehabilitation Exercises: Distance Metrics for Measuring Patient and singular value decomposition,” Biol. Cybern., vol. 59, no. 4, pp. 291– Progress,” IEEE Systems Journal, vol. 10, no. 3, pp. 1014–1025, Sep. 294, Sep. 1988. 2016. [39] A. Vakanski, J. M Ferguson, and S. Lee, “Metrics for Performance [19] M. Baccouche, F. Mamalet, C. Wolf, C. Garcia, and A. Baskurt, Evaluation of Patient Exercises during Physical Therapy,” International “Sequential Deep Learning for Human Action Recognition,” in Human Journal of Physical Medicine & Rehabilitation, vol. 05, no. 03, 2017. Behavior Understanding, 2011, pp. 29–39. [40] C. M. Bishop, Pattern Recognition and Machine Learning. New York: [20] G. Lefebvre, S. Berlemont, F. Mamalet, and C. Garcia, “BLSTM-RNN Springer, 2011. Based 3D Gesture Classification,” in Artificial Neural Networks and [41] A. P. Dempster, N. M. Laird, and D. B. Rubin, “Maximum likelihood Machine Learning – ICANN 2013, 2013, pp. 381–388. from incomplete data via the EM algorithm,” Journal of the Royal [21] F. J. Ordóñez and D. Roggen, “Deep Convolutional and LSTM Recurrent Statistical Society, Series B, vol. 39, no. 1, pp. 1–38, 1977. Neural Networks for Multimodal Wearable Activity Recognition,” [42] G. J. McLachlan and K. E. Basford, “Mixture models : inference and Sensors, vol. 16, no. 1, p. 115, Jan. 2016. applications to clustering,” Thesis, New York, N.Y. : M. Dekker, 1988. [22] A. Shahroudy, J. Liu, T.-T. Ng, and G. Wang, “NTU RGB+D: A Large [43] D. Zhang, X. Dai, and Y.-F. Wang, “Dynamic Temporal Pyramid Scale Dataset for 3D Human Activity Analysis,” arXiv:1604.02808 [cs], Network: A Closer Look at Multi-scale Modeling for Activity Apr. 2016. Detection,” in Computer Vision – ACCV 2018, 2019, pp. 712–728. [23] A. Jain, A. R. Zamir, S. Savarese, and A. Saxena, “Structural-RNN: Deep [44] Y. S. Zhao, Y. Xiong, L. Wang, Z. Wu, D. Lin, and X. Tang, “Temporal Learning on Spatio-Temporal Graphs,” in 2016 IEEE Conference on Action Detection with Structured Segment Networks,” 2017 IEEE Computer Vision and Pattern Recognition (CVPR), 2016, pp. 5308– International Conference on Computer Vision (ICCV), pp. 2933–2942, 5317. 2017. [24] S. Song, C. Lan, J. Xing, W. Zeng, and J. Liu, “An End-to-End [45] Z. Shou, D. Wang, and S.-F. Chang, “Temporal Action Localization in Spatio-Temporal Attention Model for Human Action Recognition from Untrimmed Videos via Multi-stage CNNs,” in 2016 IEEE Conference on Skeleton Data,” in Association for the Advancement of Artificial Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, Intelligence (AAAI), 2017, pp. 4263–4270. USA, 2016, pp. 1049–1058. [25] J. Bütepage, M. J. Black, D. Kragic, and H. Kjellström, “Deep [46] C. Szegedy, S. Ioffe, V. Vanhoucke, and A. Alemi, “Inception-v4, Representation Learning for Human Motion Prediction and Inception-ResNet and the Impact of Residual Connections on Learning,” Classification,” in 2017 IEEE Conference on Computer Vision and arXiv:1602.07261 [cs], Feb. 2016. Pattern Recognition (CVPR), 2017, pp. 1591–1599. [47] C. Li, Q. Zhong, D. Xie, and S. Pu, “Co-occurrence Feature Learning [26] K. Fragkiadaki, S. Levine, P. Felsen, and J. Malik, “Recurrent Network from Skeleton Data for Action Recognition and Detection with Models for Human Dynamics,” in Proceedings of the 2015 IEEE Hierarchical Aggregation,” in Proceedings of the 27th International Joint International Conference on Computer Vision (ICCV), Washington, DC, Conference on Artificial Intelligence, Stockholm, Sweden, 2018, pp. USA, 2015, pp. 4346–4354. 786–792. [27] Z. Zhang, Q. Fang, L. Wang, and P. Barrett, “Template matching based [48] A. Shahroudy, J. Liu, T.-T. Ng, and G. Wang, “NTU RGB+D: A Large motion classification for unsupervised post-stroke rehabilitation,” in Scale Dataset for 3D Human Activity Analysis,” 2016 IEEE Conference International Symposium on Bioelectronics and Bioinformations 2011, on Computer Vision and Pattern Recognition (CVPR), pp. 1010–1019, 2011, pp. 199–202. 2016. [28] I. Ar and Y. S. Akgul, “A computerized recognition system for the [49] K. Simonyan and A. Zisserman, “Two-Stream Convolutional Networks home-based physiotherapy exercises using an RGBD camera,” IEEE for Action Recognition in Videos,” in Advances in Neural Information Transactions on Neural Systems and Rehabilitation Engineering, vol. 22, Processing Systems 27, Z. Ghahramani, M. Welling, C. Cortes, N. D. no. 6, pp. 1160–1171, 2014. Lawrence, and K. Q. Weinberger, Eds. Curran Associates, Inc., 2014, pp. [29] T. Hussain, H. F. Maqbool, N. Iqbal, M. Khan, Salman, and A. A. 568–576. Dehghani-Sanij, “Computational model for the recognition of lower limb [50] M. Capecci et al., “The KIMORE dataset: KInematic assessment of movement using wearable gyroscope sensor.,” IJSNet, vol. 30, no. 1, pp. MOvement and clinical scores for remote monitoring of physical 35–45, 2019. REhabilitation,” IEEE Transactions on Neural Systems and [30] H. Sakoe, “Dynamic programming algorithm optimization for spoken Rehabilitation Engineering, 2019. word recognition,” IEEE Transactions on Acoustics, Speech, and Signal [51] T. M. Le, N. Inoue, and K. Shinoda, “A Fine-to-Coarse Convolutional Processing, vol. 26, pp. 43–49, 1978. Neural Network for 3D Human Action Recognition,” arXiv:1805.11790 [cs], May 2018. http://www.deepdyve.com/assets/images/DeepDyve-Logo-lg.png Statistics arXiv (Cornell University)

A Deep Learning Framework for Assessing Physical Rehabilitation Exercises

Statistics , Volume 2019 (1901) – Jan 29, 2019

Loading next page...
 
/lp/arxiv-cornell-university/a-deep-learning-framework-for-assessing-physical-rehabilitation-WOg0eyn6fV

References (59)

ISSN
1534-4320
eISSN
ARCH-3347
DOI
10.1109/TNSRE.2020.2966249
Publisher site
See Article on Publisher Site

Abstract

A Deep Learning Framework for Assessing Physical Rehabilitation Exercises Yalin Liao, Aleksandar Vakanski, Member, IEEE, and Min Xian, Member, IEEE Abstract—Computer-aided assessment of physical rehabilitation of all rehabilitation sessions are performed in a home-based entails evaluation of patient performance in completing prescribed setting [2]. Under these circumstances, patients are tasked to rehabilitation exercises, based on processing movement data record their daily progress and periodically visit the clinic for captured with a sensory system. Despite the essential role of functional assessment. Still, numerous medical sources report rehabilitation assessment toward improved patient outcomes and low levels of patient adherence to the recommended exercise reduced healthcare costs, existing approaches lack versatility, regimens in home-based rehabilitation, leading to prolonged robustness, and practical relevance. In this paper, we propose a treatment times and increased healthcare cost [3], [4]. Although deep learning-based framework for automated assessment of the many different factors have been identified that contribute to quality of physical rehabilitation exercises. The main components of the low compliance rates, the major impact factor is the absence the framework are metrics for quantifying movement performance, scoring functions for mapping the performance metrics into of continuous feedback and oversight of patient exercises by a numerical scores of movement quality, and deep neural network healthcare professional [5]. Despite the development of a models for generating quality scores of input movements via variety of tools and devices in support of physical supervised learning. The proposed performance metric is defined rehabilitation, such as robotic assistive systems [6], virtual based on the log-likelihood of a Gaussian mixture model, and reality and gaming interfaces [7], and Kinect-based assistants encodes low-dimensional data representation obtained with a deep [2], there is still a lack of versatile and robust systems for autoencoder network. The proposed deep spatio-temporal neural automatic monitoring and assessment of patient performance. network arranges data into temporal pyramids, and exploits the The article proposes a novel framework for assessment of spatial characteristics of human movements by using home-based rehabilitation that encompasses formulation of sub-networks to process joint displacements of individual body parts. The presented framework is validated using a dataset of ten metrics for quantifying movement performance, scoring rehabilitation exercises. The significance of this work is that it is the functions for mapping the performance metrics into numerical first that implements deep neural networks for assessment of scores of movement quality, and deep learning-based rehabilitation performance. end-to-end models for encoding the relationship between movement data and quality scores. The employed performance Index Terms—movement modeling, deep learning, metric is based on probabilistic modeling of the skeletal joints performance metrics, physical rehabilitation data with a Gaussian mixture model, and consequently, it employs the log-likelihood of the model for performance evaluation [8]. Next, the article investigates the effectiveness of I. INTRODUCTION deep autoencoder neural networks for dimensionality reduction ARTICIPATION in physical therapy and rehabilitation of captured data. Further, we propose a scoring function for programs is often compulsory and critical in postoperative scaling the values of the performance metric into movement recovery or for treatment of a wide array of musculoskeletal quality scores in the [0, 1] range. The resulting scores are conditions. However, it is infeasible and economically employed as the ground truth for training the proposed deep unjustified to offer patient access to a clinician for every single neural networks (NNs) for rehabilitation applications. rehabilitation session [1]. Accordingly, current healthcare The paper introduces a deep NN model designed to handle systems around the world are organized such that an initial spatial and temporal variability in human movements. portion of rehabilitation programs is performed in an inpatient Motivation for the proposed network structure was prior work facility under direct supervision by a clinician, followed by a on temporal pyramids [9] and hierarchical recurrent networks second portion performed in an outpatient setting, where for motion classification [10]. Specifically, the proposed model patients perform a set of prescribed exercises in their own aims to exploit spatial characteristics of human movements by residence. Reports in the literature indicate that more than 90% hierarchical processing of the joint displacements of different body parts via a series of sub-networks that gradually merge the Manuscript submitted June 14, 2019. This work was supported by the extracted feature vectors. Temporal pyramids are introduced Center for Modeling Complex Interactions (CMCI) at the University of Idaho using movement sequences at different time scales in order to through NIH Award #P20GM104420. learn data representations at multiple levels of abstraction. The Yalin Liao, Aleksandar Vakanski, and Min Xian are with the Department of network contains both convolutional layers for learning spatial Computer Science, University of Idaho, 1776 Science Center Drive, Idaho Falls, ID, 83402, USA (e-mail: liao4728@vandals.uidaho.edu; dependencies and recurrent layers for encoding temporal vakanski@uidaho.edu; mxian@uidaho.edu). correlations in movement data. The framework is validated on the University of Idaho – Physical Rehabilitation Movement movements for prediction of future motion patterns [25], Dataset (UI-PRMD) [11]. To the best of our knowledge, this is synthesis of movement sequences [26], and density estimation the first framework that employs deep NNs for assessment of [8]. Conversely, little research has been conducted on the rehabilitation exercises. application of NNs for evaluation of movement quality, which The main contributions of the paper are: (1) A novel can otherwise find use in various applications (physical framework for computer-aided assessment of rehabilitation rehabilitation being one of them). exercises; (2) A deep spatio-temporal NN model for outputing B. Movement Assessment movement quality scores; and (3) A performance metric that Quantifying the level of correctness in completing prescribed employs probabilistic modeling and autoencoder NNs for exercises is important for the development of tools and devices dimensionality reduction of rehabilitation data. in support of home-based rehabilitation. The movement The article is organized as follows. The next section provides assessment in existing studies is typically accomplished by an overview of related work. Section III first introduces the comparing a patient’s performance of an exercise to the desired mathematical notation and afterward describes the components performance by healthy participants. of the proposed framework for rehabilitation assessment, Several studies in the literature on exercise evaluation including dimensionality reduction, performance metric, employed machine learning methods to classify the individual scoring function, and deep learning model. The validation of repetitions into correct or incorrect classes of movements. the proposed framework on a dataset of rehabilitation exercises Methods used for this purpose include Adaboost classifier, is presented in Section IV. The last two sections discuss the k-nearest neighbors, Bayesian classifier, and an ensemble of results and summarize the paper. multi-layer perceptron NNs [27]–[29]. The outputs in these approaches are discrete class values of 0 or 1 (i.e., incorrect or II. RELATED WORK correct repetition). However, these methods do not provide the A. Human Movement Modeling capacity to detect varying levels of movement quality or Conventional approaches for mathematical modeling and identify incremental changes in patient performance over the representation of human movements are broadly classified into duration of the rehabilitation program. two categories: top-down approaches that introduce latent The majority of related studies employ distance functions for states for describing the temporal dynamics of the movements, deriving movement quality scores. Concretely, Houmanfar et and bottom-up approaches that employ local features for al. [18] used a variant of the Mahalanobis distance to quantify representing the movements. Commonly used methods in the the level of correctness of rehabilitation movements, based on a first category include Kalman filters [12], hidden Markov calculated distance between patient-performed repetitions and a models [13], and Gaussian mixture models [14]. The main set of repetitions performed by a group of healthy individuals. shortcomings of these methods originate from employing linear Similarly, a body of work utilized the dynamic time warping models for the transitions among the latent states (as in Kalman (DTW) algorithm [30] for calculating the distance between a filters), or from adopting simple internal structures of the latent patient’s performance and healthy subjects’ performance [31]– states (typical for hidden Markov models). The approaches [33]. The advantage of the distance functions is that they are not based on extracting local features employ predefined criteria exercise-specific, and thus can be applied for assessment of for identifying key points [15] and/or required body postures new types of exercises. The distance functions also have [16], [17], or a collection of statistics of the movements (e.g., shortcomings, because they do not attempt to derive a model of mean, standard deviation, mode, median) [18]. Such local the rehabilitation data, and the distances are calculated at the features are typically motion-specific, which limits the ability level of individual time-steps in the raw sensory measurements. to efficiently handle arbitrary spatio-temporal variations within A line of research utilized probabilistic approaches for movement data. modeling and evaluation of rehabilitation movements. Studies Recent developments in artificial NNs stirred significant based on hidden Markov models [34], [35] and mixtures of interest in their application for modeling and analysis of human Gaussian distributions [8] typically perform a quality motions. Numerous works employed NNs for motion assessment based on the likelihood that the individual classification and applied the trained models for activity sequences are being drawn from a trained model. Although the recognition, gait identification, gesture recognition, action utilization of probabilistic models is advantageous in handling localization, and related applications. NN-based motion the variability due to the stochastic character of human classifiers utilizing different computational units have been movements, models with abilities for a hierarchical data proposed, including convolutional units [19], long short-term representation can produce more reliable outcomes for memory (LSTM) recurrent units [20], gated recurrent units, and movement quality assessment, and better generalize to new combinations [21] or modifications of these computational exercises. units [22]. Also, NNs with different layer structures have been implemented, such as encoder-decoder networks, III. PROPOSED METHOD spatio-temporal graphs [23], and attention mechanism models A block-diagram of the envisioned framework for assessing [24]. Besides the task of classification, a body of work in the rehabilitation exercises is depicted in Fig. 1. The skeletal joint literature focused on modeling and representation of human coordinates acquired by the sensory system are processed via 3 dimensionality reduction, performance quantification, and positions and/or orientations for each joint, and therefore the scoring mapping to obtain movement quality scores that are dimensionality of the data ranges between 45 and 120. subsequently used for training an NN model. The trained NN Dimensionality reduction of recorded data is an essential step in model is afterward used to automatically generate movement processing human movements to suppress unimportant, quality scores for input movement data acquired by the sensory redundant, or highly correlated dimensions. The aim is to (𝑡 ) (𝑡 ) system. project the data 𝕏 = { } into a 𝐗 = (𝐱 ): 𝐱 ∈ ℝ 𝑠 ,𝑟 𝑠 ,𝑟 𝑠 ,𝑟 (𝑡 ) (𝑡 ) ̃ ̃ lower-dimensional representation 𝕏 = {𝐗 = (𝐱 ̃ ): 𝐱 ̃ ∈ 𝑠 ,𝑟 𝑠 ,𝑟 𝑠 ,𝑟 ℝ }, for 𝑡 ∈ 𝑇 , 𝑠 ∈ 𝑆 , 𝑟 ∈ 𝑅 , where 𝑀 < 𝐷 . A common approach for dimensionality reduction of human movement data is maximum variance [36], which simply retains the first 𝑀 dimensions with the largest variance and discards the remaining dimensions. Principal component analysis (PCA) and its variants [37] are also widely used for reducing the dimensionality of movement data, where a matrix containing the leading 𝑀 eigenvectors corresponding to the Fig. 1. Overview of the proposed framework for assessment of rehabilitation largest eigenvalues of the covariance matrix 𝐕 is used for exercises. projecting the data into a lower-dimensional space. Although PCA is one of the most common approaches for dimensionality A. Notation reduction in general, it employs linear mapping of In outpatient physical rehabilitation, a daily rehabilitation high-dimensional data into a lower-dimensional representation. session requires completing a series of exercises, where the Likewise, the shortcomings of maximum variance originate patient is instructed to complete a certain number of repetitions from its simplicity. of each exercise during each session. The acquired data by the In the proposed framework, we introduce autoencoder NNs sensory system for one particular exercise performed by 𝑆 [38] for dimensionality reduction. Autoencoder NN is a healthy subjects is denoted by 𝕏 , and hereafter they are referred nonlinear technique for dimensionality reduction allowing to as reference movements. The symbol 𝑅 is used for the extracting richer data representations for dimensionality number or repetitions of the exercise by the 𝑠 -th subject. The reduction in comparison to the linear techniques (such as PCA). combined data for all 𝑅 repetitions of the exercise by the 𝑠 -th Furthermore, deep autoencoder NNs created by stacking subject is denoted ∆ . Similar, 𝑅 is used for the total number of multiple consecutive layers of hidden neurons, can additionally all repetitions by the 𝑆 subjects, i.e., 𝑅 = ∑ 𝑅 . Using the 𝑠 =1 𝑠 increase the representational capacity of the network. notation 𝐗 for the collected data of the 𝑟 -th repetition by the 𝑠 ,𝑟 Autoencoders are used for unsupervised learning of an { } 𝑠 -th subject, we have 𝕏 = ∆ for 𝑠 ∈ 𝑆 , where ∆ = 𝑠 𝑠 alternative representation of input data, through a process of {𝐗 } for 𝑟 ∈ 𝑅 . For convenience, throughout the text the data compression and reconstruction. The data processing 𝑠 ,𝑟 𝑠 involves an encoding step of compressing input data through underscore symbol denotes a set of indices, e.g., 𝑆 = one or multiple hidden layers, followed by a decoding step of { } 1,2, … , 𝑆 for any positive integer 𝑆 . The data for each reconstructing the output from the encoded representation repetition 𝐗 is a temporal sequence of 𝑇 measurements, 𝑠 ,𝑟 through one or multiple hidden layers. If 𝒜 denotes a class of ( ) ( ) ( ) 1 2 𝑇 therefore 𝐗 = (𝐱 , 𝐱 , … , 𝐱 ) , where the superscripts 𝑠 ,𝑟 𝑀 𝐷 𝑠 ,𝑟 𝑠 ,𝑟 𝑠 ,𝑟 mapping functions from ℝ to ℝ , and ℬ is a class of mapping are used for indexing the temporal order of the joint 𝐷 𝑀 functions from ℝ to ℝ , then for any function A ∈ 𝒜 displacement vectors within the repetition. Furthermore, the (𝑡 ) and B ∈ ℬ, the encoder portion projects an input 𝐱 ∈ ℝ into 𝑠 ,𝑟 (𝑡 ) individual measurement 𝐱 for 𝑡 ∈ 𝑇 is a D-dimensional 𝑠 ,𝑟 ( ) ( ) 𝑡 𝑡 a lower-dimensional representation 𝐱 ̃ = B(𝐱 ) ∈ ℝ 𝑠 ,𝑟 𝑠 ,𝑟 vector, consisting of the values for all joint displacements in the (referred to as a code), and the decoder portion converts the ( ) ( ) ( ) ( ) 𝑡 𝑡 ,1 𝑡 ,2 𝑡 ,𝐷 human body, i.e. 𝐱 = [𝑥 , 𝑥 , … 𝑥 ]. 𝑠 ,𝑟 𝑠 ,𝑟 𝑠 ,𝑟 𝑠 ,𝑟 (𝑡 ) code into an output A (B(𝐱 )) ∈ ℝ . Autoencoders are 𝑠 ,𝑟 The collected data for the patient group are referred in the trained to find functions A ∈ 𝒜 and B ∈ ℬ which minimize the article as patient movements, and are denoted with the mean squared deviation between the input data and output data, symbol 𝕐 . By analogy to the introduced notation for the i.e., reference movements, 𝕐 = {𝐘 }, where 𝐘 is the data of the 𝑠 ,𝑟 𝑠 ,𝑟 𝑟 -th repetition by the 𝑠 -th subject. Analogously, the repetition (𝑡 ) (𝑡 ) argmin ‖A (B(𝐱 )) − 𝐱 ‖. (1) 𝑠 ,𝑟 𝑠 ,𝑟 (1) (2) (𝑇 ) 𝐘 = (𝐲 , 𝐲 , … , 𝐲 ) is comprised of a sequence of A,B 𝑟 ,𝑠 𝑠 ,𝑟 𝑠 ,𝑟 𝑠 ,𝑟 ( ) ( ) ( ) ( ) 𝑡 𝑡 ,1 𝑡 ,2 𝑡 ,𝐷 multidimensional vectors 𝐲 = [𝑦 , 𝑦 , … , 𝑦 ]. A graphical representation of the adopted architecture for the 𝑠 ,𝑟 𝑠 ,𝑟 𝑠 ,𝑟 𝑠 ,𝑟 autoencoder network is presented in Fig. 2. The encoder portion B. Dimensionality Reduction consists of three intermediate layers of LSTM recurrent units The sensory systems for motion capturing typically track with 30, 10, and 4 computational units, and the corresponding between 15 and 40 skeletal joints, depending on the sensor decoder portion has three intermediate layers of LSTM units type. The measurement data consists of 3-dimensional spatial 4 with 10, 30, and 117 computational units, respectively. The D. Scoring Function input time-series data are 117-dimensional vectors of joint In the presented framework, a scoring function maps the coordinates. The code representation of the proposed network values of the performance metrics into a movement quality is a temporal sequence of 4-dimensional vectors. score in the range between 0 and 1. The resulting movement quality scores play a dual role in the framework. First, in a real-world exercise assessment setting, (4) (10) (30) (117) (30) (10) the quality scores allow for intuitive understanding of the calculated values of the used performance metric. For instance, Code a movement quality score of 88% presented to a patient is easy Input Output to understand, and it can also enable the patient to self-monitor his/her progress toward functional recovery based on received Legend: LSTM layers Joint coordinates data scores over a period of time. Second, the movement quality Fig. 2. The proposed autoencoder architecture projects an input movement data scores are used here for supervised training of the deep NN into a code representation, and re-projects the code into the movement data. models. For a sequence of performance metric values of the reference C. Performance Metric movements 𝐱 = (𝑥 , 𝑥 , ⋯ , 𝑥 ) and a sequence 𝐲 = (𝑦 , 𝑦 , … 1 2 𝐿 1 2 The metrics for quantifying the patient performance are 𝑦 ) related to the patient movements, we propose the following classified into model-less and model-based groups of metrics scoring function: [39]. The model-less metrics employ distance functions, such as Euclidean, Mahalanobis distance, and dynamic time warping 𝑥 −1 −𝛼 𝜇 +3𝛿 𝑥 ̅ = (1 + 𝑒 ) ; (4) (DTW) [30] deviation between data sequences. The model-based metrics apply probabilistic approaches for −1 −𝛼 𝑦 −𝑥 1 𝑘 𝑘 ( ) 𝜇 +3𝛿 modeling the movement data, and employ the log-likelihood 𝑦 ̅ = (1 + 𝑒 + ) , (5) 𝛼 (𝜇 +3𝛿 ) for performance evaluation [8]. We adopt a metric based on Gaussian mixture model (GMM) 1 1 𝐿 𝐿 where 𝑘 ∈ 𝐿 , 𝜇 = ∑ |𝑥 | , 𝛿 = √ ∑ (|𝑥 | − 𝜇 ) , and 𝑘 =1 𝑘 𝑘 𝑘 =1 𝐿 𝐿 log-likelihood. The choice stems from the demonstrated 𝛼 , 𝛼 are data-specific parameters. The proposed scoring capacity of statistical methods to encode the inherent random 1 2 function is monotonically decreasing, and is designed to variability in human movements; this results in improved preserve the distribution of the values of the performance ability by the model-based metrics to handle spatio-temporal metric. The values for the reference movements 𝑥 are scaled variations in rehabilitation data. Log-likelihood of a movement data for a given model is a natural choice for evaluation of data by 𝜇 + 3𝛿 in (4) to ensure that the resulting scores 𝑥 have instances in probabilistic models. values close to 1 for inputs 𝑥 in the range (𝜇 − 3𝛿 , 𝜇 + 3𝛿 ). GMM is a parametric probabilistic model for representing Similarly, for the patient movements 𝑦 the scoring function in data with a mixture of Gaussian probability density functions (5) is designed to preserve their distribution in mapping the [40]. GMM is frequently used for modeling human performance metric values into movement quality scores. movements. For a dataset consisting of multidimensional E. Deep Learning Architecture for Rehabilitation Assessment ( ) vectors 𝒙 , a GMM with 𝐶 Gaussian components has the form 𝑠 ,𝑟 We propose a novel deep learning model for spatio-temporal (𝑡 ) (𝑡 ) modeling of skeletal data, for application in rehabilitation 𝒫 (𝒙 |𝜆 ) = ∑ 𝜋 𝒩 (𝒙 |𝜇 , Σ ), (2) 𝑠 ,𝑟 𝑐 =1 𝑐 𝑠 ,𝑟 𝑐 𝑐 assessment. A graphical representation of the NN architecture { } where 𝜆 = 𝜋 , 𝜇 , Σ are the mixing coefficient, mean, and 𝑐 𝑐 𝑐 is provided in Fig. 3. The NN model is designed to exploit the covariance of the c-th Gaussian component, respectively. The spatial characteristics of human movements by dedicating most popular method for estimating the model parameters 𝜆 in sub-networks for processing joint displacements of individual GMM is the expectation maximization (EM) algorithm [41]; body parts. In addition, the input data is arranged into temporal other approaches include maximum-a-posteriori estimation pyramids for processing multiple scaled version of the [42] and mixture density networks [40]. Subsequently, for a movement repetitions. The initial hierarchical layers in the GMM model with parameters λ, the negative log-likelihood is model employ strided one-dimensional convolutional filters for used as a performance metrics, and for the repetition 𝐘 is 𝑠 ,𝑟 learning spatial dependencies in human movements, and are calculated as followed by a series of LSTM recurrent layers for modeling temporal correlations in learned representations. (𝑡 ) 𝑇 𝐶 ∑ ∑ 𝒫 (𝐘 |𝜆 ) = − log{ 𝜋 𝒩 (𝐲 |𝜇 , Σ )} . (3) 𝑠 ,𝑟 𝑡 =1 𝑐 =1 𝐶 𝑠 ,𝑟 𝐶 𝐶 (a) (b) (c) Fig 3. (a) The proposed spatio-temporal model architecture. (b) Temporal pyramid sub-network. (c) Multi-branch convolutional block. The architecture of the NN draws inspiration from the In the proposed network, the temporal pyramids are hierarchical model proposed by Du et al. [10] that employs five composed of full-scaled input sequences, and three recurrent sub-networks taking as inputs joint displacements of sub-sampled versions with a temporal length equal to one half, the left arm, right arm, left leg, right leg, and torso, respectively. one quarter, and one eight of the sequence (see Fig. 3(b)). The The outputs from the five sub-networks are merged into a resulting feature vectors are then concatenated and passed to single representation. Such hierarchical organization of the the next layers. Such data processing enables recognizing network layers allows low-level spatial information from joint movement patterns at different levels of abstraction, and led to movements to be exploited for obtaining a high-level improved performance of the deep model for movement representation of the body parts’ movements in accomplishing assessment. required actions. Differently form the model proposed by Du et Inputs to the network are 117-dimensional sequences of the al. [10] that consists of bidirectional layers with LSTM full-body joint angles corresponding to single repetitions of an recurrent units, our proposed network uses convolutional units exercise. The convolutional blocks consist of two in the hierarchical layers and recurrent units in the succeeding convolutional layers followed by dropout layers with a rate of layers. The presented ablation study and performance 0.25. For these layers we adopted the multi-branch design comparison in Section VI corroborate the advantage of the approach shown in Fig. 3(c), popularized in the inception introduced modifications in our spatio-temporal model. convolutional network architectures [46]. Each layer contains Similarly, the introduction of temporal pyramids for three branches of 1D convolutional filters with different length, processing rehabilitation data was motivated by the concept of which outputs are concatenated and passed to the next layer. image pyramids in computer vision. Temporal pyramids have The use of multiple branches allows the model the select the been used for processing video data by dynamically most suitable filter length based on the input data. The recurrent subsampling input videos at varying frame rate [43], temporal portion of the model consists of four layers with 80, 40, 40, and pooling of multi-scale data representations from extracted 80 LSTM units, respectively. The last layer has linear feature map [44], or by applying sliding windows with activations, and outputs a numerical movement quality score changeable scales to the sequences of images [45]. In these for an input repetition. Mean-squared-error was used as a cost works, the use of multi-scale video pyramids has been function for training the model parameters, with the Adam conducive to improve the detection and localization of human optimizer. A batch size of 5 repetitions was applied, with early actions in videos. stopping regularization. 6 One can note that the proposed model is not particularly where 𝑥 , 𝑦 ′ ∈ [1, 20] denote the scaled values of the deep, as it comprises of a relatively low number of hidden performance metrics, 𝑀 = max{𝑥 , 𝑦 }, and 𝑚 = min {𝑥 , 𝑦 }. 𝑖 𝑗 𝑖 𝑗 𝑖 ,𝑗 ∈𝐿 𝑖 ,𝑗 ∈𝐿 layers; however, considering that the used dataset is also of The scaled values of the Euclidean distance for exercises E1 relatively small size, larger and deeper networks would overfit and E2 are shown in Fig. 4. Green circle markers are used for and produce suboptimal results. the repetitions of the correct movements, whereas the red squares symbolize the repetitions of the incorrect movements. IV. EXPERIMENTAL RESULTS Note that inconsistent data (associated with measurement errors A. Dataset or subjects performing the exercise with their left-arm/leg in a set of mostly right arm/leg exercises) were manually removed For validation of the presented framework, we created the from the original dataset, resulting in less than 100 repetitions UI-PRMD dataset [11]. The dataset consists of skeletal data per subject. E.g., there are 90 correct and incorrect movements collected from 10 healthy subjects. Each subject completed 10 for E1 in Fig. 4(a), and 55 correct and incorrect movements for repetitions of 10 rehabilitation exercises, listed in Table I. The E2 in Fig. 4(b). data were acquired with a Vicon optical tracking system, and Separation degree: For comparison of the scaled values of consist of 117-dimensional sequences of angular joint the performance metrics we propose the concept of separation displacements. The subjects performed the exercises both in a degree. Specifically, for any positive real numbers 𝑥 , 𝑦 , their correct manner, hereafter referred to as correct movements, and 𝑥 −𝑦 in an incorrect manner, i.e., simulating performance by patients ( ) [ ] separation degree is defined as S 𝑥 , 𝑦 = ∈ −1, 1 . The 𝑥 +𝑦 with musculoskeletal constraints, hereafter referred to as separation degree between two positive sequences 𝐱 = incorrect movements. The research study related to the data (𝑥 , 𝑥 , … , 𝑥 ) and 𝐲 = (𝑦 , 𝑦 , … , 𝑦 ) is defined by 1 2 𝑚 1 2 𝑛 collection was approved by the Institutional Review Boards at 𝑚 𝑛 the University of Idaho under the identification code IRB S (𝐱 , 𝐲 ) = ∑ ∑ 𝑆 (𝑥 , 𝑦 ) . (7) D 𝑖 =1 𝑗 =1 𝐷 𝑖 𝑗 16-124. A written informed consent for participation in a Values of the separation degree close to 1 or −1 indicate research study was approved by the board, and was obtained good separation between the two sequences. Conversely, for from all participants in the study. A detailed description of the values of the separation degree close to 0, the sequences don’t UI-PRMD dataset is provided in [11]. separate well and they are almost mixed together. TABLE I When applied to the values of the distance metrics, the EXERCISES IN THE UI-PRMD DATASET separation degree indicates greater ability of the used metric to Order Exercise differentiate between correct and incorrect repetitions of an E1 Deep squat exercise. For instance, in Fig. 4(b) one can observe a clearer E2 Hurdle step differentiation between the correct and incorrect movements, in E3 Inline lunge comparison to Fig. 4(a). This results in a larger value of the E4 Side lunge separation degree for the repetitions of exercise E2, which were E5 Sit to stand E6 Standing active straight leg raise calculated at 0.384 for E1, and 0.497 for E2, respectively. E7 Standing shoulder abduction E8 Standing shoulder extension E9 Standing shoulder internal–external rotation E10 Standing shoulder scaption B. Performance Quantification In this section, the adopted performance metric based on the log-likelihood of GMM is evaluated on the UI-PRMD dataset. (a) (b) For comparison, three common performance metrics for Fig. 4. Scaled values of the Euclidean distance for the between-subject case for: assessment of rehabilitation exercises based on Euclidean, (a) First exercise E1 ( S = 0.384); (b) second exercise E2 ( S = 0.497). D D Mahalanobis, and DTW distance are also evaluated. Data scaling: To compare the performance metrics on the The values for the separation degrees for the four studied same basis, their values are first linearly scaled to the same performance metrics are presented in Table II. Each cell in the range. In this study the range [1, 20] was selected based on an table corresponds to the average separate degree values S for empirical understanding of the data. For the obtained values of the 10 exercises in the dataset. The shown values are the means the performance metrics related to repetitions of the correct and in parentheses are the standard deviations. For the movements denoted 𝐱 = (𝑥 , 𝑥 , … , 𝑥 ), and for the metrics of comparison, scaled values of the metrics according to (6) are 1 2 𝐿 ( ) the incorrect movements 𝐲 = 𝑦 , 𝑦 , … , 𝑦 , the following u sed. Values for both between-subject and within-subject cases 1 2 𝐿 scaling functions were used are presented. Table II also compares the values of the metrics for the cases of raw 117-dimensional data, and 19(𝑥 −𝑚 ) 19(𝑦 −𝑚 ) ′ 𝑖 ′ 𝑖 𝑥 = + 1 ; 𝑦 = + 1 for 𝑖 ∈ 𝐿 , (6) 𝑖 𝑖 low-dimensional data obtained with the methods of 𝑀 −𝑚 𝑀 −𝑚 maximum-variance, PCA, and GMM log-likelihood. The 𝑚𝑛 7 largest values for the separation degree are indicated in each manner, where the output is a predicted value of the movement row with a bold font. quality for an input repetition. For each of the 10 exercises in Conclusively, the GMM log-likelihood metric applied on a the UI-PRMD dataset, a separate NN is trained and used for low-dimensional data with the autoencoder NN resulted in the quality assessment. Each network model is run five times, and largest separation between the correct and incorrect movements we report the average absolute deviation between the ground for both between- and within-subject cases. The within-subject truth quality scores and the network prediction. case provides improved separation because the repetitions performed by the same subject are characterized with a lower level of variability. The value of the GMM log-likelihood is not provided for the 117-dimensional data because GMM is commonly applied on low-dimensional data. Furthermore, the performance of the Euclidean and DTW distances in Table II is comparable, and better than the Mahalanobis distance. Also, one can notice that the autoencoder NN lost less information in (a) (b) compressing the high-dimensional data sequences in Fig 5. (a) GMM log-likelihood values for exercise E1; (b) Corresponding comparison to maximum variance and PCA, because the quality scores. separation degree values for all metrics using autoencoders are To evaluate the respective contributions of the individual very close to the corresponding metric values of the components in the design of our spatio-temporal model we 117-dimensions data without dimensionality reduction. In conducted an ablation study. The results for the 10 exercises in implementing GMM on the dataset, the number of Gaussian the dataset are displayed in Table III. Lower values of the components C was set to 6. absolute deviation indicate low errors by the NN model in TABLE II predicting the quality score for input data. The upper row in the SEPARATION DEGREE FOR THE PERFORMANCE METRICS: MEAN (ST. DEV.) table presents the aggregated mean deviation for all exercises Euclidean Mahalanobis DTW Log-likelihood Metric distance distance distance GMM E1 to E10. The results of the ablation study support our Between-subject intuitive assumptions that the introduced components in the D =117 0.445 (0.087) 0.195 (0.152) 0.487 (0.063) -- proposed model related to the multi-branch layers, temporal D=3 (MV) 0.309 (0.101) 0.063 (0.130) 0.310 (0.100) 0.344 (0.049) pyramids, hierarchical structure, and combination of D=3 (PCA) 0.296 (0.103) 0.108 (0.169) 0.265 (0.093) 0.360 (0.060) convolutional and recurrent units all contribute to improved D=4 (AE) 0.423 (0.092) 0.229 (0.102) 0.427 (0.094) 0.515 (0.106) assessment of rehabilitation exercises. Within-subject D=117 0.568 (0.058) 0.441 (0.118 0.570 (0.059) -- TABLE III D=3 (MV) 0.472 (0.048) 0.325 (0.118 0.455 (0.053) 0.471 (0.098) ABLATION STUDY: AVERAGE ABSOLUTE DEVIATION PER EXERCISE D=3 (PCA) 0.508 (0.032) 0.322 (0.169) 0.501 (0.031) 0.518 (0.057) Without Without Without Without Our D=4 (AE) 0.582 (0.057) 0.474 (0.133) 0.574 (0.060) 0.603 (0.073) Exercise Branching Temporal Hierarch. Recurrent Approach Layers Pyramids Layers Layers D: data dimensions; MV: maximum variance; PCA: principal component analysis; AE: autoencoder neural networks E1-E10 0.02527 0.02537 0.02594 0.02953 0.04729 E1 0.01077 0.01213 0.01162 0.01222 0.03631 C. Neural Networks Performance E2 0.02824 0.02415 0.02785 0.03522 0.04322 E3 0.03980 0.04232 0.04286 0.05350 0.07876 For training the deep neural networks, the movement quality E4 0.01185 0.01495 0.01226 0.01048 0.03654 scores based on the GMM log-likelihood calculated with E5 0.01870 0.01758 0.01569 0.01719 0.03716 autoencoder-reduced data are employed. Only the case of E6 0.01779 0.02110 0.01930 0.01858 0.04104 E7 0.03819 0.03907 0.04241 0.04016 0.05699 between-subject is considered, since for the within-subject case E8 0.02305 0.02369 0.02418 0.02658 0.04589 the number of repetitions per subject is too small to train NNs. E9 0.02271 0.02284 0.02296 0.02738 0.04130 Scoring function: The scoring function presented in (4)-(5) is E10 0.04162 0.03584 0.04027 0.05395 0.05565 used to calculate the movement quality scores. The values of We further compared the performance of the proposed NN to the parameters are empirically selected as 𝛼 = 3.2 and 𝛼 = 1 2 state-of-the-art deep learning models for movement 10. For example, Fig. 5 depicts the values of the log-likelihood classification. We are not aware of any other deep NN models and the corresponding performance scores for exercise E1 (i.e., for movement assessment. On the other hand, there is a large deep squat). The scores for the correct movements shown in body of research on using deep learning models for Fig. 5(b) have values close to 1, whereas most of the scores for the incorrect movements are in the range between 0.7 and 0.9. classification/recognition/detection of human movements (in a NN evaluation: The model was implemented on a Dell general context, rather than for biomedical purposes). Therefore, we adapted several recent NN classifiers that have Precision 5810 workstation with Intel Xeon CPU, 32 GB RAM, achieved top performance, and we re-purposed the models for 2 TB hard disk, and an NVIDIA Titan Xp GPU card. Inputs to regressing movement quality scores. The selected models are: the NNs are pairs of repetition data containing raw Co-occurrence [47], PA-LSTM [48], Two-stream CNN [49], 117-dimensional angular joint measurements and quality scores. The networks are trained in a supervised regression 8 Hierarchical LSTM [10], as well as two basic Deep CNN and closely follow the values of the input quality scores for all data Deep LSTM architectures. instances. We also validate the proposed approach using For these networks, we replaced the last softmax layer with a leave-one-out cross-validation (i.e., testing on one subject a fully-connected layer with linear activations. Furthermore, we model trained on all other subjects). The performance was omitted all batch normalization layers (if any were present) in comparable to the presented results using random test data, the original models, as they significantly degraded the capacity with the predicted quality scores closely following the ground for movement assessment. Other than that, we closely followed truth values. the proposed implementation as described by the authors in the respective papers. Hierarchical LSTM is the network proposed by Du et al. [10] that served as a motivation for our proposed deep learning model. We selected the architectures and hyperparameters of the basic Deep CNN and Deep LSTM models through an extensive grid-search; the resulting CNN (a) (b) network has three convolutional layers (60, 30, and 10 units) Fig. 6. (a) Predictions on the training set for exercise E1; (b) Predictions on the followed by two fully-connected layers (200 and 100 units), validation set for exercise E1. whereas the Deep LSTM network contains one LSTM layer (20 The proposed model was next evaluated on the KIMORE units), one fully-connected layer (30 units), and another LSTM dataset [50], which contains data for five rehabilitation layer (10 units). The values of the average absolute deviations exercises performed by 44 healthy subjects and 34 patients, and are presented in Table IV. With regards to the ability for collected with a Kinect v2 sensor. We implemented our movement quality assessment of all 10 exercises in the dataset, proposed deep learning model on the deep squat exercise. We our proposed model outperformed the other deep learning employed full-body joint orientations data for 33 healthy classification models, although some of the models provided subjects and 18 patients, and extracted 4 repetitions for each better performance on several of the exercises in the dataset subject, resulting in 204 repetitions in total. The KIMORE (shown with a bold font in the table). The computational times dataset provides clinical scores for each subject’s performance for training the models averaged over all exercises are shown in in the [0, 50] range. To train the model, we scaled the values in the last row in Table IV. The proposed spatio-temporal model is the [0, 1] range, and randomly selected 142 repetitions for computationally less expensive than almost all compared training, and 62 for validation. The results are displayed in models. The prediction of movement quality scores for input Figure 7, where the predicted movement quality scores by the repetitions by the trained model is very fast, and it took about deep learning model closely follow the ground truth scores 10 milliseconds per repetition on average. provided by the clinicians. The obtained mean absolute TABLE IV deviation was 0.03786, which is greater than the deviation for PERFORMANCE COMPARISON: AVERAGE ABSOLUTE DEVIATION PER EXERCISE the deep squat exercise in the UI-PRMD dataset, probably due Co-occu PA-LS Two-strea Hierar. Our Deep Deep to the lower accuracy of Kinect v2 compared to Vicon, and also Exercise rrence TM m CNN LSTM approach CNN LSTM [47] [48] [49] [10] in the KIMORE dataset the same clinical score is assigned for E1-E10 0.02527 0.02703 0.02615 0.04534 0.11044 0.08819 0.04059 all repetitions performed by the same subject. E1 0.01077 0.01052 0.01357 0.01839 0.28798 0.03010 0.01670 E2 0.02824 0.02905 0.02953 0.04413 0.22349 0.07742 0.04934 V. DISCUSSION E3 0.03980 0.05577 0.04141 0.08094 0.20493 0.13766 0.09382 E4 0.01185 0.01347 0.01640 0.02347 0.36033 0.03580 0.01609 The article introduces a novel framework for the assessment E5 0.01870 0.01687 0.01300 0.03156 0.12332 0.06367 0.02536 of rehabilitation exercises via deep NNs. The framework E6 0.01779 0.01886 0.02349 0.03426 0.21119 0.04676 0.02166 includes performance metrics, scoring functions, and NN E7 0.03819 0.02733 0.03346 0.04954 0.05016 0.19280 0.04090 models. Common metrics for quantifying the level of E8 0.02305 0.02464 0.02905 0.05070 0.04337 0.07260 0.04590 E9 0.02271 0.02720 0.02495 0.04313 0.14411 0.06508 0.04419 consistency in captured rehabilitation movements are E10 0.04162 0.04657 0.03667 0.07727 0.11044 0.16009 0.05198 compared. The metrics include Euclidean, Mahalanobis, DTW Training time (in seconds) distance, and GMM log-likelihood. The concept of separation 177 325 52 598 4,668 295 410 degree is proposed for metric comparison. GMM log-likelihood outperformed the model-less metrics on the The results of the proposed deep NN for assessment of UI-PRMD dataset. Such results confirm our hypothesis that exercise E1 are depicted in Fig. 6. The set of 90 correct and 90 efficient movement assessment is strongly predicated on the incorrect repetitions was randomly split using a ratio of 0.7/0.3 provision of models of human movements. Probabilistic into a training set of 124 and a validation set of 56 repetitions. approaches, such as the used GMM approach, have improved The ground truth scores and predicted scores for the training ability to handle the inherent variability and measurement and validation sets are shown in Figs. 6(a) and (b), respectively. uncertainty in human movement data, in comparison to the In the two sub-figures the first half of the scores are for the model-less approaches. correct sequences and have values close to one, and the second We compared the performance of PCA and maximum half of the scores pertain to the incorrect sequences and have variance approaches for dimensionality reduction of human lower quality scores. Conclusively, the network predictions movements to autoencoder NNs. Expectedly, the provision of 9 nonlinear functions for neuron activations in autoencoders of algorithms for dimensionality reduction, performance provides richer representational capacity of the data into a metrics, scoring functions, and deep learning models. The lower dimensional space, in comparison to the linear technique framework is evaluated on a dataset of 10 rehabilitation of PCA and the simple concept of maximum variance. exercises. The experimental results indicate that the quality scores generated by the proposed framework closely follow the ground truth quality scores for the movements. This work demonstrates the potential of deep learning models for assessment of rehabilitation exercises. Such models can consistently outperform the approaches that employ distance functions for movement assessment where the data (a) (b) processing is performed on low-level measurements of joint Fig. 7. (a) Predictions on the training set for deep squat exercise; (b) Predictions coordinates at the individual time-steps, and the probabilistic on the validation set for deep squat exercise. approaches where the data modeling is typically performed at a We propose a deep learning architecture for hierarchical single level of abstraction. The advantages of deep NNs for this spatio-temporal modeling of rehabilitation exercises at multiple task originate from the capacity for hierarchical modeling of levels of abstraction. NNs are trained for each exercise via human movements at multiple spatial and temporal levels of supervised regression, where for inputs comprising exercise abstraction. This type of models provide improved abilities to repetitions the inferred outputs are quality scores. The network “understand” the levels of hierarchy and the complex structure combines hierarchical merging of extracted feature spatiotemporal correlations in human movement data. vectors from different body parts, pyramidal processing of the movement sequences subsamples at multiple temporal scales, REFERENCES and multi-branch blocks for learning the structure of the used [1] S. R. Machlin, J. Chevan, W. W. Yu, and M. W. Zodet, “Determinants of computational units. Although recurrent units are most utilization and expenditures for episodes of ambulatory physical therapy commonly used for processing sequential time-series data as among adults,” Phys Ther, vol. 91, no. 7, pp. 1018–1029, Jul. 2011. [2] R. Komatireddy, A. Chokshi, J. Basnett, M. Casale, D. Goble, and T. the considered rehabilitation movements, our proposed model Shubert, “Quality and Quantity of Rehabilitation Exercises Delivered By employs convolutional filters in the initial layers and LSTM A 3-D Motion Controlled Camera: A Pilot Study,” Int J Phys Med recurrent units in the later layers of the network. The reasons Rehabil, vol. 2, no. 4, Aug. 2014. for such design stem from the following: (1) the employed [3] S. F. Bassett and H. Prapavessis, “Home-based physical therapy intervention with adherence-enhancing strategies versus clinic-based dataset is fairly small, consisting of less than 200 repetitions per management for patients with ankle sprains,” Phys Ther, vol. 87, no. 9, exercises, hence recurrent NNs can overfit the data due to the pp. 1132–1143, Sep. 2007. larger number of used parameters, and (2) a growing body of [4] K. Jack, S. M. McLean, J. K. Moffett, and E. Gardiner, “Barriers to work report of improved performance by CNNs on time-series treatment adherence in physiotherapy outpatient clinics: A systematic review,” Man Ther, vol. 15, no. 3–2, pp. 220–228, Jun. 2010. and movement data [51]. The proposed deep learning model [5] K. K. Miller, R. E. Porter, E. DeBaun-Sprague, M. Van Puymbroeck, and outperformed recent state-of-the-art deep NNs designed for A. A. Schmid, “Exercise after Stroke: Patient Adherence and Beliefs after movement classification. Discharge from Rehabilitation,” Top Stroke Rehabil, vol. 24, no. 2, pp. Our presented research has several limitations. The 142–148, 2017. [6] P. Maciejasz, J. Eschweiler, K. Gerlach-Hahn, A. Jansen-Troy, and S. validation is primarily performed on rehabilitation exercises Leonhardt, “A survey on robotic devices for upper limb rehabilitation,” performed by healthy subjects, where the measurements are Journal of NeuroEngineering and Rehabilitation, vol. 11, no. 1, p. 3, Jan. acquired with an expensive optical motion capturing system. Additionally, the largest segment of the validation is based on [7] L. V. Gauthier et al., “Video Game Rehabilitation for Outpatient Stroke (VIGoROUS): protocol for a multi-center comparative effectiveness trial movement data without a ground truth assessment of the of in-home gamified constraint-induced movement therapy for movement quality by clinicians. The evaluation of the deep rehabilitation of chronic upper extremity hemiparesis,” BMC Neurology, squat exercise in the KIMORE dataset provides a partial vol. 17, no. 1, p. 109, Jun. 2017. validation on patient data collected with a low-cost sensor. [8] A. Vakanski, J. M. Ferguson, and S. Lee, “Mathematical Modeling and Evaluation of Human Motions in Physical Therapy Using Mixture In future work, we will attempt to address the above-listed Density Neural Networks,” J Physiother Phys Rehabil, vol. 1, no. 4, Dec. shortcomings of this study, i.e., we will focus on a thorough validation of the framework on rehabilitation exercises [9] J. Choi, W. J. Jeon, and S.-C. Lee, “Spatio-temporal Pyramid Matching performed by patients and labeled by a group of clinicians who for Sports Videos,” in Proceedings of the 1st ACM International Conference on Multimedia Information Retrieval, New York, NY, USA, will assign quality scores. We will validate the proposed 2008, pp. 291–297. approach by acquiring muscle activity measurements. Also, we [10] Y. Du, W. Wang, and L. Wang, “Hierarchical recurrent neural network have plans to implement the framework for assessment of for skeleton based action recognition,” in 2015 IEEE Conference on patient performance in home-based rehabilitation using a Computer Vision and Pattern Recognition (CVPR), 2015, pp. 1110– Kinect sensor. [11] A. Vakanski, H. Jun, D. Paul, and R. Baker, “A Data Set of Human Body Movements for Physical Rehabilitation Exercises,” Data, vol. 3, no. 1, p. VI. CONCLUSION 2, Jan. 2018. [12] X. Yun and E. R. Bachmann, “Design, Implementation, and The article proposes a deep learning-based framework for Experimental Results of a Quaternion-Based Kalman Filter for Human assessment of rehabilitation exercises. The framework consists 10 Body Motion Tracking,” IEEE Transactions on Robotics, vol. 22, no. 6, [31] C.-J. Su, C.-Y. Chiang, and J.-Y. Huang, “Kinect-enabled home-based pp. 1216–1227, Dec. 2006. rehabilitation system using Dynamic Time Warping and fuzzy logic,” [13] G. Panahandeh, N. Mohammadiha, A. Leijon, and P. Handel, Applied Soft Computing, vol. 22, pp. 652–666, Sep. 2014. “Continuous Hidden Markov Model for Pedestrian Activity [32] Z. Zhang, Q. Fang, and X. Gu, “Objective Assessment of Upper-Limb Classification and Gait Analysis.,” IEEE Trans. Instrumentation and Mobility for Poststroke Rehabilitation,” IEEE Transactions on Measurement, vol. 62, no. 5, pp. 1073–1083, 2013. Biomedical Engineering, vol. 63, no. 4, pp. 859–868, Apr. 2016. [14] Y. Huang, K. B. Englehart, B. Hudgins, and A. D. C. Chan, “A Gaussian [33] D. Antón, A. Goñi, and A. Illarramendi, “Exercise Recognition for mixture model based classification scheme for myoelectric control of Kinect-based Telerehabilitation,” Methods Inf Med, vol. 54, no. 02, pp. powered upper limb prostheses,” IEEE Transactions on Biomedical 145–155, 2015. Engineering, vol. 52, no. 11, pp. 1801–1811, Nov. 2005. [34] M. Capecci et al., “A Hidden Semi-Markov Model based approach for [15] A. Vakanski, I. Mantegh, A. Irish, and F. Janabi-Sharifi, “Trajectory rehabilitation exercise assessment,” Journal of Biomedical Informatics, Learning for Robot Programming by Demonstration Using Hidden vol. 78, pp. 1–11, Feb. 2018. Markov Model and Dynamic Time Warping,” IEEE Transactions on [35] J. F. Lin, M. Karg, and D. Kulić, “Movement Primitive Segmentation for Systems, Man, and Cybernetics, Part B (Cybernetics), vol. 42, no. 4, pp. Human Motion Modeling: A Framework for Analysis,” IEEE 1039–1052, Aug. 2012. Transactions on Human-Machine Systems, vol. 46, no. 3, pp. 325–339, [16] D. Biswas, Z. Ye, E. B. Mazomenos, M. Jöbges, and K. Maharatna, Jun. 2016. “CORDIC Framework for Quaternion-based Joint Angle Computation to [36] J. F. Lin and D. Kulić, “Online Segmentation of Human Motion for Classify Arm Movements.,” in IEEE International Symposium on Automated Rehabilitation Exercise Analysis,” IEEE Transactions on Circuits and Systems, ISCAS 2018, 27-30 May 2018, Florence, Italy, Neural Systems and Rehabilitation Engineering, vol. 22, no. 1, pp. 168– 2018, pp. 1–5. 180, Jan. 2014. [17] E. B. Mazomenos et al., “Detecting Elementary Arm Movements by [37] F. Bashir, W. Qu, A. Khokhar, and D. Schonfeld, “HMM-based motion Tracking Upper Limb Joint Angles With MARG Sensors.,” IEEE J. recognition system using segmented PCA,” in IEEE International Biomedical and Health Informatics, vol. 20, no. 4, pp. 1088–1099, 2016. Conference on Image Processing 2005, 2005, vol. 3, pp. III–1288. [18] R. Houmanfar, M. Karg, and D. Kulic, “Movement Analysis of [38] H. Bourlard and Y. Kamp, “Auto-association by multilayer perceptrons Rehabilitation Exercises: Distance Metrics for Measuring Patient and singular value decomposition,” Biol. Cybern., vol. 59, no. 4, pp. 291– Progress,” IEEE Systems Journal, vol. 10, no. 3, pp. 1014–1025, Sep. 294, Sep. 1988. 2016. [39] A. Vakanski, J. M Ferguson, and S. Lee, “Metrics for Performance [19] M. Baccouche, F. Mamalet, C. Wolf, C. Garcia, and A. Baskurt, Evaluation of Patient Exercises during Physical Therapy,” International “Sequential Deep Learning for Human Action Recognition,” in Human Journal of Physical Medicine & Rehabilitation, vol. 05, no. 03, 2017. Behavior Understanding, 2011, pp. 29–39. [40] C. M. Bishop, Pattern Recognition and Machine Learning. New York: [20] G. Lefebvre, S. Berlemont, F. Mamalet, and C. Garcia, “BLSTM-RNN Springer, 2011. Based 3D Gesture Classification,” in Artificial Neural Networks and [41] A. P. Dempster, N. M. Laird, and D. B. Rubin, “Maximum likelihood Machine Learning – ICANN 2013, 2013, pp. 381–388. from incomplete data via the EM algorithm,” Journal of the Royal [21] F. J. Ordóñez and D. Roggen, “Deep Convolutional and LSTM Recurrent Statistical Society, Series B, vol. 39, no. 1, pp. 1–38, 1977. Neural Networks for Multimodal Wearable Activity Recognition,” [42] G. J. McLachlan and K. E. Basford, “Mixture models : inference and Sensors, vol. 16, no. 1, p. 115, Jan. 2016. applications to clustering,” Thesis, New York, N.Y. : M. Dekker, 1988. [22] A. Shahroudy, J. Liu, T.-T. Ng, and G. Wang, “NTU RGB+D: A Large [43] D. Zhang, X. Dai, and Y.-F. Wang, “Dynamic Temporal Pyramid Scale Dataset for 3D Human Activity Analysis,” arXiv:1604.02808 [cs], Network: A Closer Look at Multi-scale Modeling for Activity Apr. 2016. Detection,” in Computer Vision – ACCV 2018, 2019, pp. 712–728. [23] A. Jain, A. R. Zamir, S. Savarese, and A. Saxena, “Structural-RNN: Deep [44] Y. S. Zhao, Y. Xiong, L. Wang, Z. Wu, D. Lin, and X. Tang, “Temporal Learning on Spatio-Temporal Graphs,” in 2016 IEEE Conference on Action Detection with Structured Segment Networks,” 2017 IEEE Computer Vision and Pattern Recognition (CVPR), 2016, pp. 5308– International Conference on Computer Vision (ICCV), pp. 2933–2942, 5317. 2017. [24] S. Song, C. Lan, J. Xing, W. Zeng, and J. Liu, “An End-to-End [45] Z. Shou, D. Wang, and S.-F. Chang, “Temporal Action Localization in Spatio-Temporal Attention Model for Human Action Recognition from Untrimmed Videos via Multi-stage CNNs,” in 2016 IEEE Conference on Skeleton Data,” in Association for the Advancement of Artificial Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, Intelligence (AAAI), 2017, pp. 4263–4270. USA, 2016, pp. 1049–1058. [25] J. Bütepage, M. J. Black, D. Kragic, and H. Kjellström, “Deep [46] C. Szegedy, S. Ioffe, V. Vanhoucke, and A. Alemi, “Inception-v4, Representation Learning for Human Motion Prediction and Inception-ResNet and the Impact of Residual Connections on Learning,” Classification,” in 2017 IEEE Conference on Computer Vision and arXiv:1602.07261 [cs], Feb. 2016. Pattern Recognition (CVPR), 2017, pp. 1591–1599. [47] C. Li, Q. Zhong, D. Xie, and S. Pu, “Co-occurrence Feature Learning [26] K. Fragkiadaki, S. Levine, P. Felsen, and J. Malik, “Recurrent Network from Skeleton Data for Action Recognition and Detection with Models for Human Dynamics,” in Proceedings of the 2015 IEEE Hierarchical Aggregation,” in Proceedings of the 27th International Joint International Conference on Computer Vision (ICCV), Washington, DC, Conference on Artificial Intelligence, Stockholm, Sweden, 2018, pp. USA, 2015, pp. 4346–4354. 786–792. [27] Z. Zhang, Q. Fang, L. Wang, and P. Barrett, “Template matching based [48] A. Shahroudy, J. Liu, T.-T. Ng, and G. Wang, “NTU RGB+D: A Large motion classification for unsupervised post-stroke rehabilitation,” in Scale Dataset for 3D Human Activity Analysis,” 2016 IEEE Conference International Symposium on Bioelectronics and Bioinformations 2011, on Computer Vision and Pattern Recognition (CVPR), pp. 1010–1019, 2011, pp. 199–202. 2016. [28] I. Ar and Y. S. Akgul, “A computerized recognition system for the [49] K. Simonyan and A. Zisserman, “Two-Stream Convolutional Networks home-based physiotherapy exercises using an RGBD camera,” IEEE for Action Recognition in Videos,” in Advances in Neural Information Transactions on Neural Systems and Rehabilitation Engineering, vol. 22, Processing Systems 27, Z. Ghahramani, M. Welling, C. Cortes, N. D. no. 6, pp. 1160–1171, 2014. Lawrence, and K. Q. Weinberger, Eds. Curran Associates, Inc., 2014, pp. [29] T. Hussain, H. F. Maqbool, N. Iqbal, M. Khan, Salman, and A. A. 568–576. Dehghani-Sanij, “Computational model for the recognition of lower limb [50] M. Capecci et al., “The KIMORE dataset: KInematic assessment of movement using wearable gyroscope sensor.,” IJSNet, vol. 30, no. 1, pp. MOvement and clinical scores for remote monitoring of physical 35–45, 2019. REhabilitation,” IEEE Transactions on Neural Systems and [30] H. Sakoe, “Dynamic programming algorithm optimization for spoken Rehabilitation Engineering, 2019. word recognition,” IEEE Transactions on Acoustics, Speech, and Signal [51] T. M. Le, N. Inoue, and K. Shinoda, “A Fine-to-Coarse Convolutional Processing, vol. 26, pp. 43–49, 1978. Neural Network for 3D Human Action Recognition,” arXiv:1805.11790 [cs], May 2018.

Journal

StatisticsarXiv (Cornell University)

Published: Jan 29, 2019

There are no references for this article.