Access the full text.
Sign up today, get DeepDyve free for 14 days.
Article Application of Machine Learning Algorithms to Predict Body Condition Score from Liveweight Records of Mature Romney Ewes 1,2, 1 1 1 1 Jimmy Semakula *, Rene A Corner‐Thomas , Stephen T Morris , Hugh T Blair and Paul R Kenyon School of Agriculture and Environment, Massey University, Private Bag 11222, Palmerston North 4410, New Zealand; R.Corner@massey.ac.nz (R.A.C.‐T.); S.T.Morris@massey.ac.nz (S.T.M.); H.Blair@massey.ac.nz (H.T.B.); P.R.Kenyon@massey.ac.nz (P.R.K.) National Agricultural Research Organization, Entebbe P.O. Box 295, Uganda * Correspondence: J.Semakula@massey.ac.nz Abstract: Body condition score (BCS) in sheep (Ovis aries) is a widely used subjective measure of the degree of soft tissue coverage. Body condition score and liveweight are statistically related in ewes; therefore, it was hypothesized that BCS could be accurately predicted from liveweight using ma‐ chine learning models. Individual ewe liveweight and body condition score data at each stage of the annual cycle (pre‐breeding, pregnancy diagnosis, pre‐lambing and weaning) at 43 to 54 months of age were used. Nine machine learning (ML) algorithms (ordinal logistic regression, multinomial regression, linear discriminant analysis, classification and regression tree, random forest, k‐nearest neighbors, support vector machine, neural networks and gradient boosting decision trees) were ap‐ plied to predict BCS from a ewe’s current and previous liveweight record. A three class BCS (1.0– 2.0, 2.5–3.5, > 3.5) scale was used due to high‐class imbalance in the five‐scale BCS data. The results Citation: Semakula, J.; showed that using ML to predict ewe BCS at 43 to 54 months of age from current and previous Corner‐Thomas, R.A.; Morris, S.T.; Blair, H.T.; Kenyon, P.R. Application liveweight could be achieved with high accuracy (> 85%) across all stages of the annual cycle. The of Machine Learning Algorithms to gradient boosting decision tree algorithm (XGB) was the most efficient for BCS prediction regardless Predict Body Condition Score from of season. All models had balanced specificity and sensitivity. The findings suggest that there is Liveweight Records of Mature potential for predicting ewe BCS from liveweight using classification machine learning algorithms. Romney Ewes. Agriculture 2021, 11, 162. https://doi.org/10.3390/ Keywords: accuracy; predictor; models; classification agriculture11020162 Academic Editor: Michele Mattetti and Luigi Alberti 1. Introduction Received: 19 January 2021 Body condition score (BCS) in sheep (Ovis aries) is a widely used subjective measure Accepted: 13 February 2021 Published: 17 February 2021 of the degree of soft tissue coverage (predominantly fat and muscle) of the lumbar verte‐ brae region [1,2]. Body condition score is based on a 1–5 scale using half units or quarter Publisher’s Note: MDPI stays neu‐ units and is conducted by palpation of the lumbar vertebrae immediately caudal to the tral with regard to jurisdictional last rib above the kidneys [2]. Unlike liveweight (LW), BCS is not affected by fluctuations claims in published maps and insti‐ in gut‐fill, fleece weight and frame size, which confound liveweight as a measure of ani‐ tutional affiliations. mal size to give an indication of body condition [3]. BCS can be easily learned and is cost‐ effective and requires no specialist equipment [2]. The optimal BCS range for ewe perfor‐ mance is 2.5 to 3.5 [2]; outside this range performance is either adversely affected or it is inefficient in terms of performance per kilogram of feed eaten [4]. Farmers can use tar‐ Copyright: © 2021 by the authors. Li‐ geted feeding based on this optimal range to optimize overall performance. censee MDPI, Basel, Switzerland. Despite the advantages of using BCS over liveweight (LW) for flock management, This article is an open access article many farmers in extensive farming systems do not regularly do so. For instance, only 7% distributed under the terms and con‐ and 40% of the farmers indicated that they conducted hands‐on BCS in Australia and New ditions of the Creative Commons At‐ Zealand, respectively [5,6]. Farmers often rely on visual inspection—a method which is tribution (CC BY) license (http://crea‐ tivecommons.org/licenses/by/4.0/). Agriculture 2021, 11, 162. https://doi.org/10.3390/agriculture11020162 www.mdpi.com/journal/agriculture Agriculture 2021, 11, 162 2 of 22 inaccurate—or they only use liveweight measure [7], which is influenced by factors in‐ cluding gut fill variation, frame size, physiological stage and fleece weight [2]. The low uptake of BCS among farmers may in some part be due to challenges such as assessor subjectivity and extra labor requirements [2]. Attempts to increase the uptake of BCS among farmers—including use of promotional training workshops and regular training— have not yielded the desired outcome, likely because they do not directly alleviate the labor burden related to hands‐on BCS [2]. Therefore, accurate and reliable alternative methods to estimate body condition score with less hands‐on measurement would be ad‐ vantageous and would likely improve the uptake of BCS technology, especially for large flocks. Ewe BCS and LW are correlated [2,8,9]. This relationship varies by age, stage of the annual cycle and breed of animal [8,9]. Semakula et al. [9] reported that in Romney ewes, both LW and BCS plateaued after they reached 43–54 months of age, thereby establishing a stable base BCS–LW relationship. This means that, as a ewe ages, future liveweights, based on BCS–LW prediction equations, could potentially be used to predict a BCS with a degree of accuracy and reduce the need for hands‐on BCS measurement. Modern automated weighing systems with individual electronic identification offer an opportunity to collect lifetime data relatively easily and quickly. With such large da‐ tasets, it has become possible to process and extract valuable information. Semakula et al. [10] applied multivariate regression models to predict ewe BCS from lifetime liveweight data as a ewe aged from eight to sixty‐seven months. At best, these multivariate models explained 49% and 21% of the variability in BCS using the five‐scale (nine points) and three‐scale (three points), respectively. Further, BCS was skewed with little variability due to the limited nature of the BCS scale used (1–5, in increments of 0.5). Using only discrete values such as BCS can lead to the heaping or grouping of all possible values (i.e., non‐ continuous) at isolated points, affecting the resolution and ultimately the accuracy of any prediction model. Approaches that circumvent the challenges of considering discrete as continuous data are required for BCS prediction. Classification‐based models are recommended for discrete and categorical data analysis [11–14]. Among these classification approaches, ma‐ chine learning (ML) classification models have been used with greater success compared to traditional statistical methods in sheep production for early estimation of the growth and quality of wool in adult Australian merino sheep [15] and sheep carcass traits [16] from early‐life data. Machine learning utilizes algorithms whose logic can be learned di‐ rectly from unique patterns in the data or inexplicitly through pre‐programmed classical statistical methods [17]. The successful use of ML algorithms in various fields of science warrants their application in animal production problem solving [18,19]. Ideally, it should be possible to install this computer‐acquired intelligence into modern weighing systems to automatically explore patterns in lifetime liveweights and predict BCS. The aim of this study was to investigate the use of machine learning algorithms to predict ewe BCS from current and previous liveweight data. In the present study, ewe BCS was predicted for the ewes in their fourth year of life (43–54 months) at four stages of the annual system using previous liveweight measurements. 2. Materials and Methods 2.1. Farms and Animals Used and Data Collection The current study was a follow‐up of the previous two studies [9,10]. In their study, Semakula et al. [9] only determined the nature of the relationship between LW and BCS (linear) and the factors affecting their relationship (ewe age, stage of annual cycle and pregnancy rank). In the subsequent study, Semakula et al. [10] demonstrated the potential of predicting ewe BCS as a continuous variable from liveweight and previous BCS rec‐ ords. The resulting linear models had high prediction error (> 10%), and a greater part of the variability in BCS (from 39 to 89%) remained unexplained. The current study attempts Agriculture 2021, 11, 162 3 of 22 to predict BCS from LW records in a more precise way, using machine learning algo‐ rithms. The details on how the animals were managed and data was collected were re‐ ported in Semakula et al. [9]. 2.2. Statistical Analyses Data were analyzed using R program version 4.3.4 [20] with caret package extensions [21]. Data were initially explored to identify completeness and were summarized by BCS to determine class distribution. Missing values (n = 26) were imputed using the bagimput function from the caret package. This method constructs a “bagging” model for a given variable based on regression trees, using all other variables as predictors while maintain‐ ing the original data distribution structure [21]. Liveweight data were normalized and centered during analysis using the pre‐process function from the caret package. The dis‐ tribution of BCS at all stages of the annual cycle showed that on a full BCS scale (1–5), there were high‐class imbalances (more than 1:50 for any two classes). The average ratios of the class frequencies (minimum: maximum) were 1:216, 1:1336, 1:498 and 1:97 for pre‐ breeding, pregnancy diagnosis, pre‐lambing and weaning, respectively (Figure 1A). The high‐class or extreme imbalance was due to too few extreme BCS cases with the majority of individual BCS measurements ranging from 2.5 to 3.5. Triguero et al. [22] categorized class imbalances above 50:1 for any two outcomes as high‐class imbalance. Body condition score data is both discrete and ordered in nature, which makes multiclass classification regression approaches more suitable for its analysis. However, when the underlying assumptions are grossly violated or when classes are ex‐ tremely imbalanced [23], classification statistical methods become less accurate [24]. Strat‐ egies to overcome the challenge of class imbalance may include resampling techniques such as oversampling, undersampling and synthetic minority oversampling [25]. Such methods of circumventing class imbalances hold in cases below 50:1 imbalance. In the case of high‐class imbalance, the samples generated become less representative of the true sam‐ ple distribution leading to underfitting or overfitting the model. 0.75 A 0.50 0.25 0.00 1.0 1.5 2.0 2.5 3.0 3.5 4.0 B 0.75 0.50 0.25 0.00 1.0‐2.0 2.5‐3.5 >3.5 Body condition score Figure 1. Distribution of ewe body condition scores by stage of the annual cycle from 18,354 individual records of 5761 ewes during their fourth year (43–54 months) of age. Bar colors (grey, yellow, blue and green) indicate BCS proportions at pre‐breeding, pregnancy diagnosis, pre‐lambing and weaning respectively. In (A), a BCS of 1–4‐point scale was used and in (B), 1–3 scale (BCS 1.0–2.0: 1, 2.5–3.5: 2 and >3.5: 3). Proportion Proportion Agriculture 2021, 11, 162 4 of 22 To improve the balance of the BCS class distribution, a new but narrower three‐class BCS scale was devised (BCS 1.0–2.0: 1, 2.5–3.5: 2 and >3.5: 3) (Figure 1B). The selection of a new scale was guided by literature, where BCS of 2.5 to 3.5 is considered to be the range for optimal performance [2]. Below this BCS range, there is reduced performance; above this range, energy is used inefficiently. In addition, the resulting classes were resampled through minority class oversampling to create “synthetic” data, a method popularly known as SMOTE [25] using the SmoteClassif function in the UBL package [26]. Resampling improves the class‐level distribution (balances the number of per class obser‐ vations) of a categorical variable so that the assumptions of classification models can hold. 2.2.1. Variable Selection and Model Building The best variable combinations for prediction of BCS (1, 2 or 3) at each stage of the annual cycle using liveweight were selected through the regularization and variable se‐ lection technique utilizing the elastic net method in the glmnet extension [27] in the caret package [21]. The elastic net method combines the power of two penalized‐regularization methods (ridge and lasso regression) to search for significant predictors and handling of collinearity [28]. All models were fitted and validated using four steps as described by Semakula et al. [9]. The steps included: (i) data partitioning, (ii) resampling, (iii) model training and iv) validation. Data were partitioned with stratification into training and testing datasets in a ratio of 3:1, with replacement. Resampling was done using the bootstrapping and aggre‐ gation [29] procedures in the caret package [21]. During resampling, 10 equal‐sized sub‐ samples, repeated three times, were selected from the dataset. Prediction models were th trained on nine subsample sets which were used to compute the parameters, and the 10 was used to evaluate the model as well as compute the error. The procedure was run 30 times (10‐folds repeated three times), and the average parameter values and their proba‐ bilities were computed as described by Semakula et al. [9]. The algorithms used for this work were selected from a range of probabilistic and nonprobabilistic methods in order to cover the most commonly used machine learning algorithms [17,30]. A summary of the concepts, advantages and disadvantages of each algorithm is given in Table A1 in Appendix A. Further, the criteria for selecting these methods included (i) successful application in other animal science studies [16,19,20] and (ii) ability to handle multiclass categorization [24]. Three traditional (ordinal logistic, mul‐ tinomial regression [31,32] and linear discriminant analysis (LDA) [33]) statistical models (white box or low‐level machine learning models), two low‐level black models (random forest (RF) [34] and classification and regression trees (CART) [35]) and four high‐level black box models (support vector machines (SVM) [36] and k‐nearest neighbors (K‐NN) [37,38], neural networks (ANN), and gradient boosting decision trees (XGB) [39]) were compared. Machine learning models can be categorized in two main ways: (i) whether data provides labels that classify variables (supervised) or not (unsupervised) [40]; and (ii) if a clear description of the analysis detailing how covariates and the target variable are related (classical statistical methods or white boxes), a partial description blue print (low‐level or semiblack boxes) or no description can be given (high‐level black boxes) [17]. All algorithms were implemented in R package using several caret package extensions (nnet, multinom, polr, lda, rpart, svmLinear, xgblinear, rf and knn (http://to‐ pepo.github.io/caret/index.html)). A chart summarizing the model building and evalua‐ tion procedures is given as in the appendices (Figure A1). Agriculture 2021, 11, 162 5 of 22 2.2.2. Model Performance Evaluation Using a three‐class BCS scale (1.0–2.0, 2.5–3.5, >3.5), model fit and ranking between models were assessed using overall accuracy, balanced accuracy, precision, F‐measure, sensitivity, and specificity. The metrics were computed from the number of true positive (TP), true negative (TN), false positive (FP) and false negative (FN) predictions as de‐ scribed by Tharwat [24]. In addition, Cohen’s kappa statistic [41]—a common measure to calculate agreement between the classification of qualitative observations was calculated as described by McHugh [42] and Botchkarev [43]. To evaluate the power of the algo‐ rithms to correctly classify ewe BCS, measures of the balance (authenticity and prediction power) between sensitivity and specificity were computed. These indicators of model power and authenticity (positive likelihood ratio, negative likelihood ratio and Youden’s index) combine sensitivity and specificity to emphasize how well a model can predict the outcome [44]. A detailed description of the metrics (accuracy and authenticity) used in model assessment is given in Table 1. Table 1. Model performance evaluation metrics. Model Definition Formula The proportion of correctly classified TP TN Balanced accuracy subjects for each class. Useful especially TP FN TN FP Accuracy when there is class imbalance. The proportion of correctly classified TP Precision subjects for a given class given that they Precision TP FP truly belonged to that class The harmonic mean of the precision and 2∗ sensitivity∗ precision sensitivity best if there is some sort of F‐measure F measure balance between precision and sensitiv‐ sensitivity precision ity. The proportion of correctly classified TP Sensitivity subjects for a given class to those who Sensitivity TP FN truly belong to that class. The proportion of subjects correctly TN Specificity classified as not belonging to a given Specificity class to those that truly do not belong to TN FP that class. The ratio between the true positive and Sensitivity Positive likelihood the false positive rates for “positive” PLR 1 Specificity rate (PLR) events that are detected by a model. The ratio between the false negative and 1 Sensitivity Negative likelihood true negative rates and mirrors the NLR rate (NLR) probability for “negative” events to be Specificity detected by a model. The sum of sensitivity and specificity Youden’s index (YI) YI Sensitivity Specificity 1 minus one Measures the degree of agreement be‐ 𝑝 𝑝 𝜅 Cohen’s kappa (κ) tween two raters or ratings (inter‐rater 1𝑝 or interrater reliability) Where: TP = true positive, TN = true negative, FP = false positive, FN = false negative, κ = Cohen’s kappa statistic, po = actual observed agreement, and pe represents chance agreement. The analysis generated a dataset of 108 records (4 time points, 3 BCS classes and 9 models of two groups of model performance evaluation metrics firstly, the indicators of accuracy: balance accuracy, precision and F‐measure, and secondly measures of model Agriculture 2021, 11, 162 6 of 22 authenticity: sensitivity and specificity). To obtain a holistic picture of the overall model performance, the two groups of performance metrics were examined. Initially, each group of variables was explored using principal component analysis (PCA) to determine the ap‐ propriate number of components of dimensions where the Eigen values associated with each component were compared with those generated through a probabilistic process based on Monte Carol PCA for parallel analysis simulation [45,46]. Monte Carlo PCA sim‐ ulated Eigen values allow comparisons based on the same sample size and number of variables. If the Eigen value of a component from real data is greater than the simulated one, then that component is important. Otherwise, if equal to or less than, such compo‐ nents are considered not important. Consequently, one component was considered im‐ portant from each group of variables (indicators of accuracy: explained variance = 87%; indicators of sensitivity–specificity: explained variance = 61%) having explained most of the variability in the group data. Principal component analysis is limited to continuous data. In order to decipher the patterns in the relationship between the categorical variable (BCS) and each model regard‐ ing their overall performance, a correspondence analysis was required. Therefore, the FAMD function in the FactoMiner package [47] was used to analyze both groups of vari‐ ables. The FAMD extension combines PCA and multiple correspondence analysis (MCA) to conduct factor analysis. Each group of variables then resulted in a single dimension (latent variable). A scatterplot of accuracy and sensitivity–specificity latent variables was constructed for each of the four stages of the annual sheep weighing cycle. Models were ranked on a scale of 1 to 9 (where 1 is best and 9 is the poorest) at each stage of the annual cycle, to obtain the overall performance rank. 3. Results 3.1. Overall Performance of Machine Learning Models This section presents results for the accuracy in a broad sense, sensitivity and specificity of nine models in predicting ewe BCS based on the testing dataset (Tables 2 and 3). Addition‐ ally, Table A2 is supplied in the appendix, which show the comparisons between model accu‐ racy across stages of the annual sheep weighing cycle in New Zealand. Results showed that there were significant (p < 0.05) differences in model prediction per‐ formance based on the Boniferroni p‐value adjustment method for pairwise comparisons (Ta‐ ble A2, Appendix A). The gradient boosting decision tree algorithm (XGB) had the highest (p < 0.05) accuracy (average = 90.3%) and kappa statistic (κ = 82.1%) at pre‐breeding, pregnancy diagnosis, pre‐lambing and weaning, making it the most accurate algorithm for ewe BCS pre‐ diction on the one to three (1.0–2.0; 2.5–3.5; >3.5) scale (Table 2). The RF (Figure A2, Appendix A) algorithm had a slightly lower but still good accuracy, making it the best alternative to XGB. The multinorm, LDA, ordinal and CART algorithms had moderate to fair accuracies. Pre‐lambing, XGB and RF were comparable and had the highest accuracies. The random forest and k‐nearest neighbors (K‐NN) in decreasing order were also considered good prediction models, having scored above 80% accuracy and 70% kappa statistics at all times of the year. The CART algorithm consistently gave the lowest (p > 0.05) accuracy except pre‐lambing where its accuracy was (p = 0.047; Table A1) comparable to that of ordinal logistic regression. The lowest average accuracy was 66.6% seen for the CART model at weaning (Table 2, paren‐ thesis). Overall, all algorithms had greater accuracy than a random guess (i.e., accuracy = 33.3%) in classifying BCS. In terms of overall authenticity, models were biased towards being more specific than sensitive (Table 3). The ranking of model authenticity followed a trend like that of accuracy. The gradient boosting decision tree algorithm (XGB) had the highest sensitivity (average = 87.7%) as well as specificity (average = 93.9%) across all stages of the annual sheep weighing cycle, making it the most authentic and powerful algorithm for categorizing ewe into the cor‐ rect BCS classes on three‐point scale (1.0–2.0; 2.5–3.5; >3.5) (Table 3). The XGB model was Agriculture 2021, 11, 162 7 of 22 closely followed by RF (average sensitivity = 85.5%, average specificity: 92.8%) while CART (average sensitivity: 58.7%, average specificity: 79.5%) was the poorest. Table 2. Accuracy and kappa statistics of nine predictive models for ewe BCS at 43–54 months of age at different stages of the annual cycle. Values in parenthesis denote the minimum and maximum accuracy, in ascending order. Pre‐Breeding Pregnancy Diagnosis Pre‐Lambing Weaning Kappa Kappa Model Accuracy Accuracy Accuracy Kappa(κ) Accuracy Kappa(κ) (κ) (κ) 3,1 3,1 2,1 1,3 XGB 89.5(85.6–97.5) 79.6 91.2(88.5–93.4) 82.3 90.6(88.8–91.4) 82.9 91.7(90.1–93.2) 83.4 2,1 3,1 2,3 1,3 RF 89.0(84.7–96.6) 78.0 90.0(87.5–92.9) 78.0 89.2(86.6–91.6) 78.5 88.6(88.2–89.6) 77.1 2,1 3,1 2,3 2,3 K‐NN 87.0(81.2–95.7) 75.5 86.8(84.7–89.8) 75.5 86.2(83.0–89.7) 66.0 86.4(84.6–88.8) 77.7 2,1 2,1 2,1 2,3 SVM 86.7(78.8–96.6) 75.9 88.5(84.8–93.1) 73.7 73.8(72.0–74.7) 71.7 88.8(85.3–91.2) 72.7 2,1 2,1 1,3 1,3 ANN 85.2(79.0–94.2) 72.2 82.0(80.5–85.1) 65.6 78.9(75.5–82.4) 69.5 84.0(82.0–86.9) 68.0 2,1 3,1 1,3 3,2 Multinorm 82.7(76.4–91.7) 66.8 77.6(73.8–80.0) 56.1 73.5(71.8–75.1) 48.8 75.9(74.4–78.1) 51.8 2,1 3,1 1,3 1,2 LDA 81.2(73.8–91.1) 63.6 77.1(72.2–79.6) 54.6 73.8(71.5–75.5) 49.5 75.9(74.4–78.7) 51.7 2,1 2,1 2,3 2,1 Ordinal 79.6(70.7–88.4) 48.4 72.7(67.6–75.8) 47.7 68.4(58.7–74.8) 37.0 72.4(67.8–76.2) 44.9 2,1 3,1 1,2 2,1 CART 72.6(58.6–85.1) 47.3 69.8(64.0–73.3) 40.5 67.5(62.8–71.1) 41.8 66.6(61.4–70.1) 33.2 Model: (XGB: Gradient boosting decision trees model, RF: random forest, K‐NN: k‐nearest neighbors, SVM: support vector machines, ANN: neural networks, Multinorm: multinomial regression, LDA: linear discriminant analysis, Ordinal: ordinal 1,2,3 logistic regression, CART: classification and regression tree). The superscripts where 1: 1.0–2.0, 2: 2.5–3.5 and 3: >3.5 indicate the BCS class from which the value was observed. The first superscript indicates the class from which the mini‐ mum estimate was observed, while the second value indicates the class from which the maximum estimate was achieved). All models were significant (p < 0.05) and better than a random guess (i.e., accuracy = 33.3%). All ewe BCS predictions were based on current and previous liveweight. Table 3. Indicators of authenticity (sensitivity and specificity) of nine predictive models for ewe BCS at 43–54 months of age at different stages of the annual cycle. Values in parenthesis denote the minimum and maximum sensitivity or speci‐ ficity, in ascending order. Pre‐Breeding Pregnancy Diagnosis Pre‐Lambing Weaning Model Sensitivity Specificity Sensitivity Specificity Sensitivity Specificity Sensitivity Specificity 86.0(79.7–96.3) 93.1(89.1–98.9) 88.2(83.7–90.4) 94.2(93.1–96.3) 87.5(85.9–88.8) 93.8(89.7–97.5) 89.0(84.8–92.3) 94.5(91.6–96.5) XGB 3,1 2,1 3,1 2,1 1,3 2,1 1,2 2,3 85.3(80.0–95.3) 92.8(89.3–97.9) 86.7(80.9–90.3) 93.4(90.5–95.5) 85.6(82.6–88.6) 92.8(87.5–96.4) 84.8(82.5–87.6) 92.4(88.9–93.4) RF 2,1 2,1 3,1 2,1 1,3 2,1 1,2 2,3 82.6(74.8–93.8) 91.4(87.5–97.5) 82.3(75.3–84.2) 91.2(84.2–95.4) 81.5(73.5–86.1) 90.8(81.1–98.1) 81.9(77.6–85.6) 90.9(83.5–95.1) SVM 2,1 2,3 3,2 2,1 1,3 2,1 3,2 2,3 82.2(66.8–96.2) 91.2(85.9–97.0) 84.7(75.5–91.8) 92.3(88.4–94.5) 65.0(63.0–67.3) 82.5(76.8–86.4) 85.1(78.6–88.9) 92.6(91.9–93.6) K‐NN 2,1 3,1 2,1 3,1 1,2 2,1 2,3 2,3 80.2(71.3–91.7) 90.2(86.7–96.7) 76.0(73.2–78.0) 88.0(84.3–92.2) 71.8(56.5–80.2) 85.9(78.8–94.4) 78.7(70.5–84.1) 89.3(82.4–93.5) ANN 2,1 2,1 3,1 2,1 1,3 2,1 1,2 2,1 Mul‐ 76.8(68.5–89.0) 88.5(84.4–94.5) 70.0(62.7–71.4) 85.1(81.8–88.7) 64.7(58.6–68.7) 82.4(80.6–84.9) 67.9(63.3–76.2) 83.9(80.1–86.2) 2,1 2,1 3,2 2,1 1,3 2,1 3,1 2,1 tinom 74.9(64.7–87.7) 87.6(82.8–94.4) 69.4(57.1–82.7) 84.8(76.6–90.7) 65.0(56.3–69.4) 82.5(79.2–86.8) 67.8(61.5–79.8) 83.9(77.6–87.4) LDA 2,1 2,1 3,2 2,1 1,3 2,1 3,2 2,3 72.7(61.6–82.4) 86.5(79.7–94.5) 63.6(60.7–67.9) 81.7(73.1–90.9) 57.9(41.4–69.3) 79.0(76.1–80.8) 63.2(58.3–68.5) 81.6(72.8–88.2) Ordinal 2,1 2,1 2,3 2,1 2,3 2,1 3,1 2,3 63.3(37.0–82.5) 81.9(77.6–87.8) 59.7(41.1–77.3) 80.0(67.1–86.0) 56.7(37.9–72.3) 78.3(71.2–87.7) 55.4(39.2–62.9) 77.7(72.4–83.6) CART 2,1 3,1 3,2 2,3 1,2 2,1 2,1 3,2 Model: (XGB: Gradient boosting decision trees model, RF: random forest, K‐NN: k‐nearest neighbors, SVM: support vector machines, ANN: neural networks, Multinorm: multinomial regression, LDA: linear discriminant analysis, Ordinal: ordinal logistic regression, 1,2,3 CART: classification and regression tree). The superscripts where 1: 1.0–2.0, 2: 2.5–3.5 and 3: > 3.5 indicate the BCS class from which the value was observed. In their sequence, the first superscript indicates the class from which the minimum estimate was observed, while the second value indicates the class from which the maximum estimate was achieved). All ewe BCS predictions were based on current and previous liveweight. Agriculture 2021, 11, 162 8 of 22 In the following section we present results for the construct or latent variables which are representative of the three specific measures of model accuracy (class‐level or bal‐ anced accuracy, precision and F‐measure) together with two indicators of predictive power/authenticity (sensitivity, specificity) across four stages of the annual sheep weigh‐ ing cycles (Figures 2–5). A summary of the indicators of accuracy and authenticity was provided in Tables 2 and 3. Additionally, Table A3 provides two extra measures of accu‐ racy (precision and F‐measure) used in the construction of the accuracy latent variable. The results show the patterns in the relationship between the latent variables with BCS class prediction for each model. The CART model had the lowest accuracy and power measures across all stages of the annual sheep weighing cycle and was selected as the reference for comparisons. 3.1.1. Pre‐Breeding At pre‐breeding, the models had a clear‐cut hierarchy in performance, with XGB be‐ ing the best and CART the poorest (Figure 2). The XGB was the best algorithm with 17% more accuracy than CART, which was the least accurate in predicting ewe BCS (Table 2). The best balance between accuracy and authenticity (points along or touching the diago‐ nal line) was observed in the moderate performing models including ANN, multinom, LDA and ordinal (Figure 2). The best performing models (XGB, RF, SVM and K‐NN) were biased towards accuracy while the poorest (CART) was biased towards authenticity. In terms of BCS, the best accuracy was achieved in the 1.0–2.0 class and the lowest in the 2.5– 3.5 class for all models except for XGB which was least accurate in the >3.5 class. The best accuracy (97.5%) was achieved using the XGB in the 1.0–2.0 BCS class, and the lowest (58.6%) was observed using the CART algorithm in the 2.5–3.5 class (Table 2, parenthesis). All models were most sensitive to the 1.0–2.0 class and least sensitive to the 2.5–3.5 class except XGB which was least sensitive to the > 3.5 class. The XGB was the best algo‐ rithm, being 23% more sensitive than CART, which was the least sensitive in predicting ewe BCS (Table 3). The highest BCS classification sensitivity was observed using XGB and K‐NN models (96.3%) in the 1.0–2.0 BCS class while CART (37.0%) had the lowest in the 2.5–3.5 class (Table 3, parenthesis). All models had the highest specificity observed in the 1.0–2.0 BCS class except for SVM which had the highest specificity in the >3.5 class and both K‐NN and CART which had their lowest in the >3.5 class. The XGB was the best algorithm with 12% more specificity than CART, which had the least specificity in pre‐ dicting ewe BCS (Table 2). The highest specificity (98.9%) was observed in the 1.0–2.0 class for XGB and the lowest (72.6%) in the >3.5 class for CART model (Table 3, parenthesis). Agriculture 2021, 11, 162 9 of 22 1.0‐2.0 XGB RF SVM KNN ANN Multinom ‐3 ‐2 ‐10 1 2 3 >3.5 LDA Ordinal ‐1 2.5‐3.5 ‐2 CART ‐3 Sensitivity‐Specificity latent variable (23.48%) Figure 2. A plot of the accuracy and sensitivity–specificity latent variables from their first dimension/component obtained through a factor analysis of mixed variables (a combination of principle component and multiple correspondence anal‐ yses) procedure on measures of performance for the prediction of ewe BCS during pre‐breeding. Dots (red sphere: model, blue square: BCS class). Dotted diagonal line indicates a balance between accuracy and sensitivity–specificity. If dot is above, then model or BCS class was more accurate than sensitive–specific, while the reverse indicates that the model was more sensitive than accurate. The further and more positive a model is along the diagonal line, the greater and better its prediction power. The variance explained by each extracted first dimension for each latent variable (accuracy, sensitivity– specificity) is given in parenthesis along the axes. 3.1.2. Pregnancy Diagnosis At pregnancy diagnosis, the models had a clear‐cut hierarchy in performance, with XGB being the best and CART the poorest (Figure 3). The multinom and LDA models were closely juxtaposed indicating that they had comparable performance. The XGB was the best algorithm with 21% more accuracy than CART, which was the least accurate in predicting ewe BCS (Table 2). The best balance between accuracy and authenticity was observed in the ANN model. The XGB, RF, SVM and K‐NN models were biased towards accuracy while the multinom, LDA, ordinal and CART were biased towards authenticity (Figure 3). In terms of BCS, the best accuracy was achieved in the 1.0–2.0 class and the lowest in the >3.5 class for all models except for SVM, ANN and ordinal which were least accurate in the 2.5–3.5 class. The highest accuracy (93.4%) was achieved using the XGB in the 1.0–2.0 BCS class and the lowest (64.0%) was observed using the CART algorithm in either the >3.5 class (Table 2, parenthesis). There was no clear pattern in class‐level model sensitivity at pregnancy diagnosis. The XGB was the best algorithm with 29% more sensitivity than CART, which was the least sensitive in predicting ewe BCS (Table 3). The highest BCS classification sensitivity was observed using K‐NN models (91.8%) in the 1.0–2.0 BCS class while CART (41.1%) had the lowest in the >3.5 class (Table 3, parenthesis). All models had the highest specific‐ ity observed in the 1.0–2.0 BCS class except for CART which had the its highest in the >3.5 class. The XGB was the best algorithm with 14% more specificity than CART, which had the least specificity in predicting ewe BCS (Table 2). The highest specificity (96.3%) was observed in the 1.0–2.0 class for XGB and the lowest (67.1%) in the 2.5–3.5 class for CART model. Accuracy latent variable (30.35%) Agriculture 2021, 11, 162 10 of 22 XGB RF KNN SVM 1.0‐2.0 ANN ‐3 ‐2 ‐10 1 2 3 2.5‐3.5 >3.5 ‐1 Multinom LDA ‐2 Ordinal CART ‐3 Sensitivity‐Specificity latent (19.15%) Figure 3. A plot of the accuracy and sensitivity–specificity latent variables from their first dimension/component obtained through a factor analysis of mixed variables (a combination of principle component and multiple correspondence anal‐ yses) procedure on measures of performance for the prediction of ewe BCS during pregnancy diagnosis. Dots (red sphere: model, blue square: BCS class). Dotted diagonal line indicates a balance between accuracy and sensitivity–specificity. If dot is above, then model or BCS class was more accurate than sensitive–specific while the reverse indicates that the model was more sensitive than accurate. The further and more positive a model is along the diagonal line, the greater and better is its prediction power. The variance explained by each extracted first dimension for each latent variable (accuracy, sensi‐ tivity–specificity) is given in parenthesis along the axes. 3.1.3. Pre‐Lambing At pre‐lambing, the models had a clear‐cut hierarchy in performance, with XGB be‐ ing the best and CART the poorest (Figure 4). It was worth noting that the K‐NN model, which had been among the best four models at pre‐breeding and pregnancy diagnosis, was downgraded into a moderate model. The K‐NN, multinom and LDA models had overlapping overall performance. The XGB was the best algorithm with 23% more accu‐ racy than CART, which was the least accurate in predicting ewe BCS (Table 2). All models were biased with XGB, RF, SVM and ANN inclined towards accuracy, while K‐NN, Mul‐ tinon, LDA, ordinal and CART were inclined towards authenticity (Figure 4). The best overall accuracy was achieved in the >3.5 BCS class and the lowest in the 2.5–3.5 class (Table 2, parenthesis). Regarding BCS class‐level model accuracy, there was no clear pat‐ tern. The majority of the models (RF, K‐NN, ANN, multinom, LDA and ordinal) were most accurate in the >3.5 BCS class and least accurate in the 2.5–3.5 class. The least accu‐ racy for majority of the models (XGB, RF, K‐NN, SVM and ordinal) was observed in the 2.5–3.5 class. The highest accuracy (92%) was achieved using the RF model in the >3.5 BCS class and the lowest (63%) was observed using the CART algorithm in either the 1.0–2.0 class (Table 2, parenthesis). All models were most sensitive to the >3.5 class and least sensitive to the 1.0–2.0 class except K‐NN and CART with the highest sensitivity in the 2.5–3.5 class and ordinal with the lowest sensitivity in the 2.5–3.5 class. The XGB was the best algorithm with 31% more sensitive than CART, which was the least sensitive in predicting ewe BCS (Table 3). The highest BCS classification sensitivity was observed using XGB models (88.8%) in the >3.5 BCS class while CHART (37.9%) had the lowest in the 1.0–2.0 class (Table 3, parenthesis). All models had the highest specificity observed in the 1.0–2.0 BCS class. The XGB was the best algorithm with 16% more specificity than CART, which had the least specificity in Accuracy latent variable (29.46%) Agriculture 2021, 11, 162 11 of 22 predicting ewe BCS (Table 2). The highest specificity (97.5%) was observed in the 1.0–2.0 class for XGB and the lowest (71.2%) in the 2.5–3.5 class for CART model (Table 3, paren‐ thesis). XGB RF SVM >3.5 ANN 1.0‐2.0 ‐3 ‐2 ‐10 1 2 3 2.5‐3.5 KNN Multinom ‐1 LDA ‐2 Ordinal CART ‐3 Sensetivity‐Specificity latent variable (19.65%) Figure 4. A plot of the accuracy and sensitivity–specificity latent variables from their first dimension/component obtained through a factor analysis of mixed variables (a combination of principle component and multiple correspondence anal‐ yses) procedure on measures of performance for the prediction of ewe BCS at pre‐lambing. Dots (red sphere: model, blue square: BCS class). Dotted diagonal line indicates a balance between accuracy and sensitivity–specificity. If dot is above, then model or BCS class was more accurate than sensitive–specific while the reverse indicates that the model was more sensitive than accurate. The further and more positive a model is along the diagonal line, the greater and better is its prediction power. The variance explained by each extracted first dimension for each latent variable (accuracy, sensitivity– specificity) is given in parenthesis along the axes. 3.1.4. Weaning At weaning, the models had a clear‐cut hierarchy in performance, with XGB being the best and CART the poorest (Figure 5). The RF and K‐NN models had overlapping overall performance. The XGB was the best algorithm with 33% more accuracy than CART, which was the least accurate in predicting ewe BCS (Table 2). The majority of the models were biased towards accuracy, except for multinom, LDA, ordinal and CART, which were inclined towards authenticity (Figure 5). The best overall accuracy was achieved in the >3.5 BCS class and the lowest in the 2.5–3.5 class. Regarding the BCS level model accuracy, there was no clear pattern. However, the majority of the models (XGB, RF, SVM, K‐NN and ANN) were most accurate in the >3.5 BCS class. The least model accuracy was equally observed in the 1.0–2.0 and 2.5–3.5 BCS classes, across models. The highest accuracy (93.2%) was achieved using the RF model in the >3.5 BCS class, and the lowest (61.4%) was observed using the CART algorithm in either the 2.5–3.5 class (Table 2, parenthesis). There was no clear pattern in class‐level model sensitivity at weaning. The XGB was the best algorithm with 34% more sensitivity than CART, which was the least sensitive in predicting ewe BCS (Table 2). The highest BCS classification sensitivity was observed us‐ ing XGB models (92.3%) in the 2.5–3.5 BCS class while CHART (39.2%) had the lowest in the 2.5–3.5 class (Table 3, parenthesis). All models had the highest specificity observed in the >3.5 BCS class and the least in the 2.5–3.5 class, except for the CART, whose specificity Accuracy latent variable (29.25%) Agriculture 2021, 11, 162 12 of 22 arrangement was the opposite, and for ANN and multinom, which had their highest spec‐ ificity in the 1.0–2.0 class. The XGB was the best algorithm with 17% more specificity than CART, which had the least specificity in predicting ewe BCS (Table 3). The highest speci‐ ficity (96.5%) was observed in the 1.0–2.0 class for XGB and the lowest (72.4%) in the 2.5– 3.5 class for CART model (Table 3, parenthesis). 3.5 2.5 XGB KNN RF 1.5 SVM ANN 0.5 >3.5 1.0‐2.0 ‐3.5 ‐2.5 ‐1.5 ‐0.5 0.5 1.5 2.5 3.5 ‐0.5 2.5‐3.5 LDA Multinom ‐1.5 Ordinal ‐2.5 CART ‐3.5 Sensitivity–Specificity latent variable (20.68%) Figure 5. A plot of the accuracy and sensitivity–specificity latent variables from their first dimension/component obtained through a factor analysis of mixed variables (a combination of principle component and multiple correspondence anal‐ yses) procedure on measures of performance for the prediction of ewe BCS at weaning. Dots (red sphere: model, blue square: BCS class). A plot of the accuracy and sensitivity–specificity latent variables from the first dimension/component obtained through a factor analysis of mixed variables (a combination of Principle Component Analysis and Multiple Cor‐ respondence Analysis) procedure on measures of performance for the prediction of ewe BCS at weaning. Dots (red sphere: model, blue square: BCS class). Dotted diagonal line indicates a balance between accuracy and sensitivity–specificity. If dot is above, then model or BCS class was more accurate than sensitive–specific while the reverse indicates that the model was more sensitive than accurate. The further and more positive a model is along the diagonal line, the greater and better is its prediction power. The variance explained by each extracted first dimension for each latent variable (accuracy, sensi‐ tivity–specificity) is given in parenthesis along the axes. 3.1.5. The Balance between Sensitivity and Specificity The data showed that the overall specificity 86% (67–98%) was higher than sensitivity 74% (37–96%) values across all algorithms (Table 3). An assessment of the indicators of the balance between sensitivity and specificity was undertaken and the indices are sum‐ marized in Table 4. The positive likelihood ratio (PLR) for all models were greater than 1.0 while the negative likelihood ratio (NLR) was less than 1.0 across stages of the annual cycle. The XGB model had the highest PLR and lowest NLR, while CART had the lowest PLR and highest NLR across stage of the annal cycle. Similarly, Youden’s index, YI, was consistently highest for XGB model and lowest for the CART model. Accuracy latent variable (29.75%) Agriculture 2021, 11, 162 13 of 22 Table 4. Measures of the balance between sensitivity and specificity of the BCS prediction models by stage of the annual cycle. Pre‐Breeding Pregnancy Diagnosis Pre‐Lambing Weaning Model PLR NLR YI PLR NLR YI PLR NLR YI PLR NLR YI XGB 33.41 0.15 0.79 16.48 0.13 0.82 19.39 0.13 0.81 18.32 0.12 0.83 RF 20.49 0.16 0.78 14.45 0.14 0.80 15.33 0.16 0.78 12.25 0.16 0.77 SVM 16.88 0.19 0.74 12.13 0.19 0.74 18.48 0.20 0.72 11.79 0.20 0.73 K‐NN 15.21 0.20 0.73 12.3 0.17 0.77 3.90 0.42 0.48 11.64 0.16 0.78 ANN 13.04 0.22 0.70 6.94 0.27 0.64 6.32 0.32 0.58 8.66 0.24 0.68 Multinom 8.65 0.27 0.65 4.87 0.35 0.55 3.69 0.43 0.47 4.28 0.38 0.52 LDA 8.16 0.29 0.62 5.12 0.36 0.54 3.78 0.42 0.48 4.37 0.38 0.52 Ordinal 7.66 0.32 0.59 4.20 0.45 0.45 2.83 0.54 0.37 3.83 0.45 0.45 CART 3.92 0.46 0.45 3.27 0.49 0.40 2.70 0.54 0.35 2.49 0.57 0.33 Models: (XGB: Gradient boosting decision trees model, RF: random forest, K‐NN: k‐nearest neighbors, SVM: Support Vector Machine, ANN: neural networks, multinorm: multinomial regression, LDA: linear discriminant analysis, Ordinal: ordinal logistic regression, CART: classification and regression tree). Measures of the balance between sensitivity and specificity (PLR: Positive likelihood rate, NLR: Negative likelihood rate and YI: Youden’s index). A good model (PLR value > 1.0 and the larger PLR is the better, NLR value less than 1.0 and the smaller the better, YI ranges from 0 to 1.0 and values that approach 1.0 show higher authenticity and prediction power). 3.1.6. Overall Model Ranking Overall, black box models were better than low‐level white box models (Table 5). The XGB was consistently the best performing while CART was the poorest model. There was change in model ranking across stages of the annual cycle except for XGB, LDA, ordinal and CART. Table 5. Model ranking by stage of annual cycle and overall. Pregnancy Model Pre‐Breeding Pre‐Lambing Weaning Overall Diagnosis XGB 1 1 1 1 1(1.0) RF 3 2 2 2 2(2.3) SVM 4 3 4 3 3(3.5) K‐NN 2 6 3 4 4(3.8) ANN 5 4 5 5 5(4.8) Miltinom 6 5 6 6 6(5.8) LDA 7 7 7 7 7(7.0) Ordinal 8 8 8 8 8(8.0) CART 9 9 9 9 9(9.0) Overall (overall rank with means in parenthesis). The lower the rank the greater the BCS predic‐ tion performance. 4. Discussion The present study utilized machine learning classification algorithms to explore the possibility of predicting BCS from current and previous liveweight in mature ewes (at approximately 43–54 months of age). Body condition score was treated as a categorical variable with three levels (1.0–2.0, 2.5–3.5; >3.5). Nine of the most recognized machine learning models (XGB, ANN, RF, K‐NN, SVM, ordinal, multinom, LDA and CART mod‐ els) were applied to preprocessed datasets. We applied a strategy to reduce the accuracy and authenticity measures into two dimensions in order to generate latent variables or constructs that were plotted to give a visual summary of model performance. This technique gave a visual display (a holistic picture) of overall model performance which made it easier to decipher the patterns in the relationship between the accuracy and authenticity of models in BCS prediction. Previous Agriculture 2021, 11, 162 14 of 22 studies have suggested the use of several metrics to give an indication about a model’s accuracy and authenticity [24,43,48,49]. These have, however, been piecemeal with no uni‐ fying interface. By bringing together both accuracy and authenticity measures in a single display, we appear to have cracked that enigma. This innovation could serve as a platform for interrogating even better ways of model performance evaluation. 4.1. Overall Accuracy The findings suggest that ewe BCS prediction from current and previous liveweight can be achieved using machine learning classification algorithms within the limited BCS range used in the present study. The results indicated that XGB was the most efficient and robust model (overall accuracy = 87.6%; sensitivity = 87.7%; specificity = 93.9%). Other good alternatives to XGB for predicting ewe BCS were three algorithms (K‐NN, RF and SVM) with accuracies > 80% and kappas > 70%, while the remaining four (CART, ordinal, LDA and multinomial) were weak algorithms (accuracies < 70%, kappas < 60%). All mod‐ els performed better than a random guess, with the most efficient models giving predic‐ tion errors as low as 11% and 38%. According to Galdi and Tagliaferri [50], a perfect clas‐ sifier has a rate of 100%, while a random guess would give a 33.3% error for three‐level classifiers [50,51]. The weakest algorithms outperformed a random guess by only 8, 11, 15 and 20%, respectively, using the current study data. Whereas accuracy measures can be interpreted arbitrarily, Cohen’s kappa statistic has been classified [42,52] into six different categories, no agreement (values ≤ 0), none to slight (0.01–0.20), fair (0.21–0.40), moderate (0.41–0.60), substantial (0.61–0.80) and almost perfect agreement (0.81–1.00). Further, Fleiss et al. [53] suggested that kappa values greater than 0.75 may be taken to represent excellent agreement beyond serendipity, values below 0.40 as poor agreement and values between 0.40 and 0.75 as fair to good agreement. The findings in this study suggest that using the top performing algorithms (XGB and RF), ewe BCS can be predicted with high accuracy across four phases of the annual cycle. 4.2. Class‐Level Accuracy Results also showed that at the accuracy‐related class level, metrics including accu‐ racy, precision and F‐measure were highest for XGB, making it the most efficient and ro‐ bust model for ewe BCS prediction. Further, there appeared to be variability in all metrics across stages of the annual sheep weighing cycle and BCS class. This variation in accuracy across the stages of the annual cycle suggests that with the exception of XGB, different models may be required to predict BCS at different stages of the annual cycle. Similarly, different models may be required if there is need for greater accuracy in one BCS class than others. This is especially important when great accuracy is required for management decisions with far reaching consequences such as when limited resources must be allo‐ cated to only target classes. Further, results indicated that the higher‐level (black box) ma‐ chine learning models such as XGB and RF were better at separating BCS into distinct classes than the lower‐level (white box) models such as multinomial or ordinal logistic regression. In the current study, the best balance between accuracy and authenticity (sensitivity– specificity) was achieved during pre‐breeding compared to other stages of the annual cy‐ cle. This observation could have been due to the “relative ease” to condition score ewe pre‐breeding than other stages of the annual cycle [2,54]. Prior to breeding, most farmers enhance ewe feeding in a process known as flushing [55,56], which likely resulted in uni‐ form tissue (fat and muscle) distribution around the body. In addition, the weight meas‐ urements recorded pre‐breeding are not confounded by the conceptus mass which is the case at pregnancy diagnosis and pre‐lambing. The conceptus mass influences the ewe live‐ weight from pregnancy through the pre‐lambing stage [54,57], which coincides with the two time‐point weight measurements during those stages of the annual cycle. Further, during lactation a ewe has its greatest nutrient requirements for energy and protein [58], Agriculture 2021, 11, 162 15 of 22 and at weaning a ewe is drained by the lactation process, leading to variability in fat dep‐ osition around the body; consequently, the ewe are lighter. Using the same ewe popula‐ tion, we have previously reported a decreasing trend in ewe BCS as a ewe aged, plat‐ eauing after 43–54 months [9]. This was attributed to a likelihood that farmers were un‐ derfeeding their aging ewes at certain stages or periods of the annual cycle. Lactation pe‐ riod could be one of such periods, resulting in failure to meet ewe dietary energy and protein requirements and consequently leading to thinner animals. The management con‐ ditions at pregnancy diagnosis, pre‐lambing and weaning, therefore, could lead to differ‐ ences in fat deposition around the body, resulting in variability in BCS. 4.3. Class‐Level Model Authenticity Among the indicators of model authenticity, the models had apparently greater spec‐ ificity than sensitivity, which could point to unbalanced distinguishing power to make predictions. An examination of three indicators of balance between sensitivity and speci‐ ficity or model authenticity/power (PLR and YI) indicated that all models had values within acceptable authenticity and power (PLR > 1.0, NLR < 1.0 and YI > 1.0) across stage four stages of the annual cycle, indicating that all models had balanced sensitivity and specificity. Results also showed that XGB had the highest PLR and YI and the lowest NLR. Combined with the results from the measures of accuracy, these results rank XBG as the most robust model for BCS prediction. Sensitivity is defined as the proportion of individ‐ uals or items who belong to a given BCS class and are correctly identified, while specificity is the proportion which do not belong to a given class and are excluded by the test. There exists an inverse relationship between sensitivity and specificity of a test or prediction model [59,60]. If a model has high sensitivity, it is capable of detecting “real” BCS classes, but it also faces losses from consuming more resources due to mandatory confirmatory tests (to rule out the false positives) or when the limited resources have to be given to only the right candidates. However, if a model has high specificity, the system benefits from a significant reduction in the consumption of resources and time, but it has a decreased capacity to detect “real” BCS classes, which can lead to failure to detect many events of importance [44]. The higher specificity would not be advantageous, as failure to detect ewes inside or outside the BCS range (2.5–3.5) for optimum productivity would affect management decisions negatively. Therefore, a good model needs to achieve a balance between sensitivity and specificity [55]. This study suggests that ewe BCS prediction from current and previous liveweight can usefully be achieved using machine learning classification algorithms within a limited BCS range used in the present study. This study used unadjusted liveweight (i.e., con‐ founded by factors such as fleece length variations and fetal mass from pregnancy to lambing) records alone to achieve accuracies up to 89% in order to assign BCS to one out of three classes. It is likely that if adjusted liveweights were used together with other key variables that affect BCS, optimum accuracy would be achieved from these BCS prediction algorithms. Semakula et al. [10] suggested that the accuracy of BCS prediction could be improved if all key variables affecting the relationship between liveweight and BCS were accounted for. If this was the case, the efficiency of the machine learning models tested could also be enhanced. Although not directly comparable, having used different scale ranges and different measures of model performance, the best ML model (XGB) in the current study had great efficiency (based on liveweight predictors, alone and achieved greater than 90% accura‐ cies) and was stable (accuracy: 86–93%) across stages of the annual cycle. In their previous study based on linear regression models, Semakula et al. [10] achieved only weak to mod‐ erate wellness of fit (R = 50%) using more resources (both LW and BCS records com‐ bined). Further, the model wellness of fit and accuracy varied greatly (R : 28–64%) across stages of the annual cycle, making the linear regression models less stable. When com‐ bined, therefore, this suggests that machine learning models would offer better BCS pre‐ dictions than the linear regression models. Agriculture 2021, 11, 162 16 of 22 5. Conclusions The results of the present study showed that ewe BCS (grouped) can be predicted with great accuracy on a narrow BCS (1.0–2.0, 2.5–3.5, >3.5) scale from a ewe’s current and previous liveweight using machine learning algorithms. The gradient boosting decision trees algorithm was the most efficient for ewe BCS prediction. The results of this study, therefore, support the hypothesis that BCS can be accurately predicted from a ewe’s cur‐ rent and previous liveweights. The algorithms, having been trained on a large representa‐ tive dataset, should be able to give accurate ewe BCS predictions. These algorithms (ac‐ quired intelligence) could be incorporated into weighing systems to easily and quickly give farmers ewe BCS without the need for the hands‐on burden. Future studies should investigate how to ameliorate the accuracy of BCS prediction and the possibility of indi‐ vidual BCS prediction on a full range (1–5). Acknowledgments: We wish to thank Anne Ridler, Kate Griffiths, Catriona Jenkinson and Dean Burnham for their technical assistance. Author Contributions: conceptualization, J.S., R.A.C.‐T., S.T.M., H.T.B. and P.R.K.; data collection, R.A.C.‐T., S.T.M., H.T.B. and P.R.K.; data curation, R.A.C.‐T.; software, formal analysis, results in‐ terpretation, validation, preparation of the manuscript: J.S.; supervision, writing—review and edit‐ ing, R.A.C.‐T., S.T.M., H.T.B. and P.R.K. All authors have read and agreed to the published version of the manuscript. Funding: The study was supported by Massey University and the International Sheep Research Centre. Institutional Review Board Statement: Data was collected as part of normal routine farm manage‐ ment and thus, no ethical approval was required. Data Availability Statement: Data is available on request to the author. Conflicts of Interest: The authors declare no conflict of interest. The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manu‐ script; or in the decision to publish the results. Ethics Statement: The data used in the current study was collected as part of routine management practices and did not require ethical approval. Appendix A Table A1. Key model performance characteristics of common machine learning algorithms (selecting the most appropriate algorithms). Parameter and Sample Size and Assumptions Covariate Computa‐ Interpretabil‐ Prone to 1 2 Model Concept Processes Re‐ Data Dimen‐ and Data Re‐ References 4 5 Pools tional Time ity Overfitting quired sionality quirements Affected by Probabilistic re‐ No hyperparame‐ proportional Ordinal small sample No Fast White box Yes [32,56,58] gression ters odds, linearity sizes Probabilistic re‐ No hyperparame‐ proportional Multinom Yes No Fast White box Yes [32,58,61] gression ters odds, linearity Affected by Dimension re‐ Normality, lin‐ small sample duction + sepa‐ No hyperparame‐ earity & contin‐ LDA sizes, Good for No Fast White box Yes [62–64] rability between ters uous independ‐ high dimension classes ent variables data Performs well numerical or can remove Decision trees Low‐level CART Hyperparameters with large da‐ categorical out‐ redundant Fast No [65,66] and regression black box tasets come covariates Performs well on Decision trees, numerical or can remove Decreases Up to three hy‐ small & high di‐ Low‐level RF regression and categorical out‐ redundant with sample No [17,67] perparameters mensionality black box bugging come covariates size data Agriculture 2021, 11, 162 17 of 22 Regression trees numerical or can remove Yes, if large Require large da‐ High‐level XGB + gradient Hyperparameter categorical out‐ redundant Very fast number of [68,69] tasets black box boosting come covariates trees Regression Not good for No assump‐ Decreases curve + hy‐ One hyperparam‐ large & high di‐ tions but re‐ Fairly inter‐ K‐NN No with sample Yes [17,70] perparameter eter mensionality quires scaled pretable size (k) data data Maximal mar‐ Not good for Decreases Two hyperpa‐ No assump‐ High‐level SVM gins + kennel high dimension No with sample Yes [71,72] rameters tions black box functions data size computation‐ Sensitive to sam‐ numerical or ally very ex‐ Nodes (artificial Up to seven hy‐ High‐level ANN ple size and data categorical out‐ No pensive and Yes [73] neurons) perparameters black box dimensionality come time consum‐ ing 1 Model (Ordinal: ordinal logistic regression, multinorm: multinomial regression, LDA: linear discriminant analysis, CART: classification and regression tree, RF: random forest, XGB: Gradient boosting decision trees model, K‐NN: k‐nearest 2 3 neighbors, SVM: support vector machines, ANN: neural networks). Concept: How the algorithm works. Parameter and 4 processes: Tuning parameters for the algorithm. Covariate pools: Intrinsic ability to remove redundant variables or to 5 select important variables. Interpretability: White box: clear model structure with parameters: black boxes: model struc‐ ture and the relationship between variables is unknown. NB: The criteria used to summarize the key model performance characteristic was a modified version of a 5‐point criteria by Khaledian and Miller [17]. Table A2. A pairwise comparison (Bonferroni p‐value adjustment) of overall performance accuracy of nine predictive models for BCS, at different stages of the annual cycle (PB: pre‐breeding, PD: pregnancy diagnosis, PL: pre‐lambing, W: weaning) in 43–54‐month‐old ewes. p‐value > 0.05 indicates no significant difference between models. All ewe BCS pre‐ dictions were based on liveweight record 2. Model A Model B PB PD PL W XGB K‐NN 0.011 0.000 0.000 0.000 RF 1.000 0.000 0.245 0.007 SVM 0.010 0.000 0.000 0.000 ANN 0.000 0.000 0.001 0.000 Multinorm 0.000 0.000 0.000 0.000 LDA 0.000 0.000 0.000 0.000 Ordinal 0.000 0.000 0.000 0.000 CART 0.000 0.000 0.000 0.000 K‐NN RF 0.003 0.281 0.000 0.041 SVM 1.000 1.000 0.000 1.000 ANN 0.231 0.000 1.000 0.000 Multinorm 0.000 0.000 0.779 0.000 LDA 0.000 0.000 1.000 0.000 Ordinal 0.000 0.000 0.000 0.000 CART 0.000 0.000 0.004 0.000 RF SVM 0.203 0.014 0.008 0.002 ANN 0.002 0.000 0.002 0.000 Multinorm 0.000 0.000 0.000 0.000 LDA 0.000 0.000 0.000 0.000 Ordinal 0.000 0.000 0.000 0.000 CART 0.000 0.000 0.000 0.000 SVM ANN 0.563 0.000 0.021 0.000 Multinorm 0.000 0.000 0.000 0.000 LDA 0.000 0.000 0.000 0.000 Ordinal 0.000 0.000 0.000 0.000 CART 0.000 0.000 0.000 0.000 ANN Multinorm 0.002 0.000 1.000 0.000 Agriculture 2021, 11, 162 18 of 22 LDA 0.000 0.000 1.000 0.000 Ordinal 0.002 0.000 0.000 0.000 CART 0.000 0.000 0.903 0.000 Multinorm LDA 0.019 1.000 1.000 1.000 Ordinal 0.004 0.000 0.000 0.000 CART 0.000 0.000 0.023 0.000 LDA Ordinal 0.019 0.000 1.000 0.006 CART 0.000 0.000 0.032 0.000 Ordinal CART 0.000 0.002 0.047 0.008 Model: (XGB: Gradient boosting decision tree model, RF: random forest, K‐NN: k‐nearest neighbors, SVM: support vector machines, ANN: neural networks, multinorm: multinomial regression, LDA: linear discriminant analysis, Ordinal: ordinal logistic regression, CART: classification and regression tree). Table A3. Accuracy measures (precision, F‐measure) of nine predictive models for ewe BCS at 43–54 months of age pre‐breeding at different stages of the annual sheep weighing cycle (PB: pre‐breeding, PD: pregnancy diagnosis, PL: pre‐lambing and W: weaning). Values in parenthesis indicate the minimum and maximum. PB PD PL W Model Precision % F‐Measure % Precision % F‐Measure % Precision % F‐Measure % Precision % F‐Measure % 86.1(78.2– 86.0(80.1– 87.9(80.8– 87.6(84.1– 87.9(80.8– 87.6(84.1– 89.1(84.2– 89.0(87.5– XGB 97.7) 96.9) 94.5) 90.0) 94.5) 90.0) 92.8) 91.3) 85.3(78.1– 85.3(79.0– 86.9(83.2– 86.7(83.6– 86.1(77.0– 85.7(81.0– 84.9(79.3– 84.7(83.2– RF 95.9) 95.6) 91.1) 90.7) 91.7) 89.0) 88.8) 86.4) 82.7(74.1– 82.7(74.5– 83.4(74.6– 82.6(80.0– 83.5(68.7– 81.8(76.0– 82.8(71.6– 82.0(78.0– SVM 95.1) 94.4) 90.3) 87.2) 95.0) 86.4) 89.4) 85.7) 82.3(75.0– 82.0(71.8– 84.7(77.5– 84.5(80.9– 64.5(58.1– 64.1(61.8– 84.9(79.3– 85.1(80.5– K‐NN 94.4) 95.3) 89.5) 90.6) 68.6) 65.5) 88.8) 88.1) 80.3(71.9– 80.3(71.6– 76.3(72.1– 76.1(73.2– 73.5(64.5– 71.5(67.4– 79.5(70.0– 78.7(76.4– ANN 93.4) 92.6) 83.7) 80.7) 83.3) 76.2) 85.0) 82.6) Mul‐ 76.8(67.7– 76.8(68.1– 70.2(65.6– 70.0(64.1– 64.8(62.8– 64.6(62.1– 68.1(65.0– 67.7(65.7– tinom 89.3) 89.1) 76.4) 73.8) 65.9) 67.1) 70.7) 70.2) 75.0(64.3– 74.9(64.5– 70.5(65.1– 69.3(61.8– 65.3(61.9– 64.9(61.5– 68.3(63.4– 67.6(65.8– LDA 89.0) 88.3) 79.0) 73.3) 67.9) 67.7) 70.8) 70.7) 73.2(59.2– 72.9(60.4– 64.9(55.0– 63.8(58.4– 57.3(45.8– 57.5(43.5– 64.2(52.9– 63.4(57.4– Ordinal 88.5) 85.3) 77.4) 68.1) 64.2) 66.7) 70.9) 68.7) 62.1(47.3– 62.3(41.5– 61.1(55.5– 59.2(48.5– 57.3(55.1– 55.7(46.6– 55.4(53.4– 54.8(45.3– CART 77.7) 80.0) 68.9) 64.6) 60.5) 62.5) 59.0) 60.9) Model: (XGB: Gradient boosting decision tree model, RF: random forest, K‐NN: k‐nearest neighbors, SVM: support vec‐ tor machines, ANN: neural networks, multinorm: multinomial regression, LDA: linear discriminant analysis, Ordinal: ordinal logistic regression, CART: classification and regression tree). Agriculture 2021, 11, 162 19 of 22 Figure A1. Machine learning flow chart for ewe BCS prediction using their current and previous liveweights. (a) (b) (c) (d) Dimension 1 Dimension 1 Figure A2. Random forest‐based multidimensional score (MDS) plots for BCS prediction in 43–54‐month‐old ewes at different stages of the annual cycle (a: pre‐breeding, b: pregnancy diagnosis, c: pre‐lambing, d: weaning). Red, blue and green circles represent single data points from BCS of 1.0–2.0, 2.5‐3.5 and >3.5, respectively. Dimension 2 Dimension 2 Agriculture 2021, 11, 162 20 of 22 References 1. Jefferies, B. Body condition scoring and its use in management. Tasmanian Jour. Agr. 1961, 32, 19–21. 2. Kenyon, P.R.; Maloney, S.K.; Blache, D. Review of sheep body condition score in relation to production characteristics. NZJ Agric Res. 2014, 57, 38–64. https://doi.org/10.1080/00288233.2013.857698 3. Coates, D.B.; Penning, P. Measuring animal performance. In: Jones LtaR, editor. Field and laboratory methods for grassland and animal production research. CABI Publishing: Wallingford, UK, 2000; pp. 353–402. 4. Morel, P.C.H.; Schreurs, N.M.; Corner‐Thomas, R.A.; Greer, A.W.; Jenkinson, C.M.C.; Ridler, A.L.; Kenyon, P. R. Live weight and body composition associated with an increase in body condition score of mature ewes and the relationship to dietary energy requirements. Small Ruminant Res. 2016, 143, 8–14. https://doi.org/10.1016/j.smallrumres.2016.08.014 5. Jones, A.; van Burgel, A.J.; Behrendt, R.; Curnow, M.; Gordon, D.J.; Oldham, C.M.; Rose, I.J.;Thompson, A. N.. Evaluation of the impact of Lifetimewool on sheep producers. Anim. Prod. Sci. 2011, 51, 857–865. https://doi.org/10.1071/EA08303 6. Corner‐Thomas, R.A.; Kenyon, P.R.; Morris, S.T.; Ridler, A.L.; Hickson, R.E.; Greer, A.W.; Logan, C.M.; Blair, H.T. Brief com‐ munication: The use of farm‐management tools by New Zealand sheep farmers: Changes with time. Proc. NZ Soc. Anim. Prod. 2016, 76, 78–80. 7. Besier, R.B.; Hopkins, D. Farmers’ estimations of sheep weights to calculate drench dose. Jour. Dept. Agr. West. Aust., Series 4 1989, 30, 120–121. 8. McHugh, N.; McGovern, F.M.; Creighton, P.; Pabiou, T.; McDermott, K.; Wall, E.; Berry, D.P. Mean difference in live‐weight per incremental difference in body condition score estimated in multiple sheep breeds and crossbreds. Animal 2019, 13, 1–5. https://doi.org/10.1017/S1751731118002148 9. Semakula, J.; Corner‐Thomas, R.A.; Morris, S.T.; Blair, H.T.; Kenyon, P.R. The Effect of Age, Stage of the Annual Production Cycle and Pregnancy‐Rank on the Relationship between Liveweight and Body Condition Score in Extensively Managed Rom‐ ney Ewes. Animals 2020a, 10, 784. https://doi.org/10.3390/ani10050784 10. Semakula, J.; Corner‐Thomas, R.A.; Morris, S.T.; Blair, H.T.; Kenyon, P.R. Predicting Ewe Body Condition Score Using Lifetime Liveweight and Liveweight Change, and Previous Body Condition Score Record. Animals 2020b, 10, 1182. https://doi.org/10.3390/ani10071182 11. Bishop, P.A.; Herron, R.L. Use and misuse of the Likert item responses and other ordinal measures. Int. J. Exerc. Sci. 2015, 8, 297. 12. Blaikie, N. Analyzing quantitative data: From description to explanation; Sage: New York, NY, USA; 2003. https://dx.doi.org/10.4135/9781849208604 13. Sullivan, G.M.; Artino, A.R. Analyzing and interpreting data from Likert‐type scales. J. Grad. Med. Educ. 2013, 5, 541–542. 14. Wicker, J.E. Applications of modern statistical methods to analysis of data in physical science. Ph.D. Thesis, University of Ten‐ nessee: Knoxville, TN, USA, May 2006. 15. Shahinfar, S.; Kahn, L. Machine learning approaches for early prediction of adult wool growth and quality in Australian Merino sheep. Comput. Electron. Agric. 2018, 148, 72–81. http://dx.doi.org/10.1016/j.compag.2018.03.001 16. Shahinfar, S.; Kelman, K.; Kahn, L. Prediction of sheep carcass traits from early‐life records using machine learning. Comput. Electron. Agric. 2019, 156, 159–177. https://doi.org/10.1016/j.compag.2018.11.021 17. Khaledian, Y.; Miller, B.A. Selecting appropriate machine learning methods for digital soil mapping. Appl. Math. Model. 2020, https://doi.org/10.1016/j.apm.2019.12.016 81, 401–418. 18. Morota, G.; Ventura, R.V.; Silva, F.F.; Koyama, M.; Fernando, S.C. Big data analytics and precision animal agriculture sympo‐ sium: Machine learning and data mining advance predictive big data analysis in precision animal agriculture. Big data analysis in Animal Science 2018, 96, 1540–1550. https://doi.org/10.1093/jas/sky014 19. Bakoev, S.; Getmantseva, L.; Kolosova, M.; Kostyunina, O.; Chartier, D.R.; Tatarinova, T.V. PigLeg: Prediction of swine pheno‐ type using machine learning. PeerJ 2020, 8, e8764‐e. 20. R Core Team. R: A language and environment for statistical computing. R Foundation for Statistical Computing: Vienna, Aus‐ tria. 2016. R version 3.4.4 (2018‐03‐15) ed2016. Available online: https://cran.r‐project.org (accessed on 15 March 2018). 21. Kuhn, M. Building predictive models in R using the caret package. J. Stat. Softw. 2008, 28, 1–26. http://dx.doi.org/10.18637/jss.v028.i05 22. Triguero, I.; del Río, S.; López, V.; Bacardit, J.; Benítez, J.M.; Herrera, F. ROSEFW‐RF: The winner algorithm for the ECBDL’14 big data competition: An extremely imbalanced big data bioinformatics problem. Knowl‐Based. Syst. 2015, 87, 69–89. https://doi.org/10.1016/j.knosys.2015.05.027 23. Leevy, J.L.; Khoshgoftaar, T.M.; Bauder, R.A.; Seliya, N. A survey on addressing high‐class imbalance in big data. J. Big Data 2018, 5, 42. https://doi.org/10.1186/s40537‐018‐0151‐6 24. Tharwat, A. Classification assessment methods. Appl. Comput. Inform. 2020. https://doi.org/10.1016/j.aci.2018.08.003 25. Chawla, N.V.; Bowyer, K.W.; Hall, L.O.; Kegelmeyer, W.P. SMOTE: Synthetic minority over‐sampling technique. J. Artif. Intell. Res. 2002, 16, 321–357. https://doi.org/10.1613/jair.953 26. Branco, P.; Ribeiro, R.P.; Torgo, L. UBL: An R package for utility‐based learning. arXiv preprint arXiv:160408079 2016. Accessed on 21 July 2020. 27. Friedman, J.; Hastie, T.; Tibshirani, R. Regularization Paths for Generalized Linear Models via Coordinate Descent. J. Stat. Soft. 2010, 33, 1–22. PMID: 20808728; PMCID: PMC2929880. 28. Archer, K.J.; Williams, A.A. L 1 penalized continuation ratio models for ordinal response prediction using high‐dimensional datasets. Stat. Med. 2012, 31, 1464–1474. https://doi: 10.1002/sim.4484. Epub 2012 Feb 23. PMID: 22359384; PMCID: PMC3718008. Agriculture 2021, 11, 162 21 of 22 29. Tropsha, A.; Gramatica, P.; Gombar, V.K. The Importance of Being Earnest: Validation is the Absolute Essential for Successful Application and Interpretation of QSPR Models. QSAR & Comb. Sci. 2003, 22, 69–77. https://doi.org/10.1002/qsar.200390007 30. Valletta, J.J.; Torney, C.; Kings, M.; Thornton, A.; Madden, J. Applications of machine learning in animal behaviour studies. Anim. Behav. 2017, 124, 203–220. http://dx.doi.org/10.1016/j.anbehav.2016.12.005 31. Torgo, L. Data mining with R: Learning with case studies. Chapman and Hall/CRC: Boca Raton, FL, USA, pp. 426. 32. Agresti, A.; Kateri, M. Categorical Data Analysis. In International Encyclopedia of Statistical Science; Lovric, M. Springer Berlin Heidelberg: Berlin, Heidelberg, 2011. pp. 206–208. 33. Zhao, H.; Wang, Z.; Nie, F. A new formulation of linear discriminant analysis for robust dimensionality reduction. Trans. Knowl. Data Eng. 2018, 31, 629–640. https://doi.org/10.1109/TKDE.2018.2842023 34. Rennie, J.D.; Shih, L.; Teevan, J.; Karger, D.R. Tackling the poor assumptions of naive bayes text classifiers. Proceedings of the 20th international conference on machine learning (ICML‐03); 2003. https://dl.acm.org/doi/10.5555/3041838.3041916 35. Zhu, F.; Tang, M.; Xie, L.; Zhu, H. A Classification Algorithm of CART Decision Tree based on MapReduce Attribute Weights. Int. J.Performability Eng. 2018, 14. https://doi.org/10.23940/ijpe.18.01.p3.1725 36. Zeng, Z. Q.; Yu, H. B.; Xu, H. R.; Xie, Y. Q.; Gao, J. Fast training support vector machines using parallel sequential minimal optimization. 2008 3rd international conference on intelligent system and knowledge engineering; 2008: IEEE. https://doi.org/10.1109/ISKE.2008.4731075 37. Breiman, L. Arcing classifier (with discussion and a rejoinder by the author). The ann. Stat. 1998, 26, 801–849. https://doi.org/10.1214/aos/1024691079 38. Sun, S.; Huang, R., editors. An adaptive k‐nearest neighbor algorithm. 2010 seventh international conference on fuzzy systems and knowledge discovery; 2010: IEEE. https://doi.org/10.1109/FSKD.2010.5569740 39. Ebrahimi, M.; Mohammadi‐Dehcheshmeh, M.; Ebrahimie, E.; Petrovski, K.R. Comprehensive analysis of machine learning models for prediction of sub‐clinical mastitis: Deep learning and gradient‐boosted trees outperform other models. Comput. Biol. Med. 2019, 114, 103456. https://doi.org/10.1016/j.compbiomed.2019.103456 40. Fisher, D.H. Knowledge acquisition via incremental conceptual clustering. Machine learning 1987, 2, 139–172. https://doi.org/10.1023/A:1022852608280 41. Cohen, J. A Coefficient of Agreement for Nominal Scales. Educ. Psychol. Meas. 1960, 20, 37–46. https://doi.org/10.1177%2F001316446002000104 42. McHugh, M.L. Interrater reliability: The kappa statistic. Biochemia medica 2012, 22, 276–282. 43. Botchkarev, A. Performance Metrics (Error Measures) in Machine Learning Regression, Forecasting and Prognostics: Properties and Typology. IJIKM 2019, 14, 45–79. https://doi.org/10.28945/4184 44. Lan, Y.; Zhou, D.; Zhang, H.; Lai, S. Development of Early Warning Models. In Early Warning for Infectious Disease Outbreak; Yang, W. Academic Press: Cambridge, MA, USA; 2017. pp. 35–74. 45. Glorfeld, L.W. An improvement on Horn’s parallel analysis methodology for selecting the correct number of factors to retain. Educ. Psychol. Meas. 1995, 55, 377–393. https://doi.org/10.1177%2F0013164495055003002 46. Horn, J.L. A rationale and test for the number of factors in factor analysis. Psychometrika 1965, 30, 179–185. https://doi.org/10.1007/bf02289447 47. Lê, S.; Josse, J.; Husson, F. FactoMineR: An R package for multivariate analysis. J. Stat. Softw 2008, 25, 1–18. http://dx.doi.org/10.18637/jss.v025.i01 48. Dinga, R.; Penninx, B.W.; Veltman, D.J.; Schmaal, L.; Marquand, A.F. Beyond accuracy: Measures for assessing machine learning models, pitfalls and guidelines. bioRxiv 2019, 743138. https://doi.org/10.1101/743138 49. Hossin, M.; Sulaiman, M. A review on evaluation metrics for data classification evaluations. IJDKP 2015, 5, 1. https://doi.org/ 10.5121/ijdkp.2015.5201 50. Galdi, P.; Tagliaferri, R. Data mining: Accuracy and error measures for classification and prediction. Encyclopedia of Bioinformat‐ ics and Computational Biology 2018, 431–416. https://doi.org/10.1016/B978‐0‐12‐809633‐8.20474‐3 51. Dietterich T.G; Ensemble Methods in Machine Learning. In: Multiple Classifier Systems. MCS 2000. Lecture Notes in Computer Science, vol 1857. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3‐540‐45014‐9_. 52. Landis, J.R.; Koch, G.G. The measurement of observer agreement for categorical data. Biometrics 1977, 159–174. https://doi.org/10.2307/2529310 53. Fleiss, J.L. The measurement of interrater agreement. In: Statistical methods for rates and proportions. 2nd ed. New York, NY: John Wiley & Sons, 1981; 212–236. 54. Kenyon, P.R.; Pain, S.J.; Hutton, P.G.; Jenkinson, C.M.C.; Morris, S.T.; Peterson, S.W.; Blair, H.T. Effects of twin‐bearing ewe nutritional treatments on ewe and lamb performance to weaning. Anim. Prod. Sci. 2011, 51, 406–415. https://doi.org/10.1071/AN10184 55. Obuchowski, N.A.; Bullen, J.A. Receiver operating characteristic (ROC) curves: Review of methods with applications in diag‐ nostic medicine. Phys. Med. Biol. 2018, 63, 07TR1. https://doi.org/10.1088/1361‐6560/aab4b1 56. Agresti, A. Modelling ordered categorical data: Recent advances and future challenges. Stat. Med. 1999, 18, 2191–2207. https://doi.org/10.1002/(sici)1097‐0258(19990915/30)18:17/18%3C2191::aid‐sim249%3E3.0.co;2‐m 57. Kenyon, P.R.; Morris, S.T.; Burnham, D.L.; West, D.M. Effect of nutrition during pregnancy on hogget pregnancy outcome and birthweight and liveweight of lambs. N Z J. Agric. Res. 2008, 51, 77–83. https://doi.org/10.1080/00288230809510437 58. Liao, T.F. Interpreting probability models: Logit, probit, and other generalized linear models: Sage: New York, NY, USA, 1994. Agriculture 2021, 11, 162 22 of 22 59. Naeger, D.M.; Kohi, M.P.; Webb, E.M.; Phelps, A.; Ordovas, K.G.; Newman, T.B. Correctly using sensitivity, specificity, and predictive values in clinical practice: How to avoid three common pitfalls. Am. J. Roentgenol 2013, 200, W566–W70. https://doi.org/10.2214/ajr.12.9888 60. Parikh, R.; Mathai, A.; Parikh, S.; Sekhar, G.C.; Thomas, R. Understanding and using sensitivity, specificity and predictive val‐ ues. Indian J. Ophthalmol 2008, 56, 45. https://doi.org/10.4103/0301‐4738.37595 61. Böhning, D. Multinomial logistic regression algorithm. Annals of the Institute of Statistical Mathematics 1992, 44, 197–200. https://doi.org/10.1007/BF00048682 62. Chen, L. F.; Liao, H. Y.M.; Ko, M. T.; Lin, J. C.; Yu, G.‐J. A new LDA‐based face recognition system which can solve the small sample size problem. Pattern recognition 2000, 33, 1713–1726. https://doi.org/10.1016/S0031‐3203(99)00139‐9 63. Yu, H.; Yang, J. A direct LDA algorithm for high‐dimensional data—with application to face recognition. Pattern recognition 2001, 34, 2067–2070. https://doi.org/10.1016/S0031‐3203(00)00162‐X 64. Zheng, W.; Zhao, L.; Zou, C. An efficient algorithm to solve the small sample size problem for LDA. Pattern Recognition 2004, 37, 1077–1079. http://dx.doi.org/10.1016%2Fj.patcog.2003.02.001 65. Quinlan, J.R. Simplifying decision trees. Int. J. Man. Mach. Stud. 1987, 27, 221–234. 66. Quinlan, J.R. Induction of decision trees. Machine learning 1986, 1, 81–106. 67. Ho, T.K. Random decision forests. Proceedings of 3rd international conference on document analysis and recognition: Montreal, Canada, 14–16 August 1995; 1995: IEEE. https://doi.org/10.1109/ICDAR.1995.598994 68. Chen, T.; Guestrin, C. Xgboost: A scalable tree boosting system. In: Krishnapuram, B. In Proceedings of the 22nd acm sigkdd international conference on knowledge discovery and data mining; San Francisco, CA, USA.2016. pp. 785–794. https://doi.org/10.1145/2939672.2939785 69. Zhang, L.; Zhan, C. Machine learning in rock facies classification: An application of XGBoost. International Geophysical Con‐ ference; 17–20 April 2017; Qingdao, China: Society of Exploration Geophysicists and Chinese Petroleum Society. https://doi.org/10.1190/IGC2017‐351 70. Imandoust, S.B.; Bolandraftar, M. Application of k‐nearest neighbor (knn) approach for predicting economic events: Theoretical background. IJERA 2013, 3, 605–610. 71. Gunn, S.R. Support vector machines for classification and regression. ISIS technical report 1998, 14, 5–16. Available at https://svms.org/tutorials 72. Durgesh, K.S.; Lekha, B. Data classification using support vector machine. J. Theor. Appl. Inf. Technol. 2010, 12, 1–7. Available on http://www.jatit.org. 73. Tu, J.V. Advantages and disadvantages of using artificial neural networks versus logistic regression for predicting medical outcomes. J. Clin. Epidemiol. 1996, 49, 1225–1231. https://doi.org/10.1016/S0895‐4356(96)00002‐9.
Agriculture – Multidisciplinary Digital Publishing Institute
Published: Feb 17, 2021
You can share this free article with as many people as you like with the url below! We hope you enjoy this feature!
Read and print from thousands of top scholarly journals.
Already have an account? Log in
Bookmark this article. You can see your Bookmarks on your DeepDyve Library.
To save an article, log in first, or sign up for a DeepDyve account if you don’t already have one.
Copy and paste the desired citation format or use the link below to download a file formatted for EndNote
Access the full text.
Sign up today, get DeepDyve free for 14 days.
All DeepDyve websites use cookies to improve your online experience. They were placed on your computer when you launched this website. You can change your cookie settings through your browser.