Access the full text.
Sign up today, get DeepDyve free for 14 days.
Shuming Shi, Guangwen Yang, Dingxing Wang, Weimin Zheng (2002)
Potential-Based Hierarchical Clustering
Dongkuan Xu, Ying-jie Tian (2015)
A Comprehensive Survey of Clustering AlgorithmsAnnals of Data Science, 2
R. Thrall (1978)
Mathematics of Operations Research.
R. Fisher (1936)
THE USE OF MULTIPLE MEASUREMENTS IN TAXONOMIC PROBLEMSAnnals of Human Genetics, 7
Junlin Li, Hongguang Fu (2011)
Molecular dynamics-like data clustering approachPattern Recognit., 44
(2011)
UCI machine learning repository: Iris data set
Singiresu Rao, R. Desai (1980)
Optimization Theory and ApplicationsIEEE Transactions on Systems, Man, and Cybernetics, 10
Armen Aghajanyan (2015)
Gravitational ClusteringArXiv, abs/1509.01659
P. Berkhin (2006)
A Survey of Clustering Data Mining Techniques
U. Brandes, G. Robins, Ann McCranie, S. Wasserman (2013)
What is network science?Network Science, 1
A. Kip, J. Kelley (1970)
Fundamentals of Electricity and MagnetismPhysics Today, 23
A. Wachs, I. Schochetman, Robert Smith (2011)
Average Optimality in Nonhomogeneous Infinite Horizon Markov Decision ProcessesMath. Oper. Res., 36
A. Jamalizadeh, N. Balakrishnan (2008)
On a Generalization of Bivariate Cauchy DistributionCommunications in Statistics - Theory and Methods, 37
J. Casti (2002)
The waves of life: The Elliott wave principle and the patterns of everyday eventsComplex., 7
Y. Lu, Y. Wan (2012)
Clustering by Sorting Potential Values (CSPV): A novel potential-based clustering methodPattern Recognit., 45
C Henning (2015)
What are true clusters?Pattern Recognition Letters, 64
C. Hennig (2015)
What are the true clusters?Pattern Recognit. Lett., 64
Shuming Shi, Yang Guangwen, W. Ding-xing, Zheng Weimin (2002)
Potential-based hierarchical clusteringObject recognition supported by user interaction for service robots, 4
A. Ramsey (1941)
The Theory of Newtonian Attraction. (Scientific Books: An Introduction to the Theory of Newtonian Attraction)Science
W. Feller (1959)
An Introduction to Probability Theory and Its Applications
W. Wright (1973)
A formalization of cluster analysisPattern Recognit., 5
AS Ramsey (1959)
An introduction to the theory of Newtonian attraction
M. Dubson (2014)
The Theoretical Minimum: What You Need to Know to Start Doing Physics.American Journal of Physics, 82
L. Susskind, G. Hrabovsky (2013)
The theoretical minimum : what you need to know to start doing physics
Li Junlin, Fu Hongguang (2011)
Molecular dynamics-like data clustering approachPattern Recognition, 44
Publisher's Note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations
J. Ecker (1988)
Introduction to Operations ResearchThe Mathematical Gazette, 73
The concept of proximity curve and a new algorithm are proposed for obtaining clusters in a finite set of data points in the finite dimensional Euclidean space. Each point is endowed with a potential constructed by means of a multi-dimensional Cauchy density, contribut- ing to an overall anisotropic potential function. Guided by the steepest descent algorithm, the data points are successively visited and removed one by one, and at each stage the overall potential is updated and the magnitude of its local gradient is calculated. The result is a finite sequence of tuples, the proximity curve, whose pattern is analysed to give rise to a deterministic clustering. The finite set of all such proximity curves in conjunction with a simulation study of their distribution results in a probabilistic clustering represented by a distribution on the set of dendrograms. A two-dimensional synthetic data set is used to illus- trate the proposed potential-based clustering idea. It is shown that the results achieved are plausible since both the ‘geographic distribution’ of data points as well as the ‘topographic features’ imposed by the potential function are well reflected in the suggested clustering. Experiments using the Iris data set are conducted for validation purposes on classification and clustering benchmark data. The results are consistent with the proposed theoretical framework and data properties, and open new approaches and applications to consider data processing from different perspectives and interpret data attributes contribution to patterns. Keywords Clustering · Physical model · Anisotropic potential · Cauchy class of distributions · Steepest descent · Probabilistic dendrogram · Proximity curve · Iris data set Attila Csenki a.csenki@bradford.ac.uk Daniel Neagu d.neagu@bradford.ac.uk Denis Torgunov d.torgunov@bradford.ac.uk Natasha Micic n.micic@bradford.ac.uk Artificial Intelligence Research Group, University of Bradford, Bradford BD7 1DP, UK Journal of Classiﬁcation 1 Introduction The history of work on clustering algorithms based on, and motivated by the gravitational (and electrostatic) models stretches back to the 1970s. They are too numerous to be exhaus- tively reviewed here; we review below a few relevant highlights, thereby placing our work into context and pointing out its distinguishing features. For a general review of clustering techniques, see Berkhin (2006) and for a recent survey see Xu and Tian (2015). One of the earliest papers of cognate interest is Wright (1977b), where the attractive force (between the particles) under Newtonian gravity moves the data points to form successively larger and fewer clumps, eventually resulting in all the mass being lumped together in one single cluster. Starting with a set of data points where each point has a potential, in Shi et al. (2002), an agglomerative hierarchical clustering algorithm is presented based on a comparison of dis- tances between already defined clusters. The distance used in Shi et al. (2002) is calculated as some weighted value of differences of potentials. In Lu and Wan (2012), each of the data points is endowed with a potential (of a Newto- nian type), and the points are sorted in a list in ascending order of their potential. Cluster centres are nominated as the list is being worked through by using a preselected bandwidth parameter; being able to measure the distance between points is an essential feature of the algorithm. An interesting recent paper with a philosophical flavour is Henning (2015). The authors reason that there is no such thing as the ‘best’ clustering algorithm since the notion of a cluster will be dependent on the context and on the intention of the investigator. On the other end of the spectrum, there is the early paper Wright (1977a), seeking to make clustering into an objective endeavour by assuming a priori a ‘clustering function’ which then is used to assess the quality of the clustering achieved. The view taken here is that ‘reality’ and ‘truth’ are elusive concepts and can be at best of partial concern to the modeller; important will be that, initially, a collection of (more or less plausible) models is available, and out of these the one is chosen which most successfully predicts future events. We want our model to be understood and received in this spirit: the model put forward is rooted in the physics of own data attributes, it is rich in structure and appears plausible in certain circumstances. The algorithm suggested here is an hierarchical clustering algorithm for a finite set of data points in the Euclidean space, each of which is assumed to be generating a potential field constructed by means of the density of a certain family of probability distributions. The individual point potentials can be anisotropic, thus allowing the surface representing the total potential to be thought of (in two dimensions for the sake of simplicity) as an undulating landscape with hills and valleys extending in any direction. This construction will allow clusters of data points on a ‘real’ surface to be modelled, a feature which may turn out to be useful in future work in epidemic modelling using networks (see Barabasi ´ 2016, Chapter 10). The novelty of our model also lies in the fact that the result is not a single dendrogram but a randomized dendrogram, defined as a convex combination of deterministic dendrograms, thus allowing the results to be formulated in a probabilistic framework. The important novel feature introduced here is that of the proximity curve whose pattern will be the key for determining the suggested clustering. A welcomed by-product of our algorithm is that the black hole problem (as a local minima challenge discussed for example in Li and Fu 2011) does not appear here: all massive data points will be ‘collected’ prior to visiting genuine clusters comprising smaller data points. Journal of Classiﬁcation In the next section, we place the paper’s topic into context and introduce some notation. Then, in the main Section 3, the new proximity curve concept and proposed algorithm are described in detail and illustrated by using a synthetic data set. In Section 4, experiments using the Iris data set (Fisher 1936, 2011) are conducted for validation purposes on classi- fication and clustering benchmark data. In Section 5, the results and various features of the suggested technique and its merit are highlighted. In Section 6, our findings are summarised and some future research ideas are indicated. Appendices A – C contain auxiliary theoretical material, whereas all nontextual material namely contour maps, proximity curve, potential curve, dendrogram, pseudocodes, tables and large matrices are collected in Appendix E. 2 Context and Assumptions We are given N data points x ,..., x in the Euclidean space, assumed here for the sake of 1 N simplicity to be the 2-dimensional plane R . This is not a technical limitation as such; it is intended merely as a vehicle for conveying the underlying principles. The ith point induces the potentials v , generating the total potential as follows: V(y) = v (y), y ∈ R . i=1 Depending on the context and emphases, the data points may be referred to also as nodes, point masses, point charges or simply points. The collection (set) of all data points may be referred to as a data cloud. The structure of the point potentials v is as follows: v (y) =−M f (y θ ) , i i i where M > 0 is the ‘mass’ of point i and f (. θ ) stands for a Cauchy probability den- sity functions (pdf) with parameter θ. The parameter θ has five components, reflecting the geometric transformation steps carried out to convey the seed distribution (a standard Cauchy) to the required Cauchy distribution; the construction involved is described in detail in Appendices A – C. Our running example is a synthetic data set comprising N = 19 points in the plane shown in Fig. 2 with individual and joint equipotentials. The corresponding parameter values are shown in Table 2. We will be interested in an algorithm for finding data clusters based on the points’ location and the overall potential field generated by them. The physical models of Gravity and Electrostatics serve us as an intuitive background (e.g. Ramsey 1959;Kip 1962). The field’s gradient vector ∇V(y) is the force exerted onto an infinitesimal unit ‘test mass’ or ‘test charge’ located at y. The gradient vector ∇V(y) will be used to reflect qualitatively on where the point y in relationship with the others lies: is ∇V(y),the magnitude of the gradient, close to zero, so is y near the bottom of a valley (i.e. near a local minimum of V ), whereas ‘noticeably positive’ values of ∇V(y) are associated with the point y being on a slope on the surface describing V . Respective examples in Fig. 2 are the points #10 and #6. (See Table 2 for the points’ numbering.) The point y may also be located near a ridge if ∇V(y)≈ 0, but this possibility we choose to ignore as in our model is unlikely to occur. Journal of Classiﬁcation Our aim is to discover data clusters by considering properties derived from the gradient field. Finite lists of tuples of real numbers will be constructed (we shall call them proximity curves) whose pattern will suggest ways of clustering the data. 3 The Algorithm: Cluster Discovery by Proximity Curves 3.1 Proximity Curves Initially, all data points are said to be active. Starting from some initial position, steepest descent is applied to the total potential generated by all the active points, guiding us to one of the data points. The point thus found is then deactivated, the total potential is redefined (i.e. updated) to be the one generated by the now active points and the process is restarted at the position of the point just deactivated. This process is carried out until the last point has been found upon which all points become inactive and the total potential is zero. The core of this algorithm is PROXIMITYCURVE; it is shown as Pseudocode E-2. As can be gleaned from Pseudocode E-2, in the course of the computations, a list of pairs is being built up, each entry comprising the label of the node being deactivated and the logarithm of the vector norm of the gradient of the updated potential at the node’s location. (The algo- rithm CLOSESTACTIVE, shown as Pseudocode E-3, is an auxiliary.) Figure 3 illustrates the progress of the algorithm with a snapshot of the equipotentials just after having deactivated seven data points. A proximity curve, illustrated by Fig. 4, is a plot of the logarithm of the vector norm of the gradient of the successively updated potential (as described above) versus the points’ position in the list (the logarithm applied to the magnitude of the gradient is merely there to express and amplify an existing pattern). In total, there will be at the most N proximity curves as in practice some nodes cannot be reached to be the first node to be visited and since the first node visited completely determines the rest of the proximity curve. The full information about all possible proximity curves can be conveyed by two square matrices of size N: – The rows of matrix S = s record all possible sequences of nodes, thus ij i,j =1,...,N obtainable. – The rows of matrix = λ are the logarithm of the norm of the gradient of ij i,j =1,...,N the corresponding successively updated potential at the respective node locations. This is illustrated in Fig. 4 by the proximity curve whose first node is #2. (It was obtained by starting PROXIMITYCURVE in (− 2, − 6). The curve shown in Fig. 4 is obtained by combining the second rows of each of the matrices S and (shown in Appendix E.4)into the list of pairs L =[(s ,λ ) , (s ,λ ) , (s ,λ ) ,..., s ,λ , (s ,λ )] 21 21 22 22 23 23 2(N −1) 2(N −1) 2N 2N =[(2, −3.39), (1, −4.07), (16, −2.12), ...,(7, −2.46), (9, −∞)].(1) 3.2 Pattern Extraction The curve shown in Fig. 4 has roughly a self-similar structure (in a statistical sense) akin to the Elliott Wave Pattern described in Casti (2002) in the context of financial data. The prin- ciple formulated below will guide us when using proximity curves for identifying clusters of data points. Journal of Classiﬁcation Guiding Principle. Large values of the norm of the gradient of the updated potential function are indicative of the beginning of a new cluster. Decreasing values of the norm of the gradient indicate a lack of ‘attractive mass’ of what is left in the current cluster. Cluster boundaries are therefore marked by local minima of the proximity curve. The pattern of a proximity curve is explored by a recursive algorithm. –The recursive step is in locating the smallest local minimum. In the example shown in Fig. 4, this occurs at node #13 as is seen from the appropriate section of the list representation of the proximity curve (1) as follows: L =[(2, −3.39) ,..., (15, −2.68) , (13, −7.72) , (19, −7.14) ,..., (9, −∞)] The proximity curve L is now recognised as the concatenation of the two lists: L =[(2, −3.39) ,..., (15, −2.68) , (13, −7.72)] left and L =[(19, −7.14) ,..., (9, −∞)], right each of which is a proximity curve in its own right. Obviously, if local minima determine cluster boundaries, then L and L give rise to the clustering as follows: left right {{2,..., 15, 13}, {19,..., 7, 9}}, (2) defining level 2 in the hierarchical clustering scheme. The next level of clustering will be arrived at by subjecting each of the lists L and L to thesamerecursivestep. (A left right list not containing a local minimum will not be expanded further but will be replaced by a list containing the list itself as the only entry.) –The base case is arrived at if none of the input lists can be written as the concatenation of two non-empty lists as described above. 3.3 Deterministic Dendrograms The algorithm in Section 3.2 can be used for obtaining a nested list representation of the set of nodes indicating the clustering thus found; for the present example, this is as follows: [[[[[2, 1]], [[16, 17, 18], [3, 4, 5]]], [[[14, 12, 15, 13]]]], [[[[19, 10, 11, 8]]], [[[6, 7, 9]]]]]. The dendrogram shown in Fig. 5 is arrived at by parsing this nested list structure. The upper hyphenated line in Fig. 5 indicates the clustering shown in Eq. 2 (at depth 2); in full, {{2, 1, 16, 17, 18, 3, 4, 5, 14, 12, 15, 13}, {19, 10, 11, 8, 6, 7, 9}}. The lower hyphenated line indicates the clustering found at depth 3 (see Fig. 5), {{{2, 1, 16, 17, 18, 3, 4, 5}, {14, 12, 15, 13}}, {{19, 10, 11, 8}, {6, 7, 9}}}. 3.4 Probabilistic Dendrograms Every proximity curve gives rise uniquely to a fully developed dendrogram as described in Sections 3.2 – 3.3. Not all conceivable proximity curves can be obtained by the present algorithm, however, as is illustrated by the running example. We are going to describe here how those reachable can be combined to a probabilistic (or randomized) dendrogram. Such a dendrogram allows the user to reason probabilistically and motivate a choice of starting Journal of Classiﬁcation point. Questions like ”What is the probability that two given points belong to the same cluster?” can then be meaningfully addressed. Choose a window of interest W ⊂ R . To any given position u ∈ W, a possible den- drogram is assigned in the following manner. W is chosen so that it defines a window that the entire data set fits into, and that will provide a selection of possible points which are “sensible”, i.e. could have been observed given the context of the initial data. To arrive at z = STEEPESTDESCENT (u,V ), use Pseudocode E-1 from Appendix E.3. The point z will be the position of a local minimum of the potential V . Find the data point x ∈{x ,..., x } which is closest to z. Circumstances will be in practice such that z is ι(u) 1 N unique (as far as the numerical procedure allows), resulting in a unique choice of the closest point x . The process thus described defines a mapping as follows: ι(u) ι : W →{1,...,N } u → ι(u) (start) Running Pseudocode E-2 (see Appendix E.3) with the initial vector z = u will result in the proximity curve whose first node is labelled ι(u). The proximity curve thus found is unique in that no two different proximity curves have the same first node. The dendrogram based on this proximity curve will be denoted by D . ι(u) Let u ,..., u be a random sample of size from the uniform distribution on W.We define the dendrogram probability vector d by the following: d = (d ,...,d ) , 1 N whose kth component d = I (3) k {ι(u )=k} j =1 is the relative frequency of the node k having been chosen as the first data point for the corresponding proximity curve, where 1if x ∈ B I (x) = (4) 0if x/ ∈ B is the indicator function of a given set B. The probability of two given nodes m = m being in the same cluster can now be 1 2 evaluated by the following: P (nodes m and m are in the same cluster) = 1 2 d P (nodes m and m are in the same cluster|D applies).(5) k 1 2 k k=1 The conditional probabilities in Eq. 5 take values in {0, 1}; they are obtained by inspecting dendrogram D . There are in total N(N − 1)/2 probabilities defined in Eq. 5. Probabilities for i pairwise different nodes m ,...,m being in the same cluster can be 1 i evaluated analogously; there are such values. Based on a sample of size = 1, 000, the dendrogram probabilities were calculated by Eq. 3;theyare showninTable 3. Notice that the data points 6, 7, 9, 13, and 15 (and the corresponding dendrograms) are unreachable, which is a consequence of none of these points being located close enough to a valley (a local minimum) of V ,or, more precisely,for all of these data points there is another data point closer to the candidate minimum location; see Fig. 2, the right-hand box. Journal of Classiﬁcation Consider m = 16, m = 18 as an example. It can be seen by inspection that out of the 1 2 14 possible dendrograms (not shown here), each of the following 9 will assign these two nodes to the same cluster: {D , D , D , D , D , D , D , D , D }. 2 3 4 8 10 11 12 16 19 It is therefore P (nodes #16 and #18 are in the same cluster) = d + d + d + d + d + d + d + d + d = 0.698 2 3 4 8 10 11 12 16 19 In the above calculations and example, fully developed dendrograms were considered only (requiring us to descend to depth 5). In general, the probability of any given set of nodes belonging to the same cluster can be calculated for any required level in a similar manner. 3.5 Implementation A suite of programmes has been written to implement the algorithms. The functions in Appendix E.3 were implemented in R since this has excellent programming and plotting facilities for data analytics. The pictorial and computational output of the R programs are the contour plots and the proximity curve, the latter internally represented as a list-of-tuples which then serves as an input to a recursive Haskell programme returning the dendrogram as a nested list. Haskell was also used to produce LT X code for drawing the dendrogram (as a tree) as exemplified in Fig. 5. The associated code and data are available from http://www.comp.brad.ac.uk/ dneagu/proximity 4 Experimental Work In order to evaluate the accuracy achieved by the clustering method proposed, we use the Iris data set (Fisher 2011), which is commonly used in the literature. The Cauchy parameters are selected from the data set as detailed below. The Iris data set has the following fields: – Sepal width (S.W.) – Sepal length (S.L.) – Petal width (P.W.) – Petal length (P.L.) In order to test how the algorithm behaves under different assignments of those fields to Cauchy distribution parameters, we varied which Iris data set field gets assigned to which parameter (μ , μ , c , c ). 1 2 1 2 For each of the inputs, the mass is assumed to be 1, and the angle parameters are computed as follows: μ + μ + c + c i1 i2 i1 i2 α = 2π max(μ ) + max(μ ) + max(c ) + max(c ) 1 2 1 2 It should be noted that this only one pragmatic assignment of α , but in principle other functions of the other parameters can be used. In addition, since we know which points should be clustered together (i.e. those belong- ing to the same Iris species), we can evaluate the accuracy for a given clustering (a given depth of the dendrogram). The accuracy is measured by determining what proportion of the Journal of Classiﬁcation data points belong to the correct cluster. The class label assigned to the cluster is chosen based on what class the majority of the points in the cluster belong to. The full results, for all possible parameter assignments, are given in Appendix E.5. We consider here the 4 parameter assignments shown in Table 1, along with the related accuracies, for the sake of simplicity (the experimental work has covered the entire range of possible permutations; please see Appendix E.5 for the entire list of results). Accuracy at depth 1 is not shown, as that represents the root of the tree, i.e. the point before any clustering has been attempted. This information is visualised in Fig. 7. As evidenced by Table 1, the accuracy achieved is highly dependent on the parameter assignments chosen. For example, the best overall result, as well as the fastest, is obtained for θ . That, after the first split, gives the accuracy of 0.593, which grows to 0.607 after one more iteration, and reaches 0.860 at dendogram depth 5. These results validate our theoretical assumption that data attributes translated into Cauchy parameters (which give rise to a particular proximity curve) influence and determine cluster definitions. The proposed algorithm allows further tun- ing in subsequent iterations, creating the opportunity to further improve classification results. 5 Discussion The distinguishing feature of our algorithm is that the data points are kept stationary while the space is explored with a moving unit test charge (as in electrostatics) or test mass (as in gravity) to discover clusters. In doing so, the test charge will be attracted to groups of charges (or masses), depending on the force of attraction as measured by the gradient of the potential. As a data point is ‘discovered’, it is deactivated (i.e. removed) and the gradient of the updated potential field is used to guide us to discover the next data point. By successively removing data points, the current cluster gets depleted, resulting in a steady weakening of attraction to what remains of the cluster. Eventually, the current cluster gets exhausted, indicated by the small magnitude of the gradient of the potential. This then is the sign for starting a new cluster: the magnitude of the gradient of the total potential of the remaining active points begins to increase. The pattern of the proximity curve thus constructed holds the clue to the definition of the clustering. The process described above suggests a nested structure of clusters inherent in hierarchi- cal clustering. Individual proximity curves are usually suitable for finding the clustering at the local level because removal of points may eventually noticeably interfere with the global interplay of individual point potentials. The effect of ‘point removal’ inherent in the technique is intended to be counteracted by taking into account all possible (reachable) proximity curves in a probabilistic sense. Table 1 Parameter assignments i θ Accuracy per depth and accuracy measures μ μ c c 2345 1 2 1 2 1 S.L. P.L. S.W. P.W. 0.387 0.693 0.733 0.793 2 S.L. P.W. S.W. P.L. 0.360 0.380 0.600 0.640 3 S.W. P.L. S.L. P.W. 0.593 0.607 0.713 0.860 4 S.W. P.W. S.L. P.L. 0.473 0.587 0.640 0.733 Journal of Classiﬁcation The dendrogram in Fig. 5 in conjunction with the set’s joint equipotentials in Fig. 2 shows that the clustering suggested is intuitively plausible. The fact that point #19 is not grouped together with the points {16, 17, 18} is attributable to the fact that #19, even though it is physically close to the group, is separated from it by a ridge. Figure 3 shows a snapshot of the node deactivation process. It is seen that the algorithm duly observes both the mutual closeness of data points as well as the topography of the potential surface. Finally, it is instructive to see that the north-eastern group of data points {14, 12, 15, 13} is separated by the dendrogram from the south-western group {6, 7, 9} even though the locations of all these points are roughly on the same potential level, as shown in Fig. 6. This illustrates once more the point that the algorithm takes into account mutual geographic proximity of points as well as surface topography. 6 Conclusion and Outlook The new concept of the proximity curve has been introduced and used for finding clusterings in finite sets of data points in the Euclidean space endowed with individual potentials derived from a multivariate Cauchy distribution. The finite collection of all proximity curves gives rise to a probabilistic dendrogram. A synthetic example comprising 19 data points was taken to illustrate the technique. The evaluation of the algorithm on a combination of Iris data set case studies reported in Section 4 proves that this new approach to consider data features’ contri- butions to the global field potential as a way to discover clusters is promising and valuable. Further research is planned to explore the technique for data sets in higher dimension, for large data sets, for more real-life data sets, and the associated computational complexity. More comprehensive evaluations on other cases with real data is envisaged as future work allowing an insight into the applicability of the proposed methodology. Another research direction could address how the technique generalises to the continuous case where the data set is so large and densely packed that it is reasonable to model it as a continuum. The authors see interesting research avenues on parameter tuning and extension to multi- dimensional spaces that can be treated as multi-objective optimisation challenges. Acknowledgements The paper has greatly benefitted from the detailed comments of the referees. It has been pointed out to the authors that the present paper may have interesting connections to the fields Big Data Analytics and Physics/Cosmology. All this will have to be explored in the future. The authors acknowledge and thank the contributors to the UCI Machine Learning (Lichman 2013) repository for making available the Iris benchmark data set. Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. Appendix A: Classical NEWTONIAN Gravitational Potential The book (Susskind and Hrabovsky 2013) is a good source for those who have heard it all before and want only to refresh. Ramsey’s book (1959) is a beautiful classic on gravity. Journal of Classiﬁcation A.1 Single Point Mass A point mass of size M placed at the co-ordinate origin creates a gravitational potential which at a distance r from the origin is as follows: V(r) =−κ , (6) where κ is the gravitational constant. The force of attraction on a unit point mass is the negative of the derivative of V(r) in Eq. 6 w.r.t. r, d M F(r) =− V(r) =−κ.(7) dr r ThenegativesigninEq. 7 indicates that the force points towards the origin (the location of the point mass M ). If Cartesian co-ordinates are used then Eq. 6 nowreadsasfollows: V (x, y, z) =−κ (8) 2 2 2 x + y + z We may visualise the potential in Eq. 8 as a funnel with rotational symmetry around the origin which descends to minus infinity. The origin is a singularity of the potential. (It is not defined there.) If the point mass is at (x ,y ,z ), then its potential at the location (x, y, z) is as follows: 0 0 0 V (x, y, z | x ,y ,z ) =−κ (9) 0 0 0 2 2 2 (x − x ) + (y − y ) + (z − z ) 0 0 0 A particular component of the attractive force is the partial derivative of −V in Eq. 9 in the respective direction; for example, the x component of the attractive force at location (x, y, z) is as follows: ∂ (x − x ) − V (x, y, z | x ,y ,z ) =−κM (10) 0 0 0 0 ∂x 2 2 2 (x − x ) + (y − y ) + (z − z ) 0 0 0 Notice that if x = y = z = 0, then Eq. 10 specialises to the Cartesian form of Eq. 7, 0 0 0 M 1 F(x,y,z) =−κ xyz (11) 2 2 2 2 2 2 x + y + z x + y + z Also notice that the unit vector on the right-hand side of Eq. 11, xyz 2 2 2 x + y + z points in the direction of the point (x,y,z), and the magnitude of the attractive force in Eq. 11 is as follows: F(x, y, z)= κ 2 2 2 x + y + z Journal of Classiﬁcation A.2 Several Point Masses Let us assume that we are given N points at respective locations (x ,y ,z ) and masses i i i M , i = 1,...,N. Denoting by v individual points’ potential, the total potential is by superposition V (x, y, z) = v(x, y, z | x ,y ,z ) i i i i=1 and the force acting upon a point of unit infinitesimal mass is the negative of the gradient of the potential, ∂ ∂ ∂ F(x,y,z) =− V (x, y, z), V (x, y, z), V (x, y, z) ∂x ∂y ∂y ∂ ∂ =− v(x, y, z | x ,y ,z ), v(x, y, z | x ,y ,z ), i i i i i i ∂x ∂y i=1 v(x, y, z | x ,y ,z ) (12) i i i ∂y The operation ‘gradient’ applied to a real function of several variables is sometimes denoted also by the operator ∇ (the nabla operator). Using this notation, Eq. 12 becomes as follows: F(x,y,z) =−∇V (x, y, z) Appendix B: Potentials Generated by Cauchy Densities B.1 General Considerations A data cloud is viewed here as a collection of point masses. The same data point recorded n times will carry the mass n. The potential functions used here were inspired by the Newtonian potential. The Newtonian gravity model is, however, not suitable since it has singularities and does not allow anisotropic potential fields to be modelled. To avoid singularities, each point mass will be assumed to be generating a potential field which is the weighted negative density of a (sufficiently ‘smooth’) 2-D distribution. This approach will also allow directionality (anisotropy) of the potential field to be modelled. It will be assumed that a class of probability densities on R is given by the following: {f (y ,y θ ) : θ ∈ }, (13) Y 1 2 and that the ith of the N point masses generates the point potential v (y ,y ) =−M f (y ,y | θ ) , (14) i 1 2 i Y 1 2 i where M in Eq. 14 is the mass of the ith point. The class of densities in Eq. 13 arises from a seed distribution by subjecting it to an affine linear transformation as described in Appendix C. In addition to the mass, the ith point is associated with five more parameters, –Two location parameters μ ,μ ∈ R i1 i2 –Two stretch parameters c ,c > 0. It is 0 <c ≤ 1 for ‘squeezing’ and 1 ≤ c< ∞ i1 i2 for ‘stretching’ –An angle of rotation paremeter α ∈[0, 2π) They will be called the geometric parameters. Journal of Classiﬁcation The total potential, the sum of the point potentials, is the negative of a linear combination of 2-D densities of the class (13). We mention in passing that the negative of the normalised total potential is therefore the probability density of a mixture distribution. The components of the parameter θ in Eq. 13 will be the vector of geometric parameters θ = (μ ,μ ,c ,c ,α) . 1 2 1 2 The point potential of the point mass i is then v (y ,y ) =−M f (y ,y | μ ,μ ,c ,c ,α ) (15) i 1 2 i Y 1 2 i1 i2 i1 i2 i The total potential generated by the N point masses is by superposition as follows: N N V(y ,y ) = v (y ,y ) =− M f (y ,y | μ ,μ ,c ,c ,α ) (16) 1 2 i 1 2 i Y 1 2 i1 i2 i1 i2 i i=1 i=1 The force acting upon an infinitesimal unit mass in the plane is the negative of the gradient of V in that point. B.2 Cauchy Type Potentials The context is as described above where now the underlying class of distributions is Cauchy. More precisely, the class of densities in Eq. 13 is the set of all 2-D Cauchy densities obtain- able from the seed density in Eq. 29 by applying an affine linear transformation. The Cauchy density in Eqs. 31–32 has the form as follows: −3/2 2 2 f y ,y = K 1+K (y −μ ) +K (y −μ )(y −μ )+K (y −μ ) , (17) ( ) Y 1 2 0 1 1 1 12 1 1 2 2 2 2 2 with constants K defined in terms of the three of the five natural parameters as follows: K = , (18) 2πc c 1 2 2 2 cos α sin α K = + , (19) 2 2 c c 1 2 2 2 sin α cos α K = + , (20) 2 2 c c 1 2 1 1 K = sin 2α − . (21) 2 2 c c 1 2 By partially differentiating (17), we get the following: ∂f 3 −5/2 =− K (...) (2K (y − μ ) + K (y − μ )) 0 1 1 1 12 2 2 ∂y 2 5/3 3 1 −3/2 =− K K (...) (2K (y − μ ) + K (y − μ )) 0 0 1 1 1 12 2 2 2 K −2/3 5/3 =− K (f ) (2K (y − μ ) + K (y − μ )) (22) Y 1 1 1 12 2 2 Similarly, ∂f Y −2/3 5/3 =− K (f ) (2K (y − μ ) + K (y − μ )) (23) Y 2 2 2 12 1 1 ∂y 2 Journal of Classiﬁcation From Eqs. 22–23, the gradient of f is seen to be as follows: ∂f (y ,y ) Y 1 2 2K K y μ −2/3 1 12 1 1 ∂y 5/3 ∇f (y ,y ) = =− K (f (y ,y )) − Y 1 2 Y 1 2 ∂f (y ,y ) Y 1 2 K 2K y μ 2 12 2 2 2 ∂y (24) From Eqs. 14 and Eq. 24, the gradient of v , the potential of the ith single point mass is by Eq. 15 as follows: ∂v (y ,y ) i 1 2 2K K y μ −2/3 5/3 ∂y i,1 i,12 1 1 ∇v (y ,y ) = = M K f (y ,y ) − , i 1 2 i i,Y 1 2 ∂v (y ,y ) i,0 i 1 2 K 2K y μ i,12 i,2 2 2 ∂y (25) where f stands for the density f from Eq. 17 with the parameters being i,Y Y μ ,μ ,c ,c and α . Furthermore, the quantities K in Eq. 25 are defined by Eqs. 18–21; i1 i2 i1 i2 i i they are to be evaluated at the same set of parameter values μ ,μ ,c ,c and α . i1 i2 i1 i2 i The gradient of the total potential is the sum of the values in Eq. 25, 2K K y μ −2/3 5/3 i,1 i,12 1 1 ∇V(y ,y ) = M K f (y ,y ) − 1 2 i i,Y 1 2 i,0 K 2K y μ 2 i,12 i,2 2 2 i=1 Appendix C: Cauchy Distributions in 2-D The type of point potentiused in this paper for data points derives from densities from the well-known Cauchy class of probability distributions. Facts are collected here for reference in the main body of the paper. C.1 Seed Distribution The standard one dimensional Cauchy distribution is usually presented as the prime example of a distribution on R whose mean does not exist. Its probability density function is as follows: f (x) = ,x ∈ R. (26) π 1 + x The appeal of a function like the one in Eq. 26 lies in the fact that it does not tend ‘too fast’ to zero as the argument is taken to infinity; such distributions are sometimes referred to as ‘fat tailed’. 2-D versions of the Cauchy distribution are much less well-known. The 2-D analogue of the Cauchy density (26) is as follows: f (x ,x | ρ) = (27) X 1 2 3/2 −1 2π det (ρ) 1 + (x ,x ) (ρ) 1 2 where ρ ∈ (−1, +1) is a parameter and 1 ρ (ρ) = . (28) ρ 1 This feature is quite unlike in the Gaussian case where densities tend exponentially to zero as the argument tends to infinity. Because of this feature, potentials based on Cauchy-like functions are able to maintain the connection between clumps of data points where Gauss-based potentials would fail to do so. Journal of Classiﬁcation The density in Eqs. 27–28 is referred to as a standard bivariate Cauchy density, Jamalizadeh and Balakrishnan (2008). It will be transformed by a sequence of geometric operations. Assume that the 2-D random vector X is standard bivariate Cauchy distributed with density (27)–(28) with ρ = 0, f (x ,x ) = ,x ,x ∈ R. (29) X 1 2 1 2 3/2 2 2 2π 1 + x + x 1 2 Cauchy distributions, univariate and bivariate, are discussed extensively in Feller (1971) where (29)istermed the standard bivariate Cauchy density. It will be taken to be the seed for the geometric transformations carried out next in Appendix C.2. C.2 Transformed Distribution Assume now that Y comes about by subjecting X to the composition of the following three geometric operations: 1. Stretch by c ,c > 0 along the respective axes 1 2 2. Rotate by the angle α ∈[0, 2π) 3. Shift by the vector (μ ,μ ) 1 2 An affine linear transformation T maps X to Y = T (X),where cos α − sin α c 0 X μ 1 1 1 Y = + . (30) sin α cos α 0 c X μ 2 2 2 By solving (30)for X, we get the following: / 0 cos α sin α Y − μ c 1 1 −1 1 X = T (Y) = , 0 / − sin α cos α Y − μ 2 2 which then component-wise is written as following: cos α (Y − μ ) + sin α (Y − μ ) 1 1 2 2 X = , − sin α (Y − μ ) + cos α (Y − μ ) 1 1 2 2 X = . According to the transformation rule of probability densities, it is as following: 1 1 1 −1 f (y) = f T (y) = , (31) Y X −1 c c D det T T (y) 1 2 where the denominator D in Eq. 31 is as follows: 2 2 cos α sin α 1 1 D = 2π 1 + + (y − μ ) + sin 2α − (y − μ )(y − μ ) 1 1 1 1 2 2 2 2 2 2 c c c c 1 2 1 2 3/2 sin α cos α + + (y − μ ) (32) 2 2 2 2 c c 1 2 Figure 1 illustrates the succession of transformations for as follows: 1 3 T T ◦ μ = (−1.5, 2) , c = (1, ),α = π (= 67.5 ) . (33) 2 8 0.2 0.15 0.25 0.05 0.14 0.05 0.12 0.08 0.1 0.15 0.06 0.04 0.1 0.02 0.15 0.05 0.1 Journal of Classiﬁcation Corresponding formulae can be developed also for the 2-D Gaussian class which, how- ever, in contrast to the Cauchy class, turns out not to be suitable for our present purposes as the effect of distribution dies off fairly quickly for arguments further away from the distribution’s centre. Appendix D: Optimisation: Steepest Descent The steepest descent algorithm is an iterative algorithm for unconstrained minimisation of a differentiable real function of several real variables (Ecker and Kupferschmid 1988; Marlow 1978;Rao 1984). The equal steplength version will be used in the algorithm for determining the proximity curve. Removing a point reached changes the landscape, assuring a new minimum can be reached (since the current position is no longer a minimum after a point is deactivated). The algorithm is shown in Appendix E.3 as Pseudocode E-1. Appendix E: Non-textual Material E.1 Contourmaps Successive Transformations on the 2−D Cauchy pdf 0.1 −3 −2 −1 0 1 2 3 Fig. 1 Contours of Four Cauchy Densities: Seed 0.2 0.2 0.25 0.25 −3 −2 −1 01234 −0.45 −0.4 −0.55 −0.45 −0.3 −0.25 −0.25 −0.25 −0.2 −0.12 −0.12 −0.1 −0.15 −0.12 −0.15 −0.4 −0.2 −0.08 −0.1 −0.1 −0.05 −0.05 −0.05 −0.25 −0.1 −0.25 −0.08 −0.05 −0.05 −0.1 −0.45 −0.15 −0.08 −0.1 −0.05 −0.05 −0.06 −0.04 −0.05 −0.05 −0.45 −0.04 −0.06 −0.1 −0.05 −0.55 −0.1 −0.25 −0.2 −0.04 −0.1 −0.1 −0.02 −0.6 −0.15 −0.2 −0.1 −0.3 −0.1 −0.02 −0.05 −0.2 −0.1 −0.15 −0.05 −0.2 −0.1 −0.05 −0.35 −0.1 −0.3 −0.1 −0.15 −0.4 −0.1 −0.25 −0.2 −0.1 Journal of Classiﬁcation Points and INDIVIDUAL Equipotentials Points and JOINT Equipotentials ab 12 12 14 14 13 4 5 13 4 5 −0.1 −0.1 3 3 19 18 19 18 17 17 10 7 1 10 11 7 1 11 8 6 8 6 2 2 −2 0246 −2 0246 y y 1 1 19 active (solid circle), 0 deactivated (hollow circle) 19 active (solid circle), 0 deactivated (hollow circle) Fig. 2 Synthetic data set with individual (a) and joint equipotentials (b) Points and JOINT Equipotentials 13 4 5 19 18 1 10 11 7 8 6 −2 0246 12 active (solid circle), 7 deactivated (hollow circle) Fig. 3 Nineteen points with equipotentials, {2, 1, 16, 17, 18, 3, 4} deactivated (hollow circles) −0.3 −0.05 −0.05 −0.05 −0.05 −0.05 −0.1 −0.02 −0.3 −0.35 −0.06 −0.15 −0.15 −0.1 −0.15 −0.25 −0.5 −0.2 −0.5 −0.35 −0.05 −0.2 −0.25 −0.65 −0.2 −0.2 −0.2 −0.2 −0.25 −0.3 −0.25 −0.25 −0.25 −0.2 −0.2 −0.2 −0.02 −0.02 −0.2 −0.15 −0.04 −0.25 −0.3 −0.25 −0.25 −0.3 −0.06 −0.4 −0.1 −0.1 −0.25 −0.25 −0.04 −0.07 −0.25 −0.14 −0.22 −0.35 −0.2 −0.25 −0.01 −0.2 −0.15 −0.25 −0.35 −0.6 −0.6 −0.16 −0.12 −0.35 −0.06 −0.3 −0.2 −0.09 −0.05 −0.15 −0.01 −0.5 −0.18 −0.08 −0.03 −0.1 −0.2 −0.2 −0.6 −0.15 −0.15 −0.25 −0.15 −0.15 −0.1 −0.25 −0.25 −0.55 −0.15 −0.15 −0.35 −0.6 −0.15 −0.5 y2 −6 −4 −2 0 2 4 6 −6 −4 −2 0 2 4 6 −6 −4 −2 0 2 4 6 Journal of Classiﬁcation E.2 Nineteen Point Set: Curves and Dendrogram PROXIMITY CURVE FOR THE 19 NODES Start in ( −2 , −6 ) 2 1 16 17 18 3 4 5 14 12 15 13 19 10 11 8 6 7 9 Nodes Fig. 4 Nineteen points: proximity curve when starting in (− 2, − 6) Fig. 5 Fully developed dendrogram for clustering based on Fig. 4 (depth = 5) logarithm of grad(total potential) −7 −6 −5 −4 −3 −2 −1 Journal of Classiﬁcation Fig. 6 Nineteen points: total JOINT potential (cf Fig. 2) E.3 Pseudocodes Pseudocode E-1 Steepest Descent (equal steplength) Journal of Classiﬁcation Pseudocode E-2 Proximity Curve Pseudocode E- 3 Closest Active Node Journal of Classiﬁcation E.4 Nineteen Point Set: Tables and Matrices Table 2 Nineteen data points in the plane with parameters Point no. y y c c α Size Region 1 2 1 2 10.0 −2.0 1.0 1.0 0.0 1.0 South-west −π 20.0 −3.8 1.0 0.5 / 1.0 3 2.5 2.5 1.0 0.5 / 1.0 North-east 4 2.0 4.0 1.0 0.5 / 1.0 5 4.0 4.0 1.0 0.5 0.0 1.0 65.0 − 3.0 1.0 0.5 / 1.0 South-east 75.0 − 2.0 1.0 0.5 / 1.0 84.5 − 3.0 1.0 0.5 / 1.0 93.8 − 3.1 1.0 0.5 / 1.0 10 3.0 − 2.0 1.0 0.5 / 1.0 11 4.0 − 2.0 1.0 0.5 / 1.0 12 − 1.0 5.0 1.0 0.5 / 1.0 North-west 13 − 1.0 4.0 1.0 0.5 / 1.0 14 − 0.5 4.5 1.0 0.5 / 1.0 15 − 0.5 5.5 1.0 0.5 / 1.0 16 − 1.5 − 0.5 1.0 1.0 0.0 1.0 Elsewhere 17 − 0.5 0.5 3.0 0.2 / 1.0 18 0.5 1.5 1.0 1.0 0.0 1.0 19 − 0.5 1.5 3.0 0.2 / 0.5 Table 3 Dendrogram probability vector d k 12345678910 d 0.058 0.126 0.066 0.066 0.088 0.000 0.000 0.225 0.000 0.056 k 11 12 13 14 15 16 17 18 19 d 0.048 0.038 0.000 0.069 0.000 0.045 0.010 0.077 0.028 k Journal of Classiﬁcation E.4.1 Possible Sequences of Nodes Visited ⎛ ⎞ 12 10 11 8679 18 19 17 16 13 12 15 14453 ⎜ ⎟ 2 1 16 17 18 3 4 514121513191011 8 6 7 9 ⎜ ⎟ ⎜ ⎟ 3 4 5 18 17 16 1 2 8 9 11 7 6 10 19 14 12 15 13 ⎜ ⎟ ⎜ ⎟ 4 5 3 18 17 16 1 2 8 9 11 7 6 10 19 14 12 15 13 ⎜ ⎟ ⎜ ⎟ 5 4 18 19 17 16 1 2 8 9 11 7 6 10 3 14 12 15 13 ⎜ ⎟ ⎜ ⎟ −−−−−−−−−−−−−−−−−−− ⎜ ⎟ ⎜ ⎟ −−−−−−−−−−−−−−−−−−− ⎜ ⎟ ⎜ ⎟ 8 9 11 7 6 10 1 2 16 17 18 3 4 5 14 12 15 13 19 ⎜ ⎟ ⎜ ⎟ −−−−−−−−−−−−−−−−−−− ⎜ ⎟ ⎜ ⎟ S = 10 11 8 6 7 9 1 2161718 3 4 51412151319 ⎜ ⎟ ⎜ ⎟ 11 8 6 7 910 1 2161718 3 4 51412151319 ⎜ ⎟ ⎜ ⎟ 12 14 13 15 4 5 3 18 17 16 1 2 8 9 11 7 6 10 19 ⎜ ⎟ ⎜ ⎟ −−−−−−−−−−−−−−−−−−− ⎜ ⎟ ⎜ ⎟ 14 12 15 13 18 19 17 161289 11 76 10345 ⎜ ⎟ ⎜ ⎟ −−−−−−−−−−−−−−−−−−− ⎜ ⎟ ⎜ ⎟ 16 17 18 3 4 5 14 12 15 13 19 1 2 8 9 11 7 6 10 ⎜ ⎟ ⎜ ⎟ 17 19 18 3 4 5 15 14 13 12 16 1 2 8 9 11 7 6 10 ⎜ ⎟ ⎝ ⎠ 18 19 17 16 1 2 8 9 11 7 6 10 3 4 5 15 14 13 12 19 18 17 16 1 2 8 9 11 7 6 10 3 4 5 15 14 13 12 E.4.2 Gradient Logarithms ⎛ ⎞ − 3.84 − 4.56 − 1.42 − 0.85 − 1.95 − 1.27 − 2.45 − 6.45 − 2.46 − 3.58 − 2.95 − 5.56 − 0.85 − 1.05 − 1.76 −4.67 − 2.78 − 3.2 -Inf ⎜ ⎟ ⎜ ⎟ ⎜ ⎟ − 3.39 − 4.07 − 2.12 − 2.64 − 3.5 − 3.27 − 3.89 − 6.33 − 0.88 − 1.18 − 2.68 − 7.72 − 7.14 − 1.41 − 0.84 −1.95 − 1.27 − 2.46 -Inf ⎜ ⎟ ⎜ ⎟ ⎜ ⎟ − 2.71 − 3.13 − 3.98 − 2.14 − 2.69 − 4.84 − 3.66 − 4.42 − 1.91 − 1.74 − 3.46 − 1.91 − 4.97 − 8.41 − 4.33 − 0.88 − 1.18 − 2.68 -Inf ⎜ ⎟ ⎜ ⎟ ⎜ ⎟ − 2.64 − 3.12 − 3.48 − 2.14 − 2.69 − 4.84 − 3.66 − 4.42 − 1.91 − 1.74 − 3.46 − 1.91 − 4.97 − 8.41 − 4.33 − 0.88 − 1.18 − 2.68 -Inf ⎜ ⎟ ⎜ ⎟ ⎜ ⎟ − 3.02 − 3.23 − 2.31 − 3.44 − 2.76 − 3.79 − 3.7 − 4.45 − 1.91 − 1.74 − 3.47 − 1.91 − 4.95 − 6.98 − 5.83 − 0.88 − 1.18 − 2.68 -Inf ⎜ ⎟ ⎜ ⎟ ⎜ ⎟ ⎜ −−−−−−−−−−−−−−−−−− − ⎟ ⎜ ⎟ ⎜ ⎟ ⎜ ⎟ −−−−−−−−−−−−−−−−−− − ⎜ ⎟ ⎜ ⎟ ⎜ ⎟ − 1.92 − 1.75 − 3.59 − 1.92 − 4.75 − 4.75 − 3.76 − 5.42 − 2.12 − 2.65 − 3.5 − 3.22 −3.92 − 5.78 − 0.88 − 1.18 − 2.68 − 8.43 -Inf ⎜ ⎟ ⎜ ⎟ ⎜ ⎟ ⎜ −−−−−−−−−−−−−−−−−− − ⎟ ⎜ ⎟ ⎜ ⎟ = ⎜ − 1.44 − 0.85 − 1.96 − 1.27 − 2.44 − 5.56 − 3.76 − 5.42 − 2.12 − 2.65 − 3.5 − 3.22 − 3.92 − 5.78 − 0.88 − 1.18 − 2.68 − 8.43 -Inf ⎟ ⎜ ⎟ ⎜ ⎟ ⎜ ⎟ − 1.35 − 1.96 − 1.24 − 2.39 − 3.19 − 4.75 − 3.76 − 5.42 − 2.12 − 2.65 − 3.5 − 3.22 − 3.92 − 5.78 − 0.88 − 1.18 − 2.68 − 8.43 -Inf ⎜ ⎟ ⎜ ⎟ ⎜ ⎟ ⎜ − 0.86 − 1.18 − 2.93 − 4.91 − 2.46 − 3.13 − 3.58 − 2.13 − 2.67 − 4.53 − 3.66 − 4.4 − 1.91 − 1.74 − 3.45 − 1.91 − 4.98 − 11.8 -Inf ⎟ ⎜ ⎟ ⎜ ⎟ ⎜ −−−−−−−−−−−−−−−−−− − ⎟ ⎜ ⎟ ⎜ ⎟ ⎜ ⎟ − 0.92 − 1.19 − 2.58 − 4.17 − 2.42 − 3.1 − 2.8 −3.7 − 3.71 − 4.44 − 1.91 − 1.73 − 3.48 − 1.91 − 4.94 − 6.9 − 3.33 − 3.37 -Inf ⎜ ⎟ ⎜ ⎟ ⎜ ⎟ −−−−−−−−−−−−−−−−−− − ⎜ ⎟ ⎜ ⎟ ⎜ ⎟ − 2.16 − 2.76 − 3.61 − 3.28 − 3.88 − 6.35 − 0.88 − 1.18 − 2.67 − 7.05 − 5.57 − 3.66 − 4.39 − 1.91 − 1.74 − 3.45 −1.91 − 4.98 -Inf ⎜ ⎟ ⎜ ⎟ ⎜ ⎟ ⎜ − 5.07 − 2.44 − 3.96 − 3.31 − 3.48 − 6.67 − 0.86 − 1.05 − 1.8 − 7.1 − 3.71 − 3.66 − 4.39 − 1.91 − 1.74 −3.45 − 1.91 − 4.98 -Inf ⎟ ⎜ ⎟ ⎜ ⎟ ⎜ ⎟ − 2.42 − 3.45 − 2.82 − 3.82 − 3.72 − 4.46 − 1.91 − 1.73 − 3.48 − 1.91 − 4.93 − 6.71 − 3.25 − 3.45 − 6.15 − 0.86 − 1.05 − 1.79 -Inf ⎝ ⎠ − 2.15 − 2.43 − 2.82 − 3.82 − 3.72 − 4.46 − 1.91 − 1.73 − 3.48 − 1.91 − 4.93 − 6.71 − 3.25 − 3.45 − 6.15 − 0.86 − 1.05 − 1.79 -Inf Journal of Classiﬁcation E.5 Iris Accuracy Measures E.5.1 Combined Accuracy Plot Combined Accuracy Plot 0.9 0.8 Parameter assigment 0.7 theta1 theta2 0.6 theta3 0.5 theta4 0.4 1 2 3 4 5 6 7 8 9 1011121314 Dendrogram Depth Figure 7 Combined accuracy of θ , θ , θ , θ 1 2 3 4 E.5.2 Further Accuracy Plots Below, the accuracy plots for other assignments of Cauchy parameters are presented. Accuracy Journal of Classiﬁcation Journal of Classiﬁcation References Barabasi, ´ A.L. (2016). Network science, Cambridge University Press, Cambridge. Berkhin, P. (2006). A survey of clustering data mining techniques, (pp. 25–71). Berlin: Springer. Casti, J.L. (2002). The waves of life: the Elliott wave principle and the patterns of everyday events. Complexity, 7(6), 12–17. https://doi.org/10.1002/cplx.10051. Ecker, J.G., & Kupferschmid, M. (1988). Introduction to operations research. New York: Wiley. Feller, W. (1971). An introduction to probability theory and its applications, 2nd edn. Vol. 2. New York: Wiley. Fisher, R. (1936). The use of multiple measurements in taxonomic problems. Annals of Eugenics, 7(2), 179–188. Fisher, R.A. (2011). UCI machine learning repository: Iris data set. http://archive.ics.uci.edu/ml/datasets/Iris. Henning, C. (2015). What are true clusters? Pattern Recognition Letters, 64, 53–62. https://doi.org/10.1016/j. patrec.2015.04.009. Jamalizadeh, A., & Balakrishnan, N. (2008). On a generalization of bivariate cauchy distribution. Communi- cations in Statistics – Theory and Methods, 37, 469–474. https://doi.org/10.1080/03610920701469160. Kip, A.F. (1962). Fundamentals of electricity and magnetism. New York: McGraw-Hill. Journal of Classiﬁcation Li, J., & Fu, H. (2011). Molecular dynamics-like data clustering approach. Pattern Recognition, 44, 1721– 1737. https://doi.org/10.1016/j.patcog.2011.01.00. Lichman, M. (2013). UCI machine learning repository. http://archive.ics.uci.edu/ml. Lu, Y., & Wan, Y. (2012). Clustering by sorting potential values (CSPV): a novel potential-based clustering method. Pattern Recognition, 45, 3512–3522. https://doi.org/10.1016/j.patcog.2012.02.035. Marlow, W.H. (1978). Mathematics for operations research. New York: Dover. Ramsey, A.S. (1959). An introduction to the theory of Newtonian attraction. Cambridge: Cambridge University Press. Rao, S.S. (1984). Optimization theory and applications, 2nd edn. New Delhi: Wiley Eastern. Shi, S., Yang, G., Zheng, D.W. (2002). Potential-based hierarchical clustering. In Proceedings of the 16th international conference on pattern recognition, 2002, (Vol. 4 pp. 272–275). Quebec: IEEE, https://doi.org/10.1109/ICPR.2002.1047449. Susskind, L., & Hrabovsky, G. (2013). The theoretical minimum: what you need to know to start doing physics. New York: Perseus Group Books. Wright, W.E. (1977a). A formalization of cluster analysis. Pattern Recognition, 5, 273–282. https://doi.org/ 10.1016/0031-3203(73)90048-4. Wright, W.E. (1977b). Gravitational clustering. Pattern Recognition, 9, 151–166. https://doi.org/10.1016/ 0031-3203(77)90013-9. Xu, D., & Tian, Y. (2015). A comprehensive survey of clustering algorithms. Annals of Data Science, 2(2), 165–193. https://doi.org/10.1007/s40745-015-0040-1. Publisher’s Note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Journal of Classification – Springer Journals
Published: Dec 18, 2019
You can share this free article with as many people as you like with the url below! We hope you enjoy this feature!
Read and print from thousands of top scholarly journals.
Already have an account? Log in
Bookmark this article. You can see your Bookmarks on your DeepDyve Library.
To save an article, log in first, or sign up for a DeepDyve account if you don’t already have one.
Copy and paste the desired citation format or use the link below to download a file formatted for EndNote
Access the full text.
Sign up today, get DeepDyve free for 14 days.
All DeepDyve websites use cookies to improve your online experience. They were placed on your computer when you launched this website. You can change your cookie settings through your browser.