Access the full text.
Sign up today, get DeepDyve free for 14 days.
D. Levy, R. Yoshida, L. Pachter (2006)
Beyond pairwise distances: neighbor-joining with phylogenetic diversity estimates.Molecular biology and evolution, 23 3
K. Huber, G. Scholz (2016)
Beyond Representing Orthology Relations by TreesAlgorithmica, 80
Marc Hellmuth, Nicolas Wieseke, Marcus Lechner, Hans-Peter Lenhof, M. Middendorf, P. Stadler (2015)
Phylogenomics with paralogsProceedings of the National Academy of Sciences, 112
R. Agarwala, V. Bafna, Martín Farach-Colton, M. Paterson, M. Thorup (1996)
On the approximability of numerical taxonomy (fitting distances by tree metrics)SIAM J. Comput., 28
Sven Herrmann, K. Huber, V. Moulton, A. Spillner (2012)
Recognizing Treelike k-DissimilaritiesJournal of Classification, 29
(P2) For all pairwise distinct x, y, z ∈ X , δ ( x, y, z ) contains at most two distinct elements
Chikio Hayashi (1972)
Two dimensional quantification based on the measure of dissimilarity among three elementsAnnals of the Institute of Statistical Mathematics, 24
S. Joly, G. Calvé (1995)
Three-way distancesJournal of Classification, 12
A AHO (1981)
1073SIAM Journal of Computing, 28
V CHEPOI, B FICHET (2007)
Selected Contributions in Data Analysis and Classification
M WARRENS (2010)
11n-Way MetricsJournal of Classification, 27
V. Chepoi, B. Fichet (2007)
A Note on Three-Way Dissimilarities and Their Relationship with Two-Way Dissimilarities
V. Gurvich (2009)
Decomposing complete edge-chromatic graphs and hypergraphs. RevisitedDiscret. Appl. Math., 157
(2001)
Perfect Graphs and Graph Entropy, An Updated Survey
M LAFOND, N EL-MABROUK (2015)
Orthology Relation and Gene Tree Correction: Complexity Results WABI 2015Algorithms in Bioinformatics, 9289
S. Grünewald, Yangjing Long, Yaokun Wu (2017)
Reconstructing Unrooted Phylogenetic Trees from Symbolic Ternary MetricsBulletin of Mathematical Biology, 80
M. Deza, I. Rosenberg (2000)
n-SemimetricsEur. J. Comb., 21
Manuel Lafond, N. El-Mabrouk (2015)
Orthology Relation and Gene Tree Correction: Complexity Results
V. Gurvich (2007)
Decomposing complete edge-chromatic graphs and hypergraphs
Sebastian Böcker, A. Dress (1998)
Recovering Symbolically Dated, Rooted Trees from Symbolic UltrametricsAdvances in Mathematics, 138
Marc Hellmuth, M. Hernández-Rosales, K. Huber, V. Moulton, P. Stadler, Nicolas Wieseke (2012)
Orthology relations, symbolic ultrametrics, and cographsJournal of Mathematical Biology, 66
H. Bandelt, P. Forster, Brian Sykes, M. Richards (1995)
Mitochondrial portraits of human populations using median networks.Genetics, 141 2
V GURVICH (1984)
Some Properties and Applications of Complete Edge-Chromatic Graphs and HypergraphsSoviet Mathematics Doklady, 30
A. Aho, Y. Sagiv, T. Szymanski, J. Ullman (1981)
Inferring a Tree from Lowest Common Ancestors with an Application to the Optimization of Relational ExpressionsSIAM J. Comput., 10
Matthijs Warrens (2010)
n-Way MetricsJournal of Classification, 27
W. Heiser, Mohammed Bennani (1997)
Triadic Distance Models: Axiomatization and Least Squares RepresentationJournal of mathematical psychology, 41 2
(2003)
Phylogenetics, Oxford Lecture Series in Mathematics and its Applications
Journal of Classiﬁcation 36: 513-540 (2019) DOI: 10.1007/s00357-018-9274-x Katharina T. Huber University of East Anglia, UK Vincent Moulton University of East Anglia, UK Guillaume E. Scholz University of East Anglia, UK Abstract: Three-way dissimilarities are a generalization of (two-way) dis- similarities which can be used to indicate the lack of homogeneity or resem- blance between any three objects. Such maps have applications in cluster analysis and have been used in areas such as psychology and phylogenetics, where three-way data tables can arise. Special examples of such dissimi- larities are three-way tree-metrics and ultrametrics, which arise from leaf- labelled trees with edges labelled by positive real numbers. Here we consider three-way maps which arise from leaf-labelled trees where instead the inte- rior vertices are labelled by an arbitrary set of values. For unrooted trees, we call such maps three-way symbolic tree-maps; for rooted trees, we call them three-way symbolic ultrametrics since they can be considered as a gen- eralization of the (two-way) symbolic ultrametrics of Böcker and Dress. We show that, as with two- and three-way tree-metrics and ultrametrics, three- way symbolic tree-maps and ultrametrics can be characterized via certain k-point conditions. In the unrooted case, our characterization is mathemat- ically equivalent to one presented by Gurvich for a certain class of edge- labelled hypergraphs. We also show that it can be decided whether or not an arbitrary three-way symbolic map is a tree-map or a symbolic ultrametric using a triplet-based approach that relies on the so-called BUILD algorithm The authors thank The Biomathematics Research Centre, University of Canterbury, New Zealand, where some of this work was undertaken. KTH and VM also thank the London Mathematical Scociety for supporting their visit to that centre. In addition, the authors would like to thank Vladimir Gurvich and Stefan Grünewald for making them aware of their work, and also the anonymous reviewers for their helpful feedback. Corresponding Author’s Address: K. Huber, University of East Anglia, Norwich Research Park, Norwich, NR4 7TJ, UK, email: K.Huber@uea.ac.uk. Published online: 31 October 2018 514 K.T. Huber, V. Moulton, and G.E. Scholz for deciding when a set of 3-leaved trees or triplets can be displayed by a single tree. We envisage that our results will be useful in developing new approaches and algorithms for understanding 3-way data, especially within the area of phylogenetics. Keywords: Three-way dissimilarity; Three-way symbolic map; Symbolic ultrametric; Ultrametric; Tree-metric; Phylogenetic tree. 1. Introduction Three-way dissimilarities are a generalization of (two-way) dis- similarities, which can be used to indicate the lack of homogeneity or resemblance between any three objects in a given set (Joly and Le Calvé, 1995). They have applications in areas such as psychology (Heiser and Bennani, 1997) and phylogenetics (Levy, Yoshida, and Pachter, 2006), where they have been used to cluster data presented in the form of three-way data tables. Various special classes of three-way dissimilari- ties have been introduced (see e.g. Chepoi and Fichet, 2007; Hayashi, 1972; Heiser and Bennani, 1997; Joly and Le Calvé, 1995). These in- clude three-way dissimilarities that arise from leaf-labelled trees, where the edges are weighted by positive real numbers. These so-called three- way tree-metrics and three-way ultrametrics, which arise from unrooted and rooted trees, respectively, generalize their much studied two-way counterparts (cf. Chepoi and Fichet, 2007, for an overview). Intriguingly, in Böcker and Dress (1998), Böcker and Dress showed that the concept of ultrametricity for dissimilarities can be naturally ex- tended to include two-way symmetric maps whose range is an arbitrary set of symbols. In particular, they introduced the concept of a sym- bolic ultrametric (a two-way map arising from a rooted, vertex-labelled tree via the least common ancestor map), and characterized them in terms of a 3- and a 4-point condition (see Section 2 for full details), a result which had in fact been discovered independently in another guise by V. Gurvich (1984) (see Section 2 for details). These conditions generalize the well-known 3-point condition for ultrametricity (cf. e.g. Semple and Steel, 2003, Chapter 7.2). Symbolic ultrametrics have been found to have interesting connections with cograph theory (Hellmuth et al., 2013), game theory (Gurvich, 1984; Gurvich, 2009), as well as applications within phylogenetics (Hellmuth et al., 2015; Lafond and El-Mabrouk, 2015). Therefore, it is of interest to understand how the theory of symbolic ultrametrics can be extended to three-way maps, as these may lead to useful new applications in these areas (e.g. see the last section for a potential application in phylogenetics). Symbolic Tree-Maps and Ultrametrics 515 25 B BA A 1 B 4 3 12345 (a)(b) Figure 1. Two trees which give rise to (a) a three-way symbolic tree-map and (b) a three-way symbolic ultrametric. In this paper, we shall address this question. Let X be a set (of taxa)ofsizeatleast 3 andlet M be a set (of symbols)of size atleast2. A (three-way) symbolic map is a map δ : → M . For example, consider the unrooted tree in Figure 1(a) with leaf-set X = {1, 2, 3, 4, 5} whose interior vertices are labelled by elements from the set M = {A, B}. This tree gives rise naturally to a three-way symbolic map from X to the set M; toeach tripleofleaves weassign theelement of M which labels the vertex lying on all shortest paths between any two of these three leaves (e.g., the triple {1, 3, 5} is assigned the symbol A). We call symbolic maps that arise in this way three-way symbolic tree-maps. In Section 3, we show that a three-way symbolic tree-map uniquely determines its underlying labelled tree (Proposition 3.2), and also give a 4- and 5-point characterization for such maps (see Theorem 3.3). This result is mathematically equivalent to Gurvich (1984, Theorem 5), but for completeness we provide its proof. Our characterization for three-way symbolic tree-maps is analogous to the well-known 4-point condition for tree-metrics (cf. Semple and Steel, 2003, Chapter 7.1), and also generalizes the conditions presented in Herrmann, Huber, and Moulton (2012, Theorem 7) for determining when a three-way dissimi- larity arises from a tree. To prove Theorem 3.3 we introduce a symbolic variant of the Farris transform (Semple and Steel, 2003, p. 149), which allows us to apply the main result from Böcker and Dress, 1998). We conclude the section with a description of how our result is related to the ones presented in Gurvich (1984). In Section 4, we turn our attention to obtaining three-way sym- bolic maps from rooted trees. Consider the rooted tree in Figure 1(b). A symbolic ultrametric can be associated to this tree by deﬁning the 516 K.T. Huber, V. Moulton, and G.E. Scholz value for each pair of leaves to be the symbol labelling the least com- mon ancestor vertex of these two leaves. Therefore, a natural way to deﬁne a three-way symbolic ultrametric could be to take the value of each triple of leaves to be the set consisting of the symbols labelling the least common ancestor of all pairs of leaves in the triple (for example, we would assign the set {A, B} to the triple 1, 2, 5). However, this does not suﬃce to capture the tree (see Section 4). Even so, as we shall see, if we consider the values of the triples to be multisets instead of sets (for example, we would assign the multiset {A, A, B} to the triple 1, 2, 5 in Figure 1(b)), then we can in fact recover the underlying labelled tree in case |X|≥ 5 (Theorem 4.4). We call maps obtained in this way three-way symbolic ultrametrics. In Section 5, we give 3-, 4- and 5-point conditions which ensure that a three-way symbolic map that maps into the set of size 3 multisets of a set of symbols is a symbolic ultrametric. This is somewhat surprising since for three-way dissimilarities, a 6-point condition is required to ensure that they can be represented by a rooted tree in an analogous way (cf. Herrmann et al., 2012, Theorem 7). We conclude the paper by considering an alternative approach for deciding whether or not a three-way symbolic map is a tree-map or symbolic ultrametric. This approach is based on the BUILD algorithm (Aho et al., 1981), which can be used to decide when a set of triplets (i. e. resolved rooted leaf-labelled trees each with three leaves) is displayed by some supertree or not. Applying this algorithm to three-way symbolic maps has the advantage that only sets of size three (as opposed to sets of size up to ﬁve) need to be considered so as to determine if a three- way symbolic map is a tree-map or a symbolic ultrametric. This could potentially lead to practical algorithms for performing this task. In Section 7, we present some future directions. 2. Preliminaries For a set {x ,... ,x }, k ≥ 1, in the powerset P (X)of X and a 1 k map δ : P (X ) → M , we will write δ(x ,...,x ) instead of δ({x ,...,x }). 1 k 1 k A symbolic ultrametric (Böcker and Dress, 1998) is a 2-way sym- bolic map D : → M satisfying: (U1) For all three distinct elements x, y, z ∈ X,at least two of the three values D(x, y), D(y, z)and D(x, z) are the same. (U2) There exists no four pairwise distinct elements x, y, z, u ∈ X such that D(x, y)= D(y, z)= D(z, u) = D(z, x)= D(x, u)= D(u, y). Symbolic Tree-Maps and Ultrametrics 517 Suppose that T is a tree. Then we denote by L(T)the setof leaves of T and by V (T):= V (T ) − L(T)the set of internal vertices of T.If T is rooted then we denote by ρ the root of T.Moreover, for any two distinct leaves x and y in T , we deﬁne the least common ancestor lca (x, y)of x and y in T to be the last vertex in T that lies on both of the paths which start at ρ and end in x and in y.If lca (x, y)is T T adjacent with both x and y then we call the set {x, y} a cherry of T . We also say that vertex v in T lies below a vertex w = v in T if w lies on the path from the root of T to v. A (rooted/unrooted) phylogenetic tree T on X is a (rooted/un- rooted) tree with leaf-set X that does not contain vertices of degree two in case T is unrooted and no vertex with indegree and outdegree one in case T is rooted. Note that we will only use the terms rooted or unrooted in case it is not clear from the context which type of tree we are considering. Two phylogenetic trees T and T on X are isomorphic if there exists a bijection V (T ) → V (T ) that induces a graph isomor- phism between T and T that is the identity on X (i.e. the map which takes every element in X to itself). In case T is a rooted phylogenetic tree on X,and Y is a subset of X with size at least two, we let T denote the phylogenetic tree spanned by Y (obtained by suppressing vertices with indegree and outdegree one), and say that T is induced by Y . A labelled (rooted/unrooted) tree T on X is a pair (T, t), where T is a (rooted/unrooted) phylogenetic tree on X,and t is a labelling map on M , that is, a map from the internal vertices of T to a set M of symbols. If t(u) = t(v) for every u = v contained in the same edge of T,we say that T is discriminating. A labelled rooted tree T =(T, t) on X is a representation of a (two-way) symbolic map D : → M (or T represents D) if for all distinct x, y ∈ X,we have D(x, y)= t(lca (x, y)). Theorem 2.1 [Böcker and Dress, 1998] Let D : → M be a symbolic map. There exists a discriminating labelled rooted tree T that represents D if and only if D is a symbolic ultrametric. If this holds, then such a tree is necessarily unique. Interestingly, Theorem 2.1 appeared in a diﬀerent guise in Gur- vich (1984) (see also Gurvich, 2000, for more details) in the context of game theory. Within this context, the leaves of the tree T are seen as end of game situations, the label set M corresponds to a set of play- ers, and a directed path from the root of T to a leaf is a sequence of plays. As we will come back to this correspondence in Section 3, we 518 K.T. Huber, V. Moulton, and G.E. Scholz x yz Δ: Π: y zx u (i) (ii) Figure 2. (i) An edge-colored graph Δ on X = {x, y, z} adapted from Gurvich (1984, Figure 1). (ii) An edge-colored graph Π on X = {x, y, z, u} adapted from the same ﬁgure. Colors are represented in terms of diﬀerent edge styles (plain, dashed and dotted). now review some relevant terminology and results presented in Gurvich (1984). Suppose that H is an edge-labelled graph on X , that is a graph with vertex set V (H)= X and edge set E(H)= , equipped with amap D : E(H ) → M for a given, nonempty set M.Then D is a (two-way) symbolic map on X , and any symbolic map D : → M can trivially be seen as an edge-labelled graph (H, D)on X . An edge-labelled graph (H, D)on X is said to be linked if for all m ∈ M , the graph H obtained from H by removing all edges e ∈ E(H)forwhich D(e)= m holds is connected. For example, both graphs Δ and Π depicted in Figure 2 are linked. If (H, D)does not contain any linked subgraph, it is said to be separated. The following result from Gurvich (1984) links the property for an edge labelled graph to be separated with the representability of the symbolic map it induces. For this, it relies on the equivalence between the following three statements for a symbolic map D : → M (see Gurvich, 1984, Theorem 2 for the equivalence between (ii) and (iii), and Gurvich, 1984, Theorem 4) for the equivalence between (i) and (ii), where a discriminating labelled rooted tree is called a positional structure,or PS for short): (i) There exists a (unique) discriminating labelled rooted tree T that represents D. (ii) The edge-labelled graph (H, D) is separated. (iii) The edge-labelled graph (H, D) does not contain any subgraph isomorphic to Δ or Π (depicted in Figure 2). As mentioned above, this result, and in particular the equivalence between conditions (i) and (iii), provides a direct equivalent to Theo- Symbolic Tree-Maps and Ultrametrics 519 rem 2.1. Indeed, as is easy to see a symbolic map D : X → M satisﬁes (U1) (resp. (U2)) if and only if the edge-labelled graph (H, D)does not contain a subgraph isomorphic to Δ (resp. Π), implying that condition (iii) and the property of being a symbolic ultrametric are equivalent. 3. Three-Way Symbolic Tree-Maps We begin by considering three-way symbolic maps that arise from labelled unrooted trees. Such a tree T =(T, t) clearly gives rise to a three-way symbolic map δ : → M by putting, for all x, y, z ∈ X , δ (x, y, z)= t(med (x, y, z)), where med (x, y, z) denotes the median T T T vertex of x, y, z in T (that is, the unique vertex lying on the paths from x to y,from x to z and from y to z, respectively). If for a three-way symbolic map δ : → M , there exists a labelled unrooted tree T such that δ = δ ,we say that δ is a three-way symbolic tree-map (on X),and that T is a representation of δ (or T represents δ). We now characterize such maps. To do this, we deﬁne a symbolic Farris transform, the deﬁnition of which is adapted from the well-known Farris transform (Gurvich, 2009, p. 149) as follows. Suppose T =(T, t) is a labelled unrooted tree on X where |X|≥ 4. Put δ = δ . Pick a leaf r ∈ X , and deﬁne a rooted phylogenetic tree T on X −{r} as follows: direct all edges of T away from r,and remove r and its outgoing edge. This induces a bijection ψ from the set of internal vertices of T to the set of internal vertices of T . Hence the map t : V (T ) → M which takes any internal vertex v of T to M r r r −1 given by t (v)= t(ψ (v)) is well-deﬁned, and the pair T =(T ,t )is r r r r a labelled rooted tree. Now, suppose that δ is the three-way symbolic tree-map that is represented by T ,and that D is the symbolic ultrametric on X that is represented by T . Lemma 3.1 For all x, y ∈ X −{r} with |X|≥ 4, we have D (x, y)= δ(x, y, r). Proof. It suﬃces to note that via the symbolic Farris transform, the median vertex of x, y and r in T becomes the least common ancestor of x and y in T . Denoting the latter by v,we then have D (x, y)= r r −1 t (v)= t(ψ (v)) = t(med (x, y, r)) = δ(x, y, r). r T Motivated by this observation, for a three-way symbolic map δ : → M and some r ∈ X where |X|≥ 4, we deﬁne the map X −{r} δ : → M ; δ (x, y)= δ(x, y, r), r r 2 520 K.T. Huber, V. Moulton, and G.E. Scholz for all x, y ∈ X distinct (which can be considered as a symbolic analogue of the Farris transform as deﬁned in Gurvich (2009, p. 149). Using Lemma 3.1, we can now prove a uniqueness result. Proposition 3.2 Let δ : → M be a three-way symbolic tree-map where |X|≥ 4. There exists a unique discriminating labelled unrooted tree T that represents δ. X −{r} Proof. Let r ∈ X and consider the map δ : → M.By Lemma 3.1, δ is a symbolic ultrametric, and thus, admits a unique discriminating representation T . Moreover, this representation is ob- tained from a representation of δ, using the symbolic Farris transform. This operation is clearly invertible, and preserves the property of being discriminating. Thus, the labelled unrooted tree T obtained from T by inverting the symbolic Farris transform is necessarily the only dis- criminating representation of δ. We now characterize three-way symbolic tree maps. As we shall explain below, an equivalent characterization appears in Gurvich (1984) in the guise of Theorem 5 of that paper. For the sake of completeness, we present a proof within our framework. Subsequent to this, we explain how the approach in Gurvich (1984) relates to ours. Theorem 3.3 Suppose that |X|≥ 4 and that δ : → M is a three- way symbolic map. Then δ is a three-way symbolic tree-map if and only if δ satisﬁes the following two conditions: (M1) For all {x, y, z, u}∈ ,either δ(x, y, z)= δ(x, y, u)= δ(x, z, u)= δ(y, z, u) or two of these four are equal and so are the remaining two. (M2) There does not exist {x, y, z, u, v}∈ such that δ(v, x, y)= δ(v, y, z)= δ(v, z, u) = δ(v, z, x)= δ(v, x, u)= δ(v, u, y). In order to prove Theorem 3.3, we start with a useful lemma. Lemma 3.4 Suppose that |X|≥ 4 and that δ : → M is a three-way symbolic map satisfying (M1) and (M2). Then for all r ∈ X,the map δ is a symbolic ultrametric. Proof. Let r ∈ X . We need to show that δ satisﬁes properties (U1) and (U2). Symbolic Tree-Maps and Ultrametrics 521 To see that δ satisﬁes (U1), consider three elements x, y, z ∈ X −{r}.Since δ satisﬁes (M1) the set {δ(r, x, y),δ(r, x, z),δ(r, y, z)} contains at most two distinct elements. As this set is precisely the set {δ (x, y),δ (x, z),δ (y, z)},(U1)follows. r r r To see that (U2) holds, assume for contradiction that there exist four pairwise distinct elements x, y, z, u ∈ X −{r} such that δ (x, y)= δ (y, z)= δ (z, u) = δ (z, x)= δ (x, u)= δ (u, y). This implies r r r r r δ(r, x, y)= δ(r, y, z)= δ(r, z, u) = δ(r, z, x)= δ(r, x, u)= δ(r, u, y), which is impossible in view of (M2). Note that the converse of the Lemma 3.4 is not true in general. Consider for example the sets X = {1,... ,n}, n ≥ 4, M = {A, B},and the map δ : → M deﬁned for x, y, z ∈ X by putting δ(x, y, z)= A if 1 ∈{x, y, z} and δ(x, y, z)= B otherwise. Clearly, δ does not satisfy (M1), as we have δ(1, 2, 3) = δ(1, 2, 4) = δ(1, 3, 4) = δ(2, 3, 4). However, we have δ (x, y)= A for all x, y ∈ X −{1}, which is clearly a symbolic ultrametric. In fact, for any 2 ≤ k ≤ n we have δ (x, y)= A if 1 ∈{x, y} and δ (x, y)= B otherwise and, so, δ is also a symbolic k k ultrametric on X −{k}. Armed with Lemma 3.4, we can now prove Theorem 3.3. Proof. Assume ﬁrst that δ is a three-way symbolic tree-map, and denote by T =(T, t) its representation. To see that δ satisﬁes (M1), consider four pairwise distinct elements x, y, z, u ∈ X . Two cases may occur. If med (x, y, z)= med (x, y, u)= med (x, z, u)= med (y, z, u), it T T T T follows immediately that δ(x, y, z)= δ(x, y, u)= δ(x, z, u)= δ(y, z, u). Otherwise, there exists two pairs, say {x, y} and {z, u}, such that the path between x and y and the path between z and u are disjoint. In this case, we have med (x, y, z)= med (x, y, u) = med (x, z, u)= T T T med (y, z, u). If t(med (x, y, z)) = t(med (x, z, u)), it follows that T T T δ(x, y, z)= δ(x, y, u)= δ(x, z, u)= δ(y, z, u). Otherwise, we have δ(x, y, z)= δ(x, y, u) = δ(x, z, u)= δ(y, z, u). Thus, δ satisﬁes (M1). If |X | = 4 then it is straight forward to check that the theorem holds. So assume that |X|≥ 5. To see that δ satisﬁes (M2), assume for contradiction that there exist pairwise distinct x, y, z, u, v ∈ X such that δ(v, x, y)= δ(v, y, z)= δ(v, z, u) = δ(v, z, x)= δ(v, x, u)= δ(v, u, y). We can apply the symbolic Farris transform to T and v, thus obtaining a labelled rooted tree T . By Lemma 3.1, T is a representation of v v δ , implying that δ is a symbolic ultrametric. But, by deﬁnition, δ v v v satisﬁes δ (x, y)= δ (y, z)= δ (z, u) = δ (z, x)= δ (x, u)= δ (u, y), v v v v v v which contradicts (U2). 522 K.T. Huber, V. Moulton, and G.E. Scholz Conversely, assume that δ satisﬁes Properties (M1) and (M2), and let r ∈ X . By Lemma 3.4, the map δ is a symbolic ultrametric. Thus, there exists a labelled rooted tree T =(T ,t )on X −{r} representing r r r δ . Consider the labelled unrooted tree T =(T, t)on X deﬁned as follows. First, add a new vertex r to T and the edge {ρ ,r}.Then r T consider all edges in the resulting tree to be undirected. Let t : V (T ) → M denote the map given by t(v)= t (v), for all v ∈ V (T ). We claim that for all {x, y, z}∈ ,we have δ(x, y, z)= t(med (x, y, z)), that is, T is a representation of δ. To prove this, it suﬃces to consider two cases. Suppose {x, y, z}∈ . Case (a): {x, y, z}⊆ X −{r}. Without loss of generality, δ (x, z)= δ (y, z)= t (u)and δ (x, y)= t (v), where u and v are vertices of T , r r r r r and v is below or equal to u in T .In this case, t(med (x, y, z)) equals r T t (v). By (M1) and since δ(x, z, r)= δ(y, z, r), we have δ(x, y, z)= δ(x, y, r)= t (v)= t(med (x, y, z)). Thus, T is a representation of δ r T in this case. Case (b): r ∈{x, y, z},say r = z. If wedenoteby v the least com- mon ancestor of x and y in T ,then t(med (x, y, z)) = t (v). Hence r T r δ(x, y, r)= δ (x, y)= t (v)= t(med (x, y, r)). Thus, T is a represen- r r T tation of δ in this case, too. We next elaborate on the relationship between Theorem 3.3 and Theorem 5 in Gurvich (1984). As mentioned in Section 2, a symbolic two-way map D : → M can be seen as an edge-labelled graph (H, D). Similarily, a symbolic three-way map δ : → M can be seen as an edge-labelled 3-hypergraph (H,δ), where by 3-hypergraph, we mean that the edges of H are sets of three vertices (instead of two for graphs). Within this context, the vertex set of a 3-hypergraph H associated to a 3-way map δ is X , as in the case of two-way maps, and theedgesetof H is . Gurvich (1984) uses as a starting point for his characterization the equivalence, presented in Section 2, between symbolic 2-way maps that can be represented by a rooted tree and edge-labelled graphs that do not contain any subgraph isomorphic to the graphs Δ or Π (see Figure 2). The idea underlylng Gurvich (1984, Theorem 5) is the following. From an edge-labelled 3-hypergraph (H,δ)on X , we can pick an element r ∈ X and consider the edge-labelled graph (H, δ )on X −{r}.It is then possible to highlight three edge-labelled 3-hypergraph δ ,δ ,δ with four 2 3 4 vertices, that get transformed into edge-labelled graphs isomorphic to Δ via that operation, and one edge-labelled 3-hypergraph π with ﬁve Symbolic Tree-Maps and Ultrametrics 523 vertices, that gets transformed into an edge-labelled graph isomorphic to Π. The equivalence between the representability of δ by a labelled unrooted tree and the representability of δ by a labelled rooted tree for all r ∈ X then leads to the conclusion that an edge-labelled 3- hypergraph (H,δ) is representable if and only if it does not contain a sub(hyper)graph isomorphic to any of δ ,δ ,δ and π. 2 3 4 As it turns out, (H,δ) contains a sub(hyper)graph isomorphic to one of δ ,δ ,δ (resp. to π) if and only if δ does not satisfy (M1) (resp. 2 3 4 (M2)). This implies that Theorem 3.3 and Theorem 5 in Gurvich (1984) are equivalent. Finally, note that a similar result also appears in Grunewald, Long, and Wu (2017). However, the arguments used by Grunewald et al. (2017) do not rely on the projection of a three-way map to a two- way map and of an unrooted tree to a rooted tree, as is the case both here and in Gurvich (1984). 4. Three-Way Symbolic Ultrametrics In the last section, we considered the problem of deciding when a three-way symbolic map arises from a labelled unrooted tree. In this section, we start to consider this problem for their rooted counterparts. In particular, after deﬁning the concept of a three-way symbolic ultra- metric, we shall show that to determine whether or not a three-way symbolic map is a symbolic ultrametric, it suﬃces to consider its re- striction to sets of size ﬁve. We begin by considering how to deﬁne a three-way symbolic ul- trametric. If we consider 3 distinct leaves x, y, z of a rooted phylogenetic tree T on X , then we can clearly identify two internal vertices of the tree given by the set {lca (x, y), lca (x, z), lca (y, z)} (in contrast to T T T unrooted phylogenetic trees where we can identify only one, namely the median of the 3 leaves). A natural approach to obtain a three-way sym- bolic map δ from a labelled rooted tree T =(T, t) might therefore be to take δ(x, y, z)to be the set {t(lca (x, y)),t(lca (x, z)),t(lca (y, z))}, T T T for x, y, z ∈ X distinct. However, as can be seen in Figure 3, such a map does not necessarily uniquely capture T . For this reason, we shall consider instead maps to multisets. To formalize this, let M = M denote the set of multisets {a, b, c} with a, b, c ∈ M . As it will be useful later on, we shall also sometimes denote an element in M as a sum. So, for example, for the element {a, a, b}∈ M with a, b ∈ M , we sometimes also write 2a + b. Now, given a labelled rooted tree T =(T, t)on X , we deﬁne the three-way symbolic map 524 K.T. Huber, V. Moulton, and G.E. Scholz B A B B 12 3 4 5 124 3 5 Figure 3. Two labelled rooted trees on X = {1, 2, 3, 4, 5} with labelling maps t and t on M = {A, B}, respectively, for which the sets {t(lca (x, y)),t(lca (x, z)), T T t(lca (y, z))} and {t (lca (x, y)),t (lca (x, z)),t (lca (y, z))} coincide, for any T T T T three elements x, y, z ∈ X distinct. δ : →M by putting δ (x, y, z)= {t(lca (x, y)),t(lca (x, z)),t(lca (y, z))}. T T T T for all distinct x, y, z ∈ X . If for a three-way symbolic map δ : →M there exists a labelled rooted tree T =(T, t)on X such that δ = δ ,then we call δ a three-way symbolic ultrametric (on X).Thus, intuitively, δ is a three-way symbolic ultrametric if it can be represented by labelling a rooted tree on X in such a way that, for every 3-subset {x, y, z} of X , δ(x, y, z) is the multiset consisting of the labels of the least common ancestors for all pairs of elements in {x, y, z}. In addition, we say that T is a representation for δ (or that T represents δ). We say that T is discriminating if t(u) = t(v), for every u = v contained in the same edge in T . Note that we can think of δ as a symbolic analogue of a three-way perimeter map which arises from a weighted tree T by taking, for any three leaves of T , the length of subtree spanned by the those leaves (see e.g. Chepoi and Fichet, 2007). Also, note that by (U1), δ must satisfy the following property: Lemma 4.1 Let δ : →M be a three-way symbolic ultrametric. Then, for any three distinct elements x, y, z ∈ X, the number of distinct elements in the multiset δ(x, y, z) is at most two. We now turn our attention to showing that we can determine whether or not a three-way symbolic map δ : →M is a symbolic ultrametric by restricting δ to subsets of X with size ﬁve. To do this, we ﬁrst need to introduce some additional notation. For a subset Y of X of size four or more, let δ| denote the restriction of δ to ,that Y X is, the map obtained by restricting the map δ to the subset of . 3 3 Note that if δ is a three-way symbolic ultrametric, then δ| is a three- way symbolic ultrametric for all subsets Y ⊆ X with |Y |≥ 4. Indeed, Symbolic Tree-Maps and Ultrametrics 525 T : AAAA T : T : T : 1 2 3 4 B BBB C 1 2 3 4 1 2 3 4 12 34 12 34 T : AAA T : T : 5 6 7 B B BA C 12 3 41 2 341 2 3 4 Figure 4. All possible discriminating labelled rooted trees T ,1 ≤ i ≤ 7, on {1, 2, 3, 4}, up to a relabelling of the leaves. Table 1. For 1 ≤ i ≤ 7and M = {A, B, C}, the values of the map δ represented by the labelled rooted trees T in Figure 4. The trees T are given in terms of their i i index i in the top row. i 1 2 3 4 5 6 7 δ (1, 2, 3) 3A 2A+B 2A+B 2A+B 3B A+2B 2B+C δ (1, 2, 4) 3A 2A+B 2A+B 2A+B 2A+B 3A 2A+C δ (1, 3, 4) 3A 3A 2A+B 2A+C 2A+B 2A+B 2A+B δ (2, 3, 4) 3A 3A 2A+B 2A+C 2A+B 2A+B 2A+B if T is a representation of δ, then the subtree T of T induced by Y is a representation of δ| . Furthermore, we obtain a discriminating representation of δ| by collapsing all edges of T both of whose end Y Y vertices have the same label. We now consider symbolic ultrametrics on a set of size four. For M = {A, B, C }, in Figure4, wepicture all possiblediscriminating labelled rooted trees T ,1 ≤ i ≤ 7, on {1, 2, 3, 4} and in Table 1, we list for all 1 ≤ i ≤ 7 the values of the map δ : →M that is represented ˆ ˆ by T . As we can see from this table, all of the maps δ except for δ i i 3 capture T (in the sense that T is the unique labelled rooted tree on i i {1, 2, 3, 4} that represents δ ). Now, for Y ⊆ X of size four, we say that δ| is of type δ , i ∈{1,..., 7} if there exists a bijection between Y and Y i {1, 2, 3, 4} that induces a bijection between the image of δ| and the ˆ ˆ image of δ such that δ| and δ coincide up to these bijections. Since i Y i Table 1 is exhaustive, we have: 526 K.T. Huber, V. Moulton, and G.E. Scholz Proposition 4.2 Suppose that |X|≥ 4, that δ : →M is a three- way symbolic map, and that Y ⊆ X is a subset of size four. Then δ| : →M is a three-way symbolic ultrametric on Y if and only if there exists some i ∈{1,... , 7} such that δ| is of type δ . Moreover, Y i if i =3, the representation of δ| is unique. We now turn our attention to symbolic ultrametrics on sets of size ﬁve. In the last result, we have seen that a three-way symbolic ultrametric on a set of size 4 may have more than one representation by a labelled tree. However, as we shall now show this can not happen for sets of size ﬁve. Lemma 4.3 If Y is a set of size ﬁve and δ : →M is a three-way symbolic ultrametric on Y ,then δ has a unique discriminating repre- sentation. Proof. Suppose that δ is a three-way symbolic ultrametric on Y ,and that T is a discriminating representation of δ.Let D = D : → M be the symbolic ultrametric represented by T . By Theorem 2.1, it suﬃces to show that if δ is also represented by a labelled tree T ,then D = D . T T Since δ is a three-way symbolic ultrametric on Y , there exists a subset Y of Y with |Y | =4such that δ| is not of type δ .Thus, 0 0 Y 3 by Proposition 4.2, D | = D | . Hence, D (x ,x)= D (x ,x) T Y T Y T 0 T 0 0 0 for all x ∈ Y where x is the unique element contained in Y − Y , 0 0 0 since the value of D (x ,x)is given by δ and D | as follows. Let T 0 T Y Y = {x, y, z, u} and consider the multisets δ(x, y, x ) − D | (x, y), 0 0 T Y δ(x, z, x ) − D | (x, z)and δ(x, u, x ) − D | (x, u)wherefor a mul- 0 T Y 0 T Y 0 0 tiset A with k ≥ 1 copies of some element a,we denote by A − a the multiset obtained by removing one copy of a. If there exists a unique element c ∈ M that belongs to all three of these sets, we have D (x ,x)= c. If two distinct elements of M share this property, this T 0 implies D (x ,y)= D (x ,z)= D (x ,u) = D (x ,x). We then T 0 T 0 T 0 T 0 have D (x ,y)= m(δ(y, z, x )), and D (x ,x) is the single element T 0 0 T 0 of δ(x, y, x ) −{D | (x, y),D (x ,y)}. 0 T Y T 0 Note that, as the example in Table 2 shows, it is not true in general that a three-way symbolic map δ that restricts to a three-way symbolic ultrametric on all subsets Y of X of size four is a three-way symbolic ultrametric on X . However, as mentioned above, using the previous lemma we now show that considering sets of size ﬁve is enough to ensure that this is the case. Symbolic Tree-Maps and Ultrametrics 527 Table 2. For M = {A, B} and X = {1, 2, 3, 4, 5}, a three-way symbolic map δ : →M which is not a three-way symbolic ultrametric on X but whose restriction to any subset Y ⊂ X of size four is a three-way symbolic ultrametric on Y . δ(1, 2, 3) 2A+B δ(1, 4, 5) 2A+B δ(1, 2, 4) 2A+B δ(2, 3, 4) 3A δ(1, 2, 5) 3B δ(2, 3, 5) 2A+B δ(1, 3, 4) 3A δ(2, 4, 5) 2A+B δ(1, 3, 5) 2A+B δ(3, 4, 5) A+2B Theorem 4.4 Suppose that |X|≥ 5 and that δ : →M is a three- way symbolic map. Then, δ is a three-way symbolic ultrametric if and only if δ| is a three-way symbolic ultrametric for all Y ⊆ X of size ﬁve. Proof. The fact that a three-way symbolic ultrametric on X restricts to such an ultrametric on all subsets of X of size ﬁve is clear. Conversely, assume that δ| is a three-way symbolic ultrametric for all Y ⊆ X of size ﬁve. For such a set Y , wedenoteby T =(T ,t ) Y Y Y the unique (by Lemma 4.3) discriminating labelled tree that represents δ| ,and by D the symbolic ultrametric that is represented by T . Y Y Y Clearly, if there exists a map D : → M such that, for all subsets Y ⊆ X of size ﬁve, the restriction of D to coincides with D , then D satisﬁes δ(x, y, z)= {D(x, y),D(x, z),D(y, z)}, for all x, y, z ∈ X pairwise distinct. Moreover, since D is a symbolic ultrametric on any subset Y ⊆ X of size ﬁve, and given that the property of being a symbolic ultrametric is based on a 4-point condition, we have that such amap D, if it exists, is also a symbolic ultrametric. Thus, if D exists, then δ is a three-way symbolic ultrametric. To show that D exists, assume for contradiction that there exist x and y in X and two distinct subsets Y and Y of X of size ﬁve, both 1 2 containing x and y, such that D (x, y) = D (x, y). We may assume Y Y 1 2 without loss of generality that I = Y ∩ Y has size four. Moreover, we 1 2 claim that x, y, Y and Y can be chosen in such a way that δ| is not 1 2 I of type δ , as deﬁned in Table 1. To prove this claim, consider the case where δ| is of type δ I 3 (otherwise, the claim trivially holds). Assume Y = {x, y, z, t, u } and 1 1 Y = {x, y, z, t, u }, which implies I = {x, y, z, t}. Both the subtree 2 2 of T induced by I and the subtree of T induced by I are of the Y Y 1 2 form T in Figure 4, and their underlying phylogenetic trees are not isomorphic. We can assume that one has cherries {x, y} and {t, z} and the other has cherries {x, z} and {t, y}. Then, we have not only that 528 K.T. Huber, V. Moulton, and G.E. Scholz D (x, y) = D (x, y), but also that D (x, z) = D (x, z), D (z, t) = Y Y Y Y Y 1 2 1 2 1 D (z, t), and D (y, t) = D (y, t). Moreover, it is easy to check that Y Y Y 2 1 2 there exists a subset Y ⊂ I of size three such that neither δ| ∗ Y ∪{u } nor δ| is of type δ . Y ∪{u } Since Y is a subset of I of size three and, in view of the four inequalities listed above, there exists two elements x ,y ∈ Y such that D (x ,y ) = D (x ,y ). If we denote by Y the set Y ∪{u }∪{u }, Y Y 1 2 1 2 we have that both Y ∩ Y and Y ∩ Y have size four, and that at least 1 2 one of D (x ,y ) = D (x ,y )or D (x ,y ) = D (x ,y ) holds. If the Y Y Y Y 1 2 ﬁrst inequality holds, the claim is then satisﬁed for x ,y ,Y and Y . Otherwise, it is satisﬁed for x ,y ,Y and Y , which completes the proof of the claim. Now, in light of the claim, the representation T of δ| is unique, I I and so is the symbolic ultrametric D that is represented by T .More- over, D is precisely the restriction of D to I , and the restriction of D to I . In particular, we have D(x, y)= D (x, y)and D(x, y)= Y Y 2 1 D (x, y), which contradicts D (x, y) = D (x, y). Y Y Y 2 1 2 5. A Five-Point Characterization of Three-Way Symbolic Ultrametrics We now focus on using the results in the previous two sections to derive conditions for characterizing three-way symbolic ultrametrics that are analogous to conditions (U1) and (U2) for symbolic ultramet- rics. In the following, we shall consider expressions of the form α m,where α is a real number, which arise when we take lin- m m m∈M ear combinations of multisets in M. We shall say that such an expres- sion α m is valid for M if the coeﬃcient for each element in M is m∈M contained in N. For example, for M = {a, b},if S =2a + b, S =2b + a 1 2 and S =3a are multisets in M,then we have (S + S )= a + b,which 3 1 2 1 5 1 is valid for M , but S − S = a − b and (S + S )= a + b which are 3 1 1 3 2 2 2 not valid for M . Now, suppose that δ : →M is a three-way symbolic map where |X|≥ 5. Let Y = {x, y, z, u, v} be a subset of X.Let ν (δ) denote the vector (δ(x, y, z),δ(x, y, u),... ,δ(z, u, v)). In addition, suppose that D : → M is a map such that δ(a, b, c)= {D (a, b),D (a, c),D (b, c)} Y Y Y Symbolic Tree-Maps and Ultrametrics 529 for all a, b, c ∈ Y ,and let μ (δ) denote the vector (D (x, y),D (x, z),...,D (u, v)). Y Y Y By deﬁnition of D , it is straight-forward to check that Aμ (δ)= Y Y ν (δ), where ⎛ ⎞ 1 1001 0000 0 ⎜ ⎟ 1 0100 1000 0 ⎜ ⎟ ⎜ ⎟ 1 0010 0100 0 ⎜ ⎟ ⎜ ⎟ 0 1100 0010 0 ⎜ ⎟ ⎜ ⎟ ⎜ 0 1010 0001 0⎟ A = ⎜ ⎟ . ⎜ ⎟ 0 0110 0000 1 ⎜ ⎟ ⎜ ⎟ 0 0001 1010 0 ⎜ ⎟ ⎜ ⎟ 0 0001 0101 0 ⎜ ⎟ ⎝ ⎠ 0 0000 1100 1 0 0000 0011 1 Note that in Hermann et al. (2012) it was shown that the matrix A is invertible with inverse ⎛ ⎞ 22 2 −1 −1 −1 −1 −1 −12 ⎜ ⎟ 2 −1 −12 2 −1 −1 −12 −1 ⎜ ⎟ ⎜ ⎟ −12 −12 −12 −12 −1 −1 ⎜ ⎟ ⎜ ⎟ −1 −12 −12 2 2 −1 −1 −1 ⎜ ⎟ ⎜ ⎟ 1 2 −1 −1 −1 −1 222 −1 −1 ⎜ ⎟ −1 A = ⎜ ⎟ . ⎜ −12 −1 −12 −12 −12 −1⎟ ⎜ ⎟ ⎜ ⎟ −1 −12 2 −1 −1 −12 2 −1 ⎜ ⎟ ⎜ ⎟ −1 −12 2 −1 −12 −1 −12 ⎜ ⎟ ⎝ ⎠ −12 −1 −12 −1 −12 −12 2 −1 −1 −1 −12 −1 −12 2 −1 −1 Consider the product μ (δ)= A ν (δ). Then, as the rows of A Y Y are indexed by pairs of distinct elements in Y , it is straight-forward −1 to check by considering the {p, q}th row of A (for p = q ∈ Y )and putting {e, f, g} = Y −{p, q} and S (δ)= (2(δ(p, q, e)+ δ(p, q, f)+ δ(p, q, g)+ δ(e, f, g)) p,q − (δ(p, a, b)+ δ(q, a, b))), a,b∈Y −{p,q} Y Y that S = {D (p, q)}. Deﬁning S as above for Y ⊆ X with |Y | =5 p,q p,q and p = q ∈ Y we also have: 530 K.T. Huber, V. Moulton, and G.E. Scholz Proposition 5.1 Suppose that |X|≥ 5, that δ : →M is a three- way symbolic map, and that Y ⊆ X has size ﬁve. There exists a map Y Y Y Y D : → M such that δ(a, b, c)= {D (a, b),D (a, c),D (b, c)} for all a, b, c ∈ Y if and only if for all p, q ∈ Y distinct, S (δ) is valid for p,q M,in which case S (δ) is a singleton multiset. p,q Proof. Suppose ﬁrst that the map D exists. Without loss of generality we may assume that D = D . In view of the discussion preceding the proposition, it follows that S (δ) is valid for M for all p, q ∈ Y p,q distinct, as S (δ)= {D (p, q)}. p,q To see the converse, assume that S (δ)is valid for M for all p,q p = q ∈ Y .Fix p and q. We claim that S (δ) is a singleton multiset. p,q To see this, put A =2(δ(p, q, e)+ δ(p, q, f)+ δ(p, q, g)+ δ(e, f, g)) and B = (δ(p, a, b)+ δ(q, a, b)). Then since S (δ)is valid for a,b∈Y −{p,q} p,q M , every element in B must also be an element in A. Hence, S (δ) p,q must contain |A − B| = 1 element as |A| =24 and |B| = 18. This proves the claim. Y Y Now, it is straight forward to see that if S (δ)= {s },for p,q p,q Y Y p = q ∈ Y , then the map D : → M deﬁned by putting D (p, q)= s (δ), for all p = q ∈ Y , satisﬁes the stated property. p,q We now present conditions for characterizing when a three-way symbolic map is a three-way symbolic ultrametric. For Σ ∈M,we deﬁne the elements m(Σ) and n(Σ) of M as follows: • If Σ contains a single element A ∈ M repeated three times, we put m(Σ) = n(Σ) = A. • If Σ contains two distinct elements, we deﬁne m(Σ) as the element of Σ appearing twice and n(Σ) as the element appearing only once. • If Σ contains three distinct elements, we put m(Σ) = n(Σ) = ∅. Note that if Σ contains two or fewer distinct elements, then Σ = {m(Σ),m(Σ),n(Σ)}. Theorem 5.2 Suppose that |X|≥ 5 and that δ : →M is a three- way symbolic map. Then δ is a three-way symbolic ultrametric if and only if the following hold: (P1) For all subsets Y ⊆ X of size ﬁve and all x, y ∈ Y distinct, S (δ) x,y is valid for M . (P2) For all pairwise distinct x, y, z ∈ X, δ(x, y, z) contains at most two distinct elements. Symbolic Tree-Maps and Ultrametrics 531 (P3) For all pairwise distinct x, y, z, u ∈ X with δ(x, y, z)= δ(y, z, u) = δ(x, y, u)= δ(x, z, u) holding, we have m(δ(x, y, z)) = m(δ(x, y, u)). Proof. Assume ﬁrst that δ is a three-way symbolic ultrametric. By Theorem 4.4 and Proposition 5.1 it follows that Properties (P1) and (P2) must hold. To see that Property (P3) holds too let {x, y, z, u}∈ be such that δ(x, y, z)= δ(y, z, u) = δ(x, y, u)= δ(x, z, u). Since δ| is a three-way symbolic ultrametric, Proposition 4.2 combined {x,y,z,u} ˆ ˆ with Table 1 implies that δ| is either of type δ and δ .Clearly, {x,y,z,u} 3 5 ˆ ˆ m(δ (x, y, z)) = m(δ (x, y, u)) holds for i =3, 5 and, so, Property (P3) i i follows. Conversely, assume that δ satisﬁes Properties (P1) – (P3). Con- sider a subset Y ⊆ X of size ﬁve. By Proposition 5.1, there exists a map Y Y Y Y D : → M such that δ(x, y, z)= {D (x, y),D (x, z),D (y, z)} for all x, y, z ∈ Y . We claim that D is a symbolic ultrametric. For this it suﬃces to show that D satisﬁes Property (U2) as Property (U1) is a direct consequence of Property (P1). To see that D satisﬁes Property (U2), assume for contradiction that there exist pairwise distinct x, y, z, u ∈ Y such that D (x, y)= Y Y Y Y Y D (y, z)= D (z, u) = D (z, x)= D (x, u)= D (u, y). Put A = Y Y D (x, y)and B = D (z, x). Then δ(x, y, z)= δ(y, z, u)= 2A + B = A +2B = δ(x, y, u)= δ(x, z, u). Since, m(δ(x, y, z)) = A = B = m(δ(x, y, u)) also holds this is impossible in view of Property (P3). Thus, D also satisﬁes Property (U2) and, so, is a symbolic ultrametric, as claimed. Since D is a symbolic ultrametric, there exists a labelled rooted Y Y tree T that represents D . Combined with the deﬁnition of D it follows that T also represents δ| .Thus, δ| is a three-way symbolic Y Y ultrametric and, so, δ| is a three-way symbolic ultrametric for all sub- sets Y ⊆ X with |Y | = 5. By Theorem 4.4, it follows that δ is a three-way symbolic ultrametric. Note that Properties (P1) – (P3) are independent of each other. Indeed, that Property (P2) is independent of Properties (P1) and (P3) and that Property (P3) is independent of Properties (P1) and (P2) is a direct consequence of the fact that Properties (U1) and (U2) are independent of each other. To see that Property (P1) is independent of Properties (P2) and (P3), consider the three-way symbolic map δ : →M deﬁned, {A,B} for all x, y, z ∈ X , by putting δ(x, y, z)= 2A + B.The map δ always satisﬁes (P2) and (P3), but if |X|≥ 5, δ does not satisfy (P1). 532 K.T. Huber, V. Moulton, and G.E. Scholz 6. Reconstructing Three-Way Symbolic Ultrametric Representations Using Triplets In this section, we are interested in determining when a three- way symbolic map δ on X is a tree-map or a symbolic ultrametric. Clearly, using the conditions given in Theorem 5.2 this can be done by examining every subset of X with size ﬁve. However, we now show how to do this using a triplet-based approach, which essentially reduces the problem to considering subsets of X of size three. Recall that a triplet is a binary phylogenetic tree on three leaves. By xy|z we denote the triplet with leaf-set {x, y, z} which has x, y adjacent to the same vertex in the tree. For T a phylogenetic tree on X and x, y, z ∈ X,we say that T displays the triplet xy|z if lca (x, z)= lca (y, z) =lca (x, y). To keep notation at bay, we some- T T T times also say that a labelled tree T =(T, t)on X displays a triplet r if r is displayed by T . In Semple and Steel (2003, Section 7.6), a triplet-based approach is described for deciding whether or not a two-way symbolic map δ is a symbolic ultrametric or not and, if it is, for building a labelled tree which represents δ. This approach is based on the BUILD algorithm, that was presented under that name in Aho et al. (1981). Using the results in Section 3, the BUILD algorithm also allows us to check if a three-way symbolic map is a tree-map using triplets as follows. Suppose δ : → M is a three-way symbolic map. Pick any r ∈ X . Then, using the BUILD-based approach, we can check whether or not the map δ deﬁned in Section 3. is a symbolic ultrametric by taking the set of triplets xy|z with x, y, z ∈ X distinct, for which δ (x, y) = δ (x, z)= δ (y, z)holds r r r as input to BUILD. If this is not the case, then by Lemma 3.1, δ is not a three-way symbolic tree-map. Otherwise, if (T, t) is the representation of δ returned by BUILD, then we can simply check whether or not this leads to a representation of δ by attaching the leaf r to the root of T . If this is possible then δ is a three-way symbolic tree-map, otherwise it is not. We now turn our attention to three-way symbolic ultrametrics. We begin by presenting a key link between triplets and such maps whose proof is straight forward. Denote the underlying set of a multiset A by A. Proposition 6.1 Let T =(T, t) be a discriminating labelled tree. For x, y, z ∈ X distinct: (T1) If xy|z is a triplet displayed by T,then t(lca (x, z)) = t(lca (y, z)) = T T m(δ (x, y, z)) and t(lca (x, y)) = n(δ (x, y, z)) T T T (T2) If T does not display any triplet on {x, y, z},then |δ (x, y, z)| =1. T Symbolic Tree-Maps and Ultrametrics 533 Corollary 6.2 Up to isomorphism, any labelled tree T can be uniquely reconstructed from δ and the set of triplets displayed by T . Proof. Put T =(T, t)and δ = δ .Let t : V (T ) → M and let R denote the set of triplets displayed by T . Deﬁne a map D : → M as follows. Suppose x, y ∈ X distinct. If there exists some z ∈ X −{x, y} such that no triplet on {x, y, z} is contained in R, then deﬁne D (x, y) to be the element in δ (x, y, z). If there exists some z ∈ X −{x, y} such that xy|z ∈R then put D (x, y)= n(δ (x, y, z)) and if xz|y ∈R δ T then put D (x, y)= m(δ (x, y, z)). In view of Proposition 6.1, the map δ T D is clearly well-deﬁned. The corollary now follows in view of Theorem 2.1 as D is equal to the symbolic ultrametric D that is represented by T (as D (x, y)= T δ t(lca(x, y)) = D (x, y) clearly holds for all x, y ∈ X distinct). In light of Corollary 6.2, it is of interest to understand when, for a labelled tree T , the set of triplets displayed by T can be obtained from δ . The tree T in Figure 4, suggests that this is not always possible. T 3 In fact, as we shall show, it suﬃces to exclude a special type of labelled tree which we deﬁne next. A ﬁxed-cherry tree on X (with cherry {x ,x }), |X|≥ 4, is a 1 2 labelled tree T =(T, t)on X such that the root ρ of T has two children v and w with t(v)= t(w) = t(ρ ), v is the parent of two elements x and T 1 x of X,and w the parent of all elements in X −{x ,x }. For example, 2 1 2 the tree T in Figure 4 is a ﬁxed-cherry tree on X = {1, 2, 3, 4} with cherry {1, 2}.Note that if T =(T, t) is a ﬁxed-cherry tree with cherry {x ,x } and x, y, z ∈ X distinct, then δ (x, y, z)= {t(w),t(w),t(w)} if 1 2 T neither x nor x belong to {x, y, z} and δ(x, y, z)= {t(ρ ),t(ρ ),t(w)} 1 2 T T else. We call a three-way symbolic map δ : →M that satisﬁes these conditions for some x = x ∈ X a ﬁxed cherry map (with cherry 1 2 {x ,x }). The following observation is straight-forward to check. 1 2 Lemma 6.3 Suppose that |X|≥ 5 and that δ is a three-way symbolic map on X.Then δ can be represented by a ﬁxed-cherry tree on X with cherry {x ,x } if and only if δ is a ﬁxed-cherry map with cherry 1 2 {x ,x }. 1 2 Note that a triplet xy|z with x, y, z ∈ X is displayed by a ﬁxed- cherry tree on X with cherry {x ,x } if and only if either {x, y} = 1 2 {x ,x } or z ∈{x ,x },and x, y ∈ X −{x ,x } hold. In particular, 1 2 1 2 1 2 if |X | > 4and δ is a ﬁxed-cherry map, then the cherry can be easily identiﬁed from δ, and therefore also all of the triplets displayed by T . 534 K.T. Huber, V. Moulton, and G.E. Scholz We now consider how to obtain the triplets displayed by a labelled tree T in case T is not a ﬁxed-cherry tree. We start with a useful lemma. Suppose T =(T, t) is a labelled tree and Y ⊆ X , |Y |≥ 4, is such that δ | has a unique discriminating representation. Then we denote that T Y representation by T =(T ,t ). Y Y Y Lemma 6.4 Let T be a discriminating labelled tree on X and assume that Y ⊆ X is such that δ | has a unique discriminating representa- T Y tion. If t is a triplet displayed by T ,then t is displayed by T . Proof. Put δ = δ and T =(T, t). It suﬃces to note that T is obtained T Y from T by ﬁrst taking the subtree T of T induced by Y ,and then collapsing edges of T both of whose end vertices have the same label under the restriction t of t to V (T ). Clearly, T is a discriminating representation of δ| . By assumption, it follows that T is the unique Y Y discriminating representation of δ| . It is well-known (Semple and Steel, 2003, Theorem 6.4.1) that the set R of triplets displayed by T is contained in the set of triplets displayed by T . Since the process of collapsing edges of T removes triplets from R, but does not add any, it follows that a triplet displayed by T is also displayed by T . We now present the main result of this section. Theorem 6.5 Suppose that |X|≥ 4 and that T is a labelled tree on X that is not a ﬁxed-cherry tree. Then, for all x, y, z ∈ X distinct, T displays the triplet xy|z if and only if one of the following two properties holds: (P1) There exists some u ∈ X such that δ (x, u, z)= δ (y, u, z) = T T δ (x, y, u) and if |δ (x, y, u)| =1 then δ (x, y, u) = δ (x, y, z). T T T T (P2) There exists some u ∈ X such that |{δ (x, u, z),δ (y, u, z), T T δ (x, y, u)}| =3 and m(δ (x, u, z)) = m(δ (y, u, z)) = m(δ (x, y, u)). T T T T Proof. Put T =(T, t)and δ = δ . Assume ﬁrst that x, y, z ∈ X distinct are such that T displays the triplet xy|z.Put v =lca(x, z)and w =lca(x, y). We proceed using a case-analysis on the structure of T . Since T is not a ﬁxed-cherry tree we need to consider the following (not necessarily disjoint) cases: (a): w is not a child of v,(b): v is not the root of T or has outdegree three or more, (c): w has a child that is neither x nor y, and (d): there exists a vertex v on the path from v to z with t(v ) = t(w). Case (a): Consider the parent v of w, and an element u in X that is below v but not below w (see Figure 5(a)). Since T is a discrim- 0 Symbolic Tree-Maps and Ultrametrics 535 v v v v 0 v w v v w w 0 w 0 x y uz x y zu xu y z x y zu (a)(b)(c)(d) Figure 5. Cases (a)-(d) for the case-analysis carried out in the proof of Theorem 6.5. See text for details. inating representation for δ ,we have t(v ) = t(w). Hence, δ(x, u, z)= T 0 δ(y, u, z)= {t(v ),t(v),t(v)} and δ(x, y, u)= {t(w),t(v ),t(v )}.Con- 0 0 0 sequently, δ(x, u, z)= δ(y, u, z) = δ(x, y, u). Note that if t(v)= t(w), then δ(x, y, z)= {t(w),t(v),t(v)} and, so, |δ(x, y, z)| =1. But then δ(x, y, u) = δ(x, y, z)as |δ(x, y, u)| = 2. Hence, the second condi- tion in Property (P1) holds, too. So assume that t(w) = t(v). Then |δ(x, y, u)| = 2 and so the second condition in Property (P1) does not apply. Case (b): Consider an element of u ∈ X such that v := lca(u, z)= lca(u, x) (see Figure 5(b)). If w is not a child of v then Property (P1) follows by Case (a). So assume that w is a child of v.Then t(v) = t(w)as T is a discriminating representation for δ .Since δ(x, u, z)= δ(y, u, z)= {t(v),t(v ),t(v )} and δ(x, y, u)= {t(w),t(v ),t(v )} we 0 0 0 0 have δ(x, u, z)= δ(y, u, z) = δ(x, y, u). Since the choice of v implies that |δ(x, y, u)| = 1 the second condition in Property (P1) does not apply. Hence, Property (P1) is also satisﬁed in this case. Case (c): Then there is some u ∈ X below w that is neither x nor y. We may assume without loss of generality that w =lca(y, u). Put v =lca(x, u) (see Figure 5(c)). Note that v = w may hold. 0 0 Clearly, δ(x, u, z)= {t(v ),t(v),t(v)}, δ(y, u, z)= {t(w),t(v),t(v)} and δ(x, y, u)= {t(v ),t(w),t(w)}.If v = w then δ(y, u, z) = δ(x, u, z) = 0 0 δ(x, u, y). Hence, |{δ(y, u, z),δ(x, u, z),δ(x, u, y)}| =3. Since m(δ(x, u, z)) = t(v)= m(δ(y, u, z)) and m(δ(x, y, u)= t(w), Property (P2) follows. So assume that v = w.Then δ(y, u, z)= δ(x, u, z)= {t(w),t(v),t(v)} and δ(y, u, x)= {t(w),t(w),t(w)}.In view of Prop- erty (P1) holding if Case (a) applies, we may assume without loss of gen- erality that w is a child of v.Since T is a discriminating representation of δ we have t(v) = t(w). Hence, δ(y, u, z)= δ(x, u, z) = δ(x, y, u). Since |δ(x, y, z)| =1 and |δ(x, y, z)| = 1, Property (P1) follows in this case, too. 536 K.T. Huber, V. Moulton, and G.E. Scholz Case (d): Let u ∈ X such that v =lca(z, u) (see Figure 5(d)). Then δ(x, u, z)= δ(y, u, z)= {t(v ),t(v),t(v)} and δ(x, y, u)= {t(w), t(v),t(v)}.If t(w)= t(v )we have δ(x, u, z)= δ(y, u, z)= δ(x, y, u). In view of Property (P1) holding if Case (a) applies, we may assume with- out loss of generality that w is a child of v. Hence, t(w) = t(v) because T is a discriminating representation for δ.But then |δ(x, y, u)| = 1 and, so, the second condition in Property (P1) does not apply. Conversely, let x, y, z ∈ X distinct. Assume ﬁrst that there exists some u ∈ X −{x, y, z} such that Property (P1) is satisﬁed for the name- sakes of u, x, y,and z. Consider the restriction δ of δ to {x, y, u, z}. Let T =(T ,t ) denote a discriminating representation of δ .To see that xy|z is displayed by T we claim ﬁrst that T is the unique dis- criminating representation of δ . Tosee theclaim, weshowthat xy|z is displayed by T . Assume for contradiction that the triplet xy|z is not displayed by T . In view of the ﬁrst condition in Property (P1), the outdegree of the root ρ cannot be four. Hence, one of the triplets x|yz and y|xz must be displayed by T and T is either resolved or un- resolved. Assume ﬁrst that T is resolved. Then a straight forward case analysis concerned with adding u to the triplet x|yz implies that that triplet cannot be displayed by T . Swapping the roles of x and y in that argument also implies that the triplet y|xz cannot be displayed by T either. Thus, T must be unresolved and, so, either ρ has outdegree three or one of the children of ρ has outdegree three. If T displays the triplet x|yz and the outdegree of ρ is three then |δ(y, u, z)| = 1. Hence, δ(x, y, u)= δ(x, y, z) in view of the sec- ond condition in Property (P1) which is impossible. Thus, one of the children of ρ has outdegree three. But this is impossible in view of the ﬁrst condition in Property (P1). Similar arguments imply that the triplet displayed by T cannot be y|xz either which is impossible. Thus, T must display the triplet xy|z. Consequently, either ρ is the parent of u and z or x, y,and u have the same parent. In either case it follows that δ cannot be of type δ .Thus, T is the unique discriminating representation of δ , as claimed. By Lemma 6.4, it follows that xy|z must be displayed by T . Assume next that there exists some u ∈ X −{x, y, z} such that Property (P2) is satisﬁed for the namesakes of u, x, y,and z.Consider of δ to {x, y, u, z}.Then δ must have a repre- again the restriction δ sentation T =(T ,t ). In view of the ﬁrst condition of Property (P2), T must be discriminating. A straight forward case analysis implies that δ cannot be of type δ .Thus, T is the unique discriminating representation of δ . Symbolic Tree-Maps and Ultrametrics 537 In view of Table 1, there must exist at least two subsets Y and Y of {x, y, z, u} of size three satisfying δ(Y)= δ(Y ). Since |{δ(x, u, z),δ(y, u, z),δ(x, y, u)}| = 3, it follows that {x, y, z} must be one of these subsets. If δ(x, y, z)= δ(x, y, u), then D (x, u)= m(δ(x, u, z)) and D (y, u)= m(δ(y, u, z)) must hold where D is the symbolic T T ultrametric represented by T . Indeed, since δ(x, y, u)= δ(x, y, z), one of the following two cases must hold: (α) D (x, z)= D (x, u)and T T D (y, z)= D (y, u)and (β) D (x, z)= D (y, u)and D (y, z)= T T T T T D (x, u). However Case (β)implies δ(x, z, u)= δ(y, z, u), which is im- possible in view of the assumption that {δ(x, y, u),δ(y, z, u),δ(x, z, u)} has size three. Thus, Case (α) must hold. But then D (x, u)= m(δ(x, u, z)) and D (y, u)= m(δ(y, u, z)), as required. Since, by assumption, we also have m(δ(x, u, z)) = m(δ(y, u, z)) we obtain D (x, u)= D (y, u). Since D (x, u)and D (y, u) are both T T T T elements in the multiset δ(x, y, u), we obtain m(δ(x, u, z)) = D (x, u)= m(δ(x, y, u)), which is impossible in view of (P2). Thus, we either have δ(x, y, z)= δ(x, u, z)or δ(x, y, z)= δ(y, u, z). Note that the roles of x and y are interchangeable here, so we may assume without loss of generality that δ(x, y, z)= δ(y, u, z). Using similar arguments as before, we have D (x, z)= m(δ(x, u, z)), D (x, y)= m(δ(x, u, y)), and D (y, z)= m(δ(y, u, z)) in this case. T T By Property (P2), it follows that D (x, z)= D (y, z) = D (x, y). T T T Thus, xy|z is displayed by T and, by Lemma 6.4, xy|z is also displayed by T . We now explain how, as a direct consequence of Lemma 6.3 and Theorem 6.5, it is possible to decide whether or not a three-way sym- bolic map δ on a set X with |X|≥ 5 is a three-way symbolic ultrametric by considering triplets and, if so, construct the labelled tree T which represents δ. First, check if δ is a ﬁxed-cherry map. If this is the case, then δ is a three-way symbolic ultrametric and T can be easily constructed. If not, then compute the set Tr(δ) of triplets of X satisfying Properties (P1) or (P2), and use it as input to the BUILD algorithm. If there is no tree that display all the triplets in Tr(δ), then δ is not a three-way symbolic ultrametric. Otherwise, using the tree T that is constructed from the BUILD algorithm and the map δ, it is straight-forward to decide if there is a labelling map t for T such that (T, t)represents δ. If this is the case, then δ is a three-way symbolic ultrametric which has the computed labelled tree (T, t) as a representation, otherwise it is not. Note that BUILD may return a tree T from Tr(δ)even ifthe map δ is not a three-way symbolic ultrametric. For example, let M = {A, B} 538 K.T. Huber, V. Moulton, and G.E. Scholz and consider the map δ : →M where X = {1, 2, 3, 4, 5},and δ(x, y, z)= 3A if {x, y, z} = {3, 4, 5},and δ(x, y, z)= 2A+B otherwise. Although the map is clearly not representable by a labelled tree, we have Tr(δ)= {34|1, 34|2, 35|1, 35|2, 45|1, 45|2}, and it is easy to check that there exists a phylogenetic tree on X whose set of displayed triplets is Tr(δ). 7. Conclusion We conclude by presenting some possible future directions: • In Chepoi and Fichet (2007), some relationships are derived be- tween three-way and two-way dissimilarities in general. It would be interesting to see which of these relationships might be extend- able to symbolic three-way maps. • We deﬁne three-way tree-maps in terms of medians in leaf-labelled trees. Can any of our results be extended to median networks (Bandelt et al., 1995)? Also, can our results concerning three- way symbolic ultrametrics be extended to rooted phylogenetic networks? (cf. e.g. Huber and Scholz, 2018) • We can clearly consider generalizations of three-way symbolic maps to k-way symbolic maps, k ≥ 2 (see e.g Chepoi and Fichet, 2007; Deza and Rosenberg, 2000; Warrens, 2010), and therefore generalize the notion of a three-way symbolic ultrametric in the natural way. If δ is a k-way symbolic map and its restriction to every k + 2 subset is a k-way symbolic ultrametric, then is δ a symbolic ultrametric? • In Hellmuth et al. (2013), an application of symbolic ultrametrics to constructing genome-based phylogenies is presented. It would be interesting to see if this application could be extended to three- way maps. Note that results presented in Levy et al. (2006) might be relevant in this context. Also, it would be interesting to develop associated algorithms such as those in Lafond and El-Mabrouk (2015), for three-way symbolic maps. References AHO, A., SAGIV, Y., SZYMANSKI, T., and ULLMAN, J. (1981), “Inferring a Tree from Lowest Common Ancestors with an Application to the Optimiza- tion of Relational Expressions", SIAM Journal of Computing, 28, 1073–1085. BANDELT, H.-J., FORSTER, P., SYKES, B., and RICHARDS, M. (1995), “Mi- tochondrial Portraits of Human Populations Using Median Networks", Ge- netics, 141, 743–753. Symbolic Tree-Maps and Ultrametrics 539 BÖCKER, S., and DRESS, A. (1998), “Recovering Symbolically Dated, Rooted Trees from Symbolic Ultrametrics", Advances in Mathematics, 138, 105–125. CHEPOI, V., and FICHET, B. (2007), “A Note on Three-Way Dissimilarities and Their Relationship with Two-Way Dissimilarities", in Selected Contributions in Data Analysis and Classiﬁcation, eds. P. Briot et al., Berlin: Springer, pp. 465–475. DEZA, M.-M., and ROSENBERG, I. (2000), “n-Semimetrics", European Journal of Combinatorics, 21, 797–806. GRÜNEWALD, S., LONG, Y., and WU, Y. (2017) “Reconstructing Unrooted Phylogenetic Trees from Symbolic Ternary Metrics", arXiv:1702.00190 GURVICH, V. (1984), “Some Properties and Applications of Complete Edge- Chromatic Graphs and Hypergraphs", Soviet Mathematics Doklady, 30(3), 803–807. GURVICH, V. (2009), “Decomposing Complete Edge-Chromatic Graphs and Hy- pergraphs", Discrete Applied Mathematics, 157, 3069–3085. HAYASHI, C. (1972), “Two Dimensional Quantiﬁcation Based on the Measure of Dissimilarity Among Three Elements", Annals of the Institute of Statistical Mathematics, 24, 251–257. HEISER, W., and BENNANI, M. (1997), “Triadic Distance Models: Axiomatiza- tion and Least Squares Representation", Journal of Mathematical Psychol- ogy, 41, 189–206. HELLMUTH, M., HERNANDEZ-ROSALES, M., HUBER, K.T., MOULTON, V., STADLER, P., and WIESEKE, N. (2013), “Orthology Relations, Sym- bolic Ultrametrics and Cographs", Journal of Mathematical Biology, 66, 399– HELLMUTH, M., WIESEKE, N., LECHNER, M., LENHOF, H.-P., MIDDEN- DORF, M., and STADLER, P.F. (2015), “Phylogenomics with Paralogs", PNAS, 112(7), 2058–2063. HERRMANN, S., HUBER, K.T., MOULTON, V., and SPILLNER, A. (2012), “Recognizing Treelike k-Dissimilarities", Journal of Classiﬁcation, 29, 321– HUBER, K.T., and SCHOLZ, G.E. (2018) “Beyond Representing Orthology Re- lations by Trees", Algorithmica, 80(1), 73–103. JOLY, S., and LE CALVÉ, G. (1995), “Three-Way Distances", Journal of Clas- siﬁcation, 12, 191–205. LAFOND, M., and EL-MABROUK, N. (2015), “Orthology Relation and Gene Tree Correction: Complexity Results WABI 2015, Algorithms in Bioinfor- matics, 9289, 966–979. LEVY, D., YOSHIDA, R., and PACHTER, L. (2006), “Beyond Pairwise Dis- tances: Neighbor-Joining with Phylogenetic Diversity Estimates", Molecular Biology and Evolution, 23, 491–498. SEMPLE, C., and STEEL, M. (2003), Phylogenetics, Oxford Lecture Series in Mathematics and its Applications, Oxford: Oxford University Press. SIMONYI, G. (2001), Perfect Graphs and Graph Entropy, An Updated Survey, Wiley-Interscience Series in Discrete Mathematics and Optimization, Chich- ester: Wiley. WARRENS, M. (2010), 11n-Way Metrics", Journal of Classiﬁcation, 27, 173–190. 540 K.T. Huber, V. Moulton, and G.E. Scholz Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/ licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.
Journal of Classification – Springer Journals
Published: Oct 31, 2018
You can share this free article with as many people as you like with the url below! We hope you enjoy this feature!
Read and print from thousands of top scholarly journals.
Already have an account? Log in
Bookmark this article. You can see your Bookmarks on your DeepDyve Library.
To save an article, log in first, or sign up for a DeepDyve account if you don’t already have one.
Copy and paste the desired citation format or use the link below to download a file formatted for EndNote
Access the full text.
Sign up today, get DeepDyve free for 14 days.
All DeepDyve websites use cookies to improve your online experience. They were placed on your computer when you launched this website. You can change your cookie settings through your browser.