Access the full text.
Sign up today, get DeepDyve free for 14 days.
Thu-Hien To, M. Habib (2009)
Level-k Phylogenetic Networks Are Constructable from a Dense Triplet Set in Polynomial Time
J. Oldman, Taoyang Wu, Leo Iersel, V. Moulton (2016)
TriLoNet: Piecing Together Small Networks to Reconstruct Reticulate Evolutionary Histories.Molecular biology and evolution, 33 8
L. Iersel, J. Keijsper, S. Kelk, L. Stougie, F. Hagen, T. Boekhout (2007)
Constructing Level-2 Phylogenetic Networks from TripletsIEEE/ACM Transactions on Computational Biology and Bioinformatics, 6
K. Huber, L. Iersel, V. Moulton, Céline Scornavacca, Taoyang Wu (2014)
Reconstructing Phylogenetic Level-1 Networks from Nondense Binet and Trinet SetsAlgorithmica, 77
K. Huber, Leo Iersel, V. Moulton, Taoyang Wu (2014)
How Much Information is Needed to Infer Reticulate Evolutionary Histories?Systematic Biology, 64
J. Fischer, D. Huson (2010)
New common ancestor problems in trees and directed acyclic graphsInf. Process. Lett., 110
Leo Iersel, V. Moulton, Eveline Swart, Taoyang Wu (2017)
Binets: Fundamental Building Blocks for Phylogenetic NetworksBulletin of Mathematical Biology, 79
K. Huber, V. Moulton (2011)
Encoding and Constructing 1-Nested Phylogenetic Networks with TrinetsAlgorithmica, 66
K. Huber, L. Iersel, S. Kelk, Radosław Suchecki (2009)
A Practical Algorithm for Reconstructing Level-1 Phylogenetic NetworksIEEE/ACM Transactions on Computational Biology and Bioinformatics, 8
which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source
J. Jansson, W. Sung (2005)
Algorithms for combining rooted triplets into a galled phylogenetic network
D. Gusfield (2014)
ReCombinatorics: The Algorithmics of Ancestral Recombination Graphs and Explicit Phylogenetic Networks
L. Jetten, Leo Iersel (2016)
Nonbinary Tree-Based Phylogenetic NetworksIEEE/ACM Transactions on Computational Biology and Bioinformatics, 15
D. Huson, Regula Rupp, Céline Scornavacca (2011)
Phylogenetic Networks: Concepts, Algorithms and Applications
L. Nakhleh (2010)
Evolutionary Phylogenetic Networks: Models and Issues
L. Iersel, V. Moulton (2012)
Trinets encode tree-child and level-2 phylogenetic networksJournal of Mathematical Biology, 68
J. Jansson, W. Sung (2004)
Inferring a Level-1 Phylogenetic Network from a Dense Set of Rooted TripletsTheor. Comput. Sci., 363
A. Dress, V. Moulton, M. Steel, Taoyang Wu (2009)
Species, clusters and the 'Tree of life': a graph-theoretic perspective.Journal of theoretical biology, 265 4
L. Iersel, S. Kelk (2008)
Constructing the Simplest Possible Phylogenetic Network from TripletsAlgorithmica, 60
Gabriel Cardona, M. Llabrés, F. Rosselló, G. Valiente (2009)
Comparison of Galled TreesIEEE/ACM Transactions on Computational Biology and Bioinformatics, 8
Journal of Classification (2019) 36:200–231 DOI: 10.1007/s00357--018-9279-5 Hierarchies from Lowest Stable Ancestors in Nonbinary Phylogenetic Networks Katharina T. Huber University of East Anglia, UK Vincent Moulton University of East Anglia, UK Taoyang Wu University of East Anglia, UK Abstract: The reconstruction of the evolutionary history of a set of species is an important problem in classiﬁcation and phylogenetics. Phylogenetic networks are a generalization of evolutionary trees that are used to represent histories for species that have undergone reticulate evolution, an important evolutionary force for many organisms (e.g. plants or viruses). In this paper, we present a novel approach to understanding the structure of networks that are not necessarily binary. More speciﬁ- cally, we deﬁne the concept of a closed set and show that the collection of closed sets of a network forms a hierarchy, and that this hierarchy can be deduced from either the subtrees or subnetworks on all 3-subsets. This allows us to also show that closed sets generalize the concept of the SN-sets of a binary network, sets which have proven very useful in elucidating the structure of binary networks. We also characterize the minimal closed sets (under set inclusion) for a special class of networks (2-terminal networks). Taken together, we anticipate that our results should be useful for the development of new phylogenetic network reconstruction algorithms. Keywords: Phylogenetic network; Hierarchy; Lower Stable Ancestor; Nonbinary network. We would like to thank the two anonymous referees for their helpful and constructive comments on a previous version of this paper. Corresponding Author’s Address: T. Wu, School of Computing Sciences, University of East Anglia, UK, email: taoyang.wu@uea.ac.uk. Published online: 16 November 2018 201 Hierarchies from Phylogenetic Networks 1. Introduction Phylogenetic networks are a generalization of evolutionary trees which are used by biologists to represent the evolution of a collection of species X with a reticulate evolutionary history. Essentially, a phylogenetic network N is a rooted, directed acyclic graph (or DAG), with a single root and leaf set labeled by the species in X (see Figure 1) for an example of such a network with leaf-set X = {1, 2,... , 8}). Internal vertices in N represent ancestors of the species in X, with the root representing the highest common ancestor of all species in X. Those internal vertices which are the child of a single vertex represent a speciation event, and those which are the child of more than one vertex a reticulate event. The latter type of event might, for exam- ple, be the hybridization of plant species to form a new hybrid species, or the recombining of viruses to form a new virus. In recent years, there has been a great deal of work on trying to de- velop new methods to construct phylogenetic networks from biological data (e.g. from molecular sequences). Some recent reviews concerning phylo- genetic networks and approaches to construct them include Gusﬁeld (2014) and Huson, Rupp, and Scornavacca (2010). One approach that has proven helpful in practice is to build up phylogenetic networks from smaller trees or networks. Two speciﬁc examples relying on this approach involve building phylogenetic networks from triplets or from trinets, which are 3-leaved phy- logenetic trees and networks, respectively (see next section for formal deﬁ- nitions). They are presented in, for example, Huber et al. (2017), Jansson, Nguyen and Sung (2006), Jansson and Sung (2006), To and Habib (2009), van Iersel and Kelk (2011), and van Iersel and Moulton (2014), and ex- amples of their application to biological data may be found in Huber et al. (2011), Oldman et al. (2016), and van Iersel et al. (2009). These approaches aim to build binary phylogenetic networks from binary triplets or trinets, that is, networks/triplets/trinets in which the root has two children, the sum of the in-degree and out-degree for every internal vertex is equal to three and each leaf has in-degree one. To do this, they exploit an interesting in- terplay between certain hierarchies on the leaf-set of the network and the triplets/trinets displayed by a network. However, in practice, the assumption that the networks are binary can be restrictive since, for example, it does not allow for representing uncertainty in the order of divergence or reticu- late events, and it may be necessary to allow for triplets/trinets that are not binary (Jetten and van Iersel, in press; Nakhleh, 2011). In this paper, we consider the problem of extending some of the the- ory underlying phylogenetic network construction from triplets and trinets to the nonbinary setting. As we shall see, this leads to some new results concerning phylogenetic networks which provide novel insights into their K.T. Huber, V. Moulton, and T. Wu Figure 1. An example of a phylogenetic network N on X = {1,. .., 8}. The arcs in N (and in all subsequent ﬁgures) are all directed away from the root ρ. structure. We expect that these results should prove useful for developing new approaches to constructing phylogenetic networks (see the last section for more details). Note that although nonbinary networks have not been commonly considered in the literature, some work has appeared on con- structing (Huber and Moulton, 2012) and comparing (Cardona et al. 2011) certain nonbinary networks. In addition, some structural results have ap- peared concerning nonbinary tree-based networks (Jetten and van Iersel, in press). We now summarize the contents of the rest of the paper. In the next section, we present some preliminaries concerning digraphs and phyloge- netic networks, including a brief introduction to triplets and trinets. Follow- ing on from that section, we then introduce the key new concept of a closed set. Recall that, given a non-empty subset Y of the leaf-set X,the lowest stable ancestor of Y , LSA(Y ), in a phylogenetic network N is the lowest vertex in N that is a common ancestor of every element in Y and that is contained in every dipath that connects the root of N to some element of Y (Fischer and Huson, 2010). We say that a subset Y ⊆ X is closed (in N) if |Y | =1,or if |Y|≥ 2 and the set of leaves below LSA(Y ) is equal to Y . For example, {5, 6} and {7, 8} are both closed sets for the phylogenetic network depicted in Figure 1. After proving some structural results concerning lowest stable ances- tors in Section 3, we give a characterization of the closed sets in a phyloge- netic network in terms of certain vertices in the network (Theorem 3.6). Us- ing this characterization, in Section 4 we prove that the collection of closed sets in X for a phylogenetic network with leaf-set X is a hierarchy (The- orem 4.1), i.e. the collection may be represented as some rooted tree with leaf-set X. We also show that the hierarchy of closed sets is directly re- lated to two further hierarchies on X that can be naturally associated to a 203 Hierarchies from Phylogenetic Networks network: namely, hierarchies that arise from the cut arcs and cut vertices of N (vertices and arcs whose removal disconnects N). As we shall see, the closed set, cut arc and cut vertex hierarchies associated to a network are not the same in general, although they are identical for binary networks. In Section 5, we consider the relationship between the closed sets of a network and the collections of triplets and trinets that it displays. For a binary phylogenetic network N, it is known that the hierarchy corresponding to the cut arcs of N may be retrieved from the collection of triplets displayed by N. One way to show this is to consider so-called SN-sets for arbitrary collections of triplets. The concept of SN-sets was introduced by Jansson and Sung (2006) as part of developing an algorithm to infer binary level-1 networks from triplet systems (see Section 2 for the deﬁnition of a level- 1 network). Intuitively, each SN-set is a subset which forms the leaf-set of a subnetwork of the network which is produced by their algorithm, and hence the name SN-set (“SubNetwork-set”). These sets turned out to be very useful in elucidating the structure of binary networks in general (see e.g. To and Habib, 2009). Here, we show that the closed sets in a phylogenetic network can be obtained by extending the notion of SN-sets to the nonbinary setting. More speciﬁcally, we prove that the collection of closed sets of a phylogenetic network is precisely the collection of (generalized) SN-sets (Corollary 5.7). In addition, we show that the cut arc sets of a phylogenetic network can be obtained from its collection of trinets (Theorem 5.2), which has been proven to hold in the binary setting (van Iersel and Moulton, 2014, Theorem 1). In Section 6, we consider a certain digraph that can be associated to the collection of trinets in a network, which we call the closure digraph. A simpler version of this digraph was considered in Oldman et al. (2016) for certain binary networks. The closure digraph is of interest since it can be used to help identify certain closed sets in a network. More speciﬁcally, us- ing a key result concerning sink sets in the closure digraph (Corollary 6.4), in Section 7 we show that for a special class of phylogenetic networks (2- terminal networks) the sink sets in the closure digraph associated to a phy- logenetic network N are precisely the minimal closed sets of N (under set inclusion). We conclude in Section 8, with a discussion of some open prob- lems and possible future directions. 2. Preliminaries Throughout this paper, X is a ﬁnite set with |X|≥ 3, unless stated otherwise. A subset Y ⊆ X is called a singleton if |Y | =1,and non- singleton if |Y |≥ 2. K.T. Huber, V. Moulton, and T. Wu Digraphs. A directed graph,or digraph for short, N =(V, E), is an ordered pair consisting of a set V = V (N) of vertices and a set E = E(N) of arcs, that is, ordered pairs (u, v) of distinct vertices u, v ∈ V (so in particular, there are no loops in N). Suppose N is a digraph and u, v ∈ V (N).If (u, v) is an arc of N then we say u is a parent of v and v a child of u.The in-degree of u is the number of its parents, and the out-degree of u is the number of its children. A root of N is a vertex with in-degree 0. A leaf of N is a vertex without any children. The set of leaves of N is denoted by L(N). Any vertex in N that is neither a root nor a leaf is referred to as an interior vertex of N. Suppose N is a digraph. Then we call a sequence P : u ,u ,... , u ,u 0 1 0 k u , k ≥ 1, of pairwise distinct vertices of N such that (u ,u ) is an arc in k i−1 i N, 1 ≤ i ≤ k,a directed path (or dipath for short) from u to u . Moreover, 0 k we refer to the vertices u and u as the ends of P , and all other vertices 0 k u0,uk of P as the interior vertices of P . A pair of dipaths in N is called u ,u u ,u 0 k 0 k openly disjoint if they do not share a vertex other than possibly their ends. A directed cycle in N is a dipath P in which the requirement that the ends of P are distinct is replaced by requiring that they coincide. If N does not contain a directed cycle then N is called acyclic. Such a graph is sometimes referred to as a DAG. A DAG N is called rooted if it contains a unique vertex ρ(N) that is the root. Suppose N is an acyclic digraph and there exists a dipath from u to v for some u, v ∈ V (N) distinct. Then we write it as v ≺ u,and say that v is below u and u is an ancestor of v. Note that if the digraph N in question is clear from the context we simply write v ≺ u rather than v ≺ u. Furthermore, we write v u if u = v or v ≺ u holds. Given a subset U ⊆ V (N),a vertex w ∈ U is called lowest if no vertex in U is below w.A common ancestor of a subset Y ⊆ V (N) is a vertex w ∈ V (N) that is an ancestor of each vertex in Y.Furthermore, w is called a lowest common ancestor of Y if it is lowest among all common ancestors of Y .If u is an interior vertex of N, then we refer to the set of leaves of N below u as the cluster C(u)= C (u) induced by u.In case u is a leaf of N,then we put C(u)= {u}. Suppose N is a digraph. For v ∈ V (N) we denote by N − v the di- graph obtained from N by removing v and all arcs incident with v.We call N connected if its underlying undirected graph (i.e., the graph obtained from N by discarding the directions of its arcs) is connected and disconnected otherwise. Note that a rooted acyclic digraph is necessarily connected. A vertex v of N is called a cut vertex of N if N − v is disconnected. Similarly, a cut arc of N is an arc whose removal disconnects N. A directed graph is called biconnected if it contains no cut vertices. A biconnected component N, also known as a block of N, is a maximal biconnected subgraph. A bi- of 205 Hierarchies from Phylogenetic Networks connected component is called trivial if it contains precisely one arc (which is necessarily a cut arc), and non-trivial otherwise. Finally, we call a vertex v ∈ V (N) a terminal vertex of N if there exists a biconnected component H of N such that v is a lowest vertex in V (H). Note that v could belong to several biconnected components, but at most one of them contains v as a terminal vertex. To illustrate this concept, consider the digraph N depicted in Figure 1. Then the parent vertex of 3 and 4 is not a terminal vertex of N whereas the parent vertex w of 5 and 6 is. Note that w is also contained in the biconnected components of N containing 5 and 6, respectively. Phylogenetic networks. A phylogenetic network N (on X) is a rooted DAG with leaf set X and which does not contain any degenerate vertices (i.e., ver- tices in N that have in-degree and out-degree one). We also denote the leaf set of N by L(N). Note that a phylogenetic network N whose underlying graph is a tree is also called a phylogenetic tree (cf. Semple and Steel (2003) for more details concerning phylogenetic trees). To simplify our arguments, we shall assume throughout this paper that all leaves of a phylogenetic net- work have in-degree one. That is, each leaf v has a unique parent, denoted by p(v). Suppose N is a phylogenetic network. We refer to a vertex of N with in-degree at least two and out-degree one as a reticulation vertex of N. For k ≥ 0 and integer, we call a binary phylogenetic network N level-k if each biconnected component of N contains at most k reticulation vertices. Note that a binary phylogenetic network N is a phylogenetic tree if and only if the level of N is zero. Hence, the level of such a phylogenetic network can be regarded as a measure of its deviation from being a phylogenetic tree. Suppose that N is a phylogenetic network on X and ∅ = Y ⊆ X. Extending the notion of a stable ancestor of a subset of X (see Fischer and Huson, 2010 and the Introduction) to subsets Y ⊆ V (N) −{ρ },we say that a vertex v ∈ V (N) is a stable ancestor of Y (in N) if v is a common ancestor of Y and is contained in every dipath that connects the root of N to some vertex w ∈ Y . Note that if u and v are two stable ancestor of Y then either u v or v u must hold. We refer to the unique stable an- cestor w ∈ V (N) that is lowest among all stable ancestors of Y as the low- est stable ancestor of Y (in N), denoted by LSA (Y ),or simply LSA(Y ). Note that if Y = {y ,... ,y } for some t ≥ 1, then we sometimes write 1 t LSA (y ,... ,y ) rather than LSA (Y ). The following two easily proven N 1 t N facts will be useful later on. Observation 1: Suppose that N is a phylogenetic network on X and ∅ = Y ⊆ Y ⊆ V (N) −{ρ }.Then LSA (Y ) ≺ LSA (Y ). N N N N Observation 2: Suppose that N is a phylogenetic network on X and that Y ⊆ V (N) −{ρ } contains at least two elements. Then there exists a pair of distinct elements y and y in Y such that LSA (y ,y )= LSA (Y ). 1 2 N 1 2 N K.T. Huber, V. Moulton, and T. Wu 1 23 1 23 1 23 (i) (ii) (iii) Figure 2. An example of triplets and trinets. (i) A triplet t =1|2|3; (ii) A triplet t =12|3; 1 2 (iii) A trinet T on {1, 2, 3}.Here t are t are two triplets displayed by the network N in 1 2 Figure 1, and T is a trinet induced by N. We deﬁne the subnet N| of N on some non-empty Y ⊆ X as the subgraph obtained from N by deleting all vertices that are not on any path from LSA(Y ) to any element in Y and subsequently suppressing all degen- erate vertices and all parallel arcs. If the latter results in degenerate vertices then we repeat this whole process until we obtain a digraph containing nei- ther parallel arcs nor degenerate vertices. Note that N| = N if and only if LSA(X)= ρ(N). Triplets and Trinets. A phylogenetic tree T on a set Y = {a, b, c} of size 3 is called a triplet. Note that T comes in two possible types. Either (1) T is binary, which implies that T contains two leaves a and b such that LSA(a, b) = ρ(T ), in which case we also write ab|c for T,or(2) T is non- binary in which case LSA(Z)= ρ(T ),for all Z ⊆ Y with |Z| =2,and we also write a|b|c for T . An example of these two types of triplets is depicted in Figure 2. Suppose N is a phylogenetic network on X. Then we say that a triplet a|b|c is displayed by N if there exists an interior vertex r ∈ V (N), such that there exist three pairwise openly disjoint dipaths P , P ,and r,a r,b P . Similarly, the triplet ab|c is displayed by N if there exist two distinct r,c interior vertices r and r in N, such that there exist four pairwise openly disjoint paths P ,P ,P ,P with r not contained in P . We denote r,r r ,a r,c r,c r ,b the collection of triplets displayed by N by R(N). Let R be a triplet system (on X), that is, a non-empty set of triplets such that X = L(t). Then, if Y is a subset of X, we denote by R| t∈R the subsystem of R consisting of all triplets t ∈R with L(t) ⊆ Y . A triplet system R on X is called dense if for each 3-subset Y ⊆ X there exists at least one triplet t ∈R for which L(t)= Y . 207 Hierarchies from Phylogenetic Networks A phylogenetic network with three leaves is called a trinet (see Figure 2 for an example). A trinet system T on X is a non-empty set of trinets such that L(T)= X and there exist no distinct trinets T, T ∈T with T ∈T L(T)= L(T ). A trinet system T on X is called dense (on X) if for each subset Y ⊆ X with |Y | =3, there exists precisely one trinet T ∈T such that L(T)= Y . Note that the use of the word ‘dense’ for trinets is slightly different from that for triplets because a phylogenetic network induces pre- cisely one trinet on a subset Y with three leaves but can display more than one triplet on Y ; see Figure 2 for an example. For N a phylogenetic network on X, we denote by T (N)= {N| : Y ⊆ X and |Y | =3}, the trinet system on X induced by N. Note that for any phylogenetic net- work N of X the trinet system T (N) induced by N is always dense on X. 3. Closed Sets In this section, we shall give a characterization of the closed sets of a phylogenetic network N on X in terms of terminal vertices. Recall from the Introduction that a subset A ⊆ X with |A|≥ 2 is closed (in N)if C(LSA (A)) = A holds. Note that the set X itself is necessarily closed, and that we use the convention that all singleton subsets of X are also closed. We begin by proving a useful lemma concerning stable ancestors. To prove this lemma will use the directed point version of Menger’s Theorem which we now state for the reader’s convenience (for more details see, e.g. Lovasz ´ and Plummer, 1986, Theorem 2.4.1). Theorem 3.1 [Menger’s Theorem] Suppose that D is a digraph with distin- guished vertices s and t and that (s, t) is not an arc in D. Then the maximum number of pairwise openly disjoint dipaths from s to t is equal to the mini- mum size of a vertex set U ⊆ V (D) −{s, t} so that each dipath from s to t contains at least one vertex in U. Lemma 3.2 Suppose that N is a phylogenetic network on X, that w ∈ V (N) is an interior vertex of N on X, and that a and b are two distinct elements in X. Putting r = LSA (a, b), the following assertions hold. (i) If there exist dipaths P and P from w to a and b, respectively, such w,a w,b that the pair P and P is openly disjoint then w r. w,a w,b (ii) There exist dipaths P and P from r to a and b, respectively, such r,a r,b that the pair P and P is openly disjoint. r,a r,b Proof. (i): Let P and P denote dipaths from w to a and b, respectively, w,a w,b such that the pair P and P is openly disjoint. Let P denote a dipath w,a w,b ρ,w K.T. Huber, V. Moulton, and T. Wu from ρ = ρ(N) to w. Then the dipath obtained by concatenating P and ρ,w P is a dipath from ρ to a that contains both w and r in its vertex set. w,a Hence, w r as otherwise the deﬁnition of r implies that r is also a vertex on the dipath P from ρ to b obtained by concatenating P and P . ρ,b ρ,w w,b Thus, r ∈ (V (P ) ∩ V (P )) −{w}, which is impossible because P w,a w,b w,a and P are openly disjoint. w,b (ii): Consider the digraph N obtained from N by adding a new vertex t and two additional arcs (a, t) and (b, t).Then (r, t) is not an arc in N and the minimum number of vertices in V (N ) −{r, t} that need to be deleted from N so that there exists no dipaths from r to t is two. By Menger’s The- orem, there exist two dipaths from r to t such that the pair formed by them is openly disjoint. Since the parents of t in N are a and b, the construction of N implies that there exist dipaths P and P in N from r to a and b, r,a r,b respectively, such that the pair P and P is openly disjoint. r,a r,b Next, we present a characterization of the terminal vertices of a phy- logenetic network. For N a phylogenetic network on X and v, w ∈ V (N) distinct such that v is a cut vertex of N,wedenoteby Z (w) the connected component of N − v that contains w. Proposition 3.3 Suppose that N is a phylogenetic network on X and that v ∈ V (N) is an interior vertex of N.Then v is a terminal vertex of N if and v is a cut vertex of N and there exists no connected component C of only if N − v and two vertices u ,u ∈ V (C) such that u ≺ v ≺ u . 1 2 1 2 v is a terminal vertex in N.Let H denote the bicon- Proof. Assume ﬁrst that nected component of N in which v is a lowest vertex and let v ,... ,v ∈ 1 t V (N), t ≥ 1, denote the children of v. We ﬁrst show that v is a cut vertex of N.Since v is a lowest vertex in H, it follows that for 1 ≤ i ≤ t,we have v ∈ V (H) and every path from ρ = ρ(N) to v must contain v.In other i i words, there exists no path from ρ to v in N − v. Hence v is a cut vertex of N. We next show that there exists no connected component C in N − v for which there exist two vertices u ,u ∈ V (C) such that u ≺ v ≺ u . 1 2 1 2 Let u and u be two vertices in N with u ≺ v ≺ u .It sufﬁces to show 1 2 1 2 that u is contained in C = Z (ρ) and that u is not contained in C .That 2 v 1 u is contained in C is an immediate consequence of the fact that every parent of v is contained in C . To see that u is not contained in C note that since u ≺ v there must exist some 1 ≤ j ≤ t such that u v .If u 1 1 j 1 were contained in C then there would exist a path in C from a parent of v to v . By concatenating this path with the dipath from v to u in N, whose j j 1 existence is implied by u v , it follows that there exist two distinct paths 1 j from v to v. Hence v must also be contained in H which is impossible. j j 209 Hierarchies from Phylogenetic Networks To see the converse, suppose that v is a cut vertex of N and that there exists no connected component C of N − v and vertices u ,u ∈ V (C) 1 2 with u ≺ v ≺ u . Assume for contradiction that v is not a terminal vertex 1 2 of N.Let u denote a parent of v in N and let H denote the biconnected component of N that contains the arc (u ,v). Clearly, v ≺ u .Since, 2 2 by assumption, v cannot be a lowest vertex of H,there must exist a child u of v that is also contained in H.Since u ≺ v clearly holds and H is 1 1 biconnected it follows that there must also exist a path from u to u that 2 1 does not contain v.But then u ∈ V (Z (u )) −{v} which is impossible. 1 v 2 Corollary 3.4 Suppose that N is a phylogenetic network on X and that v ∈ V (N) is a terminal vertex of N. Then a vertex u in V (N) −{v} is contained in Z (ρ(N)) if and only if u is not below v. Proof. Put ρ = ρ(N). Suppose ﬁrst that u ∈ V (N) −{v} such that u is a vertex in Z (ρ).Then since v is a terminal vertex in N and so must be an interior vertex of N, Proposition 3.3 implies that u cannot be below v. Conversely, suppose u ∈ V (N) −{v} is a vertex that is not below v. Then there must exist a dipath from ρ to u that does not contain v. Hence, u ∈ V (Z (ρ)). Before stating our characterization of closed sets, we state one more lemma that gives a relationship between a closed set A of a phylogenetic network N and the lowest stable ancestor of A in N. Note that the lemma is trivial if in its statement the word “path” is replace by “dipath”. To prove the lemma, we require some further terminology. Suppose that N is a phyloge- netic network and P : v ,... ,v is an (undirected) path in N. Then we call 1 k v , 1 <i <k, alternating if either (i) both (v ,v ) and (v ,v ) avertex i i i+1 i i−1 are arcs in N or (ii) both (v ,v ) and (v ,v ) are arcs in N. Clearly, P i+1 i i−1 i is the underlying undirected path of a dipath in N if and only if P does not contain any alternating vertex. Moreover, if A a closed subset of N and P is a path or a dipath in N, then we call P LSA(A)-avoiding if P does not contain LSA(A) in its vertex set. Lemma 3.5 Suppose that N is a phylogenetic network on X and that A ⊆ X is a closed set in N.Then LSA(A) is a vertex in every (undirected) path that connects ρ(N) to any element in A. Proof. Note ﬁrst that without loss of generality, we may assume that |A|≥ 2 as otherwise the lemma clearly holds. Suppose for contradiction that there is some x ∈ A such that there exists a path from ρ = ρ(N) to x that does not contain w := LSA(A).Let K.T. Huber, V. Moulton, and T. Wu ∗ ∗ P = P : v := ρ,... ,v := x, k ≥ 1, denote a w-avoiding path from ρ to 0 k x such that the number of alternating vertices in P is minimum over all w- avoiding paths between ρ and x. Without loss of generality, we may assume that x is chosen in a way so that the number m of alternating vertices in P is minimum over all possible w-avoiding paths between ρ and any element in A. We show ﬁrst that m ≥ 2.Since P is w-avoiding and, by the deﬁni- tion of the lowest stable ancestor of a set, every dipath from ρ to x contains ∗ ∗ w, it follows that P is not a dipath in N. Hence, P contains at least one alternating vertex. Since neither one of the end vertices of P can be alter- nating, k ≥ 2 must hold. Hence, (v ,v ) and (v ,v ) are two distinct arcs 0 1 k−1 k in N and, so, m must be even. Consequently, m ≥ 2. We next show that the m-th alternating vertex of P is below w.Let 0 <a <b< k be such that when starting with v the vertices v and v 0 a b are the (m − 1)-th and m-th alternating vertices in P , respectively. Then v and v are alternating and v is not alternating for all a +1 ≤ i ≤ k a i and i = b.Since (v ,v ) is an arc in N, it follows that v ,v ,... ,v k−1 k b b+1 k is a dipath from v to v = x. Note that, by the choice of a and b,no b k vertex on the path P : v ,v ,... ,v from v to v can be alternating. b b−1 a b a Thus, P is a dipath from v to v .Since N is a rooted DAG, there must b a exist a dipath P : u := ρ,... ,u := v from ρ to v . Consequently, 1 t b b u ,... ,u ,v ,... ,v is a dipath from ρ to x. By the deﬁnition of a stable 1 t b+1 k 1 ≤ i<t such that w = u . Thus, ancestor it follows that there exists some v ≺ w. Let y ∈ X denote a leaf of N below v and let w := v ,w ,... ,w := a 1 a 2 j y denote an w-avoiding dipath from v to y which must exist since y ≺ v ≺ a a w. Hence, y ∈C(w)= A,as A is closed. Let P denote the path obtained from v ,... ,v ,w ,... ,w by ﬁrst ignoring directions and then removing 1 a 2 j all cycles (in case there exist any). Then P is a w-avoiding path from ρ to y that contains at most m − 1 alternating vertices, which contradicts the choice of P . We now state our characterization of the closed sets of a phylogenetic network. Theorem 3.6 Suppose that N is a phylogenetic network on X and that A ⊆ X is a subset with 2 ≤|A| < |X|. Then the following statements are equivalent: (i) A is closed in N. (ii) LSA(A) is a terminal vertex of N and A is closed in N. (iii) there exists a terminal vertex v in N with A = C(v). 211 Hierarchies from Phylogenetic Networks Proof. (i)⇒(ii): Suppose A is closed in N. It clearly sufﬁces to show that LSA(A) is a terminal vertex of N. In view of Proposition 3.3, we need to show that (a) LSA(A) is a cut vertex of N, and (b) there exist no connected component C of N − LSA(A) and no two vertices u and u in C such that u ≺ LSA(A) ≺ u . To see (a), let x ∈ A. Then, by Lemma 3.5, every path between ρ := ρ(N) and x in N contains LSA(A). Note that the assumption on |A| implies that x = LSA(A) = ρ. Hence there is no path in N − LSA(A) joining ρ and x. Thus, LSA(A) is a cut vertex of N. To see (b), put N := N − LSA(A). Note that each vertex u in N with LSA(A) ≺ u is contained in C := Z (ρ) because there exists a dipath LSA(A) from ρ to u that does not contain LSA(A). We claim that no vertex v in C is below LSA(A). Indeed, if v ≺ LSA(A) held for some vertex v ∈ V (C), then there would exist some element x ∈C(v) ⊆C(LSA(A)) = A which is contained in C. Hence, there would exist a path between ρ and x in N that does not contain LSA(A), which is a contradiction to Lemma 3.5 since A is closed. (ii)⇒(iii): This is trivial. (iii)⇒(i): Assume that v is a terminal vertex of N such that A = C(v). Then if there exists a dipath from the root ρ to an element x ∈ A that does not contain v,then ρ and x belong to the same connected component in N − v. But this is impossible in view of Proposition 3.3 and the assumption on v. Hence, v must be a stable ancestor of A. Thus, LSA(A) v and, so, A is closed in view of A ⊆C(LSA(A)) ⊆C(v)= A. 4. Hierarchies from Networks A collection H of subsets of X is called a hierarchy (on X) if A∩B ∈ {∅,A,B} holds for all A, B ∈H,and H contains X and all singletons of X, but not the empty set. In this section, we shall show that the set H (N) Cl of all closed sets in a network N forms a hierarchy. We shall also show that this hierarchy is closely related to some other hierarchies on X that can be associated to N. Various ways have been described for associating a hierarchy to a phylogenetic network N on X, two of which we now recall (see, e.g. Dress, Moulton, Steel and Wu (2010) for several examples). The ﬁrst way concerns the cut arcs of the network. More speciﬁcally, deﬁne a subset A ⊆ X to be a cut-arc set (in N) if either A = X or there exists a cut arc a =(u, v) in N with u, v ∈ V (N) such that A = C(v). Clearly, the set H (N) of all CA cut-arc sets in N is a hierarchy on X, an observation which also follows the result we prove below. K.T. Huber, V. Moulton, and T. Wu A second way to associate a hierarchy on X to N is via its cut vertices. Call a subset A ⊆ X a cut-vertex set (of N) if either A = X or there exists acut vertex v in N such that A is the leaf set of a connected component of N − v distinct from Z (ρ(N)).Let H (N) denote the set of cut-vertex v CV sets of N. It is again straight-forward to check that H (N) is a hierarchy CV on X and that H (N) ⊆H (N). CA CV Interestingly, even though Theorem 3.6 suggests a close relationship between H (N) and H (N), this relationship is not in terms of set inclu- Cl CV sion since, in general, H (N) is neither a subset nor a superset of H (N). CV Cl However, we now introduce a superset H (N) of H (N) which we shall CV CV show below to be a hierarchy that contains both H (N) and H (N). Cl CV More speciﬁcally, we deﬁne a subset A ⊆ X to be contained in H (N) if either A ∈H (N), or there exists a cut vertex v of N such that CV CV A = X−V (Z (ρ(N))). Since each cut arc of N is incident with a cut vertex of N, it is clear that H (N) ⊆H (N) ⊆H (N) all hold. To illus- CA CV CV trate these concepts, consider the network N on X = {1, 2,... , 8} depicted in Figure 1. Let H be the collection of singletons of X and the set X.Then H (N)= H∪ {{7, 8}}, H (N)= H (N) ∪{{1, 2}}, H (N)= CA CV CA Cl H∪ {{5, 6}, {7, 8}} and H (N)= H (N) ∪{{1, 2, 3, 4}, {5, 6}}. CV CV Theorem 4.1 Suppose that N is a phylogenetic network on X.Then H (N), H (N) and H (N) are all hierarchies and CA Cl CV H (N) ⊆H (N) ⊆H (N) CA Cl CV holds. In addition, if N is binary, then H (N)= H (N)= H (N). CA Cl CV Proof. Clearly H (N) is a hierarchy on X and, by the above, H (N) Cl CV is also a hierarchy on X. We break the remainder of the proof into a series of claims. We ﬁrst claim that H (N) is a hierarchy. Let A ,A ⊆ X 1 2 CV denote two distinct elements in H (N). We need to show that A ∩ CV A ∈{∅,A ,A }. Without loss of generality, we may assume that 1 < 2 1 2 |A |, |A | < |X|.For i =1, 2,let v be the cut vertex of N associated to 1 2 i A as described in the deﬁnition of H (N) and put Z = Z (ρ) where i i v CV i ρ = ρ(N). We consider three cases which reﬂect the three possible relation- ships between v and v : 1 2 Case (1) v = v :Then Z = Z .Since A = A and H (N) 1 2 1 2 1 2 CV is a hierarchy, it sufﬁces to consider the cases that either there exists some i ∈{1, 2}, i =1 say, such that A ∈H (N) and A ∈H (N) or 1 CV 2 CV A ,A ∈H (N). In the ﬁrst case, we have A = L(C ) for some con- 1 2 CV 1 1 nected component C of N − v distinct from Z .Since A ∈H (N) 1 1 1 2 CV it follows that A = L(C ) ⊆ X − L(Z )= A and, so, A ∩ A = A . 1 1 1 2 1 2 1 In the second case, we have A = X − L(Z ) for all i =1, 2.But then i 1 A = A which is impossible. 1 2 213 Hierarchies from Phylogenetic Networks Case (2) One of v and v is below the other: Assume without loss of 1 2 generality that v is below v .Then L(Z ) ⊆ L(Z ).There are two cases 2 1 1 2 to consider: namely A ∈H (N) or A ∈H (N). Suppose ﬁrst that 1 CV 1 CV A ∈H (N).Then A = X − L(Z ).If A ∈H (N) then there 1 CV 1 1 2 CV exists some connected component C of N − v such that A = L(C).Since 2 2 L(Z ) ⊆ L(Z ), we obtain A = L(C) ⊆ X − L(Z ) ⊆ X − L(Z )= A . 1 2 2 2 1 1 If A ∈H (N) then A = X − L(Z ). Again since L(Z ) ⊆ L(Z ) it 2 CV 2 2 1 2 follows that A ⊆ A . In either case we obtain A ∩ A = A . 2 1 1 2 2 Now, assume A ∈H (N). Then swapping the roles of A and 1 CV 1 A in the previous argument implies A ⊆ A in case A ∈H (N). 2 1 2 2 CV If A ∈H (N) then since H (N) is a hierarchy on X it follows that 2 CV CV A ∩ A ∈{∅,A ,A }. 1 2 1 2 Case (3) Neither v is below v nor v is below v :If A ∈H (N) 1 2 2 1 1 CV then A ∩ A ∈{∅,A ,A } follows in case A ∈H (N) because 1 2 1 2 2 CV H (N) is a hierarchy. If A ∈H (N) then A = X − L(Z ).By CV 2 CV 2 2 assumption on v and v , it follows that A ⊆ X − L(Z )= A . Thus, 1 2 1 2 2 A ∩ A = A . So assume A ∈H (N). Then swapping the roles 1 2 1 1 CV of A and A in the previous argument implies A ∩ A = A in case 1 2 1 2 2 A ∈H (N). So assume A ∈H (N).Then A = X − L(Z ).Since 2 CV 2 CV 2 2 the assumption on v and v implies L(Z ) ∪ L(Z )= X it follows that 1 2 1 2 A ∩ A =(X −L(Z ))∩ (X − L(Z )) = ∅. Thus, H (N) is a hierarchy 1 2 1 2 CV on X, as required. We next show that the two set-inclusions stated in the theorem hold. We start with establishing that H (N) ⊆H (N). Suppose A ∈H (N). CA Cl CA Without loss of generality, we may assume that A is neither a singleton nor X itself as otherwise the claim clearly follows. Hence, there exist vertices u, v ∈ V (N) such that (u, v) is a cut-arc and C(v)= A.But then v is necessarily a terminal vertex of N. By Theorem 3.6, it follows that A is a closed set in N,asclaimed. It remains to show that H (N) ⊆H (N). Suppose A in H (N). Cl Cl CV Without loss of generality, we may assume that 1 < |A| < |X| as otherwise the claim clearly follows again. Since A is closed in N, Theorem 3.6 implies that there exists a terminal vertex v in N such that A = C(v). Note that, by Proposition 3.3, v is necessarily a cut vertex of N.Let x ∈ X. Then, by Corollary 3.4, x is contained in A = C(v) if and only if x ∈ V (Z (ρ)). Thus, A ∈H (N),as claimed. CV We conclude the proof of the theorem by showing that the three set inclusions relating H (N), H (N) and H (N) become equalities in Cl CA CV case N is binary. To see this, it sufﬁces to show that H (N) ⊆H (N). CA CV Suppose N is binary and A ∈H (N). We need to show that A is a cut-arc CV set. Without loss of generality, we may assume that 1 < |A| < |X|.Let v be acut vertex in N as described in the deﬁnition of the elements in H (N). CV K.T. Huber, V. Moulton, and T. Wu We consider two possible cases, where we put Z := Z (ρ). v v Case (a) A ∈H (N): Then there exists some connected component CV C of N − v distinct from Z such that A = L(C ).Since N is binary, v 1 v 1 hasatleastone butatmosttwo children. If v has one child then let u denote that child. Since v is a cut vertex of N,the arc (v, u) is necessarily a cut arc of N.Since A = C(u) clearly holds, it follows that A is a cut-arc set. Suppose v has two children denoted u and u . Note that v = ρ if u 1 2 1 and u are both contained in C . Denoting by v the unique parent of v in 2 1 this case, it follows that (v ,v) is a cut arc of N.Since A = C(v) clearly holds, A is a cut-arc set. So assume that precisely one of u and u ,say 1 2 u , is contained in C .Then (v, u ) is a cut arc of N.Since A = C(u ) it 1 1 1 1 follows that A is a cut-arc set. Case (b) A ∈H (N):Then A = X − L(Z ).Since N is binary, CV v N − v either has two or three connected components. If N − v has two connected components then the same arguments as in Case (a) imply that A is a cut-arc set. So assume that N − v has three connected components. Then v has a unique parent v and two children, denoted respectively by u and u . Note that all of (v ,v), (u, u ) and (u, u ) must be cut arcs of N.It 2 1 2 follows that A = L(Z (u ))∪L(Z (u )) as A = X−L(Z ). Consequently, v 1 v 2 v A = C(v) and, thus, A must be a cut-arc set. Before concluding this section, we note that closed sets are related to another type of hierarchy that can be related to a phylogenetic network. More speciﬁcally, recall that a cluster C ⊆ X is called tight in a phyloge- netic network N on X if there exists a subset V ⊆ V (N) such that (i) for all v ∈ V ,we have C(v)= C, and (ii) V separates C from X − C,that C C is, each (undirected) path from C to X − C contains some vertex in V . In Dress et al. (2010) it is shown that the tight clusters of a network form a hierarchy. Note that a cut-vertex set of N is not necessarily a tight cluster of N. As a direct corollary of Theorem 3.6 we however obtain: Corollary 4.2 Suppose that N is a phylogenetic network on X and that A ⊆ X is a closed set of N.Then A is a tight cluster of N. 5. Closed Sets from Triplets and Trinets In this section, we shall see that the closed sets of a phylogenetic network N can be inferred from the triplet or trinet systems induced by N. We start by extending the notion of a closed set to trinet systems on X. Suppose that T is a trinet system on X and A ⊆ X is a non-empty subset. We say that A is a closed in T if for each trinet T ∈T either A ∩ L(T)= ∅ 215 Hierarchies from Phylogenetic Networks or A ∩ L(T ) is a closed set in T . We now show that these concepts agree in case T is the trinet system displayed by a phylogenetic network. Theorem 5.1 Suppose that N is a phylogenetic network on X and that ∅ = A ⊆ X.Then A is a closed set in N if and only if A is closed in T (N). Proof. Without loss of generality, we may assume that 1 < |A| < |X| as otherwise the theorem clearly holds. Assume ﬁrst that A is closed in N. Suppose T ∈T (N) is a trinet such that A := A ∩ L(T ) = ∅. Note that if |A | =1,then A is closed in T by deﬁnition. Moreover, if |A | =3 then A = L(T ) and so A is closed in T . So assume |A | =2.Let x, y ∈ X and z ∈ L(T ) − A where A := {x, y}.Then LSA (x, y) LSA (A) ≺ LSA (x, y, z),where the N N N part follows from Observation 1 and the ≺ part holds because LSA (A) and LSA (x, y, z) are two common stable ancestors of x and y and hence we have either LSA (A) ≺ LSA (x, y, z) or LSA (x, y, z) LSA (A). N N N N However, the latter case implies that z LSA (A), a contradiction to the fact that A is closed and z ∈ A. Therefore, C (LSA (x, y)) = {x, y} and, N N so, A is closed in T . Conversely, suppose that A is not closed in N. We need to show that there exists a trinet T ∈T (N) such that A ∩ L(T ) is neither empty nor a closed set in T . By Observation 2, ﬁx x, y ∈ A such that LSA (x, y)= LSA (A).Since A is not closed in N, there must exist some z ∈ X − A such that z ∈C (LSA (A)).Let T ∈T (N) be such that L(T)= {x, y, z}. N N Clearly, A ∩ L(T)= {x, y} = ∅.Moreover, C (LSA (x, y)) = {x, y, z} = T T {x, y} = A ∩ L(T ). Thus, A ∩ L(T ) is not closed in T . Using this result, we now show that the cut-arc sets of a phylogenetic network N can be reconstructed from its trinet system T (N). This general- izes van Iersel and Moulton (2014, Theorem 1) which considers the binary case. Theorem 5.2 Suppose that N is a phylogenetic network on X and that A ⊆ X is a subset such that 2 < |A| < |X|.Then A is a cut-arc set of N if and only if, for all x, y ∈ A and z ∈ A,the set {x, y} is a cut-arc set of the trinet induced by N on {x, y, z}. Proof. Suppose that A is a cut-arc set of N. Then there exists a cut arc (u, v) in N with C(v)= A.Let x, y ∈ A and z ∈ A and consider the trinet T ∈R(N) on {x, y, z}.Then (u, v) induces a cut arc (u ,v ) in T whose deletion results in two connected components one of which contains {x, y} in its vertex set and the other z. Thus, {x, y} is a cut-arc set of N| . {x,y,z} K.T. Huber, V. Moulton, and T. Wu Conversely, suppose that, for all x, y ∈ A and z ∈ A,the set {x, y} is a cut-arc set for the trinet on {x, y, z} contained in T (N). Then, for all trinets T ∈T (N), Theorem 4.1 implies that A ∩ L(T ) is closed in T . By Theorem 5.1 it follows that A is closed in N. Hence, by Theorem 3.6, w := LSA(A) is a terminal vertex in N and A = C(w).Let v ∈ V (N) denote the stable ancestor of A such that A C(v) while no stable ancestor of A strictly below v has this property. Note that w ≺ v. We claim that every dipath of N from v to w contains a cut arc of N. Assume for contradiction that there exists a dipath in N from v to w that contains no cut arc of N.Let u beavertex in N so that A C(u) holds and that A C(u ) does not hold for all u ∈ V (N) below u.Then w ≺ u v must hold. Choose some z ∈C(u) − A,and let x, y ∈ A such that LSA(x, y)= w. By Lemma 3.2 there exist dipaths from w to x and y, respectively, such that the pair formed by them is openly disjoint. Let T ∈T (N) denote the trinet on {x, y, z}. We now show that {x, y} is not a cut-arc set in T , a contradiction which concludes the proof of the claim. To establish this fact we consider separately the cases u = v and u ≺ v. Suppose u = v and ﬁx a dipath P from v to z. Then by the choice of u and v if follows that except for v, none of the vertices in P is an ancestor of x or y. In addition, v = u implies that all dipaths from v to w in N are contained in T , and hence v is also contained in T . Therefore, since there exists no cut arc in N between v and w, there exists no cut arc in T between v and w,and so {x, y} is not a cut-arc set in T . Now suppose u ≺ v.Fix a dipath P from v to u and a dipath P v,u u,w from u to w.Let w be the stable ancestor of A contained in P closest to u,w u. Without loss of generality, we may assume that u = w as this case can be established in a similar manner. Note that, by the deﬁnition of v,wehave w = u as u is not a stable ancestor of A. Hence, there exists a dipath P v,w from v to w that does not contain u. Starting at v,let v ∈ V (N) be the last vertex that is simultaneously contained in P and P .Then v , u, w,and v,w v,u w must all be vertices of T and each of the dipaths P , P , P ,and v ,w v ,u u,w P induce four dipaths in T so that none of them contains a cut arc of T . w ,w By the choice of x and y, it follows that {x, y} is not a cut-arc set in T.This concludes the proof of the claim. To show that A is a cut-arc set of N and thus establish the theorem, consider a cut arc (u ,u ) in N whose removal disconnects N into two 1 2 connected components such that the vertex set of one contains w and the vertex set of the other v. Note that such a cut arc must exist by the previous claim. Then u is necessarily a stable ancestor of A.Since u ≺ v clearly 2 2 holds, the choice of v implies that A = C(u ).Hence A is a cut-arc set in N. Hierarchies from Phylogenetic Networks We now turn our attention to triplet systems. We begin by deﬁning the notion of SN-sets for triplet systems which may contain nonbinary triplets. A subset A ⊆ X is called an SN-set for a triplet system R if for a, b ∈ A distinct and c ∈ X − A,we have R| ⊆{ab|c},that is, R| is {a,b,c} {a,b,c} either {ab|c} or ∅. Note that, by deﬁnition, all singletons of X and X itself are SN-sets. We will use the convention that the empty set is not an SN-set for any triplet system. For the triplet system R(N) displayed by the network N in Figure 1, the subsets {5, 6} and {7, 8} are SN-sets while {5, 6, 7, 8} is not. The following result is a straightforward generalization of the binary case stated in Jansson and Sung (2006, Lemma 8). Note that the assumption that the triplet system is dense (that is, it contains at least one triplet for each 3-subset) is necessary even for triplet systems that contain only binary triplets. Lemma 5.3 Suppose that R is a dense triplet system on X. Then the set of SN-sets for R is a hierarchy on X. Proof. Assume for contradiction that A and B are two SN-sets for R such that A ∩ B ∈{∅,A,B}. Then there exists a ∈ A, b ∈ B,and c ∈ A ∩ B such that a ∈ B and b ∈ A.Since R is dense on X,there exists some t ∈R such that L(t)= {a, b, c}.Since A and B are SN-sets it follows that R| ⊆{ac|b}∩ {bc|a} = ∅ which is impossible as R is dense. {a,b,c} We next characterize the SN-sets of a triplet system R in terms of subsets of X that are closed with respect to a certain closure operation S which we now introduce. Suppose that R is a triplet system on X and A ⊆ X. We put S (A)= S (A ∪{c}) if there exists a, b ∈ A and R R c ∈ X − A such that a|bc ∈R or a|b|c ∈R holds, and S (A)= A otherwise. Note that, by deﬁnition, S ({x})= {x} and S (X)= X. R R Lemma 5.4 Suppose that R is a triplet system on X and that ∅ = A ⊆ X. Then A is an SN-set for R if and only if S (A)= A. Proof. Since the lemma clearly holds for |A| =1 and A = X,we may assume for the remainder of the proof that 1 < |A| < |X|. Suppose ﬁrst that A is an SN-set for R. Then for all c ∈ X − A and a, b ∈ A,we have R| ⊆{ab|c}. Thus, the only triplet on {a, b, c} {a,b,c} contained in R is ab|c. Hence, S (A)= A. Conversely, suppose S (A)= A and assume for contradiction that A is not an SN-set for R. Then there must exist some c ∈ X − A such that R| ⊆{ab|c}. Swapping the roles of a and b if necessary, we may {a,b,c} K.T. Huber, V. Moulton, and T. Wu assume that a|bc ∈R or a|b|c ∈R (or both). In either case, S (A) S (A ∪{c})= S (A) follows which is impossible. R R Next we show that the SN-sets associated to a dense triplet system R can be constructed by applying the S closure operations to pairs of elements of X. This generalizes Jansson and Sung (2006, Lemma 7). Note that the density assumption on R is necessary for Lemma 5.5 to hold. Lemma 5.5 Suppose that R is a dense triplet system on X.If A ⊆ X is an SN-set for R,then A = S ({x, y}) or A = S ({x}),for some x, y ∈ A. R R Proof. Without loss of generality we may assume that |A|≥ 2. Choose elements a, b ∈ A such that |S ({x, y})|≤ |S ({a, b})|,for all x, y ∈ A. R R We claim that A = S ({a, b}). Note ﬁrst that S ({a, b}) ⊆ S (A)= A R R R clearly holds as A is an SN-set. Assume for contradiction that S ({a, b}) = A. Then there exists some c ∈ A − S ({a, b}). The deﬁnition of the S R R closure operation combined with the fact that R is dense implies R = {a,b,c} {ab|c}. Hence, b ∈ S ({a, c}). Thus, S ({a, b}) S ({a, b, c})= R R R S ({a, c}), which is impossible. We now relate closed sets for trinet systems with SN-sets for triplet systems. For T a trinet system on X, we put R(T ):= R(N). N∈T Theorem 5.6 Suppose that T is a trinet system on X and A ⊆ X.Then A is closed in T if and only if A is an SN-set for R(T ). Proof. Since the theorem holds for |A| =1 and |A| = |X|, we may assume for the remainder of the proof that 1 < |A| < |X|. Suppose ﬁrst that A is closed in T . Assume for contradiction that A is not an SN-set of R(T ). Then there exist elements a, b ∈ A and c ∈ X − A such that R| ⊆{ab|c}. Therefore, there exists a trinet T ∈T {a,b,c} on {a, b, c} such that R(T ) ⊆{ab|c}. Swapping the roles of a and b if necessary, we may assume without loss of generality that ac|b ∈R(T ) or that a|b|c ∈R(T ) holds. In either case, there exists a vertex r ∈ V (T ) such that c ≺ r and the pair formed by the dipath from r to a and the dipath from r to b is openly disjoint. By Lemma 3.2(i), it follows that c ≺ r LSA (a, b). Hence, C(LSA (a, b)) = {a, b, c}, which contradicts the assumption that A is closed in T . Conversely, suppose that A is an SN-set of R(T ). Assume for con- tradiction that A is not closed in T . Then there exists a, b ∈ A, c ∈ X − A, and a trinet T ∈T on {a, b, c} such that C(LSA (a, b)) = {a, b, c}.Let r = LSA (a, b). Then there exists a dipath P in T from r to c. In addi- T r,c tion, by Lemma 3.2(ii), there exist dipaths P and P in T from r to a r,a r,b 219 Hierarchies from Phylogenetic Networks and b, respectively, such that the pair formed by them is openly disjoint. We consider two possible cases. Case (1): P shares no interior vertex with P ,for all z ∈{a, b}. r,c r,z Then a|b|c ∈R(T ). Hence R(T ) ⊆{ab|c} and so A cannot be an SN-set for R(T ), which is impossible. Case (2): There exists some z ∈{a, b} such that P shares one or r,c more interior vertices with P .Let w ∈ V (T ) denote the lowest vertex in r,z P such that the subpath P of P from w to c (i.e., the set of vertices r,c w,c r,c v ∈ V (P ) with c v w) does not share an interior vertex with P r,c r,a and with P . Swapping the roles of a and b if necessarily, we may assume r,b without loss of generality that w is a vertex on P .Let P denote the r,a r,w subpath of P joining w and r. Considering the vertices r and w and the r,a dipaths P , P , P ,and P implies ac|b ∈R(T ). Hence, R(T ) ⊆ r,w w,a w,c r,b {ab|c} which, as observed above, is impossible. Using Theorems 5.1 and 5.6 we immediately obtain: Corollary 5.7 Suppose that N is a phylogenetic network on X and that ∅ = A ⊆ X.Then A is a closed set in N if and only if A is an SN-set for R(N). 6. The Closure Digraph In Oldman et al. (2016) a certain digraph is associated to trinet sys- tems consisting of level-1 trinets. Using properties of this graph, a method is developed for constructing binary level-1 networks, an important family of binary networks in which no two distinct cycles share a common ver- tex, from biological datasets. In this section, we shall deﬁne and study a generalization of this digraph. First we introduce some further notation. Suppose T is a dense trinet system on X.For x, y ∈ X distinct, let κ (y) denote the number of ele- ments z ∈ X −{x, y} for which there exists a trinet T ∈T on {x, y, z} such that y ≺ LSA (x, z). Note that, in general, κ (y) = κ (x) might hold T x y and that κ (y) ≤|X|− 2 (see Figure 3 for an example). Now, the closure digraph of T , denoted by D(T ), isdeﬁned asthe digraph whose vertex set is X, and any two elements x, y ∈ X are joined by an arc (x, y) if κ (y)= |X|− 2. An example of a closure digraph for the trinet system of a phylogenetic network is presented in Figure 4. Informally speaking, an arc (x, y) in the closure digraph indicates that every non-singleton set that is closed in T and contains x must also contain y. More formally: K.T. Huber, V. Moulton, and T. Wu 1 1 1 1 2 3 4 1 1 7 8 5 5 5 5 6 5 5 Figure 3. The six trinets that are displayed by the phylogenetic network on {1, 2, ·· · , 8} depicted in Figure 1 and contain leaves 1 and 5. This implies κ (5) = 6, while κ (1) = 5. 1 5 3 4 1 2 5 6 Figure 4. The closure digraph for the trinet system induced by the phylogenetic network depicted in Figure 1. Undirected edges represent bidirected arcs. Figure 4. The arc (1, 5) follows from the example presented in Figure 3. Lemma 6.1 Suppose that T is a dense trinet set on X and that x, y ∈ X distinct. If (x, y) is an arc in D(T ), then each non-singleton set that is closed in T and contains x must also contain y. Proof. Suppose that (x, y) is an arc in D(T ) and that A is closed in T with x ∈ A and |A|≥ 2. Choose some element a ∈ A −{x}. Without loss of generality, we may assume a = y as otherwise the lemma clearly holds. Let T ∈T denote the unique trinet on {x, y, a}.Since (x, y) ∈ D(T ),we have κ (y)= |X|− 2 and, so, y ≺ LSA (x, a). Combined with x T the assumption that A is closed in T , we obtain {x, y, a} = C(LSA (x, a)) ⊆C(LSA (A∩L(T ))) = A∩L(T ) ⊆{x, y, a}. T T Hence, y ∈ A must hold. Note that even if T is induced by a binary level-1 network, the con- verse of Lemma 6.1 need not always hold. For example, suppose N is the network on X = {1, 2, 3, 4} depicted in Figure 5. Then (2, 1) is not an arc in the closure digraph D(T (N)). However, each non-singleton set A that is closed in T (N) must contain 1 if 2 ∈ A. Using Lemma 6.1, we now show that closed sets for dense trinet sys- tems are so-called sink subsets in the closure digraph. Recall that a non- empty subset A of the vertex set of a digraph G is called a sink subset in G 221 Hierarchies from Phylogenetic Networks Figure 5. An example illustrating that the converse of Lemma 6.1 does not hold in general— see text for details. if there exists no arc in G from A to V − A, that is, for each arc (x, y) in G with x ∈ A,wehave y ∈ A as well. Proposition 6.2 Suppose that T is a dense trinet set on X and that A ⊆ X is a subset with |A| > 1.If A is closed in T ,then A is a sink subset in D(T ). Proof. Assume for contradiction that there exists some A ⊆ X with |A|≥ 2 such that A is closed in T but A is not a sink subset of D(T ). Then there exists an arc (x, y) in D(T ) with x ∈ A and y ∈ X − A. Hence, by Lemma 6.1, A cannot be closed in T ; a contradiction. Note that the converse of Proposition 6.2 is not true in general. For instance, consider the network N pictured in Figure 1 and its closure di- graph D(T (N)) depicted in Figure 4. Then {1, 2, 3, 4, 5, 6} is a sink set in D(T (N)), but it is not closed in T (N). Even so, in the next section we will see that for certain class of networks the converse of Proposition 6.2 does in fact hold. We now consider properties of the closure digraph of the trinet system induced by a phylogenetic network. Theorem 6.3 Suppose that N is a phylogenetic network on X and that x, y ∈ X distinct. Then (x, y) is an arc of D(T (N)) if either (i) y ≺ p(x), or (ii) C (p(x)) = {x} and y ≺ LSA(p(x)) hold. N N Proof. Put T = T (N). To see that κ (y)= |X|− 2 holds, suppose z ∈ X −{x, y}. We claim that y ≺ LSA(x, z) where T ∈T (N) is the trinet with leaf set Y := {x, y, z}. Assume ﬁrst that Property (i) holds. Then we have y ≺ p(x) N N LSA(x, z). Hence y ≺ p(x) LSA(x, z), from which the claim follows. T T Next, assume that Property (ii) holds. Let H denote the digraph ob- tained from N by removing all vertices that are not on any dipath from LSA(Y ) to some element in Y.Then T is obtained from H by recursively deleting parallel arcs and suppressing degenerate vertices. K.T. Huber, V. Moulton, and T. Wu Figure 6. An illustration of Case 2-1 in the proof of Theorem 6.3. Dotted and dashed edges denote dipaths. Let v := p(x). Together with the assumption that |C(v)| =1 = |X|, it follows that v = ρ := ρ(N) and that v must be a reticulation vertex of N.Let v ,v ,··· ,v ∈ V (N) denote the t ≥ 2 parents of v. Also, let 1 2 t u := LSA (v). Then, by assumption, y ≺ u. We now consider two possible cases: Case (1) u is a parent of v: Without loss of generality, we may assume u = v .Let P be the dipath in N consisting of u, v and x.Since y ≺ u and 1 x |C(v)| =1, there exists a dipath P from u to y in N such that the pair P y x and P is openly disjoint in N.Since P and P also form a pair of openly y x y disjoint dipaths in H, it follows that u ∈ V (T ).Let P and P be the dipaths x y in T induced by P and P , respectively. Since P contains either no interior x y vertex or has v as its only interior vertex, we have u LSA (x, z) because v is not an ancestor of z. This implies the claim as y ≺ u LSA (x, z) holds in T . Case (2) u is not a parent of v: We consider two subcases: Case (2-1) There exists no common ancestor of x and y below u (see Figure 6). This implies that there exists a dipath P in N from u to y in which the only ancestor of x is u. Now an argument similar to that used in Case (1) shows that u ∈ V (T ). Note that we may assume that z ≺ u holds as otherwise u is not an ancestor of z in T , and hence y ≺ u LSA (x, z) holds. In addition, we may further assume that there exists a common ancestor of x and z below u, as otherwise we have u LSA (x, z). Let w be a lowest common ancestor of x and z such that w is below u (see Figure 6 for an illustration). Let P be a dipath from u to v in N that contains w,and let P denote the subpath of P from u to w.Since (u, v) is not an arc in N, Theorem 3.1 combined with u = LSA (v) implies that there exists a dipath P in N from u to v such that the pair P and P is 2 2 1 openly disjoint. Let P be the dipath from u to x obtained by concatenating P and the arc (v, x).Since w is a lowest common ancestor of x and z,there also exists a dipath P from w to z in which no interior vertex is an ancestor 223 Hierarchies from Phylogenetic Networks of x.Let P be the dipath from u to z obtained from concatenating P and P . Then the dipath pair P and P must be openly disjoint. This implies x z that w and v are both contained in T,and that y ≺ u LSA (x, z) holds. T T Case (2-2): There exists a common ancestor of x and y below u:Let w be a lowest common ancestor of x and y below u. Then an argument similar to that in Case (2-1) shows that u, w and v are all vertices in T.This implies u LSA (x, z) and, thus, the claim in this case too. T T Note that the converse of Theorem 6.3 need not hold in general. For example, consider the arc (1, 5) in the closure digraph depicted in Figure 4. Then neither one of the two conditions in the theorem holds. We now prove a useful corollary concerning sink subsets in the clo- sure digraph associated to the trinet system of a phylogenetic network. We start with some additional notation. We say that a sink subset A in a digraph G is minimal if |A| > 1 and every subset A A with |A | > 1 is not a sink subset in G. Suppose that N is a phylogenetic network on X and that a, b are two vertices in N such that neither one of them is a leaf. We say that a and b are redundant if b ≺ a and, for each vertex u a,we either have u b or b ≺ u. Note that if a and b are redundant then C (a)= C (b). N N Corollary 6.4 Suppose that N is a phylogenetic network on X. Then every sink subset of D(T (N)) has size at least two (or, equivalently, for every x ∈ X, there exists an element y ∈ X such that (x, y) is an arc in D(T (N))). Proof. Put T = T (N). Note ﬁrst that we may assume that N does not contain a redundant pair of vertices as otherwise we may replace N by the phylogenetic network N obtained from N via the following process. Sup- pose a, b ∈ V (N) form a redundant pair of vertices of N. First, delete all vertices u ∈ V (N) for which u ≺ a and b ≺ u holds (including their inci- dent arcs). Next, add the arc (a, b) to the resulting graph. Finally, suppress all degenerate vertices of that graph. Clearly, a set is closed in N if and only if it is closed in N . Furthermore, the closure digraphs for T and T (N ), respectively, coincide as a pair of elements of X forms an arc in D(T ) if and only if it forms an arc in D(T (N )). Suppose x ∈ X. We show that there exists an element y ∈ X such that (x, y) is an arc in D(T ). Clearly, if |C(p(x))|≥ 2 then, for any y ∈ C(p(x))−{x}, wehavethat (x, y) is an arc of D(T ) in view of Theorem 6.3. So assume |C(p(x))| =1. Note that p(x) is not the root of N as |X|≥ 3. Put u = LSA(p(x)). Also note that if |C(u)| =1 held then u and p(x) would form a redundant pair which is impossible in view of our assumption on N. Hence, |C(u)|≥ 2. Choose some y ∈C(u) −{x}. Then, by Theorem 6.3, (x, y) must be an arc in D(T ). K.T. Huber, V. Moulton, and T. Wu 7. 2-Terminal Networks Suppose that N is a phylogenetic network on X (T a trinet system) and that A ⊆ X is a closed set in N (in T ) ofsize atleasttwo. Then A is minimal closed in N (in T ) if each non-singleton subset A A is not closed in N (in T ). In this section, we shall show that for 2-terminal networks, that is, networks N for which each biconnected component of N contains at most 2 terminal vertices, the minimal closed sets in N are precisely the minimal sink subsets in the closure digraph D(T (N)). We begin with a key structural result concerning 2-terminal networks. Note that a similar result is proven in van Iersel et al. (2017, Theorem 3.1) for binary networks, but the binary condition plays an essential part in the proof which necessitates the development of a new approach. Suppose that N is a phylogenetic network and that H is a biconnected component of N. We denote by r(H) the highest vertex in H, that is, the necessarily unique vertex in H such that v ≺ r(H) holds for all vertices v in H distinct from r(H). Lemma 7.1 Suppose that H is a biconnected component in a 2-terminal network N. Then there exists a terminal vertex u (of N)in H such that LSA(u)= r(H). Proof. Note that the lemma clearly holds if H contains only one terminal vertex. Indeed, if u is that vertex then LSA(u) r := r(H) holds by deﬁnition of r(H). Hence, if LSA(u) = r,then LSA(u) is a cut vertex of H, a contradiction. So, for the remainder of the proof, assume that H contains precisely two terminal vertices, denoted u and u , respectively. For i =1, 2, note 1 2 that u = LSA(u ) is a vertex of H. Swapping the roles of u and u if i 1 2 ∗ ∗ ∗ ∗ necessary, we may assume that u is not below u , that is, either u and u 1 2 1 2 ∗ ∗ are not comparable via “≺”or u ≺ u . 2 1 ∗ ∗ To see that u = r, assume for contradiction that u ≺ r.Then 1 1 ∗ ∗ since H is biconnected and u = r there must exist some k ≥ 3 and a u - 1 1 avoiding path P : v := r, v ,... ,v := u in H from r to u .Since u 1 2 1 1 is a stable ancestor of u , it follows that P contains at least one alternating vertex. Moreover, noting that the arcs (v ,v ) and (v ,v ) are distinct as 1 2 k−1 k k ≥ 3, the number m of alternating vertices in P is at least two. Without loss of generality, we may further assume that P is chosen so that every u -avoiding path in H from r to u contains at least m alternating vertices. Let 1 ≤ i<j ≤ k be such that v and v are the (m − 1)-th and m-th i j alternating vertices of P , respectively. Then the dipath P : v ,v ,... ,v 1 j j+1 k from v to u is a subdipath of P , and hence u -avoiding (see Figure 7 for j 1 an illustration). Since the dipath P : v ,v ,... ,v is also a subdipath of 2 j j−1 i P we have v ≺ v . i j 225 Hierarchies from Phylogenetic Networks u P Figure 7. An illustration of the various paths considered in the proof of Lemma 7.1. The concatenation of the dipaths P and P forms the dipath P , and the concatenation of the 2 3 path P and the dipath P forms the path Q. Finally, the concatenation of P , P and P 5 4 1 forms the dipath P . Let P denote a dipath from r to v (which exists by the deﬁnition of r). Note that the dipath obtained by concatenating P and P is a dipath ∗ ∗ in H from r to u , and hence contains u because u isastableancestor 1 1 ∗ ∗ of u .Since P is u -avoiding, it follows that u is a vertex of P . Hence 1 1 1 1 v ≺ v ≺ u . We now prove three claims which will allow us to establish i j that u is also a stable ancestor of u . This will complete the proof since if ∗ ∗ u is a stable ancestor of both u and u ,then u must be a cut vertex of 1 2 1 1 H, which is impossible since H is a biconnected component of N and u is contained in H. Now, we ﬁrst claim that u v does not hold. Suppose this is not the 1 i case, i.e., u v . Then there exists a dipath K from v to u . Hence, the 1 i i 1 path R obtained by concatenating the subpath Q : v ,... ,v of P with K 1 i is a path from r to u . Note that since P is u -avoiding, so is Q. Hence, R is also u -avoiding. Since, by construction, R has fewer alternating vertices than P this is impossible. Thus, the claim must hold. ∗ ∗ Second, we claim that u u . To see this, note that since u and u 1 2 2 1 are the only two terminal vertices in H, and, by the previous claim, u v 1 i does not hold, we have u v as every non-terminal vertex of N must have 2 i a terminal vertex of N below it. Without loss of generality, we may assume that, in fact, u ≺ v because the case u = v can be established in a similar 2 i 2 i manner. Then there exists a dipath P from v to u . Hence, the dipath P i 2 obtained by concatenating P , P ,and P is a dipath in H from r to u .By 2 2 the deﬁnition of a stable ancestor, u must be a vertex of P . Since, as was ∗ ∗ ∗ u is a vertex of P and, by assumption, u ≺ u does not observed above, 1 1 2 ∗ ∗ hold, we obtain u u , as required for the second claim to hold. 2 1 Finally, we claim that u is also a stable ancestor of u . To see this, ∗ ∗ ∗ note ﬁrst that u ≺ u must hold. Indeed, if u ≺ u did not hold, then u 1 1 2 2 2 must be a cut vertex of H since u is the only other terminal vertex contained 2 K.T. Huber, V. Moulton, and T. Wu in H. But this is impossible as H is a biconnected component of N.Now, assume for contradiction that u is not a stable ancestor of u .Then every ∗ ∗ ∗ u -avoiding dipath P from r to u (if it exists) must contain u since u is 1 2 2 ∗ ∗ astableancestor of u .But u ≺ u u by the previous claims. Hence, 2 1 2 1 ∗ ∗ the subpath of P from r to u can be extended to a u -avoiding dipath from 2 1 r to u . This is impossible as u is a stable ancestor of u . 1 1 We now show that for 2-terminal networks N, minimal sink subsets in D(T (N)) are closed sets in N. Proposition 7.2 Suppose that N is a 2-terminal network on X and that A ⊆ X is a subset with |A|≥ 2.If A is a minimal sink subset in D(T (N)), then A is closed in N. Proof. Put T = T (N) and assume that A ⊆ X is a subset with |A|≥ 2 that is also a minimal sink subset in D(T ). Using arguments similar to the ones used at the beginning of the proof of Corollary 6.4, we may assume that N does not contain a redundant pair of vertices. The remainder of the proof of the proposition is based on two claims which we establish ﬁrst. Suppose U is the set of terminal vertices u in N for which, in addition, C(u) ∩ A = ∅ holds. Note that U = ∅ as every element of X is a terminal vertex of N. Claim 1: For each vertex u ∈ U,either |C(u)| =1 or A ⊆C(u) must hold (but not both). To prove the claim, assume that there exists a vertex u ∈ U with |C(u)| > 1. We need to show that A ⊆C(u).Since u is a terminal vertex of N, Theorem 3.6 implies that C(u) is closed in N. By Theorem 5.1 and Proposition 6.2, it follows that C(u) is a sink subset in D(T ).Since A is also a sink subset of D(T ), the intersection B = C(u)∩A is necessarily a sink subset of D(T ). By Corollary 6.4, |B|≥ 2.Since B ⊆ A, the minimality of A implies B = A. Thus, A ⊆C(u),which completes the proof of Claim 1. Claim 2: If H is a biconnected component of N that contains a vertex u ∈ U with |C(u)| =1,then C(r(H)) ⊆ A. To prove this claim, let u denote the unique leaf in C(u). Note that since N does not contain a redundant pair, u must be the parent of u . Assume ﬁrst that u is the only terminal vertex of N contained in H. Then r(H)= LSA(u). Assume for contradiction that there exists some y ∈C(r(H)) − A.Then y ≺ r(H)= LSA(u)= LSA(p(u )).Hence, by Theorem 6.3, (u ,y) must be an arc in D(T ).Since u ∈ A as |C(u)| =1, x x and A is a sink subset of D(T ), it follows by Lemma 6.1 that y ∈ A,which is impossible. Now, suppose that H contains two terminal vertices u and u of 1 2 N, with u = u .Put u = LSA(u ), noting that we may assume that 1 1 1 227 Hierarchies from Phylogenetic Networks u ≺ r(H) holds since otherwise arguments similar to the ones in the proof ∗ ∗ of Claim 1 maybe applied. Moreover, u ≺ u as otherwise u is a cut vertex 1 1 of H, a contradiction. But then Theorem 6.3 implies for all y ∈C(u ) that (u ,y) is an arc in D(T ).Since u ∈ A and A is a sink subset of D(T ), x x it follows by Lemma 6.1 that C(u ) ⊆ A.Since A ⊆C(u ) cannot hold as 2 2 u is a terminal vertex distinct from u , Claim 1 implies that |C(u )| =1. 2 1 2 Thus, there exists some y ∈ A such that C(u )= {y}. Furthermore, since u ≺ r(H) and H is a biconnected component of a 2-terminal network we have u = r(H) by Lemma 7.1. Together with u = p(y), Theorem 6.3 implies that (y, z) is an arc in D(T ) for all z in C(r(H)) −{y}. Combined with the assumption that A is a sink subset of D(T ) and y ∈ A,it follows that C(r(H)) ⊆ A. This completes the proof of Claim 2. Using these claims we now prove that A is closed in N. Suppose x ∈ A. Then, by Theorem 6.3, (x, y) is an arc in D(T ),for all y ∈C(p(x)) − {x}. Hence C(p(x)) ⊆ A. Note that if p(x) is the root ρ(N) of N then C(p(x)) = C(ρ(N)) = X. Thus, A = X and, so, A is closed in N by deﬁnition. Thus, assume for the remainder of the proof that ρ(N) = p(x). Let p (x) be a parent of p(x) in N and let C denote the biconnected component of N containing the arc (p (x),p(x)). We consider two possible cases: Case (1) C is a trivial biconnected component of N:Then (p (x),p(x)) is the unique arc of C. Since that arc is clearly a cut arc of N, it follows that C(p(x)) is a cut-arc set for N. Hence, by Theorem 4.1, C(p(x)) is closed in N. Thus, by Theorem 5.1, C(p(x)) is closed in T .Since (p (x),p(x)) is a cut arc of C and N does not contain degenerate vertices, p(x) has at least two children. Hence, |C(p(x))| > 1. By Proposition 6.2, it follows that C(p(x)) must be a sink subset in D(T ).Since C(p(x)) ⊆ A, the minimality of A implies A = C(p(x)). Thus, A is closed in N. Case (2) C is not a trivial biconnected component of N:Let U be the set of terminal vertices u in C for which, in addition, C(u) ∩ A = ∅ holds. Note that U is not empty as it contains either p(x) or a descendant of p(x). We consider two sub-cases: Case (2-1) There exists a vertex u ∈ U with |C(u)| > 1: Then, by Claim 1, A ⊆C(u). Hence, x ≺ u, and, therefore, p(x) u.Since u is a terminal vertex of N in C and (p (x),p(x)) is an arc of C, we obtain p(x)= u. In view of Theorem 6.3, it follows that for all y ∈C(u), (x, y) is an arc in D(T ). Hence, by Lemma 6.1 C(u) ⊆ A. By the minimality of A, we obtain A = C(u). Thus A is a closed in N. Case (2-2) |C(u)| =1,for all u ∈ U : We shall construct a sequence of vertices r ,r ,... of N which will eventually terminate at a vertex r , 0 1 k k ≥ 0,so that C(r )= A and r is either ρ(N) or a terminal vertex of k k N.Put r = r(C).Then |C(r )| > 1 because C is non-trivial and N 0 0 K.T. Huber, V. Moulton, and T. Wu does not contain any redundant pair of vertices. By Claim 2, C(r ) ⊆ A. Hence, if r = ρ(N),then X = C(ρ(N)) = C(r ) ⊆ A ⊆ X,which 0 0 implies C(r )= X = A. Hence, A is closed in N in this case. So suppose r = ρ(N).If r is a terminal vertex of N, then Theorem 3.6 implies that 0 0 C(r ) is closed in N. By Theorem 5.1 and Proposition 6.2, C(r ) is a sink 0 0 subset in D(T ), and by minimality of A, C(r )= A. So, assume r = ρ(N) and that r is not a terminal vertex of N.Then 0 0 there exists some biconnected component C of N that contains r so that 1 0 r ≺ r := r(C ) holds. Furthermore, let u ∈ V (C ) denote a terminal 0 1 1 1 1 1 vertex of N for which u ≺ r holds. Then C(u ) ⊆C(r ) ⊆ A and so 0 0 1 1 u ∈ U. Note that since (p (x),p(x)) is an arc in C,we have x ∈C(u ). 1 1 Hence, A ⊆C(u ).By Claim 1, |C(u )| =1, and so by Claim 2, C(r ) ⊆ A. With r playing the role of r in the argument used in the last paragraph, if 1 0 r = ρ(N) or r is a terminal vertex of N,then C(r )= A. Therefore, r 1 1 1 1 must be contained in a biconnected component C of N which contains a 2 2 terminal vertex u ∈ C with u ≺ r ≺ r := r(C ) and C(r ) ⊆ A. 2 1 2 2 2 Since N is ﬁnite, this process of constructing vertices r , i ≥ 0 must terminate at some stage k ≥ 0 resulting in a vertex r such that C(r )= A k k and r is either ρ(N) or a terminal vertex of N. We now characterize sets that are minimal closed in 2-terminal net- works. Theorem 7.3 Suppose that N is a 2-terminal network on X and A ⊆ X with |A|≥ 2. Then the following assertions are equivalent. (i) A is minimal closed in N. (ii) A is minimal closed in T (N). (iii) A is a minimal sink subset in the closure digraph D(T (N)). Proof. (i) ⇐⇒ (ii): This is a direct consequence of Theorem 5.1. (ii) ⇒ (iii): Suppose that A is a minimal closed set in T := T (N). Then, by Proposition 6.2, A is a sink subset in D(T ). Assume for contra- diction that A is not a minimal sink subset in D(T ). Then there exists a minimal sink subset B ⊆ X in D(T ) with B A. By Proposition 7.2, B must be closed in N. Hence, by Theorem 5.1, B is also a closed in T . Thus, B = A by the minimality of A which is impossible. (iii) ⇒ (ii):Put T := T (N) and suppose that A is a minimal sink subset in D(T ). Then, by Proposition 7.2, A is closed in N. Assume for contradiction that A is not minimal closed in N. Then there exists some B A that is minimal closed in N. By the equivalence of Assertions (i) and (ii) in Theorem 7.3, B must be a minimal closed set in T . Hence, |B|≥ 2 by the deﬁnition of a minimal closed set of N. By Proposition 6.2, 229 Hierarchies from Phylogenetic Networks B is a sink subset in D(T ). Thus, A = B by the minimality of A which is impossible. To illustrate the last theorem, consider the network N on X = {1, 2, ... , 8} in Figure 1. Then A := {7, 8} is minimal closed in N,and A is also a minimal sink subset in the closure digraph D(T (N)) (see Figure 4). On the other hand, A := {1, 2, 3, 4, 5, 6} is a sink subset in the closure digraph D(T (N)) but it is not a minimal sink subset because {5, 6} is a sink subset. Hence Theorem 7.3 implies that A is not minimal closed in N. Indeed, A is not even a closed set in N. Since a level-2 (and hence also a level-1) network is necessarily a 2- terminal network, Theorem 7.3 can be viewed as a signiﬁcant generalization of a result presented in Oldman, Wu, van Iersel and Moulton (2016, The- orem 1 in the Appendix), which characterizes minimal sink subsets in the closure digraph induced by level-1 networks using minimal cut-arc sets. 8. Conclusions and Future Directions In this paper we have introduced the concept of a closed set in a phy- logenetic network. We have seen that these sets provide a natural way to extend the notion of SN-sets for binary networks to general networks, and that the closed sets of a network are closely related to the triplets and trinets that it displays. In Theorem 7.3, we showed that we can characterize the closed sets of a 2-terminal network in terms of minimal sink subsets of the closure digraph associated to the triplets displayed by the network. It would be interesting to know whether or not this result also holds for networks in general, although this appears to be quite difﬁcult to decide. In addition, it could also be of interest to better understand properties of 2-terminal networks (or more generally, k-terminal networks, k ≥ 1, which can be deﬁned in the obvious way). For example, are 2-terminal networks deﬁned by their trinets? Note that level-2 networks enjoy this property (van Iersel and Moulton 2014). In general, a phylogenetic network is not determined by its trinets (even if it is binary) (Huber, van Iersel, Moulton and Wu 2015). However, by Theorem 5.2 it follows that the cut arc hierarchy H (N) can be con- CA structed from the trinets of a phylogenetic network N. It would be interest- ing to know whether or not the cut vertex hierarchy H (N) or the related CV hierarchy H (N) can also be reconstructed from trinets. More generally, CV it could be useful to understand which other features of networks are deter- minedbytheir trinets. In this paper we have concentrated on theoretical properties of closed sets. However, there are associated algorithmic questions that are also of K.T. Huber, V. Moulton, and T. Wu interest. For example, note that combined with an algorithm similar to the one presented in Jansson and Sung (2004, Figure 4), Lemma 5.5 can be used to compute, for any dense triplet system R on a set X, the associated family of SN-sets for R in O(|X| ) time. However, it would be interesting to know whether there may be a more efﬁcient algorithm for computing closed sets along the lines of the one presented in Jansson et al. (2006) for computing SN-sets. This might also use results presented in Fischer and Huson (2010) for computing lowest stable ancestors. Solutions to these sorts of problems should eventually lead to new algorithms for computing phylogenetic networks. One possible approach to develop such an algorithm could be to use Theorem 7.3 as a basis for computing level-2 networks (or more generally 2-terminal networks). This might follow the approach that was used in Oldman et al. (2016) to construct level-1 networks in a bottom up fashion from level-1 trinets. In particular, ﬁrst a dense set of level-2 trinets would be computed from biological data and then, using the closure digraph of this set, a minimal sink subset would be found. For this subset a simple level-2 network could then be derived, and the subset replaced by a single element in such a way that this whole process could be repeated. However, various problems would need to be overcome to make this approach work. For example, new methods need to be developed to associate level-2 trinets to biological data, and robust ways need to be found for combining level-2 trinets into level-2 networks. References CARDONA, G., LLABRES, M., ROSSELLO, F., and VALIENTE, G. (2011), “Comparison of Galled Trees,” IEEE/ACM Transactions on Computational Biology and Bioinfor- matics, 8, 410–427. DRESS, A., MOULTON, V., STEEL, M., and WU, T. (2010), “Species, Clusters and the ’Tree of life’: A Graph-Theoretic Perspective,” Journal of Theoretical Biology, 265, 535–542. FISCHER, J., and HUSON, D. (2010), “New Common Ancestor Problems in Trees and Directed Acyclic Graphs,” Information Processing Letters, 110, 331–335. GUSFIELD, D. (2014), ReCombinatorics: The Algorithmics of Ancestral Recombination Graphs and Explicit Phylogenetic Networks, MIT Press. HUBER, K.T., and MOULTON, V. (2012), “Encoding and Constructing 1-Nested Phyloge- netic Networks with Trinets,” Algorithmica, 616, 714–738. HUBER, K.T., VAN IERSEL, L., MOULTON, V., SCORNAVACCA, C., and WU, T. (2017), “Reconstructing Phylogenetic Level-1 Networks from Nondense Binet and Trinet Sets,” Algorithmica, 77, 173–200. HUBER, K.T., VAN IERSEL, L., MOULTON, V., and WU, T. (2015), “How Much In- formation is Needed to Infer Reticulate Evolutionary Histories,” Systematic Biology, 64, 102–111. 231 Hierarchies from Phylogenetic Networks HUBER, K.T., VAN IERSEL, L., KELK, S., and SUCHECKI, R. (2011), “A Practical Algo- rithm for Reconstructing Level-1 Phylogenetic Networks,” IEEE/ACM Transactions on Computational Biology and Bioinformatics, 8, 635–649. HUSON, D.H., RUPP, R., and SCORNAVACCA, C. (2010), Phylogenetic Networks: Con- cepts, Algorithms and Applications, Cambridge University Press. JANSSON, J., NGUYEN, N., and SUNG, W.-K. (2006), “Algorithms for Combining Rooted Triplets into a Galled Phylogenetic Network,” SIAM Journal of Computing, 35, 1098– JANSSON, J., and SUNG, W.-K. (2006), “Inferring a Level-1 Phylogenetic Network from a Dense Set of Rooted Triplets,” Theoretical Computer Science, 363, 60–68. JETTEN, L., and VAN IERSEL, L. (2016), “Nonbinary Tree-Based Phylogenetic Networks,” IEEE/ACM Transactions on Computational Biology and Bioinformatics, in press. LOVASZ, L., and PLUMMER, M.D. (1986), Matching Theory (Vol. 121, North-Holland Mathematics Studies), Elsevier Science Ltd. NAKHLEH, L. (2011), “Evolutionary Phylogenetic Networks: Models and Issues,” in Problem Solving Handbook in Computational Biology and Bioinformatics, Springer, pp. 125–158. OLDMAN, J., WU, T., VAN IERSEL, L., and MOULTON, V. (2016), “Trilonet: Piecing To- gether Small Networks to Reconstruct Reticulate Evolutionary Histories,” Molecular Biology and Evolution, 33, 2151–2162. SEMPLE, C., and STEEL, M. (2003), Phylogenetics, Oxford University Press. TO, T.-H., and HABIB, M. (2009), “Level-k Phylogenetic Networks are Constructable from a Dense Triplet Set in Polynomial Time”, in Annual Symposium on Combinatorial Pattern Matching, Springer, pp. 275–288. VAN IERSEL, L., KEIJSPER, J., KELK, S., STOUGIE, L., HAGEN, F., and BOEKHOUT, T. (2009), “Constructing Level-2 Phylogenetic Networks from Triplets,” IEEE/ACM Transactions on Computational Biology and Bioinformatics, 6, 667–681. VAN IERSEL, L., and KELK, S. (2011), “Constructing the Simplest Possible Phylogenetic Network from Triplets,” Algorithmica, 60, 207–235. VAN IERSEL, L., and MOULTON, V. (2014), “Trinets Encode Tree-Child and Level-2 Phy- logenetic Networks,” Journal of Mathematical Biology, 68, 1707–1729. VAN IERSEL, L., MOULTON, V., DE SWART, E., and WU, T. (2017), “Binets: Fundamen- tal Building Blocks for Phylogenetic Networks,” Bulletin of Mathematical Biology, 79, 1135–1154. Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/ licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.
Journal of Classification – Springer Journals
Published: Nov 16, 2018
You can share this free article with as many people as you like with the url below! We hope you enjoy this feature!
Read and print from thousands of top scholarly journals.
Already have an account? Log in
Bookmark this article. You can see your Bookmarks on your DeepDyve Library.
To save an article, log in first, or sign up for a DeepDyve account if you don’t already have one.
Copy and paste the desired citation format or use the link below to download a file formatted for EndNote
Access the full text.
Sign up today, get DeepDyve free for 14 days.
All DeepDyve websites use cookies to improve your online experience. They were placed on your computer when you launched this website. You can change your cookie settings through your browser.