Get 20M+ Full-Text Papers For Less Than $1.50/day. Start a 14-Day Trial for You or Your Team.

Learn More →

First-Order Theory of Rewriting for Linear Variable-Separated Rewrite Systems: Automation, Formalization, Certification

First-Order Theory of Rewriting for Linear Variable-Separated Rewrite Systems: Automation,... The first-order theory of rewriting is decidable for linear variable-separated rewrite systems. We present a new decision procedure which is the basis of FORT, a decision and synthesis tool for properties expressible in the theory. The decision procedure is based on tree automata techniques and verified in Isabelle. Several extensions make the theory more expressive and FORT more versatile. We present a certificate language that enables the output of FORT to be certified by the certifier FORTify generated from the formalization, and we provide extensive experiments. Keywords Term rewriting· First-order theory· Tree automata· Formalization 1 Introduction Many properties of rewrite systems can be expressed as logical formulas in the first-order theory of rewriting. This theory is decidable for the class of linear variable-separated rewrite systems, which includes all ground rewrite systems. The decision procedure is based on tree automata techniques and goes back to Dauchet and Tison [10]. It is implemented in FORT [46, 48], which takes as input one or more rewrite systems R , R , ... and a formula ϕ,and 0 1 determines whether the rewrite systems satisfy the property expressed by ϕ, in which case it reports yes or no. FORT may not reach a conclusion due to limited resources. For properties related to confluence and termination, designated competitions (CoCo [41], termCOMP [23]) of software tools take place regularly. Occasionally, yes/no conflicts appear. Since the participating tools typically couple a plethora of techniques with sophisticated search strategies, human inspection of the output of tools to determine the correct answer is often not feasible. Hence certified categories were created in which tools must output a formal certificate. This certificate is verified by CeTA [53], an automatically generated Haskell program using the code generation feature of Isabelle. This requires not only that the underlying techniques are formalized in Isabelle, but the formalization must be executable for code generation to apply. During the time-consuming formalization process, mistakes in B Aart Middeldorp aart.middeldorp@uibk.ac.at Department of Computer Science, University of Innsbruck, Innsbruck, Austria 0123456789().: V,-vol 123 14 Page 2 of 76 A. Middeldorp et al. papers are sometimes brought to light. An additional outcome is that formalization efforts may give rise to simpler and more efficient constructions and algorithms. Since 2017 we are concerned with the question of how to ensure the correctness of the answers produced by FORT. The certifier CeTA supports a great many techniques for estab- lishing concrete properties like termination and confluence, but the formalizations in the underlying Isabelle Formalization of Rewriting (IsaFoR) are orthogonal to the ones required for supporting the decision procedure underlying FORT. We present a certificate language which is rich enough to express the various automata operations in decision procedures for the first-order theory of rewriting as well as numerous predicate symbols that may appear in formulas in this theory. FORTify, the verified Haskell program obtained from the Isabelle formalization, validates certificates in this language. The decision procedure implemented in FORT and formalized in Isabelle is based on three different tree automata models. We use standard bottom-up tree automata to represent various sets of ground terms. For (most) binary relations on ground terms, we use anchored ground tree transducers. These are a simplification of the ground tree transducers used in the literature [8–10, 12, 18] with better closure properties, reducing the number of constructions needed to represent the first-order theory of rewriting. Some of these closure properties are proved (and formalized) using the simple but equivalent class of pair automata. The third model are standard tree automata operating on a different signature in order to represent n-ary relations on ground terms, for arbitrary n (including n = 2). In the next section we present the basic definitions. Section 3 introduces the first-order theory of rewriting. In Sect. 4 we introduce in a systematic way several context closure operations on binary relations that are used to represent the binary predicates in the first-order theory of rewriting. Detailed proofs of the various results concerning the three tree automata models that are required for the decision procedure are presented in Sect. 5. Many of the results and tree automata constructions in this section are well-known, but are included for completeness and because the implementation in FORT and the subsequent formalization are directly based on them. Tree automata operate on ground terms. In Sect. 6 we present the formalized signature extension results that allow to reduce certain properties on arbitrary terms to properties on ground terms. In Sect. 7 the decision and synthesis modes of FORT are described, and a new undecidability proof related to the latter is presented. We also discuss the representation of formulas in certificates and the certificate language, and we explain how certificates are validated by FORTify,the verified Haskell program obtained from the Isabelle formalization. Experimental results are presented in Sect. 8, before we conclude in Sect. 9. In an appendix the input syntax and the interface of the tools is presented. The formalization is based on Isabelle/HOL. Our contribution is split into three parts, which are published as separate entries in the Archive of Formal Proofs. The first part [35] contains general results about bottom-up tree automata, ported from IsaFoR, extended with constructions and results about anchored ground tree transducers, pair automata, and regular relation automata. The second part [33] formalizes primitive constructions needed to decide the first-order theory of rewriting. Moreover, it connects the logical semantic entailment of first-order formulas to regular tree languages. This connection gives rise to a natural descrip- tion of the decision procedure. The specification allows tool authors to generate certificates (which can be viewed as a formal proof claim using appropriate automata construction for the corresponding logical connectives and predicates). We rely on the code generation facility http://cl-informatik.uibk.ac.at/isafor/ https://www.isa-afp.org 123 First-Order Theory of Rewriting… Page 3 of 76 14 of Isabelle/HOL to obtain the certifier FORTify that is able to verify the integrity of such certificates. The third part [32] is independent, and covers the results in Sect. 6. The formalization can be accessed via the following links: • https://www.isa-afp.org/entries/Regular_Tree_Relations.html • https://www.isa-afp.org/entries/FO_Theory_Rewriting.html • https://www.isa-afp.org/entries/Rewrite_Properties_Reduction.html Most definitions, theorems, and lemmata in this paper directly correspond to the formal- ization. These are indicated by the  symbol, which links to an HTML rendering of our formalization, for those who like to dive right into the actual Isabelle code. In the running text (traditional) proof details are given. This article combines and extends earlier papers that appeared in conference and informal workshop proceedings. These cover system descriptions of earlier versions of FORT [46, 48], formalization and certification aspects [22, 34, 36, 42], as well as results for dealing with properties on non-ground terms [37, 38, 47]. Many new examples to illustrate the various constructions were added and the presentation is self-contained. The efficiency improvements described in Sect. 7 are new. The same is true for the undecidability result in Sect. 7.5. Also several of the experiments that we present in Sect. 8 have not been described before. 2 Preliminaries In this preliminary section we recall basic definitions and notations of term rewriting [3]and tree automata [8]. 2.1 Term Rewriting We assume a finite signature F containing at least one constant symbol and a disjoint set of variables V. The set of terms built up from F and V is denoted by T (F , V), while T (F ) denotes the (non-empty) set of ground terms. The set of variables occurring in a term t is denoted by Var(t ). A term is linear if it does not contain multiple occurrences of the same variable. Positions are strings of positive integers which are used to address subterms. The set of positions in a term t is denoted by Pos(t ) and the root position by ε. The function symbol at position p ∈ Pos(t ) is denoted by t (p) and t[u] denotes the result of replacing the subterm t| of t at position p by the term u. The height height(t ) of a term t is the length of a longest position in Pos(t ). A substitution is a mapping σ from variables to terms and t σ denotes the result of applying σ to a term t. A context C is a term that contains exactly one hole, denoted by the special constant  ∈ / F. We write C[t] for the result of replacing the hole in C by the term t. A term rewrite system (TRS) R is a set of rules  → r between terms , r ∈ T (F , V).ATRS R is linear if its rewrite rules consist of linear terms. We call R variable-separated if Var()∩ Var(r ) = ∅ for every  → r ∈ R. In this paper we are concerned with finite, linear, variable-separated TRSs R and we (mostly) consider rewriting on ground terms: t → u for ground terms t, u if there exist a context C, a rewrite rule  → r ∈ R, and a substitution σ such that t = C[σ] and u = C[r σ]. We write → for the reflexive and transitive closure of→ . Further relations on terms will be introduced in the next section. We drop the subscript R when it can be inferred from the context. A ground normal form is a ground term t such that t → u for no term u. We write NF(R) for the set of ground normal forms of R. 123 14 Page 4 of 76 A. Middeldorp et al. Example 1 We use the TRS R consisting of the rewrite rules a →bf(a) →bg(a, x ) → f(a) over the signature F ={a, b, f, g} as leading example in this paper. We have f(g(a, b)) → f(f(a)) → f(b) R R with ground normal form f(b). 2.2 Tree Automata A (finite bottom-up) tree automaton A = (F , Q, Q ,) consists of a finite signature F,a finite set Q of states, disjoint from F,asubset Q ⊆ Q of final states, and a set of transition rules . Every transition rule has one of the following two shapes: • f (p ,..., p ) → q with f ∈ F and p ,..., p , q ∈ Q,or 1 n 1 n • p → q with p, q ∈ Q. Transition rules of the second shape are called ε-transitions. Transition rules can be viewed as rewrite rules between ground terms in T (F∪ Q, V). The induced rewrite relation is denoted by → or → . A ground term t ∈ T (F ) is accepted by A if t → q for some q ∈ Q . A f The set of all accepted terms is denoted by L(A) and a set L of ground terms is regular if L = L(A) for some tree automaton A. A tree automaton A is deterministic if there are no ε-transitions and no two transition rules with the same left-hand side. We say that A is completely defined if it contains a transition rule with left-hand side f (p ,..., p ) for every 1 n n-ary function symbol f and every combination p ,..., p of states. All regular sets are 1 n accepted by a completely defined, deterministic tree automaton. The class of regular sets is effectively closed under Boolean operations. Moreover, membership and emptiness are decidable. For relations on ground terms two different types of automata are used. The first one is restricted to binary relations. A ground tree transducer (GTT for short) is a pair G = (A, B) of tree automata over the same signature F.Let s and t be ground terms in T (F ). We say that ∗ ∗ the pair (s, t ) is accepted by G if s → u t for some term u ∈ T (F∪ Q).Here Q is the A B combined set of states of A and B. The set of all such pairs is denoted by L(G). Observe that L(G) is a binary relation on T (F ). A binary relation  on ground terms is a GTT relation if there exists a GTT G such that  = L(G).In FORT we deal with anchored GTTs, which are GTTs with a different acceptance condition: A pair (s, t ) of ground terms is accepted by ∗ ∗ an anchored GTT G if s → q t for some (common) state q. The set of all such pairs A B is denoted by L (G). It can be shown that the resulting language class coincides with binary Rec which is defined in [8, Sect. 3.2.1] as the class of finite unions of Cartesian products of regular sets. The more operational view above benefits the developments described in subsequent sections. We obviously have L (G) ⊆ L(G). Anchored GTT relations have the advantage that they can represent the root-step relation→ , which is not possible with GTT relations as the latter are always reflexive. Moreover, they have better closure properties than GTT relations. When we speak of “anchored GTTs”, we always have L (G) in mind. The second method for representing relations on ground terms uses standard tree automata operating on an encoding of the relation as a set of ground terms over a special signature. For (n) n a signature F and n  0we let F = (F ∪{⊥}) . Here, ⊥ ∈ / F is a fresh constant. The (n) arity of a symbol f ... f ∈ F is the maximum of the arities of f ,..., f and 0 if n = 0. 1 n 1 n (n) Given n terms t ,..., t ∈ T (F ),the term t ,..., t is the unique term u ∈ T (F ) such 1 n 1 n → First-Order Theory of Rewriting… Page 5 of 76 14 that Pos(u) = Pos(t )∪···∪Pos(t ) and u(p) = f ··· f where f = t (p) if p ∈ Pos(t ) 1 n 1 n i i i and⊥ otherwise, for all positions p ∈ Pos(u).If n = 0then Pos(u)={ ε} and u(ε) is the empty sequence. Example 2 For F ={a, b, f, g} in Example 1 we have (2) g(a, f(b)), f(a) = gf(aa, f⊥(b⊥)) ∈ T (F ) (3) a, f(f(b)), g(b, a) = afg(⊥fb(⊥b⊥),⊥⊥a) ∈ T (F ) An n-ary relation R on T (F ) is regular if its encoding { t ,..., t | (t ,..., t ) ∈ 1 n 1 n R} is regular. The class of all n-ary regular relations is denoted by RR . Every (anchored) GTT relation belongs to RR . The well-known construction (presented later in the proof of Theorem 10) is used to decide membership for anchored GTT relations. 3 First-Order Theory of Rewriting We consider first-order logic over a language L without function symbols. The language contains the following binary predicate symbols: →→ = Further predicate symbols will be added to L later in this paper. As models we consider finite linear variable-separated TRSs (F , R) such that the set of ground terms T (F ) is non-empty, which is equivalent to the requirement that the signature F contains at least one constant symbol. The set of ground terms serves as domain for the variables in formulas over L.The interpretation of the predicate symbol→ in (F , R) is the one-step rewrite relation→ over T (F ),→ denotes its transitive-reflexive closure, and= is interpreted as equality on ground terms. Variable-separated TRSs appear naturally when approximating TRSs that satisfy the usual variable restriction (Var(r ) ⊆ Var() for every rewrite rule  → r), to achieve regularity of the set of reachable terms starting from a regular set of ground terms. The support for linear variable-separated TRSs opens up the possibility of using FORT to compute depen- dency graphs based on the non-variable approximation for termination analysis [40], check infeasibility of conditional critical pairs for confluence analysis of conditional TRSs [51], and compute needed redexes based on the strong and non-variable approximations for the analysis of optimal normalizing strategies [18]. The following example gives an idea of the decision procedure for the first-order theory of rewriting. It shows how (closure) operations on tree automata and GTTs are used to obtain tree automata, each of which represent tuples of ground terms satisfying subformulas of the formula of interest. These operations are presented in Sect. 5 together with correctness proofs that have been formalized. Example 3 Consider the formula ϕ=∀ s∃ t (s → t ∧¬∃ u (t → u)) which expresses the normalization property of TRSs. To determine whether a given linear variable-separated TRS R over a signature F satisfies ϕ, we construct automata for the subterms of the formula in a bottom-up fashion. We start with an RR automaton A that 1 1 123 14 Page 6 of 76 A. Middeldorp et al. accepts the ground normal forms in T (F ), using an algorithm first described in [6]and coveredinSect. 5.4: RR A L(A )={t | t ∈ NF(R)} (Theorem 15) 1 1 1 Here t ∈ NF(R) stands for¬∃ u (t → u). Next we construct an anchored GTT G accepting the root-step relation of R: GTT G L (G )={ (s, t ) | s → t} (Theorem 4) 1 a 1 ε Using a modified transitive closure operation, we obtain an anchored GTT G : ∗ ∗ GTT G L (G )={ (s, t ) | s → ·→ ·→ t} (Theorem 8) 2 a 2 ε Since anchored GTT relations are also RR relations we can construct an equivalent RR 2 2 automaton A : ∗ ∗ RR A L(A )={ s, t | s → ·→ ·→ t} (Theorem 10) 2 2 2 ε Using a special context closure operation, we obtain an RR automaton A accepting the 2 3 encoding of→ : RR A L(A )={ s, t | s → t} (Theorem 11) 2 3 3 Before the conjunction in s → t ∧ t ∈ NF(R) can be constructed, the arities of the RR automaton A and the RR automaton A have to match. With this goal A is cylindrified 3 1 1 1 (C ) to construct the RR automaton A . Here care has to be taken that not only the arities 1 2 4 match, but also that terms, taking the place of variables shared by both formulas, are at the same position i in the encoding t ,..., t ,..., t of both automata: 1 i n RR A L(A )={ s, t | t ∈ NF(R)} (Theorem 14) 2 4 4 After this, the intersection with A results in the RR automaton A that models the 3 2 5 conjunction: RR A L(A )={ s, t | s → t ∧ t ∈ NF(R)} (Theorem 12) 2 5 5 Applying the second projection ( , which removes the second component) produces the RR automaton A : 1 6 RR A L(A )={s |∃ t (s → t ∧ t ∈ NF(R))} (Theorem 14) 1 6 6 At this point ϕ holds if and only if L(A ) = T (F ).In FORT the∀ quantifier is transformed into the equivalent¬∃¬. Hence complementation is used to obtain an RR automaton A 1 7 RR A L(A )={s |¬∃ t (s → t ∧ t ∈ NF(R))} (Theorem 13) 1 7 7 and the existential quantifier is implemented using projection. This gives an RR automaton A which either accepts the empty relation ∅ or the singleton set{ ()} consisting of the nullary tuple (). The outermost negation gives rise to another complementation step. The final RR automaton A is tested for emptiness: L(A ) = ∅ if and only the TRS R does not satisfy ϕ. 9 9 123 First-Order Theory of Rewriting… Page 7 of 76 14 Fig. 1 Automata operations for the predicates in the first-order theory of rewriting In order to express termination in the first-order theory of rewriting, we extend L with the binary predicate symbol→ (which denotes the transitive closure of→) and the unary predicate defined below (which goes back to a technical report by Dauchet and Tison [11]). Definition 1 Let  be an arbitrary binary relation on T (F ). We write INF for the set {t ∈ T (F ) | t  u for infinitely many terms u ∈ T (F )}. If we instantiate INF by taking  = → , we obtain the predicate INF that is satisfied by ground terms that have infinitely many reducts. By forbidding cycles, we obtain the formula ¬∃ t (INF ∗ (t ) ∨ t → t ) that expresses termination of finite variable-separated TRSs. The grammar in Fig. 1lists the formalized (closure) operations for the predicates in the first-order theory of rewriting. Here A are anchored GTT relations, R are RR relations, and T are regular sets of ground terms. Some of the operations will be introduced in subsequent sections. The TRS R enters the picture in three places. First of all, → is the root-step relation of R. Secondly, NF denotes the set of ground normal forms of R. Finally, T (F ) denotes the set of ground terms, which depends on the signature F of R. Every atomic subformula (predicate) will be represented as an RR or RR relation. The 1 2 logical structure of formulas in the first-order theory of rewriting is taken care of by additional closure operations on RR relations. 4 Context Operations In the next section we describe formalized automata constructions to decide the first-order theory of rewriting. To save considerable formalization efforts, we introduce a few primitives that operate on binary relations that are accepted by various kinds of tree automata. These primitives are sufficient to generate all binary rewrite relations supported by FORT.For 123 14 Page 8 of 76 A. Middeldorp et al. defining the semantics of the primitives, we introduce some context operations on binary relations in this section. Definition 2 Let F be a signature. A multi-hole context is an element of T (F{ }) where is a fresh constant symbol, called hole.If C is a multi-hole context with n  0 holes and t ,..., t are terms in T (F ) then C[t ,..., t ] denotes the term in T (F ) obtained from C 1 n 1 n by replacing the holes from left to right with t ,..., t . We write C for the set of all multi- 1 n hole contexts. Given a binary relation  on ground terms in T (F ) and a set of multi-hole contexts D ⊆ C, we write D( ) for the relation { (C[t ,..., t ], C[u ,..., u ]) | C ∈ 1 n 1 n D has n holes and t  u for all 1  i  n}. i i We consider two ways to restrict multi-hole contexts: restricting the number of holes and restricting the position of the holes. • We denote the set of multi-hole contexts with exactly one hole by C . The set of multi-hole contexts with at least one hole is denoted by C . Moreover C simply denotes C. • We denote the set of multi-hole contexts with the property that every hole occurs below the root position by C . This includes the set T (F ) of ground terms (which are multi- hole contexts without holes). Similarly, C denotes the set of multi-hole contexts with the property that every hole occurs at the root position. So C ={ }∪ T (F ). Moreover, C simply denotes C. By combining both types of restrictions, we obtain nine ways for defining new binary relations. Definition 3 Let  be a binary relation on T (F ). Given a number constraint n∈{ , 1,>} and a position constraint p ∈{ ,ε,>}, we define the binary relation  on T (F ) as (C ∩ C )( ). = 1 > = Note that  = and  = = ,for any  .Here  = ∪{=} denotes the ε ε reflexive closure of  . Example 4 Recall the TRS R from our leading example and consider the multi-hole contexts C =  C = f() C = g(, a) C = g(, ) C = f(a) 1 2 3 4 5 1 > We have C , C , C ∈ C , C , C , C , C ∈ C , C , C ∈ C ,and C , C , C , C ∈ C . 1 2 3 1 2 3 4 1 5 ε 2 3 4 5 > 1 1 Moreover, (C [a], C [b]) ∈ (→ ) and (C [a, a], C [b, b])/∈ (→ ) . 2 2 R 4 4 R > > Because C = C = C, the relation  is the multi-hole context closure of  .Using the root-step relation → induced by a linear, variable-separated TRS R as  ,weobtain eight different relations for (→ ) : 1 > (→ ) =−→ (→ ) =→ (→ ) =−→˙ ε ε ε 1 > (→ ) =→ (→ ) =→ (→ ) =→ ε ε ε ε ε ε ε ε ε 1 > (→ ) =−→ (→ ) =→ (→ ) =−→ ε > >ε ε >ε ε >ε > > Here −→ denotes a parallel step (which is the multi-hole context closure of →), −→˙ a non- empty parallel step, −→ a parallel step where only redexes below the root are contracted, >ε and−→˙ a non-empty parallel step where only redexes below the root are contracted. >ε Example 5 Consider the term pairs π = (g(a, a), g(b, b)), π = (g(a, a), f(a)),and π = 1 2 3 ˙ ˙  ˙ (g(a, a), g(a, a)).Wehave π ,π ,π ∈−→, π ,π ∈−→, π ∈−→ ,and π ∈−→ \−→ . 1 2 3 1 2 1 >ε 3 >ε >ε 123 First-Order Theory of Rewriting… Page 9 of 76 14 5 Formalized Tree Automata Constructions In this section we present constructions on tree automata and (anchored) GTTs that are required for the decision procedure. Most of the results are known [8]. We give explicit proofs, providing detailed constructions that form the basis of the implementation of the decision procedure in FORT as well as the formalization in Isabelle. Let A = (F , Q, Q ,) be a tree automaton. A state q ∈ Q is reachable if t → q for some term t ∈ T (F ).Wesay that q is productive if C[q]→ q for some ground context C and final state q ∈ Q . The automaton A is trim if all states are both reachable and f f productive. Any tree automaton can be transformed into an equivalent trim automaton. This result has been formalized in IsaFoR by Felgenhauer and Thiemann [21]. The construction preserves determinism. The following results are well-known. Lemma 1 (T ::= T (F )) The set of ground terms over a finite signature F is regular. Theorem 1 (T ::= T ∪ T | T ∩ T | T ) The class of regular sets is effectively closed under union, intersection, and complement. Before we turn to the infinity predicate (T ::= INF ), we present an important closure operation on regular relations. Other closure operations will be presented in Sect. 5.3. Definition 4 Let R be an n-ary relation over T (F ).If n  1and 1  i  n then the i-th projection of R is the relation  (R)={ (t ,..., t , t ,..., t ) | (t ,..., t ) ∈ R}. i 1 i−1 i+1 n 1 n Note that  removes the first component of an RR relation. So for a binary regular 1 n relation R,  (R) coincides with π (R) in the grammar in Fig. 1. 1 2 Theorem 2 (T ::= π (R) | π (R)) The class of regular relations is effectively closed under 1 2 projection. (n) Proof (construction) Let A = (F , Q, Q ,) be a tree automaton that accepts R . Assume n  1and let1  i  n. We construct a tree automaton that accept  (R) .We (n−1) assume that all states of A are reachable and define A = (F , Q, Q , ) where i i is obtained from  by replacing every transition rule of the form f ··· f f f ··· f (p ,..., p ) → q 1 i−1 i i+1 n 1 m with f ··· f f ··· f (p ,..., p ) → q 1 i−1 i+1 n 1 k n−1 provided n = 1or f ··· f f ··· f =⊥ for n > 1. Here k  m is the arity of 1 i−1 i+1 n f ··· f f ··· f . Epsilon transitions in  are not affected. Note that for n = 1this 1 i−1 i+1 n results in an automaton over the signature containing only a single constant () (the nullary tuple). The proof that L(A )=  (R) is given at the end of Sect. 5.3. (2) Example 6 Consider the tree automaton A = (F ,{0,..., 6},{6},) with F = {a, b, f, g} and  consisting of the transition rules aa → 0 bb → 0 gg(0) → 0 ff(0, 0) → 0 ab → 1 bb → 1 gb(2) → 1 fb(2, 2) → 1 a⊥→ 2 b⊥→ 2 g⊥(2) → 2 f⊥(2, 2) → 2 123 14 Page 10 of 76 A. Middeldorp et al. a⊥→ 3 ⊥b → 5 fg(1, 3) → 6 gf(4, 5) → 6 aa → 4 gg(6) → 6 ff(6, 0) → 6 ff(0, 6) → 6 This automaton accepts the encoding of→ on T (F ) induced by the TRS R consisting of the rewrite rules f(x , a) → g(b) g(a) → f(a, b) For the first projection we obtain the automaton  (A) consisting of the transition rules a → 0 b → 0 g(0) → 0 f(0, 0) → 0 b → 1 b → 5 g(1) → 6 f(4, 5) → 6 a → 4 g(6) → 6 f(6, 0) → 6 f(0, 6) → 6 Note that the third row of transitions in  disappeared completely. The rule fg(1, 3) → 6 is transformed into g(1) → 6, so state 3 is dropped. The second projection results in the automaton  (A) that accepts the reducible ground terms of R: a → 0 b → 0 g(0) → 0 f(0, 0) → 0 a → 1 b → 1 g(2) → 1 f(2, 2) → 1 a → 2 b → 2 g(2) → 2 f(2, 2) → 2 a → 3 f(1, 3) → 6 g(4) → 6 a → 4 g(6) → 6 f(6, 0) → 6 f(0, 6) → 6 We now present a formalized proof of a version of the pumping lemma that we need for the infinity predicate INF (in the proof of Theorem 3 below). Lemma 2 Let A = (F , Q, Q ,) be a tree automaton and t → q with t ∈ T (F ) and q ∈ Q. If height(t)> |Q| then there exist contexts C and C = , a term u, and a state p 1 2 ∗ ∗ ∗ such that t = C [C [u]],u → p, C [p]→ p, and C [p]→ q. 1 2 2 1 Proof From the assumptions t → q and height(t)> |Q| we obtain a sequence (t ,..., t , q ,..., q , D ,..., D ) 1 n+1 1 n+1 1 n consisting of ground terms, states, and non-empty contexts with n > |Q| such that • t → q for all i  n + 1, i i • D [t ]= t and D [q ]→ q for all i  n,and i i i+1 i i i+1 • q = q and t = t n+1 n+1 by a straightforward induction proof on t. Because n > |Q| there exist indices 1  i < j  n such that q = q . We construct the contexts C = D [...[D ] ...] and C = i j 1 n j 2 ∗ ∗ D [...[D ] ...]. Note that C =  as i < j.Weobtain C [q ]→ q and C [q ]→ j−1 i 2 2 i j 1 j q by induction on the difference j − i. By letting p = q = q and u = t we obtain the n+1 i j i desired result. 5.1 Infinity Predicate Below we show that INF is regular for every RR relation R. The following definition R 2 originates from [11] and plays an important role in the proof. 123 First-Order Theory of Rewriting… Page 11 of 76 14 (2) Definition 5 Given a tree automaton A = (F , Q, Q ,),the set Q ⊆ Q consists of f ∞ all states q ∈ Q such that ⊥, t → q for infinitely many terms t ∈ T (F ). Example 7 Consider the binary relation n m R={ (f(a, g (b)), g (f(a, b))) | n = 2and m  1or n  3and m = 1} over T (F ) with F ={a, b, f, g}. Its encoding R is accepted by the automaton A = (2) (F , Q, Q ,) with Q ={0,..., 11}, Q ={0},and  consisting of the following f f transition rules: fg(1, 2) → 0 ⊥f(3, 4) → 5 g⊥(6) → 2 b⊥→ 7 fg(8, 9) → 0 ⊥g(5) → 5 g⊥(7) → 6 b⊥→ 11 af(3, 4) → 1 ⊥a → 3 g⊥(10) → 9 ag(5) → 1 af(3, 4) → 8 ⊥b → 4 g⊥(11) → 10 g⊥(11) → 11 For instance, f(a,g(g(b))), g(f(a, b)) = fg(af(⊥a,⊥b), g⊥(g⊥(b⊥))) ∗ ∗ → fg(af(3, 4), g⊥(g⊥(7))) → fg(1, g⊥(6)) → fg(1, 2) → 0 but f(a, g(b), f(a, b)) = ff(aa, gb(b⊥)) is not accepted. We have Q ={5}. State 5 is reached by ⊥, g (f(a, b)) for all n  0. (2) Definition 6 Given A = (F , Q, Q ,), we define the tree automaton (2) ¯ ¯ ¯ A = (F , Q ∪ Q, Q ,∪ ) ∞ f ¯ ¯ Here Q is a copy of Q where every state is dashed: q¯ ∈ Q if and only if q ∈ Q.For every transition rule fg(q ,..., q ) → q ∈  we have the following transition rules in : 1 n fg(q ,..., q )→¯ q if q ∈ Q and f =⊥ (1) 1 n ∞ fg(q ,..., q , q¯ , q ,..., q )→¯ q for all 1  i  n (2) 1 i−1 i i+1 n Moreover, for every ε-transition p → q ∈  we add p¯→¯ q (3) ¯ ¯ to . We write  for ∪ . Dashed states are created by rules of shape (1) and propagated by rules of shapes (2)and (3). The above construction differs from the one in [11]; instead of (1) the latter contains fg(q ,..., q ) →¯ q if q ∈ Q for some i > arity( f ). In an implementation, rather 1 n i ∞ than adding all dashed states and all transition rules of shape (2), the necessary rules would be computed by propagating the dashes created by (1) in order to avoid the appearance of unreachable dashed states. When A is used in isolation, a single bit suffices to record that a dashed state occurred during a computation. Example 8 For the tree automaton A from Example 7 we obtain A by adding the following transition rules (the missing rules of shape (2) involve unreachable states): ¯ ¯ ¯ ¯ ¯ ¯ ¯ ¯ ⊥f(3, 4) → 5 ⊥g(5) → 5 ⊥g(5) → 5 ag(5) → 1 fg(1, 2) → 0 The unique final state of A is 0. We have f(a, g(g(b))), g(f(a, b)) ∈ L(A ) but there is ∞ ∞ no term u such that f(a(g(b)), u ∈ L(A ). 123 14 Page 12 of 76 A. Middeldorp et al. The following preliminary lemma is used in the proof of the theorem below and provides a characterization of the ground terms that reduce to a dashed state. (2) ∗ Lemma 3 Let t be a term in T (F ).Ift → p¯ then there exist a state q ∈ Q ,a context C, and a term s such that t = C[s], root(s) =⊥ f with f ∈ F,s → q¯, and C[¯ q]→ p.¯ Proof Write t = gf (t ,..., t ). We distinguish two cases, depending on when the dash is 1 n introduced in t → p¯. In the first case the dash is created by a root step: ∗ ∗ t → gf (q ,..., q ) → q¯ → p¯ 1 n We have g =⊥ and q ∈ Q by (1). Hence we can take s = t and C = .Notethat root(s) = gf =⊥ f . In the second case the dash is created during the evaluation of an argument t of t, and hence the given sequence t → p¯ can be rearranged as ∗ ∗ t → gf (t ,..., r¯,..., t ) → p¯ 1 n A A ∞ ∞ The induction hypothesis yields a state q ∈ Q , a context C ,and aterm s such that ∗  ∗ t = C [s], root(s) =⊥ f with f ∈ F, s → q¯,and C [¯ q]→ r¯.Inthiscasewe A A ∞ ∞ simply take C = t[C ] = gf (t ,..., C ,..., t ).Wehave t = t[t ] = t[C [s ]] = C[s] i 1 n i i i ∗ ∗ and C[¯ q]= gf (t ,..., C [¯ q],..., t ) → gf (t ,..., r¯,..., t ) → p¯. 1 n 1 n A A ∞ ∞ The following result goes back to a technical report by Dauchet and Tison [11]. Theorem 3 (T ::= INF )Theset INF is regular for every RR relation R. R R 2 (2) Proof Let A = (F , Q, Q ,) be a tree automaton that accepts R . We show that INF =  (L(A )). The regularity of INF then follows from Theorem 2. R 2 ∞ R First suppose t ∈ INF .So t , u ∈ L(A) for infinitely many terms u ∈ T (F ).Since the signature F is finite, there are only finitely many ground terms of any given height. Moreover, height( t , u ) = max (height(t ), height(u)). Hence there must exist a term u ∈ T (F ) with t , u ∈ L(A) such that height(t )+|Q|+ 1 < height(u). This is only possible if there are positions p and q such that p ∈ / Pos(t ), pq ∈ Pos(u),and |Q| < |q|. From Pos( t , u ) = Pos(t )∪ Pos(u) we obtain t , u | = ⊥, u| .Since t , u ∈ L(A) p p there exist states r ∈ Q and q ∈ Q such that f f ∗ ∗ t , u = t , u [ ⊥, u| ] → t , u [r] → q p p p f A A where we assume without loss of generality that the final step in the subsequence ⊥, u| → r uses a non-ε-transition rule. From |Q| < |q| and pq ∈ Pos(u) we infer |Q| < height( ⊥, u| ). Hence we can use the pumping lemma (Lemma 2) to conclude the existence of infinitely many terms v ∈ T (F ) such that ⊥,v → r. Hence r ∈ Q by Definition 5. Since the final step in ⊥, u| → r uses a non-ε-transition rule, we obtain ⊥, u| → r¯ from the construction of A with a final application of a rule of shape (1). p ∞ ∗ ∗ ∗ We obtain t , u [r¯] → q¯ from t , u [r] → q . Hence t , u → q¯ and since p f p f f A A A ∞ ∞ q¯ ∈ Q , t , u ∈ L(A ) and thus t ∈  (L(A )). f f ∞ 2 ∞ Next suppose t ∈  (L(A )).So t , u ∈ L(A ) for some ground terms u. There exists 2 ∞ ∞ a final state q¯ ∈ Q with t , u → q¯ . Using Lemma 3, we obtain a context C,aterm s f f with root(s)=⊥ f for some f ∈ F,and astate q ∈ Q such that C[s]= t , u , s → q¯, and C[¯ q]→ q¯ .Let p be the position of the hole in C.From C[s]= t , u and root(s) = ⊥ f ,weinfer p ∈ Pos(u)\ Pos(t ).Since q ∈ Q the set { v ∈ T (F )| ⊥,v → q} is 123 First-Order Theory of Rewriting… Page 13 of 76 14 Fig. 2 Inference rules for computing Q infinite. Hence the set S={u[v] ∈ T (F )| ⊥,v → q} is infinite, too. Let u[w] ∈ S. p p ∗ ∗ ∗ So ⊥,w → q.Weobtain C[q]→ q from C[¯ q]→ q¯ by erasing all dashes. We f f A A A have C[w]= t , u[w] as p ∈ Pos(u)\Pos(t ). It follows that t , u[w] ∈ L(A) and thus p p there are infinitely many terms u such that t , u ∈ L(A).Since R = L(A) we conclude t ∈ INF as desired. Due to the definition of Q , the automaton A defined in Definition 6 is not executable. ∞ ∞ We present an equivalent but executable definition, which we name Q : Q ={q | p  p and p  q for some state p ∈ Q} Here the relation  is defined using the inference rules in Fig. 2. Intuitively, the first rule initializes the relation. Finding a cycle p  p ensures the existence of infinitely many terms ⊥, s that reduce to p. The other two rules are used to collapse cycles (and other non-empty sequences of ε-transitions) into single steps. Before proving that the two definitions are equivalent, we illustrate the definition of Q by revisiting Example 7. Example 9 We obtain 3  5and 4  5 by applying the first inference rule to the transition rule⊥f(3, 4) → 5. Similarly,⊥g(5) → 5gives rise to 5  5. Since A has no ε-transitions, no further inferences can be made. It follows that Q ={5}. We call a term in T ({⊥}× F ) right-only. A term in T (({⊥}× F )∪{ }) with exactly one occurrence of the hole  is a right-only context. Definition 7 We denote the composition of→ and→ by  . ¬ε ε ∗ ∗ The proof of the next lemma is straightforward. Note that the relations → and  do not coincide on mixed terms, involving function symbols and states. ∗ ∗ Lemma 4 Let C be a ground context. We have C[p]→ q if and only if p → p and C[p ]  q for some state p . ∗ ∗ Proof First we show t  q if t → q, for all ground terms t and states q.Weuse ∗ ∗ induction on t = f (t ,..., t ). The given derivation t → q may be written as t → 1 n ∗ ∗ f (q ,..., q ) → q → q. We obtain t  q for 1  i  n from the induction 1 n i i ¬ε hypothesis. Clearly, f (q ,..., q )  q and hence t  q as desired. 1 n Next we prove the statement of the lemma. The if direction is trivial. For the only-if direction we use induction on the ground context C.Let C[p]→ q.If C =  then we take p = q. Suppose C = f (t ,..., C ,..., t ). We may write the derivation C[p]→ q as 1 n ∗  ∗ t → f (q ,..., q ) → q → q. The induction hypothesis yields a state p such that 1 n ¬ε ∗    ∗ ∗ p → p and C [p ]  q and we obtain t  q for j = i from the first part of the i j j proof. We have f (q ,..., q )  q and hence C[p]= f (t ,..., C [p ],..., t )  q. 1 n 1 n Lemma 5 Q ⊆ Q 123 14 Page 14 of 76 A. Middeldorp et al. Proof We start by proving the following claim: if C[p]  q and C is a non-empty right-only context then p  q (4) We use induction on the structure of C.If C =  there is nothing to show. Suppose C = ⊥ f (t ,..., C ,..., t ) where C is the i-th subterm of C. The sequence C[p]  q can be 1 n ∗  ∗ rearranged as C[p]=⊥ f (t ,..., C [p],..., t )  ⊥ f (q ,..., q ) → q → q.We 1 n 1 n obtain q  q and subsequently q  q by using the inference rules in Fig. 2.If C = i i then p = q and if C =  then the induction hypothesis yields p  q and thus p  q by i i transitivity. This concludes the proof of (4). Assume q ∈ Q , so there exist infinitely many terms t such that ⊥, t → q.Since the signature is finite, there exist terms of arbitrary height. Thus there exists an arbitrary but fixed term t such that the height of t is greater than the number of states of Q. Write t = f (t ,..., t ). Since the height of t is greater than the number of the states in Q,there 1 n exist a subterm s of t,astate p, and contexts C and C =  such that 1 2 1. ⊥, t = C [C [ ⊥, s ]], 1 2 2. ⊥, s → p, 3. C [p]→ p,and 4. C [p]→ q. ∗   ∗ From Lemma 4 we obtain a state q such that p → q and C [q ]  p. Hence q  p by (4). We obtain q  q from q  p in connection with the inference rule for ε-transitions. We perform a case analysis of the context C . • If C =  then p → q and thus q  q follows from q  p in connection with the inference rule for ε-transitions. Hence q ∈ Q . ∗   ∗ • If C =  then Lemma 4 yields a state q such that p → q and C [q ]  q. 1 1 Hence q  q by (4). We also have C [q ]  q and thus q  q by (4). We obtain q  q from the transitivity rule. Hence also in this case we obtain q ∈ Q . For the following lemma, we need the fact that A can be assumed to be trim, so every state is productive and reachable. We may do so because Theorem 3 talks about regular relations, and any automaton that accepts the same language as A will witness the fact that the given relation R is regular. Lemma 6 Q ⊆ Q , provided that A is trim. Proof In connection with the fact that A accepts R ⊆ T (F )× T (F ), trimness of A entails ∗ ∗ ∗ that any run t → q is embedded into an accepting run C[t]→ C[q]→ q ∈ Q .So f f C[t]= u,v for some (u,v) ∈ R, and hence t must be a well-formed term. Moreover, if root(t )=⊥ f for some f ∈ F then t = ⊥, u for some term u ∈ T (F ).Wenow show the converse of claim (4) in the proof of Lemma 5 for the relation→ : if p  q then C[p]→ q for some ground right-only context C =  (5) We prove the claim by induction on the derivation of p  q. First suppose p  q is derived from the transition rule ⊥ f (p ,..., p ,..., p ) → q in  with p = p. Because 1 i n i all states are reachable by well-formed terms, there exist terms t ,..., t ∈ T (F ) such that 1 n ⊥, t → p for all 1  i  n.Let C =⊥ f ( ⊥, t ,..., ,..., ⊥, t ) where the hole i 1 1 n is the i-th argument. We have C [p]→ ⊥ f (p ,..., p ,..., p ) → q. Next suppose 1 1 i n p  q is derived from p  q and q → q. The induction hypothesis yields a ground ∗  ∗ right-only context C =  such that C[p]→ q . Hence also C[p]→ q. Finally, suppose 123 First-Order Theory of Rewriting… Page 15 of 76 14 p  q is derived from p  r and r  q. The induction hypothesis yields non-empty ground ∗ ∗ ∗ right-only contexts C and C such that C [p]→ r and C [r]→ q. Hence C[p]→ q 1 2 1 2 for the context C = C [C ]. This concludes the proof of (5). 2 1 Now let q ∈ Q . So there exists a state p such that p  p and p  q.Using (5), we obtain non-empty ground right-only contexts C and C such that C [p]→ p and 1 2 1 ∗ (2) C [p]→ q. Since all states are reachable, there exists a ground term t ∈ T (F ) such that ∗ ∗ t → p. Hence C [t]→ q and, by the observation made at the beginning of the proof, C [t] is a well-formed term. Since C is right-only, it follows that t = ⊥, u for some term 2 2 u ∈ T (F ). Now consider the infinitely many terms t = C [C [t]] for n  0. We have n 2 t → q and t is right-only by construction. Hence q ∈ Q . n n ∞ Corollary 1 If A is trim then Q = Q . 5.2 Anchored GTT Relations Next we turn our attention to formalized constructions on (anchored) GTTs. Many of the results and automata constructions in this subsection are known. In the formalization we also employ an equivalent but more flexible definition of anchored GTT. Definition 8 A pair automaton is a triple P = (Q, A, B) where A, B are tree automata and ∗ ∗ Q ⊆ Q × Q .Wedefine L(P)={ (s, t ) | s → p and t → q with (p, q) ∈ Q}. A B A B Lemma 7 Anchored GTTs and pair automata are equivalent. Proof If G = (A, B) is a GTT then L (G) = L(P) for the pair automaton P = (Q, A, B) with Q ={ (p, p) | p ∈ Q ∩ Q }. Conversely, given a pair automaton P = (Q, A, B), A B we first rename the states of B to obtain an equivalent tree automaton B such that A and B do not share states. We add an ε-transition p → q to A for every (p, q) ∈ Q, resulting in the tree automaton A .Here q is the (renamed) state in B that corresponds to state q in B. The GTT G = (A , B ) satisfies L (G) = L(P). The above lemma will be used in the sequel without mention. Lemma 8 (A ::= T × T ) If T and U are regular sets of ground terms then T × Uis an anchored GTT relation. Proof Let A = (F , Q , Q , ) and B = (F , Q , Q , ) be tree automata that A fA A B fB B accept T and U.The set T × U is accepted by the pair automaton P = (Q, A, B) with Q = Q × Q . fA fB There are several ways to associate a GTT G = (A, B) with a linear variable-separated TRS R. The one in [9] uses for each rewrite rule  → r of R a unique interface state i, common to A and B, and transition rules and states specific to A (B) that accept all ground instances of  (r) in state i. No states are shared between different rewrite rules. The resulting GTT accepts−→ and→ when viewed as an anchored GTT. The second way to associate a GTT with a linear variable-separated TRS R originates from Dauchet et al. [12]. The resulting GTT accepts a relation in between−→ and→ . The construction that we formalized can be seen as a pair automaton version of the construction in [9]. Theorem 4 [A ::= → ] The relation → is an anchored GTT relation for every linear ε ε variable-separated TRS R. 123 14 Page 16 of 76 A. Middeldorp et al. Proof Let R be a linear variable-separated TRS over a signature F. We denote the set of left-hand (right-hand) sides of the rules in R by lhs(R) (rhs(R)). Given a set of terms T,we write s  T if s is a subterm of some term in T . Given a term s we write sˆ for the ground term obtained from s by replacing each variable with a designated (fresh) constant∗.Let Q be the set of states t for each t  lhs(R)∪ rhs(R).The set  consists of the transitions lhs f ( t ,..., t ) →  f (t ,..., t ) 1 n 1 n for every f (t ,..., t )  lhs(R) and, if some term in lhs(R) contains a variable, 1 n f ( ∗ ,..., ∗ ) → ∗ for every f ∈ F.The set  is defined similarly, using rhs(R) instead of lhs(R) for rhs generating the rules. We now define P = (Q, , ) with Q ={ (  , ˆr ) |  → r ∈ lhs rhs R}. It is easy to prove that L (P )=→ . a R ε The other binary relations associated with a TRS R (like−→ and↔ ) will be obtained from the root-step relation → by automata constructions that operate on anchored GTT relations and RR relations. Example 10 The pair automaton P = (Q, A, B) constructed in the above proof consists of the transition rules : a→ ∗ b→ ∗ f( ∗ )→ ∗ g( ∗ , ∗ )→ ∗ a→ a f( a )→ f(a) g( a , ∗ )→ g(a,∗) : a→ a b→ b f( a )→ f(a) Q: ( a , b )( f(a) , b ) g(a,∗) , f(a) ) and accepts the root-step relation→ of our leading TRS R.The statepairs in Q are presented as ε-transitions and perform the transfer from left-hand sides to right-hand sides of R.For ∗ ∗ instance, g(a, f(f(b))) → f(a) is witnessed by g(a, f(f(b))) → g(a,∗) → f(a) A B f(a). To shorten the notation in subsequent examples, we number the states as follows: 0= ∗ 1= a 2= f(a) 3= g(a,∗) 4= b Hence the transition rules are presented as follows: : a → 0 b → 0 f(0) → 0 g(0, 0) → 0 a → 1 f(1) → 2 g(1, 0) → 3 : a → 1 b → 4 f(1) → 2 Q: (1,4) (2,4) (3,2) To turn P into an equivalent anchored GTT G = (A , B ) we rename states 1 and 2 in B R R into 5 and 6 and add the pairs in Q as ε-transitions to A, after applying the renaming to their targets: : a → 0 b → 0 f(0) → 0 g(0, 0) → 0 a → 1 f(1) → 2 g(1, 0) → 3 1 →42 →43 → 6 : a → 5 b → 4 f(5) → 6 Next we turn to composition and transitive closure. → First-Order Theory of Rewriting… Page 17 of 76 14 Fig. 3  (A, B) Definition 9 Given tree automata A and B,  (A, B) is the set of ε-transitions  defined by the inference rules in Fig. 3. The inference rule[c] appeared first in [17]. Since there are only finitely many ε-transitions between states in Q,  (A, B) can be effectively computed. The next result provides a useful equivalent characterization (which is presented as a definition in [8, 12]). Example 11 For the (anchored) GTT G of Example 10, which will be referred to as G = (A, B) in the following, the set  (A, B) consists of the following seven ε-transitions: 0  5 [c] (0 ← a → 5) 0  6 [c] (0 ← f(0)  f(5) → 6) A B A B 1  5 [c] (1 ← a → 5) 2  6 [c] (2 ← f(1)  f(5) → 6) A B A B 0  4 [c] (0 ← b → 4) 4  5 [a] (4 ← 1  5) A B A 4  6 [a] (4 ← 2  6) Since B does not contain ε-transitions, the inference rule[b] is not used here. Lemma 9 If A and B are tree automata over a signature F then ∗ ∗ (A, B) ={ p  q | p t → q for some ground term t ∈ T (F )} A B ∗ ∗ Proof First suppose there exists a ground term t ∈ T (F ) with p t → q for states A B p of A and q of B.Weshow p  q by induction on t = f (t ,..., t ). The sequence 1 n ∗ ∗  ∗ t → p can be written as t → f (p ,..., p ) → p → p with states p ,..., p , p of 1 n 1 n A A A A ∗  ∗ A. Similarly, t → f (q ,..., q ) → q → q with states q ,..., q , q of B.Wehave 1 n 1 n B B B ∗ ∗ p t → q and thus p  q by the induction hypothesis, for 1  i  n. Hence we i i i i i A B obtain p  q by[c]. Repeated applications of the inference rules[a] and[b] in connection ∗  ∗ with p → p and q → q yields p  q. Hence p  q ∈  (A, B) as desired. A B Next suppose p  q ∈  (A, B). We show the existence of a ground term t ∈ T (F ) ∗ ∗ such that p t → q by induction on the derivation of p  q. In the base case [c] is A B used with p a and a → q for some constant a and hence we can take t = a.For the A B induction step we consider three cases, depending on which inference rule is used to derive p  q. First suppose[c] is used. So there exist transition rules f (p ,..., p ) → p in A and 1 n f (q ,..., q ) → q in B such that p  q for 1  i  n. The induction hypothesis yields 1 n i i ∗ ∗ ∗ ∗ ground terms t ,..., t such that p t → q for 1  i  n. Hence p t → q 1 n i i i A B A B for t = f (t ,..., t ). Next suppose [a] is applied to derive p  q. So there exists a state 1 n p such that p p  q. The induction hypothesis yields a ground term t ∈ T (F ) such ∗ ∗ ∗ ∗ that p t → q and hence also p t → q. The reasoning for[b] is the same. A B A B Theorem 5 (A ::= A ◦ A) Anchored GTT relations are effectively closed under composition. → → → → → 14 Page 18 of 76 A. Middeldorp et al. Fig. 4  (P ) for P = (Q, A, B) Proof Let P = (Q , A , B ) and P = (Q , A , B ) be pair automata (operating on terms 1 1 1 1 2 2 2 2 over the same signature). We construct the pair automaton P = (Q, A , B ) with 1 2 Q = Q ◦  (B , A ) ◦ Q 1 ε 1 2 2 We claim that L(P) = L(P ) ◦ L(P ). First let (s, t ) ∈ L(P).Wehave s → p and 1 2 t → q for some (p, q) ∈ Q. The definition of Q yields states p and q such that (p, p ) ∈ Q , (p , q ) ∈  (B , A ),and (q , q) ∈ Q . According to Lemma 9 there 1 ε 1 2 2 ∗  ∗ exists a ground term u such that u → p and u → q . Hence (s, u) ∈ L(P ) and B A 1 2 (u, t ) ∈ L(P ) and thus (s, t ) ∈ L(P ) ◦ L(P ). 2 1 2 For the converse, let (s, t ) ∈ L(P ) ◦ L(P ). So there exists a ground term u such that 1 2 (s, u) ∈ L(P ) and (u, t ) ∈ L(P ). Hence there are pairs (p , q ) ∈ Q and (p , q ) ∈ Q 1 2 1 1 1 2 2 2 ∗ ∗ ∗ ∗ such that s → p , u → q , u → p ,and t → q . Lemma 9 yields (q , p ) ∈ 1 1 2 2 1 2 A B A B 1 1 2 2 (B , A ). Hence (p , q ) ∈ Q and thus (s, t ) ∈ L(P). ε 1 2 1 2 Example 12 We compose the pair automaton P = (Q, A, B) of Example 10 with itself. We have  (B, A) =  (A, B) ={ (1, 0), (1, 1), (4, 0), (2, 2), (2, 0)}. Hence we obtain ε ε the pair automaton P = (Q , A, B) with Q = Q ◦  (B, A) ◦ Q ={ (3, 4)}.We have L(A, 3) ={g(a, t ) | t ∈ T (F )} and L(B, 4) ={b}. Hence, we obtain L(P ) = L(A, 3)× L(B, 4)=→ as expected. Theorem 6 (A ::= A ) Anchored GTT relations are effectively closed under transitive closure. Proof Let P = (Q, A, B) be a pair automaton. We construct the pair automaton P = ( (P), A, B) where  (P) is the binary relation on states defined by the inference rules + + in Fig. 4 . We claim that L(P ) = L(P) . From the first inference rule we immediately obtain L(P) ⊆ L(P ). The second inference rule, together with the definition of Q in the proof of Theorem 5, yields L(P ) ◦ L(P ) ⊆ L(P ). Hence L(P) ⊆ L(P ). + + + + For the converse, let (s, t ) ∈ L(P ). So there exists a pair p  q such that s → p and ∗ + t → q.Weprove (s, t ) ∈ L(P) by induction on the derivation of p  q.If (p, q) ∈ Q then (s, t ) ∈ L(P). Suppose p  p , (p , q ) ∈  (B, A),and q  q. According to ∗  ∗ Lemma 9 there exists a ground term u such that u → p and u → q . The induction B A + + + hypothesis yields (s, u) ∈ L(P) and (u, t ) ∈ L(P) . Hence also (s, t ) ∈ L(P) . Example 13 Consider the pair automaton P = (Q, A, B) of Example 10. As observed in Example 12,  (B, A)={ (1, 0), (1, 1), (4, 0), (2, 2), (2, 0)}. Hence we obtain the pair automaton P = ( (P), A, B) with  (P) ={ (1, 4), (2, 4), (3, 2), (3, 4)}. The pair + + + (3, 4) is obtained from the second inference rules with p = 3, q = q = 2and r = 4. We have g(a, b) → f(a) → b and the pair (g(a, b), b) is accepted by P as g(a, b) → 3and ε ε + b → 4 with (3, 4) ∈  (P). Furthermore, g(a, b) → f(a) → f(b) but g(a, b) → f(b) B + ε does not hold, and one readily checks that the pair (g(a, b), f(b)) is not accepted by P . Two further closure operations on anchored GTT relations are inverse and union. Recall that GTT relations are not closed under union. 123 First-Order Theory of Rewriting… Page 19 of 76 14 Lemma 10 (A ::= A | A∪ A) Anchored GTT relations are effectively closed under inverse and union. − − Proof Given a pair automaton P = (Q, A, B),wehave L(P) = L(P ) for the pair − − − automaton P = (Q , B, A).Here Q ={ (q, p) | (p, q) ∈ Q}. Given pair automata P = (Q , A , B ) and P = (Q , A , B ) without common states, L(P )∪ L(P ) = L(P) 1 1 1 1 2 2 2 2 1 2 for the pair automaton P = (Q ∪ Q , A ∪ A , B ∪ B ). 1 2 1 2 1 2 Next we present a modified composition operation ◦ that preserves anchored GTT relations. Definition 10 Given two binary relations  and  on the same set of ground terms, their 1 2 modified composition  ◦ is defined as the relation 1 2 ◦ = ◦ ( ) ∪ ( ) ◦ 1 2 1 2 1 2 We have ( ◦ ) = ( ) ◦ ( ) . The proof that anchored GTT relations are 1 2 1 2 closed under ◦ requires a preliminary result on the interplay between GTTs and anchored GTTs. Lemma 11 The composition of an anchored GTT relation and a GTT relation is an anchored GTT relation. Proof Let P = (Q, A , B ) be a pair automaton and G = (A , B ) a GTT. Without loss of 1 1 2 2 generality we assume that P and G do not share states. Define the pair automaton P = (Q, A , B ∪  (A , B )∪ B ) 1 1 ε 2 1 2 ∗ ∗ We claim that L(P ) = L(P) ◦ L(G). First let (s, t ) ∈ L(P ).So s → p and t → q A B with (p, q) ∈ Q and B abbreviating B ∪  (A , B )∪ B . Because P and G do not share 1 ε 2 1 2 states, the sequence t → q can be rearranged as follows: ∗ ∗ ∗ t = C[t ,..., t ]→ C[q ,..., q ]→ C[r ,..., r ]→ q 1 n 1 n 1 n B  (A ,B ) B 2 ε 2 1 1 Here C is a multi-hole context with n  0 holes. Using Lemma 9 we obtain ground terms ∗ ∗ u ,..., u such that u → q and u → r for all 1  i  n. Define the term u = 1 n i i i A B 2 1 ∗ ∗ C[u ,..., u ].Wehave u → C[r ,..., r ]→ q and thus (s, u) ∈ L(P). Furthermore, 1 n 1 n B B 1 1 u → C[q ,..., q ] and thus also (u, t ) ∈ L(G). Hence (s, t ) ∈ L(P) ◦ L(G). 1 n For the converse direction, let (s, t ) ∈ L(P) and (t , u) ∈ L(G).So s → p and t → q with (p, q) ∈ Q. Moreover, there exists a multi-hole context C with n  0 holes, terms t ,..., t , u ,..., u , and states r ,..., r such that t = C[t ,..., t ], u = 1 n 1 n 1 n 1 n ∗ ∗ ∗ C[u ,..., u ],and t → r and u → r for all 1  i  n. The sequence t → q 1 n i i i i A B B 2 2 1 ∗ ∗ can be written as t = C[t ,..., t ]→ C[q ,..., q ]→ q for some states q ,..., q . 1 n 1 n 1 n B B 1 1 By Lemma 9, r → q is a transition rule in  (A , B ). Hence u = C[u ,..., u ]→ i i ε 2 1 1 n ∗ ∗ C[r ,..., r ]→ C[q ,..., q ]→ q and thus (s, u) ∈ L(P ) as desired. 1 n 1 n (A ,B ) B ε 2 1 1 Example 14 We consider the pair automaton P and the GTT G of Example 10.The R R construction in the above proof requires that P and G do not share states, so we R R rename the states of G (by adding a prime). We obtain the pair automaton P = ({ (1, 4), (2, 4), (3, 2)}, A , B ) with A : a → 0 b → 0 f(0) → 0 g(0, 0) → 0 a → 1 f(1) → 2 g(1, 0) → 3 123 14 Page 20 of 76 A. Middeldorp et al. Fig. 5  (A, B) B : a → 1 b → 4 f(1) →20 →11 → 1 a → 5 b → 4 f(5 ) → 6 0 →40 → 2 2 →24 →14 → 2 We can also trim the resulting pair automata by trimming the underlying automata A and B . We declare a state q of A to be productive if C[q]→ r for some context C and state r ∈{ p | (p, p ) ∈ Q}. For the automaton B we use the second components{ p | (p, p ) ∈ Q}. In our case A is already trim, but B simplifies to a → 1 b → 4 f(1) → 2 b → 4 4 →14 → 2 We have L(P )={ (f(a), b), (a, b)}∪{g(a, t ) | t ∈ T (F )}×{b, f(a), f(b)}, which indeed coincides with the relation→ ·−→ induced by our leading TRS R. Theorem 7 (A ::= A ◦ A) Anchored GTT relations are effectively closed under modified composition. Proof The construction L(P)× L(G) → L(P ) in the proof of Lemma 11 and its symmetric counterpart L(G)× L(P) → L(P ) in connection with Lemma 10 ensure that  ◦ is 1 2 an anchored GTT relation. In Theorem 6 we have seen that anchored GTT relations are closed under transitive closure. GTT relations are also closed under transitive closure, which is the reason they were developed in the first place, but the construction is different from the one for anchored GTT relations and the correctness proof is considerably more involved. We present this construction as a modified transitive closure operation that preserves anchored GTT relations. Definition 11 The modified transitive closure  of a binary relation  on ground terms is defined as the relation + + + = ( ) ◦ ◦ ( ) + + We have ( ) = ( ) . The proof that anchored GTT relations are effectively closed under+ employs the set  (A, B) consisting of ε-transitions p  q that are computed by the inference rules in Fig. 5. Definition 12 Given a GTT G = (A, B),wewrite A for A∪  (B, A) and B for B ∪ + + + (A, B).The GTT G is defined as (A , B ). + + + + According to the following lemma, the multi-hole context closure of an anchored GTT relation is a GTT relation using the same GTT. Lemma 12 For every GTT G,L(G) = L (G) . 123 First-Order Theory of Rewriting… Page 21 of 76 14 Proof Let G = (A, B).If (s, t ) ∈ L(G) then there exist a context C with n  0 holes, terms s ,..., s , t ,..., t , and states q ,..., q with s = C[s ,..., s ], t = C[t ,..., t ],and 1 n 1 n 1 n 1 n 1 n ∗ ∗ s → q t for all 1  i  n.Wehave (s , t ) ∈ L (G) for all 1  i  n by definition i i i i i a A B of anchored GTTs. Moreover, C ∈ C ∩C . Hence (s, t ) ∈ L (G) . The converse is equally easy. − + Lemma 13 Let G = (A, B) be a GTT. If (p, q) ∈  (A, B) then (s, t ) ∈ L(G ) for some ground terms s ∈ L(A, p) and t ∈ L(B, q). Proof We use induction on the relation  defined by the inference rules in Fig. 5. In the base case [c] is used with p a and a → q for some constant a and hence we can take A B s = t = a. For the induction step we consider four cases, depending on which inference rule is used to derive p  q. First suppose [c] is used. So there exist transition rules f (p ,..., p ) → p in A and f (q ,..., q ) → q in B such that p  q for 1  i  n.The 1 n 1 n i i − + induction hypothesis yields ground terms s ,..., s , t ,..., t such that (s , t ) ∈ L(G ) , 1 n 1 n i i s ∈ L(A, p ),and t ∈ L(B, q ) for 1  i  n.Let s = f (s ,..., s ) and t = f (t ,..., t ). i i i i 1 n 1 n − + We have s ∈ L(A, p) and t ∈ L(B, q). Moreover, (s, t ) ∈ L(G ) because the transitive closure of a parallel relation is parallel. Next suppose[a] is applied to derive p  q.Sothere exists a state p such that p p  q. The induction hypothesis yields ground terms s and − + t such that (s, t ) ∈ L(G ) , s ∈ L(A, p ),and t ∈ L(B, q). Hence also s ∈ L(A, p).The reasoning for[b] is similar. The final case is the transitivity rule[t].So p  r and r  q for − + some state r. The induction hypothesis yields terms s, t, u, v such that (s, u), (v, t ) ∈ L(G ) , s ∈ L(A, p), u ∈ L(B, r ), v ∈ L(A, r ),and t ∈ L(B, q).From u ∈ L(B, r ) and v ∈ L(A, r ) − − + we infer (u,v) ∈ L(G ). Together with (s, u), (v, t ) ∈ L(G ) , we obtain the desired − + (s, t ) ∈ L(G ) . ∗ ∗ Lemma 14 Let G = (A, B) be a GTT. Let G = (A , B ).Ifs → qthen t → qfor + + + A A some ground term t with (s, t ) ∈ L(G) . Proof We proceed by induction on the length of the reduction s → p. If the last step is an epsilon transition q → p then the induction hypothesis yields a ground term u with (s, u) ∈ L(G) and u ∈ L(A, q).If q → p is a transition from A then u ∈ L(A, p), and we conclude by letting t = u; otherwise, q → p must come from  (B, A),and using Lemma 13 we obtain ground terms v and w with v ∈ L(B, q), w ∈ L(A, p),and + + (v, w) ∈ L(G) . This implies (u,v) ∈ L(G) and thus (s,w) ∈ L(G) by transitivity. Letting t = w gives the desired result. If the last step is not an ε-transition, then it must be a transition f (p ,..., p ) → p from A, and we have s = f (s ,..., s ) for suitable s ,..., s .We 1 n 1 n 1 n apply the induction hypothesis to each argument position, resulting in t ,..., t with (s , t ) ∈ 1 n i i L(G) and t ∈ L(A, p ) for 1  i  n.Let t = f (t ,..., t ).Wehave t ∈ L(A, p).Since i i 1 n + ∗ L(G) is transitive and closed under contexts, we obtain (s, t ) ∈ L(G) .Since L(G) is reflexive, we actually have (s, t ) ∈ L(G) as desired. Lemma 15 Let G = (A, B) be a GTT. If G = (A , B ) then  (A , B ) + + + ε + + =  (A, B). Proof We first show  (A , B ) ⊆  (A, B) via induction on the relation  defined by ε + + + the inference rules in Fig. 3. We proceed by case analysis, so assume (p, q) ∈  (A , B ) ε + + is derived from a congruence step[c]. Hence we obtain (p, q) ∈  (A, B) by a congruence step [c] of Fig. 5, the fact that the constructions only add ε-transitions, and the induction hypothesis. Next assume that we derived (q, r ) ∈  (A , B ) by an ε-step[a].So p → q ε + + A → 14 Page 22 of 76 A. Middeldorp et al. and p  r.Wehave A = A ∪  (B, A). The result trivially follows for p → q. + + A So let (p, q) ∈  (B, A). Hence (q, p) ∈  (A, B). The induction hypothesis yields + + (p, r ) ∈  (A, B) and therefore (q, r ) ∈  (A, B) using the transitivity rule [t].The + + ε-step[b] case is obtained in the same way. For the reverse inclusion we use induction on the relation  defined by the inference rules in Fig. 5 and argue in a similar fashion. Hence  (A , B ) =  (A, B) as desired. ε + + + Theorem 8 (A ::= A ) Anchored GTT relations are effectively closed under modified transitive closure. Proof Let G = (A, B) beaGTT.Weshow L (G ) = L (G) . First let (s, t ) ∈ L (G ). a + a a + ∗ ∗ So there exists a state q such that s → q and t → q. Lemma 14 yields a ground A B + + ∗ + − term u such that u → q and (s, u) ∈ L(G) . Applied to G = (B, A), Lemma 14 yields ∗ − + a ground term v such that v → q and (t,v) ∈ L(G ) . Hence (u,v) ∈ L (G) and + + + (v, t ) ∈ L(G) . Consequently, (s, t ) ∈ L(G) ◦ L (G) ◦ L(G) and, using Lemma 12, + + + L(G) ◦ L (G) ◦ L(G) = L (G) . a a For the other direction we apply the modified composition operation ◦ of Definition 10 with  = = L (G ) and obtain 1 2 a + L (G ) ◦ L(G ) ∪ L(G ) ◦ L (G ) ⊆ L (G ) ◦ L (G ) = L (G ) a + + + a + a + a + a + with the help of Lemma 15. Note that we do not get equality, as one direction in the proof of Lemma 11 requires disjoint state sets. Since L (G) ⊆ L (G ) we also have a a + L (G) ◦ L(G ) ∪ L(G ) ◦ L (G) ⊆ L (G ) a + + a a + At this point we can use the following well-known result in Kleene algebra ∗ ∗ A ⊆ X ∧ B ◦ X ⊆ X ∧ X ◦ C ⊆ X ⇒ B ◦ A ◦ C ⊆ X ∗ + with A = L (G), B = C = L(G),and X = L (G ).Since L(G) = L(G) , we are done. a a + Example 15 For the GTT G = (A, B) of Example 11 we have  (A, B) =  (A, B). + ε Hence G = (A , B ) adds the pairs of  (B, A) ={(5, 0), (5, 1), (4, 0), (6, 0), + + + + (6, 2), (5, 4), (6, 4)} as ε-transitions to A and those of  (A, B) =  (B, A) to B.We + + have (g(a, b), f(b)) ∈ L (G ) as g(a, b) → 6and f(b) → f(4) → f(5) → 6. a + B B B + + + The term pair (f(a), f(b)) does not belong to L (G ). a + The penultimate operation on anchored GTT relations that we consider is complement. This requires the determinization of pair automata. Lemma 16 For every pair automaton P = (Q, A, B) there exist deterministic tree automata d d A and B and a binary relation Q such that L(P) = L((Q , A , B )). Proof We use the subset construction to determinize A and B into equivalent deterministic tree automata A and B . As the binary state relation we take Q ={ (X , Y ) | (p, q) ∈ Q for some p ∈ X ⊆ Q and q ∈ Y ⊆ Q }.Wehave L(P) = L((Q , A , B )) by the correctness A B of the subset construction. Theorem 9 (A ::= A ) Anchored GTT relations are effectively closed under complement. Proof Let G be an anchored GTT. According to Lemma 16 we may assume that L(G) is accepted by a deterministic pair automaton P = (Q, A, B). Without loss of generality we c c may further assume that A and B are completely defined. It follows that L(P) = (Q , A, B) where Q = (Q × Q )\Q. A B 123 First-Order Theory of Rewriting… Page 23 of 76 14 It is worth noting that GTT relations are not closed under complement [8,Exercise3.4]. Example 16 For the pair automaton P = (Q, A, B) of Example 10 we have Q = { (1, 4), (2, 4), (3, 2)}. Determinizing A yields the tree automaton A with the following transition rules: C if X = A D if X = A a → A b → B f(X ) → g(X , Y ) → B otherwise B otherwise for all X , Y ∈{ A, B, C , D}.Here A={0, 1}, B ={0}, C ={0, 2},and D={0, 3}.Next we determinize B to obtain the tree automaton B consisting of the following transition rules: G if X = E a → E b → F f(X ) → g(X , Y ) → H H otherwise for all X , Y ∈{ E , F , G, H}.Here E ={1}, F ={4}, G ={2},and H = ∅. The transition rules for g are added to make B completely defined. Now the complement L(G) of L(G) is accepted by the pair automaton (Q , A , B ) with Q = ({ A, B, C , D}×{ E , F , G, H})\{ (A, F ), (C , F ), (D, G)} The final closure property of anchored GTT relations that we mention is intersection. Lemma 17 (A ::= A∩ A) Anchored GTT relations are effectively closed under intersection. Proof This follows from Theorem 9 and Lemma 10. The formalized proof uses a more efficient product construction, to avoid the subset construction of the complement. 5.3 Regular Relations We continue with operations on regular relations. Again, most of the results and constructions are known. We provide detailed proofs that form the basis of the formalization. The following lemma takes care of transforming anchored GTT relations into binary regular (i.e., RR ) relations. Theorem 10 (R ::= A) Every anchored GTT relation is an RR relation. Proof Let G = (A, B) be a GTT. We construct an RR automaton that accepts L (G).We 2 a use a product construction with states pq where p is a state of A or⊥,and q is a state of B or⊥;the state⊥⊥ is not used. The transitions are fg(p q ,..., p q ) → pq 1 1 k k f⊥(p ⊥,..., p ⊥) → p⊥ 1 n ⊥g(⊥q ,...,⊥q )→⊥q 1 m for all f (p ,..., p ) → p ∈ A and g(q ,..., q ) → q ∈ B,where k = max(n, m) and 1 n 1 m p =⊥ if n < i  k and q =⊥ if m < j  k,and i j pq → p q for all p → p ∈ A and q ∈ Q ∪{⊥} pq → pq for all q → q ∈ B and p ∈ Q ∪{⊥} 123 14 Page 24 of 76 A. Middeldorp et al. These transitions accept s, t in state pq if and only if s ∈ L(A, p) and t ∈ L(B, q).As final states we pick pp with p ∈ Q ∩ Q . A straightforward induction proof reveals that A B the resulting tree automaton accepts L (G). We illustrate the construction on our leading example. Example 17 For the anchored GTT G of Example 11 we obtain the RR automaton A = (2) (F , Q, Q ,) with Q = ({0, 1, 2, 3, 4, 6,⊥}×{4, 5, 6,⊥})\{⊥⊥}, Q ={44, 66}, f f and  consisting of the following transition rules: aa → 05 ab → 04 af(⊥5) → 06 aa → 15 ab → 14 af(⊥5) → 16 ba → 05 bb → 04 bf(⊥5) → 06 fa(0⊥) → 05 fb(0⊥) → 04 ff(05) → 06 fa(1⊥) → 25 fb(1⊥) → 24 ff(15) → 26 ga(0⊥, 0⊥) → 05 gb(0⊥, 0⊥) → 04 gf(05, 0⊥) → 06 ga(1⊥, 0⊥) → 35 gb(1⊥, 0⊥) → 34 gf(15, 0⊥) → 36 a⊥→ 0⊥ b⊥→ 0⊥⊥a→⊥5 a⊥→ 1⊥⊥b→⊥4 f⊥(0⊥) → 0⊥ f⊥(1⊥) → 2⊥⊥f(⊥5)→⊥6 g⊥(0⊥, 0⊥) → 0⊥ g⊥(1⊥, 0⊥) → 3⊥ 14 → 44 24 → 44 34 → 64 15 → 45 25 → 45 35 → 65 16 → 46 26 → 46 36 → 66 1⊥→ 4⊥ 2⊥→ 4⊥ 3⊥→ 6⊥ We have ∗ ∗ g(a, f(b)), f(a) = gf(aa, f⊥(b⊥)) → gf(15, f⊥(0⊥)) → gf(15, 0⊥) → 66 The various context closure operations are taken care of in the following general result. n n Theorem 11 (R ::= R ) If R is an RR relation then R is an RR relation, for all n ∈ 2 2 p p { , 1,>} and p∈{ ,ε,>}. (2) Proof Let A = (F , Q, Q ,) be the RR automaton that accepts R.Weadd twonew f 2 states ∗ and . In the former the encoding of the identity relation on ground terms will be accepted. The latter will serve as the unique final state (unless specified otherwise). This is achieved by extending  with the transitions ff (∗,...,∗) →∗ for every f ∈ F and (2) q →  for every q ∈ Q . The resulting automaton A = (F , Q∪{ ,∗},{ }, ) is equivalent to A and the starting point for the various context closure operations. • For n = 1and p =  we extend  with all rules of the form ff (∗,...,∗, ,∗,...,∗) → • For p = > we need a new final state  to ensure that the surrounding context is non-empty: ff (∗,...,∗, ,∗,...,∗) →  ff (∗,...,∗,  ,∗,...,∗) → 123 First-Order Theory of Rewriting… Page 25 of 76 14 This is sufficient for n = 1. For n = > we add the single ε-transition  →∗ and for n =  we additionally add a new final state ∗ together with transition rules ensuring that the accepted relation is reflexive: ff (∗ ,...,∗ )→∗ • For n = p =  we make∗ the new (and only) final state and add the ε-transition →∗. • For p = ε and n∈{1,>} we have R = R and thus we can just take the RR automaton n = A.For n =  we have R = R and declare∗ as an additional final state. • In the remaining case we have p =  and n = >.Weextend  with all rules of the form ff (∗,...,∗, ,∗,...,∗) → and the single ε-transition →∗. The proof details can be found in the formalization. Example 18 The following transition rules are added to the RR automaton of Example 17 to model the relation L (G) =−→ : a >ε aa→∗ 44 →  ff() →  ff( ) → bb→∗ 66 →  gg(,∗) →  gg( ,∗) → ff(∗)→∗ →∗ gg(∗, ) →  gg(∗,  ) → gg(∗,∗)→∗ The encoding of the term pair (g(f(a), f(a)), g(b, f(b))) is accepted: gg(fb(a⊥), ff(ab)) → ∗ ∗ ∗ gg(fb(1⊥), ff(14)) → gg(24, ff(44)) → gg(44, ff()) → gg(,  ) → gg(∗,  ) → We present one more operation that turns a regular set into an RR relation. Here = 2 T consists of all pairs (t , t ) with t ∈ T . Lemma 18 (R ::== ) If T ⊆ T (F ) is regular then= is an RR relation. T T 2 Proof Let A = (F , Q, Q ,) be a tree automaton that accepts T.Weturn A into the (2) automaton B = (F , Q, Q , ),where  is obtained from  by modifying every tran- sition rule f (p ,..., p ) → q of  into ff (p ,..., p ) → q.The ε-transitions of  are 1 n 1 n kept. It is a trivial exercise to show that L(B) == == . L(A) T The following result is an immediate consequence of the corresponding closure properties on regular sets (Theorem 1). Theorem 12 (R ::= R∪ R | R∩ R) The class of n-ary regular relations is effectively closed under union and intersection for any n  0. The final closure operations on regular relations are required for the logical structure of formulas in the first-order theory of rewriting. Theorem 13 (R ::= R ) The class of regular relations is effectively closed under comple- ment. c c c Given a regular relation R, its complement is denoted by R .Notethat R = R . The former is the topic of Theorem 13 and is used to model logical negation. 123 14 Page 26 of 76 A. Middeldorp et al. n c c c Proof Let R ⊆ T (F ) be a regular relation. We have R = R \W where (n) W ={t ∈ T (F ) | t = t ,..., t for some t ,..., t ∈ T (F )} 1 n 1 n is the set of encodings of n-tuples of ground terms. It is not difficult to show that W is regular. The set R is regular by assumption. Hence the regularity of R is a consequence of Theorem 1. Definition 13 Let R be an n-ary relation over T (F ).If1  i  n + 1 then the i-th cylindrification of R is the relation C (R)={ (t ,..., t , u, t ,..., t ) | (t ,..., t ) ∈ R and u ∈ T (F )} i 1 i−1 i n 1 n Moreover, if σ is a permutation on{1,..., n} then σ(R)={ (t ,..., t ) | (t ,..., t ) ∈ R} σ(1) σ(n) 1 n Theorem 14 The class of regular relations is effectively closed under cylindrification and permutation. In [8, Proposition 3.2.12] the closure under cylindrification is obtained via an inverse homomorphic image, resulting in a shorter proof. The proof of the latter operates on completely defined deterministic tree automata. The (formalized) proof below operates on arbitrary tree automata. (n) Proof Let A = (F , Q, Q ,) be a tree automaton that accepts R . We construct tree automata that accept C (R) and σ(R) . We first consider permutation. Let σ be (n) a permutation on {1,..., n} and define A = (F , Q, Q , ) where  is obtained σ f σ σ from  by replacing every transition rule of the form f ··· f (p ,..., p ) → q with 1 n 1 m f ··· f (p ,..., p ) → q. Epsilon transitions in  are not affected. To conclude σ(1) σ(n) 1 m (n) L(A )= σ(R) , we first define the effect of σ on terms in T (F ): σ(t ) = f ··· f (σ (t ), ...,σ (t )) σ(1) σ(n) 1 m for t = f ··· f (t ,..., t ). The following preliminary fact 1 n 1 m t ,..., t = σ( t ,..., t ) (∗ ) σ(1) σ(n) 1 n σ is proved as follows. We have Pos( t ,..., t ) = Pos(t )∪···∪ Pos(t ) = Pos( t ) = Pos(σ ( t )) σ(1) σ(n) 1 n and, for every position p ∈ Pos( t ,..., t ), σ(1) σ(n) t ,..., t (p) = f ··· f = σ( t )(p) σ(1) σ(n) 1 n where f = t (p) if p ∈ Pos(t ) and f =⊥ otherwise. We now prove i i σ(i ) σ(i ) ∗ ∗ t ,..., t → q ⇐⇒ t ,..., t → q (6) 1 n σ(1) σ(n) A σ(A) for all terms t ,..., t ∈ T (F ∪{⊥}) and states q ∈ Q. Suppose 1 n t ,..., t = f ··· f (u ,..., u ) → q 1 n 1 n 1 m ∗ ∗ So there exists a transition rule f ··· f (q ,..., q ) → p ∈  with p → q and u → q 1 n 1 m i i A A for all 1  i  m.Wehave f ··· f (q ,..., q ) → p ∈  and p → q.Using σ(1) σ(n) 1 m σ σ(A) (∗ ) the induction hypothesis yields σ(u ) → q for 1  i  m and thus σ i i σ(A) t ,..., t = f ··· f (σ (u ), ...,σ (u )) → q σ(1) σ(n) σ(1) σ(n) 1 n σ(A) 123 First-Order Theory of Rewriting… Page 27 of 76 14 The converse is proved in a similar fashion. By specializing (6) to terms t ,..., t ∈ T (F ) 1 n and states q ∈ Q we obtain L(σ (A)) ={σ( t ,..., t ) | t ,..., t ∈ L(A)}= f 1 n 1 n L( σ(R) ). Next we consider cylindrification. Let i ∈{1,..., n + 1}. We define the tree automaton (n+1) A = (F ,(Q∪{⊥})×{! ,⊥}, Q ×{!} , ) where⊥ is a fresh state and  is C f C C i i i obtained from  by replacing every transition rule of the form f ··· f f ··· f (p ,..., p ) → q 1 i−1 i n 1 m with the transitions f ··· f gf ··· f (p q ,..., p q ,..., p q ) → q! 1 i−1 i n 1 1 m m k k f ··· f ⊥ f ··· f (p ⊥,..., p ⊥) → q⊥ 1 i−1 i n 1 m for all l-ary g ∈ F.Here k = max(m, l) is the arity of f ··· f gf ··· f . Moreover, 1 i−1 i n p =⊥ for all m < j  k,and ! if j  l q = ⊥ if j > l for all 1  j  k. Additionally,  contains the transition rule ⊥···⊥g⊥···⊥(⊥! ,...,⊥! )→⊥! for every g ∈ F.Here g is the i-th element in⊥···⊥g⊥···⊥. Finally, for every ε-transition p → q in  we add p!→ q! and p⊥→ q⊥ to  . The purpose of the second component ⊥/! in states of A is to mark whether states are reached by terms where (! )the i-th position in the encoded tuple is a term in T (F ),or(⊥)itis⊥.Inorder to show L(A )= C (R) , C i we simplify the notation by considering i = 1, which entails no loss of generality as regular relations are closed under permutation. Again, first we define the effect of C on terms in (1) (n) T (F )× T (F ): C (s, t ) = ff ··· f (C (s , u ), ..., C (s , u )) 1 1 n 1 1 1 1 k k for s = f (s ,..., s ) and t = f ··· f (u ,..., u ).Here k = max(l, m) is the arity of 1 l 1 n 1 m n (1) ff ··· f , s =⊥ for l < j and u =⊥ for m < j. By induction on s ∈ T (F ) and 1 n j j (n) t ∈ T (F ) we show the preliminary statements Pos(C (⊥, t )) = Pos(t ) and C (⊥, t )(p) =⊥t (p) for all p ∈ Pos(t ) (7) 1 1 n n n Pos(C (s,⊥ )) = Pos(s) and C (s,⊥ )(p) = s(p)⊥ for all p ∈ Pos(s) (8) 1 1 Let t = f ··· f (u ,..., u ).Wehave C (⊥, t )=⊥ f ··· f (C (⊥, u ), ..., C (⊥, u )) 1 n 1 m 1 1 n 1 1 1 k and obtain Pos(C (⊥, u )) = Pos(u ) and C (⊥, u )(q) =⊥u (q) for all ip ∈ Pos(t ) 1 i i 1 i i from the induction hypothesis. Note that ip ∈ Pos(t ) if and only if p ∈ Pos(u ).For p = ε we have C (⊥, t )(p) =⊥ f ··· f =⊥t (p). This establishes (7). The proof of 1 1 n (8) is similar and omitted. These statements are used to prove Pos(C (s, t )) = Pos(s) ∪ Pos(t ) and C (s, t )(p) = s(p)t (p) for all p ∈ Pos(s)∪ Pos(t ), by induction on |s|+|t|. Let s = f (s ,..., s ) and t = f ··· f (u ,..., u ).Let k = max(l, m) be the arity of 1 l 1 n 1 m ff ··· f .Wehave Pos(C (s , u )) = Pos(s )∪ Pos(u ) and C (s , u )(p) = s (p)u (p) 1 n 1 i i i i 1 i i i i for all p ∈ Pos(s ) ∪ Pos(u ) for all 1  i  k.For i  min(l, m) this follows from i i the induction hypothesis and for i > min(l, m) this follows from (7)or(8). Moreover, 123 14 Page 28 of 76 A. Middeldorp et al. C (s, t )(ε) = ff ··· f = s(ε)t (ε) so the second statement also holds for p = ε.From 1 1 n these statements we immediately obtain C (s, t ) = s, t ,..., t (∗ ) 1 1 n C (1) (n) for all terms s ∈ T (F ) and t = t ,..., t ∈ T (F ). The following two properties are 1 n easily proved by induction: n ∗ C (s,⊥ ) → ⊥! (9) C (A) for all terms s ∈ T (F ) and ∗ ∗ t → q ⇐⇒ C (⊥, t ) → q⊥ (10) A C (A) (n) for all terms t ∈ T (F ). For the first one we use induction on s = f (s ,..., s ).We 1 l n n n n n ∗ have C (s,⊥ ) = f⊥ (C (s ,⊥ ), ..., C (s ,⊥ )) and obtain C (s ,⊥ ) → ⊥! 1 1 1 1  1 i C (A) for 1  l  n from the induction hypothesis. By construction f⊥ (⊥! ,...,⊥! ) → n ∗ ⊥! ∈  . Hence C (s,⊥ ) → ⊥! . The second property is proved by induction C 1 C (A) on t = f ··· f (u ,..., u ).Wehave C (⊥, t )=⊥ f ··· f (C (⊥, u ), ..., C (⊥, u )). 1 n 1 m 1 1 n 1 1 1 m First assume t → q. So there exists a transition rule f ··· f (q ,..., q ) → p ∈ 1 n 1 m ∗ ∗ with p → q and u → q for all 1  i  m. The induction hypothesis yields i i A A C (⊥, u ) → q ⊥ for 1  i  m. By construction ⊥ f ··· f (q ⊥,..., q ⊥) → 1 i i 1 n 1 m C (A) ∗ ∗ p⊥∈  and p⊥→ q⊥. Combining all this yields C (⊥, t ) → q⊥.For the C 1 C (A) C (A) 1 1 converse, assume C (⊥, t ) → q⊥. So there exists a rule⊥ f ··· f (q ⊥,..., q ⊥) → 1 1 n 1 m C (A) ∗ ∗ p⊥∈  with p⊥→ q⊥ and C (⊥, u ) → q ⊥ for all 1  i  m.The C 1 i i C A C (A) 1 1 induction hypothesis yields u → q for 1  i  m. Furthermore, the transition rule i i ⊥ f ··· f (q ⊥,..., q ⊥) → p⊥ originates from f ··· f (q ,..., q ) → p ∈  and we 1 n 1 m 1 n 1 m ∗ ∗ ∗ obtain p⊥→ q⊥ from p → q. Hence t → q as desired. This completes the proofs C (A) A A of (9)and (10). Next we prove ∗ ∗ t → q ⇐⇒ C (s, t ) → q! (11) A C (A) (n) for all s ∈ T (F ), t ∈ T (F ) and q ∈ Q. For the only-if direction we use induc- tion on t = f ··· f (u ,..., u ).Let s = f (s ,..., s ).From t → q we obtain 1 n 1 m 1 l ∗ ∗ f ··· f (p ,..., p ) → p ∈  with p → q and u → p for all 1  i  m.We 1 n 1 m i i A A have ff ··· f (p q ,..., p q ,..., p q ) → p!∈ 1 n 1 1 m m k k C by construction. Here k = max(l, m) is the arity of ff ··· f , p =⊥ for all m < i  k, 1 n i q =! if 1  i  l and q =⊥ if l < i  k.Wehave p!→ q! and C (s, t ) = i i 1 C (A) ff ··· f (C (s , u ), ..., C (s , u )) with s =⊥ for l < i  k and u =⊥ for m < 1 n 1 1 1 1 k k i i i  k. The induction hypothesis yields C (s , u ) → p ! for all 1  i  min(l, m). 1 i i i C (A) Note that!= q .For min(l, m)< i  k we distinguish two cases. n ∗ • If min(l, m) = m then m < i and thus u =⊥ . We obtain C (s , u ) → ⊥! from i 1 i i C (A) (9). Note that p =⊥ and q =! . i i • If min(l, m) = l then l < i and thus s =⊥. We obtain C (s , u ) → p ⊥ from i 1 i i i C (A) (10). Note that q =⊥. So in all cases we have C (s , u ) → p q . Hence 1 i i i i C (A) ∗ ∗ C (s, t ) → ff ··· f (p q ,..., p q ,..., p q ) → p!→ q! 1 1 n 1 1 m m k k C (A) C (A) C (A) 1 1 1 123 First-Order Theory of Rewriting… Page 29 of 76 14 as desired. The if-direction of (11) is proved in a similar fashion. From C (s, t ) = ff ··· f (C (s , u ), ..., C (s , u )) → q! 1 1 n 1 1 1 1 k k C (A) we obtain a rule ff ··· f (p q ,..., p q ,..., p q ) → p!∈  with p!→ 1 n 1 1 m m k k C C (A) q! and C (s , u ) → p q for 1  i  k.Wehave f ··· f (p ,..., p ) → p ∈ 1 i i i i 1 n 1 m C (A) ∗ ∗ and p → q due to the construction of  . The induction hypothesis yields u → p C i i A 1 A for 1  i  m and thus t = f ··· f (u ,..., u ) → q. Specializing (11) to terms 1 n 1 m t = t ,..., t with t ,..., t ∈ T (F ) and q ∈ Q yields L(C (A))={ s, t ,..., t | 1 n 1 n f 1 1 n t ,..., t ∈ L(A) and s ∈ T (F )}= C (R) . 1 n 1 Note that for every RR relation R, its inverse R is the same as σ(R) for the permutation σ = (12). Corollary 2 (R ::= R ) The class of binary regular relations is effectively closed under inverse. (2) Example 19 Consider the RR automaton A = (F , Q, Q ,) of Example 17. We compute 2 f C ({ (s, t , u) | s → u and t ∈ T (F )}. To this end, we transform A by the construction in the 2 ε (3) above proof. This results in an automaton B = (F , Q , Q , ) with Q = (Q∪{⊥})× {! ,⊥}, Q ={44 , 66 },and  consisting of 183 transitions. Every non-ε-transition in ! ! gives rise to five transitions in  . For instance, the transitions aaa → 05 afa(⊥ ) → 05 aga(⊥ ,⊥ ) → 05 ! ! ! ! ! ! aba → 05 a⊥a → 05 ! ⊥ originate from aa → 05 and the transitions ⊥af(⊥5 )→⊥6 ⊥ff(⊥5 )→⊥6 ⊥gf(⊥5 ,⊥ )→⊥6 ! ! ! ! ! ! ! ⊥bf(⊥5 )→⊥6 ⊥⊥f(⊥5 )→⊥6 ! ! ⊥ ⊥ originate from ⊥f(⊥5) →⊥6. Moreover, every ε-transition in  is duplicated in  .For instance, 25 → 45 gives rise to 25 → 45 and 25 → 45 . Finally,  contains the ! ! ⊥ ⊥ transitions ⊥a⊥→⊥ ⊥b⊥→⊥ ⊥f⊥(⊥ )→⊥ ⊥g⊥(⊥ ,⊥ )→⊥ ! ! ! ! ! ! ! So in total there are 31× 5+ 12× 2+ 4 = 183 transitions in  . In Theorem 14 and its proof we have finally introduced all concepts needed to complete the proof that RR relations are closed under projection (Theorem 2). It remains to be shown that L(A )=  (R) . Proof of Theorem 2 (cont’d) To simplify the notation, we consider  (which entails no loss of generality as regular relations are closed under permutation). Again, first we define the (n) effect of  on terms in T (F ): (t ) = f ··· f ( (u ), ..., (u )) 1 2 n 1 1 1 k for t = f ··· f (u ,..., u ).Here k  m is the arity of f ··· f .Weshow 1 n 1 m 2 n (C (s, t )) = t (12) 1 1 123 14 Page 30 of 76 A. Middeldorp et al. (1) (n) for all terms s ∈ T (F ) and t ∈ T (F ) by induction on|s|+|t|.Solet s = f (s ,..., s ) 1 l and t = f ··· f (u ,..., u ).Wehave 1 n 1 m (C (s, t )) =  ( ff ··· f (C (s , u ), ..., C (s , u ))) 1 1 1 1 n 1 1 1 1 k k = f ··· f ( (C (s , u )), . . . ,  (C (s , u ))) 1 n 1 1 1 i 1 1 m m = f ··· f (u ,..., u ) = t 1 n 1 m Here k = max(l, m) is the arity of ff ··· f , s =⊥ for l < j, u =⊥ for m < j,and the 1 n j j induction hypothesis is applied to  (C (s , u )) for 1  i  m. Now we can easily show 1 1 i i ( t ,..., t ) = t ,..., t (∗ ) 1 1 n 2 n (n) for all terms t ,..., t ∈ T (F ).From(∗ ) in the proof of Theorem 14 we obtain 1 n C t , t ,..., t = C (t , t ,..., t ) 1 2 n 1 1 2 n and thus  ( t ,..., t ) =  (C (t , t ,..., t )) = t ,..., t using (12). We now 1 1 n 1 1 1 2 n 2 n prove the following two statements: ∗ ∗ t → q ⇒  (t ) → q (13) A  (A) (n) for all terms t ∈ T (F ) and states q ∈ Q,and ∗ ∗ (n) u → q ⇒ t → q for some term t ∈ T (F ) with  (t ) = u (14) (A) A (n) for all terms u ∈ T (F ). We prove the first statement by induction on t. Suppose t = f ··· f (u ,..., u ) → q 1 n 1 m So there exist a transition rule f ··· f (q ,..., q ) → p ∈  with p → q such 1 n 1 m that u → q for all 1  i  m. To simplify the reasoning, we assume that the con- i i n−1 dition f ··· f =⊥ in the definition of  is temporarily lifted. This entails that 2 n f ··· f (q ,..., q ) → p is a transition rule in  .Here k  m is the arity of f ··· f . 2 n 1 k  2 n ∗ ∗ We have p → q. The induction hypothesis yields (u ) → q for 1  i  m. i i (A)  (A) 1 1 Hence ∗ ∗ (t ) = f ··· f ( (u ), ..., (u )) → f ··· f (q ,..., q ) → q 1 2 n 1 1 1 k 2 n 1 k (A)  (A) 1 1 as desired. For the second statement, suppose u = f ··· f (u ,..., u ) → q and 2 n 1 k (A) so there exists a transition rule f ··· f (q ,..., q ) → p ∈  with p → q and 2 n 1 k 1  (A) u → q for all 1  i  k. By construction of  (A), there exist a function symbol f ∈ i i 1 1 (A) F∪{⊥} and states q ,..., q such that f f ··· f (q ,..., q ) → p ∈ .Here m  k k+1 m 1 2 n 1 m (n) is the arity of f ··· f . From the induction hypothesis we obtain terms v ,...,v ∈ T (F ) 1 n 1 k such that v → q and  (v ) = u for 1  i  k. Because all states of A are reachable, i i 1 i i (n) ∗ there exist terms v ,...,v ∈ T (F ) such that v → q for k + 1  j  m.Now let k+1 m j j ∗ ∗ t = f ··· f (v ,...,v ). We clearly have t → f ··· f (q ,..., q ) → p Moreover, 1 n 1 m 1 n 1 m A A (t ) = f ··· f ( (v ), ..., (v )) = f ··· f (u ,..., u ) = u. This concludes the 1 2 n 1 1 1 k 2 n 1 k proof of the two statements. Specializing statement (13)to t = t ,..., t where t ,..., t ∈ 1 n 1 n T (F ) and states q ∈ Q yields  (L(A)) ⊆ L( (A)). From statement (14) we conclude f 1 1 L( (A)) ⊆  (L(A)) and hence 1 1 L( (A)) ={  ( t ,..., t )| t ,..., t ∈ L(A)}=  (R) 1 1 1 n 1 n 1 n−1 It remains to show that the automaton  (A) does not use any rule⊥ → p to accept terms n−1 when n > 1. Since L( (A)) =  (R) and  (R) ⊆ T (F ) ,notermin  (R) 1 1 1 1 n−1 contains the function symbol⊥ . 123 First-Order Theory of Rewriting… Page 31 of 76 14 5.4 Normal Form Predicate At this point we have formalized proofs for the constructs in the grammar in Fig. 1, with the exception of the normal form predicate (T ::= NF). This predicate can be defined in the first-order theory of rewriting as NF(t ) ⇐⇒¬ u (∃t → u) which gives rise to the following procedure: 1. Using Theorems 4, 10 and 11 an RR automaton is constructed that accepts the encoding of the rewrite relation→. 2. Using Theorem 2 the RR automaton of step 1 is projected into a tree automaton that accepts the set of reducible ground terms, corresponding to the subformula∃ u (t → u). 3. Complementation (Theorem 13) is applied to the automaton of step 2 to obtain a tree automaton that accepts the set of ground normal forms. Since projection may transform a deterministic tree automaton into a non-deterministic one, this is inefficient. In this section we provide a direct construction of a tree automaton that accepts the set of ground normal forms of a left-linear TRS, which goes back to Comon [6], and present a formalized correctness proof. Throughout this section R is assumed to be left-linear. We start with defining some preliminary concepts. Definition 14 Given a signature F, we write F for the extension of F with a fresh constant symbol⊥.Given t ∈ T (F , V), t denotes the result of replacing all variables in t by⊥: ⊥ ⊥ ⊥ ⊥ x =⊥ f (t ,..., t ) = f (t ,..., t ) 1 n 1 n We define the partial order  on T (F ) as the least congruence that satisfies ⊥  t for all terms t ∈ T (F ): t  u ··· t  u 1 1 n n ⊥  t f (t ,..., t )  f (u ,..., u ) 1 n 1 n The partial map↑: T (F )× T (F ) → T (F ) is defined as follows: ⊥ ⊥ ⊥ ⊥↑ t = t ↑⊥ =tf (t ,..., t ) ↑ f (u ,..., u ) = f (t ↑ u ,..., t ↑ u ) 1 n 1 n 1 1 n n It is not difficult to show that t ↑ u is the least upper bound of comparable terms t and u. ⊥ ⊥ Definition 15 Let R be a TRS over a signature F. We write T for the set {t | t for some  → r ∈ R}∪{⊥}.The set T is obtained by closing T under↑. Example 20 Consider the TRS R consisting of following rules: h(f(g(a), x , y)) → g(a) g(f(x , h(x ), y))) → x h(f(x , y, h(a))) → h(x ) We start by collecting the subterms of the left-hand sides: T ={⊥, a, g(a), h(⊥), h(a), f(g(a),⊥,⊥), f(⊥, h(⊥),⊥), f(⊥,⊥, h(a))} Closing T under↑ adds the following terms: f(g(a),⊥,⊥) ↑ f(⊥, h(⊥),⊥) = f(g(a), h(⊥),⊥) f(⊥,⊥, h(a)) ↑ f(⊥, h(⊥),⊥) = f(⊥, h(⊥), h(a)) f(g(a), h(⊥),⊥) ↑ f(⊥, h(⊥), h(a)) = f(g(a), h(⊥), h(a)) 123 14 Page 32 of 76 A. Middeldorp et al. Lemma 19 The set T is finite. Proof If t ↑ u is defined then Pos(t ↑ u) = Pos(t )∪ Pos(u). It follows that the positions ⊥ ⊥ ⊥ of terms in T \T are positions of terms in T .Since T is finite, there are only finitely many such positions. Hence the finiteness of T follows from the finiteness of F. Although the above proof is simple enough, we formalized the proof below which is based on a concrete algorithm to compute T . Actually, the algorithm presented below is based on a general saturation procedure, which is of independent interest. Definition 16 Let f : U×U → U be a (possibly partial) function and let S be a finite subset of U.The closure C (S) is the least extension of S with the property that f (a, b) ∈ C (S) f f whenever a, b ∈ C (S) and f (a, b) is defined. The following lemma provides a sufficient condition for closures to exist. The proof gives a concrete algorithm to compute the closure. Lemma 20 If f is a total, associative, commutative, and idempotent function then C (S) exists and is finite. Proof If S = ∅ then C (S) = ∅ and the claim trivially holds. Suppose S = ∅ and let a be an arbitrary element in S.Weshow C (S) = C (S\{a})∪{a}∪{ f (a, c) | c ∈ C (S\{a})} f f f Since S is finite, this gives rise to the following iterative algorithm to compute C (S): I := ∅; for all x ∈ S do I := I ∪{ x}∪{ f (x , y) | y ∈ I} return I In each iteration only finitely many elements are added. Hence C (S) is finite. It remains to show the above equation. The inclusion from left to right is immediate from the definition of C (S).Let b be an arbitrary element of C (S).If b ∈ S then b ∈ C (S\{a})∪{a}.If b ∈ / S f f f then b = f (a , f (a ,... f (a , a )...)) for some sequence of elements a ,..., a ∈ S. 1 2 n−1 n 1 n If a is an element of this sequence then, using the properties of f , we may assume a appears exactly once in the sequence. Hence b = f (a, c) for some element c ∈ C (S\{a}).If a is not an element of a ,..., a then b ∈ C (S\{a}). This completes the proof. 1 n f Since our function↑ is partial, we need to lift it to a total function that preserves associa- tivity and commutativity. In our abstract setting this entails finding a binary predicate P on U such that f (a, b) is defined if P(a, b) holds. In addition, the following properties need to be fulfilled: • P is reflexive and symmetric, • if P(a, f (b, c)) and P(b, c) hold then P(a, b) and P( f (a, b), c) hold as well, for all a, b, c ∈ U . For the details we refer to the formalization. Definition 17 The tree automaton A = (F , Q, Q ,) is defined as follows: Q = NF(R) f Q = T and  consists of all transition rules f (p ,..., p ) → q such that f (p ,..., p ) f ↑ 1 n 1 n is no redex of R and q is the maximal element of Q satisfying q  f (p ,..., p ). 1 n Since states are terms from T ⊆ T (F ) here, Definition 14 applies. ↑ ⊥ 123 First-Order Theory of Rewriting… Page 33 of 76 14 Example 21 For the TRS R of Example 20, the tree automaton A consists of the NF(R) following transition rules: 2if p = 1 4if p = 1 a → 1 g(p) → h(p) → 0if p∈{ / 1, 6, 9, 10} 3if p∈{ / 1, 8, 10} ⎪ 5if p = 2, q ∈{ / 3, 4} 6if p = 2, q ∈{3, 4}, r = 4 f(p, q, r ) → 7if q ∈{ / 3, 4}, r = 4 8if p = 2, q ∈{3, 4}, r = 4 9if p = 2, q ∈{3, 4}, r = 4 f(p, q, r ) → 10 if p = 2, q ∈{3, 4}, r = 4 0 otherwise Here we use the following abbreviations: 0=⊥ 3 = h(⊥) 6 = f(⊥, h(⊥),⊥) 8 = f(g(a), h(⊥),⊥) 1 = a 4 = h(a) 7 = f(⊥,⊥, h(a)) 9 = f(⊥, h(⊥), h(a)) 2 = g(a) 5 = f(g(a),⊥,⊥) 10 = f(g(a), h(⊥), h(a)) As can be seen from the above example, the tree automaton A is not completely NF(R) defined. Unlike the construction in [6], we do not have an additional state that is reached by all reducible ground terms. Before proving that A accepts the ground normal forms of R,wefirstshowthat NF(R) A is well-defined, which amounts to showing that for every f (p ,..., p ) with f ∈ F NF(R) 1 n and p ,..., p ∈ T the set of states q such that q  f (p ,..., p ) has a maximum element 1 n ↑ 1 n with respect to the partial order . Lemma 21 For every term t ∈ T (F ) the set {s ∈ T | s  t} has a unique maximal ⊥ ↑ element. Proof Let S ={s ∈ T | s  t}. Because ⊥  t and⊥∈ T , S = ∅.If s , s ∈ S then ↑ ↑ 1 2 s  t and s  t and thus s ↑ s is defined and satisfies s ↑ s  t.Since T is closed 1 2 1 2 1 2 ↑ under↑, s ↑ s ∈ T and thus s ↑ s ∈ S. Consequently, S has a unique maximal element. 1 2 ↑ 1 2 The next lemma is a trivial consequence of the fact that A has no ε-transitions. NF(R) Lemma 22 The tree automaton A is deterministic. NF(R) ∗ ⊥ ⊥ Lemma 23 If t ∈ T (F ) with t → q and s  t for a proper subterm s of some left-hand side of R then s  q. Proof We use induction on t.Let t = f (t ,..., t ).Wehave t → f (q ,..., q ) → q. 1 n 1 n We proceed by case analysis on s.If s is a variable then s =⊥ and, as ⊥ is minimal in , we obtain s  q. Otherwise we must have root(s) = f from the assumption ⊥ ⊥ ⊥ s  t . So we may write s = f (s ,..., s ). The induction hypothesis yields s  q for 1 n i ⊥ ⊥ ⊥ ⊥ all 1  i  n. Hence s = f (s ,..., s )  f (q ,..., q ). Additionally we have s ∈ Q 1 n by Definition 17 as s is a proper subterm of a left-hand side of R.Since f (q ,..., q ) → q 1 n is a transition rule, we obtain f (s ,..., s )  q from the maximality of q. 1 n 123 14 Page 34 of 76 A. Middeldorp et al. Table 1 Summary of (formalized) closure properties Operation GTTs Anchored GTTs RR Operation Regular relations Union ×  Union Intersection ×  Intersection Complement ×  Complement Composition   Projection Inverse   Cylindrification Transitive closure  × Permutation Context closure × Using the previous result we can prove that no redex of R reaches a state in A . NF(R) Lemma 24 If t ∈ T (F ) is a redex then t → q for no state q ∈ T . Proof We have   t for some left-hand side  of R. For a proof by contradiction, assume ∗ ∗ t → q. Write t = f (t ,..., t ).Wehave t → f (q ,..., q ) → q and obtain 1 n 1 n f (q ,..., q ) by a case analysis on  and Lemma 23. Therefore the transition rule 1 n f (q ,..., q ) → q cannot exist by Definition 17. 1 n Lemma 25 If t → q and t ∈ T (F ) then q  t. ∗ ∗ Proof We use induction on t.Let t = f (t ,..., t ).Wehave t → f (q ,..., q ) → q. 1 n 1 n The induction hypothesis yields q  t for all 1  i  n and thus also f (q ,..., q ) i i 1 n f (t ,..., t ).Wehave q  f (q ,..., q ) by Definition 17 and thus q  t by the transitivity 1 n 1 n of . Lemma 26 If t ∈ NF(R) then t → q for some state q ∈ T . Proof We use induction on t.Let t = f (t ,..., t ).Since t ,..., t ∈ NF(R) we obtain 1 n 1 n f (t ,..., t ) → f (q ,..., q ) from the induction hypothesis. Suppose f (q ,..., q ) is 1 n 1 n 1 n aredex,so   f (q ,..., q ) for some left-hand side  of R. From Lemma 25 we obtain 1 n q  t for all 1  i  n and thus f (q ,..., q )  f (t ,..., t ). Hence   f (t ,..., t ). i i 1 n 1 n 1 n This however contradicts the assumption that t is a normal form. (Here we need left-linearity of R.) Therefore f (q ,..., q ) is no redex and thus, using Lemma 21, there exists a transition 1 n f (q ,..., q ) → q in  and thus t → q. 1 n Theorem 15 (T ::= NF) If R is a left-linear TRS then L(A ) = NF(R). NF(R) Proof Let t ∈ T (F ).If t ∈ NF(R) then t → q for some state q ∈ T by Lemma 26.Since all states in T are final, t ∈ L(A ). Next assume t ∈ / NF(R). Hence t = C[s] for some ↑ NF(R) redex s. According to Lemma 24 s does not reach a state in A . Hence also t cannot NF(R) reach a state and thus t ∈ / L(A ). NF(R) 5.5 Decision Procedure In Table 1 we summarize the effective closure properties that were presented in detail in this section and formalized in Isabelle. The asterisks indicate that for anchored GTTs we have two closure properties each. The underlined result (the closure of RR relations under 123 First-Order Theory of Rewriting… Page 35 of 76 14 Table 2 Binary predicates as RR relations 1 1 − → = (→ ) ← = ((→ ) ) ε ε → = (→ ) ε ε + +  > → = ((→ ) ) 1 ∗ + → = (→ ) → = ((→ ) ) >ε ε ε > > >ε ∗ + −→  = (→ ) → = ((→ ) ) ε ε + + 1 ∗ − + → = ((→ ) ) ↔ = (((→ ) ∪→ ) ) ε ε ε ε ε − 1 + − + ↔ = ((→ ) ∪→ ) ↓ = ((→ ) ◦ (→ ) ) ε ε ε ! + → = ((→ ) ) ∩ (T (F )× NF) composition) is not used in the decision procedure but does hold: If R and R are RR 1 2 2 relations then R ◦ R =  (C (R )∩ C (R )). Concerning the empty entry in the table, 1 2 2 3 1 1 1 it can be shown that GTT relations are closed under the context operation (·) if and only if n ∈{ , 1,>} and p ∈{ ,ε}. The second and third columns in the left part of Table 1 correspond to the A and R parts of the grammar in Fig. 1. The logical structure of formulas in the first-order theory of rewriting is taken care of by the closure operations on regular relations listed in the second half of Table 1. In Table 2 we show how some of the common binary predicates in term rewriting are represented as RR relations using the corresponding operations. These are added to the language L of the first-order theory of rewriting without compromising the decidability result that is presented below. Theorem 16 The first-order theory of rewriting is decidable for finite linear variable- separated TRSs. Proof Let ϕ(x ,..., x ) be a first-order formula over the language L with free variables 1 n x ,..., x .Let R be a finite linear variable-separated TRS over a signature F. We construct 1 n an RR automaton that accepts the encoding of the relation [[ϕ]] = { (t ,..., t ) | R n 1 n ϕ(t ,..., t )}. For closed formulas, checking R ϕ then boils down to checking non- 1 n emptiness of [[ϕ]] , which is decidable. We prove the (correctness of the) construction by structural induction on ϕ. In the base case ϕ is an atomic formula and we distinguish the following cases. 1. If ϕ = (x → y) then we use Theorem 4 to obtain an anchored GTT for → ,which is transformed into an RR automaton for → by Theorem 10. An application of 2 ε Theorem 11 with n = 1and p =  yields an RR automaton for (→ ) =[[ϕ]]. 2 ε 2. If ϕ = (x → y) then we repeat the constructions in the previous case, with an additional application of modified transitive closure (Theorem 8) before Theorem 11 (with n = p = ) is applied. 3. If ϕ = (x = y) then [[ϕ]] is regular by Lemma 18. Here we assume that x = y.If x and y are the same variable then [[ϕ]] is a set of ground terms and the above constructions need to be modified as follows. If ϕ = (x = x ) then [[ϕ]] = { t | t ∈ T (F )}= T (F ) is accepted by the tree automaton (F ,{q},{q},) with  consisting of all rules f (q,..., q) → q for f ∈ F. Consider ϕ = (x → x ).Wehave { t , t | t → t}={ t , u | t → u and t = u}. The latter is regular (cases 1 and 3 above R R 123 14 Page 36 of 76 A. Middeldorp et al. together with Theorem 12) and hence the regularity of [[ϕ]] = { t | t → t} follows by an application of Theorem 2. In the remaining case (ϕ = (x → x )) we reason as in the previous case (using cases 2 and 3 above). Next we consider the propositional connectives. 4. Suppose ϕ =¬ψ. The induction hypothesis yields an RR automaton that accepts [[ψ]] . Since the class of n-ary regular relations is effectively closed under complement (Theorem 13), we obtain an RR automaton that accepts [[ϕ]] . 5. Suppose ϕ = ψ ∧ψ .Since ψ and ψ may have less free variables than ϕ, we cannot use 1 2 1 2 Theorem 12 without further ado. Let y ,..., y be the free variables in ψ and z ,..., z 1 k 1 1 m be the free variables in ψ .Wehave{ x ,..., x }={ y ,..., y }∪{z ,..., z }. Because 2 1 n 1 k 1 m regular relations are closed under permutation (Theorem 14), we may assume that the variables in y ,..., y and z ,..., z are listed in the same order as in x ,..., x .The 1 k 1 m 1 n induction hypothesis yields an RR automaton A for [[ψ ]] and an RR automaton A k 1 1 m 2 for [[ψ ]] .Using 2n − (k + m) applications of cylindrification (Theorem 14), these automata are turned into RR automata. Since n-ary regular relations are closed under intersection (Theorem 12), we obtain an RR automaton for [[ϕ]] . 6. The other binary connectives are handled exactly like conjunction. The final cases involve the two quantifiers. 7. Suppose ϕ=∃ x ψ.If x does not occur free in ψ then [[ϕ]] = [[ψ]] and hence the result follows immediately from the induction hypothesis. So we assume that x occurs free in ψ and n  0. The induction hypothesis yields an RR automaton that accepts [[ψ]] . n+1 Since the class of regular relations is effectively closed under projection (Theorem 2), we obtain an RR automation that accepts [[ϕ]] . 8. The case ϕ =∀ x ψ reduces to the preceding case by the well-known equivalence ∀ x ψ ≡¬∃ x¬ ψ. 6 Properties on Non-ground Terms Since tree automata operate on ground terms, the decision procedure presented in the pre- ceding section is restricted to properties on ground terms. The following example shows that ground-confluence, i.e., confluence restricted to ground terms, is not the same as confluence. Example 22 The left-linear right-ground TRS R consisting of the rules a →bf(a, x ) →bf(b, b) → b over the signature F ={a, b, f} is ground-confluent because every ground term in T (F ) rewrites to b. Confluence does not hold; the term f(a, x ) rewrites to the different normal forms b and f(b, x ). In this section we present results that allow the use of FORT on (certain) properties over arbitrary terms. The main idea is to extend the given signature F with constants to replace variables in terms. The required number of additional constants depends on the property under consideration. We consider the following confluence-related properties: ∗ ∗ CR:∀ s∀ t∀ u (s → t ∧ s → u ⇒ t ↓ u) confluence = ∗ SCR:∀ s∀ t∀ u (s → t ∧ s → u ⇒ ∃ v(t → v ∧ u → v)) strong confluence WCR:∀ s∀ t∀ u (s → t ∧ s → u ⇒ t ↓ u) local confluence 123 First-Order Theory of Rewriting… Page 37 of 76 14 Fig. 6 Confluence-related properties on ground and non-ground terms ∗ ! ! NFP:∀ s∀ t∀ u (s → t ∧ s → u ⇒ t → u) normal form property ! ! UNR:∀ s∀ t∀ u (s → t ∧ s → u ⇒ t = u) unique normal forms with respect to reduction UNC:∀ t∀ u (t ↔ u ∧ NF(t )∧ NF(u) ⇒ t = u) unique normal forms with respect to conversion ∗ ∗ Here t ↓ u denotes joinability: ∃v(t → v ∧ u → v).Let P be the collection of these properties. We also consider the following properties involving two TRSs R and S: ∗ ∗ ∗ ∗ COM:∀ s∀ t∀ u (s → t ∧ s → u ⇒ ∃ v(t → v ∧ u → v)) commutation R S S R ∗ ∗ CE:∀ s∀ t (s ↔ t ⇐⇒ s ↔ t ) conversion equivalence R S ! ! NE:∀ s∀ t (s → t ⇐⇒ s → t ) normalization equivalence R S Let P ={COM, CE, NE}. For a property P ∈ P ∪ P , GP denotes the property P restricted 2 1 2 to ground terms. The diagram in Fig. 6summarizes the relationships between properties P and GP for P ∈ P . The properties CE, NE ∈ P are unrelated. 1 2 According to the following result, all considered properties are closed under signature extension. Lemma 27 Let R and S be linear variable-separated TRSs over a common signature F. 1. If P ∈ P and (F , R) Pthen (F {c}, R) P. 2. If P ∈ P and (F , R, S) Pthen (F {c}, R, S) P. Proof Let U be a linear variable-separated TRS not containing the constant c.For any x ∈ V, the mapping φ : T (F {c}, V) → T (F , V) replaces all occurrences of c in terms by the variable x: ⎪ x if t = c φ (t ) = t if t ∈ V x x f (φ (t ), ...,φ (t )) if t = f (t ,..., t ) 1 n 1 n c c x ∗ x ∗ A straightforward induction proof reveals that φ (s) → φ (t ) whenever s → t.By c c U U choosing x ∈ / Var(s) ∪ Var(t ), the reverse direction holds as well. Moreover, since linear variable-separated TRSs are closed under rule inversion, the equivalence also holds for↔ = → . The lemma is an easy consequence of these facts. We illustrate this for COM. U∪ U ∗ ∗ x ∗ x Given s → t and s → u, with s, t , u ∈ T (F {c}, V), we obtain φ (s) → φ (t ) c c R S R x ∗ x and φ (s) → φ (u). Commutation of (F , R, S) yields a term v ∈ T (F , V) such that c S c x ∗ x ∗ ∗ φ (t ) → v and φ (u) → v.Bytaking x ∈ / Var(t ) ∪ Var(u),weobtain t → v and c c S R S u → v for v = v{x → c} by closure of rewriting under substitutions. So adding constants preserves the properties of interest. For removing constants more effort is required. For the properties in P ∪ P , root steps will play a major role. Root 1 2 123 14 Page 38 of 76 A. Middeldorp et al. steps are important since they permit the use of different substitutions for the left and right- hand side of the employed rewrite rule, due to variable separation. We therefore start with a preliminary result (Lemma 28) which provides abstract conditions that permit the restriction ∗ε∗ ∗ ε ∗ to rewrite sequences containing root steps. We write→ for the relation→ ·→ ·→ . R R R R The proof of Lemma 28 is obtained by a straightforward induction on the term structure and the multi-hole context closure of the rewrite relation, and is omitted. Definition 18 A binary predicate P on terms over a given signature F is closed under multi- hole contexts if P(C[s ,..., s ], C[t ,..., t ]) holds whenever C is a multi-hole context 1 n 1 n over F with n  0 holes and P(s , t ) holds for all 1  i  n. i i Lemma 28 Let A and B be TRSs over the same signature F and let P be a binary predicate that is closed under multi-hole contexts over F. ∗ε∗ ∗ 1. If s → t ⇒ P(s, t ) for all terms s and t then s → t ⇒ P(s, t ) for all terms A A s and t. ∗ε∗ ∗ ∗ ∗ε∗ 2. If s → ·→ t ∨ s → ·→ t ⇒ P(s, t ) for all terms s and t then A B A B ∗ ∗ s → ·→ t ⇒ P(s, t ) for all terms s and t. A B For example, in the results below (Lemmata 34 and 35)for NFP we make use of this ∗ − lemma by instantiating part 2 with P (s, t ): NF(t ) ⇒ s → t, R for A,and R for B. This results in the statement that if ∗ε∗ ∗ ∗ ∗ε∗ ∗ s → ·→ t ∨ s → ·→ t ⇒ NF(t ) ⇒ s → t R R R R R then ∗ ∗ ∗ s → ·→ t ⇒ NF(t ) ⇒ s → t R R R Using the identity → = ← and the definition of NFP, it follows that NFP is a consequence of the statement ∗ε∗ ∗ ∗ ∗ε∗ ∗ s ·→ t ∨ s ·→ t ⇒ NF(t ) ⇒ s → t R R R R R for all s, t ∈ T (F ). Hence we only need to consider rewrite sequences involving root steps, which together with variable separation significantly simplifies the proof. For the other properties of interest, Lemma 28 is instantiated as follows. • For UNC we use part 1 with P (s, t ): NF(s)∧ NF(t ) ⇒ s = t and R∪ R for A. • For UNR we use part 2 with the same predicate P and R for A and R for B. ∗ ∗ − • For COM we use part 2 with P (s, t ): s → ·→ t and R for A and S for B. 3 − • For CR we use part 2 with the same predicate P and replace S by R. • For NE we use part 1 twice, with P (s, t ): NF (t ) ⇒ s → t and R for A, and with 4 R P (s, t ): NF (t ) ⇒ s → t and S for A. 5 S ∗ − • For CE we use part 1 twice, with P (s, t ): s → t and R ∪ R for A, and with S∪ S ∗ − P (s, t ): s → t and S ∪ S for A. 7 − R∪ R ∗ε∗ ∗ε∗ ∗ ∗ In addition, we make use of the identities→ =↔ and→ =↔ for UNC − − R R R∪ R R∪ R and CE. Lemma 29 The properties P ,..., P are closed under multi-hole contexts. 1 7 Strong confluence (SCR) and local confluence (WCR) cannot be reduced to root steps with Lemma 28, because they involve single steps in their definition, which are not multi-hole context closed. However, by investigating the positions involved in s → t and s → u we easily deduce a reduction to root steps for both properties. → → First-Order Theory of Rewriting… Page 39 of 76 14 Lemma 30 A TRS is local confluent if and only if s → t ∧ s → u ⇒ t ↓ u for all terms s, t and u. A TRS is strongly confluent if and only if ε ε = s → t ∧ s → u ∨ s → t ∧ s → u ⇒ t → · u for all terms s, t and u. The next lemma is a key result. It allows the removal of introduced fresh constants while preserving the reachability relation. Note that variable-separation is not required. Lemma 31 Let R be a linear TRS over a signature F that contains a constant c which does ∗ ∗ not appear in R.Ifs → t with c ∈ Fun(s)\Fun(t ) then s[u] → t using the same R R rewrite rules at the same positions, for all terms u and positions p ∈ Pos(s) such that s| = c. The restriction to linear TRSs can also be lifted, at the expense of a more complicated replacement function and proof. Since the decision procedure implemented in FORT-h relies on linearity and variable-separation, we present a simple proof for linear TRSs. Due to calculations involving positions, the formalization in Isabelle/HOL was anything but simple. Proof We use induction on the length of s → t. If this length is zero then there is nothing to show as Fun(s)\Fun(t ) = ∅. Suppose s → v → t and write s = C[σ]→ R R C[r σ]= v.Let p be the position of the hole in C and let p ∈ Pos(s) with s| = c.We distinguish two cases. If p  p then s[u] = (C[u] )[σ] → v with v = (C[u] )[r σ] .Since v| = p p R p p p p ∗  ∗ C| = c we can apply the induction hypothesis to v → t. This yields v → t and hence R R s[u] → t as desired. In the remaining case, p  p.From s| = c and the fact that c does not appear in R we infer that there exists a variable y ∈ Var() such that c ∈ Fun(σ (y)).Let q be the (unique) position of y in  and consider the substitution σ(y)[u]  if x = y τ(x ) = σ(x ) otherwise Here q = p\(p q) is the position of c in σ(y).If y ∈ / Var(r ) then v = C[r σ]= C[r τ] and thus s[u] = C[τ]→ C[r τ]= v → t.If y ∈ Var(r ) then there exists a unique p R position q ∈ Pos(r ) such that r|  = y.So v|    = c and we obtain s[u] = C[τ]→ q p q q p R C[r τ]= v[u] → t from the induction hypothesis. p q q In the proofs below Lemma 31 (also for R ) is used as follows. Let σ denote the sub- ∗ ∗ stitution mapping all variables to c.If sσ → t then s → t by repeated applications of R R Lemma 31 (if the conditions are satisfied). We now prove that two fresh constants are sufficient to reduce commutation (COM), confluence (CR), local confluence (WCR), unique normal forms (UNC and UNR), and the normal form property (NFP) to the corresponding ground properties. Lemma 32 Linear variable-separated TRSs R and S over a common signature F commute if and only if R and S ground-commute over F {c, d}. → 14 Page 40 of 76 A. Middeldorp et al. Proof The only-if direction follows from Lemma 27. For the if direction suppose R and S ground-commute on terms in T (F{c, d}). In order to conclude that R and S commute on terms in T (F , V), according to Lemma 28, it suffices to show the inclusions ∗ε∗ ∗ ∗ ∗ ∗ ∗ε∗ ∗ ∗ → ·→ ⊆→ ·→ → ·→ ⊆→ ·→ − − − − S S R R S S R ∗ε∗ ∗ on terms in T (F , V). Suppose s → ·→ t. Let the substitution σ map all variables − c R S to c and let σ map all variables to d. Since rewriting is closed under substitutions and the variable-separated rule used in the root step → allows changing the substitution, we ∗ε∗ ∗ ∗ ∗ obtain sσ → ·→ t σ . From ground commutation we obtain sσ → ·→ t σ . c − d c − d S S R R Note that s and t are terms in T (F , V) and hence do not contain the constants c and d. Therefore, d ∈ / Fun(sσ ) and c ∈ / Fun(t σ ). As a consequence, repeated applications of c d ∗ ∗ ∗ ∗ Lemma 31 transform sσ → ·→ t σ into a sequence s → ·→ t in which c and d c − d − S R S R do not appear, proving the first inclusion. Note that in our setting TRSs are closed under rule reversal. Hence we can apply Lemma 31 in both directions, which allows us to remove the ∗ ∗ε∗ ∗ ∗ constant d from the term t. The second inclusion→ ·→ ⊆→ ·→ is obtained − − S S R R in the same way. If the TRSs R and S are left-linear right-ground (as opposed to linear variable-separated) then the term t in the above proof is ground due to the root step involved. Hence t σ = t, which allows us to simplify the proof and strengthen the statement to use only one additional constant. Lemma 33 Left-linear right-ground TRSs R and S over a common signature F commute if and only if R and S ground-commute over F{c}. The proof for confluence follows directly from commutation. The proofs for the other properties in P are obtained in a similar manner. We present the proof details for strong confluence since it requires a bit more effort. Lemma 34 Let R be a linear variable-separated TRS over a signature F.If P ∈ P then (F , R) P ⇐⇒ (F {c, d}, R) GP Proof We present the if direction for P = SCR. First we use Lemma 30 to reduce the problem to local peaks involving a root step. Following the reasoning in the proof of Lemma 32,we = ∗ ∗ obtain a witness v such that t σ → v uσ .If t σ = v then uσ → t σ and we obtain d c d c d R R R ∗ ∗ u → t with the help of Lemma 31. So assume uσ → ·→ t σ . Using Lemma 31 c R d R R and induction on the number of variables in u we deduce u → ·→ − t σ .The same argument applied to t produces u → w → − t. Note that w may contain occurrences of the constants c and d since R is a variable-separated TRS. We use the map defined in the proof x x ∗ x x x x of Lemma 27 to eliminate these: u = φ (φ (u)) → φ (φ (w)) → − φ (φ (t )) = t. c d R c d c d Lemma 35 Let R be a left-linear right-ground TRS over a signature F.If P ∈ P \{UNC} then (F , R) P ⇐⇒ (F {c}, R) GP Moreover, (F , R) UNC ⇐⇒ (F {c, d}, R) GUNC → First-Order Theory of Rewriting… Page 41 of 76 14 The simplification in the proof of Lemma 32 for left-linear right-ground systems is not applicable for UNC as conversion can introduce variables. The following example shows that adding a single fresh constant is indeed insufficient for UNC. Example 23 The left-linear right-ground TRS R consisting of the rules a →bf(x , a) → f(b, b) f(b, x ) → f(b, b) f(f(x , y), z) → f(b, b) does not satisfy UNC since f(x , b) ← f(x , a) → f(b, b) ← f(y, a) → f(y, b) is a conversion between distinct normal forms. Adding a single fresh constant c is not enough to violate GUNC as the last two rewrite rules ensure that f(c, b) is the only ground instance of f(x , b) that is a normal form. Adding another fresh constant d, GUNC is lost: f(c, b) ← f(c, a) → f(b, b) ← f(d, a) → f(d, b). The following example shows that at least two fresh constants are required to reduce confluence to ground-confluence for linear variable-separated TRSs. Example 24 Consider the linear variable-separated TRS R consisting of the single rule a → x over the signature F ={a}.Since x ← a → y with distinct variables x and y, R is R R not confluent. Ground-confluence holds trivially as a → a is the only rewrite step between ground terms. Adding a single fresh constant b does not destroy ground-confluence (a → a and a → b are the only steps). By adding a second fresh constant c, ground-confluence is lost: b ← a → c. R R We now turn our attention to the equivalence properties (CE and NE)in P . For conversion equivalence a single fresh constant suffices to reduce it to ground conversion equivalence. Lemma 36 Linear variable-separated TRSs R and S over a common signature F such that T (F ) = ∅ are conversion equivalent if and only if R and S are ground conversion equivalent over F {c}. Proof For the if direction we assume that R and S are ground conversion equivalent over ∗ε∗ ∗ F {c}. Due Lemma 28 and symmetry, it suffices to show the inclusion ↔ ⊆↔ R S ∗ε∗ on terms in T (F , V). Suppose s ↔ t.Let d ∈ F be a constant, whose existence is guaranteed by the assumption T (F ) = ∅, and consider the substitutions σ and σ in the c d ∗ε∗ proof of Lemma 32. Closure under substitutions and variable separation yields sσ ↔ t σ c c ∗ε∗ ∗ ∗ and sσ ↔ t σ . Ground conversion equivalence gives sσ ↔ t σ and sσ ↔ t σ ,and c d c c c d R S S ∗ ∗ ∗ ∗ thus also t σ ↔ t σ . Using Lemma 31 yields s ↔ t σ and t ↔ t σ . Hence s ↔ t as c d d d S S S S desired. The only-if direction easily follows from Lemma 27. Two fresh constants are required to reduce normalization equivalence to its ground version. Lemma 37 Linear variable-separated TRSs R and S over a common signature F are nor- malization equivalent if and only if R and S are ground normalization equivalent over F {c, d}. Proof For the if direction we assume that R and S are ground normalization equivalent over F {c, d}. Note that this implies that NF (t ) ⇐⇒ NF (t ) for all terms t.Weapply R S ∗ε∗ ∗ Lemma 28 and symmetry, reducing the problem to s → t ⇒ NF (t ) ⇒ s → t. R S Let σ and σ be substitutions replacing all variables by c and d respectively. Closure under c d ∗ε∗ substitution and variable separation yields sσ → t σ ,and NF (t σ ) since d does not c d R d appear in R. Ground normalization equivalence gives sσ → t σ . Applying Lemma 31 we c d obtain the desired s → t. The only-if direction follows from Lemma 27. 123 14 Page 42 of 76 A. Middeldorp et al. Contrary to Lemma 36 one fresh constant is not sufficient as seen by the following counterexample. Example 25 Consider the two linear variable-separated TRSs R: a →bf(f(x , y), z) → f(b, b) f(b, x ) → f(b, b) f(x , a) → f(z, b) S: a →bf(f(x , y), z) → f(b, b) f(b, x ) → f(b, b) f(b, a) → f(z, b) f(f(x , y), a) → f(z, b) They are not normalization equivalent since f(x , a) → f(z, b) and f(x , a) → ∗f(z, b). R S The TRSs are however ground normalization equivalent over the signature F {c}.First observe that the only ground normal forms reachable via a rewrite sequence involving a root step are b and f(c, b). The normal form b is reached (using a root step) only from a,in both R and S. The normal form f(c, b) can be reached from all ground terms of the shape f(t , a).For R this is obvious and for S this can be seen by a case analysis on the root symbol of t. Adding a second constant d allows one to mimick the original counterexample since f(c, a) → f(d, b) and f(c, a) → ∗f(d, b). R S For left-linear right-ground TRSs, a single fresh constant is enough to reduce normalization equivalence to ground normalization equivalence. Lemma 38 Left-linear right-ground TRSs R and S over a common signature F are nor- malization equivalent if and only if R and S are ground normalization equivalent over F {c}. Proof We mention the differences with the proof of Lemma 37. For the equivalence of NF (t ) ∗ε∗ and NF (t ) for arbitrary terms t ∈ T (F , V), a single constant suffices. If s → t then t ∗ ∗ is ground. Hence sσ → t and thus sσ → t by ground normalization equivalence. c c R S Lemma 31 gives s → t. Each additional constant can increases the execution time of FORT-h significantly, as seen later in Example 36. Hence results that reduce the required number are of obvious interest. In the remainder of this section we present results for ground TRSs and for TRSs over monadic signatures, which are signatures that consist of constants and unary function symbols. Lemma 39 Let R and S be right-ground TRSs over a signature F.If R and S are ground or F is monadic then (F , R) P ⇐⇒ (F , R) GP for all P ∈ P (F , R, S) COM ⇐⇒ (F , R, S) GCOM Proof First assume that R is ground. In this case only ground subterms can be rewritten. Given a term t ∈ T (F , V), we write t = C[[t ,..., t ]] if t = C[t ,..., t ] and t ,..., t are 1 n 1 n 1 n the maximal ground subterms of t. So all variables appearing in t occur in C. The following property is obvious: ∗ ∗ (a) if t = C[[t ,..., t ]] → u then u = C[[u ,..., u ]] and t → u for all 1  i  n. 1 n 1 n i i R R 123 First-Order Theory of Rewriting… Page 43 of 76 14 ∗ ∗ Suppose (F , R) GCR and consider s → t and s → u with s ∈ T (F , V). Writing R R s = C[[s ,..., s ]], we obtain t = C[[t ,..., t ]] and u = C[[u ,..., u ]] with s → t 1 n 1 n 1 n i i and s → u for all 1  i  n. GCR yields t ↓ u for all 1  i  n. Hence t ↓ u as i i i i desired. The proofs for the other properties in P are equally easy. For UNC we note that↔ ∗ − coincides with→ for the ground TRS R∪ R . R∪ R Next suppose that F is monadic. Let (F , R) GP and let σ be the substitution that maps all variables to some arbitrary but fixed ground term. In this case the following property holds: (b) if t ∈ T (F , V) and t → u then u ∈ T (F ) and t σ → u. We consider P = NFP and P = UNC and leave the proof for the other properties to the ! ! reader. Let s → t and s → u. We obtain sσ → t and sσ → u from property 2. R R R R ∗ ∗ (Note that s = u.) Hence t → u follows from GNFP.Let t ↔ u with normal forms R R t and u.If t and u are ground terms then we obtain t = u from GUNC (after applying the substitution σ to all intermediate terms in the conversion between t and u). Otherwise, the conversion between t and u must be empty due to property (b) and the fact that t and u are normal forms. Hence also in this case t = u. In contrast to COM, the properties NE and CE require additional constants for TRSs over monadic signatures. Example 26 The linear variable-separated TRSs R: f(x ) → a S: f(a) →af(f(a)) → a are neither normalization equivalent nor conversion equivalent as can be seen from f(x ) → a and f(x ) ↔ a. Since every ground term rewrites in R and in S to the unique ground normal form a, the TRSs are ground normalization equivalent as well as ground conversion equivalent. Nevertheless, we can reduce the number of constants to one if the signature is monadic. A key observation is that in non-empty rewrite sequences in a linear variable-separated TRS over a monadic signature fresh constants can be replaced by arbitrary terms. Lemma 40 Let R be a linear variable-separated TRS over a monadic signature F that contains a constant c which does not appear in R.Ifs → t and p ∈ Pos(s) such that s| = cthen s[u] → t using the same rewrite rules at the same positions, for all terms p p u. The proof follows the same structure as Lemma 31 and the details are left for the reader. As linear variable-separated TRSs are closed under inverse we can immediately deduce that + + rewrite sequences of the shape sσ → t σ imply s → t for monadic systems. With this c c R R we are ready to prove our claim. Lemma 41 Linear variable-separated TRSs R and S over a common monadic signature F are normalization equivalent if and only if R and S are ground normalization equivalent over F {c}. Proof We again mention the differences with the proof of Lemma 37. For the equivalence of NF (t ) and NF (t ) for arbitrary terms t ∈ T (F , V), a single constant suffices. Consider a R S ∗ε∗ rewrite sequence s → t with NF (t ). Ground normalization equivalence and substitution 123 14 Page 44 of 76 A. Middeldorp et al. Table 3 Additional constants required to reduce a property to the corresponding ground property Property Ground TRSs Left-linear right-ground TRSs Linear variable-separated TRSs CR 0 1 (0) 2 (2) SCR 0 1 (0) 2 (2) WCR 0 1 (0) 2 (2) COM 0 1 (0) 2 (2) UNR 0 1 (0) 2 (2) UNC 0 2 (0) 2 (2) NFP 0 1 (0) 2 (2) CE 0 1 (1) 1 (1) NE 0 1 (1) 2 (1) ∗ ∗ε∗ closure yields sσ → t σ . Furthermore, since the sequence s → t is non-empty by def- c c S R inition we know that¬NF (sσ ), which in turn yields¬NF (sσ ). Together with NF (t σ ) R c S c S c this means sσ = t σ , and we obtain sσ → t σ . Applying Lemma 40 twice allows us to c c c c replace c in sσ and t σ by the corresponding variables, leading to s → t. c c The following example shows that we cannot reduce the number of constants (in Lem- mata 32 and 34) for linear variable-separated TRSs over a monadic signature and properties P ∈ P ∪{COM}. Example 27 The monadic linear variable-separated TRS R consisting of the rules g(a) → g(x ) g(g(x )) → g(y) does not satisfy WCR and UNR, and hence also not CR, SCR, NFP and UNC, because g(x ) ← g(a) → g(y) with different normal forms g(x ) and g(y). Adding a single fresh constant c is insufficient to violate GSCR and thus also GCR, GWCR, GNFP, GUNC and GUNR, because every term in T ({g, a, c}) can reach precisely one of the three ground normal forms a, c or g(c) and they can all do so in at most one step. Adding an additional constant d does suffice: g(c) ← g(a) → g(d) with different ground normal forms g(c) and g(d).The same behaviour is observed for COM by noting that a TRS is (ground) confluent if and only if it (ground) commutes with itself. The results in this section are summarized in Table 3, which shows the number of additional constants needed to reduce a property to the corresponding property on ground terms. In parentheses are the numbers for monadic TRSs. For termination (SN) and normalization (WN) there is no need to add fresh constants, since these properties hold if and only if they hold for all ground terms. For other properties that can be expressed in the first-order theory of rewriting, one or two fresh constants may be insufficient. Consider for instance the formula ϕ: ∗ ∗ ∗ ¬∃ s∃ t∃ u∀ v(v ↔ s ∨ v ↔ t ∨ v ↔ u) which is satisfied on arbitrary terms (with respect to any left-linear right-ground TRS (F , R)). For the TRS consisting of the rule f(x ) → a and two additional constants c and d, ϕ does not hold for ground terms because every ground term is convertible to a, c or d. It is tempting to believe that adding a fresh unary symbol g in addition to a fresh constant c,inorder to create infinitely many ground normal forms which can replace variables that appear in open 123 First-Order Theory of Rewriting… Page 45 of 76 14 terms, is sufficient for any property P. The formula ∀ s∀ t (s → t ⇒ s → t ) and the TRS consisting of the rule a → b show that this is incorrect. 7 Automation and Certification 7.1 Decision Mode FORT-h is a new decision tool for the first order theory of rewriting. It is a reimplemen- tation of the decision mode of the previous FORT tool [48], referred to as FORT-j in the remainder of the paper. The decision procedure implemented in FORT-j is based on the orig- inal procedure described in [10, 11], in which the basic relations are one-step and parallel rewriting. Anchored GTTs, which form the backbone of the formalized decision procedure described in this paper and implemented in FORT-h, were developed later. The new tool is implemented in Haskell whereas FORT-j is written in Java. FORT-h supports all features of FORT-j while extending the domain of supported TRSs from left-linear right-ground TRSs to linear variable-separated ones. While FORT-j could technically take such TRSs as input, it is unsound when checking non-ground properties on them. Example 28 To check confluence of the linear variable-separated TRS g(g(x )) → g(y) a → g(a) FORT-h can be called with the formula CR. It correctly states that NO the system is not confluent. However, FORT-j incorrectly identifies this as confluent due to the lack of support for variables appearing in right-hand sides of rules. FORT-h took part in the 2020, 2021 and 2022 editions of the Confluence Competition (CoCo), competing in five categories: COM, GCR, NFP, UNC and UNR. In 2021 and 2022 it also competed together with FORTify in the categories COM, TRS, GCR, UNC, UNR and NFP (only in 2022) producing certified answers. Even though it does not support many problems tested in the competition, due to the restriction to linear variable-separated TRSs, it was able to win the category for most YES results in UNR in all three years. The tool expects as input a formula and one or more TRSs, as seen in Fig. 7. It then outputs the answer YES or NO depending on whether the formula is satisfied or not by the given TRSs. The command-line interface of FORT-h is described in Appendix B. The implemented procedure closely follows the procedure described in Sect. 5.5. When called it first parses the formula (format described below) and converts it into an internal represention using de Bruijn indices as described in Sect. 7.2. Additionally, universal quan- tifiers and implications are eliminated, and negations are pushed as far as possible to the atomic subformulas. The tool then traverses the formula in a bottom-up fashion, constructing the corresponding anchored GTTs and RR automata. During this traversal we also keep track of the steps taken, to construct the certificate if necessary. To improve performance the automata are cached and reused for equal subformulas. The tree automaton representing the whole formula is then checked for emptiness. If the accepted language is empty, FORT-h reports NO, otherwise it outputs YES. To avoid having to write formulas using de Bruijn indices when using FORT-h,weuse a more convenient syntax for interacting with the tool. The input format (later called FORT syntax) is described in Appendix A. http://project-coco.uibk.ac.at/ 123 14 Page 46 of 76 A. Middeldorp et al. Fig. 7 FORT-h and FORTify 7.1.1 Witness Generation The usual output of FORT-h consists of a YES or NO answer, and possibly a certificate containing size information of the automata. To help the user in understanding why a property holds or does not hold we support witness generation. This is possible in two cases. Firstly for satisfiable existentially quantified formulas, where FORT-h can produce an n-tuple of ground terms as evidence of existence. Secondly for unsatisfiable universally quantified formulas, where the tuple presents a counterexample. For instance, if a given or synthesized TRS is ∗ ∗ ∗ ∗ not ground-confluent¬∀ s∀ t∀ u (s → t ∧ s → u ⇒ ∃ v(t → v ∧ u → v)),itis interesting to provide witnessing terms for the variables s, t,and u. Given the TRS consisting of the rules a → f(a, b) f(a, b) → f(b, a) FORT-h produces the following terms as witnesses: s = f(a, b), t = f(b, a),and u = f(f(a, b), b). To find these ground terms FORT-h first eliminates universal quantifiers using ∀=¬∃¬, pushes negations inwards and removes double negations in the formula resulting ∗ ∗ ∗ ∗ in ∃ s∃ t∃ u (s → t ∧ s → u ∧¬∃ v(t → v ∧ u → v)). In the next step FORT-h strips outermost negations, none in this case, followed by outermost existential quantifiers ∗ ∗ ∗ ∗ resulting in the so-called formula body: (s → t ∧ s → u ∧¬∃ v(t → v ∧ u → v)). Since the original formula is satisfiable, the RR automaton corresponding to the formula body must accept at least one n-tuple of ground terms. The algorithm depicted in Fig. 8generates (encoded) witnesses that are accepted by the given RR automaton. To find minimal witnesses we use a version of Dijkstra’s shortest path algorithm. We keep track of visited states in Q , a mapping W from states to terms where W(q) is a minimal witness which reaches the state q, and a priority queue P. The search is started at the states reachable in a single step from some constant. We also map from these 123 First-Order Theory of Rewriting… Page 47 of 76 14 Fig. 8 Witness generation states to the respective constants as witnesses in W. In each iteration we select the state q with the smallest witness w from P. The size of a witness is determined by the function size( w ,...,w ) = size(w )+···+ size(w ),where size(w ) is the total number of 1 n 1 n i function symbols in F occurring in w ,so⊥ is not counted. If q is a final state we have found an accepted term and return the witness w. Otherwise we check that we have not visited q previously, set W(q) = w, and enumerate all transition rules containing q on the left-hand side where all states on the left-hand side have been visited, and hence have a witness. If the transition rule is an epsilon transition q → p, then the state p has the same witness as q so we add (p,w, size(w)) to P. For a transition rule f ··· f (q ,..., q ) → p we construct 1 k 1 m a witness w = f ··· f (W(q ), ..., W(q )) and add (p,w, size(w)) to the queue. The 1 k 1 m search continues until a final state is reached or all reachable states have been visited. In the latter case the algorithm fails, since the automaton does not accept any terms. 7.1.2 Collapsing -transitions Keeping the size of automata small is crucial for the performance of FORT-h.One wayto reduce the number of states and transitions is based on the observation that when two states ∗ ∗ q an p are strongly connected by ε-transitions, which means q → p and p → q,then ε ε they are equivalent. In other words, for all ground terms s and t we have s → q if and only ∗ ∗ if t → p, and for all ground contexts C and states r we have C[q]→ r if and only if C[p]→ r. We can therefore replace all occurrences of a state in the transition rules by an equivalent one without changing the accepted language. This reduces the number of states, and may remove duplicate transition rules. In FORT-h we can further take advantage of the fact that some of the most common constructions already produce sets of ε-transitions which are transitively closed. Instead of constructing the strongly connected components, checking if two states q and p are strongly connected then boils down to checking if q → p and p → q. For example, this is case ε ε after computing the transitive closure of anchored GTT relations as in the Theorems 6 and 8. We therefore immediately collapse and eliminate the ε-transitions in the underlying tree automata after these constructions. 123 14 Page 48 of 76 A. Middeldorp et al. Fig. 9 Collapsing ε-transitions in (A, B) Example 29 The anchored GTT G = (A, B) with : a → 0 b → 1 0 →31 →21 → 4 : a → 2 b → 3 c → 4 accepts the rewrite relation of the ARS {a → b, b → a, b → c}. When constructing G = (A∪  (B, A), B∪  (A, B)), we need to compute the ε-transitions in  (A, B). + + + + The result is shown in Fig. 9(a). We can see that the graph contains one non-trivial strongly connected component, consisting of the states {2, 3}. Instead of adding all 10 ε-transitions we can therefore simplify G and  beforehand by replacing all occurrences of state 3 by state 2. This reduces the number of transitions in  (A, B) to 4, as shown in Fig. 9(b), which, when added to G, results in the GTT G = (A , B ) with : a → 0 b → 1 0 →21 →21 →42 →02 →42 → 1 : a → 2 b → 2 c → 4 0 →24 →21 → 2 Note that we also dropped the redundant transition 2 → 2 from  (A, B). 7.2 Certification Whereas witness generation can only provide some evidence to assist the user in understand- ing why certain formulas hold or not, in certification we are interested in machine-readable proofs that are verified by an independent and trustworthy certifier. The first step in the cer- tification process is to translate formulas in the first-order theory of rewriting into a format suitable for further processing. We adopt de Bruijn indices [13] to avoid alpha renaming. Example 30 Consider the formula ∗ ∗ ∗ ∗ ∀ s∀ t∀ u (s → t ∧ s → u ⇒ ∃ v(t → v ∧ u → v)) 0 1 1 0 It expresses the commutation of two TRSs, indicated by the indices 0 and 1. Using de Bruijn indices for the term variables s, t, u, v produces ∗ ∗ ∗ ∗ ∀∀∀ (2 → 1∧ 2 → 0) ⇒ ∃ (2 → 0∧ 1 → 0) 0 1 1 0 We refer to Example 32 for further explanation. The formal syntax of formulas in certificates is given below. Here rr denotes the supported binary regular relations, which are formally defined after Example 31. Likewise, 123 First-Order Theory of Rewriting… Page 49 of 76 14 rr stands for regular sets (which are identified with unary regular relations). formula ::= (rr1 rr term )| (rr2 rr term term ) 1 2 | (and formula ∗ )| (or formula ∗ )| (not formula ) | (forall formula )| (exists formula )| (true)| (false) | (restrict formula ( trs + )) term ::= nat trs ::= nat | nat - nat ::= 0| 1| 2| ··· De Bruijn indices are used for term variables and nat - denotes a TRS with index nat in which the left- and right-hand sides of the rules have been swapped. The class of linear variable-separated TRSs is closed under this operation. We use it to represent the conversion ∗ ∗ − relation↔ of a TRS R as the reachability relation→ induced by the TRS R∪ R . Example 31 The commutation property in Example 30 is rendered as follows: (forall (forall (forall (or (not (and (rr2 (step* (0)) 2 1) (rr2 (step* (1)) 2 0))) (exists (and (rr2 (step* (1)) 2 0) (rr2 (step* (0)) 1 0))))))) Here (step* (0)) denotes the RR relation→ induced by the first TRS (which is indexed by 0) and (rr2 (step* (1)) 2 0) represents the subformula [1] t ->* v of the FORT formula in Example 30. We continue with the certificate syntax of RR and RR relations: 1 2 rr ::= (terms)| (nf( trs + ))| (inf rr )| (proj (1| 2) rr ) 1 2 2 | (union rr rr )| (inter rr rr )| (diff rr rr ) 1 1 1 1 1 1 rr ::= (gtt gtt pos num )| (product rr rr )| (id rr ) 2 1 1 1 | (union rr rr )| (inter rr rr )| (diff rr rr ) 2 2 2 2 2 2 | (comp rr rr )| (inverse rr ) 2 2 2 pos ::= >=| e| > num ::= >=| 1| > gtt ::= (root-step ( trs + ))| (gsteps ( trs + ))| (inverse gtt ) | (union gtt gtt )| (acomp gtt gtt )| (gcomp gtt gtt ) | (inter gtt gtt )| (acomplement gtt )| (atc gtt )| (gtc gtt ) Here (terms) refers to T (F ), (nf( trs + )) to the normal forms (NF) induced by the union of the underlying TRSs, and (inf rr ) to the infinity predicate (INF )which 2 R is satisfied by all terms having infinitely many successors with respect to the relation R. Furthermore, (proj (1| 2) rr ) denotes projection (π) to the first (second) argument, (gtt gtt pos num ) the transformation of a GTT relation into an RR relation with corresponding context closure (Theorems 10 and 11), (id rr ) the identity relation on the underlying set, and (gtc gtt ) ((atc gtt )) the (anchored) transitive closure of the underlying (anchored) GTT relation. The (gsteps ( trs + )) construct serves as an abbreviation for (gtc ((root-step ( trs + )))). The constructs defined above closely correspond to the formalized closure operations for the predicates in the first-order theory of rewriting, summarized in the grammar in Fig. 1. 123 14 Page 50 of 76 A. Middeldorp et al. For convenience of tool authors, we add a few other constructs to rr . The certifier expands these to a sequence of basic constructs given above. rr ::= ··· | (step ( trs + ))| (step= ( trs + ))| (step+ ( trs + )) | (step* ( trs + ))| (step! ( trs + ))| (equality) | (parallel-step ( trs + ))| (root-step ( trs + )) | (root-step= ( trs + ))| (root-step+ ( trs + )) | (root-step* ( trs + ))| (non-root-step ( trs + )) | (non-root-step= ( trs + ))| (non-root-step+ ( trs + )) | (non-root-step* ( trs + ))| (meet ( trs + )) | (join ( trs + ))| (reflcl ( rr )) A certificate for a first-order formula ϕ explains how the corresponding RR automaton is constructed. We adopt a line-oriented natural deduction style. The automata are implicit. This is a deliberate design decision to keep certificates small. More importantly, it avoids having to check equivalence of finite tree automata, which is EXPTIME-complete [8, Sect. 1.7]. certificate ::= ( item inference formula info ∗ ) certificate | (empty item )| (nonempty item ) item ::= nat info ::= (size nat nat nat ) inference ::= (rr1 rr term )| (rr2 rr term term )| (and item ∗ ) 1 2 | (or item ∗ )| (not item )| (exists item )| (nnf item ) Currently the info field only serves as an interface between the tool (which provides the certificate) and the certifier to compare the sizes of the constructed automata. In the future we plan to extend this field with concrete automata. This allows to test language equivalence of a tree automaton computed by a tool that supports our certificate language and the one reconstructed by FORTify, thereby providing tool authors with a mechanism to trace buggy constructions in case a certificate is rejected. We revisit Example 3 to illustrate the construction of certificates. Example 32 The formula ϕ=∀ s∃ t (s → t ∧ NF(t )) expressing normalization is rendered as ϕ =∀∃ (1 → 0 ∧ 0 ∈ NF[0]) in de Bruijn notation. Here 1 refers to the variable s,the second and third occurrences of 0 refer to t, and the last occurrence of 0 refer to the first (and only) TRS, which has index 0. We construct the certificate bottom-up, to mimic the decision procedure. The first line is for NF[0]: (0 (rr1 (nf (0)) 0) (rr1 (nf (0)) 0)) The components can be read as follows: • item = 0 denotes the first step in our proof, • inference = rr1 (nf (0)) 0 constructs the automaton that accepts the normal forms and keeps track of the variable 0, • formula = rr1 (nf (0)) 0 denotes the subformula 0 ∈ NF[0]; it is satisfiable if and only if the automaton constructed using the description in inference is not empty. The apparent redundancy will disappear when we continue. We proceed by expressing the ∗ ∗ relation → and subsequently make sure that the second component of → is in normal 0 0 form: 123 First-Order Theory of Rewriting… Page 51 of 76 14 (1 (rr2 (step* (0)) 1 0) (rr2 (step* (0)) 1 0)) (2 (and (1 0)) (and ((rr2 (step* (0)) 1 0) (rr1 (nf (0)) 0)))) Line 1 is similar to line 0. The inference step (and 1 0) in line 2 constructs an RR automa- ton that accepts the intersection of the relations modeled in lines 1 and 0. This automaton corresponds to A in Example 3. The cylindrification step from A to A in Example 3 is 5 1 4 left implicit. We continue with the projection of variable 0 and afterwards complement the resulting automaton. This is done by an exists followed by a not inference step: (3 (exists 2) (exists (and ((rr2 (step* (0)) 1 0) (rr1 (nf (0)) 0))))) (4 (not 3) (not (exists (and ((rr2 (step* (0)) 1 0) (rr1 (nf (0)) 0)))))) The inference steps until this point describe the construction of A in Example 3.Wecomplete the certificate by introducing the remaining operators: (5 (exists 4) (exists (not (exists (and ((rr2 (step* (0)) 1 0) (rr1 (nf (0)) 0))))))) (6 (not 5) (not (exists (not (exists (and ((rr2 (step* (0)) 1 0) (rr1 (nf (0)) 0)))))))) (7 (nnf 6) (forall (exists (and ((rr2 (step* (0)) 1 0) (rr1 (nf (0)) 0)))))) (nonempty 7) The nnf inference step does not modify the tree automaton computed in step 6 (which corresponds to A in Example 3) but checks the equivalence of the formula in line 6 with the one of line 7, which corresponds to the input formula ϕ . The equivalence check incorporates ∀ elimination, negation normal form, and associativity, commutativity and idempotency of∧ and∨. In the future we might add support for additional equivalences in first-order logic. The final step (nonempty 7) checks that L(A ) = ∅. So this certificate claims that the input TRS is normalizing. For TRSs that do not satisfy ϕ, the final line in the certificate would be (empty 7). In the previous example we intentionally skipped over some details to convey the under- lying intuition. First of all, the rr construct (step* (0)) is derived and internally unfolded via (anchored) GTTs into (gtt (gtc (root-step 0)) >= >) Starting from an anchored GTT that accepts the root step relation induced by the first (and only) TRS in the list, an application of the GTT transitive closure operation followed by a multi-hole context closure operation with at least one hole that may appear in any position, an RR automaton that accepts the relation→ is constructed. We also mentioned that cylin- drification is implicit. The same holds for the projection operation that is used in the exists inference steps. A projection takes place in the first component if the variable 0 is present in the list of variables, otherwise the inference step preserves the automaton. This approach is sound as variables indicate the relevant components of the RR automaton. Thanks to the de Bruijn representation, the innermost quantifier refers to variable 0, the first component in the given RR automaton. However we must keep track of all variables occurring in the surrounding formula and update that list accordingly. 123 14 Page 52 of 76 A. Middeldorp et al. 7.3 FORTify The example in the preceding subsection makes clear that certificate can be viewed as a recipe for the certifier to perform certain operations on automata and formulas to confirm the final (non-)emptiness claim. In particular, checking a certificate is expensive because the decision procedure for the first-order theory is replayed using code-generated operations from a verified version of the decision procedure. In this subsection we describe the steps we performed to turn the Isabelle formalization of the decision procedure into our certifier FORTify. The formalization is split into two parts. The second part is about the certification process, but we start our description with the first part [35] which serves as a general tree automata library. This part includes bottom-up tree automata with ε-transitions, (anchored) ground tree transducers, encoding of regular relations, and their respective closure properties. Addi- tionally it contains a framework to simplify code generation of inductively defined sets as in Fig. 3. Such inductive sets, if they are finite, can be computed by a saturation procedure. We provide an abstraction for that, which essentially does Horn inference without negative atoms. The point of the abstraction is that it separates a common iterative or recursive part of saturation procedures (which gives rise to non-trivial correctness proofs) from the enumera- tion of inferences without premises (H , see below), and inferences induced by a single new conclusion (H , also below), which usually are set comprehensions that can be computed in a very straightforward way. Definition 19 A positive Horn inference system is given by a set of atoms A (with elements α, β,…)and set H of inference rules of the shape α ∧···∧ α → β.Wewrite!→ β if 1 n the list of premises is empty. Each positive Horn inference system defines a predicate H on atoms inductively by the rule α ∧···∧ α → β ∈ H H(α ) for 1  i  n 1 n i H(β) Example 33 Consider the inference rules from Fig. 3. To obtain a positive Horn inference system for given automata A and B,let A = Q × Q where Q is the set of states occurring in A or B.The set H consists of the following inference rules: • (p, r ) → (q, r ) if p → q and r ∈ Q, • (p, q) → (p, r ) if q → r and p ∈ Q,and • (p , q ) ∧ ... ∧ (p , q ) → (p, q) if f (p ,..., p ) → p and f (q ,..., q ) → q. 1 1 n n 1 n A 1 n B These Horn clauses correspond directly to Fig. 3 with p  q replaced by (p, q). It is easy to see that the resulting H satisfies (p, q) ∈ H if and only if p  q. We have formalized an abstract marking algorithm for positive Horn inference systems. In order to use this algorithm, the user has to provide implementations for two building blocks, H and H , which are given by 0 1 H ={ β |!→ β ∈ H} H (α, B)={ β | α ∧···∧ α → β ∈ H and α∈{ α ,...,α }⊆ B∪{ α}} 1 1 n 1 n In essence, H computes inferences without premises, whereas H (α, B) provides all pos- 0 1 sible conclusions involving a particular premise α together with other premises fulfilled by B. These two ingredients are sufficient to implement a simple marking algorithm: 123 First-Order Theory of Rewriting… Page 53 of 76 14 saturate_rec(α, I ): saturate: if α ∈ I then return I I := ∅; else for all α ∈ H do J := { α}∪ I ; I := saturate_rec(α, I ) for all β ∈ H (α, I ) do return I J := saturate_rec(β, J ); return J Most of the work is performed by saturate_rec, whose purpose is to add a newly inferred atom α to an accumulator I of previously inferred atoms, taking into account all further inferences that can be made using α and elements of I . It relies on H for computing the set of atoms that can be inferred using β at least once and elements of I for other premises. The main method saturate iterates over the elements of H and adds them to the accumulator I using the saturate_rec helper, starting with I = ∅. We formalized soundness of saturate, and of refinements to lists and finite sets. Example 34 Continuing from Example 33, we note that the computation of H and H can 0 1 often be done efficiently without ever computing the full set H. For the inference rules from Fig. 3, we obtain the following descriptions: H ={ (p, q) | f → p and f → q} 0 A B H ((p, q), B)={ (r , q) | p → r}∪{ (p, r ) | q → r}∪ H 1 A B where H consists of all pairs (p , q ) such that f (p ,..., p ) → p f (q ,..., q ) → q 1 n A 1 n B with (p , q ) ∈ B∪{ (p, q)} for all 1  i  n,and (p, q) = (p , q ) for some 1  i  n.This i i i i last component is slightly complicated (but not much more complicated than the definition of H itself). On the other hand, the first two components of H make no reference to Q,which is a welcome simplification. Isabelle/HOL has a predicate compiler [5] that produces executable code for certain inductive sets, but it is quite restricted; basically, it works by searching all possible derivation trees to arrive at a conclusion. This easily leads to non-termination when there are infinitely many such trees, which often happens. For example, using the rules in Fig. 3,ifwewantto check whether 1  2and thereisan ε-transition 1 → 1, then the first inference rule is a possible candidate for the last inference step, leading us to check 1  2 recursively, ad infinitum. In our formalization, GTT compositions and GTT transitive closure are implemented on top of positive Horn inference. The other building blocks are derived directly from the definitions, using automatic and some manual refinement to obtain concrete implementations. This concludes the first part. In the remainder of this section details of the second part are discussed [33]. We use the FOL-Fitting library [4], which is part of the Archive of Formal Proofs, to connect the first-order theory of rewriting and first-order logic. The translation is more or less straightforward. We interpret RR constructions as predicates and RR con- 1 2 structions as relations in first-order logic and prove both interpretations to be semantically equivalent: lemma eval_formula F Rs α f = eval α undefined (for_eval_rel F Rs)(form_of_formula f ) 123 14 Page 54 of 76 A. Middeldorp et al. With this equivalence we are able to define the semantics of formulas: definition formula_satisfiable where formula_satisfiable F Rs f ←→ (∃ α. range α ⊆ T F ∧ eval_formula F Rs α f ) definition formula_unsatisfiable where formula_unsatisfiable F Rs fm ←→ (formula_satisfiable F Rs fm = False) definition correct_certificate where correct_certificate F Rs claim infs n≡ (claim = Empty ←→ (formula_unsatisfiable (fset F)(map fset Rs) (fst (snd (snd (infs! n)))))∧ claim = Nonempty ←→ formula_satisfiable (fset F)(map fset Rs) (fst (snd (snd (infs! n))))) Last but not least we define the important function check_certificate which takes as input a signature, a list of TRSs, a Boolean, a formula, and a certificate. This function first verifies that the given formula and the claim corresponds to the ones referenced in the certificate and afterwards checks the integrity of the certificate. The following lemmata, which are formally proved in Isabelle, state the correctness of the check_certificate function: lemma check_certificate F Rs A fm (Certificate infs claim n)= Some B ⇒ fm = fst (snd (snd (infs! n)))∧ A= (claim = Nonempty) lemma check_certificate F Rs A fm (Certificate infs claim n)= Some B ⇒ (B= True −→ correct_certificate F Rs claim infs n)∧ (B= False −→ correct_certificate F Rs (case claim of Empty ⇒ Nonempty | Nonempty ⇒ Empty) infs n) The first lemma ensures that our check function verifies that the provided parameters fm (formula) and A (answer satisfiable/unsatisfiable) match the formula and the claim stated in the certificate. The second lemma is the key result. It states that the check function returns Some True if and only if the certificate is correct. The only-if case is hidden in the last two lines. More precisely, if the claim of the certificate is wrong then negating the claim (the first-order theory of rewriting is complete) leads to a correct certificate. Therefore, if our check function returns Some None then the certificate is correct after negating the claim. Our check function returns None if the global assumptions (the input TRS is not linear variable-separated, the signature is not empty, etc.) are not fulfilled. We plan to extend the check_certificate function in the near future such that it reports these kinds of errors. A central part of the formalization is to obtain a trustworthy decision procedure to verify certificates. Hence we use the code generation facility of Isabelle/HOL to produce an exe- cutable version of our check_certificate function. Isabelle’s code generation facility is able to derive executable code for our constructions with the exception of inductively defined sets. We use the abstract Horn inference system framework of Definition 19 to obtain executable code for the following constructions defined as inductive sets: • reachable and productive states of a tree automaton, 123 First-Order Theory of Rewriting… Page 55 of 76 14 Table 4 Formalization statistics Topics Lines Facts Defs Utility files 1892 187 19 Terms, context, and rewriting 3969 454 97 Horn inference system 462 39 17 Tree automata 2891 319 66 Regular relations 4016 285 65 Primitives and context closure 4043 318 43 FORT decision procedure 2023 107 60 Signature extension 2874 182 15 Implementation files 3058 190 81 Total 25, 228 2081 463 • states of tree automata obtained by the subset construction, • ε-transitions for the composition and transitive closure constructions of (anchored) GTTs, • an inductive set needed for the tree automaton for the infinity predicate. At this point we can use Isabelle’s code generation to obtain an executable check function. The resulting code-generated certifier is called FORTify. The overall design of FORTify is shown in the bottom half of Fig. 7. It can be viewed as two separate modules A and B. Module B is the verified Haskell code base that is generated by Isabelle’s code generation facility, containing the check_certificate function and the data type declarations for formulas and certificates. To use this functionality, we wrote a parser which translates strings representing formulas (signatures, TRSs, certificates) to semantically equivalent formulas (signatures, TRSs, certificates) represented in the data types obtained from the generated code. This was done in Haskell and refers to module A in Fig. 7. Module A accepts formulas in FORT syntax. Hence it also applies the conversion to the de Bruijn representation. After the translation in module A, the check_certificate function in module B is executed and its output is reported. Importantly, the code in module A is not verified in Isabelle. Correctness of FORTify must therefore assume correctness of module A as well as the correctness of the Glasgow Haskell Compiler, which we use to generate a standalone executable from the generated code. Table 4 lists some statistics of the underlying formalization. 7.4 Synthesis Mode FORT can be used to synthesize TRSs that satisfy properties given by the user (which is different from finding witnessing terms in formulas as described in Sect. 7.1). This is useful for finding counterexamples and non-trivial TRSs for exam exercises as well as competitions. The synthesis procedure for a given signature F boils down to generating candidate TRSs and then checking the given property as shown in Fig. 10. The latter is done using a call to the decision procedure decide(F,ϕ, C ), which checks if the formula ϕ holds for the system C over the domain T (F ). To limit and control the search space we introduce the parameters r, R, D and v: • r and R specify the lower and upper bound on the number of rewrite rules, • D specifies the upper bound on the height of the left- and right-hand sides of the rules, • v specifies the number of different variables that may appear in the rewrite rules. 123 14 Page 56 of 76 A. Middeldorp et al. Fig. 10 Simplified synthesis procedure (for a fixed signature) By default the procedure searches for left-linear right-ground TRSs, but can also synthesize linear variable-separated systems. This affects the generation of candidate TRSs S in Fig. 10. To extend the functionality and improve performance, the implementation in the synthesis tool (FORT-s) differs from the procedure in Fig. 10. Since the greatest cost when running the procedure comes from executing the decision procedure, care is taken to not generate and check equivalent system more than once. To this end, we keep track of fresh terms from previous iterations and only generate rules containing at least one new term, and the fresh terms in T must contain at least one new term in an argument position. Similar improvements are used when generating the rewrite systems. The second major performance improvement is the possibility of checking systems in parallel. It is of interest to synthesize TRSs that depend on one or more other TRSs. This can be done by passing additional TRSs to FORT-s in addition to a formula which references multiple systems. The additional systems are then also passed to the decision procedure. For example, if we want to transform our leading TRS R (see Example 1) into an equivalent complete TRS (on ground terms), we pass both R and the formula ∗ ∗ (GWCR ∧ SN )∧∀ s∀ t (s ↔ t ⇐⇒ s ↔ t ) 0 0 0 1 to FORT-s. Here the index 1 refers to R and the index 0 to the system to be synthesized. This returns the TRS consisting of the rules a →bf(b) → g(a, a) g(b, b) → a Using formulas referencing multiple TRSs FORT-s can also be used to synthesize multiple systems. For convenience FORT-s supports multiple ways to specify the signature used during synthesis. The full user interface of FORT-s is given in Appendix C. 7.5 Undecidability of Synthesis Since the first-order theory is decidable for linear variable-separated TRSs a natural question arises. Is synthesis also decidable for these systems? In other words, can we determine if there exists a linear variable-separated TRS satisfying a given property? Unfortunately this is not the case. 123 First-Order Theory of Rewriting… Page 57 of 76 14 Theorem 17 The following problem is undecidable: instance: a closed formula ϕ in the first-order theory of rewriting question: does some linear variable-separated TRS R satisfy ϕ Proof We show the undecidability by a reduction from Post correspondence problem. Let P be a finite set of pairs of non-empty strings over the alphabet{0, 1}.Wedefineaformula ϕ in the first-order theory of rewriting that is satisfiable if and only if P has a solution. To this end, consider the following predicates: node(x ) := x → x next(x , y) := node(x )∧ node(y)∧ x → y ∧ x = y step := ∀x node(x )∧ x = e ⇒ ∃ y next(x , y) unique := ∀x∀y∀z next(x , y)∧ next(x , z) ⇒ y = z linear := step∧ unique value(x , 0) := x → a∧¬(x → b) value(x , 1) := x → b∧¬(x → a) finite := ¬∃ x INF (x ) Positions in a solution string are represented by nodes, which are linearly ordered. Nodes are characterized by self-loops. The special nodes s and e mark the starting and final positions in a solution of P. The predicate finite ensures that solution strings are finite. We have two additional elements, a and b that correspond to the symbols 0 and 1. border(x , y) := node(x )∧ node(y)∧∃ z (¬node(z)∧ x → z ∧ z → y) The border predicate marks the two positions in a solution string corresponding to the decomposition into first and second components. The latter is checked by the solution predicate: match(x , x ··· x ,v ··· v ) := next(x , x )∧ value(x ,v ) 0 1 k 1 k i−1 i i i i=1 pair(x , y,v,w) := ∃ x ...∃ x ∃ y ...∃ y border(x , y )∧ 1 |v| 1 |w| |v| |w| match(x , x ··· x ,v)∧ match(y, y ··· y ,w) 1 |v| 1 |w| solution := ∀x∀y border(x , y) ⇒ (x = y ∧ x = e)∨ pair(x , y,v,w) (v,w)∈P The formula ϕ is now defined as ∃ s∃ e∃ a∃ b s = e∧ border(s, s)∧ linear∧ finite∧ solution Note that the witnessing TRSs constructed in the above proof are actually abstract rewrite systems (ARSs) that consist of rewrite rules between constants. The construction is illus- trated in Fig. 11 , for the PCP instance P ={ (1, 011), (10, 11), (001, 00)} with solution 001|10|001|1 = 00|11|00|011. The separation bars correspond to the elements b , b and 1 2 b . Node n witnesses e. Elements 0 and 1 witness a and b. 3 9 123 14 Page 58 of 76 A. Middeldorp et al. Fig. 11 The construction for PCP instance P The synthesis problem is obviously decidable for ARSs over a fixed signature, but remains undecidable for TRSs over a fixed signature, since we can still generate an arbitrary number of ground terms using non-constant function symbols. Take for example the signature{E, s, 0}, where E and s are unary function symbols and 0 is a constant. We can then represent an arbitrary number n of objects (nodes, borders and values in the encoding) using ground terms of the shape E(s (0)). The rules of the ARS correspond to rules between such ground terms of the generated TRS. (The inclusion of the function symbol E removes any possibility of unwanted overlap between rules of the TRS.) 8 Experiments In this section we describe the experiments we performed with FORT-h, FORT-s,and FOR- Tify. We include version 1.0 of FORT-h, which was first published as part of an artifact in conjunction with [42]. The current version of FORT-h is 2.0. Full details of the experiments are available from the website accompanying this paper. Precompiled binaries of FORT-h 2.0, FORT-s,and FORTify are available from the same site. All experiments were run on a computer equipped with an Intel Core i7-5930K processor with 6 cores, and with 32 GB of memory. To remove any ambiguity in the calls made to the tools we use FORT-syntax (see Appendix A) to specify formulas in this section. This also aids in replicating the experiments. 8.1 FORT-h and FORTify For the experiments reported in this section we used a timeout of 60 s for the decision tools and 600 s for FORTify. 8.1.1 Comparing Different Representations of Properties The problems for these experiments are taken from the Confluence Problems database (COPS), and consists of 122 left-linear right-ground TRSs. The formulas were taken from the experiments reported in [46]. Experiment 1 The first three "forall s, t, u (s ->*t&s->* u=>t join u)" (15) https://fortissimo.uibk.ac.at/tacas2021/ https://fortissimo.uibk.ac.at/jar https://cops.uibk.ac.at/ 123 First-Order Theory of Rewriting… Page 59 of 76 14 Table 5 FORT-h (with FORTify)and FORT-j run on GCR formulas YES ∅-time ✔ NO ∅-time ✔ ∞ total (✔) time (15) FORT-h 2.0 37 0.89 s 37 84 0.69 s 81 1 151.12 s (0.8 h) FORT-h 1.0 36 0.26 s 10 84 0.56 s 16 2 176.23 s (17.6 h) FORT-j 37 0.31 s – 82 0.52 s – 3 234.08 s (16) FORT-h 2.0 38 1.50 s 37 84 0.06 s 81 0 62.13 s (0.9 h) FORT-h 1.0 37 1.48 s 10 84 0.09 s 16 1 122.55 s (17.8 h) FORT-j 37 0.32 s – 82 0.50 s – 3 233.20 s (17) FORT-h 2.0 37 0.91 s 37 83 0.04 s 81 2 156.64 s (1.0 h) FORT-h 1.0 36 0.45 s 6 83 0.08 s 9 3 202.64 s (18.2 h) FORT-j 37 0.32 s – 82 0.55 s – 3 236.69 s "forall s, t, u (s ->*t&s->u=> t join u)" (16) "forall t, u (t <->* u => t join u)" (17) denote different but equivalent formulations of ground-confluence (GCR). The results are showninTable 5, where the YES (NO) column shows the number of systems determined to be (non-)ground-confluent together with average time (∅-time) the tool took. The∞ column is the number of timeouts. To compare overall performance the total time column contains the sum of all run times, including timeouts but excluding the time taken by FORTify.The ✔ columns show the numbers of certifiable results as well as the overall time taken by FORTify. These results show that, even though they have the same meaning, the choice of formula has an impact on performance. Most notably this can be seen when comparing the number of solved problems by FORT-h 2.0. The formula (16) (semi-confluence) was fastest with no timeouts, followed by (15) with one timeout and (17) with two. It is apparent that formulas containing conversion (↔ ) are especially slow, which we will also see in later experiments. Further note that FORT-h 2.0 can solve an additional problem compared to the 1.0 version, for each formula. Interestingly FORT-h (2.0) is generally faster and can solve more problems than FORT-j even though the latter implements parallelism. This performance advantage is more promi- nent in systems which are non-confluent where FORT-h can solve more problems, while for problems with the answer YES, FORT-j can solve close to the same number of problems, while taking less time per problem in general. The table also shows that FORTify can certify most of the results, which is a large improvement over the previous version. Here the differ- ence between the three formulas is not as visible, but it is also faster on (16)and (15), and slowest on (17). The times for FORTify must also be seen in the context that it ran on more problems on the first two formulas, since FORT-h could produce more certificates. No wrong results by the decision tools where identified. Experiment 2 The second set of formulas represents the normal form property, restricted to ground terms (GNFP): "forall t, u (t <->* u & NF(u) => t ->* u)" (18) "forall s, t, u (s ->t&s->! u=>t->* u)" (19) "forall t (WN(t) => CR(t))" (20) The results for these are shown in Table 6. The same pattern is observed, where even though 123 14 Page 60 of 76 A. Middeldorp et al. Table 6 FORT-h (with FORTify)and FORT-j run on GNFP formulas YES ∅-time ✔ NO ∅-time ✔ ∞ Total (✔) time (18) FORT-h 2.0 59 0.30 s 57 63 0.04 s 63 0 20.37 s (0.5 h) FORT-h 1.0 59 0.70 s 31 63 0.07 s 20 0 45.62 s (14.6 h) FORT-j 59 0.23 s – 63 0.39 s – 0 38.16 s (19) FORT-h 2.0 59 0.02 s 59 63 0.01 s 63 0 1.76 s (0.1 h) FORT-h 1.0 59 0.03 s 46 63 0.01 s 50 0 2.55 s (6.3 h) FORT-j 59 0.22 s – 63 0.30 s – 0 31.83 s (20) FORT-h 2.0 59 0.03 s 56 62 0.11 s 62 1 68.83 s (0.8 h) FORT-h 1.0 59 0.05 s 42 62 0.12 s 45 1 70.51 s (8.6 h) FORT-j 59 0.31 s – 62 0.64 s – 1 117.86 s Table 7 FORT-h 2.0 run on YES ∅-time NO ∅-time ∞ total time " forall s,t (s <->* t)" (21) 91 0.10 s 31 0.42 s 0 22.00 s with differing encodings of (22) 91 0.10 s 31 0.48 s 0 24.22 s conversion (23) 91 0.07 s 31 0.41 s 0 19.31 s all three can (dis)prove satisfaction for the same formulas, FORT-h 2.0 is faster than FORT-j overall, and has improved over FORT-h 1.0. Since the representations containing conversion (↔ ) in the previous experiments are outperformed by the other representations, it is often a good idea to avoid it. Obviously this is not always possible. Take the properties UNC, CE or consistency for example. It is therefore important to choose the correct representation in the primitive automata constructions, to ensure good performance when conversion cannot be avoided. Experiment 3 We tested the following three representations of conversion for a TRS R: ε − ε + ((→ ) ∪→ ) ) (21) R R ε − ε) ((→ ) ◦→ ) (22) R R ε + ((→ ) ) (23) R∪ R The representation (21) is the one listed in Table 2. Using composition (◦) instead of union as in (22) works because ε − ε ε − − ε (→ ) ◦→ = ((→ ) ◦−→ ) ∪ ((−→ ) ◦→ ) R R R R R R ε ε The third representation (23) uses the identity → =↔ and is the default used R∪ R by FORT-h. The results of running FORT-h 2.0 on the COPS dataset, using the formula " forall s, t (s <->* t)" for consistency with the three different representations of conversion can be seen in Table 7.Wecan seethat(23) is the fastest with and overall runtime of 19.31 s. It is about 12% faster than (21) and about 20% faster than (22). Also important is that (23) produces smaller automata, which leads to better performance when conversion is embedded within larger formulas. Consider for example COPS #741: if(true, a, x ) →aif(true, g(a), x ) → g(a) g(a) → g(g(a)) 123 First-Order Theory of Rewriting… Page 61 of 76 14 Table 8 FORT-h 2.0 (with FORTify) run on normalization with different encodings of NF YES ∅-time ✔ NO ∅-time ✔ ∞ total (✔) time "NF(t)" 41 0.02 s 41 81 0.00 s 81 0 0.85 s (20.50 s) "∼exists u (t -> u)" 41 0.02 s 41 81 0.00 s 81 0 1.05 s (23.71 s) if(true, b, x ) →bif(true, g(b), x ) → g(b) g(b) → a if(false, x , a) →aif(false, x , g(a)) → g(a) f(a, b) → b if(false, x , b) →bif(false, x , g(b)) → g(b) f(g(g(a)), x ) → b The RR automata representing (21)and (22) both contain 233 states, 7927 transitions and 9 ε-transitions before trimming, and 132 states and 4937 transitions after. In comparison the automaton for (23) contains 152 states, 3975 transitions and 9 ε-transitions before, and 75 states with 2313 transitions after trimming. Overall (23) therefore has less than half the number of transitions in this example, which can have a significant effect in any later closure operations. The final experiment in this subsection involves the normal form predicate NF(t ),which is implemented in FORT-h according to the description in Sect. 5.4, instead of using the equivalent formula¬∃ u (t → u). Experiment 4 Consider the formula "forall s (exists t (NF(t) & s ->* t))" for normalization and COPS #503: f(a, a, b, b) → f(c, c, c, c) a →ba →cb →ab → c When using the formula¬∃ u (t → u) for NF(t), FORT-h first constructs the RR automaton A for t → u, with 4 states and 15 transitions. It then projects to construct the automaton A for∃ u (t → u) with 4 states and 13 transitions, and finally it has to determinize A and 2 2 construct the complement for the negated formula¬∃ u (t → u), resulting in the automaton A with 4 states and 259 transitions before and 1 state with two 2 transitions after trimming. If instead the direct normal form predicate is used, FORT-h immediately produces the latter automaton, without having to construct the intermediate automata or having to trim. The impact on runtime can be seen in Table 8 . It is rather small for FORT-h,but for FORTify the direct construction is about 13% faster. When looking at the sizes of the automata, the average untrimmed automaton A , for our dataset of left-linear right-ground COPS problems, contains 75.8 transitions while the average automaton for the normal form predicate contains 13.3 transitions. 8.1.2 Properties Involving Multiple TRSs We also ran experiments to test performance on properties involving two TRSs. As a dataset we constructed problems of all ordered pairs of COPS problems, resulting in 7503 pairs. Experiment 5 The first property tested was ground-commutation (GCOM). The results, pre- sented in Table 9, show that FORT-h is ahead of FORT-j here as well. It can (dis)prove more problems, timing-out on only two as compared to 49 problems. Additionally it does so in significantly less time. With FORTify we can see a large improvement over the old version. It is able to certify close to 98% of the results found by FORT-h 2.0. 123 14 Page 62 of 76 A. Middeldorp et al. Table 9 FORT-h (with FORTify)and FORT-j run on GCOM YES ∅-time ✔ NO ∅-time ✔ ∞ total (✔) time FORT-h 2.0 1381 0.10 s 1368 6120 0.02 s 5965 2 374.63 s (51.5 h) FORT-h 1.0 1381 0.16 s 878 6120 0.03 s 3666 2 517.32 s (681.5 h) FORT-j 1354 1.46 s – 6100 0.94 s – 49 10670.89 s In the 2019 edition of the Confluence Competition [41] three tools contested the commu- tation (COM) category: ACP [2], CoLL [49], and FORT-j. On input problem COPS #1118 the tools gave conflicting answers. Example 35 COPS #1118 is about the commutation of the TRSs COPS #669 a →cf(a) →bb →bb → h(b, h(c, a)) and COPS #695 h(a, a) →cb → h(b, a) b →af(c) →cc → a To determine the correct answer we use FORT-h 2.0 to produce a certificate for ground- commutation by calling > fort-h -c cert -i "GCom([0],[1])" 1118.trs YES This produces the following certificate: (0 (rr2 (comp (inverse (step* (1))) (step* (0))) 0 1) (rr2 (comp (inverse (step* (1))) (step* (0))) 0 1) (size 13 53 0)) (1 (rr2 (comp (step* (0)) (inverse (step* (1)))) 0 1) (rr2 (comp (step* (0)) (inverse (step* (1)))) 0 1) (size 11 47 0)) (2 (not 1) (not (rr2 (comp (step* (0)) (inverse (step* (1)))) 0 1))) (3 (and (0 2)) (and ((rr2 (comp (inverse (step* (1))) (step* (0))) 0 1) (not (rr2 (comp (step* (0)) (inverse (step* (1)))) 0 1))))) (4 (exists 3) (exists (and ((rr2 (comp (inverse (step* (1))) (step* (0))) 0 1) (not (rr2 (comp (step* (0)) (inverse (step* (1)))) 0 1)))))) (5 (exists 4) (exists (exists (and ((rr2 (comp (inverse (step* (1))) (step* (0))) 0 1) (not (rr2 (comp (step* (0)) (inverse (step* (1)))) 0 1))))))) (6 (not 5) (not (exists (exists (and ( (rr2 (comp (inverse (step* (1))) (step* (0))) 0 1) https://cops.uibk.ac.at/results/?y=2019&c=COM 123 First-Order Theory of Rewriting… Page 63 of 76 14 Table 10 FORT-h 2.0 (with FORTify) run on (G)CE and G(NE) YES ∅-time ✔ NO ∅-time ✔ ∞ total (✔) time GCE 157 0.70 s 150 7162 0.94 s 6736 184 5.0 h (125.6 h) CE 151 0.74 s 144 7168 0.93 s 6739 184 5.0 h (127.1 h) GNE 181 0.02 s 181 7320 0.04 s 7308 2 448.75 s (5.4 h) NE 177 0.02 s 177 7324 0.04 s 7312 2 446.54 s (5.6 h) (not (rr2 (comp (step* (0)) (inverse (step* (1)))) 0 1)))))))) (7 (nnf 6) (forall (forall (or ( (not (rr2 (comp (inverse (step* (1))) (step* (0) )) 0 1)) (rr2 (comp (step* (0)) (inverse (step* (1)) s))0 1)))))) (nonempty 7) When passing this certificate to FORTify, after 0.2 s the output Certified is produced, so we can be assured that the TRSs do commute. Note that the inference steps 0 and 1 contain the optional size information. Here (sizekmn) means the underlying RR automaton constructed by FORT-h 2.0 contains k states, m transitions, and n ε-transitions. Experiment 6 For the second experiment using multiple TRSs we tested FORT-h 2.0 and FORTify on conversion equivalence and normalization equivalence, once for all terms and once for only ground-terms. FORT-h 1.0 and FORT-j have not implemented the necessary signature extension results to cover these properties, and are therefore not run. The results can be seen in Table 10. Comparing the properties to the corresponding ground-properties, we can see that FORT-h 2.0 succeeds to find results on the same number of problems. However, six results moved from YES to NO in the case of (G)CE and four in the case of (G)NE.These correspond to TRSs where the additional constants are needed to disprove the property. While the run times of FORT-h 2.0 stayed almost the same when comparing the ground and non-ground properties, we can see that FORTify does take longer to certify results on the non-ground properties. This is to be expected, since the additional constants lead to larger automata. Simply by having a larger signature, some of the atomic constructions produce more transition rules. While this is usually only a small difference it can have a significant effect when embedded within a bigger formula. 8.1.3 Optimizations To show this effect, and the improvement caused by Lemma 39 consider the following example. Example 36 Consider COPS #214 a →ba → f(a) b → f(f(b)) f (b) → b where f represents 64 nested applications of f. To check UNC, FORT-h 2.0 extends the signature as needed and uses the formula for GUNC internally represented as ¬∃(∃((NF(0)× NF(1))∧ 0 = 1∧ 0 ↔ 1)) 123 14 Page 64 of 76 A. Middeldorp et al. Table 11 FORT-h 2.0 run on UNC with and without Lemma 39 YES ∅-time NO ∅-time ∞ total time "UNC" (with Lemma 39) 72 0.29 s 49 0.20 s 1 90.92 s "{+2} GUNC" (two constants) 72 0.54 s 49 0.20 s 1 108.52 s Fig. 12 Graph presentation of COPS #116 In this case no constants have to be added, since the TRS is ground. The intermediate automa- ton A for the subformula NF(0)×NF(1) contains no transitions, since the TRS has no normal forms for the given signature. For the automaton A of 0 = 1 we have 13 transitions and A 2 3 for 0 ↔ 1 has 150,569 transitions. Like we have seen in earlier experiments, the automaton for conversion is clearly the largest, and would also take the largest amount of time to con- struct. However, since A is empty, the intersection with A and then further with A will 1 2 3 also be empty. And due to the lazy evaluation strategy of Haskell the third automaton will never be constructed. Therefore FORT-h 2.0 can almost instantly (0.01 s) determine that the automaton for the formula within the negation is empty, and conclude that UNC holds. How- ever, if we were to ignore the optimization introduced by Lemma 39 and add two constants the automaton A is no longer empty, since we added two normal forms to the domain. This changes the numbers as follows: The automaton A would contain 15 transitions and 3 states, A has 31 transitions and 3 states, and A has 150,571 transitions and 4356 states. Since 2 3 none of the automata are empty we must construct the intersection A ∩ A containing 34 1 2 transitions and 6 states. After trimming this drops to 20 transitions and 4 states. The intersec- tion (A ∩ A )∩ A then results in an automaton with 132,652 transitions and 8584 states. 1 2 3 Only after trimming we see that this automaton is empty to conclude that UNC holds. Overall this takes FORT-h 2.0 7.15 s, which is orders of magnitude slower than with the optimization. While such large speedups are not the norm, the overall runtime on the COPS dataset for UNC drops by about 16%, as seen in Table 11. Example 37 To see that the optimization of collapsing strongly connected states, introduced in Sect. 7.1, can have a significant effect consider COPS #116. It is an ARS consisting of 26 rules presented as a graph in Fig. 12. To check if it is consistent we can use the formula "∼forall s, t (s <->* t)" which is internally represented as∃(∃(¬ (0 ↔ 1))). For this FORT-h constructs the automaton A for 0 ↔ 1, consisting of 8 states 418 transitions and 3 ε-transitions. After eliminating the ε-transitions and trimming, we are left with 1 state c ∗ and 361 transitions. The complement automaton A which represents ¬(0 ↔ 1) has the same size, which drops to zero after trimming, showing that the system is not consistent. Overall FORT-h takes 0.34 s. If we however remove the optimization and do not collapse strongly connected compo- nents, we get significantly larger automata. The automaton A grows to 8427 states, 2827 123 First-Order Theory of Rewriting… Page 65 of 76 14 Table 12 FORT-h 2.0 run on " forall s, t (s <->* t)" with/out collapsing SCCs YES ∅-time NO ∅-time ∞ Total time Collapsing SCCs 91 0.07 s 31 0.41 s 0 19.31 s Unoptimized 91 0.14 s 28 1.21 s 3 223.82 s Table 13 FORT-h 2.0 compared to other tools YES ∅-time NO ∅-time ∞/MAYBE Total time GCR FORT-h 2.0 37 0.06 s 84 0.04 s 1 65.82 s AGCP 24 0.02 s 79 0.07 s 19 276.42 s NFP FORT-h 2.0 55 0.02 s 67 0.01 s 0 1.76 s CSI 55 0.79 s 61 1.02 s 6 186.94 s UNC FORT-h 2.0 72 0.31 s 49 0.21 s 1 92.75 s ACP 70 0.08 s 47 0.86 s 5 345.91 s CSI 71 0.83 s 46 1.12 s 5 187.37 s UNR FORT-h 2.0 96 0.02 s 26 0.01 s 0 2.21 s CSI 86 0.81 s 26 0.76 s 10 209.12 s COM FORT-h 2.0 1365 0.10 s 6135 0.04 s 3 578.3 s CoLL 1349 0.21 s 4015 0.13 s 2139 19.5 h ACP 1238 0.01 s 3519 0.04 s 2746 5.0 h transitions and 851,916 ε-transitions. At this point the procedure usually eliminates the ε- transitions and trims the automaton, but FORT-h does not manage to do so within the 60 s timeout. The overall improvement on testing consistency can be seen in Table 12. 8.1.4 Comparison with Other Tools As a last experiment we compare FORT-h to a number of state of the art tools. For the properties GCR, NFP, UNC, UNR and COM we chose the following tools that competed in the corresponding categories in the confluence competition in 2021: ACP [2]in UNC and COM, AGCP [1]in GCR, CSI [44]in NFP, UNC and UNR,and CoLL [49]in COM. All these tools implement various sufficient conditions for the corresponding property and are not limited to linear variable-separated or left-linear right-ground TRSs. For the sake of comparing them to FORT-h we run them only on the left-linear right-ground TRSs in COPS, and on the pairs of these problems for COM. The results can be seen in Table 13. We can see that FORT-h 2.0 significantly outperforms all the other tools on this class of systems. For all properties it can find results for more problems and can often do so with less time per problem. This difference is especially pronounced in the COM category, where FORT-h 2.0 can (dis)prove all but three of the 7503 problems, while ACP and CoLL timeout or return Maybe on more than 2000 of these. Given this performance discrepancy it is of interest to other tools to use FORT-h 2.0 on problems of this class. Here it could be used as a black box on problems (or subproblems) as long as they are linear variable-separated 123 14 Page 66 of 76 A. Middeldorp et al. TRSs, and can be expressed in the first-order theory of rewriting. An example of such a tool is CONFident [27] which uses FORT, among other tools, as part of its procedure. Another interesting point can be seen when comparing the first line in Table 13, where 37 YES results are reported, with the fourth line in Table 5, where 38 YES results are reported. Both formulas check ground-confluence, but the built-in GCR property is represented slightly different. Instead of the joinability predicate (t ↓ u), which is constructed via operations on ∗ ∗ anchored GTTs, it uses the equivalent formula ∃ v(t → v ∧ u → v). In this case the explicit formula is slower on COPS #215 leading to the additional timeout, but is faster on other problems causing the total time to be similar. Like previous experiments this shows that the representation of a property can have a large and non-obvious effect on performance. 8.2 FORT-s In this subsection we report on the synthesis experiments that we performed. All experiments were executed with the options -j 4 and +RTS -A64M, unless stated otherwise. First we consider Fig. 6. Experiment 7 The following TRSs were produced by FORT-s on the given formulas when restricting the signature (using the command-line option-S "a 0b0f2") to a binary function symbol f and two constants a and b: ∼ ∼ "GWCR & WCR & GCR" a →bf(a, x ) →aa → f(a, a) 9 s ∼ ∼ "GCR & CR & GSCR" a →bf(a, x ) → f(a, a) f(b, b) → a 10 s ∼ ∼ "GNFP & NFP & GCR" a →bf(a, x ) → f(a, a) f(b, b) → f(a, a) 4 s ∼ ∼ "GUNC & UNC & GNFP" a →af(a, x ) →af(x , b) → b 11 s We do not know whether there exist TRSs over the restricted signature that satisfy "GUNR & ∼UNR & ∼GUNC". Human expertise was used to produce a witness over a larger signature, which was subsequently simplified using the decision mode of FORT: b →ac →cd →cf(x , a) →Af(x , A) → A b →cd →ef(x , e) →Af(c, x ) → A FORT-h produces the following terms as witnesses for the fact that UNR is not satisfied: t = A and u = f(e, $). Indeed both A and f(e, $) are normal forms reachable from f(d, $). Moreover, we obtain witnesses t = a and u = e showing that GUNC does not hold. (The rule c → c is needed to satisfy GUNR.) In the next experiment we use the infinity predicate to distinguish well-known subclasses of linear-variable separated TRSs. Experiment 8 The formula ∃ t INF ε (t ) "exists t (INF(e<-,t))" ←− distinguishes ground TRSs from left-linear right-ground (but not ground) ones. Without any options FORT-s produces the TRS{g(x ) → g(a)} in a fraction of a second. The formula ∃ t INF (t ) "exists t (INF( =,t))" http://zenon.dsic.upv.es/confident 123 First-Order Theory of Rewriting… Page 67 of 76 14 is true for TRSs that are not ARSs. FORT-s produces the empty TRS over the signature consisting of the constant a and an additional constant and unary function symbol. The second constant is not necessary, but is added by the signature step. Finally, to distinguish linear variable-separated TRSs from left-linear right-ground TRSs, assuming the signature contains at least one non-constant function symbol, the formula ∃ t INF ε (t ) "exists t (INF(->e,t))" −→ can be used in connection with the -l option. This generates the TRS {a → x} over the signature consisting of the constant a and an additional constant and unary function symbol. Without the latter, the generated linear variable-separated TRS induces only a finite rewrite relation. Adding "& CR & WN" to the last formula produces the TRS{a → b, f(b) → x}. Experiment 9 Finding a locally confluent but not confluent TRS R is easy. FORT-s produces the ground TRS a →bf(a) →aa → f(a) when given the formula "WCR & CR" is less than 1 s. The well-known abstract counterexample by Kleene ab c d is found by restricting the search to ARSs. The easiest way to do this is with the option -A 0, which sets the maximal arity of function symbols to 0. Moreover, the maximum number of rewrite rules has to be set to at least four (-R 4). If we impose the additional condition that R is terminating (cf. [56]), the TRS a →ba → g(a) b → g(g(b)) is generated with ∼ ∼ "WCR & CR & exists t (INF(*<-,t) | t +<- t)" without any additional command-line options in less than 7 s. The next experiment shows how FORT-s can be used to complete TRSs into complete (canonical) ones. Experiment 10 FORT-s produces the TRS{a → c, f(x ) → a} when presented the formula "[0](WCR & SN) & forall s, t ([0] s <->* t <=> [1] s <->* t)" with input.trs as additional parameter. Here input.trs consists of the three rules c →af(b) →cf(c) → a The result is complete (as demanded by "[0](WCR & SN)"), but not equivalent! The reason is that "forall s, t ([0] s <->*t <=> [1] s <->* t)" ensures ground conversion equivalence, and we have seen in Sect. 6 that an extra constant is needed to reduce conversion equivalence to ground conversion equivalence. The same behaviour can also be seen for our leading example, where the same formula is used. When presented the formula "[0](WCR & SN) & CE([0],[1])" 123 14 Page 68 of 76 A. Middeldorp et al. the equivalent complete TRS consisting of the rules a →cf(b) → f(a) f(c) → a is synthesized. Note that the latter TRS is not canonical since not all right-hand sides are in normal form. It is well-known that every system of ground equations admits a presentation as canonical TRS. Snyder [50] proved that a ground TRS is canonical if only if it is reduced. The latter property is easily expressible: "[0](forall s, t (s ->e t => NF(t) & ∼exists u (s ->be u) & forall u (s ->e u => t = u)))" Together with "CE([0],[1])", any ground TRS is transformed into an equivalent canon- ical one, without explicitly requiring confluence and termination. For our example TRS, we obtain a →cf(b) →cf(c) → c The final experiment is based on [57, Example 5.1] and shows how FORT-s can be used to synthesize multiple TRSs. Experiment 11 If we want to generate two terminating ARSs such that their union is non- terminating, the formula "[0]SN & [1]SN & SN" can be used in connection with the options -A 0 and -n 2. The latter tells FORT-s to synthesize two TRSs. The additional requirement that the composition of both relations is a subset of the transitive closure of one of them is expressed as "forall s, t, u ([0] s -> t & [1] t -> u => [0] s ->+ u | [1] s ->+ u)" In a fraction of a second FORT-s synthesizes the following two ARSs satisfying the conjunction of these requirements: A : a →bb → c A : b →cc → a 0 1 Using completely different techniques, similar ARSs are generated by Carpa, the tool described in Zantema [57]. 9 Conclusion In this paper we presented a formalized decision procedure of the first-order theory of rewrit- ing for the class of linear variable-separated TRSs. The decision procedure ultimately goes back to Dauchet and Tison [10] and is the basis of the tool FORT-h. Different from [8, 10], we extensively use anchored GTT relations. These have better closure properties than GTT relations and allow to efficiently express numerous binary relations on ground terms, eas- ing formalization efforts. We presented signature extension results that allow us to reduce certain properties on arbitrary terms to the corresponding properties on ground terms. These allow FORT-h to participate in categories other than GCR in the Confluence Competition. We presented a certificate language in which certificates for the yes/no output of the decision procedure can be expressed. These certificates are validated by FORTify,the verifiedHaskell program obtained from the executable Isabelle formalization. FORT-h supports properties like commutation that involve multiple TRSs. Witness generation is useful to gain insight in 123 First-Order Theory of Rewriting… Page 69 of 76 14 why a particular property holds. The synthesis mode is used to find small TRSs that satisfy a given property. FORT-s supports several options to control the (infinite) search space. We showed that the synthesis problem is undecidable, already for ARSs, by a reduction from PCP. Comprehensive experimental results were presented, including a comparison with the tools ACP [2], AGCP [1], CoLL [49], CSI [44] that compete with FORT-h in CoCo. Full details are available from the web site https://fortissimo.uibk.ac.at/ which additionally provides a convenient interface to FORT-h, FORT-s and FORTify, as well as precompiled binaries for the three tools. Linear variable-separated TRSs are a proper extension of left-linear right-ground TRSs. Dropping either restriction, one quickly faces an undecidable first-order theory, even when one-step rewriting (→) is the only predicate. This was first shown by Treinen [54]. Related undecidability results are presented in [39, 55]. In particular, Marcinkowski [39] showed that the first-order theory of one-step rewriting is undecidable for right-ground TRSs. Many concrete properties expressible in the first-order theory of rewriting are known to be decidable for much larger classes of rewrite systems. For instance, termination is known to be decidable for right-linear right-shallow TRSs, a result by Godoy et al. [25], extending the earlier decision result for right-ground systems of Dershowitz [14]. Termination is also decidable for almost-orthogonal growing TRSs [43]. Confluence is decidable for right-linear shallow TRSs [24] and for right-ground TRSs [30]. For ground TRSs, which are in the scope of FORT-h, termination is known to be decidable in polynomial time [45]. The same holds for confluence [7]. Felgenhauer [19] showed that confluence can be decided in cubic time. Similar complexity results for the related properties NFP, UNC and UNR are given in [20]. The worst-case complexity of the formalized decision procedure implemented in FORT-h is at least double exponential (cf. [26]). Concerning synthesis, we are not aware of any other tree-automata based tool for synthe- sizing TRSs nor of any tool that allows properties to be specified by an arbitrary first-order formula in the theory of rewriting. Jiresch [29] developed a synthesis tool to attack the well- known open problems [15, 16] concerning the sufficiency of certain restricted joinability conditions on critical pairs of left-linear TRSs. Zantema [56] developed the tool Carpa+ for synthesizing TRSs that satisfy properties which can be encoded as SMT problems. The TRSs that can be synthesized form a small extension of the class of ARSs: A single unary function symbol f is permitted and rules must have the shape a → b, a → f (b),or f (a) → b, where a and b are constants. The properties are restricted to those that can be encoded into the conjunctive fragment of SMT-LRA (linear real arithmetic). The predecessor tool Carpa [57] synthesized combinations of ARSs with help of a SAT solver. It was used to show the necessity of certain conditions in abstract confluence results [52, Sect. 5] and inspired us to support multiple TRSs in FORT. Concerning future work, improving the efficiency of FORT-h by supporting parallelism might result in a speed-up, especially for larger formulas. The minimization of tree automata (also non-deterministic ones) is an obvious target for further investigation. Preprocessing techniques that go beyond the mere transformation to negation normal form will be helpful to obtain equivalent formulas that reduce the size of the ensuing tree automata in the decision procedure. In [28] similar ideas are applied to WSkS, in connection with MONA [31]. An interesting question is whether FORT-h can be extended to deal with properties involving innermost and other restrictions of rewriting. Formalization efforts that aim to transfer code in module A to the verified code in module B in Fig. 7, are also of interest. The conversion of FORT syntax to de Bruijn notation is a natural candidate here. 123 14 Page 70 of 76 A. Middeldorp et al. Acknowledgements This research was supported by FWF (Austrian Science Fund) project P30301. Several persons helped to make this project successful. We are grateful to Bertram Felgenhauer for numerous contribu- tions. Franziska Rapp implemented the first versions of FORT in OCaml and Java. She and T. V. H. Prathamesh contributed to the early stage of the formalization of the decision procedure. Jamie Hochrainer reimplemented the synthesis mode, resulting in FORT-s. Johannes Koch designed the web interface. We thank René Thie- mann for advice concerning turning the formalization into executable code. The first author acknowledges the support of the Future Value Creation Research Center of Nagoya University, where part of the research was performed. The detailed comments of the anonymous reviewers improved the presentation. Author Contributions All authors contributed to the research reported in the manuscript. Alexander Lochmann performed the formalizations in Isabelle/HOL that led to FORTify. Fabian Mitterwallner was the main developer of the artifacts (FORT-h, FORT-s and FORTify). The first draft of the manuscript was written by Aart Middeldorp and all authors commented on previous versions of the manuscript. All authors read and approved the final manuscript. Funding Open access funding provided by Austrian Science Fund (FWF). This work was supported by FWF (Austrian Science Fund) project P30301. The first author acknowledges the support of the Future Value Creation Research Center of Nagoya University. Data Availability The experiments summarized in the manuscript are available from https://fortissimo.uibk. ac.at/jar. The same holds for binaries and sources of the artifacts. Declarations Conflict of interest The author declares that they have no conflict of interest. Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. Appendix A: Input Format The input format of FORT-h can be roughly split into two parts: The logical structure of the property and the involved atomic predicates and relations. The logical structure is defined by the following grammar, where angle brackets are used for non-terminal symbols: formula ::= formula operator formula |∼ formula | quantifier vars ( formula )| var relation var | property | {+ nat } formula | [ trss ] formula | ( formula ) operator ::= <=>| =>| || & quantifier ::= forall| exists trss ::= nat | nat , trss vars ::= var | var , vars Here nat is a natural number, var is an alphanumerical string representing a variable name and trss is a comma separated list of indices referencing TRSs. The logical operators 123 First-Order Theory of Rewriting… Page 71 of 76 14 are all right-associative. Regarding precedence the unary operations bind strongest with the binary operators respecting the order & > | > => > <=>. Most represented operations have the meaning expected from a first-order formula, the exception being the operations {+ nat } formula , which allows the user to specify the number of constants to be added to the signature when evaluating the subformula, and [ trss ] formula , which restricts and permutes the indices of TRSs for the underlying subformula. The atomic binary relations supported by FORT-h are defined as: relation ::= ->e| ->e*| ->e=| ->e+| e<-| *e<-| =e<-| +e<- | ->be| ->be*| ->be=| ->be+| be<-| *be<-| =be<-| +be<- | ->| ->*| ->=| ->+| <-| *<-| =<-| +<- | ->!| -||->| !<-| <-||-| <->| <->* | =| join| meet Here the ->e stands for a root step, ->be for a step below the root, -> a normal rewrite step, ->! is a reduction to normal form, -||-> is a parallel step, join stands for joinability↓ and meet for meetability ↑.The suffix * stands for the transitive-reflexive, + for the transitive, and = for the reflexive closures. Example 38 Consider calling FORT-h with three input TRSs on the formula: "{+2} forall s, t ([2,0] ([0] s ->!t <=>[1]s->! t))" The {+2} instructs FORT-h to add two constants to the signature when constructing the automata. Normally "[0] s ->! t" means that term s normalizes to term t in the first input TRS (the one with index 0), however here the context has changed due to the restrict modifier [2,0], which permutes and restricts the three TRSs in the subformula ([0] s ->!t <=> [1]s->!t) such that [0] refers to the TRS with index 2 and [1] refers to the TRS with index 0. So FORT-h checks normalization equivalence of the third and first input TRS, while ignoring the second one. The two constants are added according to Table 3, since one of the involved TRSs may be linear variable-separated. It is also possible to use some predefined properties by name. Here we differentiate between properties of terms and properties of whole TRSs. property ::= prop_of_term | prop_of_system The properties on whole TRSs have the same names as defined in Sect. 6. prop_of_system := CR| WCR| SCR| NFP| UNC| UNR| WN| SN | GCR| GWCR| GSCR| GNFP| GUNC| GUNR | binary_prop ([ trss ],[ trss ]) binary_prop ::= COM| GCOM| CE| GCE| NE| GNE The term properties take a variable as an additional argument. prop_of_term ::= prop ( var )| finiteness ( binrel , var ) prop ::= CR| WCR| WN| NFP| SN| NF| SCR| UNR finiteness ::= INF| FIN binrel ::= binrel operator binrel |∼ binrel | relation 123 14 Page 72 of 76 A. Middeldorp et al. Note that the INF and FIN properties also take a binary relation as an argument. This is usually one of the predefined rewrite relations, but may also be a more complex relation constructed by combining the rewrite relations using logical operators. The property names (with exception of NF and INF) are all just a shorthand for larger formulas. In general these correspond to the definitions of the property in Sect. 6.However there are some exceptions. Take for example ground-confluence (GCR). This unfolds to the formula forall s, t, u (s ->u&s->*t=> exists v (u ->*v&t->* v)) The s->u on the left of the implication differs from the original definition of GCR. However this property (known as semi-confluence [3]) can be shown to be equivalent to GCR by a simple induction proof, and generally leads to smaller automata in the decision procedure. The runtime comparison between different representations of ground-confluence and other properties is shown in Sect. 8. Appendix B: User Interface of FORT-h The command-line interface of FORT-h is fort-h [OPTIONS] FORMULA TRS.trs .. where TRS.trs .. is one ore more files containing TRSs in the COPS format used in CoCo. It also supports many-sorted TRSs in the MSTRS format in the GCR category. The additional options are -c FILE write certificate to FILE -i enable the additional info in the inference steps of the certificate -v enables verbose output (e.g., the internal representation) -w enables witness generation Witness generation enables the tool to produce witnesses/counterexamples and will be described in detail later in this section. For now, consider Example 28 and the call > fort-h -w "CR" input.trs NO formula body / witness: (0 (<- o->*)1&˜0 (->* o *<-) 1) 0 = g(_00()) 1 = g(_01()) So in addition to the answer NO, it also outputs a counterexample for the given formula consisting of the two terms g(_00()) and g(_01()).Here _00 and _01 are additional constants required to reduce confluence to ground-confluence, and represent variables. The terms should therefore be read as g(x ) and g(y). Appendix C: User Interface of FORT-s The command-line interface of FORT-s is given below: 123 First-Order Theory of Rewriting… Page 73 of 76 14 fort-s [OPTIONS] FORMULA [TRS.trs ..] where [TRS.trs..] are zero or more files containing TRSs, and the options are -j NUM jobs to run in parallel (default: 1) -l search for linear variable-separated TRSs -n NUM number of systems to be synthesized (default: 1) -S STRING specifies signature (default: uses signature step) -a STRING specifies arities (default: uses signature step) -s NUM signature step (default: 2) -A NUM maximal generated arity (default: 3) -D NUM upper bound on height (default: 3) -r NUM lower bound on number of rules per system (default: 0) -R NUM upper bound on number of rules per system (default: 3) -v NUM upper bound on number of variables (default: 1) The signature used during synthesis can be specified in multiple ways, the two simplest being with the command line flags -S and -a. With the option -S the signature is specified by a string listing the symbols in F together with their arities, like in the call fort-s -S "a 0f2g1" "GCR & CR" Since we often do not care about the presentation of function symbols it is also permitted to just list arities with the option -a: fort-s -a "0 1" "WN & SN" FORT-s then generates unique symbol names for the user. If no signature is given, FORT-s generates successive signatures in a systematic manner with the help of a signature step and a bound on the maximal arity. If the signature step number is set to 1 and the arity is bounded by 3, signatures with the following arities are created: {0},{0, 1},{0, 1, 2},{0, 1, 2, 3},{0, 0, 1, 2, 3},{0, 0, 1, 1, 2, 3},... If the signature step is set to 2 (its default value), we obtain {0},{0, 0},{0, 0, 1},{0, 0, 1, 1},{0, 0, 1, 1, 2},... , {0, 0, 1, 1, 2, 2, 3, 3},{0, 0, 0, 1, 1, 2, 2, 3, 3},... The signature step is passed to FORT-s with the option -s and the bound on the arities by -A. Note that when additional systems are passed to FORT-s, it will use the union of the signatures of those systems. When synthesizing n TRSs, in the given formula the indices 0 through n − 1 refer to the systems to be generated, and the indices greater than n − 1 refer to systems passed as additional inputs to FORT-s. References 1. Aoto, T., Toyama, Y.: Ground confluence prover based on rewriting induction. In: Kesner, D., Pientka, B. (eds.) Proc. 1st International Conference on Formal Structures for Computation and Deduction. Leibniz International Proceedings in Informatics, vol. 52, pp. 33:1–33:12 (2016). https://doi.org/10.4230/LIPIcs. FSCD.2016.33 2. Aoto, T., Yoshida, J., Toyama, Y.: Proving confluence of term rewriting systems automatically. In: Treinen, R. (ed.) Proc. 20th International Conference on Rewriting Techniques and Applications. Lecture Notes in Computer Science, vol. 5595, pp. 93–102 (2009). https://doi.org/10.1007/978-3-642-02348-4_7 123 14 Page 74 of 76 A. Middeldorp et al. 3. Baader, F., Nipkow, T.: Term Rewriting and All That. Cambridge University Press, Cambridge (1998). https://doi.org/10.1017/CBO9781139172752 4. Berghofer, S.: First-order logic according to Fitting. Archive of Formal Proofs (2007). https://isa-afp.org/ entries/FOL-Fitting.html 5. Berghofer, S., Bulwahn, L., Haftmann, F.: Turning inductive into equational specifications. In: Berghofer, S., Nipkow, T., Urban, C., Wenzel, M. (eds.) Proc. 22nd International Conference on Theorem Proving in Higher Order Logics. Lecture Notes in Computer Science, vol. 5674, pp. 131–146 (2009). https://doi. org/10.1007/978-3-642-03359-9_11 6. Comon, H.: Sequentiality, monadic second-order logic and tree automata. Inf. Comput. 157(1–2), 25–51 (2000). https://doi.org/10.1006/inco.1999.2838 7. Comon, H., Godoy, G., Nieuwenhuis, R.: The confluence of ground term rewrite systems is decidable in polynomial time. In: Proc. 42th IEEE Symposium on Foundations of Computer Science, pp. 298–307 (2001). https://doi.org/10.1109/SFCS.2001.959904 8. Comon, H., Dauchet, M., Gilleron, R., Löding, C., Jacquemard, F., Lugiez, D., Tison, S., Tommasi, M.: Tree Automata Techniques and Applications (2008). http://tata.gforge.inria.fr/ 9. Dauchet, M., Tison, S.: Decidability of confluence for ground term rewriting systems. In: Budach, L. (ed.) Proc. 5th International Conference on Fundamentals of Computation Theory. Lecture Notes in Computer Science, vol. 199, pp. 80–84 (1985). https://doi.org/10.1007/BFb0028794 10. Dauchet, M., Tison, S.: The theory of ground rewrite systems is decidable. In: Proc. 5th IEEE Symposium on Logic in Computer Science, pp. 242–248 (1990a). https://doi.org/10.1109/LICS.1990.113750 11. Dauchet, M., Tison, S.: The theory of ground rewrite systems is decidable (extended version). Technical Report I.T. 197, LIFL (1990b) 12. Dauchet, M., Heuillard, T., Lescanne, P., Tison, S.: Decidability of the confluence of finite ground term rewriting systems and of other related term rewriting systems. Inf. Comput. 88(2), 187–201 (1990). https:// doi.org/10.1016/0890-5401(90)90015-A 13. de Bruijn, N.G.: Lambda calculus notation with nameless dummies: A tool for automatic formula manipu- lation, with application to the Church-Rosser theorem. Indagationes Mathematicae 34(5), 381–392 (1972). https://doi.org/10.1016/1385-7258(72)90034-0 14. Dershowitz, N.: Termination of linear rewriting systems (preliminary version). In: Even, S., Kariv, O. (eds.) Proc. 8th International Colloquium on Automata, Languages and Programming, vol. 115, pp. 448–458 (1981). https://doi.org/10.1007/3-540-10843-2_36 15. Dershowitz, N.: Open. Closed. Open. In: Giesl, J. (ed.) Proc. 16th International Conference on Rewriting Techniques and Applications. Lecture Notes in Computer Science, vol. 3467, pp. 276–393 (2005). https:// doi.org/10.1007/978-3-540-32033-3_28 16. Dershowitz, N., Jouannaud, J.-P., Klop, J.W.: Open problems in rewriting. In: Book, R.V. (ed.) Proc. 4th International Conference on Rewriting Techniques and Applications. Lecture Notes in Computer Science, vol. 488, pp. 445–456 (1991). https://doi.org/10.1007/3-540-53904-2_120 17. Deruyver, A., Gilleron, R.: The reachability problem for ground TRS and some extensions. In: Proc. 14th Colloquium on Trees in Algebra and Programming. Lecture Notes in Computer Science, vol. 351, pp. 227–243 (1989). https://doi.org/10.1007/3-540-50939-9_135 18. Durand, I., Middeldorp, A.: Decidable call-by-need computations in term rewriting. Inf. Comput. 196(2), 95–126 (2005). https://doi.org/10.1016/j.ic.2004.10.003 19. Felgenhauer, B.: Deciding confluence of ground term rewrite systems in cubic time. In: Tiwari, A. (ed.) Proc. 23nd International Conference on Rewriting Techniques and Applications. Leibniz International Proceedings in Informatics, vol. 15, pp. 165–175 (2012). https://doi.org/10.4230/LIPIcs.RTA.2012.165 20. Felgenhauer, B.: Deciding confluence and normal form properties of ground term rewrite systems efficiently. Log. Methods Comput. Sci. (2018). https://doi.org/10.23638/LMCS-14(4:7)2018 21. Felgenhauer, B., Thiemann, R.: Reachability, confluence, and termination analysis with state-compatible automata. Inf. Comput. 253(3), 467–483 (2017). https://doi.org/10.1016/j.ic.2016.06.011 22. Felgenhauer, B., Middeldorp, A., Prathamesh, T.V.H., Rapp, F.: A verified ground confluence tool for linear variable-separated rewrite systems in Isabelle/HOL. In: Mahboubi, A., Myreen, M.O. (eds.) Proc. 8th ACM SIGPLAN International Conference on Certified Programs and Proofs, pp. 132–143 (2019). https://doi.org/10.1145/3293880.3294098 23. Giesl, J., Rubio, A., Sternagel, C., Waldmann, J., Yamada, A.: The termination and complexity compe- tition. In: Beyer, D., Huisman, M., Kordon, F., Steffen, B. (eds.) Proc. 25th International Conference on Tools and Algorithms for the Construction and Analysis of Systems. Lecture Notes in Computer Science, vol. 11429, pp. 156–166 (2019). https://doi.org/10.1007/978-3-030-17502-3_10 24. Godoy, G., Tiwari, A.: Confluence of shallow right-linear rewrite systems. In: Ong, L. (ed.) Proc. 14th International Conference on Computer Science Logic. Lecture Notes in Computer Science, vol. 3634, pp. 541–556 (2005). https://doi.org/10.1007/11538363_37 123 First-Order Theory of Rewriting… Page 75 of 76 14 25. Godoy, G., Huntingford, E., Tiwari, A.: Termination of rewriting with right-flat rules. In: Baader, F. (ed.) Proc. 18th International Conference on Rewriting Techniques and Applications. Lecture Notes in Computer Science, vol. 4533, pp. 200–213 (2007). https://doi.org/10.1007/978-3-540-73449-9_16 26. Göller, S., Lohrey, M.: The first-order theory of ground tree rewrite graphs. Log. Methods Comput. Sci. (2014). https://doi.org/10.2168/LMCS-10(1:7)2014 27. Gutiérrez, R., Lucas, S., Vítores, M.: Confluence of conditional rewriting in logic form. In: Bojanczyk, M., Chekuri, C. (eds.) Proc. 41st IARCS Annual Conference on Foundations of Software Technology and Theoretical Computer Science. Leibniz International Proceedings in Informatics, vol. 213, pp. 44:1–44:18 (2021). https://doi.org/10.4230/LIPIcs.FSTTCS.2021.44 28. Havlena, V., Holík, L., Lengal, O., Vales, O., Vojnar, T.: Antiprenexing for WSkS: A little goes a long way. In: Albert, E., Kovacs, L. (eds.) Proc. 23rd International Conference on Logic for Programming, Artificial Intelligence, and Reasoning. EPiC Series in Computing, vol. 73, pp. 298–316 (2020). https:// doi.org/10.29007/6bfc 29. Jiresch, E.: A term rewriting laboratory with systematic and random generation and heuristic test facilities. Master’s thesis, Vienna University of Technology (2008) 30. Kaiser, L.: Confluence of right ground term rewriting systems is decidable. In: Sassone, V. (ed.) Proc. 8th International Conference on Foundations of Software Science and Computation Structures. Lecture Notes in Computer Science, vol. 3441, pp. 470–489 (2005). https://doi.org/10.1007/978-3-540-31982- 5_30 31. Klarlund, N., Møller, A., Schwartzbach, M.I.: MONA implementation secrets. Int. J. Found. Comput. Sci. 13(4), 571–586 (2002). https://doi.org/10.1142/S012905410200128X 32. Lochmann, A.: Reducing Rewrite Properties to Properties on Ground Terms. Archive of Formal Proofs (2022). https://isa-afp.org/entries/Rewrite_Properties_Reduction.html 33. Lochmann, A., Felgenhauer, B.: First-order theory of rewriting. Archive of Formal Proofs (2022). https:// isa-afp.org/entries/FO_Theory_Rewriting.html 34. Lochmann, A., Middeldorp, A.: Formalized proofs of the infinity and normal form predicates in the first- order theory of rewriting. In: Biere, A., Parker, D. (eds.) Proc. 26th International Conference on Tools and Algorithms for the Construction and Analysis of Systems. Lecture Notes in Computer Science, vol. 12079, pp. 178–194 (2020). https://doi.org/10.1007/978-3-030-45237-7_11 35. Lochmann, A., Felgenhauer, B., Sternagel, C., Thiemann, R., Sternagel, T.: Regular tree relations. Archive of Formal Proofs (2021a). https://www.isa-afp.org/entries/Regular_Tree_Relations.html 36. Lochmann, A., Middeldorp, A., Mitterwallner, F., Felgenhauer, B.: A verified decision procedure for the first-order theory of rewriting for linear variable-separated rewrite systems variable-separated rewrite systems in Isabelle/HOL. In: Hri¸tcu, C., Popescu, A. (eds.) Proc. 10th ACM SIGPLAN International Conference on Certified Programs and Proofs, pp. 250–263 (2021b). https://doi.org/10.1145/3437992. 37. Lochmann, A., Mitterwallner, F., Middeldorp, A.: Formalized signature extension results for conflu- ence, commutation and unique normal forms. In: Mimram, S., Rocha, C. (eds.) Proc. 10th International Workshop on Confluence, pp. 25–30 (2021) 38. Lochmann, A., Mitterwallner, F., Middeldorp, A.: Formalized signature extension results for equivalence. In: Winkler, S., Rocha, C. (eds.) Proc. 11th International Workshop on Confluence, pp. 42–47 (2022) 39. Marcinkowski, J.: Undecidability of the first order theory of one-step right ground rewriting. In: Comon, H. (ed.) Proc. 8th International Conference on Rewriting Techniques and Applications. Lecture Notes in Computer Science, vol. 1232, pp. 241–253 (1997). https://doi.org/10.1007/3-540-62950-5_75 40. Middeldorp, A.: Approximating dependency graphs using tree automata techniques. In: Goré, R., Leitsch, A., Nipkow, T. (eds.) Proc. 1st International Joint Conference on Automated Reasoning. LNAI, vol. 2083, pp. 593–610 (2001). https://doi.org/10.1007/3-540-45744-5_49 41. Middeldorp, A., Nagele, J., Shintani, K.: Confluence competition 2019. In: Beyer, D., Huisman, M., Kordon, F., Steffen, B. (eds.) Proc. 25th International Conference on Tools and Algorithms for the Con- struction and Analysis of Systems. Lecture Notes in Computer Science, vol. 11429, pp. 25–40 (2019). https://doi.org/10.1007/978-3-030-17502-3_2 42. Mitterwallner, F., Lochmann, A., Middeldorp, A., Felgenhauer, B.: Certifying proofs in the first-order theory of rewriting. In: Groote, J.F., Larsen, K.G. (eds.) Proc. 27th International Conference on Tools and Algorithms for the Construction and Analysis of Systems. Lecture Notes in Computer Science, vol. 12652, pp. 127–144 (2021). https://doi.org/10.1007/978-3-030-72013-1_7 43. Nagaya, T., Toyama, Y.: Decidability for left-linear growing term rewriting systems. Inf. Comput. 178(2), 499–514 (2002). https://doi.org/10.1006/inco.2002.3157 44. Nagele, J., Felgenhauer, B., Middeldorp, A.: CSI: New evidence—a progress report. In: de Moura, L. (ed.) Proc. 26th International Conference on Automated Deduction. LNAI, vol. 10395, pp. 385–397 (2017). https://doi.org/10.1007/978-3-319-63046-5_24 123 14 Page 76 of 76 A. Middeldorp et al. 45. Plaisted, D.A.: Polynomial time termination and constraint satisfaction tests. In: Kirchner, C. (ed.) Proc. 5th International Conference on Rewriting Techniques and Applications. Lecture Notes in Computer Science, vol. 690, pp. 405–420 (1993). https://doi.org/10.1007/978-3-662-21551-7_30 46. Rapp, F., Middeldorp, A.: Automating the first-order theory of left-linear right-ground term rewrite sys- tems. In: Kesner, D., Pientka, B. (eds.) Proc. 1st International Conference on Formal Structures for Computation and Deduction. Leibniz International Proceedings in Informatics, vol. 52, pp 36:1–36:12 (2016). https://doi.org/10.4230/LIPIcs.FSCD.2016.36 47. Rapp, F., Middeldorp, A.: Confluence properties on open terms in the first-order theory of rewriting. In: Accattoli, B., Tiwari, A. (eds.) Proc. 5th International Workshop on Confluence, pp. 26–30 (2016) 48. Rapp, F., Middeldorp, A.: FORT 2.0. In: Galmiche, D., Schulz, S., Sebastiani, R. (eds.) Proc. 9th Interna- tional Joint Conference on Automated Reasoning. LNAI, vol. 10900, pp. 81–88 (2018). https://doi.org/ 10.1007/978-3-319-94205-6_6 49. Shintani, K., Hirokawa, N.: CoLL: A confluence tool for left-linear term rewrite systems. In: Felty, A.P., Middeldorp, A. (eds.) Proc. 25th International Conference on Automated Deduction. Lecture Notes in Computer Science, vol. 9195, pp. 127–136 (2015). https://doi.org/10.1007/978-3-319-21401-6_8 50. Snyder, W.: A fast algorithm for generating reduced ground rewriting systems from a set of ground equations. J. Symbol. Comput. 15(4), 415–450 (1993). https://doi.org/10.1006/jsco.1993.1029 51. Sternagel, C., Sternagel, T.: Certifying confluence of almost orthogonal CTRSs via exact tree automata completion. In: Kesner, D., Pientka, B. (eds.) Proc. 1st International Conference on Formal Structures for Computation and Deduction. Leibniz International Proceedings in Informatics, vol. 52, pp. 29:1–29:16 (2016). https://doi.org/10.4230/LIPIcs.FSCD.2016.29 52. Stump, A., Zantema, H., Kimmell, G., Omar, R.E.H.: A rewriting view of simple typing. Log. Methods Comput. Sci. (2012). https://doi.org/10.2168/LMCS-9(1:4)2013 53. Thiemann, R., Sternagel, C.: Certification of termination proofs using CeTA. In: Berghofer, S., Nipkow, T., Urban, C., Wenzel, M. (eds.) Proc. 22nd International Conference on Theorem Proving in Higher Order Logics. Lecture Notes in Computer Science, vol. 5674, pp. 452–468 (2009). https://doi.org/10. 1007/978-3-642-03359-9_31 54. Treinen, R.: The first-order theory of linear one-step rewriting is undecidable. Theor. Comput. Sci. 208(1– 2), 179–190 (1998). https://doi.org/10.1016/S0304-3975(98)00083-8 55. Vorobyov, S.: The undecidability of the first-order theories of one step rewriting in linear canonical systems. Inf. Comput. 175(2), 182–213 (2002). https://doi.org/10.1006/inco.2002.3151 56. Zantema, H.: Automatically finding non-confluent examples in term rewriting. In: Hirokawa, N., van Oost- rom, V. (eds.) Proc. 2nd International Workshop on Confluence, pp. 11–15 (2013). http://cl-informatik. uibk.ac.at/iwc/iwc2013.pdf 57. Zantema, H.: Finding small counterexamples for abstract rewriting properties. Math. Struct. Comput. Sci. 28, 1485–1505 (2018). https://doi.org/10.1017/S0960129518000221 Publisher’s Note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations. http://www.deepdyve.com/assets/images/DeepDyve-Logo-lg.png Journal of Automated Reasoning Springer Journals

First-Order Theory of Rewriting for Linear Variable-Separated Rewrite Systems: Automation, Formalization, Certification

Loading next page...
 
/lp/springer-journals/first-order-theory-of-rewriting-for-linear-variable-separated-rewrite-yNTfUKaHc5

References (61)

Publisher
Springer Journals
Copyright
Copyright © The Author(s) 2023
ISSN
0168-7433
eISSN
1573-0670
DOI
10.1007/s10817-023-09661-7
Publisher site
See Article on Publisher Site

Abstract

The first-order theory of rewriting is decidable for linear variable-separated rewrite systems. We present a new decision procedure which is the basis of FORT, a decision and synthesis tool for properties expressible in the theory. The decision procedure is based on tree automata techniques and verified in Isabelle. Several extensions make the theory more expressive and FORT more versatile. We present a certificate language that enables the output of FORT to be certified by the certifier FORTify generated from the formalization, and we provide extensive experiments. Keywords Term rewriting· First-order theory· Tree automata· Formalization 1 Introduction Many properties of rewrite systems can be expressed as logical formulas in the first-order theory of rewriting. This theory is decidable for the class of linear variable-separated rewrite systems, which includes all ground rewrite systems. The decision procedure is based on tree automata techniques and goes back to Dauchet and Tison [10]. It is implemented in FORT [46, 48], which takes as input one or more rewrite systems R , R , ... and a formula ϕ,and 0 1 determines whether the rewrite systems satisfy the property expressed by ϕ, in which case it reports yes or no. FORT may not reach a conclusion due to limited resources. For properties related to confluence and termination, designated competitions (CoCo [41], termCOMP [23]) of software tools take place regularly. Occasionally, yes/no conflicts appear. Since the participating tools typically couple a plethora of techniques with sophisticated search strategies, human inspection of the output of tools to determine the correct answer is often not feasible. Hence certified categories were created in which tools must output a formal certificate. This certificate is verified by CeTA [53], an automatically generated Haskell program using the code generation feature of Isabelle. This requires not only that the underlying techniques are formalized in Isabelle, but the formalization must be executable for code generation to apply. During the time-consuming formalization process, mistakes in B Aart Middeldorp aart.middeldorp@uibk.ac.at Department of Computer Science, University of Innsbruck, Innsbruck, Austria 0123456789().: V,-vol 123 14 Page 2 of 76 A. Middeldorp et al. papers are sometimes brought to light. An additional outcome is that formalization efforts may give rise to simpler and more efficient constructions and algorithms. Since 2017 we are concerned with the question of how to ensure the correctness of the answers produced by FORT. The certifier CeTA supports a great many techniques for estab- lishing concrete properties like termination and confluence, but the formalizations in the underlying Isabelle Formalization of Rewriting (IsaFoR) are orthogonal to the ones required for supporting the decision procedure underlying FORT. We present a certificate language which is rich enough to express the various automata operations in decision procedures for the first-order theory of rewriting as well as numerous predicate symbols that may appear in formulas in this theory. FORTify, the verified Haskell program obtained from the Isabelle formalization, validates certificates in this language. The decision procedure implemented in FORT and formalized in Isabelle is based on three different tree automata models. We use standard bottom-up tree automata to represent various sets of ground terms. For (most) binary relations on ground terms, we use anchored ground tree transducers. These are a simplification of the ground tree transducers used in the literature [8–10, 12, 18] with better closure properties, reducing the number of constructions needed to represent the first-order theory of rewriting. Some of these closure properties are proved (and formalized) using the simple but equivalent class of pair automata. The third model are standard tree automata operating on a different signature in order to represent n-ary relations on ground terms, for arbitrary n (including n = 2). In the next section we present the basic definitions. Section 3 introduces the first-order theory of rewriting. In Sect. 4 we introduce in a systematic way several context closure operations on binary relations that are used to represent the binary predicates in the first-order theory of rewriting. Detailed proofs of the various results concerning the three tree automata models that are required for the decision procedure are presented in Sect. 5. Many of the results and tree automata constructions in this section are well-known, but are included for completeness and because the implementation in FORT and the subsequent formalization are directly based on them. Tree automata operate on ground terms. In Sect. 6 we present the formalized signature extension results that allow to reduce certain properties on arbitrary terms to properties on ground terms. In Sect. 7 the decision and synthesis modes of FORT are described, and a new undecidability proof related to the latter is presented. We also discuss the representation of formulas in certificates and the certificate language, and we explain how certificates are validated by FORTify,the verified Haskell program obtained from the Isabelle formalization. Experimental results are presented in Sect. 8, before we conclude in Sect. 9. In an appendix the input syntax and the interface of the tools is presented. The formalization is based on Isabelle/HOL. Our contribution is split into three parts, which are published as separate entries in the Archive of Formal Proofs. The first part [35] contains general results about bottom-up tree automata, ported from IsaFoR, extended with constructions and results about anchored ground tree transducers, pair automata, and regular relation automata. The second part [33] formalizes primitive constructions needed to decide the first-order theory of rewriting. Moreover, it connects the logical semantic entailment of first-order formulas to regular tree languages. This connection gives rise to a natural descrip- tion of the decision procedure. The specification allows tool authors to generate certificates (which can be viewed as a formal proof claim using appropriate automata construction for the corresponding logical connectives and predicates). We rely on the code generation facility http://cl-informatik.uibk.ac.at/isafor/ https://www.isa-afp.org 123 First-Order Theory of Rewriting… Page 3 of 76 14 of Isabelle/HOL to obtain the certifier FORTify that is able to verify the integrity of such certificates. The third part [32] is independent, and covers the results in Sect. 6. The formalization can be accessed via the following links: • https://www.isa-afp.org/entries/Regular_Tree_Relations.html • https://www.isa-afp.org/entries/FO_Theory_Rewriting.html • https://www.isa-afp.org/entries/Rewrite_Properties_Reduction.html Most definitions, theorems, and lemmata in this paper directly correspond to the formal- ization. These are indicated by the  symbol, which links to an HTML rendering of our formalization, for those who like to dive right into the actual Isabelle code. In the running text (traditional) proof details are given. This article combines and extends earlier papers that appeared in conference and informal workshop proceedings. These cover system descriptions of earlier versions of FORT [46, 48], formalization and certification aspects [22, 34, 36, 42], as well as results for dealing with properties on non-ground terms [37, 38, 47]. Many new examples to illustrate the various constructions were added and the presentation is self-contained. The efficiency improvements described in Sect. 7 are new. The same is true for the undecidability result in Sect. 7.5. Also several of the experiments that we present in Sect. 8 have not been described before. 2 Preliminaries In this preliminary section we recall basic definitions and notations of term rewriting [3]and tree automata [8]. 2.1 Term Rewriting We assume a finite signature F containing at least one constant symbol and a disjoint set of variables V. The set of terms built up from F and V is denoted by T (F , V), while T (F ) denotes the (non-empty) set of ground terms. The set of variables occurring in a term t is denoted by Var(t ). A term is linear if it does not contain multiple occurrences of the same variable. Positions are strings of positive integers which are used to address subterms. The set of positions in a term t is denoted by Pos(t ) and the root position by ε. The function symbol at position p ∈ Pos(t ) is denoted by t (p) and t[u] denotes the result of replacing the subterm t| of t at position p by the term u. The height height(t ) of a term t is the length of a longest position in Pos(t ). A substitution is a mapping σ from variables to terms and t σ denotes the result of applying σ to a term t. A context C is a term that contains exactly one hole, denoted by the special constant  ∈ / F. We write C[t] for the result of replacing the hole in C by the term t. A term rewrite system (TRS) R is a set of rules  → r between terms , r ∈ T (F , V).ATRS R is linear if its rewrite rules consist of linear terms. We call R variable-separated if Var()∩ Var(r ) = ∅ for every  → r ∈ R. In this paper we are concerned with finite, linear, variable-separated TRSs R and we (mostly) consider rewriting on ground terms: t → u for ground terms t, u if there exist a context C, a rewrite rule  → r ∈ R, and a substitution σ such that t = C[σ] and u = C[r σ]. We write → for the reflexive and transitive closure of→ . Further relations on terms will be introduced in the next section. We drop the subscript R when it can be inferred from the context. A ground normal form is a ground term t such that t → u for no term u. We write NF(R) for the set of ground normal forms of R. 123 14 Page 4 of 76 A. Middeldorp et al. Example 1 We use the TRS R consisting of the rewrite rules a →bf(a) →bg(a, x ) → f(a) over the signature F ={a, b, f, g} as leading example in this paper. We have f(g(a, b)) → f(f(a)) → f(b) R R with ground normal form f(b). 2.2 Tree Automata A (finite bottom-up) tree automaton A = (F , Q, Q ,) consists of a finite signature F,a finite set Q of states, disjoint from F,asubset Q ⊆ Q of final states, and a set of transition rules . Every transition rule has one of the following two shapes: • f (p ,..., p ) → q with f ∈ F and p ,..., p , q ∈ Q,or 1 n 1 n • p → q with p, q ∈ Q. Transition rules of the second shape are called ε-transitions. Transition rules can be viewed as rewrite rules between ground terms in T (F∪ Q, V). The induced rewrite relation is denoted by → or → . A ground term t ∈ T (F ) is accepted by A if t → q for some q ∈ Q . A f The set of all accepted terms is denoted by L(A) and a set L of ground terms is regular if L = L(A) for some tree automaton A. A tree automaton A is deterministic if there are no ε-transitions and no two transition rules with the same left-hand side. We say that A is completely defined if it contains a transition rule with left-hand side f (p ,..., p ) for every 1 n n-ary function symbol f and every combination p ,..., p of states. All regular sets are 1 n accepted by a completely defined, deterministic tree automaton. The class of regular sets is effectively closed under Boolean operations. Moreover, membership and emptiness are decidable. For relations on ground terms two different types of automata are used. The first one is restricted to binary relations. A ground tree transducer (GTT for short) is a pair G = (A, B) of tree automata over the same signature F.Let s and t be ground terms in T (F ). We say that ∗ ∗ the pair (s, t ) is accepted by G if s → u t for some term u ∈ T (F∪ Q).Here Q is the A B combined set of states of A and B. The set of all such pairs is denoted by L(G). Observe that L(G) is a binary relation on T (F ). A binary relation  on ground terms is a GTT relation if there exists a GTT G such that  = L(G).In FORT we deal with anchored GTTs, which are GTTs with a different acceptance condition: A pair (s, t ) of ground terms is accepted by ∗ ∗ an anchored GTT G if s → q t for some (common) state q. The set of all such pairs A B is denoted by L (G). It can be shown that the resulting language class coincides with binary Rec which is defined in [8, Sect. 3.2.1] as the class of finite unions of Cartesian products of regular sets. The more operational view above benefits the developments described in subsequent sections. We obviously have L (G) ⊆ L(G). Anchored GTT relations have the advantage that they can represent the root-step relation→ , which is not possible with GTT relations as the latter are always reflexive. Moreover, they have better closure properties than GTT relations. When we speak of “anchored GTTs”, we always have L (G) in mind. The second method for representing relations on ground terms uses standard tree automata operating on an encoding of the relation as a set of ground terms over a special signature. For (n) n a signature F and n  0we let F = (F ∪{⊥}) . Here, ⊥ ∈ / F is a fresh constant. The (n) arity of a symbol f ... f ∈ F is the maximum of the arities of f ,..., f and 0 if n = 0. 1 n 1 n (n) Given n terms t ,..., t ∈ T (F ),the term t ,..., t is the unique term u ∈ T (F ) such 1 n 1 n → First-Order Theory of Rewriting… Page 5 of 76 14 that Pos(u) = Pos(t )∪···∪Pos(t ) and u(p) = f ··· f where f = t (p) if p ∈ Pos(t ) 1 n 1 n i i i and⊥ otherwise, for all positions p ∈ Pos(u).If n = 0then Pos(u)={ ε} and u(ε) is the empty sequence. Example 2 For F ={a, b, f, g} in Example 1 we have (2) g(a, f(b)), f(a) = gf(aa, f⊥(b⊥)) ∈ T (F ) (3) a, f(f(b)), g(b, a) = afg(⊥fb(⊥b⊥),⊥⊥a) ∈ T (F ) An n-ary relation R on T (F ) is regular if its encoding { t ,..., t | (t ,..., t ) ∈ 1 n 1 n R} is regular. The class of all n-ary regular relations is denoted by RR . Every (anchored) GTT relation belongs to RR . The well-known construction (presented later in the proof of Theorem 10) is used to decide membership for anchored GTT relations. 3 First-Order Theory of Rewriting We consider first-order logic over a language L without function symbols. The language contains the following binary predicate symbols: →→ = Further predicate symbols will be added to L later in this paper. As models we consider finite linear variable-separated TRSs (F , R) such that the set of ground terms T (F ) is non-empty, which is equivalent to the requirement that the signature F contains at least one constant symbol. The set of ground terms serves as domain for the variables in formulas over L.The interpretation of the predicate symbol→ in (F , R) is the one-step rewrite relation→ over T (F ),→ denotes its transitive-reflexive closure, and= is interpreted as equality on ground terms. Variable-separated TRSs appear naturally when approximating TRSs that satisfy the usual variable restriction (Var(r ) ⊆ Var() for every rewrite rule  → r), to achieve regularity of the set of reachable terms starting from a regular set of ground terms. The support for linear variable-separated TRSs opens up the possibility of using FORT to compute depen- dency graphs based on the non-variable approximation for termination analysis [40], check infeasibility of conditional critical pairs for confluence analysis of conditional TRSs [51], and compute needed redexes based on the strong and non-variable approximations for the analysis of optimal normalizing strategies [18]. The following example gives an idea of the decision procedure for the first-order theory of rewriting. It shows how (closure) operations on tree automata and GTTs are used to obtain tree automata, each of which represent tuples of ground terms satisfying subformulas of the formula of interest. These operations are presented in Sect. 5 together with correctness proofs that have been formalized. Example 3 Consider the formula ϕ=∀ s∃ t (s → t ∧¬∃ u (t → u)) which expresses the normalization property of TRSs. To determine whether a given linear variable-separated TRS R over a signature F satisfies ϕ, we construct automata for the subterms of the formula in a bottom-up fashion. We start with an RR automaton A that 1 1 123 14 Page 6 of 76 A. Middeldorp et al. accepts the ground normal forms in T (F ), using an algorithm first described in [6]and coveredinSect. 5.4: RR A L(A )={t | t ∈ NF(R)} (Theorem 15) 1 1 1 Here t ∈ NF(R) stands for¬∃ u (t → u). Next we construct an anchored GTT G accepting the root-step relation of R: GTT G L (G )={ (s, t ) | s → t} (Theorem 4) 1 a 1 ε Using a modified transitive closure operation, we obtain an anchored GTT G : ∗ ∗ GTT G L (G )={ (s, t ) | s → ·→ ·→ t} (Theorem 8) 2 a 2 ε Since anchored GTT relations are also RR relations we can construct an equivalent RR 2 2 automaton A : ∗ ∗ RR A L(A )={ s, t | s → ·→ ·→ t} (Theorem 10) 2 2 2 ε Using a special context closure operation, we obtain an RR automaton A accepting the 2 3 encoding of→ : RR A L(A )={ s, t | s → t} (Theorem 11) 2 3 3 Before the conjunction in s → t ∧ t ∈ NF(R) can be constructed, the arities of the RR automaton A and the RR automaton A have to match. With this goal A is cylindrified 3 1 1 1 (C ) to construct the RR automaton A . Here care has to be taken that not only the arities 1 2 4 match, but also that terms, taking the place of variables shared by both formulas, are at the same position i in the encoding t ,..., t ,..., t of both automata: 1 i n RR A L(A )={ s, t | t ∈ NF(R)} (Theorem 14) 2 4 4 After this, the intersection with A results in the RR automaton A that models the 3 2 5 conjunction: RR A L(A )={ s, t | s → t ∧ t ∈ NF(R)} (Theorem 12) 2 5 5 Applying the second projection ( , which removes the second component) produces the RR automaton A : 1 6 RR A L(A )={s |∃ t (s → t ∧ t ∈ NF(R))} (Theorem 14) 1 6 6 At this point ϕ holds if and only if L(A ) = T (F ).In FORT the∀ quantifier is transformed into the equivalent¬∃¬. Hence complementation is used to obtain an RR automaton A 1 7 RR A L(A )={s |¬∃ t (s → t ∧ t ∈ NF(R))} (Theorem 13) 1 7 7 and the existential quantifier is implemented using projection. This gives an RR automaton A which either accepts the empty relation ∅ or the singleton set{ ()} consisting of the nullary tuple (). The outermost negation gives rise to another complementation step. The final RR automaton A is tested for emptiness: L(A ) = ∅ if and only the TRS R does not satisfy ϕ. 9 9 123 First-Order Theory of Rewriting… Page 7 of 76 14 Fig. 1 Automata operations for the predicates in the first-order theory of rewriting In order to express termination in the first-order theory of rewriting, we extend L with the binary predicate symbol→ (which denotes the transitive closure of→) and the unary predicate defined below (which goes back to a technical report by Dauchet and Tison [11]). Definition 1 Let  be an arbitrary binary relation on T (F ). We write INF for the set {t ∈ T (F ) | t  u for infinitely many terms u ∈ T (F )}. If we instantiate INF by taking  = → , we obtain the predicate INF that is satisfied by ground terms that have infinitely many reducts. By forbidding cycles, we obtain the formula ¬∃ t (INF ∗ (t ) ∨ t → t ) that expresses termination of finite variable-separated TRSs. The grammar in Fig. 1lists the formalized (closure) operations for the predicates in the first-order theory of rewriting. Here A are anchored GTT relations, R are RR relations, and T are regular sets of ground terms. Some of the operations will be introduced in subsequent sections. The TRS R enters the picture in three places. First of all, → is the root-step relation of R. Secondly, NF denotes the set of ground normal forms of R. Finally, T (F ) denotes the set of ground terms, which depends on the signature F of R. Every atomic subformula (predicate) will be represented as an RR or RR relation. The 1 2 logical structure of formulas in the first-order theory of rewriting is taken care of by additional closure operations on RR relations. 4 Context Operations In the next section we describe formalized automata constructions to decide the first-order theory of rewriting. To save considerable formalization efforts, we introduce a few primitives that operate on binary relations that are accepted by various kinds of tree automata. These primitives are sufficient to generate all binary rewrite relations supported by FORT.For 123 14 Page 8 of 76 A. Middeldorp et al. defining the semantics of the primitives, we introduce some context operations on binary relations in this section. Definition 2 Let F be a signature. A multi-hole context is an element of T (F{ }) where is a fresh constant symbol, called hole.If C is a multi-hole context with n  0 holes and t ,..., t are terms in T (F ) then C[t ,..., t ] denotes the term in T (F ) obtained from C 1 n 1 n by replacing the holes from left to right with t ,..., t . We write C for the set of all multi- 1 n hole contexts. Given a binary relation  on ground terms in T (F ) and a set of multi-hole contexts D ⊆ C, we write D( ) for the relation { (C[t ,..., t ], C[u ,..., u ]) | C ∈ 1 n 1 n D has n holes and t  u for all 1  i  n}. i i We consider two ways to restrict multi-hole contexts: restricting the number of holes and restricting the position of the holes. • We denote the set of multi-hole contexts with exactly one hole by C . The set of multi-hole contexts with at least one hole is denoted by C . Moreover C simply denotes C. • We denote the set of multi-hole contexts with the property that every hole occurs below the root position by C . This includes the set T (F ) of ground terms (which are multi- hole contexts without holes). Similarly, C denotes the set of multi-hole contexts with the property that every hole occurs at the root position. So C ={ }∪ T (F ). Moreover, C simply denotes C. By combining both types of restrictions, we obtain nine ways for defining new binary relations. Definition 3 Let  be a binary relation on T (F ). Given a number constraint n∈{ , 1,>} and a position constraint p ∈{ ,ε,>}, we define the binary relation  on T (F ) as (C ∩ C )( ). = 1 > = Note that  = and  = = ,for any  .Here  = ∪{=} denotes the ε ε reflexive closure of  . Example 4 Recall the TRS R from our leading example and consider the multi-hole contexts C =  C = f() C = g(, a) C = g(, ) C = f(a) 1 2 3 4 5 1 > We have C , C , C ∈ C , C , C , C , C ∈ C , C , C ∈ C ,and C , C , C , C ∈ C . 1 2 3 1 2 3 4 1 5 ε 2 3 4 5 > 1 1 Moreover, (C [a], C [b]) ∈ (→ ) and (C [a, a], C [b, b])/∈ (→ ) . 2 2 R 4 4 R > > Because C = C = C, the relation  is the multi-hole context closure of  .Using the root-step relation → induced by a linear, variable-separated TRS R as  ,weobtain eight different relations for (→ ) : 1 > (→ ) =−→ (→ ) =→ (→ ) =−→˙ ε ε ε 1 > (→ ) =→ (→ ) =→ (→ ) =→ ε ε ε ε ε ε ε ε ε 1 > (→ ) =−→ (→ ) =→ (→ ) =−→ ε > >ε ε >ε ε >ε > > Here −→ denotes a parallel step (which is the multi-hole context closure of →), −→˙ a non- empty parallel step, −→ a parallel step where only redexes below the root are contracted, >ε and−→˙ a non-empty parallel step where only redexes below the root are contracted. >ε Example 5 Consider the term pairs π = (g(a, a), g(b, b)), π = (g(a, a), f(a)),and π = 1 2 3 ˙ ˙  ˙ (g(a, a), g(a, a)).Wehave π ,π ,π ∈−→, π ,π ∈−→, π ∈−→ ,and π ∈−→ \−→ . 1 2 3 1 2 1 >ε 3 >ε >ε 123 First-Order Theory of Rewriting… Page 9 of 76 14 5 Formalized Tree Automata Constructions In this section we present constructions on tree automata and (anchored) GTTs that are required for the decision procedure. Most of the results are known [8]. We give explicit proofs, providing detailed constructions that form the basis of the implementation of the decision procedure in FORT as well as the formalization in Isabelle. Let A = (F , Q, Q ,) be a tree automaton. A state q ∈ Q is reachable if t → q for some term t ∈ T (F ).Wesay that q is productive if C[q]→ q for some ground context C and final state q ∈ Q . The automaton A is trim if all states are both reachable and f f productive. Any tree automaton can be transformed into an equivalent trim automaton. This result has been formalized in IsaFoR by Felgenhauer and Thiemann [21]. The construction preserves determinism. The following results are well-known. Lemma 1 (T ::= T (F )) The set of ground terms over a finite signature F is regular. Theorem 1 (T ::= T ∪ T | T ∩ T | T ) The class of regular sets is effectively closed under union, intersection, and complement. Before we turn to the infinity predicate (T ::= INF ), we present an important closure operation on regular relations. Other closure operations will be presented in Sect. 5.3. Definition 4 Let R be an n-ary relation over T (F ).If n  1and 1  i  n then the i-th projection of R is the relation  (R)={ (t ,..., t , t ,..., t ) | (t ,..., t ) ∈ R}. i 1 i−1 i+1 n 1 n Note that  removes the first component of an RR relation. So for a binary regular 1 n relation R,  (R) coincides with π (R) in the grammar in Fig. 1. 1 2 Theorem 2 (T ::= π (R) | π (R)) The class of regular relations is effectively closed under 1 2 projection. (n) Proof (construction) Let A = (F , Q, Q ,) be a tree automaton that accepts R . Assume n  1and let1  i  n. We construct a tree automaton that accept  (R) .We (n−1) assume that all states of A are reachable and define A = (F , Q, Q , ) where i i is obtained from  by replacing every transition rule of the form f ··· f f f ··· f (p ,..., p ) → q 1 i−1 i i+1 n 1 m with f ··· f f ··· f (p ,..., p ) → q 1 i−1 i+1 n 1 k n−1 provided n = 1or f ··· f f ··· f =⊥ for n > 1. Here k  m is the arity of 1 i−1 i+1 n f ··· f f ··· f . Epsilon transitions in  are not affected. Note that for n = 1this 1 i−1 i+1 n results in an automaton over the signature containing only a single constant () (the nullary tuple). The proof that L(A )=  (R) is given at the end of Sect. 5.3. (2) Example 6 Consider the tree automaton A = (F ,{0,..., 6},{6},) with F = {a, b, f, g} and  consisting of the transition rules aa → 0 bb → 0 gg(0) → 0 ff(0, 0) → 0 ab → 1 bb → 1 gb(2) → 1 fb(2, 2) → 1 a⊥→ 2 b⊥→ 2 g⊥(2) → 2 f⊥(2, 2) → 2 123 14 Page 10 of 76 A. Middeldorp et al. a⊥→ 3 ⊥b → 5 fg(1, 3) → 6 gf(4, 5) → 6 aa → 4 gg(6) → 6 ff(6, 0) → 6 ff(0, 6) → 6 This automaton accepts the encoding of→ on T (F ) induced by the TRS R consisting of the rewrite rules f(x , a) → g(b) g(a) → f(a, b) For the first projection we obtain the automaton  (A) consisting of the transition rules a → 0 b → 0 g(0) → 0 f(0, 0) → 0 b → 1 b → 5 g(1) → 6 f(4, 5) → 6 a → 4 g(6) → 6 f(6, 0) → 6 f(0, 6) → 6 Note that the third row of transitions in  disappeared completely. The rule fg(1, 3) → 6 is transformed into g(1) → 6, so state 3 is dropped. The second projection results in the automaton  (A) that accepts the reducible ground terms of R: a → 0 b → 0 g(0) → 0 f(0, 0) → 0 a → 1 b → 1 g(2) → 1 f(2, 2) → 1 a → 2 b → 2 g(2) → 2 f(2, 2) → 2 a → 3 f(1, 3) → 6 g(4) → 6 a → 4 g(6) → 6 f(6, 0) → 6 f(0, 6) → 6 We now present a formalized proof of a version of the pumping lemma that we need for the infinity predicate INF (in the proof of Theorem 3 below). Lemma 2 Let A = (F , Q, Q ,) be a tree automaton and t → q with t ∈ T (F ) and q ∈ Q. If height(t)> |Q| then there exist contexts C and C = , a term u, and a state p 1 2 ∗ ∗ ∗ such that t = C [C [u]],u → p, C [p]→ p, and C [p]→ q. 1 2 2 1 Proof From the assumptions t → q and height(t)> |Q| we obtain a sequence (t ,..., t , q ,..., q , D ,..., D ) 1 n+1 1 n+1 1 n consisting of ground terms, states, and non-empty contexts with n > |Q| such that • t → q for all i  n + 1, i i • D [t ]= t and D [q ]→ q for all i  n,and i i i+1 i i i+1 • q = q and t = t n+1 n+1 by a straightforward induction proof on t. Because n > |Q| there exist indices 1  i < j  n such that q = q . We construct the contexts C = D [...[D ] ...] and C = i j 1 n j 2 ∗ ∗ D [...[D ] ...]. Note that C =  as i < j.Weobtain C [q ]→ q and C [q ]→ j−1 i 2 2 i j 1 j q by induction on the difference j − i. By letting p = q = q and u = t we obtain the n+1 i j i desired result. 5.1 Infinity Predicate Below we show that INF is regular for every RR relation R. The following definition R 2 originates from [11] and plays an important role in the proof. 123 First-Order Theory of Rewriting… Page 11 of 76 14 (2) Definition 5 Given a tree automaton A = (F , Q, Q ,),the set Q ⊆ Q consists of f ∞ all states q ∈ Q such that ⊥, t → q for infinitely many terms t ∈ T (F ). Example 7 Consider the binary relation n m R={ (f(a, g (b)), g (f(a, b))) | n = 2and m  1or n  3and m = 1} over T (F ) with F ={a, b, f, g}. Its encoding R is accepted by the automaton A = (2) (F , Q, Q ,) with Q ={0,..., 11}, Q ={0},and  consisting of the following f f transition rules: fg(1, 2) → 0 ⊥f(3, 4) → 5 g⊥(6) → 2 b⊥→ 7 fg(8, 9) → 0 ⊥g(5) → 5 g⊥(7) → 6 b⊥→ 11 af(3, 4) → 1 ⊥a → 3 g⊥(10) → 9 ag(5) → 1 af(3, 4) → 8 ⊥b → 4 g⊥(11) → 10 g⊥(11) → 11 For instance, f(a,g(g(b))), g(f(a, b)) = fg(af(⊥a,⊥b), g⊥(g⊥(b⊥))) ∗ ∗ → fg(af(3, 4), g⊥(g⊥(7))) → fg(1, g⊥(6)) → fg(1, 2) → 0 but f(a, g(b), f(a, b)) = ff(aa, gb(b⊥)) is not accepted. We have Q ={5}. State 5 is reached by ⊥, g (f(a, b)) for all n  0. (2) Definition 6 Given A = (F , Q, Q ,), we define the tree automaton (2) ¯ ¯ ¯ A = (F , Q ∪ Q, Q ,∪ ) ∞ f ¯ ¯ Here Q is a copy of Q where every state is dashed: q¯ ∈ Q if and only if q ∈ Q.For every transition rule fg(q ,..., q ) → q ∈  we have the following transition rules in : 1 n fg(q ,..., q )→¯ q if q ∈ Q and f =⊥ (1) 1 n ∞ fg(q ,..., q , q¯ , q ,..., q )→¯ q for all 1  i  n (2) 1 i−1 i i+1 n Moreover, for every ε-transition p → q ∈  we add p¯→¯ q (3) ¯ ¯ to . We write  for ∪ . Dashed states are created by rules of shape (1) and propagated by rules of shapes (2)and (3). The above construction differs from the one in [11]; instead of (1) the latter contains fg(q ,..., q ) →¯ q if q ∈ Q for some i > arity( f ). In an implementation, rather 1 n i ∞ than adding all dashed states and all transition rules of shape (2), the necessary rules would be computed by propagating the dashes created by (1) in order to avoid the appearance of unreachable dashed states. When A is used in isolation, a single bit suffices to record that a dashed state occurred during a computation. Example 8 For the tree automaton A from Example 7 we obtain A by adding the following transition rules (the missing rules of shape (2) involve unreachable states): ¯ ¯ ¯ ¯ ¯ ¯ ¯ ¯ ⊥f(3, 4) → 5 ⊥g(5) → 5 ⊥g(5) → 5 ag(5) → 1 fg(1, 2) → 0 The unique final state of A is 0. We have f(a, g(g(b))), g(f(a, b)) ∈ L(A ) but there is ∞ ∞ no term u such that f(a(g(b)), u ∈ L(A ). 123 14 Page 12 of 76 A. Middeldorp et al. The following preliminary lemma is used in the proof of the theorem below and provides a characterization of the ground terms that reduce to a dashed state. (2) ∗ Lemma 3 Let t be a term in T (F ).Ift → p¯ then there exist a state q ∈ Q ,a context C, and a term s such that t = C[s], root(s) =⊥ f with f ∈ F,s → q¯, and C[¯ q]→ p.¯ Proof Write t = gf (t ,..., t ). We distinguish two cases, depending on when the dash is 1 n introduced in t → p¯. In the first case the dash is created by a root step: ∗ ∗ t → gf (q ,..., q ) → q¯ → p¯ 1 n We have g =⊥ and q ∈ Q by (1). Hence we can take s = t and C = .Notethat root(s) = gf =⊥ f . In the second case the dash is created during the evaluation of an argument t of t, and hence the given sequence t → p¯ can be rearranged as ∗ ∗ t → gf (t ,..., r¯,..., t ) → p¯ 1 n A A ∞ ∞ The induction hypothesis yields a state q ∈ Q , a context C ,and aterm s such that ∗  ∗ t = C [s], root(s) =⊥ f with f ∈ F, s → q¯,and C [¯ q]→ r¯.Inthiscasewe A A ∞ ∞ simply take C = t[C ] = gf (t ,..., C ,..., t ).Wehave t = t[t ] = t[C [s ]] = C[s] i 1 n i i i ∗ ∗ and C[¯ q]= gf (t ,..., C [¯ q],..., t ) → gf (t ,..., r¯,..., t ) → p¯. 1 n 1 n A A ∞ ∞ The following result goes back to a technical report by Dauchet and Tison [11]. Theorem 3 (T ::= INF )Theset INF is regular for every RR relation R. R R 2 (2) Proof Let A = (F , Q, Q ,) be a tree automaton that accepts R . We show that INF =  (L(A )). The regularity of INF then follows from Theorem 2. R 2 ∞ R First suppose t ∈ INF .So t , u ∈ L(A) for infinitely many terms u ∈ T (F ).Since the signature F is finite, there are only finitely many ground terms of any given height. Moreover, height( t , u ) = max (height(t ), height(u)). Hence there must exist a term u ∈ T (F ) with t , u ∈ L(A) such that height(t )+|Q|+ 1 < height(u). This is only possible if there are positions p and q such that p ∈ / Pos(t ), pq ∈ Pos(u),and |Q| < |q|. From Pos( t , u ) = Pos(t )∪ Pos(u) we obtain t , u | = ⊥, u| .Since t , u ∈ L(A) p p there exist states r ∈ Q and q ∈ Q such that f f ∗ ∗ t , u = t , u [ ⊥, u| ] → t , u [r] → q p p p f A A where we assume without loss of generality that the final step in the subsequence ⊥, u| → r uses a non-ε-transition rule. From |Q| < |q| and pq ∈ Pos(u) we infer |Q| < height( ⊥, u| ). Hence we can use the pumping lemma (Lemma 2) to conclude the existence of infinitely many terms v ∈ T (F ) such that ⊥,v → r. Hence r ∈ Q by Definition 5. Since the final step in ⊥, u| → r uses a non-ε-transition rule, we obtain ⊥, u| → r¯ from the construction of A with a final application of a rule of shape (1). p ∞ ∗ ∗ ∗ We obtain t , u [r¯] → q¯ from t , u [r] → q . Hence t , u → q¯ and since p f p f f A A A ∞ ∞ q¯ ∈ Q , t , u ∈ L(A ) and thus t ∈  (L(A )). f f ∞ 2 ∞ Next suppose t ∈  (L(A )).So t , u ∈ L(A ) for some ground terms u. There exists 2 ∞ ∞ a final state q¯ ∈ Q with t , u → q¯ . Using Lemma 3, we obtain a context C,aterm s f f with root(s)=⊥ f for some f ∈ F,and astate q ∈ Q such that C[s]= t , u , s → q¯, and C[¯ q]→ q¯ .Let p be the position of the hole in C.From C[s]= t , u and root(s) = ⊥ f ,weinfer p ∈ Pos(u)\ Pos(t ).Since q ∈ Q the set { v ∈ T (F )| ⊥,v → q} is 123 First-Order Theory of Rewriting… Page 13 of 76 14 Fig. 2 Inference rules for computing Q infinite. Hence the set S={u[v] ∈ T (F )| ⊥,v → q} is infinite, too. Let u[w] ∈ S. p p ∗ ∗ ∗ So ⊥,w → q.Weobtain C[q]→ q from C[¯ q]→ q¯ by erasing all dashes. We f f A A A have C[w]= t , u[w] as p ∈ Pos(u)\Pos(t ). It follows that t , u[w] ∈ L(A) and thus p p there are infinitely many terms u such that t , u ∈ L(A).Since R = L(A) we conclude t ∈ INF as desired. Due to the definition of Q , the automaton A defined in Definition 6 is not executable. ∞ ∞ We present an equivalent but executable definition, which we name Q : Q ={q | p  p and p  q for some state p ∈ Q} Here the relation  is defined using the inference rules in Fig. 2. Intuitively, the first rule initializes the relation. Finding a cycle p  p ensures the existence of infinitely many terms ⊥, s that reduce to p. The other two rules are used to collapse cycles (and other non-empty sequences of ε-transitions) into single steps. Before proving that the two definitions are equivalent, we illustrate the definition of Q by revisiting Example 7. Example 9 We obtain 3  5and 4  5 by applying the first inference rule to the transition rule⊥f(3, 4) → 5. Similarly,⊥g(5) → 5gives rise to 5  5. Since A has no ε-transitions, no further inferences can be made. It follows that Q ={5}. We call a term in T ({⊥}× F ) right-only. A term in T (({⊥}× F )∪{ }) with exactly one occurrence of the hole  is a right-only context. Definition 7 We denote the composition of→ and→ by  . ¬ε ε ∗ ∗ The proof of the next lemma is straightforward. Note that the relations → and  do not coincide on mixed terms, involving function symbols and states. ∗ ∗ Lemma 4 Let C be a ground context. We have C[p]→ q if and only if p → p and C[p ]  q for some state p . ∗ ∗ Proof First we show t  q if t → q, for all ground terms t and states q.Weuse ∗ ∗ induction on t = f (t ,..., t ). The given derivation t → q may be written as t → 1 n ∗ ∗ f (q ,..., q ) → q → q. We obtain t  q for 1  i  n from the induction 1 n i i ¬ε hypothesis. Clearly, f (q ,..., q )  q and hence t  q as desired. 1 n Next we prove the statement of the lemma. The if direction is trivial. For the only-if direction we use induction on the ground context C.Let C[p]→ q.If C =  then we take p = q. Suppose C = f (t ,..., C ,..., t ). We may write the derivation C[p]→ q as 1 n ∗  ∗ t → f (q ,..., q ) → q → q. The induction hypothesis yields a state p such that 1 n ¬ε ∗    ∗ ∗ p → p and C [p ]  q and we obtain t  q for j = i from the first part of the i j j proof. We have f (q ,..., q )  q and hence C[p]= f (t ,..., C [p ],..., t )  q. 1 n 1 n Lemma 5 Q ⊆ Q 123 14 Page 14 of 76 A. Middeldorp et al. Proof We start by proving the following claim: if C[p]  q and C is a non-empty right-only context then p  q (4) We use induction on the structure of C.If C =  there is nothing to show. Suppose C = ⊥ f (t ,..., C ,..., t ) where C is the i-th subterm of C. The sequence C[p]  q can be 1 n ∗  ∗ rearranged as C[p]=⊥ f (t ,..., C [p],..., t )  ⊥ f (q ,..., q ) → q → q.We 1 n 1 n obtain q  q and subsequently q  q by using the inference rules in Fig. 2.If C = i i then p = q and if C =  then the induction hypothesis yields p  q and thus p  q by i i transitivity. This concludes the proof of (4). Assume q ∈ Q , so there exist infinitely many terms t such that ⊥, t → q.Since the signature is finite, there exist terms of arbitrary height. Thus there exists an arbitrary but fixed term t such that the height of t is greater than the number of states of Q. Write t = f (t ,..., t ). Since the height of t is greater than the number of the states in Q,there 1 n exist a subterm s of t,astate p, and contexts C and C =  such that 1 2 1. ⊥, t = C [C [ ⊥, s ]], 1 2 2. ⊥, s → p, 3. C [p]→ p,and 4. C [p]→ q. ∗   ∗ From Lemma 4 we obtain a state q such that p → q and C [q ]  p. Hence q  p by (4). We obtain q  q from q  p in connection with the inference rule for ε-transitions. We perform a case analysis of the context C . • If C =  then p → q and thus q  q follows from q  p in connection with the inference rule for ε-transitions. Hence q ∈ Q . ∗   ∗ • If C =  then Lemma 4 yields a state q such that p → q and C [q ]  q. 1 1 Hence q  q by (4). We also have C [q ]  q and thus q  q by (4). We obtain q  q from the transitivity rule. Hence also in this case we obtain q ∈ Q . For the following lemma, we need the fact that A can be assumed to be trim, so every state is productive and reachable. We may do so because Theorem 3 talks about regular relations, and any automaton that accepts the same language as A will witness the fact that the given relation R is regular. Lemma 6 Q ⊆ Q , provided that A is trim. Proof In connection with the fact that A accepts R ⊆ T (F )× T (F ), trimness of A entails ∗ ∗ ∗ that any run t → q is embedded into an accepting run C[t]→ C[q]→ q ∈ Q .So f f C[t]= u,v for some (u,v) ∈ R, and hence t must be a well-formed term. Moreover, if root(t )=⊥ f for some f ∈ F then t = ⊥, u for some term u ∈ T (F ).Wenow show the converse of claim (4) in the proof of Lemma 5 for the relation→ : if p  q then C[p]→ q for some ground right-only context C =  (5) We prove the claim by induction on the derivation of p  q. First suppose p  q is derived from the transition rule ⊥ f (p ,..., p ,..., p ) → q in  with p = p. Because 1 i n i all states are reachable by well-formed terms, there exist terms t ,..., t ∈ T (F ) such that 1 n ⊥, t → p for all 1  i  n.Let C =⊥ f ( ⊥, t ,..., ,..., ⊥, t ) where the hole i 1 1 n is the i-th argument. We have C [p]→ ⊥ f (p ,..., p ,..., p ) → q. Next suppose 1 1 i n p  q is derived from p  q and q → q. The induction hypothesis yields a ground ∗  ∗ right-only context C =  such that C[p]→ q . Hence also C[p]→ q. Finally, suppose 123 First-Order Theory of Rewriting… Page 15 of 76 14 p  q is derived from p  r and r  q. The induction hypothesis yields non-empty ground ∗ ∗ ∗ right-only contexts C and C such that C [p]→ r and C [r]→ q. Hence C[p]→ q 1 2 1 2 for the context C = C [C ]. This concludes the proof of (5). 2 1 Now let q ∈ Q . So there exists a state p such that p  p and p  q.Using (5), we obtain non-empty ground right-only contexts C and C such that C [p]→ p and 1 2 1 ∗ (2) C [p]→ q. Since all states are reachable, there exists a ground term t ∈ T (F ) such that ∗ ∗ t → p. Hence C [t]→ q and, by the observation made at the beginning of the proof, C [t] is a well-formed term. Since C is right-only, it follows that t = ⊥, u for some term 2 2 u ∈ T (F ). Now consider the infinitely many terms t = C [C [t]] for n  0. We have n 2 t → q and t is right-only by construction. Hence q ∈ Q . n n ∞ Corollary 1 If A is trim then Q = Q . 5.2 Anchored GTT Relations Next we turn our attention to formalized constructions on (anchored) GTTs. Many of the results and automata constructions in this subsection are known. In the formalization we also employ an equivalent but more flexible definition of anchored GTT. Definition 8 A pair automaton is a triple P = (Q, A, B) where A, B are tree automata and ∗ ∗ Q ⊆ Q × Q .Wedefine L(P)={ (s, t ) | s → p and t → q with (p, q) ∈ Q}. A B A B Lemma 7 Anchored GTTs and pair automata are equivalent. Proof If G = (A, B) is a GTT then L (G) = L(P) for the pair automaton P = (Q, A, B) with Q ={ (p, p) | p ∈ Q ∩ Q }. Conversely, given a pair automaton P = (Q, A, B), A B we first rename the states of B to obtain an equivalent tree automaton B such that A and B do not share states. We add an ε-transition p → q to A for every (p, q) ∈ Q, resulting in the tree automaton A .Here q is the (renamed) state in B that corresponds to state q in B. The GTT G = (A , B ) satisfies L (G) = L(P). The above lemma will be used in the sequel without mention. Lemma 8 (A ::= T × T ) If T and U are regular sets of ground terms then T × Uis an anchored GTT relation. Proof Let A = (F , Q , Q , ) and B = (F , Q , Q , ) be tree automata that A fA A B fB B accept T and U.The set T × U is accepted by the pair automaton P = (Q, A, B) with Q = Q × Q . fA fB There are several ways to associate a GTT G = (A, B) with a linear variable-separated TRS R. The one in [9] uses for each rewrite rule  → r of R a unique interface state i, common to A and B, and transition rules and states specific to A (B) that accept all ground instances of  (r) in state i. No states are shared between different rewrite rules. The resulting GTT accepts−→ and→ when viewed as an anchored GTT. The second way to associate a GTT with a linear variable-separated TRS R originates from Dauchet et al. [12]. The resulting GTT accepts a relation in between−→ and→ . The construction that we formalized can be seen as a pair automaton version of the construction in [9]. Theorem 4 [A ::= → ] The relation → is an anchored GTT relation for every linear ε ε variable-separated TRS R. 123 14 Page 16 of 76 A. Middeldorp et al. Proof Let R be a linear variable-separated TRS over a signature F. We denote the set of left-hand (right-hand) sides of the rules in R by lhs(R) (rhs(R)). Given a set of terms T,we write s  T if s is a subterm of some term in T . Given a term s we write sˆ for the ground term obtained from s by replacing each variable with a designated (fresh) constant∗.Let Q be the set of states t for each t  lhs(R)∪ rhs(R).The set  consists of the transitions lhs f ( t ,..., t ) →  f (t ,..., t ) 1 n 1 n for every f (t ,..., t )  lhs(R) and, if some term in lhs(R) contains a variable, 1 n f ( ∗ ,..., ∗ ) → ∗ for every f ∈ F.The set  is defined similarly, using rhs(R) instead of lhs(R) for rhs generating the rules. We now define P = (Q, , ) with Q ={ (  , ˆr ) |  → r ∈ lhs rhs R}. It is easy to prove that L (P )=→ . a R ε The other binary relations associated with a TRS R (like−→ and↔ ) will be obtained from the root-step relation → by automata constructions that operate on anchored GTT relations and RR relations. Example 10 The pair automaton P = (Q, A, B) constructed in the above proof consists of the transition rules : a→ ∗ b→ ∗ f( ∗ )→ ∗ g( ∗ , ∗ )→ ∗ a→ a f( a )→ f(a) g( a , ∗ )→ g(a,∗) : a→ a b→ b f( a )→ f(a) Q: ( a , b )( f(a) , b ) g(a,∗) , f(a) ) and accepts the root-step relation→ of our leading TRS R.The statepairs in Q are presented as ε-transitions and perform the transfer from left-hand sides to right-hand sides of R.For ∗ ∗ instance, g(a, f(f(b))) → f(a) is witnessed by g(a, f(f(b))) → g(a,∗) → f(a) A B f(a). To shorten the notation in subsequent examples, we number the states as follows: 0= ∗ 1= a 2= f(a) 3= g(a,∗) 4= b Hence the transition rules are presented as follows: : a → 0 b → 0 f(0) → 0 g(0, 0) → 0 a → 1 f(1) → 2 g(1, 0) → 3 : a → 1 b → 4 f(1) → 2 Q: (1,4) (2,4) (3,2) To turn P into an equivalent anchored GTT G = (A , B ) we rename states 1 and 2 in B R R into 5 and 6 and add the pairs in Q as ε-transitions to A, after applying the renaming to their targets: : a → 0 b → 0 f(0) → 0 g(0, 0) → 0 a → 1 f(1) → 2 g(1, 0) → 3 1 →42 →43 → 6 : a → 5 b → 4 f(5) → 6 Next we turn to composition and transitive closure. → First-Order Theory of Rewriting… Page 17 of 76 14 Fig. 3  (A, B) Definition 9 Given tree automata A and B,  (A, B) is the set of ε-transitions  defined by the inference rules in Fig. 3. The inference rule[c] appeared first in [17]. Since there are only finitely many ε-transitions between states in Q,  (A, B) can be effectively computed. The next result provides a useful equivalent characterization (which is presented as a definition in [8, 12]). Example 11 For the (anchored) GTT G of Example 10, which will be referred to as G = (A, B) in the following, the set  (A, B) consists of the following seven ε-transitions: 0  5 [c] (0 ← a → 5) 0  6 [c] (0 ← f(0)  f(5) → 6) A B A B 1  5 [c] (1 ← a → 5) 2  6 [c] (2 ← f(1)  f(5) → 6) A B A B 0  4 [c] (0 ← b → 4) 4  5 [a] (4 ← 1  5) A B A 4  6 [a] (4 ← 2  6) Since B does not contain ε-transitions, the inference rule[b] is not used here. Lemma 9 If A and B are tree automata over a signature F then ∗ ∗ (A, B) ={ p  q | p t → q for some ground term t ∈ T (F )} A B ∗ ∗ Proof First suppose there exists a ground term t ∈ T (F ) with p t → q for states A B p of A and q of B.Weshow p  q by induction on t = f (t ,..., t ). The sequence 1 n ∗ ∗  ∗ t → p can be written as t → f (p ,..., p ) → p → p with states p ,..., p , p of 1 n 1 n A A A A ∗  ∗ A. Similarly, t → f (q ,..., q ) → q → q with states q ,..., q , q of B.Wehave 1 n 1 n B B B ∗ ∗ p t → q and thus p  q by the induction hypothesis, for 1  i  n. Hence we i i i i i A B obtain p  q by[c]. Repeated applications of the inference rules[a] and[b] in connection ∗  ∗ with p → p and q → q yields p  q. Hence p  q ∈  (A, B) as desired. A B Next suppose p  q ∈  (A, B). We show the existence of a ground term t ∈ T (F ) ∗ ∗ such that p t → q by induction on the derivation of p  q. In the base case [c] is A B used with p a and a → q for some constant a and hence we can take t = a.For the A B induction step we consider three cases, depending on which inference rule is used to derive p  q. First suppose[c] is used. So there exist transition rules f (p ,..., p ) → p in A and 1 n f (q ,..., q ) → q in B such that p  q for 1  i  n. The induction hypothesis yields 1 n i i ∗ ∗ ∗ ∗ ground terms t ,..., t such that p t → q for 1  i  n. Hence p t → q 1 n i i i A B A B for t = f (t ,..., t ). Next suppose [a] is applied to derive p  q. So there exists a state 1 n p such that p p  q. The induction hypothesis yields a ground term t ∈ T (F ) such ∗ ∗ ∗ ∗ that p t → q and hence also p t → q. The reasoning for[b] is the same. A B A B Theorem 5 (A ::= A ◦ A) Anchored GTT relations are effectively closed under composition. → → → → → 14 Page 18 of 76 A. Middeldorp et al. Fig. 4  (P ) for P = (Q, A, B) Proof Let P = (Q , A , B ) and P = (Q , A , B ) be pair automata (operating on terms 1 1 1 1 2 2 2 2 over the same signature). We construct the pair automaton P = (Q, A , B ) with 1 2 Q = Q ◦  (B , A ) ◦ Q 1 ε 1 2 2 We claim that L(P) = L(P ) ◦ L(P ). First let (s, t ) ∈ L(P).Wehave s → p and 1 2 t → q for some (p, q) ∈ Q. The definition of Q yields states p and q such that (p, p ) ∈ Q , (p , q ) ∈  (B , A ),and (q , q) ∈ Q . According to Lemma 9 there 1 ε 1 2 2 ∗  ∗ exists a ground term u such that u → p and u → q . Hence (s, u) ∈ L(P ) and B A 1 2 (u, t ) ∈ L(P ) and thus (s, t ) ∈ L(P ) ◦ L(P ). 2 1 2 For the converse, let (s, t ) ∈ L(P ) ◦ L(P ). So there exists a ground term u such that 1 2 (s, u) ∈ L(P ) and (u, t ) ∈ L(P ). Hence there are pairs (p , q ) ∈ Q and (p , q ) ∈ Q 1 2 1 1 1 2 2 2 ∗ ∗ ∗ ∗ such that s → p , u → q , u → p ,and t → q . Lemma 9 yields (q , p ) ∈ 1 1 2 2 1 2 A B A B 1 1 2 2 (B , A ). Hence (p , q ) ∈ Q and thus (s, t ) ∈ L(P). ε 1 2 1 2 Example 12 We compose the pair automaton P = (Q, A, B) of Example 10 with itself. We have  (B, A) =  (A, B) ={ (1, 0), (1, 1), (4, 0), (2, 2), (2, 0)}. Hence we obtain ε ε the pair automaton P = (Q , A, B) with Q = Q ◦  (B, A) ◦ Q ={ (3, 4)}.We have L(A, 3) ={g(a, t ) | t ∈ T (F )} and L(B, 4) ={b}. Hence, we obtain L(P ) = L(A, 3)× L(B, 4)=→ as expected. Theorem 6 (A ::= A ) Anchored GTT relations are effectively closed under transitive closure. Proof Let P = (Q, A, B) be a pair automaton. We construct the pair automaton P = ( (P), A, B) where  (P) is the binary relation on states defined by the inference rules + + in Fig. 4 . We claim that L(P ) = L(P) . From the first inference rule we immediately obtain L(P) ⊆ L(P ). The second inference rule, together with the definition of Q in the proof of Theorem 5, yields L(P ) ◦ L(P ) ⊆ L(P ). Hence L(P) ⊆ L(P ). + + + + For the converse, let (s, t ) ∈ L(P ). So there exists a pair p  q such that s → p and ∗ + t → q.Weprove (s, t ) ∈ L(P) by induction on the derivation of p  q.If (p, q) ∈ Q then (s, t ) ∈ L(P). Suppose p  p , (p , q ) ∈  (B, A),and q  q. According to ∗  ∗ Lemma 9 there exists a ground term u such that u → p and u → q . The induction B A + + + hypothesis yields (s, u) ∈ L(P) and (u, t ) ∈ L(P) . Hence also (s, t ) ∈ L(P) . Example 13 Consider the pair automaton P = (Q, A, B) of Example 10. As observed in Example 12,  (B, A)={ (1, 0), (1, 1), (4, 0), (2, 2), (2, 0)}. Hence we obtain the pair automaton P = ( (P), A, B) with  (P) ={ (1, 4), (2, 4), (3, 2), (3, 4)}. The pair + + + (3, 4) is obtained from the second inference rules with p = 3, q = q = 2and r = 4. We have g(a, b) → f(a) → b and the pair (g(a, b), b) is accepted by P as g(a, b) → 3and ε ε + b → 4 with (3, 4) ∈  (P). Furthermore, g(a, b) → f(a) → f(b) but g(a, b) → f(b) B + ε does not hold, and one readily checks that the pair (g(a, b), f(b)) is not accepted by P . Two further closure operations on anchored GTT relations are inverse and union. Recall that GTT relations are not closed under union. 123 First-Order Theory of Rewriting… Page 19 of 76 14 Lemma 10 (A ::= A | A∪ A) Anchored GTT relations are effectively closed under inverse and union. − − Proof Given a pair automaton P = (Q, A, B),wehave L(P) = L(P ) for the pair − − − automaton P = (Q , B, A).Here Q ={ (q, p) | (p, q) ∈ Q}. Given pair automata P = (Q , A , B ) and P = (Q , A , B ) without common states, L(P )∪ L(P ) = L(P) 1 1 1 1 2 2 2 2 1 2 for the pair automaton P = (Q ∪ Q , A ∪ A , B ∪ B ). 1 2 1 2 1 2 Next we present a modified composition operation ◦ that preserves anchored GTT relations. Definition 10 Given two binary relations  and  on the same set of ground terms, their 1 2 modified composition  ◦ is defined as the relation 1 2 ◦ = ◦ ( ) ∪ ( ) ◦ 1 2 1 2 1 2 We have ( ◦ ) = ( ) ◦ ( ) . The proof that anchored GTT relations are 1 2 1 2 closed under ◦ requires a preliminary result on the interplay between GTTs and anchored GTTs. Lemma 11 The composition of an anchored GTT relation and a GTT relation is an anchored GTT relation. Proof Let P = (Q, A , B ) be a pair automaton and G = (A , B ) a GTT. Without loss of 1 1 2 2 generality we assume that P and G do not share states. Define the pair automaton P = (Q, A , B ∪  (A , B )∪ B ) 1 1 ε 2 1 2 ∗ ∗ We claim that L(P ) = L(P) ◦ L(G). First let (s, t ) ∈ L(P ).So s → p and t → q A B with (p, q) ∈ Q and B abbreviating B ∪  (A , B )∪ B . Because P and G do not share 1 ε 2 1 2 states, the sequence t → q can be rearranged as follows: ∗ ∗ ∗ t = C[t ,..., t ]→ C[q ,..., q ]→ C[r ,..., r ]→ q 1 n 1 n 1 n B  (A ,B ) B 2 ε 2 1 1 Here C is a multi-hole context with n  0 holes. Using Lemma 9 we obtain ground terms ∗ ∗ u ,..., u such that u → q and u → r for all 1  i  n. Define the term u = 1 n i i i A B 2 1 ∗ ∗ C[u ,..., u ].Wehave u → C[r ,..., r ]→ q and thus (s, u) ∈ L(P). Furthermore, 1 n 1 n B B 1 1 u → C[q ,..., q ] and thus also (u, t ) ∈ L(G). Hence (s, t ) ∈ L(P) ◦ L(G). 1 n For the converse direction, let (s, t ) ∈ L(P) and (t , u) ∈ L(G).So s → p and t → q with (p, q) ∈ Q. Moreover, there exists a multi-hole context C with n  0 holes, terms t ,..., t , u ,..., u , and states r ,..., r such that t = C[t ,..., t ], u = 1 n 1 n 1 n 1 n ∗ ∗ ∗ C[u ,..., u ],and t → r and u → r for all 1  i  n. The sequence t → q 1 n i i i i A B B 2 2 1 ∗ ∗ can be written as t = C[t ,..., t ]→ C[q ,..., q ]→ q for some states q ,..., q . 1 n 1 n 1 n B B 1 1 By Lemma 9, r → q is a transition rule in  (A , B ). Hence u = C[u ,..., u ]→ i i ε 2 1 1 n ∗ ∗ C[r ,..., r ]→ C[q ,..., q ]→ q and thus (s, u) ∈ L(P ) as desired. 1 n 1 n (A ,B ) B ε 2 1 1 Example 14 We consider the pair automaton P and the GTT G of Example 10.The R R construction in the above proof requires that P and G do not share states, so we R R rename the states of G (by adding a prime). We obtain the pair automaton P = ({ (1, 4), (2, 4), (3, 2)}, A , B ) with A : a → 0 b → 0 f(0) → 0 g(0, 0) → 0 a → 1 f(1) → 2 g(1, 0) → 3 123 14 Page 20 of 76 A. Middeldorp et al. Fig. 5  (A, B) B : a → 1 b → 4 f(1) →20 →11 → 1 a → 5 b → 4 f(5 ) → 6 0 →40 → 2 2 →24 →14 → 2 We can also trim the resulting pair automata by trimming the underlying automata A and B . We declare a state q of A to be productive if C[q]→ r for some context C and state r ∈{ p | (p, p ) ∈ Q}. For the automaton B we use the second components{ p | (p, p ) ∈ Q}. In our case A is already trim, but B simplifies to a → 1 b → 4 f(1) → 2 b → 4 4 →14 → 2 We have L(P )={ (f(a), b), (a, b)}∪{g(a, t ) | t ∈ T (F )}×{b, f(a), f(b)}, which indeed coincides with the relation→ ·−→ induced by our leading TRS R. Theorem 7 (A ::= A ◦ A) Anchored GTT relations are effectively closed under modified composition. Proof The construction L(P)× L(G) → L(P ) in the proof of Lemma 11 and its symmetric counterpart L(G)× L(P) → L(P ) in connection with Lemma 10 ensure that  ◦ is 1 2 an anchored GTT relation. In Theorem 6 we have seen that anchored GTT relations are closed under transitive closure. GTT relations are also closed under transitive closure, which is the reason they were developed in the first place, but the construction is different from the one for anchored GTT relations and the correctness proof is considerably more involved. We present this construction as a modified transitive closure operation that preserves anchored GTT relations. Definition 11 The modified transitive closure  of a binary relation  on ground terms is defined as the relation + + + = ( ) ◦ ◦ ( ) + + We have ( ) = ( ) . The proof that anchored GTT relations are effectively closed under+ employs the set  (A, B) consisting of ε-transitions p  q that are computed by the inference rules in Fig. 5. Definition 12 Given a GTT G = (A, B),wewrite A for A∪  (B, A) and B for B ∪ + + + (A, B).The GTT G is defined as (A , B ). + + + + According to the following lemma, the multi-hole context closure of an anchored GTT relation is a GTT relation using the same GTT. Lemma 12 For every GTT G,L(G) = L (G) . 123 First-Order Theory of Rewriting… Page 21 of 76 14 Proof Let G = (A, B).If (s, t ) ∈ L(G) then there exist a context C with n  0 holes, terms s ,..., s , t ,..., t , and states q ,..., q with s = C[s ,..., s ], t = C[t ,..., t ],and 1 n 1 n 1 n 1 n 1 n ∗ ∗ s → q t for all 1  i  n.Wehave (s , t ) ∈ L (G) for all 1  i  n by definition i i i i i a A B of anchored GTTs. Moreover, C ∈ C ∩C . Hence (s, t ) ∈ L (G) . The converse is equally easy. − + Lemma 13 Let G = (A, B) be a GTT. If (p, q) ∈  (A, B) then (s, t ) ∈ L(G ) for some ground terms s ∈ L(A, p) and t ∈ L(B, q). Proof We use induction on the relation  defined by the inference rules in Fig. 5. In the base case [c] is used with p a and a → q for some constant a and hence we can take A B s = t = a. For the induction step we consider four cases, depending on which inference rule is used to derive p  q. First suppose [c] is used. So there exist transition rules f (p ,..., p ) → p in A and f (q ,..., q ) → q in B such that p  q for 1  i  n.The 1 n 1 n i i − + induction hypothesis yields ground terms s ,..., s , t ,..., t such that (s , t ) ∈ L(G ) , 1 n 1 n i i s ∈ L(A, p ),and t ∈ L(B, q ) for 1  i  n.Let s = f (s ,..., s ) and t = f (t ,..., t ). i i i i 1 n 1 n − + We have s ∈ L(A, p) and t ∈ L(B, q). Moreover, (s, t ) ∈ L(G ) because the transitive closure of a parallel relation is parallel. Next suppose[a] is applied to derive p  q.Sothere exists a state p such that p p  q. The induction hypothesis yields ground terms s and − + t such that (s, t ) ∈ L(G ) , s ∈ L(A, p ),and t ∈ L(B, q). Hence also s ∈ L(A, p).The reasoning for[b] is similar. The final case is the transitivity rule[t].So p  r and r  q for − + some state r. The induction hypothesis yields terms s, t, u, v such that (s, u), (v, t ) ∈ L(G ) , s ∈ L(A, p), u ∈ L(B, r ), v ∈ L(A, r ),and t ∈ L(B, q).From u ∈ L(B, r ) and v ∈ L(A, r ) − − + we infer (u,v) ∈ L(G ). Together with (s, u), (v, t ) ∈ L(G ) , we obtain the desired − + (s, t ) ∈ L(G ) . ∗ ∗ Lemma 14 Let G = (A, B) be a GTT. Let G = (A , B ).Ifs → qthen t → qfor + + + A A some ground term t with (s, t ) ∈ L(G) . Proof We proceed by induction on the length of the reduction s → p. If the last step is an epsilon transition q → p then the induction hypothesis yields a ground term u with (s, u) ∈ L(G) and u ∈ L(A, q).If q → p is a transition from A then u ∈ L(A, p), and we conclude by letting t = u; otherwise, q → p must come from  (B, A),and using Lemma 13 we obtain ground terms v and w with v ∈ L(B, q), w ∈ L(A, p),and + + (v, w) ∈ L(G) . This implies (u,v) ∈ L(G) and thus (s,w) ∈ L(G) by transitivity. Letting t = w gives the desired result. If the last step is not an ε-transition, then it must be a transition f (p ,..., p ) → p from A, and we have s = f (s ,..., s ) for suitable s ,..., s .We 1 n 1 n 1 n apply the induction hypothesis to each argument position, resulting in t ,..., t with (s , t ) ∈ 1 n i i L(G) and t ∈ L(A, p ) for 1  i  n.Let t = f (t ,..., t ).Wehave t ∈ L(A, p).Since i i 1 n + ∗ L(G) is transitive and closed under contexts, we obtain (s, t ) ∈ L(G) .Since L(G) is reflexive, we actually have (s, t ) ∈ L(G) as desired. Lemma 15 Let G = (A, B) be a GTT. If G = (A , B ) then  (A , B ) + + + ε + + =  (A, B). Proof We first show  (A , B ) ⊆  (A, B) via induction on the relation  defined by ε + + + the inference rules in Fig. 3. We proceed by case analysis, so assume (p, q) ∈  (A , B ) ε + + is derived from a congruence step[c]. Hence we obtain (p, q) ∈  (A, B) by a congruence step [c] of Fig. 5, the fact that the constructions only add ε-transitions, and the induction hypothesis. Next assume that we derived (q, r ) ∈  (A , B ) by an ε-step[a].So p → q ε + + A → 14 Page 22 of 76 A. Middeldorp et al. and p  r.Wehave A = A ∪  (B, A). The result trivially follows for p → q. + + A So let (p, q) ∈  (B, A). Hence (q, p) ∈  (A, B). The induction hypothesis yields + + (p, r ) ∈  (A, B) and therefore (q, r ) ∈  (A, B) using the transitivity rule [t].The + + ε-step[b] case is obtained in the same way. For the reverse inclusion we use induction on the relation  defined by the inference rules in Fig. 5 and argue in a similar fashion. Hence  (A , B ) =  (A, B) as desired. ε + + + Theorem 8 (A ::= A ) Anchored GTT relations are effectively closed under modified transitive closure. Proof Let G = (A, B) beaGTT.Weshow L (G ) = L (G) . First let (s, t ) ∈ L (G ). a + a a + ∗ ∗ So there exists a state q such that s → q and t → q. Lemma 14 yields a ground A B + + ∗ + − term u such that u → q and (s, u) ∈ L(G) . Applied to G = (B, A), Lemma 14 yields ∗ − + a ground term v such that v → q and (t,v) ∈ L(G ) . Hence (u,v) ∈ L (G) and + + + (v, t ) ∈ L(G) . Consequently, (s, t ) ∈ L(G) ◦ L (G) ◦ L(G) and, using Lemma 12, + + + L(G) ◦ L (G) ◦ L(G) = L (G) . a a For the other direction we apply the modified composition operation ◦ of Definition 10 with  = = L (G ) and obtain 1 2 a + L (G ) ◦ L(G ) ∪ L(G ) ◦ L (G ) ⊆ L (G ) ◦ L (G ) = L (G ) a + + + a + a + a + a + with the help of Lemma 15. Note that we do not get equality, as one direction in the proof of Lemma 11 requires disjoint state sets. Since L (G) ⊆ L (G ) we also have a a + L (G) ◦ L(G ) ∪ L(G ) ◦ L (G) ⊆ L (G ) a + + a a + At this point we can use the following well-known result in Kleene algebra ∗ ∗ A ⊆ X ∧ B ◦ X ⊆ X ∧ X ◦ C ⊆ X ⇒ B ◦ A ◦ C ⊆ X ∗ + with A = L (G), B = C = L(G),and X = L (G ).Since L(G) = L(G) , we are done. a a + Example 15 For the GTT G = (A, B) of Example 11 we have  (A, B) =  (A, B). + ε Hence G = (A , B ) adds the pairs of  (B, A) ={(5, 0), (5, 1), (4, 0), (6, 0), + + + + (6, 2), (5, 4), (6, 4)} as ε-transitions to A and those of  (A, B) =  (B, A) to B.We + + have (g(a, b), f(b)) ∈ L (G ) as g(a, b) → 6and f(b) → f(4) → f(5) → 6. a + B B B + + + The term pair (f(a), f(b)) does not belong to L (G ). a + The penultimate operation on anchored GTT relations that we consider is complement. This requires the determinization of pair automata. Lemma 16 For every pair automaton P = (Q, A, B) there exist deterministic tree automata d d A and B and a binary relation Q such that L(P) = L((Q , A , B )). Proof We use the subset construction to determinize A and B into equivalent deterministic tree automata A and B . As the binary state relation we take Q ={ (X , Y ) | (p, q) ∈ Q for some p ∈ X ⊆ Q and q ∈ Y ⊆ Q }.Wehave L(P) = L((Q , A , B )) by the correctness A B of the subset construction. Theorem 9 (A ::= A ) Anchored GTT relations are effectively closed under complement. Proof Let G be an anchored GTT. According to Lemma 16 we may assume that L(G) is accepted by a deterministic pair automaton P = (Q, A, B). Without loss of generality we c c may further assume that A and B are completely defined. It follows that L(P) = (Q , A, B) where Q = (Q × Q )\Q. A B 123 First-Order Theory of Rewriting… Page 23 of 76 14 It is worth noting that GTT relations are not closed under complement [8,Exercise3.4]. Example 16 For the pair automaton P = (Q, A, B) of Example 10 we have Q = { (1, 4), (2, 4), (3, 2)}. Determinizing A yields the tree automaton A with the following transition rules: C if X = A D if X = A a → A b → B f(X ) → g(X , Y ) → B otherwise B otherwise for all X , Y ∈{ A, B, C , D}.Here A={0, 1}, B ={0}, C ={0, 2},and D={0, 3}.Next we determinize B to obtain the tree automaton B consisting of the following transition rules: G if X = E a → E b → F f(X ) → g(X , Y ) → H H otherwise for all X , Y ∈{ E , F , G, H}.Here E ={1}, F ={4}, G ={2},and H = ∅. The transition rules for g are added to make B completely defined. Now the complement L(G) of L(G) is accepted by the pair automaton (Q , A , B ) with Q = ({ A, B, C , D}×{ E , F , G, H})\{ (A, F ), (C , F ), (D, G)} The final closure property of anchored GTT relations that we mention is intersection. Lemma 17 (A ::= A∩ A) Anchored GTT relations are effectively closed under intersection. Proof This follows from Theorem 9 and Lemma 10. The formalized proof uses a more efficient product construction, to avoid the subset construction of the complement. 5.3 Regular Relations We continue with operations on regular relations. Again, most of the results and constructions are known. We provide detailed proofs that form the basis of the formalization. The following lemma takes care of transforming anchored GTT relations into binary regular (i.e., RR ) relations. Theorem 10 (R ::= A) Every anchored GTT relation is an RR relation. Proof Let G = (A, B) be a GTT. We construct an RR automaton that accepts L (G).We 2 a use a product construction with states pq where p is a state of A or⊥,and q is a state of B or⊥;the state⊥⊥ is not used. The transitions are fg(p q ,..., p q ) → pq 1 1 k k f⊥(p ⊥,..., p ⊥) → p⊥ 1 n ⊥g(⊥q ,...,⊥q )→⊥q 1 m for all f (p ,..., p ) → p ∈ A and g(q ,..., q ) → q ∈ B,where k = max(n, m) and 1 n 1 m p =⊥ if n < i  k and q =⊥ if m < j  k,and i j pq → p q for all p → p ∈ A and q ∈ Q ∪{⊥} pq → pq for all q → q ∈ B and p ∈ Q ∪{⊥} 123 14 Page 24 of 76 A. Middeldorp et al. These transitions accept s, t in state pq if and only if s ∈ L(A, p) and t ∈ L(B, q).As final states we pick pp with p ∈ Q ∩ Q . A straightforward induction proof reveals that A B the resulting tree automaton accepts L (G). We illustrate the construction on our leading example. Example 17 For the anchored GTT G of Example 11 we obtain the RR automaton A = (2) (F , Q, Q ,) with Q = ({0, 1, 2, 3, 4, 6,⊥}×{4, 5, 6,⊥})\{⊥⊥}, Q ={44, 66}, f f and  consisting of the following transition rules: aa → 05 ab → 04 af(⊥5) → 06 aa → 15 ab → 14 af(⊥5) → 16 ba → 05 bb → 04 bf(⊥5) → 06 fa(0⊥) → 05 fb(0⊥) → 04 ff(05) → 06 fa(1⊥) → 25 fb(1⊥) → 24 ff(15) → 26 ga(0⊥, 0⊥) → 05 gb(0⊥, 0⊥) → 04 gf(05, 0⊥) → 06 ga(1⊥, 0⊥) → 35 gb(1⊥, 0⊥) → 34 gf(15, 0⊥) → 36 a⊥→ 0⊥ b⊥→ 0⊥⊥a→⊥5 a⊥→ 1⊥⊥b→⊥4 f⊥(0⊥) → 0⊥ f⊥(1⊥) → 2⊥⊥f(⊥5)→⊥6 g⊥(0⊥, 0⊥) → 0⊥ g⊥(1⊥, 0⊥) → 3⊥ 14 → 44 24 → 44 34 → 64 15 → 45 25 → 45 35 → 65 16 → 46 26 → 46 36 → 66 1⊥→ 4⊥ 2⊥→ 4⊥ 3⊥→ 6⊥ We have ∗ ∗ g(a, f(b)), f(a) = gf(aa, f⊥(b⊥)) → gf(15, f⊥(0⊥)) → gf(15, 0⊥) → 66 The various context closure operations are taken care of in the following general result. n n Theorem 11 (R ::= R ) If R is an RR relation then R is an RR relation, for all n ∈ 2 2 p p { , 1,>} and p∈{ ,ε,>}. (2) Proof Let A = (F , Q, Q ,) be the RR automaton that accepts R.Weadd twonew f 2 states ∗ and . In the former the encoding of the identity relation on ground terms will be accepted. The latter will serve as the unique final state (unless specified otherwise). This is achieved by extending  with the transitions ff (∗,...,∗) →∗ for every f ∈ F and (2) q →  for every q ∈ Q . The resulting automaton A = (F , Q∪{ ,∗},{ }, ) is equivalent to A and the starting point for the various context closure operations. • For n = 1and p =  we extend  with all rules of the form ff (∗,...,∗, ,∗,...,∗) → • For p = > we need a new final state  to ensure that the surrounding context is non-empty: ff (∗,...,∗, ,∗,...,∗) →  ff (∗,...,∗,  ,∗,...,∗) → 123 First-Order Theory of Rewriting… Page 25 of 76 14 This is sufficient for n = 1. For n = > we add the single ε-transition  →∗ and for n =  we additionally add a new final state ∗ together with transition rules ensuring that the accepted relation is reflexive: ff (∗ ,...,∗ )→∗ • For n = p =  we make∗ the new (and only) final state and add the ε-transition →∗. • For p = ε and n∈{1,>} we have R = R and thus we can just take the RR automaton n = A.For n =  we have R = R and declare∗ as an additional final state. • In the remaining case we have p =  and n = >.Weextend  with all rules of the form ff (∗,...,∗, ,∗,...,∗) → and the single ε-transition →∗. The proof details can be found in the formalization. Example 18 The following transition rules are added to the RR automaton of Example 17 to model the relation L (G) =−→ : a >ε aa→∗ 44 →  ff() →  ff( ) → bb→∗ 66 →  gg(,∗) →  gg( ,∗) → ff(∗)→∗ →∗ gg(∗, ) →  gg(∗,  ) → gg(∗,∗)→∗ The encoding of the term pair (g(f(a), f(a)), g(b, f(b))) is accepted: gg(fb(a⊥), ff(ab)) → ∗ ∗ ∗ gg(fb(1⊥), ff(14)) → gg(24, ff(44)) → gg(44, ff()) → gg(,  ) → gg(∗,  ) → We present one more operation that turns a regular set into an RR relation. Here = 2 T consists of all pairs (t , t ) with t ∈ T . Lemma 18 (R ::== ) If T ⊆ T (F ) is regular then= is an RR relation. T T 2 Proof Let A = (F , Q, Q ,) be a tree automaton that accepts T.Weturn A into the (2) automaton B = (F , Q, Q , ),where  is obtained from  by modifying every tran- sition rule f (p ,..., p ) → q of  into ff (p ,..., p ) → q.The ε-transitions of  are 1 n 1 n kept. It is a trivial exercise to show that L(B) == == . L(A) T The following result is an immediate consequence of the corresponding closure properties on regular sets (Theorem 1). Theorem 12 (R ::= R∪ R | R∩ R) The class of n-ary regular relations is effectively closed under union and intersection for any n  0. The final closure operations on regular relations are required for the logical structure of formulas in the first-order theory of rewriting. Theorem 13 (R ::= R ) The class of regular relations is effectively closed under comple- ment. c c c Given a regular relation R, its complement is denoted by R .Notethat R = R . The former is the topic of Theorem 13 and is used to model logical negation. 123 14 Page 26 of 76 A. Middeldorp et al. n c c c Proof Let R ⊆ T (F ) be a regular relation. We have R = R \W where (n) W ={t ∈ T (F ) | t = t ,..., t for some t ,..., t ∈ T (F )} 1 n 1 n is the set of encodings of n-tuples of ground terms. It is not difficult to show that W is regular. The set R is regular by assumption. Hence the regularity of R is a consequence of Theorem 1. Definition 13 Let R be an n-ary relation over T (F ).If1  i  n + 1 then the i-th cylindrification of R is the relation C (R)={ (t ,..., t , u, t ,..., t ) | (t ,..., t ) ∈ R and u ∈ T (F )} i 1 i−1 i n 1 n Moreover, if σ is a permutation on{1,..., n} then σ(R)={ (t ,..., t ) | (t ,..., t ) ∈ R} σ(1) σ(n) 1 n Theorem 14 The class of regular relations is effectively closed under cylindrification and permutation. In [8, Proposition 3.2.12] the closure under cylindrification is obtained via an inverse homomorphic image, resulting in a shorter proof. The proof of the latter operates on completely defined deterministic tree automata. The (formalized) proof below operates on arbitrary tree automata. (n) Proof Let A = (F , Q, Q ,) be a tree automaton that accepts R . We construct tree automata that accept C (R) and σ(R) . We first consider permutation. Let σ be (n) a permutation on {1,..., n} and define A = (F , Q, Q , ) where  is obtained σ f σ σ from  by replacing every transition rule of the form f ··· f (p ,..., p ) → q with 1 n 1 m f ··· f (p ,..., p ) → q. Epsilon transitions in  are not affected. To conclude σ(1) σ(n) 1 m (n) L(A )= σ(R) , we first define the effect of σ on terms in T (F ): σ(t ) = f ··· f (σ (t ), ...,σ (t )) σ(1) σ(n) 1 m for t = f ··· f (t ,..., t ). The following preliminary fact 1 n 1 m t ,..., t = σ( t ,..., t ) (∗ ) σ(1) σ(n) 1 n σ is proved as follows. We have Pos( t ,..., t ) = Pos(t )∪···∪ Pos(t ) = Pos( t ) = Pos(σ ( t )) σ(1) σ(n) 1 n and, for every position p ∈ Pos( t ,..., t ), σ(1) σ(n) t ,..., t (p) = f ··· f = σ( t )(p) σ(1) σ(n) 1 n where f = t (p) if p ∈ Pos(t ) and f =⊥ otherwise. We now prove i i σ(i ) σ(i ) ∗ ∗ t ,..., t → q ⇐⇒ t ,..., t → q (6) 1 n σ(1) σ(n) A σ(A) for all terms t ,..., t ∈ T (F ∪{⊥}) and states q ∈ Q. Suppose 1 n t ,..., t = f ··· f (u ,..., u ) → q 1 n 1 n 1 m ∗ ∗ So there exists a transition rule f ··· f (q ,..., q ) → p ∈  with p → q and u → q 1 n 1 m i i A A for all 1  i  m.Wehave f ··· f (q ,..., q ) → p ∈  and p → q.Using σ(1) σ(n) 1 m σ σ(A) (∗ ) the induction hypothesis yields σ(u ) → q for 1  i  m and thus σ i i σ(A) t ,..., t = f ··· f (σ (u ), ...,σ (u )) → q σ(1) σ(n) σ(1) σ(n) 1 n σ(A) 123 First-Order Theory of Rewriting… Page 27 of 76 14 The converse is proved in a similar fashion. By specializing (6) to terms t ,..., t ∈ T (F ) 1 n and states q ∈ Q we obtain L(σ (A)) ={σ( t ,..., t ) | t ,..., t ∈ L(A)}= f 1 n 1 n L( σ(R) ). Next we consider cylindrification. Let i ∈{1,..., n + 1}. We define the tree automaton (n+1) A = (F ,(Q∪{⊥})×{! ,⊥}, Q ×{!} , ) where⊥ is a fresh state and  is C f C C i i i obtained from  by replacing every transition rule of the form f ··· f f ··· f (p ,..., p ) → q 1 i−1 i n 1 m with the transitions f ··· f gf ··· f (p q ,..., p q ,..., p q ) → q! 1 i−1 i n 1 1 m m k k f ··· f ⊥ f ··· f (p ⊥,..., p ⊥) → q⊥ 1 i−1 i n 1 m for all l-ary g ∈ F.Here k = max(m, l) is the arity of f ··· f gf ··· f . Moreover, 1 i−1 i n p =⊥ for all m < j  k,and ! if j  l q = ⊥ if j > l for all 1  j  k. Additionally,  contains the transition rule ⊥···⊥g⊥···⊥(⊥! ,...,⊥! )→⊥! for every g ∈ F.Here g is the i-th element in⊥···⊥g⊥···⊥. Finally, for every ε-transition p → q in  we add p!→ q! and p⊥→ q⊥ to  . The purpose of the second component ⊥/! in states of A is to mark whether states are reached by terms where (! )the i-th position in the encoded tuple is a term in T (F ),or(⊥)itis⊥.Inorder to show L(A )= C (R) , C i we simplify the notation by considering i = 1, which entails no loss of generality as regular relations are closed under permutation. Again, first we define the effect of C on terms in (1) (n) T (F )× T (F ): C (s, t ) = ff ··· f (C (s , u ), ..., C (s , u )) 1 1 n 1 1 1 1 k k for s = f (s ,..., s ) and t = f ··· f (u ,..., u ).Here k = max(l, m) is the arity of 1 l 1 n 1 m n (1) ff ··· f , s =⊥ for l < j and u =⊥ for m < j. By induction on s ∈ T (F ) and 1 n j j (n) t ∈ T (F ) we show the preliminary statements Pos(C (⊥, t )) = Pos(t ) and C (⊥, t )(p) =⊥t (p) for all p ∈ Pos(t ) (7) 1 1 n n n Pos(C (s,⊥ )) = Pos(s) and C (s,⊥ )(p) = s(p)⊥ for all p ∈ Pos(s) (8) 1 1 Let t = f ··· f (u ,..., u ).Wehave C (⊥, t )=⊥ f ··· f (C (⊥, u ), ..., C (⊥, u )) 1 n 1 m 1 1 n 1 1 1 k and obtain Pos(C (⊥, u )) = Pos(u ) and C (⊥, u )(q) =⊥u (q) for all ip ∈ Pos(t ) 1 i i 1 i i from the induction hypothesis. Note that ip ∈ Pos(t ) if and only if p ∈ Pos(u ).For p = ε we have C (⊥, t )(p) =⊥ f ··· f =⊥t (p). This establishes (7). The proof of 1 1 n (8) is similar and omitted. These statements are used to prove Pos(C (s, t )) = Pos(s) ∪ Pos(t ) and C (s, t )(p) = s(p)t (p) for all p ∈ Pos(s)∪ Pos(t ), by induction on |s|+|t|. Let s = f (s ,..., s ) and t = f ··· f (u ,..., u ).Let k = max(l, m) be the arity of 1 l 1 n 1 m ff ··· f .Wehave Pos(C (s , u )) = Pos(s )∪ Pos(u ) and C (s , u )(p) = s (p)u (p) 1 n 1 i i i i 1 i i i i for all p ∈ Pos(s ) ∪ Pos(u ) for all 1  i  k.For i  min(l, m) this follows from i i the induction hypothesis and for i > min(l, m) this follows from (7)or(8). Moreover, 123 14 Page 28 of 76 A. Middeldorp et al. C (s, t )(ε) = ff ··· f = s(ε)t (ε) so the second statement also holds for p = ε.From 1 1 n these statements we immediately obtain C (s, t ) = s, t ,..., t (∗ ) 1 1 n C (1) (n) for all terms s ∈ T (F ) and t = t ,..., t ∈ T (F ). The following two properties are 1 n easily proved by induction: n ∗ C (s,⊥ ) → ⊥! (9) C (A) for all terms s ∈ T (F ) and ∗ ∗ t → q ⇐⇒ C (⊥, t ) → q⊥ (10) A C (A) (n) for all terms t ∈ T (F ). For the first one we use induction on s = f (s ,..., s ).We 1 l n n n n n ∗ have C (s,⊥ ) = f⊥ (C (s ,⊥ ), ..., C (s ,⊥ )) and obtain C (s ,⊥ ) → ⊥! 1 1 1 1  1 i C (A) for 1  l  n from the induction hypothesis. By construction f⊥ (⊥! ,...,⊥! ) → n ∗ ⊥! ∈  . Hence C (s,⊥ ) → ⊥! . The second property is proved by induction C 1 C (A) on t = f ··· f (u ,..., u ).Wehave C (⊥, t )=⊥ f ··· f (C (⊥, u ), ..., C (⊥, u )). 1 n 1 m 1 1 n 1 1 1 m First assume t → q. So there exists a transition rule f ··· f (q ,..., q ) → p ∈ 1 n 1 m ∗ ∗ with p → q and u → q for all 1  i  m. The induction hypothesis yields i i A A C (⊥, u ) → q ⊥ for 1  i  m. By construction ⊥ f ··· f (q ⊥,..., q ⊥) → 1 i i 1 n 1 m C (A) ∗ ∗ p⊥∈  and p⊥→ q⊥. Combining all this yields C (⊥, t ) → q⊥.For the C 1 C (A) C (A) 1 1 converse, assume C (⊥, t ) → q⊥. So there exists a rule⊥ f ··· f (q ⊥,..., q ⊥) → 1 1 n 1 m C (A) ∗ ∗ p⊥∈  with p⊥→ q⊥ and C (⊥, u ) → q ⊥ for all 1  i  m.The C 1 i i C A C (A) 1 1 induction hypothesis yields u → q for 1  i  m. Furthermore, the transition rule i i ⊥ f ··· f (q ⊥,..., q ⊥) → p⊥ originates from f ··· f (q ,..., q ) → p ∈  and we 1 n 1 m 1 n 1 m ∗ ∗ ∗ obtain p⊥→ q⊥ from p → q. Hence t → q as desired. This completes the proofs C (A) A A of (9)and (10). Next we prove ∗ ∗ t → q ⇐⇒ C (s, t ) → q! (11) A C (A) (n) for all s ∈ T (F ), t ∈ T (F ) and q ∈ Q. For the only-if direction we use induc- tion on t = f ··· f (u ,..., u ).Let s = f (s ,..., s ).From t → q we obtain 1 n 1 m 1 l ∗ ∗ f ··· f (p ,..., p ) → p ∈  with p → q and u → p for all 1  i  m.We 1 n 1 m i i A A have ff ··· f (p q ,..., p q ,..., p q ) → p!∈ 1 n 1 1 m m k k C by construction. Here k = max(l, m) is the arity of ff ··· f , p =⊥ for all m < i  k, 1 n i q =! if 1  i  l and q =⊥ if l < i  k.Wehave p!→ q! and C (s, t ) = i i 1 C (A) ff ··· f (C (s , u ), ..., C (s , u )) with s =⊥ for l < i  k and u =⊥ for m < 1 n 1 1 1 1 k k i i i  k. The induction hypothesis yields C (s , u ) → p ! for all 1  i  min(l, m). 1 i i i C (A) Note that!= q .For min(l, m)< i  k we distinguish two cases. n ∗ • If min(l, m) = m then m < i and thus u =⊥ . We obtain C (s , u ) → ⊥! from i 1 i i C (A) (9). Note that p =⊥ and q =! . i i • If min(l, m) = l then l < i and thus s =⊥. We obtain C (s , u ) → p ⊥ from i 1 i i i C (A) (10). Note that q =⊥. So in all cases we have C (s , u ) → p q . Hence 1 i i i i C (A) ∗ ∗ C (s, t ) → ff ··· f (p q ,..., p q ,..., p q ) → p!→ q! 1 1 n 1 1 m m k k C (A) C (A) C (A) 1 1 1 123 First-Order Theory of Rewriting… Page 29 of 76 14 as desired. The if-direction of (11) is proved in a similar fashion. From C (s, t ) = ff ··· f (C (s , u ), ..., C (s , u )) → q! 1 1 n 1 1 1 1 k k C (A) we obtain a rule ff ··· f (p q ,..., p q ,..., p q ) → p!∈  with p!→ 1 n 1 1 m m k k C C (A) q! and C (s , u ) → p q for 1  i  k.Wehave f ··· f (p ,..., p ) → p ∈ 1 i i i i 1 n 1 m C (A) ∗ ∗ and p → q due to the construction of  . The induction hypothesis yields u → p C i i A 1 A for 1  i  m and thus t = f ··· f (u ,..., u ) → q. Specializing (11) to terms 1 n 1 m t = t ,..., t with t ,..., t ∈ T (F ) and q ∈ Q yields L(C (A))={ s, t ,..., t | 1 n 1 n f 1 1 n t ,..., t ∈ L(A) and s ∈ T (F )}= C (R) . 1 n 1 Note that for every RR relation R, its inverse R is the same as σ(R) for the permutation σ = (12). Corollary 2 (R ::= R ) The class of binary regular relations is effectively closed under inverse. (2) Example 19 Consider the RR automaton A = (F , Q, Q ,) of Example 17. We compute 2 f C ({ (s, t , u) | s → u and t ∈ T (F )}. To this end, we transform A by the construction in the 2 ε (3) above proof. This results in an automaton B = (F , Q , Q , ) with Q = (Q∪{⊥})× {! ,⊥}, Q ={44 , 66 },and  consisting of 183 transitions. Every non-ε-transition in ! ! gives rise to five transitions in  . For instance, the transitions aaa → 05 afa(⊥ ) → 05 aga(⊥ ,⊥ ) → 05 ! ! ! ! ! ! aba → 05 a⊥a → 05 ! ⊥ originate from aa → 05 and the transitions ⊥af(⊥5 )→⊥6 ⊥ff(⊥5 )→⊥6 ⊥gf(⊥5 ,⊥ )→⊥6 ! ! ! ! ! ! ! ⊥bf(⊥5 )→⊥6 ⊥⊥f(⊥5 )→⊥6 ! ! ⊥ ⊥ originate from ⊥f(⊥5) →⊥6. Moreover, every ε-transition in  is duplicated in  .For instance, 25 → 45 gives rise to 25 → 45 and 25 → 45 . Finally,  contains the ! ! ⊥ ⊥ transitions ⊥a⊥→⊥ ⊥b⊥→⊥ ⊥f⊥(⊥ )→⊥ ⊥g⊥(⊥ ,⊥ )→⊥ ! ! ! ! ! ! ! So in total there are 31× 5+ 12× 2+ 4 = 183 transitions in  . In Theorem 14 and its proof we have finally introduced all concepts needed to complete the proof that RR relations are closed under projection (Theorem 2). It remains to be shown that L(A )=  (R) . Proof of Theorem 2 (cont’d) To simplify the notation, we consider  (which entails no loss of generality as regular relations are closed under permutation). Again, first we define the (n) effect of  on terms in T (F ): (t ) = f ··· f ( (u ), ..., (u )) 1 2 n 1 1 1 k for t = f ··· f (u ,..., u ).Here k  m is the arity of f ··· f .Weshow 1 n 1 m 2 n (C (s, t )) = t (12) 1 1 123 14 Page 30 of 76 A. Middeldorp et al. (1) (n) for all terms s ∈ T (F ) and t ∈ T (F ) by induction on|s|+|t|.Solet s = f (s ,..., s ) 1 l and t = f ··· f (u ,..., u ).Wehave 1 n 1 m (C (s, t )) =  ( ff ··· f (C (s , u ), ..., C (s , u ))) 1 1 1 1 n 1 1 1 1 k k = f ··· f ( (C (s , u )), . . . ,  (C (s , u ))) 1 n 1 1 1 i 1 1 m m = f ··· f (u ,..., u ) = t 1 n 1 m Here k = max(l, m) is the arity of ff ··· f , s =⊥ for l < j, u =⊥ for m < j,and the 1 n j j induction hypothesis is applied to  (C (s , u )) for 1  i  m. Now we can easily show 1 1 i i ( t ,..., t ) = t ,..., t (∗ ) 1 1 n 2 n (n) for all terms t ,..., t ∈ T (F ).From(∗ ) in the proof of Theorem 14 we obtain 1 n C t , t ,..., t = C (t , t ,..., t ) 1 2 n 1 1 2 n and thus  ( t ,..., t ) =  (C (t , t ,..., t )) = t ,..., t using (12). We now 1 1 n 1 1 1 2 n 2 n prove the following two statements: ∗ ∗ t → q ⇒  (t ) → q (13) A  (A) (n) for all terms t ∈ T (F ) and states q ∈ Q,and ∗ ∗ (n) u → q ⇒ t → q for some term t ∈ T (F ) with  (t ) = u (14) (A) A (n) for all terms u ∈ T (F ). We prove the first statement by induction on t. Suppose t = f ··· f (u ,..., u ) → q 1 n 1 m So there exist a transition rule f ··· f (q ,..., q ) → p ∈  with p → q such 1 n 1 m that u → q for all 1  i  m. To simplify the reasoning, we assume that the con- i i n−1 dition f ··· f =⊥ in the definition of  is temporarily lifted. This entails that 2 n f ··· f (q ,..., q ) → p is a transition rule in  .Here k  m is the arity of f ··· f . 2 n 1 k  2 n ∗ ∗ We have p → q. The induction hypothesis yields (u ) → q for 1  i  m. i i (A)  (A) 1 1 Hence ∗ ∗ (t ) = f ··· f ( (u ), ..., (u )) → f ··· f (q ,..., q ) → q 1 2 n 1 1 1 k 2 n 1 k (A)  (A) 1 1 as desired. For the second statement, suppose u = f ··· f (u ,..., u ) → q and 2 n 1 k (A) so there exists a transition rule f ··· f (q ,..., q ) → p ∈  with p → q and 2 n 1 k 1  (A) u → q for all 1  i  k. By construction of  (A), there exist a function symbol f ∈ i i 1 1 (A) F∪{⊥} and states q ,..., q such that f f ··· f (q ,..., q ) → p ∈ .Here m  k k+1 m 1 2 n 1 m (n) is the arity of f ··· f . From the induction hypothesis we obtain terms v ,...,v ∈ T (F ) 1 n 1 k such that v → q and  (v ) = u for 1  i  k. Because all states of A are reachable, i i 1 i i (n) ∗ there exist terms v ,...,v ∈ T (F ) such that v → q for k + 1  j  m.Now let k+1 m j j ∗ ∗ t = f ··· f (v ,...,v ). We clearly have t → f ··· f (q ,..., q ) → p Moreover, 1 n 1 m 1 n 1 m A A (t ) = f ··· f ( (v ), ..., (v )) = f ··· f (u ,..., u ) = u. This concludes the 1 2 n 1 1 1 k 2 n 1 k proof of the two statements. Specializing statement (13)to t = t ,..., t where t ,..., t ∈ 1 n 1 n T (F ) and states q ∈ Q yields  (L(A)) ⊆ L( (A)). From statement (14) we conclude f 1 1 L( (A)) ⊆  (L(A)) and hence 1 1 L( (A)) ={  ( t ,..., t )| t ,..., t ∈ L(A)}=  (R) 1 1 1 n 1 n 1 n−1 It remains to show that the automaton  (A) does not use any rule⊥ → p to accept terms n−1 when n > 1. Since L( (A)) =  (R) and  (R) ⊆ T (F ) ,notermin  (R) 1 1 1 1 n−1 contains the function symbol⊥ . 123 First-Order Theory of Rewriting… Page 31 of 76 14 5.4 Normal Form Predicate At this point we have formalized proofs for the constructs in the grammar in Fig. 1, with the exception of the normal form predicate (T ::= NF). This predicate can be defined in the first-order theory of rewriting as NF(t ) ⇐⇒¬ u (∃t → u) which gives rise to the following procedure: 1. Using Theorems 4, 10 and 11 an RR automaton is constructed that accepts the encoding of the rewrite relation→. 2. Using Theorem 2 the RR automaton of step 1 is projected into a tree automaton that accepts the set of reducible ground terms, corresponding to the subformula∃ u (t → u). 3. Complementation (Theorem 13) is applied to the automaton of step 2 to obtain a tree automaton that accepts the set of ground normal forms. Since projection may transform a deterministic tree automaton into a non-deterministic one, this is inefficient. In this section we provide a direct construction of a tree automaton that accepts the set of ground normal forms of a left-linear TRS, which goes back to Comon [6], and present a formalized correctness proof. Throughout this section R is assumed to be left-linear. We start with defining some preliminary concepts. Definition 14 Given a signature F, we write F for the extension of F with a fresh constant symbol⊥.Given t ∈ T (F , V), t denotes the result of replacing all variables in t by⊥: ⊥ ⊥ ⊥ ⊥ x =⊥ f (t ,..., t ) = f (t ,..., t ) 1 n 1 n We define the partial order  on T (F ) as the least congruence that satisfies ⊥  t for all terms t ∈ T (F ): t  u ··· t  u 1 1 n n ⊥  t f (t ,..., t )  f (u ,..., u ) 1 n 1 n The partial map↑: T (F )× T (F ) → T (F ) is defined as follows: ⊥ ⊥ ⊥ ⊥↑ t = t ↑⊥ =tf (t ,..., t ) ↑ f (u ,..., u ) = f (t ↑ u ,..., t ↑ u ) 1 n 1 n 1 1 n n It is not difficult to show that t ↑ u is the least upper bound of comparable terms t and u. ⊥ ⊥ Definition 15 Let R be a TRS over a signature F. We write T for the set {t | t for some  → r ∈ R}∪{⊥}.The set T is obtained by closing T under↑. Example 20 Consider the TRS R consisting of following rules: h(f(g(a), x , y)) → g(a) g(f(x , h(x ), y))) → x h(f(x , y, h(a))) → h(x ) We start by collecting the subterms of the left-hand sides: T ={⊥, a, g(a), h(⊥), h(a), f(g(a),⊥,⊥), f(⊥, h(⊥),⊥), f(⊥,⊥, h(a))} Closing T under↑ adds the following terms: f(g(a),⊥,⊥) ↑ f(⊥, h(⊥),⊥) = f(g(a), h(⊥),⊥) f(⊥,⊥, h(a)) ↑ f(⊥, h(⊥),⊥) = f(⊥, h(⊥), h(a)) f(g(a), h(⊥),⊥) ↑ f(⊥, h(⊥), h(a)) = f(g(a), h(⊥), h(a)) 123 14 Page 32 of 76 A. Middeldorp et al. Lemma 19 The set T is finite. Proof If t ↑ u is defined then Pos(t ↑ u) = Pos(t )∪ Pos(u). It follows that the positions ⊥ ⊥ ⊥ of terms in T \T are positions of terms in T .Since T is finite, there are only finitely many such positions. Hence the finiteness of T follows from the finiteness of F. Although the above proof is simple enough, we formalized the proof below which is based on a concrete algorithm to compute T . Actually, the algorithm presented below is based on a general saturation procedure, which is of independent interest. Definition 16 Let f : U×U → U be a (possibly partial) function and let S be a finite subset of U.The closure C (S) is the least extension of S with the property that f (a, b) ∈ C (S) f f whenever a, b ∈ C (S) and f (a, b) is defined. The following lemma provides a sufficient condition for closures to exist. The proof gives a concrete algorithm to compute the closure. Lemma 20 If f is a total, associative, commutative, and idempotent function then C (S) exists and is finite. Proof If S = ∅ then C (S) = ∅ and the claim trivially holds. Suppose S = ∅ and let a be an arbitrary element in S.Weshow C (S) = C (S\{a})∪{a}∪{ f (a, c) | c ∈ C (S\{a})} f f f Since S is finite, this gives rise to the following iterative algorithm to compute C (S): I := ∅; for all x ∈ S do I := I ∪{ x}∪{ f (x , y) | y ∈ I} return I In each iteration only finitely many elements are added. Hence C (S) is finite. It remains to show the above equation. The inclusion from left to right is immediate from the definition of C (S).Let b be an arbitrary element of C (S).If b ∈ S then b ∈ C (S\{a})∪{a}.If b ∈ / S f f f then b = f (a , f (a ,... f (a , a )...)) for some sequence of elements a ,..., a ∈ S. 1 2 n−1 n 1 n If a is an element of this sequence then, using the properties of f , we may assume a appears exactly once in the sequence. Hence b = f (a, c) for some element c ∈ C (S\{a}).If a is not an element of a ,..., a then b ∈ C (S\{a}). This completes the proof. 1 n f Since our function↑ is partial, we need to lift it to a total function that preserves associa- tivity and commutativity. In our abstract setting this entails finding a binary predicate P on U such that f (a, b) is defined if P(a, b) holds. In addition, the following properties need to be fulfilled: • P is reflexive and symmetric, • if P(a, f (b, c)) and P(b, c) hold then P(a, b) and P( f (a, b), c) hold as well, for all a, b, c ∈ U . For the details we refer to the formalization. Definition 17 The tree automaton A = (F , Q, Q ,) is defined as follows: Q = NF(R) f Q = T and  consists of all transition rules f (p ,..., p ) → q such that f (p ,..., p ) f ↑ 1 n 1 n is no redex of R and q is the maximal element of Q satisfying q  f (p ,..., p ). 1 n Since states are terms from T ⊆ T (F ) here, Definition 14 applies. ↑ ⊥ 123 First-Order Theory of Rewriting… Page 33 of 76 14 Example 21 For the TRS R of Example 20, the tree automaton A consists of the NF(R) following transition rules: 2if p = 1 4if p = 1 a → 1 g(p) → h(p) → 0if p∈{ / 1, 6, 9, 10} 3if p∈{ / 1, 8, 10} ⎪ 5if p = 2, q ∈{ / 3, 4} 6if p = 2, q ∈{3, 4}, r = 4 f(p, q, r ) → 7if q ∈{ / 3, 4}, r = 4 8if p = 2, q ∈{3, 4}, r = 4 9if p = 2, q ∈{3, 4}, r = 4 f(p, q, r ) → 10 if p = 2, q ∈{3, 4}, r = 4 0 otherwise Here we use the following abbreviations: 0=⊥ 3 = h(⊥) 6 = f(⊥, h(⊥),⊥) 8 = f(g(a), h(⊥),⊥) 1 = a 4 = h(a) 7 = f(⊥,⊥, h(a)) 9 = f(⊥, h(⊥), h(a)) 2 = g(a) 5 = f(g(a),⊥,⊥) 10 = f(g(a), h(⊥), h(a)) As can be seen from the above example, the tree automaton A is not completely NF(R) defined. Unlike the construction in [6], we do not have an additional state that is reached by all reducible ground terms. Before proving that A accepts the ground normal forms of R,wefirstshowthat NF(R) A is well-defined, which amounts to showing that for every f (p ,..., p ) with f ∈ F NF(R) 1 n and p ,..., p ∈ T the set of states q such that q  f (p ,..., p ) has a maximum element 1 n ↑ 1 n with respect to the partial order . Lemma 21 For every term t ∈ T (F ) the set {s ∈ T | s  t} has a unique maximal ⊥ ↑ element. Proof Let S ={s ∈ T | s  t}. Because ⊥  t and⊥∈ T , S = ∅.If s , s ∈ S then ↑ ↑ 1 2 s  t and s  t and thus s ↑ s is defined and satisfies s ↑ s  t.Since T is closed 1 2 1 2 1 2 ↑ under↑, s ↑ s ∈ T and thus s ↑ s ∈ S. Consequently, S has a unique maximal element. 1 2 ↑ 1 2 The next lemma is a trivial consequence of the fact that A has no ε-transitions. NF(R) Lemma 22 The tree automaton A is deterministic. NF(R) ∗ ⊥ ⊥ Lemma 23 If t ∈ T (F ) with t → q and s  t for a proper subterm s of some left-hand side of R then s  q. Proof We use induction on t.Let t = f (t ,..., t ).Wehave t → f (q ,..., q ) → q. 1 n 1 n We proceed by case analysis on s.If s is a variable then s =⊥ and, as ⊥ is minimal in , we obtain s  q. Otherwise we must have root(s) = f from the assumption ⊥ ⊥ ⊥ s  t . So we may write s = f (s ,..., s ). The induction hypothesis yields s  q for 1 n i ⊥ ⊥ ⊥ ⊥ all 1  i  n. Hence s = f (s ,..., s )  f (q ,..., q ). Additionally we have s ∈ Q 1 n by Definition 17 as s is a proper subterm of a left-hand side of R.Since f (q ,..., q ) → q 1 n is a transition rule, we obtain f (s ,..., s )  q from the maximality of q. 1 n 123 14 Page 34 of 76 A. Middeldorp et al. Table 1 Summary of (formalized) closure properties Operation GTTs Anchored GTTs RR Operation Regular relations Union ×  Union Intersection ×  Intersection Complement ×  Complement Composition   Projection Inverse   Cylindrification Transitive closure  × Permutation Context closure × Using the previous result we can prove that no redex of R reaches a state in A . NF(R) Lemma 24 If t ∈ T (F ) is a redex then t → q for no state q ∈ T . Proof We have   t for some left-hand side  of R. For a proof by contradiction, assume ∗ ∗ t → q. Write t = f (t ,..., t ).Wehave t → f (q ,..., q ) → q and obtain 1 n 1 n f (q ,..., q ) by a case analysis on  and Lemma 23. Therefore the transition rule 1 n f (q ,..., q ) → q cannot exist by Definition 17. 1 n Lemma 25 If t → q and t ∈ T (F ) then q  t. ∗ ∗ Proof We use induction on t.Let t = f (t ,..., t ).Wehave t → f (q ,..., q ) → q. 1 n 1 n The induction hypothesis yields q  t for all 1  i  n and thus also f (q ,..., q ) i i 1 n f (t ,..., t ).Wehave q  f (q ,..., q ) by Definition 17 and thus q  t by the transitivity 1 n 1 n of . Lemma 26 If t ∈ NF(R) then t → q for some state q ∈ T . Proof We use induction on t.Let t = f (t ,..., t ).Since t ,..., t ∈ NF(R) we obtain 1 n 1 n f (t ,..., t ) → f (q ,..., q ) from the induction hypothesis. Suppose f (q ,..., q ) is 1 n 1 n 1 n aredex,so   f (q ,..., q ) for some left-hand side  of R. From Lemma 25 we obtain 1 n q  t for all 1  i  n and thus f (q ,..., q )  f (t ,..., t ). Hence   f (t ,..., t ). i i 1 n 1 n 1 n This however contradicts the assumption that t is a normal form. (Here we need left-linearity of R.) Therefore f (q ,..., q ) is no redex and thus, using Lemma 21, there exists a transition 1 n f (q ,..., q ) → q in  and thus t → q. 1 n Theorem 15 (T ::= NF) If R is a left-linear TRS then L(A ) = NF(R). NF(R) Proof Let t ∈ T (F ).If t ∈ NF(R) then t → q for some state q ∈ T by Lemma 26.Since all states in T are final, t ∈ L(A ). Next assume t ∈ / NF(R). Hence t = C[s] for some ↑ NF(R) redex s. According to Lemma 24 s does not reach a state in A . Hence also t cannot NF(R) reach a state and thus t ∈ / L(A ). NF(R) 5.5 Decision Procedure In Table 1 we summarize the effective closure properties that were presented in detail in this section and formalized in Isabelle. The asterisks indicate that for anchored GTTs we have two closure properties each. The underlined result (the closure of RR relations under 123 First-Order Theory of Rewriting… Page 35 of 76 14 Table 2 Binary predicates as RR relations 1 1 − → = (→ ) ← = ((→ ) ) ε ε → = (→ ) ε ε + +  > → = ((→ ) ) 1 ∗ + → = (→ ) → = ((→ ) ) >ε ε ε > > >ε ∗ + −→  = (→ ) → = ((→ ) ) ε ε + + 1 ∗ − + → = ((→ ) ) ↔ = (((→ ) ∪→ ) ) ε ε ε ε ε − 1 + − + ↔ = ((→ ) ∪→ ) ↓ = ((→ ) ◦ (→ ) ) ε ε ε ! + → = ((→ ) ) ∩ (T (F )× NF) composition) is not used in the decision procedure but does hold: If R and R are RR 1 2 2 relations then R ◦ R =  (C (R )∩ C (R )). Concerning the empty entry in the table, 1 2 2 3 1 1 1 it can be shown that GTT relations are closed under the context operation (·) if and only if n ∈{ , 1,>} and p ∈{ ,ε}. The second and third columns in the left part of Table 1 correspond to the A and R parts of the grammar in Fig. 1. The logical structure of formulas in the first-order theory of rewriting is taken care of by the closure operations on regular relations listed in the second half of Table 1. In Table 2 we show how some of the common binary predicates in term rewriting are represented as RR relations using the corresponding operations. These are added to the language L of the first-order theory of rewriting without compromising the decidability result that is presented below. Theorem 16 The first-order theory of rewriting is decidable for finite linear variable- separated TRSs. Proof Let ϕ(x ,..., x ) be a first-order formula over the language L with free variables 1 n x ,..., x .Let R be a finite linear variable-separated TRS over a signature F. We construct 1 n an RR automaton that accepts the encoding of the relation [[ϕ]] = { (t ,..., t ) | R n 1 n ϕ(t ,..., t )}. For closed formulas, checking R ϕ then boils down to checking non- 1 n emptiness of [[ϕ]] , which is decidable. We prove the (correctness of the) construction by structural induction on ϕ. In the base case ϕ is an atomic formula and we distinguish the following cases. 1. If ϕ = (x → y) then we use Theorem 4 to obtain an anchored GTT for → ,which is transformed into an RR automaton for → by Theorem 10. An application of 2 ε Theorem 11 with n = 1and p =  yields an RR automaton for (→ ) =[[ϕ]]. 2 ε 2. If ϕ = (x → y) then we repeat the constructions in the previous case, with an additional application of modified transitive closure (Theorem 8) before Theorem 11 (with n = p = ) is applied. 3. If ϕ = (x = y) then [[ϕ]] is regular by Lemma 18. Here we assume that x = y.If x and y are the same variable then [[ϕ]] is a set of ground terms and the above constructions need to be modified as follows. If ϕ = (x = x ) then [[ϕ]] = { t | t ∈ T (F )}= T (F ) is accepted by the tree automaton (F ,{q},{q},) with  consisting of all rules f (q,..., q) → q for f ∈ F. Consider ϕ = (x → x ).Wehave { t , t | t → t}={ t , u | t → u and t = u}. The latter is regular (cases 1 and 3 above R R 123 14 Page 36 of 76 A. Middeldorp et al. together with Theorem 12) and hence the regularity of [[ϕ]] = { t | t → t} follows by an application of Theorem 2. In the remaining case (ϕ = (x → x )) we reason as in the previous case (using cases 2 and 3 above). Next we consider the propositional connectives. 4. Suppose ϕ =¬ψ. The induction hypothesis yields an RR automaton that accepts [[ψ]] . Since the class of n-ary regular relations is effectively closed under complement (Theorem 13), we obtain an RR automaton that accepts [[ϕ]] . 5. Suppose ϕ = ψ ∧ψ .Since ψ and ψ may have less free variables than ϕ, we cannot use 1 2 1 2 Theorem 12 without further ado. Let y ,..., y be the free variables in ψ and z ,..., z 1 k 1 1 m be the free variables in ψ .Wehave{ x ,..., x }={ y ,..., y }∪{z ,..., z }. Because 2 1 n 1 k 1 m regular relations are closed under permutation (Theorem 14), we may assume that the variables in y ,..., y and z ,..., z are listed in the same order as in x ,..., x .The 1 k 1 m 1 n induction hypothesis yields an RR automaton A for [[ψ ]] and an RR automaton A k 1 1 m 2 for [[ψ ]] .Using 2n − (k + m) applications of cylindrification (Theorem 14), these automata are turned into RR automata. Since n-ary regular relations are closed under intersection (Theorem 12), we obtain an RR automaton for [[ϕ]] . 6. The other binary connectives are handled exactly like conjunction. The final cases involve the two quantifiers. 7. Suppose ϕ=∃ x ψ.If x does not occur free in ψ then [[ϕ]] = [[ψ]] and hence the result follows immediately from the induction hypothesis. So we assume that x occurs free in ψ and n  0. The induction hypothesis yields an RR automaton that accepts [[ψ]] . n+1 Since the class of regular relations is effectively closed under projection (Theorem 2), we obtain an RR automation that accepts [[ϕ]] . 8. The case ϕ =∀ x ψ reduces to the preceding case by the well-known equivalence ∀ x ψ ≡¬∃ x¬ ψ. 6 Properties on Non-ground Terms Since tree automata operate on ground terms, the decision procedure presented in the pre- ceding section is restricted to properties on ground terms. The following example shows that ground-confluence, i.e., confluence restricted to ground terms, is not the same as confluence. Example 22 The left-linear right-ground TRS R consisting of the rules a →bf(a, x ) →bf(b, b) → b over the signature F ={a, b, f} is ground-confluent because every ground term in T (F ) rewrites to b. Confluence does not hold; the term f(a, x ) rewrites to the different normal forms b and f(b, x ). In this section we present results that allow the use of FORT on (certain) properties over arbitrary terms. The main idea is to extend the given signature F with constants to replace variables in terms. The required number of additional constants depends on the property under consideration. We consider the following confluence-related properties: ∗ ∗ CR:∀ s∀ t∀ u (s → t ∧ s → u ⇒ t ↓ u) confluence = ∗ SCR:∀ s∀ t∀ u (s → t ∧ s → u ⇒ ∃ v(t → v ∧ u → v)) strong confluence WCR:∀ s∀ t∀ u (s → t ∧ s → u ⇒ t ↓ u) local confluence 123 First-Order Theory of Rewriting… Page 37 of 76 14 Fig. 6 Confluence-related properties on ground and non-ground terms ∗ ! ! NFP:∀ s∀ t∀ u (s → t ∧ s → u ⇒ t → u) normal form property ! ! UNR:∀ s∀ t∀ u (s → t ∧ s → u ⇒ t = u) unique normal forms with respect to reduction UNC:∀ t∀ u (t ↔ u ∧ NF(t )∧ NF(u) ⇒ t = u) unique normal forms with respect to conversion ∗ ∗ Here t ↓ u denotes joinability: ∃v(t → v ∧ u → v).Let P be the collection of these properties. We also consider the following properties involving two TRSs R and S: ∗ ∗ ∗ ∗ COM:∀ s∀ t∀ u (s → t ∧ s → u ⇒ ∃ v(t → v ∧ u → v)) commutation R S S R ∗ ∗ CE:∀ s∀ t (s ↔ t ⇐⇒ s ↔ t ) conversion equivalence R S ! ! NE:∀ s∀ t (s → t ⇐⇒ s → t ) normalization equivalence R S Let P ={COM, CE, NE}. For a property P ∈ P ∪ P , GP denotes the property P restricted 2 1 2 to ground terms. The diagram in Fig. 6summarizes the relationships between properties P and GP for P ∈ P . The properties CE, NE ∈ P are unrelated. 1 2 According to the following result, all considered properties are closed under signature extension. Lemma 27 Let R and S be linear variable-separated TRSs over a common signature F. 1. If P ∈ P and (F , R) Pthen (F {c}, R) P. 2. If P ∈ P and (F , R, S) Pthen (F {c}, R, S) P. Proof Let U be a linear variable-separated TRS not containing the constant c.For any x ∈ V, the mapping φ : T (F {c}, V) → T (F , V) replaces all occurrences of c in terms by the variable x: ⎪ x if t = c φ (t ) = t if t ∈ V x x f (φ (t ), ...,φ (t )) if t = f (t ,..., t ) 1 n 1 n c c x ∗ x ∗ A straightforward induction proof reveals that φ (s) → φ (t ) whenever s → t.By c c U U choosing x ∈ / Var(s) ∪ Var(t ), the reverse direction holds as well. Moreover, since linear variable-separated TRSs are closed under rule inversion, the equivalence also holds for↔ = → . The lemma is an easy consequence of these facts. We illustrate this for COM. U∪ U ∗ ∗ x ∗ x Given s → t and s → u, with s, t , u ∈ T (F {c}, V), we obtain φ (s) → φ (t ) c c R S R x ∗ x and φ (s) → φ (u). Commutation of (F , R, S) yields a term v ∈ T (F , V) such that c S c x ∗ x ∗ ∗ φ (t ) → v and φ (u) → v.Bytaking x ∈ / Var(t ) ∪ Var(u),weobtain t → v and c c S R S u → v for v = v{x → c} by closure of rewriting under substitutions. So adding constants preserves the properties of interest. For removing constants more effort is required. For the properties in P ∪ P , root steps will play a major role. Root 1 2 123 14 Page 38 of 76 A. Middeldorp et al. steps are important since they permit the use of different substitutions for the left and right- hand side of the employed rewrite rule, due to variable separation. We therefore start with a preliminary result (Lemma 28) which provides abstract conditions that permit the restriction ∗ε∗ ∗ ε ∗ to rewrite sequences containing root steps. We write→ for the relation→ ·→ ·→ . R R R R The proof of Lemma 28 is obtained by a straightforward induction on the term structure and the multi-hole context closure of the rewrite relation, and is omitted. Definition 18 A binary predicate P on terms over a given signature F is closed under multi- hole contexts if P(C[s ,..., s ], C[t ,..., t ]) holds whenever C is a multi-hole context 1 n 1 n over F with n  0 holes and P(s , t ) holds for all 1  i  n. i i Lemma 28 Let A and B be TRSs over the same signature F and let P be a binary predicate that is closed under multi-hole contexts over F. ∗ε∗ ∗ 1. If s → t ⇒ P(s, t ) for all terms s and t then s → t ⇒ P(s, t ) for all terms A A s and t. ∗ε∗ ∗ ∗ ∗ε∗ 2. If s → ·→ t ∨ s → ·→ t ⇒ P(s, t ) for all terms s and t then A B A B ∗ ∗ s → ·→ t ⇒ P(s, t ) for all terms s and t. A B For example, in the results below (Lemmata 34 and 35)for NFP we make use of this ∗ − lemma by instantiating part 2 with P (s, t ): NF(t ) ⇒ s → t, R for A,and R for B. This results in the statement that if ∗ε∗ ∗ ∗ ∗ε∗ ∗ s → ·→ t ∨ s → ·→ t ⇒ NF(t ) ⇒ s → t R R R R R then ∗ ∗ ∗ s → ·→ t ⇒ NF(t ) ⇒ s → t R R R Using the identity → = ← and the definition of NFP, it follows that NFP is a consequence of the statement ∗ε∗ ∗ ∗ ∗ε∗ ∗ s ·→ t ∨ s ·→ t ⇒ NF(t ) ⇒ s → t R R R R R for all s, t ∈ T (F ). Hence we only need to consider rewrite sequences involving root steps, which together with variable separation significantly simplifies the proof. For the other properties of interest, Lemma 28 is instantiated as follows. • For UNC we use part 1 with P (s, t ): NF(s)∧ NF(t ) ⇒ s = t and R∪ R for A. • For UNR we use part 2 with the same predicate P and R for A and R for B. ∗ ∗ − • For COM we use part 2 with P (s, t ): s → ·→ t and R for A and S for B. 3 − • For CR we use part 2 with the same predicate P and replace S by R. • For NE we use part 1 twice, with P (s, t ): NF (t ) ⇒ s → t and R for A, and with 4 R P (s, t ): NF (t ) ⇒ s → t and S for A. 5 S ∗ − • For CE we use part 1 twice, with P (s, t ): s → t and R ∪ R for A, and with S∪ S ∗ − P (s, t ): s → t and S ∪ S for A. 7 − R∪ R ∗ε∗ ∗ε∗ ∗ ∗ In addition, we make use of the identities→ =↔ and→ =↔ for UNC − − R R R∪ R R∪ R and CE. Lemma 29 The properties P ,..., P are closed under multi-hole contexts. 1 7 Strong confluence (SCR) and local confluence (WCR) cannot be reduced to root steps with Lemma 28, because they involve single steps in their definition, which are not multi-hole context closed. However, by investigating the positions involved in s → t and s → u we easily deduce a reduction to root steps for both properties. → → First-Order Theory of Rewriting… Page 39 of 76 14 Lemma 30 A TRS is local confluent if and only if s → t ∧ s → u ⇒ t ↓ u for all terms s, t and u. A TRS is strongly confluent if and only if ε ε = s → t ∧ s → u ∨ s → t ∧ s → u ⇒ t → · u for all terms s, t and u. The next lemma is a key result. It allows the removal of introduced fresh constants while preserving the reachability relation. Note that variable-separation is not required. Lemma 31 Let R be a linear TRS over a signature F that contains a constant c which does ∗ ∗ not appear in R.Ifs → t with c ∈ Fun(s)\Fun(t ) then s[u] → t using the same R R rewrite rules at the same positions, for all terms u and positions p ∈ Pos(s) such that s| = c. The restriction to linear TRSs can also be lifted, at the expense of a more complicated replacement function and proof. Since the decision procedure implemented in FORT-h relies on linearity and variable-separation, we present a simple proof for linear TRSs. Due to calculations involving positions, the formalization in Isabelle/HOL was anything but simple. Proof We use induction on the length of s → t. If this length is zero then there is nothing to show as Fun(s)\Fun(t ) = ∅. Suppose s → v → t and write s = C[σ]→ R R C[r σ]= v.Let p be the position of the hole in C and let p ∈ Pos(s) with s| = c.We distinguish two cases. If p  p then s[u] = (C[u] )[σ] → v with v = (C[u] )[r σ] .Since v| = p p R p p p p ∗  ∗ C| = c we can apply the induction hypothesis to v → t. This yields v → t and hence R R s[u] → t as desired. In the remaining case, p  p.From s| = c and the fact that c does not appear in R we infer that there exists a variable y ∈ Var() such that c ∈ Fun(σ (y)).Let q be the (unique) position of y in  and consider the substitution σ(y)[u]  if x = y τ(x ) = σ(x ) otherwise Here q = p\(p q) is the position of c in σ(y).If y ∈ / Var(r ) then v = C[r σ]= C[r τ] and thus s[u] = C[τ]→ C[r τ]= v → t.If y ∈ Var(r ) then there exists a unique p R position q ∈ Pos(r ) such that r|  = y.So v|    = c and we obtain s[u] = C[τ]→ q p q q p R C[r τ]= v[u] → t from the induction hypothesis. p q q In the proofs below Lemma 31 (also for R ) is used as follows. Let σ denote the sub- ∗ ∗ stitution mapping all variables to c.If sσ → t then s → t by repeated applications of R R Lemma 31 (if the conditions are satisfied). We now prove that two fresh constants are sufficient to reduce commutation (COM), confluence (CR), local confluence (WCR), unique normal forms (UNC and UNR), and the normal form property (NFP) to the corresponding ground properties. Lemma 32 Linear variable-separated TRSs R and S over a common signature F commute if and only if R and S ground-commute over F {c, d}. → 14 Page 40 of 76 A. Middeldorp et al. Proof The only-if direction follows from Lemma 27. For the if direction suppose R and S ground-commute on terms in T (F{c, d}). In order to conclude that R and S commute on terms in T (F , V), according to Lemma 28, it suffices to show the inclusions ∗ε∗ ∗ ∗ ∗ ∗ ∗ε∗ ∗ ∗ → ·→ ⊆→ ·→ → ·→ ⊆→ ·→ − − − − S S R R S S R ∗ε∗ ∗ on terms in T (F , V). Suppose s → ·→ t. Let the substitution σ map all variables − c R S to c and let σ map all variables to d. Since rewriting is closed under substitutions and the variable-separated rule used in the root step → allows changing the substitution, we ∗ε∗ ∗ ∗ ∗ obtain sσ → ·→ t σ . From ground commutation we obtain sσ → ·→ t σ . c − d c − d S S R R Note that s and t are terms in T (F , V) and hence do not contain the constants c and d. Therefore, d ∈ / Fun(sσ ) and c ∈ / Fun(t σ ). As a consequence, repeated applications of c d ∗ ∗ ∗ ∗ Lemma 31 transform sσ → ·→ t σ into a sequence s → ·→ t in which c and d c − d − S R S R do not appear, proving the first inclusion. Note that in our setting TRSs are closed under rule reversal. Hence we can apply Lemma 31 in both directions, which allows us to remove the ∗ ∗ε∗ ∗ ∗ constant d from the term t. The second inclusion→ ·→ ⊆→ ·→ is obtained − − S S R R in the same way. If the TRSs R and S are left-linear right-ground (as opposed to linear variable-separated) then the term t in the above proof is ground due to the root step involved. Hence t σ = t, which allows us to simplify the proof and strengthen the statement to use only one additional constant. Lemma 33 Left-linear right-ground TRSs R and S over a common signature F commute if and only if R and S ground-commute over F{c}. The proof for confluence follows directly from commutation. The proofs for the other properties in P are obtained in a similar manner. We present the proof details for strong confluence since it requires a bit more effort. Lemma 34 Let R be a linear variable-separated TRS over a signature F.If P ∈ P then (F , R) P ⇐⇒ (F {c, d}, R) GP Proof We present the if direction for P = SCR. First we use Lemma 30 to reduce the problem to local peaks involving a root step. Following the reasoning in the proof of Lemma 32,we = ∗ ∗ obtain a witness v such that t σ → v uσ .If t σ = v then uσ → t σ and we obtain d c d c d R R R ∗ ∗ u → t with the help of Lemma 31. So assume uσ → ·→ t σ . Using Lemma 31 c R d R R and induction on the number of variables in u we deduce u → ·→ − t σ .The same argument applied to t produces u → w → − t. Note that w may contain occurrences of the constants c and d since R is a variable-separated TRS. We use the map defined in the proof x x ∗ x x x x of Lemma 27 to eliminate these: u = φ (φ (u)) → φ (φ (w)) → − φ (φ (t )) = t. c d R c d c d Lemma 35 Let R be a left-linear right-ground TRS over a signature F.If P ∈ P \{UNC} then (F , R) P ⇐⇒ (F {c}, R) GP Moreover, (F , R) UNC ⇐⇒ (F {c, d}, R) GUNC → First-Order Theory of Rewriting… Page 41 of 76 14 The simplification in the proof of Lemma 32 for left-linear right-ground systems is not applicable for UNC as conversion can introduce variables. The following example shows that adding a single fresh constant is indeed insufficient for UNC. Example 23 The left-linear right-ground TRS R consisting of the rules a →bf(x , a) → f(b, b) f(b, x ) → f(b, b) f(f(x , y), z) → f(b, b) does not satisfy UNC since f(x , b) ← f(x , a) → f(b, b) ← f(y, a) → f(y, b) is a conversion between distinct normal forms. Adding a single fresh constant c is not enough to violate GUNC as the last two rewrite rules ensure that f(c, b) is the only ground instance of f(x , b) that is a normal form. Adding another fresh constant d, GUNC is lost: f(c, b) ← f(c, a) → f(b, b) ← f(d, a) → f(d, b). The following example shows that at least two fresh constants are required to reduce confluence to ground-confluence for linear variable-separated TRSs. Example 24 Consider the linear variable-separated TRS R consisting of the single rule a → x over the signature F ={a}.Since x ← a → y with distinct variables x and y, R is R R not confluent. Ground-confluence holds trivially as a → a is the only rewrite step between ground terms. Adding a single fresh constant b does not destroy ground-confluence (a → a and a → b are the only steps). By adding a second fresh constant c, ground-confluence is lost: b ← a → c. R R We now turn our attention to the equivalence properties (CE and NE)in P . For conversion equivalence a single fresh constant suffices to reduce it to ground conversion equivalence. Lemma 36 Linear variable-separated TRSs R and S over a common signature F such that T (F ) = ∅ are conversion equivalent if and only if R and S are ground conversion equivalent over F {c}. Proof For the if direction we assume that R and S are ground conversion equivalent over ∗ε∗ ∗ F {c}. Due Lemma 28 and symmetry, it suffices to show the inclusion ↔ ⊆↔ R S ∗ε∗ on terms in T (F , V). Suppose s ↔ t.Let d ∈ F be a constant, whose existence is guaranteed by the assumption T (F ) = ∅, and consider the substitutions σ and σ in the c d ∗ε∗ proof of Lemma 32. Closure under substitutions and variable separation yields sσ ↔ t σ c c ∗ε∗ ∗ ∗ and sσ ↔ t σ . Ground conversion equivalence gives sσ ↔ t σ and sσ ↔ t σ ,and c d c c c d R S S ∗ ∗ ∗ ∗ thus also t σ ↔ t σ . Using Lemma 31 yields s ↔ t σ and t ↔ t σ . Hence s ↔ t as c d d d S S S S desired. The only-if direction easily follows from Lemma 27. Two fresh constants are required to reduce normalization equivalence to its ground version. Lemma 37 Linear variable-separated TRSs R and S over a common signature F are nor- malization equivalent if and only if R and S are ground normalization equivalent over F {c, d}. Proof For the if direction we assume that R and S are ground normalization equivalent over F {c, d}. Note that this implies that NF (t ) ⇐⇒ NF (t ) for all terms t.Weapply R S ∗ε∗ ∗ Lemma 28 and symmetry, reducing the problem to s → t ⇒ NF (t ) ⇒ s → t. R S Let σ and σ be substitutions replacing all variables by c and d respectively. Closure under c d ∗ε∗ substitution and variable separation yields sσ → t σ ,and NF (t σ ) since d does not c d R d appear in R. Ground normalization equivalence gives sσ → t σ . Applying Lemma 31 we c d obtain the desired s → t. The only-if direction follows from Lemma 27. 123 14 Page 42 of 76 A. Middeldorp et al. Contrary to Lemma 36 one fresh constant is not sufficient as seen by the following counterexample. Example 25 Consider the two linear variable-separated TRSs R: a →bf(f(x , y), z) → f(b, b) f(b, x ) → f(b, b) f(x , a) → f(z, b) S: a →bf(f(x , y), z) → f(b, b) f(b, x ) → f(b, b) f(b, a) → f(z, b) f(f(x , y), a) → f(z, b) They are not normalization equivalent since f(x , a) → f(z, b) and f(x , a) → ∗f(z, b). R S The TRSs are however ground normalization equivalent over the signature F {c}.First observe that the only ground normal forms reachable via a rewrite sequence involving a root step are b and f(c, b). The normal form b is reached (using a root step) only from a,in both R and S. The normal form f(c, b) can be reached from all ground terms of the shape f(t , a).For R this is obvious and for S this can be seen by a case analysis on the root symbol of t. Adding a second constant d allows one to mimick the original counterexample since f(c, a) → f(d, b) and f(c, a) → ∗f(d, b). R S For left-linear right-ground TRSs, a single fresh constant is enough to reduce normalization equivalence to ground normalization equivalence. Lemma 38 Left-linear right-ground TRSs R and S over a common signature F are nor- malization equivalent if and only if R and S are ground normalization equivalent over F {c}. Proof We mention the differences with the proof of Lemma 37. For the equivalence of NF (t ) ∗ε∗ and NF (t ) for arbitrary terms t ∈ T (F , V), a single constant suffices. If s → t then t ∗ ∗ is ground. Hence sσ → t and thus sσ → t by ground normalization equivalence. c c R S Lemma 31 gives s → t. Each additional constant can increases the execution time of FORT-h significantly, as seen later in Example 36. Hence results that reduce the required number are of obvious interest. In the remainder of this section we present results for ground TRSs and for TRSs over monadic signatures, which are signatures that consist of constants and unary function symbols. Lemma 39 Let R and S be right-ground TRSs over a signature F.If R and S are ground or F is monadic then (F , R) P ⇐⇒ (F , R) GP for all P ∈ P (F , R, S) COM ⇐⇒ (F , R, S) GCOM Proof First assume that R is ground. In this case only ground subterms can be rewritten. Given a term t ∈ T (F , V), we write t = C[[t ,..., t ]] if t = C[t ,..., t ] and t ,..., t are 1 n 1 n 1 n the maximal ground subterms of t. So all variables appearing in t occur in C. The following property is obvious: ∗ ∗ (a) if t = C[[t ,..., t ]] → u then u = C[[u ,..., u ]] and t → u for all 1  i  n. 1 n 1 n i i R R 123 First-Order Theory of Rewriting… Page 43 of 76 14 ∗ ∗ Suppose (F , R) GCR and consider s → t and s → u with s ∈ T (F , V). Writing R R s = C[[s ,..., s ]], we obtain t = C[[t ,..., t ]] and u = C[[u ,..., u ]] with s → t 1 n 1 n 1 n i i and s → u for all 1  i  n. GCR yields t ↓ u for all 1  i  n. Hence t ↓ u as i i i i desired. The proofs for the other properties in P are equally easy. For UNC we note that↔ ∗ − coincides with→ for the ground TRS R∪ R . R∪ R Next suppose that F is monadic. Let (F , R) GP and let σ be the substitution that maps all variables to some arbitrary but fixed ground term. In this case the following property holds: (b) if t ∈ T (F , V) and t → u then u ∈ T (F ) and t σ → u. We consider P = NFP and P = UNC and leave the proof for the other properties to the ! ! reader. Let s → t and s → u. We obtain sσ → t and sσ → u from property 2. R R R R ∗ ∗ (Note that s = u.) Hence t → u follows from GNFP.Let t ↔ u with normal forms R R t and u.If t and u are ground terms then we obtain t = u from GUNC (after applying the substitution σ to all intermediate terms in the conversion between t and u). Otherwise, the conversion between t and u must be empty due to property (b) and the fact that t and u are normal forms. Hence also in this case t = u. In contrast to COM, the properties NE and CE require additional constants for TRSs over monadic signatures. Example 26 The linear variable-separated TRSs R: f(x ) → a S: f(a) →af(f(a)) → a are neither normalization equivalent nor conversion equivalent as can be seen from f(x ) → a and f(x ) ↔ a. Since every ground term rewrites in R and in S to the unique ground normal form a, the TRSs are ground normalization equivalent as well as ground conversion equivalent. Nevertheless, we can reduce the number of constants to one if the signature is monadic. A key observation is that in non-empty rewrite sequences in a linear variable-separated TRS over a monadic signature fresh constants can be replaced by arbitrary terms. Lemma 40 Let R be a linear variable-separated TRS over a monadic signature F that contains a constant c which does not appear in R.Ifs → t and p ∈ Pos(s) such that s| = cthen s[u] → t using the same rewrite rules at the same positions, for all terms p p u. The proof follows the same structure as Lemma 31 and the details are left for the reader. As linear variable-separated TRSs are closed under inverse we can immediately deduce that + + rewrite sequences of the shape sσ → t σ imply s → t for monadic systems. With this c c R R we are ready to prove our claim. Lemma 41 Linear variable-separated TRSs R and S over a common monadic signature F are normalization equivalent if and only if R and S are ground normalization equivalent over F {c}. Proof We again mention the differences with the proof of Lemma 37. For the equivalence of NF (t ) and NF (t ) for arbitrary terms t ∈ T (F , V), a single constant suffices. Consider a R S ∗ε∗ rewrite sequence s → t with NF (t ). Ground normalization equivalence and substitution 123 14 Page 44 of 76 A. Middeldorp et al. Table 3 Additional constants required to reduce a property to the corresponding ground property Property Ground TRSs Left-linear right-ground TRSs Linear variable-separated TRSs CR 0 1 (0) 2 (2) SCR 0 1 (0) 2 (2) WCR 0 1 (0) 2 (2) COM 0 1 (0) 2 (2) UNR 0 1 (0) 2 (2) UNC 0 2 (0) 2 (2) NFP 0 1 (0) 2 (2) CE 0 1 (1) 1 (1) NE 0 1 (1) 2 (1) ∗ ∗ε∗ closure yields sσ → t σ . Furthermore, since the sequence s → t is non-empty by def- c c S R inition we know that¬NF (sσ ), which in turn yields¬NF (sσ ). Together with NF (t σ ) R c S c S c this means sσ = t σ , and we obtain sσ → t σ . Applying Lemma 40 twice allows us to c c c c replace c in sσ and t σ by the corresponding variables, leading to s → t. c c The following example shows that we cannot reduce the number of constants (in Lem- mata 32 and 34) for linear variable-separated TRSs over a monadic signature and properties P ∈ P ∪{COM}. Example 27 The monadic linear variable-separated TRS R consisting of the rules g(a) → g(x ) g(g(x )) → g(y) does not satisfy WCR and UNR, and hence also not CR, SCR, NFP and UNC, because g(x ) ← g(a) → g(y) with different normal forms g(x ) and g(y). Adding a single fresh constant c is insufficient to violate GSCR and thus also GCR, GWCR, GNFP, GUNC and GUNR, because every term in T ({g, a, c}) can reach precisely one of the three ground normal forms a, c or g(c) and they can all do so in at most one step. Adding an additional constant d does suffice: g(c) ← g(a) → g(d) with different ground normal forms g(c) and g(d).The same behaviour is observed for COM by noting that a TRS is (ground) confluent if and only if it (ground) commutes with itself. The results in this section are summarized in Table 3, which shows the number of additional constants needed to reduce a property to the corresponding property on ground terms. In parentheses are the numbers for monadic TRSs. For termination (SN) and normalization (WN) there is no need to add fresh constants, since these properties hold if and only if they hold for all ground terms. For other properties that can be expressed in the first-order theory of rewriting, one or two fresh constants may be insufficient. Consider for instance the formula ϕ: ∗ ∗ ∗ ¬∃ s∃ t∃ u∀ v(v ↔ s ∨ v ↔ t ∨ v ↔ u) which is satisfied on arbitrary terms (with respect to any left-linear right-ground TRS (F , R)). For the TRS consisting of the rule f(x ) → a and two additional constants c and d, ϕ does not hold for ground terms because every ground term is convertible to a, c or d. It is tempting to believe that adding a fresh unary symbol g in addition to a fresh constant c,inorder to create infinitely many ground normal forms which can replace variables that appear in open 123 First-Order Theory of Rewriting… Page 45 of 76 14 terms, is sufficient for any property P. The formula ∀ s∀ t (s → t ⇒ s → t ) and the TRS consisting of the rule a → b show that this is incorrect. 7 Automation and Certification 7.1 Decision Mode FORT-h is a new decision tool for the first order theory of rewriting. It is a reimplemen- tation of the decision mode of the previous FORT tool [48], referred to as FORT-j in the remainder of the paper. The decision procedure implemented in FORT-j is based on the orig- inal procedure described in [10, 11], in which the basic relations are one-step and parallel rewriting. Anchored GTTs, which form the backbone of the formalized decision procedure described in this paper and implemented in FORT-h, were developed later. The new tool is implemented in Haskell whereas FORT-j is written in Java. FORT-h supports all features of FORT-j while extending the domain of supported TRSs from left-linear right-ground TRSs to linear variable-separated ones. While FORT-j could technically take such TRSs as input, it is unsound when checking non-ground properties on them. Example 28 To check confluence of the linear variable-separated TRS g(g(x )) → g(y) a → g(a) FORT-h can be called with the formula CR. It correctly states that NO the system is not confluent. However, FORT-j incorrectly identifies this as confluent due to the lack of support for variables appearing in right-hand sides of rules. FORT-h took part in the 2020, 2021 and 2022 editions of the Confluence Competition (CoCo), competing in five categories: COM, GCR, NFP, UNC and UNR. In 2021 and 2022 it also competed together with FORTify in the categories COM, TRS, GCR, UNC, UNR and NFP (only in 2022) producing certified answers. Even though it does not support many problems tested in the competition, due to the restriction to linear variable-separated TRSs, it was able to win the category for most YES results in UNR in all three years. The tool expects as input a formula and one or more TRSs, as seen in Fig. 7. It then outputs the answer YES or NO depending on whether the formula is satisfied or not by the given TRSs. The command-line interface of FORT-h is described in Appendix B. The implemented procedure closely follows the procedure described in Sect. 5.5. When called it first parses the formula (format described below) and converts it into an internal represention using de Bruijn indices as described in Sect. 7.2. Additionally, universal quan- tifiers and implications are eliminated, and negations are pushed as far as possible to the atomic subformulas. The tool then traverses the formula in a bottom-up fashion, constructing the corresponding anchored GTTs and RR automata. During this traversal we also keep track of the steps taken, to construct the certificate if necessary. To improve performance the automata are cached and reused for equal subformulas. The tree automaton representing the whole formula is then checked for emptiness. If the accepted language is empty, FORT-h reports NO, otherwise it outputs YES. To avoid having to write formulas using de Bruijn indices when using FORT-h,weuse a more convenient syntax for interacting with the tool. The input format (later called FORT syntax) is described in Appendix A. http://project-coco.uibk.ac.at/ 123 14 Page 46 of 76 A. Middeldorp et al. Fig. 7 FORT-h and FORTify 7.1.1 Witness Generation The usual output of FORT-h consists of a YES or NO answer, and possibly a certificate containing size information of the automata. To help the user in understanding why a property holds or does not hold we support witness generation. This is possible in two cases. Firstly for satisfiable existentially quantified formulas, where FORT-h can produce an n-tuple of ground terms as evidence of existence. Secondly for unsatisfiable universally quantified formulas, where the tuple presents a counterexample. For instance, if a given or synthesized TRS is ∗ ∗ ∗ ∗ not ground-confluent¬∀ s∀ t∀ u (s → t ∧ s → u ⇒ ∃ v(t → v ∧ u → v)),itis interesting to provide witnessing terms for the variables s, t,and u. Given the TRS consisting of the rules a → f(a, b) f(a, b) → f(b, a) FORT-h produces the following terms as witnesses: s = f(a, b), t = f(b, a),and u = f(f(a, b), b). To find these ground terms FORT-h first eliminates universal quantifiers using ∀=¬∃¬, pushes negations inwards and removes double negations in the formula resulting ∗ ∗ ∗ ∗ in ∃ s∃ t∃ u (s → t ∧ s → u ∧¬∃ v(t → v ∧ u → v)). In the next step FORT-h strips outermost negations, none in this case, followed by outermost existential quantifiers ∗ ∗ ∗ ∗ resulting in the so-called formula body: (s → t ∧ s → u ∧¬∃ v(t → v ∧ u → v)). Since the original formula is satisfiable, the RR automaton corresponding to the formula body must accept at least one n-tuple of ground terms. The algorithm depicted in Fig. 8generates (encoded) witnesses that are accepted by the given RR automaton. To find minimal witnesses we use a version of Dijkstra’s shortest path algorithm. We keep track of visited states in Q , a mapping W from states to terms where W(q) is a minimal witness which reaches the state q, and a priority queue P. The search is started at the states reachable in a single step from some constant. We also map from these 123 First-Order Theory of Rewriting… Page 47 of 76 14 Fig. 8 Witness generation states to the respective constants as witnesses in W. In each iteration we select the state q with the smallest witness w from P. The size of a witness is determined by the function size( w ,...,w ) = size(w )+···+ size(w ),where size(w ) is the total number of 1 n 1 n i function symbols in F occurring in w ,so⊥ is not counted. If q is a final state we have found an accepted term and return the witness w. Otherwise we check that we have not visited q previously, set W(q) = w, and enumerate all transition rules containing q on the left-hand side where all states on the left-hand side have been visited, and hence have a witness. If the transition rule is an epsilon transition q → p, then the state p has the same witness as q so we add (p,w, size(w)) to P. For a transition rule f ··· f (q ,..., q ) → p we construct 1 k 1 m a witness w = f ··· f (W(q ), ..., W(q )) and add (p,w, size(w)) to the queue. The 1 k 1 m search continues until a final state is reached or all reachable states have been visited. In the latter case the algorithm fails, since the automaton does not accept any terms. 7.1.2 Collapsing -transitions Keeping the size of automata small is crucial for the performance of FORT-h.One wayto reduce the number of states and transitions is based on the observation that when two states ∗ ∗ q an p are strongly connected by ε-transitions, which means q → p and p → q,then ε ε they are equivalent. In other words, for all ground terms s and t we have s → q if and only ∗ ∗ if t → p, and for all ground contexts C and states r we have C[q]→ r if and only if C[p]→ r. We can therefore replace all occurrences of a state in the transition rules by an equivalent one without changing the accepted language. This reduces the number of states, and may remove duplicate transition rules. In FORT-h we can further take advantage of the fact that some of the most common constructions already produce sets of ε-transitions which are transitively closed. Instead of constructing the strongly connected components, checking if two states q and p are strongly connected then boils down to checking if q → p and p → q. For example, this is case ε ε after computing the transitive closure of anchored GTT relations as in the Theorems 6 and 8. We therefore immediately collapse and eliminate the ε-transitions in the underlying tree automata after these constructions. 123 14 Page 48 of 76 A. Middeldorp et al. Fig. 9 Collapsing ε-transitions in (A, B) Example 29 The anchored GTT G = (A, B) with : a → 0 b → 1 0 →31 →21 → 4 : a → 2 b → 3 c → 4 accepts the rewrite relation of the ARS {a → b, b → a, b → c}. When constructing G = (A∪  (B, A), B∪  (A, B)), we need to compute the ε-transitions in  (A, B). + + + + The result is shown in Fig. 9(a). We can see that the graph contains one non-trivial strongly connected component, consisting of the states {2, 3}. Instead of adding all 10 ε-transitions we can therefore simplify G and  beforehand by replacing all occurrences of state 3 by state 2. This reduces the number of transitions in  (A, B) to 4, as shown in Fig. 9(b), which, when added to G, results in the GTT G = (A , B ) with : a → 0 b → 1 0 →21 →21 →42 →02 →42 → 1 : a → 2 b → 2 c → 4 0 →24 →21 → 2 Note that we also dropped the redundant transition 2 → 2 from  (A, B). 7.2 Certification Whereas witness generation can only provide some evidence to assist the user in understand- ing why certain formulas hold or not, in certification we are interested in machine-readable proofs that are verified by an independent and trustworthy certifier. The first step in the cer- tification process is to translate formulas in the first-order theory of rewriting into a format suitable for further processing. We adopt de Bruijn indices [13] to avoid alpha renaming. Example 30 Consider the formula ∗ ∗ ∗ ∗ ∀ s∀ t∀ u (s → t ∧ s → u ⇒ ∃ v(t → v ∧ u → v)) 0 1 1 0 It expresses the commutation of two TRSs, indicated by the indices 0 and 1. Using de Bruijn indices for the term variables s, t, u, v produces ∗ ∗ ∗ ∗ ∀∀∀ (2 → 1∧ 2 → 0) ⇒ ∃ (2 → 0∧ 1 → 0) 0 1 1 0 We refer to Example 32 for further explanation. The formal syntax of formulas in certificates is given below. Here rr denotes the supported binary regular relations, which are formally defined after Example 31. Likewise, 123 First-Order Theory of Rewriting… Page 49 of 76 14 rr stands for regular sets (which are identified with unary regular relations). formula ::= (rr1 rr term )| (rr2 rr term term ) 1 2 | (and formula ∗ )| (or formula ∗ )| (not formula ) | (forall formula )| (exists formula )| (true)| (false) | (restrict formula ( trs + )) term ::= nat trs ::= nat | nat - nat ::= 0| 1| 2| ··· De Bruijn indices are used for term variables and nat - denotes a TRS with index nat in which the left- and right-hand sides of the rules have been swapped. The class of linear variable-separated TRSs is closed under this operation. We use it to represent the conversion ∗ ∗ − relation↔ of a TRS R as the reachability relation→ induced by the TRS R∪ R . Example 31 The commutation property in Example 30 is rendered as follows: (forall (forall (forall (or (not (and (rr2 (step* (0)) 2 1) (rr2 (step* (1)) 2 0))) (exists (and (rr2 (step* (1)) 2 0) (rr2 (step* (0)) 1 0))))))) Here (step* (0)) denotes the RR relation→ induced by the first TRS (which is indexed by 0) and (rr2 (step* (1)) 2 0) represents the subformula [1] t ->* v of the FORT formula in Example 30. We continue with the certificate syntax of RR and RR relations: 1 2 rr ::= (terms)| (nf( trs + ))| (inf rr )| (proj (1| 2) rr ) 1 2 2 | (union rr rr )| (inter rr rr )| (diff rr rr ) 1 1 1 1 1 1 rr ::= (gtt gtt pos num )| (product rr rr )| (id rr ) 2 1 1 1 | (union rr rr )| (inter rr rr )| (diff rr rr ) 2 2 2 2 2 2 | (comp rr rr )| (inverse rr ) 2 2 2 pos ::= >=| e| > num ::= >=| 1| > gtt ::= (root-step ( trs + ))| (gsteps ( trs + ))| (inverse gtt ) | (union gtt gtt )| (acomp gtt gtt )| (gcomp gtt gtt ) | (inter gtt gtt )| (acomplement gtt )| (atc gtt )| (gtc gtt ) Here (terms) refers to T (F ), (nf( trs + )) to the normal forms (NF) induced by the union of the underlying TRSs, and (inf rr ) to the infinity predicate (INF )which 2 R is satisfied by all terms having infinitely many successors with respect to the relation R. Furthermore, (proj (1| 2) rr ) denotes projection (π) to the first (second) argument, (gtt gtt pos num ) the transformation of a GTT relation into an RR relation with corresponding context closure (Theorems 10 and 11), (id rr ) the identity relation on the underlying set, and (gtc gtt ) ((atc gtt )) the (anchored) transitive closure of the underlying (anchored) GTT relation. The (gsteps ( trs + )) construct serves as an abbreviation for (gtc ((root-step ( trs + )))). The constructs defined above closely correspond to the formalized closure operations for the predicates in the first-order theory of rewriting, summarized in the grammar in Fig. 1. 123 14 Page 50 of 76 A. Middeldorp et al. For convenience of tool authors, we add a few other constructs to rr . The certifier expands these to a sequence of basic constructs given above. rr ::= ··· | (step ( trs + ))| (step= ( trs + ))| (step+ ( trs + )) | (step* ( trs + ))| (step! ( trs + ))| (equality) | (parallel-step ( trs + ))| (root-step ( trs + )) | (root-step= ( trs + ))| (root-step+ ( trs + )) | (root-step* ( trs + ))| (non-root-step ( trs + )) | (non-root-step= ( trs + ))| (non-root-step+ ( trs + )) | (non-root-step* ( trs + ))| (meet ( trs + )) | (join ( trs + ))| (reflcl ( rr )) A certificate for a first-order formula ϕ explains how the corresponding RR automaton is constructed. We adopt a line-oriented natural deduction style. The automata are implicit. This is a deliberate design decision to keep certificates small. More importantly, it avoids having to check equivalence of finite tree automata, which is EXPTIME-complete [8, Sect. 1.7]. certificate ::= ( item inference formula info ∗ ) certificate | (empty item )| (nonempty item ) item ::= nat info ::= (size nat nat nat ) inference ::= (rr1 rr term )| (rr2 rr term term )| (and item ∗ ) 1 2 | (or item ∗ )| (not item )| (exists item )| (nnf item ) Currently the info field only serves as an interface between the tool (which provides the certificate) and the certifier to compare the sizes of the constructed automata. In the future we plan to extend this field with concrete automata. This allows to test language equivalence of a tree automaton computed by a tool that supports our certificate language and the one reconstructed by FORTify, thereby providing tool authors with a mechanism to trace buggy constructions in case a certificate is rejected. We revisit Example 3 to illustrate the construction of certificates. Example 32 The formula ϕ=∀ s∃ t (s → t ∧ NF(t )) expressing normalization is rendered as ϕ =∀∃ (1 → 0 ∧ 0 ∈ NF[0]) in de Bruijn notation. Here 1 refers to the variable s,the second and third occurrences of 0 refer to t, and the last occurrence of 0 refer to the first (and only) TRS, which has index 0. We construct the certificate bottom-up, to mimic the decision procedure. The first line is for NF[0]: (0 (rr1 (nf (0)) 0) (rr1 (nf (0)) 0)) The components can be read as follows: • item = 0 denotes the first step in our proof, • inference = rr1 (nf (0)) 0 constructs the automaton that accepts the normal forms and keeps track of the variable 0, • formula = rr1 (nf (0)) 0 denotes the subformula 0 ∈ NF[0]; it is satisfiable if and only if the automaton constructed using the description in inference is not empty. The apparent redundancy will disappear when we continue. We proceed by expressing the ∗ ∗ relation → and subsequently make sure that the second component of → is in normal 0 0 form: 123 First-Order Theory of Rewriting… Page 51 of 76 14 (1 (rr2 (step* (0)) 1 0) (rr2 (step* (0)) 1 0)) (2 (and (1 0)) (and ((rr2 (step* (0)) 1 0) (rr1 (nf (0)) 0)))) Line 1 is similar to line 0. The inference step (and 1 0) in line 2 constructs an RR automa- ton that accepts the intersection of the relations modeled in lines 1 and 0. This automaton corresponds to A in Example 3. The cylindrification step from A to A in Example 3 is 5 1 4 left implicit. We continue with the projection of variable 0 and afterwards complement the resulting automaton. This is done by an exists followed by a not inference step: (3 (exists 2) (exists (and ((rr2 (step* (0)) 1 0) (rr1 (nf (0)) 0))))) (4 (not 3) (not (exists (and ((rr2 (step* (0)) 1 0) (rr1 (nf (0)) 0)))))) The inference steps until this point describe the construction of A in Example 3.Wecomplete the certificate by introducing the remaining operators: (5 (exists 4) (exists (not (exists (and ((rr2 (step* (0)) 1 0) (rr1 (nf (0)) 0))))))) (6 (not 5) (not (exists (not (exists (and ((rr2 (step* (0)) 1 0) (rr1 (nf (0)) 0)))))))) (7 (nnf 6) (forall (exists (and ((rr2 (step* (0)) 1 0) (rr1 (nf (0)) 0)))))) (nonempty 7) The nnf inference step does not modify the tree automaton computed in step 6 (which corresponds to A in Example 3) but checks the equivalence of the formula in line 6 with the one of line 7, which corresponds to the input formula ϕ . The equivalence check incorporates ∀ elimination, negation normal form, and associativity, commutativity and idempotency of∧ and∨. In the future we might add support for additional equivalences in first-order logic. The final step (nonempty 7) checks that L(A ) = ∅. So this certificate claims that the input TRS is normalizing. For TRSs that do not satisfy ϕ, the final line in the certificate would be (empty 7). In the previous example we intentionally skipped over some details to convey the under- lying intuition. First of all, the rr construct (step* (0)) is derived and internally unfolded via (anchored) GTTs into (gtt (gtc (root-step 0)) >= >) Starting from an anchored GTT that accepts the root step relation induced by the first (and only) TRS in the list, an application of the GTT transitive closure operation followed by a multi-hole context closure operation with at least one hole that may appear in any position, an RR automaton that accepts the relation→ is constructed. We also mentioned that cylin- drification is implicit. The same holds for the projection operation that is used in the exists inference steps. A projection takes place in the first component if the variable 0 is present in the list of variables, otherwise the inference step preserves the automaton. This approach is sound as variables indicate the relevant components of the RR automaton. Thanks to the de Bruijn representation, the innermost quantifier refers to variable 0, the first component in the given RR automaton. However we must keep track of all variables occurring in the surrounding formula and update that list accordingly. 123 14 Page 52 of 76 A. Middeldorp et al. 7.3 FORTify The example in the preceding subsection makes clear that certificate can be viewed as a recipe for the certifier to perform certain operations on automata and formulas to confirm the final (non-)emptiness claim. In particular, checking a certificate is expensive because the decision procedure for the first-order theory is replayed using code-generated operations from a verified version of the decision procedure. In this subsection we describe the steps we performed to turn the Isabelle formalization of the decision procedure into our certifier FORTify. The formalization is split into two parts. The second part is about the certification process, but we start our description with the first part [35] which serves as a general tree automata library. This part includes bottom-up tree automata with ε-transitions, (anchored) ground tree transducers, encoding of regular relations, and their respective closure properties. Addi- tionally it contains a framework to simplify code generation of inductively defined sets as in Fig. 3. Such inductive sets, if they are finite, can be computed by a saturation procedure. We provide an abstraction for that, which essentially does Horn inference without negative atoms. The point of the abstraction is that it separates a common iterative or recursive part of saturation procedures (which gives rise to non-trivial correctness proofs) from the enumera- tion of inferences without premises (H , see below), and inferences induced by a single new conclusion (H , also below), which usually are set comprehensions that can be computed in a very straightforward way. Definition 19 A positive Horn inference system is given by a set of atoms A (with elements α, β,…)and set H of inference rules of the shape α ∧···∧ α → β.Wewrite!→ β if 1 n the list of premises is empty. Each positive Horn inference system defines a predicate H on atoms inductively by the rule α ∧···∧ α → β ∈ H H(α ) for 1  i  n 1 n i H(β) Example 33 Consider the inference rules from Fig. 3. To obtain a positive Horn inference system for given automata A and B,let A = Q × Q where Q is the set of states occurring in A or B.The set H consists of the following inference rules: • (p, r ) → (q, r ) if p → q and r ∈ Q, • (p, q) → (p, r ) if q → r and p ∈ Q,and • (p , q ) ∧ ... ∧ (p , q ) → (p, q) if f (p ,..., p ) → p and f (q ,..., q ) → q. 1 1 n n 1 n A 1 n B These Horn clauses correspond directly to Fig. 3 with p  q replaced by (p, q). It is easy to see that the resulting H satisfies (p, q) ∈ H if and only if p  q. We have formalized an abstract marking algorithm for positive Horn inference systems. In order to use this algorithm, the user has to provide implementations for two building blocks, H and H , which are given by 0 1 H ={ β |!→ β ∈ H} H (α, B)={ β | α ∧···∧ α → β ∈ H and α∈{ α ,...,α }⊆ B∪{ α}} 1 1 n 1 n In essence, H computes inferences without premises, whereas H (α, B) provides all pos- 0 1 sible conclusions involving a particular premise α together with other premises fulfilled by B. These two ingredients are sufficient to implement a simple marking algorithm: 123 First-Order Theory of Rewriting… Page 53 of 76 14 saturate_rec(α, I ): saturate: if α ∈ I then return I I := ∅; else for all α ∈ H do J := { α}∪ I ; I := saturate_rec(α, I ) for all β ∈ H (α, I ) do return I J := saturate_rec(β, J ); return J Most of the work is performed by saturate_rec, whose purpose is to add a newly inferred atom α to an accumulator I of previously inferred atoms, taking into account all further inferences that can be made using α and elements of I . It relies on H for computing the set of atoms that can be inferred using β at least once and elements of I for other premises. The main method saturate iterates over the elements of H and adds them to the accumulator I using the saturate_rec helper, starting with I = ∅. We formalized soundness of saturate, and of refinements to lists and finite sets. Example 34 Continuing from Example 33, we note that the computation of H and H can 0 1 often be done efficiently without ever computing the full set H. For the inference rules from Fig. 3, we obtain the following descriptions: H ={ (p, q) | f → p and f → q} 0 A B H ((p, q), B)={ (r , q) | p → r}∪{ (p, r ) | q → r}∪ H 1 A B where H consists of all pairs (p , q ) such that f (p ,..., p ) → p f (q ,..., q ) → q 1 n A 1 n B with (p , q ) ∈ B∪{ (p, q)} for all 1  i  n,and (p, q) = (p , q ) for some 1  i  n.This i i i i last component is slightly complicated (but not much more complicated than the definition of H itself). On the other hand, the first two components of H make no reference to Q,which is a welcome simplification. Isabelle/HOL has a predicate compiler [5] that produces executable code for certain inductive sets, but it is quite restricted; basically, it works by searching all possible derivation trees to arrive at a conclusion. This easily leads to non-termination when there are infinitely many such trees, which often happens. For example, using the rules in Fig. 3,ifwewantto check whether 1  2and thereisan ε-transition 1 → 1, then the first inference rule is a possible candidate for the last inference step, leading us to check 1  2 recursively, ad infinitum. In our formalization, GTT compositions and GTT transitive closure are implemented on top of positive Horn inference. The other building blocks are derived directly from the definitions, using automatic and some manual refinement to obtain concrete implementations. This concludes the first part. In the remainder of this section details of the second part are discussed [33]. We use the FOL-Fitting library [4], which is part of the Archive of Formal Proofs, to connect the first-order theory of rewriting and first-order logic. The translation is more or less straightforward. We interpret RR constructions as predicates and RR con- 1 2 structions as relations in first-order logic and prove both interpretations to be semantically equivalent: lemma eval_formula F Rs α f = eval α undefined (for_eval_rel F Rs)(form_of_formula f ) 123 14 Page 54 of 76 A. Middeldorp et al. With this equivalence we are able to define the semantics of formulas: definition formula_satisfiable where formula_satisfiable F Rs f ←→ (∃ α. range α ⊆ T F ∧ eval_formula F Rs α f ) definition formula_unsatisfiable where formula_unsatisfiable F Rs fm ←→ (formula_satisfiable F Rs fm = False) definition correct_certificate where correct_certificate F Rs claim infs n≡ (claim = Empty ←→ (formula_unsatisfiable (fset F)(map fset Rs) (fst (snd (snd (infs! n)))))∧ claim = Nonempty ←→ formula_satisfiable (fset F)(map fset Rs) (fst (snd (snd (infs! n))))) Last but not least we define the important function check_certificate which takes as input a signature, a list of TRSs, a Boolean, a formula, and a certificate. This function first verifies that the given formula and the claim corresponds to the ones referenced in the certificate and afterwards checks the integrity of the certificate. The following lemmata, which are formally proved in Isabelle, state the correctness of the check_certificate function: lemma check_certificate F Rs A fm (Certificate infs claim n)= Some B ⇒ fm = fst (snd (snd (infs! n)))∧ A= (claim = Nonempty) lemma check_certificate F Rs A fm (Certificate infs claim n)= Some B ⇒ (B= True −→ correct_certificate F Rs claim infs n)∧ (B= False −→ correct_certificate F Rs (case claim of Empty ⇒ Nonempty | Nonempty ⇒ Empty) infs n) The first lemma ensures that our check function verifies that the provided parameters fm (formula) and A (answer satisfiable/unsatisfiable) match the formula and the claim stated in the certificate. The second lemma is the key result. It states that the check function returns Some True if and only if the certificate is correct. The only-if case is hidden in the last two lines. More precisely, if the claim of the certificate is wrong then negating the claim (the first-order theory of rewriting is complete) leads to a correct certificate. Therefore, if our check function returns Some None then the certificate is correct after negating the claim. Our check function returns None if the global assumptions (the input TRS is not linear variable-separated, the signature is not empty, etc.) are not fulfilled. We plan to extend the check_certificate function in the near future such that it reports these kinds of errors. A central part of the formalization is to obtain a trustworthy decision procedure to verify certificates. Hence we use the code generation facility of Isabelle/HOL to produce an exe- cutable version of our check_certificate function. Isabelle’s code generation facility is able to derive executable code for our constructions with the exception of inductively defined sets. We use the abstract Horn inference system framework of Definition 19 to obtain executable code for the following constructions defined as inductive sets: • reachable and productive states of a tree automaton, 123 First-Order Theory of Rewriting… Page 55 of 76 14 Table 4 Formalization statistics Topics Lines Facts Defs Utility files 1892 187 19 Terms, context, and rewriting 3969 454 97 Horn inference system 462 39 17 Tree automata 2891 319 66 Regular relations 4016 285 65 Primitives and context closure 4043 318 43 FORT decision procedure 2023 107 60 Signature extension 2874 182 15 Implementation files 3058 190 81 Total 25, 228 2081 463 • states of tree automata obtained by the subset construction, • ε-transitions for the composition and transitive closure constructions of (anchored) GTTs, • an inductive set needed for the tree automaton for the infinity predicate. At this point we can use Isabelle’s code generation to obtain an executable check function. The resulting code-generated certifier is called FORTify. The overall design of FORTify is shown in the bottom half of Fig. 7. It can be viewed as two separate modules A and B. Module B is the verified Haskell code base that is generated by Isabelle’s code generation facility, containing the check_certificate function and the data type declarations for formulas and certificates. To use this functionality, we wrote a parser which translates strings representing formulas (signatures, TRSs, certificates) to semantically equivalent formulas (signatures, TRSs, certificates) represented in the data types obtained from the generated code. This was done in Haskell and refers to module A in Fig. 7. Module A accepts formulas in FORT syntax. Hence it also applies the conversion to the de Bruijn representation. After the translation in module A, the check_certificate function in module B is executed and its output is reported. Importantly, the code in module A is not verified in Isabelle. Correctness of FORTify must therefore assume correctness of module A as well as the correctness of the Glasgow Haskell Compiler, which we use to generate a standalone executable from the generated code. Table 4 lists some statistics of the underlying formalization. 7.4 Synthesis Mode FORT can be used to synthesize TRSs that satisfy properties given by the user (which is different from finding witnessing terms in formulas as described in Sect. 7.1). This is useful for finding counterexamples and non-trivial TRSs for exam exercises as well as competitions. The synthesis procedure for a given signature F boils down to generating candidate TRSs and then checking the given property as shown in Fig. 10. The latter is done using a call to the decision procedure decide(F,ϕ, C ), which checks if the formula ϕ holds for the system C over the domain T (F ). To limit and control the search space we introduce the parameters r, R, D and v: • r and R specify the lower and upper bound on the number of rewrite rules, • D specifies the upper bound on the height of the left- and right-hand sides of the rules, • v specifies the number of different variables that may appear in the rewrite rules. 123 14 Page 56 of 76 A. Middeldorp et al. Fig. 10 Simplified synthesis procedure (for a fixed signature) By default the procedure searches for left-linear right-ground TRSs, but can also synthesize linear variable-separated systems. This affects the generation of candidate TRSs S in Fig. 10. To extend the functionality and improve performance, the implementation in the synthesis tool (FORT-s) differs from the procedure in Fig. 10. Since the greatest cost when running the procedure comes from executing the decision procedure, care is taken to not generate and check equivalent system more than once. To this end, we keep track of fresh terms from previous iterations and only generate rules containing at least one new term, and the fresh terms in T must contain at least one new term in an argument position. Similar improvements are used when generating the rewrite systems. The second major performance improvement is the possibility of checking systems in parallel. It is of interest to synthesize TRSs that depend on one or more other TRSs. This can be done by passing additional TRSs to FORT-s in addition to a formula which references multiple systems. The additional systems are then also passed to the decision procedure. For example, if we want to transform our leading TRS R (see Example 1) into an equivalent complete TRS (on ground terms), we pass both R and the formula ∗ ∗ (GWCR ∧ SN )∧∀ s∀ t (s ↔ t ⇐⇒ s ↔ t ) 0 0 0 1 to FORT-s. Here the index 1 refers to R and the index 0 to the system to be synthesized. This returns the TRS consisting of the rules a →bf(b) → g(a, a) g(b, b) → a Using formulas referencing multiple TRSs FORT-s can also be used to synthesize multiple systems. For convenience FORT-s supports multiple ways to specify the signature used during synthesis. The full user interface of FORT-s is given in Appendix C. 7.5 Undecidability of Synthesis Since the first-order theory is decidable for linear variable-separated TRSs a natural question arises. Is synthesis also decidable for these systems? In other words, can we determine if there exists a linear variable-separated TRS satisfying a given property? Unfortunately this is not the case. 123 First-Order Theory of Rewriting… Page 57 of 76 14 Theorem 17 The following problem is undecidable: instance: a closed formula ϕ in the first-order theory of rewriting question: does some linear variable-separated TRS R satisfy ϕ Proof We show the undecidability by a reduction from Post correspondence problem. Let P be a finite set of pairs of non-empty strings over the alphabet{0, 1}.Wedefineaformula ϕ in the first-order theory of rewriting that is satisfiable if and only if P has a solution. To this end, consider the following predicates: node(x ) := x → x next(x , y) := node(x )∧ node(y)∧ x → y ∧ x = y step := ∀x node(x )∧ x = e ⇒ ∃ y next(x , y) unique := ∀x∀y∀z next(x , y)∧ next(x , z) ⇒ y = z linear := step∧ unique value(x , 0) := x → a∧¬(x → b) value(x , 1) := x → b∧¬(x → a) finite := ¬∃ x INF (x ) Positions in a solution string are represented by nodes, which are linearly ordered. Nodes are characterized by self-loops. The special nodes s and e mark the starting and final positions in a solution of P. The predicate finite ensures that solution strings are finite. We have two additional elements, a and b that correspond to the symbols 0 and 1. border(x , y) := node(x )∧ node(y)∧∃ z (¬node(z)∧ x → z ∧ z → y) The border predicate marks the two positions in a solution string corresponding to the decomposition into first and second components. The latter is checked by the solution predicate: match(x , x ··· x ,v ··· v ) := next(x , x )∧ value(x ,v ) 0 1 k 1 k i−1 i i i i=1 pair(x , y,v,w) := ∃ x ...∃ x ∃ y ...∃ y border(x , y )∧ 1 |v| 1 |w| |v| |w| match(x , x ··· x ,v)∧ match(y, y ··· y ,w) 1 |v| 1 |w| solution := ∀x∀y border(x , y) ⇒ (x = y ∧ x = e)∨ pair(x , y,v,w) (v,w)∈P The formula ϕ is now defined as ∃ s∃ e∃ a∃ b s = e∧ border(s, s)∧ linear∧ finite∧ solution Note that the witnessing TRSs constructed in the above proof are actually abstract rewrite systems (ARSs) that consist of rewrite rules between constants. The construction is illus- trated in Fig. 11 , for the PCP instance P ={ (1, 011), (10, 11), (001, 00)} with solution 001|10|001|1 = 00|11|00|011. The separation bars correspond to the elements b , b and 1 2 b . Node n witnesses e. Elements 0 and 1 witness a and b. 3 9 123 14 Page 58 of 76 A. Middeldorp et al. Fig. 11 The construction for PCP instance P The synthesis problem is obviously decidable for ARSs over a fixed signature, but remains undecidable for TRSs over a fixed signature, since we can still generate an arbitrary number of ground terms using non-constant function symbols. Take for example the signature{E, s, 0}, where E and s are unary function symbols and 0 is a constant. We can then represent an arbitrary number n of objects (nodes, borders and values in the encoding) using ground terms of the shape E(s (0)). The rules of the ARS correspond to rules between such ground terms of the generated TRS. (The inclusion of the function symbol E removes any possibility of unwanted overlap between rules of the TRS.) 8 Experiments In this section we describe the experiments we performed with FORT-h, FORT-s,and FOR- Tify. We include version 1.0 of FORT-h, which was first published as part of an artifact in conjunction with [42]. The current version of FORT-h is 2.0. Full details of the experiments are available from the website accompanying this paper. Precompiled binaries of FORT-h 2.0, FORT-s,and FORTify are available from the same site. All experiments were run on a computer equipped with an Intel Core i7-5930K processor with 6 cores, and with 32 GB of memory. To remove any ambiguity in the calls made to the tools we use FORT-syntax (see Appendix A) to specify formulas in this section. This also aids in replicating the experiments. 8.1 FORT-h and FORTify For the experiments reported in this section we used a timeout of 60 s for the decision tools and 600 s for FORTify. 8.1.1 Comparing Different Representations of Properties The problems for these experiments are taken from the Confluence Problems database (COPS), and consists of 122 left-linear right-ground TRSs. The formulas were taken from the experiments reported in [46]. Experiment 1 The first three "forall s, t, u (s ->*t&s->* u=>t join u)" (15) https://fortissimo.uibk.ac.at/tacas2021/ https://fortissimo.uibk.ac.at/jar https://cops.uibk.ac.at/ 123 First-Order Theory of Rewriting… Page 59 of 76 14 Table 5 FORT-h (with FORTify)and FORT-j run on GCR formulas YES ∅-time ✔ NO ∅-time ✔ ∞ total (✔) time (15) FORT-h 2.0 37 0.89 s 37 84 0.69 s 81 1 151.12 s (0.8 h) FORT-h 1.0 36 0.26 s 10 84 0.56 s 16 2 176.23 s (17.6 h) FORT-j 37 0.31 s – 82 0.52 s – 3 234.08 s (16) FORT-h 2.0 38 1.50 s 37 84 0.06 s 81 0 62.13 s (0.9 h) FORT-h 1.0 37 1.48 s 10 84 0.09 s 16 1 122.55 s (17.8 h) FORT-j 37 0.32 s – 82 0.50 s – 3 233.20 s (17) FORT-h 2.0 37 0.91 s 37 83 0.04 s 81 2 156.64 s (1.0 h) FORT-h 1.0 36 0.45 s 6 83 0.08 s 9 3 202.64 s (18.2 h) FORT-j 37 0.32 s – 82 0.55 s – 3 236.69 s "forall s, t, u (s ->*t&s->u=> t join u)" (16) "forall t, u (t <->* u => t join u)" (17) denote different but equivalent formulations of ground-confluence (GCR). The results are showninTable 5, where the YES (NO) column shows the number of systems determined to be (non-)ground-confluent together with average time (∅-time) the tool took. The∞ column is the number of timeouts. To compare overall performance the total time column contains the sum of all run times, including timeouts but excluding the time taken by FORTify.The ✔ columns show the numbers of certifiable results as well as the overall time taken by FORTify. These results show that, even though they have the same meaning, the choice of formula has an impact on performance. Most notably this can be seen when comparing the number of solved problems by FORT-h 2.0. The formula (16) (semi-confluence) was fastest with no timeouts, followed by (15) with one timeout and (17) with two. It is apparent that formulas containing conversion (↔ ) are especially slow, which we will also see in later experiments. Further note that FORT-h 2.0 can solve an additional problem compared to the 1.0 version, for each formula. Interestingly FORT-h (2.0) is generally faster and can solve more problems than FORT-j even though the latter implements parallelism. This performance advantage is more promi- nent in systems which are non-confluent where FORT-h can solve more problems, while for problems with the answer YES, FORT-j can solve close to the same number of problems, while taking less time per problem in general. The table also shows that FORTify can certify most of the results, which is a large improvement over the previous version. Here the differ- ence between the three formulas is not as visible, but it is also faster on (16)and (15), and slowest on (17). The times for FORTify must also be seen in the context that it ran on more problems on the first two formulas, since FORT-h could produce more certificates. No wrong results by the decision tools where identified. Experiment 2 The second set of formulas represents the normal form property, restricted to ground terms (GNFP): "forall t, u (t <->* u & NF(u) => t ->* u)" (18) "forall s, t, u (s ->t&s->! u=>t->* u)" (19) "forall t (WN(t) => CR(t))" (20) The results for these are shown in Table 6. The same pattern is observed, where even though 123 14 Page 60 of 76 A. Middeldorp et al. Table 6 FORT-h (with FORTify)and FORT-j run on GNFP formulas YES ∅-time ✔ NO ∅-time ✔ ∞ Total (✔) time (18) FORT-h 2.0 59 0.30 s 57 63 0.04 s 63 0 20.37 s (0.5 h) FORT-h 1.0 59 0.70 s 31 63 0.07 s 20 0 45.62 s (14.6 h) FORT-j 59 0.23 s – 63 0.39 s – 0 38.16 s (19) FORT-h 2.0 59 0.02 s 59 63 0.01 s 63 0 1.76 s (0.1 h) FORT-h 1.0 59 0.03 s 46 63 0.01 s 50 0 2.55 s (6.3 h) FORT-j 59 0.22 s – 63 0.30 s – 0 31.83 s (20) FORT-h 2.0 59 0.03 s 56 62 0.11 s 62 1 68.83 s (0.8 h) FORT-h 1.0 59 0.05 s 42 62 0.12 s 45 1 70.51 s (8.6 h) FORT-j 59 0.31 s – 62 0.64 s – 1 117.86 s Table 7 FORT-h 2.0 run on YES ∅-time NO ∅-time ∞ total time " forall s,t (s <->* t)" (21) 91 0.10 s 31 0.42 s 0 22.00 s with differing encodings of (22) 91 0.10 s 31 0.48 s 0 24.22 s conversion (23) 91 0.07 s 31 0.41 s 0 19.31 s all three can (dis)prove satisfaction for the same formulas, FORT-h 2.0 is faster than FORT-j overall, and has improved over FORT-h 1.0. Since the representations containing conversion (↔ ) in the previous experiments are outperformed by the other representations, it is often a good idea to avoid it. Obviously this is not always possible. Take the properties UNC, CE or consistency for example. It is therefore important to choose the correct representation in the primitive automata constructions, to ensure good performance when conversion cannot be avoided. Experiment 3 We tested the following three representations of conversion for a TRS R: ε − ε + ((→ ) ∪→ ) ) (21) R R ε − ε) ((→ ) ◦→ ) (22) R R ε + ((→ ) ) (23) R∪ R The representation (21) is the one listed in Table 2. Using composition (◦) instead of union as in (22) works because ε − ε ε − − ε (→ ) ◦→ = ((→ ) ◦−→ ) ∪ ((−→ ) ◦→ ) R R R R R R ε ε The third representation (23) uses the identity → =↔ and is the default used R∪ R by FORT-h. The results of running FORT-h 2.0 on the COPS dataset, using the formula " forall s, t (s <->* t)" for consistency with the three different representations of conversion can be seen in Table 7.Wecan seethat(23) is the fastest with and overall runtime of 19.31 s. It is about 12% faster than (21) and about 20% faster than (22). Also important is that (23) produces smaller automata, which leads to better performance when conversion is embedded within larger formulas. Consider for example COPS #741: if(true, a, x ) →aif(true, g(a), x ) → g(a) g(a) → g(g(a)) 123 First-Order Theory of Rewriting… Page 61 of 76 14 Table 8 FORT-h 2.0 (with FORTify) run on normalization with different encodings of NF YES ∅-time ✔ NO ∅-time ✔ ∞ total (✔) time "NF(t)" 41 0.02 s 41 81 0.00 s 81 0 0.85 s (20.50 s) "∼exists u (t -> u)" 41 0.02 s 41 81 0.00 s 81 0 1.05 s (23.71 s) if(true, b, x ) →bif(true, g(b), x ) → g(b) g(b) → a if(false, x , a) →aif(false, x , g(a)) → g(a) f(a, b) → b if(false, x , b) →bif(false, x , g(b)) → g(b) f(g(g(a)), x ) → b The RR automata representing (21)and (22) both contain 233 states, 7927 transitions and 9 ε-transitions before trimming, and 132 states and 4937 transitions after. In comparison the automaton for (23) contains 152 states, 3975 transitions and 9 ε-transitions before, and 75 states with 2313 transitions after trimming. Overall (23) therefore has less than half the number of transitions in this example, which can have a significant effect in any later closure operations. The final experiment in this subsection involves the normal form predicate NF(t ),which is implemented in FORT-h according to the description in Sect. 5.4, instead of using the equivalent formula¬∃ u (t → u). Experiment 4 Consider the formula "forall s (exists t (NF(t) & s ->* t))" for normalization and COPS #503: f(a, a, b, b) → f(c, c, c, c) a →ba →cb →ab → c When using the formula¬∃ u (t → u) for NF(t), FORT-h first constructs the RR automaton A for t → u, with 4 states and 15 transitions. It then projects to construct the automaton A for∃ u (t → u) with 4 states and 13 transitions, and finally it has to determinize A and 2 2 construct the complement for the negated formula¬∃ u (t → u), resulting in the automaton A with 4 states and 259 transitions before and 1 state with two 2 transitions after trimming. If instead the direct normal form predicate is used, FORT-h immediately produces the latter automaton, without having to construct the intermediate automata or having to trim. The impact on runtime can be seen in Table 8 . It is rather small for FORT-h,but for FORTify the direct construction is about 13% faster. When looking at the sizes of the automata, the average untrimmed automaton A , for our dataset of left-linear right-ground COPS problems, contains 75.8 transitions while the average automaton for the normal form predicate contains 13.3 transitions. 8.1.2 Properties Involving Multiple TRSs We also ran experiments to test performance on properties involving two TRSs. As a dataset we constructed problems of all ordered pairs of COPS problems, resulting in 7503 pairs. Experiment 5 The first property tested was ground-commutation (GCOM). The results, pre- sented in Table 9, show that FORT-h is ahead of FORT-j here as well. It can (dis)prove more problems, timing-out on only two as compared to 49 problems. Additionally it does so in significantly less time. With FORTify we can see a large improvement over the old version. It is able to certify close to 98% of the results found by FORT-h 2.0. 123 14 Page 62 of 76 A. Middeldorp et al. Table 9 FORT-h (with FORTify)and FORT-j run on GCOM YES ∅-time ✔ NO ∅-time ✔ ∞ total (✔) time FORT-h 2.0 1381 0.10 s 1368 6120 0.02 s 5965 2 374.63 s (51.5 h) FORT-h 1.0 1381 0.16 s 878 6120 0.03 s 3666 2 517.32 s (681.5 h) FORT-j 1354 1.46 s – 6100 0.94 s – 49 10670.89 s In the 2019 edition of the Confluence Competition [41] three tools contested the commu- tation (COM) category: ACP [2], CoLL [49], and FORT-j. On input problem COPS #1118 the tools gave conflicting answers. Example 35 COPS #1118 is about the commutation of the TRSs COPS #669 a →cf(a) →bb →bb → h(b, h(c, a)) and COPS #695 h(a, a) →cb → h(b, a) b →af(c) →cc → a To determine the correct answer we use FORT-h 2.0 to produce a certificate for ground- commutation by calling > fort-h -c cert -i "GCom([0],[1])" 1118.trs YES This produces the following certificate: (0 (rr2 (comp (inverse (step* (1))) (step* (0))) 0 1) (rr2 (comp (inverse (step* (1))) (step* (0))) 0 1) (size 13 53 0)) (1 (rr2 (comp (step* (0)) (inverse (step* (1)))) 0 1) (rr2 (comp (step* (0)) (inverse (step* (1)))) 0 1) (size 11 47 0)) (2 (not 1) (not (rr2 (comp (step* (0)) (inverse (step* (1)))) 0 1))) (3 (and (0 2)) (and ((rr2 (comp (inverse (step* (1))) (step* (0))) 0 1) (not (rr2 (comp (step* (0)) (inverse (step* (1)))) 0 1))))) (4 (exists 3) (exists (and ((rr2 (comp (inverse (step* (1))) (step* (0))) 0 1) (not (rr2 (comp (step* (0)) (inverse (step* (1)))) 0 1)))))) (5 (exists 4) (exists (exists (and ((rr2 (comp (inverse (step* (1))) (step* (0))) 0 1) (not (rr2 (comp (step* (0)) (inverse (step* (1)))) 0 1))))))) (6 (not 5) (not (exists (exists (and ( (rr2 (comp (inverse (step* (1))) (step* (0))) 0 1) https://cops.uibk.ac.at/results/?y=2019&c=COM 123 First-Order Theory of Rewriting… Page 63 of 76 14 Table 10 FORT-h 2.0 (with FORTify) run on (G)CE and G(NE) YES ∅-time ✔ NO ∅-time ✔ ∞ total (✔) time GCE 157 0.70 s 150 7162 0.94 s 6736 184 5.0 h (125.6 h) CE 151 0.74 s 144 7168 0.93 s 6739 184 5.0 h (127.1 h) GNE 181 0.02 s 181 7320 0.04 s 7308 2 448.75 s (5.4 h) NE 177 0.02 s 177 7324 0.04 s 7312 2 446.54 s (5.6 h) (not (rr2 (comp (step* (0)) (inverse (step* (1)))) 0 1)))))))) (7 (nnf 6) (forall (forall (or ( (not (rr2 (comp (inverse (step* (1))) (step* (0) )) 0 1)) (rr2 (comp (step* (0)) (inverse (step* (1)) s))0 1)))))) (nonempty 7) When passing this certificate to FORTify, after 0.2 s the output Certified is produced, so we can be assured that the TRSs do commute. Note that the inference steps 0 and 1 contain the optional size information. Here (sizekmn) means the underlying RR automaton constructed by FORT-h 2.0 contains k states, m transitions, and n ε-transitions. Experiment 6 For the second experiment using multiple TRSs we tested FORT-h 2.0 and FORTify on conversion equivalence and normalization equivalence, once for all terms and once for only ground-terms. FORT-h 1.0 and FORT-j have not implemented the necessary signature extension results to cover these properties, and are therefore not run. The results can be seen in Table 10. Comparing the properties to the corresponding ground-properties, we can see that FORT-h 2.0 succeeds to find results on the same number of problems. However, six results moved from YES to NO in the case of (G)CE and four in the case of (G)NE.These correspond to TRSs where the additional constants are needed to disprove the property. While the run times of FORT-h 2.0 stayed almost the same when comparing the ground and non-ground properties, we can see that FORTify does take longer to certify results on the non-ground properties. This is to be expected, since the additional constants lead to larger automata. Simply by having a larger signature, some of the atomic constructions produce more transition rules. While this is usually only a small difference it can have a significant effect when embedded within a bigger formula. 8.1.3 Optimizations To show this effect, and the improvement caused by Lemma 39 consider the following example. Example 36 Consider COPS #214 a →ba → f(a) b → f(f(b)) f (b) → b where f represents 64 nested applications of f. To check UNC, FORT-h 2.0 extends the signature as needed and uses the formula for GUNC internally represented as ¬∃(∃((NF(0)× NF(1))∧ 0 = 1∧ 0 ↔ 1)) 123 14 Page 64 of 76 A. Middeldorp et al. Table 11 FORT-h 2.0 run on UNC with and without Lemma 39 YES ∅-time NO ∅-time ∞ total time "UNC" (with Lemma 39) 72 0.29 s 49 0.20 s 1 90.92 s "{+2} GUNC" (two constants) 72 0.54 s 49 0.20 s 1 108.52 s Fig. 12 Graph presentation of COPS #116 In this case no constants have to be added, since the TRS is ground. The intermediate automa- ton A for the subformula NF(0)×NF(1) contains no transitions, since the TRS has no normal forms for the given signature. For the automaton A of 0 = 1 we have 13 transitions and A 2 3 for 0 ↔ 1 has 150,569 transitions. Like we have seen in earlier experiments, the automaton for conversion is clearly the largest, and would also take the largest amount of time to con- struct. However, since A is empty, the intersection with A and then further with A will 1 2 3 also be empty. And due to the lazy evaluation strategy of Haskell the third automaton will never be constructed. Therefore FORT-h 2.0 can almost instantly (0.01 s) determine that the automaton for the formula within the negation is empty, and conclude that UNC holds. How- ever, if we were to ignore the optimization introduced by Lemma 39 and add two constants the automaton A is no longer empty, since we added two normal forms to the domain. This changes the numbers as follows: The automaton A would contain 15 transitions and 3 states, A has 31 transitions and 3 states, and A has 150,571 transitions and 4356 states. Since 2 3 none of the automata are empty we must construct the intersection A ∩ A containing 34 1 2 transitions and 6 states. After trimming this drops to 20 transitions and 4 states. The intersec- tion (A ∩ A )∩ A then results in an automaton with 132,652 transitions and 8584 states. 1 2 3 Only after trimming we see that this automaton is empty to conclude that UNC holds. Overall this takes FORT-h 2.0 7.15 s, which is orders of magnitude slower than with the optimization. While such large speedups are not the norm, the overall runtime on the COPS dataset for UNC drops by about 16%, as seen in Table 11. Example 37 To see that the optimization of collapsing strongly connected states, introduced in Sect. 7.1, can have a significant effect consider COPS #116. It is an ARS consisting of 26 rules presented as a graph in Fig. 12. To check if it is consistent we can use the formula "∼forall s, t (s <->* t)" which is internally represented as∃(∃(¬ (0 ↔ 1))). For this FORT-h constructs the automaton A for 0 ↔ 1, consisting of 8 states 418 transitions and 3 ε-transitions. After eliminating the ε-transitions and trimming, we are left with 1 state c ∗ and 361 transitions. The complement automaton A which represents ¬(0 ↔ 1) has the same size, which drops to zero after trimming, showing that the system is not consistent. Overall FORT-h takes 0.34 s. If we however remove the optimization and do not collapse strongly connected compo- nents, we get significantly larger automata. The automaton A grows to 8427 states, 2827 123 First-Order Theory of Rewriting… Page 65 of 76 14 Table 12 FORT-h 2.0 run on " forall s, t (s <->* t)" with/out collapsing SCCs YES ∅-time NO ∅-time ∞ Total time Collapsing SCCs 91 0.07 s 31 0.41 s 0 19.31 s Unoptimized 91 0.14 s 28 1.21 s 3 223.82 s Table 13 FORT-h 2.0 compared to other tools YES ∅-time NO ∅-time ∞/MAYBE Total time GCR FORT-h 2.0 37 0.06 s 84 0.04 s 1 65.82 s AGCP 24 0.02 s 79 0.07 s 19 276.42 s NFP FORT-h 2.0 55 0.02 s 67 0.01 s 0 1.76 s CSI 55 0.79 s 61 1.02 s 6 186.94 s UNC FORT-h 2.0 72 0.31 s 49 0.21 s 1 92.75 s ACP 70 0.08 s 47 0.86 s 5 345.91 s CSI 71 0.83 s 46 1.12 s 5 187.37 s UNR FORT-h 2.0 96 0.02 s 26 0.01 s 0 2.21 s CSI 86 0.81 s 26 0.76 s 10 209.12 s COM FORT-h 2.0 1365 0.10 s 6135 0.04 s 3 578.3 s CoLL 1349 0.21 s 4015 0.13 s 2139 19.5 h ACP 1238 0.01 s 3519 0.04 s 2746 5.0 h transitions and 851,916 ε-transitions. At this point the procedure usually eliminates the ε- transitions and trims the automaton, but FORT-h does not manage to do so within the 60 s timeout. The overall improvement on testing consistency can be seen in Table 12. 8.1.4 Comparison with Other Tools As a last experiment we compare FORT-h to a number of state of the art tools. For the properties GCR, NFP, UNC, UNR and COM we chose the following tools that competed in the corresponding categories in the confluence competition in 2021: ACP [2]in UNC and COM, AGCP [1]in GCR, CSI [44]in NFP, UNC and UNR,and CoLL [49]in COM. All these tools implement various sufficient conditions for the corresponding property and are not limited to linear variable-separated or left-linear right-ground TRSs. For the sake of comparing them to FORT-h we run them only on the left-linear right-ground TRSs in COPS, and on the pairs of these problems for COM. The results can be seen in Table 13. We can see that FORT-h 2.0 significantly outperforms all the other tools on this class of systems. For all properties it can find results for more problems and can often do so with less time per problem. This difference is especially pronounced in the COM category, where FORT-h 2.0 can (dis)prove all but three of the 7503 problems, while ACP and CoLL timeout or return Maybe on more than 2000 of these. Given this performance discrepancy it is of interest to other tools to use FORT-h 2.0 on problems of this class. Here it could be used as a black box on problems (or subproblems) as long as they are linear variable-separated 123 14 Page 66 of 76 A. Middeldorp et al. TRSs, and can be expressed in the first-order theory of rewriting. An example of such a tool is CONFident [27] which uses FORT, among other tools, as part of its procedure. Another interesting point can be seen when comparing the first line in Table 13, where 37 YES results are reported, with the fourth line in Table 5, where 38 YES results are reported. Both formulas check ground-confluence, but the built-in GCR property is represented slightly different. Instead of the joinability predicate (t ↓ u), which is constructed via operations on ∗ ∗ anchored GTTs, it uses the equivalent formula ∃ v(t → v ∧ u → v). In this case the explicit formula is slower on COPS #215 leading to the additional timeout, but is faster on other problems causing the total time to be similar. Like previous experiments this shows that the representation of a property can have a large and non-obvious effect on performance. 8.2 FORT-s In this subsection we report on the synthesis experiments that we performed. All experiments were executed with the options -j 4 and +RTS -A64M, unless stated otherwise. First we consider Fig. 6. Experiment 7 The following TRSs were produced by FORT-s on the given formulas when restricting the signature (using the command-line option-S "a 0b0f2") to a binary function symbol f and two constants a and b: ∼ ∼ "GWCR & WCR & GCR" a →bf(a, x ) →aa → f(a, a) 9 s ∼ ∼ "GCR & CR & GSCR" a →bf(a, x ) → f(a, a) f(b, b) → a 10 s ∼ ∼ "GNFP & NFP & GCR" a →bf(a, x ) → f(a, a) f(b, b) → f(a, a) 4 s ∼ ∼ "GUNC & UNC & GNFP" a →af(a, x ) →af(x , b) → b 11 s We do not know whether there exist TRSs over the restricted signature that satisfy "GUNR & ∼UNR & ∼GUNC". Human expertise was used to produce a witness over a larger signature, which was subsequently simplified using the decision mode of FORT: b →ac →cd →cf(x , a) →Af(x , A) → A b →cd →ef(x , e) →Af(c, x ) → A FORT-h produces the following terms as witnesses for the fact that UNR is not satisfied: t = A and u = f(e, $). Indeed both A and f(e, $) are normal forms reachable from f(d, $). Moreover, we obtain witnesses t = a and u = e showing that GUNC does not hold. (The rule c → c is needed to satisfy GUNR.) In the next experiment we use the infinity predicate to distinguish well-known subclasses of linear-variable separated TRSs. Experiment 8 The formula ∃ t INF ε (t ) "exists t (INF(e<-,t))" ←− distinguishes ground TRSs from left-linear right-ground (but not ground) ones. Without any options FORT-s produces the TRS{g(x ) → g(a)} in a fraction of a second. The formula ∃ t INF (t ) "exists t (INF( =,t))" http://zenon.dsic.upv.es/confident 123 First-Order Theory of Rewriting… Page 67 of 76 14 is true for TRSs that are not ARSs. FORT-s produces the empty TRS over the signature consisting of the constant a and an additional constant and unary function symbol. The second constant is not necessary, but is added by the signature step. Finally, to distinguish linear variable-separated TRSs from left-linear right-ground TRSs, assuming the signature contains at least one non-constant function symbol, the formula ∃ t INF ε (t ) "exists t (INF(->e,t))" −→ can be used in connection with the -l option. This generates the TRS {a → x} over the signature consisting of the constant a and an additional constant and unary function symbol. Without the latter, the generated linear variable-separated TRS induces only a finite rewrite relation. Adding "& CR & WN" to the last formula produces the TRS{a → b, f(b) → x}. Experiment 9 Finding a locally confluent but not confluent TRS R is easy. FORT-s produces the ground TRS a →bf(a) →aa → f(a) when given the formula "WCR & CR" is less than 1 s. The well-known abstract counterexample by Kleene ab c d is found by restricting the search to ARSs. The easiest way to do this is with the option -A 0, which sets the maximal arity of function symbols to 0. Moreover, the maximum number of rewrite rules has to be set to at least four (-R 4). If we impose the additional condition that R is terminating (cf. [56]), the TRS a →ba → g(a) b → g(g(b)) is generated with ∼ ∼ "WCR & CR & exists t (INF(*<-,t) | t +<- t)" without any additional command-line options in less than 7 s. The next experiment shows how FORT-s can be used to complete TRSs into complete (canonical) ones. Experiment 10 FORT-s produces the TRS{a → c, f(x ) → a} when presented the formula "[0](WCR & SN) & forall s, t ([0] s <->* t <=> [1] s <->* t)" with input.trs as additional parameter. Here input.trs consists of the three rules c →af(b) →cf(c) → a The result is complete (as demanded by "[0](WCR & SN)"), but not equivalent! The reason is that "forall s, t ([0] s <->*t <=> [1] s <->* t)" ensures ground conversion equivalence, and we have seen in Sect. 6 that an extra constant is needed to reduce conversion equivalence to ground conversion equivalence. The same behaviour can also be seen for our leading example, where the same formula is used. When presented the formula "[0](WCR & SN) & CE([0],[1])" 123 14 Page 68 of 76 A. Middeldorp et al. the equivalent complete TRS consisting of the rules a →cf(b) → f(a) f(c) → a is synthesized. Note that the latter TRS is not canonical since not all right-hand sides are in normal form. It is well-known that every system of ground equations admits a presentation as canonical TRS. Snyder [50] proved that a ground TRS is canonical if only if it is reduced. The latter property is easily expressible: "[0](forall s, t (s ->e t => NF(t) & ∼exists u (s ->be u) & forall u (s ->e u => t = u)))" Together with "CE([0],[1])", any ground TRS is transformed into an equivalent canon- ical one, without explicitly requiring confluence and termination. For our example TRS, we obtain a →cf(b) →cf(c) → c The final experiment is based on [57, Example 5.1] and shows how FORT-s can be used to synthesize multiple TRSs. Experiment 11 If we want to generate two terminating ARSs such that their union is non- terminating, the formula "[0]SN & [1]SN & SN" can be used in connection with the options -A 0 and -n 2. The latter tells FORT-s to synthesize two TRSs. The additional requirement that the composition of both relations is a subset of the transitive closure of one of them is expressed as "forall s, t, u ([0] s -> t & [1] t -> u => [0] s ->+ u | [1] s ->+ u)" In a fraction of a second FORT-s synthesizes the following two ARSs satisfying the conjunction of these requirements: A : a →bb → c A : b →cc → a 0 1 Using completely different techniques, similar ARSs are generated by Carpa, the tool described in Zantema [57]. 9 Conclusion In this paper we presented a formalized decision procedure of the first-order theory of rewrit- ing for the class of linear variable-separated TRSs. The decision procedure ultimately goes back to Dauchet and Tison [10] and is the basis of the tool FORT-h. Different from [8, 10], we extensively use anchored GTT relations. These have better closure properties than GTT relations and allow to efficiently express numerous binary relations on ground terms, eas- ing formalization efforts. We presented signature extension results that allow us to reduce certain properties on arbitrary terms to the corresponding properties on ground terms. These allow FORT-h to participate in categories other than GCR in the Confluence Competition. We presented a certificate language in which certificates for the yes/no output of the decision procedure can be expressed. These certificates are validated by FORTify,the verifiedHaskell program obtained from the executable Isabelle formalization. FORT-h supports properties like commutation that involve multiple TRSs. Witness generation is useful to gain insight in 123 First-Order Theory of Rewriting… Page 69 of 76 14 why a particular property holds. The synthesis mode is used to find small TRSs that satisfy a given property. FORT-s supports several options to control the (infinite) search space. We showed that the synthesis problem is undecidable, already for ARSs, by a reduction from PCP. Comprehensive experimental results were presented, including a comparison with the tools ACP [2], AGCP [1], CoLL [49], CSI [44] that compete with FORT-h in CoCo. Full details are available from the web site https://fortissimo.uibk.ac.at/ which additionally provides a convenient interface to FORT-h, FORT-s and FORTify, as well as precompiled binaries for the three tools. Linear variable-separated TRSs are a proper extension of left-linear right-ground TRSs. Dropping either restriction, one quickly faces an undecidable first-order theory, even when one-step rewriting (→) is the only predicate. This was first shown by Treinen [54]. Related undecidability results are presented in [39, 55]. In particular, Marcinkowski [39] showed that the first-order theory of one-step rewriting is undecidable for right-ground TRSs. Many concrete properties expressible in the first-order theory of rewriting are known to be decidable for much larger classes of rewrite systems. For instance, termination is known to be decidable for right-linear right-shallow TRSs, a result by Godoy et al. [25], extending the earlier decision result for right-ground systems of Dershowitz [14]. Termination is also decidable for almost-orthogonal growing TRSs [43]. Confluence is decidable for right-linear shallow TRSs [24] and for right-ground TRSs [30]. For ground TRSs, which are in the scope of FORT-h, termination is known to be decidable in polynomial time [45]. The same holds for confluence [7]. Felgenhauer [19] showed that confluence can be decided in cubic time. Similar complexity results for the related properties NFP, UNC and UNR are given in [20]. The worst-case complexity of the formalized decision procedure implemented in FORT-h is at least double exponential (cf. [26]). Concerning synthesis, we are not aware of any other tree-automata based tool for synthe- sizing TRSs nor of any tool that allows properties to be specified by an arbitrary first-order formula in the theory of rewriting. Jiresch [29] developed a synthesis tool to attack the well- known open problems [15, 16] concerning the sufficiency of certain restricted joinability conditions on critical pairs of left-linear TRSs. Zantema [56] developed the tool Carpa+ for synthesizing TRSs that satisfy properties which can be encoded as SMT problems. The TRSs that can be synthesized form a small extension of the class of ARSs: A single unary function symbol f is permitted and rules must have the shape a → b, a → f (b),or f (a) → b, where a and b are constants. The properties are restricted to those that can be encoded into the conjunctive fragment of SMT-LRA (linear real arithmetic). The predecessor tool Carpa [57] synthesized combinations of ARSs with help of a SAT solver. It was used to show the necessity of certain conditions in abstract confluence results [52, Sect. 5] and inspired us to support multiple TRSs in FORT. Concerning future work, improving the efficiency of FORT-h by supporting parallelism might result in a speed-up, especially for larger formulas. The minimization of tree automata (also non-deterministic ones) is an obvious target for further investigation. Preprocessing techniques that go beyond the mere transformation to negation normal form will be helpful to obtain equivalent formulas that reduce the size of the ensuing tree automata in the decision procedure. In [28] similar ideas are applied to WSkS, in connection with MONA [31]. An interesting question is whether FORT-h can be extended to deal with properties involving innermost and other restrictions of rewriting. Formalization efforts that aim to transfer code in module A to the verified code in module B in Fig. 7, are also of interest. The conversion of FORT syntax to de Bruijn notation is a natural candidate here. 123 14 Page 70 of 76 A. Middeldorp et al. Acknowledgements This research was supported by FWF (Austrian Science Fund) project P30301. Several persons helped to make this project successful. We are grateful to Bertram Felgenhauer for numerous contribu- tions. Franziska Rapp implemented the first versions of FORT in OCaml and Java. She and T. V. H. Prathamesh contributed to the early stage of the formalization of the decision procedure. Jamie Hochrainer reimplemented the synthesis mode, resulting in FORT-s. Johannes Koch designed the web interface. We thank René Thie- mann for advice concerning turning the formalization into executable code. The first author acknowledges the support of the Future Value Creation Research Center of Nagoya University, where part of the research was performed. The detailed comments of the anonymous reviewers improved the presentation. Author Contributions All authors contributed to the research reported in the manuscript. Alexander Lochmann performed the formalizations in Isabelle/HOL that led to FORTify. Fabian Mitterwallner was the main developer of the artifacts (FORT-h, FORT-s and FORTify). The first draft of the manuscript was written by Aart Middeldorp and all authors commented on previous versions of the manuscript. All authors read and approved the final manuscript. Funding Open access funding provided by Austrian Science Fund (FWF). This work was supported by FWF (Austrian Science Fund) project P30301. The first author acknowledges the support of the Future Value Creation Research Center of Nagoya University. Data Availability The experiments summarized in the manuscript are available from https://fortissimo.uibk. ac.at/jar. The same holds for binaries and sources of the artifacts. Declarations Conflict of interest The author declares that they have no conflict of interest. Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. Appendix A: Input Format The input format of FORT-h can be roughly split into two parts: The logical structure of the property and the involved atomic predicates and relations. The logical structure is defined by the following grammar, where angle brackets are used for non-terminal symbols: formula ::= formula operator formula |∼ formula | quantifier vars ( formula )| var relation var | property | {+ nat } formula | [ trss ] formula | ( formula ) operator ::= <=>| =>| || & quantifier ::= forall| exists trss ::= nat | nat , trss vars ::= var | var , vars Here nat is a natural number, var is an alphanumerical string representing a variable name and trss is a comma separated list of indices referencing TRSs. The logical operators 123 First-Order Theory of Rewriting… Page 71 of 76 14 are all right-associative. Regarding precedence the unary operations bind strongest with the binary operators respecting the order & > | > => > <=>. Most represented operations have the meaning expected from a first-order formula, the exception being the operations {+ nat } formula , which allows the user to specify the number of constants to be added to the signature when evaluating the subformula, and [ trss ] formula , which restricts and permutes the indices of TRSs for the underlying subformula. The atomic binary relations supported by FORT-h are defined as: relation ::= ->e| ->e*| ->e=| ->e+| e<-| *e<-| =e<-| +e<- | ->be| ->be*| ->be=| ->be+| be<-| *be<-| =be<-| +be<- | ->| ->*| ->=| ->+| <-| *<-| =<-| +<- | ->!| -||->| !<-| <-||-| <->| <->* | =| join| meet Here the ->e stands for a root step, ->be for a step below the root, -> a normal rewrite step, ->! is a reduction to normal form, -||-> is a parallel step, join stands for joinability↓ and meet for meetability ↑.The suffix * stands for the transitive-reflexive, + for the transitive, and = for the reflexive closures. Example 38 Consider calling FORT-h with three input TRSs on the formula: "{+2} forall s, t ([2,0] ([0] s ->!t <=>[1]s->! t))" The {+2} instructs FORT-h to add two constants to the signature when constructing the automata. Normally "[0] s ->! t" means that term s normalizes to term t in the first input TRS (the one with index 0), however here the context has changed due to the restrict modifier [2,0], which permutes and restricts the three TRSs in the subformula ([0] s ->!t <=> [1]s->!t) such that [0] refers to the TRS with index 2 and [1] refers to the TRS with index 0. So FORT-h checks normalization equivalence of the third and first input TRS, while ignoring the second one. The two constants are added according to Table 3, since one of the involved TRSs may be linear variable-separated. It is also possible to use some predefined properties by name. Here we differentiate between properties of terms and properties of whole TRSs. property ::= prop_of_term | prop_of_system The properties on whole TRSs have the same names as defined in Sect. 6. prop_of_system := CR| WCR| SCR| NFP| UNC| UNR| WN| SN | GCR| GWCR| GSCR| GNFP| GUNC| GUNR | binary_prop ([ trss ],[ trss ]) binary_prop ::= COM| GCOM| CE| GCE| NE| GNE The term properties take a variable as an additional argument. prop_of_term ::= prop ( var )| finiteness ( binrel , var ) prop ::= CR| WCR| WN| NFP| SN| NF| SCR| UNR finiteness ::= INF| FIN binrel ::= binrel operator binrel |∼ binrel | relation 123 14 Page 72 of 76 A. Middeldorp et al. Note that the INF and FIN properties also take a binary relation as an argument. This is usually one of the predefined rewrite relations, but may also be a more complex relation constructed by combining the rewrite relations using logical operators. The property names (with exception of NF and INF) are all just a shorthand for larger formulas. In general these correspond to the definitions of the property in Sect. 6.However there are some exceptions. Take for example ground-confluence (GCR). This unfolds to the formula forall s, t, u (s ->u&s->*t=> exists v (u ->*v&t->* v)) The s->u on the left of the implication differs from the original definition of GCR. However this property (known as semi-confluence [3]) can be shown to be equivalent to GCR by a simple induction proof, and generally leads to smaller automata in the decision procedure. The runtime comparison between different representations of ground-confluence and other properties is shown in Sect. 8. Appendix B: User Interface of FORT-h The command-line interface of FORT-h is fort-h [OPTIONS] FORMULA TRS.trs .. where TRS.trs .. is one ore more files containing TRSs in the COPS format used in CoCo. It also supports many-sorted TRSs in the MSTRS format in the GCR category. The additional options are -c FILE write certificate to FILE -i enable the additional info in the inference steps of the certificate -v enables verbose output (e.g., the internal representation) -w enables witness generation Witness generation enables the tool to produce witnesses/counterexamples and will be described in detail later in this section. For now, consider Example 28 and the call > fort-h -w "CR" input.trs NO formula body / witness: (0 (<- o->*)1&˜0 (->* o *<-) 1) 0 = g(_00()) 1 = g(_01()) So in addition to the answer NO, it also outputs a counterexample for the given formula consisting of the two terms g(_00()) and g(_01()).Here _00 and _01 are additional constants required to reduce confluence to ground-confluence, and represent variables. The terms should therefore be read as g(x ) and g(y). Appendix C: User Interface of FORT-s The command-line interface of FORT-s is given below: 123 First-Order Theory of Rewriting… Page 73 of 76 14 fort-s [OPTIONS] FORMULA [TRS.trs ..] where [TRS.trs..] are zero or more files containing TRSs, and the options are -j NUM jobs to run in parallel (default: 1) -l search for linear variable-separated TRSs -n NUM number of systems to be synthesized (default: 1) -S STRING specifies signature (default: uses signature step) -a STRING specifies arities (default: uses signature step) -s NUM signature step (default: 2) -A NUM maximal generated arity (default: 3) -D NUM upper bound on height (default: 3) -r NUM lower bound on number of rules per system (default: 0) -R NUM upper bound on number of rules per system (default: 3) -v NUM upper bound on number of variables (default: 1) The signature used during synthesis can be specified in multiple ways, the two simplest being with the command line flags -S and -a. With the option -S the signature is specified by a string listing the symbols in F together with their arities, like in the call fort-s -S "a 0f2g1" "GCR & CR" Since we often do not care about the presentation of function symbols it is also permitted to just list arities with the option -a: fort-s -a "0 1" "WN & SN" FORT-s then generates unique symbol names for the user. If no signature is given, FORT-s generates successive signatures in a systematic manner with the help of a signature step and a bound on the maximal arity. If the signature step number is set to 1 and the arity is bounded by 3, signatures with the following arities are created: {0},{0, 1},{0, 1, 2},{0, 1, 2, 3},{0, 0, 1, 2, 3},{0, 0, 1, 1, 2, 3},... If the signature step is set to 2 (its default value), we obtain {0},{0, 0},{0, 0, 1},{0, 0, 1, 1},{0, 0, 1, 1, 2},... , {0, 0, 1, 1, 2, 2, 3, 3},{0, 0, 0, 1, 1, 2, 2, 3, 3},... The signature step is passed to FORT-s with the option -s and the bound on the arities by -A. Note that when additional systems are passed to FORT-s, it will use the union of the signatures of those systems. When synthesizing n TRSs, in the given formula the indices 0 through n − 1 refer to the systems to be generated, and the indices greater than n − 1 refer to systems passed as additional inputs to FORT-s. References 1. Aoto, T., Toyama, Y.: Ground confluence prover based on rewriting induction. In: Kesner, D., Pientka, B. (eds.) Proc. 1st International Conference on Formal Structures for Computation and Deduction. Leibniz International Proceedings in Informatics, vol. 52, pp. 33:1–33:12 (2016). https://doi.org/10.4230/LIPIcs. FSCD.2016.33 2. Aoto, T., Yoshida, J., Toyama, Y.: Proving confluence of term rewriting systems automatically. In: Treinen, R. (ed.) Proc. 20th International Conference on Rewriting Techniques and Applications. Lecture Notes in Computer Science, vol. 5595, pp. 93–102 (2009). https://doi.org/10.1007/978-3-642-02348-4_7 123 14 Page 74 of 76 A. Middeldorp et al. 3. Baader, F., Nipkow, T.: Term Rewriting and All That. Cambridge University Press, Cambridge (1998). https://doi.org/10.1017/CBO9781139172752 4. Berghofer, S.: First-order logic according to Fitting. Archive of Formal Proofs (2007). https://isa-afp.org/ entries/FOL-Fitting.html 5. Berghofer, S., Bulwahn, L., Haftmann, F.: Turning inductive into equational specifications. In: Berghofer, S., Nipkow, T., Urban, C., Wenzel, M. (eds.) Proc. 22nd International Conference on Theorem Proving in Higher Order Logics. Lecture Notes in Computer Science, vol. 5674, pp. 131–146 (2009). https://doi. org/10.1007/978-3-642-03359-9_11 6. Comon, H.: Sequentiality, monadic second-order logic and tree automata. Inf. Comput. 157(1–2), 25–51 (2000). https://doi.org/10.1006/inco.1999.2838 7. Comon, H., Godoy, G., Nieuwenhuis, R.: The confluence of ground term rewrite systems is decidable in polynomial time. In: Proc. 42th IEEE Symposium on Foundations of Computer Science, pp. 298–307 (2001). https://doi.org/10.1109/SFCS.2001.959904 8. Comon, H., Dauchet, M., Gilleron, R., Löding, C., Jacquemard, F., Lugiez, D., Tison, S., Tommasi, M.: Tree Automata Techniques and Applications (2008). http://tata.gforge.inria.fr/ 9. Dauchet, M., Tison, S.: Decidability of confluence for ground term rewriting systems. In: Budach, L. (ed.) Proc. 5th International Conference on Fundamentals of Computation Theory. Lecture Notes in Computer Science, vol. 199, pp. 80–84 (1985). https://doi.org/10.1007/BFb0028794 10. Dauchet, M., Tison, S.: The theory of ground rewrite systems is decidable. In: Proc. 5th IEEE Symposium on Logic in Computer Science, pp. 242–248 (1990a). https://doi.org/10.1109/LICS.1990.113750 11. Dauchet, M., Tison, S.: The theory of ground rewrite systems is decidable (extended version). Technical Report I.T. 197, LIFL (1990b) 12. Dauchet, M., Heuillard, T., Lescanne, P., Tison, S.: Decidability of the confluence of finite ground term rewriting systems and of other related term rewriting systems. Inf. Comput. 88(2), 187–201 (1990). https:// doi.org/10.1016/0890-5401(90)90015-A 13. de Bruijn, N.G.: Lambda calculus notation with nameless dummies: A tool for automatic formula manipu- lation, with application to the Church-Rosser theorem. Indagationes Mathematicae 34(5), 381–392 (1972). https://doi.org/10.1016/1385-7258(72)90034-0 14. Dershowitz, N.: Termination of linear rewriting systems (preliminary version). In: Even, S., Kariv, O. (eds.) Proc. 8th International Colloquium on Automata, Languages and Programming, vol. 115, pp. 448–458 (1981). https://doi.org/10.1007/3-540-10843-2_36 15. Dershowitz, N.: Open. Closed. Open. In: Giesl, J. (ed.) Proc. 16th International Conference on Rewriting Techniques and Applications. Lecture Notes in Computer Science, vol. 3467, pp. 276–393 (2005). https:// doi.org/10.1007/978-3-540-32033-3_28 16. Dershowitz, N., Jouannaud, J.-P., Klop, J.W.: Open problems in rewriting. In: Book, R.V. (ed.) Proc. 4th International Conference on Rewriting Techniques and Applications. Lecture Notes in Computer Science, vol. 488, pp. 445–456 (1991). https://doi.org/10.1007/3-540-53904-2_120 17. Deruyver, A., Gilleron, R.: The reachability problem for ground TRS and some extensions. In: Proc. 14th Colloquium on Trees in Algebra and Programming. Lecture Notes in Computer Science, vol. 351, pp. 227–243 (1989). https://doi.org/10.1007/3-540-50939-9_135 18. Durand, I., Middeldorp, A.: Decidable call-by-need computations in term rewriting. Inf. Comput. 196(2), 95–126 (2005). https://doi.org/10.1016/j.ic.2004.10.003 19. Felgenhauer, B.: Deciding confluence of ground term rewrite systems in cubic time. In: Tiwari, A. (ed.) Proc. 23nd International Conference on Rewriting Techniques and Applications. Leibniz International Proceedings in Informatics, vol. 15, pp. 165–175 (2012). https://doi.org/10.4230/LIPIcs.RTA.2012.165 20. Felgenhauer, B.: Deciding confluence and normal form properties of ground term rewrite systems efficiently. Log. Methods Comput. Sci. (2018). https://doi.org/10.23638/LMCS-14(4:7)2018 21. Felgenhauer, B., Thiemann, R.: Reachability, confluence, and termination analysis with state-compatible automata. Inf. Comput. 253(3), 467–483 (2017). https://doi.org/10.1016/j.ic.2016.06.011 22. Felgenhauer, B., Middeldorp, A., Prathamesh, T.V.H., Rapp, F.: A verified ground confluence tool for linear variable-separated rewrite systems in Isabelle/HOL. In: Mahboubi, A., Myreen, M.O. (eds.) Proc. 8th ACM SIGPLAN International Conference on Certified Programs and Proofs, pp. 132–143 (2019). https://doi.org/10.1145/3293880.3294098 23. Giesl, J., Rubio, A., Sternagel, C., Waldmann, J., Yamada, A.: The termination and complexity compe- tition. In: Beyer, D., Huisman, M., Kordon, F., Steffen, B. (eds.) Proc. 25th International Conference on Tools and Algorithms for the Construction and Analysis of Systems. Lecture Notes in Computer Science, vol. 11429, pp. 156–166 (2019). https://doi.org/10.1007/978-3-030-17502-3_10 24. Godoy, G., Tiwari, A.: Confluence of shallow right-linear rewrite systems. In: Ong, L. (ed.) Proc. 14th International Conference on Computer Science Logic. Lecture Notes in Computer Science, vol. 3634, pp. 541–556 (2005). https://doi.org/10.1007/11538363_37 123 First-Order Theory of Rewriting… Page 75 of 76 14 25. Godoy, G., Huntingford, E., Tiwari, A.: Termination of rewriting with right-flat rules. In: Baader, F. (ed.) Proc. 18th International Conference on Rewriting Techniques and Applications. Lecture Notes in Computer Science, vol. 4533, pp. 200–213 (2007). https://doi.org/10.1007/978-3-540-73449-9_16 26. Göller, S., Lohrey, M.: The first-order theory of ground tree rewrite graphs. Log. Methods Comput. Sci. (2014). https://doi.org/10.2168/LMCS-10(1:7)2014 27. Gutiérrez, R., Lucas, S., Vítores, M.: Confluence of conditional rewriting in logic form. In: Bojanczyk, M., Chekuri, C. (eds.) Proc. 41st IARCS Annual Conference on Foundations of Software Technology and Theoretical Computer Science. Leibniz International Proceedings in Informatics, vol. 213, pp. 44:1–44:18 (2021). https://doi.org/10.4230/LIPIcs.FSTTCS.2021.44 28. Havlena, V., Holík, L., Lengal, O., Vales, O., Vojnar, T.: Antiprenexing for WSkS: A little goes a long way. In: Albert, E., Kovacs, L. (eds.) Proc. 23rd International Conference on Logic for Programming, Artificial Intelligence, and Reasoning. EPiC Series in Computing, vol. 73, pp. 298–316 (2020). https:// doi.org/10.29007/6bfc 29. Jiresch, E.: A term rewriting laboratory with systematic and random generation and heuristic test facilities. Master’s thesis, Vienna University of Technology (2008) 30. Kaiser, L.: Confluence of right ground term rewriting systems is decidable. In: Sassone, V. (ed.) Proc. 8th International Conference on Foundations of Software Science and Computation Structures. Lecture Notes in Computer Science, vol. 3441, pp. 470–489 (2005). https://doi.org/10.1007/978-3-540-31982- 5_30 31. Klarlund, N., Møller, A., Schwartzbach, M.I.: MONA implementation secrets. Int. J. Found. Comput. Sci. 13(4), 571–586 (2002). https://doi.org/10.1142/S012905410200128X 32. Lochmann, A.: Reducing Rewrite Properties to Properties on Ground Terms. Archive of Formal Proofs (2022). https://isa-afp.org/entries/Rewrite_Properties_Reduction.html 33. Lochmann, A., Felgenhauer, B.: First-order theory of rewriting. Archive of Formal Proofs (2022). https:// isa-afp.org/entries/FO_Theory_Rewriting.html 34. Lochmann, A., Middeldorp, A.: Formalized proofs of the infinity and normal form predicates in the first- order theory of rewriting. In: Biere, A., Parker, D. (eds.) Proc. 26th International Conference on Tools and Algorithms for the Construction and Analysis of Systems. Lecture Notes in Computer Science, vol. 12079, pp. 178–194 (2020). https://doi.org/10.1007/978-3-030-45237-7_11 35. Lochmann, A., Felgenhauer, B., Sternagel, C., Thiemann, R., Sternagel, T.: Regular tree relations. Archive of Formal Proofs (2021a). https://www.isa-afp.org/entries/Regular_Tree_Relations.html 36. Lochmann, A., Middeldorp, A., Mitterwallner, F., Felgenhauer, B.: A verified decision procedure for the first-order theory of rewriting for linear variable-separated rewrite systems variable-separated rewrite systems in Isabelle/HOL. In: Hri¸tcu, C., Popescu, A. (eds.) Proc. 10th ACM SIGPLAN International Conference on Certified Programs and Proofs, pp. 250–263 (2021b). https://doi.org/10.1145/3437992. 37. Lochmann, A., Mitterwallner, F., Middeldorp, A.: Formalized signature extension results for conflu- ence, commutation and unique normal forms. In: Mimram, S., Rocha, C. (eds.) Proc. 10th International Workshop on Confluence, pp. 25–30 (2021) 38. Lochmann, A., Mitterwallner, F., Middeldorp, A.: Formalized signature extension results for equivalence. In: Winkler, S., Rocha, C. (eds.) Proc. 11th International Workshop on Confluence, pp. 42–47 (2022) 39. Marcinkowski, J.: Undecidability of the first order theory of one-step right ground rewriting. In: Comon, H. (ed.) Proc. 8th International Conference on Rewriting Techniques and Applications. Lecture Notes in Computer Science, vol. 1232, pp. 241–253 (1997). https://doi.org/10.1007/3-540-62950-5_75 40. Middeldorp, A.: Approximating dependency graphs using tree automata techniques. In: Goré, R., Leitsch, A., Nipkow, T. (eds.) Proc. 1st International Joint Conference on Automated Reasoning. LNAI, vol. 2083, pp. 593–610 (2001). https://doi.org/10.1007/3-540-45744-5_49 41. Middeldorp, A., Nagele, J., Shintani, K.: Confluence competition 2019. In: Beyer, D., Huisman, M., Kordon, F., Steffen, B. (eds.) Proc. 25th International Conference on Tools and Algorithms for the Con- struction and Analysis of Systems. Lecture Notes in Computer Science, vol. 11429, pp. 25–40 (2019). https://doi.org/10.1007/978-3-030-17502-3_2 42. Mitterwallner, F., Lochmann, A., Middeldorp, A., Felgenhauer, B.: Certifying proofs in the first-order theory of rewriting. In: Groote, J.F., Larsen, K.G. (eds.) Proc. 27th International Conference on Tools and Algorithms for the Construction and Analysis of Systems. Lecture Notes in Computer Science, vol. 12652, pp. 127–144 (2021). https://doi.org/10.1007/978-3-030-72013-1_7 43. Nagaya, T., Toyama, Y.: Decidability for left-linear growing term rewriting systems. Inf. Comput. 178(2), 499–514 (2002). https://doi.org/10.1006/inco.2002.3157 44. Nagele, J., Felgenhauer, B., Middeldorp, A.: CSI: New evidence—a progress report. In: de Moura, L. (ed.) Proc. 26th International Conference on Automated Deduction. LNAI, vol. 10395, pp. 385–397 (2017). https://doi.org/10.1007/978-3-319-63046-5_24 123 14 Page 76 of 76 A. Middeldorp et al. 45. Plaisted, D.A.: Polynomial time termination and constraint satisfaction tests. In: Kirchner, C. (ed.) Proc. 5th International Conference on Rewriting Techniques and Applications. Lecture Notes in Computer Science, vol. 690, pp. 405–420 (1993). https://doi.org/10.1007/978-3-662-21551-7_30 46. Rapp, F., Middeldorp, A.: Automating the first-order theory of left-linear right-ground term rewrite sys- tems. In: Kesner, D., Pientka, B. (eds.) Proc. 1st International Conference on Formal Structures for Computation and Deduction. Leibniz International Proceedings in Informatics, vol. 52, pp 36:1–36:12 (2016). https://doi.org/10.4230/LIPIcs.FSCD.2016.36 47. Rapp, F., Middeldorp, A.: Confluence properties on open terms in the first-order theory of rewriting. In: Accattoli, B., Tiwari, A. (eds.) Proc. 5th International Workshop on Confluence, pp. 26–30 (2016) 48. Rapp, F., Middeldorp, A.: FORT 2.0. In: Galmiche, D., Schulz, S., Sebastiani, R. (eds.) Proc. 9th Interna- tional Joint Conference on Automated Reasoning. LNAI, vol. 10900, pp. 81–88 (2018). https://doi.org/ 10.1007/978-3-319-94205-6_6 49. Shintani, K., Hirokawa, N.: CoLL: A confluence tool for left-linear term rewrite systems. In: Felty, A.P., Middeldorp, A. (eds.) Proc. 25th International Conference on Automated Deduction. Lecture Notes in Computer Science, vol. 9195, pp. 127–136 (2015). https://doi.org/10.1007/978-3-319-21401-6_8 50. Snyder, W.: A fast algorithm for generating reduced ground rewriting systems from a set of ground equations. J. Symbol. Comput. 15(4), 415–450 (1993). https://doi.org/10.1006/jsco.1993.1029 51. Sternagel, C., Sternagel, T.: Certifying confluence of almost orthogonal CTRSs via exact tree automata completion. In: Kesner, D., Pientka, B. (eds.) Proc. 1st International Conference on Formal Structures for Computation and Deduction. Leibniz International Proceedings in Informatics, vol. 52, pp. 29:1–29:16 (2016). https://doi.org/10.4230/LIPIcs.FSCD.2016.29 52. Stump, A., Zantema, H., Kimmell, G., Omar, R.E.H.: A rewriting view of simple typing. Log. Methods Comput. Sci. (2012). https://doi.org/10.2168/LMCS-9(1:4)2013 53. Thiemann, R., Sternagel, C.: Certification of termination proofs using CeTA. In: Berghofer, S., Nipkow, T., Urban, C., Wenzel, M. (eds.) Proc. 22nd International Conference on Theorem Proving in Higher Order Logics. Lecture Notes in Computer Science, vol. 5674, pp. 452–468 (2009). https://doi.org/10. 1007/978-3-642-03359-9_31 54. Treinen, R.: The first-order theory of linear one-step rewriting is undecidable. Theor. Comput. Sci. 208(1– 2), 179–190 (1998). https://doi.org/10.1016/S0304-3975(98)00083-8 55. Vorobyov, S.: The undecidability of the first-order theories of one step rewriting in linear canonical systems. Inf. Comput. 175(2), 182–213 (2002). https://doi.org/10.1006/inco.2002.3151 56. Zantema, H.: Automatically finding non-confluent examples in term rewriting. In: Hirokawa, N., van Oost- rom, V. (eds.) Proc. 2nd International Workshop on Confluence, pp. 11–15 (2013). http://cl-informatik. uibk.ac.at/iwc/iwc2013.pdf 57. Zantema, H.: Finding small counterexamples for abstract rewriting properties. Math. Struct. Comput. Sci. 28, 1485–1505 (2018). https://doi.org/10.1017/S0960129518000221 Publisher’s Note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Journal

Journal of Automated ReasoningSpringer Journals

Published: Jun 1, 2023

Keywords: Term rewriting; First-order theory; Tree automata; Formalization

There are no references for this article.