Access the full text.
Sign up today, get DeepDyve free for 14 days.
M. Bonacina, Stéphane Lengrand, N. Shankar (2017)
Satisfiability Modulo Theories and Assignments
for every Infer transition, either C is TAF -minimal in P or C is F -minimal in P
Bernhard Gleiss, L. Kovács, Jakob Rath (2020)
Subsumption Demodulation in First-Order Theorem ProvingAutomated Reasoning, 12166
Gabriel Ebner, Sebastian Ullrich, Jared Roesch, J. Avigad, L. Moura (2017)
A metaprogramming framework for formal verificationProceedings of the ACM on Programming Languages, 1
Yeting Ge, L. Moura (2009)
Complete Instantiation for Quantified Formulas in Satisfiabiliby Modulo Theories
Arnaud Fietzke, Christoph Weidenbach (2008)
Labelled splittingAnnals of Mathematics and Artificial Intelligence, 55
C Barrett (2011)
171
L. Bachmair, H. Ganzinger (1994)
Rewrite-Based Equational Theorem Proving with Selection and SimplificationJ. Log. Comput., 4
C Barrett, CL Conway, M Deters, L Hadarean, D Jovanovic, T King, A Reynolds, C Tinelli, G Gopalakrishnan, S Qadeer (2011)
CVC4CAV 2011
Gabriel Ebner, J. Blanchette, Sophie Tourret (2021)
A Unifying Splitting Framework
D. Déharbe, P. Fontaine, Silvio Ranise, C. Ringeissen (2006)
Decision Procedures for the Formal Analysis of Software
A i ∪ P i ∪ Q i ∪ L i ) J ) for every subsequence (J j ) j of (J i ) i converging to a limit point
Giles Reger, M. Suda, A. Voronkov (2015)
Playing with AVATAR
If the original A-clause is made redundant, delete it; otherwise, use Lock to temporarily remove it. If an enabled ⊥ ← A is derived with A ⊆ i {l i
R Nieuwenhuis, A Oliveras, C Tinelli (2006)
Solving SAT and SAT modulo theories: from an abstract Davis?Putnam?Logemann?Loveland procedure to DPLL(T\documentclass[12pt]{minimal}J. ACM, 53
Use Lock to temporarily remove a locally redundant enabled A-clause
L Bachmair (1994)
217J. Log. Comput., 4
MP Bonacina, C Lynch, L de Moura, RA Schmidt (2009)
On deciding satisfiability by DPLL(?+T\documentclass[12pt]{minimal}CADE-22
R. Nieuwenhuis, Albert Oliveras, C. Tinelli (2006)
Solving SAT and SAT Modulo Theories: From an abstract Davis--Putnam--Logemann--Loveland procedure to DPLL(T)J. ACM, 53
Apply Base (via Lift and Derive) to perform an inference from the enabled A-clauses. If an enabled ⊥ ← A is derived with A ⊆ i {l i , r i }, apply Switch or StrongUnsat
Apply SoftSplit (via Lift and Derive) with split level k on an A-clause C . Then use Switch to enable the left branch and apply Lock on C
T. Nipkow, G. Klein (2014)
Concrete Semantics
L. Moura, N. Bjørner (2008)
Z3: An Efficient SMT Solver
Christoph Weidenbach, Bernd Gaede, Georg Rock (1996)
SPASS & FLOTTER Version 0.42
α(C 1 ), α(C 0 )) ∈ SInf be an inference with premises in α(N)
T. Nipkow, G. Klein (2014)
Concrete Semantics: With Isabelle/HOL
A Schlichtkrull, JC Blanchette, D Traytel, A Mahboubi, MO Myreen (2019)
A verified prover based on ordered resolutionCPP 2019
Christoph Weidenbach (2001)
Combining Superposition, Sorts and Splitting
Andrew Reynolds, C. Tinelli, A. Goel, S. Krstic (2013)
Finite Model Finding in SMT
Andrew Reynolds, Haniel Barbosa, P. Fontaine (2018)
Revisiting Enumerative Instantiation
Uwe Waldmann, Sophie Tourret, Simon Robillard, J. Blanchette (2020)
A Comprehensive Framework for Saturation Theorem ProvingAutomated Reasoning, 12166
there are infinitely many indices i such that either P i = ∅ or Infer chooses a TAFminimal C at i
L. Moura, Dejan Jovanovic (2013)
A Model-Constructing Satisfiability Calculus
W. McCune, L. Wos (1997)
Otter - The CADE-13 Competition IncarnationsJournal of Automated Reasoning, 18
Thomas Bouton, D. Oliveira, D. Déharbe, P. Fontaine (2009)
veriT: An Open, Trustable and Efficient SMT-Solver
M. Bonacina, C. Lynch, L. Moura (2009)
On Deciding Satisfiability by DPLL(G+T) and Unsound Theorem Proving
D. Knuth, P. Bendix (1983)
Simple Word Problems in Universal Algebras
L. Steen, J. Seebach (1970)
Counterexamples in Topology
L Bachmair (2001)
19
Anders Schlichtkrull, J. Blanchette, Dmitriy Traytel (2019)
A verified prover based on ordered resolutionProceedings of the 8th ACM SIGPLAN International Conference on Certified Programs and Proofs
TAF is a selection order on i P i , and F is a selection order on F
B Gleiss, L Kovács, J Rath, N Peltier, V Sofronie-Stokkermans (2020)
Subsumption demodulation in first-order theorem provingIJCAR 2020, Part I
M. Bonacina, D. Plaisted (2014)
SGGS Theorem Proving: an Exposition
A. Riazanov, A. Voronkov (2001)
Splitting Without Backtracking
J. Robinson (1965)
A Machine-Oriented Logic Based on the Resolution PrincipleJ. ACM, 12
Giles Reger, N. Bjørner, M. Suda, A. Voronkov (2016)
AVATAR Modulo Theories
L. Bachmair, H. Ganzinger, David McAllester, C. Lynch (2001)
Resolution Theorem Proving
U Waldmann, S Tourret, S Robillard, J Blanchette, N Peltier, V Sofronie-Stokkermans (2020)
A comprehensive framework for saturation theorem provingIJCAR 2020, Part I
Simon Cruanes (2015)
Extending Superposition with Integer Arithmetic, Structural Induction, and Beyond. (Extensions de la Superposition pour l'Arithmétique Linéaire Entière, l'Induction Structurelle, et bien plus encore)
A. Voronkov (2014)
AVATAR: The Architecture for First-Order Theorem Provers
Publisher's Note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations
AVATAR is an elegant and effective way to split clauses in a saturation prover using a SAT solver. But is it refutationally complete? And how does it relate to other splitting architectures? To answer these questions, we present a unifying framework that extends a saturation calculus (e.g., superposition) with splitting and that embeds the result in a prover guided by a SAT solver. The framework also allows us to study locking, a subsumption-like mechanism based on the current propositional model. Various architectures are instances of the framework, including AVATAR, labeled splitting, and SMT with quantifiers. Keywords Automated theorem proving · Completeness · Splitting · AVATAR 1 Introduction One of the great strengths of saturation calculi such as resolution [26] and superposition [1] is that they avoid case analyses. Derived clauses hold unconditionally, and the prover can stop as soon as it derives the empty clause, without having to backtrack. The drawback is that these calculi often generate long, unwieldy clauses that slow down the prover. A remedy is to partition the search space by splitting a multiple-literal clause C ∨ ··· ∨ C into subclauses 1 n C that share no variables. Splitting approaches include splitting with backtracking [31, 32], splitting without backtracking [25], labeled splitting [15], and AVATAR [28]. The AVATAR architecture, which is based on a satisfiability (SAT) solver, is of particular interest because it is so successful. Voronkov reported that an AVATAR-enabled Vampire could solve 421 TPTP problems that had never been solved before by any system [28, Sect. 9], a mind-boggling number. Intuitively, AVATAR works well in combination with the Gabriel Ebner gebner@gebner.org Jasmin Blanchette j.c.blanchette@vu.nl; jasmin.blanchette@inria.fr; jasmin.blanchette@mpi-inf.mpg.de Sophie Tourret sophie.tourret@inria.fr; stourret@mpi-inf.mpg.de Vrije Universiteit Amsterdam, Amsterdam, The Netherlands Université de Lorraine, CNRS, Inria, LORIA, Nancy, France Max-Planck-Institut für Informatik, Saarland Informatics Campus, Saarbrücken, Germany 0123456789().: V,-vol 123 16 Page 2 of 44 G. Ebner et al. superposition calculus because it combines superposition’s strong equality reasoning with the SAT solver’s strong clausal reasoning. It is also appealing theoretically, because it gracefully generalizes traditional saturation provers and yet degenerates to a SAT solver if the problem is propositional. To illustrate the approach, we follow the key steps of an AVATAR-enabled resolution prover on the initial clause set containing ¬p(a), ¬q(z, z), and p(x)∨q(y, b). The disjunction can be split into p(x)←{[p(x)]} and q(y, b)←{[q(y, b)]}, where C←{[C]} indicates that the clause C is enabled only in models in which the associated propositional variable [C] is true. A SAT solver is then run to choose a model J of [p(x)]∨[q(y, b)]. Suppose J makes [p(x)] true and [q(y, b)] false. Then resolving p(x) ←{[p(x)]} with ¬p(a) produces ⊥←{[p(x)]}, meaning that [p(x)] must be false. Next, the SAT solver makes [p(x)] false and [q(y, b)] true. Resolving q(y, b) ←{[q(y, b)]} with ¬q(z, z) yields ⊥←{[q(y, b)]}, meaning that [q(y, b)] must be false. Since both disjuncts of [p(x)]∨[q(y, b)] are false, the SAT solver reports “unsatisfiable,” concluding the refutation. What about refutational completeness? Far from being a purely theoretical concern, estab- lishing completeness—or finding counterexamples—could yield insights into splitting and perhaps lead to an even stronger AVATAR. Before we can answer this open question, we must mathematize splitting. Our starting point is the saturation framework by Waldmann, Tourret, Robillard, and Blanchette [29], based on the work of Bachmair and Ganzinger [2]. It covers a wide array of techniques, but “the main missing piece of the framework is a generic treatment of clause splitting” [29, p. 332]. We provide that missing piece, in the form of a splitting framework, and use it to show the completeness of an AVATAR-like architecture. The framework is currently a pen-and-paper creature; a formalization using Isabelle/HOL [21] is underway. Our framework has five layers, linked by refinement. The first layer consists of a base calculus, such as resolution or superposition. It must be presentable as an inference system and a redundancy criterion, as required by the saturation framework, and it must be refutationally complete. From a base calculus, our framework can be used to derive the second layer, which we call the splitting calculus (Sect. 3). This extends the base calculus with splitting and inherits the base’s completeness. It works on A-clauses or A-formulas of the form C ← A,where C is a base clause or formula and A is a set of propositional literals, called assertions (Sect. 2). Using the saturation framework, we can prove the dynamic completeness of an abstract prover, formulated as a transition system, that implements the splitting calculus. However, this ignores a major component of AVATAR: the SAT solver. AVATAR considers only inferences involving A-formulas whose assertions are true in the current propositional model. The role of the third layer is to reflect this behavior. A model-guided prover operates on states of the form (J, N ), where J is a propositional model and N is a set of A-formulas (Sect. 4). This layer is also dynamically complete. The fourth layer introduces AVATAR’s locking mechanism (Sect. 5). With locking, an A-formula D ← B can be temporarily disabled by another A-formula C ← A if C subsumes D, even if A B. Here we make a first discovery: AVATAR-style locking compromises completeness and must be curtailed. Finally, the fifth layer is an AVATAR-based prover (Sect. 6). This refines the locking model- guided prover of the fourth layer with the given clause procedure, which saturates an A- formula set by distinguishing between active and passive A-formulas. Here we make another discovery: Selecting A-formulas fairly is not enough to guarantee completeness. We need a stronger criterion. 123 Unifying Splitting Page 3 of 44 16 There are also implications for other architectures. In a hypothetical tête-à-tête with the designers of labeled splitting, they might gently point out that by pioneering the use of a propositional model, including locking, they almost invented AVATAR themselves. Likewise, developers of satisfiability modulo theories (SMT) solvers might be tempted to claim that Voronkov merely reinvented SMT. To investigate such questions, we apply our framework to splitting without backtracking, labeled splitting, and SMT with quantifiers (Sect. 7). This gives us a solid basis for comparison as well as some new theoretical results. A shorter version of this article was presented at CADE-28 [14]. This article extends the conference paper with more explanations, examples, counterexamples, and proofs. We strengthened the definition of consequence relation to require compactness, which allowed us to simplify property (D4). The property (D4) from the conference paper is proved as Lemma 5. The definition of strongly finitary was also changed to include a stronger condition on the introduced assertions, which is needed for the proof of Lemma 72. 2 Preliminaries Our framework is parameterized by abstract notions of formulas, consequence relations, inferences, and redundancy. We largely follow the conventions of Waldmann et al. [29]. A-formulas generalize Voronkov’s A-clauses [28]. 2.1 Formulas Aset F of formulas, ranged over by C, D ∈ F, is a set that contains a distinguished element ⊥ denoting falsehood. A consequence relation | over F is a relation | ⊆ (P(F)) with the following properties for all sets M, M , N, N ⊆ F and formulas C, D ∈ F: (D1) {⊥} | ∅; (D2) {C}|{C}; (D3) if M ⊆ M and N ⊆ N, then M | N implies M | N; (D4) if M | N ∪{C} and M ∪{C}| N ,then M ∪ M | N ∪ N ; (D5) if M | N, then there exist finite sets M ⊆ M and N ⊆ N such that M | N . The intended interpretation of M | N is conjunctive on the left but disjunctive on the right: “ M −→ N.” The disjunctive interpretation of N will be useful to define splittability abstractly in Sect. 3.1. Property (D4) is called the cut rule, and (D5) is called compactness. For their saturation framework, Waldmann et al. instead consider a fully conjunctive version of the consequence relation, with different properties. The incompatibility can easily be repaired: Given a consequence relation |, we can obtain a consequence relation | in their sense by defining M | N if and only if M | {C} for every C ∈ N. The two versions differ only when the right-hand side is not a singleton. The conjunctive version | can then be used when interacting with the saturation framework. The | notation can be extended to allow negation on either side. Let F be defined as F {∼C | C ∈ F } such that ∼∼C = C. Given M, N ⊆ F , we set M | N if and only if ∼ ∼ {C ∈ F | C ∈ M}∪{C ∈ F |∼C ∈ N} | {C ∈ F |∼C ∈ M}∪{C ∈ F | C ∈ N} We write M || N for the conjunction of M | N and N | M. Lemma 1 Let C ∈ F .Then {C}∪{∼C}|{⊥}. 123 16 Page 4 of 44 G. Ebner et al. Proof This holds by (D2) and (D3) due to the definition of | on F . Following the saturation framework [29, p. 318], we distinguish between the consequence relation | used for stating refutational completeness and the consequence relation |≈ used for stating soundness. For example, | could be entailment for first-order logic with equality, whereas |≈ could also draw on linear arithmetic, or interpret Skolem symbols so as to make skolemization sound. Normally | ⊆ |≈, but this is not required. Example 1 In clausal first-order logic with equality, as implemented in superposition provers, the formulas in F consist of clauses over a signature . Each clause C is a finite multiset of literals L ,..., L written C = L ∨···∨ L . The clause’s variables are implicitly quantified 1 n 1 n universally. Each literal L is either an atom or its negation (¬), and each atom is an unoriented equation s ≈ t. We define the consequence relation | by letting M | N if and only if every first-order -interpretation that satisfies all clauses in M also satisfies at least one clause in N. 2.2 Calculi and Derivations A refutational calculus combines a set of inferences, which are a priori mandatory, and a redundancy criterion, which identifies inferences that a posteriori need not be performed as well as formulas that can be deleted. Let F be a set of formulas equipped with ⊥. An F-inference ι is a tuple (C ,..., C , D) ∈ n 1 n+1 F . The formulas C ,..., C are the premises,and D is the conclusion.Define prems(ι) = n 1 {C ,..., C } and concl(ι) ={D}. An inference ι is sound w.r.t. |≈ if prems(ι) |≈ concl(ι). n 1 An inference system Inf is a set of F-inferences. Given N ⊆ F, we let Inf(N) denote the set of all inferences in Inf whose premises are included in N, and Inf(N, M) = Inf(N∪M) \ Inf(N \ M) for the set of all inferences in Inf such that one or more premises are in M and the remaining premises are in N. A redundancy criterion for an inference system Inf and a consequence relation | is a pair Red = (Red , Red ), where Red : P(F) → P(Inf) and Red : P(F) → P(F) enjoy the I F I F following properties for all sets M, N ⊆ F: (R1) if N | {⊥}, then N\Red (N) | {⊥}; (R2) if M ⊆ N, then Red (M) ⊆ Red (N) and Red (M) ⊆ Red (N); F F I I (R3) if M ⊆ Red (N), then Red (N) ⊆ Red (N\M) and Red (N) ⊆ Red (N\M); F F F I I (R4) if ι ∈ Inf and concl(ι) ∈ N, then ι ∈ Red (N). Inferences in Red (N) and formulas in Red (N) are said to be redundant w.r.t. N. Red I F I indicates which inferences need not be performed, whereas Red justifies the deletion of formulas deemed useless. The above properties make the passage from static to dynamic completeness possible: (R1) ensures that deleting a redundant formula preserves a set’s inconsistency, so as not to lose refutations; (R2) and (R3) ensure that arbitrary formulas can be added and redundant formulas can be deleted by the prover; and (R4) ensures that adding an inference’s conclusion to the formula set makes the inference redundant. Apair (Inf, Red) forms a calculus.Aset N ⊆ F is saturated w.r.t. Inf and Red if Inf(N) ⊆ Red (N). The calculus (Inf, Red) is statically (refutationally) complete (w.r.t. |)iffor every set N ⊆ F that is saturated w.r.t. Inf and Red and such that N | {⊥}, we have ⊥∈ N. Lemma 2 Assume that the calculus (Inf, Red) is statically complete. Then ⊥ ∈ / Red (N) for every N ⊆ F. 123 Unifying Splitting Page 5 of 44 16 Proof By (R2), it suffices to show ⊥ ∈ / Red (F). Clearly, by (D2) and (D3), F | {⊥}. Thus, by (R1), F\Red (F) | {⊥}. Moreover, by (R3) and (R4), F\Red (F) is saturated. Hence, F F since (Inf, Red) is statically complete, ⊥∈ F \ Red (F). Therefore, ⊥ ∈ / Red (F). F F Remark 3 Given a redundancy criterion (Red , Red ), where ⊥ ∈ / Red (F), we can make it I F F stricter as follows. Define Red such that ι ∈ Red (N) if and only if either ι ∈ Red (N) or I I ⊥∈ N. Define Red such that C ∈ Red (N) if and only if either C ∈ Red (N) or else both F F ⊥∈ N and C =⊥. Obviously, Red = (Red , Red ) is a redundancy criterion. Moreover, if I F N is saturated w.r.t. Inf and Red , then N is saturated w.r.t. Inf and Red , and if the calculus (Inf, Red) is statically complete, then (Inf, Red ) is also statically complete. (In the last case, the condition ⊥ ∈ / Red (F) holds by Lemma 2.) A sequence (x ) over a set X is a function from N to X that maps each i ∈ N to x ∈ X. i i i Let (X ) be a sequence of sets. Its limit inferior is X = lim inf X = X , i i ∞ j→∞ j j i j≥i and its limit superior is X = lim sup X = X . The elements of X are j j ∞ j→∞ i j≥i called persistent. A sequence (N ) of sets of F-formulas is weakly fair w.r.t. Inf and Red i i I if Inf(N ) ⊆ Red (N ) and strongly fair if lim sup Inf(N ) ⊆ Red (N ). Weak ∞ I i i I i i i→∞ i fairness requires that all inferences possible from some index i and ever after eventually be performed or become redundant for another reason. Strong fairness requires the same from all inferences that are possible infinitely often, even if not continuously so. Both can be used to ensure that some limit is saturated. Given a relation ⊆ X (pronounced “triangle”), a -derivation is a sequence of X elements such that x x for every i. Finite runs can be extended to derivations by i i+1 repeating the final state infinitely. We must then ensure that supports such stuttering. Abusing language, and departing slightly from the saturation framework, we will say that a derivation (x ) terminates if x = x = ··· for some index i. i i i i+1 Let ⊆ (P(F)) be the relation such that M N if and only if M\N ⊆ Red (N). Red Red F F F Note that it is reflexive and hence supports stuttering. The relation is also transitive due to (R3). We could additionally require soundness (M |≈ N) or at least consistency preservation (M |≈ {⊥} implies N |≈ {⊥}), but this is unnecessary for proving completeness. The calculus (Inf, Red) is dynamically (refutationally) complete (w.r.t. |)iffor every -derivation (N ) that is weakly fair w.r.t. Inf and Red andsuchthat N | {⊥}, we have Red i i I 0 ⊥∈ N for some i. 2.3 A-Formulas We fix throughout a countable set V of propositional variables v , v ,.... For each v ∈ V, 0 1 let ¬v ∈¬V denote its negation, with ¬¬v = v. We assume that a formula fml(v) ∈ F is associated with each propositional variable v ∈ V. Intuitively, v approximates fml(v) at the propositional level. This definition is extended so that fml(¬v) =∼fml(v). A propositional literal, or assertion, a ∈ A = V∪¬V over V is either a propositional variable v or its negation ¬v. A propositional interpretation J ⊆ A is a set of assertions such that for every variable v ∈ V, exactly one of v ∈ J and ¬v ∈ J holds. We lift fml to sets in an elementwise fashion: fml(J) ={fml(a) | a ∈ J}. In the rest of this article, we will often implicitly lift functions elementwise to sets. The condition on the variables ensures that J is propositionally consistent. J might nevertheless be inconsistent for |, which takes into account the semantics of the formulas fml(v) associated with the variables v; for example, we might have J ={v , v }, 0 1 fml(v ) = p(x),and fml(v ) =¬p(a),and J | {⊥}. 0 1 123 16 Page 6 of 44 G. Ebner et al. An A-formula over a set F of base formulas andanassertionset A is a pair C = (C, A) ∈ AF = F × P (A), written C ← A, where C is a formula and A is a finite set of assertions fin {a ,..., a } understood as an implication a ∧ ··· ∧ a −→ C.Weidentify C ←∅ with C 1 n 1 n and define the projections C ← A= C and (C ← A ,..., C ← A )= (C ,..., n n 0 0 n C ). Moreover, N is the set consisting of all A-formulas of the form ⊥← A ∈ N ,where 0 ⊥ A ∈ P (A). Since ⊥←{a ,..., a } can be read as ¬a ∨ ··· ∨ ¬a , we call such A- fin 1 n 1 n formulas propositional clauses. (In contrast, we call a variable-free base formula such as p∨q a ground clause when F is first-order logic.) The set N represents the clauses considered by the SAT solver in the original AVATAR [28]. Note the use of calligraphic letters (e.g., C , N ) to range over A-formulas and sets of A-formulas. Model-guided provers only consider A-formulas whose assertions are true in the current interpretation. Thus we say that an A-formula C ← A ∈ AF is enabled in a propositional interpretation J if A ⊆ J. A set of A-formulas is enabled in J if all of its members are enabled in J. Given an A-formula set N ⊆ AF, the enabled projection N ⊆N consists of the projections C of all A-formulas C enabled in J. Analogously, the enabled projection Inf ⊆Inf of a set Inf of AF-inferences consists of the projections ι of all inferences ι ∈ Inf whose premises are all enabled in J. A propositional interpretation J is a propositional model of N , written J | N , if ⊥ ⊥ ⊥ ∈ / (N ) . (i.e., (N ) =∅). Moreover, we write J |≈ N if ⊥ ∈ / (N ) or fml(J) |≈ ⊥ ⊥ ⊥ ⊥ J J J {⊥}.Aset N is propositionally satisfiable if there exists an interpretation J such that J | N . In contrast to consequence relations, propositional modelhood | interprets the set N conjunctively: J | N is informally understood as J | N . ⊥ ⊥ ⊥ Given consequence relations | and |≈, we lift them from P(F) to P(AF): M | N if and only if M | N for every J in which N is enabled, and M |≈ N if and only if fml(J)∪M |≈ N for every J in which N is enabled. The consequence relation | is used for the completeness of the splitting prover and only captures what inferences such a prover must perform. In contrast, |≈ captures a stronger semantics: For example, thanks to fml(J) among the premises for |≈, the A-formula fml(a) ←{a} is always a |≈-tautology. Also note that assuming ∅ | ∅,then |⊆|≈ on sets that contain exclusively propositional clauses. When needed, we use |≈ to denote |≈ on P(F) and analogously for |≈ ,aswellas | F AF F and | . AF Lemma 4 The relations | and |≈ on P(AF) are consequence relations. Proof We consider only |≈; the proof for | is analogous. For (D1), we need to show fml(J) ∪ {⊥} |≈ ∅ for every J because ∅ is always enabled. This follows from (D1) and (D3). For (D2), we need to show fml(J)∪{C ← A} |≈ {C ← A}, assuming C ← A is enabled in J. Hence it suffices to show fml(J)∪{C}|≈{C}, which follows from (D2) and (D3). For (D3), it suffices to show fml(J) ∪ M |≈ N assuming that N is enabled in J, and fml(J) ∪ M |≈ N for every M ⊆ M and N ⊆ N . This follows from (D3) and monotonicity of and () . For (D4), we need to show fml(J) ∪ (M ∪ M ) |≈ N ∪ N , assuming fml(J) ∪ M |≈ J J N ∪{C} if C ← A is enabled in J, fml(J) ∪ M ∪{C ← A} |≈ N , and N ∪ N is enabled in J. This follows directly from (D4) if C ← A is enabled in J, andfrom(D3)if C ← A is not enabled. Finally, we show the compactness of |≈ (D5), using the compactness of propositional AF logic. First we consider the case where N is never enabled. Then the set of assertions in N , seen as conjunctions of propositional literals, is unsatisfiable. By compactness, there exists a finite subset of these assertions that is also unsatisfiable, i.e., there is a finite subset N of N that is also never enabled. Thus for any finite subset M of M , M |≈ N as wanted. 123 Unifying Splitting Page 7 of 44 16 Otherwise, there is at least one J enabling N . By abuse of notation, we write N even if A ⊆ A is not an interpretation. For every interpretation J in which N is enabled, there J J exist by compactness of |≈ finite sets J ⊆ J, M ⊆ M , and N ⊆ N such that J J fml(J ) ∪ M |≈ N . Define E ={⊥ ← {¬a}| a ∈ A for some C ← A ∈ N }. Note that J | E if and only if N is enabled in J. This observation implies that the sets of propositional clauses E and {⊥ ← J | J interpretation where N is enabled}∪ E are, respectively, propositionally satisfiable and propositionally unsatisfiable. By compactness, there exists a finite unsatisfiable subset {⊥ ← J ,...,⊥← J }∪ E of the latter set. J J i i Let M = M and N = N ∪ N where J is any of the interpretations i i enabling N that is at the origin of the existence of this J and N is a finite subset of N such that all assertions in E also occur negated in N . Note that both M and N are finite sets. It now suffices to show M |≈ N . Thus let J be an interpretation in which N is enabled. Then J | E because all assertions in E also appear negated in N ⊆ N . Thus, since {⊥ ← J ,...,⊥← J }∪ E is unsatisfiable, there must exist an index k such that J J k k J | ⊥ ← J ,thatis, J ⊆ J.Wehave fml(J ) ∪ M |≈ N by construction, and thus k k k J J i i fml(J) ∪ M |≈ N ∪N by (D3). i i Given sets M, N ⊆ P(F), the expression M | N can refer to either the base consequence relation on P(F) or the lifted consequence relation on P(AF) (since F ⊆ AF). Fortunately, there is no ambiguity. First, let us show a preparatory lemma: Lemma 5 Let | be a consequence relation on F, and M, N ⊆ F. If M | N for all M ⊇ M and N ⊇ N such that M ∪ N = F,then M | N. Proof By contraposition, we assume that M | N, and we need to find M ⊇ M and N ⊇ N such that M ∪ N = F and M | N . We apply Zorn’s lemma to obtain a maximal element (M , N ) of the set {(M , N ) | M ⊆ M , N ⊆ N and M | N } with the order (M , N ) ≤ (M , N ) if and only if M ⊆ M and N ⊆ N . Compactness of |, together 1 1 2 2 1 2 1 2 with (D3), guarantees that every chain in this set has an upper bound; for nonempty chains, this is the pairwise union of all the elements in the chain. It remains to show that M ∪N = F. Assume to the contrary that C ∈ / M ∪ N for some C. Due to the maximality of (M , N ),we necessarily have M ∪{C}| N and M | N ∪{C}. Applying the cut rule for |, we get M | N , a contradiction. Lemma 6 The two versions of | coincide on F-formulas, and similarly for |≈. Proof The first property is obvious. For the second property, the argument is as follows. Let M, N ⊆ F. Then we must show that M |≈ N if and only if M |≈ N. First assume that F AF M |≈ N. Then clearly fml(J) ∪ M |≈ N for any J by (D3) and thus M |≈ N. Assuming F F AF M |≈ N, we show M |≈ N using Lemma 5. It thus suffices to show that M |≈ N for AF F F every M ⊇ M and N ⊇ N such that M ∪ N = F. Set J ={v | fml(v) ∈ M }∪{¬v | fml(v)/ ∈ M }. Then fml(J) ∪ M ∪∼N ⊆ M ∪∼N . By the assumption M |≈ N we have AF fml(J) ∪ M |≈ N and thus M |≈ N via (D3). F F Aside from resolving ambiguity, Lemma 6 justifies the use of splitting in provers without compromising soundness or completeness: When we prove a completeness theorem that claims that a given prover derives ⊥ from any initial | -unsatisfiable set M ⊆ AF, Lemma 6 AF allows us to conclude that it also derives ⊥ when starting from any initial | -unsatisfiable set M ⊆ F. Given a formula C ∈ F , let asn(C) denote the set of assertions a ∈ A such that {fml(a)}|≈| {C}. Normally, we would make sure that asn(C) is nonempty for every formula C. Given 123 16 Page 8 of 44 G. Ebner et al. a ∈ asn(C), observe that if a ∈ asn(D), then {C}|≈| {D}, and if ¬a ∈ asn(D), then {C}|≈| {∼D}. Remark 7 Our propositional interpretations are always total. We could also consider partial interpretations—that is, J ⊆ A such that at most one of v ∈ J and ¬v ∈ J holds for every v ∈ V. But this is not necessary, because partial interpretations can be simulated by total + − ones: For every variable v in the partial interpretation, we can use two variables v and v + − in the total interpretation and interpret v as true if v is true and v as true if v is false. By − + adding the propositional clause ⊥←{v , v }, every total model of the translated A-formulas corresponds to a partial model of the original A-formulas. Example 8 In the original description of AVATAR [28], the connection between first-order clauses and assertions takes the form of a function [] : F → A. The encoding is such that [¬C]=¬[C] for every ground unit clause C and [C]=[D] if and only if C is syntactically equal to D up to variable renaming. This can be supported in our framework by letting fml(v) = C for some C such that [C]= v, for every propositional variable v. A different encoding is used to exploit the theories of an SMT solver [4]. With a notion of |≈-entailment that gives a suitable meaning to Skolem symbols, we can go further and have [¬C(sk )]=¬[C(x)]. Even if the superposition prover considers sk an uninter- ¬C(x) ¬C(x) preted symbol (according to |), the SAT or SMT solver can safely prune the search space by assuming that C(x) and ¬C(sk ) are exhaustive (according to |≈). ¬C(x) 3 Splitting Calculi Let F be a set of base formulas equipped with ⊥, |, and |≈. The consequence relation |≈ is assumed to be nontrivial: (D6) ∅ |≈ ∅. Let A be a set of assertions over V,and let AF be the set of A-formulas over F and A. Let (FInf, FRed) be a base calculus for F-formulas, where FRed is a redundancy criterion that additionally satisfies (R5) Inf(F, Red (N)) ⊆ Red (N) for every N ⊆ F; F I (R6) ⊥ ∈ / FRed (N) for every N ⊆ F; (R7) C ∈ FRed ({⊥}) for every C =⊥. These requirements can easily be met by a well-designed redundancy criterion. Require- ment (R5) is called reducedness by Waldmann et al. [30, Sect. 2.3]. Requirement (R6) must hold of any complete calculus (Lemma 2), and (R7) can be made without loss of generality (Remark 3). Bachmair and Ganzinger’s redundancy criterion for superposition [1, Sect. 4.3] meets (R1)–(R7). From a base calculus, we will define an induced splitting calculus (SInf, SRed). We will show that the splitting calculus is sound w.r.t. |≈ and that it is statically and dynamically complete w.r.t. |. Furthermore, we will show two stronger results that take into account the switching of propositional models that characterizes most splitting architectures: strong static completeness and strong dynamic completeness. 3.1 The Inference Rules We start with the mandatory inference rules. Definition 9 The splitting inference system SInf consists of all instances of the following two rules: 123 Unifying Splitting Page 9 of 44 16 n n (C ← A ) (⊥← A ) i i i i=1 i=1 Base Unsat D ← A ∪ ··· ∪ A ⊥ 1 n For Base, the side condition is (C ,..., C , D) ∈ FInf. For Unsat, the side condition is n 1 that {⊥ ← A ,...,⊥← A } is propositionally unsatisfiable. 1 n In addition, the following optional inference rules can be used if desired; the completeness proof does not depend on their application. Rules identified by double bars, such as Split,are simplifications; they replace their premises with their conclusions in the current A-formula set. The premises’ removal is justified by SRed , defined in Sect. 3.2. C ← A Split ⊥←{¬a ,..., ¬a }∪ A (C ←{a }) 1 n i i i=1 In the Split rule, we require that C =⊥ is splittable into C ,..., C and that a ∈ asn(C ) 1 n i i for each i. A-formula C is splittable into formulas C ,..., C if n ≥ 2, {C}|≈{C ,..., C } 1 n 1 n and C ∈ FRed ({C }) for each i. F i Split performs an n-way case analysis on C. Each case C is approximated by an asser- tion a . The first conclusion expresses that the cases are exhaustive. The n other conclusions assume C if its approximation a is true. i i In a clausal prover, typically C = C ∨ ··· ∨ C , where the subclauses C have mutually 1 n i disjoint sets of variables and form a maximal split. For example, the clause p(x) ∨ q(x) is not splittable because of the shared variable x, whereas p(x) ∨q(y) can be split into {p(x), q(y)}. n n (⊥← A ) C ← A (⊥← A ) C ← A ∪ B i i i=1 i=1 Trim Collect n n (⊥← A ) (⊥← A ) C ← B i i i=1 i=1 For Collect, we require C =⊥ and {⊥← A } |≈ {⊥ ← A}. For Trim, we require C =⊥ i=1 and {⊥ ← A } ∪{⊥← A}|≈{⊥ ← B}. i=1 Collect removes A-formulas whose assertions cannot be satisfied by any model of the propositional clauses—a form of garbage collection. Similarly, Trim removes assertions that are entailed by existing propositional clauses. (⊥← A ) C ← A i=1 Approx StrongUnsat Tauto ⊥ ⊥←{¬a}∪ A C ← A For StrongUnsat,werequire {⊥ ← A } |≈ {⊥}. For Approx, we require a ∈ asn(C). i=1 For Tauto, we require |≈ {C ← A}. StrongUnsat is a variant of Unsat that uses |≈ instead of |. A splitting prover may choose to apply StrongUnsat if desired, but only Unsat is necessary for completeness. In practice, |≈-entailment can be much more expensive to decide, or even be undecidable. A splitting prover could invoke an SMT solver [4](|≈) with a time limit, falling back on a SAT solver (|) if necessary. Approx can be used to make any derived A-formula visible to |≈. It is similar to a one-way split. Tauto, which asserts a |≈-tautology, allows communication in the other direction, from the SMT or SAT solver to the calculus. Example 10 Suppose the base calculus is first-order resolution [2] and the initial clauses are ¬p(a), ¬q(z, z), and p(x) ∨ q(y, b), as in Sect. 1. Split replaces the last clause by ⊥←{¬v , ¬v }, p(x) ←{v }, and q(y, b) ←{v }. Two Base inferences then generate 0 1 0 1 ⊥←{v } and ⊥←{v }. Finally, Unsat generates ⊥. 0 1 123 16 Page 10 of 44 G. Ebner et al. Example 11 Consider a splitting calculus obeying the AVATAR conventions of Example 8. When splitting on C(x) ∨ D(y), after closing the C(x) case, we can assume that C(x) does not hold when considering the D(y) case. This can be achieved by adding the A-clause ¬C(sk ) ←{¬[C(x)]} using Tauto. If we use an SMT solver that is strong enough to ¬C(x) determine that ¬C(sk ) and D(y) are inconsistent, we can then apply StrongUnsat ¬C(x) immediately, skipping the D(y) branch altogether. This would be the case if we took C(x) := f(x)>0and D(y) := f(y)> 3 with a solver that supports linear arithmetic and quantifiers. We are not aware of any prover that implements this idea, although a similar idea is described for ground C(x) in the context of labeled splitting [15, Sect. 2]. Example 12 Consider a splitting calculus whose propositional solver is an SMT solver sup- porting linear arithmetic. Suppose that we are given the inconsistent clause set {c > 0, c < 0}. Two applications of Approx make these clauses visible to the SMT solver, as the propo- sitional clause set {⊥ ← ¬(c > 0), ⊥←¬(c < 0)}. Then the SMT solver, modeled by StrongUnsat, detects the unsatisfiability. The splitting inference system commutes nicely with the enabled projection: Lemma 13 (SInf(N )) = FInf(N ) if ⊥ ∈ / N . J J J Proof The condition ⊥ ∈ / N rules out the Unsat inferences. It remains to show that the enabled projection of a Base inference is an FInf-inference from enabled premises, and vice versa. Theorem 14 (Soundness) The rules Unsat, Split, Collect, Trim, StrongUnsat, Approx, and Tauto are sound w.r.t. |≈. Moreover, if every rule in FInf is sound w.r.t. |≈ (on P(F)), then the rule Base is sound w.r.t. |≈ (on P(AF)). Proof Cases Unsat, StrongUnsat, Tauto:Trivial. Case Split: For the left conclusion, by definition of |≈, it suffices to show fml(J)∪{C}|≈{⊥} for every J ⊇ A∪{¬a ,..., ¬a }. By the side condition {C}|≈{C ,..., C }, it suffices in 1 n 1 n turn to show fml(J)∪{C }|≈{⊥} for every i. Notice that ∼C ∈ fml(J). The entailment i i amounts to fml(J)\{∼C } ∪{C }|≈{C }, which follows from (D2) and (D3). i i i For the right conclusions, we must show fml(J)∪{C ← A} |≈{C } for every J ⊇{a }. J i i Notice that C ∈ fml(J). The desired result follows from (D2) and (D3). Case Collect: We must show {⊥ ← A } |≈ {C ← A}. This follows from the stronger i=1 side condition {⊥ ← A } |≈ {⊥ ← A}. i=1 Case Trim: Only the right conclusion is nontrivial. Let N ={⊥ ← A } . It suffices to i=1 show N ∪{C ← A} |≈ {C} for every J ⊇ B. Assume J |≈ N ∪{C ← A} . By the side J J J J condition N ∪{⊥← A}|≈{⊥ ← B}, we get N ∪{⊥← A} |≈ {⊥}, meaning that either J J N |≈ {⊥} or J ⊇ A. The first case is trivial. In the other case, J |≈ N ∪{C} and thus J J J |≈ {C}, as required. Case Approx: The proof is as for the left conclusion of Split. Case Base:To show {C ← A } |≈ {D ← A ∪ ··· ∪ A }, by the definition of |≈ on i i 1 n i=1 P(AF), it suffices to show {C ,..., C }|≈{D}. This follows from the soundness of the 1 n inferences in FInf. 123 Unifying Splitting Page 11 of 44 16 3.2 The Redundancy Criterion Next, we lift the base redundancy criterion. Definition 15 The splitting redundancy criterion SRed = (SRed , SRed ) is specified as I F follows. An A-formula C ← A ∈ AF is redundant w.r.t. N , written C ← A ∈ SRed (N ), if either of these conditions is met: (1) C ∈ FRed (N ) for every propositional interpretation J ⊇ A;or F J (2) there exists an A-formula C ← B ∈ N such that B ⊂ A. An inference ι ∈ SInf is redundant w.r.t. N , written ι ∈ SRed (N ), if either of these conditions is met: (3) ι is a Base inference and {ι} ⊆ FRed (N ) for every J;or J J (4) ι is an Unsat inference and ⊥∈ N . Condition (1) lifts FRed to A-formulas. It is used both as such and to justify the Split and Collect rules, as we will see below. Condition (2) is used to justify Trim. We will use SRed to justify global A-formula deletion, but also FRed for local A-formula deletion in F F the locking prover. Note that SRed is not reduced. Inference redundancy partly commutes with the enabled projection: Lemma 16 (SRed (N )) ⊆ FRed (N ) if ⊥ ∈ / N . I J I J Proof Since ⊥ ∈ / N , condition (4) of the definition of SRed cannot apply. The inclusion then follows directly from condition (3) applied to the interpretation J. Lemma 17 ⊥ ∈ / SRed (N ) for every N ⊆ AF. Proof By Lemma 2, condition (1) of the definition of SRed cannot apply. Nor can condi- tion (2). Lemma 18 SRed is a redundancy criterion. Proof We will first show that the restriction ARed of SRed to Base inferences is a redundancy criterion. Then we will consider Unsat inferences. ∩G , We start by showing that ARed is a special case of the redundancy criterion FRed of Waldmann et al. [29,Sect.3]—the intersection of lifted redundancy criteria with tiebreaker orders. Then we can simply invoke Theorem 37 and Lemma 19 from their technical report [30]. To strengthen the redundancy criterion, we define a tiebreaker order such that C ← A D ← B if and only if C = D and A ⊂ B. In this way, C ← B is redundant w.r.t. C ← A if A ⊂ B, even though the base clause is the same. The only requirement on is that it must be well founded, which is the case since the assertion sets of A-formulas are finite. We also define a family of grounding functions G indexed by a propositional model J. Here, “grounding” will mean enabled projection. For A-formulas C , we set G (C ) ={C } . For J J inferences ι, we set G (ι) ={ι} . J J We must show that G satisfies the following characteristic properties of grounding func- tion: (G1) G (⊥) ={⊥};(G2)for every C ∈ AF, if ⊥∈ G (C ), then C =⊥; and (G3) for J J every ι ∈ SInf, G (ι) ⊆ FRed (G (concl(ι))). J I J Condition (G1) obviously holds, and (G3) holds by property (R4) of FRed. However, (G2) does not hold, a counterexample being ⊥←{a}. On closer inspection, Waldmann et al. use 123 16 Page 12 of 44 G. Ebner et al. (G2) only to prove static completeness (Theorems 27 and 45 in their technical report) but ∩G , not to establish that FRed is a redundancy criterion, so we can proceed. It is a routine ∩G , ∩G , ∩G exercise to check that ARed coincides with FRed = (FRed , FRed ), which is defined as follows: ∩G 1. ι ∈ FRed (N ) if and only if for every propositional interpretation J, we have G (ι) ⊆ FRed (G (N )); I J ∩G , 2. C ∈ FRed (N ) if and only if for every propositional interpretation J and every D ∈ G (C ), either D ∈ FRed (G (N )) or there exists C ∈ N such that C C and J F J D ∈ G (C ). We also need to check that the consequence relation | used in SRed coincides with the ∩ ∩ consequence relation | , which is defined as M | {C } if and only if for every J and G G D ∈ G ({C }),wehave G (M ) | {D}. After expanding G , this is exactly the definition we J J J used for lifting | to AF. To extend the above result to SRed, we must show the second half of conditions (R2) and (R3) as well as (R4) for Unsat inferences. (R2) Given an Unsat inference ι, we must show that if M ⊆ N and ι ∈ SRed (M ), then ι ∈ SRed (N ). This holds because if ⊥∈ M , then ⊥∈ N . (R3) Given an Unsat inference ι, we must show that if M ⊆ SRed (N ) and ι ∈ SRed (N ), F I then ι ∈ SRed (N \M ). This amounts to proving that if ⊥∈ N , then ⊥∈ N \M , which follows from Lemma 17. (R4) Given an Unsat inference ι, we must show that if ⊥∈ N , then ι ∈ SRed (N ). This follows from the definition of SRed . SRed is highly versatile. It can justify the deletion of A-formulas that are propositionally tautological, such as C ←{v, ¬v}. It lifts the base redundancy criterion gracefully: If D ∈ n n FRed ({C } ), then D ← A ∪ ··· ∪ A ∈ SRed ({C ← A } ). It also allows other F i 1 n F i i i=1 i=1 simplifications, as long as the assertions on A-formulas used to simplify a given C ← A are contained in A. If the base criterion FRed supports subsumption (e.g., following the lines of Waldmann et al. [29]), this also extends to A-formulas: D ← B ∈ SRed ({C ← A}) if D is strictly subsumed by C and B ⊇ A,orif C = D and B ⊃ A. Finally, it is strong enough to justify case splits and the other simplification rules presented in Sect. 3.1. Theorem 19 (Simplification) For every Split, Collect,or Trim inference, the conclusions collectively make the premises redundant according to SRed . Proof Case Split: We must show C ← A ∈ SRed ({⊥ ← {¬a ,..., ¬a }∪ A}∪{C ← F 1 n i {a }} ). By condition (1) of the definition of SRed , it suffices to show C ∈ FRed ({⊥ ← i F F i=1 {¬a ,..., ¬a }} ∪({C ←{a }} ) for every J ⊇ A. If a ∈ J for some i, this follows from 1 n J i i J i i=1 Split’s side condition C ∈ FRed ({C }). Otherwise, this follows from (R7), the requirement F i that C ∈ FRed ({⊥}),since C =⊥. Case Collect: We must show C ← A ∈ SRed ({⊥ ← A } ). By condition (1) of the F i i=1 definition of SRed , it suffices to show C ∈ FRed (({⊥ ← A } ) ) for every J ⊇ A. If F F i J i=1 A ⊆ J for some i, this follows from Collect’s side condition that C =⊥ and (R7). Otherwise, from the side condition {⊥ ← A } |≈ {⊥ ← A}, we obtain ∅|≈{⊥}, which i=1 contradicts (D6). Case Trim: We must show C ← A ∪ B ∈ SRed ({⊥ ← A } ∪{C ← B}). This follows F i i=1 directly from condition (2) of the definition of SRed . 123 Unifying Splitting Page 13 of 44 16 Annoyingly, the redundancy criterion SRed does not mesh well with α-equivalence. We would expect the A-formula p(x) ←{a} to be subsumed by p(y) ←∅, where x, y are vari- ables, but this is not covered by condition (2) of SRed because p(x) = p(y). The simplest solution is to take F to be the quotient of some set of raw formulas by α-equivalence. An alternative is to generalize the theory so that the projection operator G generates entire α-equivalence classes (e.g., G ({p(x)}) ={p(x), p(y), p(z), ... }) or groundings (e.g., G ({p(x)}) ={p(a), p(f(a)), . . . }). Waldmann et al. describe the second approach [29, Sect. 4]. 3.3 Standard Saturation We will now prove that the splitting calculus is statically complete and therefore dynamically complete. Unfortunately, derivations produced by most practical splitting architectures vio- late the fairness condition associated with dynamic completeness. Nevertheless, the standard completeness notions are useful stepping stones, so we start with them. Lemma 20 Let N ⊆ AF be an A-formula set, and let J be a propositional interpretation. If N is saturated w.r.t. SInf and SRed , then N is saturated w.r.t. FInf and FRed . I J I Proof Assuming ι ∈ FInf(N ), we must show ι ∈ FRed (N ). The argument follows that of J I J the “folklore” Lemma 26 in the technical report of Waldmann et al. [30]. First note that any inference in FInf is lifted, via Base,in SInf, so that we have ι ∈ (SInf(N )) . This means that there exists a Base inference ι ∈ SInf(N ). By saturation of N , we have ι ∈ SRed (N ). 0 0 I By definition of SRed , {ι } ={ι}⊆ FRed (N ), as required. I 0 I J J Theorem 21 (Static completeness) Assume (FInf, FRed) is statically complete. Then (SInf, SRed) is statically complete. Proof Suppose N ⊆ AF, N | {⊥}, and N is saturated w.r.t. SInf and SRed . We will show ⊥∈ N . First, we show ⊥∈ N for every J. From N | {⊥}, by the definition of | on A- formulas, it follows that N | {⊥}. Moreover, by Lemma 20, N is saturated w.r.t. FInf and J J FRed . By static completeness of (FInf, FRed), we get ⊥∈ N . I J Hence N is propositionally unsatisfiable. By compactness of propositional logic, there exists a finite subset M ⊆ N such that M is propositionally unsatisfiable. By saturation w.r.t. Unsat,weobtain ⊥∈ N , as required. Thanks to the requirements on the redundancy criterion, we obtain dynamic completeness as a corollary: Corollary 22 (Dynamic completeness) Assume (FInf, FRed) is statically complete. Then (SInf, SRed) is dynamically complete. Proof This immediately follows from Theorem 21 by Lemma 6 in the technical report of Waldmann et al. [30]. 3.4 Local Saturation The above completeness result, about -derivations, can be extended to prover designs SRed based on the given clause procedure, such as the Otter, DISCOUNT, and Zipperposition 123 16 Page 14 of 44 G. Ebner et al. loops, as explained by Waldmann et al. [29, Sect. 4]. But it fails to capture a crucial aspect of most splitting architectures. Since -derivations have no notion of current split branch SRed or propositional model, they place no restrictions on which inferences may be performed when. To fully capture splitting, we need to start with a weaker notion of saturation. If an A- formula set is consistent, it should suffice to saturate w.r.t. a single propositional model. In other words, if no A-formula ⊥← A such that A ⊆ J is derivable for some model J | N , the prover will never be able to apply the Unsat rule to derive ⊥. It should then be allowed to deliver a verdict of “consistent.” We will call such model-specific saturations local and standard saturations global. Definition 23 Aset N ⊆ AF is locally saturated w.r.t. SInf and SRed if either ⊥∈ N or there exists a propositional model J | N such that N is saturated w.r.t. FInf and FRed . ⊥ I Local saturation works in tandem with strong static completeness: Theorem 24 (Strong static completeness) Assume (FInf, FRed) is statically complete. Given a set N ⊆ AF that is locally saturated w.r.t. SInf and SRed and such that N | {⊥}, we have ⊥∈ N . Proof We show ⊥∈ N by case analysis on the condition by which N is locally saturated. The first case is vacuous. Otherwise, let J | N . Since N | {⊥}, we have N | {⊥}. By ⊥ J the definition of local saturation and static completeness of (FInf, FRed), we get ⊥∈ N , contradicting J | N . Example 25 Consider the following A-clause set expressed using AVATAR conventions: {⊥ ← {¬[p(x)], ¬[q(y)]}, p(x) ←{[p(x)]}, q(y) ←{[q(y)]}, ¬q(a)} It is not globally saturated for resolution, because the conclusion ⊥←{[q(y)]} of resolving the last two A-clauses is missing, but it is locally saturated with J ⊇{[p(x)], ¬[q(y)]} as the witness in Definition 23. We also need a notion of local fairness that works in tandem with local saturation. Definition 26 A sequence (N ) of sets of A-formulas is locally fair w.r.t. SInf and SRed i i I if either ⊥∈ N for some i or there exists a propositional model J | (N ) such that i ∞ ⊥ FInf((N ) ) ⊆ FRed ((N ) ). ∞ J I i J Lemma 27 Let (N ) be a -derivation that is locally fair w.r.t. SInf and SRed . Then i i SRed I the limit inferior N is locally saturated w.r.t. SInf and SRed . ∞ I Proof The proof is by case analysis on the condition by which (N ) is locally fair. If ⊥∈ N , i i i then⊥∈ N by Lemma 17,and N is therefore locally saturated. In the remaining case, we ∞ ∞ have N ⊆ N ∪SRed (N ) by Lemma 4 in the technical report of Waldmann et al. [30], and i ∞ F ∞ therefore FRed ((N ) ) ⊆ FRed ((N ) ∪ FRed ((N ) )) = FRed ((N ) ) I i I ∞ F ∞ I ∞ J J J J i i i because we clearly have (SRed (N )) ⊆ FRed ((N ) ) ∪ (N ) . F ∞ J F ∞ J ∞ J Local fairness works in tandem with strong dynamic completeness. Theorem 28 (Strong dynamic completeness) Assume (FInf, FRed) is statically complete. Given a -derivation (N ) that is locally fair w.r.t. SInf and SRed and such that N | SRed i i I 0 {⊥}, we have ⊥∈ N for some i. 123 Unifying Splitting Page 15 of 44 16 Proof We connect the dynamic and static points of view along the lines of the proof of Lemma 6 in the technical report of Waldmann et al. [30]. First, we show that the limit inferior is inconsistent: N | {⊥}. We have N ⊇ N | {⊥}, and by (R1), it follows ∞ i 0 that ( N )\SRed ( N ) | {⊥}. By their Lemma 2, ( N )\SRed ( N ) ⊆ N . i F i i F i ∞ i i i i Hence N ⊇ ( N )\SRed ( N ) | {⊥}. By Lemma 27, N is locally saturated, so ∞ i F i ∞ i i by Theorem 24, ⊥∈ N . Thus, ⊥∈ N for some i. ∞ i An alternative proof based on dynamic completeness follows: Proof We show ⊥∈ N for some i by case analysis on the condition by which (N ) is locally i i i fair. The first case is vacuous. Otherwise, we have J | (N ) . Since N | {⊥}, we have ∞ ⊥ 0 (N ) | {⊥}. By the definition of local fairness and Theorem 22,weget ⊥∈ (N ) for 0 J i J some i. By Lemma 2 and the definition of , we obtain ⊥∈ (N ) , contradicting FRed ∞ F J J | (N ) . ∞ ⊥ In Sects. 4 to 6, we will review three transition systems of increasing complexity, culmi- nating with an idealized specification of AVATAR. They will be linked by a chain of stepwise refinements, like pearls on a string. All derivations using these systems will correspond to -derivations, and their fairness criteria will imply local fairness. Consequently, by SRed Theorem 28, they will all be complete. 4 Model-Guided Provers The transition system provides a very abstract notion of splitting prover. AVATAR and SRed other splitting architectures maintain a model of the propositional clauses, which represents the split tree’s current branch. We can capture this abstractly by refining -derivations SRed to incorporate a propositional model. 4.1 The Transition Rules The states are now pairs (J, N ), where J is a propositional interpretation and N ⊆ AF. Initial states have the form (J, N), where N ⊆ F. The model-guided prover MG is defined by the following transition rules: Derive (J, N M ) ⇒ (J, N M ) ifM ⊆ SRed (N M ) MG F Switch (J, N ) ⇒ (J , N ) ifJ | N MG ⊥ StrongUnsat (J, N ) ⇒ (J, N ∪{⊥}) ifN |≈ {⊥} MG ⊥ The Derive rule can add new A-formulas (M ) and delete redundant A-formulas (M ). In practice, Derive will perform only sound or consistency-preserving inferences, but we impose no such restriction. If soundness of a prover is desired, it can be derived easily from the soundness of the individual inferences. Similarly, M and M will usually be enabled in J, but we do not require this. The interpretation J should be a model of N most of the time; when it is not, Switch can be used to switch interpretation or StrongUnsat to finish the refutation. Although the condition J | (N ) might be violated for some i, to make progress we must periodically i i ⊥ check it and apply Switch as needed. Much of the work that is performed while the condi- tion is violated will likely be wasted. To avoid this waste, Vampire invokes the SAT solver whenever it selects a clause as part of the given clause procedure. 123 16 Page 16 of 44 G. Ebner et al. Transitions can be combined to form ⇒ -derivations (pronounced “arrow-MG- MG derivations”). Lemma 29 If (J, N ) ⇒ (J , N ), then N N . MG SRed Proof The only rule that deletes A-formulas, Derive, exclusively takes out A-formulas that are redundant w.r.t. the next state, as mandated by . SRed To develop our intuitions, we will study several examples of ⇒ -derivations. In all the MG examples in this section, the base calculus is first-order resolution, and | is entailment for first-order logic with equality. Example 30 Let us revisit Example 10. Initially, the propositional interpretation is J = {¬v , ¬v }. After the split, we have the A-clauses ¬p(a), ¬q(z, z), p(x) ←{v }, q(y, b) ← 0 1 0 {v }, and ⊥←{¬v , ¬v }. The natural option is to switch interpretation. We take J = 1 0 1 1 {v , ¬v }. We then derive ⊥←{v }. Since J | ⊥ ← {v }, we switch to J ={¬v , v }, 0 1 0 1 0 2 0 1 where we derive ⊥←{v }. Finally, we detect that the propositional clauses are unsatisfiable and generate ⊥. This corresponds to the transitions below, where arrows are annotated by transition names and light gray boxes identify enabled A-clauses: (J , { ¬p(a), ¬q(z, z), p(x) ∨ q(y, b) }) ⇒ (J , { ¬p(a), ¬q(z, z), ⊥←{¬v , ¬v } , p(x) ←{v }, q(y, b) ←{v }}) Derive 0 0 1 0 1 ⇒ (J , { ¬p(a), ¬q(z, z), ⊥←{¬v , ¬v }, p(x) ←{v } , q(y, b) ←{v }}) 1 0 1 0 1 Switch ⇒ (J , { ¬p(a), ¬q(z, z), ⊥←{¬v , ¬v }, p(x) ←{v } , q(y, b) ←{v }, Derive 1 0 1 0 1 ⊥←{v }}) ⇒ (J , { ¬p(a), ¬q(z, z), ⊥←{¬v , ¬v }, p(x) ←{v }, q(y, b) ←{v } , Switch 2 0 1 0 1 ⊥←{v }}) ⇒ (J , { ¬p(a), ¬q(z, z), ⊥←{¬v , ¬v }, p(x) ←{v }, q(y, b) ←{v } , Derive 2 0 1 0 1 ⊥←{v }, ⊥←{v }}) 0 1 ⇒ (J , { ¬p(a), ¬q(z, z), ⊥←{¬v , ¬v }, p(x) ←{v }, q(y, b) ←{v } , 2 0 1 0 1 Strong- Unsat ⊥←{v }, ⊥←{v } , ⊥}) 0 1 4.2 Fairness We need a fairness criterion for MG that implies local fairness of the underlying - SRed derivation. The latter requires a witness J but gives us no hint as to where to look for one. This is where basic topology comes into play. Definition 31 A propositional interpretation J is a limit point in a sequence (J ) if there i i exists a subsequence (J ) of (J ) such that J = J = J . i i i i ∞ Intuitively, a limit point is a propositional interpretation that is the limit of a family of interpretations that we revisit infinitely often. We will see that there always exists a limit point. To achieve fairness, we will focus on saturating a limit point. 123 Unifying Splitting Page 17 of 44 16 Fig. 1 A split tree with a single infinite branch Fig. 2 A split tree with two infinite branches Example 32 Let (J ) be the sequence such that J ∩V ={v , v ,..., v } (i.e., v , v ,..., i i 2i 1 3 2i−1 1 3 v are true and the other variables are false) and J = (J \{¬v }) ∪{v }. Although it 2i−1 2i+1 2i 2i 2i is not in the sequence, the interpretation J ∩ V ={v , v ,...} is a limit point. The split tree 1 3 of (J ) is depicted in Fig. 1. The direct path from the root to a node labeled J specifies the i i i assertions that are true in J . The limit point J corresponds to the only infinite branch of the tree. The above example hints at why the proof of MG’s dynamic completeness is nontrivial: Some derivations might involve infinitely many split branches, making it difficult for the prover to focus on any single one and saturate it. Example 33 A sequence may have multiple limit points. Let (J ) be the sequence such that i i J ∩ V =∅, J ∩ V ={v }∪{v | j < i}, J ∩ V ={v , v }∪{v | j < i}, 0 4i+1 0 4 j+3 4i+2 0 4i+2 4 j+3 J ∩ V ={v | j ≤ i}, and J ∩ V ={v | j ≤ i}∪{v }. This sequence 4i+3 4 j+1 4i+4 4 j+1 4i+4 has two limit points: J = lim inf J and J = lim inf J . The split tree is i→∞ 4i+1 i→∞ 4i+3 depicted in Fig. 2. Lemma 34 Let (J ) be a sequence of propositional interpretations. Then J ⊆ J ⊆ J for i i ∞ all of its limit points J. Proof By definition of limit point, there must exist a subsequence (J ) of (J ) such that i i i J = J = J. It is obvious by definition that the limit inferior of a subsequence must be a superset of the limit inferior of the original sequence, and analogously that the limit superior of a subsequence must be a subset of the limit superior of the original sequence. Lemma 35 Every sequence (J ) of propositional interpretations has at least one limit point. i i Proof The set of propositional interpretations is homeomorphic to the set of functions V → {0, 1} equipped with the product topology. Since V is countable, this set of functions is a compact metric space—namely, the Cantor space. In a compact metric space, every sequence has a convergent subsequence, and thus a limit point in our notation. 123 16 Page 18 of 44 G. Ebner et al. We can nearly as easily supply an elementary proof: Proof We construct a subsequence (J ) converging to a limit point J in such a way that J j j gets the first j variables right—i.e., such that J | v if and only if J | v for every k ≤ j. k k Moreover, we maintain the invariant that there are infinitely many elements in the sequence (J ) that agree with this finite prefix. Assume that we have already defined J ,..., J . i i 0 j Among the infinitely many elements J that agree with J ,..., J on v ,... v , there must i 1 j 0 j be infinitely many with J | v or infinitely many with J | ¬v . In the first case, set i j+1 i j+1 J = J for one such index i, and analogously in the second case. j+1 Lemma 35 tells us that every sequence has a limit point. No matter how erratically the prover switches branches, it will systematically explore at least one branch in a limit point. It then suffices to perform the base FInf-inferences fairly in that branch: Definition 36 An ⇒ -derivation (J , N ) is fair if either (1) ⊥∈ N for some i or MG i i i i (2) J | (N ) for infinitely many indices i and there exists a limit point J of (J ) such that i i ⊥ i i FInf((N ) ) ⊆ FRed ((N ) ). ∞ J I i J Until ⊥ is derived, it is impossible in a fair ⇒ -derivation to delay Switch forever MG (by the first half of (2)) or to starve off Derive by performing only Switch transitions (by the second half of (2)). Also note that we make no assumptions about the order in which propositional models are enumerated; the propositional solver is given carte blanche. We might at first expect that a realistic prover would ensure the inclusion FInf((N ) ) ⊆ ∞ J FRed ((N ) ) for all limit points J. However, a prover like Vampire, based on the given I i J clause procedure with an age-based heuristic, might saturate only one of the limit points, as we will see in Sect. 6.2. Fairness of ⇒ -derivations is deliberately defined in terms of FRed instead of SRed . MG I I This results in a more suitable notion of fairness, since it allows the prover to ignore formulas and inferences that are locally redundant at the limit point but not redundant w.r.t. (SInf, SRed). For example, the inference (t ≈ s, p(t), p(s)) is locally redundant in J ⊇{v } if the A-clause p(s) ←{v } has already been derived, but it is not redundant w.r.t. (SInf, SRed). Lemma 37 Let (J , N ) be an ⇒ -derivation such that J | (N ) for infinitely many i i i MG i i ⊥ indices i. Then for every D ∈ (N ) , there exists an index i such that J | {D} for every ∞ ⊥ j j ≥ i. Proof Let D =⊥ ← A ∈ (N ) . Then ⊥← A ∈ N for some k. For every j ≥ k,wethen ∞ ⊥ k have ⊥← B ∈ N for some B ⊆ A, since every ⇒ -derivation is a -derivation and j MG SRed ⊥← A can only become redundant due to such a ⊥← B.Since {⊥ ← B}|{⊥← A},weget (N ) | {⊥ ← A} for every j ≥ k. By the assumption, there exists an index i ≥ k such that j ⊥ J | {⊥ ← A}. For j ≥ i, the interpretation changes only in the Switch transition, which has J | (N ) as a side condition. Since (N ) | {⊥ ← A}, we have J | {⊥ ← A} for j j ⊥ j ⊥ j every j ≥ i. Lemma 38 Let (J , N ) be an ⇒ -derivation such that J | (N ) for infinitely many i i i MG i i ⊥ indices i, and let J be a limit point of (J ) . Then J | (N ) . i i ∞ ⊥ Proof Let C ∈ (N ) . By Lemma 37, there exists an index i such that J | {C } for every ∞ ⊥ j j ≥ i. Let (J ) be the subsequence associated with the limit point J. Then there also exists an index i such that J | {C } for every j ≥ i and hence J | {C }. 123 Unifying Splitting Page 19 of 44 16 In the spirit of refinement, we have that fairness of an ⇒ -derivation implies local MG fairness of the underlying -derivation: SRed Theorem 39 (Fairness) Let (J , N ) be a fair ⇒ -derivation. Then (N ) is a - i i i MG i i SRed derivation that is locally fair w.r.t. SInf and SRed . Proof The case where ⊥∈ N for some i is trivial. Otherwise, we have that J | (N ) for i i i ⊥ infinitely many i and there is a limit point J such that FInf((N ) ) ⊆ FRed ((N ) ). We ∞ J I i J take this limit point as the witness for J in Definition 26. It remains to show that J | (N ) . ∞ ⊥ This follows from Lemma 38. Corollary 40 (Dynamic completeness) Assume (FInf, FRed) is statically complete. Given a fair ⇒ -derivation (J , N ) such that N | {⊥}, we have ⊥∈ N for some i. MG i i i 0 i Proof By Theorem 28. A well-behaved propositional solver, as in labeled splitting, enumerates potential models in a systematic way and always gives rise to a single limit point J , which can be taken for J in the definition of fairness (Definition 36). To achieve this kind of fairness, a splitting prover would perform all inferences from persistently enabled A-formulas—that is, A-formulas that eventually become enabled and remain enabled forever. In a prover based on the given clause procedure, this can be implemented in the standard way, using an age-based selection heuristic [27, Sect. 4]. However, such a strategy is not sufficient if the prover exploits local redundancy, as we will see in Sects. 5 and 7.2, even if the propositional solver is well behaved. By contrast, an unconstrained solver, as supported by AVATAR, can produce multiple limit points; in particular, the restart feature of SAT solvers [20] could produce this kind of behavior. Then it is more challenging to ensure fairness, as we will see in Sect. 6. Example 41 Suppose that we leave out ¬q(z, z) from the initial clause set of Example 10. Then we can still derive ⊥←{v }, as in Example 30, but not ⊥←{v }. By static completeness 0 1 of the splitting calculus, we conclude that the A-clause set is consistent. Example 42 Consider the initial clause set consisting of p(x)∨q(a) and ¬q(y)∨q(f(y)). With- out splitting, and without selection [2, Sect. 3], a resolution prover would diverge attempting to generate infinitely many clauses of the form p(x) ∨ q(f (a)). By contrast, in a splitting prover, we might split the first clause, yielding the A-clauses p(x) ←{v }, q(a) ←{v }, and ⊥←{¬v , ¬v }. If we then choose the model {v } and 0 1 0 1 1 commit to it, we will also diverge, although somewhat faster since we do not need to carry around the literal p(x). On the other hand, if we at any point switch to {v }, we notice that {p(x)} is saturated and terminate. This illustrates the benefits of employing an unconstrained SAT solver. Example 43 It is crucial to invoke the SAT solver often enough—in other words, to take Switch and StrongUnsat transitions periodically. Suppose that the inconsistent initial clause set of Example 10 is supplemented by the prolific but unhelpful clauses r(a) and ¬r(x)∨r(f(x)). We can perform the same split as before, but if we ignore the fairness condition that J | (N ) must hold infinitely often, we can stick to the interpretation {¬v , ¬v } i i ⊥ 1 2 and derive useless consequences of the form r(f (a)) forever, thereby failing to generate ⊥. Similarly, the SAT solver must be invoked eventually after deriving a propositional clause ⊥← A that conflicts with the current interpretation. 123 16 Page 20 of 44 G. Ebner et al. Example 44 Consider the consistent set consisting of ¬p(x), p(a) ∨ q(a), and ¬q(y) ∨ p(f(y)) ∨ q(f(y)). Splitting the second clause into p(a) and q(a) and resolving q(a) with the third clause yields p(f(a)) ∨ q(f(a)). This process can be iterated to yield arbitrarily many i i applications of f. Now suppose that v and v are associated with p(f (a)) and q(f (a)), 2i 2i+1 i i respectively. If we split every emerging clause p(f (a)) ∨ q(f (a)) and the SAT solver always makes v true before v , we end up with the situation of Example 32 and Fig. 1.For the 2i 2i+1 limit point J,all FInf-inferences are performed. Thus, the derivation is fair. Example 45 We build a clause set from two copies of Example 44, where each clause C from each copy i ∈{1, 2} is extended to ¬r ∨ C.Weadd theclause r ∨ r and split it as our i 1 2 first move. From there, each branch imitates Example 44. A SAT solver might jump back and forth between them, as in Example 33 and Fig. 2. Even if the A-clauses get disabled and re-enabled infinitely often, we must select them eventually and perform all nonredundant inferences in at least one of the two limit points (J or J ). 5 Locking Provers With both AVATAR and labeled splitting, an enabled A-clause can be redundant locally, w.r.t. the current interpretation, and yet nonredundant globally. Both architectures provide mechanisms to temporarily lock away such A-clauses and unlock them when coming back to an interpretation where they are no longer locally redundant. In AVATAR, conditionally deleted A-clauses are stored in the locked set; in labeled splitting, they are stored in the split stack. We will refine the model-guided prover into a locking prover that captures these mechanisms. 5.1 The Transition Rules The states of a locking derivation are triples (J, N , L ),where J is a propositional inter- pretation, N ⊆ AF is a set of A-formulas, and L ⊆ P (A) × AF is a set of pairs fin of finite assertion sets and A-formulas. Intuitively, (B, C ← A) ∈ L means that C ← A is “locally redundant” in all interpretations J ⊇ B. The function erases the locks: L ={C | (B, C ) ∈ L for some B}. Initial states have the form (J, N, ∅),where N ⊆ F. The locking prover is defined by two transition rules: Lift (J, N , L ) ⇒ (J , N ∪ U , L \ U ) if (J, N ) ⇒ (J , N ) and U ={(B, C ← A) ∈ L | B J and A ⊆ J } MG Lock (J, N {C ← A}, L ) ⇒ (J, N , L ∪{(B, C ← A)}) ifB ⊆ J and C ∈ FRed (N ) for every J ⊇ A ∪ B The Lift rule performs an ⇒ -transition and unlocks any A-formulas that are no longer MG locally redundant. The Lock rule can be used to lock A-formulas that are locally redundant. Lemma 46 Let (J , N , L ) be an ⇒ -derivation. Then (J , N ∪ L ) is an ⇒ - i i i i L i i i i MG derivation. Proof Every Lift transition clearly corresponds to an MG transition. Every Lock transition corresponds to a Derive transition with M = M =∅. 123 Unifying Splitting Page 21 of 44 16 Example 47 Let J ={¬v } and J ={v }. The following derivation based on first-order 0 0 1 0 resolution illustrates the locking and unlocking of an A-clause: (J , { ¬p(a), p(x) ←{¬v } , p(a) }, ∅) 0 0 ⇒ (J , { ¬p(a), p(x) ←{¬v }}, {({¬v }, p(a))}) 0 0 0 Lock ⇒ (J , { ¬p(a), p(x) ←{¬v } , ⊥←{¬v }}, {({¬v }, p(a))}) 0 0 0 0 Lift ⇒ (J , { ¬p(a), p(x) ←{¬v }, ⊥←{¬v }, p(a) }, ∅) 1 0 0 Lift ⇒ (J , { ¬p(a), p(x) ←{¬v }, ⊥←{¬v }, p(a), ⊥}, ∅) Lift 1 0 0 Gray boxes indicate enabled unlocked clauses. We first put a lock on p(a), because it is “locally subsumed” by p(x) ←{¬v } in J . Once we switch to J , the lock is released, and 0 0 1 we can use p(a) to conclude the refutation. There are three things to note. First, if we had simply thrown away the clause p(a) instead of locking it, we would have lost refutability. Second, it would have been advantageous not to lock p(a) at all and to use it immediately to derive ⊥; however, it is not difficult to come up with examples where locking actually helps, which is why AVATAR includes this mechanism. Third, although the derivation shows only “local subsumption,” it could easily be changed to perform “local simplification”—e.g., demodulation from an equation s ≈ t ← A. 5.2 Counterexamples Locking can cause incompleteness, because an A-formula can be locally redundant at every point in the derivation and yet not be so at any limit point, thereby breaking local saturation. Forexample,ifwehavederived p(x) ←{¬v } for every k, then p(c) is locally redundant in any interpretation J that contains ¬v . If the sequence of interpretations is given by J ={v ,..., v , ¬v , ¬v ,...}, the clause p(c) would always be locally redundant and i 0 i−1 i i+1 never be considered for inferences. Yet p(c) might not be locally redundant at the unique limit point J = V. Example 48 Consider the inconsistent initial clause set {t(a), ¬t(x) ∨ t(f(x)), ¬t(x) ∨¬s(x) ∨¬r(x, y) ∨ q(y), ¬p(c), ¬q(c), r(x, y), ¬r(x, y) ∨¬r(x, z) ∨ q(x) ∨ p(x)} and ordered resolution with selection as the calculus. Assume that the selection function always chooses the maximal negative literals w.r.t. the precedence p ≺ q ≺ r ≺ s ≺ t. Let i i fml(v ) =¬r(f (a), x) ∨ q(x) and fml(w ) =¬s(f (a)), and let v and w be false in the initial i i i i model for all i. Following an age-based selection heuristic and maximal splitting, the second clause the prover derives is ¬s(a) ∨¬r(a, y) ∨ q(y), which it splits into ¬s(a) ←{w } and ¬r(a, y) ∨ q(y) ←{v }. The s predicate’s role is purely to ensure that this clause is split and that the assertion v is introduced. The prover later also derives q(x) ∨ p(x) and q(y) ←{v } 0 0 and switches to a model in which v is true if and only if i = 0. The first of the two clauses is clearly locally redundant, so Lock applies, and ({v }, q(x) ∨ p(x)) is added to L . Next, q(y)←{v } is derived, before q(y)←{v } is selected for inferences. Eventually, that 1 0 latter clause can be used together with ¬q(c) to derive ⊥←{v }. The prover then switches to a new model in which v is true if and only if i = 1. The clause q(x) ∨ p(x) can immediately 123 16 Page 22 of 44 G. Ebner et al. be relocked. This process can be repeated indefinitely. The clause q(x) ∨ p(x),which is necessary for a refutation (together with ¬p(c) and ¬q(c)), is ignored because it is always locally redundant. It is locked each time the prover selects an A-clause for inferences, due to a different A-clause. But it is not locally redundant at the limit point J =¬V. In the derivation in Example 48, locking is not applied exhaustively: The A-clause ¬r(f (a), y) ∨ q(y) ←{v } is not locked, even though q(y) ←{v } has already been derived. i j This situation is unrealistic and would not happen in Vampire. We could hope that is enough for completeness to forbid such anomalous scenarios. However, this is not the case, as we can see from a more complicated example: Example 49 The calculus is ordered resolution with selection using the precedence p ≺ q ≺ q ≺ r ≺ r ≺ s ≺ t ≺ t ≺ u, selecting nothing if the clause is of the form 1 2 1 2 1 2 ¬u(...) ∨ u(...) and otherwise selecting the maximal negative literals. The initial clauses are as follows. First, we have a splittable clause q (x)∨q (y). Then we 1 2 have clauses u(a, y) and ¬u(x, y) ∨ u(f(x), y). We will use the predicate symbol u to delay the selection of a clause in an age-based selection heuristic, by adding a literal ¬u(f (x), y) to the clause. Moreover, we have the clause s(x, y). We can prevent splitting by adding the literal ¬s(x, y) to a clause. Finally, we add the following clauses: ¬u(f(y), x) ∨ r (x) ∨¬q (x) ¬u(f(y), x) ∨¬r (x) ¬t (x) ∨¬s(x, y) ∨¬p(x) ∨ r (y) 1 1 1 1 ¬u(f(y), x) ∨ r (x) ∨¬q (x) ¬u(f(y), x) ∨¬r (x) ¬t (x) ∨¬s(x, y) ∨¬p(x) ∨ r (y) 2 2 2 2 2 t (a) t (f(a)) ¬t (x) ∨ t (f(f(x))) ¬t (x) ∨ t (f(f(x))) 1 2 1 1 2 2 The initial clause set is clearly inconsistent. Yet we will sketch an infinite derivation that corresponds to an age-based selection heuristic and that does not derive ⊥. First, we split q (x) ∨ q (y) into q (x) ←{x } and q (x) ←{x }, where the assertion denotations are as 1 2 1 1 2 2 follows: 2i i fml(w ) =¬s(f (a), x) ∨ r (x) fml(v ) =¬p(f (a)) 2i 1 i 2i+1 fml(w ) =¬s(f (a), x) ∨ r (x) fml(x ) = q (x) 2i+1 2 i The derivation uses the following sequence of interpretations J : – J | v if and only if j < i; i j – J | w if and only if j ∈{i, i + 1}; i j – J | x if and only if i is even; i 1 – J | x if and only if i is odd. i 2 The derivation thus alternates between two families of interpretations, with even and odd indices, giving rise to two limit points. After the clause q (x) ∨q (y) is split, the prover is in the model J and derives the clauses 1 2 0 ¬u(y, x) ∨ r (x) ∨¬q (x), ¬u(y, x) ∨ r (x) ∨¬q (x), ¬s(a, y) ∨ r (y) ∨¬p(a). The last 1 2 1 1 2 clause is split into ¬s(a, y) ∨ r (y) ←{w }, ¬p(a) ←{v }, and ⊥←{¬v , ¬w }. Then 1 0 0 0 0 an analogous split happens with r instead of r . After a few more inferences, we derive 2 1 r (x) ∨¬q (x) and then r (y) ←{w }, which makes r (x) ∨¬q (x) locally redundant (and 1 1 0 1 1 1 analogously for r in place of r ). Once r (y)←{w } is picked as the given clause, the prover 2 1 1 0 derives ⊥←{w } and switches to the next model J . 0 1 In the first family, J , the clause ¬q (x)∨r (x) is always locally redundant due to r (x)← 2i 1 1 {w }, and is locked with the assertion w . Similarly, ¬q (x) ∨ r (x) is locally redundant in 2i 2i 2 the second family, J , with the assertion w . In both cases we can already lock each 2i+1 2i+1 of the clauses while the prover is still in the previous model (J and J , respectively). 2i−1 2i 123 Unifying Splitting Page 23 of 44 16 The clause ¬q (x) ∨ r (x) is thus only ever unlocked in interpretations J . In those 2 2i interpretations, q (x) ←{x } is disabled and hence no inferences can be performed with 2 2 ¬q (x) ∨ r (x). The same holds mutatis mutandis for ¬q (x) ∨ r (x), which is unlocked 2 1 2 1 only when no inferences can be performed with it. As a result, the derivation never performs inferences with q (x) ←{x } or q (x) ←{x }. Removing these A-clauses makes the set 1 2 1 2 satisfiable; thus, by soundness, the derivation cannot contain ⊥. Given the right sequence of propositional interpretations returned by the SAT solver, the derivation in Example 49 could potentially happen in a prover such as Vampire. It is difficult to exclude that the SAT solver used by Vampire could produce this sequence, or generally to characterize the sequence of models produced by SAT solvers in such a concrete way. This derivation is also strongly fair—every inference that is possible infinitely often, perhaps intermittently, is eventually made redundant. Thus strong fairness is not a sufficient criterion for completeness either. 5.3 Fairness Our solution to the issues encountered above is as follows. Let (J , N , L ) be an ⇒ - i i i i L derivation. Given a subsequence (J ) of (J ) ,let (N ) be the corresponding subsequence of j i i j j j (N ) . To achieve fairness, we now consider N , the A-formulas persistent in the subsequence i i (N ) . By contrast, with no A-formulas locked away, fairness of ⇒ -derivations could j MG use N . Definition 50 An ⇒ -derivation (J , N , L ) is fair if (A) L =∅ and (B) either (1) L i i i i 0 ⊥∈ N or (2) J | (N ) for infinitely many indices i and there exists a subsequence (J ) i i i ⊥ j i j converging to a limit point J such that FInf((N ) ∪ (lim sup L ) \L ) J J J ∞ j→∞ ⊆ FRed ((N ∪ L ) ),where (N ) and (L ) correspond to (J ) . I i i J j j j j j j Fairness of an ⇒ -derivation implies fairness of the corresponding ⇒ -derivation. L MG The condition on the sets L ensures that inferences from A-formulas that are locked infinitely often, but not infinitely often with the same lock, are redundant at the limit point. In particular, if we know that each A-formula is locked at most finitely often, then lim sup L = L and the inclusion in the definition above simplifies to j→∞ FInf((N ) ) ⊆ FRed ((N ∪ L ) ). J I i i J Theorem 51 (Fairness) Let (J , N , L ) be a fair ⇒ -derivation. Then (J , N ∪ L ) i i i i L i i i i is a fair ⇒ -derivation. MG Proof We already showed that (J , N , L ) is an ⇒ -derivation in Lemma 46. It remains i i i i MG to show fairness. If the ⇒ -derivation is fair by case (1) of Definition 50,weapplycase (1) of Definition 36 to show that the ⇒ -derivation is fair. Otherwise, from case (2) of MG Definition 50,weretrievealimit point J. We will show, for that limit point, case (2) of Definition 36: FInf((lim inf N ∪ L ) ) ⊆ FRed ((N ∪ L ) ) i→∞ i i J I i i J Assume (a) ι ∈ FInf((lim inf N ∪ L ) ). By the definition of fairness of ⇒ - j→∞ j j J L derivations, if all of ι’s premises belong to (N ) ∪ ((lim sup L ) \(L ) ), J J J ∞ j→∞ then ι is redundant. Otherwise, we have that (b) one of ι’s premises, C, is not in that set; that is, C ∈ / (N ) and either C ∈ / (lim sup L ) or C ∈ (L ) . J J J ∞ j→∞ 123 16 Page 24 of 44 G. Ebner et al. By (a) we have that C ← A ∈ lim inf N ∪ L ⊆ lim inf N ∪ L for j→∞ j j j→∞ j j some A ⊆ J.Since C ∈ / (N ) by (b), C ← A cannot be persistent in (N ) and hence J j ∞ j must occur infinitely often in the sequence (L ) . Thus C ∈ (lim sup L ) and j J j j→∞ j therefore C ∈ (L ) by (b). Hence (B, C ← A ) ∈ L for some A ⊆ J and B.If (B, C ← A ) ∈ L for some j, then necessarily B ⊆ J due to the side conditions of the ⇒ -transitions. Since this is true for infinitely many indices j,wealsohave B ⊆ J = J, and thus C ∈ FRed((N ) ) for i J some i by the side condition of the Lock transition. Therefore, by reducedness of FRed,the inference ι is redundant. Corollary 52 (Dynamic completeness) Assume (FInf, FRed) is statically complete. Given a fair ⇒ -derivation (J , N , L ) such that N | {⊥}, we have ⊥∈ N for some i. L i i i i 0 i Proof By Theorems 28 and 39. 6 AVATAR-Based Provers AVATAR was unveiled in 2014 by Voronkov [28], although it was reportedly present in Vam- pire already in 2012. Since then, he and his colleagues studied many options and extensions [4, 22]. At least two reimplementations exist, in Ebner’s super tactic for Lean [13]and in the Drodi prover by Oscar Contreras. Here we attempt to capture AVATAR’s essence. We will define an abstract AVATAR-based prover that extends the locking prover L with a given clause procedure [19, Sect. 2.3]. A-formulas are moved in turn from the passive to the active set, where inferences are performed. The heuristic for choosing the next given A-formula to move is guided by timestamps indicating when the A-formulas were derived, to ensure fairness. 6.1 The Transition Rules Let TAF = AF × N be the set of timestamped A-formulas. (We will often omit the adjective “timestamped.”) Given a subset N ⊆ TAF, we define N ={C | (C , t) ∈ N for some t} and overload existing notations to erase timestamps as necessary. Accordingly, N=N, N = N , and N = N . Note that we use a new set of calligraphic letters (e.g., C, N) ⊥ ⊥ J J to range over timestamped A-formulas and timestamped A-formula sets. We say that N is enabled in J if and only if N is enabled in J.Wealsodefine (C ,..., C , D) = 1 n (C ,..., C , D) for TAF-inferences ι. 1 n Using the saturation framework [29, Sect. 3], we lift a calculus (SInf, SRed) on AF to a calculus (TSInf, TSRed) on TAF with the tiebreaker order< on timestamps. The tiebreaker is used to strengthen redundancy, so that if the same A-formula appears but with two different timestamps, the more recent version is considered redundant. In other words, if an A-formula appears with two timestamps, the later version is redundant. Note that TSRed is in general not reduced. Traditionally, provers use the active or passive status as tiebreaker: An active clause may subsume a passive copy of itself, but not the other way around. Timestamps are more fine-grained. Lemma 53 Let N ⊆ AF, C ∈ AF, and t, k ∈ N. Then: 1. TSInf(N) = SInf(N); 123 Unifying Splitting Page 25 of 44 16 2. TSRed (N) = SRed (N); I I 3. TSRed (N) ⊇ SRed (N); and F F 4. (C , t + k) ∈ TSRed ({(C , t)}) if k > 0. Proof This follows directly from the definition in Waldmann et al. [29]. A state is a tuple (J, A, P, Q, L) consisting of a propositional interpretation J,asetof enabled nonpropositional active A-formulas A ⊆ TAF, a set of enabled nonpropositional passive A-formulas P ⊆ TAF, a set of A-formulas Q ⊆ TAF that are either disabled in J or propositional clauses, and a set of locked A-formulas L ⊆ P (A) × TAF such that fin (1) A = P =∅;(2) A ∪ P is enabled in J;(3) Q ⊆{⊥}. ⊥ ⊥ J Whenever we write a tuple (J, A, P, Q, L), we assume that it satisfies all these invariants. For every L ⊆ A × TAF,wedefine L ={(B, C) | (B, C) ∈ L}⊆ A × AF. The input formulas are first put in the passive set P. Once an A-formula is selected for inferences and all inferences with it and the active A-formulas have been made redundant, it is moved to the active set A. Inferences such as Split produce disabled and propositional clauses, which are put into Q. When switching to a new model, the prover moves the newly enabled A-formulas from Q to P and the newly disabled A-formulas from A and P to Q to preserve the state invariant. The division of nonactive A-formulas into the sets P and Q is done for notational conve- nience; for example, P is a separate set because fairness will be stated in terms of A and P.In a practical implementation, this division would likely be different. The set Q would typically be distributed over two data structures: The propositional clauses in Q are directly passed to the SAT solver and need not be stored by the prover itself. The remaining A-formulas Q \ Q are those that need to be moved back into P when the prover switches to an inter- pretation that enables them. These might be stored in the same data structure as the set of locked A-formulas L, which also need to be reactivated depending on the interpretation. This is what Vampire does. Alternatively, they could be stored in the same data structure as P, with the prover checking on every access whether an A-formula is enabled in the current interpretation. The AVATAR-based prover AV is defined as the following transitions: Infer (J, A, P {C}, Q, L) ⇒ (J, A ∪{C}, P , Q , L) AV if TSInf(A, {C}) ⊆ TSRed (A ∪{C}∪ P ∪ Q ), P ⊆ P , and Q ⊆ Q Process (J, A, P, Q, L) ⇒ (J, A , P , Q , L) AV if A ⊇ A and (A\A ) ∪ (P\P ) ∪ (Q\Q ) ⊆ TSRed (A ∪ P ∪ Q ) Switch (J, A, P, Q, L) ⇒ (J , A , P ∪ U, Q , L \ U) AV if J | Q , J | Q , A ={C ∈ A | C is enabled in J }, ⊥ ⊥ U ={(B,(C ← A, t)) ∈ L | B J and A ⊆ J }, and A ∪ P ∪ Q = A ∪ P ∪ Q StrongUnsat (J, A, P, Q, L) ⇒ (J, A, P, Q ∪{(⊥, t)}, L) AV if Q |≈ {⊥} LockA (J, A {(C ← A, t)}, P, Q, L) ⇒ AV (J, A, P, Q, L ∪{(B,(C ← A, t))}) if B ⊆ J and C ∈ FRed ((A ∪ P) ) for every J ⊇ A ∪ B F J 123 16 Page 26 of 44 G. Ebner et al. There is also a LockP rule that is identical to LockA except that it starts in the state (J, A, P {(C ← A, t)}, Q, L).An AV-derivation is well timestamped if every A-formula introduced by a rule is assigned a unique timestamp. In practice, a prover would ensure well-timestampedness by assigning timestamps monotonically, but this is not necessary for the fairness and completeness proofs. Lemma 54 Let (J , A , P , Q , L ) be an ⇒ -derivation. Then (J , A ∪ P ∪ Q , L ) i i i i i i AV i i i i i i is an ⇒ -derivation. Proof The transitions map directly to the corresponding transitions in ⇒ ; both Infer and Process maptoa Lift of Derive. Lemma 55 Let (J , A , P , Q , L ) be an ⇒ -derivation such that A =∅.Then i i i i i i AV 0 TSInf(A ) ⊆ TSRed (A ∪ P ∪ Q ∪ L ) for every i. i I i i i i Proof The invariant is preserved by each transition. Example 56 Let us redo the ⇒ -derivation of Example 30 using ⇒ . For readability, MG AV we emphasize in gray the A-clauses that appear or move between state components and omit all timestamps. One possible derivation is (J , ∅, {¬p(a), ¬q(z, z), p(x) ∨ q(y, b)}, ∅, ∅) ⇒ (J , { ¬p(a) }, {¬q(z, z), p(x) ∨ q(y, b)}, ∅, ∅) Infer ⇒ (J , {¬p(a), ¬q(z, z) }, {p(x) ∨ q(y, b)}, ∅, ∅) Infer 0 ⇒ (J , {¬p(a), ¬q(z, z)}, ∅, Process 0 {⊥←{¬v , ¬v } , p(x) ←{v } , q(y, b) ←{v }}, ∅) 0 1 0 1 ⇒ (J , {¬p(a), ¬q(z, z)}, { p(x) ←{v }}, 1 0 Switch {⊥ ← {¬v , ¬v }, q(y, b) ←{v }}, ∅) 0 1 1 ⇒ (J , {¬p(a), ¬q(z, z), p(x) ←{v }}, ∅, 1 0 Infer {⊥ ← {¬v , ¬v }, q(y, b) ←{v }, ⊥←{v }}, ∅) 0 1 1 0 ⇒ (J , {¬p(a), ¬q(z, z)}, { q(y, b) ←{v }}, 2 1 Switch {⊥ ← {¬v , ¬v }, ⊥←{v }, p(x) ←{v }}, ∅) 0 1 0 0 ⇒ (J , {¬p(a), ¬q(z, z), q(y, b) ←{v }}, ∅, Infer 2 1 {⊥ ← {¬v , ¬v }, ⊥←{v }, p(x) ←{v }, ⊥←{v }}, ∅) 0 1 0 0 1 ⇒ (J , {¬p(a), ¬q(z, z), q(y, b) ←{v }}, ∅, Strong- 2 1 Unsat {⊥ ← {¬v , ¬v }, ⊥←{v }, p(x) ←{v }, ⊥←{v }, ⊥}, ∅) 0 1 0 0 1 Example 57 Let us redo the ⇒ -derivation of Example 47 using ⇒ , following the same L AV conventions as in the previous example. One possible derivation is (J , ∅, {¬p(a), p(x) ←{¬v }, p(a)}, ∅, ∅) 0 0 ⇒ (J , ∅, {¬p(a), p(x) ←{¬v }}, ∅, {({¬v }, p(a))}) 0 0 0 LockP ⇒ (J , { ¬p(a) }, {p(x) ←{¬v }}, ∅, {({¬v }, p(a))}) 0 0 0 Infer ⇒ (J , {¬p(a), p(x) ←{¬v }}, ∅, {⊥←{¬v }}, {({¬v }, p(a))}) 0 0 0 0 Infer ⇒ (J , {¬p(a)}, { p(a) }, {⊥ ← {¬v }, p(x) ←{¬v }}, ∅) Switch 1 0 0 ⇒ (J , {¬p(a), p(a) }, ∅, {⊥ ← {¬v }, p(x) ←{¬v }, ⊥}, ∅) 1 0 0 Infer 123 Unifying Splitting Page 27 of 44 16 6.2 Counterexamples In contrast with nonsplitting provers, for AV, fairness w.r.t. formulas does not imply fairness w.r.t. inferences. To ensure fairness in a nonsplitting prover, it suffices to select the oldest formula for inferences infinitely often; for example, provers can alternate between choosing the oldest and choosing the heuristically best formula. In splitting provers, such a strategy is incomplete, and we need an even stronger fairness criterion. A problematic scenario involves two premises C, D of a binary inference ι and four transitions repeated forever, with other steps interleaved: Infer makes C active; Switch disables it; Infer makes D active; Switch disables it. Even though C and D are selected in a strongly fair fashion, ι is never performed. Example 58 More concretely, make two copies of the clause set {¬p(x) ∨ p(f(x)) ∨ q(x), p(a), ¬q(x), ¬p(x) ∨ q(f(x))} one with the assertion {x }, the other with {x }, in addition to the propositional clause ⊥← 1 2 {¬x , ¬x }. Suppose the prover starts with the model {x } and processes the clauses in the 1 2 1 order given above. It first chooses the A-clause ¬p(x) ∨p(f(x)) ∨q(x)←{x } for inferences, i i followed by p(a) ←{x }. Let fml(v ) = p(f (a)) and fml(w ) = q(f (a)). By resolution and 1 i i splitting, it derives p(f(a)) ←{v }, q(a) ←{w }, and ⊥←{x , ¬v , ¬w }. It then switches 1 0 1 1 0 to a model in which x is true. There it selects ¬p(x)∨p(f(x))∨q(x)←{x } and p(a)←{x } 2 2 2 for inferences, deriving analogous A-clauses as in the x branch. Let the models alternate between the x and x branches, making as few variables true as 1 2 possible. Because the models alternate between the two branches, ¬p(x) ∨ p(f(x)) ∨ q(x) ← {x } will always be the oldest passive A-clause after switching to a new model. Assume that the prover chooses this clause for inferences based on age. If we are allowed to interleave age-based selection with heuristic selection, we can cause the prover to switch model after selecting at most two additional A-clauses for inferences: If an A-clause q(f (a)) ←{w } is enabled, we heuristically select both that A-clause and ¬q(x)←{x }. Otherwise, an A-clause of the form p(f (a)) ←{v } is enabled. Assume that j is maximal among such clauses, and that thus v is false in the model. We heuristically select that clause for inferences, deriving j+1 j+1 j ⊥←{¬v , ¬w , v , x } by splitting p(f (a)) ∨ q(f (a)) ←{v , x }. j+1 j j i j i With this strategy, the prover will never select ¬p(x)∨q(f(x))←{x } for inferences, since there is always an older clause to choose first. Consequently, it will never derive ⊥. In Example 58, the prover did not derive ⊥ because it never performed an inference between p(a) ←{x } and ¬p(x) ∨ q(f(x)) ←{x } (and analogously for x ), even though 1 1 2 both A-clauses are enabled infinitely often. Forbidding this situation does not guarantee completeness either. As Example 49 showed, there exist strongly fair derivations that do not derive ⊥ from an inconsistent initial set. We believe that this counterexample cannot arise with Vampire, because Vampire alter- nates between age-based and weight-based selection using a fixed ratio (the “age–weight ratio” or “pick–given ratio”). In contrast, our example requires a highly unrestricted heuristic selection, where we choose young, large A-clauses such as p(f (a)) ←{v } even though smaller, older ones are enabled. Unrelated to completeness, we might expect that under a reasonable strategy an ⇒ - AV derivation saturates every limit point. This is, however, not the case either, even with strict age-based selection: 123 16 Page 28 of 44 G. Ebner et al. Example 59 Take the following consistent A-clause set: {¬q(x) ←{x}, p(a) ←{x}, ¬p(x) ∨ p(f(x)) ∨ q(f(x)) ←{x}, ¬q(x) ←{¬x}, p(a) ←{¬x}, ¬p(x) ←{¬x}} Assume ordered resolution as the base calculus with the precedence q ≺ p. Thus the prover will not resolve ¬q(x) ←{x} with ¬p(x) ∨ p(f(x)) ∨ q(f(x)) ←{x}. We will sketch a derivation with two limit points, J | x and J | x, where J will not be locally saturated. i i Let fml(v ) = p(f (a)) and fml(w ) = q(f (a)). We define the sequence of models (J ) such i i i i that J | x J | v iff j ≤ i J | w 2i 2i j 2i j J | x J | v J | w iff j > i 2i+1 2i+1 j 2i+1 j The prover now starts in the model J , and processes the formulas in the order we listed them at the beginning of the example. The first new formula it derives is p(f(a)) ∨ q(f(a)) ← {x}, which it splits into p(f(a)) ←{v }, q(f(a)) ←{w }, and ⊥←{x, ¬v , ¬w }. The last 1 1 1 1 propositional clause is not satisfied in J , so the prover switches to a new interpretation. After switching to the next model J = J , the two formulas p(f(a)) ←{v } and 1 2·0+1 1 q(f(a)) ←{w } remain in the active set. The prover then chooses the oldest enabled passive formula, ¬q(x)←{¬x}, for inferences. Thus deriving the propositional clause ⊥←{¬x, w }, which is not satisfied in J . This process is then repeated infinitely often. In the model J , the prover derives the three 2i i+1 i+1 new formulas p(f (a)) ←{v }, q(f (a)) ←{w }, and ⊥←{x, ¬v , ¬w }. The i+1 i+1 i+1 i+1 last propositional clause causes a switch to the next model J , where ⊥←{¬x, w } is 2i+1 i+1 derived. The subsequence (J ) converges a limit point, call it J . The formulas enabled at J 2i+1 i are not saturated: p(a) and ¬p(x) are enabled, but ⊥ is not. 6.3 Fairness Definition 60 An ⇒ -derivation (J , A , P , Q , L ) is fair if (A) L =∅, (B) A =∅, AV i i i i i i 0 0 and (C) either (1) ⊥∈ Q or (2) J | (Q ) for infinitely many indices i and there exists i i i ⊥ a subsequence (J ) converging to a limit point J such that (3) lim inf TSInf(A , P ) =∅ j→∞ j j j and (4) (lim sup L ) \L ⊆ FRed ((A ∪ P ∪ Q ∪ L ) ). J J F i i i i J j→∞ i Condition (3) ensures that all inferences involving passive A-formulas are redundant at the limit point. It would not suffice to simply require P =∅ because A-formulas can move back and forth between the sets A, P,and Q,aswesaw in Example 58. Condition (4) is similar to the condition on locks in Definition 50. Theorem 61 (Fairness) Let (J , A , P , Q , L ) be a fair ⇒ -derivation. Then the ⇒ - i i i i i i AV L derivation (J , A ∪ P ∪ Q , L ) is fair. i i i i i i Proof We trivially have L =∅. Furthermore, if⊥∈ Q , we clearly have ⇒ -fairness. 0 i L It remains to show subcase (B)(2) of Definition 50, using the corresponding subsequence as used for ⇒ -fairness. So let ♣ = (lim sup L ) \lim sup L and AV L J J j→∞ j→∞ j j ♣ V = (lim sup L ) \L be the terms from the corresponding fairness condi- A J J j→∞ j tions. First we show ( A ∪P ∪L ) ⊆ (lim inf A ∪P ) ∪ FRed ((A ∪P ∪Q ∪ J j→∞ J F i i i j j j j j j i L ) ), that is, every enabled formula in the subsequence is either persistent or redundant on i J 123 Unifying Splitting Page 29 of 44 16 thebaselevel.Solet (C ← A, t) ∈ P . We prove the statement by induction on (A, t) w.r.t. the lexicographic order. If C ∈♣ V or C ∈ lim sup L , we are done. Otherwise C ∈ / A J j→∞ j ∞ ∞ L (because L ⊆ lim sup L ) and hence C ∈ / (lim sup L ) J J J J j→∞ j j→∞ j (because C ∈♣ / V). So since (C ← A, t) is not locked infinitely often, there exists an index after which it is never locked again, which means that it is either persistent in (A ∪ P ) (and j j we are done) or deleted in Process and thus (C ← A, t) ∈ TSRed (A ∪ P ∪ Q ).By F i i i definition of TSRed , either (a) C ∈ FRed ((A ∪ P ∪ Q ) ), (b) C ← B ∈ A ∪ P F F i i i J i i i i for some B ⊂ A, or (c) (C ← A, s) ∈ A ∪ P for some s < t. In case (a), we are done. In i i cases (b) and (c), we apply the induction hypothesis. Now let R = FRed ((A ∪ P ∪ Q ∪ L ) ) and compute I i i i i J FInf((lim inf A ∪ P ∪ Q ) ∪♣ ) j→∞ J L j j j = FInf((lim inf A ∪ P ) ∪♣ ) (since by fairness ⊥ ∈ / (Q ) and j→∞ J L J j j j hence Q is disabled at lim inf) ⊆ FInf((lim inf A ∪ P ) ∪ FRed ((A ∪ P ) )) (as shown above) j→∞ J F i i J j j i ⊆ FInf((lim inf A ∪ P ) ) ∪ R (by reducedness of FRed) j→∞ J j j = TSInf(lim inf A ∪ P ) ∪ R (by Lemmas 13 and 53) j→∞ j j = lim inf TSInf(A ∪ P ) ∪ R (by property of lim inf) j→∞ j j = lim inf TSInf(A ) ∪ TSInf(A , P ) ∪ R (by definition of TSInf(, )) j→∞ j j j ⊆ TSRed (A ∪ P ∪ Q ∪ L ) ∪ lim inf TSInf(A , P ) ∪ R (by Lemma 55) I j→∞ i i i i j j J ⊆ TSRed (A ∪ P ∪ Q ∪ L ) ∪ R (by ⇒ -fairness) I i i i i AV = FRed ((A ∪ P ∪ Q ∪ L ) ) (by Lemmas 16and 53) I i i i i J Corollary 62 (Dynamic completeness) Assume (FInf, FRed) is statically complete. Given afair ⇒ -derivation (J , A , P , Q , L ) such that P ∪ Q | {⊥}, we have ⊥∈ Q for AV i i i i i i 0 0 i some i. Proof By Theorems 28, 39,and 51. Assuming the restriction on locking we already required for ⇒ -derivations, the ⇒ L AV relation is concrete enough to allow us to show that typical clause selection strategies are fair and avoid the counterexamples from Sects. 5.2 and 6.2. Many selection strategies are combinations of basic strategies, such as choosing the smallest formula by weight or the oldest by age. We capture such strategies using selection orders . Intuitively, C D if the prover will select C before D whenever both are present. That is, the prover always chooses one of the -minimal A-formulas. We use two selection orders: , on timestamped A-formulas, TAF must be followed infinitely often; , on base formulas, must be followed otherwise. Definition 63 Let X beaset.A selection order on X is an irreflexive and transitive relation such that {y | y x} is finite for every x ∈ X. Example 64 Let X ⊆ TAF be such that X only contains finitely many A-formulas with the same timestamp. Define on X so that (C , t) (C , t ) if and only if t < t . Then age age age is a selection order corresponding to age-based selection. 123 16 Page 30 of 44 G. Ebner et al. Remark 65 Every selection order is a well-founded relation, but not every well-founded relation is a selection order. A well order is a selection order if and only if its order type is less than ω + 1. The ordinal ω + 1 ={0 < 1 < 2 < ··· < ω} is not a selection order since {y | y > ω} is infinite. Even well-founded relations of low rank need not be selection orders: The empty relation ∅⊆ N × N is irreflexive, transitive, and well founded (with rank zero) but not a selection order. Selection orders on TAF also generalize the mechanism, outlined by Bachmair and Ganzinger in a footnote [2, Sect. 4.3] and elaborated by Schlichtkrull et al. [27, Sect. 4], of using an N-valued weight function that is strictly monotone in the timestamp. Example 66 Let F be the set of first-order clauses in a fixed signature. Define the selection order on F by C C if and only if |C |≤|C|,where |C| denotes the sum of the nv nv number of nonvariable positions in C.Then is a selection order because there exists at nv most a finite number of first-order clauses with at most n nonvariable positions for any n. This selection order corresponds to a simple weight-based selection scheme. The intersection of two orders and corresponds to the nondeterministic alternation 1 2 between them. The prover may choose either a -minimal or a -minimal A-formula, at 1 2 its discretion. Lemma 67 Let and be selection orders on X. Then ∩ is a selection order as 1 2 1 2 well. Proof Irreflexivity and transitivity are preserved by intersections, and note that {y | not (x y and x y)}={y | x y}∪{y | x y} is finite as a union of two finite sets. 2 1 2 Lemma 68 Let be a selection order on an infinite set X. Then for all elements x and y, there exists an element z such that x z and y z. Proof The set X\{z | x z and y z}={z | x z}∪{z | y z} is finite, and therefore {z | x z and y z} is infinite and in particular nonempty. To ensure completeness of the given clause procedure, we must restrict the inferences that the prover may perform; otherwise, it could derive infinitely many A-formulas with different assertions, causing it to switch between two branches of the split tree without making progress as in Example 58.Given N ⊆ AF,let N = {A | C ← A ∈ N for some C}. Definition 69 A function F : P(AF) → P(AF) is called strongly finitary if (1) F(N ) is finite for every N ⊆ AF such that N is finite, and (2) there exists a function B : F → P (A) such that F(N ) ⊆N ∪ B(N ) for every N ⊆ AF. fin Aset of AF-inferences Inf is strongly finitary if the function N !→ concl(Inf(N )) is strongly finitary. An inference rule is strongly finitary if the set of inferences it defines is strongly finitary. We can extend a strongly finitary function F to sets of base formulas by taking F (N) =F(N × P (A)) for every N ⊆ F and to sets of timestamped A-formulas F fin by taking F (N) = F(N) × N for every N ⊆ TAF. The functions F and F are TAF F finitary, mapping finite sets to finite sets. Moreover, if F and G are strongly finitary, then so is N !→ F(N ) ∪ G(N ). The function B in Definition 69 gives a bound on the new assertions. For the inference rules Base, Unsat, Collect, Trim,and StrongUnsat, we can set B(N) =∅ since the 123 Unifying Splitting Page 31 of 44 16 conclusions do not contain assertions which were not already in the premises. So if FInf(N) is finite for every finite N ⊆ F,then SInf is strongly finitary. The inference rules Split and Approx require a nonempty B(N), containing the assertions chosen for the split A-formulas. Deterministic splitting rules (where the chosen assertions are fully determined by the base formula), such as AVATAR’s, are thus also strongly finitary because then B(N) is finite. The optional inference rule Tauto is not strongly finitary. For the completeness of the given clause procedure, Lemma 77, we will fix a strongly finitary function I restricting the inferences: The prover may perform an inference only if its conclusion is in I(A ),where A is the active clause set after the Infer transition. This i i restriction will allow us to rule out the case where A is finite and the prover switches models without making progress. Condition (1) in Definition 69 then says that the prover only infers finitely many F-formulas—this will in turn ensure that splitting creates only finitely many new assertions. Condition (2) says that the inferred A-formulas contain only finitely many new assertions. Taken together, only finitely many assertions are added in the case, where A is finite, which means that the prover can only switch models finitely often, a contradiction. Simplification rules used by the prover must be restricted even more to ensure complete- ness, because they can lead to new splits and assertions, and hence switching to new models. For example, simplifying p(x∗0)∨p(x) to p(0)∨p(x) transforms a nonsplittable clause into a splittable one. Even for the standard orders on first-order clauses, there can be infinitely many clauses that are smaller than a given clause. For example, with the lexicographic path order, the set {u | u ≺ u} is typically infinite for a term u. If simplifications were to produce infinitely many new splittable clauses, the prover might split clauses and switch propositional interpretations forever without making progress. Example 70 Even if ≺ is a well-founded order on F,and I is a set of binary inferences such that C ≺ C and D ≺ C for every inference (C , C , D) ∈ I, simplification with I can 2 1 1 2 1 still produce infinitely many base formulas. This is because in an AV prover, the same base formula may be rederived infinitely often (for example due to switching between two families of interpretations). As a slightly abstract example, consider F = N ∪{∞} with 0 ≺ 1 ≺ 2 ≺ ··· ≺ ∞,and let I ={(n, ∞, n + 1) | n ∈ N}. If the prover then rederives ∞ infinitely often, it might simplify ∞ using I in a different way each time, the first time to 1, then to 2, and so on. We hence need to ensure that, in the entire derivation, each formula is simplified in at most a finite number of ways. Definition 71 Let ≺ be a transitive well-founded relation on F, and let # be its reflexive closure. A function S : AF → P(AF) is a strongly finitary simplification bound for ≺ if N !→ S (C ) is strongly finitary and C #C for every C ∈ S (C ). C ∈N The prover may simplify an A-formula C to C only if C ∈ S (C ). It may also delete C . Strongly finitary simplification bounds are closed under unions, allowing the combination of simplification techniques based on ≺. For superposition, a natural choice for ≺ is the clause order. Analogously to strongly finitary functions, we define the extension of strongly finitary simplification bounds to sets of formulas as S (N) =S (N × P (A)). The key property F fin of strongly finitary simplification bounds is that if we saturate a finite set of A-formulas w.r.t. simplifications, the saturation is also finite. This is crucial to bound the number of A-formulas derived by the prover and thus the number of possible model switches: If the prover only selects a finite set of A-formulas for inferences, then simplification will only derive finitely many A-formulas as well, no matter how often an A-formula is derived again: 123 16 Page 32 of 44 G. Ebner et al. Lemma 72 Let S be a strongly finitary simplification bound. For every C ∈ AF,let ∗ i i ∗ S (C ) = S (C ),where S denotes the ith iterate of S . Then S is also a strongly i=0 finitary simplification bound. Proof Clearly, C #C for every C ∈ S (C ). Let N ⊆ F be finite. Next we show that S (N) is finite as well. Define a sequence of finite sets M ⊆ F by M = N and F i 0 ∗ ∗ M = M ∪ S (M ). Clearly, S (M ) ∪ N ⊆ M = S (N) ⊇ S (N). i+1 i F i F ∞ ∞ F F To show that M is finite, consider the sequence M \M . By construction S (M )\M ⊆ ∞ i+1 i F i i (S (M \M ) ∪ S (M ))\M ⊆ (S (M \M ) ∪ M )\M = S (M \M )\M ,and F i i−1 F i−1 i F i i−1 i i F i i−1 i thus M \M = S (M \M )\M . Because S is a strongly finitary simplification bound, i+1 i F i i−1 i M \ M is always finite, and decreasing in the multiset extension of ≺. It is even strictly i+1 i decreasing as long as M \ M =∅ because M \ M ∩ M \ M =∅ and therefore i+1 i i+1 i i i−1 M \ M = M \ M . From the well-foundedness of ≺, it follows that M \M =∅ for i+1 i i i+1 i+1 i large enough i and that S (N) = M = M = M ∪ (M \M ) is finite. F ∞ i 0 i+1 i i i Thus S (N ) is finite whenever N is finite. It remains to exhibit a function B : F → P (A) such that S (N ) ⊆ B (N ) ∪N . By assumption, we have such a function B fin ∗ i+1 for S . Then set B (C) = B(S (C)) (which is finite for all C), and we have S (N ) = i i i i i S (S (N )) ⊆ B(S (N ))∪S (N ) ⊆ B (N )∪S (N ) and therefore S (N ) ⊆ N ∪ B (N ) by induction and hence S (N ) ⊆N ∪ B (N ). Example 73 Let F be the set of first-order clauses and S (C ←A) ={C ←A |C is a subclause of C and A ⊆ A}. Then S is a strongly finitary simplification bound, because (1) C # C if C is a subclause of C and (2) each clause has only finitely many subclauses. This S covers many simplification techniques, including elimination of duplicate literals (simplifying C ∨L∨L to C ∨ L), deletion of resolved literals (simplifying C ∨u ≈ u to C), and subsumption resolution (simplifying Cσ ∨Dσ ∨Lσ to Cσ ∨Dσ given the side premise C∨¬L). Removing redundant clauses is possible with every S . Example 74 If the Knuth–Bendix order [18] is used as the term order and all weights are positive, then S (C←A) ={C ←A | C ≺ C and A ⊆ A} is a strongly finitary simplification bound. This can be used to cover demodulation. Example 75 The simplification rules Collect, Trim,and StrongUnsat from Sect. 3.1 are all strongly finitary simplification bounds. In a practical implementation, Split will deterministically split C ∈ F into C ,..., C and use the same assertions a ∈ asn(C ) every 1 n i i time. Under these conditions, Split is also a strongly finitary simplification bound. For other term orders, the S in Example 74 is not strongly finitary, and proving that demodulation is a strongly finitary simplification bound is much more involved. In this case, the necessary strongly finitary simplification bound even depends on the derivation. Example 76 If unit equations are only removed by demodulation, reflexivity deletion, or sub- sumption, the one-step demodulations possible at any point in the derivation are a strongly finitary simplification bound. By Lemma 72, this implies that many-step demodulation is also a strongly finitary simplification bound. Assume that demodulation is performed in a postorder traversal (i.e., subterms first), always rewriting using the oldest available equation. Also assume that if l ≈ r is an existing (ordered) equation and the prover derives l ≈ r ,that l ≈ r is not used for demodulation (but for example instead simplified to r ≈ r ). We will show that for every term t, there exist only finitely many terms t that are sim- plified from t in one step. The term t will typically be different over the course of the 123 Unifying Splitting Page 33 of 44 16 derivation. However, we can assign a decreasing well-founded measure to the rewrite step, ensuring finiteness. Consider a demodulation step transforming C[lσ ] to C[rσ ] using an equation l ≈ r, with rσ ≺ lσ.Let i be the index of lσ in C[lσ ] in a postorder traversal, let |l| be the number of nonvariable positions in l,and let u = 1 if the equation is unorientable (l r)and u = 0 otherwise. Then the tuple (i, |l|, u, rσ) decreases or stays the same in the left-to-right lexicographic order as we move along the derivation. If the prover derives a new ordered equation l ≈ r , it is possible that it applies at an earlier position in C, thus decreasing the i. Otherwise, it applies at the same position as l ≈ r previously, and the prover rewrites using the older l ≈ r first, keeping the tuple unchanged. If the equation l ≈ r is subsumed by l ≈ r and deleted, then |l | < |l|. (Note that if l ≈ r is subsumed by l ≈ r ,then r and r are identical because all variables that occur in r, r also occur in l.)If l ≈ r is simplified to t ≈ r by l ≈ r and l ≈ l ,then |l | < |l|.If l ≈ r is simplified to t ≈ r by l ≈ r and l ≈ r is unorientable, then |l |=|l| and u decreases. If l ≈ r is simplified to l ≈ t by l ≈ r ,then |l| stays the same, u might decrease, and rσ $ tσ . Based on the above definitions, we introduce a fairness criterion that is more concrete and easier to apply than the definition of fairness of ⇒ -derivations. AV Lemma 77 Let I be a strongly finitary function, and let S be a strongly finitary simplification bound. Then a well-timestamped ⇒ -derivation (J , A , P , Q , L ) is fair if all of the AV i i i i i i following conditions hold: 1. is a selection order on P , and is a selection order on F; TAF i F 2. A =∅, L =∅, and P ∪ Q is finite; 0 0 0 0 3. for every Infer transition, either C is -minimal in P or C is -minimal in P; TAF F 4. for every Infer transition, P ∪ Q ⊆ I (A ∪{C}); TAF 5. for every Process transition, P ∪ Q ⊆ S (A ∪ P ∪ Q ∪ L); TAF 6. if J | (Q ) , then eventually Switch or StrongUnsat occurs; i i ⊥ 7. if P =∅, then eventually Infer, Switch,or StrongUnsat occurs; 8. there are infinitely many indices i such that either P =∅ or Infer chooses a - i TAF minimal C at i; 9. (lim sup L ) \L ⊆ FRed ((A ∪ P ∪ Q ∪ L ) ) for every subse- J J F i i i i J j→∞ i quence (J ) of (J ) converging to a limit point J. j i i Proof If⊥∈ Q , the derivation is trivially fair. Otherwise, the StrongUnsat transition never occurs, and therefore Switch is eventually applied if the propositional clauses are not satisfied by the interpretation. Hence J | Q for infinitely many i, thus satisfying i i condition (C)(2) of Definition 36. Conditions (A) and (B) are satisfied due to condition (2) of this lemma, and (C)(4) due to (9). It remains to construct a subsequence (J , A , P , Q , L ) j j j j j such that (J ) converges to a limit point and lim inf TSInf(A , P ) =∅, as required j j→∞ j j j for (C)(3). Case 1:The set of -minimal A-formulas selected in an Infer transition for some state j is unbounded in . That is, for every C ∈ F,there is a Infer transition from state j such that the selected A-formula S is -minimal in P ,and C S.These Infer transitions j F j F clearly form an infinite subsequence. By Lemma 35, we can further refine it into a subsequence (J , A , P , Q , L ) ,where (J ) converges to a limit point. Assume towards a contradiction j j j j j j j j that ι ∈ lim inf TSInf(A , P ). By Lemma 68,for every C ∈ prems(ι) there exists an j→∞ j j index j such that C S . Therefore prems(ι) ⊆ A by the -minimality requirement F F j j on the Infer transition, a contradiction. 123 16 Page 34 of 44 G. Ebner et al. Case 2: The set of -minimal A-formulas selected in an Infer transition for some state j TAF is unbounded in . This case is analogous to case 1. TAF Case 3: Neither case 1 nor case 2 apply. Then the set of -minimal formulas selected in an Infer transitions is bounded and hence finite since is a selection order. Similarly, the set of -minimal TAF-formulas selected in an Infer transitions is finite as well. Let T be the TAF set of A-formulas selected in an Infer transition. So T and therefore A are both finite. The set S (I ( A )∪P ∪Q ) is then finite, and therefore A ∪P ∪Q ∪ L is F i 0 0 i i i i i i finite as well. Since both S and I are strongly finitary, only a finite number of new assertions are introduced, and A ∪ P ∪ Q ∪ L is also finite. Thus ( Q ) is also finite, and i i i i i ⊥ i i only a finite number of Switch transitions can occur. Thus there exists an index N such that no Switch transitions occur at states i > N. Now take the whole derivation as subsequence. We have P =∅ because there are infinitely many states j with a Infer transition such that a -minimal A-formula S is TAF j selected. After the initial N steps, every A-formula is selected only once (that is, S = S i j if i = j), because once it has been selected, it can only be removed from the active set if it becomes redundant or locked. (There are no Switch transitions.) In either case, the A-formula is removed from the passive set for the rest of the derivation. Newly derived A- formulas have a different timestamp due to the well-timestampedness requirement. Therefore, once an A-formula is in the active set, it will not come back into the passive set, and we have lim inf TSInf(A , P ) =∅. i→∞ i i Recall the abstract counterexample from Sect. 6.2 in which the A-formulas C and D were selected and disabled in turn. Intuitively, selection orders, together with the restrictions on the inferences, ensure that the prover will follow roughly the same steps whenever it is in a model that enables C and D. Since there are only finitely many formulas that it can select for inferences before C or D, the prover will eventually repeat itself and thus make progress. We could refine AV further and use Lemma 77 to show the completeness of an imperative procedure such as Voronkov’s extended Otter loop [28, Fig. 3], thus showing that AVATAR as implemented in Vampire is complete if locking is sufficiently restricted. A slight complication is that in Vampire’s AVATAR, A-clauses C ←{[C]} are generated on a per-need basis when switching model. This is not a serious issue because we can imagine that the A-clauses were there all along in the Q set. Even the concrete criterion offered by Lemma 77 refers, in its condition 9, to limit superiors and limit points. Some architectures will satisfy it by their very design. For AVATAR, an easy way to meet the condition is to bound the number of times each A-formula can be locked. Once that number has been reached, the A-formula can no longer be locked. An alternative, suggested by a reviewer, is to disable all splitting after the prover has run for a specified time. 7 Application to Other Architectures AVATAR may be the most natural application of our framework, but it is not the only one. We will complete the picture below by studying splitting without backtracking, labeled splitting, and SMT with quantifiers. 123 Unifying Splitting Page 35 of 44 16 7.1 Splitting Without Backtracking Before the invention of AVATAR, Riazanov and Voronkov [25] had already experimented with splitting in Vampire in a lighter variant without backtracking. They based their work on ordered resolution O with selection [2], but the same ideas work with superposition. Weiden- bach [31, end of Sect. 4.5] independently outlined the same technique at about the same time. The basic idea of splitting without backtracking is to extend the signature with a countable set P of nullary predicate symbols disjoint from and to augment the base calculus with a binary splitting rule that replaces a clause C ∨D with C ∨p and D∨¬p,where C and D share no variables and p ∈ P. Riazanov and Voronkov require that the precedence ≺ makes all P-literals smaller than the -literals. Binary splitting then qualifies as a simplification rule. They show that their rule and a few variants are consistency-preserving. They do not show refutational completeness, but this is obvious since the rule is a simplification. Riazanov and Voronkov also extend the selection function of the base calculus to support P-literals. They present two such extensions: The blocking function allows for the selection of P-literals in clauses that contain -literals, whereas the parallel function selects only maximal P-literals in pure P-clauses and otherwise imitates the original selection function. Parallel selection cleanly separates the P-and the -literals. Bachmair and Ganzinger proved O statically complete, and this also obviously extends to ordered resolution with this extension, which we denote by O , since it is an instance of the same calculus. The calculus O is closely related to an instance of our framework. Let F be the set of -clauses, with the empty clause as ⊥.Let O = (FInf, FRed),where FInf is the set of ordered resolution inferences on F with some selection function and FRed is the standard redundancy criterion [2, Sect. 4.2.2], and similarly O = (FPInf, FPRed). We use the notion of entailment from Example 1 for the base relations | and |≈ for both calculi. We take V = P for defining AF. The properties (D1)–(D6) and (R1)–(R7) are verified for | and FRed, respectively. This gives us a splitting calculus LA = (SInf, SRed), whose name stands for lightweight AVATAR. Lightweight AVATAR amounts to the splitting architecture Cruanes implemented in Zipperposition and confusingly called “AVATAR” [9, Sect. 2.5]. Binary splitting can be realized in LA as the following simplification rule: C ∨ D ← A BinSplit C ← A ∪{a} D ← A ∪{¬a} with the side conditions that a ∈ asn(C) and C ∨ D is splittable into C, D. By Theorem 21, LA is complete. Like splitting without backtracking but unlike the real AVATAR, Cruanes’s architecture is not guided by a propositional model. It is essentially an instance of LA,exceptthatitis based on superposition instead of ordered resolution. It performs branch-specific simplifica- tions (a special case of subsumption demodulation [17]), which is supported by our locking mechanism. A SAT solver is used to detect propositional unsatisfiability (corresponding to our Unsat rule) and to eliminate assertions that are implied at the SAT solver’s top level (corresponding to our Trim rule). The calculi O and LA are very similar but not identical. O has a slightly stronger notion P P of inference redundancy, because its order ≺ can access not only the -literals but also the P-literals, whereas with LA the P-literals are invisible to the base calculus. To see this, consider the set consisting of the -clauses 123 16 Page 36 of 44 G. Ebner et al. qb ∨ p ∨¬q ¬a ∨ p ∨¬qa ∨ p ∨¬q where P ={a, b}. Given the precedence a ≺ b ≺ p ≺ q, an ordered resolution inference is possible between the first two clauses, with b ∨ p as its conclusion. This inference is redundant according to FPRed , because the conclusion is entailed by the first, third, and fourth clauses taken together, all of which are ≺-smaller than the main premise b ∨ p ∨¬q. However, the corresponding AF-inference is not redundant according to SRed , because the assertions are simply truncated by the projection operator () and not compared. Without the assertions, the third and fourth clauses are equal to, but not smaller than, the main premise, and the inference is not redundant. Note that the set is not saturated: Inferences are possible to derive ¬a ∨ p and a ∨ p,which make b ∨ p redundant. Another dissimilarity is that LA can detect unsatisfiability immediately using a SAT solver, whereas splitting without backtracking generally needs many propositional resolution steps to achieve the same. Correspondingly, on satisfiable problems, LA allows smaller saturated sets. For example, while the A-clause set {⊥ ← {a, ¬b}, ⊥←{b}} is saturated, its O counterpart is subject to an inference between ¬a ∨ b and ¬b. As positive results, we will show that O and LA share the same notion of entailment and O ’s redundancy criterion is stronger than LA’s, yet saturation w.r.t. LA guarantees satura- tion w.r.t. O , up to the natural correspondence between A-clauses and -clauses. More P P precisely, a -clause can be written as C ∨ L ∨ ··· ∨ L ,where C is a -clause and P 1 n the L ’s are P-literals. Let α be a bijective mapping such that α(C ∨ L ∨ ··· ∨ L ) = i 1 n C ←{¬L ,..., ¬L } is the corresponding A-clause. We overload the operator to erase 1 n the P-literals: C ∨ L ∨ ··· ∨ L =C ←{¬L ,..., ¬L } = C. Moreover, let G denote 1 n 1 n the function that returns all ground instances of a clause, clause set, or inference according to , which is assumed to contain at least one constant. Lemma 78 Given the -clause sets M, N, we have M | N if and only if α(M) | α(N). Proof Since both entailments are defined via grounding, it suffices to consider the case where M and N are ground. For the forward direction, we must show that (α(M)) | N for some J in which α(N) is enabled. Let K be a -model of {α(M)} . We will show that at least one clause in N is true in K. We start by showing that K ∪ J is a -model of M.Let C ∈ M.If α(C) is enabled in J,then C∈ (α(M)) . Thus K | C and finally K ∪ J | {C}. Otherwise, α(C) contains an assertion that is false in J, which means that C contains the complementary P-literal, which is true in J, and we have K ∪ J | {C}. Either way, K ∪ J | M and hence, since M | N, one of the clauses in N is true in K∪J.Since α(N) is enabled in J,all P-literals occurring in N are false in K ∪ J. Therefore, each clause in N must contain a true -literal in K, which means that the corresponding clause in N must also be true in K. For the backward direction, we must show that M | N.Let K ∪ J be a -model of M, where K is a -interpretation and J is a P-interpretation. We will show that a clause in N is true in K ∪ J.If α(N) is disabled in J, there exists a P-literal in some clause from N that is true in K ∪ J, which suffices to make the entire clause true. Otherwise, N is enabled in J and then (α(M)) | N.Since K ∪ J | M,wehave K | (α(M)) . Hence, one of the clauses J J in N is true in K, and its counterpart in N is also true in K ∪ J. Lemma 79 Given an inference ι over -clauses, if α(ι) is a Base inference, then ι ∈ FPInf. Proof Let ι = (C ,..., C , C ) and assume α(ι) is a Base inference. By the definition n 1 0 of ordered resolution, none of the -clauses C ,for i ∈{1,..., n}, can be ⊥. Thus, the 123 Unifying Splitting Page 37 of 44 16 selected literals in the premises coincide with those chosen by the parallel selection function on the -clauses C and so ι ∈ FPInf. P i Lemma 80 (1) Given a -clause C, if α(C) ∈ SRed (α(N)),then C ∈ FPRed (N). P F F (2) Given an inference ι over -clauses, if α(ι) ∈ SRed (α(N)),then ι ∈ FPRed (N). P I I Proof For (2), let ι ∈ G (ι) and let C ,..., C be ι ’s premises and C be its conclusion. n 1 0 Let E ={C ← A ∈ α(G (N)) | C ≺C } and F ={C ∈ N | G (C) ≺ C }.Bythe 1 1 definition of standard redundancy, assuming that {α(C ), ...,α(C )}∪ E | {α(C )} we n 2 0 need to show that {C ,..., C }∪ F | {C }. By Lemma 78, this amounts to showing n 2 0 that {α(C ), ...,α(C )}∪ α(F ) | {α(C )}. By (D3), this follows from our assumption if n 2 0 E ⊆ α(F ). This subset inclusion holds because if C ≺C ,thenwehave C ∨ D ≺ C for 1 1 every P-clause D,since P-literals are smaller than -literals. For (1), essentially the same line of argumentation applies, with n = 1and C = C = C. 1 0 Lemma 81 (Saturation) Let N be a set of -clauses. If N is saturated w.r.t. O ,then α(N) P P is saturated w.r.t. LA. Proof Let ι = (α(C ), ...,α(C ), α(C )) ∈ SInf be an inference with premises in α(N).We n 1 0 will show that it is redundant w.r.t. α(N). Case Base: We need to show that {ι} ⊆ FRed ((α(G (N))) ) for every propositional inter- J I J pretations J.The casewhere ι is disabled in J is trivial. Otherwise, let θ be a substitution such that ιθ ∈ G (ι). We must show that {C θ,..., C θ} ∪ E|{C θ},where n 2 0 E ={α(C) | C ∈ G (N) and C≺C θ}. Because the premises’ assertions are contained in the conclusion’s, this is equivalent to showing that {α(C θ),...,α(C θ)}∪E | {α(C θ)}. n 2 0 By Lemma 79, there exists an inference (C ,..., C , C ) ∈ FPInf.Since N is saturated, n 1 0 the inference is redundant—i.e., {C θ, ..., C θ}∪ F | {C θ},where F ={C | C ∈ n 2 0 G (N) and C ≺ C θ}.If α(F ) ⊆ E ,wecan invoke Lemma 78 to conclude. However, in the general case, we have only that α(F \ F ) ⊆ E ,where F ={C ∈ F |C=C θ}, eq eq 1 and thus there might be models of E that are models of F \ F but not of F . Fortunately, eq eq we can show that {C θ, ..., C θ}∪ (F \F ) | {C θ}. We proceed by removing from F n 2 eq 0 each clause D ∈ F in turn and by showing that the entailment is preserved by each step. eq Finally, we invoke Lemma 78. A slight complication is that F maybeinfinite.However, eq by compactness, only a finite subset F ⊆ F is needed to have the desired entailment. eq eq Let D ∈ N be a clause that generalizes the ground, $-largest clause in F . Then there eq exists an inference (C ,..., C , D, D ) ∈ FPInf such that C θ∈G (D ) and the P- n 2 0 0 0 literals of D are the union of those of C ,..., C , D. By renaming the variables in D and 0 n 2 D , we can ensure that Dθ=C θ and D θ=C θ.Now,toprove thedesired 0 1 0 0 entailment, assume that J is a model of {C θ, ..., C θ}∪ (F \{Dθ}).Since N is saturated, n 2 {C θ, ..., C θ}∪{C ∈ G (N) | C ≺ Dθ}|{D θ}. Since we are proceeding from largest n 2 0 to smallest clause, we have {C ∈ G (N) | C ≺ Dθ}⊆ F \{Dθ}, even if some clauses have been removed from F already. Thus, in both cases, J | {D θ}.If J makes a -literal of D θ 0 0 true, J makes the same literal in C θ true. Otherwise, either J makes one of the P-literals of C θ, ..., C θ true, satisfying C θ for the same reason, or it makes one of the P-literals of n 2 0 D true and then J | {C θ, ..., C θ}∪ F , which as noted above implies J | {C θ} by the n 2 0 saturation of N. In both cases, J | {C θ}. Case Unsat: The inference derives ⊥ from a set of P-clauses (α(A )) such that {α(A )} i i i i=1 is propositionally unsatisfiable—i.e., {A } | {⊥} in O .Since O is complete and N ⊇{A } i i P P i i 123 16 Page 38 of 44 G. Ebner et al. is saturated, we have ⊥∈ N and hence ⊥∈ α(N). Therefore, ι is redundant also in this case. 7.2 Labeled Splitting Labeled splitting, as originally described by Fietzke and Weidenbach [15] and implemented in SPASS, is a first-order resolution-based calculus with binary splitting that traverses the split tree in a depth-first way, using an elaborate backtracking mechanism inspired by CDCL [20]. It works on pairs (, N ),where is a stack storing the current state of the split tree and N is a set of labeled clauses—clauses annotated with finite sets of natural numbers. We model labeled splitting as an instance of the locking prover L based on the splitting calculus LS = (SInf, SRed) induced by the resolution calculus R = (FInf, FRed),where | and |≈ are as in Example 1 and V = {l , r , s }. A-clauses are essentially the same as i i i i∈N labeled clauses. Splits are identified by unique split levels. Given a split on C ∨ D with level k,the propositional variables l ∈ asn(C) and r ∈ asn(D) represent the left and right branches, k k respectively. In practice, the prover would dynamically extend fml to ensure that fml(l ) = C and fml(r ) = D. When splitting, if we simply added the propositional clause ⊥←{¬l , ¬r }, we would k k always need to consider either C←{l } or D←{r }, depending on the interpretation. However, k k labeled splitting can undo splits when backtracking. Yet fairness would require us to perform inferences with either C or D, which Fietzke and Weidenbach avoid. We solve this as follows. Let %=∼⊥. We introduce the propositional variable s ∈ asn(%) so that we can enable or disable the split as we wish. The StrongUnsat rule then knows that s is true and that the cases are exhaustive, but we can still switch to propositional models that disable both C and D. A-clauses are then split using the following binary variant of Split: C ∨ D ← A SoftSplit ⊥←{¬l , ¬r , s } C ← A ∪{l } D ← A ∪{r } k k k k k where C and D share no variables and k is the next split level. Unlike AVATAR, labeled splitting keeps the premise and might split it again with another level. We rely on locking to ensure that the premise is not split within either branch. To emulate the original, the locking prover based on the LS calculus must repeatedly apply the following three steps in any order until saturation: 1. Apply Base (via Lift and Derive) to perform an inference from the enabled A-clauses. If an enabled ⊥← A is derived with A ⊆ {l , r }, apply Switch or StrongUnsat. i i 2. Use Derive (via Lift) to delete a redundant enabled A-clause. 3. Use Lock to temporarily remove a locally redundant enabled A-clause. 4. Use Derive (via Lift) to simplify an enabled A-clause. If the original A-clause is made redundant, delete it; otherwise, use Lock to temporarily remove it. If an enabled ⊥← A is derived with A ⊆ {l , r }, apply Switch or StrongUnsat. i i 5. Apply SoftSplit (via Lift and Derive) with split level k on an A-clause C.Thenuse Switch to enable the left branch and apply Lock on C with s as the lock. Disabled A-clauses are the ones that occur in branches other than the current one and in disabled splits. As such, they should not be used when exploring the current branch. Switch is powerful enough to support all of Fietzke and Weidenbach’s backtracking rules, but to explore the tree in the same order as they do, we must choose the new model carefully. 123 Unifying Splitting Page 39 of 44 16 If a left branch is closed, the model must be updated so as to disable the splits that were not used to close this branch and to enable the right branch. If a right branch is closed, the split must be disabled, and the model must switch to the right branch of the closest enabled split above it with an enabled left branch. If a right branch is closed but there is no split above with an enabled left branch, the entire tree has been visited. Then, a propositional clause ⊥← A with A ⊆ {s } is |-entailed by the A-clause set, and StrongUnsat can finish the refutation by exploiting fml(s ) =%. We illustrate the strategy on an example. Example 82 Let N be the clause set {¬q(x), p(x) ∨ q(y), r(x) ∨ s(y), ¬p(x) ∨ q(y)} It is unsatisfiable due to the first, second, and fourth clauses. Let J =¬V be the initial model. The first disjunction is split into p(x) ←{l }, q(y) ←{r } and ⊥←{¬l , ¬r , s } 0 0 0 0 0 by SoftSplit.Thena Switch transition replaces J with (J \{¬l , ¬s }) ∪{l , s }.This 0 0 0 0 0 0 enables the A-clause p(x) ←{l }.Then Lock removes p(x) ∨ q(y) from N for as long as 0 0 s is enabled. Splitting the other two disjunctions leads to the state (J , N , L ),where 2 2 J = (J \ {¬l , ¬s }) ∪ {l , s } 0 i i i i i=0 i=0 N = {⊥ ← {¬l , ¬r , s }} ∪ i i i i=0 {¬q(x), p(x) ←{l }, r(x) ←{l }, ¬p(x) ←{l } 0 1 2 q(y) ←{r }, s(y) ←{r }, q(y) ←{r }} 0 1 2 L ={({s }, p(x) ∨ q(x)), ({s }, r(x) ∨ s(y)), ({s }, ¬p(x) ∨ q(y))} 0 1 2 where the last three clauses listed in N are disabled and thus currently unusable for infer- ences. The first backtracking step happens after a Base inference produces ⊥←{l , l } from 0 2 p(x) ←{l } and ¬p(x) ←{l }.The Switch disables s , because this split was not useful in 0 2 1 closing the branch, and it moves from branch l to r . The new model disables ¬p(x) ←{l }, 2 2 2 enables q(y) ←{r }, and unlocks r(x) ∨ s(y). The second backtracking step happens after ⊥←{r } is derived from ¬q(x) and q(y) ← {r }. Since both branches of the split s have now been closed, the Switch rule is invoked, 2 2 producing the model (J \{¬r , ¬s }) ∪{r , s }. This unlocks ¬p(x) ∨ q(y), and now only 0 0 0 0 0 q(y) ←{r } is enabled in addition to the unlocked input clauses. The r branch is immediately closed by the generation of ⊥←{r } from q(y) ←{r } and 0 0 0 ¬q(x). Now, the resulting A-clause set contains the propositional clauses ⊥←{l , l }, ⊥←{r }, 0 2 2 ⊥←{r } derived by inferences as well as ⊥←{s , ¬l , ¬r } and ⊥←{s , ¬l , ¬r } 0 0 0 0 2 2 2 produced by Split. Clearly, it entails ⊥←{s , s }.Since fml(s ) = fml(s ) =∼⊥,atthis 0 2 0 2 point StrongUnsat derives ⊥. By following the strategy presented above, LS closely simulates the original calculus, in the sense that it is possible to add and remove (or at least disable) exactly the same elements to the A-clause set as is done in the original, and in the same order. A subtle, inconsequential difference lies in the backtracking: Labeled splitting can move to a branch where ⊥ is enabled, whereas our Switch rule requires that all propositional clauses are satisfied. What about fairness? The above strategy helps achieve fairness by ensuring that there exists at most one limit point. It also uses locks in a well-behaved way. This means we can considerably simplify the notion of fairness for ⇒ -derivations and obtain a criterion 123 16 Page 40 of 44 G. Ebner et al. that is almost identical to, but slightly more liberal than, Fietzke and Weidenbach’s, thereby re-proving the completeness of labeled splitting. For terminating derivations, their fairness criterion coincides with ours: Both require that the final A-clause set is locally saturated and all propositional clauses are satisfied by the interpretation. For diverging derivations, Fietzke and Weidenbach construct a limit subsequence ( , N ) of the derivation ( , N ) and demand that every persistent inference i i i i i i in it be made redundant, exactly as we do for ⇒ -derivations. The subsequence consists of all states that lie on the split tree’s unique infinite branch. Therefore, this subsequence converges to a limit point of the full derivation. Locks are well behaved, with lim sup L = j→∞ j L , because with the strategy above, once an A-clause is enabled on the rightmost branch, it remains enabled forever. Our definition of fairness allows more subsequences, although this is difficult to exploit without bringing in all the theoretical complexity of AVATAR. Example 83 Alternating age-based and unrestricted heuristic selection is incomplete for labeled splitting just as it is for AVATAR (Example 58). To see why, start with the clause set {p(a, y), ¬p(x, y) ∨ p(s(x), y), q(a), r(x) ∨ q(y) ∨¬p(x, y), s(x) ∨¬q(x), ¬s(x)} and always select the negative literal if there is one. The prover begins by deriving p(s(a), y) and r(a) ∨ q(y) using the age-based heuristic. Then it heuristically selects r(a) ∨ q(y) and splits it. In the left branch, where q(y) is enabled, q(a) is locally redundant and locked. Before age-based selection allows the prover to derive ⊥ from the clauses s(x) ∨¬q(x), q(y), and ¬s(x), it will also have derived p(s(s(a)), y) and r(s(a)) ∨ q(y). When the prover switches back to the right branch, it can heuristically select the newly derived disjunction and split it. This process can be repeated to give rise to infinitely many splittable clauses of the form r(s (a)) ∨ q(y). In this way, no inferences are ever performed in the rightmost branch, only splits. The clause q(a), which is necessary for a refutation, is never selected for inferences; most of the time, it is even locally redundant. 7.3 SMT with Quantifiers SMT solvers based on DPLL(T)[20] combine a SAT solver with theory solvers, each responsible for reasoning about a specific quantifier-free theory (e.g., equality, linear inte- ger arithmetic). In the classical setup, the theories are decidable, and the overall solver is a decision procedure for the union of the theories. Some SMT solvers, including cvc5 [3], veriT [8], andZ3[10], also support quantified formulas via instantiation at the expense of decidability. Complete instantiation strategies have been developed for various fragments of first-order logic [16, 23, 24]. In particular, enumerative quantifier instantiation [24] is complete under some conditions. An SMT solver following such a strategy ought to be refutationally com- plete, but this has never been proved. Although SMT is quite different from the architectures we have studied so far, we can instantiate our framework to show the completeness of an abstract SMT solver. The model-guided prover MG will provide a suitable starting point, since we will need neither L’s locking mechanism nor AV’s given clause procedure. Let F be the set of first-order -formulas with a distinguished falsehood ⊥.Werepresent the SMT solver’s underlying SAT solver by the Unsat rule and complement it with an inference system FInf that clausifies formulas, detects inconsistencies up to theories excluding quantifiers, and instantiates quantifiers. For FRed, we take an arbitrary instance of the standard 123 Unifying Splitting Page 41 of 44 16 redundancy criterion [2, Sect. 4.2.2]. It can be used to split disjunctions destructively and to simplify formulas. We define the “theories with quantifiers” calculus TQ = (FInf, FRed).For the consequence relations | and |≈, we use entailment in the supported theories including quantifiers. Some theories such as linear integer arithmetic are not compact and thus cannot directly be used for the consequence relation. Instead, we define M | N to be true if and only if LIA there exist finite sets M ⊆ M and N ⊆ N such that M −→ N is valid modulo linear integer arithmetic. For finite sets, this relation coincides with noncompact entailment: If M is finite, then M | ⊥ if and only if M is inconsistent modulo linear integer arithmetic. LIA Both completeness and soundness of a concrete prover are statements about the finite set of input formulas, so using a compactified version of the consequence relation is purely an implementation detail and poses no restriction. The clausification rules work on logical symbols outside quantifiers; they derive C and D from a premise C ∧ D, among others. The theory rules can derive ⊥ from some finite formula set N if N | {⊥}, ignoring quantifiers; this triggers a model switch. Finally, the instantiation rules derive formulas p(t) from premises ∀x.p(x),where t is some ground term; the instantiation strategy determines which ground terms must be tried and in which order. A lot of complexity hidden in FInf—such as purification and theory-specific data structures and algorithms—is taken as a black box. As with AVATAR, the initial problem is expressed using -formulas. We use the same approximation function as in AVATAR to represent formulas as assertions (Example 8). Abusing terminology slightly, let us call an A-formula C←A a subunit if C is not a disjunction. Whenever a (ground) disjunction C ∨ D ← A emerges, we immediately apply Split.This delegates clausal reasoning to the SAT solver. It then suffices to assume that TQ is complete for subunits. Theorem 84 (Dynamic completeness) Assume TQ is statically complete for subunit sets. Let (J , N ) be a fair ⇒ -derivation based on TQ.If N | {⊥} and N contains only i i i MG 0 ∞ subunits, then ⊥∈ N for some j. Proof The proof is analogous to that of Theorem 28. Because we only have conditional static completeness of (FInf, FRed),weneed theassumptionthat N contains only subunits. Care must be taken to design a practical fair strategy. Like AVATAR-based provers, SMT solvers will typically not perform all SInf-inferences, not even up to SRed .Given a ≈ b←{v }, b ≈ c←{v }, a ≈ d←{v }, c ≈ d←{v },and a ≈ c←{v }, an SMT solver will 0 1 2 3 4 find only one of the conflicts ⊥←{v , v , v } or ⊥←{v , v , v } but not both. This leaves 0 1 4 2 3 4 us in a similar predicament as with locking: A theory conflict might be nonredundant at the limit point, even though it is redundant at every point of the derivation. The SMT solver just happened to choose the wrong conflict every time. Example 85 Consider the initial clause set {∀x (x ≤ 0 ∨ a > x), a ≈ 0, a + 3 < 2} Eagerly applying quantifier instantiation, we get the instances i ≤ 0 ←{[i ≤ 0]}, a > i ←{[a > i]}, ⊥←{¬[i ≤ 0], ¬[a > 1]} for every i ∈ N. The solver then starts in a model where each [a > i] is true. Here it can derive the conflict ⊥←{[a > 0]}. Then it switches to the next model where [a > 0] is false, but [a > 1], [a > 2], etc. are true, and derive the conflict ⊥←{[a > 1]}. 123 16 Page 42 of 44 G. Ebner et al. Iterating this process, we see that all conflicts are of the form [a > i] for some i. However, at the limit point— where [a > i] is false for every i— none of these conflicts is enabled. The only conflict which exists at the limit point is between a ≈ 0and a + 3 < 2, and the solver never finds it because it detects a different conflict first. For decidable theories, a practical fair strategy is to first clausify and detect theory conflicts and to instantiate quantifiers only if no other rules are applicable. A similar case analysis as in the proof of Lemma 77 works to establish fairness for this strategy. First consider the case where quantifier instantiation is invoked infinitely often. Then there exists an infinite subsequence (J , N ) of states such that (1) (J ) converges to a j j j j j limit point, and (2) no N has a theory conflict. To prove the ⇒ -derivation fair, we MG need to show that ι ∈ FInf((N ) ) implies ι ∈ FRed ((N ) ) for every ι. If ι is a theory ∞ J I i J conflict or clausification inference, then its finitely many premises are in N for some j, contradicting the strategy. Otherwise, ι is a quantifier instantiation. Here, it suffices to ensure that A-formulas that are enabled infinitely often at a quantifier instantiation step are also fully instantiated. (Just as with AV provers, it is possible that not all limit points are saturated.) Otherwise, quantifier instantiation is only invoked finitely often—either because every encountered model had a theory conflict, or because there was nothing to instantiate. Here, it suffices to assume that clausification is a strongly finitary simplification bound (which means that a formula can only be clausified in a finite number of ways). Under this assumption only finitely many base formulas will be derived; this implies that only a finite number of models will be considered. The last model will then be saturated due to the strategy. There is also the question of model soundness. If the SMT solver starts with the -formula set N and ends in a state (J , N ) with J | (N ) , we would like the solver to generate 0 i i i i ⊥ a model of (N ) , from which a model of N can be derived. This is possible if the solver i J 0 performs only sound inferences and applies Approx systematically. Then (N ) is fully i J exposed to the propositional level, and fml(J ) is a theory model of N and therefore of N . i J 0 Example 86 Consider an SMT solver equipped with the theory of uninterpreted functions and linear arithmetic. Let {∀x (f(x) ≈ 0 ∨ g(x) ≈ 0), f(1) ≈ 1, g(1) ≈ 1} be the initial clause set. The SMT solver first considers the propositional model J =¬V. There is no theory conflict, so the solver uses quantifier instantiation and clausification to derive f(0) ≈ 0 ←{v }, g(0) ≈ 0 ←{v },and ⊥←{¬v , ¬v }. We have J | ⊥ ← 0 1 0 1 0 {¬v , ¬v }, so the solver switches to the model J = (J \{¬v }) ∪{v }. There is still no 0 1 1 0 0 0 theory conflict, so it instantiates a quantifier again, producing the A-clauses f(1) ≈ 0←{v }, g(1) ≈ 0←{v },and ⊥←{¬v , ¬v }. The solver now switches to J = (J \{¬v })∪{v }. 3 2 3 2 1 2 2 It derives a theory conflict ⊥←{v } and switches to J = (J \{¬v })∪{v }. For this model, 2 3 2 3 3 there is also a conflict, ⊥←{v }, and the solver terminates by applying Unsat. Our mathematization of AVATAR and SMT with quantifiers exposes their dissimilarities. With SMT, splitting is mandatory, and there is no subsumption or simplification, locking, or active and passive sets. And of course, theory inferences are n-ary and quantifier instantiation is unary, whereas superposition is binary. Nevertheless, their completeness follows from the same principles. 123 Unifying Splitting Page 43 of 44 16 8 Conclusion Our framework captures splitting calculi and provers in a general way, independently of the base calculus. Users can conveniently derive a dynamic refutational completeness result for a splitting prover based on a given statically refutationally complete calculus. As we developed the framework, we faced some tension between constraining the SAT solver’s behavior and the saturation prover’s. It seemed preferable to constrain the prover, because the prover is typically easier to modify than an off-the-shelf SAT solver. To our surprise, we discovered counterexamples related to locking, formula selection, and simplification, which may affect Vampire’s AVATAR implementation, depending on the SAT solver and prover heuristics used. We proposed some restrictions, but alternatives could be investigated. We found that labeled splitting can be seen as a variant of AVATAR where the SAT solver follows a strict strategy and propositional variables are not reused across branches. A benefit of the strict strategy is that locking preserves completeness. As for the relationship between AVATAR and SMT, there are some glaring differences, including that splitting is necessary to support disjunctions in SMT but fully optional in AVATAR. For future work, we could try to complete the picture by considering other related architectures [5–7, 11, 12]. Acknowledgements Petar Vukmirovic´ greatly helped us design the abstract notions connected to A-formulas. Giles Reger patiently explained AVATAR and revealed some of its secrets. Simon Cruanes did the same regard- ing lightweight AVATAR. Simon Robillard, Martin Suda, Andrei Voronkov, Uwe Waldmann, and Christoph Weidenbach discussed splitting with us. Haniel Barbosa, Pascal Fontaine, Andrew Reynolds, and Cesare Tinelli explained some fine points of SMT. Natarajan Shankar pointed us to his work on the Shostak proce- dure. Ahmed Bhayat, Nicolas Peltier, Martin Suda, Mark Summerfield, Dmitriy Traytel, Petar Vukmirovic, ´ and the anonymous reviewers suggested textual improvements. We thank them all. This research has received funding from the European Research Council (ERC) under the European Union’s Horizon 2020 research and innovation program (Grant Agreement No. 713999, Matryoshka). The research has also received funding from the Nederlandse Organisatie voor Wetenschappelijk Onderzoek (NWO) under the Vidi program (Project No. 016.Vidi.189.037, Lean Forward). Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. References 1. Bachmair, L., Ganzinger, H.: Rewrite-based equational theorem proving with selection and simplification. J. Log. Comput. 4(3), 217–247 (1994) 2. Bachmair, L., Ganzinger, H.: Resolution theorem proving. In: Robinson, A., Voronkov, A. (eds.) Handbook of Automated Reasoning, vol. I, pp. 19–99. Elsevier, Amsterdam (2001) 3. Barrett, C., Conway, C.L., Deters, M., Hadarean, L., Jovanovic, D., King, T., Reynolds, A., Tinelli, C.: CVC4. In: Gopalakrishnan, G., Qadeer, S. (eds.) CAV 2011. LNCS, vol. 6806, pp. 171–177. Springer, Berlin (2011) 4. Bjø[r]ner, N., Reger, G., Suda, M., Voronkov, A.: AVATAR modulo theories. In: Benzmüller, C., Sutcliffe, G., Rojas, R. (eds.) GCAI 2016. EPiC Series in Computing, vol. 41, pp. 39–52. EasyChair, Manchester (2016) 5. Bonacina, M.P., Plaisted, D.A.: SGGS theorem proving: an exposition. In: Schulz, S., de Moura, L., Konev, B. (eds.) PAAR-2014. EPiC Series in Computing, vol. 31, pp. 25–38. EasyChair, Manchester (2014) 123 16 Page 44 of 44 G. Ebner et al. 6. Bonacina, M.P., Lynch, C., de Moura, L.: On deciding satisfiability by DPLL( +T ) and unsound theorem proving. In: Schmidt, R.A. (ed.) CADE-22. LNCS, vol. 5663, pp. 35–50. Springer, Berlin (2009) 7. Bonacina, M.P., Graham-Lengrand, S., Shankar, N.: Satisfiability modulo theories and assignments. In: de Moura, L. (ed.) CADE-26. LNCS, vol. 10395, pp. 42–59. Springer, Berlin (2017) 8. Bouton, T., de Oliveira, D.C.B., Déharbe, D., Fontaine, P.: veriT: an open, trustable and efficient SMT- solver. In: Schmidt, R.A. (ed.) CADE-22. LNCS, vol. 5663, pp. 151–156. Springer, Berlin (2009) 9. Cruanes, S.: Extending superposition with integer arithmetic, structural induction, and beyond. Ph.D. thesis, École polytechnique (2015) 10. de Moura, L., Bjørner, N.: Z3: an efficient SMT solver. In: Ramakrishnan, C.R., Rehof, J. (eds.) TACAS 2008. LNCS, vol. 4963, pp. 337–340. Springer, Berlin (2008) 11. de Moura, L., Jovanovic, ´ D.: A model-constructing satisfiability calculus. In: Giacobazzi, R., Berdine, J., Mastroeni, I. (eds.) VMCAI 2013. LNCS, vol. 7737, pp. 1–12. Springer, Berlin (2013) 12. Déharbe, D., Fontaine, P., Ranise, S., Ringeissen, C.: Decision procedures for the formal analysis of software. In: Barkaoui, K., Cavalcanti, A., Cerone, A. (eds.) ICTAC 2006. LNCS, vol. 4281, pp. 366– 370. Springer, Berlin (2006) 13. Ebner, G., Ullrich, S., Roesch, J., Avigad, J., de Moura, L.: A metaprogramming framework for formal verification. Proc. ACM Program. Lang. 1(ICFP), 34:1–34:29 (2017) 14. Ebner, G., Blanchette, J., Tourret, S.: A unifying splitting framework. In: Platzer, A., Sutcliffe, G. (eds.) CADE-28. LNCS, vol. 12699, pp. 344–360. Springer, Berlin (2021) 15. Fietzke, A., Weidenbach, C.: Labelled splitting. Ann. Math. Artif. Intell. 55(1–2), 3–34 (2009) 16. Ge, Y., de Moura, L.: Complete instantiation for quantified formulas in satisfiabiliby modulo theories. In: Bouajjani, A., Maler, O. (eds.) CAV 2009. LNCS, vol. 5643, pp. 306–320. Springer, Berlin (2009) 17. Gleiss, B., Kovács, L., Rath, J.: Subsumption demodulation in first-order theorem proving. In: Peltier, N., Sofronie-Stokkermans, V. (eds.) IJCAR 2020, Part I. LNCS, vol. 12166, pp. 297–315. Springer, Berlin (2020) 18. Knuth, D.E., Bendix, P.B.: Simple word problems in universal algebras. In: Leech, J. (ed.) Computational Problems in Abstract Algebra, pp. 263–297. Pergamon Press, Oxford (1970) 19. McCune, W., Wos, L.: Otter—the CADE-13 competition incarnations. J. Autom. Reason. 18(2), 211–220 (1997) 20. Nieuwenhuis, R., Oliveras, A., Tinelli, C.: Solving SAT and SAT modulo theories: from an abstract Davis–Putnam–Logemann–Loveland procedure to DPLL(T ). J. ACM 53(6), 937–977 (2006) 21. Nipkow, T., Klein, G.: Concrete Semantics: With Isabelle/HOL. Springer, Cham (2014) 22. Reger, G., Suda, M., Voronkov, A.: Playing with AVATAR. In: Felty, A.P., Middeldorp, A. (eds.) CADE- 25. LNCS, vol. 9195, pp. 399–415. Springer, Berlin (2015) 23. Reynolds, A., Tinelli, C., Goel, A., Krstic, ´ S.: Finite model finding in SMT. In: Sharygina, N., Veith, H. (eds.) CAV 2013. LNCS, vol. 8044, pp. 640–655. Springer, Berlin (2013) 24. Reynolds, A., Barbosa, H., Fontaine, P.: Revisiting enumerative instantiation. In: Beyer, D., Huisman, M. (eds.) TACAS 2018. LNCS, vol. 10806, pp. 112–131. Springer, Berlin (2018) 25. Riazanov, A., Voronkov, A.: Splitting without backtracking. In: Nebel, B. (ed.) IJCAI 2001, pp. 611–617. Morgan Kaufmann, San Francisco (2001) 26. Robinson, J.A.: A machine-oriented logic based on the resolution principle. J. ACM 12(1), 23–41 (1965) 27. Schlichtkrull, A., Blanchette, J.C., Traytel, D.: A verified prover based on ordered resolution. In: Mah- boubi, A., Myreen, M.O. (eds.) CPP 2019, pp. 152–165. ACM, New York (2019) 28. Voronkov, A.: AVATAR: the architecture for first-order theorem provers. In: Biere, A., Bloem, R. (eds.) CAV 2014. LNCS, vol. 8559, pp. 696–710. Springer, Berlin (2014) 29. Waldmann, U., Tourret, S., Robillard, S., Blanchette, J.: A comprehensive framework for saturation theorem proving. In: Peltier, N., Sofronie-Stokkermans, V. (eds.) IJCAR 2020, Part I. LNCS, vol. 12166, pp. 316–334. Springer, Berlin (2020) 30. Waldmann, U., Tourret, S., Robillard, S., Blanchette, J.: A comprehensive framework for saturation theorem proving. Technical report (2020). https://matryoshka-project.github.io/pubs/saturate_report.pdf 31. Weidenbach, C.: Combining superposition, sorts and splitting. In: A. Robinson, A. Voronkov (eds.) Handbook of Automated Reasoning, vol. II, pp. 1965–2013. Elsevier/MIT, Amsterdam/Cambridge (2001) 32. Weidenbach, C., Gaede, B., Rock, G.: SPASS & FLOTTER version 0.42. In: McRobbie, M.A., Slaney, J.K. (eds.) CADE-13. Lecture Notes in Computer Science, vol. 1104, pp. 141–145. Springer, Berlin (1996) Publisher’s Note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Journal of Automated Reasoning – Springer Journals
Published: Jun 1, 2023
Keywords: Automated theorem proving; Completeness; Splitting; AVATAR
You can share this free article with as many people as you like with the url below! We hope you enjoy this feature!
Read and print from thousands of top scholarly journals.
Already have an account? Log in
Bookmark this article. You can see your Bookmarks on your DeepDyve Library.
To save an article, log in first, or sign up for a DeepDyve account if you don’t already have one.
Copy and paste the desired citation format or use the link below to download a file formatted for EndNote
Access the full text.
Sign up today, get DeepDyve free for 14 days.
All DeepDyve websites use cookies to improve your online experience. They were placed on your computer when you launched this website. You can change your cookie settings through your browser.