Get 20M+ Full-Text Papers For Less Than $1.50/day. Subscribe now for You or Your Team.

Learn More →

Foundations of Software Science and Computation Structures

Foundations of Software Science and Computation Structures Jean Goubault-Larrecq Barbara König (Eds.) Foundations of Software Science and Computation Structures 23rd International Conference, FOSSACS 2020 Held as Part of the European Joint Conferences on Theory and Practice of Software, ETAPS 2020 Dublin, Ireland, April 25–30, 2020, Proceedings LNCS 12077 ARCoSS Lecture Notes in Computer Science 12077 Founding Editors Gerhard Goos, Germany Juris Hartmanis, USA Editorial Board Members Elisa Bertino, USA Gerhard Woeginger , Germany Wen Gao, China Moti Yung, USA Bernhard Steffen , Germany Advanced Research in Computing and Software Science Subline of Lecture Notes in Computer Science Subline Series Editors Giorgio Ausiello, University of Rome ‘La Sapienza’, Italy Vladimiro Sassone, University of Southampton, UK Subline Advisory Board Susanne Albers, TU Munich, Germany Benjamin C. Pierce, University of Pennsylvania, USA Bernhard Steffen , University of Dortmund, Germany Deng Xiaotie, Peking University, Beijing, China Jeannette M. Wing, Microsoft Research, Redmond, WA, USA More information about this series at � Jean Goubault-Larrecq Barbara König (Eds.) Foundations of Software Science and Computation Structures 23rd International Conference, FOSSACS 2020 Held as Part of the European Joint Conferences on Theory and Practice of Software, ETAPS 2020 Dublin, Ireland, April 25–30, 2020 Proceedings Editors Jean Goubault-Larrecq Barbara König Université Paris-Saclay, University of Duisburg-Essen ENS Paris-Saclay, CNRS Duisburg, Germany Cachan, France ISSN 0302-9743 ISSN 1611-3349 (electronic) Lecture Notes in Computer Science ISBN 978-3-030-45230-8 ISBN 978-3-030-45231-5 (eBook) LNCS Sublibrary: SL1 – Theoretical Computer Science and General Issues © The Editor(s) (if applicable) and The Author(s) 2020. This book is an open access publication. Open Access This book is licensed under the terms of the Creative Commons Attribution 4.0 International License (, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made. The images or other third party material in this book are included in the book’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the book’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use. The publisher, the authors and the editors are safe to assume that the advice and information in this book are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or the editors give a warranty, expressed or implied, with respect to the material contained herein or for any errors or omissions that may have been made. The publisher remains neutral with regard to jurisdictional claims in published maps and institutional affiliations. This Springer imprint is published by the registered company Springer Nature Switzerland AG The registered company address is: Gewerbestrasse 11, 6330 Cham, Switzerland ETAPS Foreword Welcome to the 23rd ETAPS! This is the first time that ETAPS took place in Ireland in its beautiful capital Dublin. ETAPS 2020 was the 23rd instance of the European Joint Conferences on Theory and Practice of Software. ETAPS is an annual federated conference established in 1998, and consists of four conferences: ESOP, FASE, FoSSaCS, and TACAS. Each conference has its own Program Committee (PC) and its own Steering Committee (SC). The conferences cover various aspects of software systems, ranging from theoretical computer science to foundations of programming language developments, analysis tools, and formal approaches to software engineering. Organizing these conferences in a coherent, highly synchronized conference program enables researchers to participate in an exciting event, having the possibility to meet many colleagues working in different directions in the field, and to easily attend talks of different conferences. On the weekend before the main conference, numerous satellite workshops took place that attracted many researchers from all over the globe. Also, for the second time, an ETAPS Mentoring Workshop was organized. This workshop is intended to help students early in the program with advice on research, career, and life in the fields of computing that are covered by the ETAPS conference. ETAPS 2020 received 424 submissions in total, 129 of which were accepted, yielding an overall acceptance rate of 30.4%. I thank all the authors for their interest in ETAPS, all the reviewers for their reviewing efforts, the PC members for their contributions, and in particular the PC (co-)chairs for their hard work in running this entire intensive process. Last but not least, my congratulations to all authors of the accepted papers! ETAPS 2020 featured the unifying invited speakers Scott Smolka (Stony Brook University) and Jane Hillston (University of Edinburgh) and the conference-specific invited speakers (ESOP) Işıl Dillig (University of Texas at Austin) and (FASE) Willem Visser (Stellenbosch University). Invited tutorials were provided by Erika Ábrahám (RWTH Aachen University) on the analysis of hybrid systems and Madhusudan Parthasarathy (University of Illinois at Urbana-Champaign) on combining Machine Learning and Formal Methods. On behalf of the ETAPS 2020 attendants, I thank all the speakers for their inspiring and interesting talks! ETAPS 2020 took place in Dublin, Ireland, and was organized by the University of Limerick and Lero. ETAPS 2020 is further supported by the following associations and societies: ETAPS e.V., EATCS (European Association for Theoretical Computer Science), EAPLS (European Association for Programming Languages and Systems), and EASST (European Association of Software Science and Technology). The local organization team consisted of Tiziana Margaria (general chair, UL and Lero), Vasileios Koutavas (Lero@UCD), Anila Mjeda (Lero@UL), Anthony Ventresque (Lero@UCD), and Petros Stratis (Easy Conferences). vi ETAPS Foreword The ETAPS Steering Committee (SC) consists of an Executive Board, and representatives of the individual ETAPS conferences, as well as representatives of EATCS, EAPLS, and EASST. The Executive Board consists of Holger Hermanns (Saarbrücken), Marieke Huisman (chair, Twente), Joost-Pieter Katoen (Aachen and Twente), Jan Kofron (Prague), Gerald Lüttgen (Bamberg), Tarmo Uustalu (Reykjavik and Tallinn), Caterina Urban (Inria, Paris), and Lenore Zuck (Chicago). Other members of the SC are: Armin Biere (Linz), Jordi Cabot (Barcelona), Jean Goubault-Larrecq (Cachan), Jan-Friso Groote (Eindhoven), Esther Guerra (Madrid), Jurriaan Hage (Utrecht), Reiko Heckel (Leicester), Panagiotis Katsaros (Thessaloniki), Stefan Kiefer (Oxford), Barbara König (Duisburg), Fabrice Kordon (Paris), Jan Kretinsky (Munich), Kim G. Larsen (Aalborg), Tiziana Margaria (Limerick), Peter Müller (Zurich), Catuscia Palamidessi (Palaiseau), Dave Parker (Birmingham), Andrew M. Pitts (Cambridge), Peter Ryan (Luxembourg), Don Sannella (Edinburgh), Bernhard Steffen (Dortmund), Mariëlle Stoelinga (Twente), Gabriele Taentzer (Marburg), Christine Tasson (Paris), Peter Thiemann (Freiburg), Jan Vitek (Prague), Heike Wehrheim (Paderborn), Anton Wijs (Eindhoven), and Nobuko Yoshida (London). I would like to take this opportunity to thank all speakers, attendants, organizers of the satellite workshops, and Springer for their support. I hope you all enjoyed ETAPS 2020. Finally, a big thanks to Tiziana and her local organization team for all their enormous efforts enabling a fantastic ETAPS in Dublin! February 2020 Marieke Huisman ETAPS SC Chair ETAPS e.V. President Preface This volume contains the papers presented at the 23rd International Conference on Foundations of Software Science and Computation Structures (FoSSaCS), which took place in Dublin, Ireland, during April 27–30, 2020. The conference series is dedicated to foundational research with a clear significance for software science. It brings together research on theories and methods to support the analysis, integration, syn- thesis, transformation, and verification of programs and software systems. This volume contains 31 contributed papers selected from 98 full paper submis- sions, and also a paper accompanying an invited talk by Scott Smolka (Stony Brook University, USA). Each submission was reviewed by at least three Program Committee members, with the help of external reviewers, and the final decisions took into account the feedback from a rebuttal phase. The conference submissions were managed using the EasyChair conference system, which was also used to assist with the compilation of these proceedings. We wish to thank all the authors who submitted papers to FoSSaCS 2020, the Program Committee members, the Steering Committee members and the external reviewers. In addition, we are grateful to the ETAPS 2020 Organization for providing an excellent environment for FoSSaCS 2020 alongside the other ETAPS conferences and workshops. February 2020 Jean Goubault-Larrecq Barbara König Organization Program Committee Parosh Aziz Abdulla Uppsala University, Sweden Thorsten Altenkirch University of Nottingham, UK Paolo Baldan Università di Padova, Italy Nick Benton Facebook, UK Frédéric Blanqui Inria and LSV, France Michele Boreale Università di Firenze, Italy Corina Cirstea University of Southampton, UK Pedro R. D’Argenio Universidad Nacional de Córdoba, CONICET, Argentina Josée Desharnais Université Laval, Canada Jean Goubault-Larrecq Université Paris-Saclay, ENS Paris-Saclay, CNRS, LSV, Cachan, France Ichiro Hasuo National Institute of Informatics, Japan Delia Kesner IRIF, Université de Paris, France Shankara Narayanan IIT Bombay, India Krishna Barbara König Universität Duisburg-Essen, Germany Sławomir Lasota University of Warsaw, Poland Xavier Leroy Collège de France and Inria, France Leonid Libkin University of Edinburgh, UK, and ENS Paris, France Jean-Yves Marion LORIA, Université de Lorraine, France Dominique Méry LORIA, Université de Lorraine, France Matteo Mio LIP, CNRS, ENS Lyon, France Andrzej Murawski University of Oxford, UK Prakash Panangaden McGill University, Canada Amr Sabry Indiana University Bloomington, USA Lutz Schröder Friedrich-Alexander-Universität Erlangen-Nürnberg, Germany Sebastian Siebertz Universität Bremen, Germany Benoît Valiron LRI, CentraleSupélec, Université Paris-Saclay, France Steering Committee Andrew Pitts (Chair) University of Cambridge, UK Christel Baier Technische Universität Dresden, Germany Lars Birkedal Aarhus University, Denmark Ugo Dal Lago Università degli Studi di Bologna, Italy x Organization Javier Esparza Technische UniversitätMünchen, Germany Anca Muscholl LaBRI, Université de Bordeaux, France Frank Pfenning Carnegie Mellon University, USA Additional Reviewers Accattoli, Beniamino Dell’Erba, Daniele Alvim, Mario S. Deng, Yuxin André, Étienne Eickmeyer, Kord Argyros, George Exibard, Leo Arun-Kumar, S. Faggian, Claudia Ayala-Rincon, Mauricio Fijalkow, Nathanaël Bacci, Giorgio Filali-Amine, Mamoun Bacci, Giovanni Francalanza, Adrian Balabonski, Thibaut Frutos Escrig, David Basile, Davide Galletta, Letterio Berger, Martin Ganian, Robert Bernardi, Giovanni Garrigue, Jacques Bisping, Benjamin Gastin, Paul Bodeveix, Jean-Paul Genaim, Samir Bollig, Benedikt Genest, Blaise Bonchi, Filippo Ghica, Dan Bonelli, Eduardo Goncharov, Sergey Boulmé, Sylvain Gorla, Daniele Bourke, Timothy Guerrini, Stefano Bradfield, Julian Hirschowitz, Tom Breuvart, Flavien Hofman, Piotr Bruni, Roberto Hoshino, Naohiko Bruse, Florian Howar, Falk Capriotti, Paolo Inverso, Omar Carette, Jacques Iván, Szabolcs Carette, Titouan Jaax, Stefan Carton, Olivier Jeandel, Emmanuel Cassano, Valentin Johnson, Michael Chadha, Rohit Kahrs, Stefan Charguéraud, Arthur Kamburjan, Eduard Cho, Kenta Katsumata, Shin-Ya Choudhury, Vikraman Kerjean, Marie Ciancia, Vincenzo Kiefer, Stefan Clemente, Lorenzo Komorida, Yuichi Colacito, Almudena Kop, Cynthia Corradini, Andrea Kremer, Steve Czerwiński, Wojciech Kuperberg, Denis de Haan, Ronald Křetínský, Jan de Visme, Marc Laarman, Alfons Organization xi Laurent, Fribourg Reutter, Juan L. Levy, Paul Blain Rossman, Benjamin Li, Yong Rot, Jurriaan Licata, Daniel R. Rowe, Reuben Liquori, Luigi Ruemmer, Philipp Lluch Lafuente, Alberto Sammartino, Matteo Lopez, Aliaume Sankaran, Abhisekh Malherbe, Octavio Sankur, Ocan Manuel, Amaldev Sattler, Christian Manzonetto, Giulio Schmitz, Sylvain Matache, Christina Serre, Olivier Matthes, Ralph Shirmohammadi, Mahsa Mayr, Richard Siles, Vincent Melliès, Paul-André Simon, Bertrand Merz, Stephan Simpson, Alex Miculan, Marino Singh, Neeraj Mikulski, Łukasz Sprunger, David Moser, Georg Srivathsan, B. Moss, Larry Staton, Sam Munch-Maccagnoni, Guillaume Stolze, Claude Muskalla, Sebastian Straßburger, Lutz Nantes-Sobrinho, Daniele Streicher, Thomas Nestra, Härmel Tan, Tony Neumann, Eike Tawbi, Nadia Neves, Renato Toruńczyk, Szymon Niehren, Joachim Tzevelekos, Nikos Padovani, Luca Urbat, Henning Pagani, Michele van Bakel, Steffen Paquet, Hugo van Breugel, Franck Patterson, Daniel van de Pol, Jaco Pedersen, Mathias Ruggaard van Doorn, Floris Peressotti, Marco Van Raamsdonk, Femke Pitts, Andrew Vaux Auclair, Lionel Potapov, Igor Verma, Rakesh M. Power, John Vial, Pierre Praveen, M. Vignudelli, Valeria Puppis, Gabriele Vrgoc, Domagoj Péchoux, Romain Waga, Masaki Pérez, Guillermo Wang, Meng Quatmann, Tim Witkowski, Piotr Rabinovich, Roman Zamdzhiev, Vladimir Radanne, Gabriel Zemmari, Akka Rand, Robert Zhang, Zhenya Ravara, António Zorzi, Margherita Remy, Didier Contents Neural Flocking: MPC-Based Supervised Learning of Flocking Controllers.... 1 Usama Mehmood, Shouvik Roy, Radu Grosu, Scott A. Smolka, Scott D. Stoller, and Ashish Tiwari On Well-Founded and Recursive Coalgebras . . . . . . . . . . . . . . . . . . . . . . . 17 Jiří Adámek, Stefan Milius, and Lawrence S. Moss Timed Negotiations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37 S. Akshay, Blaise Genest, LoïcHélouët, and Sharvik Mital Cartesian Difference Categories . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57 Mario Alvarez-Picallo and Jean-Simon Pacaud Lemay Contextual Equivalence for Signal Flow Graphs . . . . . . . . . . . . . . . . . . . . . 77 Filippo Bonchi, Robin Piedeleu, Paweł Sobociński, and Fabio Zanasi Parameterized Synthesis for Fragments of First-Order Logic Over Data Words . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97 Béatrice Bérard, Benedikt Bollig, Mathieu Lehaut, and Nathalie Sznajder Controlling a Random Population . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119 Thomas Colcombet, Nathanaël Fijalkow, and Pierre Ohlmann Decomposing Probabilistic Lambda-Calculi . . . . . . . . . . . . . . . . . . . . . . . . 136 Ugo Dal Lago, Giulio Guerrieri, and Willem Heijltjes On the k-synchronizability of Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . 157 Cinzia Di Giusto, Laetitia Laversa, and Etienne Lozes General Supervised Learning as Change Propagation with Delta Lenses. . . . . 177 Zinovy Diskin Non-idempotent Intersection Types in Logical Form . . . . . . . . . . . . . . . . . . 198 Thomas Ehrhard On Computability of Data Word Functions Defined by Transducers . . . . . . . 217 Léo Exibard, Emmanuel Filiot, and Pierre-Alain Reynier Minimal Coverability Tree Construction Made Complete and Efficient . . . . . 237 Alain Finkel, Serge Haddad, and Igor Khmelnitsky xiv Contents Constructing Infinitary Quotient-Inductive Types. . . . . . . . . . . . . . . . . . . . . 257 Marcelo P. Fiore, Andrew M. Pitts, and S. C. Steenkamp Relative Full Completeness for Bicategorical Cartesian Closed Structure . . . . 277 Marcelo Fiore and Philip Saville A Duality Theoretic View on Limits of Finite Structures . . . . . . . . . . . . . . . 299 Mai Gehrke, Tomáš Jakl, and Luca Reggio Correctness of Automatic Differentiation via Diffeologies and Categorical Gluing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 319 Mathieu Huot, Sam Staton, and Matthijs Vákár Deep Induction: Induction Rules for (Truly) Nested Types . . . . . . . . . . . . . . 339 Patricia Johann and Andrew Polonsky Exponential Automatic Amortized Resource Analysis . . . . . . . . . . . . . . . . . 359 David M. Kahn and Jan Hoffmann Concurrent Kleene Algebra with Observations: From Hypotheses to Completeness . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 381 Tobias Kappé, Paul Brunet, Alexandra Silva, Jana Wagemaker, and Fabio Zanasi Graded Algebraic Theories . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 401 Satoshi Kura A Curry-style Semantics of Interaction: From Untyped to Second-Order Lazy kl-Calculus . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 422 James Laird An Axiomatic Approach to Reversible Computation . . . . . . . . . . . . . . . . . . 442 Ivan Lanese, Iain Phillips, and Irek Ulidowski An Auxiliary Logic on Trees: on the Tower-Hardness of Logics Featuring Reachability and Submodel Reasoning. . . . . . . . . . . . . . . . . . . . . . . . . . . . 462 Alessio Mansutti The Inconsistent Labelling Problem of Stutter-Preserving Partial-Order Reduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 482 Thomas Neele, Antti Valmari, and Tim A. C. Willemse Semantical Analysis of Contextual Types. . . . . . . . . . . . . . . . . . . . . . . . . . 502 Brigitte Pientka and Ulrich Schöpp Ambiguity, Weakness, and Regularity in Probabilistic Büchi Automata . . . . . 522 Christof Löding and Anton Pirogov Contents xv Local Local Reasoning: A BI-Hyperdoctrine for Full Ground Store . . . . . . . . 542 Miriam Polzer and Sergey Goncharov Quantum Programming with Inductive Datatypes: Causality and Affine Type Theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 562 Romain Péchoux, Simon Perdrix, Mathys Rennela, and Vladimir Zamdzhiev Spinal Atomic Lambda-Calculus . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 582 David Sherratt, Willem Heijltjes, Tom Gundersen, and Michel Parigot Learning Weighted Automata over Principal Ideal Domains . . . . . . . . . . . . . 602 Gerco van Heerdt, Clemens Kupke, Jurriaan Rot, and Alexandra Silva The Polynomial Complexity of Vector Addition Systems with States. . . . . . . 622 Florian Zuleger Author Index ... .. ... .. ... .. ... .. .. ... .. ... .. ... .. ... .. .. 643 Neural Flocking: MPC-based Supervised Learning of Flocking Controllers 1 1 2 1 ()Usama Mehmood , Shouvik Roy , Radu Grosu , Scott A. Smolka , 1 3 Scott D. Stoller , and Ashish Tiwari Stony Brook University, Stony Brook NY, USA Technische Universitat Wien, Wien, Austria Microsoft Research, San Francisco CA, USA Abstract. We show how a symmetric and fully distributed flocking con- troller can be synthesized using Deep Learning from a centralized flocking controller. Our approach is based on Supervised Learning, with the cen- tralized controller providing the training data, in the form of trajectories of state-action pairs. We use Model Predictive Control (MPC) for the cen- tralized controller, an approach that we have successfully demonstrated on flocking problems. MPC-based flocking controllers are high-performing but also computationally expensive. By learning a symmetric and dis- tributed neural flocking controller from a centralized MPC-based one, we achieve the best of both worlds: the neural controllers have high performance (on par with the MPC controllers) and high efficiency. Our experimental results demonstrate the sophisticated nature of the dis- tributed controllers we learn. In particular, the neural controllers are capable of achieving myriad flocking-oriented control objectives, includ- ing flocking formation, collision avoidance, obstacle avoidance, predator avoidance, and target seeking. Moreover, they generalize the behavior seen in the training data to achieve these objectives in a significantly broader range of scenarios. In terms of verification of our neural flock- ing controller, we use a form of statistical model checking to compute confidence intervals for its convergence rate and time to convergence. Keywords: Flocking · Model Predictive Control · Distributed Neural Controller · Deep Neural Network · Supervised Learning 1 Introduction With the introduction of Reynolds rule-based model [16, 17], it is now possible to understand the flocking problem as one of distributed control. Specifically, in this model, at each time-step, each agent executes a control law given in terms of the weighted sum of three competing forces to determine its next acceleration. Each of these forces has its own rule: separation (keep a safe distance away from your neighbors), cohesion (move towards the centroid of your neighbors), and alignment (steer toward the average heading of your neighbors). Reynolds The Author(s) 2020 J. Goubault-Larrecq and B. K¨ onig (Eds.): FOSSACS 2020, LNCS 12077, pp. 1–16, 2020. 2 U. Mehmood et al. Fig. 1: Neural Flocking Architecture controller is distributed; i.e., it is executed separately by each agent, using information about only itself and nearby agents, and without communication. Furthermore, it is symmetric; i.e., every agent runs the same controller (same code). We subsequently showed that a simpler, more declarative approach to the flocking problem is possible [11]. In this setting, flocking is achieved when the agents combine to minimize a system-wide cost function. We presented centralized and distributed solutions for achieving this form of “declarative flocking” (DF), both of which were formulated in terms of Model-Predictive Control (MPC) [2]. Another advantage of DF over the ruled-based approach exemplified by Reynolds model is that it allows one to consider additional control objectives (e.g., obstacle and predator avoidance) simply by extending the cost function with additional terms for these objectives. Moreover, these additional terms are typically quite straightforward in nature. In contrast, deriving behavioral rules that achieve the new control objectives can be a much more challenging task. An issue with MPC is that computing the next control action can be compu- tationally expensive, as MPC searches for an action sequence that minimizes the cost function over a given prediction horizon. This renders MPC unsuitable for real-time applications with short control periods, for which flocking is a prime example. Another potential problem with MPC-based approaches to flocking is its performance (in terms of achieving the desired flight formation), which may suffer in a fully distributed setting. In this paper, we present Neural Flocking (NF), a new approach to the flocking problem that uses Supervised Learning to learn a symmetric and fully distributed flocking controller from a centralized MPC-based controller. By doing so, we achieve the best of both worlds: high performance (on par with the MPC controllers) in terms of meeting flocking flight-formation objectives, and high efficiency leading to real-time flight controllers. Moreover, our NF controllers can easily be parallelized on hardware accelerators such as GPUs and TPUs. Figure 1 gives an overview of the NF approach. A high-performing centralized MPC controller provides the labeled training data to the learning agent: a symmetric and distributed neural controller in the form of a deep neural network (DNN). The training data consists of trajectories of state-action pairs, where a state contains the information known to an agent at a time step (e.g., its own position and velocity, and the position and velocity of its neighbors), and the action (the label) is the acceleration assigned to that agent at that time step by the centralized MPC controller. We formulate and evaluate NF in a number of essential flocking scenarios: basic flocking with inter-agent collision avoidance, as in [11], and more advanced Neural Flocking: MPC-based Supervised Learning of Flocking Controllers 3 scenarios with additional objectives, including obstacle avoidance, predator avoid- ance, and target seeking by the flock. We conduct an extensive performance evaluation of NF. Our experimental results demonstrate the sophisticated nature of NF controllers. In particular, they are capable of achieving all of the stated control objectives. Moreover, they generalize the behavior seen in the training data in order to achieve these objectives in a significantly broader range of scenar- ios. In terms of verification of our neural controller, we use a form of statistical model checking [5, 10] to compute confidence intervals for its rate of convergence to a flock and for its time to convergence. 2 Background We consider a set of n dynamic agents A = {1,...,n} that move according to the following discrete-time equations of motion: p (k +1) = p (k)+ dt · v (k), |v (k)| < v¯ i i i i (1) v (k +1) = v (k)+ dt · a (k), |a (k)| < a ¯ i i i i 2 2 2 where p (k) ∈ R , v (k) ∈ R , a (k) ∈ R are the position, velocity and accelera- i i i tion of agent i ∈A respectively at time step k, and dt ∈ R is the time step. The magnitudes of velocities and accelerations are bounded by v¯ and a ¯, respectively. Acceleration a (k) is the control input for agent i at time step k. The acceleration is updated after every η time steps i.e., η · dt is the control period. The flock configuration at time step k is thus given by the following vectors (in boldface): T T T p(k)=[p (k) ··· p (k)] (2) 1 n T T T v(k)=[v (k) ··· v (k)] (3) 1 n T T T a(k)=[a (k) ··· a (k)] (4) 1 n The configuration vectors are referred to without the time indexing as p, v, and a. The neighborhood of agent i at time step k, denoted by N (k) ⊆A, contains its N -nearest neighbors, i.e., the N other agents closest to it. We use this definition (in Section 2.2 to define a distributed-flocking cost function) for simplicity, and expect that a radius-based definition of neighborhood would lead to similar results for our distributed flocking controllers. 2.1 Model-Predictive Control Model-Predictive control (MPC) [2] is a well-known control technique that has recently been applied to the flocking problem [11, 19, 20]. At each control step, an optimization problem is solved to find the optimal sequence of control actions (agent accelerations in our case) that minimizes a given cost function with respect to a predictive model of the system. The first control action of the optimal control sequence is then applied to the system; the rest is discarded. In the computation 4 U. Mehmood et al. of the cost function, the predictive model is evaluated for a finite prediction horizon of T control steps. MPC-based flocking models can be categorized as centralized or distributed.A centralized model assumes that complete information about the flock is available to a single “global” controller, which uses the states of all agents to compute their next optimal accelerations. The following optimization problem is solved by a centralized MPC controller at each control step k: T −1 min J(k)+ λ · a(k + t | k) (5) a(k|k),...,a(k+T −1|k) < a ¯ t=0 The first term J(k) is the centralized model-specific cost, evaluated for T control steps (this embodies the predictive aspect of MPC), starting at time step k.It encodes the control objective of minimizing the cost function J(k). The second term, scaled by a weight λ> 0, penalizes large control inputs: a(k + t | k) are the predictions made at time step k for the accelerations at time step k + t. In distributed MPC, each agent computes its acceleration based only on its own state and its local knowledge, e.g., information about its neighbors: T −1 min J (k)+ λ · a (k + t | k) (6) i i a (k|k),...,a (k+T −1|k) < a ¯ i i t=0 J (k) is the distributed, model-specific cost function for agent i, analogous to J(k). In a distributed setting where an agent’s knowledge of its neighbors’ behavior is limited, an agent cannot calculate the exact future behavior of its neighbors. Hence, the predictive aspect of J (k) must rely on some assumption about that behavior during the prediction horizon. Our distributed cost functions are based on the assumption that the neighbors have zero accelerations during the prediction horizon. While this simple design is clearly not completely accurate, our experiments show that it still achieves good results. 2.2 Declarative Flocking Declarative flocking (DF) is a high-level approach to designing flocking algorithms based on defining a suitable cost function for MPC [11]. This is in contrast to the operational approach, where a set of rules are used to capture flocking behavior, as in Reynolds model. For basic flocking, the DF cost function contains two terms: (1) a cohesion term based on the squared distance between each pair of agents in the flock; and (2) a separation term based on the inverse of the squared distance between each pair of agents. The flock evolves toward a configuration in which these two opposing forces are balanced. The cost function J for centralized DF, i.e., centralized MPC (CMPC), is as follows: 2 1 C 2 J (p)= · p  + ω · (7) ij s |A| · (|A| − 1) p ij i∈A j∈A,i<j Neural Flocking: MPC-based Supervised Learning of Flocking Controllers 5 where ω is the weight of the separation term and controls the density of the flock. |A|·(|A−1|) The cost function is normalized by the number of pairs of agents, ; as such, the cost does not depend on the size of the flock. The control law for CMPC is given by Eq. (5), with J(k)= J (p(k + t | k)). t=1 The basic flocking cost function for distributed DF is similar to that for CMPC, except that the cost function J for agent i is computed over its set of neighbors N (k) at time k: 1 1 D 2 J (p(k)) = · p  + ω · (8) ij s |N (k)| p i ij j∈N (k) j∈N (k) i i The control law for agent i is given by Eq. (6), with J (k) = J (p(k + t | k)). t=1 i 3 Additional Control Objectives The cost functions for basic flocking given in Eqs. (7) and (8) are designed to ensure that in the steady state, the agents are well-separated. Additional goals such as obstacle avoidance, predator avoidance, and target seeking are added to the MPC formulation as weighted cost-function terms. Different objectives can be combined by including the corresponding terms in the cost function as a weighted sum. Cost-Function Term for Obstacle Avoidance. We consider multiple rectangular obstacles which are distributed randomly in the field. For a set of m rectangular obstacles O = {O , O , ..., O }, we define the cost function term for obstacle 1 2 m avoidance as: 1 1 J (p, o)= (9) OA |A||O|  (i) iA jO p − o (i) where o is the set of points on the obstacle boundaries and o is the point on th th the obstacle boundary of the j obstacle O that is closest to the i agent. Cost-Function Term for Target Seeking. This term is the average of the squared distance between the agents and the target. Let g denote the position of the fixed target. Then the target-seeking term is as defined as J (p)= p − g (10) TS i |A| i∈A Cost-Function Term for Predator Avoidance. We introduce a single predator, which is more agile than the flocking agents: its maximum speed and acceleration are a factor of f greater than v¯ and a ¯, respectively, with f > 1. Apart from p p being more agile, the predator has the same dynamics as the agents, given by 6 U. Mehmood et al. Eq. (1). The control law for the predator consists of a single term that causes it to move toward the centroid of the flock with maximum acceleration. For a flock of n agents and one predator, the cost-function term for predator avoidance is the average of the inverse of the cube of the distances between the predator and the agents. It is given by: 1 1 J (p,p )= (11) PA pred |A| p − p i pred iA where p is the position of the predator. In contrast to the separation term pred in Eqs. (5)-(6), which we designed to ensure inter-agent collision avoidance, the predator-avoidance term has a cube instead of a square in the denominator. This is to reduce the influence of the predator on the flock when the predator is far away from the flock. NF Cost-Function Terms. The MPC cost functions used in our examination of Neural Flocking are weighted sums of the cost function terms introduced above. We refer to the first term of our centralized DF cost function J (p) (see Eq. (7)) as J (p) and the second as J (p). We use the following cost functions J , cohes sep 1 J , and J for basic flocking with collision avoidance, obstacle avoidance with 2 3 target seeking, and predator avoidance, respectively. J (p)= J (p)+ ω · J (p) (12a) 1 cohes s sep J (p, o)= J (p)+ ω · J (p)+ ω · J (p, o)+ ω · J (p) (12b) 2 cohes s sep o OA t TS J (p,p )= J (p)+ ω · J (p)+ ω · J (p,p ) (12c) 3 pred cohes s sep p PA pred where ω is the weight of the separation term, ω is the weight of the obstacle s o avoidance term, ω is the weight of the target-seeking term, and ω is the weight t p of the predator-avoidance term. Note that J is equivalent to J (Eq. (7)). The weight ω of the separation term is experimentally chosen to ensure that the distance between agents, throughout the simulation, is at least d , the minimum min inter-agent distance representing collision avoidance. Similar considerations were given to the choice of values for ω and ω . The specific values we used for the o p weights are: ω = 2000, ω = 1500, ω = 10, and ω = 500. s o t p We experimented with an alternative strategy for introducing inter-agent collision avoidance, obstacle avoidance, and predator avoidance into the MPC problem, namely, as constraints of the form d − p < 0, d −||p − min ij min i (i) o || < 0, and d −||p − p || < 0, respectively. Using the theory of exact min i pred penalty functions [12], we recast the constrained MPC problem as an equivalent unconstrained MPC problem by converting the constraints into a weighted penalty term, which is then added to the MPC cost function. This approach rendered the optimization problem difficult to solve due to the non-smoothness of the penalty term. As a result, constraint violations in the form of collisions were observed during simulation. Neural Flocking: MPC-based Supervised Learning of Flocking Controllers 7 4 Neural Flocking We learn a distributed neural controller (DNC) for the flocking problem using training data in the form of trajectories of state-action pairs produced by a CMPC controller. In addition to basic flocking with inter-agent collision avoidance, the DNC exhibits a number of other flocking-related behaviors, including obstacle avoidance, target seeking, and predator avoidance. We also show how the learned behavior exhibited by the DNC generalizes over a larger number of agents than what was used during training to achieve successful collision-free flocking in significantly larger flocks. We use Supervised Learning to train the DNC. Supervised Learning learns a function that maps an input to an output based on example sequences of input- output pairs. In our case, the trajectory data obtained from CMPC contains both the training inputs and corresponding labels (outputs): the state of an agent in the flock (and that of its nearest neighbors) at a particular time step is the input, and that agent’s acceleration at the same time step is the label. 4.1 Training Distributed Flocking Controllers We use Deep Learning to synthesize a distributed and symmetric neural controller from the training data provided by the CMPC controller. Our objective is to learn basic flocking, obstacle avoidance with target seeking, and predator avoidance. Their respective CMPC-based cost functions are given in Sections 2.2 and 3. All of these control objectives implicitly also include inter-agent collision avoidance by virtue of the separation term in Eq. 7. For each of these control objectives, DNC training data is obtained from CMPC trajectory data generated for n = 15 agents, starting from initial con- figurations in which agent positions and velocities are uniformly sampled from 2 2 [−15, 15] and [0, 1] , respectively. All training trajectories are 1,000 time steps in duration. We further ensure that the initial configurations are recoverable; i.e., no two agents are so close to each other that they cannot avoid a collision by resorting to maximal accelerations. We learn a single DNC from the state-action pairs of all n agents. This yields a symmetric distributed controller, which we use for each agent in the flock during evaluation. Basic Flocking. Trajectory data for basic flocking is generated using the cost function given in Eq. (7). We generate 200 trajectories, each of which (as noted above) is 1,000 time steps long. The input to the NN is the position and velocity of each agent along with the positions and velocities of its N -nearest neighbors. This yields 200 · 1, 000 · 15 = 3M total training samples. Let us refer to the agent (the DNC) being learned as A . Since we use y y y x x x neighborhood size N = 14, the input to the NN is of the form [p p v v p p 0 0 0 0 1 1 y y y y y x x x x x v v ... p p v v ], where p , p are the position coordinates and v , v 1 1 14 14 14 14 0 0 0 0 y y x x velocity coordinates for agent A , and p , p and v , v are the 1...14 1...14 1...14 1...14 position and velocity vectors of its neighbors. Since this input vector has 60 components, the input to the NN consists of 60 features. 8 U. Mehmood et al. (a) Basic flocking (b) Obstacle avoid. (c) Predator avoid. (d) Target seeking Fig. 2: Snapshots of DNC flocking behaviors for 30 agents Obstacle Avoidance with Target Seeking. For obstacle avoidance with target seeking, we use CMPC with the cost function given in Eq. (12b). The target is located beyond the obstacles, forcing the agents to move through the obstacle field. For the training data, we generate 100 trajectories over 4 different obstacle fields (25 trajectories per obstacle field). The input to the NN consists of the 92 y y y y y y y x x x x x x x y x features [p p v v o o ... p p v v o o g g ], where o , o is the 0 0 0 0 0 0 14 14 14 14 14 14 0 0 closest point on any obstacle to agent A ; o , o give the closest point on 1...14 1...14 x y any obstacle for the 14 neighboring agents, and g , g is the target location. Predator Avoidance. The CMPC cost function for predator avoidance is given in Eq. (12c). The position, velocity, and the acceleration of the predator are denoted by p , v , a , respectively. We take f =1.40; hence v¯ =1.40 v¯ and pred pred pred p pred a ¯ =1.40 a ¯. The input features to the NN are the positions and velocities pred of agent A and its N -nearest neighbors, and the position and velocity of the y y y x x x predator. The input with 64 features thus has the form [p p v v ... p p 0 0 0 0 14 14 y y y x x x v v p p v v ]. 14 14 pred pred pred pred 5 Experimental Evaluation This section contains the results of our extensive performance analysis of the distributed neural flocking controller (DNC), taking into account various control objectives: basic flocking with collision avoidance, obstacle avoidance with target seeking, and predator avoidance. As illustrated in Fig. 1, this involves running CMPC to generate the training data for the DNCs, whose performance we then compare to that of the DMPC and CMPC controllers. We also show that the DNC flocking controllers generalize the behavior seen in the training data to achieve successful collision-free flocking in flocks significantly larger in size than those used during training. Finally, we use Statistical Model Checking to obtain confidence intervals for DNC’s correctness/performance. 5.1 Preliminaries The CMPC and DMPC control problems defined in Section 2.1 are solved using MATLAB fmincon optimizer. In the training phase, the size of the flock is Neural Flocking: MPC-based Supervised Learning of Flocking Controllers 9 n = 15. For obstacle-avoidance with target-seeking, we use 5 obstacles with the target located at [60,50]. The simulation time is 100, dt = 0.1 time units, and η = 3, where (recall) η · dt is the control period. Further, the agent velocity and acceleration bounds are v¯=2.0 and a ¯ =1.5. We use d =1.5 as the minimum inter-agent distance for collision avoidance, min obs d = 1 as the minimum agent-obstacle distance for obstacle avoidance, and min pred d =1.5 as the minimum agent-predator distance for predator avoidance. For min initial configurations, recall that agent positions and velocities are uniformly 2 2 sampled from [−15, 15] and [0, 1] , respectively, and we ensure that they are recoverable; i.e., no two agents are so close to each other that they cannot avoid a collision when resorting to maximal accelerations. The predator starts at rest from a fixed location at a distance of 40 from the flock center. For training, we considered 15 agents and 200 trajectories per agent, each trajectory 1,000 time steps in length. This yielded a total of 3,000,000 training samples. Our neural controller is a fully connected feed-forward Deep Neural Network (DNN), with 5 hidden layers, 84 neurons per hidden layer, and with a ReLU activation function. We use an iterative approach for choosing the DNN hyperparameters and architecture where we continuously improve our NN, until we observe satisfactory performance by the DNC. For training the DNNs, we use Keras [3], which is a high-level neural network API written in Python and capable of running on top of TensorFlow. To generate the NN model, Keras uses the Adam optimizer [8] with the following settings: −2 −8 lr = 10 , β = 0.9, β = 0.999,  = 10 . The batch size (number of samples 1 2 processed before the model is updated) is 2,000, and the number of epochs (number of complete passes through the training dataset) used for training is 1,000. For measuring training loss, we use the mean-squared error metric. For basic flocking, DNN input vectors have 60 features and the number of trainable DNN parameters is 33,854. For flocking with obstacle-avoidance and target-seeking, input vectors have 92 features and the number of trainable parameters is 36,542. Finally, for flocking with predator-avoidance, input vectors have 64 features and the resulting number of trainable DNN parameters is 34,190. To test the trained DNC, we generated 100 simulations (runs) for each of the desired control objectives: basic flocking with collision avoidance, flocking with obstacle avoidance and target seeking, and flocking with predator avoidance. The results presented in Tables 1, were obtained using the same number of agents and obstacles and the same predator as in the training phase. We also ran tests that show DNC controllers can achieve collision-free flocking with obstacle avoidance where the numbers of agents and obstacles are greater than those used during training. 5.2 Results for Basic Flocking We use flock diameter, inter-agent collision count and velocity convergence [20] as performance metrics for flocking behavior. At any time step, the flock diameter D(p)= max p  is the largest distance between any two agents in the (i,j)∈A ij flock. We calculate the average converged diameter by averaging the flock diameter 10 U. Mehmood et al. 0.5 12 0 0 20406080 100 0 20406080 100 Time Time (a) Flock diameter (b) Velocity convergence Fig. 3: Performance comparison for basic flocking with collision avoidance, aver- aged over 100 test runs. in the final time step of the simulation over the 100 runs. An inter-agent collision (IC) occurs when the distance between two agents at any point in time is less than d . The IC rate (ICR) is the average number of ICs per test-trajectory time- min step. The velocity convergence VC (v)=(1/n) v − ( v )/n is i j i∈A j=1 the average of the squared magnitude of the discrepancy between the velocities of agents and the flock’s average velocity. For all the metrics, lower values are better, indicating a denser and more coherent flock with fewer collisions. A successful flocking controller should also ensure that values of D(p) and VC (v) eventually stabilize. Fig. 3 and Table 1 compare the performance of the DNC on the basic-flocking problem for 15 agents to that of the MPC controllers. Although the DMPC and CMPC outperform the DNC, the difference is marginal. An important advantage of the DNC over DMPC is that they are much faster. Executing a DNC controller requires a modest number of arithmetic operations, whereas executing an MPC controller requires simulation of a model and controller over the prediction horizon. In our experiments, on average, the CMPC takes 1209 msec of CPU time for the entire flock and DMPC takes 58 msec of CPU time per agent, whereas the DNC takes only 1.6 msec. Table 1: Performance comparison for BF with 15 agents on 100 test runs Avg. Conv. Diameter ICR Velocity Convergence DNC 14.13 0 0.15 DMPC 13.67 0 0.11 CMPC 13.84 0 0.10 VC Neural Flocking: MPC-based Supervised Learning of Flocking Controllers 11 Table 2: DNC Performance Generalization for BF Agents Avg. Conv. Conv. Avg. Conv. ICR Diameter Rate (%) Time 15 14.13 100 52.15 0 20 16.45 97 58.76 0 25 19.81 94 64.11 0 30 23.24 92 72.08 0 35 30.57 86 83.84 0.008 40 38.66 81 95.32 0.019 5.3 Results for Obstacle and Predator Avoidance For obstacle and predator avoidance, collision rates are used as a performance metric. An obstacle-agent collision (OC) occurs when the distance between an obs agent and the closest point on any obstacle is less than d . A predator-agent min collision (PC) occurs when the distance between an agent and the predator is less pred than d . The OC rate (OCR) is the average number of OCs per test-trajectory min time-step, and the PC rate (PCR) is defined similarly. Our test results show that the DNC, along with the DMPC and CMPC, is collision-free (i.e., each of ICR, OCR, and PCR is zero) for 15 agents, with the exception of DMPC for predator avoidance where PCR = 0.013. We also observed that the flock successfully reaches the target location in all 100 test runs. 5.4 DNC Generalization Results Tables 2–3 present DNC generalization results for basic flocking (BF), obstacle avoidance (OA), and predator avoidance (PA), with the number of agents ranging from 15 (the flock size during training) to 40. In all of these experiments, we use a neighborhood size of N = 14, the same as during training. Each controller was evaluated with 100 test runs. The performance metrics in Table 2 are the average converged diameter, convergence rate, average convergence time, and ICR. The convergence rate is the fraction of successful flocks over 100 runs. The collection of agents is said to have converged to a flock (with collision avoidance) if the value of the global cost function is less than the convergence threshold. We use a convergence threshold of J (p) ≤ 150, which was chosen based on its proximity to the value achieved by CMPC. We use the cost function from Eq. 12a to calculate our success rate because we are showing convergence rate for basic flocking. The average convergence time is the time when the global cost function first drops below the success threshold and remains below it for the rest of the run, averaged over all 100 runs. Even with a local neighborhood of size 14, the results demonstrate that the DNC can successfully generalize to a large number of agents for all of our control objectives. 12 U. Mehmood et al. Table 3: DNC Generalization Performance for OA and PA OA PA Agents ICR OCR ICR PCR 15 0000 20 0000 25 0000 30 0000 35 0.011 0.009 0.013 0.010 40 0.021 0.018 0.029 0.023 5.5 Statistical Model Checking Results We use Monte Carlo (MC) approximation as a form of Statistical Model Check- ing [5, 10] to compute confidence intervals for the DNC’s convergence rate to a flock with collision avoidance and for the (normalized) convergence time. The convergence rate is the fraction of successful flocks over N runs. The collection of agent is said to have converged to a successful flock with collision avoidance if the global cost function J (p) ≤ 150, where J (p) is cost function for basic 1 1 flocking defined in Eq. 12a. The main idea of MC is to use N random variables, Z ,...,Z , also called 1 N samples, IID distributed according to a random variable Z with mean μ , and to take the sum μ ˜ =(Z + ... + Z )/N as the value approximating the mean μ . Z 1 N Z Since an exact computation of μ is almost always intractable, an MC approach is used to compute an (, δ)-approximation of this quantity. Additive Approximation [6] is an (, δ)-approximation scheme where the mean μ of an RV Z is approximated with absolute error  and probability 1 − δ: Pr[μ −  ≤ μ ˜ ≤ μ + ] ≥ 1 − δ (13) Z Z Z where μ ˜ is an approximation of μ . An important issue is to determine the Z Z number of samples N needed to ensure that μ ˜ is an (, δ)-approximation of μ .If Z Z Z is a Bernoulli variable expected to be large, one can use the Chernoff-Hoeffding instantiation of the Bernstein inequality and take N to be N = 4 ln(2/δ)/ , as in [6]. This results in the additive approximation algorithm [5], defined in Algorithm 1. We use this algorithm to obtain a joint (, δ)-approximation of the mean convergence rate and mean normalized convergence time for the DNC. Each sample Z is based on the result of an execution obtained by simulating the system starting from a random initial state, and we take Z =(B, R), where B is a Boolean variable indicating whether the agents converged to a flock during the execution, and R is a real value denoting the normalized convergence time. The normalized convergence time is the time when the global cost function first drops below the convergence threshold and remains below it for the rest of the run, measured as a fraction of the total duration of the run. The assumptions Neural Flocking: MPC-based Supervised Learning of Flocking Controllers 13 Algorithm 1: Additive Approximation Algorithm Input: (, δ) with 0 << 1and0 <δ < 1 Input: Random variables Z ,IID Output: μ ˜ approximation of μ Z Z N = 4 ln(2/δ)/ ; for (i=0; i ≤ N; i++) do S = S + Z ; μ˜ = S/N; return μ ˜ ; Z Z Table 4: SMC results for DNC convergence rate and normalized convergence time;  =0.01, δ =0.0001 Agents μ ˜ μ ˜ CR CT 15 0.99 0.53 20 0.97 0.58 25 0.94 0.65 30 0.91 0.71 35 0.86 0.84 40 0.80 0.95 about Z required for validity of the additive approximation hold, because RV B is a Bernoulli variable, the convergence rate is expected to be large (i.e., closer to 1 than to 0), and the proportionality constraint of the Bernstein inequality is also satisfied for RV R. In these experiments, the initial configurations are sampled from the same distributions as in Section 5.1, and we set  =0.01 and δ =0.0001, to obtain N = 396,140. We perform the required set of N simulations for 15, 20, 25, 30, 35 and 40 agents. Table 4 presents the results, specifically, the (, δ)-approximations μ ˜ CR and μ ˜ of the mean convergence rate and the mean normalized convergence CT time, respectively. While the results for the convergence rate are (as expected) nu- merically similar to the results in Table 2, the results in Table 4 are much stronger, because they come with the guarantee that they are (, δ)-approximations of the actual mean values. 6 Related Work In [18], a flocking controller is synthesized using multi-agent reinforcement learning (MARL) and natural evolution strategies (NES). The target model from which the system learns is Reynolds flocking model [16]. For training purposes, a list of metrics called entropy are chosen, which provide a measure of the collective behavior displayed by the target model. As the authors of [18] observe, this technique does not quite work: although it consistently leads to agents forming recognizable patterns during simulation, agents self-organized into a cluster instead of flowing like a flock. 14 U. Mehmood et al. In [9], reinforcement learning and flocking control are combined for the purpose of predator avoidance, where the learning module determines safe spaces in which the flock can navigate to avoid predators. Their approach to predator avoidance, however, isn’t distributed as it requires a majority consensus by the flock to determine its action to avoid predators. They also impose an α-lattice structure [13] on the flock. In contrast, our approach is geometry-agnostic and achieves predator avoidance in a distributed manner. In [7], an uncertainty-aware reinforcement learning algorithm is developed to estimate the probability of a mobile robot colliding with an obstacle in an unknown environment. Their approach is based on bootstrap neural networks using dropouts, allowing it to process raw sensory inputs. Similarly, a learning- based approach to robot navigation and obstacle avoidance is presented in [14]. They train a model that maps sensor inputs and the target position to motion commands generated by the ROS [15] navigation package. Our work in contrast considers obstacle avoidance (and other control objectives) in a multi-agent flocking scenario under the simplifying assumption of full state observation. In [4], an approach based on Bayesian inference is proposed that allows an agent in a heterogeneous multi-agent environment to estimate the navigation model and goal of each of its neighbors. It then uses this information to compute a plan that minimizes inter-agent collisions while allowing the agent to reach its goal. Flocking formation is not considered. 7 Conclusions With the introduction of Neural Flocking (NF), we have shown how machine learning in the form of Supervised Learning can bring many benefits to the flocking problem. As our experimental evaluation confirms, the symmetric and fully distributed neural controllers we derive in this manner are capable of achieving a multitude of flocking-oriented objectives, including flocking formation, inter-agent collision avoidance, obstacle avoidance, predator avoidance, and target seeking. Moreover, NF controllers exhibit real-time performance and generalize the behavior seen in the training data to achieve these objectives in a significantly broader range of scenarios. Ongoing work aims to determine whether a DNC can perform as well as the centralized MPC controller for agent models that are significantly more realistic than our current point-based model. For this purpose, we are using transfer learning to train a DNC that can achieve acceptable performance on realistic quadrotor dynamics [1], starting from our current point-model-based DNC. This effort also involves extending our current DNC from 2-dimensional to 3-dimensional spatial coordinates. If successful, and preliminary results are encouraging, this line of research will demonstrate that DNCs are capable of achieving flocking with complex realistic dynamics. For future work, we plan to investigate a distance-based notion of agent neigh- borhood as opposed to our current nearest-neighbors formulation. Furthermore, motivated by the quadrotor study of [21], we will seek to combine MPC with Neural Flocking: MPC-based Supervised Learning of Flocking Controllers 15 reinforcement learning in the framework of guided policy search as an alternative solution technique for the NF problem. References 1. Bouabdallah, S.: Design and control of quadrotors with application to autonomous flying (2007) 2. Camacho, E.F., Bordons Alba, C.: Model Predictive Control. Springer (2007) 3. Chollet, F., et al.: Keras (2015), 4. Godoy, J., Karamouzas, I., Guy, S.J., Gini, M.: Moving in a crowd: Safe and efficient navigation among heterogeneous agents. In: Proceedings of the Twenty- Fifth International Joint Conference on Artificial Intelligence. pp. 294–300. IJCAI’16, AAAI Press (2016) 5. Grosu, R., Peled, D., Ramakrishnan, C.R., Smolka, S.A., Stoller, S.D., Yang, J.: Using statistical model checking for measuring systems. In: 6th International Symposium, ISoLA 2014. Corfu, Greece (Oct 2014) 6. H´erault, T., Lassaigne, R., Magniette, F., Peyronnet, S.: Approximate probabilistic model checking. In: Steffen, B., Levi, G. (eds.) Verification, Model Checking, and Abstract Interpretation. pp. 73–84. Springer Berlin Heidelberg, Berlin, Heidelberg (2004) 7. Kahn, G., Villaflor, A., Pong, V., Abbeel, P., Levine, S.: Uncertainty-aware re- inforcement learning for collision avoidance. arXiv preprint arXiv:1702.01182. pp. 1–12 (2017) 8. Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. In: 3rd Interna- tional Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015) 9. La, H.M., Lim, R., Sheng, W.: Multirobot cooperative learning for predator avoid- ance. IEEE Transactions on Control Systems Technology 23(1), 52–63 (2015) 10. Larsen, K.G., Legay, A.: Statistical model checking: Past, present, and future. In: 6th International Symposium, ISoLA 2014. Corfu, Greece (Oct 2014) 11. Mehmood, U., Paoletti, N., Phan, D., Grosu, R., Lin, S., Stoller, S.D., Tiwari, A., Yang, J., Smolka, S.A.: Declarative vs rule-based control for flocking dynamics. In: Proceedings of SAC 2018, 33rd Annual ACM Symposium on Applied Computing. pp. 816–823 (2018) 12. Nocedal, J., Wright, S.J.: Numerical Optimization. Springer, New York, NY, USA, second edn. (2006) 13. Olfati-Saber, R.: Flocking for multi-agent dynamic systems: Algorithms and theory. IEEE Transactions on automatic control 51(3), 401–420 (2006) 14. Pfeiffer, M., Schaeuble, M., Nieto, J.I., Siegwart, R., Cadena, C.: From perception to decision: A data-driven approach to end-to-end motion planning for autonomous ground robots. In: 2017 IEEE International Conference on Robotics and Automation, ICRA 2017, Singapore, Singapore, May 29 - June 3, 2017. pp. 1527–1533 (2017) 15. Quigley, M., Conley, K., Gerkey, B.P., Faust, J., Foote, T., Leibs, J., Wheeler, R., Ng, A.Y.: ROS: an open-source robot operating system. In: ICRA Workshop on Open Source Software (2009) 16. Reynolds, C.W.: Flocks, herds and schools: A distributed behavioral model. SIG- GRAPH Comput. Graph. 21(4) (Aug 1987) 17. Reynolds, C.W.: Steering behaviors for autonomous characters. In: Proceedings of Game Developers Conference 1999. pp. 763–782 (1999) 16 U. Mehmood et al. 18. Shimada, K., Bentley, P.: Learning how to flock: Deriving individual behaviour from collective behaviour with multi-agent reinforcement learning and natural evolution strategies. In: Proceedings of the Genetic and Evolutionary Computation Conference Companion. pp. 169–170. ACM (2018) 19. Zhan, J., Li, X.: Flocking of multi-agent systems via model predictive control based on position-only measurements. IEEE Transactions on Industrial Informatics 9(1), 377–385 (2013) 20. Zhang, H.T., Cheng, Z., Chen, G., Li, C.: Model predictive flocking control for second-order multi-agent systems with input constraints. IEEE Transactions on Circuits and Systems I: Regular Papers 62(6), 1599–1606 (2015) 21. Zhang, T., Kahn, G., Levine, S., Abbeel, P.: Learning deep control policies for autonomous aerial vehicles with MPC-guided policy search. In: 2016 IEEE Interna- tional Conference on Robotics and Automation, ICRA 2016, Stockholm, Sweden, May 16-21, 2016. pp. 528–535 (2016) Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License ( 0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made. The images or other third party material in this chapter are included in the chapter’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. On Well-Founded and Recursive Coalgebras 1, 2,  ,() 3 , † Jiří Adámek , Stefan Milius , and Lawrence S. Moss Czech Technical University, Prague, Czech Republic Friedrich-Alexander-Universität Erlangen-Nürnberg, Germany Indiana University, Bloomington, IN, USA Abstract This paper studies fundamental questions concerning category- theoretic models of induction and recursion. We are concerned with the relationship between well-founded and recursive coalgebras for an endofunctor. For monomorphism preserving endofunctors on complete and well-powered categories every coalgebra has a well-founded part, and we provide a new, shorter proof that this is the coreflection in the category of all well-founded coalgebras. We present a new more general proof of Taylor’s General Recursion Theorem that every well- founded coalgebra is recursive, and we study conditions which imply the converse. In addition, we present a new equivalent characterization of well-foundedness: a coalgebra is well-founded iff it admits a coalgebra-to- algebra morphism to the initial algebra. Keywords: Well-founded · Recursive · Coalgebra · Initial Algebra · General Recursion Theorem 1 Introduction What is induction? What is recursion? In areas of theoretical computer science, the most common answers are related to initial algebras. Indeed, the dominant trend in abstract data types is initial algebra semantics (see e.g. [19]), and this approach has spread to other semantically-inclined areas of the subject. The approach in broad slogans is that, for an endofunctor F describing the type of algebraic operations of interest, the initial algebra μF has the property that for every F -algebra A, there is a unique homomorphism μF → A,andthis is recursion. Perhaps the primary example is recursion on N, the natural numbers. Recall that N is the initial algebra for the set functor FX = X +1.If A is any set, and a ∈ A and α : A → A +1 are given, then initiality tells us that there is a unique f : N → A such that for all n ∈ N, f (0) =af (n +1) = α(f (n)). (1.1) A full version of this paper including full proof details is available on arXiv [5]. Supported by the Grant Agency of the Czech Republic under grant 19-00902S. Supported by Deutsche Forschungsgemeinschaft (DFG) under project MI 717/5-2. Supported by grant #586136 from the Simons Foundation. The Author(s) 2020 J. Goubault-Larrecq and B. K¨ onig (Eds.): FOSSACS 2020, LNCS 12077, pp. 17–36, 2020. 18 J. Ad´ amek et al. Then the first additional problem coming with this approach is that of how to “recognize” initial algebras: Given an algebra, how do we really know if it is initial? The answer – again in slogans – is that initial algebras are the ones with “no junk and no confusion.” Although initiality captures some important aspects of recursion, it cannot be a fully satisfactory approach. One big missing piece concerns recursive definitions based on well-founded relations. For example, the whole study of termination of rewriting systems depends on well-orders, the primary example of recursion on a wel l-founded order.Let (X, R) be a well-founded relation, i.e. one with no infinite sequences ··· x Rx Rx .Let A be any set, and let α : P A → A.(Here 2 1 0 and below, P is the power set functor, taking a set to the set of its subsets.) Then there is a unique f : X → A such that for all x ∈ X , f (x)= α({f (y): yR x}). (1.2) The main goal of this paper is the study of concepts that allow one to extend the algebraic spirit behind initiality in (1.1) to the setting of recursion arising from well-foundedness as we find it in (1.2). The corresponding concepts are those of well-founded and recursive coalgebras for an endofunctor, which first appear in work by Osius [22] and Taylor [23, 24], respectively. In his work on categorical set theory, Osius [22] first studied the notions of well-founded and recursive coalgebras (for the power-set functor on sets and, more generally, the power-object functor on an elementary topos). He defined recursive coalgebras as those coalgebras α : A → P A which have a unique coalgebra-to-algebra homomorphism into every algebra (see Definition 3.2). Taylor [23,24] took Osius’ ideas much further. He introduced well-founded coalgebras for a general endofunctor, capturing the notion of a well-founded rela- tion categorically, and considered recursive coalgebras under the name ‘coalgebras obeying the recursion scheme’. He then proved the General Recursion Theorem that all well-founded coalgebras are recursive, for every endofunctor on sets (and on more general categories) preserving inverse images. Recursive coalgebras were also investigated by Eppendahl [12], who called them algebra-initial coalgebras. Capretta, Uustalu, and Vene [10] further studied recursive coalgebras, and they showed how to construct new ones from given ones by using comonads. They also explained nicely how recursive coalgebras allow for the semantic treatment of (functional) divide-and-conquer programs. More recently, Jeannin et al. [15] proved the General Recursion Theorem for polynomial functors on the category of many-sorted sets; they also provide many interesting examples of recursive coalgebras arising in programming. Our contributions in this paper are as follows. We start by recalling some pre- liminaries in Section 2 and the definition of (parametrically) recursive coalgebras in Section 3 and of well-founded coalgebras in Section 4 (using a formulation based on Jacobs’ next time operator [14], which we extend from Kripke poly- nomial set functors to arbitrary functors). We show that every coalgebra for a monomorphism preserving functor on a complete and well-powered category has a well-founded part, and provide a new proof that this is the coreflection in the On Well-Founded and Recursive Coalgebras 19 category of well-founded coalgebras (Proposition 4.19), shortening our previous proof [6]. Next we provide a new proof of Taylor’s General Recursion Theorem (Theorem 5.1), generalizing this to endofunctors preserving monomorphisms on a complete and well-powered category having smooth monomorphisms (see Defini- tion 2.8). For the category of sets, this implies that “well-founded ⇒ recursive” holds for all endofunctors, strengthening Taylor’s result. We then discuss the converse: is every recursive coalgebra well-founded? Here the assumption that F preserves inverse images cannot be lifted, and one needs additional assumptions. In fact, we present two results: one assumes universally smooth monomorph- isms and that the functor has a pre-fixed point (see Theorem 5.5). Under these assumptions we also give a new equivalent characterization of recursiveness and well-foundedness: a coalgebra is recursive if it has a coalgebra-to-algebra morphism into the initial algebra (which exists under our assumptions), see Co- rollary 5.6. This characterization was previously established for finitary functors on sets [3]. The other converse of the above implication is due to Taylor using the concept of a subobject classifier (Theorem 5.8). It implies that ‘recursive’ and ‘well-founded’ are equivalent concepts for all set functors preserving inverse images. We also prove that a similar result holds for the category of vector spaces over a fixed field (Theorem 5.12). Finally, we show in Section 6 that well-founded coalgebras are closed under coproducts, quotients and, assuming mild assumptions, under subcoalgebras. 2 Preliminaries We start by recalling some background material. Except for the definitions of algebra and coalgebra in Subsection 2.1, the subsections below may be read as needed. We assume that readers are familiar with notions of basic category theory; see e.g. [2] for everything which we do not detail. We indicate monomorphisms by writing  and strong epimorphisms by . 2.1 Algebras and Coalgebras. We are concerned throughout this paper with algebras and coalgebras for an endofunctor. This means that we have an underlying category, usually written A ; frequently it is the category of sets or of vector spaces over a fixed field, and that a functor F : A → A is given. An F -algebra is a pair (A, α),where α : FA → A.An F -coalgebra is a pair (A, α), where α : A → FA. We usually drop the functor F . Given two algebras (A, α) and (B, β),an algebra homomorphism from the first to the second is h : A → B in A such that h · α = β · Fh. Similarly, a coalgebra homomorphism satisfies β · h = Fh · α.Wedenoteby Coalg F the category of all coalgebras for F . Example 2.1. (1) The power set functor P : Set → Set takes a set X to the set P X of all subsets of it; for a morphism f : X → Y , P f : P X → P Y takes a subset S ⊆ X to its direct image f [S]. Coalgebras α : X → P X may be identified with directed graphs on the set X of vertices, and the coalgebra structure α describes the edges: b ∈ α(a) means that there is an edge a → b in the graph. 20 J. Ad´ amek et al. (2) Let Σ be a signature, i.e. a set of operation symbols, each with a finite arity. The polynomial functor H associated to Σ assigns to a set X the set H X = Σ × X , Σ n n∈N where Σ is the set of operation symbols of arity n. This may be identified with the set of all terms σ(x ,...,x ),for σ ∈ Σ ,and x ,...,x ∈ X.Algebrasfor 1 n n 1 n H are the usual Σ-algebras. (3) Deterministic automata over an input alphabet Σ are coalgebras for the functor FX = {0, 1}× X . Indeed, given a set S of states, a next-state map S × Σ → S may be curried to δ : S → S . The set of final states yields the acceptance predicate a : S →{0, 1}. So an automaton may be regarded as a coalgebra a, δ : S →{0, 1}× S . (4) Labelled transitions systems are coalgebras for FX = P (Σ × X ). (5) To describe linear weighted automata, i.e. weighted automata over the input alphabet Σ with weights in a field K , as coalgebras, one works with the category coal- Vec of vector spaces over K . A linear weighted automaton is then a gebra for FX = K × X . 2.2 Preservation Properties. Recall that an intersection of two subobjects s : S  A (i =1, 2)ofagivenobject A is given by their pullback. Analogously, i i (general) intersections are given by wide pullbacks. Furthermore, the inverse image of a subobject s : S  B under a morphism f : A → B is the subobject t : T  A obtained by a pullback of s along f . All of the ‘usual’ set functors preserve intersections and inverse images: Example 2.2. (1) Every polynomial functor preserves intersections and inverse images. (2) The power-set functor P preserves intersections and inverse images. (3) Intersection-preserving set functors are closed under taking coproducts, products and composition. Similarly, for inverse images. (4) Consider next the set functor R defined by RX = {(x, y) ∈ X × X : x = y} + {d} for sets X . For a function f : X → Y put Rf (x, y)=(f (x),f (y)) if f (x) = f (y),and d otherwise. R preserves intersections but not inverse images. Proposition 2.3 [27]. For every set functor F there exists an essential ly unique set functor F which coincides with F on nonempty sets and functions and preserves finite intersections (whence monomorphisms). ¯ ¯ Remark 2.4. (1) In fact, Trnková gave a construction of F : she defined F ∅ as the set of all natural transformations C → F,where C is the set functor with 01 01 C ∅ = ∅ and C X =1 for all nonempty sets X . For the empty map e : ∅→ X 01 01 with X = ∅, Fe maps a natural transformation τ : C → F to the element given by τ :1 → FX . (2) The above functor F is called the Trnková hul l of F . It allows us to achieve preservation of intersections for all finitary set functors. Intuitively, a functor on On Well-Founded and Recursive Coalgebras 21 sets is finitary if its behavior is completely determined by its action on finite sets and functions. For a general functor, this intuition is captured by requiring that the functor preserves filtered colimits [8]. For a set functor F this is equivalent to being finitely bounded, which is the following condition: for each element x ∈ FX there exists a finite subset M ⊆ X such that x ∈ Fi[FM ],where i : M→ X is the inclusion map [7,Rem. 3.14]. Proposition 2.5 [4,p. 66]. The Trnková hul l of a finitary set functor preserves al l intersections. 2.3 Factorizations. Recall that an epimorphism e : A → B is called strong if it satisfies the following diagonal fil l-in property: given a monomorphism m : C  D and morphisms f : A → C and g : B → D such that m · f = g · e then there exists a unique d : B → C such that f = d · e and g = m · d. Every complete and well-powered category has factorizations of morphisms: every morphism f may be written as f = m · e,where e is a strong epimorphism and m is a monomorphism [9,Prop. 4.4.3]. We call the subobject m the image of f . It follows from a result in Kurz’ thesis [16,Prop. 1.3.6] that factorizations of morphisms lift to coalgebras: Proposition 2.6 (Coalg F inherits factorizations from A ). Suppose that F preserves monomorphisms. Then the category Coalg F has factorizations of homomorphisms f as f = m · e,where e is carried by a strong epimorphism and m by a monomorphism in A . The diagonal fil l-in property holds in Coalg F . Remark 2.7. By a subcoalgebra of a coalgebra (A, α) we mean a subobject in Coalg F represented by a homomorphism m:(B, β)  (A, α),where m is monic in A . Similarly, by a strong quotient of a coalgebra (A, α) we mean one represented by a homomorphism e:(A, α)  (C, γ) with e strongly epic in A . 2.4 Chains. By a transfinite chain in a category A we understand a functor from the ordered class Ord of all ordinals into A . Moreover, for an ordinal λ,a λ-chain in A is a functor from λ to A . A category has colimits of chains if for every ordinal λ it has a colimit of every λ-chain. This includes the initial object 0 (the case λ =0). Definition 2.8. (1) A category A has smooth monomorphisms if for every λ-chain C of monomorphisms a colimit exists, its colimit cocone is formed by monomorphisms, and for every cone of C formed by monomorphisms, the factorizing morphism from colim C is monic. In particuar, every morphism from 0 is monic. (2) A has universal ly smooth monomorphisms if A also has pullbacks, and for every morphism f : X → colim C , the functor A / colim C → A /X forming pullbacks along f preserves the colimit of C. This implies that initial object 0 is strict, i.e. every morphism f : X → 0 is an isomorphism. Indeed, consider the empty chain (λ =0). Example 2.9. (1) Set has universally smooth monomorphisms. 22 J. Ad´ amek et al. (2) Vec has smooth monomorphisms, but not universally so because the initial object is not strict. (3) Categories in which colimits of chains and pullbacks are formed “set-like” have universally smooth monomorphisms. These include the categories of posets, graphs, topological spaces, presheaf categories, and many varieties, such as monoids, groups, and unary algebras. (4) Every locally finitely presentable category A with a strict initial object (see Remark 2.12(1)) has smooth monomorphisms. This follows from [8,Prop. 1.62]. Moreover, since pullbacks commute with colimits of chains, it is easy to prove that colimits of chains are universal using the strictness of 0. (5) The category CPO of complete partial orders does not have smooth mono- morphisms. Indeed, consider the ω-chain of linearly ordered sets A = {0,...,n}+ } ( a top element) with inclusion maps A → A . Its colimit is the linearly n n+1 ordered set N + { } of natural numbers with two added top elements For the sub-cpo N + { }, the inclusions of A are monic and form a cocone. But the unique factorizing morphism from the colimit is not monic. Notation 2.10. For every object A we denote by Sub(A) the poset of all subob- jects of A (represented by monomorphisms s : S  A), where s ≤ s if there exists i with s = s · i.If A has pullbacks we have, for every morphism f : A → B,the ← − inverse image operator, viz. the monotone map f : Sub(B) → Sub(A) assigning toasubobject s : S  A the subobject of B obtained by forming the inverse image of s under f , i.e. the pullback of s along f . ← − Lemma 2.11. If A is complete and wel l-powered, then f has a left adjoint → − given by the (direct) image operator f : Sub(A) → Sub(B). It maps a subobject t : T  B to the subobject of A given by the image of f · t; in symbols we have → − ← − f (t) ≤ s iff t ≤ f (s). Remark 2.12. If A is a complete and well-powered category, then Sub(A) is a complete lattice. Now suppose that A has smooth monomorphisms. (1) In this setting, the unique morphism ⊥ :0 → A is a monomorphism and therefore is the bottom element of the poset Sub(A). (2) Furthermore, a join of a chain in Sub(A) is obtained by forming a colimit, in the obvious way. (3) If A has universally smooth monomorphisms, then for every morphism ← − f : A → B, the operator f : Sub(B) → Sub(A) preserves unions of chains. Remark 2.13. Recall [1] that every endofunctor F yields the initial-algebra i 0 chain, viz. a transfinite chain formed by the objects F 0 of A , as follows: F 0=0, i+1 i the initial object; F 0= F (F 0), and for a limit ordinal i we take the colimit j i j of the chain (F 0) . The connecting morphisms w : F 0 → F 0 are defined j<i i,j by a similar transfinite recursion. On Well-Founded and Recursive Coalgebras 23 3 Recursive Coalgebras Assumption 3.1. We work with a standard set theory (e.g. Zermelo-Fraenkel), assuming the Axiom of Choice. In particular, we use transfinite induction on several occasions. (We are not concerned with constructive foundations in this paper.) Throughout this paper we assume that A is a complete and well-powered category A and that F : A → A preserves monomorphisms. For A = Set the condition that F preserves monomorphisms may be dropped. In fact, preservation of non-empty monomorphism is sufficient in general (for a suitable notion of non-empty monomorphism) [21, Lemma 2.5], and this holds for every set functor. The following definition of recursive coalgebras was first given by Osius [22]. Taylor [24] speaks of coalgebras obeying the recursion scheme. Capretta et al. [10] extended the concept to parametrical ly recursive coalgebra by dualizing completely iterative algebras [20]. Definition 3.2. A coalgebra α : A → FA is called recursive if for every algebra e : FX → X there exists a unique coalgebra-to-algebra morphism e : A → X , i.e. a unique morphism such that the square on the left below commutes: † † e e AX AX α e α,A e Fe ×A Fe FA FX FA ×AFX × A (A, α) is called parametrical ly recursive if for every morphism e : FX × A → X there is a unique morphism e : A → X such that the square on the right above commutes. Example 3.3. (1) A graph regarded as a coalgebra for P is recursive iff it has no infinite path. This is an immediate consequence of the General Recursion Theorem (see Corollary 5.6 and Example 4.5(2)). (2) Let ι : F (μF ) → μF be an initial algebra. By Lambek’s Lemma, ι is an −1 isomorphism. So we have a coalgebra ι : μF → F (μF ). This algebra is (para- metrically) recursive. By [20,Thm. 2.8], in dual form, this is precisely the same as the terminal parametrically recursive coalgebra (see also [10,Prop. 7]). (3) The initial coalgebra 0 → F 0 is recursive. (4)If (C, γ) is recursive so is (FC, F γ), see [10,Prop. 6]. (5) Colimits of recursive coalgebras in Coalg F are recursive. This is easy to prove, using that colimits of coalgebras are formed on the level of the underlying category. (6) It follows from items (3)–(5) that in the initial-algebra chain from Re- i i+1 mark 2.13 all coalgebras w : F 0 → F 0, i ∈ Ord, are recursive. i,i+1 24 J. Ad´ amek et al. (7) Every parametrically recursive coalgebra is recursive. (To see this, form for agiven e : FX → X the morphism e = e · π,where π : FX × A → FX is the projection.) In Corollaries 5.6 and 5.9 we will see that the converse often holds. Here is an example where the converse fails [3]. Let R : Set → Set be the functor defined in Example 2.2(4). Also, let C = {0, 1}, and define γ : C → RC by γ(0) = γ(1) = (0, 1).Then (C, γ) is a recursive coalgebra. Indeed, for every algebra α : RA → A the constant map h : C → A with h(0) = h(1) = α(d) is the unique coalgebra-to-algebra morphism. However, (C, γ) is not parametrically recursive. To see this, consider any morphism e : RX ×{0, 1}→ X such that RX contains more than one pair (x ,x ), x = x with e((x ,x ),i)= x for i =0, 1.Theneachsuchpairyields 0 1 0 1 0 1 i h : C → X with h(i)= x making the appropriate square commutative. Thus, (C, γ) is not parametrically recursive. (8) Capretta et al. [11] showed that recursivity semantically models divide-and- conquer programs, as demonstrated by the example of Quicksort. For every linearly ordered set A (of data elements), Quicksort is usually defined as the ∗ ∗ recursive function q : A → A given by q(ε)= ε and q(aw)= q(w )  (aq(w )), ≤a >a where A is the set of all lists on A, ε is the empty list,  is the concatenation of lists and w denotes the list of those elements of w which are less than or equal ≤a than a; analogously for w . >a Now consider the functor FX =1 + A × X × X on Set,where 1= {•},and ∗ ∗ ∗ form the coalgebra s : A → 1+ A × A × A given by s(ε)= • and s(aw)=(a, w ,w ) for a ∈ A and w ∈ A . ≤a >a We shall see that this coalgebra is recursive in Example 5.3. Thus, for the ∗ ∗ ∗ F -algebra m :1 + A × A × A → A given by m(•)= ε and m(a, w, v)= w (av) there exists a unique function q on A such that q = m · Fq · s. Notice that the last equation reflects the idea that Quicksort is a divide-and-conquer algorithm. The coalgebra structure s divides a list into two parts w and w .Then Fq ≤a >a sorts these two smaller lists, and finally in the combine- (or conquer-) step, the algebra structure m merges the two sorted parts to obtain the desired whole sorted list. Jeannin et al. [15, Sec. 4] provide a number of recursive functions arising in programming that are determined by recursivity of a coalgebra, e.g. the gcd of integers, the Ackermann function, and the Towers of Hanoi. 4 The Next Time Operator and Well-Founded Coalgebras As we have mentioned in the Introduction, the main issue of this paper is the relationship between two concepts pertaining to coalgebras: recursiveness and On Well-Founded and Recursive Coalgebras 25 well-foundedness. The concept of well-foundedness is well-known for directed graphs (G, →): it means that there are no infinite directed paths g → g →· · · . 0 1 For a set X with a relation R, well-foundedness means that there are no backwards sequences ··· Rx Rx Rx , i.e. the converse of the relation is well-founded as a 2 1 0 graph. Taylor [24,Def. 6.2.3] gave a more general category theoretic formulation of well-foundedness. We observe here that his definition can be presented in a compact way, by using an operator that generalizes the way one thinks of the semantics of the ‘next time’ operator of temporal logics for non-deterministic (or even probabilistic) automata and transitions systems. It is also strongly related to the algebraic semantics of modal logic, where one passes from a graph G to a function on P G. Jacobs [14] defined and studied the ‘next time’ operator on coalgebras for Kripke polynomial set functors. This can be generalized to arbitrary functors as follows. Recall that Sub(A) denotes the complete lattice of subobjects of A. Definition 4.1 [4,Def. 8.9]. Every coalgebra α : A → FA induces an endo- function on Sub(A), called the next time operator ← − : Sub(A) → Sub(A), (s)= α (Fs) for s ∈ Sub(A). In more detail: we define s and α(s) by the pullback in (4.1). (Being a pullback is indicated by the “corner” symbol.) In words, α(s) SFS assigns to each subobject s : S  A the inverse image of Fs under α.Since Fs is a monomorphism, s is a Fs (4.1) monomorphism and α(s) is (for every representation AFA s of that subobject of A) uniquely determined. Example 4.2. (1) Let A be a graph, considered as a coalgebra for P : Set → Set. If S ⊆ A is a set of vertices, then S is the set of vertices all of whose successors belong to S. (2) For the set functor FX = P (Σ × X ) expressing labelled transition systems the operator  for a coalgebra α : A → P (Σ × A) is the semantic counterpart of the next time operator of classical linear temporal logic, see e.g. Manna and Pnüeli [18]. In fact, for a subset S→ A we have that S consists of those states all of whose next states lie in S, in symbols: S = x ∈ A | (s, y) ∈ α(x) implies y ∈ S, for all s ∈ Σ . The next time operator allows a compact definition of well-foundedness as characterized by Taylor [24, Exercise VI.17] (see also [6, Corollary 2.19]): Definition 4.3. A coalgebra is wel l-founded if id is the only fixed point of its next time operator. Remark 4.4. (1) Let us call a subcoalgebra m:(B, β)  (A, α) cartesian provided that the square (4.2) is a pullback. Then (A, α) is well-founded iff it has no proper cartesian BFB subcoalgebra. That is, if m:(B, β)  (A, α) is a (4.2) Fm cartesian subcoalgebra, then m is an isomorphism. AFA Indeed, the fixed points of next time are precisely the 26 J. Ad´ amek et al. cartesian subcoalgebras. (2) A coalgebra is well-founded iff  has a unique pre-fixed point m ≤ m. Indeed, since Sub(A) is a complete lattice, the least fixed point of a monotone map is its least pre-fixed point. Taylor’s definition [24,Def. 6.3.2] uses that property: he calls a coalgebra well-founded iff  has no proper subobject as a pre-fixed point. Example 4.5. (1) Consider a graph as a coalgebra α : A → P A for the power- set functor (see Example 2.1). A subcoalgebra is a subset m : B→ A such that with every vertex v it contains all neighbors of v. The coalgebra structure β : B → P B is then the domain-codomain restriction of α.Tosaythat B is a cartesian subcoalgebra means that whenever a vertex of A has all neighbors in B, it also lies in B. It follows that (A, α) is well-founded iff it has no infinite directed path, see [24, Example 6.3.3]. (2) If μF exists, then as a coalgebra it is well-founded. Indeed, in every pull- −1 back (4.2),since ι (as α) is invertible, so is β. The unique algebra homomorph- −1 ism from μF to the algebra β : FB → B is clearly inverse to m. (3) If a set functor F fulfils F ∅ = ∅, then the only well-founded coalgebra is the empty one. Indeed, this follows from the fact that the empty coalgebra is a fixed point of . For example, a deterministic automaton over the input alphabet Σ, as a coalgebra for FX = {0, 1}× X , is well-founded iff it is empty. (4) A non-deterministic automaton may be considered as a coalgebra for the set functor FX = {0, 1}× (P X ) . It is well-founded iff the state transition graph is well-founded (i.e. has no infinite path). This follows from Corollary 4.10 below. (5) A linear weighted automaton, i.e. a coalgebra for FX = K × X on Vec , is well-founded iff every path in its state transition graph eventually leads to 0. This means that every path starting in a given state leads to the state 0 after finitely many steps (where it stays). Notation 4.6. Given a set functor F , we define for every set X the map τ : FX → P X assigning to every element x ∈ FX the intersection of all subsets m : M→ X such that x lies in the image of Fm: τ (x)= {m | m : M→ X satisfies x ∈ Fm[FM ]}. (4.3) Recall that a functor preserves intersections if it preserves (wide) pullbacks of families of monomorphisms. Gumm [13,Thm. 7.3] observed that for a set functor preserving intersections, the maps τ : FX → P X in (4.3) form a “subnatural” transformation from F to the power-set functor P . Subnaturality means that (although these maps do not form a natural transformation in general) for every monomorphism i : X → Y we have a commutative square: FX P X (4.4) Fi Pi FY P Y On Well-Founded and Recursive Coalgebras 27 Remark 4.7. As shownin[13,Thm. 7.4]and[23,Prop. 7.5], a set functor F preserves intersections iff the squares in (4.4) above are pullbacks. Moreover, loc. cit. and [13,Thm. 8.1]provethat τ : F → P is a natural transformation, provided F preserves inverse images and intersections. Definition 4.8. Let F be a set functor. For every coalgebra α : A → FA its α τ canonical graph is the following coalgebra for P : A − → FA − − → P A. Thanks to the subnaturality of τ one obtains the following results. Proposition 4.9. For every set functor F preserving intersections, the next time operator of a coalgebra (A, α) coincides with that of its canonical graph. Corollary 4.10 [24,Rem. 6.3.4]. A coalgebra for a set functor preserving intersections is wel l-founded iff its canonical graph is wel l-founded. Example 4.11. (1) For a (deterministic or non-deterministic) automaton, the canonical graph has an edge from s to t iff there is a transition from s to t for some input letter. Thus, we obtain the characterization of well-foundedness as stated in Example 4.5(3) and (4). (2) Every polynomial functor H : Set → Set preserves intersections. Thus, a coalgebra (A, α) is well-founded if there are no infinite paths in its canonical graph. The canonical graph of A has an edge from a to b if α(a) is of the form σ(c ,...,c ) for some σ ∈ Σ and if b is one of the c ’s. 1 n n i (3) Thus, for the functor FX =1 + A × X × X , the coalgebra (A ,s) of Example 3.3(8) is easily seen to be well-founded via its canonical graph. Indeed, this graph has for every list w one outgoing edge to the list w and one to w ≤a >a for every a ∈ A. Hence, this is a well-founded graph. Lemma 4.12. The next time operator is monotone: if m ≤ n,then m ≤n. Lemma 4.13. Let α : A → FA be a coalgebra and m : B  A a subobject. (1) There is a coalgebra structure β : B → FB for which m gives a subcoalgebra of (A, α) iff m ≤m. (2) There is a coalgebra structure β : B → FB for which m gives a cartesian subcoalgebra of (A, α) iff m = m. Lemma 4.14. For every coalgebra homomorphism f :(B, β) → (A, α) we have ← − ← − · f ≤ f · , β α where  and  denote the next time operators of the coalgebras (A, α) and α β (B, β), respectively, and ≤ is the pointwise order. Corollary 4.15. For every coalgebra homomorphism f :(B, β) → (A, α) we ← − ← − have  · f = f · , provided that either β α 28 J. Ad´ amek et al. (1) f is a monomorphism in A and F preserves finite intersections, or (2) F preserves inverse images. Definition 4.16 [4]. The wel l-founded part of a coalgebra is its largest well- founded subcoalgebra. The well-founded part of a coalgebra always exists and is the coreflection in the category of well-founded coalgebras [6,Prop. 2.27]. We provide a new, shorter proof of this fact. The well-founded part is obtained by the following: Construction 4.17 [6, Not. 2.22]. Let α : A → FA be a coalgebra. We know that Sub(A) is a complete lattice and that the next time operator  is monotone (see Lemma 4.12). Hence, by the Knaster-Tarski fixed point theorem,  has a ∗ ∗ least fixed point, which we denote by a : A  A. ∗ ∗ ∗ By Lemma 4.13(2), we know that there is a coalgebra structure α : A → FA ∗ ∗ ∗ so that a :(A ,α )  (A, α) is the smallest cartesian subcoalgebra of (A, α). ∗ ∗ Proposition 4.18. For every coalgebra (A, α), the coalgebra (A ,α ) is wel l- founded. ∗ ∗ Proof. Let m:(B, β)  (A ,α ) be a cartesian subcoalgebra. By Lemma 4.13, ∗ ∗ a · m : B → A is a fixed point of .Since a is the least fixed point, we have ∗ ∗ ∗ ∗ ∗ ∗ a ≤ a · m, i.e. a = a · m · x for some x : A  B.Since a is monic, we thus have m · x = id .So m is a monomorphism and a split epimorphism, whence an isomorphism. Proposition 4.19. The ful l subcategory of Coalg F given by wel l-founded coal- gebras is coreflective. In fact, the wel l-founded coreflection of a coalgebra (A, α) ∗ ∗ ∗ is its wel l-founded part a :(A ,α )  (A, α). Proof. We are to prove that for every coalgebra homomorphism f :(B, β) → (A, α),where (B, β) is well-founded, there exists a coalgebra homomorphism ∗ ∗ ∗ f :(B, β) → (A ,α ) such that a · f = f . The uniqueness is easy. ← − For the existence of f , we first observe that f (a ) is a pre-fixed point of ← − ← − ← − ∗ ∗ ∗ : indeed, using Lemma 4.14 we have  ( f (a )) ≤ f ( (a )) = f (a ). β β α ← − ∗ ∗ By Remark 4.4(2), we therefore have id = b ≤ f (a ) in Sub(B).Usingthe → − adjunction of Lemma 2.11,wehave f (id ) ≤ a in Sub(A). Now factorize f as e m → − → − B  C  A.We have f (id )= m, and we then obtain m = f (id ) ≤ a , B B ∗ ∗ i.e. there exists a morphism h : C  A such that a · h = m.Thus, f = ∗ ∗  ∗ h · e : B → A is a morphism satisfying a · f = a · h · e = m · e = f . It follows ∗ ∗ ∗ that f is a coalgebra homomorphism from (B, β) to (A ,α ) since f and a are and F preserves monomorphisms. Construction 4.20 [6, Not. 2.22]. Let (A, α) be a coalgebra. We obtain a , the least fixed point of , as the join of the following transfinite chain of subobjects a : A  A, i ∈ Ord. First, put a = ⊥ , the least subobject of A. i i 0 A Given a : A  A, put a = a : A = A  A. For every limit ordinal i i i+1 i i+1 i j, put a = a .Since Sub(A) is a set, there exists an ordinal i such that j i i<j ∗ ∗ a = a : A  A. i On Well-Founded and Recursive Coalgebras 29 Remark 4.21. Note that, whenever monomorphisms are smooth, we have A = 0 and the above join a is obtained as the colimit of the chain of the subobject a : A  A, i<j (see Remark 2.12). i i If F is a finitary functor on a locally finitely presentable category, then the least ordinal i with a = a is at most ω, but in general one needs transfinite iteration to reach a fixed point. Example 4.22. Let (A, α) be a graph regarded as a coalgebra for P (see Example 2.1). Then A = ∅, A is formed by all leaves; i.e. those nodes with no 0 1 neighbors, A by all leaves and all nodes such that every neighbor is a leaf, etc. We see that a node x lies in A iff every path starting in x has length at most i+1 i. Hence A = A is the set of all nodes from which no infinite paths start. We close with a general fact on well-founded parts of fixed points (i.e. (co)alge- bras whose structure is invertible). The following result generalizes [15, Cor. 3.4], and it also appeared before for functors preserving finite intersections [4,The- orem 8.16 and Remark 8.18]. Here we lift the latter assumption (see [5,The- orem 7.6] for the new proof): Theorem 4.23. Let A be a complete and wel l-powered category with smooth monomorphisms. For F preserving monomorphisms, the wel l-founded part of every fixed point is an initial algebra. In particular, the only wel l-founded fixed point is the initial algebra. Example 4.24. We illustrate that for a set functor F preserving monomorph- isms, the well-founded part of the terminal coalgebra is the initial algebra. ∞ ∗ Consider FX = A × X +1. The terminal coalgebra is the set A ∪ A of finite and infinite sequences from the set A. The initial algebra is A . It is easy to ∗ ∞ ∗ check that A is the well-founded part of A ∪ A . 5 The General Recursion Theorem and its Converse The main consequence of well-foundedness is parametric recursivity. This is Taylor’s General Recursion Theorem [24,Theorem 6.3.13]. Taylor assumed that F preserves inverse images. We present a new proof for which it is sufficient that F preserves monomorphisms, assuming those are smooth. Theorem 5.1 (General Recursion Theorem). Let A be a complete and wel lpowered category with smooth monomorphisms. For F : A → A preserving monomorphisms, every wel l-founded coalgebra is parametrical ly recursive. Proof sketch. (1) Let (A, α) be well-founded. We first prove that it is recursive. We use the subobjects a : A  A of Construction 4.20 , the corresponding i i One might object to this use of transfinite recursion, since Theorem 5.1 itself could be used as a justification for transfinite recursion. Let us emphasize that we are not presenting Theorem 5.1 as a foundational contribution. We are building on the classical theory of transfinite recursion. 30 J. Ad´ amek et al. morphisms α(a ): A = A → FA (cf. Definition 4.3), and the recursive i i+1 i i coalgebras (F 0,w ) of Example 3.3(6). We obtain a natural transformation i,i+1 h from the chain (A ) in Construction 4.20 to the initial-algebra chain (F 0) (see Remark 2.13) by transfinite recursion. Now for every algebra e : FX → X , we obtain a unique coalgebra-to-algebra morphism f : F 0 → X , i.e. we have that f = e · Ff · w .Since (A, α) is i i i i,i+1 well-founded, we know that α = α = α(a ) for some i. From this it is not difficult to prove that f · h is a coalgebra-to-algebra morphism from (A, α) to (X, e). i i In order to prove uniqueness, we prove by transfinite induction that for any † † given coalgebra-to-algebra homomorphism e , one has e · a = f · h · a for j j j j every ordinal number j. Then for the above ordinal number i with a = id ,we i A have e = f · h , as desired. This shows that (A, α) is recursive. i i (2) We prove that (A, α) is parametrically recursive. Consider the coalgebra α, id  : A → FA × A for F (−) × A. This functor preserves monomorphisms since F does and monomorphisms are closed under products. The next time operator  on Sub(A) is the same for both coalgebras since the square (4.1) is a pullback if and only if the square on the right below is one. Since id is the unique fixed point of w.r.t. F (see Definition 4.3), it is also the α(m),m SFS × A unique fixed point of  w.r.t. F (−) × A. Thus, (A, α, id ) is a well-founded coal- A m Fm×A gebra for F (−) × A. By the previous ar- α,A AFA × A gument, this coalgebra is thus recursive for F (−) × A; equivalently, (A, α) is parametrically recursive for F . Theorem 5.2. For every endofunctor on Set or Vec (vector spaces and linear maps), every wel l-founded coalgebra is parametrical ly recursive. Proof sketch. For Set, we apply Theorem 5.1 to the Trnková hull F (see Proposi- tion 2.3), noting that F and F have the same (non-empty) coalgebras. Moreover, one can show that every well-founded (or recursive) F -coalgebra is a well-founded (recursive, resp.) F -coalgebra. For Vec , observe that monomorphisms split and are therefore preserved by every endofunctor F . Example 5.3. We saw in Example 4.11(3) that for FX =1 + A × X × X the coalgebra (A, s) from Example 3.3(8) is well-founded, and therefore it is (parametrically) recursive. Example 5.4. Well-founded coalgebras need not be recursive when F does not preserve monomorphisms. We take A to be the category of sets with a predicate, i.e. pairs (X, A),where A ⊆ X . Morphisms f :(X, A) → (Y, B) satisfy f [A] ⊆ B. Denote by 1 the terminal object (1, 1). We define an endofunctor F by F (X, ∅)=(X +1, ∅), and for A = ∅, F (X, A)= 1. For a morphism f :(X, A) → (Y, B), put Ff = f + id if A = ∅;if A = ∅,thenalso B = ∅ and Ff is id : 1 → 1. On Well-Founded and Recursive Coalgebras 31 The terminal coalgebra is id : 1 → 1, and it is easy to see that it is well- founded. But it is not recursive: there are no coalgebra-to-algebra morphisms into an algebra of the form F (X, ∅) → (X, ∅). We next prove a converse to Theorem 5.1: “recursive =⇒ well-founded”. Related results appear in Taylor [23, 24], Adámek et al. [3] and Jeannin et al. [15]. Recall universally smooth monomorphisms from Definition 2.8(2).A pre-fixed point of F is a monic algebra α : FA  A. Theorem 5.5. Let A be a complete and wel lpowered category with universal ly smooth monomorphisms, and suppose that F : A → A preserves inverse images and has a pre-fixed point. Then every recursive coalgebra is wel l-founded. Proof. (1) We first observe that an initial algebra exists. This follows from results by Trnková et al. [25] as we now briefly recall. Recall the initial-algebra chain from Remark 2.13.Let β : FB  B be a pre-fixed point. Then there is a unique cocone β : F 0 → B satisfying β = β · Fβ .Moreover, each β is monomorphic. i i+1 i i Since B has only a set of subobjects, there is some λ such that for every i>λ, all of the morphisms β represent the same subobject of B. Consequently, w i λ,λ+1 of Remark 2.13 is an isomorphism, due to β = β · w .Then μF = F 0 λ λ+1 λ,λ+1 −1 with the structure ι = w : F (μF ) → μF is an initial algebra. λ,λ+1 (2) Now suppose that (A, α) is a recursive coalgebra. Then there exists a unique −1 coalgebra homomorphism h:(A, α) → (μF, ι ). Let us abbreviate w by iλ c : F 0  μF , and recall the subobjects a : A  A from Construction 4.20. i i i We will prove by transfinite induction that a is the inverse image of c under h;in i i ← − symbols: a = h (c ) for all ordinals i. Then it follows that a is an isomorphism, i i λ since so is c , whence (A, α) is well-founded. In the base case i =0 this is clear since A = W =0 is a strict initial object. 0 0 For the isolated step we compute the pullback of c : W → μF along h i+1 i+1 using the following diagram: α(a ) i Fh A FA FW i+1 i i i+1 a Fa Fc i+1 i i α Fh ι AFA F (μF ) μF By the induction hypothesis and since F preserves inverse images, the middle square above is a pullback. Since the structure map ι of the initial algebra is an isomorphism, it follows that the middle square pasted with the right-hand triangle is also a pullback. Finally, the left-hand square is a pullback by the definition of a . Thus, the outside of the above diagram is a pullback, as required. i+1 For a limit ordinal j, we know that a = a and similarly, c = c j i j i i<j i<j since W = colim W and monomorphisms are smooth (see Remark 2.12(2)). j i<j j ← − Using Remark 2.12(3) and the induction hypothesis we thus obtain h (c )= ← −   ← − h c = h (c )= a = a . i i i j i<j i<j i<j 32 J. Ad´ amek et al. Corollary 5.6. Let A and F satisfy the assumptions of Theorem 5.5. Then the fol lowing properties of a coalgebra are equivalent: (1) wel l-foundedness, (2) parametric recursiveness, (3) recursiveness, −1 (4) existence of a homomorphism into (μF, ι ), (5) existence of a homomorphism into a wel l-founded coalgebra. Proof sketch. We already know (1) ⇒ (2) ⇒ (3). Since F has an initial algebra (as proved in Theorem 5.5), the implication (3) ⇒ (4) follows from Example 3.3(2). In Theorem 5.5 we also proved (4) ⇒ (1). The implication (4) ⇒ (5) follows −1 from Example 4.5(2). Finally, it follows from [6, Remark 2.40]that (μF, ι ) is a terminal well-founded coalgebra, whence (5) ⇒ (4). Example 5.7. (1) The category of many-sorted sets satisfies the assumptions of Theorem 5.5, and polynomial endofunctors on that category preserve inverse images. Thus, we obtain Jeannin et al.’s result [15,Thm. 3.3]that(1)–(4)in Corollary 5.6 are equivalent as a special instance. (2) The implication (4) ⇒ (3)in Corollary 5.6 does not hold for vector spaces. In fact, for the identity functor on Vec we have μId =(0, id ). Hence, every coalgebra has a homomorphism into μId . However, not every coalgebra is recursive, e.g. the coalgebra (K, id ) admits many coalgebra-to-algebra morphisms to the algebra (K, id ). Similarly, the implication (4) ⇒ (1) does not hold. We also wish to mention a result due to Taylor [23,Rem. 3.8]. It uses the concept of a subobject classifier originating in [17] and prominent in topos theory. This is an object Ω with a subobject t:1  Ω such that for every subobject b : B  A ˆ ˆ there is a unique b : A → Ω such that b is the inverse image of t under b.By definition, every elementary topos has a subobject classifier, in particular every category Set with C small. Our standing assumption that A is a complete and well-powered category is not needed for the next result: finite limits are sufficient. Theorem 5.8 (Taylor [23]). Let F be an endofunctor preserving inverse im- ages on a finitely complete category with a subobject classifier. Then every recursive coalgebra is wel l-founded. Corollary 5.9. For every set functor preserving inverse images, the fol lowing properties of a coalgebra are equivalent: wel l-foundedness ⇐⇒ parametric recursiveness ⇐⇒ recursiveness. Example 5.10. Thehypothesisin Theorems 5.5 and 5.8 that the functor preserves inverse images cannot be lifted. In order to see this, we consider the functor R : Set → Set of Example 2.2(4). It preserves monomorphisms but not inverse images. The coalgebra A = {0, 1} with the structure α constant to (0, 1) is recursive: given an algebra β : RB → B, the unique coalgebra-to-algebra On Well-Founded and Recursive Coalgebras 33 homomorphism h : {0, 1}→ B is given by h(0) = h(1) = β(d). But A is not well-founded: ∅ is a cartesian subcoalgebra. −1 Recall that an initial algebra (μF, ι) is also considered as a coalgebra (μF, ι ). Taylor [23, Cor. 9.9] showed that, for functors preserving inverse images, the terminal well-founded coalgebra is the initial algebra. Surprisingly, this result is true for al l set functors. Theorem 5.11 [6,Thm. 2.46]. For every set functor, a terminal wel l-founded coalgebra is precisely an initial algebra. Theorem 5.12. For every functor on Vec preserving inverse images, the fol- lowing properties of a coalgebra are equivalent: wel l-foundedness ⇐⇒ parametric recursiveness ⇐⇒ recursiveness. 6 Closure Properties of Well-founded Coalgebras In this section we will see that strong quotients and subcoalgebras (see Remark 2.7) of well-founded coalgebras are well-founded again. We mention the following corollary to Proposition 4.19. For endofunctors on sets preserving inverse images this was stated by Taylor [24, Exercise VI.16]: Proposition 6.1. The subcategory of Coalg F formed by al l wel l-founded coal- gebras is closed under strong quotients and coproducts in Coalg F . This follows from a general result on coreflective subcategories [2,Thm. 16.8]: the category Coalg F has the factorization system of Proposition 2.6,and its full subcategory of well-founded coalgebras is coreflective with monomorphic coreflections (see Proposition 4.19). Consequently, it is closed under strong quotients and colimits. We prove next that, for an endofunctor preserving finite intersections, well- founded coalgebras are closed under subcoalgebras provided that the complete lattice Sub(A) is a frame. This means that for every subobject m : B  A and every family m (i ∈ I ) of subobjects of A we have m ∧ m = (m ∧ m ). i i i i∈I i∈I ← − Equivalently, m : Sub(A) → Sub(B) (see Notation 2.10) has a right adjoint m : Sub(B) → Sub(A). This property holds for Set as well as for the categories of posets, graphs, topological spaces, and presheaf categories Set , C small. Moreover, it holds for every Grothendieck topos. The categories of complete partial orders and Vec do not satisfy this requirement. Proposition 6.2. Suppose that F preserves finite intersections, and let (A, α) be a wel l-founded coalgebra such that Sub(A) a frame. Then every subcoalgebra of (A, α) is wel l-founded. 34 J. Ad´ amek et al. Proof. Let m:(B, β)  (A, α) be a subcoalgebra. We will show that the only pre-fixed point of  is id (cf. Remark 4.4(2)). Suppose s : S  B fulfils β B ← − ← − (s) ≤ s.Since F preserves finite intersections, we have m · =  · m by β α β ← − ← − Corollary 4.15(1). The counit of the above adjunction m  m yields m(m (s)) ≤ ∗ ∗ ← − ← − s, so that we obtain m( (m (s))) =  (m(m (s))) ≤ (s) ≤ s. Using again α ∗ β ∗ β ← − the adjunction m  m ,wehaveequivalentlythat  (m (s)) ≤ m (s); i.e. m (s) ∗ α ∗ ∗ ∗ is a pre-fixed point of  .Since (A, α) is well-founded, Corollary 4.15(1) implies ← − that m (s)= id .Since m is also a right adjoint and therefore preserves the top ∗ A ← − ← − element of Sub(B),wethusobtain id = m(id )= m(m (s)) ≤ s. B A ∗ Remark 6.3. Given a set functor F preserving inverse images, a much better result was proved by Taylor [24, Corollary 6.3.6]: for every coalgebra homo- morphism f :(B, β) → (A, α) with (A, α) well-founded so is (B, β). In fact, our proof above is essentially Taylor’s. Corollary 6.4. If a set functor preserves finite intersections, then subcoalgebras of wel l-founded coalgebras are wel l-founded. Trnková [26] proved that every set functor preserves all nonempty finite intersections. However, this does not suffice for Corollary 6.4: Example 6.5. A well-founded coalgebra for a set functor can have non-well- founded subcoalgebras. Let F ∅ =1 and FX =1+1 for all nonempty sets X,and let Ff = inl:1 → 1+1 be the left-hand injection for all maps f : ∅→ X with X nonempty. The coalgebra inr:1 → F 1 is not well-founded because its empty subcoalgebra is cartesian. However, this is a subcoalgebra of id :1 + 1 → 1+1 (via the embedding inr), and the latter is well-founded. The fact that subcoalgebras of a well-founded coalgebra are well-founded does not necessarily need the assumption that Sub(A) is a frame. Instead, one may assume that the class of morphisms is universally smooth: Theorem 6.6. If A has universal ly smooth monomorphisms and F preserves finite intersections, every subcoalgebra of a wel l-founded coalgebra is wel l-founded. 7 Conclusions Well-founded coalgebras introduced by Taylor [24] have a compact definition based on an extension of Jacobs’ ‘next time’ operator. Our main contribution is a new proof of Taylor’s General Recursion Theorem that every well-founded coalgebra is recursive, generalizing this result to all endofunctors preserving monomorphisms on a complete and well-powered category with smooth monomorphisms. For functors preserving inverse images, we also have seen two variants of the converse implication “recursive ⇒ well-founded”, under additional hypothesis: one due to Taylor for categories with a subobject classifier, and the second one provided that the category has universally smooth monomorphisms and the functor has a pre-fixed point. Various counterexamples demonstrate that all our hypotheses are necessary. On Well-Founded and Recursive Coalgebras 35 References 1. Adámek, J.: Free algebras and automata realizations in the language of categories. Comment. Math. Univ. Carolin. 15, 589–602 (1974) 2. Adámek, J., Herrlich, H., Strecker, G.E.: Abstract and Concrete Categories: The Joy of Cats. Dover Publications, 3rd edn. (2009) 3. Adámek, J., Lücke, D., Milius, S.: Recursive coalgebras of finitary functors. Theor. In- form. Appl. 41(4), 447–462 (2007) 4. Adámek, J., Milius, S., Moss, L.S.: Fixed points of functors. J. Log. Algebr. Methods Program. 95, 41–81 (2018) 5. Adámek, J., Milius, S., Moss, L.S.: On well-founded and recursive coalgebras (2019), full version; available online at 6. Adámek, J., Milius, S., Moss, L.S., Sousa, L.: Well-pointed coalgebras. Log. Methods Comput. Sci. 9(2), 1–51 (2014) 7. Adámek, J., Milius, S., Sousa, L., Wißmann, T.: On finitary functors. Theor. Appl. Categ. 34, 1134–1164 (2019). available online at 8. Adámek, J., Rosický, J.: Locally Presentable and Accessible Categories. Cambridge University Press (1994) 9. Borceux, F.: Handbook of Categorical Algebra: Volume 1, Basic Category Theory. Encyclopedia of Mathematics and its Applications, Cambridge University Press (1994) 10. Capretta, V., Uustalu, T., Vene, V.: Recursive coalgebras from comonads. In- form. and Comput. 204, 437–468 (2006) 11. Capretta, V., Uustalu, T., Vene, V.: Corecursive algebras: A study of general structured corecursion. In: Oliveira, M., Woodcock, J. (eds.) Formal Methods: Foundations and Applications, Lecture Notes in Computer Science, vol. 5902, pp. 84–100. Springer Berlin Heidelberg (2009) 12. Eppendahl, A.: Coalgebra-to-algebra morphisms. In: Proc. Category Theory and Computer Science (CTCS). Electron. Notes Theor. Comput. Sci., vol. 29, pp. 42–49 (1999) 13. Gumm, H.: From T -coalgebras to filter structures and transition systems. In: Fiadeiro, J.L., Harman, N., Roggenbach, M., Rutten, J. (eds.) Algebra and Coalgebra in Computer Science, Lecture Notes in Computer Science, vol. 3629, pp. 194–212. Springer Berlin Heidelberg (2005) 14. Jacobs, B.: The temporal logic of coalgebras via Galois algebras. Math. Structures Comput. Sci. 12(6), 875–903 (2002) 15. Jeannin, J.B., Kozen, D., Silva, A.: Well-founded coalgebras, revisited. Math. Struc- tures Comput. Sci. 27, 1111–1131 (2017) 16. Kurz, A.: Logics for Coalgebras and Applications to Computer Science. Ph.D. thesis, Ludwig-Maximilians-Universität München (2000) 17. Lawvere, W.F.: Quantifiers and sheaves. Actes Congès Intern. Math. 1, 329–334 (1970) 18. Manna, Z., Pnüeli, A.: The Temporal Logic of Reactive and Concurrent Systems: Specification. Springer-Verlag (1992) 19. Meseguer, J., Goguen, J.A.: Initiality, induction, and computability. In: Algebraic methods in semantics (Fontainebleau, 1982), pp. 459–541. Cambridge Univ. Press, Cambridge (1985) 20. Milius, S.: Completely iterative algebras and completely iterative monads. In- form. and Comput. 196, 1–41 (2005) 36 J. Ad´ amek et al. 21. Milius, S., Pattinson, D., Wißmann, T.: A new foundation for finitary corecursion and iterative algebras. Inform. and Comput. 217 (2020), available online at https: // 22. Osius, G.: Categorical set theory: a characterization of the category of sets. J. Pure Appl. Algebra 4(79–119)(1974) 23. Taylor, P.: Towards a unified treatment of induction I: the general recursion theorem (1995–6), preprint, available at 24. Taylor, P.: Practical Foundations of Mathematics. Cambridge University Press (1999) 25. Trnková, V., Adámek, J., Koubek, V., Reiterman, J.: Free algebras, input processes and free monads. Comment. Math. Univ. Carolin. 16, 339–351 (1975) 26. Trnková, V.: Some properties of set functors. Comment. Math. Univ. Carolin. 10, 323–352 (1969) 27. Trnková, V.: On a descriptive classification of set functors I. Com- ment. Math. Univ. Carolin. 12, 143–174 (1971) Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made. The images or other third party material in this chapter are included in the chapter’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. Timed Negotiations 1() 2 3 1 S. Akshay , Blaise Genest ,Lo¨ıc H´elou¨et , and Sharvik Mital IIT Bombay, Mumbai, India {akshayss,sharky} Univ Rennes, CNRS, IRISA, Rennes, France Univ Rennes, Inria, Rennes, France Abstract. Negotiations were introduced in [6] as a model for concurrent systems with multiparty decisions. What is very appealing with negotia- tions is that it is one of the very few non-trivial concurrent models where several interesting problems, such as soundness, i.e. absence of deadlocks, can be solved in PTIME [3]. In this paper, we introduce the model of timed negotiations and consider the problem of computing the minimum and the maximum execution times of a negotiation. The latter can be solved using the algorithm of [10] computing costs in negotiations, but surprisingly minimum execution time cannot. This paper proposes new algorithms to compute both minimum and maximum execution time, that work in much more general classes of ne- gotiations than [10], that only considered sound and deterministic nego- tiations. Further, we uncover the precise complexities of these questions, ranging from PTIME to Δ -complete. In particular, we show that com- puting the minimum execution time is more complex than computing the maximum execution time in most classes of negotiations we consider. 1 Introduction Distributed systems are notoriously difficult to analyze, mainly due to the ex- plosion of the number of configurations that have to be considered to answer even simple questions. A challenging task is then to propose models on which analysis can be performed with tractable complexities, preferably within poly- nomial time. Free choice Petri nets are a classical model of distributed systems that allow for efficient verification, in particular when the nets are 1-safe [4, 5]. Recently, [6] introduced a new model called negotiations for workflows and business processes. A negotiation describes how processes interact in a dis- tributed system: a subset of processes in a node of the system take a synchronous decisions among several outcomes. The effect of this outcome sends contribut- ing processes to a new set of nodes. The execution of a negotiation ends when processes reach a final configuration. Negotiations can be deterministic (once an outcome is fixed, each process knows its unique successor node) or not. Negotiations are an interesting model since several properties can be decided with a reasonable complexity. The question of soundness, i.e., deadlock-freedom: Supported by DST/CEFIPRA/INRIA Associated team EQuaVE and DST/SERB Matrices grant MTR/2018/000744. The Author(s) 2020 J. Goubault-Larrecq and B. K¨ onig (Eds.): FOSSACS 2020, LNCS 12077, pp. 37–56, 2020. 38 S. Akshay et al. whether from every reachable configuration one can reach a final configuration, is PSPACE-complete. However, for deterministic negotiations, it can be decided in PTIME [7]. The decision procedure uses reduction rules. Reduction techniques were originally proposed for Petri nets [2, 8, 11, 16]. The main idea is to define transformations rules that produce a model of smaller size w.r.t. the original model, while preserving the property under analysis. In the context of negotia- tions, [7, 3] proposed a sound and complete set of soundness-preserving reduction rules and algorithms to apply these rules efficiently. The question of soundness for deterministic negotiations was revisited in [9] and showed NLOGSPACE- complete using anti patterns instead of reduction rules. Further, they show that the PTIME result holds even when relaxing determinism [9]. Negotiation games have also been considered to decide whether one particular process can force ter- mination of a negotiation. While this question is EXPTIME-complete in general, for sound and deterministic negotiations, it becomes PTIME [12]. While it is natural to consider cost or time in negotiations (e.g. think of the Brexit negotiation where time is of the essence, and which we model as running example in this paper), the original model of negotiations proposed by [6] is only qualitative. Recently, [10] has proposed a framework to associate costs to the executions of negotiations, and adapt a static analysis technique based on reduction rules to compute end-to-end cost functions that are not sensitive to scheduling of concurrent nodes. For sound and deterministic negotiations, the end-to-end cost can be computed in O(n.(C + n)), where n is the size of the negotiation and C the time needed to compute the cost of an execution. Requir- ing soundness or determinism seems perfectly reasonable, but asking sound and deterministic negotiations is too restrictive: it prevents a process from waiting for decisions of other processes to know how to proceed. In this paper, we revisit time in negotiations. We attach time intervals to outcomes of nodes. We want to compute maximal and minimal executions times, for negotiations that are not necessarily sound and deterministic. Since we are interested in minimal and maximal execution time, cycles in negotiations can be either bypassed or lead to infinite maximal time. Hence, we restrict this study to acyclic negotiations. Notice that time can be modeled as a cost, following [10], and the maximal execution time of a sound and deterministic negotiation can be computed in PTIME using the algorithm from [10]. Surprisingly however, we give an example (Example 3) for which the minimal execution time cannot be computed in PTIME by this algorithm. The first contribution of the paper shows that reachability (whether at least one run of a negotiation terminates) is NP-complete, already for (untimed) deter- ministic acyclic negotiations. This implies that computing minimal or maximal execution time for deterministic (but unsound) acyclic negotiations cannot be done in PTIME (unless NP=PTIME). We characterize precisely the complex- ities of different decision variants (threshold, equality, etc.), with complexities ranging from (co-)NP-complete to Δ . We thus turn to negotiations that are sound but not necessarily determinis- tic. Our second contribution is a new algorithm, not based on reduction rules, Timed Negotiations 39 to compute the maximal execution time in PTIME for sound negotiations. It is based on computing the maximal execution time of critical paths in the nego- tiations. However, we show that minimal execution time cannot be computed in PTIME for sound negotiations (unless NP=PTIME): deciding whether the minimal execution time is lower than T is NP-complete, even for T given in unary, using a reduction from a Bin packing problem. This shows that minimal execution time is harder to compute than maximal execution time. Our third contribution consists in defining a class in which the minimal exe- cution time can be computed in (pseudo) PTIME. To do so, we define the class of k-layered negotiations, for k fixed, that is negotiations where nodes can be or- ganized into layers of at most k nodes at the same depth. These negotiations can be executed without remembering more than k nodes at a time. In this case, we show that computing the maximal execution time is PTIME, even if the negoti- ation is neither deterministic nor sound. The algorithm, not based on reduction rules, uses the k-layer restriction in order to navigate in the negotiation while considering only a polynomial number of configurations. For minimal execution time, we provide a pseudo PTIME algorithm, that is PTIME if constants are given in unary. Finally, we show that the size of constants do matter: deciding whether the minimal execution time of a k-layered negotiation is less than T is NP-complete, when T is given in binary. We show this by reducing from a Knapsack problem, yet again emphasizing that the minimal execution time of a negotiation is harder to compute than its maximal execution time. This paper is organized as follows. Section 2 introduces the key ingredients of negotiations, determinism and soundness, known results in the untimed setting, and provides our running example modeling the Brexit negotiation. Section 3 introduces time in negotiations, gives a semantics to this new model, and for- malizes several decision problems on maximal and minimal durations of runs in timed negotiations. We recall the main results of the paper in Section 4. Then, Section 5 considers timed execution problems for deterministic negotiations, Sec- tion 6 for sound negotiations, and section 7 for layered negotiations. Proof details for the last three sections are given in an extended version of this paper [1]. 2 Negotiations: Definitions and Brexit example In this section, we recall the definition of negotiations, of some subclasses (acyclic and deterministic), as well as important problems (soundness and reachability). Definition 1 (Negotiation [6, 10]). A negotiation over a finite set of pro- cesses P is a tuple N =(N, n ,n , X ), where: 0 f – N is a finite set of nodes. Each node is a pair n =(P ,R ) where P ⊆ P n n n is a non empty set of processes participating in node n, and R is a finite set of outcomes of node n (also called results), with R = {r }. We denote n f by R the union of all outcomes of nodes in N. – n is the first node of the negotiation and n is the final node. Every process 0 f in P participates in both n and n . 0 f 40 S. Akshay et al. EU PM Pa EU PM Pa court no-backstop backstop court no-court EU EU PM PM Pa Pa c-meet meet recess defend EU PM EU PM Pa deal agreed deal w/backstop debate EU PM Pa delay delay brexit EU PM Pa Fig. 1. A (sound but non-deterministic) negotiation modeling Brexit. – For all n ∈ N, X : P × R → 2 is a map defining the transition relation n n n from node n, with X (p, r)= ∅ iff n = n ,r = r . We denote X : N × P × n f f R → 2 the partial map defined on ({n}× P × R ), with X (n, p, a)= n n n∈N X (p, a) for all p, a. Intuitively, at a node n =(P ,R ) in a negotiation, all processes of P have n n n to agree on a common outcome r chosen from R . Once this outcome r is chosen, every process p ∈ P is ready to move to any node prescribed by X (n, p, r). A new node m can only start when all processes of P are ready to move to m. Example 1. We illustrate negotiations by considering a simplified model of the Brexit negotiation, see Figure 1. There are 3 processes, P = {EU, PM, Pa}.At first EU decides whether or not to enforce a backstop in any deal (outcome back- stop) or not (outcome no-backstop). In the meantime, PM decides to proroge Pa, and Pa can choose or not to appeal to court (outcome court/no court). If it goes to court, then PM and Pa will take some time in court (c-meet, defend), before PM can meet EU to agree on a deal. Otherwise, Pa goes to recess, and PM can meet EU directly. Once EU and PM agreed on a deal, PM tries to convince Pa to vote the deal. The final outcome is whether the deal is voted, or whether Brexit is delayed. Definition 2 (Deterministic negotiations). A process p ∈ P is determinis- tic iff, for every n ∈ N and every outcome r of n, X (n, p, r) is a singleton. A ne- gotiation is deterministic iff all its processes are deterministic. It is weakly non- deterministic [9] (called weakly deterministic in [3]) iff, for every node n, one of the processes in P is deterministic. Last, it is very weakly non-deterministic [9] (called weakly deterministic in [6]) iff, for every n, every p ∈ P and every out- come r of n, there exists a deterministic process q such that q ∈ P  for every n ∈X (n, p, r). Timed Negotiations 41 In deterministic negotiations, once an outcome is chosen, each process knows the next node it will be involved in. In (very-)weakly non-deterministic nego- tiations, the next node might depend upon the outcome chosen in other nodes by other processes. However, once the outcomes have been chosen for all cur- rent nodes, there is only one next node possible for each process. Observe that the class of deterministic negotiations is isomorphic to the class of free choice workflow nets [10]. In Example 1, the Brexit negotiation is non-deterministic, because process PM is non-deterministic. Indeed, consider outcomes c-meet:it allows two nodes, according to whether the backstop is enforced or not, which is a decision taken by process EU. Semantics: A configuration [3] of a negotiation is a mapping M : P → 2 . Intuitively, it tells for each process p the set M(p) of nodes p is ready to engage in. The semantics of a negotiation is defined in terms of moves from a configuration to the next one. The initial M and final M configurations, are given by M (p)= 0 f 0 {n } and M (p)= ∅ respectively for every process p ∈ P . A configuration M 0 f enables node n if n ∈ M(p) for every p ∈ P . When n is enabled, a decision at node n can occur, and the participants at this node choose an outcome r ∈ R . The occurrence of (n, r) produces the configuration M given by M (p)= X (n, p, r) for every p ∈ P and M (p)= M(p) for remaining processes in P \ P . n n n,r Moving from M to M after choosing (n, r)iscalleda step, denoted M −−→ M .A run of N is a sequence (n ,r ), (n ,r )...(n ,r ) such that there is a sequence of 1 1 2 2 k k configurations M ,M ,...,M and every (n ,r ) is a step between M and M . 0 1 k i i i−1 i A run starting from the initial configuration and ending in the final configuration is called a final run. By definition, its last step is (n ,r ). f f An important class of negotiations in the context of timed negotiations is acyclic negotiations, where infinite sequence of steps is impossible: Definition 3 (Acyclic negotiations). The graph of a negotiation N is the labeled graph G =(V, E) where V = N, and E = {((n, (p, r),n ) | n ∈ X (n, p, r)}, with pairs of the form (p, r) being the labels. A negotiation is acyclic iff its graph is acyclic. We denote by P aths(G ) the set of paths in the graph of a negotiation. These paths are of form π =(n , (p ,r ),n ) ... (n , (p ,r ),n ). 0 0 0 1 k−1 k k k The Brexit negotiation of Fig.1 is an example of acyclic negotiation. Despite their apparent simplicity, negotiations may express involved behaviors as shown with the Brexit example. Indeed two important questions in this setting are whether there is some way to reach a final node in the negotiation from (i) the initial node and (ii) any reachable node in the negotiation. Definition 4 (Soundness and Reachability). 1. A negotiation is sound iff every run from the initial configuration can be extended to a final run. The problem of soundness is to check if a given negotiation is sound. 2. The problem of reachability asks if a given negotiation has a final run. 42 S. Akshay et al. Notice that the Brexit negotiation of Fig.1 is sound (but not deterministic). It seems hard to preserve the important features of this negotiation while being both sound and deterministic. The problem of soundness has received consider- able attention. We summarize the results about soudness in the next theorem: Theorem 1. Determining whether a negotiation is sound is PSPACE-Complete. For (very-)weakly non-deterministic negotiations, it is co-NP-complete [9]. For acyclic negotiations, it is in DP and co-NP-Hard [6]. Determining whether an acyclic weakly non-deterministic negotiation is sound is in PTIME [3, 9]. Fi- nally, deciding soundness for deterministic negotiations is NLOGSPACE-complete [9]. Checking reachability is NP-complete, even for deterministic acyclic negoti- ations (surprisingly, we did not find this result stated before in the literature): Proposition 1. Reachability is NP-complete for acyclic negotiations, even if the negotiation is deterministic. Proof (sketch). One can guess a run of size ≤|N| in polynomial time, and verify if it reaches n , which gives the inclusion in NP. The hardness part comes from a reduction from 3-CNF-SAT that can be found in the proof of Theorem 3. k-Layered Acyclic Negotiations We introduce a new class of negotiations which has good algorithmic properties, namely k-layered acyclic negotiations, for k fixed. Roughly speaking, nodes of a k-layered acyclic negotiations can be arranged in layers, and these layers contain at most k nodes. Before giving a formal definition, we need to define the depth of nodes in N . First, a path in a negotiation is a sequence of nodes n ...n such that for all i ∈{1,..., − 1}, there exists p ,r with n ∈X (n ,p ,r ). The length of a i i i+1 i i i path n ,...,n is . The depth depth(n) of a node n is the maximal length of a path from n to n (recall that N is acyclic, so this number is always finite). Definition 5. An acyclic negotiation is layered if for all node n, every path reaching n has length depth(n). An acyclic negotiation is k-layered if it is layered, and for all  ∈ N, there are at most k nodes at depth . The Brexit example of Fig. 1 is 6-layered. Notice that a layered negotiation is necessarily k-layered for some k ≤|N| − 2. Note also that we can always transform an acyclic negotiation N into a layered acyclic negotiation N ,by adding dummy nodes: for every node m ∈X (n, p, r) with depth(m) > depth(n)+ 1, we can add several nodes n ,...n with  = depth(m) − (depth(n) + 1), and processes P = {p}. We compute a new relation X such that X (n, p, r)= {n }, X (n ,p,r)= {m} and for every i ∈ 1.. − 1, X (n ,p,r)= n . This 1  i i+1 transformation is polynomial: the resulting negotiation is of size up to |N | × |X | × |P |. The proof of the following Theorem can be found in [1]. Theorem 2. Let k ∈ N . Checking reachability or soundness for a k-layered acyclic negotiation N can be done in PTIME. Timed Negotiations 43 3 Timed Negotiations In many negotiations, time is an important feature to take into account. For instance, in the Brexit example, with an initial node starting at the begining of st September 2019, there are 9 weeks to pass a deal till the 31 October deadline. We extend negotiations by introducing timing constraints on outcomes of nodes, inspired by timed Petri nets [14] and by the notion of negotiations with costs [10]. We use time intervals to specify lower and upper bounds for the duration of negotiations. More precisely, we attach time intervals to pairs (n, r) where n is a node and r an outcome. In the rest of the paper, we denote by I the set of intervals with endpoints that are non-negative integers or ∞.For convenience we only use closed intervals in this paper (except for ∞), but the results we show can also be extended to open intervals with some notational overhead. Intuitively, outcome r can be taken at a node n with associated time interval [a, b] only after a time units have elapsed from the time all processes contributing to n are ready to engage in n, and at most b time units later. Definition 6. A timed negotiation is a pair (N,γ) where N is a negotiation, and γ : N ×R →I associates an interval to each pair (n, r) of node and outcome such that r ∈ R . For a given node n and outcome r, we denote by γ (n, r) (resp. γ (n, r)) the lower bound (resp. the upper bound) of γ(n, r). Example 2. In the Brexit example, we define the following timed constraints γ. We only specify the outcome names, as the timing only depends upon them. Backstop and no-backstop both take between 1 and 2 weeks: γ(backstop) = γ(no-backstop) = [1, 2]. In case of no-court, recess takes 5 weeks γ(recess) = [5, 5], and PM can meet EU immediatly γ(meet) = [0, 0]. In case of court ac- tion, PM needs to spend 2 weeks in court γ(c-meet) = [2, 2], and depending on the court delay and decision, Pa needs between 3 (court overules recess) to 5 (court confirms recess) weeks, γ(defend) = [3, 5]. Agreeing on a deal can take anywhere from 2 weeks to 2 years (104 weeks): γ(deal agreed) = [2, 104]—some would say infinite time is even possible! It needs more time with the backstop, γ(deal w/backstop) = [5, 104]. All other outcomes are assumed to be immediate, i.e., associated with [0, 0]. ≥0 Semantics: A timed valuation is a map μ : P → R that associates a non- negative real value to every process. A timed configuration is a pair (M, μ) where M is a configuration and μ a timed valuation. There is a timed step from (M, μ) (n,r) (n,r) to (M ,μ ), denoted (M, μ) − −−→ (M ,μ ), if (i) M − −−→ M , (ii) p/ ∈ P implies μ (p)= μ(p) (iii) ∃d ∈ γ(n, r) such that ∀p ∈ P ,wehave μ (p)= max  μ(p )+ d (d is the duration of node n). p ∈P (n,r) Intuitively a timed step (M, μ) − −−→ (M ,μ ) depicts a decision taken at node n, and how long each process of P waited in that node before taking decision (n, r). The last process engaged in n must wait for a duration contained in γ(n, r). However, other processes may spend a time greater than γ (n, r). 44 S. Akshay et al. A timed run is a sequence of steps ρ =(M ,μ ) −→ (M ,μ ) ... (M ,μ ) 0 0 1 1 k k where M is the initial configuration, μ (p) = 0 for every p ∈ P , and each 0 0 (M ,μ ) −→ (M ,μ ) is a timed step. It is final if M = M . Its execution i i i+1 i+1 k f time δ(ρ) is defined as δ(ρ)=max μ (p). p∈P k Notice that we only attached timing to processes, not to individual steps. With our definition of runs, timing on steps may not be monotonous (i.e., non- decreasing) along the run, while timing on processes is. Viewed by the lens of concurrent systems, the timing is monotonous on the partial orders of the system rather than the linearization. It is not hard to restrict paths, if necessary, to have a monotonous timing on steps as well. In this paper, we are only interested in execution time, which does not depend on the linearization considered. Given a timed negotiation N , we can now define the minimum and maximum execution time, which correspond to optimistic or pessimistic views: Definition 7. Let N be a timed negotiation. Its minimum execution time, de- noted mintime(N ) is the minimal δ(ρ) over all final timed run ρ of N.We define the maximal execution time maxtime(N ) of N similarly. Given T ∈ N, the main problems we consider in this paper are the following: – The mintime problem, i.e., do we have mintime(N ) ≤ T ?. In other words, does there exist a final timed run ρ with δ(ρ) ≤ T ? – The maxtime problem, i.e., do we have maxtime(N ) ≤ T ?. In other words, does δ(ρ) ≤ T for every final timed run ρ? These questions have a practical interest : in the Brexit example, the question “is there a way to have a vote on a deal within 9 weeks ?” is indeed a minimum execution time problem. We also address the equality variant of these decision problems, i.e., mintime(N)= T : is there a final run of N that terminates in exactly T time units and no other final run takes less than T time units? Similarly for maxtime(N)= T . Example 3. We use Fig. 1 to show that it is not easy to compute the minimal execution time, and in particular one cannot use the algorithm from [10] to com- pute it. Consider the node n with P = {PM, Pa} and R = {court, no court}. n n If the outcome is court, then PM needs 2 weeks before (s)he can talk to EU and Pa needs at least 3 weeks before he can debate. However, if the outcome is no court, then PM need not wait before (s)he can talk to EU, but Pa wastes 5 weeks in recess. This means that one needs to remember different alternatives which could be faster in the end, depending on the future. On the other hand, the algorithm from [10] attaches one minimal time to process Pa, and one min- imal time to process PM. No matter the choices (0 or 2 for PM and 3 or 5 for Pa), there will be futures in which the chosen number will over or underap- proximate the real minimal execution time (this choice is not explicit in [10]) . the authors of [10] acknowledged the issue with their algorithm for mintime. Timed Negotiations 45 For maximum execution time, it is not an issue to attach to each node a unique maximal execution time. The reason for the asymmetry between minimal and maximal execution times of a negotiation is that the execution time of a path is max μ (p), for μ the last timed valuation, which breaks the symmetry p∈P k k between min and max. 4 High level view of the main results In this section, we give a high-level description of our main results. Formal statements can be found in the sections where they are proved. We gather in Fig. 2 the precise complexities for the minimal and the maximal execution time problems for 3 classes of negotiations that we describe in the following. Since we are interested in minimum and maximum execution time, cycles in negotiations can be either bypassed or lead to infinite maximal time. Hence, while we define timed negotiations in general, we always restrict to acyclic negotiations (such as Brexit) while stating and proving results. In [10], a PTIME algorithm is given to compute different costs for negoti- ations that are both sound and deterministic. One limitation of this result is that it cannot compute the minimum execution time, as explained in Example 3. A second limitation is that the class of sound and deterministic negotiations is quite restrictive: it cannot model situations where the next node a process participates in depends on the outcome from another process, as in the Brexit example. We thus consider classes where one of these restrictions is dropped. We first consider (Section 5) negotiations that are deterministic, but with- out the soundness restriction. We show that for this class, no timed problem we consider can be solved in PTIME (unless NP=PTIME). Further, we show that the equality problems (maxtime/mintime(N)= T ), are complete for the complexity class DP, i.e., at the second level of the Boolean Hierarchy [15]. We then consider (Section 6) the class of negotiations that are sound, but not necessarily deterministic. We show that maximum execution time can be solved in PTIME, and propose a new algorithm. However, the minimum execution time cannot be computed in PTIME (unless NP=PTIME). Again for the mintime equality problem we have a matching DP-completeness result. Deterministic Sound k-layered Max ≤ T co-NP-complete (Thm. 3) PTIME (Prop. 3) PTIME (Thm. 6) Max = T DP-complete (Prop. 2) pseudo-PTIME (Thm. 8) Min ≤ T NP-complete (Thm. 3) NP-complete (Thm. 5) NP-complete (Thm. 7) Min = T DP-complete (Prop. 2) DP-complete (Prop. 4) pseudo-PTIME (Thm. 8) Fig. 2. Results for acyclic timed negotiations. DP refers to the complexity class, Dif- ference Polynomial time [15], the second level of the Boolean Hierarchy. hardness holds even for very weakly non-deterministic negotiations, and T in unary. hardness holds even for sound and very weakly non-deterministic negotiations. 46 S. Akshay et al. Finally, in order to obtain a polytime algorithm to compute the minimum execution time, we consider the class of k-layered negotiations (see Section 7): Given k ∈ N, we can show that maxtime(N ) can be computed in PTIME for k-layered negotiations. We also show that while the mintime(N ) ≤ T ? problem is weakly NP-complete for k-layered negotiations, we can compute mintime(N ) in pseudo-PTIME, i.e. in PTIME if constants are given in unary. 5 Deterministic Negotiations We start by considering the class of deterministic acyclic negotiations. We show that both maximal and minimal execution times cannot be computed in PTIME (unless NP=PTIME), as the threshold problems are (co-)NP-complete. Theorem 3. The mintime(N ) ≤ T decision problem is NP complete, and the maxtime(N ) ≤ T decision problem is co-NP-complete for acyclic deterministic timed negotiations. Proof. For mintime(N ) ≤ T , containment in NP is easy: we just need to guess a run ρ (of polynomial size as N is acyclic), consider the associated timed run ρ where all decisions are taken at their earliest possible dates, and check whether δ(ρ ) ≤ T , which can be done in time O(|N |+log T ). For the hardness, we give the proof in two steps. First, we start with a proof of Proposition 1 that reachability problem is NP-hard using reduction of 3-CNF SAT, i.e., given a formula φ, we build a deterministic negotiation N s.t. φ is satisfiable iff N has a final run. In a second step, we introduce timings on this negotiation and show that mintime(N ) ≤ T iff φ is satisfiable. Step 1: Reducing 3-CNF-SAT to Reachability problem. Given a Boolean formula φ with variables v , 1 ≤ i ≤ n and clauses c , 1 ≤ j ≤ i j m, for each variable v we define the sets of clauses S = {c | v is present in c } i i,t j i j and S = {c |¬v is present in c }. Clauses in S and S are naturally i,f j i j i,t i,f ordered: c <c iff i<j. We denote these elements S (1) <S (2) <.... i j i,t i,t Similarly for set S . i,f Now, we construct a negotiation N (as depicted in Figure 3) with a process V for each variable v and a process C for each clause c : i i j j – Initial node n has a single outcome r taking each process C to node Lone , 0 j c and each process V to node Lone . i v – Lone has three outcomes: if literal v ∈ c , then t is an outcome, taking c i j i C to P air , and if literal ¬v ∈ c , then f is an outcome, taking C to j c ,v i j i j j i P air . c ,¬v j i – The outcomes of Lone are true and false. Outcome true brings V to v i node Tlone and outcome false brings V to node F lone . v ,1 i v ,1 i i – Wehaveanode T lone for each j ≤|S | and F lone for each j ≤|S |, v ,j i,t v ,j i,f i i with V as only process. Let c = S (j). Node T lone has two outcomes i r i,t v ,j vton bringing V to T lone (or n if j = |S |), and vtoc bringing V i v ,j+1 f i,t i,r i to P air . The two outcomes from Flone are similar. c ,v v ,j r i i Timed Negotiations 47 V1 Vi Vn C1 j Ck Cm n0 r r r r r r Lone Lonec Lone vi V j Cj C c i k k f t i i 3 5 t t i i 2 4 true false [2, 2] [2, 2] T lone F lone fi v ,1 V V v ,1 i i i i vton vton vton ctof F lone v ,r Vi V j P airc ,¬v i j i vton ctof F lone v ,r+1 Vi ctof vton vton ctof i V C i k vton ctof V V V C j C C 1 i n 1 k m Fig. 3. A part of N where clause c is (i ∨¬i ∨¬i ) and clause c is (i ∨¬i ∨ i ). φ j 2 3 k 4 5 Timing is [0, 0] whereever not mentioned – Node P air has V and C as its processes and one outcome ctof which c ,v i r r i takes process C to final node n and process V to T lone (with c = r f i v ,j+1 r S (j)), or to n if j = |S |.Node P air is defined in the same way i,t f i,t c ,¬v r i from F lone . v ,j With this we claim that N has a final run iff φ is satisfiable which completes the first step of the proof. We give a formal proof of this claim in Appendix A of [1]. Observe that the negotiation N constructed is deterministic and acyclic (but it is not sound). Step 2 : Before we introduce timing on N , we introduce a new outcome r at n which takes all processes to n . Now, the timing function γ associated 0 f with N is: γ(n ,r)=[2, 2] and γ(n ,r )=[3, 3] and γ(n, r)=[0, 0], for all φ 0 0 node n = n and all r ∈ R . Then, mintime(N ) ≤ 2iff φ has a satisfiable 0 n φ assignment: if mintime(N ) ≤ 2, there is a run with decision r taken at n φ 0 which is final. But existence of any such final run implies satisfiability of φ.For vtoci,j vtoc i,k ctof 48 S. Akshay et al. reverse implication, if φ is satisfiable, then the corresponding run for satisfying assignment takes 2 time units, which means that mintime(N ) ≤ 2. Similarly, we can prove that the MaxTime problem is co-NP complete by changing γ(n ,r )=[1, 1] and asking if maxtime(N ) > 1 for the new N . The 0 φ φ answer will be yes iff φ is satisfiable. We now consider the related problem of checking if mintime(N)= T (or if maxtime(N)= T ). These problems are harder than their threshold variant un- der usual complexity assumptions: they are DP-complete (Difference Polynomial time class, i.e., second level of the Boolean Hierarchy, defined as intersection of a problem in NP and one in co-NP [15]). Proposition 2. The mintime(N)= T and maxtime(N)= T decision prob- lems are DP-complete for acyclic deterministic negotiations. Proof. We only give the proof for mintime (the proof for maxtime is given in Appendix A of [1]). Indeed, it is easy to see that this problem is in DP, as it can be written as mintime(N ) ≤ T which is in NP and ¬(mintime(N ) ≤ T − 1)), which is in co-NP. To show hardness, we use the negotiation constructed in the above proof as a gadget, and show a reduction from the SAT-UNSAT problem (a standard DP-complete problem). The SAT-UNSAT Problem asks given two Boolean expressions φ and φ , both in CNF forms with three literals per clause, is it true that φ is satisfiable and φ is unsatisfiable? SAT-UNSAT is known to be DP-complete [15]. We reduce this problem to mintime(N)= T . Given φ, φ , we first make the corresponding negotiations N and N  as in the previous proof. Let n and n be the initial and final nodes of N and 0 f φ n and n be the initial and final nodes of N  . (Similarly, for other nodes we 0 f φ write above the nodes to signify they belong to N  .) In the negotiation N  , we introduce a new node n , in which all the pro- all cesses participate (see Figure 4). The node n has a single outcome r which all all sends all the processes to n . Also, for node n , apart from the outcome r which sends all processes to different nodes, there is another outcome r which sends all all the processes to n . Now we merge the nodes n and n and call the merged all f node n . Also nodes n and n now have all the processes of N and N sep 0 φ f φ participating in them. This merged process gives us a new negotiation N  in φ,φ which the structure above n is same as N while below it is same as N  . sep φ Node n now has all the processes of N and N  participating in it. The sep φ outcomes of n will be same as that of n (r ,r). For both the outcomes of sep all n the processes corresponding to N directly go to n of the N  . Similarly sep φ f φ,φ n of N which is same n of N sends processes corresponding to N di- 0 0 φ, φ,φ φ rectly to n for all its outcomes. We now define timing function γ for N sep φ,φ which is as follows: γ(Lone ,r)=[1, 1] for all v ∈ φ and r ∈{true, false}, γ(n ,r )=[2, 2] and γ(n, r)=[0, 0] for all other outcomes of nodes. With this all all construction, one can conclude that mintime(N )=2iff φ is satisfiable and φ,φ φ is unsatisfiable (see [1] for details). This completes the reduction and hence proves DP-hardness.   Timed Negotiations 49 V1 Vn C1 Cm V V  C C 1 1 n m n0 r r r Structure [0, 0] r r r r of N vton vton ctof ctof V V C C V  C 1 n 1 m V1 C1 n m rall all n all sep r r r r all Structure r, rall r, rall r, rall r, rall V V C C [0, 0] 1 n 1 m [2, 2] [1, 1] of N all r r all r all all all ctof vton ctof vton V V C C V  C 1 n 1 m V C 1 n 1 m Fig. 4. Structure of N φ,φ Finally, we consider a related problem of computing the min and max time. To consider the decision variant, we rephrase this problem as checking whether an arbitrary bit of the minimum execution time is 1. Perhaps surprisingly, we obtain that this problem goes even beyond DP, the second level of the Boolean Hierarchy and is in fact hard for Δ (second level of the polynomial hierarchy), which contains the entire Boolean Hierarchy. Formally, Theorem 4. Given an acyclic deterministic timed negotiation and a positive th integer k,computing the k bit of the maximum/minimum execution time is Δ -complete. Finally, we remark that if we were interested in the optimization variant and not the decision variant of the problem, the above proof can be adapted to show that these variants are OptP-complete (as defined in [13]). But as optimization is not the focus of this paper, we avoid formal details of this proof. 6 Sound Negotiations Sound negotiations are negotiations in which every run can be extended to a final run, as in Fig. 1. In this section, we show that maxtime(N ) can be computed in PTIME for sound negotiations, hence giving PTIME complexi- ties for the maxtime(N ) ≤ T ? and maxtime(N)= T ? questions. However, we 50 S. Akshay et al. show that mintime(N ) ≤ T is NP-complete for sound negotiations, and that mintime(N)= T is DP-complete, even if T is given in unary. Consider the graph G of a negotiation N . Let π =(n , (p ,r ),n ) ··· N 0 0 0 1 (n , (p ,r ),n ) be a path of G . We define the maximal execution time of k k k k+1 N + + a path π as the value δ (π)= γ (n ,r ). We say that a path π = i i i∈0..k (n ,r ) (n , (p ,r ),n ) ··· (n , (p ,r ),n ) is a path of some run ρ =(M ,μ ) −→ 0 0 0 1    +1 1 1 ··· (M ,μ )if r ,...,r is a subword of r ,...,r . k k 0 Lemma 1. Let N be an acyclic and sound timed negotiation. Then maxtime(N ) + + = max δ (π)+ γ (n ,r ). π∈P aths(G ) f f + + Proof. Let us first prove that maxtime(N ) ≥ max δ (π)+γ (n ,r ). π∈P aths(G ) f f Consider any path π of G , ending in some node n. First, as N is sound, we can compute a run ρ such that π is a path of ρ , and ρ ends in a configuration π π π in which n is enabled. We associate with ρ the timed run ρ which asso- ciates to every node the latest possible execution date. We have easily δ(ρ ) ≥ + + + δ (π), and then we obtain max δ(ρ ) ≥ max δ (π). As π∈P aths(G ) π∈P aths(G ) N π N maxtime(N ) is the maximal duration over all runs, it is hence necessarily greater + + than max δ(ρ )+ γ (n ,r ). π∈P aths(G ) f f N π + + We now prove that maxtime(N ) ≤ max δ (π)+γ (n ,r ). Take π∈P aths(G ) f f (n ,r ) 1 1 any timed run ρ =(M ,μ ) −→ · · · (M ,μ )of N with a unique maximal node 1 1 k k n . We show that there exists a path π of ρ such that δ(ρ) ≤ δ (π) by induction on the length k of ρ. The initialization is trivial for k =1.Let k ∈ N. Because n + + is the unique maximal node of ρ, we have δ (ρ) = max μ (p)+γ (n ,r ). p∈P k−1 k k We choose one p maximizing μ (p). Let <k be the maximal index of a k−1 k−1 decision involving process p (i.e. p ∈ P ). Now, consider the timed run k−1 k−1 n ρ subword of ρ, but with n as unique maximal node (that is, it is ρ where nodes n ,i >  has been removed, but also where some nodes n ,i <  have been i i removed if they are not causally before n (in particular, P ∩ P = ∅).) n n + +  + + By definition, we have that δ (ρ)= δ (ρ )+ γ (n ,r )+ γ (n ,r ). We k k apply the induction hypothesis on ρ , and obtain a path π of ρ ending in +  + + n such that δ (ρ )+ γ (n ,r ) ≤ δ (π ). It suffices to consider path π = + + + π .(n , (p ,r ),n ) to prove the inductive step δ (ρ) ≤ δ (π)+ γ (n ,r ). k−1  k k k + + + Thus maxtime(N ) = max δ (ρ) ≤ max δ (π)+ γ (n ,r ). π∈P aths(G ) f f Lemma 1 gives a way to evaluate the maximal execution time. This amounts to finding a path of maximal weight in an acyclic graph, which is a standard PTIME problem that can be solved using standard max-cost calculation. Proposition 3. Computing the maximal execution time for an acyclic sound negotiation N =(N, n ,n , X ) can be done in time O(|N| + |X |). 0 f A direct consequence is that maxtime(N ) ≤ T and maxtime(N)= T prob- lems can be solved in polynomial time when N is sound. Notice that if N is deterministic but not sound, then Lemma 1 does not hold: we only have an inequality. Timed Negotiations 51 We now turn to mintime(N ). We show that it is strictly harder to compute for sound negotiations than maxtime(N ). Theorem 5. mintime(N ) ≤ T is NP-complete in the strong sense for sound acyclic negotiations, even if N is very weakly non-deterministic. Proof (sketch). First, we can decide mintime(N ) ≤ T in NP. Indeed, one can guess a final (untimed) run ρ of size ≤|N|, consider ρ the timed run corre- sponding to ρ where all outcomes are taken at the earliest possible dates, and − − compute in linear time δ(ρ ), and check that δ(ρ ) ≤ T . The hardness part is obtained by reduction from the Bin Packing problem. The reduction is similar to Knapsack, that we will present in Thm. 7. The difference is that we use  bins in parallel, rather than 2 processes, one for the weight and one for the value. The hardness is thus strong, but the negotiation is not k-layered for a bounded k (it is 2 + 1 bounded, with  depending on the input). A detailed proof is given in Appendix B of [1]. We show that mintime(N)= T is harder to decide than mintime(N ) ≤ T , with a proof similar to Prop. 2. Proposition 4. The mintime(N)= T ? decision problem is DP-complete for sound acyclic negotiations, even if it is very weakly non-deterministic. An open question is whether the minimal execution time can be computed in PTIME if the negotiation is both sound and deterministic. The reduction from Bin Packing does not work with deterministic (and sound) negotiations. 7 k-Layered Negotiations In this section, we consider k-layeredness, a syntactic property that can be effi- ciently verified (see Section 2). 7.1 Algorithmic properties Let k be a fixed integer. We first show that the maximum execution time can be computed in PTIME for k-layered negotiations. Let N be the set of nodes at layer i. We define for every layer i the set S of subsets of nodes X ⊆ N which i i can be jointly enabled and such that for every process p, there is exactly one node n(X, p)in X with p ∈ n(X, p). An element X in S is a subset of nodes that can be selected by solving all non-determnism with an appropriate choice of outcomes. Formally, we define S inductively. We start with S = {n }. We then i 0 0 define S from the contents of layer S :wehave Y ∈ S iff P = P i+1 i i+1 n n∈Y and there exist X ∈ S and an outcome r ∈ R for every m ∈ X, such that i m m n ∈X (n(X, p),p,r ) for each n ∈ Y and p ∈ P . m n Theorem 6. Let k ∈ N . Computing the maximum execution time for a k- layered acyclic negotiation N can be done in PTIME. More precisely, the worst- k+1 case time complexity is O(|P|·|N| ). 52 S. Akshay et al. Proof (Sketch). The first step is to compute S layer by layer, by following its inductive definition. The set S is of size at most 2 ,as |N | <k by definition of i i k-layeredness. Knowing S , it is easy to build S by induction. This takes time i i+1 k+1 in O(|P ||N | ) : We need to consider all k-uples of outcomes for each layer. There can be |N | such tuples. We need to do that for all processes (|P |), and for all layers (at most |N |). We then keep for each subset X ∈ S and each node n ∈ X, the maximal time f (n, X) ∈ N associated with n and X. From S and f , we inductively i i+1 i compute f in the following way: for all X ∈ S with successor Y ∈ S i+1 i i+1 for outcomes (r ) , we denote f (Y, n, X) = max f (X, n(X, p)) + p p∈P i+1 p∈P (n) i γ (n(X, p),r ). If there are several choices of (r ) leading to the same Y , p p p∈P we take r with the maximal f (X, n(X, p)) + γ (n(X, p),r ). We then define p i p f (Y, n) = max f (Y, n, X). Again, the initialization is trivial, with i+1 X∈S i+1 f ({n },n ) = 0. The maximal execution time of N is f({n },n ). 0 0 0 f f We can bound the complexity precisely by O(d(N ) · C(N ) ·||R|| ), with: – d(N ) ≤|N| the depth of n , that is the number of layers of N , and ||R|| is the maximum number of outcomes of a node, – C(N ) = max |S |≤ 2 , which we will call the number of contexts of N , and i i which is often much smaller than 2 . ∗  ∗ – k = max |X|≤ k. We say that N is k -thread bounded, meaning X∈ S that there cannot be more that k nodes in the same context X of any layer. Usually, k is strictly smaller than k = max |N |,as N = X. i i i X∈S Consider again the Brexit example Figure 1. We have (k + 1) = 7, while ∗ ∗ we have the depth d(N ) = 6, the negotiation is k = 3-thread bounded (k is bounded by the number of processes), ||R|| = 2, and the number of contexts is at most C(N ) = 4 (EU chooses to enforce backstop or not, and Pa chooses to go to court or not). 7.2 Minimal Execution Time As with sound negotiations, computing minimal time is much harder than com- puting the maximal time for k-layered negotiations: Theorem 7. Let k ≥ 6. The Min ≤ T problem is NP-Complete for k-layered acyclic negotiations, even if the negotiation is sound and very weakly non-deterministic. Proof. One can guess in polynomial time a final run of size ≤|N|. If the exe- cution time of this final run is smaller than T then we have found a final run witnessing mintime(N ) ≤ T . Hence the problem is in NP. Let us now show that the problem is NP-hard. We proceed by reduction from the Knapsack decision problem. Let us consider a set of items U = {u ,...u } 1 n of respective values v ,...v and weight w ,...,w and a knapsack of maximal 1 n 1 n capacity W . The knapsack problem asks, given a value V whether there exists a subset of items U ⊆ U such that v ≥ V and such that w ≤ W . i i u ∈U u ∈U i i Timed Negotiations 53 n = C 0 1 p p p p p p 1 2 n 2n n+2 n+1 yes yes yes yes yes yes no no no no no no p1 p1 p2 pn p2n pn+2 pn+1 pn+1 a b no yes yes 1 1 no c 1 0 C3 p1 p2 p1 p2 p3 pn p2n pn+3 pn+2 pn+1 pn+2 pn+1 p p n 2n no yes no p1 pn p1 pn p2n pn+1 p2n pn+1 b c 0 n n p p p p 1 n 2n n+1 Fig. 5. The negotiation encoding Knapsack We build a negotiation with 2n processes P = {p ,...p }, as shown in 1 2n Fig. 5. Intuitively, p ,i ≤ n will serve to encode the value of selected items as timing, while p ,i > n will serve to encode the weight of selected items as timing. Concerning timing constraints for outcomes we do the following: Outcomes 0, yes and no are associated with [0, 0]. Outcome c is associated with [w ,w ], i i i the weight of u . Last, outcome b is associated with a more complex function, i i (v −v )W v W max i max such that b ≤ W iff v ≥ V . For that, we set [ , ] for i i i i n·v −V n·v −v max max i outcome b , where v is the largest value of an item, and V is the total value i max (v )W v W max max we want to reach at least. Also, we set [ , ] for outcome a .We n·v −V n·v −v max max i set T = W , the maximal weight of the knapsack. Now, consider a final run ρ in N . The only choices in ρ are outcomes yes or no from C ,...,C . Let I be the set of indices such that yes is the outcome from 1 n all C in this path. We obtain δ(ρ) = max( a + b , c ). We have i i i i i/ ∈I i∈I i∈I δ(ρ) ≤ T = W iff w ≤ W , that is the sum of the weights is lower than i∈I (v )W (v −v )W max max i W , and + ≤ W . That is, n · v − v ≤ max i i/ ∈I n·v −V i∈I n·v −V i∈I max max n · v − V , i.e. v ≥ V . Hence, there exists a path ρ with δ(ρ) ≤ T = W max i i∈I iff there exists a set of items of weight less than W and of value more than V . It is well known that Knapsack is weakly NP-hard, that is, it is NP-hard only when weights/values are given in binary. This means that Thm. 7 shows that minimum execution time ≤ T is NP-hard only when T is given in binary. We 54 S. Akshay et al. can actually show that for k-layered negotiations, the mintime(N ) ≤ T problem can be decided in PTIME if T is given in unary (i.e. if T is not too large): Theorem 8. Let k ∈ N. Given a k-layered negotiation N and T written in unary, one can decide in PTIME whether the minimum execution time of N is ≤ T . The worst-case time complexity is O(|N | · |P|· (T ·|N|) ). Proof. We will remember for each layer i aset T of functions τ from nodes N i i of layer i to a value in {1,...,T, ⊥}. Basically, we have τ ∈T if there exists a path ρ reaching X = {n ∈ N | τ(n) = ⊥}, and this path reaches node n ∈ X after τ(n) time units. As for S , for all p, we should have a unique node n(τ, p) such that p ∈ n(τ, p) and τ(n(τ, p)) = ⊥. Again, it is easy to initialize T = {τ }, 0 0 with τ (n )=0, and τ (n)= ⊥ for all n = n . 0 0 0 0 Inductively, we build T in the following way: τ ∈T iff there exists a i+1 i+1 i+1 τ ∈T and r ∈ R for all p ∈ P such that for all n with τ (n) = ⊥,we i i p i+1 n(τ ,p) have τ (n) = max τ (n(τ ,p)) + γ(n(τ ,p),r ). i+1 p i i p We have that the minimum execution time for N is min τ(n ), for n the τ∈T τ depth of n . There are at most T functions τ in any T , and there are at most f i |N | layers to consider, giving the complexity. As with Thm. 6, we can more accurately state the complexity as O(d(N ) · ∗ ∗ k k −1 ∗ C(N )·||R|| · T ). The k − 1 is because we only need to remember minimal functions τ ∈T :if τ (n) ≥ τ(n) for all n, then we do not need to keep τ in T . i i In particular, for the knapsack encoding in the proof of Thm. 7, we have k =3, ||R|| = 2 and C(N ) = 4. Notice that if k is part of the input, then the problem is strongly NP-hard, even if T is given in unary, as e.g. encoding bin packing with  bins result to a 2 + 1-layered negotiations. 8 Conclusion In this paper, we considered timed negotiations. We believe that time is of the essence in negotiations, as examplified by the Brexit negotiation. It is thus im- portant to be able to compute in a tractable way the minimal and maximal execution time of negotiations. We showed that we can compute in PTIME the maximal execution time for acyclic negotiations that are either sound or k-layered, for k fixed. We showed that we cannot compute in PTIME the max- imal execution time for negotiations that are not sound nor k-layered, even if they are deterministic and acyclic (unless NP=PTIME). We also showed that surprisingly, computing the minimal execution time is much harder, with strong NP-hardness results in most of the classes of negotiations, contradicting a claim in [10]. We came up with a new reasonable class of negotiations, namely k-layered negotiations, which enjoys a pseudo PTIME algorithm to compute the minimal execution time. That is, the algorithm is PTIME when the timing constants are given in unary. We showed that this restriction is necessary, as the prob- lem becomes NP-hard for constants given in binary, even when the negotiation is sound and very weakly non-deterministic. The problem to know whether the minimal execution time can be computed in PTIME for deterministic and sound negotiation remains open. Timed Negotiations 55 References 1. S. Akshay, B. Genest, L. H´elou¨et, and S. Mital. Timed Negotiations (extended version). In Research report,, 2020. 2. J. Desel. Reduction and Design of Well-behaved Concurrent Systems. In CONCUR ’90, Theories of Concurrency: Unification and Extension, Amsterdam, The Nether- lands, August 27-30, 1990, Proceedings, volume 458 of Lecture Notes in Computer Science, pages 166–181. Springer, 1990. 3. J. Desel, J. Esparza, and P. Hoffmann. Negotiation as Concurrency Primitive. Acta Inf., 56(2):93–159, 2019. 4. J. Esparza. Decidability and Complexity of Petri Net Problems - An Introduc- tion. In Lectures on Petri Nets I: Basic Models, Advances in Petri Nets, Dagstuhl, September 1996, volume 1491 of Lecture Notes in Computer Science, pages 374– 428. Springer, 1998. 5. J. Esparza and J. Desel. Free Choice Petri Nets. Cambridge University Press, 6. J. Esparza and J. Desel. On Negotiation as Concurrency Primitive. In CON- CUR 2013 - Concurrency Theory - 24th International Conference, CONCUR 2013, Buenos Aires, Argentina, August 27-30, 2013. Proceedings, volume 8052 of Lecture Notes in Computer Science, pages 440–454. Springer, 2013. 7. J. Esparza and J. Desel. On Negotiation as Concurrency Primitive II: Deterministic Cyclic Negotiations. In FOSSACS’14, volume 8412 of Lecture Notes in Computer Science, pages 258–273. Springer, 2014. 8. J. Esparza and P. Hoffmann. Reduction Rules for Colored Workflow Nets. In Fundamental Approaches to Software Engineering - 19th International Confer- ence, FASE 2016, Held as Part of the European Joint Conferences on Theory and Practice of Software, ETAPS 2016, Eindhoven, The Netherlands, April 2-8, 2016, Proceedings, volume 9633 of Lecture Notes in Computer Science, pages 342–358. Springer, 2016. 9. J. Esparza, D. Kuperberg, A. Muscholl, and I. Walukiewicz. Soundness in Negoti- ations. Logical Methods in Computer Science, 14(1), 2018. 10. J. Esparza, A. Muscholl, and I. Walukiewicz. Static Analysis of Deterministic Ne- gotiations. In 32nd Annual ACM/IEEE Symposium on Logic in Computer Science, LICS 2017, Reykjavik, Iceland, June 20-23, 2017, pages 1–12, 2017. 11. S. Haddad. A Reduction Theory for Coloured Nets. In Advances in Petri Nets 1989, volume 424 of Lecture Notes in Computer Science, pages 209–235. Springer, 12. P. Hoffmann. Negotiation Games. In Javier Esparza and Enrico Tronci, editors, Proceedings Sixth International Symposium on Games, Automata, Logics and For- mal Verification, GandALF 2015, Genoa, Italy, 21-22nd September 2015., volume 193 of EPTCS, pages 31–42, 2015. 13. M. W. Krentel. The Complexity of Optimization Problems. Journal of computer and system sciences, 36(3):490–509, 1988. 14. P.M. Merlin. A Study of the Recoverability of Computing Systems. PhD thesis, University of California, Irvine, CA, USA, 1974. 15. C. H. Papadimitriou and M. Yannakakis. The Complexity of Facets (and Some Facets of Complexity). In Proceedings of the Fourteenth Annual ACM Symposium on Theory of Computing, STOC ’82, pages 255–260, New York, NY, USA, 1982. ACM. 56 S. Akshay et al. 16. R.H. Sloan and U.A. Buy. Reduction Rules for Time Petri Nets. Acta Inf., 33(7):687–706, 1996. Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made. The images or other third party material in this chapter are included in the chapter’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. Cartesian Difference Categories 1 2 Mario Alvarez-Picallo and Jean-Simon Pacaud Lemay () Department of Computer Science, University of Oxford, Oxford, UK Department of Computer Science, University of Oxford, Oxford, UK Abstract. Cartesian differential categories are categories equipped with a differential combinator which axiomatizes the directional derivative. Important models of Cartesian differential categories include classical differential calculus of smooth functions and categorical models of the differential λ-calculus. However, Cartesian differential categories cannot account for other interesting notions of differentiation such as the calcu- lus of finite differences or the Boolean differential calculus. On the other hand, change action models have been shown to capture these examples as well as more “exotic” examples of differentiation. However, change action models are very general and do not share the nice properties of a Cartesian differential category. In this paper, we introduce Cartesian difference categories as a bridge between Cartesian differential categories and change action models. We show that every Cartesian differential cat- egory is a Cartesian difference category, and how certain well-behaved change action models are Cartesian difference categories. In particular, Cartesian difference categories model both the differential calculus of smooth functions and the calculus of finite differences. Furthermore, ev- ery Cartesian difference category comes equipped with a tangent bundle monad whose Kleisli category is again a Cartesian difference category. Keywords: Cartesian Difference Categories · Cartesian Differential Cat- egories · Change Actions · Calculus Of Finite Differences · Stream Cal- culus. 1 Introduction In the early 2000s, Ehrhard and Regnier introduced the differential λ-calculus [10], an extension of the λ-calculus equipped with a differential combinator ca- pable of taking the derivative of arbitrary higher-order functions. This develop- ment, based on models of linear logic equipped with a natural notion of “deriva- tive” [11], sparked a wave of research into categorical models of differentiation. One of the most notable developments in the area is the introduction of Cartesian differential categories [4] by Blute, Cockett and Seely, which provide an abstract categorical axiomatization of the directional derivative from differential The second author is financially supported by Kellogg College, the Oxford-Google Deep Mind Graduate Scholarship, and the Clarendon Fund. The Author(s) 2020 J. Goubault-Larrecq and B. K¨ onig (Eds.): FOSSACS 2020, LNCS 12077, pp. 57–76, 2020. 58 M. Alvarez-Picallo and J.-S. P. Lemay calculus. The relevance of Cartesian differential categories lies in their ability to model both “classical” differential calculus (with the canonical example being the category of Euclidean spaces and smooth functions between) and the differential λ-calculus (as every categorical model for it gives rise to a Cartesian differential category [14]). However, while Cartesian differential categories have proven to be an immensely successful formalism, they have, by design, some limitations. Firstly, they cannot account for certain “exotic” notions of derivative, such as the difference operator from the calculus of finite differences [16] or the Boolean differential calculus [19]. This is because the axioms of a Cartesian differential category stipulate that derivatives should be linear in their second argument (in the same way that the directional derivative is), whereas these aforementioned discrete sorts of derivative need not be. Additionally, every Cartesian differential category is equipped with a tangent bundle monad [7, 15] whose Kleisli category can be intuitively understood as a category of generalized vector fields. This Kleisli category has an obvious differentiation operator which comes close to making it a Cartesian differential category, but again fails the requirement of being linear in its second argument. More recently, discrete derivatives have been suggested as a semantic frame- work for understanding incremental computation. This led to the development of change structures [6] and change actions [2]. Change action models have been successfully used to provide a model for incrementalizing Datalog programs [1], but have also been shown to model the calculus of finite differences as well as the Kleisli category of the tangent bundle monad of a Cartesian differential cate- gory. Change action models, however, are very general, lacking many of the nice properties of Cartesian differential categories (for example, addition in a change action model is not required to be commutative), even though they are verified in most change action models. As a consequence of this generality, the tangent bundle endofunctor in a change action model can fail to be a monad. In this work, we introduce Cartesian difference categories (Section 4.2), whose key ingredients are an infinitesimal extension operator and a difference combi- nator, whose axioms are a generalization of the differential combinator axioms of a Cartesian differential category. In Section 4.3, we show that every Cartesian differential category is, in fact, a Cartesian difference category whose infinites- imal extension operator is zero, and conversely how every Cartesian difference category admits a full subcategory which is a Cartesian differential category. In Section 4.4, we show that every Cartesian difference category is a change action model, and conversely how a full subcategory of suitably well-behaved objects of a change action model is a Cartesian difference category. In Section 6, we show that every Cartesian difference category comes equipped with a monad whose Kleisli category again a Cartesian difference category. Finally, in Section 5 we provide some examples of Cartesian difference categories; notably, the calculus of finite differences and the stream calculus. Cartesian Difference Categories 59 2 Cartesian Differential Categories In this section, we briefly review Cartesian differential categories, so that the reader may compare Cartesian differential categories with the new notion of Cartesian difference categories which we introduce in the next section. For a full detailed introduction on Cartesian differential categories, we refer the reader to the original paper [4]. 2.1 Cartesian Left Additive Categories Here we recall the definition of Cartesian left additive categories [4] – where “additive” is meant being skew enriched over commutative monoids, which in particular means that we do not assume the existence of additive inverses, i.e., “negative elements”. By a Cartesian category we mean a category X with chosen finite products where we denote the binary product of objects A and B by A × B with projection maps π : A × B → A and π : A × B → B and pairing 0 1 operation −, −, and the chosen terminal object as  with unique terminal maps ! : A →. Definition 1. A left additive category [4] is a category X such that each hom-set X(A, B) is a commutative monoid with addition operation +: X(A, B)× X(A, B) → X(A, B) and zero element (called the zero map) 0 ∈ X(A, B), such that pre-composition preserves the additive structure: (f + g) ◦ h = f ◦ h + g ◦ h and 0◦f =0. A map k in a left additive category is additive if post-composition by k preserves the additive structure: k ◦ (f + g)= k ◦ f + k ◦ g and k ◦ 0=0. A Cartesian left additive category [4] is a Cartesian category X which is also a left additive category such all projection maps π : A × B → A and π : A × B → B are additive. We note that the definition given here of a Cartesian left additive category is slightly different from the one found in [4], but it is indeed equivalent. By [4, Proposition 1.2.2], an equivalent axiomatization is of a Cartesian left additive category is that of a Cartesian category where every object comes equipped with a commutative monoid structure such that the projection maps are monoid morphisms. This will be important later in Section 4.2. 2.2 Cartesian Differential Categories Definition 2. A Cartesian differential category [4] is a Cartesian left ad- ditive category equipped with a differential combinator D of the form f : A → B D[f]: A × A → B verifying the following coherence conditions: [CD.1] D[f + g]= D[f]+ D[g] and D[0]=0 60 M. Alvarez-Picallo and J.-S. P. Lemay [CD.2] D[f] ◦x, y + z = D[f] ◦x, y + D[f] ◦x, z and D[f] ◦x, 0 =0 [CD.3] D[1 ]= π and D[π ]= π ◦ π and D[π ]= π ◦ π A 1 0 0 1 1 1 1 [CD.4] D[f, g]= D[f], D[g] and D[! ]=! A A×A [CD.5] D[g ◦ f]= D[g] ◦f ◦ π , D[f] [CD.6] D [D[f]] ◦x, y, 0,z = D[f] ◦x, z [CD.7] D [D[f]] ◦x, y, z, 0 = D [D[f]] ◦x, z, y, 0 Note that here, following the more recent work on Cartesian differential cat- egories, we’ve flipped the convention found in [4], so that the linear argument is in the second argument rather than in the first argument. We highlight that by [7, Proposition 4.2], the last two axioms [CD.6] and [CD.7] have an equivalent alternative expression. Lemma 1. In the presence of the other axioms, [CD.6] and [CD.7] are equiv- alent to: [CD.6.a] D [D[f]] ◦x, 0, 0,y = D[f] ◦x, y [CD.7.a] D [D[f]] ◦x, y, z, w = D [D[f]] ◦x, z, y, w As a Cartesian difference category is a generalization of a Cartesian differ- ential category, we leave the discussion of the intuition of these axioms for later in Section 4.2 below. We also refer to [4, Section 4] for a term calculus which may help better understand the axioms of a Cartesian differential category. The canonical example of a Cartesian differential category is the category of real smooth functions, which we will discuss in Section 5.1. Other interesting exam- ples of can be found throughout the literature such as categorical models of the differential λ-calculus [10, 14], the subcategory of differential objects of a tangent category [7], and the coKleisli category of a differential category [3, 4]. 3 Change Action Models Change actions [1, 2] have recently been proposed as a setting for reasoning about higher-order incremental computation, based on a discrete notion of differentia- tion. Together with Cartesian differential categories, they provide the core ideas behind Cartesian difference categories. In this section, we quickly review change actions and change action models, in particular, to highlight where some of the axioms of a Cartesian difference category come from. For more details on change actions, we invite readers to see the original paper [2]. 3.1 Change Actions Definition 3. A change action A in a Cartesian category X is a quintuple A ≡ (A, ΔA, ⊕ , + , 0 ) consisting of two objects A and ΔA, and three maps: A A A ⊕ : A × ΔA → A + : ΔA × ΔA → ΔA 0 : → ΔA A A A such that (ΔA, + , 0 ) is a monoid and ⊕ : A × ΔA → A is an action of ΔA A A A on A, that is, the following equalities hold: ⊕ ◦1 , 0 ◦!  =1 ⊕ ◦(1 × + )= ⊕ ◦ (⊕ × 1 ) A A A A A A A A A A ΔA Cartesian Difference Categories 61 For a change action A and given a pair of maps f : C → A and g : C → ΔA, we define f ⊕ g : C → A as f ⊕ g = ⊕ ◦f, g. Similarly, for maps h : C → ΔA A A and k : C → ΔA, define h + k =+ ◦h, k. Therefore, that ⊕ is an action A A of ΔA on A can be rewritten as: 1 ⊕ 0 =1 1 ⊕ (1 + 1 )=(1 ⊕ 1 ) ⊕ 1 A A A A ΔA ΔA A ΔA ΔA A A A A A The intuition behind the above definition is that the monoid ΔA isatypeof possible “changes” or “updates” that might be applied to A, with the monoid structure on ΔA representing the capability to compose updates. Change actions give rise to a notion of derivative, with a distinctly “discrete” flavour. Given change actions on objects A and B, a map f : A → B can be said to be differentiable when changes to the input (in the sense of elements of ΔA) are mapped to changes to the output (that is, elements of ΔB). In the setting of incremental computation, this is precisely what it means for f to be incrementalizable, with the derivative of f corresponding to an incremental version of f. Definition 4. Let A ≡ (A, ΔA, ⊕ , + , 0 ) and B ≡ (B, ΔB, ⊕ , + , 0 ) be A A A B B B change actions. For a map f : A → B, a map ∂[f]: A × ΔA → ΔB is a derivative of f whenever the following equalities hold: [CAD.1] f ◦ (x ⊕ y)= f ◦ x ⊕ (∂[f] ◦x, y) A B [CAD.2] ∂[f] ◦x, y + z =(∂[f] ◦x, y)+ (∂[f] ◦x ⊕ y, z) and A B A ∂[f] ◦x, 0 ◦!  =0 ◦! B B B A×ΔA The intuition for these axioms will be explained in more detail in Section 4.2 when we explain the axioms of a Cartesian difference category. Note that although there is nothing in the above definition guaranteeing that any given map has at most a single derivative, the chain rule does hold. As a corollary, differentiation is compositional and therefore the change actions in X form a category. Lemma 2. Whenever ∂[f] and ∂[g] are derivatives for composable maps f and g respectively, then ∂[g] ◦f ◦ π , ∂[f] is a derivative for g ◦ f. 3.2 Change Action Models Definition 5. Given a Cartesian category X, define its change actions category CAct(X) as the category whose objects are change actions in X and whose arrows f : A → B are the pairs (f, ∂[f]), where f : A → B is an arrow in X and ∂[f]: A × ΔA → ΔB is a derivative for f. The identity is (1 ,π ), while A 1 composition of (f, ∂[f]) and (g, ∂[g]) is (g ◦ f, ∂[g] ◦f ◦ π , ∂[f]). There is an obvious product-preserving forgetful functor E : CAct(X) → X sending every change action (A, ΔA, ⊕, +, 0) to its base object A and every map (f, ∂[f]) to the underlying map f. As a setting for studying differentiation, the category CAct(X) is rather lacklustre, since there is no notion of higher 62 M. Alvarez-Picallo and J.-S. P. Lemay derivatives, so we will instead work with change action models. Informally, a change action model consists of a rule which for every object A of X associates a change action over it, and for every map a choice of a derivative. Definition 6. A change action model is a Cartesian category X is a product- preserving functor α : X → CAct(X) that is a section of the forgetful functor E. For brevity, when A is an object of a change action model, we will write ΔA, ⊕ ,+ , and 0 to refer to the components of the corresponding change action A A A α(A). Examples of change action models can be found in [2]. In particular, we highlight that a Cartesian differential category always provides a change model action. We will generalize this result, and show in Section 4.4 that a Cartesian difference category also always provides a change action model. 4 Cartesian Difference Categories In this section, we introduce Cartesian difference categories, which are gener- alizations of Cartesian differential categories. Examples of Cartesian difference categories can be found in Section 5. 4.1 Infinitesimal Extensions in Left Additive Categories We first introduce infinitesimal extensions, which is an operator that turns a map into an “infinitesimal” version of itself – in the sense that every map coincides with its Taylor approximation on infinitesimal elements. Definition 7. A Cartesian left additive category X is said to have an infinites- imal extension ε if every homset X(A, B) comes equipped with a monoid mor- phism ε : X(A, B) → X(A, B), that is, ε(f + g)= ε(f)+ ε(g) and ε(0) = 0, and such that ε(g ◦ f)= ε(g) ◦ f and ε(π )= π ◦ ε(1 ) and ε(π )= π ◦ ε(1 ). 0 0 A×B 1 1 A×B Note that since ε(g ◦ f)= ε(g) ◦ f, it follows that ε(f)= ε(1 ) ◦ f and ε(1 ): A → A is an additive map (Definition 1). In light of this, it turns out that infinitesimal extensions can equivalently be described as a class of additive maps ε : A → A such that ε = ε ×ε . The equivalence is given by setting A A×B A B ε(f)= ε ◦ f and ε = ε(1 ). Furthermore, infinitesimal extensions equipped B A A each object with a canonical change action structure: Lemma 3. Let X be a Cartesian left additive category with infinitesimal exten- sion ε. For every object A, define the maps ⊕ : A×A → A as ⊕ = π +ε(π ), A A 0 1 + : A×A → A as π +π , and 0 : → A as 0 =0. Then (A, A, ⊕ , + , 0 ) A 0 1 A A A A A is a change action in X. Proof. As mentioned earlier, that (A, + , 0 ) is a commutative monoid was A A shown in [4]. On the other hand, that ⊕ is a change action follows from the fact that ε preserves the addition.  Cartesian Difference Categories 63 Setting A ≡ (A, A, ⊕ , + , 0 ), we note that f ⊕ g = f +ε(g) and f + g = A A A A A f + g, and so in particular + = +. Therefore, from now on we will omit the subscripts and simply write ⊕ and +. For every Cartesian left additive category, there are always at least two pos- sible infinitesimal extensions: Lemma 4. For any Cartesian left additive category X, 1. Setting ε(f)=0 defines an infinitesimal extension on X and therefore in this case, ⊕ = π and f ⊕ g = f. A 0 2. Setting ε(f)= f defines an infinitesimal extension on X and therefore in this case, ⊕ =+ and f ⊕ g = f + g. A A We note that while these examples of infinitesimal extensions may seem triv- ial, they are both very important as they will give rise to key examples of Carte- sian difference categories. 4.2 Cartesian Difference Categories Definition 8. A Cartesian difference category is a Cartesian left additive category with an infinitesimal extension ε which is equipped with a difference combinator ∂ of the form: f : A → B ∂[f]: A × A → B verifying the following coherence conditions: [C∂.0] f ◦ (x + ε(y)) = f ◦ x + ε (∂[f] ◦x, y) [C∂.1] ∂[f + g]= ∂[f]+ ∂[g], ∂[0] = 0, and ∂[ε(f)] = ε(∂[f]) [C∂.2] ∂[f] ◦x, y + z = ∂[f] ◦x, y + ∂[f] ◦x + ε(y),z and ∂[f] ◦x, 0 =0 [C∂.3] ∂[1 ]= π and ∂[π ]= π ; π and ∂[π ]= π ; π A 1 0 1 0 1 1 0 [C∂.4] ∂[f, g]= ∂[f], ∂[g] and ∂[! ]=! A A×A [C∂.5] ∂[g ◦ f]= ∂[g] ◦f ◦ π , ∂[f] [C∂.6] ∂ [∂[f]] ◦x, y, 0,z = ∂[f] ◦x + ε(y),z [C∂.7] ∂ [∂[f]] ◦x, y, z, 0 = ∂ [∂[f]] ◦x, z, y, 0 Before giving some intuition on the axioms [C∂.0] to [C∂.7], we first observe that one could have used change action notation to express [C∂.0], [C∂.2], and [C∂.6] which would then be written as: [C∂.0] f ◦ (x ⊕ y)=(f ◦ x) ⊕ (∂[f] ◦x, y) [C∂.2] ∂[f] ◦x, y + z = ∂[f] ◦x, y + ∂[f] ◦x ⊕ y, z and ∂[f] ◦x, 0 =0 [C∂.6] ∂ [∂[f]] ◦x, y, 0,z = ∂[f] ◦x ⊕ y, z And also, just like Cartesian differential categories, [C∂.6] and [C∂.7] have alternative equivalent expressions. Lemma 5. In the presence of the other axioms, [C∂.6] and [C∂.7] are equiv- alent to: 64 M. Alvarez-Picallo and J.-S. P. Lemay [C∂.6.a] ∂ [∂[f]] ◦x, 0, 0,y = ∂[f] ◦x, y [C∂.7.a] ∂ [∂[f]] ◦x, y, z, w = ∂ [∂[f]] ◦x, z, y, w Proof. The proof is essentially the same as [7, Proposition 4.2]. The keen eyed reader will notice that the axioms of a Cartesian difference cat- egory are very similar to the axioms of a Cartesian differential category. Indeed, [C∂.1], [C∂.3], [C∂.4], [C∂.5], and [C∂.7] are the same as their Cartesian dif- ferential category counterpart. The axioms which are different are [C∂.2] and [C∂.6] where the infinitesimal extension ε is now included, and also there is the new extra axiom [C∂.0]. On the other hand, interestingly enough, [C∂.6.a] is the same as [CD.6.a]. We also point out that writing out [C∂.0] and [C∂.2] using change action notion, we see that these axioms are precisely [CAD.1] and [CAD.2] respectively. To better understand [C∂.0] to [C∂.7] it may be useful to write them out using element-like notation. In element-like notation, [C∂.0] is written as: f(x + ε(y)) = f(x)+ ε (∂[f](x, y)) This condition can be read as a generalization of the Kock-Lawvere axiom that characterizes the derivative in from synthetic differential geometry [13]. Broadly speaking, the Kock-Lawvere axiom states that, for any map f : R→R and any x ∈R and d ∈D, there exists a unique f (x) ∈R verifying f(x + d)= f(x)+ d · f (x) where D is the subset of R consisting of infinitesimal elements. It is by analogy with the Kock-Lawvere axiom that we refer to ε as an “infinitesimal extension” as it can be thought of as embedding the space A into a subspace ε(A)of infinitesimal elements. [C∂.1] states that the differential of a sum of maps is the sum of differentials, and similarly for zero maps and the infinitesimal extension of a map. [C∂.2] is the first crucial difference between a Cartesian difference category and a Carte- sian differential category. In a Cartesian differential category, the differential of a map is assumed to be additive in its second argument. In a Cartesian differ- ence category, just as derivatives for change actions, while the differential is still required to preserve zeros in its second argument, it is only additive “up to a small perturbation”, that is: ∂[f](x, y + z)= ∂[f](x, y)+ ∂[f](x + ε(y),z) [C∂.3] tells us what the differential of the identity and projection maps are, while [C∂.4] says that the differential of a pairing of maps is the pairing of their differentials. [C∂.5] is the chain rule which expresses what the differential of a composition of maps is: ∂[g ◦ f](x, y)= ∂[g](f(x), ∂[f](x, y)) [C∂.6] and [C∂.7] tell us how to work with second order differentials. [C∂.6] is expressed as follows: ∂ [∂[f]] (x, y, 0,z)= ∂[f](x + ε(y),z) Cartesian Difference Categories 65 and finally [C∂.7] is expressed as: ∂ [∂[f]] (x, y, z, 0) = ∂ [∂[f]] (x, z, y, 0) It is interesting to note that while [C∂.6] is different from [CD.6], its alternative version [C∂.6.a] is the same as [CD.6.a]. ∂ [∂[f]] ((x, 0), (0,y)) = ∂[f](x, z) 4.3 Another look at Cartesian Differential Categories Here we explain how a Cartesian differential category is a Cartesian difference category where the infinitesimal extension is given by zero. Proposition 1. Every Cartesian differential category X with differential com- binator D is a Cartesian difference category where the infinitesimal extension is defined as ε(f)=0 and the difference combinator is defined to be the differential combinator, ∂ = D. Proof. As noted before, the first two parts of the [C∂.1], the second part of [C∂.2], [C∂.3], [C∂.4], [C∂.5], and [C∂.7] are precisely the same as their Cartesian differential axiom counterparts. On the other hand, since ε(f)=0, [C∂.0] and the third part of [C∂.1] trivial state that 0 = 0, while the first part of [C∂.2] and [C∂.6] end up being precisely the first part of [CD.2] and [CD.6]. Therefore, the differential combinator satisfies the Cartesian difference axioms and we conclude that a Cartesian differential category is a Cartesian difference category. Conversely, one can always build a Cartesian differential category from a Cartesian difference category by considering the objects for which the infinites- imal extension is the zero map. Proposition 2. For a Cartesian difference category X with infinitesimal exten- sion ε and difference combinator ∂, then X , the full subcategory of objects A such that ε(1 )=0, is a Cartesian differential category where the differential combinator is defined to be the difference combinator, D = ∂. Proof. First note that if ε(1 )=0 and ε(1 ) = 0, then by definition it also A B follows that ε(1 ) = 0, and also that for the terminal object ε(1 )=0 A×B by uniqueness of maps into the terminal object. Thus X is closed under finite products and is therefore a Cartesian left additive category. Furthermore, we again note that since ε(f) = 0, this implies that for maps between such objects the Cartesian difference axioms are precisely the Cartesian differential axioms. Therefore, the difference combinator is a differential combinator for this subcat- egory, and so X is a Cartesian differential category. 0 66 M. Alvarez-Picallo and J.-S. P. Lemay In any Cartesian difference category X, the terminal object  always satisfies that ε(1 ) = 0, and so therefore, X is never empty. On the other hand, applying Proposition 2 to a Cartesian differential category results in the entire category. It is also important to note that the above two propositions do not imply that if a difference combinator is a differential combinator then the infinitesimal ex- tension must be zero. In Section 5.3, we provide such an example of a Cartesian differential category that comes equipped with a non-zero infinitesimal extension such that the differential combinator is a difference combinator with respect to this non-zero infinitesimal extension. 4.4 Cartesian Difference Categories as Change Action Models In this section, we show how every Cartesian difference category is a particu- larly well-behaved change action model, and conversely how every change action model contains a Cartesian difference category. Proposition 3. Let X be a Cartesian difference category with infinitesimal ex- tension ε and difference combinator ∂. Define the functor α : X → CAct(X) as α(A)=(A, A, ⊕ , + , 0 ) (as defined in Lemma 3) and α(f)=(f, ∂[f]). Then A A A (X,α : X → CAct(X)) is a change action model. Proof. By Lemma 3, (A, A, ⊕ , + , 0 ) is a change action and so α is well- A A A defined on objects. While for a map f, ∂[f] is a derivative of f in the change action sense since [C∂.0] and [C∂.2] are precisely [CAD.1] and [CAD.2], and so α is well-defined on maps. That α preserves identities and composition follows from [C∂.3] and [C∂.5] respectively, and so α is a functor. That α preserves finite products will follow from [C∂.3] and [C∂.4]. Lastly, it is clear that α section of the forgetful functor, and therefore we conclude that (X,α)is a change action model. It is clear that not every change action model is a Cartesian difference cat- egory. For example, change action models do not require the addition to be commutative. On the other hand, it can be shown that every change action model contains a Cartesian difference category as a full subcategory. Definition 9. Let (X,α : X → CAct(X)) be a change action model. An object A is flat whenever the following hold: [F.1] ΔA = A [F.2] α(⊕ )=(⊕ , ⊕ ◦ π ) A A A 1 [F.3] 0 ⊕ (0 ⊕ f)=0 ⊕ f for any f : U → A. A A A [F.4] ⊕ is right-injective, that is, if ⊕ ◦f, g = ⊕ ◦f, h then g = h. A A A We would like to show that for any change action model (X,α), its full sub- category of flat objects, Flat is a Cartesian difference category. Starting with the finite product structure, since α preserves finite products, it is straightfor- ward to see that  is Euclidean and if A and B are flat then so is A × B. The sum of maps f : A → B and g : A → B in Flat is defined using the change action structure f + g, while the zero map 0 : A → B is 0=0 ◦! . And so we B B A obtain that: Cartesian Difference Categories 67 Lemma 6. Flat is a Cartesian left additive category. Proof. Most of the Cartesian left additive structure is straightforward. However, since the addition is not required to be commutative for arbitrary change actions, we will show that the addition is commutative for Euclidean objects. Using that ⊕ is an action, that by [F.2] we have that ⊕ ◦ π is a derivative for ⊕ , and B B 1 B [CAD.1], we obtain that: 0 ⊕ (f + g)=(0 ⊕ f) ⊕ g =(0 ⊕ g) ⊕ f =0 ⊕ (g + f) B B B B B B B B B B B B By [F.4], ⊕ is right-injective and we conclude that f + g = g + f. As an immediate consequence We note that for any change action model (X,α), since the terminal object is always flat, Flat is never empty. We use the action of the change action structure to define the infinitesimal extension. So for a map f : A → B in Flat , define ε(f): A → B as follows: ε(f)= ⊕ ◦0 ◦! ,f =0 ⊕ f B B A B Lemma 7. ε is an infinitesimal extension for Flat . Proof. We show that ε preserve the addition. Following the same idea as in the proof of Lemma 6, we obtain the following: 0 ⊕ ε(f + g)=0 ⊕ (0 ⊕ (f + g)) B B B B B B B B =(0 ⊕ 0 ) ⊕ ((0 ⊕ f) ⊕ g)=(0 ⊕ (0 ⊕ f)) ⊕ (0 ⊕ g) B B B B B B B B B B B B B B =(0 ⊕ ε(f)) ⊕ ε(g)=0 ⊕ (ε(f)+ ε(g)) B B B B B B Then by [F.3], it follows that ε(f +g)= ε(f)+ε(g). The remaining infinitesimal extension axioms are proven in a similar fashion. Lastly, the difference combinator for Flat is defined in the obvious way, that is, ∂[f] is defined as the second component of α(f). Proposition 4. Let (X,α : X → CAct(X)) be a change action model. Then Flat is a Cartesian difference category. Proof (Sketch). The full calculations will appear in an upcoming extended jour- nal version of this paper, but we give an informal explanation. [C∂.0] and [C∂.2] are a straightforward consequences of [CAD.1] and [CAD.2].[C∂.3] and [C∂.4] follow trivially from the fact that α preserves finite products and from the structure of products in CAct(X), while [C∂.5] follows from composition in CAct(X). [C∂.1],[C∂.6] and [C∂.7] are obtained by mechanical calculation in the spirit of Lemma 6. Note that every axiom except for [C∂.6] can be proven without using [F.3]  68 M. Alvarez-Picallo and J.-S. P. Lemay 4.5 Linear Maps and ε-Linear Maps An important subclass of maps in a Cartesian differential category is the subclass of linear maps [4, Definition 2.2.1]. One can also define linear maps in a Cartesian difference category by using the same definition. Definition 10. In a Cartesian difference category, a map f is linear if the following equality holds: ∂[f]= f ◦ π . Using element-like notation, a map f is linear if ∂[f](x, y)= f(y). Linear maps in a Cartesian difference category satisfy many of the same properties found in [4, Lemma 2.2.2]. Lemma 8. In a Cartesian difference category, 1. If f : A → B is linear then ε(f)= f ◦ ε(1 ); 2. If f : A → B is linear, then f is additive (Definition 1); 3. Identity maps, projection maps, and zero maps are linear; 4. The composite, sum, and pairing of linear maps is linear; 5. If f : A → B and k : C → D are linear, then for any map g : B → C, the following equality holds: ∂[k ◦ g ◦ f]= k ◦ ∂[g] ◦ (f × f); 6. If an isomorphism is linear, then its inverse is linear; 7. For any object A, ⊕ and + are linear. A A Using element-like notation, the first point of the above lemma says that if f is linear then f(ε(x)) = ε(f(x)). And while all linear maps are additive, the converse is not necessarily true, see [4, Corollary 2.3.4]. However, an immediate consequence of the above lemma is that the subcategory of linear maps of a Cartesian difference category has finite biproducts. Another interesting subclass of maps is the subclass of ε-linear maps, which are maps whose infinitesimal extension is linear. Definition 11. In a Cartesian difference category, a map f is ε-linear if ε(f) is linear. Lemma 9. In a Cartesian difference category, 1. If f : A → B is ε-linear then f ◦ (x + ε(y)) = f ◦ x + ε(f) ◦ y; 2. Every linear map is ε-linear; 3. The composite, sum, and pairing of ε-linear maps is ε-linear; 4. If an isomorphism is ε-linear, then its inverse is again ε-linear. Using element-like notation, the first point of the above lemma says that if f is ε-linear then f(x + ε(y)) = f(x)+ ε(f(y)). So ε-linear maps are additive on “infinitesimal” elements (i.e. those of the form ε(y)). For a Cartesian differential category, linear maps in the Cartesian difference category sense are precisely the same as the Cartesian differential category sense [4, Definition 2.2.1], while every map is ε-linear since ε =0. Cartesian Difference Categories 69 5 Examples of Cartesian Difference Categories 5.1 Smooth Functions Every Cartesian differential category is a Cartesian difference category where the infinitesimal extension is zero. As a particular example, we consider the category of real smooth functions, which as mentioned above, can be considered to be the canonical (and motivating) example of a Cartesian differential category. Let R be the set of real numbers and let SMOOTH be the category whose n 0 objects are Euclidean spaces R (including the point R = {∗}), and whose n m maps are smooth functions F : R → R . SMOOTH is a Cartesian left additive category where the product structure is given by the standard Cartesian product of Euclidean spaces and where the additive structure is defined by point-wise addition, (F + G)(x)= F (x)+ G(x) and 0(x)=(0,..., 0), where x ∈ R . SMOOTH is a Cartesian differential category where the differential combinator is defined by the directional derivative of smooth functions. Explicitly, for a n m smooth function F : R → R , which is in fact a tuple of smooth functions n n n m F =(f ,...,f ) where f : R → R, D[F]: R × R → R is defined as follows: 1 n i n n ∂f ∂f 1 n D[F](x, y):= (x)y ,..., (x)y i i ∂u ∂u i i i=1 i=1 where x =(x ,...,x ), y =(y ,...,y ) ∈ R . Alternatively, D[F ] can also be 1 n 1 n defined in terms of the Jacobian matrix of F . Therefore SMOOTH is a Carte- sian difference category with infinitesimal extesion ε = 0 and with difference combinator D. Since ε = 0, the induced action is simply x ⊕ y = x. Also a smooth function is linear in the Cartesian difference category sense precisely if it is R-linear in the classical sense, and every smooth function is ε-linear. 5.2 Calculus of Finite Differences Here we explain how the difference operator from the calculus of finite differences gives an example of a Cartesian difference category but not a Cartesian differ- ential category. This example was the main motivating example for developing Cartesian difference categories. The calculus of finite differences is captured by the category of abelian groups and arbitrary set functions between them. Let Ab be the category whose objects are abelian groups G (where we use additive notation for group structure) and where a map f : G → H is simply an arbitrary function between them (and therefore does not necessarily preserve the group structure). Ab is a Cartesian left additive category where the product structure is given by the standard Cartesian product of sets and where the additive structure is again given by point-wise addition, (f +g)(x)= f(x)+g(x) and 0(x)=0. Ab is a Cartesian difference category where the infinitesimal extension is simply given by the identity, that is, ε(f)= f, and and where the difference combinator ∂ is defined as follows for a map f : G → H: ∂[f](x, y)= f(x + y) − f(x) 70 M. Alvarez-Picallo and J.-S. P. Lemay On the other hand, ∂ is not a differential combinator for Ab since it does not satisfy [CD.6] and part of [CD.2]. Thanks to the addition of the infinitesimal extension, ∂ does satisfy [C∂.2] and [C∂.6], as well as [C∂.0]. However, as noted in [5], it is interesting to note that this ∂ does satisfy [CD.1], the second part of [CD.2], [CD.3], [CD.4], [CD.5], [CD.7], and [CD.6.a]. It is worth noting that in [5], the goal was to drop the addition and develop a “non-additive” version of Cartesian differential categories. In Ab, since the infinitesimal operator is given by the identity, the induced action is simply the addition, x⊕ y = x+y. On the other hand, the linear maps in Ab are precisely the group homomorphisms. Indeed, f is linear if ∂[f](x, y)= f(y). But by [C∂.0] and [C∂.2], we get that: f(x + y)= f(x)+ ∂[f](x, y)= f(x)+ f(y) f(0) = ∂[f](x, 0) = 0 So f is a group homomorphism. Conversely, if f is a group homomorphism: ∂[f](x, y)= f(x + y) − f(x)= f(x)+ f(y) − f(x)= f(y) So f is linear. Since ε(f)= f, the ε-linear maps are precisely the linear maps. 5.3 Module Morphisms Here we provide a simple example of a Cartesian difference category whose dif- ference combinator is also a differential combinator, but where the infinitesimal extension is neither zero nor the identity. Let R be a commutative semiring and let MOD be the category of R- modules and R-linear maps between them. MOD has finite biproducts and is therefore a Cartesian left additive category where every map is additive. Every r ∈ R induces an infinitesimal extension ε defined by scalar multiplication, ε (f)(m)= rf(m). Then MOD is a Cartesian difference category with the infinitesimal extension ε for any r ∈ R and difference combinator ∂ defined as: ∂[f](m, n)= f(n) R-linearity of f assures that [C∂.0] holds, while the remaining Cartesian dif- ference axioms hold trivially. In fact, ∂ is also a differential combinator and therefore MOD is also a Cartesian differential category. The induced action is given by m ⊕ n = m + rn. By definition of ∂, every map in MOD is linear, M R and by definition of ε and R-linearity, every map is also ε-linear. 5.4 Stream calculus Here we show how one can extend the calculus of finite differences example to stream calculus. The differential calculus of causal functions and interesting applications have recently been studying in [17, 18]. For a set A, let A denote the set of infinite sequences of elements of A, where we write [a ] for the infinite sequence [a ]=(a ,a ,a ,...) and a for i i 1 2 3 i:j Cartesian Difference Categories 71 ω ω the (finite) subsequence (a ,a ,...,a ). A function f : A → B is causal i i+1 j whenever the n-th element f ([a ]) of the output sequence only depends on the first n elements of [a ], that is, f is causal if and only if whenever a = b i 0:n 0:n then f ([a ]) = f ([b ]) . We now consider streams over abelian groups, so i i 0:n 0:n let Ab be the category whose objects are all the Abelian groups and whose ω ω morphisms are causal maps from G to H . Ab is a Cartesian left-additive category, where the product is given by the standard product of abelian groups and where the additive structure is lifted point-wise from the structure of Ab, that is, (f + g)([a ]) = f ([a ]) + g ([a ]) and 0 ([a ]) = 0. In order to define i i i i n n n n the infinitesimal extension, we first need to define the truncation operator z.So let G be an abelian group and [a ] ∈ G , then define the sequence z([a ]) as: i i z([a ]) =0 z ([a ]) = a i 0 i n+1 n+1 The category Ab is a Cartesian difference category where the infinitesimal ex- tension is given by the truncation operator, ε(f)([a ]) = z (f ([a ])), i i and where the difference combinator ∂ is defined as follows: ∂[f]([a ] , [b ]) = f ([a ]+[b ]) − f ([a ]) i i i i i 0 0 0 ∂[f]([a ] , [b ]) = f ([a ]+ z([b ])) − f ([a ]) i i i i i n+1 n+1 n+1 Note the similarities between the difference combinator on Ab and that on Ab . The induced action is computed out to be: ([a ] ⊕ [b ]) = a ([a ] ⊕ [b ]) = a + b i i 0 0 i i n+1 n+1 n+1 A causal map is linear (in the Cartesian difference category sense) if and only if it is a group homomorphism. While a causal map f is ε-linear if and only if it is a group homomorphism which does not the depend on the 0-th term of its input, that is, f ([a ]) = f (z([a ])). i i 6 Tangent Bundles in Cartesian Difference Categories In this section, we show that the difference combinator of a Cartesian difference category induces a monad, called the tangent monad, whose Kleisli category is again a Cartesian difference category. This construction is a generalization of the tangent monad for Cartesian differential categories [7, 15]. However, the Kleisli category of the tangent monad of a Cartesian differential category is not a Cartesian differential category, but rather a Cartesian difference category. 6.1 The Tangent Bundle Monad Let X be a Cartesian difference category with infinitesimal extension ε and dif- ference combinator ∂. Define the functor T : X → X as follows: T(A)= A × A T(f)= f ◦ π , ∂[f] 0 72 M. Alvarez-Picallo and J.-S. P. Lemay and define the natural transformations η : 1 ⇒ T and μ : T ⇒ T as follows: η := 1 , 0 μ := π ◦ π ,π ◦ π + π ◦ π + ε(π ◦ π ) A A A 0 0 1 0 0 1 1 1 Proposition 5. (T,μ,η) is a monad. Proof. Functoriality of T will follow from [C∂.3] and the chain rule [C∂.5]. Naturality of η and μ and the monad identities will follow from the remain- ing difference combinator axioms. The full lengthy brute force calculations will appear in an upcoming extended journal version of this paper. When X is a Cartesian differential category with the difference structure aris- ing from setting ε = 0, this tangent bundle monad coincides with the standard tangent monad corresponding to its tangent category structure [7, 15]. 6.2 The Kleisli Category of T Recall that the Kleisli category of the monad (T,μ,η) is defined as the category X whose objects are the objects of X, and where a map A → B in X is a map T T f : A → T(B)in X, which would be a pair f = f ,f  where f : A → B. 0 1 j The identity map in X is the monad unit η : A → T(A), while composition T A of Kleisli maps f : A → T(B) and g : B → T(C) is defined as the composite μ ◦T(g)◦f. To distinguish between composition in X and X , we denote Kleisli C T composition as g ◦ f = μ ◦ T(g) ◦ f.If f = f ,f  and g = g ,g , then their C 0 1 0 1 Kleisli composition can be explicitly computed out to be: T T g ◦ f = g ,g ◦ f ,f  = g ◦ f , ∂[g ] ◦f ,f  + g ◦ (f + ε(f )) 0 1 0 1 0 0 0 0 1 1 0 1 Kleisli maps can be understood as “generalized” vector fields. Indeed, T(A) should be thought of as the tangent bundle over A, and therefore a vector field would be a map 1,f : A → T(A), which is of course also a Kleisli map. For more details on the intuition behind this Kleisli category see [7]. We now wish to explain how the Kleisli category is again a Cartesian difference category. We begin by exhibiting the Cartesian left additive structure of the Kleisli category. The product of objects in X is defined as A × B with projections T T T π : A × B → T(A) and π : A × B → T(B) defined respectively as π = π , 0 0 1 0 and π = π , 0. The pairing of Kleisli maps f = f ,f  and g = ,g ,g  is 1 0 1 0 1 defined as f, g = f ,g , f ,g . The terminal object is again  and where 0 0 1 1 the unique map to the terminal object is ! = 0. The sum of Kleisli maps f Kleisli maps f = f ,f  and g = ,g ,g  is defined as f + g = f +g = f +g ,f +g , 0 1 0 1 0 0 1 1 and the zero Kleisli maps is simply 0 =0= 0, 0. Therefore we conclude that the Kleisli category of the tangent monad is a Cartesian left additive category. Lemma 10. X is a Cartesian left additive category. The infinitesimal extension ε for the Kleisli category is defined as follows for a Kleisli map f = f ,f : 0 1 ε (f)= 0,f + ε(f ) 0 1 Cartesian Difference Categories 73 Lemma 11. ε is an infinitesimal extension on X . It is interesting to point out that for an object A the induced action ⊕ can be computed out to be: T T T T ⊕ = π + ε (π )= π , 0 + 0,π  = π ,π  =1 1 0 1 0 1 A 0 T(A) and we stress that this is the identity of T(A) in the base category X (but not in the Kleisli category). To define the difference combinator for the Kleisli category, first note that difference combinators by definition do not change the codomain. That is, if f : A → T(B) is a Kleisli arrow, then the type of its derivative qua Kleisli arrow should be A × A → T(B) × T(B), which coincides with the type of its derivative in X. Therefore, the difference combinator ∂ for the Kleisli category can be defined to be the difference combinator of the base category, that is, for a Kleisli map f = f ,f : 0 1 ∂ [f]= ∂[f]= ∂[f ], ∂[f ] 0 1 Proposition 6. For a Cartesian difference category X, the Kleisli category X is a Cartesian difference category with infinitesimal extension ε and difference combinator ∂ . Proof. The full lengthy brute force calculations will appear in an upcoming ex- tended journal version of this paper. We do note that a crucial identity for this proof is that for any map f in X, the following equality holds: T(∂[f]) = ∂ [T(f)] ◦π × π ,π × π 0 0 1 1 This helps simplify many of the calculations for the difference combinator axioms since T(∂[f]) appears everywhere due to the definition of Kleisli composition. As a result, the Kleisli category of a Cartesian difference category is again a Cartesian difference category, whose infinitesimal extension is neither the iden- tity or the zero map. This allows one to build numerous examples of interesting and exotic Cartesian difference categories, such as the Kleisli category of Carte- sian differential categories (or iterating this process, taking the Kleisli category of the Kleisli category). We highlight the importance of this construction in the Cartesian differential case as it does not in general result in a Cartesian differ- ential category. Indeed, even if ε = 0, it is always the case that ε =0. We conclude this section by taking a look at the linear maps and the ε -linear maps in the Kleisli category. A Kleisli map f = f ,f  is linear in the Kleisli category 0 1 T T if ∂ [f]= f ◦ π , which amounts to requiring that: ∂[f ], ∂[f ] = f ◦ π ,f ◦ π 0 1 0 1 1 1 Therefore a Kleisli map is linear in the Kleisli category if and only if it is the pairing of maps which are linear in the base category. On the other hand, f is T T ε -linear if ε (f)= 0,f + ε(f ) is linear in the Kleisli category, which in this 0 1 case amounts to requiring that f + ε(f ) is linear. Therefore, if f is linear and 0 1 0 f is ε-linear, then f is ε -linear. 1 74 M. Alvarez-Picallo and J.-S. P. Lemay 7 Conclusions and Future Work We have presented Cartesian difference categories, which generalize Cartesian differential categories to account for more discrete definitions of derivatives while providing an additional structure that is absent in change action models. We have also exhibited important examples and shown that Cartesian difference cate- gories arise quite naturally from considering tangent bundles in any Cartesian differential category. We claim that Cartesian difference categories can facilitate the exploration of differentiation in discrete spaces, by generalizing techniques and ideas from the study of their differential counterparts. For example, Carte- sian differential categories can be extended to allow objects whose tangent space is not necessarily isomorphic to the object itself [9]. The same generalization could be applied to Cartesian difference categories – with some caveats: for ex- ample, the equation defining a linear map (Definition 10) becomes ill-typed, but the notion of ε-linear map remains meaningful. Another relevant path to consider is developing the analogue of the “tensor” story for Cartesian difference categories. Indeed, an important source of exam- ples of Cartesian differential categories are the coKleisli categories of a tensor differential category [3, 4]. A similar result likely holds for a hypothetical “ten- sor difference category”, but it is not clear how these should be defined: [C∂.2] implies that derivatives in the difference sense are non-linear and therefore their interplay with the tensor structure will be much different. A further generalization of Cartesian differential categories, categories with tangent structure [7] are defined directly in terms of a tangent bundle functor rather than requiring that every tangent bundle be trivial (that is, in a tangent category it may not be the case that TA = A × A). Some preliminary research on change actions has already shown that, when generalized in this way, change actions are precisely internal categories, but the consequences of this for change action models (and, a fortiori, Cartesian difference categories) are not under- stood. More recently, some work has emerged about differential equations using the language of tangent categories [8]. We believe similar techniques can be ap- plied in a straightforward way to Cartesian difference categories, where they might be of use to give an abstract formalization of discrete dynamical systems and difference equations. An important open question is whether Cartesian difference categories (or a similar notion) admit an internal language. It is well-known that the differen- tial λ-calculus can be interpreted in Cartesian closed differential categories [14]. Given their similarities, we believe there will be a very similar “difference λ- calculus” which could potentially have applications to automatic differentiation (change structures, a notion similar to change actions, have already been pro- posed as models of forward-mode automatic differentiation [12], although work on the area seems to have stagnated). Lastly, we should mention that there are adjunctions between the categories of Cartesian difference categories, change action models, and Cartesian differ- ential categories given by Proposition 1, 2, 3, and 4. These adjunctions will be explored in detail in the upcoming journal version of this paper. Cartesian Difference Categories 75 References 1. Alvarez-Picallo, M., Eyers-Taylor, A., Jones, M.P., Ong, C.H.L.: Fixing incremental computation. In: European Symposium on Programming. pp. 525–552. Springer (2019) 2. Alvarez-Picallo, M., Ong, C.H.L.: Change actions: models of generalised differ- entiation. In: International Conference on Foundations of Software Science and Computation Structures. pp. 45–61. Springer (2019) 3. Blute, R.F., Cockett, J.R.B., Seely, R.A.G.: Differential categories. Mathematical structures in computer science 16(06), 1049–1083 (2006) 4. Blute, R.F., Cockett, J.R.B., Seely, R.A.G.: Cartesian differential categories. The- ory and Applications of Categories 22(23), 622–672 (2009) 5. Bradet-Legris, J., Reid, H.: Differential forms in non-linear cartesian differential categories (2018), Foundational Methods in Computer Science 6. Cai, Y., Giarrusso, P.G., Rendel, T., Ostermann, K.: A theory of changes for higher- order languages: Incrementalizing λ-calculi by static differentiation. In: ACM SIG- PLAN Notices. vol. 49, pp. 145–155. ACM (2014) 7. Cockett, J.R.B., Cruttwell, G.S.H.: Differential structure, tangent structure, and sdg. Applied Categorical Structures 22(2), 331–417 (2014) 8. Cockett, J., Cruttwell, G.: Connections in tangent categories. Theory and Appli- cations of Categories 32(26), 835–888 (2017) 9. Cruttwell, G.S.: Cartesian differential categories revisited. Mathematical Struc- tures in Computer Science 27(1), 70–91 (2017) 10. Ehrhard, T., Regnier, L.: The differential lambda-calculus. Theoretical Computer Science 309(1), 1–41 (2003) 11. Ehrhard, T.: An introduction to differential linear logic: proof-nets, models and antiderivatives. Mathematical Structures in Computer Science 28(7), 995–1060 (2018) 12. Kelly, R., Pearlmutter, B.A., Siskind, J.M.: Evolving the incremental {\lambda} calculus into a model of forward automatic differentiation (ad). arXiv preprint arXiv:1611.03429 (2016) 13. Kock, A.: Synthetic differential geometry, vol. 333. Cambridge University Press (2006) 14. Manzonetto, G.: What is a categorical model of the differential and the resource λ-calculi? Mathematical Structures in Computer Science 22(3), 451–520 (2012) 15. Manzyuk, O.: Tangent bundles in differential lambda-categories. arXiv preprint arXiv:1202.0411 (2012) 16. Richardson, C.H.: An introduction to the calculus of finite differences. Van Nos- trand (1954) 17. Sprunger, D., Jacobs, B.: The differential calculus of causal functions. arXiv preprint arXiv:1904.10611 (2019) 18. Sprunger, D., Katsumata, S.y.: Differentiable causal computations via delayed trace. In: 2019 34th Annual ACM/IEEE Symposium on Logic in Computer Science (LICS). pp. 1–12. IEEE (2019) 19. Steinbach, B., Posthoff, C.: Boolean differential calculus. In: Logic Functions and Equations, pp. 75–103. Springer (2009) 76 M. Alvarez-Picallo and J.-S. P. Lemay Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made. The images or other third party material in this chapter are included in the chapter’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. Contextual Equivalence for Signal Flow Graphs 1 2 3 Filippo Bonchi , Robin Piedeleu ,Pawel Soboci´ nski , and Fabio Zanasi () Universit` a di Pisa, Italy University College London, UK, {r.piedeleu, f.zanasi} Tallinn University of Technology, Estonia Abstract. We extend the signal flow calculus—a compositional account of the classical signal flow graph model of computation—to encompass affine behaviour, and furnish it with a novel operational semantics. The increased expressive power allows us to define a canonical notion of con- textual equivalence, which we show to coincide with denotational equal- ity. Finally, we characterise the realisable fragment of the calculus: those terms that express the computations of (affine) signal flow graphs. Keywords: signal flow graphs · affine relations · full abstraction · con- textual equivalence · string diagrams 1 Introduction Compositional accounts of models of computation often lead one to consider relational models because a decomposition of an input-output system might consist of internal parts where flow and causality are not always easy to assign. These insights led Willems [33] to introduce a new current of control theory, called behavioural control: roughly speaking, behaviours and observations are of prime concern, notions such as state, inputs or outputs are secondary. Indepen- dently, programming language theory converged on similar ideas, with contextual equivalence [25,28] often considered as the equivalence: programs are judged to be different if we can find some context in which one behaves differently from the other, and what is observed about “behaviour” is often something quite canonical and simple, such as termination. Hoare [17] and Milner [23] discovered that these programming language theory innovations also bore fruit in the non- deterministic context of concurrency. Here again, research converged on studying simple and canonical contextual equivalences [24,18]. This paper brings together all of the above threads. The model of computa- tion of interest for us is that of signal flow graphs [32,21], which are feedback systems well known in control theory [21] and widely used in the modelling of linear dynamical systems (in continuous time) and signal processing circuits (in Supported by EPSRC grant EP/R020604/1. Supported by the ESF funded Estonian IT Academy research measure (project 2014- 2020.4.05.19-0001) The Author(s) 2020 J. Goubault-Larrecq and B. K¨ onig (Eds.): FOSSACS 2020, LNCS 12077, pp. 77–96, 2020. 78 F. Bonchi et al. discrete time). The signal flow calculus [10,9] is a syntactic presentation with an underlying compositional denotational semantics in terms of linear relations. Armed with string diagrams [31] as a syntax, the tools and concepts of program- ming language theory and concurrency theory can be put to work and the cal- culus can be equipped with a structural operational semantics. However, while in previous work [9] a connection was made between operational equivalence (essentially trace equivalence) and denotational equality, the signal flow calculus was not quite expressive enough for contextual equivalence to be a useful notion. The crucial step turns out to be moving from linear relations to affine rela- tions, i.e. linear subspaces translated by a vector. In recent work [6], we showed that they can be used to study important physical phenomena, such as current and voltage sources in electrical engineering, as well as fundamental synchroni- sation primitives in concurrency, such as mutual exclusion. Here we show that, in addition to yielding compelling mathematical domains, affinity proves to be the magic ingredient that ties the different components of the story of signal flow graphs together: it provides us with a canonical and simple notion of observation to use for the definition of contextual equivalence, and gives us the expressive power to prove a bona fide full abstraction result that relates contextual equiv- alence with denotational equality. To obtain the above result, we extend the signal flow calculus to handle affine behaviour. While the denotational semantics and axiomatic theory appeared in [6], the operational account appears here for the first time and requires some technical innovations: instead of traces, we consider trajectories, which are infi- nite traces that may start in the past. To record the time, states of our transition system have a runtime environment that keeps track of the global clock. Because the affine signal flow calculus is oblivious to flow directionality, some terms exhibit pathological operational behaviour. We illustrate these phenomena with several examples. Nevertheless, for the linear sub-calculus, it is known [9] that every term is denotationally equal to an executable realisation: one that is in a form where a consistent flow can be identified, like the classical notion of signal flow graph. We show that the question has a more subtle answer in the affine extension: not all terms are realisable as (affine) signal flow graphs. However, we are able to characterise the class of diagrams for which this is true. Related work. Several authors studied signal flow graphs by exploiting concepts and techniques of programming language semantics, see e.g. [4,22,29,2]. The most relevant for this paper is [2], which, independently from [10], proposed the same syntax and axiomatisation for the ordinary signal flow calculus and shares with our contribution the same methodology: the use of string diagrams as a math- ematical playground for the compositional study of different sorts of systems. The idea is common to diverse, cross-disciplinary research programmes, includ- ing Categorical Quantum Mechanics [1,11,12], Categorical Network Theory [3], Monoidal Computer [26,27] and the analysis of (a)synchronous circuits [14,15]. Outline In Section 2 we recall the affine signal flow calculus. Section 3 introduces the operational semantics for the calculus. Section 4 defines contextual equiv- alence and proves full abstraction. Section 5 introduces a well-behaved class of Contextual Equivalence for Signal Flow Graphs 79 circuits, that denotes functional input-output systems, laying the groundwork for Section 6, in which the concept of realisability is introduced before a charac- terisation of which circuit diagrams are realisable. Missing proofs can be found in the extended version of this paper [7]. 2 Background: the Affine Signal Flow Calculus The Affine Signal Flow Calculus extends the signal flow calculus [9] with an extra generator that allows to express affine relations. In this section, we first recall its syntax and denotational semantics from [6] and then we highlight two key properties for proving full abstraction that are enabled by the affine extension. The operational semantics is delayed to the next section. :(1, 2) k :(1, 1) x :(1, 1) :(2, 1) :(1, 0) :(0, 1) :(0, 1) :(2, 1) k :(1, 1) x :(1, 1) :(1, 2) :(0, 1) :(1, 0) :(1, 0) c :(n, z) d :(z, m) c :(n, m) d :(r, z) :(0, 0) :(2, 2) :(1, 1) c ; d :(n, m) c⊕d :(n+r, m+z) Fig. 1. Sort inference rules. 2.1 Syntax c :: = | | k | | | | | (1) | | k | | | | | (2) | | | c ⊕ c | c ; c (3) The syntax of the calculus, generated by the grammar above, is parametrised over a given field k, with k ranging over k. We refer to the constants in rows (1)- (2) as generators. Terms are constructed from generators, , , , and the two binary operations in (3). We will only consider those terms that are sortable, i.e. they can be associated with a pair (n, m), with n, m ∈ N. Sortable terms are called circuits: intuitively, a circuit with sort (n, m) has n ports on the left and m on the right. The sorting discipline is given in Fig. 1. We delay discussion of computational intuitions to Section 3 but, for the time being, we observe that the generators of row (2) are those of row (1) “reflected about the y-axis”. 2.2 String Diagrams It is convenient to consider circuits as the arrows of a symmetric monoidal cat- egory ACirc (for Affine Circuits). Objects of ACirc are natural numbers (thus 80 F. Bonchi et al. ACirc is a prop [19]) and morphisms n → m are the circuits of sort (n, m), quotiented by the laws of symmetric monoidal categories [20,31] . The circuit grammar yields the symmetric monoidal structure of ACirc: sequential composi- tion is given by c ; d, the monoidal product is given by c ⊕ d, and identities and symmetries are built by pasting together and in the obvious way. We will adopt the usual convention of writing morphisms of ACirc as string diagrams, . . . . . . . .  . . . . c c meaning that c ; c is drawn . . . and c ⊕ c is drawn . More suc- .  . . . . c . cinctly, ACirc is the free prop on generators (1)-(2). The free prop on (1)-(2) sans and , hereafter called Circ, is the signal flow calculus from [9]. Example 1. The diagram represents the circuit (( ; )⊕ );( ⊕( ; )) ; ((( ⊕ )⊕ );(( ; )⊕ )). 2.3 Denotational Semantics and Axiomatisation The semantics of circuits can be given denotationally by means of affine relations. d d Definition 1. Let k be a field. An affine subspace of k is a subset V ⊆ k that is either empty or for which there exists a vector a ∈ k and a linear subspace L of k such that V = {a + v | v ∈ L}.A k-affine relation of type n → m is an n m affine subspace of k × k , considered as a k-vector space. Note that every linear subspace is affine, taking a above to be the zero vector. Affine relations can be organised into a prop: Definition 2. Let k be a field. Let ARel be the following prop: – arrows n → m are k-affine relations. n m – composition is relational: given G = {(u, v) | u ∈ k ,v ∈ k } and H = m l {(v, w) | v ∈ k ,w ∈ k }, their composition is G ; H := {(u, w) |∃v.(u, v) ∈ G ∧ (v, w) ∈ H}. u v – monoidal product given by G⊕H = , | (u, v) ∈ G, (u ,v ) ∈ H . u v In order to give semantics to ACirc, we use the prop of affine relations over the field k(x) of fractions of polynomials in x with coefficients from k. Elements 1 2 n k +k ·x +k ·x +···+k ·x 0 1 2 n q ∈ k(x) are a fractions for some n, m ∈ N and k ,l ∈ k. 1 2 m i i l +l ·x +l ·x +···+l ·l 0 1 2 m Sum, product, 0 and 1 in k(x) are defined as usual. This quotient is harmless: both the denotational semantics from [6] and the opera- tional semantics we introduce in this paper satisfy those axioms on the nose. Contextual Equivalence for Signal Flow Graphs 81 Definition 3. The prop morphism [[·]] : ACirc → ARel is inductively defined k(x) on circuits as follows. For the generators in (1) p p −→ p, | p ∈ k(x) −→ ,p + q | p, q ∈ k(x) p q −→ {(p, •) | p ∈ k(x)} −→ {(•, 0)} −→ {(•, 1)} r −→ {(p, p · r) | p ∈ k(x)} x −→ {(p, p · x) | p ∈ k(x)} where • is the only element of k(x) . The semantics of components in (2) is symmetric, e.g. is mapped to {(p, •) | p ∈ k(x)}.For (3) p q −→ {(p, p) | p ∈ k(x)} −→ , | p, q ∈ k(x) q p −→ {(•, •)} c ⊕ c −→ [[c ]] ⊕ [[c ]] c ; c −→ [[c ]];[[c ]] 1 2 1 2 1 2 1 2 The reader can easily check that the pair of 1-dimensional vectors 1, ∈ 1−x 1 1 k(x) × k(x) belongs to the denotation of the circuit in Example 1. The denotational semantics enjoys a sound and complete axiomatisation. The axioms involve only basic interactions between the generators (1)-(2). The resulting theory is that of Affine Interacting Hopf Algebras (aIH).The generators in (1) form a Hopf algebra, those in (2) form another Hopf algebra, and the interaction of the two give rise to two Frobenius algebras. We refer the reader to [6] for the full set of equations and all further details. aIH Proposition 1. For all c, d in ACirc, [[c]]=[[d]] if and only if c = d. 2.4 Affine vs Linear Circuits It is important to highlight the differences between ACirc and Circ. The latter is the purely linear fragment: circuit diagrams of Circ denote exactly the linear relations over k(x) [8], while those of ACirc denote the affine relations over k(x). The additional expressivity afforded by affine circuits is essential for our development. One crucial property is that every polynomial fraction can be expressed as an affine circuit of sort (0, 1). Lemma 1. For all p ∈ k(x), there is c ∈ ACirc[0, 1] with [[c ]] = {(•,p)}. p p Proof. For each p ∈ k(x), let P be the linear subspace generated by the pair of 1-dimensional vectors (1,p). By fullness of the denotational semantics of Circ [8], there exists a circuit c in Circ such that [[c]] = P . Then, [[ ; c]] = {(•,p)}. The above observation yields the following: n m Proposition 2. Let (u, v) ∈ k(x) × k(x) . There exist circuits c ∈ ACirc[0,n] and c ∈ ACirc[m, 0] such that [[c ]] = {(•,u)} and [[c ]] = {(v, •)}. v u v 82 F. Bonchi et al. p q 1 1 . . Proof. Let u = . and v = . . By Lemma 1, for each p , there exists a . . p q n m circuit c such that [[c ]] = {(•,p )}. Let c = c ⊕ ... ⊕ c . Then [[c ]] = p p i u p p u i i 1 n {(•,u)}.For c , it is enough to see that Proposition 1 also holds with 0 and 1 switched, then use the argument above. Proposition 2 asserts that any behaviour (u, v) occurring in the denotation of some circuit c, i.e., such that (u, v) ∈ [[c]], can be expressed by a pair of circuits (c ,c ). We will, in due course, think of such a pair as a context, namely an u v environment with which a circuit can interact. Observe that this is not possible with the linear fragment Circ, since the only singleton linear subspace is 0. Another difference between linear and affine concerns circuits of sort (0, 0). 0 0 0 Indeed k(x) = {•}, and the only linear relation over k(x) ×k(x) is the singleton {(•, •)}, which is id in ARel . But there is another affine relation, namely the 0 k(x) 0 0 empty relation ∅∈ k(x) × k(x) . This can be represented by , for instance, since [[ ]] = {(•, 1)} ; {(0, •)} = ∅. Proposition 3. Let c ∈ ACirc[0, 0]. Then [[c]] is either id or ∅. 3 Operational Semantics for Affine Circuits Here we give the structural operational semantics of affine circuits, building on previous work [9] that considered only the core linear fragment, Circ. We consider circuits to be programs that have an observable behaviour. Observations are possible interactions at the circuit’s interface. Since there are two interfaces: a left and a right, each transition has two labels. In a transition tc −→ t c , c and c are states, that is, circuits augmented with information about which values k ∈ k are stored in each regis- x x ter ( and ) at that instant of the computation. When transitioning to c , the v above the arrow is a vector of values with which c synchronises on the left, and the w below the arrow accounts for the synchronisation on the right. States are decorated with runtime contexts: t and t are (possibly negative) inte- gers that—intuitively—indicate the time when the transition happens. Indeed, in Fig. 2, every rule advances time by 1 unit. “Negative time” is important: as we shall see in Example 3, some executions must start in the past. The rules in the top section of Fig. 2 provide the semantics for the generators in (1): is a copier, duplicating the signal arriving on the left; accepts any signal on the left and discards it, producing nothing on the right; is an adder that takes two signals on the left and emits their sum on the right, emits the constant 0 signal on the right; k is an amplifier, multiplying the signal on the left by the scalar k ∈ k. All the generators described so far are stateless. State is provided by which is a register; a synchronous one place buffer with the value l stored. When it receives some value k on the left, it emits l on the right and stores k. The behaviour of the affine generator Contextual Equivalence for Signal Flow Graphs 83 k k t −−→ t +1  t −→ t +1 kk • kl • t −−→ t +1  t −→ t +1 k+l 0 l k k l t x −→ t +1  x t r −−→ t +1  r l rl • • 0  −→ 1  t −→ t +1  (t =0) 1 0 kk • t −−→ t +1  t −→ t +1 k k k+l 0 t − −−→ t +1  t −→ t +1 kl • l k l rl t x − → t +1  x t r − − → t +1  r k l 1 0 0  −→ 1  t −→ t +1  (t =0) • • k kl • t −→ t +1  t −−→ t +1  t −→ t +1 k lk u v tc −→ t +1 c td −→ t +1 d v w tc ; d −→ t +1 c ; d u u 1  2 tc −−→ t +1 c td −−→ t +1 d v v 1 2 u u 1 2 tc ⊕ d − −−−→ t +1 c ⊕ d v v 1 2 Fig. 2. Structural rules for operational semantics, with p ∈ Z, k, l ranging over k and u, v, w vectors of elements of k of the appropriate size. The only vector of k is written T n as • (as in Definition 3), while a vector (k ... k ) ∈ k as k ...k . 1 n 1 n depends on the time: when t = 0, it emits 1, otherwise it emits 0. Observe that the behaviour of all other generators is time-independent. So far, we described the behaviour of the components in (1) using the in- tuition that signal flows from left to right: in a transition −→ , the signal v on the left is thought as trigger and w as effect. For the generators in (2), whose behaviour is defined by the rules in the second section of Fig. 2, the behaviour is symmetric—indeed, here it is helpful to think of signals as flowing from right to left. The next section of Fig. 2 specifies the behaviours of the structural con- nectors of (3): is a twist, swapping two signals, is the empty circuit and is the identity wire: the signals on the left and on the right ports are equal. Finally, the rule for sequential ; composition forces the two components to have the same value v on the shared interface, while for parallel ⊕ composition, 84 F. Bonchi et al. components can proceed independently. Observe that both forms of composition require component transitions to happen at the same time. Definition 4. Let c ∈ ACirc. The initial state c of c is the one where all the registers store 0.A computation of c starting at time t ≤ 0 is a (possibly infinite) sequence of transitions v v t+1 t+2 tc −−→ t +1 c − −−→ t +2 c − −−→ ... (4) 0 1 2 w w w t t+1 t+2 Since all transitions increment the time by 1, it suffices to record the time at which a computation starts. As a result, to simplify notation, we will omit the runtime context after the first transition and, instead of (4), write v v v t+1 t+2 t c −−→ c − −−→ c − −−→ ... 0 1 2 w w w t t+1 t+2 Example 2. The circuit in Example 1 can perform the following computation. 0 1 1 1 0 0 x x x 0  −→ −→ −→ ··· 1 1 1 In the example above, the flow has a clear left-to-right orientation, albeit with a feedback loop. For arbitrary circuits of ACirc this is not always the case, which sometimes results in unexpected operational behaviour. Example 3. In x is not possible to identify a consistent flow: goes from left to right, while from right to left. Observe that there is no computation starting at t = 0, since in the initial state the register contains 0 while must emit 1. There is, however, a (unique!) computation starting at time t = −1, that loads the register with 1 before can also emit 1 at time t =0. 0 1 0 0 • • • • x x x x −1  −→ −→ −→ −→ ... 1 0 0 0 x x Similarly, features a unique computation starting at time t = −2. 00 01 10 00 • • • • −2  x x −→ x x −→ x x −→ x x −→ ... 1 0 0 0 It is worthwhile clarifying the reason why, in the affine calculus, some compu- tations start in the past. As we have already mentioned, in the linear fragment the semantics of all generators is time-independent. It follows easily that time- independence is a property enjoyed by all purely linear circuits. The behaviour of , however, enforces a particular action to occur at time 0. Considering this in conjunction with a right-to-left register results in x , and the effect is to anticipate that action by one step to time -1, as shown in Example 3. It is obvi- ous that this construction can be iterated, and it follows that the presence of a single time-dependent generator results in a calculus in which the computation of some terms must start at a finite, but unbounded time in the past. Contextual Equivalence for Signal Flow Graphs 85 Example 4. Another circuit with conflicting flow is . Here there is no possible transition at t = 0, since at that time must emit a 1 and can only synchro- nise on a 0. Instead, the circuit can always perform an infinite computation • • t −→ −→ ... , for any t ≤ 0. Roughly speaking, the computations of • • these two (0, 0) circuits are operational mirror images of the two possible denota- tions of Proposition 3. This intuition will be made formal in Section 4. For now, it is worth observing that for all c, ⊕ c can perform the same computations of c, while ⊕ c cannot ever make a transition at time 0. x x Example 5. Consider the circuit , which again features conflicting flow. Our equational theory equates it with , but the computations involved are subtly different. Indeed, for any sequence a ∈ k, it is obvious that admits the computation a a a 0 1 2 0  −−→ −−→ −−→ ... (5) a a a 0 1 2 x x The circuit admits a similar computation, but we must begin at time t = −1 in order to first “load” the registers with a : a a a a a a 00 0 0 1 1 2 2 a a a 0 1 2 −1  x x −→ x x −−→ x x −−→ x x −−→ ... (6) 0 a a a 0 1 2 The circuit x x , which again is equated with by the equational theory, is more tricky. Although every computation of can be reproduced, x x admits additional, problematic computations. Indeed, consider 00 01 x x x x 0  −→ (7) at which point no further transition is possible—the circuit can deadlock. The following lemma is an easy consequence of the rules of Fig. 2 and follows by structural induction. It states that all circuits can stay idle in the past. Lemma 2. Let c ∈ ACirc[n, m] with initial state c . Then t c −→ c if t< 0. 0 0 0 3.1 Trajectories For the non-affine version of the signal flow calculus, we studied in [9] traces arising from computations. For the affine extension, this is not possible since, as explained above, we must also consider computations that start in the past. In this paper, rather than traces we adopt a common control theoretic notion. n m Definition 5. An (n, m)-trajectory σ is a Z-indexed sequence σ : Z → k × k that is finite in the past, i.e., for which ∃j ∈ Z such that σ(i)=(0, 0) for i ≤ j. n m By the universal property of the product we can identify σ : Z → k × k n m with the pairing σ ,σ of σ : Z → k and σ : Z → k .A(k, m)-trajectory l r l r σ and (m, n)-trajectory τ are compatible if σ = τ . In this case, we can define r l 86 F. Bonchi et al. their composite, a (k, n)-trajectory σ ; τ by σ ; τ := σ ,τ . Given an (n ,m )- l r 1 1 trajectory σ , and an (n ,m )-trajectory σ , their product, an (n +n ,m +m )- 1 2 2 2 1 2 1 2 σ(i) trajectory σ ⊕σ , is defined (σ ⊕σ )(i):= . Using these two operations 1 2 1 2 τ(i) we can organise sets of trajectories into a prop. Definition 6. The composition of two sets of trajectories is defined as S ; T := {σ ; τ | σ ∈ S, τ ∈ T are compatible}. The product of sets of trajectories is defined as S ⊕ S := {σ ⊕ σ | σ ∈ S ,σ ∈ S }. 1 2 1 2 1 1 2 2 Clearly both operations are strictly associative. The unit for ⊕ is the singleton with the unique (0, 0)-trajectory. Also ; has a two sided identity, given by sets of “copycat” (n, n)-trajectories. Indeed, we have that: Proposition 4. Sets of (n, m)-trajectories are the arrows n → m of a prop Traj with composition and monoidal product given as in Definition 6. Traj serves for us as the domain for operational semantics: given a circuit c and an infinite computation u u t t+1 t+2 t c −−→ c −−−→ c −−−→ ... 0 1 2 v v v t+1 t+2 its associated trajectory σ is (u ,v)if i ≥ t, i i σ(i)= (8) (0, 0) otherwise. Definition 7. For a circuit c, c is the set of trajectories given by its infinite computations, following the translation (8) above. The assignment c → c is compositional, that is: Theorem 1. · : ACirc → Traj is a morphism of props. Example 6. Consider the computations (5) and (6) from Example 5. According to (8) both are translated into the trajectory σ mapping i ≥ 0into(a ,a ) and i i i< 0into(0, 0). The reader can easily verify that, more generally, it holds that x x = . At this point it is worth to remark that the two circuits would be distinguished when looking at their traces: the trace of computation (5) is different from the trace of (6). Indeed, the full abstraction result in [9] does not hold for all circuits, but only for those of a certain kind. The affine extension obliges us to consider computations that starts in the past and, in turn, this drives us toward a stronger full abstraction result, shown in the next section. Before concluding, it is important to emphasise that = x x also holds. Indeed, problematic computations, like (7), are all finite and, by definition, do not give rise to any trajectory. The reader should note that the use of trajectories is not a semantic device to get rid of problematic computations. In fact, trajectories do not appear in the statement of our full abstraction result; they are merely a convenient tool to prove it. Another result (Proposition 9) independently takes care of ruling out problematic computations. Contextual Equivalence for Signal Flow Graphs 87 4 Contextual Equivalence and Full Abstraction This section contains the main contribution of the paper: a traditional full ab- straction result asserting that contextual equivalence agrees with denotational equivalence. It is not a coincidence that we prove this result in the affine set- ting: affinity plays a crucial role, both in its statement and proof. In particular, Proposition 3 gives us two possibilities for the denotation of (0, 0) circuits: (i) ∅—which, roughly speaking, means that there is a problem (see e.g. Example 4) and no infinite computation is possible—or (ii) id , in which case infinite com- putations are possible. This provides us with a basic notion of observation, akin to observing termination vs non-termination in the λ-calculus. Definition 8. For a circuit c ∈ ACirc[0, 0] we write c ↑ if c can perform an infinite computation and c/ ↑ otherwise. For instance ↑, while /↑. To be able to make observations about arbitrary circuits we need to intro- duce an appropriate notion of context. Roughly speaking, contexts for us are (0, 0)-circuits with a hole into which we can plug another circuit. Since ours is a variable-free presentation, “dangling wires” assume the role of free vari- ables [16]: restricting to (0, 0) contexts is therefore analogous to considering ground contexts—i.e. contexts with no free variables—a standard concept of programming language theory. To define contexts formally, we extend the syntax of Section 2.1 with an extra generator “−” of sort (n, m). A (0, 0)-circuit of this extended syntax is a context when “−” occurs exactly once. Given an (n, m)-circuit c and a context C[−], we write C[c] for the circuit obtained by replacing the unique occurrence of “−”by c. With this setup, given an (n, m)-circuit c, we can insert it into a context C[−] and observe the possible outcome: either C[c] ↑ or C[c] /↑. This naturally leads us to contextual equivalence and the statement of our main result. Definition 9. Given c, d ∈ ACirc[n, m], we say that they are contextually equiv- alent, written c ≡ d, if for all contexts C[−], C[c] ↑ iff C[d] ↑ . x x Example 7. Recall from Example 5, the circuits and . Take the context C[−]= c ; − ; c for c ∈ ACirc[0, 1] and c ∈ ACirc[1, 0]. Assume that σ τ σ τ c and c have a single infinite computation. Call σ and τ the corresponding σ τ x x trajectories. If σ = τ, both C[ ] and C[ ] would be able to perform an infinite computation. Instead if σ = τ, none of them would perform any infinite computation: would stop at time t, for t the first moment such that x x σ(t) = τ(t), while C[ ] would stop at time t +1. Now take as context C[−]= ; − ; . In contrast to c and c , σ τ and can perform more than one single computation: at any time they can nondeterministically emit any value. Thus every computation of C[ ]= 88 F. Bonchi et al. can always be extended to an infinite one, forcing synchronisation of and at each step. For C[ x x ]= x x , and may emit different values at time t, but the computation will get stuck at t + 1. However, our x x definition of ↑ only cares about whether C[ ] can perform an infinite computation. Indeed it can, as long as and consistently emit the same value at each time step. If we think of contexts as tests, and say that a circuit c passes test C[−]if C[c] perform an infinite computation, then our notion of contextual equivalence x x is may-testing equivalence [13]. From this perspective, and are not must equivalent, since the former must pass the test ; − ; while x x may not. It is worth to remark here that the distinction between may and must testing will cease to make sense in Section 5 where we identify a certain class of circuits equipped with a proper flow directionality and thus a deterministic, input-output, behaviour. aIH Theorem 2 (Full abstraction). c ≡ d iff c = d The remainder of this section is devoted to the proof of Theorem 2. We will start by clarifying the relationship between fractions of polynomials (the denotational domain) and trajectories (the operational domain). 4.1 From Polynomial Fractions to Trajectories The missing link between polynomial fractions and trajectories are (formal) Laurent series: we now recall this notion. Formally, a Laurent series is a function σ : Z → k for which there exists j ∈ Z such that σ(i) = 0 for all i<j.We write σ as ...,σ(−1),σ(0),σ(1),... with position 0 underlined, or as formal sum σ(i)x . Each Laurent series σ has then a degree d ∈ Z, which is the first i=d non-zero element. Laurent series form a field k((x)): sum is pointwise, product −1 is by convolution, and the inverse σ of σ with degree d is defined as: 0if i< −d −1 −1 σ(d) if i = −d σ (i)= (9) n −1 σ(d+i)·σ (−d+n−i) ⎩ i=1 if i=−d+n for n>0 −σ(d) Note (formal) power series, which form ‘just’ a ring k[[x]], are a particular case of Laurent series, namely those σs for which d ≥ 0. What is most interesting for our purposes is how polynomials and fractions of polynomials relate to k((x)) and k[[x]]. First, the ring k[x] of polynomials embeds into k[[x]], and thus into k((x)): a polynomial p + p x + ··· + p x can also be regarded as the power series 0 1 n p x with p = 0 for all i>n. Because Laurent series are closed under i i i=0 division, this immediately gives also an embedding of the field of polynomial fractions k(x)into k((x)). Note that the full expressiveness of k((x)) is required: for instance, the fraction is represented as the Laurent series ..., 0, 1, 0, 0,... , x Contextual Equivalence for Signal Flow Graphs 89 which is not a power series, because a non-zero value appears before position 0. In fact, fractions that are expressible as power series are precisely the rational 2 n k +k x+k x ···+k x 0 1 2 n fractions, i.e. of the form where l =0. 2 n 0 l +l x+l x ···+l x 0 1 2 n Rational fractions form a ring k x which, dif- k[[x]] k((x)) ferently from the full field k(x), embeds into k[[x]]. Indeed, whenever l = 0, the inverse of 2 n l + l x + l x ··· + l x is, by (9), a bona fide 0 1 2 n k x power series. The commutative diagram on the k[x] k(x) right is a summary. Relations between k((x))-vectors organise themselves into a prop ARel k((x)) (see Definition 2). There is an evident prop morphism ι: ARel → ARel : k(x) k((x)) it maps the empty affine relation on k(x) to the one on k((x)), and otherwise applies pointwise the embedding of k(x)into k((x)). For the next step, observe that trajectories are in fact rearrangements of Laurent series: each pair of vectors n m (u, v) ∈ k((x)) × k((x)) , as on the left below, yields the trajectory κ(u, v) defined for all i ∈ Z as on the right below. ⎛ ⎛ ⎞ ⎛ ⎞ ⎞ ⎛ ⎛ ⎞ ⎛ ⎞ ⎞ 1 1 1 1 α β α (i) β (i) ⎜ ⎜ ⎟ ⎜ ⎟ ⎟ ⎜ ⎜ ⎟ ⎜ ⎟ ⎟ . . . . (u, v)= . , . κ(u, v)(i)= . , . ⎝ ⎝ ⎠ ⎝ ⎠ ⎠ ⎝ ⎝ ⎠ ⎝ ⎠ ⎠ . . . . n m n m α β α (i) β (i) Similarly to ι, the assignment κ extends to sets of vectors, and also to a prop morphism from ARel to Traj. Together, κ and ι provide the desired link k((x)) between operational and denotational semantics. Theorem 3. · = κ ◦ ι ◦ [[·]] Proof. Since both are symmetric monoidal functors from a free prop, it is enough to check the statement for the generators of ACirc. We show, as an example, the case of . By Definition 3, [[ ]] = p, | p ∈ k(x) . This is mapped by ι to α, | α ∈ k((x)) . Now, to see that κ(ι([[ ]])) = ,itis enough to observe that a trajectory σ is in κ(ι([[ ]])) precisely when, for all i, there exists some k ∈ k such that σ(i)= k , . i i 4.2 Proof of Full Abstraction We now have the ingredients to prove Theorem 2. First, we prove an adequacy result for (0, 0) circuits. Proposition 5. Let c ∈ ACirc[0, 0]. Then [[c]] = id if and only if c ↑. Proof. By Proposition 3, either [[c]] = id or [[c]] = ∅, which, combined with Theorem 3, means that c = κ ◦ ι(id )or c = κ ◦ ι(∅). By definition of ι this implies that either c contains a trajectory or not. In the first case c ↑;inthe second c/↑. 90 F. Bonchi et al. Next we obtain a result that relates denotational equality in all contexts to equality in aIH. Note that it is not trivial: since we consider ground contexts it does not make sense to merely consider “identity” contexts. Instead, it is at this point that we make another crucial use of affinity, taking advantage of the increased expressivity of affine circuits, as showcased by Proposition 2. aIH Proposition 6. If [[C[c]]]=[[C[d]]] for all contexts C[−], then c = d. aIH Proof. Suppose that c = d. Then [[c]] =[[d]]. Since both [[c]] and [[d]] are affine n m relations over k(x), there exists a pair of vectors (u, v) ∈ k(x) × k(x) that is in one of [[c]] and [[d]], but not both. Assume wlog that (u, v) ∈ [[c]] and (u, v) ∈ / [[d]]. By Proposition 2, there exists c and c such that [[c ; c ; c ]]=[[c ]];[[c]];[[c ]] = u v u v u v {(•,u)} ;[[c]] ; {(v, •)}. Since (u, v) ∈ [[c]], then [[c ; c ; c ]] = {(•, •)}. Instead, since u v (u, v) ∈ / [[d]], we have that [[c ; d ; c ]] = ∅. Therefore, for the context C[−]= u v c ; − ; c , we have that [[C[c]]] =[[C[d]]]. u v The proof of our main result is now straightforward. aIH Proof of Theorem 2. Let us first suppose that c = d. Then [[C[c]]]=[[C[d]]] for all contexts C[−], since [[·]] is a morphism of props. By Corollary 5, it follows immediately that C[c] ↑ if and only if C[d] ↑, namely c ≡ d. Conversely, suppose that, for all C[−], C[c] ↑ iff C[d] ↑. Again by Corollary 5, we have that [[C[c]]]=[[C[d]]]. We conclude by invoking Proposition 6. 5 Functional Behaviour and Signal Flow Graphs There is a sub-prop SF of Circ of classical signal flow graphs (see e.g. [21]). Here signal flows left-to-right, possibly featuring feedback loops, provided that these go through at least one register. Feedback can be captured algebraically via an operation Tr(·): Circ[n +1,m +1] → Circ[n, m] taking c: n +1 → m + 1 to: n m → − Following [9], let us call Circ the free sub-prop of Circ of circuits built from (3) → − and the generators of (1), without . Then SF is defined as the closure of Circ under Tr(·). For instance, the circuit of Example 2 is in SF. Signal flow graphs are intimately connected to the executability of circuits. In general, the rules of Figure 2 do not assume a fixed flow orientation. As a result, some circuits in Circ are not executable as functional input-output systems, as x x x we have demonstrated with , and of Examples 3-5. Notice that none of these are signal flow graphs. In fact, the circuits of SF do not have pathological behaviour, as we shall state more precisely in Proposition 9. At the denotational level, signal flow graphs correspond precisely to rational functional behaviours, that is, matrices whose coefficients are in the ring k x Contextual Equivalence for Signal Flow Graphs 91 of rational fractions (see Section 4.1). We call such matrices, rational matrices. One may check that the semantics of a signal flow graph c:(n, m)isalways of the form [[c]] = {(v, A · v) | v ∈ k(x) }, for some m × n rational matrix A. Conversely, all relations that are the graph of rational matrices can be expressed as signal flow graphs. Proposition 7. Given c:(n, m), we have [[c]] = {(p, A · p) | p ∈ k(x) } for some rational m×n matrix A iff there exists a signal flow graph f, i.e., a circuit f :(n, m) of SF, such that [[f]]=[[c]]. Proof. This is a folklore result in control theory which can be found in [30]. The details of the translation between rational matrices and circuits of SF can be found in [10, Section 7]. The following gives an alternative characterisation of rational matrices—and therefore, by Proposition 7, of the behaviour of signal flow graphs—that clarifies their role as realisations of circuits. m n Proposition 8. An m × n matrix is rational iff A · r ∈ k x for all r ∈ k x . Proposition 8 is another guarantee of good behaviour—it justifies the name of inputs (resp. outputs) for the left (resp. right) ports of signal flow graphs. Recall from Section 4.1 that rational fractions can be mapped to Laurent series of nonnegative degree, i.e., to plain power series. Operationally, these correspond to trajectories that start after t = 0. Proposition 8 guarantees that any trajectory of a signal flow graph whose first nonzero value on the left appears at time t =0, will not have nonzero values on the right starting before time t = 0. In other words, signal flow graphs can be seen as processing a stream of values from left to right. As a result, their ports can be clearly partitioned into inputs and outputs. But the circuits of SF are too restrictive for our purposes. For example, can also be seen to realise a functional behaviour transforming inputs on the left into outputs on the right yet it is not in SF. Its behaviour is no longer linear, but affine. Hence, we need to extend signal flow graphs to include functional affine behaviour. The following definition does just that. Definition 10. Let ASF be the sub-prop of ACirc obtained from all the genera- tors in (1), closed under Tr(·). Its circuits are called affine signal flow graphs. x x x As before, none of , and from Examples 3-5 are affine sig- nal flow graphs. In fact, ASF rules out pathological behaviour: all computations can be extended to be infinite, or in other words, do not get stuck. Proposition 9. Given an affine signal flow graph f, for every computation u t+1 t f −−→ f −−−→ ...f 0 1 n v v t p+1 there exists a trajectory σ ∈ c such that σ(i)=(u ,v ) for t ≤ i ≤ t + n. i i Proof. By induction on the structure of affine signal flow graphs. 92 F. Bonchi et al. If SF circuits correspond precisely to k x -matrices, those of ASF correspond precisely to k x -affine transformations. n m Definition 11. A map f: k(x) → k(x) is an affine map if there exists an m n m × n matrix A and b ∈ k(x) such that f(p)= A · p + b for all p ∈ k(x) .We call the pair (A, b) the representation of f. The notion of rational affine map is a straightforward extension of the linear case and so is the characterisation in terms of rational input-output behaviour. Definition 12. An affine map f: p → A · p + b is rational if A and b have coefficients in k x . n m m Proposition 10. An affine map f: k(x) → k(x) is rational iff f(r) ∈ k x for all r ∈ k x . The following extends the correspondence of Proposition 7, showing that ASF is the rightful affine heir of SF. Proposition 11. Given c:(n, m), we have [[c]] = {(p, f(p)) | p ∈ k(x) } for some rational affine map f iff there exists an affine signal flow graph g, i.e., a circuit g:(n, m) of ASF, such that [[g]]=[[c]]. Proof. Let f be given by p → Ap + b for some rational m × n matrix A and vector b ∈ k x . By Proposition 7, we can find a circuit c of SF such that [[c ]] = {(p, A · p) | p ∈ k(x)}. Similarly, we can n represent b as a signal flow graph c of sort (1,m). b m c := Then, the circuit on the right is clearly in ASF and verifies [[c]] = {(p, Ap + b) | p ∈ k(x)} as required. For the converse direction it is straightforward to check by structural in- duction that the denotation of affine signal flow graphs is the graph (in the set-theoretic sense of pairs of values) of some rational affine map. 6 Realisability In the previous section we gave a restricted class of morphisms with good be- havioural properties. We may wonder how much of ACirc we can capture with this restricted class. The answer is, in a precise sense: most of it. Surprisingly, the behaviours realisable in Circ—the purely linear fragment— are not more expressive. In fact, from an operational (or denotational, by full abstraction) point of view, Circ is nothing more than jumbled up version of SF. Indeed, it turns out that Circ enjoys a realisability theorem: any circuit c of Circ can be associated with one of SF, that implements or realises the behaviour of c into an executable form. But the corresponding realisation may not flow neatly from left to right like signal flow graphs do—its inputs and outputs may have been moved from one side to the other. Consider for example, the circuit on the right Contextual Equivalence for Signal Flow Graphs 93 It does not belong to SF but it can be read as a signal flow graph with an input that has been bent and moved to the bottom right. The behaviour it realises can therefore executed by rewiring this port to obtain a signal flow graph: aIH x x We will not make this notion of rewiring precise here but refer the reader to [9] for the details. The intuition is simply that a rewiring partitions the ports of a circuit into two sets—that we call inputs and outputs—and uses or to bend input ports to the left and and output ports to the right. The realisability theorem then states that we can always recover a (not necessarily unique) signal flow graph from any circuit by performing these operations. Theorem 4. [9, Theorem 5] Every circuit in Circ is equivalent to the rewiring of a signal flow graph, called its realisation. This theorem allows us to extend the notion of inputs and outputs to all circuits of Circ. Definition 13. A port of a circuit c of Circ is an input (resp. output) port, if there exists a realisation for which it is an input (resp. output). Note that, since realisations are not necessarily unique, the same port can be both an input and an output. Then, the realisability theorem (Theorem 4) says that every port is always an input, an output or both (but never neither). An output-only port is an output port that is not an input port. Similarly an input-only port in an input port that is not an output port. Example 8. The left port of the register x is input-only whereas its right port is output-only. In the identity wire, both ports are input and output ports. The single port of is output-only ; that of is input-only. While in the purely linear case, all behaviours are realisable, the general case of ACirc is a bit more subtle. To make this precise, we can extend our definition of realisability to include affine signal flow graphs. Definition 14. A circuit of ACirc is realisable if its ports can be rewired so that it is equivalent to a circuit of ASF. Example 9. is realisable; is not. Notice that Proposition 11, gives the following equivalent semantic criterion for realisability. Realisable behaviours are precisely those that map rationals to rationals. Theorem 5. A circuit c is realisable iff its ports can be partitioned into two sets, that we call inputs and outputs, such that the corresponding rewiring of c is an affine rational map from inputs to outputs. 94 F. Bonchi et al. We offer another perspective on realisability below: realisable behaviours cor- respond precisely to those for which the constants are connected to inputs of the underlying Circ-circuit. First, notice that, since (1-dup) (1-del) = and = in aIH, we can assume without loss of generality that each circuit contains exactly one . Proposition 12. Every circuit c of ACirc is equivalent to one with precisely one and no . For c:(n, m) a circuit of ACirc, we will call cˆ the circuit of Circ of sort (n +1,m) that one obtains by first transforming c into an equivalent circuit with a single and no as above, then removing this , and replacing it by an identity wire that extends to the left boundary. Theorem 6. A circuit c is realisable iff is connected to an input port of cˆ. 7 Conclusion and Future Work We introduced the operational semantics of the affine extension of the signal flow calculus and proved that contextual equivalence coincides with denotational equality, previously introduced and axiomatised in [6]. We have observed that, at the denotational level, affinity provides two key properties (Propositions 2 and 3) for the proof of full abstraction. However, at the operational level, affin- ity forces us to consider computations starting in the past (Example 3) as the syntax allows terms lacking a proper flow directionality. This leads to circuits that might deadlock ( in Example 4) or perform some problematic computa- x x tions ( in Example 5). We have identified a proper subclass of circuits, called affine signal flow graphs (Definition 10), that possess an inherent flow directionality: in these circuits, the same pathological behaviours do not arise (Proposition 9). This class is not too restrictive as it captures all desirable be- haviours: a realisability result (Theorem 5) states that all and only the circuits that do not need computations to start in the past are equivalent to (the rewiring of) an affine signal flow graph. The reader may be wondering why we do not restrict the syntax to affine signal flow graphs. The reason is that, like in the behavioural approach to control theory [33], the lack of flow direction is what allows the (affine) signal flow calcu- lus to achieve a strong form of compositionality and a complete axiomatisation (see [9] for a deeper discussion). We expect that similar methods and results can be extended to other models of computation. Our next step is to tackle Petri nets, which, as shown in [5], can be regarded as terms of the signal flow calculus, but over N rather than a field. Contextual Equivalence for Signal Flow Graphs 95 References 1. Abramsky, S., Coecke, B.: A categorical semantics of quantum protocols. In: Pro- ceedings of the 19th Annual IEEE Symposium on Logic in Computer Science (LICS), 2004. pp. 415–425. IEEE (2004) 2. Baez, J., Erbele, J.: Categories in control. Theory and Applications of Categories 30, 836–881 (2015) 3. Baez, J.C.: Network theory (2014),, website (retrieved 15/04/2014) 4. Basold, H., Bonsangue, M., Hansen, H., Rutten, J.: (Co)Algebraic characterizations of signal flow graphs. In: van Breugel, F., Kashefi, E., Palamidessi, C., Rutten, J. (eds.) Horizons of the Mind. A Tribute to Prakash Panangaden, Lecture Notes in Computer Science, vol. 8464, pp. 124–145. Springer International Publishing (2014) 5. Bonchi, F., Holland, J., Piedeleu, R., Sobocinski, ´ P., Zanasi, F.: Diagrammatic al- gebra: from linear to concurrent systems. Proceedings of the 46th ACM SIGPLAN Symposium on Principles of Programming Languages (POPL) 3, 1–28 (2019) 6. Bonchi, F., Piedeleu, R., Sobocinski, ´ P., Zanasi, F.: Graphical affine algebra. In: Proceedings of the 34th Annual ACM/IEEE Symposium on Logic in Computer Science (LICS). pp. 1–12 (2019) 7. Bonchi, F., Piedeleu, R., Sobocinski, ´ P., Zanasi, F.: Contextual equivalence for signal flow graphs (2020), 8. Bonchi, F., Sobocinski, ´ P., Zanasi, F.: A categorical semantics of signal flow graphs. In: Proceedings of the 25th International Conference on Concurrency Theory (CONCUR). pp. 435–450. Springer (2014) 9. Bonchi, F., Sobocinski, P., Zanasi, F.: Full abstraction for signal flow graphs. In: Proceedings of the 42nd Annual ACM SIGPLAN Symposium on Principles of Programming Languages (POPL). pp. 515–526 (2015) 10. Bonchi, F., Sobocinski, P., Zanasi, F.: The calculus of signal flow diagrams I: linear relations on streams. Information and Computation 252, 2–29 (2017) 11. Coecke, B., Duncan, R.: Interacting quantum observables. In: Proceedings of the 35th international colloquium on Automata, Languages and Programming (ICALP), Part II. pp. 298–310 (2008) 12. Coecke, B., Kissinger, A.: Picturing Quantum Processes - A first course in Quantum Theory and Diagrammatic Reasoning. Cambridge University Press (2017) 13. De Nicola, R., Hennessy, M.C.: Testing equivalences for processes. Theoretical Computer Science 34(1-2), 83–133 (1984) 14. Ghica, D.R.: Diagrammatic reasoning for delay-insensitive asynchronous circuits. In: Computation, Logic, Games, and Quantum Foundations. The Many Facets of Samson Abramsky, pp. 52–68. Springer (2013) 15. Ghica, D.R., Jung, A.: Categorical semantics of digital circuits. In: Proceedings of the 16th Conference on Formal Methods in Computer-Aided Design (FMCAD). pp. 41–48 (2016) 16. Ghica, D.R., Lopez, A.: A structural and nominal syntax for diagrams. In: Pro- ceedings 14th International Conference on Quantum Physics and Logic (QPL). pp. 71–83 (2017) 17. Hoare, C.A.R.: Communicating Sequential Processes. Prentice Hall (1985) 18. Honda, K., Yoshida, N.: On reduction-based process semantics. Theoretical Com- puter Science 152(2), 437–486 (1995) 96 F. Bonchi et al. 19. Mac Lane, S.: Categorical algebra. Bulletin of the American Mathematical Society 71, 40–106 (1965) 20. Mac Lane, S.: Categories for the Working Mathematician. Springer (1998) 21. Mason, S.J.: Feedback Theory: I. Some Properties of Signal Flow Graphs. MIT Research Laboratory of Electronics (1953) 22. Milius, S.: A sound and complete calculus for finite stream circuits. In: Proceedings of the 2010 25th Annual IEEE Symposium on Logic in Computer Science (LICS). pp. 421–430 (2010) 23. Milner, R.: A Calculus of Communicating Systems, Lecture Notes in Computer Science, vol. 92. Springer (1980) 24. Milner, R., Sangiorgi, D.: Barbed bisimulation. In: Proceedings of the 19th Inter- national Colloquium on Automata, Languages and Programming (ICALP). pp. 685–695 (1992) 25. Morris Jr, J.H.: Lambda-calculus models of programming languages. Ph.D. thesis, Massachusetts Institute of Technology (1969) 26. Pavlovic, D.: Monoidal computer I: Basic computability by string diagrams. Infor- mation and Computation 226, 94–116 (2013) 27. Pavlovic, D.: Monoidal computer II: Normal complexity by string diagrams. arXiv:1402.5687 (2014) 28. Plotkin, G.D.: Call-by-name, call-by-value and the λ-calculus. Theoretical Com- puter Science 1(2), 125–159 (1975) 29. Rutten, J.J.M.M.: A tutorial on coinductive stream calculus and signal flow graphs. Theoretical Computer Science 343(3), 443–481 (2005) 30. Rutten, J.J.M.M.: Rational streams coalgebraically. Logical Methods in Computer Science 4(3) (2008) 31. Selinger, P.: A survey of graphical languages for monoidal categories. Springer Lecture Notes in Physics 13(813), 289–355 (2011) 32. Shannon, C.E.: The theory and design of linear differential equation machines. Tech. rep., National Defence Research Council (1942) 33. Willems, J.C.: The behavioural approach to open and interconnected systems. IEEE Control Systems Magazine 27, 46–99 (2007) Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License ( 4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made. The images or other third party material in this chapter are included in the chapter’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. Parameterized Synthesis for Fragments of First-Order Logic over Data Words 1 2 1() B´eatrice B´erard , Benedikt Bollig , Mathieu Lehaut , and Nathalie Sznajder Sorbonne Universit´e, CNRS, LIP6, F-75005 Paris, France CNRS, LSV & ENS Paris-Saclay, Universit´e Paris-Saclay, Cachan, France Abstract. We study the synthesis problem for systems with a parame- terized number of processes. As in the classical case due to Church, the system selects actions depending on the program run so far, with the aim of fulfilling a given specification. The difficulty is that, at the same time, the environment executes actions that the system cannot control. In con- trast to the case of fixed, finite alphabets, here we consider the case of parameterized alphabets. An alphabet reflects the number of processes, which is static but unknown. The synthesis problem then asks whether there is a finite number of processes for which the system can satisfy the specification. This variant is already undecidable for very limited logics. Therefore, we consider a first-order logic without the order on word posi- tions. We show that even in this restricted case synthesis is undecidable if both the system and the environment have access to all processes. On the other hand, we prove that the problem is decidable if the environ- ment only has access to a bounded number of processes. In that case, there is even a cutoff meaning that it is enough to examine a bounded number of process architectures to solve the synthesis problem. 1 Introduction Synthesis deals with the problem of automatically generating a program that satisfies a given specification. The problem goes back to Church [9], who formu- lated it as follows: The environment and the system alternately select an input symbol and an output symbol from a finite alphabet, respectively, and in this way generate an infinite sequence. The question now is whether the system has a winning strategy, which guarantees that the resulting infinite run is contained in a given (ω)-regular language representing the specification, no matter how the environment behaves. This problem is decidable and very well understood [8,37], and it has been extended in several different ways (e.g., [24, 26, 28, 36, 43]). In this paper, we consider a variant of the synthesis problem that allows us to model programs with a variable number of processes. As we then deal with an unbounded number of process identifiers, a fixed finite alphabet is not suit- able anymore. It is more appropriate to use an infinite alphabet, in which every Partly supported by ANR FREDDA (ANR-17-CE40-0013). The Author(s) 2020 J. Goubault-Larrecq and B. K¨ onig (Eds.): FOSSACS 2020, LNCS 12077, pp. 97–118, 2020. 98 B. B´ erard et al. letter contains a process identifier and a program action. One can distinguish two cases here. In [16], a potentially infinite number of data values are involved in an infinite program run (e.g. by dynamic process generation). In a parameter- ized system [4, 13], on the other hand, one has an unknown but static number of processes so that, along each run, the number of processes is finite. In this paper, we are interested in the latter, i.e., parameterized case. Parameterized programs are ubiquitous and occur, e.g., in distributed algorithms, ad-hoc net- works, telecommunication protocols, cache-coherence protocols, swarm robotics, and biological systems. The synthesis question asks whether the system has a winning strategy for some number of processes (existential version) or no matter how many processes there are (universal version). Over infinite alphabets, there are a variety of different specification languages (e.g., [5, 11, 12, 19, 29, 33, 40]). Unlike in the case of finite alphabets, there is no canonical definition of regular languages. In fact, the synthesis problem has been studied for N-memory automata [7], the Logic of Repeating Values [16], and reg- ister automata [15,30,31]. Though there is no agreement on a “regular” automata model, first-order (FO) logic over data words can be considered as a canonical logic, and this is the specification language we consider here. In addition to classical FO logic on words over finite alphabets, it provides a predicate x ∼ y to express that two events x and y are triggered by the same process. Its two- variable fragment FO has a decidable emptiness and universality problem [5] and is, therefore, a promising candidate for the synthesis problem. Previous generalizations of Church’s synthesis problem to infinite alphabets were generally synchronous in the sense that the system and the environment perform their actions in strictly alternating order. This assumption was made, e.g., in the above-mentioned recent papers [7, 15, 16, 30, 31]. If there are several processes, however, it is realistic to relax this condition, which leads us to an asynchronous setting in which the system has no influence on when the envi- ronment acts. Like in [21], where the asynchronous case for a fixed number of processes was considered, we only make the reasonable fairness assumption that the system is not blocked forever. In summary, the synthesis problem over infinite alphabets can be classified as (i) parameterized vs. dynamic, (ii) synchronous vs. asynchronous, and (iii) according to the specification language (register automata, Logic of Repeating Values, FO logic, etc.). As explained above, we consider here the parameter- ized asynchronous case for specifications written in FO logic. To the best of our knowledge, this combination has not been considered before. For flexible model- ing, we also distinguish between three types of processes: those that can only be controlled by the system; those that can only be controlled by the environment; and finally those that can be triggered by both. A partition into system and environment processes is also made in [3,18], but for a fixed number of processes and in the presence of an arena in terms of a Petri net. Let us briefly describe our results. We show that the general case of the synthesis problem is undecidable for FO logic. This follows from an adaptation of an undecidability result from [16,17] for a fragment of the Logic of Repeating Parameterized Synthesis for First-Order Logic over Data Words 99 Values [11]. We therefore concentrate on an orthogonal logic, namely FO without the order on the word positions. First, we show that this logic can essentially count processes and actions of a given process up to some threshold. Though it has limited expressive power (albeit orthogonal to that of FO ), it leads to intricate behaviors in the presence of an uncontrollable environment. In fact, we show that the synthesis problem is still undecidable. Due to the lack of the order relation, the proof requires a subtle reduction from the reachability problem in 2-counter Minsky machines. However, it turns out that the synthesis problem is decidable if the number of processes that are controllable by the environment is bounded, while the number of system processes remains unbounded. In this case, there is even a cutoff k, an important measure for parameterized systems (cf. [4] for an overview): If the system has a winning strategy for k processes, then it has one for any number of processes greater than k, and the same applies to the environment. The proofs of both main results rely on a reduction of the synthesis problem to turn-based parameterized vector games, in which, similar to Petri nets, tokens corresponding to processes are moved around between states. The paper is structured as follows. In Section 2, we define FO logic (especially FO without word order), and in Section 3, we present the parameterized synthesis problem. In Section 4, we transform a given formula into a normal form and finally into a parameterized vector game. Based on this reduction, we investigate cutoff properties and show our (un)decidability results in Section 5. We conclude in Section 6. Some proof details can be found in the long version of this paper [2] 2 Preliminaries ∗ ω For a finite or infinite alphabet Σ, let Σ and Σ denote the sets of finite and, ∗ ω respectively, infinite words over Σ. The empty word is ε. Given w ∈ Σ ∪ Σ , let |w| denote the length of w and Pos(w) its set of positions: |w| = n and Pos(w)= {1,...,n} if w = σ σ ...σ ∈ Σ , and |w| = ω and Pos(w)= 1 2 n {1, 2,...} if w ∈ Σ . Let w[i]bethe i-th letter of w for all i ∈ Pos(w). Executions. We consider programs involving a finite (but not fixed) number of processes. Processes are controlled by antagonistic protagonists, System and Environment. Accordingly, each process has a type among T = {s, e, se}, and we let P , P , and P denote the pairwise disjoint finite sets of processes controlled s e se by System, by Environment, and by both System and Environment, respectively. We let P denote the triple (P , P , P ). Abusing notation, we sometimes refer to s e se P as the disjoint union P ∪ P ∪ P . s e se Given any set S, vectors s ∈ S are usually referred to as triples s = (s ,s ,s ). Moreover, for s, s ∈ N , we write s ≤ s if s ≤ s for all θ ∈ T. s e se θ Finally, let s + s =(s + s ,s + s ,s + s ). s e se s e se Processes can execute actions from a finite alphabet A. Whenever an action is executed, we would like to know whether it was triggered by System or by Environment. Therefore, A is partitioned into A = A A . Let Σ = A ×(P ∪P ) s e s s s se and Σ = A × (P ∪ P ). Their union Σ = Σ ∪ Σ is the set of events. A word e e e se s e ∗ ω w ∈ Σ ∪ Σ is called a P-execution. 100 B. B´ erard et al. e A = {a, b} A = {c, d} s e a c a c a a b d d b d 1 8 7 4 7 6 2 77 6 6 se Fig. 1. Representation of P-execution as a mathematical structure Logic. Formulas of our logic are evaluated over P-executions. We fix an infinite supply V = {x, y, z, . . .} of variables, which are interpreted as processes from P or positions of the execution. The logic FO [∼,<, +1] is given by the grammar ϕ ::= θ(x) | a(x) | x = y | x ∼ y | x<y | +1(x, y) |¬ϕ | ϕ ∨ ϕ |∃x.ϕ where x, y ∈V, θ ∈ T, and a ∈ A. Conjunction (∧), universal quantification (∀), implication (=⇒), true, and false are obtained as abbreviations as usual. Let ϕ ∈ FO [∼,<, +1]. By Free(ϕ) ⊆V, we denote the set of variables that occur free in ϕ.If Free(ϕ)= ∅, then we call ϕ a sentence. We sometimes write ϕ(x ,...,x ) to emphasize the fact that Free(ϕ) ⊆{x ,...,x }. 1 n 1 n To evaluate ϕ over a P-execution w =(a ,p )(a ,p ) ..., we consider (P,w)as 1 1 2 2 a structure S =(P  Pos(w), P , P , P , (R ) , ∼,<, +1) where PPos(w) (P,w) s e se a a∈A is the universe, P P , and P are interpreted as unary relations, R is the unary s e se a relation {i ∈ Pos(w) | a = a}, < = {(i, j) ∈ Pos(w) × Pos(w) | i<j}, +1 = {(i, i +1) | 1 ≤ i< |w|}, and ∼ is the smallest equivalence relation over P  Pos(w) containing – (p, i) for all p ∈ P and i ∈ Pos(w) such that p = p , and – (i, j) for all (i, j) ∈ Pos(w) × Pos(w) such that p = p . i j An equivalence class of ∼ is often simply referred to as a class. Note that it contains exactly one process. Example 1. Suppose A = {a, b} and A = {c, d}. Let the set of processes s e P be given by P = {1, 2, 3}, P = {4, 5}, and P = {6, 7, 8}. Moreover, let s e se w =(a, 1)(b, 8)(d, 7)(c, 4)(a, 6)(c, 6)(a, 7)(d, 6)(b, 2)(d, 7)(a, 7) ∈ Σ . Figure 1 il- lustrates S . The edge relation represents +1, its transitive closure is <. (P,w) An interpretation for (P,w) is a partial mapping I : V→ P ∪ Pos(w). Sup- pose ϕ ∈ FO [∼,<, +1] such that Free(ϕ) ⊆ dom(I). The satisfaction relation (P,w),I |= ϕ is then defined as expected, based on the structure S and in- (P,w) terpreting free variables according to I. For example, let w =(a ,p )(a ,p ) ... 1 1 2 2 and i ∈ Pos(w). Then, for I(x)= i,wehave (P,w),I |= a(x)if a = a. We identify some fragments of FO [∼,<, +1]. For R ⊆{∼,<, +1}, let FO [R] A A denote the set of formulas that do not use symbols in {∼,<, +1}\ R. Moreover, FO [R] denotes the fragment of FO [R] that uses only two (reusable) variables. A A Parameterized Synthesis for First-Order Logic over Data Words 101 Let ϕ(x ,...,x ,y) ∈ FO [∼,<, +1] be a formula and m ∈ N.Weuse 1 n A ≥m ∃ y.ϕ(x ,...,x ,y) as an abbreviation for 1 n ∃y ... ∃y . ¬(y = y ) ∧ ϕ(x ,...,x ,y ) , 1 m i j 1 n i 1≤i<j≤m 1≤i≤m ≥0 ≥m if m> 0, and ∃ y.ϕ(x ,...,x ,y)= true. Thus, ∃ y.ϕ says that there are at 1 n =m least m distinct elements that verify ϕ. We also use ∃ y.ϕ as an abbreviation ≥m ≥m+1 ≥m for ∃ y.ϕ∧¬∃ y.ϕ. Note that ϕ ∈ FO [R] implies that ∃ y.ϕ ∈ FO [R] A A =m and ∃ y.ϕ ∈ FO [R]. Example 2. Let A, P, and w be like in Example 1 and Figure 1. – ϕ = ∀x. (s(x) ∨ se(x)) =⇒∃y.(x ∼ y ∧ (a(y) ∨ b(y))) says that each process that System can control executes at least one system action. We have ϕ ∈ FO [∼] and (P,w) |= ϕ , as process 3 is idle. 1 1 – ϕ = ∀x. d(x)=⇒∃y.(x ∼ y ∧ a(y)) says that, for every d, there is an a on the same process. We have ϕ ∈ FO [∼] and (P,w) |= ϕ . 2 2 – ϕ = ∀x. d(x)=⇒∃y.(x ∼ y ∧x<y ∧a(y)) says that every d is eventually followed by an a executed by the same process. We have ϕ ∈ FO [∼,<] and (P,w) |= ϕ : The event (d, 6) is not followed by some (a, 6). =2 =2 – ϕ = ∀x. ∃ y.(x ∼ y ∧ a(y)) ⇐⇒ ∃ y.(x ∼ y ∧ d(y)) says that each class contains exactly two occurrences of a iff it contains exactly two occurrences of d. Moreover, ϕ ∈ FO [∼] and (P,w) |= ϕ . Note that ϕ ∈ 4 A 4 4 =2 FO [∼], as ∃ y requires the use of three different variable names. 3 Parameterized Synthesis Problem We define an asynchronous synthesis problem. A P-strategy (for System) is a ∗ ∗ ω mapping f : Σ → Σ ∪{ε}.A P-execution w = σ σ ... ∈ Σ ∪ Σ is f- s 1 2 compatible if, for all i ∈ Pos(w) such that σ ∈ Σ ,wehave f(σ ...σ )= σ . i s 1 i−1 i We call wf-fair if the following hold: (i)If w is finite, then f(w)= ε, and (ii) if w is infinite and f(σ ...σ ) = ε for infinitely many i ≥ 1, then σ ∈ Σ for 1 i−1 j s infinitely many j ≥ 1. Let ϕ ∈ FO [∼,<, +1] be a sentence. We say that f is P-winning for ϕ if, for every P-execution w that is f-compatible and f-fair, we have (P,w) |= ϕ. The existence of a P-strategy that is P-winning for a given formula does not depend on the concrete process identities but only on the cardinality of the sets P , P ,and P . This motivates the following definition of winning triples for a s e se formula. Given ϕ, let Win(ϕ) be the set of triples (k ,k ,k ) ∈ N for which s e se there is P =(P , P , P ) such that |P | = k for all θ ∈ T and there is a P-strategy s e se θ θ that is P-winning for ϕ. Let 0 = {0} and k ,k ∈ N. In this paper, we focus on the intersection of e se Win(ϕ) with the sets N × 0 × 0 (which corresponds to the usual satisfiability problem); N ×{k }×{k } (there is a constant number of environment and e se mixed processes); N×N×{k } (there is a constant number of mixed processes); se 0 × 0 × N (each process is controlled by both System and Environment). 102 B. B´ erard et al. Definition 3 (synthesis problem). For fixed F ∈{FO, FO }, set of relation symbols R ⊆{∼,<, +1}, and N , N , N ⊆ N, the (parameterized) synthesis s e se problem is given as follows: Synth(F[R], N , N , N ) s e se Input: A = A  A and a sentence ϕ ∈ F [R] s e A Question: Win(ϕ) ∩ (N ×N ×N ) = ∅ ? s e se The satisfiability problem for F[R] is defined as Synth(F[R], N, 0, 0). Example 4. Suppose A = {a, b} and A = {c, d}, and consider the formulas s e ϕ –ϕ from Example 2. 1 4 First, we have Win(ϕ )= N . Given an arbitrary P and any total order over P ∪ P , a possible P-strategy f that is P-winning for ϕ maps w ∈ Σ to s se 1 (a, p)if p is the smallest process from P ∪ P wrt.  that does not occur in w, s se and that returns ε for w if all processes from P ∪ P already occur in w. s se For the three formulas ϕ , ϕ , and ϕ , observe that, since d is an environment 2 3 4 action, if there is at least one process that is exclusively controlled by Environ- ment, then there is no winning strategy. Hence we must have P = ∅. In fact, this condition is sufficient in the three cases and the strategies described below show that all three sets Win(ϕ ), Win(ϕ ), and Win(ϕ ) are equal to N×0×N. 2 3 4 – For ϕ , the very same strategy as for ϕ also works in this case, producing 2 1 an a for every process in P ∪ P , whether there is a d or not. s se – For ϕ , a winning strategy f will apply the previous mechanism itera- tively, performing (a, p) for p ∈ P = {p ,...,p } over and over again: se 0 n−1 f(w)=(a, p ) where i is the number of occurrences of letters from Σ mod- i s ulo n. By the fairness assumption, this guarantees satisfaction of ϕ . A more “economical” winning strategy f may organize pending requests in terms of d in a queue and acknowledge them successively. More precisely, given u ∈ P and σ ∈ Σ, we define another word uσ ∈ P by u(d, p)= u·p (inserting p in the queue) and (p·u)(a, p)= u (deleting it). In all other cases, uσ = u. Let w = σ ...σ ∈ Σ , with queue ((ε  σ )  σ ...)  σ = p ...p .We 1 n 1 2 n 1 k let f (w)= ε if k = 0, and f (w)=(a, p )if k ≥ 1. – For ϕ , the strategy f for ϕ ensures that every d has a corresponding a so 4 3 that, in the long run, there are as many a’s as d’s in every class. Another interesting question is whether System (or Environment) has a win- ning strategy as soon as the number of processes is big enough. This leads to the notion of a cutoff (cf. [4] for an overview): Let N , N , N ⊆ N and W ⊆ N .We s e se call k ∈ N a cutoff of W wrt. (N , N , N )if k ∈N ×N ×N and either 0 s e se 0 s e se – for all k ∈N ×N ×N such that k ≥ k ,wehave k ∈ W,or s e se 0 – for all k ∈N ×N ×N such that k ≥ k ,wehave k ∈ W . s e se 0 Let F ∈{FO, FO } and R ⊆{∼,<, +1}. If, for every alphabet A = A  A s e and every sentence ϕ ∈ F [R], the set Win(ϕ) has a computable cutoff wrt. A Parameterized Synthesis for First-Order Logic over Data Words 103 Table 1. Summary of results. Our contributions are highlighted in bold. Synthesis (N, 0, 0)(N, {k }, {k })(N, N, 0)(0, 0, N) e se FO [∼,<, +1] decidable [5]? ? undecidable FO [∼,<] NEXPTIME-c. [5]? ? ? FO[∼] decidable decidable ? undecidable We show, however, that there is no cutoff. (N , N , N ), then we know that Synth(F[R], N , N , N ) is decidable, as it s e se s e se can be reduced to a finite number of simple synthesis problems over a finite alphabet. The latter can be solved, e.g., using attractor-based backward search (cf. [42]). This is how we will show decidability of Synth(FO[∼], N, {k }, {k }) e se for all k ,k ∈ N. e se Our contributions are summarized in Table 1. Note that known satisfiability results for data logic apply to our logic, as processes can be simulated by treating every θ ∈ T as an ordinary letter. Let us first state undecidability of the general synthesis problem, which motivates the study of other FO fragments. Theorem 5. The problem Synth(FO [∼,<, +1], 0, 0, N) is undecidable. Proof (sketch). We adapt the proof from [16, 17] reducing the halting problem for 2-counter machines. We show that their encoding can be expressed in our logic, even if we restrict it to two variables, and can also be adapted to the asynchronous setting. 4FO[∼] and Parameterized Vector Games Due to the undecidability result of Theorem 5, one has to switch to other frag- ments of first-order logic. We will henceforth focus on the logic FO[∼] and es- tablish some important properties, such as a normal form, that will allow us to deduce a couple of results, both positive and negative. 4.1 Satisfiability and Normal Form for FO[∼] We first show that FO[∼] logic essentially allows one to count letters in a class up to some threshold, and to count such classes up to some other threshold. Let B ∈ N and  ∈{0,...,B} . Intuitively, (a) imposes a constraint on the number of occurrences of a in a class. We first define an FO [∼]-formula ψ (y) A B, verifying that, in the class defined by y, the number of occurrences of each letter a ∈ A, counted up to B,is (a): =(a) ≥(a) ψ (y)= ∃ z. y ∼ z ∧ a(z) ∧ ∃ z. y ∼ z ∧ a(z) B, a∈A | a∈A | (a)<B (a)=B 104 B. B´ erard et al. Theorem 6 (normal form for FO[∼]). Let ϕ ∈ FO [∼] be a sentence. There is a computable B ∈ N such that ϕ is effectively equivalent to a disjunction of conjunctions of formulas of the form ∃ y. θ(y) ∧ ψ (y) where ∈{≥, =}, B, m ∈ N, θ ∈ T, and  ∈{0,...,B} . The normal form can be obtained using known normal-form constructions [23,41] for general FO logic [2], or using Ehrenfeucht-Fra¨ıss´e games [39], or using a direct inductive transformation in the spirit of [23]. =2 =2 Example 7. Recall the formula ϕ = ∀x. ∃ y.(x ∼ y ∧ a(y)) ⇐⇒ ∃ y.(x ∼ y ∧ d(y)) ∈ FO [∼] from Example 2, over A = {a, b} and A = {c, d}.An A s e =0 equivalent formula in normal form is ϕ = ∃ y. θ(y)∧ψ (y) where 3, 4 θ∈T, ∈Z Z is the set of vectors  ∈{0,..., 3} such that (a)=2 = (d)or (d)=2 = (a). The formula indeed says that there is no class with =2 occurrences of a and =2 occurrences of d or vice versa, which is equivalent to ϕ . Thanks to the normal form, it is sufficient to test finitely many structures to determine whether a given formula is satisfiable: Corollary 8. The satisfiability problem for FO[∼] over data words is decidable. Moreover, every satisfiable FO [∼] formula has a finite model. Note that the satisfiability problem for FO [∼] is already NEXPTIME-hard, due to NEXPTIME-hardness for two-variable logic with unary relations only [14, 20,22]. In fact, it is NEXPTIME-complete due to the upper bound for FO [∼,<] [5]. It is worth mentioning that two-variable logic with one equivalence relation on arbitrary structures also has the finite-model property [32]. 4.2 From Synthesis to Parameterized Vector Games Exploiting the normal form for FO [∼], we now present a reduction of the syn- thesis problem to a strictly turn-based two-player game. This game is conceptu- ally simpler and easier to reason about. The reduction works in both directions, which will allow us to derive both decidability and undecidability results. Note that, given a formula ϕ ∈ FO [∼] (which we suppose to be in normal form with threshold B), the order of letters in an execution does not matter. Thus, given some P, a reasonable strategy for Environment would be to just “wait and see”. More precisely, it does not put Environment into a worse position if, given the current execution w ∈ Σ , it lets the System execute as many actions as it wants in terms of a word u ∈ Σ . Due to the fairness assumption, System would be able to execute all the letters from u anyway. Environment can even require System to play a word u such that (P,wu) |= ϕ. If System is not able to produce such a word, Environment can just sit back and do nothing. Conversely, upon wu satisfying ϕ, Environment has to be able to come up with a word v ∈ Σ such that (P,wuv) |= ϕ. This leads to a turn-based game in which System and Environment play in strictly alternate order and have to provide a satisfying and, respectively, falsifying execution. Parameterized Synthesis for First-Order Logic over Data Words 105 In a second step, we can get rid of process identifiers: According to our normal form, all we are interested in is the number of processes that agree on their letters counted up to threshold B. That is, a finite execution can be T A abstracted as a configuration C : L → N where L = {0,...,B} .For  ∈ L and C()=(n ,n ,n ), n is the number of processes of type θ whose letter count s e se θ up to threshold B corresponds to . We can also say that  contains n tokens of type θ. If it is System’s turn, it will pick some pairs (,  ) and move some tokens of type θ ∈{s, se} from  to  , provided (a) ≤  (a) for all a ∈ A and (a)=  (a) for all a ∈ A . This actually corresponds to adding more system letters in the corresponding processes. The Environment proceeds analogously. Finally, the formula ϕ naturally translates to an acceptance condition F⊆ C over configurations, where C is the set of local acceptance conditions, which are of the form ( n , n , n ) where ∈{=, ≥} and n ,n ,n ∈ N. s s e e se se s e se s e se We end up with a turn-based game in which, similarly to a VASS game [1,6, 10,27,38], System and Environment move tokens along vectors from L. Note that, however, our games have a very particular structure so that undecidability for VASS games does not carry over to our setting. Moreover, existing decidability results do not allow us to infer our cutoff results below. In the following, we will formalize parameterized vector games. Definition 9. A parameterized vector game (or simply game) is given by a triple G =(A, B, F) where A = A  A is the finite alphabet, B ∈ N is a bound, s e A L and, letting L = {0,...,B} , F⊆ C is a finite set called acceptance condition. Locations. Let  be the location such that  (a) = 0 for all a ∈ A.For  ∈ L 0 0 and a ∈ A, we define  + a by ( + a)(b)= (b) for b = a and ( + a)(b)= max{(a)+1,B} otherwise. This is extended for all u ∈ A and a ∈ A by + ε =  and  + ua =( + u)+ a.By ⟪w⟫, we denote the location  + w. Configurations. As explained above, a configuration of G is a mapping C : L → N . Suppose that, for  ∈ L and θ ∈ T,wehave C()=(n ,n ,n ). Then, we s e se let C(, θ) refer to n .By Conf , we denote the set of all configurations. Transitions. A system transition (respectively environment transition) is a map- ping τ : L×L → (N×{0}×N) (respectively τ : L×L → ({0}×N×N)) such that, for all (,  ) ∈ L×L with τ(,  ) =(0, 0, 0), there is a word w ∈ A (respectively w ∈ A ) such that  =  + w. Let T denote the set of system transitions, T the s e set of environment transitions, and T = T ∪ T the set of all transitions. s e For τ ∈ T , let the mappings out , in : L → N be defined by out ()= τ τ τ τ(,  ) and in ()= τ( ,) (recall that sum is component-wise). ∈L  ∈L We say that τ ∈ T is applicable at C ∈ Conf if, for all  ∈ L,wehave out () ≤ C() (component-wise). Abusing notation, we let τ(C) denote the configuration C defined by C ()= C() − out ()+ in () for all  ∈ L. Moreover, for τ τ τ(,  )=(n ,n ,n ) and θ ∈ T,welet τ(,  ,θ) refer to n . s e se θ Plays. Let C ∈ Conf . We write C |= F if there is κ ∈F such that, for all ∈ L,wehave C() |= κ() (in the expected manner). A C-play, or simply play, is a finite sequence π = C τ C τ C ...τ C alternating between configurations 0 1 1 2 2 n n 106 B. B´ erard et al. and transitions (with n ≥ 0) such that C = C and, for all i ∈{1,...,n}, C = τ (C ) and i i i−1 – if i is odd, then τ ∈ T and C |= F (System’s move), i s i – if i is even, then τ ∈ T and C |= F (Environment’s move). i e i The set of all C-plays is denoted by Plays . Strategies. A C-strategy for System is a partial mapping f : Plays → T C s such that f(C) is defined and, for all π = C τ C ...τ C ∈ Plays with τ = 0 1 1 i i f(π) defined, we have that τ is applicable at C and τ(C ) |= F. Play π = i i C τ C ...τ C is 0 1 1 n n – f-compatible if, for all odd i ∈{1,...,n}, τ = f(C τ C ...τ C ), i 0 1 1 i−1 i−1 – f-maximal if it is not the strict prefix of an f-compatible play, – winning if C |= F. We say that f is winning for System (from C)ifall f-compatible f-maximal C- plays are winning. Finally, C is winning if there is a C-strategy that is winning. Note that, given an initial configuration C, we deal with an acyclic finite reach- ability game so that, if there is a winning C-strategy, then there is a positional one, which only depends on the last configuration. For k ∈ N , let C denote the configuration that maps  to k and all other k 0 locations to (0, 0, 0). We set Win(G)= {k ∈ N | C is winning for System}. Definition 10 (game problem). For sets N , N , N ⊆ N, the game problem s e se is given as follows: Game(N , N , N ) s e se Input: Parameterized vector game G Question: Win(G) ∩ (N ×N ×N ) = ∅ ? s e se One can show that parameterized vector games are equivalent to the synthesis problem in the following sense: Lemma 11. For every sentence ϕ ∈ FO [∼], there is a parameterized vector game G =(A, B, F) such that Win(ϕ)= Win(G). Conversely, for every param- eterized vector game G =(A, B, F), there is a sentence ϕ ∈ FO [∼] such that Win(G)= Win(ϕ). Both directions are effective. Example 12. To illustrate parameterized vector games and the reduction from =0 the synthesis problem, consider the formula ϕ = ∃ y. θ(y)∧ψ (y) 3, 4 θ∈T, ∈Z in normal form from Example 7. For simplicity, we assume that A = {a} and i j {a,d} A = {d}. That is, Z is the set of vectors ⟪a d ⟫ ∈ L = {0,..., 3} such that i =2 = j or j =2 = i. Figure 2 illustrates a couple of configurations C ,...,C : L → N . The leftmost location in a configuration is  , the rightmost 0 5 0 Parameterized Synthesis for First-Order Logic over Data Words 107 C 1 C τ C 0 1 2 System Environment 6 6 4 2 2 6 System System Environment τ τ τ 3 C C 5 C 3 4 4 5 Fig. 2. A play of a parameterized vector game 3 3 3 3 location ⟪a d ⟫, the topmost one ⟪a ⟫, and the one at the bottom ⟪d ⟫. Self- loops have been omitted, and locations from Z have gray background and a dashed border. Towards an equivalent game G =(A, 3, F), it remains to determine the accep- tance condition F. Recall that ϕ says that every class contains two occurrences of a iff it contains two occurrences of d. This is reflected by the acceptance condi- tion F = {κ} where κ()=(=0 , =0 , =0) for all  ∈ Z and κ()=(≥0 , ≥0 , ≥0) for all  ∈ L \ Z. With this, a configuration is accepting iff no token is on a location from Z (a gray location). We can verify that Win(G)= Win(ϕ )= N ×0 ×N.In G, a uniform winning strategy f for System that works for all P with P = ∅ proceeds as follows: System first awaits an Environment’s move and then moves each token upwards as many locations as Environment has moved it downwards. Figure 2 illustrates an f-maximal C -play that is winning for System. We note that f is a (6,0,0) “compressed” version of the winning strategy presented in Example 4, as System makes her moves only when really needed. 5 Results for FO[∼] via Parameterized Vector Games In this section, we present our results for the synthesis problem for FO[∼], which we obtain showing corresponding results for parameterized vector games. In particular, we show that (FO[∼], 0, 0, N) and (FO[∼], N, N, 0) do not have a cutoff, whereas (FO[∼], N, {k }, {k }) has a cutoff for all k ,k ∈ N. Finally, we e se e se prove that Synth(FO[∼], 0, 0, N) is, in fact, undecidable. Lemma 13. There is a game G =(A, B, F) such that Win(G) does not have a cutoff wrt. (0, 0, N). Proof. We let A = {a} and A = {b}, as well as B =2. For k ∈{0, 1, 2}, define s e = ≥ the local acceptance conditions k = (=0 , =0 , =k) and k = (=0 , =0 , ≥k). Set 108 B. B´ erard et al. 2 0 0 0 0 ≥0 ≥0 ≥0 ≥0 ≥2 0 0 0 0 1 0 0 1 ≥0 ≥0 ≥0 ≥1 1 0 Fig. 3. Acceptance conditions for a game with no cutoff wrt. (0, 0, N) 2 2 2 = ⟪a⟫, = ⟪ab⟫, = ⟪a b⟫, and  = ⟪a b ⟫.For k ,...,k ∈{0, 1, 2} and 1 2 3 4 0 4 0 1 2 3 4 ,..., ∈{=, ≥}, let [ k , k , k , k , k ] denote κ ∈ C where 0 4 0 1 2 3 4 κ( )=( k ) for all i ∈{0,..., 4} and κ( )=( 0) for  ∈{ /  ,..., }. Finally, i i 0 4 ≥ = = = ≥ ≥ = = = ≥ = = = = ≥ [ 0 , 2 , 0 , 0 , 0] [ 0 , 0 , 0 , 2 , 0] [ 0 , 0 , 0 , 0 , 2] F = ∪ K ≥ = = = ≥ ≥ = = = ≥ [ 0 , 1 , 1 , 0 , 0] [ 0 , 0 , 0 , 1 , 1] where K = {κ |  ∈ L such that (b) >(a)} with κ ( )=( 1) if  = , and κ ( )=( 0) otherwise. This is illustrated in Figure 3. There is a winning strategy for System from any initial configuration of size 2n: Move two tokens from  to  , wait until Environment sends them both to 0 1 , then move them to  , wait until they are moved to  , then repeat with two 2 3 4 new tokens from  until all the tokens are removed from  , and Environment 0 0 cannot escape F anymore. However, one can check that there is no winning strategy for initial configurations of odd size. Lemma 14. There is a game G =(A, B, F) such that Win(G) does not have a cutoff wrt. (N, N, 0). Proof. We define G such that System wins only if she has at least as many processes as Environment. Let A = {a}, A = {b}, and B = 2. As there are no s e shared processes, we can safely ignore locations with a letter from both System and Environment. We set F = {κ ,κ ,κ ,κ } where 1 2 3 4 κ (⟪a⟫) = (=1 , =0 , =0) κ (⟪a⟫) = (=1 , =0 , =0) κ (⟪a⟫) = (=0 , =0 , =0) 1 2 3 κ (⟪b⟫)=(=0 , =0 , =0) κ (⟪b⟫) = (=0 , ≥2 , =0) κ (⟪b⟫)=(=0 , ≥1 , =0) , 1 2 3 κ ( ) = (=0 , =0 , =0), and κ ( )=(≥0 , ≥0 , =0) for all other  ∈ L and 4 0 i i ∈{1, 2, 3, 4}. We now turn to the case where the number of processes that can be trig- gered by Environment is bounded. Note that similar restrictions are imposed in other settings to get decidability, such as limiting the environment to a fi- nite (Boolean) domain [16] or restricting to one environment process [3,18]. We obtain decidability of the synthesis problem via a cutoff construction: Parameterized Synthesis for First-Order Logic over Data Words 109 Theorem 15. Given k ,k ∈ N, every game G =(A, B, F) has a cutoff wrt. e se (N, {k }, {k }). More precisely: Let K be the largest constant that occurs in F. e se Max+1 ˆ ˆ Moreover, let Max =(k +k )·|A |·B and N = |L| ·K. Then, (N, k ,k ) e se e e se is a cutoff of Win(G) wrt. (N, {k }, {k }). e se Proof. We will show that, for all N ≥ N, (N, k ,k ) ∈ Win(G) ⇐⇒ (N +1,k ,k ) ∈ Win(G) . e se e se The main observation is that, when C contains more than K tokens in a given ∈ L, adding more tokens in  will not change whether C |= F. Given C, C ∈ Conf , we write C< C if C = C and there is τ ∈ T such that τ(C)= C . Note e e that the length d of a chain C < C < ... < C is bounded by Max. In other 0 e 1 e e d words, Max is the maximal number of transitions that Environment can do in a play. For all d ∈{0,..., Max}, let Conf be the set of configurations C ∈ Conf such that the longest chain in (Conf ,< ) starting from C has length d. Claim. Suppose that C ∈ Conf and  ∈ L such that C()=(N, n ,n ) with e se d+1 N ≥|L| · K and n ,n ∈ N. Set D = C[ → (N +1,n ,n )]. Then, e se e se C is winning for System ⇐⇒ D is winning for System. To show the claim, we proceed by induction on d ∈ N, which is illustrated in Figure 4. In each implication, we distinguish the cases d = 0 and d ≥ 1. For the latter, we assume that equivalence holds for all values strictly smaller than d. For τ ∈ T and ,  ∈ L, we let τ[(,  , s)++] denote the transition η ∈ T s s given by η( , , e)= τ( , , e)=0, η( , , se)= τ( , , se), η( , , s)= 1 2 1 2 1 2 1 2 1 2 τ( , , s)+1 if ( , )=(,  ), and η( , , s)= τ( , , s)if( , ) =(,  ). 1 2 1 2 1 2 1 2 1 2 We define τ[(,  , s)––] similarly (provided τ(,  , s) ≥ 1). =⇒: Let f be a winning strategy for System from C ∈ Conf . Let τ = f(C) d+1 and C = τ (C). Note that C |= F. Since C(, s)= N ≥|L| · K, there is ∗    d ∈ L such that  + w =  for some w ∈ A and C ( , s)= N ≥|L| · K. We show that D = C[ → (N +1,n ,n )] is winning for System by exhibiting e se a corresponding winning strategy g from D that will carefully control the position of the additional token. First, set g(D)= η where η = τ [(,  , s)++]. Let D = η (D). We obtain D ( , s)= N + 1. Note that, since N ≥ K, the acceptance condition F cannot distinguish between C and D . Thus, we have D |= F. Case d = 0: As, for all transitions η ∈ T ,wehave η (D )= D |= F,we reached a maximal play that is winning for System. We deduce that D is winning for System. Case d ≥ 1: Take any η ∈ T and D such that D = η (D ) |= F.Let τ = η and C = τ (C ). Note that D = C [( , s) → N + 1], C = D [( , s) → N], and C ,D ∈ Conf for some d <d.As f is a winning strategy for System from C, we have that C is winning for System. By induction hypothesis, D is winning for System, say by winning strategy g .Welet g(Dη D η π)= g (π) for all D -plays π. For all unspecified plays, let g return any applicable system transition. Altogether, for any choice of η ,we have that g is winning from D .Thus, g is a winning strategy from D. 110 B. B´ erard et al. Conf Conf d d |= F η − d <d D D D N +1 n n N +1 n n e se e se N +1 n n e se d+1  d N ≥ |L| · K N ≥ |L| · K N n n N n n e se e se Nn n e se τ |= F τ Fig. 4. Induction step in the cutoff construction ⇐=: Suppose g is a winning strategy for System from D. Thus, for η = g(D) d+1 and D = η (D), we have D |= F. Recall that D(, s) ≥ (|L| · K)+1. We distinguish two cases: 1. Suppose there is  ∈ L such that  =  , D ( , s)= N + 1 for some N ≥|L| · K, and η (,  , s) ≥ 1. Then, we set τ = η [(,  , s)––]. 2. Otherwise, we have D (, s) ≥ (|L| · K) + 1, and we set τ = η (as well as =  and N = N). Let C = τ (C). Since D |= F, one obtains C |= F. Case d = 0: For all transitions τ ∈ T ,wehave τ (C )= C |= F. Thus, we reached a maximal play that is winning for System. We deduce that C is winning for System. Case d ≥ 1: Take any τ ∈ T such that C = τ (C ) |= F. Let η = τ and D = η (D ). We have C = D [( , s) → N ], D = C [( , s) → N + 1], and C ,D ∈ Conf − for some d <d.As D is winning for System, by induction hypothesis, C is winning for System, say by winning strategy f . We let f(Cτ C τ π)= f (π) for all C -plays π. For all unspecified plays, let f return an arbitrary applicable system transition. Again, for any choice of τ , f is winning from C .Thus, f is a winning strategy from C. This concludes the proof of the claim and, therefore, of Theorem 15. Corollary 16. Let k ,k ∈ N be the number of environment and the num- e se ber of mixed processes, respectively. The problems Game(N, {k }, {k }) and e se Synth(FO[∼], N, {k }, {k }) are decidable. e se Parameterized Synthesis for First-Order Logic over Data Words 111 In particular, by Theorem 15, the game problem can be reduced to an ex- ponential number of acyclic finite-state games whose size (and hence the time complexity for determining the winner) is exponential in the cutoff and, there- fore, doubly exponential in the size of the alphabet, the bound B, and the fixed number of processes that are controllable by the environment. Theorem 17. Game(0, 0, N) and Synth(FO[∼], 0, 0, N) are undecidable. Proof. We provide a reduction from the halting problem for 2-counter machines (2CM) to Game(0, 0, N). A 2CM M =(Q, Δ, c , c ,q ,q ) has two counters, 1 2 0 h c and c , a finite set of states Q, and a set of transitions Δ ⊆ Q × Op × Q 1 2 where Op = {c ++ , c –– , c ==0 | i ∈{1, 2}}. Moreover, we have an initial i i i state q ∈ Q and a halting state q ∈ Q. A configuration of M is a triple 0 h γ =(q, ν ,ν ) ∈ Q × N × N giving the current state and the current respective 1 2 counter values. The initial configuration is γ =(q , 0, 0) and the set of halting 0 0 configurations is F = {q }× N × N.For t ∈ Δ, configuration (q ,ν ,ν )isa 1 2 (t-)successor of (q, ν ,ν ), written (q, ν ,ν )  (q ,ν ,ν ), if there is i ∈{1, 2} 1 2 1 2 t 1 2 such that ν = ν and one of the following holds: (i) t =(q, c ++,q ) and 3−i i 3−i ν = ν +1, or (ii) t =(q, c ––,q ) and ν = ν − 1, or (iii) t =(q, c ==0,q ) and i i i i i i ν = ν = 0. A run of M is a (finite or infinite) sequence γ  γ  ... . The i 0 t 1 t i 1 2 2CM halting problem asks whether there is a run reaching a configuration in F . It is known to be undecidable [34]. We fix a 2CM M =(Q, Δ, c , c ,q ,q ). Let A = Q ∪ Δ ∪{a ,a } and A = 1 2 0 h s 1 2 e {b} with a , a , and b three fresh symbols. We consider the game G =(A, B, F) 1 2 with A = A A , B = 4, and F defined below. Let L = {0,...,B} . Since there s e are only processes shared by System and Environment, we alleviate notation and consider that a configuration is simply a mapping C : L → N. From now on, to avoid confusion, we refer to configurations of the 2CM M as M-configurations, and to configurations of G as G-configurations. Intuitively, every valid run of M will be encoded as a play in G, and the acceptance condition will enforce that, if a player in G deviates from a valid play, then she will lose immediately. At any point in the play, there will be at most one process with only a letter from Q played, which will represent the current state in the simulated 2CM run. Similarly, there will be at most one process with only a letter from Δ to represent what transition will be taken next. Finally, the value of counter c will be encoded by the number of processes 2 2 with exactly two occurrences of a and two occurrences of b (i.e., C(⟪a b ⟫)). 2 2 To increase counter c , the players will move a new token to ⟪a b ⟫, and to 2 2 4 4 decrease it, they will move, together, a token from ⟪a b ⟫ to ⟪a b ⟫. Observe i i 2 2 that, if c has value 0, then C(⟪a b ⟫) = 0 in the corresponding configuration of the game. As expected, it is then impossible to simulate the decrement of c . Environment’s only role is to acknowledge System’s actions by playing its (only) letter when System simulates a valid run. If System tries to cheat, she loses immediately. Encoding an M-configuration. Let us be more formal. Suppose γ =(q, ν ,ν )is 1 2 an M-configuration and C a G-configuration. We say that C encodes γ if 112 B. B´ erard et al. 2 2 2 2 – C(⟪q⟫)=1, C(⟪a b ⟫)= ν , C(⟪a b ⟫)= ν , 1 2 1 2 2 2 2 2 4 4 – C() ≥ 0 for all  ∈{ }∪{⟪qˆ b ⟫, ⟪t b ⟫, ⟪a b ⟫ | qˆ ∈ Q, t ∈ Δ, i ∈{1, 2}}, – C() = 0 for all other  ∈ L. We then write γ = m(C). Let C(γ) be the set of G-configurations C that en- code γ. We say that a G-configuration C is valid if C ∈ C(γ) for some γ. Simulating a transition of M. Let us explain how we go from a G-configuration encoding γ to a G-configuration encoding a successor M-configuration γ . Ob- serve that System cannot change by herself the M-configuration encoded. If, for instance, she tries to change the current state q, she might move one process from to ⟪q ⟫, but then the G-configuration is not valid anymore. We need to move 2 2 the process in ⟪q⟫ into ⟪q b ⟫ and this requires the cooperation of Environment. Assume that the game is in configuration C encoding γ =(q, ν ,ν ). System 1 2 will pick a transition t starting in state q,say, t =(q, c ++,q ). From con- figuration C, System will go to the configuration C defined by C (⟪t⟫)=1, 1 1 C (⟪a ⟫) = 1, and C ()= C() for all other  ∈ L. 1 1 1 If the transition t is correctly chosen, Environment will go to a configura- tion C defined by C (⟪q⟫)=0, C (⟪qb⟫)=1, C (⟪t⟫)=0, C (⟪tb⟫)=1, 2 2 2 2 2 C (⟪a ⟫)=0, C (⟪a b⟫) = 1 and, for all other  ∈ L, C ()= C (). This 2 1 2 1 2 1 means that Environment moves processes in locations ⟪t⟫, ⟪q⟫, ⟪a ⟫ to loca- tions ⟪tb⟫, ⟪qb⟫, ⟪a b⟫, respectively. To finish the transition, System will now move a process to the destination state q of t, and go to configuration C defined by C (⟪q ⟫)=1, C (⟪tb⟫)=0, 3 3 3 2 2 2 C (⟪t b⟫)=1, C (⟪qb⟫)=0, C (⟪q b⟫)=1, C (⟪a b⟫)=0, C (⟪a b⟫)=1, 3 3 3 3 1 3 and C ()= C () for all other  ∈ L. 3 2 Finally, Environment moves to configuration C given by C (⟪t b⟫)=0, 4 4 2 2 2 2 2 2 2 2 2 C (⟪t b ⟫)= C (⟪t b ⟫)+1, C (⟪q b⟫)=0, C (⟪q b ⟫)= C (⟪q b ⟫)+1, 4 3 4 4 3 2 2 2 2 2 C (⟪a b⟫)=0, C (⟪a b ⟫)= C (⟪a b ⟫)+1, and C ()= C () for all other 4 4 3 4 3 1 1 1 ∈ L. Observe that C ∈ C((q ,ν +1,ν )). 4 1 2 Other types of transitions will be simulated similarly. To force System to start the simulation in γ , and not in any M-configuration, the configurations 2 2 C such that C(⟪q b ⟫)=0 and C(⟪q⟫) = 1 for q = q are not valid, and will be losing for System. Acceptance condition. It remains to define F in a way that enforces the above 2 2 4 4 sequence of G-configurations. Let L = { }∪{⟪a b ⟫, ⟪a b ⟫ | i ∈{1, 2}} ∪ i i 2 2 2 2 {⟪q b ⟫ | q ∈ Q}∪{⟪t b ⟫ | t ∈ Δ} be the set of elements in L whose values do not affect the acceptance of the configuration. By [ n ,..., n ], we 1 1 1 k k k denote κ ∈ C such that κ( )=( n ) for i ∈{1,...,k} and κ() = (=0) for all i i i ˆ ˆ ∈ L \{ ,..., }. Moreover, for a set of locations L ⊆ L,welet L ≥ 0 stand 1 k for “( ≥ 0) for all  ∈ L”. First, we force Environment to play only in response to System by making System win as soon as there is a process where Environment has played more letters than System (see Condition (d) in Table 2). If γ is not halting, the configurations in C(γ) will not be winning for System. Hence, System will have to move to win (Condition (a)). Parameterized Synthesis for First-Order Logic over Data Words 113 Table 2. Acceptance conditions for the game simulating a 2CM Requirements for System (a) For all t =(q, op,q ) ∈ Q: 2 2 2 2 F = [⟪q⟫ =1, ⟪t⟫ =1, ⟪a ⟫ =1, ⟪qˆ b ⟫ ≥ 1, L \{⟪qˆ b ⟫} ≥ 0] if op = c ++ (q,t) i  i qˆ∈Q 3 2 2 2 2 2 F = [⟪q⟫ =1, ⟪t⟫ =1, ⟪a b ⟫ =1, ⟪qˆ b ⟫ ≥ 1, L \{⟪qˆ b ⟫} ≥ 0] if op = c –– (q,t) i  i qˆ∈Q 2 2 2 2 2 2 2 2 F = [⟪q⟫ =1, ⟪t⟫ =1, ⟪a b ⟫ =0, ⟪qˆ b ⟫ ≥ 1, L \{⟪qˆ b ⟫, ⟪a b ⟫} ≥ 0] if op = c ==0 (q,t)  i qˆ∈Q i i (b) For all t =(q , op,q ) ∈ Q such that op ∈{c ++, c ==0}: 0 i i F = [⟪q ⟫ =1, ⟪t⟫ =1, ⟪a ⟫ =1, ≥ 0] if op = c ++ t 0 i 0 i F = [⟪q ⟫ =1, ⟪t⟫ =1, ≥ 0] if op = c ==0 t 0 0 i (c) For all t =(q, op,q ) ∈ Q: 2 2 2 F  = [⟪q b⟫ =1, ⟪t b⟫ =1, ⟪a b⟫ =1, ⟪q ⟫ =1,L ≥ 0] if op = c ++ (q,t,q ) i  i 2 2 4 3 F  = [⟪q b⟫ =1, ⟪t b⟫ =1, ⟪a b ⟫ =1, ⟪q ⟫ =1,L ≥ 0] if op = c –– (q,t,q ) i  i 2 2 F  = [⟪q b⟫ =1, ⟪t b⟫ =1,L ≥ 0] if op = c ==0 (q,t,q )  i Requirements for Environment (d) Let L =  ∈ L | (α) <(b) . For all  ∈ L : F =[ ≥ 1, (L \{}) ≥ 0] s<e s<e α∈A (e) For all t =(q, op,q ) ∈ Q: ⎧ ⎫ ⎪ ⎪ [⟪qb⟫ =1, ⟪t⟫ =1, ⟪ai⟫ =1,L ≥ 0], [⟪q⟫ =1, ⟪tb⟫ =1, ⟪ai⟫ =1,L ≥ 0], ⎪ ⎪ ⎨ ⎬ [⟪q⟫ =1, ⟪t⟫ =1, ⟪a b⟫ =1,L ≥ 0], [⟪qb⟫ =1, ⟪tb⟫ =1, ⟪a ⟫ =1,L ≥ 0], F = i  i  if op = c ++ (q,t) ⎪ ⎪ ⎪ ⎪ ⎩ ⎭ [⟪qb⟫ =1, ⟪t⟫ =1, ⟪a b⟫ =1,L ≥ 0], [⟪q⟫ =1, ⟪tb⟫ =1, ⟪a b⟫ =1,L ≥ 0] i  i ⎧ ⎫ 3 2 3 2 ⎪ [⟪qb⟫ =1, ⟪t⟫ =1, ⟪a b ⟫ =1,L ≥ 0], [⟪q⟫ =1, ⟪tb⟫ =1, ⟪a b ⟫ =1,L ≥ 0],⎪ i  i ⎪ ⎪ ⎨ ⎬ 3 3 3 2 [⟪q⟫ =1, ⟪t⟫ =1, ⟪a b ⟫ =1,L ≥ 0], [⟪qb⟫ =1, ⟪tb⟫ =1, ⟪a b ⟫ =1,L ≥ 0], F = i  i  if op = ci–– (q,t) ⎪ ⎪ ⎪ ⎪ 3 3 3 3 ⎩ ⎭ [⟪qb⟫ =1, ⟪t⟫ =1, ⟪a b ⟫ =1,L ≥ 0], [⟪q⟫ =1, ⟪tb⟫ =1, ⟪a b ⟫ =1,L ≥ 0] i  i F = [⟪qb⟫ =1, ⟪t⟫ =1,L ≥ 0], [⟪q⟫ =1, ⟪tb⟫ =1,L ≥ 0] if op = c ==0 (q,t) (f) For all t =(q, op,q ) ∈ Q: ⎧ ⎫ 2 2 2 ⎪ [⟪q ⟫ =1, ⟪q b⟫ =1, ⟪t b⟫ ≥ 0, ⟪a b⟫ ≥ 0,L ≥ 0],⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪  2 2 2 ⎪ ⎨ ⎬ [⟪q ⟫ =1, ⟪q b⟫ ≥ 0, ⟪t b⟫ =1, ⟪a b⟫ ≥ 0,L ≥ 0], F  = if op = c ++ (q,t,q )  2 2 2 ⎪ ⎪ [⟪q ⟫ =1, ⟪q b⟫ ≥ 0, ⟪t b⟫ ≥ 0, ⟪a b⟫ =1,L ≥ 0], ⎪ i  ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ 2 2 2 ⎩ ⎭ [⟪q b⟫ =1, ⟪q b⟫ ≥ 0, ⟪t b⟫ ≥ 0, ⟪a b⟫ ≥ 0,L ≥ 0] ⎧ ⎫ 2 2 4 3 ⎪ [⟪q ⟫ =1, ⟪q b⟫ =1, ⟪t b⟫ ≥ 0, ⟪a b ⟫ ≥ 0,L ≥ 0],⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪  2 2 4 3 ⎪ ⎨ ⎬ [⟪q ⟫ =1, ⟪q b⟫ ≥ 0, ⟪t b⟫ =1, ⟪a b ⟫ ≥ 0,L ≥ 0], F  = if op = c –– (q,t,q ) i 2 2 4 3 ⎪ ⎪ [⟪q ⟫ =1, ⟪q b⟫ ≥ 0, ⟪t b⟫ ≥ 0, ⟪a b ⟫ =1,L ≥ 0], ⎪ i  ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ 2 2 4 3 ⎩ ⎭ [⟪q b⟫ =1, ⟪q b⟫ ≥ 0, ⟪t b⟫ ≥ 0, ⟪a b ⟫ ≥ 0,L ≥ 0] ⎧ ⎫ 2 2 [⟪q ⟫ =1, ⟪q b⟫ =1, ⟪t b⟫ ≥ 0,L ≥ 0], ⎪  ⎪ ⎪ ⎪ ⎨ ⎬ 2 2 [⟪q ⟫ =1, ⟪q b⟫ ≥ 0, ⟪t b⟫ =1,L ≥ 0], F  =  if op = c ==0 (q,t,q ) i ⎪ ⎪ ⎪ ⎪ ⎩  2 2 4 3 ⎭ [⟪q b⟫ =1, ⟪q b⟫ ≥ 0, ⟪t b⟫ ≥ 0, ⟪a b ⟫ ≥ 0,L ≥ 0] i  114 B. B´ erard et al. The first transition chosen by System must start from the initial state of M. This is enforced by Condition (b). Once System has moved, Environment will move other processes to leave accepting configurations. The only possible move for her is to add b on a pro- cess in locations ⟪q⟫, ⟪t⟫, and ⟪a ⟫,if t is a transition incrementing counter 3 2 c (respectively ⟪a b ⟫ if t is a transition decrementing counter c ). All other i i G-configurations accessible by Environment from already defined accepting con- figurations are winning for System, as established in Condition (e). System can now encode the successor configuration of M, according to the chosen transition, by moving a process to the destination state of the transition (see Condition (c)). Finally, Environment makes the necessary transitions for the configuration to be a valid G-configuration. If she deviates, System wins (see Condition (f)). If Environment reaches a configuration in C(γ) for γ ∈ F , System can win by moving the process in ⟪q ⟫ to ⟪q ⟫. From there, all the configurations reachable by Environment are also winning for System: 2 2 2 2 F = [⟪q ⟫ =1,L ≥ 0] , [⟪q b⟫ =1,L ≥ 0] , [⟪q b ⟫ =1,L ≥ 0] . h h h Finally, the acceptance condition is given by e e F = F ∪ F ∪ (F ∪F ∪F  ∪F )∪F . t  F (q,t) (q,t,q ) (q,t) (q,t,q ) ∈L t=(q ,op,q )∈Δ t=(q,op,q )∈Δ s<e 0 Note that a correct play can end in three different ways: either there is a process in ⟪q ⟫ and System moves it to ⟪q ⟫, or System has no transition to pick, or there are not enough processes in  for System to simulate a new transition. Only the first kind is winning for System. We can show that there is an accepting run in M iff there is some k such that System has a winning C -strategy for G. (0,0,k) 6 Conclusion There are several questions that we left open and that are interesting in their own right due to their fundamental character. Moreover, in the decidable cases, it will be worthwhile to provide tight bounds on cutoffs and the algorithmic complexity of the decision problem. Like in [7,15,16,30,31], our strategies allow the system to have a global view of the whole program run executed so far. However, it is also perfectly natural to consider uniform local strategies where each process only sees its own actions and possibly those that are revealed according to some causal dependencies. This is, e.g., the setting considered in [3,18] for a fixed number of processes and in [25] for parameterized systems over ring architectures. Moreover, we would like to study a parameterized version of the control problem [35] where, in addition to a specification, a program in terms of an arena is already given but has to be controlled in a way such that the specification is satisfied. Finally, our synthesis results crucially rely on the fact that the number of processes in each execution is finite. It would be interesting to consider the case with potentially infinitely many processes. Parameterized Synthesis for First-Order Logic over Data Words 115 References 1. P. A. Abdulla, R. Mayr, A. Sangnier, and J. Sproston. Solving parity games on integer vectors. In P. R. D’Argenio and H. C. Melgratti, editors, CONCUR 2013 - Concurrency Theory - 24th International Conference, CONCUR 2013, Buenos Aires, Argentina, August 27-30, 2013. Proceedings, volume 8052 of Lecture Notes in Computer Science, pages 106–120. Springer, 2013. 2. B. B´erard, B. Bollig, M. Lehaut, and N. Sznajder. Parameterized synthesis for fragments of first-order logic over data words. CoRR, abs/1910.14294, 2019. 3. R. Beutner, B. Finkbeiner, and J. Hecking-Harbusch. Translating Asynchronous Games for Distributed Synthesis. In W. Fokkink and R. van Glabbeek, editors, 30th International Conference on Concurrency Theory (CONCUR 2019), volume 140 of Leibniz International Proceedings in Informatics (LIPIcs), pages 26:1–26:16, Dagstuhl, Germany, 2019. Schloss Dagstuhl–Leibniz-Zentrum fuer Informatik. 4. R. Bloem, S. Jacobs, A. Khalimov, I. Konnov, S. Rubin, H. Veith, and J. Widder. Decidability of Parameterized Verification. Morgan & Claypool Publishers, 2015. 5. M. Bojanczyk, C. David, A. Muscholl, T. Schwentick, and L. Segoufin. Two- variable logic on data words. ACM Trans. Comput. Log., 12(4):27, 2011. 6. T. Br´ azdil, P. Jancar, and A. Kucera. Reachability games on extended vector addition systems with states. In ICALP’10, Part II, volume 6199 of LNCS, pages 478–489. Springer, 2010. 7. B. Brutsc ¨ h and W. Thomas. Playing games in the Baire space. In Proc. Cassting Workshop on Games for the Synthesis of Complex Systems and 3rd Int. Workshop on Synthesis of Complex Parameters, volume 220 of EPTCS, pages 13–25, 2016. 8. J. R. Buc ¨ hi and L. H. Landweber. Solving sequential conditions by finite-state strategies. Transactions of the American Mathematical Society, 138:295–311, Apr. 9. A. Church. Applications of recursive arithmetic to the problem of circuit synthesis. In Summaries of the Summer Institute of Symbolic Logic – Volume 1, pages 3–50. Institute for Defense Analyses, 1957. 10. J. Courtois and S. Schmitz. Alternating vector addition systems with states. In E. Csuhaj-Varju, ´ M. Dietzfelbinger, and Z. Esik, editors, Mathematical Founda- tions of Computer Science 2014 - 39th International Symposium, MFCS 2014, Budapest, Hungary, August 25-29, 2014. Proceedings, Part I, volume 8634 of Lec- ture Notes in Computer Science, pages 220–231. Springer, 2014. 11. S. Demri, D. D’Souza, and R. Gascon. Temporal logics of repeating values. J. Log. Comput., 22(5):1059–1096, 2012. 12. S. Demri and R. Lazi´c. LTL with the freeze quantifier and register automata. ACM Transactions on Computational Logic, 10(3), 2009. 13. J. Esparza. Keeping a crowd safe: On the complexity of parameterized verification. In STACS’14, volume 25 of Leibniz International Proceedings in Informatics, pages 1–10. Leibniz-Zentrum fur ¨ Informatik, 2014. 14. K. Etessami, M. Y. Vardi, and T. Wilke. First-order logic with two variables and unary temporal logic. Inf. Comput., 179(2):279–295, 2002. 15. L. Exibard, E. Filiot, and P.-A. Reynier. Synthesis of Data Word Transducers. In W. Fokkink and R. van Glabbeek, editors, 30th International Conference on Con- currency Theory (CONCUR 2019), volume 140 of Leibniz International Proceed- ings in Informatics (LIPIcs), pages 24:1–24:15, Dagstuhl, Germany, 2019. Schloss Dagstuhl–Leibniz-Zentrum fuer Informatik. 116 B. B´ erard et al. 16. D. Figueira and M. Praveen. Playing with repetitions in data words using en- ergy games. In A. Dawar and E. Gr¨ adel, editors, Proceedings of the 33rd Annual ACM/IEEE Symposium on Logic in Computer Science, LICS 2018, Oxford, UK, July 09-12, 2018, pages 404–413. ACM, 2018. 17. D. Figueira and M. Praveen. Playing with repetitions in data words using energy games. arXiv preprint arXiv:1802.07435, 2018. 18. B. Finkbeiner and E. Olderog. Petri games: Synthesis of distributed systems with causal memory. Inf. Comput., 253:181–203, 2017. 19. H. Frenkel, O. Grumberg, and S. Sheinvald. An automata-theoretic approach to model-checking systems and specifications over infinite data domains. J. Autom. Reasoning, 63(4):1077–1101, 2019. 20. M. Furer. ¨ The computational complexity of the unconstrained limited domino problem (with implications for logical decision problems). In E. B¨ orger, G. Hasen- jaeger, and D. R¨ odding, editors, Logic and Machines: Decision Problems and Com- plexity, Proceedings of the Symposium ”Rekursive Kombinatorik” held from May 23-28, 1983 at the Institut fur ¨ Mathematische Logik und Grundlagenforschung der Universit¨at Munster/Westfalen ¨ , volume 171 of Lecture Notes in Computer Science, pages 312–319. Springer, 1983. 21. P. Gastin and N. Sznajder. Fair synthesis for asynchronous distributed systems. ACM Transactions on Computational Logic, 14(2:9), 2013. 22. E. Gr¨ adel, P. G. Kolaitis, and M. Y. Vardi. On the decision problem for two- variable first-order logic. Bulletin of Symbolic Logic, 3(1):53–69, 1997. 23. W. Hanf. Model-theoretic methods in the study of elementary logic. In J. W. Addison, L. Henkin, and A. Tarski, editors, The Theory of Models. North-Holland, Amsterdam, 1965. 24. F. Horn, W. Thomas, N. Wallmeier, and M. Zimmermann. Optimal strategy syn- thesis for request-response games. RAIRO - Theor. Inf. and Applic., 49(3):179–203, 25. S. Jacobs and R. Bloem. Parameterized synthesis. Logical Methods in Computer Science, 10(1), 2014. 26. S. Jacobs, L. Tentrup, and M. Zimmermann. Distributed synthesis for parameter- ized temporal logics. Inf. Comput., 262(Part):311–328, 2018. 27. P. Jancar. On reachability-related games on vector addition systems with states. In RP’15, volume 9328 of LNCS, pages 50–62. Springer, 2015. 28. M. Jenkins, J. Ouaknine, A. Rabinovich, and J. Worrell. The church synthesis problem with metric. In M. Bezem, editor, Computer Science Logic, 25th Interna- tional Workshop / 20th Annual Conference of the EACSL, CSL 2011, September 12-15, 2011, Bergen, Norway, Proceedings,volume12of LIPIcs, pages 307–321. Schloss Dagstuhl - Leibniz-Zentrum fuer Informatik, 2011. 29. M. Kaminski and N. Francez. Finite-memory automata. Theoretical Computer Science, 134(2):329–363, 1994. 30. A. Khalimov and O. Kupferman. Register-Bounded Synthesis. In W. Fokkink and R. van Glabbeek, editors, 30th International Conference on Concurrency Theory (CONCUR 2019), volume 140 of Leibniz International Proceedings in Informatics (LIPIcs), pages 25:1–25:16, Dagstuhl, Germany, 2019. Schloss Dagstuhl–Leibniz- Zentrum fuer Informatik. 31. A. Khalimov, B. Maderbacher, and R. Bloem. Bounded synthesis of register trans- ducers. In S. K. Lahiri and C. Wang, editors, Automated Technology for Verifica- tion and Analysis - 16th International Symposium, ATVA 2018, Los Angeles, CA, USA, October 7-10, 2018, Proceedings, volume 11138 of Lecture Notes in Computer Science, pages 494–510. Springer, 2018. Parameterized Synthesis for First-Order Logic over Data Words 117 32. E. Kieronski and M. Otto. Small substructures and decidability issues for first- order logic with two variables. J. Symb. Log., 77(3):729–765, 2012. 33. L. Libkin, T. Tan, and D. Vrgoc. Regular expressions for data words. J. Comput. Syst. Sci., 81(7):1278–1297, 2015. 34. M. L. Minsky. Computation: Finite and Infinite Machines. Prentice Hall, Upper Saddle River, NJ, USA, 1967. 35. A. Muscholl. Automated synthesis of distributed controllers. In M. M. Halld´ orsson, K. Iwama, N. Kobayashi, and B. Speckmann, editors, Automata, Languages, and Programming - 42nd International Colloquium, ICALP 2015, Kyoto, Japan, July 6- 10, 2015, Proceedings, Part II, volume 9135 of Lecture Notes in Computer Science, pages 11–27. Springer, 2015. 36. A. Pnueli and R. Rosner. Distributed reactive systems are hard to synthesize. In 31st Annual Symposium on Foundations of Computer Science, St. Louis, Missouri, USA, October 22-24, 1990, Volume II, pages 746–757. IEEE Computer Society, 37. M. O. Rabin. Automata on infinite objects and Church’s problem.Number 13 in Regional Conference Series in Mathematics. American Mathematical Soc., 1972. 38. J. Raskin, M. Samuelides, and L. V. Begin. Games for counting abstractions. Electr. Notes Theor. Comput. Sci., 128(6):69–85, 2005. 39. A. Sangnier and O. Stietel. Private communication, 2020. 40. L. Schr¨ oder, D. Kozen, S. Milius, and T. Wißmann. Nominal automata with name binding. In J. Esparza and A. S. Murawski, editors, Foundations of Software Science and Computation Structures - 20th International Conference, FOSSACS 2017, Held as Part of the European Joint Conferences on Theory and Practice of Software, ETAPS 2017, Uppsala, Sweden, April 22-29, 2017, Proceedings, volume 10203 of Lecture Notes in Computer Science, pages 124–142, 2017. 41. T. Schwentick and K. Barthelmann. Local normal forms for first-order logic with applications to games and automata. In Annual Symposium on Theoretical Aspects of Computer Science, pages 444–454. Springer, 1998. 42. W. Thomas. Church’s problem and a tour through automata theory. In Pillars of Computer Science, Essays Dedicated to Boris (Boaz) Trakhtenbrot on the Occasion of His 85th Birthday, volume 4800 of Lecture Notes in Computer Science, pages 635–655. Springer, 2008. 43. Y. Velner and A. Rabinovich. Church synthesis problem for noisy input. In M. Hof- mann, editor, Foundations of Software Science and Computational Structures - 14th International Conference, FOSSACS 2011, Held as Part of the Joint Euro- pean Conferences on Theory and Practice of Software, ETAPS 2011, Saarbrucken, ¨ Germany, March 26-April 3, 2011. Proceedings, volume 6604 of Lecture Notes in Computer Science, pages 275–289. Springer, 2011. 118 B. B´ erard et al. Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License ( 4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made. The images or other third party material in this chapter are included in the chapter’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. Controlling a random population 1 2,3 1 Thomas Colcombet , Nathana¨el Fijalkow (), and Pierre Ohlmann Universit´e de Paris, IRIF, CNRS, Paris, France {thomas.colcombet,pierre.ohlmann} CNRS, LaBRI, Bordeaux, France The Alan Turing Institute of data science, London, United Kingdom Abstract. Bertrand et al. introduced a model of parameterised systems, where each agent is represented by a finite state system, and studied the following control problem: for any number of agents, does there exist a controller able to bring all agents to a target state? They showed that the problem is decidable and EXPTIME-complete in the adversarial setting, and posed as an open problem the stochastic setting, where the agent is represented by a Markov decision process. In this paper, we show that the stochastic control problem is decidable. Our solution makes significant uses of well quasi orders, of the max-flow min-cut theorem, and of the theory of regular cost functions. 1 Introduction The control problem for populations of identical agents. The model we study was introduced in [3] (see also the journal version [4]): a population of agents are controlled uniformly, meaning that the controller applies the same action to every agent. The agents are represented by a finite state system, the same for every agent. The key difficulty is that there is an arbitrary large number of agents: the control problem is whether for every n ∈ N, there exists a controller able to bring all n agents synchronously to a target state. The technical contribution of [3,4] is to prove that in the adversarial setting where an opponent chooses the evolution of the agents, the (adversarial) control problem is EXPTIME-complete. In this paper, we study the stochastic setting, where each agent evolves in- dependently according to a probabilistic distribution, i.e. the finite state system modelling an agent is a Markov decision process. The control problem becomes whether for every n ∈ N, there exists a controller able to bring all n agents synchronously to a target state with probability one. The authors are committed to making professional choices acknowledging the cli- mate emergency. We submitted this work to FoSSaCS for its excellence and because its location induces for us a low carbon footprint. This work was supported by the European Research Council (ERC) under the European Union’s Horizon 2020 re- search and innovation programme (grant agreement No.670624), and by the DeLTA ANR project (ANR-16-CE40-0007). c The Author(s) 2020 J. Goubault-Larrecq and B. K¨ onig (Eds.): FOSSACS 2020, LNCS 12077, pp. 119–135, 2020. 120 T. Colcombet et al. Our main technical result is that the stochastic control problem is decidable. In the next paragraphs we discuss four motivations for studying this problem: control of biological systems, parameterised verification and control, distributed computing, and automata theory. Modelling biological systems. The original motivation for studying this model was for controlling population of yeasts ([21]). In this application, the concen- tration of some molecule is monitored through fluorescence level. Controlling the frequency and duration of injections of a sorbitol solution influences the concen- tration of the target molecule, triggering different chemical reactions which can be modelled by a finite state system. The objective is to control the popula- tion to reach a predetermined fluorescence state. As discussed in the conclusions of [3,4], the stochastic semantics is more satisfactory than the adversarial one for representing the behaviours of the chemical reactions, so our decidability result is a step towards a better understanding of the modelling of biological systems as populations of arbitrarily many agents represented by finite state systems. From parameterised verification to parameterised control. Parameterised verifi- cation was introduced in [12]: it is the verification of a system composed of an arbitrary number of identical components. The control problem we study here and introduced in [3,4] is the first step towards parameterised control: the goal is control a system composed of many identical components in order to ensure a given property. To the best of our knowledge, the contributions of [3,4] are the first results on parameterised control; by extension, we present the first results on parameterised control in a stochastic setting. Distributed computing. Our model resembles two models introduced for the study of distributed computing. The first and most widely studied is popula- tion protocols, introduced in [2]: the agents are modelled by finite state systems and interact by pairs drawn at random. The mode of interaction is the key difference with the model we study here: in a time step, all of our agents per- form simultaneously and independently the same action. This brings us closer to broadcast protocols as studied for instance in [8], in which one action involves an arbitrary number of agents. As explained in [3,4], our model can be seen as a subclass of (stochastic) broadcast protocols, but key differences exist in the semantics, making the two bodies of work technically independent. The focus of the distributed computing community when studying population or broadcast protocols is to construct the most efficient protocols for a given task, such as (prominently) electing a leader. A growing literature from the verification community focusses on checking the correctness of a given protocol against a given specification; we refer to the recent survey [7] for an overview. We concentrate on the control problem, which can then be seen as a first result in the control of distributed systems in a stochastic setting. Alternative semantics for probabilistic automata. It is very tempting to con- sider the limit case of infinitely many agents: the parameterised control question Controlling a random population 121 becomes the value 1 problem for probabilistic automata, which was proved un- decidable in [13], and even in very restricted cases ([10]). Hence abstracting continuous distributions by a discrete population of arbitrary size can be seen as an approximation technique for probabilistic automata. Using n agents cor- −n reponds to using numerical approximation up to 2 with random rounding; in this sense the control problem considers arbitrarily fine approximations. The plague of undecidability results on probabilistic automata (see e.g. [9]) is nicely contrasted by our positive result, which is one of the few decidability results on probabilistic automata not making structural assumptions on the underlying graph. Our results. We prove decidability of the stochastic control problem. The first insight is given by the theory of well quasi orders, which motivates the introduc- tion of a new problem called the sequential flow problem. The first step of our solution is to reduce the stochastic control problem to (many instances of) the sequential flow problem. The second insight comes from the theory of regular cost functions, providing us with a set of tools for addressing the key difficulty of the problem, namely the fact that there are arbitarily many agents. Our key technical contribution is to show the computability of the sequential flow prob- lem by reducing it to a boundedness question expressed in the cost monadic second order logic using the max-flow min-cut theorem. Related work. The notion of decisive Markov chains was introduced in [1] as a unifying property for studying infinite-state Markov chains with finite-like properties. A typical example of decisive Markov chains is lossy channel sys- tems where tokens can be lost anytime inducing monotonicity properties. Our situation is the exact opposite as we are considering (using the Petri nets ter- minology) safe Petri nets where the number of tokens along a run is constant. So it is not clear whether the underlying argument in both cases can be unified using decisiveness. Organisation of the paper. We define the stochastic control problem in Section 2, and the sequential flow problem in Section 3. We construct a reduction from the former to (many instances of) the latter in Section 4, and show the decidability of the sequential flow problem in Section 5. 2 The stochastic control problem Definition 1. A Markov decision process (MDP for short) consists of – a finite set of states Q, – a finite set of actions A, – a stochastic transition table ρ : Q × A→D (Q). The interpretation of the transition table is that from the state p under action a, the probability to transition to q is ρ(p, a)(q). The transition relation Δ is 122 T. Colcombet et al. defined by Δ = {(p, a, q) ∈ Q×A×Q : ρ(p, a)(q) > 0} . We also use Δ given by {(p, q) ∈ Q × Q :(p, a, q) ∈ Δ}. We refer to [17] for the usual notions related to MDPs; it turns out that very little probability theory will be needed in this paper, so we restrict ourselves to mentioning only the relevant objects. In an MDP M, a strategy is a function σ : Q→A; note that we consider only pure and positional strategies, as they will be sufficient for our purposes. Given a source s ∈Q and a target t ∈Q, we say that the strategy σ almost surely reaches t if the probability that a path starting from s and consistent with σ eventually leads to t is 1. As we shall recall in Section 4, whether there exists a strategy ensuring to reach t almost surely from s, called the almost sure reachability problem for MDP can be reduced to solving a two player Buc ¨ hi game, and in particular does not depend upon the exact probabilities. In other words, the only relevant information for each (p, a, q) ∈ Q ×A×Q is whether ρ(p, a)(q) > 0 or not. Since the same will be true for the stochastic control problem we study in this paper, in our examples we do not specify the exact probabilities, and an edge from p to q labelled a means that ρ(p, a)(q) > 0. Let us now fix an MDP M and consider a population of n tokens (we use tokens to represent the agents). Each token evolves in an independent copy of the MDP M. The controller acts through a strategy σ : Q →A, meaning that given the state each of the n tokens is in, the controller chooses one action to be performed by all tokens independently. Formally, we are considering the n n product MDP M whose set of states is Q , set of actions is A, and transition n n th table is ρ (u, a)(v)= ρ(u ,a)(v ), where u, v ∈Q and u ,v are the i i i i i i=1 components of u and v. n n Let s, t ∈Q be the source and target states, we write s and t for the constant n-tuples where all components are s and t. For a fixed value of n, n n whether there exists a strategy ensuring to reach t almost surely from s can be reduced to solving a two player Buc ¨ hi game in the same way as above for a single MDP, replacing M by M . The stochastic control problem asks whether this is true for arbitrary values of n: Problem 1 (Stochastic control problem). The inputs are an MDP M, a source state s ∈Q and a target state t ∈Q. The question is whether for all n ∈ N, n n there exists a strategy ensuring to reach t almost surely from s . Our main result is the following. Theorem 1. The stochastic control problem is decidable. The fact that the problem is co-recursively enumerable is easy to see: if the answer is “no”, there exists n ∈ N such that there exist no strategy ensuring n n to reach t almost surely from s . Enumerating the values of n and solving the almost sure reachability problem for M eventually finds this out. However, it is not clear whether one can place an upper bound on such a witness n, which Controlling a random population 123 would yield a simple (yet inefficient!) algorithm. As a corollary of our analysis we can indeed derive such an upper bound, but it is non elementary in the size of the MDP. In the remainder of this section we present a few interesting examples. Example 1 Let us consider the MDP represented in Figure 1. We show that for this MDP, for any n ∈ N, the controller has an almost sure strategy to reach n n t from s . Starting with n tokens on s, we iterate the following strategy: – Repeatedly play action a until all tokens are in q; – Play action b. The first step is eventually successful with probability one, since at each iteration there is a positive probability that the number of tokens in state q increases. In the second step, with non zero probability at least one token goes to t, while the rest go back to s. It follows that each iteration of this strategy increases with non zero probability the number of tokens in t. Hence, all tokens are eventually transferred to t almost surely. n n Fig. 1. The controller can almost surely reach t from s , for any n ∈ N. Example 2 We now consider the MDP represented in Figure 2. By convention, if from a state some action does not have any outgoing transition (for instance the action u from s), then it goes to the sink state ⊥. We show that there exists a controller ensuring to transfer seven tokens from s to t, but that the same does not hold for eight tokens. For the first assertion, we present the following strategy: – Play a. One of the states q for i ∈{u, d} receives at least 4 tokens. – Play i ∈{u, d}. At least 4 tokens go to t while at most 3 go to q . 1 1 – Play a. One of the states q for i ∈{u, d} receives at least 2 tokens. – Play i ∈{u, d}. At least 2 tokens go to t while at most 1 token goes to q . 2 2 – Play a. The token (if any) goes to q for i ∈{u, d}. 3 3 124 T. Colcombet et al. – Play i ∈{u, d}. The remaining token (if any) goes to t. Now assume that there are 8 tokens or more on s. The only choices for a strategy are to play u or d on the second, fourth, and sixth move. First, with non zero probability at least 4 tokens are in each of q for i ∈{u, d}. Then, whatever the choice of action i ∈{u, d}, there are at least 4 tokens in q after the next step. Proceeding likewise, there are at least 2 tokens in q with non zero probability two steps later. Then again two steps later, at least 1 token falls in the sink with non zero probability. Fig. 2. The controller can synchronise up to 7 tokens on the target state t almost surely, but not more. Generalising this example shows that if the answer to the stochastic control problem is “no”, the smallest number of tokens n for which there exist no almost n n surely strategy for reaching t from s may be exponential in |Q|. This can further extended to show a doubly exponential in Q lower bound, as done in [3,4]; the example produced there holds for both the adversarial and the stochastic setting. Interestingly, for the adversarial setting this doubly exponential lower bound is tight. Our proof for the stochastic setting yields a non-elementary bound, leaving a very large gap. Example 3 We consider the MDP represented in Figure 3. For any n ∈ N, n n there exists a strategy almost surely reaching t from s . However, this strategy has to pass tokens one by one through q . We iterate the following strategy: – Repeatedly play action a until exactly 1 token is in q . – Play action b. The token goes to q for some i ∈{l, r}. – Play action i ∈{l, r}, which moves the token to t. Note that the first step may take a very long time (the expectation of the number of as to be played until this happens is exponential in the number of tokens), Controlling a random population 125 but it is eventually successful with probability one. This very slow strategy is necessary: if q contains at least two tokens, then action b should not be played: with non zero probability, at least one token ends up in each of q ,q ,soatthe l r next step some token ends up in ⊥. It follows that any strategy almost surely reaching t has to be able to detect the presence of at most 1 token in q . This is a key example for understanding the difficulty of the stochastic control problem. Fig. 3. The controller can synchronise any number of tokens almost surely on the target state t, but they have to go one by one. 3 The sequential flow problem We let Q be a finite set of states. We call configuration an element of N and Q×Q flow an element of f ∈ N .Aflow f induces two configurations pre(f) and post(f) defined by pre(f)(p)= f(p, q) and post(f)(q)= f(p, q). q∈Q p∈Q Given c, c two configurations and f a flow, we say that c goes to c using f and write c → c ,if c = pre(f) and c = post(f). A flow word is f = f ...f where each f is a flow. We write c c if there 1  i exists a sequence of configurations c = c ,c ,...,c = c such that c → c 0 1  i−1 i for all i ∈{1,...,}. In this case, we say that c goes to c using the flow word f. We now recall some classical definitions related to well quasi orders ([15,16], see [19] for an exposition of recent results). Let (E,) be a quasi ordered set (i.e.  is reflexive and transitive), it is a well quasi ordered set (WQO) if any infinite sequence contains an increasing pair. We say that S ⊆ E is downward closed if for any x ∈ S,if y  x then y ∈ S.An ideal is a non-empty downward 126 T. Colcombet et al. closed set I ⊆ E such that for all x, y ∈ I, there exists some z ∈ I satisfying both x  z and y  z. Lemma 1. – Any infinite sequence of decreasing downward closed sets in a WQO is even- tually constant. – A subset is downward closed if and only if it is a finite union of incomparable ideals. We call it its decomposition into ideals (or simply, its decomposi- tion), which is unique (up to permutation). – An ideal is included in a downward closed set if and only if it is included in one of the ideals of its decomposition. Q Q×Q We equip the set of configurations N and the set of flows N with the quasi order  defined component wise, yielding thanks to Dickson’s Lemma [6] two WQOs. Lemma 2. Let X be a finite set. A subset of N is an ideal if and only if it is of the form a↓= {c ∈ N | c  a}, for some a ∈ (N ∪{ω}) (in which ω is larger than all integers). We represent downward closed sets of configurations and flows using their decomposition into finitely many ideals of the form a ↓ for a ∈ (N ∪{ω}) or Q×Q a ∈ (N ∪{ω}) . Problem 2 (Sequential flow problem). Let Q be a finite set of states. Given a Q×Q downward closed set of flows Flows ⊆ N and a downward closed set of final configurations F ⊆ N , compute the downward closed set ∗ Q f  ∗ Pre (Flows,F)= {c ∈ N | c  c ∈ F, f ∈ Flows } , i.e. the configurations from which one may reach F using only flows from Flows. 4 Reduction of the stochastic control problem to the sequential flow problem Let us consider an MDP M and a target t ∈Q. We first recall a folklore result reducing the almost sure reachability question for MDPs to solving a two player B¨ uchi game (we refer to [14] for the definitions and notations of Buc ¨ hi games). The Buc ¨ hi game is played between Eve and Adam as follows. From a state p: 1. Eve chooses an action a and a transition (p, q) ∈ Δ ; 2. Adam can either choose to agree and the game continues from q,or interrupt and choose another transition (p, q ) ∈ Δ , the game continues from q . Controlling a random population 127 The Buc ¨ hi objective is satisfied (meaning Eve wins) if either the target state t is reached or Adam interrupts infinitely many times. Lemma 3. There exists a strategy ensuring almost surely to reach t from s if and only if Eve has a winning strategy from s in the above Buchi ¨ game. We now explain how this reduction can be extended to the stochastic control problem. Let us consider an MDP M and a target t ∈Q. We now define an infinite Buc ¨ hi game G . The set of vertices is the set of configurations N .For aflow f, we write supp(f)= (p, q) ∈Q : f(p, q) > 0 . The game is played as follows from a configuration c: 1. Eve chooses an action a and a flow f such that pre(f)= c and supp(f) ⊆ Δ . 2. Adam can either choose to agree and the game continues from c = post(f) interrupt and choose a flow f such that pre(f )= c and supp(f ) ⊆ Δ , and the game continues from c = post(f ). Note that Eve choosing a flow f is equivalent to choosing for each token a transition (p, q) ∈ Δ , inducing the configuration c , and simiarly for Adam should he decide to interrupt. Eve wins if either all tokens are in the target state, or if Adam interrupts infinitely many times. Note that although the game is infinite, it is actually a disjoint union of finite games. Indeed, along a play the number of tokens is fixed, so each play is included in Q for some n ∈ N. Lemma 4. Let c be a configuration with n tokens in total, the following are equivalent: – There exists a strategy almost surely reaching t from c, – Eve has a winning strategy in the Buchi ¨ game G starting from c. Lemma 4 follows from applying Lemma 3 on the product MDP M . (i) We also consider the game G for i ∈ N, which is defined just as G except (i) for the winning objective: Eve wins in G if either all tokens are in the target state, or if Adam interrupts more than i times. It is clear that if Eve has a (i) winning strategy in G then she has a winning strategy in G . Conversely, the (i) following result states that G is equivalent to G for some i. Lemma 5. There exists i ∈ N such that from any configuration c ∈ N , Eve (i) has a winning strategy in G if and only if Eve has a winning strategy in G . M 128 T. Colcombet et al. (i) (i) Q Proof: Let X ⊆ N be the winning region for Eve in G . We first argue that (i) X = X is the winning region in G . It is clear that X is contained in the winning region: if Eve has a strategy to ensure that either all tokens are in the target state, or that Adam interrupts infinitely many times, then it particular this is true for Adam interrupting more than i times for any i. The converse inclusion holds because G is a disjoint union of finite Buc ¨ hi games. Indeed, in a finite Buc ¨ hi game, since Adam can restrict himself to playing a memoryless winning strategy, if Eve can ensure that he interrupts a certain number of times (larger than the size of the game), then by a simple pumping argument this implies that Adam will interrupt infinitely many times. (i) To conclude, we note that each X is downward closed: indeed, a winning strategy from a configuration c can be used from a configuration c where there (i) are fewer tokens in each state. It follows that (X ) is a decreasing sequence i≥0 of downward closed sets in N , hence it stabilises thanks to Lemma 1, i.e. there (i ) (i) exists i ∈ N such that X = X , which concludes. Note that Lemma 4 and Lemma 5 substantiate the claims made in Section 2: pure positional strategies are enough and the answer to the stochastic control problem does not depend upon the exact probabilities in the MDP. Indeed, the construction of the Buc ¨ hi games do not depend on them, and the answer to the former is equivalent to determining whether Eve has a winning strategy in each of them. We are now fully equipped to show that a solution to the sequential flow problem yields the decidability of the stochastic control problem. Let F be the set of configurations for which all tokens are in state t.welet (i) (i) Q X ⊆ N denote the winning region for Eve in the game G . Note first that ∗ 0 (0) X = Pre (Flows ,F ) where 0 Q×Q Flows = {f ∈ N : ∃a ∈A, supp(f) ⊆ Δ }. (0) Indeed, in the game G Adam cannot interrupt as this would make him lose (0) 0 immediately. Hence, the winning region for Eve in G is Pre (Flows ,F ). We generalise this by setting Flows for all i> 0 to be the set of flows f ∈ Q×Q N such that for some action a ∈A, – supp(f) ⊆ Δ , and (i−1) – for f with pre(f ) = pre(f) and supp(f ) ⊆ Δ , we have post(f ) ∈ X . Equivalently, this is the set of flows for which, when played in the game G by Eve, Adam cannot use an interrupt move and force the configuration outside (i−1) of X . We now claim that ∗ i (i) X = Pre (Flows ,F ) for all i ≥ 0. (i) We note that this means that for each i computing X reduces to solving one instance of the sequential flow problem. This induces an algorithm for solving Controlling a random population 129 (i) the stochastic control problem: compute the sequence (X ) until it stabilises, i≥0 which is ensured by Lemma 5 and yields the winning region of G . The answer to the stochastic control problem is then whether the initial configuration where all tokens are in s belongs to the winning region of G . Let us prove the claim by induction on i. ∗ i Let c be a configuration in Pre (Flows ,F ). This means that there exists i f a flow word f = f ··· f such that f ∈ Flows for all k, and c  c ∈ F . 1  k Expanding the definition, there exist c = c,...,c = c such that c → c 0  k−1 k for all k. (i) Let us now describe a strategy for Eve in G starting from c. As long as Adam agrees, Eve successively chooses the sequence of flows f ,f ,... and the 1 2 corresponding configurations c ,c ,... . If Adam never interrupts, then the game 1 2 reaches the configuration c ∈ F , and Eve wins. Otherwise, as soon as Adam (i−1) interrupts, by definition of Flows , we reach a configuration d ∈ X .By induction hypothesis, Eve has a strategy which ensures from d to either reach F or that Adam interrupts at least i − 1 times. In the latter case, adding the interrupt move leading to d yields i interrupts, so this is a winning strategy for (i) (i) Eve in G , witnessing that c ∈ X . (i) Conversely, assume that there is a winning strategy σ of Eve in G from a configuration c. Consider a play consistent with σ, it either reaches F or Adam interrupts. Let us denote by f = f ,f ,...,f the sequence of flows until 1 2 then. We argue that f ∈ Flows for k ∈{1,...,}.Let f = f for some k,by k k definition of the game supp(f) ⊆ Δ for some action a. Let f such that pre(f )= pre(f) and supp(f ) ⊆ Δ . In the game G after Eve played f , Adam has a M k the possibility to interrupt and choose f . From this configuration onward the (i−1) i strategy σ is winning in G , implying that f ∈ Flows .Thus f = f f ...f 1 2 (i) is a witness that c ∈ X . 5 Computability of the sequential flow problem Q×Q Let Q be a finite set of states, Flows ⊆ N a downward closed set of flows and F ⊆ N a downward closed set of configurations, the sequential flow problem is to compute the downward closed set Pre defined by ∗ Q f  ∗ Pre (Flows,F)= {c ∈ N | c  c ∈ F, f ∈ Flows } , i.e. the configurations from which one may reach F using only flows from Flows. The following classical result of [22] allows us to further reduce our problem. Lemma 6. The task of computing a downward closed set can be reduced to the task of deciding whether a given ideal is included in a downward closed set. Thanks to Lemma 6, it is sufficient for solving the sequential flow problem to establish the following result. 130 T. Colcombet et al. Lemma 7. Let I be an ideal of the form a↓ for a ∈ (N ∪{ω}) , and Flows ⊆ Q×Q N be a downward closed set of flows. It is decidable whether F can be reached from all configurations of I using only flows from Flows. Q×Q We call a vector a ∈ (N ∪{ω}) a capacity.A capacity word is a finite sequence of capacities. For two capacity words w, w of the same length, we write w ≤ w to mean that w ≤ w for each i. Since flows are particular cases of capacities, we can compare flows with capacities in the same way. Before proving Lemma 7 let us give an example and some notations. Given a state q, we write q ∈ N for the vector which has value 1 on the q component and 0 elsewhere. More generally we let αq for α ∈ N ∪{ω} denote the vector with value α on the q component and 0 elsewhere. We use similar notations for flows. For instance, ωq + q has value ω in the q component, 1 in 1 2 1 the q component, and 0 elsewhere. In the instance of the sequential flow problem represented in Figure 4, we ask the following question: can F be reached from any configuration of I =(ωq )↓? n−1 n f The answer is yes: the capacity word w =(ac b) is such that nq  nq ∈ F 2 4 for a flow word f  w, the begining of which is described in Figure 5. Fig. 4. An instance of the sequential flow problem. We let Flows = a↓∪ b↓∪ c ↓ where a = ω(q ,q )+(q ,q )+ ω(q ,q ), b = ω(q ,q )+(q ,q )+ ω(q ,q ), and c = 2 2 2 3 4 4 1 2 3 4 4 4 ω(q ,q )+(q ,q )+ ω(q ,q )+ ω(q ,q )+ ω(q ,q ). Set also F =(ωq )↓. 1 1 2 1 2 2 3 3 4 4 4 n−1 Fig. 5. A flow word f = f f ...f  ac b such that nq goes to (n − 1)q + q 1 2 n+1 2 1 4 using f. This construction can be extended to f  w such that nq goes to nq using f. 2 4 We write a[ω ← n] for the configuration obtained from a by replacing all ωs by n. Controlling a random population 131 The key idea for solving the sequential flow problem is to rephrase it using regular cost functions (a set of tools for solving boundedness questions). Indeed, whether F can be reached from all configurations of I = a ↓ using only flows from Flows can be equivalently phrased as a boundedness question, as follows: does there exist a bound on the values of n ∈ N such that a[ω ← n] c for some c ∈ F and f ∈ Flows ? We show that this boundedness question can be formulated as a boundedness question for a formula of cost monadic logic, a formalism that we introduce now. We assume that the reader is familiar with monadic second order logic (MSO) over finite words, and refer to [20] for the definitions. The syntax of cost monadic logic (cost MSO for short) extends MSO with the construct |X|≤ N, where X is a second order variable and N is a bounding variable. The semantics is defined as usual: w, n |= ϕ for a word w ∈ A , with n ∈ N specifying the bound N. We assume that there is at most one bounding variable, and that the construct |X|≤ N appears positively, i.e. under an even number of negations. This ensures that the larger N, the more true the formula is: if w, n |= ϕ, then w, n |= ϕ for all n ≥ n. The semantics of a formula ϕ of cost MSO induces a function A → N ∪ {∞} defined by ϕ(w)=inf {n ∈ N | w, n |= ϕ}. The boundedness problem for cost monadic logic is the following problem: ∗ ∗ given a cost MSO formula ϕ over A , is it true that the function A → N ∪ {∞} is bounded, i.e.: ∃n ∈ N, ∀w ∈ A ,w,n |= ϕ? The decidability of the boundedness problem is a central result in the theory of regular cost functions ([5]). Since in the theory of regular cost functions, when considering functions we are only interested in whether they are bounded or not, we will consider functions “up to boundedness properties”. Concretely, this means that a cost function is an equivalence class of functions A → N ∪ {∞}, with the equivalence being f ≈ g if there exists α : N → N such that f(w) is finite if and only if g(w) is finite, and in this case, f(w)  α(g(w)) and g(w)  α(f(w)). This is equivalent to stating that for all X ⊆ A ,if f is bounded over X if and only if g is bounded over X. Let us now establish Lemma 7. Proof: Let T = {q ∈Q| a(q)= ω}. Note that for n sufficiently large, we have Q×Q a[ω ← n]↓= I ∩{0, 1,...,n}.Welet C ⊆ (N ∪{ω}) be the decomposition of Flows into ideals, that is, C is the minimal finite set such that Flows = b↓ . b∈C We let k denote the largest finite value that appears in the definition of C , that is, k = max{b(q, q ): b ∈ C ,q,q ∈Q,b(q, q ) = ω}. Let us define the function Φ : C −→ N ∪{ω} w −→ sup{n ∈ N : ∃f  w, a[ω ← n]  F }. 132 T. Colcombet et al. By definition Φ is unbounded if and only if F can be reached from all configura- tions of I. Since boundedness of cost MSO is decidable, it suffices to construct a formula in cost monadic logic for Φ to obtain the decidability of our problem. Our approach will be to additively decompose the capacity word w into a finitary (fin) part w (which is handled using a regular language), and several unbounded (s) parts w for each s ∈ T . The unbounded parts require a more careful analysis which notably goes through the use of the max-flow min-cut theorem. Note that a[ω ← n] decomposes as the sum of its finite part a = a[ω ← 0] fin and ns. Since flows are additive, it holds that f  w = w ...w is a 1 l s∈T flow from c to F if and only if the capacity word w may be decomposed into (s) (s) (fin) (fin) (s) (fin) (w ) =(w ...w ) and w = w ...w such that s∈T s∈T 1 l 1 l (s) – all the numbers appearing in the w capacities are bounded by k, (s) – for all i ∈{1,...,l},w = w , s∈T ∪{fin} f (s) – for all s ∈ T , ns  F for some flow word f  w , f (fin) – and a  F for some flow word f  w . fin In order to encode such capacity words in cost MSO we use monadic variables (s) W  where q, q ∈Q, p ∈{0,...,k,ω} and s ∈ T ∪{fin}. They are meant to q,q ,p (s) (s) satisfy that i ∈ W if and only if w (q, q )= p. We use bold W to denote q,q ,p,s i (s) (s) (s) the tuple (W ) , and W for (W ) when s ∈ T ∪{ω} is fixed. q,q ,p,s  q,q ,p q,q ,p q,q ,p (s) The MSO formula IsDecomp(W,w) states that a decomposition (w ) s∈T ∪{ω} is semantically valid and sums to w: (s) (s) ∀i, i ∈ W ∧ i/ ∈ W q,q ,s p∈{0,...,k,ω} q,q ,p p =p q,q ,p (s) ∧ w (q, q )= p =⇒ i ∈ W i (p ) q,q p s∈T ∪{fin} s∈T ∪{fin} q,q ,p p =p For s ∈ T , we now consider the function (s) Q×Q Ψ : {0, 1,...,k,ω} −→ N ∪{ω} (s) (s) w −→ sup{n ∈ N |∃f  w ,ns F }. Q×Q (fin) We also define Ψ ⊆ ({0,...,k,ω}) to be the language of capacity words (fin) (fin) f w such that there exists a flow f  w with a  F . Note that fin (fin) Ψ is a regular language since it is recognized by a finite automaton over {0, 1,...,k|Q|} that may update the current bounded configuration only with (fin) flows smaller than the current letter of w . We have (s) (s) (fin) (fin) Φ(w)=sup ∃W , IsDecomp(W,w) ∧ Ψ (W ) ≥ n ∧ W ∈ Ψ . s∈T (s) Hence, it is sufficient to prove that for each s ∈ T , Ψ is definable in cost MSO. Controlling a random population 133 Q×Q Let us fix s and a capacity word w ∈{0,...,k,ω} of length |w| = . Consider the finite graph G with vertex set Q×{0, 1,...,} and for all i ≥ 1, an (s) edge from (q, i − 1) to (q ,i) labelled by w (q, q ). Then Ψ (w) is the maximal flow from (s, 0) to (t, )in G. We recall that a cut in a graph with distinguished source s and target t is a set of edges such that removing them disconnects s and t. The cost of a cut is the sum of the weight of its edges. The max-flow min-cut theorem states that the maximal flow in a graph is exactly the minimal cost of a cut ([11]). (s) We now define a cost MSO formula Ψ which is equivalent (in terms of cost (s) functions) to the minimal cost of cut in the previous graph G and thus to Ψ .In the following formula, X =(X )  represents a cut in the graph: i ∈ X q,q q,q ∈Q q,q means that edge ((q, i−1), (q ,i)) belongs to the cut. Likewise, P =(P ) q,q q,q ∈Q (s) represents paths in the graph. Let Ψ (w) be defined by inf ∃X n ≥|X | ∧ ∀i, i ∈ X =⇒ w (q, q ) <ω ∧ Disc (X,w) , q,q q,q i s,t q,q where Disc (X,w) expresses that X disconnects (s, 0) and (t, )in G.For s,t instance Disc (X,w) is defined by s,t ∀P , ∀i, i ∈ P =⇒ w (q, q ) > 0 ∧ 0 ∈ P ∧  ∈ P ∧ q,q i s,q q,t q,q q ∀i ≥ 1, i ∈ P =⇒ i − 1 ∈ P =⇒∃i, i ∈ X ∧ i ∈ P . q,q q ,q q,q q,q q,q q q,q (s) (s) Now Ψ (w) does not exactly define the minimal total weight Φ (w) of a cut, but rather the minimal value over all cuts of the minimum over (q, q ) ∈Q of how many edges are of the form ((q, i − 1), (q ,i)). This is good enough for our purposes since these two values are related by (s) (s) 2 (s) ˜ ˜ Ψ (w)  Φ (w)  k|Q| Ψ (w), (s) (s) implying that the functions Ψ and Φ define the same cost function. In par- (s) ticular, Φ is definable in cost MSO. 6 Conclusions We showed the decidability of the stochastic control problem. Our approach uses well quasi orders and the sequential flow problem, which is then solved using the theory of regular cost functions. Together with the original result of [3,4] in the adversarial setting, our result contributes to the theoretical foundations of parameterised control. We return to the first application of this model, control of biological systems. As we discussed 134 T. Colcombet et al. the stochastic setting is perhaps more satisfactory than the adversarial one, although as we saw very complicated behaviours emerge in the stochastic setting involving single agents, which are arguably not pertinent for modelling biological systems. We thus pose two open questions. The first is to settle the complexity status of the stochastic control problem. Very recently [18] proved the EXPTIME- hardness of the problem, which is interesting because the underlying phenomena involved in this hardness result are specific to the stochastic setting (and do not apply to the adversarial setting). Our algorithm does not even yield elementary upper bounds, leaving a very large complexity gap. The second question is to- wards more accurately modelling biological systems: can we refine the stochastic control problem by taking into account the synchronising time of the controller, and restrict it to reasonable bounds? Acknowledgements We thank Nathalie Bertrand and Blaise Genest for introducing us to this fasci- nating problem, and the preliminary discussions at the Simons Institute for the Theory of Computing in Fall 2015. References 1. Abdulla, P.A., Henda, N.B., Mayr, R.: Decisive Markov chains. Logical Methods in Computer Science 3(4) (2007). 2. Angluin, D., Aspnes, J., Diamadi, Z., Fischer, M.J., Peralta, R.: Computation in networks of passively mobile finite-state sensors. Distributed Computing 18(4), 235–253 (2006). 3. Bertrand, N., Dewaskar, M., Genest, B., Gimbert, H.: Con- trolling a population. In: CONCUR. pp. 12:1–12:16 (2017). 4. Bertrand, N., Dewaskar, M., Genest, B., Gimbert, H., Godbole, A.A.: Controlling a population. Logical Methods in Computer Science 15(3) (2019), https://lmcs. 5. Colcombet, T.: Regular cost functions, part I: logic and algebra over words. Log- ical Methods in Computer Science 9(3) (2013). 9(3:3)2013 6. Dickson, L.E.: Finiteness of the odd perfect and primitive abundant numbers with n distinct prime factors. American Journal of Mathematics 35(4), 413–422 (1913), 7. Esparza, J.: Parameterized verification of crowds of anonymous processes. In: Dependable Software Systems Engineering, pp. 59–71. IOS Press (2016). 8. Esparza, J., Finkel, A., Mayr, R.: On the verification of broadcast protocols. In: LICS. pp. 352–359 (1999). 9. Fijalkow, N.: Undecidability results for probabilistic automata. SIGLOG News 4(4), 10–17 (2017), Controlling a random population 135 10. Fijalkow, N., Gimbert, H., Horn, F., Oualhadj, Y.: Two recursively insep- arable problems for probabilistic automata. In: MFCS. pp. 267–278 (2014). 23 11. Ford, L.R., Fulkerson, D.R.: Maximal flow through a network. Canadian Journal of Mathematics 8, 399–404 (1956). 12. German, S.M., Sistla, A.P.: Reasoning about systems with many processes. Journal of the ACM 39(3), 675–735 (1992) 13. Gimbert, H., Oualhadj, Y.: Probabilistic automata on finite words: De- cidable and undecidable problems. In: ICALP. pp. 527–538 (2010). 44 14. Gr¨ adel, E., Thomas, W., Wilke, T. (eds.): Automata, Logics, and Infinite Games, LNCS, vol. 2500. Springer (2002) 15. Higman, G.: Ordering by divisibility in abstract algebras. Proceed- ings of the London Mathematical Society s3-2(1), 326–336 (1952). 16. Kruskal, J.B.: The theory of well-quasi-ordering: A frequently discovered concept. J. Comb. Theory, Ser. A 13(3), 297–305 (1972). 3165(72)90063-5 17. Ku˘cera, A.: Turn-Based Stochastic Games. Lectures in Game Theory for Computer Scientists, Cambridge University Press (2011) 18. Mascle, C., Shirmohammadi, M., Totzke, P.: Controlling a random population is EXPTIME-hard. CoRR (2019), 19. Schmitz, S.: Algorithmic Complexity of Well-Quasi-Orders. Habilitation a ` diriger des recherches, Ecole normale sup´erieure Paris-Saclay (Nov 2017), https://tel. 20. Thomas, W.: Languages, automata, and logic. In: Handbook of Formal Language Theory, vol. III, pp. 389–455. Springer (1997) 21. Uhlendorf, J., Miermont, A., Delaveau, T., Charvin, G., Fages, F., Bottani, S., Hersen, P., Batt, G.: In silico control of biomolecular processes. Computational Methods in Synthetic Biology 13, 277–285 (2015) 22. Valk, R., Jantzen, M.: The residue of vector sets with applications to de- cidability problems in Petri nets. Acta Informatica 21, 643–674 (03 1985). Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made. The images or other third party material in this chapter are included in the chapter’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. Decomposing Probabilistic Lambda-Calculi 1 2() 2 Ugo Dal Lago , Giulio Guerrieri , and Willem Heijltjes Dipartimento di Informatica - Scienza e Ingegneria Universit` a di Bologna, Bologna, Italy Department of Computer Science University of Bath, Bath, UK {w.b.heijltjes,g.guerrieri} Abstract. A notion of probabilistic lambda-calculus usually comes with a prescribed reduction strategy, typically call-by-name or call-by-value, as the calculus is non-confluent and these strategies yield different results. This is a break with one of the main advantages of lambda-calculus: confluence, which means that results are independent from thechoice of strategy. We present a probabilistic lambda-calculus where the proba- bilistic operator is decomposed into two syntactic constructs: a generator, which represents a probabilistic event; and a consumer, which acts on the term depending on a given event. The resulting calculus, the Prob- abilistic Event Lambda-Calculus, is confluent, and interprets the call- by-name and call-by-value strategies through different interpretations of the probabilistic operator into our generator and consumer constructs. We present two notions of reduction, one via fine-grained local rewrite steps, and one by generation and consumption of probabilistic events. Simple types for the calculus are essentially standard, and they convey strong normalization. We demonstrate how we can encode call-by-name and call-by-value probabilistic evaluation. 1 Introduction Probabilistic lambda-calculi [24,22,17,11,18,9,15] extend the standard lambda- calculus with a probabilistic choice operator N M, which chooses N with probability p and M with probability 1 − p (throughout this paper, we let p be /2 and will omit it). Duplication of N ⊕ M, as is wont to happen in lambda- calculus, raises a fundamental question about its semantics: do the duplicate occurrences represent the same probabilistic event, or different ones with the same probability? For example, take the term  ⊥ that represents a coin flip between boolean values true  and false ⊥. If we duplicate this term, do the copies represent two distinct coin flips with possibly distinct outcomes, or do these represent a single coin flip that determines the outcome for both copies? Put differently again, when we duplicate  ⊥, do we duplicate the event,or only its outcome? In probabilistic lambda-calculus, these two interpretations are captured by the evaluation strategies of call-by-name ( ), which duplicates events, and cbn c The Author(s) 2020 J. Goubault-Larrecq and B. K¨ onig (Eds.): FOSSACS 2020, LNCS 12077, pp. 136–156, 2020. Decomposing Probabilistic Lambda-Calculi 137 call-by-value ( ), which evaluates any probabilistic choice before it is du- cbv plicated, and thus only duplicates outcomes. Consider the following example, where = tests equality of boolean values. ⊕ ⊕ (λx. x = x)( ⊥)  ⊥ cbv cbn This situation is not ideal, for several, related reasons. Firstly, it demonstrates how probabilistic lambda-calculus is non-confluent, negating one of the central properties of the lambda-calculus, and one of the main reasons why it is the prominent model of computation that it is. Secondly, it means that a probabilis- tic lambda-calculus must derive its semantics from a prescribed reduction strat- egy, and its terms only have meaning in the context of that strategy. Thirdly, combining different kinds of probabilities becomes highly involved [15], as it would require specialized reduction strategies. These issues present themselves even in a more general setting, namely that of commutative (algebraic) effects, which in general do not commute with copying. We address these issues by a decomposition of the probabilistic operator into a generator a and a choice ⊕, as follows. Δ a ⊕ ⊕ N M = a .N M Semantically, a represents a probabilistic event, that generates a boolean value recorded as a. The choice N ⊕M is simply a conditional on a, choosing N if a is false and M if a is true. Syntactically, a is a boolean variable with an occurrence in ⊕, and a acts as a probabilistic quantifier, binding all occurrences in its scope. (To capture a non-equal chance, one would attach a probability p to a generator, as a , though we will not do so in this paper.) The resulting probabilistic event lambda-calculus Λ , which we present in PE this paper, is confluent. Our decomposition allows us to separate duplicating an event, represented by the generator a , from duplicating only its outcome a, through having multiple choice operators . In this way our calculus may interpret both original strategies, call-by-name and call-by-value, by different translations of standard probabilistic terms into Λ : call-by-name by the above PE decomposition (see also Section 2), and call-by-value by a different one (see Sec- tion 7). For our initial example, we get the following translations and reductions. a a b = = ⊕ cbn:(λx. x x)( a .  ⊕⊥) ( a .  ⊕⊥) ( b .  ⊕⊥)  ⊥ (1) a a a = = cbv : a . (λx. x x)( ⊕⊥) a . ( ⊕⊥) ( ⊕⊥)  (2) We present two reduction relations for our probabilistic constructs, both in- dependent of beta-reduction. Our main focus will be on permutative reduction (Sections 2, 3), a small-step local rewrite relation which is computationally ineffi- cient but gives a natural and very fine-grained operational semantics. Projective reduction (Section 6) is a more standard reduction, following the intuition that a generates a coin flip to evaluate ⊕ , and is coarser but more efficient. We further prove confluence (Section 4), and we give a system of simple types and prove strong normalization for typed terms by reducibility (Section 5). Omitted proofs can be found in [7], the long version of this paper. 138 U. Dal Lago et al. 1.1 Related Work Probabilistic λ-calculi are a topic of study since the pioneering work by Saheb- Djaromi [24], the first to give the syntax and operational semantics of a λ-calculus with binary probabilistic choice. Giving well-behaved denotational models for probabilistic λ-calculi has proved to be challenging, as witnessed by the many contributions spanning the last thirty years: from Jones and Plotkin’s early study of the probabilistic powerdomain [17], to Jung and Tix’s remarkable (and mostly negative) observations [18], to the very recent encouraging results by Goubault- Larrecq [16]. A particularly well-behaved model for probabilistic λ-calculus can be obtained by taking a probabilistic variation of Girard’s coherent spaces [10], this way getting full abstraction [13]. On the operational side, one could mention a study about the various ways the operational semantics of a calculus with binary probabilistic choice can be specified, namely by small-step or big-step semantics, or by inductively or coin- ductively defined sets of rules [9]. Termination and complexity analysis of higher- order probabilistic programs seen as λ-terms have been studied by way of type systems in a series of recent results about size [6], intersection [4], and refinement type disciplines [1]. Contextual equivalence on probabilistic λ-calculi has been studied, and compared with equational theories induced by B¨ohm Trees [19], applicative bisimilarity [8], or environmental bisimilarity [25]. In all the aforementioned works, probabilistic λ-calculi have been taken as implicitly endowed with either call-by-name or call-by-value strategies, for the reasons outlined above. There are only a few exceptions, namely some works on Geometry of Interaction [5], Probabilistic Coherent Spaces [14], and Standard- ization [15], which achieve, in different contexts, a certain degree of indepen- dence from the underlying strategy, thus accommodating both call-by-name and call-by-value evaluation. The way this is achieved, however, invariably relies on Linear Logic or related concepts. This is deeply different from what we do here. Some words of comparison with Faggian and Ronchi Della Rocca’s work on confluence and standardization [15] are also in order. The main difference between their approach and the one we pursue here is that the operator ! in their calculus Λ plays both the roles of a marker for duplicability and of a checkpoint for any probabilistic choice ”flowing out” of the term (i.e. being fired). In our calculus, we do not control duplication, but we definitely make use of checkpoints. Saying it another way, Faggian and Ronchi Della Rocca’s work is inspired by linear logic, while our approach is inspired by deep inference, even though this is, on purpose, not evident in the design of our calculus. Probabilistic λ-calculi can also be seen as vehicles for expressing probabilistic models in the sense of bayesian programming [23,3]. This, however, requires an operator for modeling conditioning, which complicates the metatheory consid- erably, and that we do not consider here. Our permutative reduction is a refinement of that for the call-by-name prob- abilistic λ-calculus [20], and is an implementation of the equational theory of (ordered) binary decision trees via rewriting [27]. Probabilistic decision trees Decomposing Probabilistic Lambda-Calculi 139 have been proposed with a primitive binary probabilistic operator [22], but not with a decomposition as we explore here. 2 The Probabilistic Event λ-Calculus Λ PE Definition 1. The probabilistic event λ-calculus (Λ ) is given by the follow- PE ing grammar, with from left to right: a variable (denoted by x, y, z, . . . ), an abstraction,an application,a (labeled) choice, and a (probabilistic) generator. ·· M, N = x | λx.N | NM | N M | a .N ·· In a term λx. M the abstraction λx binds the free occurrences of the variable x in its scope M, and in a .N the generator a binds the label a in N. The calculus features a decomposition of the usual probabilistic sum , as follows. Δ a ⊕ ⊕ N M = a .N M (3) The generator a represents a probabilistic event, whose outcome, a binary value {0, 1} represented by the label a, is used by the choice operator ⊕. That is, a flips a coin setting a to 0 (resp. 1), and depending on this N ⊕M reduces to N (resp. M). We will use the unlabeled choice as in (3). This convention also gives the translation from a call-by-name probabilistic λ-calculus into Λ (the PE interpretation of a call-by-value probabilistic λ-calculus is in Section 7). Reduction. Reduction in Λ will consist of standard β-reduction plus an PE β evaluation mechanism for generators and choice operators, which implements probabilistic choice. We will present two such mechanisms: projective reduc- tion and permutative reduction . While projective reduction implements π p the given intuition for the generator and choice operator, we relegate it to Sec- tion 6 and make permutative reduction our main evaluation mechanism, for the reason that it is more fine-grained, and thus more general. Permutative reduction is based on the idea that any operator distributes over the labeled choice operator (see the reduction steps in Figure 1), even other choice operators, as below. a b b a b (N ⊕M) ⊕P ∼ (N ⊕P ) ⊕(M ⊕P ) To orient this as a rewrite rule, we need to give priority to one label over another. Fortunately, the relative position of the associated generators a and b provides just that. Then to define , we will want every choice to belong to some generator, and make the order of generators explicit. Definition 2. The set fl(N)of free labels of a term N is defined inductively by: fl(x)= ∅ fl(MN)= fl(M) ∪ fl(N) fl(λx. M)= fl(M) fl( a .M)= fl(M)  {a} fl(M N)= fl(M) ∪ fl(N) ∪{a} A term M is label-closed if fl(M)= ∅. 140 U. Dal Lago et al. (λx.N)M N[M/x](β) N N N (i) a a a (N ⊕M) ⊕P N ⊕P (c ) p 1 a a a N ⊕(M ⊕P ) N ⊕P (c ) p 2 a a λx. (N ⊕M) (λx. N) ⊕(λx. M)( λ) a a (N ⊕M)P (NP ) ⊕(MP)( f) a a N(M ⊕P ) (NM) ⊕(NP)(⊕a) a b b a b ⊕ ⊕ ⊕ ⊕ ⊕ < ⊕⊕ (N M) P (N P ) (M P ) (if a b)( ) p 1 b a b a b < ⊕⊕ N ⊕(M ⊕P ) (N ⊕M) ⊕(N ⊕P ) (if a b)( ) p 2 a a b . (N ⊕M) ( b .N) ⊕( b .M) (if a = b)( ) a .N N (if a/ ∈ fl(N)) () λx. a .N a . λx. N (λ) ( a .N)M a . (NM) (if a/ ∈ fl(M)) (f) Fig. 1. Reduction Rules for β-reduction and p-reduction. From here on, we consider only label-closed terms (we implicitly assume this, unless otherwise stated). All terms are identified up to renaming of their bound variables and labels. Given some terms M and N and a variable x, M[N/x]is the capture-avoiding (for both variables and labels) substitution of N for the free occurrences of x in M. We speak of a representative M of a term when M is not considered up to such a renaming. A representative M of a term is well-labeled if for every occurrence of a in M there is no a occurring in its scope. Definition 3 (Order for labels). Let M be a well-labeled representative of a term. We define an order < for the labels occurring in M as follows: a< b M M if and only if b occurs in the scope of a . For a well-labeled and label-closed representative M, < is a finite tree order. Definition 4. Reduction = ∪ in Λ consists of β-reduction β p PE β and permutative or p-reduction , both defined as the contextual closure of the rules given in Figure 1. We write for the reflexive–transitive closure of , and for reduction to normal form; similarly for and . We write = β p p for the symmetric and reflexive–transitive closure of . p Decomposing Probabilistic Lambda-Calculi 141 a a = ⊕ = ⊕ = ⊕ a . (λx. x x)( ⊥) a . (λx. x x) (λx. x x)⊥ ( a) = = a . ( ) ⊕ (⊥ ⊥) = a .  ⊕ a .   (i,) p p Fig. 2. Example Reduction of the cbv-translation of the Term on p. 137. Two example reductions are (1)-(2) on p. 137; a third, complete reduction is in Figure 2. The crucial feature of p-reduction is that a choice ⊕ does permute out of the argument position of an application, but a generator a does not, as below. Since the argument of a redex may be duplicated, this is how we characterize the difference between the outcome of a probabilistic event, whose duplicates may be identified, and the event itself, whose duplicates may yield different outcomes. a a N (M ⊕P ) (NM) ⊕(NP ) N ( a .M)  a .N M p p By inspection of the rewrite rules in Figure 1, we can then characterize the normal forms of and as follows. Proposition 5 (Normal forms). The normal forms P of , respectively 0 p N of , are characterized by the following grammars. ·· ⊕ N = N | N N 0 ·· 1 0 ·· ⊕ P = P | P P 0 ·· 1 0 ·· N = N | λx.N 1 ·· 2 1 ·· P = x | λx.P | P P 1 ·· 1 1 0 ·· N = x | N N 2 ·· 2 0 3 Properties of Permutative Reduction We will prove strong normalization and confluence of . For strong normal- ization, the obstacle is the interaction between different choice operators, which may duplicate each other, creating super-exponential growth. Fortunately, Der- showitz’s recursive path orders [12] seem tailor-made for our situation. Observe that the set Λ endowed with is a first-order term rewriting sys- PE p tem over a countably infinite set of variables and the signature Σ given by: • the binary function symbol ⊕, for any label a; • the unary function symbol a , for any label a; • the unary function symbol λx, for any variable x; • the binary function symbol @, letting @(M, N) stand for MN. This was inferred only from a simple simulation; we would be interested to know a rigorous complexity result. 142 U. Dal Lago et al. Definition 6. Let M be a well-labeled representative of a label-closed term, and let Σ be the set of signature symbols occurring in M. We define ≺ as M M the (strict) partial order on Σ generated by the following rules. a b ⊕ ≺ ⊕ if a< b M M ≺ b for any labels a, b b ≺ @,λx for any label b Lemma 7. The reduction is strongly normalizing. Proof. For the first-order term rewriting system (Λ , ) we derive a well- PE p founded recursive path ordering < from ≺ following [12, p. 289]. Let f and g range over function symbols, let [N ,...,N ] denote a multiset and extend < 1 n to multisets by the standard multiset ordering, and let N = f(N ,...,N ) and 1 n M = g(M ,...,M ); then 1 m [N ,...,N ] < [M ,...,M ]if f = g 1 n 1 m N< M ⇐⇒ [N ,...,N ] < [M]if f ≺ g 1 n M [N] ≤ [M ,...,M]if f  g. 1 m M While ≺ is defined only relative to Σ , reduction may only reduce the signa- M M ture. Inspection of Figure 1 then shows that M N implies N< M. Confluence of Permutative Reduction. With strong normalization, conflu- ence of requires only local confluence. We reduce the number of cases to consider, by casting the permutations of ⊕ as instances of a common shape. Definition 8. We define a context C[ ] (with exactly one hole [ ]) as follows, and let C[N] represent C[ ] with the hole [ ] replaced by N. a a ·· C[] =[] | λx.C[] | C[]M | NC[] | C[] ⊕M | N ⊕C[] | a .C[] ·· ⊕ ⊕ Observe that the six reduction rules λ through  in Figure 1 are all of the following form. We refer to these collectively as . a a C[N ⊕M] C[N] ⊕C[M]( ) Lemma 9 (Confluence of ). Reduction is confluent. p p Proof. By Newman’s lemma and strong normalization of (Lemma 7), con- fluence follows from local confluence. The proof of local confluence consists of joining all critical pairs given by . Details are in the Appendix of [7]. Definition 10. We denote the unique p-normal form of a term N by N . p Decomposing Probabilistic Lambda-Calculi 143 4 Confluence We aim to prove that = ∪ is confluent. We will use the standard β p technique of parallel β-reduction [26], a simultaneous reduction step on a number of β-redexes, which we define via a labeling of the redexes to be reduced. The central point is to find a notion of reduction that is diamond, i.e. every critical pair can be closed in one (or zero) steps. This will be our complete reduction, which consists of parallel β-reduction followed by p-reduction to normal form. Definition 11. A labeled term P is a term P with chosen β-redexes annotated • • • as (λx. N) M. The unique labeled β-step P P from P to the labeled reduct β • P reduces every labeled redex, and is defined inductively as follows. • • • • • (λx. N ) M N [M /x] N M N M β • • β • • a a • • x xN ⊕M N ⊕M β β • • • • λx. N λx. N a .N a .N β • β • • • A parallel β-step P P is a labeled step P P for some labeling P . β • β • Note that P is an unlabeled term, since all labels are removed in the reduction. For the empty labeling, P = P = P , so parallel reduction is reflexive: P P . • β Lemma 12. A parallel β-step P P is a β-reduction P P . β • β • Proof. By induction on the labeled term P generating P P . β • Lemma 13. Parallel β-reduction is diamond. • ◦ Proof. Let P P and P P be two labeled reduction steps on a term β • β ◦ P . We annotate each step with the label of the other, preserved by reduction, •◦ ◦• to give the span from the doubly labeled term P = P below left. Reducing the remaining labels will close the diagram, as below right. ◦ •◦ ◦• • ◦ • P P = P P P P = P P β β β •◦ ◦• β • ◦ • ◦ •◦ This is proved by induction on P , where only two cases are not immediate: those where a redex carries one but not the other label. One case follows by the below diagram; the other case is symmetric. Below, for the step top right, • • • induction on N shows that N [M /x] N [M /x]. β • • ◦• ◦ ◦• • • (λx. N ) M N [M /x] N [M /x] β β ◦• ◦• ◦ ◦ == •◦ ◦ •◦ ◦ ◦ ◦ (λx. N ) M (λx. N ) M N [M /x] β β •◦ •◦ • • 144 U. Dal Lago et al. 4.1 Parallel Reduction and Permutative Reduction For the commutation of (parallel) β-reduction with p-reduction, we run into the minor issue that a permuting generator or choice operator may block a redex: in both cases below, before the term has a redex, but after it is blocked. p p a a ⊕ ⊕ (λx. N M) P ((λx. N) (λx. M)) P (λx. a .N) M ( a . λx. N) M p p We address this by an adaptation of p-reduction on labeled terms, which is a strategy in that permutes past a labeled redex in one step. • • Definition 14. A labeled p-reduction N M on labeled terms is a p- reduction of one of the forms a a • • • • • • • • • • (λx. N ⊕M ) P (λx. N ) P ⊕(λx. M ) P • • • • • • (λx. a .N ) M a . (λx. N ) M or a single p-step on unlabeled constructors in N . Lemma 15. Reduction to normal form in is equal to (on labeled terms). p p Proof. Observe that and have the same normal forms. Then in one p p direction, since ⊆ we have ⊆ . Conversely, let N M. On this p p p p p reduction, let P Q be the first step such that P  Q. Then there is an R p p such that P R and Q R. Note that we have N R. By confluence, p p p R M, and by induction on the sum length of paths in from R (smaller p p than from N)wehave R M, and hence N M. p p The following lemmata then give the required commutation properties of the relations , , and . Figure 3 illustrates these by commuting diagrams. p p β • • Lemma 16. If N M then N = M . p • p • Proof. By induction on the rewrite step . The two interesting cases are: a a • • • • • • • • • • (λx. M ) (N ⊕P)((λx. M ) N ) ⊕((λx. M ) P ) β β (x ∈ fv(M)) a a M [(N ⊕P )/x] M [N /x] ⊕M [P /x] • • • • • • • a a • • • • • • • • • • ⊕ ⊕ (λx. M ) (N P)((λx. M ) N ) ((λx. M ) P ) β β (x/ ∈ fv(M)) M M M • • • p Decomposing Probabilistic Lambda-Calculi 145 How the critical pairs in the above diagrams are joined shows that we cannot use the Hindley-Rosen Lemma [2, Prop. 3.3.5] to prove confluence of ∪ . β p Lemma 17. N = N . • p p• • • Proof. Using Lemma 15 we decompose N N as • • • • • N = N N ··· N = N p p p 1 2 n p where (N ) = (N ) by Lemma 16. i • p i+1 • 4.2 Complete Reduction To obtain a reduction strategy with the diamond property for , we combine parallel reduction with permutative reduction to normal form into a no- β p tion of complete reduction . We will show that it is diamond (Lemma 19), and that any step in maps onto a complete step of p-normal forms (Lemma 20). Confluence of (Theorem 21) then follows: any two paths map onto complete paths on p-normal forms, which then converge by the diamond property. Definition 18. A complete reduction step N N is a parallel β-step fol- •p lowed by p-reduction to normal form: N N = N N N . •p · β • p •p Lemma 19 (Complete reduction is diamond). If P N M then for some Q, P Q M. Proof. By the following diagram, where M = N and P = N , and Q = N . ◦p •p ◦•p The square top left is by Lemma 13, top right and bottom left are by Lemma 17, and bottom right is by confluence and strong normalization of p-reduction. β p ◦• • • N N N ◦ ◦p β β N N N ◦• ◦p• p p β p N N N •p◦ ◦•p •p Lemma 20 (p-Normalization maps reduction to complete reduction). If N M then N M . p p Proof. For a p-step N M we have N = M while is reflexive. For a p p p β β-step N M we label the reduced redex in N to get N N = M. Then β β • Lemma 17 gives N = M, and hence N N M . p• p p β p• p p 146 U. Dal Lago et al. p p NM NM NM NM β β β β p p P = Q P = Q PQ PQ p p Lemma 16 Lemma 17 Lemma 19 Lemma 20 Fig. 3. Diagrams for the Lemmata Leading up to Confluence Theorem 21. Reduction is confluent. Proof. By the following diagram. For the top and left areas, by Lemma 20 any reduction path N M maps onto one N M . The main square follows by p p the diamond property of complete reduction, Lemma 19. NM N M p p P Q 5 Strong Normalization for Simply-Typed Terms In this section, we prove that the relation enjoys strong normalization in simply typed terms. Our proof of strong normalization is based on the classic reducibility technique, and inherently has to deal with label-open terms. It thus make great sense to turn the order < from Definition 3 into something more formal, at the same time allowing terms to be label-open. This is in Figure 4. It is easy to realize that, of course modulo label α-equivalence, for every term M there is at least one θ such that θ  M. An easy fact to check is that if θ  M and M N, then θ  N. It thus makes sense to parametrize on L L a sequence of labels θ, i.e., one can define a family of reduction relations on pairs in the form (M, θ). The set of strongly normalizable terms, and the number of steps to normal forms become themselves parametric: • The set SN of those terms M such that θ  M and (M, θ) is strongly normalizing modulo ; θ θ θ • The function sn assigning to any term in SN the maximal number of steps to normal form. Decomposing Probabilistic Lambda-Calculi 147 ·· Label Sequences: θ = ε | a · θ ·· ·· Label Judgments: ξ = θ  M ·· L θ  M a · θ  M L L θ L x θ L λx.M θ L a .M Label Rules: θ  Mθ  N θ  Mθ  Na ∈ θ L L L L θ  MN θ  M N L L Fig. 4. Labeling Terms ·· Types: τ = α | τ ⇒ ρ ·· ·· Environments: Γ = x : τ ,...,x : τ ·· 1 1 n n ·· Judgments: π = Γ  M : τ ·· Γ, x : τ  M : ρ Γ  M : τ Γ, x : τ  x : τ Γ  λx.M : τ ⇒ ρ Γ  a .M : τ Typing Rules: Γ  M : τ ⇒ρΓ  N : τ Γ  M :τΓ  N : τ Γ  MN : ρ Γ  M N : τ Fig. 5. Types, Environments, Judgments, and Rules θ θ θ θ ML ...L ∈ SN NL ...L ∈ SN a ∈ θ L ∈ SN ··· L ∈ SN 1 m 1 m 1 m θ θ xL ...L ∈ SN M ⊕NL ...L ∈ SN 1 m 1 m θ θ a·θ M[L /x]L ...L ∈ SN L ∈ SN ML ...L ∈ SN ∀i.a ∈ L 0 1 m 0 1 m i θ θ (λx. M)L ...L ∈ SN ( a .M)L ...L ∈ SN 0 m 1 m Fig. 6. Closure Rules for Sets SN We can now define types, environments, judgments, and typing rules in Figure 5. Please notice that the type structure is precisely the one of the usual, vanilla, simply-typed λ-calculus (although terms are of course different), and we can thus reuse most of the usual proof of strong normalization, for example in the version given by Ralph Loader’s notes [21], page 17. Lemma 22. The closure rules in Figure 6 are all sound. 148 U. Dal Lago et al. Since the structure of the type system is the one of plain, simple types, the definition of reducibility sets is the classic one: Red = {(Γ, θ, M) | M ∈ SN ∧ Γ  M : α}; Red = {(Γ, θ, M) | (Γ  M : τ ⇒ ρ) ∧ (θ  M) ∧ τ⇒ρ L ∀(ΓΔ,θ,N) ∈ Red .(ΓΔ,θ,MN) ∈ Red }. τ ρ Before proving that all terms are reducible, we need some auxiliary results. Lemma 23. 1. If (Γ, θ, M) ∈ Red , then M ∈ SN . 2. If Γ  xL ...L : τ and L ,...,L ∈ SN , then (Γ, θ, xL ...L ) ∈ Red . 1 m 1 m 1 m τ 3. If (Γ, θ, M[L /x]L ...L ) ∈ Red with Γ  L : ρ and L ∈ SN , then 0 1 m τ 0 0 (Γ, θ, (λx. M)L ...L ) ∈ Red . 0 m τ 4. If (Γ, θ, ML ...L ) ∈ Red with (Γ, θ, NL ...L ) ∈ Red and a ∈ θ, then 1 m τ 1 m τ (Γ, θ, (M ⊕N)L ...L ) ∈ Red . 1 m τ 5. If (Γ, a · θ, ML ...L ) ∈ Red and a ∈ L for all i, 1 m τ i then (Γ, θ, ( a .M)L ...L ) ∈ Red . 1 m τ Proof. The proof is an induction on τ:If τ is an atom α, then Point 1 follows by definition, while points 2 to 5 come from Lemma 22.If τ is ρ ⇒ μ, Points 2 to 5 come directly from the induction hypothesis, while Point 1 can be proved θ θ by observing that M is in SN if Mx is itself SN , where x is a fresh variable. By induction hypothesis (on Point 2), we can say that (Γ (x : ρ),θ,x) ∈ Red , and conclude that (Γ (x : ρ),θ,Mx) ∈ Red . The following is the so-called Main Lemma: Proposition 24. Suppose y : τ ,...,y : τ  M : ρ and θ  M, with 1 1 n n L (Γ, θ, N ) ∈ Red for all 1 ≤ j ≤ n. Then (Γ, θ, M[N /y ,...,N /y ]) ∈ Red . j τ 1 1 n n ρ Proof. This is an induction on the structure of the term M: • If M is a variable, necessarily one among y ,...,y , then the result is trivial. 1 n • If M is an application LP , then there exists a type ξ such that y : τ ,...,y : 1 1 n τ  L : ξ ⇒ ρ and y : τ ,...,y : τ  P : ξ. Moreover, θ  L and θ  P n 1 1 n n L L we can then safely apply the induction hypothesis and conclude that (Γ, θ, L[N/y]) ∈ Red (Γ, θ, P [N/y]) ∈ Red . ξ⇒ρ ξ By definition, we get (Γ, θ, (LP )[N/y]) ∈ Red . • If M is an abstraction λx. L, then ρ is an arrow type ξ ⇒ μ and y : τ ,...,y : τ ,x : ξ  L : μ. Now, consider any (ΓΔ,θ,P ) ∈ Red . Our 1 n n ξ objective is to prove with this hypothesis that (ΓΔ, θ, (λx.L[N/y])P ) ∈ Red . By induction hypothesis, since (ΓΔ, N ) ∈ Red , we get that μ i τ (ΓΔ,θ,L[N/y, P/x]) ∈ Red . The thesis follows from Lemma 23. μ Decomposing Probabilistic Lambda-Calculi 149 • If M is a sum L ⊕P , we can make use of Lemma 23 and the induction hypothesis, and conclude. • If M is a generator a .P , we can make use of Lemma 23 and the induction hypothesis. We should however observe that a · θ  P , since θ  M. L L We now have all the ingredients for our proof of strong normalization: Theorem 25. If Γ  M : τ and θ  M, then M ∈ SN . Proof. Suppose that x : ρ ,...,x : ρ  M : τ. Since x : ρ ,...,x : ρ  x : 1 1 n n 1 1 n n i ρ for all i, and clearly θ  x for every i, we can apply Lemma 24 and obtain i L i that (Γ, θ, M[x/x]) ∈ Red from which, via Lemma 23, one gets the thesis. 6 Projective Reduction Permutative reduction evaluates probabilistic sums purely by rewriting. Here we look at a more standard projective notion of reduction, which conforms more closely to the intuition that a generates a probabilistic event to determine the choice . Using + for an external probabilistic sum, we expect to reduce a .N to N + N where each N is obtained from N by projecting every subterm M M 0 1 i 0 1 to M . The question is, in what context should we admit this reduction? We first limit ourselves to reducing in head position. a a Definition 26. The a-projections π (N) and π (N) are defined as follows: 0 1 a a a a π (N M)= π (N) π (λx. N)= λx.π (N) 0 0 i i a a a a a π (N ⊕M)= π (M) π (NM)= π (N) π (M) 1 1 i i i b b a a a a π ( a .N)= a .N π (N ⊕M)= π (N) ⊕ π (M)if a = b i i i i a a a π (x)=xπ ( b .N)= b .π (N)if a = b. i i i Definition 27. A head context H[ ] is given by the following grammar. ·· H[] =[] | λx. H[] | H[]N ·· Definition 28. Projective head reduction is given by πh a a H[ a .N] H[π (N)] + H[π (N)] . πh 0 1 We can simulate by permutative reduction if we interpret the external πh sum + by an outermost ⊕ (taking special care if the label does not occur). Proposition 29. Permutative reduction simulates projective head reduction: H[N] if a/ ∈ fl(N) H[ a .N] a a H[π (N)] H[π (N)] otherwise. 0 1 150 U. Dal Lago et al. Proof. The case a/ ∈ fl(N) is immediate by a  step. For the other case, observe that H[ a .N] a .H[N]by λ and f steps, and since a does not occur in a a H[ ], that H[π (N)] = π (H[N]). By induction on N,if a is minimal in N (i.e. i i a a a ∈ fl(N) and a ≤ b for all b ∈ fl(N)) then N π (N) π (N). As required, 0 1 a a H[ a .N] a .H[π (N)] ⊕ H[π (N)] if a ∈ fl(N) . 0 1 A gap remains between which generators will not be duplicated, which we should be able to reduce, and which generators projective head reduction does reduce. In particular, to interpret call-by-value probabilistic reduction in Sec- tion 7, we would like to reduce under other generators. However, permutative reduction does not permit exchanging generators, and so only simulates reducing in head position. While (independent) probabilistic events are generally consid- ered interchangeable, it is a question whether the below equivalence is desirable. a . b .N ∼ b . a .N (4) We elide the issue by externalizing probabilistic events, and reducing with refer- ence to a predetermined binary stream s ∈{0, 1} representing their outcomes. In this way, we will preserve the intuitions of both permutative and projective reduction: we obtain a qualified version of the equivalence (4) (see (5) below), and will be able to reduce any generator on the spine of a term: under (other) generators and choices as well as under abstractions and in function position. Definition 30. The set of streams is S = {0, 1} , ranged over by r, s, t, and i · s denotes a stream with i ∈{0, 1} as first element and s as the remainder. Definition 31. The stream labeling N of a term N with a stream s ∈ S, which i s annotates generators as a with i ∈{0, 1} and variables as x with a stream s, is given inductively below. We lift β-reduction to stream-labeled terms by s s introducing a substitution case for stream-labeled variables: x [M/x]= M . s s i·s i s (λx. N) = λx. N ( a .N) = a .N a a s s s s s (NM) = N M (N ⊕M) = N ⊕M Definition 32. Projective reduction on stream-labeled terms is the rewrite relation given by i a a .N π (N) . Observe that in N a generator that occurs under n other generators on the spine of N, is labeled with the element of s at position n + 1. Generators in argument position remain unlabeled, until a β-step places them on the spine, in which case they become labeled by the new substitution case. We allow to annotate a term with a finite prefix of a stream, e.g. N with a singleton i, so that only part of the spine is labeled. Subsequent labeling of a partly labeled term is r s r·s then by (N ) = N (abusing notation). To introduce streams via the external Decomposing Probabilistic Lambda-Calculi 151 probabilistic sum, and to ignore an unused remaining stream after completing a probabilistic computation, we adopt the following equation. 0 1 N = N + N Proposition 33. Projective reduction generalizes projective head reduction: 0 1 a a H[ a .N]= H[ a .N]+ H[ a .N] H[π (N)] + H[π (N)] . 0 1 Returning to the interchangeability of probabilistic events, we refine (4)by exchanging the corresponding elements of the annotating streams: i·j·s i j s a b s ( a . b .N) = a . b .N π (π (N )) i j ∼ = (5) j·i·s j i s b a s ( b . a .N) = b . a .N π (π (N )) j i Stream-labeling externalizes all probabilities, making reduction determinis- tic. This is expressed by the following proposition, that stream-labeling com- mutes with reduction: if a generator remains unlabeled in M and becomes la- beled after a reduction step M N, what label it receives is predetermined. The deep reason is that stream labeling assigns an outcome to each generator in a way that corresponds to a call-by-name strategy for probabilistic reduction. s s Proposition 34. If M N by a step other than  then M N . Remark 35. The statement is false for the  rule a .N N (a/ ∈ fl(N)), as it removes a generator but not an element from the stream. Arguably, for this reason the rule should be excluded from the calculus. On the other hand, the ⊕ ⊕ rule is necessary to implement idempotence of , rather than just , as follows. N N = a .N ⊕N a .N N where a/ ∈ fl(N) p p The below proposition then expresses that projective reduction is an invari- ant for permutative reduction. If N M by a step (that is not ) on a labeled generator a or a corresponding choice ⊕, then N and M reduce to a common term, N P M, by the projective steps evaluating a . π π Proposition 36. Projective reduction is an invariant for permutative reduction, as follows (with a case for c symmetric to c , and where D[] is a context). 2 1 p p a a a a i i i i ⊕ a .C[N] ⊕ ⊕ ⊕ a .C[N N] a .C[(N M) N ] a .C[N N ] 0 1 0 1 π π π π a a π (C[N]) π (C[N ]) i i a a i i a .C[D[N ⊕N ]] a .C[D[N ] ⊕D[N ]] 0 1 0 1 π π π (C[D[N ]]) i 152 U. Dal Lago et al. p p i i i i λx. a .N a . λx. N ( a .N)M a .NM π π π π λ f a a a a = = λx. π (N) π (λx. N) π (N)Mπ (NM) i i i i 7 Call-by-value Interpretation We consider the interpretation of a call-by-value probabilistic λ-calculus. For simplicity we will allow duplicating (or deleting) β-redexes, and only restrict duplicating probabilities; our values V are then just deterministic—i.e. without choices—terms, possibly applications and not necessarily β-normal (so that our is actually β-reduction on deterministic terms, unlike [9]). We evaluate the βv internal probabilistic choice ⊕ to an external probabilistic choice +. ·· ⊕ N = x | λx.N | MN | M N (λx.N)V N[V/x] ·· v βv ·· ⊕ V, W = x | λx.V | VW M N M + N ·· v v The interpretation N  of a call-by-value term N into Λ is given as follows. v PE First, we translate N to a label-open term N  = θ  P by replacing each open L ⊕ ⊕ choice with one with a unique label, where the label-context θ collects the labels used. Then N  is the label closure N  = θ  P , which prefixes P v v L with a generator a for every a in θ. Definition 37. (Call-by-value interpretation) The open interpretation N open of a call-by-value term N is as follows, where all labels are fresh, and inductively N  = θ  P for i ∈{1, 2}. i open i L i x =  x N N  = θ · θ  P P open L 1 2 open 2 1 L 1 2 λx.N  = θ  λx.P N N  = θ · θ · a  P ⊕P 1 open 1 L 1 1 v 2 open 2 1 L 1 2 The label closure θ  P  is given inductively as follows. P  = P a · θ  P  = θ  a .P L L L The call-by-value interpretation of N is N  = N  . v open Our call-by-value reduction may choose an arbitrary order in which to evalu- ate the choices in a term N, but the order of generators in the interpretation N  is necessarily fixed. Then to simulate a call-by-value reduction, we cannot choose a fixed context stream a priori; all we can say is that for every reduction, there is some stream that allows us to simulate it. Specifically, a reduction step C[N N ] C[N ] where C[ ] is a call-by-value term context is simulated by 0 v 1 v j the following projective step. i j k i k ... a . b . c ...D[P ⊕P ] ... a . c ...D[P ] 0 1 π j Decomposing Probabilistic Lambda-Calculi 153 ⊕ ⊕ Here, C[N N ] = θ  D[P P ] with D[] a Λ -context, and θ giving 0 v 1 open L 0 1 PE rise to the sequence of generators ... a . b . c ... in the call-by-value transla- tion. To simulate the reduction step, if b occupies the n-th position in θ, then the n-th position in the context stream s must be the element j. Since β-reduction survives the translation and labeling process intact, we may simulate call-by- value probabilistic reduction by projective and β-reduction. Theorem 38. If N V then N  V  for some stream s ∈ S. v,βv π,β v 8 Conclusions and Future Work We believe our decomposition of probabilistic choice in λ-calculus to be an ele- gant and compelling way of restoring confluence, one of the core properties of the λ-calculus. Our probabilistic event λ-calculus captures traditional call-by-name and call-by-value probabilistic reduction, and offers finer control beyond those strategies. Permutative reduction implements a natural and fine-grained equiv- alence on probabilistic terms as internal rewriting, while projective reduction provides a complementary and more traditional external perspective. There are a few immediate areas for future work. Firstly, within probabilistic λ-calculus, it is worth exploring if our decomposition opens up new avenues in semantics. Secondly, our approach might apply to probabilistic reasoning more widely, outside the λ-calculus. Most importantly, we will explore if our approach can be extended to other computational effects. Our use of streams interprets probabilistic choice as a read operation from an external source, which means other read operations can be treated similarly. A complementary treatment of write operations would allow us to express a considerable range of effects, in- cluding input/output and state. Acknowledgments This work was supported by EPSRC Project EP/R029121/1 Typed Lambda- Calculi with Sharing and Unsharing. The first author is partially supported by the ANR project 19CE480014 PPS, the ERC Consolidator Grant 818616 DI- APASoN, and the MIUR PRIN 201784YSZ5 ASPRA. We thank the referees for their diligence and their helpful comments. We are grateful to Chris Bar- rett and—indirectly—Anupam Das for pointing us to Zantema and Van de Pol’s work [27]. References 1. Avanzini, M., Dal Lago, U., Ghyselen, A.: Type-based complexity analysis of probabilistic functional programs. In: 34th Annual ACM/IEEE Symposium on Logic in Computer Science, LICS 2019. pp. 1–13. IEEE Computer Society (2019). 2. Barendregt, H.P.: The Lambda Calculus – Its Syntax and Semantics, Studies in logic and the foundations of mathematics, vol. 103. North-Holland (1984) 154 U. Dal Lago et al. 3. Borgstr¨ om, J., Dal Lago, U., Gordon, A.D., Szymczak, M.: A lambda-calculus foundation for universal probabilistic programming. In: 21st ACM SIGPLAN In- ternational Conference on Functional Programming, ICFP 2016. pp. 33–46. ACM (2016). 4. Breuvart, F., Dal Lago, U.: On intersection types and probabilistic lambda calculi. In: roceedings of the 20th International Symposium on Principles and Practice of Declarative Programming, PPDP 2018. pp. 8:1–8:13. ACM (2018). 5. Dal Lago, U., Faggian, C., Valiron, B., Yoshimizu, A.: The geometry of parallelism: classical, probabilistic, and quantum effects. In: Proceedings of the 44th ACM SIGPLAN Symposium on Principles of Programming Languages, POPL 2017. pp. 833–845. ACM (2017). 6. Dal Lago, U., Grellois, C.: Probabilistic termination by monadic affine sized typing. ACM Transactions on Programming Languages and Systems 41(2), 10:1–10:65 (2019). 7. Dal Lago, U., Guerrieri, G., Heijltjes, W.: Decomposing probabilistic lambda- calculi (long version) (2020), 8. Dal Lago, U., Sangiorgi, D., Alberti, M.: On coinductive equivalences for higher- order probabilistic functional programs. In: The 41st Annual ACM SIGPLAN- SIGACT Symposium on Principles of Programming Languages, POPL ’14. pp. 297–308. ACM (2014). 9. Dal Lago, U., Zorzi, M.: Probabilistic operational semantics for the lambda cal- culus. RAIRO - Theoretical Informatics and Applications 46(3), 413–450 (2012). 10. Danos, V., Ehrhard, T.: Probabilistic coherence spaces as a model of higher- order probabilistic computation. Information and Compututation 209(6), 966–991 (2011). 11. de’Liguoro, U., Piperno, A.: Non deterministic extensions of untyped lambda-calculus. Information and Computation 122(2), 149–177 (1995). 12. Dershowitz, N.: Orderings for term-rewriting systems. Theoretical Computer Sci- ence 17, 279–301 (1982). 13. Ehrhard, T., Pagani, M., Tasson, C.: Full abstraction for probabilistic PCF. Journal of the ACM 65(4), 23:1–23:44 (2018). 14. Ehrhard, T., Tasson, C.: Probabilistic call by push value. Logical Methods in Com- puter Science 15(1) (2019). 15. Faggian, C., Ronchi Della Rocca, S.: Lambda calculus and probabilis- tic computation. In: 34th Annual ACM/IEEE Symposium on Logic in Computer Science, LICS 2019. pp. 1–13. IEEE Computer Society (2019). 16. Goubault-Larrecq, J.: A probabilistic and non-deterministic call-by-push- value language. In: 34th Annual ACM/IEEE Symposium on Logic in Computer Science, LICS 2019. pp. 1–13. IEEE Computer Society (2019). 17. Jones, C., Plotkin, G.D.: A probabilistic powerdomain of evaluations. In: Proceedings of the Fourth Annual Symposium on Logic in Com- puter Science (LICS ’89). pp. 186–195. IEEE Computer Society (1989). 18. Jung, A., Tix, R.: The troublesome probabilistic powerdomain. Electronic Notes in Theoretical Computer Science 13, 70–91 (1998). 0661(05)80216-6 Decomposing Probabilistic Lambda-Calculi 155 19. Leventis, T.: Probabilistic B¨ ohm trees and probabilistic separation. In: Pro- ceedings of the 33rd Annual ACM/IEEE Symposium on Logic in Com- puter Science, LICS 2018. pp. 649–658. IEEE Computer Society (2018). 20. Leventis, T.: A deterministic rewrite system for the probabilistic λ-calculus. Mathematical Structures in Computer Science 29(10), 1479–1512 (2019). 21. Loader, R.: Notes on simply typed lambda calculus. Reports of the laboratory for foundations of computer science ECS-LFCS-98-381, University of Edinburgh, Edinburgh (1998), 22. Manber, U., Tompa, M.: Probabilistic, nondeterministic, and alternating decision trees. In: 14th Annual ACM Symposium on Theory of Computing. pp. 234–244 (1982). 23. Ramsey, N., Pfeffer, A.: Stochastic lambda calculus and monads of probability distributions. In: Conference Record of POPL 2002: The 29th SIGPLAN-SIGACT Symposium on Principles of Programming Languages. pp. 154–165. POPL ’02 (2002). 24. Saheb-Djahromi, N.: Probabilistic LCF. In: Mathematical Foundations of Com- puter Science 1978, Proceedings, 7th Symposium. Lecture Notes in Computer Sci- ence, vol. 64, pp. 442–451. Springer (1978). 7 92 25. Sangiorgi, D., Vignudelli, V.: Environmental bisimulations for probabilistic higher- order languages. In: Proceedings of the 43rd Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, POPL 2016. pp. 595–607 (2016). 26. Takahashi, M.: Parallel reductions in lambda-calculus. Information and Computa- tion 118(1), 120–127 (1995). 27. Zantema, H., van de Pol, J.: A rewriting approach to binary decision dia- grams. The Journal of Logic and Algebraic Programming 49(1-2), 61–86 (2001). 156 U. Dal Lago et al. Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made. The images or other third party material in this chapter are included in the chapter’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. On the k-synchronizability of Systems Cinzia Di Giusto () , Laetitia Laversa , and Etienne Lozes Universit´eCote ˆ d’Azur, CNRS, I3S, Sophia Antipolis, France {cinzia.di-giusto,laetitia.laversa,etienne.lozes} Abstract. We study k-synchronizability: a system is k-synchronizable if any of its executions, up to reordering causally independent actions, can be divided into a succession of k-bounded interaction phases. We show two results (both for mailbox and peer-to-peer automata): first, the reachability problem is decidable for k-synchronizable systems; second, the membership problem (whether a given system is k-synchronizable) is decidable as well. Our proofs fix several important issues in previous attempts to prove these two results for mailbox automata. Keywords: Verification · Communicating Automata · A/Synchronous communication. 1 Introduction Asynchronous message-passing is ubiquitous in communication-centric systems; these include high-performance computing, distributed memory management, event-driven programming, or web services orchestration. One of the parameters that play an important role in these systems is whether the number of pending sent messages can be bounded in a predictable fashion, or whether the buffering capacity offered by the communication layer should be unlimited. Clearly, when considering implementation, testing, or verification, bounded asynchrony is pre- ferred over unbounded asynchrony. Indeed, for bounded systems, reachability analysis and invariants inference can be solved by regular model-checking [5]. Unfortunately and even if designing a new system in this setting is easier, this is not the case when considering that the buffering capacity is unbounded, or that the bound is not known a priori . Thus, a question that arises naturally is how can we bound the “behaviour” of a system so that it operates as one with un- bounded buffers? In a recent work [4], Bouajjani et al. introduced the notion of k-synchronizable system of finite state machines communicating through mail- boxes and showed that the reachability problem is decidable for such systems. Intuitively, a system is k-synchronizable if any of its executions, up to reordering causally independent actions, can be chopped into a succession of k-bounded in- teraction phases. Each of these phases starts with at most k send actions that are followed by at most k receptions. Notice that, a system may be k-synchronizable even if some of its executions require buffers of unbounded capacity. As explained in the present paper, this result, although valid, is surprisingly non-trivial, mostly due to complications introduced by the mailbox semantics of c The Author(s) 2020 J. Goubault-Larrecq and B. K¨ onig (Eds.): FOSSACS 2020, LNCS 12077, pp. 157–176, 2020. 158 C. Di Giusto et al. communications. Some of these complications were missed by Bouajjani et al. and the algorithm for the reachability problem in [4] suffers from false positives. Another problem is the membership problem for the subclass of k-synchronizable systems: for a given k and a given system of communicating finite state machines, is this system k-synchronizable? The main result in [4] is that this problem is decidable. However, again, the proof of this result contains an important flaw at the very first step that breaks all subsequent developments; as a consequence, the algorithm given in [4] produces both false positives and false negatives. In this work, we present a new proof of the decidability of the reachability problem together with a new proof of the decidability of the membership pro- blem. Quite surprisingly, the reachability problem is more demanding in terms of causality analysis, whereas the membership problem, although rather intricate, builds on a simpler dependency analysis. We also extend both decidability results to the case of peer-to-peer communication. Outline. Next section recalls the definition of communicating systems and re- lated notions. In Section 3 we introduce k-synchronizability and we give a graphi- cal characterisation of this property. This characterisation corrects Theorem 1 in [4] and highlights the flaw in the proof of the membership problem. Next, in Section 4, we establish the decidability of the reachability problem, which is the core of our contribution and departs considerably from [4]. In Section 5,we show the decidability of the membership problem. Section 6 extends previous results to the peer-to-peer setting. Finally Section 7 concludes the paper dis- cussing other related works. Proofs and some additional material are available at 2 Preliminaries A communicating system is a set of finite state machines that exchange messages: automata have transitions labelled with either send or receive actions. The paper mainly considers as communication architecture, mailboxes: i.e., messages await to be received in FIFO buffers that store all messages sent to a same automaton, regardless of their senders. Section 6, instead, treats peer-to-peer systems, their introduction is therefore delayed to that point. Let V be a finite set of messages and P a finite set of processes. A send action, denoted send(p, q, v), designates the sending of message v from process p to process q. Similarly a receive action rec(p, q, v) expresses that process q is receiving message v from p. We write a to denote a send or receive action. Let S = {send(p, q, v) | p, q ∈ P, v ∈ V} be the set of send actions and R = {rec(p, q, v) | p, q ∈ P, v ∈ V} the set of receive actions. S and R stand p p for the set of sends and receives of process p respectively. Each process is encoded by an automaton and by abuse of notation we say that a system is the parallel composition of processes. Definition 1 (System). A system is a tuple S = (L ,δ ,l ) | p ∈ P where, p p for each process p, L is a finite set of local control states, δ ⊆ (L ×(S ∪R )× p p p p p L ) is the transition relation (also denoted l − → l ) and l is the initial state. p p p On the k-synchronizability of Systems 159 Definition 2 (Configuration). Let S = (L ,δ ,l ) | p ∈ P , a configuration p p is a pair (l, Buf) where l =(l ) ∈ Π L is a global control state of S (a p p∈P p∈P p ∗ P local control state for each automaton), and Buf =(b ) ∈ (V ) is a vector p p∈P of buffers, each b being a word over V. We write l to denote the vector of initial states of all processes p ∈ P, and Buf 0 0 stands for the vector of empty buffers. The semantics of a system is defined by the two rules below. [SEND] [RECEIVE] send(p,q,v) rec(p,q,v) l − −−−−−−→ l b = b · v l − −−−−−→ l b = v · b p p q q q q p q q q send(p,q,v) rec(p,q,v) (l, Buf) − −−−−−−→ (l[l /l ], Buf[b /b ]) (l, Buf) − −−−−−→ (l[l /l ], Buf[b /b ]) p p q q q q q q A send action adds a message in the buffer b of the receiver, and a receive action pops the message from this buffer. An execution e = a ··· a is a sequence of 1 n a a 1 n actions in S ∪ R such that (l , Buf ) −→ ··· −−→ (l, Buf) for some l and Buf. 0 0 e a a 1 n As usual = ⇒ stands for −→··· −−→.Wewrite asEx(S) to denote the set of asynchronous executions of a system S. In a sequence of actions e = a ··· a , 1 n a send action a = send(p, q, v)is matched by a reception a = rec(p ,q , v ) i j (denoted by a  a )if i<j, p = p , q = q , v = v , and there is  ≥ 1 such i j that a and a are the th actions of e with these properties respectively. A send i j action a is unmatched if there is no matching reception in e.A message exchange of a sequence of actions e is a set either of the form v = {a ,a } with a  a or i j i j of the form v = {a } with a unmatched. For a message v , we will note v the i i i i corresponding message exchange. When v is either an unmatched send(p, q, v) or a pair of matched actions {send(p, q, v),rec(p, q, v)}, we write proc (v) for p and proc (v) for q. Note that proc (v) is defined even if v is unmatched. Finally, R R we write procs(v) for {p} in the case of an unmatched send and {p, q} in the case of a matched send. An execution imposes a total order on the actions. We are interested in stressing the causal dependencies between messages. We thus make use of mes- sage sequence charts (MSCs) that only impose an order between matched pairs of actions and between the actions of a same process. Informally, an MSC will be depicted with vertical timelines (one for each process) where time goes from top to bottom, that carry some events (points) representing send and receive actions of this process (see Fig. 1). An arc is drawn between two matched events. We will also draw a dashed arc to depict an unmatched send event. An MSC is, thus, a partially ordered set of events, each corresponding to a send or receive action. Definition 3 (MSC). A message sequence chart is a tuple (Ev, λ, ≺), where – Ev is a finite set of events, – λ : Ev → S ∪ R tags each event with an action, – ≺=(≺ ∪≺ ) is the transitive closure of ≺ and ≺ where: po src po src •≺ is a partial order on Ev such that, for all process p, ≺ induces a po po −1 total order on the set of events of process p, i.e., on λ (S ∪ R ) p p SR 160 C. Di Giusto et al. pq pq pq r r r v v v 1 1 1 v v v SS 2 2 3 v v 3 2 RS v v 1 2 SR (a) (b) (c) (d) Fig. 1: (a) and (b): two MSCs that violate causal delivery. (c) and (d): an MSC and its conflict graph •≺ is a binary relation that relates each receive event to its preceding src send event : −1 ∗ for all events r ∈ λ (R), there is exactly one events s such that s ≺ r src −1 ∗ for all events s ∈ λ (S), there is at most one event r such that s ≺ r src ∗ for any two events s, r such that s ≺ r, there are p, q, v such that src λ(s)= send(p, q, v) and λ(r)= rec(p, q, v). We identify MSCs up to graph isomorphism (i.e., we view an MSC as a labeled graph). For a given well-formed (i.e., each reception is matched) sequence of actions e = a ...a ,welet msc(e) be the MSC where Ev =[1..n], ≺ is the 1 n po set of pairs of indices (i, j) such that i<j and {a ,a }⊆ S ∪ R for some i j p p p ∈ P (i.e., a and a are actions of a same process), and ≺ is the set of pairs i j src of indices (i, j) such that a  a . We say that e = a ...a is a linearisation i j 1 n of msc(e), and we write asT r(S) to denote {msc(e) | e ∈ asEx(S)} the set of MSCs of system S. Mailbox communication imposes a number of constraints on what and when messages can be read. The precise definition is given below, we now discuss some of the possible scenarios. For instance: if two messages are sent to a same process, they will be received in the same order as they have been sent. As another example, unmatched messages also impose some constraints: if a process p sends an unmatched message to r, it will not be able to send matched messages to r afterwards (Fig. 1a); or similarly, if a process p sends an unmatched message to r, any process q that receives subsequent messages from p will not be able to send matched messages to r afterwards (Fig. 1b). When an MSC satisfies the constraint imposed by mailbox communication, we say that it satisfies causal delivery. Notice that, by construction, all executions satisfy causal delivery. Definition 4 (Causal Delivery). Let (Ev, λ, ≺) be an MSC. We say that it satisfies causal delivery if the MSC has a linearisation e = a ...a such that for 1 n any two events i ≺ j such that a = send(p, q, v) and a = send(p ,q, v ), either i j a is unmatched, or there are i ,j such that a  a , a  a , and i ≺ j . j i i j j Our definition enforces the following intuitive property. On the k-synchronizability of Systems 161 Proposition 1. An MSC msc satisfies causal delivery if and only if there is a system S and an execution e ∈ asEx(S) such that msc = msc(e). We now recall from [4] the definition of conflict graph depicting the causal dependencies between message exchanges. Intuitively, we have a dependency SS whenever two messages have a process in common. For instance an −→ depen- dency between message exchanges v and v expresses the fact that v has been sent after v, by the same process. Definition 5 (Conflict Graph). The conflict graph CG(e) of a sequence of XY actions e = a ··· a is the labeled graph (V, {−→} ) where V is the set 1 n X,Y ∈{R,S} of message exchanges of e, and for all X, Y ∈{S, R}, for all v, v ∈ V , there is XY a XY dependency edge v −→ v between v and v if there are i<j such that {a } = v ∩ X, {a } = v ∩ Y , and proc (v)= proc (v ). i j X Y Notice that each linearisation e of an MSC will have the same conflict graph. We can thus talk about an MSC and the associated conflict graph. (As an exam- ple see Figs. 1c and 1d.) XY We write v → v if v −→ v for some X, Y ∈{R, S}, and v → v if there is a (possibly empty) path from v to v . 3 k-synchronizable Systems In this section, we define k-synchronizable systems. The main contribution of this part is a new characterisation of k-synchronizable executions that corrects the one given in [4]. In the rest of the paper, k denotes a given integer k ≥ 1. A k-exchange denotes a sequence of actions starting with at most k sends and followed by at most k receives matching some of the sends. An MSC is k-synchronous if there exists a linearisation that is breakable into a sequence of k-exchanges, such that a message sent during a k-exchange cannot be received during a subsequent one: either it is received during the same k-exchange, or it remains orphan forever. Definition 6 (k-synchronous). An MSC msc is k-synchronous if: 1. there exists a linearisation of msc e = e · e ··· e where for all i ∈ [1..n], 1 2 n ≤k ≤k e ∈ S · R , 2. msc satisfies causal delivery, 3. for all j, j such that a  a holds in e, a  a holds in some e . j j j j i An execution e is k-synchronizable if msc(e) is k-synchronous. We write sT r (S) to denote the set {msc(e) | e ∈ asEx(S) and msc(e)is k-synchronous}. Example 1 (k-synchronous MSCs and k-synchronizable Executions). SS RR 162 C. Di Giusto et al. pq pq pq r r rs RR v v v 2 1 2 SR SR v v v v 3 2 1 3 v v v 3 2 4 v v v v 4 3 4 5 RR (a) (b) (c) (d) Fig. 2: (a) the MSC of Example 1.1. (b) the MSC of Example 1.2. (c) the MSC of Example 2 and (d) its conflict graph. 1. There is no k such that the MSC in Fig. 2ais k-synchronous. All messages must be grouped in the same k-exchange, but it is not possible to schedule all the sends first, because the reception of v happens before the sending of v . Still, this MSC satisfies causal delivery. 2. Let e = send(r, q, v )·send(q, p, v )·send(p, q, v )·rec(q, p, v )·rec(r, q, v ) 1 3 2 1 2 3 be an execution. Its MSC, msc(e ) depicted in Fig. 2b satisfies causal deliv- ery. Notice that e can not be divided in 1-exchanges. However, if we consider the alternative linearisation of msc(e ): e = send(p, q, v ) · send(q, p, v ) · 1 2 1 2 rec(q, p, v ) · send(r, q, v ) · rec(r, q, v ), we have that e is breakable into 1- 2 3 3 2 exchanges in which each matched send is in a 1-exchange with its reception. Therefore, msc(e ) is 1-synchronous and e is 1-synchronizable. Remark that 1 1 e is not an execution and there exists no execution that can be divided into 1-exchanges. A k-synchronous MSC highlights dependencies between mes- sages but does not impose an order for the execution. Comparison with [4].In [4], the authors define set sEx (S) as the set of k- synchronous executions of system S in the k-synchronous semantics. Nonetheless as remarked in Example 1.2 not all executions of a system can be divided into k-exchanges even if they are k-synchronizable. Thus, in order not to lose any executions, we have decided to reason only on MSCs (called traces in [4]). Following standard terminology, we say that a set U ⊆ V of vertices is a strongly connected component (SCC) of a given graph (V, →) if between any two ∗   ∗ vertices v, v ∈ U, there exist two oriented paths v → v and v → v. The statement below fixes some issues with Theorem 1 in [4]. Theorem 1 (Graph Characterisation of k-synchronous MSCs). Let msc be a causal delivery MSC. msc is k-synchronous iff every SCC in its conflict graph is of size at most k and if no RS edge occurs on any cyclic path. Example 2 (A 5-synchronous MSC). Fig. 2c depicts a 5-synchronous MSC, that is not 4-synchronous. Indeed, its conflict graph (Fig. 2d) contains a SCC of size 5 (all vertices are on the same SCC). RR SS On the k-synchronizability of Systems 163 Comparison with [4]. Bouajjani et al. give a characterisation of k-synchronous executions similar to ours, but they use the word cycle instead of SCC, and the subsequent developments of the paper suggest that they intended to say Hamiltonian cycle (i.e., a cyclic path that does not go twice through the same vertex). It is not the case that a MSC is k-synchronous if and only if every Hamiltonian cycle in its conflict graph is of size at most k and if no RS edge occurs on any cyclic path. Indeed, consider again Example 2. This graph is not Hamiltonian, and the largest Hamiltonian cycle indeed is of size 4 only. But as we already discussed in Example 2, the corresponding MSC is not 4-synchronous. As a consequence, the algorithm that is presented in [4] for deciding whether a system is k-synchronizable is not correct as well: the MSC of Fig. 2c would be considered 4-synchronous according to this algorithm, but it is not. 4 Decidability of Reachability for k-synchronizable Systems We show that the reachability problem is decidable for k-synchronizable systems. While proving this result, we have to face several non-trivial aspects of causal delivery that were missed in [4] and that require a completely new approach. Definition 7 (k-synchronizable System). A system S is k-synchronizable if all its executions are k-synchronizable, i.e., sT r (S)= asT r(S). In other words, a system S is k-synchronizable if for every execution e of S, msc(e) may be divided into k-exchanges. Remark 1. In particular, a system may be k-synchronizable even if some of its executions fill the buffers with more than k messages. For instance, the only linearisation of the 1-synchronous MSC Fig. 2b that is an execution of the system needs buffers of size 2. For a k-synchronizable system, the reachability problem reduces to the rea- chability through a k-synchronizable execution. To show that k-synchronous reachability is decidable, we establish that the set of k-synchronous MSCs is regular. More precisely, we want to define a finite state automaton that accepts a sequence e · e ··· e of k-exchanges if and only if they satisfy causal delivery. 1 2 n We start by giving a graph-theoretic characterisation of causal delivery. For XY this, we define the extended edges v  v of a given conflict graph. The relation XY XY is defined in Fig. 3 with X, Y ∈{S, R}. Intuitively, v  v expresses that event X of v must happen before event Y of v due to either their order on the same machine (Rule 1), or the fact that a send happens before its matching receive (Rule 2), or due to the mailbox semantics (Rules 3 and 4), or because of a chain of such dependencies (Rule 5). We observe that in the extended con- flict graph, obtained applying such rules, a cyclic dependency appears whenever causal delivery is not satisfied. 164 C. Di Giusto et al. XY RR v ∩ R = ∅ v −→ v v −→ v 1 2 1 2 (Rule 2) (Rule 1) (Rule 3) SR XY SS v  v v  v v  v 1 2 1 2 v ∩ R = ∅ v ∩ R = ∅ 1 2 XY YZ v  v 1 2 proc (v )= proc (v ) 1 2 R R (Rule 5) (Rule 4) XZ SS v  v 1 2 v  v 1 2 Fig. 3: Deduction rules for extended dependency edges of the conflict graph Example 3. Fig. 5aand 5b depict an MSC and its associated conflict graph with some extended edges. This MSC violates causal delivery and there is a cyclic SS dependency v  v . 1 1 Theorem 2 (Graph-theoretic Characterisation of Causal Delivery). An MSC satisfies causal delivery iff there is no cyclic causal dependency of the form SS v  v for some vertex v of its extended conflict graph. Let us now come back to our initial problem: we want to recognise with finite memory the sequences e ,e ...e of k-exchanges that composed give an MSC 1 2 n that satisfies causal delivery. We proceed by reading each k-exchange one by one in sequence. This entails that, at each step, we have only a partial view of the global conflict graph. Still, we want to determine whether the acyclicity condition of Theorem 2 is satisfied in the global conflict graph. The crucial observation is that only the edges generated by Rule 4 may “go back in time”. This means that we have to remember enough information from the previously examined k- exchanges to determine whether the current k-exchange contains a vertex v that shares an edge with some unmatched vertex v seen in a previous k-exchange and whether this could participate in a cycle. This is achieved by computing two sets of processes C and C that collect the following information: a process S,p R,p q is in C if it performs a send action causally after an unmatched send to S,p p, or it is the sender of the unmatched send; a process q belongs to C if it R,p receives a message that was sent after some unmatched message directed to p. More precisely, we have: SS C = {proc (v) | v  v & v is unmatched & proc (v )= p} S,p S R SS C = {proc (v) | v  v & v is unmatched & proc (v )= p & v ∩ R = ∅} R,p R R These sets abstract and carry from one k-exchange to another the necessary information to detect violations of causal delivery. We compute them in any local conflict graph of a k-exchange incrementally, i.e., knowing what they were at the end of the previous k-exchange, we compute them at the end of the current one. More precisely, let e = s ··· s · r ··· r  be a k-exchange, CG(e)=(V, E) its 1 m 1 m P P conflict graph and B : P → (2 × 2 ) the function that associates to each p ∈ P the two sets B(p)=(C ,C ). Then, the conflict graph CG(e, B) is the graph S,p R,p (V ,E ) with V = V ∪{ψ | p ∈ P} and E ⊇ E as defined below. For each process p ∈ P, the “summary node” ψ shall account for all past unmatched p On the k-synchronizability of Systems 165 ∗ ∗ e = s ··· s · r ··· r  s ··· s ∈ S r ··· r  ∈ R 0 ≤ m ≤ m ≤ k 1 m 1 1 m 1 m m (l, Buf ) = ⇒ (l , Buf) for some Buf for all p ∈ P B(p)=(C ,C )and B (p)=(C ,C ), S,p R,p S,p R,p Unm = {ψ }∪{v | v is unmatched, proc (v)= p} p p SS C = C ∪{p | p ∈ C ,v  ψ , (proc (v)= p or v = ψ )}∪ X,p X,p X,q q p SS {proc (v) | v ∈ Unm ∩ V, X = S}∪{proc (v ) | v  v ,v ∈ Unm ,v ∩ X = ∅} p p X X for all p ∈ P,p ∈ C R,p e,k (l, B) ==⇒ (l ,B ) cd e,k Fig. 4: Definition of the relation ==⇒ cd messages sent to p that occurred in some k-exchange before e. E is the set E XY of edges −→ among message exchanges of e, as in Definition 5, augmented with the following set of extra edges that takes into account summary nodes. SX {ψ −→ v | proc (v) ∈ C & v ∩ X = ∅ for some X ∈{S, R}} (1) p S,p SS ∪{ψ −→ v | proc (v) ∈ C & v ∩ R = ∅ for some X ∈{S, R}} (2) p R,p SS ∪{ψ −→ v | proc (v) ∈ C & v is unmatched} (3) p R,p SS SS ∪{v −→ ψ | proc (v)= p & v ∩ R = ∅} ∪ {ψ −→ ψ | p ∈ C } (4) p q p R,q These extra edges summarise/abstract the connections to and from previous SS SR k-exchanges. Equation (1) considers connections −→ and −→ that are due to two sends messages or, respectively, a send and a receive on the same process. RR RS Equations (2) and (3) considers connections −→ and −→ that are due to two received messages or, respectively, a receive and a subsequent send on the same process. Notice how the rules in Fig. 3 would then imply the existence of a SS SS connection , in particular Equation (3) abstract the existence of an edge built because of Rule 4. Equations in (4) abstract edges that would connect the current k-exchange to previous ones. As before those edges in the global conflict graph would correspond to extended edges added because of Rule 4 in Fig. 3. Once we have this enriched local view of the conflict graph, we take its extended XY version. Let  denote the edges of the extended conflict graph as defined from rules in Fig. 3 taking into account the new vertices ψ and their edges. e,k Finally, let S be a system and ==⇒ be the transition relation given in Fig. 4 cd among abstract configurations of the form (l, B). l is a global control state of P P S and B : P → 2 × 2 is the function defined above that associates to each e,k process p a pair of sets of processes B(p)=(C ,C ). Transition ==⇒ updates S,p R,p cd these sets with respect to the current k-exchange e. Causal delivery is verified by checking that for all p ∈ P,p ∈ C meaning that there is no cyclic dependency R,p 166 C. Di Giusto et al. pq rs v1 v1 C = ∅ S,r C = ∅ R,r SS SS SS SS e C = {q} 1 S,r C = {s} v2 v2 R,r SS v SS RR SS RR C = {q} S,r e2 v3 SS v3 C = {s} R,r C = {p, q} S,r SS SS SS C = {s, r} R,r SS v4 v4 (a) (b) (c) Fig. 5: (a) an MSC (b) its associated global conflict graph, (c) the conflict graphs of its k-exchanges P P as stated in Theorem 2. The initial state is (l ,B ), where B : P → (2 × 2 ) 0 0 0 denotes the function such that B (p)=(∅, ∅) for all p ∈ P. Example 4 (An Invalid Execution). Let e = e · e with e and e the two 1 2 1 2 2-exchanges of this execution. such that e = send(q, r, v ) · send(q, s, v ) · 1 1 2 rec(q, s, v ) and e = send(p, s, v ) · rec(p, s, v ) · send(p, r, v ) · rec(p, r, v ). 2 2 3 3 4 4 Fig. 5a and 5c show the MSC and corresponding conflict graph of each of the 2-exchanges. Note that two edges of the global graph (in blue) “go across” k- exchanges. These edges do not belong to the local conflict graphs and are mim- icked by the incoming and outgoing edges of summary nodes. The values of sets C and C at the beginning and at the end of the k-exchange are given S,r R,r on the right. All other sets C and C for p = r are empty, since there is S,p R,p only one unmatched message to process r. Notice how at the end of the second k-exchange, r ∈ C signalling that message v violates causal delivery. R,r e,k Comparison with [4].In [4] the authors define ==⇒ in a rather different way: cd they do not explicitly give a graph-theoretic characterisation of causal delivery; instead they compute, for every process p, the set B(p) of processes that either sent an unmatched message to p or received a message from a process in B(p). They then make sure that any message sent to p by a process q ∈ B(p)is unmatched. According to that definition, the MSC of Fig. 5b would satisfy causal delivery and would be 1-synchronous. However, this is not the case (this MSC does not satisfy causal delivery) as we have shown in Example 3. Due to to the above errors, we had to propose a considerably different approach. The extended edges of the conflict graph, and the graph-theoretic characterisation of causal delivery as well as summary nodes, have no equivalent in [4]. Next lemma proves that Fig. 4 properly characterises causal delivery. On the k-synchronizability of Systems 167 Lemma 1. An MSC msc is k-synchronous iff there is e = e ··· e a lineari- 1 n e ,k e ,k 1 n sation such that (l ,B ) ==⇒ ··· ==⇒ (l ,B ) for some global state l and some 0 0 cd cd P P B : P → (2 × 2 ). Note that there are only finitely many abstract configurations of the form P P (l, B) with l a tuple of control states and B : P → (2 × 2 ). Moreover, since V is finite, the alphabet over the possible k-exchange for a given k is also finite. e,k Therefore ==⇒ is a relation on a finite set, and the set sT r (S)of k-synchronous cd MSCs of a system S forms a regular language. It follows that it is decidable whether a given abstract configuration of the form (l, B) is reachable from the initial configuration following a k-synchronizable execution. Theorem 3. Let S be a k-synchronizable system and l a global control state of S. The problem whether there exists e ∈ asEx(S) and Buf such that (l , Buf ) = ⇒ 0 0 (l, Buf) is decidable. Remark 2. Deadlock-freedom, unspecified receptions, and absence of orphan mes- sages are other properties that become decidable for a k-synchronizable system because of the regularity of the set of k-synchronous MSCs. 5 Decidability of k-synchronizability for Mailbox Systems We establish the decidability of k-synchronizability; our approach is similar to the one of [4] based on the notion of borderline violation, but we adjust it to adapt to the new characterisation of k-synchronizable executions (Theorem 1). Definition 8 (Borderline Violation). A non k-synchronizable execution e is a borderline violation if e = e · r, r is a reception and e is k-synchronizable. Note that a system S that is not k-synchronizable always admits at least one borderline violation e · r ∈ asEx(S) with r ∈ R: indeed, there is at least one execution e ∈ asEx(S) which contains a unique minimal prefix of the form e · r that is not k-synchronizable; moreover since e is k-synchronizable, r cannot be a k-exchange of just one send action, therefore it must be a receive action. In order to find such a borderline violation, Bouajjani et al. introduced an instrumented system S that behaves like S, except that it contains an extra process π,and such that a non-deterministically chosen message that should have been sent from a process p to a process q may now be sent from p to π, and later forwarded by π to q.In S , each process p has the possibility, instead of sending a message v to q, to deviate this message to π;ifitdoesso, p continues its execution as if it really had sent it to q. Note also that the message sent to π get tagged with the original destination process q. Similarly, for each possible reception, a process has the possibility to receive a given message not from the initial sender but from π. The process π has an initial state from which it can receive any messages from the system. Each reception makes it go into a different state. From this state, v 168 C. Di Giusto et al. it is able to send the message back to the original recipient. Once a message is forwarded, π reaches its final state and remains idle. The following example illustrates how the instrumented system works. Example 5 (A Deviated Message). Let e , e be two executions of a system S with 1 2 MSCs respectively msc(e ) and msc(e ). e is not 1- 1 2 1 pq pq synchronizable. It is borderline in S. If we delete the last π (q, v ) reception, it becomes indeed 1-synchronizable. msc(e ) 1 is the MSC obtained from the instrumented system S v v 2 2 where the message v is first deviated to π and then sent back to q from π. Note that msc(e ) is 1-synchronous. In this case, the instrumented system S in the 1-synchronous semantics msc(e ) msc(e ) 1 2 “reveals” the existence of a borderline violation of S. For each execution e · r ∈ asEx(S) that ends with a reception, there exists an execution deviate(e · r) ∈ asEx(S ) where the message exchange associated with the reception r has been deviated to π; formally, if e · r = e · s · e · r with 1 2 r = rec(p, q, v) and s  r, then deviate(e·r)= e ·send(p, π, (q, v))·rec(p, π, (q, v))·e ·send(π, q, (v))·rec(π, q, v). 1 2 Definition 9 (Feasible Execution, Bad Execution). A k-synchronizable execution e of S is feasible if there is an execution e · r ∈ asEx(S) such that deviate(e·r)= e . A feasible execution e = deviate(e·r) of S is bad if execution e · r is not k-synchronizable in S. pq pq Example 6 (A Non-feasible Execution). (q, v ) Let e be an execution such that msc(e ) is as depicted on the right. Clearly, this MSC satisfies causal delivery and could be the execution of some instrumented system S . However, the sequence e·r such that deviate(e·r)= e does not satisfy causal delivery, therefore it cannot be an execution of the original system S. In other words, msc(e ) msc(e · r) the execution e is not feasible. Lemma 2. A system S is not k-synchronizable iff there is a k-synchronizable execution e of S that is feasible and bad. As we have already noted, the set of k-synchronous MSCs of S is regular. The decision procedure for k-synchronizability follows from the fact that the set of MSCs that have as linearisation a feasible bad execution as we will see, is regular as well, and that it can be recognised by an (effectively computable) non-deterministic finite state automaton. The decidability of k-synchronizability follows then from Lemma 2 and the decidability of the emptiness problem for non-deterministic finite state automata. On the k-synchronizability of Systems 169 Recognition of Feasible Executions. We start with the automaton that recognises feasible executions; for this, we revisit the construction we just used for recognising sequences of k-exchanges that satisfy causal delivery. In the remainder, we assume an execution e ∈ asEx(S ) that contains exactly one send of the form send(p, π, (q, v)) and one reception of the form XY rec(π, q, v), this reception being the last action of e . Let (V, {−→} )be X,Y ∈{R,S} the conflict graph of e . There are two uniquely determined vertices υ ,υ ∈ start stop V such that proc (υ )= π and proc (υ )= π that correspond, respectively, start stop R S to the first and last message exchanges of the deviation. The conflict graph of e · r is then obtained by merging these two nodes. Lemma 3. The execution e is not feasible iff there is a vertex v in the conflict SS RR graph of e such that υ  v −→ υ . start stop In order to decide whether an execution e is feasible, we want to forbid that a send action send(p ,q, v ) that happens causally after υ is matched by a start receive rec(p ,q, v ) that happens causally before the reception υ . As a matter stop of fact, this boils down to deal with the deviated send action as an unmatched π π send. So we will consider sets of processes C and C similar to the ones used S R e,k for ==⇒, but with the goal of computing which actions happen causally after the cd send to π. We also introduce a summary node ψ and the extra edges following start P P the same principles as in the previous section. Formally, let B : P → (2 × 2 ), π π ≤k ≤k C ,C ⊆ P and e ∈ S R be fixed, and let CG(e, B)=(V ,E ) be the S R constraint graph with summary nodes for unmatched sent messages as defined π π in the previous section. The local constraint graph CG(e, B, C ,C ) is defined S R as the graph (V ,E ) where V = V ∪{ψ } and E is E augmented with start SX {ψ −→ v | proc (v) ∈ C & v ∩ X = ∅ for some X ∈{S, R}} start X S SS ∪{ψ −→ v | proc (v) ∈ C & v ∩ R = ∅ for some X ∈{S, R}} start X R SS SS π π ∪{ψ −→ v | proc (v) ∈ C & v is unmatched}∪{ψ −→ ψ | p ∈ C } start start p R R R XY As before, we consider the “closure”  of these edges by the rules of Fig. 3. e,k The transition relation = ==⇒ is defined in Fig. 6. It relates abstract configurations feas of the form (l, B, C, dest ) with C =(C ,C ) and dest ∈ P∪{⊥} storing to π S,π R,π π whom the message deviated to π was supposed to be delivered. Thus, the initial abstract configuration is (l ,B , (∅, ∅), ⊥), where ⊥ means that the processus 0 0 dest has not been determined yet. It will be set as soon as the send to process π is encountered. Lemma 4. Let e be an execution of S . Then e is a k-synchronizable feasible execution iff there are e = e ··· e · send(π, q, v) · rec(π, q, v) with e ,...,e ∈ 1 n 1 n ≤k ≤k  P  P 2 S R , B : P → 2 , C ∈ (2 ) , and a tuple of control states l such that msc(e )= msc(e ), π ∈ C (with B (q)=(C ,C )), and R,q S,q R,q e ,k e ,k 1 n (l ,B , (∅, ∅), ⊥) = ==⇒ ... = ==⇒ (l ,B , C ,q). 0 0 feas feas 170 C. Di Giusto et al. e,k (l, B) ==⇒ (l ,B ) e = a ··· a (∀v) proc (v) = π 1 n cd (∀v, v ) proc (v)= proc (v )= π =⇒ v = v ∧ dest = ⊥ R R (∀v) v send(p, π, (q, v)) =⇒ dest = q dest = ⊥ =⇒ dest = dest π π π π SS π  π C = C ∪{proc (v ) | v  v & v ∩ X = ∅ &(proc (v)= π or v = ψ )} start X X X R ∪{proc (v) | proc (v)= π & X = S} S R SS ∪{p | p ∈ C & v  ψ &(proc (v)= π or v = ψ )} X,q q start dest ∈ C e,k π π  π π (l, B, C ,C , dest ) = ==⇒ (l ,B ,C ,C , dest ) S R S R π feas e,k Fig. 6: Definition of the relation = ==⇒ feas Comparison with [4].In[4] the authors verify that an execution is feasible with a monitor which reviews the actions of the execution and adds processes that no longer are allowed to send a message to the receiver of π. Unfortunately, we have here a similar problem that the one mentioned in the previous comparison paragraph. According to their monitor, the following execution e = deviate(e·r) is feasible, i.e., is runnable in S and e · r is runnable in S. e = send(q, π, (r, v )) · rec(q, π, (r, v )) · send(q, s, v ) · rec(q, s, v )· 1 1 2 2 send(p, s, v ) · rec(p, s, v ) · send(p, r, v ) · rec(p, r, v )· 3 3 4 4 send(π, r, v ) · rec(π, r, v ) 1 4 However, this execution is not feasible because there is a causal dependency between v and v .In[4] this execution would then be considered as feasible 1 3 and therefore would belong to set sT r (S ). Yet there is no corresponding exe- cution in asT r(S), the comparison and therefore the k-synchronizability, could be distorted and appear as a false negative. Recognition of Bad Executions. Finally, we define a non-deterministic finite state automaton that recognizes MSCs of bad executions, i.e., feasible executions e = deviate(e · r) such that e · r is not k-synchronizable. We come back to the XY “non-extended” conflict graph, without edges of the form . Let Post (v)= {v ∈ V | v → v } be the set of vertices reachable from v, and let Pre (v)= {v ∈ V | v → v} be the set of vertices co-reachable from v. For a set of vertices ∗ ∗ ∗ ∗ U ⊆ V , let Post (U)= {Post (v) | v ∈ U}, and Pre (U)= {Pre (v) | v ∈ U}. Lemma 5. The feasible execution e is bad iff one of the two holds RS ∗ ∗ 1. υ −→ −→−→ υ ,or start stop ∗ ∗ 2. the size of the set Post (υ ) ∩ Pre (υ ) is greater or equal to k +2. start stop In order to determine whether a given message exchange v of CG(e ) should be counted as reachable (resp. co-reachable), we will compute at the entry and exit of every k-exchange of e which processes are “reachable” or “co-reachable”. On the k-synchronizability of Systems 171 Example 7. (Reachable and Co-reachable Processes) Consider the MSC on the right made of five 1-exchanges. pq rs π While sending message (s, v ) that corresponds to υ , 0 start (s, v ) process r becomes “reachable”: any subsequent message exchange that involves r corresponds to a vertex of the conflict graph that is reachable from υ . While send- start ing v , process s becomes “reachable”, because process r will be reachable when it will receive message v . Sim- 2 v ilary, q becomes reachable after receiving v because r was reachable when it sent v , and p becomes reachable msc(e) after receiving v because q was reachable when it sent v . Co-reachability works similarly, but reasoning backwards on the timelines. For instance, process s stops being “co-reachable” while it receives v , process r stops being co-reachable after it receives v , and process p stops being co- reachable by sending v . The only message that is sent by a process being both reachable and co-reachable at the instant of the sending is v , therefore it is the only message that will be counted as contributing to the SCC. More formally, let e be sequence of actions, CG(e) its conflict graph and P, Q two sets of processes, Post (P)= Post {v | procs(v) ∩ P = ∅} and Pre (Q)= Pre {v | procs(v) ∩ Q = ∅} are introduced to represent the local ∗ ∗ view through k-exchanges of Post (υ ) and Pre (υ ). For instance, for e start stop as in Example 7,weget Post ({π})= {(s, v ), v , v , v , v } and Pre ({π})= e 0 2 3 4 0 e {v , v , v , (s, v )}. In each k-exchange e the size of the intersection between 0 2 1 0 i Post (P ) and Pre (Q) will give the local contribution of the current k-exchange e e i i e,k to the calculation of the size of the global SCC. In the transition relation = ==⇒ bad this value is stored in variable cnt. The last ingredient to consider is to recognise if an edge RS belongs to the SCC. To this aim, we use a function lastisRec : P →{True, False} that for each process stores the information whether the last action in the previous k-exchange was a reception or not. Then depending on the value of this variable and if a node is in the current SCC or not the value of sawRS is set accordingly. e,k The transition relation = ==⇒ defined in Fig. 7 deals with abstract confi- bad gurations of the form (P, Q, cnt, sawRS, lastisRec ) where P, Q ⊆ P, sawRS is a boolean value, and cnt is a counter bounded by k +2. We denote by lastisRec the function where all lastisRec(p)= False for all p ∈ P. Lemma 6. Let e be a feasible k-synchronizable execution of S . Then e is a bad execution iff there are e = e ··· e · send(π, q, v) · rec(π, q, v) with e ,...,e ∈ 1 n 1 n ≤k ≤k S R and msc(e )= msc(e ), P ,Q ⊆ P, sawRS ∈{True, False}, cnt ∈ {0,...,k +2}, such that e ,k e ,k 1 n ({π},Q, 0, False, lastisRec ) = ==⇒ ... = ==⇒ (P , {π}, cnt, sawRS, lastisRec) bad bad 172 C. Di Giusto et al. P = procs(Post (P )) Q = procs(Pre (Q )) e e SCC = Post (P ) ∩ Pre (Q ) e e e cnt = min(k +2, cnt + n) where n = |SCC | lastisRec (q) ⇔ (∃v ∈ SCC .proc (v)= q ∧ v ∩ R = ∅)∨ (lastisRec(q)∧ ∃v ∈ V.proc (v)= q) sawRS = sawRS∨ (∃v ∈ SCC )(∃p ∈ P \{π}) proc (v)= p ∧ lastisRec(p) ∧ p ∈ P ∩ Q e,k (P, Q, cnt, sawRS, lastisRec) = == ⇒ (P ,Q , cnt , sawRS , lastisRec ) bad e,k Fig. 7: Definition of the relation = ==⇒ bad and at least one of the two holds: either sawRS = True,or cnt = k +2. Comparison with [4]. As for the notion of feasibility, to determine if an execution is bad, in [4] the authors use a monitor that builds a path between the send to process π and the send from π. In addition to the problems related to the wrong characterisation of k-synchronizability, this monitor not only can detect an RS edge when there should be none, but also it can miss them when they exist. In general, the problem arises because the path is constructed by considering only an endpoint at the time. We can finally conclude that: Theorem 4. The k-synchronizability of a system S is decidable for k ≥ 1. 6 k-synchronizability for Peer-to-Peer Systems In this section, we will apply k-synchronizability to peer-to-peer systems. A peer- to-peer system is a composition of communicating automata where each pair of machines exchange messages via two private FIFO buffers, one per direction of communication. Here we only give an insight on what changes with respect to the mailbox setting. Causal delivery reveals the order imposed by FIFO buffers. Definition 4 must then be adapted to account for peer-to-peer communication. For instance, two messages that are sent to a same process p by two different processes can be received by p in any order, regardless of any causal dependency between the two sends. Thus, checking causal delivery in peer-to-peer systems is easier than in the mailbox setting, as we do not have to carry information on causal dependencies. Within a peer-to-peer architecture, MSCs and conflict graphs are defined as within a mailbox communication. Indeed, they represents dependencies over machines, i.e., the order in which the actions can be done on a given machine, and over the send and the reception of a same message, and they do not depend on the type of communication. The notion of k-exchange remains also unchanged. On the k-synchronizability of Systems 173 Decidability of Reachability for k-synchronizable Peer-to-Peer Sys- tems. To establish the decidability of reachability for k-synchronizable peer-to- p2p e,k peer systems, we define a transition relation ==⇒ for a sequence of action e cd describing a k-exchange. As for mailbox systems, if a send action is unmatched in the current k-exchange, it will stay orphan forever. Moreover, after a process p sent an orphan message to a process q, p is forbidden to send any matched message to q. Nonetheless, as a consequence of the simpler definition of causal delivery, , we no longer need to work on the conflict graph. Summary nodes and extended edges are not needed and all the necessary information is in function B that solely contains all the forbidden senders for process p. The characterisation of a k-synchronizable execution is the same as for mail- box systems as the type of communication is not relevant. We can thus conclude, as within mailbox communication, that reachability is decidable. Theorem 5. Let S be a k-synchronizable system and l a global control state of S. The problem whether there exists e ∈ asEx(S) and Buf such that (l , Buf ) = ⇒ 0 0 (l, Buf) is decidable. Decidability of k-synchronizability for Peer-to-Peer Systems. As in mailbox system, the detection of a borderline execution determines whether a system is k-synchronizable. p2p e,k The relation transition = ==⇒ allows to obtain feasible executions. Differ- feas ently from the mailbox setting, we need to save not only the recipient dest but also the sender of the delayed message (information stored in variable exp ). The transition rule then checks that there is no message that is violating causal delivery, i.e., there is no message sent by exp to dest after the deviation. Finally the recognition of bad execution, works in the same way as for mailbox p2p e,k systems. The characterisation of a bad execution and the definition of = ==⇒ bad are, therefore, the same. As for mailbox systems, we can, thus, conclude that for a given k, k-synchro- nizability is decidable. Theorem 6. The k-synchronizability of a system S is decidable for k ≥ 1. 7 Concluding Remarks and Related works In this paper we have studied k-synchronizability for mailbox and peer-to-peer systems. We have corrected the reachability and decidability proofs given in [4]. The flaws in [4] concern fundamental points and we had to propose a consid- erably different approach. The extended edges of the conflict graph, and the graph-theoretic characterisation of causal delivery as well as summary nodes, e,k e,k have no equivalent in [4]. Transition relations = ==⇒ and = ==⇒ building on the feas bad 174 C. Di Giusto et al. graph-theoretic characterisations of causal delivery and k-synchronizability, de- part considerably from the proposal in [4]. We conclude by commenting on some other related works. The idea of “com- munication layers” is present in the early works of Elrad and Francez [8] or Chou and Gafni [7]. More recently, Chaouch-Saad et al. [6] verified some consensus al- gorithms using the Heard-Of Model that proceeds by “communication-closed rounds”. The concept that an asynchronous system may have an “equivalent” synchronous counterpart has also been widely studied. Lipton’s reduction [14] reschedules an execution so as to move the receive actions as close as possible from their corresponding send. Reduction recently received an increasing interest for verification purpose, e.g. by Kragl et al. [12], or Gleissenthal et al. [11]. Existentially bounded communication systems have been studied by Ge- nest et al. [10,15]: a system is existentially k-bounded if any execution can be rescheduled in order to become k-bounded. This approach targets a broader class of systems than k-synchronizability, because it does not require that the execu- tion can be chopped in communication-closed rounds. In the perspective of the current work, an interesting result is the decidability of existential k-boundedness for deadlock-free systems of communicating machines with peer-to-peer channels. Despite the more general definition, these older results are incomparable with the present ones, that deal with systems communicating with mailboxes, and not peer-to-peer channels. Basu and Bultan studied a notion they also called synchronizability, but it differs from the notion studied in the present work; synchronizability and k- synchronizability define incomparable classes of communicating systems. The proofs of the decidability of synchronizability [3,2] were shown to have flaws by Finkel and Lozes [9]. A question left open in their paper is whether synchroni- zability is decidable for mailbox communications, as originally claimed by Basu and Bultan. Akroun and Salaun ¨ defined also a property they called stability [1] and that shares many similarities with the synchronizability notion in [2]. Context-bounded model-checking is yet another approach for the automatic verification of concurrent systems. La Torre et al. studied systems of commu- nicating machines extended with a calling stack, and showed that under some conditions on the interplay between stack actions and communications, context- bounded reachability was decidable [13]. A context-switch is found in an exe- cution each time two consecutive actions are performed by a different partici- pant. Thus, while k-synchronizability limits the number of consecutive sendings, bounded context-switch analysis limits the number of times two consecutive ac- tions are performed by two different processes. As for future work, it would be interesting to explore how both context- boundedness and communication-closed rounds could be composed. Moreover refinements of the definition of k-synchronizability can also be considered. For instance, we conjecture that the current development can be greatly simplified if we forbid linearisation that do not correspond to actual executions. On the k-synchronizability of Systems 175 References 1. Akroun, L., Salaun, ¨ G.: Automated verification of automata communicating via FIFO and bag buffers. Formal Methods in System Design 52(3), 260–276 (2018). 2. Basu, S., Bultan, T.: On deciding synchronizability for asynchronously communicating systems. Theor. Comput. Sci. 656, 60–75 (2016). 3. Basu, S., Bultan, T., Ouederni, M.: Synchronizability for verification of asyn- chronously communicating systems. In: Kuncak, V., Rybalchenko, A. (eds.) Verifi- cation, Model Checking, and Abstract Interpretation - 13th International Con- ference, VMCAI 2012, Philadelphia, PA, USA, January 22-24, 2012. Proceed- ings. Lecture Notes in Computer Science, vol. 7148, pp. 56–71. Springer (2012). 5 4. Bouajjani, A., Enea, C., Ji, K., Qadeer, S.: On the completeness of verifying mes- sage passing programs under bounded asynchrony. In: Chockler, H., Weissenbacher, G. (eds.) Computer Aided Verification - 30th International Conference, CAV 2018, Held as Part of the Federated Logic Conference, FloC 2018, Oxford, UK, July 14-17, 2018, Proceedings, Part II. Lecture Notes in Computer Science, vol. 10982, pp. 372–391. Springer (2018). 23 5. Bouajjani, A., Habermehl, P., Vojnar, T.: Abstract regular model checking. In: Alur, R., Peled, D.A. (eds.) Computer Aided Verification, 16th Interna- tional Conference, CAV 2004, Boston, MA, USA, July 13-17, 2004, Proceed- ings. Lecture Notes in Computer Science, vol. 3114, pp. 372–386. Springer (2004). 29 6. Chaouch-Saad, M., Charron-Bost, B., Merz, S.: A reduction theorem for the veri- fication of round-based distributed algorithms. In: Bournez, O., Potapov, I. (eds.) Reachability Problems, 3rd International Workshop, RP 2009, Palaiseau, France, September 23-25, 2009. Proceedings. Lecture Notes in Computer Science, vol. 5797, pp. 93–106. Springer (2009). 10 7. Chou, C., Gafni, E.: Understanding and verifying distributed algorithms us- ing stratified decomposition. In: Dolev, D. (ed.) Proceedings of the Sev- enth Annual ACM Symposium on Principles of Distributed Computing, Toronto, Ontario, Canada, August 15-17, 1988. pp. 44–65. ACM (1988). 8. Elrad, T., Francez, N.: Decomposition of distributed programs into communication-closed layers. Sci. Comput. Program. 2(3), 155–173 (1982). 9. Finkel, A., Lozes, E.: Synchronizability of communicating finite state ma- chines is not decidable. In: Chatzigiannakis, I., Indyk, P., Kuhn, F., Muscholl, A. (eds.) 44th International Colloquium on Automata, Languages, and Programming, ICALP 2017, July 10-14, 2017, Warsaw, Poland. LIPIcs, vol. 80, pp. 122:1–122:14. Schloss Dagstuhl - Leibniz-Zentrum fuer Informatik (2017)., http://www.dagstuhl. de/dagpub/978-3-95977-041-5 10. Genest, B., Kuske, D., Muscholl, A.: On communicating automata with bounded channels. Fundam. Inform. 80(1-3), 147–167 (2007), articles/fundamenta-informaticae/fi80-1-3-09 11. von Gleissenthall, K., Kici, R.G., Bakst, A., Stefan, D., Jhala, R.: Pretend syn- chrony: synchronous verification of asynchronous distributed programs. PACMPL 3(POPL), 59:1–59:30 (2019). 176 C. Di Giusto et al. 12. Kragl, B., Qadeer, S., Henzinger, T.A.: Synchronizing the asynchronous. In: Schewe, S., Zhang, L. (eds.) 29th International Conference on Concurrency The- ory, CONCUR 2018, September 4-7, 2018, Beijing, China. LIPIcs, vol. 118, pp. 21:1–21:17. Schloss Dagstuhl - Leibniz-Zentrum fuer Informatik (2018). 13. La Torre, S., Madhusudan, P., Parlato, G.: Context-bounded analysis of concurrent queue systems. In: Ramakrishnan, C.R., Rehof, J. (eds.) Tools and Algorithms for the Construction and Analysis of Systems, 14th International Conference, TACAS 2008, Held as Part of the Joint European Conferences on Theory and Practice of Software, ETAPS 2008, Budapest, Hungary, March 29-April 6, 2008. Proceed- ings. Lecture Notes in Computer Science, vol. 4963, pp. 299–314. Springer (2008). 21 14. Lipton, R.J.: Reduction: A method of proving properties of parallel programs. Commun. ACM 18(12), 717–721 (1975). 15. Muscholl, A.: Analysis of communicating automata. In: Dediu, A., Fernau, H., Mart´ın-Vide, C. (eds.) Language and Automata Theory and Applications, 4th International Conference, LATA 2010, Trier, Germany, May 24-28, 2010. Proceed- ings. Lecture Notes in Computer Science, vol. 6031, pp. 50–57. Springer (2010). 4 Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made. The images or other third party material in this chapter are included in the chapter’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. General Supervised Learning as Change Propagation with Delta Lenses ( ) Zinovy Diskin McMaster University, Hamilton, Canada Abstract. Delta lenses are an established mathematical framework for modelling and designing bidirectional model transformations (Bx). Fol- lowing the recent observations by Fong et al, the paper extends the delta lens framework with a a new ingredient: learning over a parameterized space of model transformations seen as functors. We will define a notion of an asymmetric learning delta lens with amendment (ala-lens), and show how ala-lenses can be organized into a symmetric monoidal (sm) category. We also show that sequential and parallel composition of well- behaved (wb) ala-lenses are also wb so that wb ala-lenses constitute a full sm-subcategory of ala-lenses. 1 Introduction The goal of the paper is to develop a formal model of supervised learning in a very general context of bidirectional model transformation or Bx, i.e., synchro- nization of two arbitrary complex structures (called models) related by a trans- formation. Rather than learning parameterized functions between Euclidean spaces as is typical for machine learning (ML), we will consider learning map- pings between model spaces and formalize them as parameterized functors be- tween categories, f: P ×A → B,with P being a parameter space. The basic ML-notion of a training pair (A, B ) ∈ A × B will be considered as an incon- 0 0 sistency between models caused by a change (delta) v: B → B of the target model B = f(p, A), p ∈ P , that was first consistent with A w.r.t. the transfor- mation (functor) f(p, _). An inconsistency is repaired by an appropriate change of the source structure, u: A → A , changing the parameter p to p ,andan @  @   @ amendment of the target structure v : B → B so that f(p ,A )= B is a consistent state of the parameterized two-model system. The setting above without parameterization and learning (i.e., p = p always holds), and without amendment (v = id  always holds), is well known in the Bx literature under the name of delta lenses— mathematical structures, in Term Bx refers to a wide area including file synchronization, data exchange in databases, and model synchronization in Model-Driven software Engineering (MDE), see [7] for a survey. In the present paper, Bx will mainly refer to Bx in the MDE context. c The Author(s) 2020 J. Goubault-Larrecq and B. K¨ onig (Eds.): FOSSACS 2020, LNCS 12077, pp. 177–197, 2020. 178 Z. Diskin which consistency restoration via change propagation is modelled by functorial- like algebraic operations over categories [12,6]. There are several types of delta lenses tailored for modelling different synchronization tasks and scenarios, partic- ularly, symmetric and asymmetric. In the paper, we only consider asymmetric delta lenses and will often omit explicit mentioning these attributes. Despite their extra-generality, (delta) lenses have been proved useful in the design and implementation of practical model synchronization systems with triple graph grammars (TGG) [5,2]; enriching lenses with amendment is a recent extension of the framework motivated and formalized in [11]. A major advantage of the lens framework for synchronization is its compositionality: a lens satisfying sev- eral equational laws specifying basic synchronization requirements is called well- behaved (wb), and basic lens theorems state that sequential and parallel compo- sition of wb lenses is again wb. In practical applications, it allows the designer of a complex synchronizer to avoid integration testing: if elementary synchronizers are tested and proved to be wb, their composition is automatically wb too. The present paper makes the following contributions to the delta lens frame- work for Bx. a) We motivate model synchronization enriched with learning and, moreover, with categorical learning, in which the parameter space is a cate- gory, and introduce the notion of a wb asymmetric learning (delta) lens with amendment (a wb ala-lens) (this is the content of Sect. 3). b) We prove compo- sitionality of wb ala-lenses and show how their universe can be organized into a symmetric monoidal (sm) category (Theorems 1-3 in Sect. 4). All proofs (rather straightforward but notationally laborious) can be found in the long version of the paper [9]. One more compositional result is c) a definition of a compositional bidirectional transformation language (Def. 6) that formalizes an important re- quirement to model synchronization tools, which (surprisingly) is missing from the Bx literature. Background Sect. 2 provides a simple example demonstrat- ing main concepts of Bx and delta lenses in the MDE context. Section 5 briefly surveys related work, and Sect. 6 concludes. Notation. Given a category A, its objects are denoted by capital letters A, A , etc. to recall that in MDE applications, objects are complex structures, which themselves have elements a, a , ....; the collection of all objects of category A is denoted by A .Anarrowwithdomain A ∈ A is written as u: A → _or 0 0 dom u ∈ A(A, _);wealsowrite dom(u)= A (and sometimes u = A to shorten formulas). Similarly, formula u: _ → A denotes an arrow with codomain u.cod = A . Given a functor f: A → B, its object function is denoted by f : A → B . 0 0 0 A subcategory B ⊂ A is called wide if it has the same objects. All categories we consider in the paper are small. 2 Background: Update propagation and delta lenses Although Bx ideas work well only in domains conforming to the slogan any im- plementation satisfying the specification is good enough such as code generation (see [10] for discussion), and have limited applications in databases (only so called updatable views can be treated in the Bx-way), we will employ a simple General Supervised Learning 179 database example: it allows demonstrating the core ideas without any special domain knowledge required by typical Bx-amenable areas. The presentation will be semi-formal as our goal is to motivate the delta lens formalism that abstracts the details away rather than formalize the example as such. 2.1 Why deltas. Bx-lenses first appeared in the work on file synchronization, and if we have two sets of strings, say, B = {John, Mary} and B = {Jon, Mary}, we can readily see the difference: John = Jon but Mary = Mary. We thus have a structure in-between B and B (which maybe rather complex if B and B are big files), but this structure can be recovered by string matching and thus updates can be identified with pairs. The situation dramatically changes if B and B are object structures, e.g., B = {o ,o } with Name(o ) = John, Name(o ) = Mary and 1 2 1 2 similarly B = {o ,o } with Name(o ) = Jon, Name(o ) = Mary.Now string 1 2 1 2 matching does not say too much: it may happen that o and o are the same object (think of a typo in the dataset), while o and o are different (although equally named) objects. Of course, for better matching we could use full names or ID numbers or something similar (called, in the database parlance, primary keys), but absolutely reliable keys are rare, and typos and bugs can compromise them anyway. Thus, for object structures that Bx needs to keep in sync, deltas between models need to be independently specified, e.g., by specifying a same- ness relation u ⊂ B×B between models. For example, u = {o ,o } says that John@B and Jon@B are the same person while Mary@B and Mary@B are not. Hence, model spaces in Bx are categories (objects are models and arrows are update/delta specifications) rather than sets (codiscrete categories). 2.2 Consistency restoration via update propagation: An Example Figure 1 presents a simple example of delta propagation for consistency restora- tion. Models consist of objects (in the sense of OO programming) with attributes (a.k.a. labelled records), e.g., the source model A consists of three objects iden- tified by their oids (object identifiers) #A, #J, #M (think about employees of some company) with attribute values as shown in the table: attribute Expr. refers to Experience measured by a number of years, and Depart. is the column of de- partment names. The schema of the table, i.e., the triple S of attributes (Name, Expr., Depart.) with their domains of values S S Str tr tring ing ing, IIInteg n nteg teger er er, S S Str tr tring ing ing resp., de- termines a model space A.Amodel X ∈ A is given by its set of objects OID X X X together with three functions Name , Expr. , Depart. from the same domain OID to targets S S Str tr tring ing ing, IIIn nteg nteg teger er er, S S Str tr tring ing ing resp., which are compactly specified by tables as shown for model A. The target model space B is given by a similar schema S consisting of two attributes. The B-view get(X) of an A-model X X X is computed by selecting those oids #O ∈ OID for which Depart. (#O) is an def IT-department, i.e., an element of the set IT = {ML, DB}. For example, the upper part of the figure shows the IT-view B of model A. u update : #A = #A #J = #J #M = #M 180 Z. Diskin We assume that all column names in schemas S ,and S are qualified by A B schema names, e.g., OID@S , OID@S etc, so that schemas are disjoint except A B elementary domains like S S Str tr tring ing ing etc. Also disjoint are OID-values, e.g., #J@A and #J@B are different elements, but constants like John and Mary are elements of String set S Str tring ing shared by both schemas. To shorten long expressions in the diagrams, we will often omit qualifiers and write #J =#J meaning #J@A =#J@B or #J@B =#J@B depending on the context given by the diagram; often we will also write #J and #J for such OIDs. Also, when we write #J =#J inside block arrows denoting updates, we actually mean a pair, e.g., (#J@B, #J@B ). Giventwo modelsoverthesameschema,say, B and B over S ,anupdate B B v: B → B is a relation v ⊂ OID ×OID ; if a schema contains several nodes, an update should provide a relation v for each node N in the schema. Note an essential difference between the two parallel updates v ,v : B → B 1 2 specified in the figure. Update v says that John’s name was changed to Jon (think of fixing a typo), and the experience data for Mary were also corrected (either because of a typo or, e.g., because the department started to use a new ML method for which Mary has a longer experience). Update v specifies the same story for John but a new story for Mary: it says that Mary #M left the IT-view and Mary #M is a new employee in one of IT-departments. :get Source model A Target (view) model B (IT-departments) OIDs Name Expr. Depart. OIDs Name Expr. #A Ann 10 Sales #J John 10 #J John 10 DB 1:put #M Mary 5 #M Mary 5 ML upd. v : upd.v : 1 tr #J = #J’ 2:put #J = #J’ #M = #M’ qt 2:put Updated view BÊ qt Updated source AÊ Updated source AÊ OIDs Name Expr. OIDs Name Expr. Depart. OIDs Name Expr. Depart. Jon tr #A Ann 10 Sales #J’ 10 upd.u #A Ann 10 Sales 2 #J Jon 10 DB #M’ Mary #A = #A Jon #J 10 DB #M’ Mary 7 ? (in IT) #J = #J par #M Mary 7 ML 2:put #M = #M tr Updated source model AÊ par 2 Updated source model AÊ OIDs Name Expr. Depart. OIDs Name Expr. Depart. #A Ann 10 Sales #A Ann 10 Sales #J Jon 10 DB #J Jon 10 DB #M Mary ?(not IT) ? (not IT) #M Mary 5 ML #M’ Mary 7 ? (in IT) #M’ Mary 7 ? (in IT) Fig. 1: Example of update propagation qt upd. u : #A = #A #J = #J General Supervised Learning 181 2.3 Update propagation and update policies The updated view B is inconsistent with the source A and the latter is to be updated accordingly — we say that update v is to be propagated back to A.Prop- agation of v is easy: we just update accordingly the values of the attributes as shown in the figure in the block arrow u : A → A (of black colour). Importantly, propagation needs two pieces of data: the view update v and the original state A of the source as shown in the figure by two data-flow lines into the chevron 1:put; the latter denotes invocation of the backward propagation operation put (read “put view update back to the source”). The quadruple 1=(v ,A, u ,A ) 1 1 can be seen as an instance of operation put, hence the notation 1:put (borrowed from the UML). Propagation of update v is more challenging: Mary can disappear from the IT-view because a) she quit the company, b) she transitioned to a non-IT de- partment, and c) the view definition has changed, e.g., the new view must only show employees with experience more than 5 years. Choosing between these pos- sibilities is often called choosing an (update) policy. We will consider the case of changing the view in Sect. 3, and in the current section discuss policies a) and b) (ignore for a while the propagation scenario shown in blue in the right lower corner of the figure that shows policy c)). For policy a), further referred to as quiting and briefly denoted by qt,the result of update propagation is shown in the figure with green colour: notice qt qt theupdate(block)arrow u and its result, model A , produced by invoking 2 2 qt operation put . Note that while we know the new employee Mary works in one of IT departments, we do not know in which one. This is specified with a special value ’ ?’ (a.k.a. labelled null in the database parlance). For policy b), further referred to as transition and denoted tr, the result of update propagation is shown in the figure with orange colour: notice update tr tr tr arrow u and its result, model A produced by put .Mary#Mistheold 2 2 employee who transitioned to a new non-IT department, for which her expertize is unknown. Mary #M’ is a new employee in one of IT-departments (we assume that the set of departments is not exhausted by those appearing in a particular state A ∈ A). There are also updates whose backward propagation is uniquely defined and does not need a policy, e.g., update v is such. An important property of update propagations we have considered is that they restore consistency: the view of the updated source equals to the updated view initiated the update: get(A )= B ; moreover, this equality extends for update arrows: get(u )= v , i =1, 2. Such extensions can be derived from view i i definitions if the latter are determined by so called monotonic queries (which encompass a wide class of practically useful queries including the Select-Project- Join class). For views defined by non-monotonic queries, in order to obtain get’s action on source updates u: A → A , a suitable policy is to be added to the view definition (see [1,14,12] for details and discussion). Moreover, normally get preserves identity updates, get(id )= id , and update composition: for any A get(A) u: A → A and u : A → A , equality get(u; u )= get(u); get(u ) holds. 182 Z. Diskin 2.4 Delta lenses Our discussion of the example can be summarized in the following algebraic terms. We have two categories of models and updates, A and B, and a functor get: A → B incrementally computing B-views of A-models (we will often write A.get for get(A)). We also suppose that for a chosen update policy, we have worked out precise procedures for how to propagate any view update backwards. This gives us a family of operations put : A(A, _) ← B(A.get, _) indexed by A-objects, A ∈ A ,for whichwewrite put .v or put (v) interchangeably. A A Definition 1 (Delta Lenses ([12])) Let A, B be two categories. An (asym- metric delta) lens from A (the source of the lens) to B (the target) is a pair =(get, put),where get: A → B is a functor and put is a family of operations put : A(A, _) ← B(A.get, _) indexedbyobjectsof A, A ∈ A .Given A,op- eration put maps any arrow v: A.get → B to an arrow u: A → A such that A .get = B . The last condition is called (co)discrete Putget law: (Putget) (put .v).cod.get = v.cod for all A ∈ A and v ∈ B(A.get, _) A 0 where get denotes the object function of functor get. We will write a lens as an arrow : A → B going in the direction of get. Note that family put corresponds to a chosen update policy, e.g., in terms of the example above, for the same view functor get, we have two families qt tr of put-operations, put and put , corresponding to the two updated policies qt qt we discussed. These two policies determine two lenses  =(get, put ) and tr tr =(get, put ) sharing the same get. Definition 2 (Well-behavedness) A (lens) equational law is an equation to hold for all values of two variables: A ∈ A and v: A.get → T . A lens is called well-behaved (wb) if the following two laws hold: (Stability) id = put .id for all A ∈ A A A.get 0 (Putget)(put .v).get = v for all A ∈ A and all v ∈ B(A.get, _) Remark 1. Stability law says that a wb lens does nothing if nothing happens on the target side (no actions without triggers). Putget requires consistency after the backward propagation is finished. Note the distinction between the Putget condition included into the very definition of a lens, and the full Putget law required for the wb lenses. The former is needed to ensure smooth tiling of put-squares (i.e., arrow squares describing application of put to a view update and its result) both horizontally (for sequential composition) and vertically (not considered in the paper). The full Putget assures true consistency as considering a state B alone does not say much about the real update and elements of B cannot be properly interpreted. The real story is specified by delta v: B → B , and consistency restoration needs the full (PutGet) law as above. A more detailed trailer of lenses can be found in the long version [9]. As shown in [6], the Putget condition is needed if we want to define operations put separately from the functor get: then we still need a function get : A → B and 0 0 the codiscrete Putget law to ensure a reasonable behaviour of put. General Supervised Learning 183 3 Asymmetric Learning Lenses with Amendments We will begin with a brief motivating discussion, and then proceed with formal definitions 3.1 Does Bx need categorical learning? Enriching delta lenses with learning capabilities has a clear practical sense for Bx. Having a lens (get, put): A → B and inconsistency A.get = B ,the idea of learning extends the notion of the search space and allows us to update the transformation itself so that the final consistency is achieved for a new transfor- mation get : A.get = B . For example, in the case shown in Fig. 1, disappearance of Mary #M in the updated view B can be caused by changing the view def- inition, which now requires to show only those employees whose experience is more than 5 years and hence Mary #M is to be removed from the view, while Mary #M’ is a new IT-employee whose experience satisfies the new definition. Then the update v can be propagated as shown in the bottom right corner of Fig. 1, where index par indicates a new update policy allowing for view definition (parameter) change. To manage the extended search possibilities, we parameterize the space of transformations as a family of mappings get : A → B indexed over some param- eter space p ∈ P. For example, we may define the IT-view to be parameterized by the experience of employees shown in the view (including any experience as a special parameter value). Then we have two interrelated propagation operations that map an update B B to a parameter update p p and a source update AA . Thus, the extended search space allows for new update policies that look for updating the parameter as an update propagation possibility. The possibility to update the transformation appears to be very natural in at least two impor- tant Bx scenarios: a) model transformation design and b) model transformation evolution (cf. [21]), which necessitates the enrichment of the delta lens frame- work with parameterization and learning. Note that all transformations get , p ∈ P are to be elements of the same lens, and operations put are not indexed by p, hence, formalization of learning by considering a family of ordinary lenses would not do the job. Categorical vs. codiscrete learning Suppose that the parameter p is itself a set, e.g., the set of departments forming a view can vary depending on some context. Then an update from p to p has a relational structure as discussed above, i.e., e: p → p is a relation e ⊂ p×p specifying which departments disap- peared from the view and which are freshly added. This is a general phenomenon: as soon as parameters are structures (sets of objects or graphs of objects and attributes), a parameter change becomes a structured delta and the space of pa- rameters gives rise to a category P. The search/propagation procedure returns an arrow e: p → p in this category, which updates the parameter value from p to p . Hence, a general model of supervised learning should assume P to be a category (and we say that learning is categorical).Thecaseoftheparameter 184 Z. Diskin space being a set is captured by considering a codiscrete category P whose only arrows are pairs of its objects; we call such learning codiscrete. 3.2 Ala-lenses The notion of a parameterized functor (p-functor) is fundamental for ala-lenses, but is not a lens notion per se and is thus placed into Appendix Sect. A.1. We will work with its exponential (rather than equivalent product-based) formulation but will do uncurrying and currying back if necessary, and often using the same symbol for an arrow f and its uncurried version f . Definition 3 (ala-lenses) Let A and B be categories. An ala-lens from A (the source of the lens) to B (the target)isapair  =(get, put) whose first component is a p-functor get: A B and the second component is a triple of upd req self (families of) operations put =(put , put , put ) indexed by pairs p ∈ P , p,A p,A p,A A ∈ A ; arities of the operations are specified below after we introduce some notation. Names req (for ’request’) and upd (for ’update’) are chosen to match the terminology in [17]. Categories A, B are called model spaces,theirobjectsare models and their arrows are (model) updates or deltas. Objects of P are called parameters and are denoted by small letters p, p ,.. rather than capital ones to avoid confusion with [17], in which capital P is used for the entire parameter set. Arrows of P are called parameter deltas. For a parameter p ∈ P ,wewrite get for the functor get(p): A → B (read “get B-views of A”), and if A ∈ A is a source model, its get -view is denoted by get (A) or A.get or even A (so that _ becomes p p p p yet another notation for functor get ). Given a parameter delta e: p → p and asourcemodel A ∈ A , the model delta get(e): get (A) → get (A) will be p p denoted by get (A) or e (rather than A as we would like to keep capital letters S e for objects only). In the uncurried version, get (A) is nothing but get(e, id ) Since get is a natural transformation, for any delta u: A → A we have a commutative square e ; u = u ; e (whose diagonal is get(e, u)). We will S p p A denote the diagonal of this square by u.get or u : A → A . Thus, we use e p e p notation def def def A = A.get = get (A) = get(p)(A) p p (1) def def def def nat u = u.get = get (u) = get(e)(u) = e ; u = u ; e : A → A e S p p A p e e p Now we describe operations put. They all have the same indexing set P × A , 0 0 and the same domain: for any index p, A and any model delta v: A → B in B, the value put (p, A), x ∈{req, upd, self} is defined and unique: p,A upd put : p → p is a parameter delta from p, p,A req put : A → A is a model delta from A, p,A (2) self put : B → A is a model delta from B p,A p called the amendment and denoted by v . e SÊ General Supervised Learning 185 self Note that the definition of put involves an equational dependency between all three operations: for all A ∈ A , v ∈ B(A.get, _),werequire req upd self (Putget) (put .v).cod.get =(v; put ).cod where p =(put .v).cod 0 A p A A We will write an ala-lens as an arrow  =(get, put): A B. Alensiscalled (twice) codiscrete if categories A, B, P are codiscrete and thus get: A B is a parameterized function. If only P is codiscrete, we call a codiscretely learning delta lens, while if only model spaces are codiscrete, we call  a categorically learning codiscrete lens. Diagram in Fig. 2 shows how a lens’ operations are interrelated. The up- kkk upd e = put (v) p,A per part shows an arrow e: p → p in category P and two correspond- ing functors from A to B.Thelower part is to be seen as a 3D-prism with visible front face AA A A and visible upper face AA A ,thebot- p p tom and two back faces are invisi- ble, and the corresponding arrows are :get pÊ pÊ dashed. The prism denotes an alge- p v braic term: given elements are shown req u = put (v) p, A BÊ with black fill and white font while de- AÊ pÊ rived elements are blue (recalls being mechanically computed) and blank :get pÊ AÊ AÊ (double-body arrows are considered pÊ self kkk v = put (v) p, A as “blank”). The two pairs of arrows originating from A and A are not blank because they denote pairs of Fig. 2: Ala-lens operations nodes (the UML says links)rather than mappings/deltas between nodes. Equational definitions of deltas e, u, v arewrittenupinthethreecalloutsnear them. The right back face of the prism is formed by the two vertical derived deltas u = u.get and u  = u.get , and the two matching them horizontal derived p p p p deltas e = get (A) and e  = get (A ); together they form a commutative S A e e square due to the naturality of get(e) as explained earlier. Definition 4 (Well-behavedness) An ala-lens is called well-behaved (wb) if the following two laws hold for all p ∈ P , A ∈ A and v: A → B : 0 0 p (Stability) if v = id then all three propagated updates e, u, v are identities: upd req self put (id )= id , put (id )= id , put (id )= id A p A S A A p,A p p,A p p,A p p req upd @ @ self (Putget)(put .v).get = v; v where e = put (v) and v = put (v) p,A p,A p,A Remark 2. Note that Remark 1 about the Putget law is again applicable. Example 1 (Identity lenses). Any category A givesrisetoanala-lens id with the following components. The source and target spaces are equal to A,and pÊ get get get pÊ :get :get p 186 Z. Diskin the parameter space is 1. Functor get is the identity functor and all putsare identities. Obviously, this lens is wb. Example 2 (Iso-lenses). Let ι: A → B be an isomorphism between model spaces. (ι) It gives rise to a wb ala-lens (ι): A → B with P = 1 = {∗} as follows. Given (ι).req −1 any A in A and v: ι(A) → B in B, we define put (v)= ι (v) while the ∗,A two other put operations map v to identities. Example 3 (Bx lenses). Examples of wb aa-lenses modelling a Bx can be found in [11]: they all can be considered as ala-lenses with a trivial parameter space 1. Example 4 (Learners). Learners defined in [17] are codiscretely learning codis- crete lenses with amendment, and as such satisfy (the amended) Putget (Remark 1). Looking at the opposite direction, ala-lenses are a categorification of learners as detailed in Fig. 8 on p. 194. 4 Compositionality of ala-lenses This section explores the compositional structure of the universe of ala-lenses; especially interesting is their sequential composition. We will begin with a small example demonstrating sequential composition of ordinary lenses and showing that the notion of update policy transcends individual lenses. Then we define sequential and parallel composition of ala-lenses (the former is much more in- volved than for ordinary lenses) and show that wb ala-lenses can be organized into an sm-category. Finally, we formalize the notion of a compositional update policy via the notion of a compositional bidirectional language. 4.1 Compositionality of update policies: An example Fig. 3 extends the example in Fig. 1 with a new model space C whose schema consists of the only attribute Name, and a view of the IT-view, in which only employees of the ML department are to be shown. Thus, we now have two functors, get1: A → B and get2: B → C, and their composition Get: A → C (referredtoasthe long get). The top part of Fig. 3 shows how it works for model A considered above. Each of the two policies, policy qt (green) and policy tr (orange), in which person’s disappearance from the view are interpreted, resp., as quiting the com- pany and transitioning to a department not included into the view, is applicable to the new view mappings get2 and Get, thus giving us six lenses shown in Fig. 4 qt tr with solid arrows; amongst them, lenses, L and L are obtained by applying policy pol to the (long) functor Get;, and we will refer to them long lenses. In addition, we can compose lenses of the same colour as shown in Fig. 4 by dashed qt tr tr arrows (and we can also compose lenses of different colours ( with  and 1 2 1 qt with  ) but we do not need them). Now an important question is how long and pol pol pol composed lenses are related: whether L and  ;  for pol ∈{qt, tr},are 1 2 equal (perhaps up to some equivalence) or different? qt upd.v : #J = #J General Supervised Learning 187 :Get Source model View B(IT departments) OIDs Name Expr. Depart. View C (ML dep.) OIDs Name Dep. #A Ann 10 Sales OIDs Name #J John DB :get1 :get2 #J John 10 DB #M Mary ML #M Mary qt #M Mary 5 ML qt :put2 :put1 qt qt upd. w: upd.u = u : #A = #A tr #J = #J :put2 quit Updated B OIDs Name Dep. Updated CÊ qt qt Upd. source A‘ =A‘ #J John DB tr OIDs Name upd. u : OIDs Name Expr. Dep. 12 #M’ Mary ML #A = #A #M’ Mary tr #A Ann 10 Sales #J = #J :put1 #J John 10 DB #M = #M #M’ Mary ? ML tr tr Upd. source model A Updated BÊ qt uu =id tr OIDs Name Expr. Depart. Updated source A ‘ OIDs Name Dep. #A Ann 10 Sales OIDs Name Expr. Depart. #J John DB tr #J John 10 DB 𝛿 : upd. #A Ann 10 Sales A,w #M Mary ? in IT/notML #M’ Mary ? ML #A = #A #J John 10 DB #M’ Mary ML #J = #J #M Mary 5? in IT/notML #M’= #M’ #M’ Mary ? ML #M = #M #M Mary 5? notML Fig. 3: Example cont’d: functoriality of update policies qt Fig. 3 demonstrates how the mechanisms work L with a simple example. We begin with an update w qt qt l ;l 1 2 q qt of the view C that says that Mary #M left the ML qt department, and a new Mary #M was hired for A C tr tr ML. Policy qt interprets Mary’s disappearance as 1 2 tr tr l ;l 1 2 quiting the company, and hence this Mary doesn’t qt qt tr appear in view B produced by put2 nor in view qt qt qt qt A produced from B by put1 , and updates v qt Fig. 4: Lens combination and u are written accordingly. Obviously, Mary qt schemas for Fig. 3 also does not appear in view A produced by qt qt qt the long lens’s Put . Thus, put1 (put2 (w)) = A A qt Put (w), and it is easy to understand that such equality will hold for any source model A and any update w: C → C due to the nature of our two views get1 qt qt qt qt qt qt qt and get2. Hence, L =  ;  where L =(Get, Put ) and  =(geti, puti ). 1 2 tr The situation with policy tr is more interesting. Model A produced by the tr tr tr tr tr composed lens  ;  , and model A produced by the long lens L =(Get, Put ) 1 2 are different as shown in the figure (notice the two different values for Mary’s department framed with red ovals in the models). Indeed, the composed lens has more information about the old employee Mary—it knows that Mary was in the IT view, and hence can propagate the update more accurately. The com- tr tr tr parison update δ : A → A adds this missing information so that equality A,w 12 tr tr tr u ; δ = u holds. This is a general phenomenon: functor composition looses A,w 12 information and, in general, functor Get = get1; get2 knows less than the pair (get1, get2). Hence, operation Put back-propagating updates over Get (we will tr :Put qt :Put upd. : #J = #J #M = #M upd. #A = # #J = #J #M = #M tr t tr e AÊ B@ BÊ (e*h) (e*h) AÊ (e ) A qÊ (e ) AÊ q Ap 188 Z. Diskin also say inverting Get) will, in general, result in less certain models than com- position put1 ◦ put2 that inverts the composition get1; get2 (a discussion and examples of this phenomenon in the context of vertical composition of updates tr can be found in [8]). Hence, comparison updates such as δ should exist for any A,w A and any w: A.Get → C , and together they should give rise to something like tr tr tr tr a natural transformation between lenses, δ : L ⇒  ;  . To make this no- A,B,C 1 2 tion precise, we need a notion of natural transformation between “functors” put, which we leave for future work. In the present paper, we will consider policies like qt, for which strict equality holds. 4.2 Sequential composition of ala-lenses Let k : A → B and : B → C be two ala-lenses with parameterized functors get : P → [A, B] and get : Q → [B, C] resp. Their composition is the following ala-lens k ; . Its parameter space is the product P × Q,and the get-family is k ; defined as follows. For any pair of parameters (p, q) (we will write pq), get = pq get ; get : A → C. Given a pair of parameter deltas, e: p → p in P and h: q → q p q k ; in Q,their get -image is the Godement product ∗ of natural transformations, k ; k  k get (eh)= get (e) ∗ get (h) ( we will also write get || get ) pq 3Ê:qÊ pqÊ pq w 4Ê:pÊ v 5:qÊ pÊ pÊqÊ u BÊ p AÊ q pq v qÊ CÊ v v u u pÊ @ u pÊqÊ B B AÊ 3Ê: qÊ BÊ BÊ qÊ qÊ 5Ê: qÊ 4Ê:pÊ AÊ pÊqÊ AÊ AÊ pÊ Fig. 5: Sequential composition of apa-lenses Now we define k ; ’s propagation operations puts. Let (A, pq, A ) with A ∈ pq A , pq ∈ (P×Q) , A.get .get = A ∈ C be a state of lens k ; ,and w: A → 0 0 pq 0 pq p q C is a target update as shown in Fig. 3. For the first propagation step, we run lens  as shown in Fig. 3 with the blue colour for derived elements: this is just an qÊ pÊ 0:p 1:q 5:q 5:q 4:p 3:q General Supervised Learning 189 instantiation of the pattern of Fig. 2 with the source object being A = A.get and parameter q. The results are deltas (3) .upd .req @ .self h = put (w): q → q ,v = put (w): A → B ,w = put (w): C → B . q,A q q,A q,A p p p Next we run lens k at state (p, A) and the target update v produced by lens ;it is yet another instantiation of pattern in Fig. 2 (this time with the green colour for derived elements), which produces three deltas (4) k .upd  k .req  @ k .self e = put (v): p → p ,u = put (v): A → A ,v = put (v): B → A . p,A p,A p,A p These data specify the green prism adjoint to the blue prism: the edge v of the latter is the “first half” of the right back face diagonal A A  of the former. In order to make an instance of the pattern in Fig. 2 for lens k ; , we need to extend the blue-green diagram to a triangle prism by filling-in the corresponding “empty space”. These filling-in arrows are provided by functors get and get and shown in orange (where we have chosen one of the two equivalent ways of forming the Godement product – note two curve brown arrows). In this way we obtain yet another instantiation of the pattern in Fig. 2 denoted by k ; : (k ;)upd (k ;)req (k ;)self @ @ (5) put (w)=(e, h), put (w)= u, put (w)= w ; v A,pq A,pq A,pq @ @ where v denotes v .get  . Thus, we built an ala-lens k ; , which satisfies equa- tion Putget by construction. Theorem 1 (Sequential composition and lens laws). Given ala-lenses k : A → B and : B → C,let lens k ; : A → C be their sequential composi- tion as defined above. Then the lens k ;  is wb as soon as lenses k and  are such. See [9, Appendix A.3] for a proof. 4.3 Parallel composition of ala-lenses Let  : A → B , i =1, 2 be two ala-lenses with parameter spaces P . The lens i i i i || : A ×A → B ×B is defined as follows. Parameter space  || .P = P × 1 2 1 2 1 2 1 2 1 || 1 2 1 2 P . For any pair p ||p ∈ (P ×P ) ,define get = get × get (we denote 2 1 2 1 2 0 p p p ||p 1 2 1 2 pairs of parameters by p ||p rather than p ⊗ p to shorten long formulas going 1 2 1 2 beyond the page width). Further, for any pair of models A ||A ∈ (A × A ) 1 2 1 2 0 || 1 2 and deltas v ||v :(A ||A ).get → B ||B , we define componentwise 1 2 1 2 1 2 p ||p 1 2 ( || )upd 1 2 e = put (v ||v ): p ||p → p ||p 1 2 1 2 1 2 p ||p ,A ||A 1 2 1 2 ( || )req 1 2 by setting e = e ||e where e = put (v ), i =1, 2 and similarly for put 1 2 i i p ,S i i p ||p ,A ||A 1 2 1 2 ( || )self 1 2 and put The following result is obvious. p ||p ,A ||A 1 2 1 2 Theorem 2 (Parallel composition and lens laws). Lens  || is wb as soon 1 2 as lenses  and  are such. 1 2 −L bx 190 Z. Diskin 4.4 Symmetric monoidal structure over ala-lenses Our goal is to organize ala-lenses into an sm-category. To make sequential compo- sition of ala-lenses associative, we need to consider them up to some equivalence (indeed, Cartesian product is not strictly associative). Definition 5 (Ala-lens Equivalence) Two parallel ala-lenses , : A → B are called equivalent if their parameter spaces are isomorphic via a functor ι: P → P such that for any A ∈ A , e: p → p ∈ P and v:(A.get ) → T the following 0 p holds (for x∈{req, self}): upd A.get = A.get ,ι(put (v)) = put (v), and put (v)= put (v) e ι(e) ι(p),A p,A ι(p),A p,A Remark 3. It would be more categorical to require delta isomorphisms (i.e., com- mutative squares whose horizontal edges are isomorphisms) rather than equali- ties as above. However, model spaces appearing in Bx-practice are skeletal cat- egories (and even stronger than skeletal in the sense that all isos, including iso loops, are identities), for which isos become equalities so that the generality would degenerate into equality anyway. It is easy to see that operations of lens’ sequential and parallel composition are compatible with lens’ equivalence and hence are well-defined for equivalence classes. Below we identify lenses with their equivalence classes by default. Theorem 3 (Ala-lenses form an sm-category). Operations of sequential and parallel composition of ala-lenses defined above give rise to an sm-category aLaLens aLaLens aLaLens, whose objects are model spaces (= categories) and arrows are (equiv- alence classes of) ala-lenses. See [9, p.17 and Appendix A.4] for a proof. 4.5 Functoriality of learning in the delta lens setting As example in Sect. 4.1 shows, the notion of update policy transcends individual lenses. Hence, its proper formalization needs considering the entire category of ala-lenses and functoriality of a suitable mapping. Definition 6 (Bx-transformation language) A compositional bidirectional model transforma- aLaLens aLaLens aLaLens wb tion language L is given by (i) an sm-category bx pGet pGet pGet(L ) whose objects are (L -)model spaces bx bx −wb and arrows are (L -)transformations which is bx bx pGet pGet pGet(L ) aLaLens aLaLens aLaLens bx supplied with forgetful functor into pC pC pCat at at,and (ii) an sm-functor L : pGet pGet pGet(L ) → aLaLens aLaLens aLaLens L bx bx such that the lower triangle in the inset diagram commutes. (Forgetful functors in this diagram pCat pC pCat at are named “−X”with X referring to the structure to be forgotten.) An L -language is well-behaved (wb) if functor L factorizes as shown by bx L bx the upper triangle of the diagram. −put −R General Supervised Learning 191 Example. A major compositionality result of Fong et al [17] states the existence of an sm-functor from the category of Euclidean spaces and parameterized dif- ferentiable functions (pd-functions) Pa Pa Parrraaa into the category Lear Lear Learn n n of learning algorithms (learners) as shown by the inset commutative diagram. (The functor is itself parameterized by a step size 0 <ε ∈ R and ε,err an error function err: R×R → R needed to specify Pa Pa Parrraaa Lear Lear Learn n n the gradient descent procedure.) However, learners are nothing but codiscrete ala-lenses (see Sect. A.2), and thus the inset diagram is a codiscrete specialization of pS pS pSet et et the diagram in Def. 6 above. That is, the category of Euclidean spaces and pd-functions, and the gradient descent method for back propagation, give rise to a (codiscrete) compositional pSet pCat bx-transformation language (over pS pSet et rather than pC pCat at). Finding a specifically Bx instance of Def. 6 (e.g., checking whether it holds for concrete languages and tools such as eMoflon [23] or groundTram [22]) is laborious and left for future work. 5 Related work Figure 6 on the right is a simplified version of Fig. 8 Parameter Space on p. 194 convenient for our discussion here: imme- learning diate related work should be found in areas located 1 delta learners at points (0,1) (codiscrete learning lenses) and (1,0) lenses (delta lenses) of the plane. For the point (0,1), the pa- Model per [17] by Fong, Spivak and Tuyéras is fundamental: codiscr. delta Spaces lenses lenses they defined the notion of a codiscrete learning lens (called a learner), proved a fundamental results about Fig. 6 sm-functoriality of the gradient descent approach to ML, and thus laid a foundation for the compositional approach to change prop- agation with learning. One follow-up of that work is paper [16] by Fong and Johnson, in which they build an sm-functor Lear Lear Learn n n → sLens sLens sLens which maps learn- ers to so called symmetric lenses. That paper is probably the first one where the terms ’lens’ and ’learner’ are met, but the initial observation that a learner whose parameter set is a singleton is actually a lens is due to Jules Hedges, see [16]. There are conceptual and technical distinctions between [16] and the present paper. On the conceptual level, by encoding learners as symmetric lenses, they “hide” learning inside the lens framework and make it a technical rather than conceptual idea. In contrast, we consider parameterization and supervised learn- ing as a fundamental idea and a first-class citizen for the lens framework, which grants creation of a new species of lenses. Moreover, while an ordinary lens is a way to invert a functor, a learning lens is a way to invert a parameterized func- tor so that learning lenses appear as an extension of the parameterization idea from functors to lenses. (This approach can probably be specified formally by treating parameterization as a suitably defined functorial construction.) Besides −put 192 Z. Diskin technical advantages (working with asymmetric lenses is simpler), our asymmet- ric model seems more adequate to the problem of learning functions rather than relations. On the technical level, the lens framework we develop in the paper is much more general than in [16]: we categorificated both the parameter space and model spaces, and we work with lenses with amendment (which allows us to relax the Putget law if needed). As for the delta lens roots (the point (1,0) in the figure), delta lenses were motivated and formally defined in [12] (the asymmetric case) and [13] (the sym- metric one). Categorical foundations for the delta lens theory were developed by Johnson and Rosebrugh in a series of papers (see [20] for references); this line is continued in Clarke’s work [6]. The notion of a delta lens with amend- ments (in both asymmetric and symmetric variants) was defined in [11], and several composition results were proved. Another extensive body of work within the delta-based area is modelling and implementing model transformations with triple-graph grammars (TGG) [4,23]. TGG provide an implementation frame- work for delta lenses as is shown and discussed in [5,19,2], and thus inevitably consider change propagation on a much more concrete level than lenses. The author is not aware of any work considering functoriality of update policies developed within the TGG framework. The present paper is probably the first one at the intersection (1,1) of the plane. The preliminary results have recently been reported at ACT’19 in Oxford to a representative lens community, and no references besides [17], [16] mentioned abovewereprovided. 6 Conclusion The perspective on Bx presented in the paper is an example of a fruitful in- teraction between two domains—ML and Bx. In order to be ported to Bx, the compositional approach to ML developed in [17] is to be categorificated as shown in Fig. 8 on p. 194. This opens a whole new program for Bx: checking that cur- rently existing Bx languages and tools are compositional (and well-behaved) in the sense of Def. 6 p. 190. The wb compositionality is an important practical requirement as it allows for modular design and testing of bidirectional trans- formations. Surprisingly, but this important requirement has been missing from the agenda of the Bx community, e.g., the recent endeavour of developing an effective benchmark for Bx-tools [3] does not discuss it. In a wider context, the main message of the paper is that the learning idea transcends its applications in ML: it is applicable and usable in many domains in which lenses are applicable such as model transformations, data migration, and open games [18]. Moreover, the categorificated learning may perhaps find useful applications in ML itself. In the current ML setting, the object to be learnt is m n a function f: R → R that, in the OO class modelling perspective, is a very simple structure: it can be seen as one object with a (huge) amount of attributes, or, perhaps, a predefined set of objects, which is not allowed to be changed during the search — only attribute values may be changed. In the delta lens view, General Supervised Learning 193 such changes constitute a rather narrow class of updates and thus unjustifiably narrow the search space. Learning with the possibility to change dimensions m, n may be an appropriate option in several contexts. On the other hand, while categorification of model spaces extends the search space, categorification of the parameter space would narrow the search space as we are allowed to replace a parameter p by parameter p only if there is a suitable arrow e: p → p in category P. This narrowing may, perhaps, improve performance. All in all, the interaction between ML and Bx could be bidirectional! A Appendices A.1 Category of parameterized functors pC pC pCat at at Category pC pC pCat at at has all small categories as objects. pC pC pCat at at-arrows A → B are parameterized functors (p-functors) i.e., functors f: P → [A, B] with P asmall category of parameters and [A, B] the category of functors from A to B and their natural transformations. For an object p andanarrow e: p → p in P, we write f for the functor f(p): A → B and f for the natural transformation p e f(e): f ⇒ f  . We will write p-functors as labelled arrows f: A B.As Ca Ca Cattt p p is Cartesian closed, we have a natural isomorphism between Ca Ca Cattt(P, [A, B]) and Ca Ca Cattt(P×A, B) and can reformulate the above definition in an equivalent way with functors P×A → B. We prefer the former formulation as it corresponds to the notation f: A B visualizing P as a hidden state of the transformation, which seems adequate to the intuition of parameterized in our context. (If some technicalities may perhaps be easier to see with the product formulation, we will switch to the product view thus doing currying and uncurrying without special P Q mentioning.) Sequential composition of of f: A B and g: B C is P×Q def f.g: A C given by (f.g) = f .g for objects, i.e., pairs p∈P, q∈Q,and pq p q by the Godement product of natural transformations for arrows in P×Q.That is, given a pair e: p → p in P and h: q → q in Q, we define the transformation (f.g) : f .g ⇒ f .g to be the Godement product f ∗ g . eh p q p q e h Any category A givesrisetoap-functor Id : A A, whose param- eter space is a singleton category 1 with the only object ∗, Id (∗)= id A A and Id (id ): id ⇒ id is the identity transformation. It’s easy to see that A ∗ A A p-functors Id are units of the sequential composition. To ensure associativ- ity we need to consider p-functors up to an equivalence of their parameter P P spaces. Two parallel p-functors f: A B and f: A B,are equiv- alent if there is an isomorphism α: P → P such that two parallel functors f: P → [A, B] and α; f: P → [A, B] are naturally isomorphic; then we write ˆ ˆ f ≈ f.It’seasytoseethatif f ≈ f: A → B and g ≈ gˆ: B → C,then α α β f; g ≈ f;ˆ g: A → C, i.e., sequential composition is stable under equivalence. α×β Below we will identify p-functors and their equivalence classes. Using a natu- ral isomorphism (P×Q)×R P×(Q×R), strict associativity of the functor composition and strict associativity of the Godement product, we conclude that 194 Z. Diskin sequential composition of (equivalence classes of) p-functors is strictly associa- tive. Hence, pC pC pCat at at is a category. pC pC pCat at at pS pS pSet et et Our next goal is to supply it with a monoidal structure. We borrow the latter from the sm- Cat category (Ca Cat, t ×), whose tensor is given by the prod- uct. There is an identical on objects embedding (Ca Ca Cat, tt ×) (Se Se Set, tt ×) Cat pCat (Ca Cat, t ×) pC pCat at that maps a functor f: A → B Fig. 7 toap-functor f: A B whose parameter space is the singleton category 1. Moreover, as this embedding is a functor, the co- Cat herence equations for the associators and unitors that hold in (Ca Cat, t ×) hold in pCat pCat pC pCat at as well (this proof idea is borrowed from [17]). In this way, pC pCat at becomes an sm-category. In a similar way, we define the sm-category pS pS pSet et et of small sets and parametrized functions between them — the codiscrete version of pC pC pCat at at.The diagram in Fig. 7 shows how these categories are related. A.2 Ala-lenses as categorification of ML-learners Figure 8 shows a discrete two-dimensional plane with each axis having three points: a space is a singleton, a set, a category encoded by coordinates 0,1,2 resp. Each of the points x is then the location of a corresponding sm-category of ij Parameter cat egorical learning delta space learners lenses with amend. aLLens 2 aLaLens P∈ Cat {1} learners of co discr. learning delta Fo ng et al lenses with amend. ☀ ☀ ☀ aL L Lens ∈ aL aLens P Cat 1 {1} co discr. d elta lenses lenses w it h amend. {1} aLens aaLens P = 1 {1} 0 2 Model A, B ∈ Cat ∈ spaces A, B = 1 A, B Cat Fig. 8: The universe of categories of learning delta lenses (asymmetric) learning (delta) lenses. Category {111111111} is a terminal category whose only arrow is the identity lens 111111111 =(id , id ): 1 → 1 propagating from a terminal 1 1 category 1 to itself. Label ∗∗∗ refers to the codiscrete specialization of the construct ∗ ∗ being labelled: L L L means codiscrete learning (i.e., the parameter space P is a ∗ ∗ set considered as a codiscrete category) and aLens aLens aLens refers to codiscrete model spaces. The category of learners defined in [17] is located at point (1,1), and the category of learning delta lenses with amendments defined in the present paper is located at (2,2). There are also two semi-categorificated species of learning lenses: categorical learners at point (1,2) and codiscretely learning delta lenses at (2,1), which are special cases of ala-lenses. General Supervised Learning 195 References 1. Abiteboul, S., McHugh, J., Rys, M., Vassalos, V., J.Wiener: Incremental Mainte- nance for Materialized Views over Semistructured Data. In: Gupta, A., Shmueli, O., Widom, J. (eds.) VLDB. Morgan Kaufmann (1998) 2. Anjorin, A.: An introduction to triple graph grammars as an implementation of the delta-lens framework. In: Gibbons, J., Stevens, P. (eds.) Bidirectional Trans- formations - International Summer School, Oxford, UK, July 25-29, 2016, Tutorial Lectures. Lecture Notes in Computer Science, vol. 9715, pp. 29–72. Springer (2016). 3. Anjorin, A., Diskin, Z., Jouault, F., Ko, H., Leblebici, E., Westfechtel, B.: Bench- marx reloaded: A practical benchmark framework for bidirectional transformations. In: Eramo and Johnson [15], pp. 15–30, pdf 4. Anjorin, A., Leblebici, E., Schürr, A.: 20 years of triple graph grammars: A roadmap for future research. ECEASST 73 (2015). tuj.eceasst.73.1031 5. Anjorin, A., Rose, S., Deckwerth, F., Schürr, A.: Efficient model synchronization with view triple graph grammars. In: Modelling Foundations and Applications - 10th European Conference, ECMFA 2014, York, UK, July 21-25, 2014. Proceed- ings. Lecture Notes in Computer Science, vol. 8569, pp. 1–17. Springer (2014). 6. Clarke, B.: Internal lenses as functors and cofunctors. In: Pre-proceedings of ACT’19, Oxford, 2019. BryceClarke.pdf 7. Czarnecki, K., Foster, J.N., Hu, Z., Lämmel, R., Schürr, A., Terwilliger, J.F.: Bidi- rectional transformations: A cross-discipline perspective. In: Theory and Practice of Model Transformations, pp. 260–283. Springer (2009) 8. Diskin, Z.: Compositionality of update propagation: Lax putput. In: Eramo and Johnson [15], pp. 74–89, 9. Diskin, Z.: General supervised learning as change propagation with delta lenses. CoRR abs/1911.12904 (2019), 10. Diskin, Z., Gholizadeh, H., Wider, A., Czarnecki, K.: A three-dimensional taxon- omy for bidirectional model synchronization. Journal of System and Software 111, 298–322 (2016). 11. Diskin, Z., König, H., Lawford, M.: Multiple model synchronization with multiary delta lenses with amendment and K-Putput. Formal Asp. Comput. 31(5), 611–640 (2019)., (Sect.7.1 of the paper is unreadable and can be found in 12. Diskin, Z., Xiong, Y., Czarnecki, K.: From State- to Delta-Based Bidirectional Model Transformations: the Asymmetric Case. Journal of Object Technology 10, 6: 1–25 (2011) 13. Diskin, Z., Xiong, Y., Czarnecki, K., Ehrig, H., Hermann, F., Orejas, F.: From state-to delta-based bidirectional model transformations: the symmetric case. In: MODELS, pp. 304–318. Springer (2011) 14. El-Sayed, M., Rundensteiner, E.A., Mani, M.: Incremental Maintenance of Materi- alized XQuery Views. In: Liu, L., Reuter, A., Whang, K.Y., Zhang, J. (eds.) ICDE. p. 129. IEEE Computer Society (2006). 15. Eramo, R., Johnson, M. (eds.): Proceedings of the 6th International Workshop on Bidirectional Transformations co-located with The European Joint Conferences 196 Z. Diskin on Theory and Practice of Software, Bx@ETAPS 2017, Uppsala, Sweden, April 29, 2017, CEUR Workshop Proceedings, vol. 1827. (2017), http: // 16. Fong, B., Johnson, M.: Lenses and learners. In: Cheney, J., Ko, H. (eds.) Proceed- ings of the 8th International Workshop on Bidirectional Transformations co-located with the Philadelphia Logic Week, Bx@PLW 2019, Philadelphia, PA, USA, June 4, 2019. CEUR Workshop Proceedings, vol. 2355, pp. 16–29. (2019), 17. Fong, B., Spivak, D.I., Tuyéras, R.: Backprop as functor: A compositional perspec- tive on supervised learning. In: The 34th Annual ACM/IEEE Symposium on Logic in Computer Science, LICS 2019, Vancouver, BC, Canada, June 24-27, 2019. pp. 1–13. IEEE (2019). 18. Hedges, J.: From open learners to open games. CoRR abs/1902.08666 (2019), 19. Hermann, F., Ehrig, H., Orejas, F., Czarnecki, K., Diskin, Z., Xiong, Y., Gottmann, S., Engel, T.: Model synchronization based on triple graph grammars: correct- ness, completeness and invertibility. Software and System Modeling 14(1), 241–269 (2015). 20. Johnson, M., Rosebrugh, R.D.: Unifying set-based, delta-based and edit-based lenses. In: The 5th International Workshop on Bidirectional Transformations, Bx 2016. pp. 1–13 (2016), 21. Kappel, G., Langer, P., Retschitzegger, W., Schwinger, W., Wimmer, M.: Model transformation by-example: A survey of the first wave. In: Conceptual Modelling and Its Theoretical Foundations - Essays Dedicated to Bernhard Thalheim on the Occasion of His 60th Birthday. pp. 197–215 (2012). 978-3-642-28279-9_15 22. Sasano, I., Hu, Z., Hidaka, S., Inaba, K., Kato, H., Nakano, K.: Toward bidi- rectionalization of ATL with GRoundTram. In: Theory and Practice of Model Transformations - 4th International Conference, ICMT 2011, Zurich, Switzerland, June 27-28, 2011. Proceedings. Lecture Notes in Computer Science, vol. 6707, pp. 138–151. Springer (2011). 23. Weidmann, N., Anjorin, A., Fritsche, L., Varró, G., Schürr, A., Leblebici, E.: Incremental bidirectional model transformation with emoflon: Ibex. In: The 8th International Workshop on Bidirectional Transformations co-located with the Philadelphia Logic Week, Bx@PLW 2019, Philadelphia, PA, USA, June 4, 2019. CEUR Workshop Proceedings, vol. 2355, pp. 45–55. (2019), http: // General Supervised Learning 197 Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License ( 4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made. The images or other third party material in this chapter are included in the chapter’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. Non-idempotent intersection types in logical form Thomas Ehrhard [] Université de Paris, IRIF, CNRS, F-75013 Paris, France ehrhard/ Abstract. Intersection types are an essential tool in the analysis of oper- ational and denotational properties of lambda-terms and functional pro- grams. Among them, non-idempotent intersection types provide precise quantitative information about the evaluation of terms and programs. However, unlike simple or second-order types, intersection types cannot be considered as a logical system because the application rule (or the intersection rule, depending on the presentation of the system) involves a condition stipulating that the proofs of premises must have the same structure. Using earlier work introducing an indexed version of Linear Logic, we show that non-idempotent typing can be given a logical form in a system where formulas represent hereditarily indexed families of intersection types. Keywords: Lambda Calculus · Denotational Semantics · Intersection Types · Linear Logic Introduction Intersection types, introduced in the work of Coppo and Dezani [4,5] and de- veloped since then by many authors, are still a very active research topic. As quite clearly explained in [13], the Coppo and Dezani intersection type system DΩ can be understood as a syntactic presentation of the denotational interpre- tation of λ-terms in the Engeler’s model, which is a model of the pure λ-calculus in the cartesian closed category of prime-algebraic complete lattices and Scott continuous functions. Intersection types can be considered as formulas of the propositional calculus with implication ⇒ and conjunction ∧ as connectives. However, as pointed out by Hindley [12], intersection types deduction rules depart drastically from the standard logical rules of intuitionistic logic (and of any standard logical system) by the fact that, in the ∧-introduction rule, it is assumed that the proofs of the two premises are typings of the same λ-term, which means that, in some sense made precise by the typing system itself, they have the same structure. Such requirements on proofs premises, and not only on formulas proven in premises, Partially supported by the project ANR-19-CE48-0014 PPS. c The Author(s) 2020 J. Goubault-Larrecq and B. K¨ onig (Eds.): FOSSACS 2020, LNCS 12077, pp. 198–216, 2020. Non-idempotent intersection types in logical form 199 are absent from standard (intuitionistic or classical) logical systems where the proofs of premises are completely independent from each other. Many authors have addressed this issue, we refer to [14] for a discussion on several solutions which mainly focus on the design of àlaChurch presentations of intersection typ- ing systems, thus enriching λ-terms with additional structures. Among the most recent and convincing contributions to this line of research we should certainly mention [15]. In our “new” approach to this problem — not so new actually since it dates back to [3] —, we change formulas instead of changing terms. It is based on a specific model of Linear Logic (and thus of the λ-calculus): the relational model. It is fair to credit Girard for the introduction of this model since it appears at least implicitly in [11]. It was probably known by many people in the Linear Logic community as a piece of folklore since the early 1990’s and is presented formally in [3]. In this quite simple and canonical denotational model, types are interpreted as sets (without any additional structure) and a closed term of type σ is interpreted as a subset of the interpretation of σ. It is quite easy to define, in this semantic framework, analogues of the usual models of the pure λ-calculus such as Scott’s D or Engeler’s model, which in some sense are simpler than the original ones since the sets interpreting types need not to be pre-ordered. As explained in the work of De Carvalho [6,7], the intersection type counterpart of this semantics is a typing system where “intersection” is non- idempotent (in sharp contrast with the original systems introduced by Coppo and Dezani), sometimes called system R. Notice that the precise connection between the idempotent and non-idempotent approaches is analyzed in [8], in a quite general Linear Logic setting by means of an extensional collapse. In order to explain our approach, we restrict first to simple types, interpreted as follows in the relational model: a basic type α isinterpretedasagivenset α and the type σ ⇒ τ is interpreted as the set M (σ) × τ  (where M (E) is fin fin the set of finite multisets of elements of E). Remember indeed that intersection types can be considered as a syntactic presentation of denotational semantics, so it makes sense to define intersection types relative to simple types (in the spirit of [10]) as we do in Section 3: an intersection type relative to the base type α is an element of α and an intersection type relative to σ ⇒ τ is a pair ([ a ,...,a ],b) 1 n where the a s are intersection types relative to σ and b is an intersection type relative to τ ; with more usual notations ([ a ,...,a ],b) would be written (a ∧ 1 n 1 ··· ∧ a ) → b. Then, given a type σ, the main idea consists in representing an indexed family of elements of σ as a formula of a new logical system. If σ =(ϕ ⇒ ψ) then the family can be written ([ a | k ∈ K and u(k)= j ],b ) k j j∈J −1 where J and K are indexing sets, u : K → J is a function such that f ({j}) is finite for all j ∈ J , (b ) is a family of elements of ψ (represented by a formula j j∈J B)and (a ) is a family of elements of ϕ (represented by a formula A): in k k∈K that case we introduce the implicative formula (A ⇒ B) to represent the family That we prefer not to use for avoiding confusions between these two levels of typing. We use [ ··· ] for denoting multisets much as one uses {· · · } for denoting sets, the only difference is that multiplicities are taken into account. 200 T. Ehrhard ([ a | k ∈ K and u(k)= j ],b ) . It is clear that a family of simple types has k j j∈J generally infinitely many representations as such formulas; this huge redundancy makes it possible to establish a tight link between inhabitation of intersection types with provability of formulas representing them (in an indexed version LJ(I) of intuitionistic logic). Such a correspondence is exhibited in Section 3 in the simply typed setting and the idea is quite simple: given a type σ, a family (a ) of elements of σ, and a closed λ-term j j∈J of type σ,itisequivalenttosay that  M : a holds for all j and to say that some (and actually any) formula A representing (a ) has an j j∈J LJ(I) proof whose underlying λ-term is M . In Section 4 we extend this approach to the untyped λ-calculus taking as underlying model of the pure λ-calculus our relational version R of Scott’s D . ∞ ∞ We define an adapted version of LJ(I) and establish a similar correspondence, with some slight modifications due to the specificities of R . 1 Notations and preliminary definitions If E is a set, a finite multiset of elements of E is a function m : E → N such that the set {a ∈ E | m(a) =0} (called the domain of m) is finite. The cardinal of such a multiset m is #m = m(a).Weuse + for the obvious addition a∈E operation on multisets, and if a ,...,a are elements of E,weuse [ a ,...,a ] 1 n 1 n for the corresponding multiset (taking multiplicities into account); for instance [0, 1, 0, 2,1] is the multiset m of elements of N such that m(0) = 2, m(1) = 2, m(2)=1 and m(i)=0 for i> 2.If (a ) is a family of elements of E and if J i i∈I is a finite subset of I, we use [ a | i ∈ J ] for the multiset of elements of E which maps a ∈ E to the number of elements i ∈ J such that a = a (which is finite since J is). We use M (E) for the set of finite multisets of elements of E. fin We use + to denote set union when we we want to stress the fact that the −1 involved sets are disjoint. A function u : J → K is almost injective if #u {k} is finite for each k ∈ K (equivalently, the inverse image of any finite subset of K under u is finite). If s =(a ,...,a ) is a sequence of elements of E and 1 n i ∈{1,...,n}, we use (s) \ i for the sequence (a ,...,a ,a ,...,a ).Given 1 i−1 i+1 n sets E and F , we use F for the set of function from E to F . The elements of F are sometimes considered as functions u (with a functional notation u(e) for application) and sometimes as indexed families a (with index notations a for application) especially when E is countable. If i ∈{1,...,n} and j ∈{1,...,n − 1}, we define s(j, i) ∈{1,...,n} as follows: s(j, i)= j if j< i and s(j, i)= j +1 if j ≥ i. Any such proof can be stripped from its indexing data giving rise to a proof of σ in intuitionistic logic. Non-idempotent intersection types in logical form 201 2 The relational model of the λ-calculus Let Rel the category whose objects are sets and Rel (X, Y )= P(M (X) × Y ) ! ! fin with Id = {([ a ],a) | a ∈ X} and composition of s ∈ Rel (X, Y ) and t ∈ X ! Rel (Y, Z) given by t ◦ s = {(m + ··· + m ,c) | 1 k ∃b ,...,b ∈ Y ([ b ,...,b ],c) ∈ t and ∀j (m ,b ) ∈ s} . 1 k 1 k j j It is easily checked that this composition law is associative and that Id is neutral for composition . This category has all countable products: let (X ) be a j j∈J countable family of sets, their product is X = X = {j}× X and j∈J j j j∈J projections (pr ) given by pr = {([ (j, a)],a) | a ∈ X }∈ Rel (X, X ) and if j∈J j ! j j j (s ) is a family of morphisms s ∈ Rel (Y, X ) then their tupling is s = j j∈J j ! j j j∈J {([ a ], (j, b))) | j ∈ J and ([ a ],b) ∈ s }∈ Rel (Y, X). j ! The category Rel is cartesian closed with object of morphisms from X to Y the set (X ⇒ Y )= M (X)×Y and evaluation morphism Ev ∈ Rel ((X ⇒ Y )& fin ! X, Y ) is given by Ev = {([ (1, [ a ,...,a ],b), (2,a ),..., (2,a )],b) | a ,...,a ∈ 1 k 1 k 1 k X and b ∈ Y }. The transpose (or curryfication) of s ∈ Rel (Z & X, Y ) is Cur(s) ∈ Rel (Z, X ⇒ Y ) given by Cur(s)= {([ c ,...,c ], ([ a ,...,a ],b)) | ! 1 n 1 k ([ (1,c ),..., (1,c ), (2,a ),..., (2,a )],c) ∈ s}. 1 n 1 k Relational D . Let R be the least set such that (m ,m ,... ) ∈ R as soon ∞ ∞ 0 1 ∞ as m ,m ... are finite multisets of elements of R which are almost all equal 0 1 ∞ to []. Notice in particular that e =([ ],[],... ) ∈ R and satisfies e =([ ], e). By construction we have R = M (R ) × R ,thatis R =(R ⇒ R ) ∞ fin ∞ ∞ ∞ ∞ ∞ and hence R is a model of the pure λ-calculus in Rel which also satisfies the ∞ ! η-rule. See [1] for general facts on this kind of model. 3Thesimplytypedcase We assume to be given a set of type atoms α, β, . . . and of variables x, y, . . . ; types and terms are given as usual by σ, τ, . . . := α | σ ⇒ τ and M, N, . . . := x | (M) N | λx N . With any type atom we associate a set α. This interpretation is extended to all types by σ ⇒ τ  = σ ⇒ τ  = M (σ)× τ . The relational semantics of fin this λ-calculus can be described as a non-idempotent intersection type system, with judgments of shape x : m : σ ,...,x : m : σ  M : a : σ where the x ’s 1 1 1 n n n i are pairwise distinct variables, M is a term, a ∈ σ and m ∈M (σ ) for i fin i each i. Here are the typing rules: j = i ⇒ m =[ ] and m =[ a ] Φ, x : m : σ  M : b : τ j i Φ  λx M :(m, b): σ ⇒ τ (x : m : σ )  x : a : σ i i i i i=1 We can restrict to countable sets. This results from the fact that Rel arises as the Kleisli category of the LL model of sets and relations, see [3] for instance. 202 T. Ehrhard Φ  M :([ a ,...,a ],b): σ ⇒ τ (Φ  N : a : σ) 1 k l l l=1 Ψ  (M) N : b : τ n l n where Φ =(x : m : σ ) , Φ =(x : m : σ ) for l =1,...,k and i i i l i i i=1 i i=1 l n Ψ =(x : m + m : σ ) . i i i i i=1 l=1 3.1 Why do we need another system? The trouble with this deduction system is that it cannot be considered as the term decorated version of an underlying “logical system for intersection types” allowing to prove sequents of shape m : σ ,...,m : σ  a : σ (where non- 1 1 n n idempotent intersection types m and a are considered as logical formulas, the ordinary types σ playing the role of “kinds”) because, in the application rule above, it is required that all the proofs of the k right hand side premises have the same shape given by the λ-term N . We propose now a “logical system” derived from [3] which, in some sense, solves this issue. The main idea is quite simple and relies on three principles: (1) replace hereditarily multisets with indexed families in intersection types, (2) instead of proving single types, prove indexed families of hereditarily indexed types and (3) represent syntactically such families (of hereditarily indexed types) as formulas of a new system of indexed logic. 3.2 Minimal LJ(I) We define now the syntax of indexed formulas. Assume to be given an infinite countable set I of indices. Then we define indexed types A; with each such type d(A) we associate an underlying type A, a set d(A) and a family A ∈ A . These formulas are given by the following inductive definition: – if J ⊆ I and f : J → α is a function then α[f] is a formula with α[f] = α, d(α[f]) = J and α[f] = f – and if A and B are formulas and u : d(A) → d(B) is almost injective then A ⇒ B is a formula with A ⇒ B = A ⇒ B, d(A ⇒ B)= d(B) and, for u u u k ∈ d(B), A ⇒ B =([ A | j ∈ d(A) and u(j)= k ], B ). u k j k Proposition 1. Let σ be a type, J be a subset of I and f ∈ σ . There is a formula A such that A = σ, d(A)= J and A = f (actually, there are infinitely many such A’s as soon as σ is not an atom and J = ∅). Proof. The proof is by induction on σ.If σ is an atom α then we take A = α[f]. Assume that σ =(ρ ⇒ τ) so that f(j)=(m ,b ) with m ∈M (ρ) and j j j fin b ∈ τ . Since each m is finite and I is infinite, we can find a family (K ) of j j j j∈J pairwise disjoint finite subsets of I such that #K =#m .Let K = K , j j j j∈J there is a function g : K → ρ such that m =[ g(k) | k ∈ K ] for each j ∈ J j j (choose first an enumeration g : K → ρ of m for each j and then define j j j g(k)= g (k) where j is the unique element of J such that k ∈ K ). Let u : K → J j j be the unique function such that k ∈ K for all k ∈ K; since each K is finite, u(k) j Non-idempotent intersection types in logical form 203 this function u is almost injective. By inductive hypothesis there is a formula A such that A = ρ, d(A)= K and A = g, and there is a formula B such that B = τ , d(B)= J and B =(b ) . Then the formula A ⇒ B is well formed j j∈J u (since u is an almost injective function d(A)= K → d(B)= J ) and satisfies A ⇒ B = σ, d(A ⇒ B)= J and A ⇒ B = f as contended. u u u As a consequence, for any type σ and any element a of σ (so a is a non- idempotent intersection type of kind σ), one can find a formula A such that A = σ, d(A)= {j} (where j is an arbitrary element of I)and A = a.Inother word, any intersection type can be represented as a formula (in infinitely many different ways in general of course, but up to renaming of indices, that is, up to “hereditary α-equivalence”, this representation is unique). For any formula A and J ⊆ I, we define a formula A such that A = A, J J d(A )= d(A) ∩ J and A = A  . The definition is by induction on A. J J J – α[f] = α[f  ] J J −1 – (A ⇒ B) =(A ⇒ B ) where K = u (d(B) ∩ J) and v = u  . u J K v J K Let u : d(A) → J be a bijection (so that u(d(A)) = J ), we define a formula u (A) such that u (A) = A, d(u (A)) = u(d(A)) and u (A) = A −1 .The ∗ ∗ ∗ ∗ j u (j) definition is by induction on A: −1 – u (α[f]) = α[f ◦ u ] – u (A ⇒ B)=(A ⇒ u (B)). ∗ v u◦v ∗ Using these two auxiliary notions, we can give a set of three deduction rules for a minimal natural deduction allowing to prove formulas in this indexed intu- itionistic logic. This logical system allows to derive sequents which are of shape 1 u A ,...,A  B (1) 1 n where for each i =1,...,n, the function u : d(A ) → d(B) is almost injective (it i i n u is not required that d(B)= u (d(A ))). Notice that the expressions A are i i i=1 not formulas; this construction A is part of the syntax of sequents, just as the “,” separating these pseudo-formulas. Given a formula A and u : d(A) → J almost u J u injective, it is nevertheless convenient to define A ∈M (A) by A = fin j [ A | u(k)= j ]. In particular, when u is a bijection, A =[ A −1 ]. k j u (j) The crucial point here is that such a sequent (1)involvesno λ-term. The main difference between the original system LL(I) of [3] and the present system is the way axioms are dealt with. In LL(I) there is no explicit identity axiom and only “atomic axioms” restricted to the basic constants of LL; indeed it is well-known that in LL all identity axioms can be η-expanded, leading to proofs using only such atomic axioms. In the λ-calculus, and especially in the untyped λ-calculus we want to deal with in next sections, such η-expansions are hard to handle so we prefer to use explicit identity axioms. The axiom is j = i ⇒ d(A )= ∅ and u is a bijection j i 1 u A ,...,A  u (A ) i∗ i 1 n 204 T. Ehrhard so that for j = i, the function u is empty. A special case is j = i ⇒ d(A )= ∅ and u is the identity function j i 1 u A ,...,A  A 1 n which may look more familiar, but the general axiom rule, allowing to “delocalize” the proven formula A by an arbitrary bijection u , is required as we shall see. i i The ⇒ introduction rule is quite simple 1 u u A ,...,A ,A  B 1 n u u 1 n A ,...,A  A ⇒ B Last the ⇒ elimination rule is more complicated (from a Linear Logic point of view, this is due to the fact that it combines 3 LL logical rules:  elimination, contraction and promotion). We have the deduction u v 1 u 1 v n n C ,...,C  A ⇒ BD ,...,D  A 1 n 1 n w w 1 n E ,...,E  B under the following conditions, to be satisfied by the involved formulas and functions: for each i =1,...,n one has d(C )∩d(D )= ∅, d(E )= d(C )+d(D ), i i i i i C = E  , D = E  , w  = u ,and w  = u ◦ v . i i d(C ) i i d(D ) i d(C ) i i d(D ) i i i i i 1 u Let π be a deduction tree of the sequent A ,...,A  B in this system. 1 n By dropping all index information we obtain a derivation tree π of A ,...,A 1 n → − B, and, upon choosing a sequence x of n pairwise distinct variables, we can associate with this derivation tree a simply typed λ-term π→ − which satisfies x : A ,...,x : A  π→ − : B. 1 1 n n 3.3 Basic properties of LJ(I) We prove some basic properties of this logical system. This is also the opportunity to get some acquaintance with it. Notice that in many places we drop the type annotations of variables in λ-terms, first because they are easy to recover, and second because the very same results and proofs are also valid in the untyped setting of Section 4. Lemma 1 (Weakening). Assume that Φ  A is provable by a proof π and let B be a formula such that d(B)= ∅. Then Φ  A is provable by a proof π ,where d(A) → − Φ is obtained by inserting B at any place in Φ. Moreover π→ − = π (where → − → − x is obtained from x by inserting a dummy variable at the same place). The proof is an easy induction on the proof of Φ  A. i n Lemma 2 (Relocation). Let π be a proof of (A )  A let u : d(A) → J be i i=1 u◦u i n a bijection, there is a proof π of (A )  u (A) such that π → − = π→ − . i i=1 x x The proof is a straightforward induction on π. u n Lemma 3 (Restriction). Let π be a proof of (A )  A and let J ⊆ d(A). i i=1 −1 For i =1,...,n,let K = u (J) ⊆ d(A ) and u = u  : K → J . Then the i i i i K i i i u n sequent ((A  ) )  A has a proof π such that π → − = π→ − . i K J i i=1 x x Non-idempotent intersection types in logical form 205 uj Proof. By induction on π. Assume that π consists of an axiom (A )  u (A ) i i j=1 ∗ with d(A )= ∅ if j = i,and u a bijection. With the notations of the lemma, j i K = ∅ for j = i and u is a bijection K → J.Moreover u (A  )= u (A ) j i i K i i J i i i ∗ u n so that ((A  ) )  A is obtained by an axiom π with π → − = x = π→ − . i K J i i i=1 x x Assume that π ends with a ⇒-introduction rule: u n+1 (A )  B i i=1 i n (A )  A ⇒ B n+1 u n+1 i i=1 → − with A =(A ⇒ B), and we have π = λx ρ .Withthe no- n+1 u n+1 → − n+1 x x,x n+1 tations of the lemma we have A =(A  ⇒ B ). By inductive J n+1 K u J n+1 n+1 n+1 hypothesis there is a proof ρ of (A  )  B such that ρ = ρ i J → − → − K i=1 x,x x,x n+1 n+1 → − and hence we have a proof π of (A  )  A with π = λx ρ = i J n+1 → − K i=1 i x,x n+1 π→ − as contended. Assume last that π ends with a ⇒-elimination rule: μ ρ v w i n i n (B )  B ⇒ A (C )  B i=1 i=1 i i i n (A )  A i i=1 with d(A )= d(B )+ d(C ), B = A  and C = A  , u  = v i i i i i i i i i d(B ) d(C ) d(B ) i i i → − and u  = v ◦ w for i =1,...,n, and of course π = μ ρ .Let i i → − → − d(C ) x x x −1 −1 −1 L = v (J) ⊆ d(B).Let L = v (J) and R = w (L) for i =1,...,n (we i i i i also set v = v  , w = w  and v = v ). By inductive hypothesis, we have i L i R L i i i i i n aproof μ of (B  )  B ⇒  A such that μ = μ and a proof ρ i L v J → − → − L i=1 x x i n  −1 of (C  )  B such that ρ = ρ .Now,setting K = u (K), observe i L → − → − i i i=1 i x x that – d(B ) ∩ K = L = d(B  ) and u  = v since u  = v i i i i L i L i d(B ) i i i i i −1 −1 – d(C ) ∩ K = R = d(C ) ∩ w (L) since u  = v ◦ w and L = v (J), i i i i i i d(C ) i hence d(C ) ∩ K = d(C  ), and also u  = v ◦ w . i i i R i L i i i It follows that d(A  )= L + R , and, setting u = u  ,wehave u  = v i K i i i K L i i i i i i and u  = v ◦ w . Hence we have a proof π of (A  )  A such that R i J i i i K i=1 π → − = μ ρ = μ ρ = π→ − as contended. → − → − → − → − x x x x x x Though substitution lemmas are usually trivial, the LJ(I) substitution lemma requires some care in its statement and proof . j n Lemma 4 (Substitution). Assume that (A )  A withaproof μ and j j=1 n−1 that, for some i ∈{1,...,n}, (B )  A with a proof ρ. Then there is a j j=1 j n−1 proof π of (C )  A such that π → − = μ ρ /x as soon as for each → − → − i j j=1 ( x )\i x ( x )\i j =1,...,n − 1, d(C )= d(A )+ d(B ) for each j =1,...,n − 1 (remember j s(j,i) j that this requires also that d(A ) ∩ d(B )= ∅)with: s(j,i) j We use notations introduced in Section 1, especially for s(j, i). 206 T. Ehrhard – C  = A and w  = u j j d(A ) s(j,i) d(A ) s(j,i) s(j,i) s(j,i) – C  = B and w  = u ◦ v . j d(B ) j j d(B ) i j j j Proof. By induction on the proof μ. Assume that μ is an axiom, so that there is a k ∈{1,...,n} such that A = u (A ), u is a bijection and d(A )= ∅ for all k∗ k k j j = k.Inthatcasewehave μ = x . There are two subcases to consider. Assume → − k u ◦v i j n−1 first that k = i.ByLemma 2 there is a proof ρ of (B )  u (A ) such i∗ i j j=1 that ρ = ρ .Wehave C = B and w = u ◦ v for j =1,...,n − 1, → − → − j j j i j ( x )\i ( x )\i j n−1 so that ρ is a proof of (C )  A,sowetake π = ρ and equation π → − = j j=1 ( x )\i μ ρ /x holds since μ = x . Assume next that k = i,then d(A )= ∅ → − → − → − i i i x ( x )\i x and hence d(B )= ∅ (and v =0 )for j =1,...,n − 1.Therefore C = A j j ∅ j s(j,i) j n−1 and w = v for j =1,...,n − 1. So our target sequent (C )  A can j s(j,i) j j=1 s(j,i) n−1 also be written (A )  u (A ) andisprovablebyaproof π such that k∗ k j=1 s(j,i) π → − = x as contended. ( x )\i Assume now that μ is a ⇒-intro, that is A =(A ⇒ A ) and μ is n+1 u n+1 j n+1 (A )  A j j=1 (A )  A j j=1 We set B = A  and of course v =0 .Thenwehaveaproof ρ of n n+1 n+1 ∅ d(A) (B )  A such that ρ = ρ by Lemma 1.Weset C = A i → − → − n n+1 j j=1 ( x )\i,x ( x )\i n+1 and w = u . Then by inductive hypothesis applied to θ we have a proof n n+1 0 n  0 → − → − π of (C )  A which satisfies π = θ ρ /x and → − i j j=1 ( x )\i,x x,x n+1 n+1 ( x )\i n−1 applying a ⇒-introduction rule we get a proof π of (C )  A such that j j=1 π → − = λx (θ→ − ρ /x )= μ ρ /x as expected. n+1 → − i → − → − i ( x )\i x,x n+1 ( x )\i x ( x )\i Assume last that the proof μ ends with s t j j n n (E )  E ⇒ A (F )  E j j=1 j j=1 (A )  A j j=1 with d(A )= d(E )+ d(F ), A  = E , A  = F , u  = s j j j j d(E ) j j d(F ) j j d(E ) j j j j and u  = s ◦ t ,for j =1,...,n. And we have μ = ϕ ψ .The j j → − → − → − d(F ) x x x idea is to “share” the substituting proof ρ of (B )  A among ϕ and ψ j j=1 according to what they need, as specified by the formulas E and F .Sowewrite i i −1 −1 d(B )= L +R where L = v (d(E )) and R = v (d(F )) and by Lemma 3 j j j j j i j j i L R v v j n−1 j n−1 we have two proofs ρ of (B  )  E and (B  )  F whereweset j i j i L j=1 R j=1 j j L R v = v  and v = v  , obtained from ρ by restriction. These proofs satisfy j L j R j j j j L R ρ = ρ = ρ . → − → − → − ( x )\i ( x )\i ( x )\i Non-idempotent intersection types in logical form 207 Now we want to apply the inductive hypothesis to ϕ and ρ ,inorder to get j n−1 a proof of the sequent (G )  E ⇒ A where G = C  (observe s j j d(E )+L j j=1 j s(j,i) indeed that d(E ) ⊆ d(A ) and L ⊆ d(B ) and hence are disjoint by our s(j,i) s(j,i) j j assumption that d(C )= d(A )+ d(B ))and w = w  .With j s(j,i) j j d(E )+L j j s(j,i) these definitions, and by our assumptions about C and w , we have for all j j j =1,...,n − 1 G  = C   = A  = E j d(E ) j d(A ) d(E ) s(j,i) d(E ) s(j,i) s(j,i) s(j,i) s(j,i) s(j,i) w  = w   = u  = s j d(E ) d(A ) d(E ) s(j,i) d(E ) s(j,i) s(j,i) s(j,i) s(j,i) s(j,i) G  = C   = B j L j d(B ) L j L j j j j L L L w  = w   =(u ◦ v ) = u  ◦ v = s ◦ v . L j d(B ) L i j L i d(E ) i j j j j j i j j n−1 Therefore the inductive hypothesis applies yielding a proof ϕ of (G ) j j=1 E ⇒ A such that ϕ = ϕ ρ /x = ϕ ρ /x . s → − → − → − i → − → − i ( x )\i x ( x )\i x ( x )\i Next we want to apply the inductive hypothesis to ψ and ρ ,inorder to j n−1 get a proof of the sequent (H )  E where, for j =1,...,n − 1, H = j j=1 C  (again d(F ) ⊆ d(A ) and R ⊆ d(B ) are disjoint by our j d(F )+R s(j,i) s(j,i) j j s(j,i) j assumption that d(C )= d(A )+ d(B ))and r is defined by r  = j s(j,i) j j j d(F ) s(j,i) R R t and r  = t ◦ v . Remember indeed that v : R → d(F ) and t : j R i j i i s(j,i) j j j d(F ) → d(E).Wehave H  = C   = A  = F j j d(F ) d(A ) d(F ) s(j,i) d(F ) s(j,i) s(j,i) s(j,i) s(j,i) s(j,i) H  = C   = B j R j d(B ) R j R j j j j j n−1 and hence by inductive hypothesis there is a proof ψ of (H )  E such that j j=1 ψ = ψ ρ /x = ψ ρ /x . → − → − → − i → − → − i ( x )\i x ( x )\i x ( x )\i To end the proof of the lemma, it will be sufficient to prove that we can apply j j n−1 n−1 a ⇒-elimination rule to the sequents (G )  E ⇒ A and (H )  E j j=1 j j=1 j n−1 in ordertoget aproof π of the sequent (C )  A. Indeed, the proof π j j=1 obtained in that way will satisfy π → − = ϕ ψ = μ ρ /x . → − → − → − → − i ( x )\i ( x )\i ( x )\i x ( x )\i Let j ∈{1,...,n−1}.Wehave C  = G and C  = H simply because j d(G ) j j d(H ) j j j G and H are defined by restricting C .Moreover d(G )= d(E )+ L and j j j j s(j,i) j d(H )= d(F )+ R . Therefore d(G ) ∩ d(H )= ∅ and j s(j,i) j j j d(C )= d(A )+ d(B )= d(E )+ d(F )+ L + R = d(G )+ d(H ) . j s(j,i) j s(j,i) s(j,i) j j j j L L We have w  = w by definition of w as w  .Wehave j d(G ) j d(E )+L j j j s(j,i) j w   = w   = u j d(H ) d(F ) j d(A ) d(F ) s(j,i) d(F ) j s(j,i) s(j,i) s(j,i) s(j,i) = s ◦ t =(s ◦ r ) s(j,i) d(F ) s(j,i) w   = w   =(u ◦ v ) j d(H ) R j d(B ) R i j R j j j j j R R = u  ◦ v = s ◦ t ◦ v = s ◦ r  =(s ◦ r ) i d(F ) i j R j R i j j j j 208 T. Ehrhard and therefore w  = s ◦ r as required. j j d(H ) We shall often use the two following consequences of the Substitution Lemma. j n v Lemma 5. Given a proof μ of (A )  A and a proof ρ of B  A (for j j=1 u u j i−1 j ui◦v n some i ∈{1,...,n}), there is a proof π of (A ) ,B , (A )  A such j j=1 j j=i+1 → − that π = μ ρ /x → − i x x u u j d(A) j i n Proof. By weakening we have a proof μ of (A ) ,B , (A )  A j j=1 j j=i+1 → − such that μ = μ (where x is a list of pairwise distinct variables of → − → − x ( x )\i+1 0 0 d(A ) d(A ) i i v i n length n+1), as well as a proof ρ of (A  ) ,B , (A  )  A such j j i j=1 j=i+1 ∅ ∅ u u j i−1 j u ◦v n that ρ = ρ .ByLemma 4,wehaveaproof π of (A ) ,B , (A ) → − j j=1 j j=i+1 x x i+1 → − A which satisfies π = μ ρ /x = μ ρ /x . → − → − i → − i ( x )\i x ( x )\i x x v n Lemma 6. Given a proof μ of A  B and a proof ρ of (A )  A,thereis j j=1 v◦u → − aproof π of (A )  B such that π = μ ρ /x . → − j=1 x x x The proof is similar to the previous one. If A and B are formulas such that A = B, d(A)= d(B) and A = B ,we say that A and B are similar and we write A ∼ B. One fundamental property of our deduction system is that two formulas which represent the same family of intersection types are logically equivalent. Id Theorem 1. If A ∼ B then A  B with a proof π such that π ∼ x. Id Proof. Assume that A = α[f],thenwehave B = A and A  B is an axiom. Assume that A =(C ⇒ D) and B =(E ⇒ F ).Wehave D ∼ F and u v Id hence D  F with a proof ρ such that ρ ∼ x. And there is a bijection w : d(E) → d(C) such that w (E) ∼ C and u ◦ w = v. By inductive hypothesis Id we have a proof μ of w (E)  C such that μ ∼ y, and hence using the axiom ∗ η w  w E  w (E) and Lemma 5 we have a proof μ of E  C such that μ = μ . x x 1 Id u 1 There is a proof π of (C ⇒ D) ,C  D such that π =(x) y (consider x,y 0 0 d(D) d(C) Id Id the two axioms (C ⇒ D) ,C  C ⇒ D and (C ⇒ D) ,C  C u u u ∅ ∅ and use a ⇒-elimination rule). So by Lemma 5 there is a proof π of (C ⇒ Id u◦w Id v 2 D) ,E  D,thatisof (C ⇒ D) ,E  D, such that π =(x) μ . x,y 3 Id v 3 Applying Lemma 6 we get a proof π of (C ⇒ D) ,E  F such that π = x,y ρ (x) μ /z . We get the expected proof π by a ⇒-introduction rule so that z y π = λy ρ (x) μ /z . By inductive hypothesis π ∼ x. x x z y Non-idempotent intersection types in logical form 209 3.4 Relation between intersection types and LJ(I) Now we explain the precise connection between non-idempotent intersection types and our logical system LJ(I). This connection consists of two statements: – the first one means that any proof of LJ(I) can be seen as a typing derivation in non-idempotent intersection types (soundness) – and the second one means that any non-idempotent intersection typing can be seen as a derivation in LJ(I) (completeness). i n Theorem 2 (Soundness). Let π be a deduction tree of the sequent (A ) i i=1 → − B and x asequenceof n pairwise distinct variables. Then the λ-term π→ − sat- u n isfies (x : A : A )  π→ − : B : B in the intersection type system, for i j i j i i=1 x each j ∈ d(B). Proof. We prove the first part by induction on π (in the course of this induction, we recall the precise definition of π→ − ). If π is the proof q = i ⇒ d(A )= ∅ and u is a bijection q i (A )  u (A ) q i∗ i q=1 q u → − (so that B = u (A ))then π = x .Wehave A =[ ] if q = i, A = i∗ i x i j j q n [ A −1 ] and u (A ) = A −1 . It follows that (x : A : A ) i i i j i q q j q u (j) ∗ u (j) q=1 i i x : B : B is a valid axiom in the intersection type system. i j Assume that π is the proof 1 u u A ,...,A ,A  B 1 u A ,...,A  A ⇒ B 1 n where π is the proof of the premise of the last rule of π. By inductive hypothesis 0 i n u 0 the λ-term π → − satisfies (x : A : A ) ,x : A : A  π → − : B : B i j i j j x,x i i=1 x,x i n A 0 u → − from which we deduce (x : A : A )  λx π :( A , B ): A ⇒ B i j i j j i i=1 x,x A 0 i → − → − which is the required judgment since π = λx π and ( A , B )= j j x x,x A ⇒ B as easily checked. u j Assume last that π ends with 1 2 π π u v 1 u 1 v n n C ,...,C  A ⇒ B D ,...,D  A 1 n 1 n w1 w E ,...,E  B with: for each i =1,...,n there are two disjoint sets L and R such that i i d(E )= L + R , C = E  , D = E  , w  = u ,and w  = u ◦ v . i i i i i L i i R i L i i R i i i i i u n Let j ∈ d(B). By inductive hypothesis, the judgment (x : C : C ) i j i i i=1 π → − : A ⇒ B : A ⇒ B is derivable in the intersection type system. Let K = u j j −1 u ({j}), which is a finite subset of d(A). By inductive hypothesis again, for 210 T. Ehrhard i n 2 → − each k ∈ K we have (x : D : D )  π : A : A . Now observe that j i k i k i=1 x A ⇒ B =([ A | k ∈ K ], B ) so that u j k j j u u n 1 2 i i (x : C + D : E )  π → − π → − : B : B i j k i j i i i=1 x x k∈K is derivable in intersection types (remember that C = D = E ). Since π→ − = i i i 1 2 π → − π → − it will be sufficient to prove that x x w u v i i i E = C + D . (2) j j k i i i k∈K For this, since E =[ E | w (l)= j ], consider an element l of d(E ) such j i l i i that w (l)= j. There are two possibilities: (1) either l ∈ L and in that case we i i know that E = C since E  = C and moreover we have u (l)= w (l)= j i l i l i L i i i (2) or l ∈ R .Inthatcasewehave E = D since E  = D .Moreover i i l i l i R i u(v (l)) = w (l)= j and hence v (l) ∈ K . Therefore i i i j [ E | l ∈ L and w (l)= j ]=[ C | u (l)= j ]= C i l i i i l i j [ E | l ∈ R and w (l)= j ]=[ D | v (l) ∈ K ]= D i l i i i l i j k k∈K and (2) follows. Theorem 3 (Completeness). Let J ⊆ I.Let M be a λ-term and x ,...,x 1 n be pairwise distinct variables, such that (x : m : σ )  M : b : τ in the i i j i i=1 intersection type system for all j ∈ J.Let A ,...,A and B be formulas and 1 n let u ,...,u be almost injective functions such that u : d(A ) → J = d(B). 1 n i i Assume also that A = σ for each i =1,...,n and that B = τ . Last assume i i u j that, for all j ∈ J , one has B = b and A = m for i =1,...,n. Then j j j i i i n the judgment (A )  B has a proof π such that π→ − ∼ M . i i=1 x Proof. By induction on M . Assume first that M = x for some i ∈{1,...,n}. Then we must have τ = σ , m =[ ] for q = i and m =[ b ] for all j ∈ J . i j q i Therefore d(A )= ∅ and u is the empty function for q = i, u is a bijection q q i d(A ) → J and ∀k ∈ d(A ) A = b ,inother words u (A ) ∼ B.By i i i k u (k) i∗ i Id Theorem 1 we know that the judgment (u (A ))  B is provable in LJ(I) with i∗ i u n aproof ρ such that ρ ∼ x.Wehaveaproof θ of (A )  u (A ) which η i∗ i i i=1 → − consists of an axiom so that θ = x and hence by Lemma 6 we have a proof π u n → − → − of (A )  B such that π = ρ [θ /x] ∼ x . η i x x i i=1 Assume that M = λx N,that τ =(σ ⇒ ϕ) andthatwehaveafam- n j ily of deductions (for j ∈ J)of (x : m : σ )  M :(m ,c ): σ ⇒ ϕ with i i j i=1 b =(m ,c ) and the premise of this conclusion in each of these deductions is j j n j (x : m : σ ) ,x : m : σ  N : c : ϕ. We must have B =(C ⇒ D) with i i j u i i=1 D = ϕ, C = σ, d(D)= J , u : d(C) → d(D) almost injective, D = c and j j Non-idempotent intersection types in logical form 211 j u j [ C | k ∈ d(C) and u(k)= j]= m ,thatis C = m ,for each j ∈ J . k j i n u By inductive hypothesis we have a proof ρ of (A ) ,C  D such that i=1 i n ρ ∼ N from which we obtain a proof π of (A )  C ⇒ D such that → − η u i=1 x,x → − π = λx ρ ∼ M as expected. → − η x,x Assume last that M =(N) P andthatwehavea J -indexed family of deduc- tions (x : m : σ )  M : b : τ. Let A ,...,A , u ,...,u and B be LJ(I) i i j 1 n 1 n i i=1 formulas and almost injective functions as in the statement of the theorem. j,0 j,l Let j ∈ J . There is a finite set L ⊆ I and multisets m , (m ) such j l∈L i i j j,0 j 7 n that we have deductions of (x : m : σ )  N :([ a | l ∈ L ],b ): σ ⇒ τ i i j j i i=1 l j,l j and, for each l ∈ L ,of (x : m : σ )  P : a : σ with j i i i i=1 l j j,0 j,l m = m + m . (3) i i i l∈L We assume the finite sets L to be pairwise disjoint (this is possible because I is infinite) and we use L for their union. Let u : L → J be the function which maps l ∈ L to the unique j such that l ∈ L , this function is almost injective. u(l) Let A be an LL(J) formula such that A = σ, d(A)= L and A = a ; such a formula exists by Proposition 1. Let i ∈{1,...,n}.Foreach j ∈ J we know that j j,0 j,l [ A | r ∈ d(A ) and u (r)= j ]= m = m + m i r i i i i i l∈L j,0 −1 and hence we can split the set d(A ) ∩ u ({j}) into disjoint subsets R and i i j,l (R ) in such a way that l∈L j,0 j,0 j,l j,l [ A | r ∈ R ]= m and ∀l ∈ L [ A | r ∈ R ]= m . i r j i r i i i i j,0 j,0 We set R = R ; observe that this is a disjoint union because R ⊆ j∈J i i u(l),l −1 1 u ({j}). Similarly we define R = R which is a disjoint union for i i l∈L j,l j,l the following reason: if l, l ∈ L satisfy u(l)= u(l )= j then R and R i i have been chosen disjoint and if u(l)= j and u(l )= j with j = j we have j,l j ,l −1 −1  1 R ⊆ u {j} and R ⊆ u ({j }).Let v : R → L be defined by: v (r) is i i i i i i u(l),l j,l the unique l ∈ L such that r ∈ R . Since each R is finite the function v is i i almost injective. Moreover u ◦ v = u  1 . i i 0  0 We use u for the restriction of u to R so that u : R → J.Byinduc- i i i i u n tive hypothesis we have ((A  0) )  A ⇒ B with a proof μ such that i u R i=1 j,0 μ ∼ N . Indeed [ A  0 | r ∈ R and u (r)= j ]= m and A ⇒ B = → − η i R r u j i i i x i v n ([ a | u(l)= j ],b ) for each j ∈ J . For the same reason we have ((A  1) ) j i R l i=1 A with a proof ρ such that ρ ∼ P . Indeed for each l ∈ L = d(A) we have → − Notice that our λ-calculus is in Church style and hence the type σ is uniquely determined by the sub-term N of M . 212 T. Ehrhard j,l j [ A  1 | v (r)= l]= m and A = a where j = u(l). By an application i r i l i l i n → − rule we get a proof π of (A )  B such that π = μ ρ ∼ (N) P = M → − → − η i=1 x x x as contended. 4 The untyped Scott case Since intersection types usually apply to the pure λ-calculus, we move now to this setting by choosing in Rel the set R as model of the pure λ-calculus. The ! ∞ R intersection typing system has the elements of R as types, and the typing ∞ ∞ rules involve sequents of shape (x : m )  M : a where m ∈M (R ) and i i i fin ∞ i=1 a ∈ R . We use Λ for the set of terms of the pure λ-calculus, and Λ as the pure λ- calculus extended with a constant Ω subject to the two following  reduction rules: λx Ω  Ω and (Ω) M  Ω. We use ∼ for the least congruence on Λ ω ω ηω Ω which contains  and  and similarly for ∼ . We define a family (H(x)) η ω βηω x∈V → − → − of subsets of Λ minimal such that, for any sequence x =(x ,...,x ) and y = Ω 1 n → − → − (y ,...,y ) such that x, y is repetition-free, and for any terms M ∈H(x ) (for 1 k i i → − → − i =1,...,n), one has λxλ y (x) M ··· M O ··· O ∈H(x) where O ∼ Ω 1 n 1 l j ω for j =1,...,l.Noticethat x ∈H(x). The typing rules of R are Φ, x : m  M : a x :[ ],...,x :[ a ],...,x :[ ]  x : a 1 i n i Φ  λx M :(m, a) Φ  M :([ a ,...,a ],b)(Φ  N : a ) 1 k j j j=1 Φ + Φ  (M) N : b j=1 where we use the following convention: when we write Φ + Ψ it is assumed that n n Φ is of shape (x : m ) and Ψ is of shape (x : p ) ,and then Φ + Ψ is i i i i i=1 i=1 (x : m + p ) . This typing system is just a “proof-theoretic” rephrasing of the i i i i=1 denotational semantics of the terms of Λ in R . Ω ∞ → − Proposition 2. Let M, M ∈ Λ and x =(x ,...,x ) be a list of pairwise dis- Ω 1 n tinct variables containing all the free variables of M and M .Let m ∈M (R ) i fin ∞ for i =1,...,n and b ∈ R .If M ∼ M then (x : m )  M : b iff ∞ βηω i i i=1 (x : m )  M : b. i i i=1 4.1 Formulas We define the associated formulas as follows, each formula A being given together d(A) with d(A) ⊆ I and A ∈ R . – If J ⊆ I then ε is a formula with d(ε )= J and ε = e for j ∈ J J J J j – and if A and B are formulas and u : d(A) → d(B) is almost injective then A ⇒ B is a formula with d(A ⇒ B)= d(B) and A ⇒ B = u u u j ([ A | u(k)= j ], B ) ∈ R . k j ∞ Non-idempotent intersection types in logical form 213 We can consider that there is a type o of pure λ-terms interpreted as R in Rel , such that (o ⇒ o)= o, and then for any formula A we have A = o. Operations of restriction and relocation of formulas are the same as in Sec- tion 3 (setting ε  = ε ) and satisfy the same properties, for instance J K J∩K A = A  and one sets u (ε )= ε if u : J → K is a bijection. K K ∗ J K The deduction rules are exactly the same as those of Section 3, plus the axiom u → − i n ε . With any deduction π of (A )  B and sequence x =(x ,...,x ) of ∅ 1 n i i=1 pairwise distinct variables, we can associate a pure π→ − ∈ Λ defined exactly as in Section 3 (just drop the types associated with variables in abstractions). If π consists of an instance of the additional axiom, we set π→ − = Ω. Lemma 7. Let A, A ,...,A be a formula such that d(A)= d(A )= ∅. Then 1 n i ∅ n (A )  A is provable by a proof π which satisfies π ∼ Ω. i i=1 x1,...,x The proof is a straightforward induction on A using the additional axiom, Lemma 1 and the observations that if d(B ⇒ C)= ∅ then u =0 . u ∅ One can easily define a size function sz : R → N such that sz(e)=0 and sz([ a ,...,a ],a)= sz(a)+ (1 + sz(a )). First we have to prove an adapted 1 k i i=1 version of Proposition 1; here it will be restricted to finite sets. Proposition 3. Let J be a finite subset of I and f ∈ R . There is a formula A such that d(A)= J and A = f . Proof. Observe that, since J is finite, there is an N ∈ N such that ∀j ∈ J ∀q ∈ N q ≥ N ⇒ f(j) =[ ] (remember that f(j) ∈M (R ) ). Let N(f) be the q fin ∞ least such N.Weset sz(f)= sz(f(j)) and the proof is by induction on j∈J (sz(f),N(f)) lexicographically. If sz(f)=0 this means that f(j)= e for all j ∈ J and hence we can take A = ε . Assume that sz(f) > 0,one canwrite f(j)=(m ,a ) with J j j m ∈M (R ) and a ∈ R for each j ∈ J . Just as in the proof of Proposition 1 j fin ∞ j ∞ we choose a set K, a function g : K → R and an almost injective function u : K → J such that m =[ g(k) | u(k)= j ]. The set K is finite since J is and we have sz(g) < sz(f) because sz(f) > 0. Therefore by inductive hypothesis there is a formula B such that d(B)= K and B = g.Let f : J → R defined by f (j)= a ,wehave sz(f ) ≤ sz(f) and N(f ) <N(f) and hence by inductive hypothesis there is a formula C such that C = f.Weset A =(B ⇒ C) which satisfies A = f as required. Theorem 1 still holds up to some mild adaptation. First notice that A ∼ B simply means now that d(A)= d(B) and A = B . Id Theorem 4. If A and B are such that A ∼ B then A  B with a proof π which satisfies π ∈H(x). This is also possible if sz(f)=0 actually. 214 T. Ehrhard Proof. By induction on the sum of the sizes of A and B. Assume that A = ε so that d(B)= J and ∀j ∈ J B = e. There are two cases as to B.Inthe first case B is of shape ε but then we must have K = J and we can take for π an axiom so that π = x ∈H(x). Otherwise we have B =(C ⇒ D) with d(D)= J , ∀j ∈ J D = e and d(C)= ∅,sothat u =0 .Wehave A ∼ D and j J Id hence by inductive hypothesis we have a proof ρ of A  D such that ρ ∈H(x). Id By weakening and ⇒-introduction we get a proof π of A  B which satisfies π = λy ρ ∈H(x). Assume that A =(C ⇒ D).If B = ε then we must have d(C)= ∅, u =0 u J J Id and D ∼ B and hence by inductive hypothesis we have a proof ρ of D  B such that ρ ∈H(x).ByLemma 7 there is a proof θ of  C such that θ ∼ Ω. Id Hence there is a proof π of A  B such that π = ρ [(x) θ/y] ∈H(x). Assume last that B =(E ⇒ F ), then we must have D ∼ F and there must be a bijection w : d(E) → d(C) such that u ◦ w = v and w (E) ∼ C.Wereason Id as in the proof of Lemma 1: by inductive hypothesis we have a proof ρ of D  F Id Id and a proof μ of w (E)  C from which we build a proof π of A  B such that π = λy ρ (x) μ /z ∈H(x) by inductive hypothesis. z y u u 1 n Theorem 5 (Soundness). Let π be a deduction tree of A ,...,A  B and 1 n → − → − x asequenceof n pairwise distinct variables. Then the λ-term π ∈ Λ satisfies u n → − (x : A )  π : B in the R intersection type system, for each j ∈ i j j ∞ i i=1 d(B). The proof is exactly the same as that of Theorem 2, dropping all simple types. For all λ-term M ∈ Λ, we define H (M) as the least subset of element of Λ such that: – if O ∈ Λ and O ∼ Ω then O ∈H (M) for all M ∈ Λ Ω ω Ω – if M = x then H(x) ⊆H (M) – if M = λy N and N ∈H (N) then λy N ∈H (M) Ω Ω – if M =(N) P , N ∈H (N) and P ∈H (P ) then (N ) P ∈H (M). Ω Ω Ω The elements of H (M) can probably be seen as approximates of M . Theorem 6 (Completeness). Let J ⊆ I be finite.Let M ∈ Λ and x ,...,x Ω 1 n be pairwise distinct variables, such that (x : m )  M : b in the R inter- i j ∞ i i=1 section type system for all j ∈ J.Let A ,...,A and B be formulas and let 1 n u ,...,u be almost injective functions such that u : d(A ) → J = d(B).As- 1 n i i sume also that, for all j ∈ J , one has B = b and A = m for i =1,...,n. j j j i i 1 u Then the judgment A ,...,A  B has a proof π such that π→ − ∈H (M). 1 n x The proof is very similar to that of Theorem 3. 5 Concluding remarks and acknowledgments The results presented in this paper show that, at least in non-idempotent inter- section types, the problem of knowing whether all elements of a given family of Non-idempotent intersection types in logical form 215 intersection types (a ) are inhabited by a common λ-term can be reformu- j j∈J lated logically: is it true that one (or equivalently, any) of the indexed formulas A such that d(A)= J and ∀j ∈ A = a is provable in LJ(I)? Such a strong con- j j nection between intersection and Indexed Linear Logic was already mentioned in the introduction of [2], but we never made it more explicit until now. To conclude we propose a typed λ-calculus àlaChurch to denote proofs of the LJ(I) system of Section 4. The syntax of pre-terms is given by s,t... := u u x[J] | λx : A s | (s) t where in x[J], x is a variable and J ⊆ I and, in λx : A s, u is an almost injective function from d(A) to a set J ⊆ I. Given a pre-term s and a variable x,the domain of x in s is the subset dom(x, s) of I given by dom(x, x[J]) = J , dom(x, y[J]) = ∅ if y = x, dom(x, λy : A s)= dom(x, s) (assuming of course y = x)and dom(x, (s) t)= dom(x, s) ∪ dom(x, t).Then a pre-term s is a term if any subterm of t which is of shape (s ) s satisfies 1 2 dom(x, s )∩dom(x, s )= ∅ for all variable x. A typing judgment is an expression 1 2 i n (x : A )  s : B where the x ’s are pairwise distinct variables, s is a term i i i i=1 and each u is an almost injective function d(A ) → d(B). The following typing i i rules exactly mimic the logical rules of LJ(I): d(A)= ∅ ∅ n ((x : A ) )  Ω : A i i=1 i n u q = i ⇒ d(A )= ∅ and u bijection (x : A ) ,x : A  s : B i i i i=1 q u n i n u (x : A )  x [d(A )] : u (A ) (x : A )  λx : A s : A ⇒ B q q i i i i i u q=1 ∗ i i=1 v w i n i n (x : A  )  s : A ⇒ B (x : A  )  t : A i i u i i i=1 i=1 dom(x ,s) dom(x ,t) i i v +(u◦w ) i i n (x : A )  (s) t : B i=1 The properties of this calculus, and more specifically of its β-reduction, and its connections with the resource calculus of [9] will be explored in further work. Another major objective will be to better understand the meaning of LJ(I) formulas, using ideas developed in [3]where a phase semantics is introduced and related to (non-uniform) coherence space semantics. In the intuitionistic present setting, it is tempting to look for Kripke-like interpretations with the hope of generalizing indexed logic beyond the (perhaps too) specific relational setting we started from. Last, we would like to thank Luigi Liquori and Claude Stolze for many helpful discussions on intersection types and the referees for their careful reading and insightful comments and suggestions. References 1. F. Breuvart, G. Manzonetto, and D. Ruoppolo. Relational graph models at work. Logical Methods in Computer Science, 14(3), 2018. 2. A. Bucciarelli and T. Ehrhard. On phase semantics and denotational semantics in multiplicative-additive linear logic. Annals of Pure and Applied Logic, 102(3):247– 282, 2000. 216 T. Ehrhard 3. A. Bucciarelli and T. Ehrhard. On phase semantics and denotational semantics: the exponentials. Annals of Pure and Applied Logic, 109(3):205–241, 2001. 4. M. Coppo and M. Dezani-Ciancaglini. An extension of the basic functionality theory for the λ-calculus. Notre Dame Journal of Formal Logic, 21(4):685–693, 5. M. Coppo, M. Dezani-Ciancaglini, and B. Venneri. Functional characters of solv- able terms. Mathematical Logic Quarterly, 27(2-6):45–58, 1981. 6. D. de Carvalho. Execution time of lambda-terms via denotational semantics and intersection types. CoRR, abs/0905.4251, 2009. 7. D. de Carvalho. Execution time of λ-terms via denotational semantics and inter- section types. MSCS, 28(7):1169–1203, 2018. 8. T. Ehrhard. The Scott model of linear logic is the extensional collapse of its relational model. Theoretical Computer Science, 424:20–45, 2012. 9. T. Ehrhard and L. Regnier. Uniformity and the Taylor expansion of ordinary lambda-terms. Theoretical Computer Science, 403(2-3):347–372, 2008. 10. T. S. Freeman and F. Pfenning. Refinement Types for ML. In D. S. Wise, editor, Proceedings of the ACM SIGPLAN’91 Conference on Programming Language De- sign and Implementation (PLDI), Toronto, Ontario, Canada, June 26-28, 1991, pages 268–277. ACM, 1991. 11. J.-Y. Girard. Normal functors, power series and the λ-calculus. Annals of Pure and Applied Logic, 37:129–177, 1988. 12. J. R. Hindley. Coppo-dezani types do not correspond to propositional logic. The- oretical Computer Science, 28:235–236, 1984. 13. J.-L. Krivine. Lambda-Calculus, Types and Models. Ellis Horwood Series in Com- puters and Their Applications. Ellis Horwood, 1993. Translation by René Cori from French 1990 edition (Masson). 14. L. Liquori and S. R. D. Rocca. Intersection-types à la Church. Information and Computation, 205(9):1371–1386, 2007. 15. L. Liquori and C. Stolze. The Delta-calculus: Syntax and Types. In H. Geu- vers, editor, 4th International Conference on Formal Structures for Computation and Deduction, FSCD 2019, June 24-30, 2019, Dortmund, Germany., volume 131 of LIPIcs, pages 28:1–28:20. Schloss Dagstuhl - Leibniz-Zentrum fuer Informatik, Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made. The images or other third party material in this chapter are included in the chapter’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. On Computability of Data Word Functions Defined by Transducers 1,2() 1 2† L´eo Exibard , Emmanuel Filiot , and Pierre-Alain Reynier Universit´e Libre de Bruxelles, Brussels, Belgium Aix Marseille Univ, Universit´edeToulon, CNRS,LIS,Marseille, France Abstract. In this paper, we investigate the problem of synthesizing computable functions of infinite words over an infinite alphabet (data ω-words). The notion of computability is defined through Turing machines with infinite inputs which can produce the corresponding infinite outputs in the limit. We use non-deterministic transducers equipped with registers, an extension of register automata with outputs, to specify functions. Such transducers may not define functions but more generally relations of data ω-words, and we show that it is PSpace-complete to test whether a given transducer defines a function. Then, given a function defined by some register transducer, we show that it is decidable (and again, PSpace-c) whether such function is computable. As for the known finite alphabet case, we show that computability and continuity coincide for functions defined by register transducers, and show how to decide continuity. We also define a subclass for which those problems are PTime. Keywords: Data Words · Register Automata · Register Transducers · Functionality · Continuity · Computability. 1 Introduction Context Program synthesis aims at deriving, in an automatic way, a program that fulfils a given specification. Such setting is very appealing when for instance the specification describes, in some abstract formalism (an automaton or ideally a logic), important properties that the program must satisfy. The synthesised program is then correct-by-construction with regards to those properties. It is particularly important and desirable for the design of safety-critical systems with hard dependability constraints, which are notoriously hard to design correctly. Program synthesis is hard to realise for general-purpose programming lan- guages but important progress has been made recently in the automatic synthesis A version with full proofs can be found at Funded by a FRIA fellowship from the F.R.S.-FNRS. Research associate of F.R.S.-FNRS. Supported by the ARC Project Transform F´ed´eration Wallonie-Bruxelles and the FNRS CDR J013116F; MIS F451019F projects. Partly funded by the ANR projects DeLTA (ANR-16-CE40-0007) and Ticktac (ANR- 18-CE40-0015). The Author(s) 2020 J. Goubault-Larrecq and B. K¨ onig (Eds.): FOSSACS 2020, LNCS 12077, pp. 217–236, 2020. 218 L. Exibard et al. of reactive systems. In this context, the system continuously receives input signals to which it must react by producing output signals. Such systems are not assumed to terminate and their executions are usually modelled as infinite words over the alphabets of input and output signals. A specification is thus a set of pairs (in,out), where in and out are infinite words, such that out is a legitimate output for in. Most methods for reactive system synthesis only work for synchronous systems over finite sets of input and output signals Σ and Γ . In this synchronous setting, input and output signals alternate, and thus implementations of such a specification are defined by means of synchronous transducers, which are Buc ¨ hi automata with transitions of the form (q, σ, γ, q ), expressing that in state q, when getting input σ ∈ Σ, output γ ∈ Γ is produced and the machine moves to state q . We aim at building deterministic implementations, in the sense that the output γ and state q uniquely depend on q and σ. The realisability problem of specifications given as synchronous non-deterministic transducers, by implementations defined by synchronous deterministic transducers is known to be decidable [14,20]. In this paper, we are interested in the asynchronous setting, in which transducers can produce none or several outputs at once every time some input is read, i.e., transitions are of the form (q, σ, w, q ) where w ∈ Γ . However, such generalisation makes the realisability problem undecidable [2,9]. Synthesis of Transducers with Registers In the setting we just described, the set of signals is considered to be finite. This assumption is not realistic in general, as signals may come with unbounded information (e.g. process ids) that we call here data. To address this limitation, recent works have considered the synthesis of reactive systems processing data words [17,6,16,7]. Data words are infinite words over an alphabet Σ ×D, where Σ is a finite set and D is a possibly infinite countable set. To handle data words, just as automata have been extended to register automata, transducers have been extended to register transducers. Such transducers are equipped with a finite set of registers in which they can store data and with which they can compare data for equality or inequality. While the realisability problem of specifications given as synchronous non-deterministic register transducers (NRT ) by implementation defined by synchronous deter- syn ministic register transducers (DRT ) is undecidable, decidability is recovered syn for specifications defined by universal register transducers and by giving as input the number of registers the implementation must have [7,17]. Computable Implementations In the previously mentioned works, both for finite or infinite alphabets, implementations are considered to be deterministic transducers. Such an implementation is guaranteed to use only a constant amount of memory (assuming data have size O(1)). While it makes sense with regards to memory- efficiency, some problems turn out to be undecidable, as already mentioned: realisability of NRT specifications by DRT , or, in the finite alphabet setting, syn syn when both the specification and implementation are asynchronous. In this paper, we propose to study computable implementations, in the sense of (partial) functions f of data ω-words computable by some Turing machine M that has an infinite input x ∈ dom(f), and produces longer and longer prefixes of the output On Computability of Data Word Functions Defined by Transducers 219 f(x) as it reads longer and longer prefixes of the input x. Therefore, such a machine produces the output f(x) in the limit. We denote by TM the class of Turing machines computing functions in this sense. As an example, consider the function f that takes as input any data ω-word u =(σ ,d )(σ ,d ) ... and outputs 1 1 2 2 (σ ,d ) if d occurs at least twice in u, and otherwise outputs u. This function is 1 1 1 not computable, as an hypothetic machine could not output anything as long as d is not met a second time. However, the following function g is computable. It ∗ ω is defined only on words (σ ,d )(σ ,d ) ... such that σ σ ··· ∈ ((a + b)c ) ,and 1 1 2 2 1 2 transforms any (σ ,d )by(σ ,d ) if the next symbol in {a, b} is an a, otherwise it i i i 1 keeps (σ ,d ) unchanged. To compute it, a TM would need to store d , and then i i 1 wait until the next symbol in {a, b} is met before outputting something. Since ∗ ω the finite input labels are necessarily in ((a + b)c ) , this machine will produce the whole output in the limit. Note that g cannot be defined by any deterministic register transducer, as it needs unbounded memory to be implemented. However, already in the finite alphabet setting, the problem of deciding if a specification given as some non-deterministic synchronous transducer is realisable by some computable function is open. The particular case of realisability by computable functions of universal domain (the set of all ω-words) is known to be decidable [12]. In the asynchronous setting, the undecidability proof of [2] can be easily adapted to show the undecidability of realisability of specifications given by non-deterministic (asynchronous) transducers by computable functions. Functional Specifications As said before, a specification is in general a relation from inputs to outputs. If this relation is a function, we call it functional. Due to the negative results just mentioned about the synthesis of computable functions from non-functional specifications, we instead here focus on the case of functional specifications and address the following general question: given the specification of a function of data ω-words, is this function “implementable”, where we define “implementable” as “being computable by some Turing machine”. Moreover, if it is implementable, then we want a procedure to automatically generate an algorithm that computes it. This raises another important question: how to decide whether a specification is functional ? We investigate these questions for asynchronous register transducers, here called register transducers. This asynchrony allows for much more expressive power, but is a source of technical challenge. Contributions In this paper, we solve the questions mentioned before for the class of (asynchronous) non-deterministic register transducers (NRT). We also give fundamental results on this class. In particular, we prove that: 1. deciding whether an NRT defines a function is PSpace-complete, 2. deciding whether two functions defined by NRT are equal on the intersection of their domains is PSpace-complete, 3. the class of functions defined by NRT is effectively closed under composition, 4. computability and continuity are equivalent notions for functions defined by NRT, where continuity is defined using the classical Cantor distance, 5. deciding whether a function given as an NRT is computable is PSpace-c, 220 L. Exibard et al. 6. those problems are in PTime for a subclass of NRT, called test-free NRT. Finally, we also mention that considering the class of deterministic register transducers (DRT for short) instead of computable functions as a yardstick for the notion of being “implementable” for a function would yield undecidability. Indeed, given a function defined by some NRT, it is in general undecidable to check whether this function is realisable by some DRT, by a simple reduction from the universality problem of non-deterministic register automata [19]. Related Work The notion of continuity with regards to Cantor distance is not new, and for rational functions over finite alphabets, it was already known to be decidable [21]. Its connection with computability for functions of ω-words over a finite alphabet has recently been investigated in [3] for one-way and two-way transducers. Our results lift some of theirs to the setting of data words. The model of test-free NRT can be seen as a one-way non-deterministic version of a model of two-way transducers considered in [5]. 2 Data Words and Register Transducers ∗ ω For a (possibly infinite) set S,wedenoteby S (resp. S ) the set of finite ∞ ∗ ω (resp. infinite) words over this alphabet, and we let S = S ∪ S . For a word u = u ...u , we denote u = n its length, and, by convention, for 1 n u ∈ S , u = ∞. The empty word is denoted ε. For 1 ≤ i ≤ j ≤u, we let u[i:j]= u u ...u and u[i]= u[i:i] the ith letter of u.For u, v ∈ S ,wesay i i+1 j that u is a prefix of v, written u  v, if there exists w ∈ S such that v = uw. −1 ∞ In this case, we define u v = w.For u, v ∈ S , we say that u and v mismatch, written mismatch(u, v), when there exists a position i such that 1 ≤ i ≤u, 1 ≤ i ≤v and u[i] = v[i]. Finally, for u, v ∈ S , we denote by u ∧ v their longest common prefix, i.e. the longest word w ∈ S such that w  u and w  v. Data Words In this paper, Σ and Γ are two finite alphabets and D is a countably infinite set of data. We use letter σ (resp. γ, d) to denote elements of Σ (resp. Γ , D). We also distinguish an arbitrary data value d ∈D. Given a set R, let R R τ be the constant function defined by τ (r)= d for all r ∈ R. Given a finite 0 0 alphabet A,a labelled data is a pair x =(a, d) ∈ A ×D, where a is the label and d the data. We define the projections lab(x)= a and dt(x)= d.A data word over A and D is an infinite sequence of labelled data, i.e. a word w ∈ (A ×D) .We extend the projections lab and dt to data words naturally, i.e. lab(w) ∈ A and ω ω dt(w) ∈D .A data word language is a subset L ⊆ (A ×D) . Note that here, data words are infinite, otherwise they are called finite data words. 2.1 Register Transducers Register transducers are transducers recognising data word relations. They are an extension of finite transducers to data word relations, in the same way register On Computability of Data Word Functions Defined by Transducers 221 automata [15] are an extension of finite automata to data word languages. Here, we define them over infinite data words with a Buc ¨ hi acceptance condition, and allow multiple registers to contain the same data, with a syntax close to [18]. The current data can be compared for equality with the register contents via tests, which are symbolic and defined via Boolean formulas of the following form. Given R a set of registers, a test is a formula φ satisfying the following syntax: = = φ ::= |⊥| r | r | φ ∧ φ | φ ∨ φ |¬φ where r ∈ R. Given a valuation τ : R →D, a test φ and a data d, we denote by τ, d |= φ the satisfiability of φ by d in valuation τ, defined as τ, d |= r if τ(r)= d and τ, d |= r if τ(r) = d. The Boolean combinators behave as usual. We denote by Tst the set of (symbolic) tests over R. Definition 1. A non-deterministic register transducer ( NRT) is a tuple T = (Q, R, i ,F,Δ), where Q is a finite set of states, i ∈ Q is the initial state, 0 0 F ⊆ Q is the set of accepting states, R is a finite set of registers and Δ ⊆ R ∗ Q × Σ × Tst × 2 × (Γ × R) × Q is a finite set of transitions. We write σ,φ|asgn,o q − −−−−−→ q for (q, σ, φ, asgn,o,q ) ∈ Δ (T is sometimes omitted). The semantics of a register transducer is given by a labelled transition system: we define L =(C, Λ, →), where C = Q × (R →D) is the set of configurations, Λ =(Σ ×D) × (Γ ×D) is the set of labels, and we have, for all (q, τ), (q ,τ ) ∈ C (l,w) and for all (l, w) ∈ Λ, that (q, τ) − −−→ (q ,τ ) whenever there exists a transition σ,φ|asgn,o q − −−−−−→ q such that, by writing l =(σ ,d) and w =(γ ,d ) ... (γ ,d ): 1 n n – (Matching labels) σ = σ – (Compatibility) d satisfies the test φ ∈ Tst , i.e. τ, d |= φ. – (Update) τ is the successor register configuration of τ with regards to d and asgn: τ (r)= d if r ∈ asgn, and τ (r)= τ(r) otherwise – (Output) By writing o =(γ ,r ) ... (γ ,r ), we have that m = n and for 1 1 m m all 1 ≤ i ≤ n, γ = γ and d = τ (r ). i i i Then, a run of T is an infinite sequence of configurations and transitions (u ,v ) (u ,v ) 1 1 2 2 ρ =(q ,τ ) −−−−→ (q ,τ ) −−−−→ ··· . Its input is in(ρ)= u u ... , its output is 0 0 1 1 1 2 L L T T out(ρ)= v · v ... . We also define its sequence of states st(ρ)= q q ... , and its 1 2 0 1 trace tr(ρ)= u ·v ·u ·v ... . Such run is initial if (q ,τ )=(i ,τ ). It is final if it 1 1 2 2 0 0 0 satisfies the Buc ¨ hi condition, i.e. inf(st) ∩ F =  ∅,where inf(st)= {q ∈ Q | q = q for infinitely many i}. Finally, it is accepting if it is both initial and final. We u|v then write (q ,τ ) −−→ to express that there is a final run ρ of T starting from 0 0 (q ,τ ) such that in(ρ)= u and out(ρ)= v. In the whole paper, and unless stated 0 0 otherwise, we always assume that the output of an accepting run is infinite (v ∈ (Γ ×D) ), which can be ensured by a Buc ¨ hi condition. A partial run is a finite prefix of a run. The notions of input, output and states u|v are extended by taking the corresponding prefixes. We then write (q ,τ ) −−→ 0 0 T 222 L. Exibard et al. (q ,τ ) to express that there is a partial run ρ of T starting from configuration n n (q ,τ ) and ending in configuration (q ,τ ) such that in(ρ)= u and out(ρ)= v. 0 0 n n Finally, the relation represented by a transducer T is: ω ω T  = (u, v) ∈ (Σ ×D) × (Γ ×D) | there exists an accepting run ρ of T such that in(ρ)= u and out(ρ)= v Example 2. As an example, consider the register transducer T depicted in rename Figure 1. It realises the following transformation: consider a setting in which we deal with logs of communications between a set of clients. Such a log is an infinite sequence of pairs consisting of a tag, chosen in some finite alphabet Σ, and the identifier of the client delivering this tag, chosen in some infinite set of data values. The transformation should modify the log as follows: for a given client that needs to be modified, each of its messages should now be associated with some new identifier. The transformation has to verify that this new identifier is indeed free, i.e. never used in the log. Before treating the log, the transformation receives as input the id of the client that needs to be modified (associated with the tag del), and then a sequence of identifiers (associated with the tag ch), ending with #. The transducer is non-deterministic as it has to guess which of these identifiers it can choose to replace the one of the client. In particular, observe that it may associate multiple output words to a same input if two such free identifiers exist. ch,| ∅, ch,| ∅, σ, r | ∅, (σ, r2) del,| r , ch,r | r , #,| ∅, 1 2 1 2 3 4 = = σ, r ∧ r | r , (σ, r ) 0 0 1 2 Fig. 1. A register transducer T . It has three registers r , r and r and four states. rename 1 2 0 σ denotes any letter in Σ, r1 stores the id of del and r2 the chosen id of ch, while r0 is used to output the last data value read as input. As we only assign data to single registers, we write r for the singleton assignment set {r }. i i Finite Transducers Since we reduce the decision of continuity and functionality of NRT to the one of finite transducers, let us introduce them: a finite transducer (NFT for short) is an NRT with 0 registers (i.e. R = ∅). Thus, its transition relation can be represented as Δ ⊆ Q × Σ × Γ × Q. A direct extension of the construction of [15, Proposition 1] allows to show that: Proposition 3. Let T be an NRT with k registers, and let X ⊂ D be a finite ω ω subset of data. Then, T  ∩ (Σ × X) × (Γ × X) is recognised by an NFT of |R| exponential size, more precisely with O(|Q|×|X| ) states. 2.2 Technical Properties of Register Automata Although automata are simpler machines than transducers, we only use them as tools in our proofs, which is why we define them from transducers, and not the On Computability of Data Word Functions Defined by Transducers 223 other way around. A non-deterministic register automaton, denoted NRA,isa transducer without outputs: its transition relation is Δ ⊆ Q × Σ × Tst × 2 × {ε}× Q (simply represented as Δ ⊆ Q × Σ × Tst × 2 × Q). The semantics are the same, except that now we lift the condition that the output v is infinite since there is no output. For A an NRA, we denote L(A)= {u ∈ (Σ ×D) | there exists an accepting run ρ of A over u}. Necessarily the output of an accepting run is ε. In this section, we establish technical properties about NRA. Proposition 4, the so-called “indistinguishability property”, was shown in the seminal paper by Kaminski and Francez [15, Proposition 1]. Their model differs in that they do not allow distinct registers to contain the same data, and in the corresponding test syntax, but their result easily carries to our setting. It states that if an NRA accepts a data word, then such data word can be relabelled with data from any set containing d and with at least k + 1 elements. Indeed, at any point of time, the automaton can only store at most k data in its registers, so its notion of “freshness” is a local one, and forgotten data can thus be reused as fresh ones. Moreover, as the automaton only tests data for equality, their actual value does not matter, except for d which is initially contained in the registers. Such “small-witness” property is fundamental to NRA, and will be paramount in establishing decidability of functionality (Section 3) and computability (Sec- tion 4). We use it jointly with Lemma 5, which states that the interleaving of the traces of runs of an NRT can be recognised with an NRA, and Lemma 6, which expresses that an NRA can check whether interleaved words coincide on some bounded prefix, and/or mismatch before some given position. Proposition 4 ([15]). Let A be an NRA with k registers. If L(A) = ∅, then, for any X ⊆D of size |X|≥ k +1 such that d ∈ X, L(A) ∩ (Σ × X) = ∅. The runs of a register transducer T can be flattened to their traces, so as to be recognised by an NRA. Those traces can then be interleaved, in order to be compared. The proofs of the following properties are straightforward. (u ,u ) (v ,v ) 1 1 1 1 Let ρ =(q ,τ ) − −−−−→ (q ,τ ) ... and ρ =(p ,μ ) − −−−→ (p ,μ ) ... be 1 0 0 1 1 2 0 0 1 1 L L T T two runs of a transducer T . Then, we define their interleaving ρ ⊗ρ = u ·u ·v · 1 2 1 1 v · u · u · v · v ... and L (T)= {ρ ⊗ ρ | ρ and ρ are accepting runs of T }. 2 2 ⊗ 1 2 1 2 1 2 2 Lemma 5. If T has k registers, then L (T ) is recognised by an NRA with 2k registers. Lemma 6. Let i, j ∈ N ∪ {∞}. We define M = {u u v v ···|∀k ≥ 1,u ,v ∈ 1 1 k k j 1 1 (Σ ×D),u ,v ∈ (Γ ×D) , ∀1 ≤ k ≤ j, v = u and u · u ···∧ v · v ...≤ i}. k k k k 1 2 1 2 Then, M is recognisable by an NRA with 2 registers and with 1 register if i = ∞. 3 Functionality, Equivalence and Composition of NRT In general, since they are non-deterministic, NRT may not define functions but relations, as illustrated by Example 2. In this section, we first show that deciding 224 L. Exibard et al. whether a given NRT defines a function is PSpace-complete, in which case we call it functional. We show, as a consequence, that testing whether two functional NRT define two functions which coincide on their common domain is PSpace-complete. Finally, we show that functions defined by NRT are closed under composition. This is an appealing property in transducer theory, as it allows to define complex functions by composing simple ones. Example 7. As explained before, the transducer T described in Example 2 rename is not functional. To gain functionality, one can reinforce the specification by considering that one gets at the beginning a list of k possible identifiers, and that one has to select the first one which is free, for some fixed k. This transformation is realised by the register transducer T depicted in Figure 2 (for k = 2). rename2 = = = del,| r1, ch,r | r2, ch,r ∧ r | r3, 1 1 2 1 2 3 4 = = = σ, r | ∅, (σ, r ) σ, r | ∅, (σ, r ) 1 3 σ, r | ∅, (σ, r ) 1 2 1 3 σ, r | ∅, (σ, r ) #,| ∅, 2 2 7 6 5 = = = = = = = σ, r ∧ r | r , (σ, r ) σ, r ∧ r ∧ r | r , (σ, r ) σ, r ∧ r | r , (σ, r ) 0 0 0 0 0 0 1 3 1 2 3 1 2 Fig. 2. A NRT T , with four registers r ,r ,r and r (the latter being used, as in rename2 1 2 3 0 Figure 1, to output the last read data). After reading the # symbol, it guesses whether the value of register r appears in the suffix of the input word. If not, it goes to state 5, and replaces occurrences of r by r . Otherwise, it moves to state 6, waiting for an 1 2 occurrence of r , and replaces occurrences of r by r . 2 1 3 Let us start with the functionality problem in the data-free case. It is al- ready known that checking whether an NFT over ω-words is functional is decid- able [13,11]. By relying on the pattern logic of [10] designed for transducers of finite words, it can be shown that it is decidable in NLogSpace. Proposition 8. Deciding whether an NFT is functional is in NLogSpace. The following theorem shows that a relation between data-words defined by an NRT with k registers is a function iff its restriction to a set of data with at most 2k + 3 data is a function. As a consequence, functionality is decidable as it reduces to the functionality problem of transducers over a finite alphabet. Theorem 9. Let T be an NRT with k registers. Then, for all X ⊆D of size |X|≥ 2k +3 such that d ∈ X, we have that T is functional if and only if ω ω T  ∩ ((Σ × X) × (Γ × X) ) is functional. Proof. The left-to-right direction is trivial. Now, assume T is not functional. Let ω ω x ∈ (Σ ×D) be such that there exists y, z ∈ (Γ ×D) such that y =  z and (x, y), (x, z) ∈ T . Let i = y ∧ z. Then, consider the language L = {ρ ⊗ ρ | ρ 1 2 1 and ρ are accepting runs of T, in(ρ )= in(ρ ) and out(ρ )∧out(ρ )≤ i}. Since, 2 1 2 1 2 #,| ∅, On Computability of Data Word Functions Defined by Transducers 225 by Lemma 5, L (T ) is recognised by an NRA with 2k registers and, by Lemma 6, i i M is recognised by an NRA with 2 registers, we get that L = L (T ) ∩ M is ∞ ∞ recognised by an NRA with 2k + 2 registers. Now, L = ∅, since, by letting ρ and ρ be the runs of T both with input x and 1 2 with respective outputs y and z, we have that w = ρ ⊗ ρ ∈ L. Let X ⊆D such 1 2 that |X|≥ 2k + 3 and d ∈ X.By Proposition 4, we get that L ∩ (Σ × X) = ∅. By letting w = ρ ⊗ ρ ∈ L ∩ (Σ × X) , and x = in(ρ )= in(ρ ), y = out(ρ ) 1 2 1 2 1 ω ω and z = out(ρ ), we have that (x ,y ), (x ,z ) ∈ T  ∩ ((Σ × X) × (Γ × X) ) and y ∧ z ≤ i, so, in particular, y = z (since both are infinite words). Thus, ω ω T  ∩ ((Σ × X) × (Γ × X) ) is not functional. As a consequence of Proposition 8 and Theorem 9, we obtain the follow- ing result. The lower bound is obtained by encoding non-emptiness of register automata, which is PSpace-complete [4]. Corollary 10. Deciding whether an NRT T is functional is PSpace-complete. Hence, the following problem on the equivalence of NRT is decidable: Theorem 11. The problem of deciding, given two functions f, g defined by NRT, whether for all x ∈ dom(f) ∩ dom(g), f(x)= g(x),is PSpace-complete. Proof. The formula ∀x ∈ dom(f) ∩ dom(g) · f(x)= g(x) is true iff the relation f ∪ g = {(x, y) | y = f(x) ∨ y = g(x)} is a function. The latter can be decided by testing whether the disjoint union of the transducers defining f and g defines a function, which is in PSpace by Corollary 10. To show the hardness, we similarly reduce the emptiness problem of NRA A over finite words, just as in the proof of Corollary 10. In particular, the functions f and f defined in this proof (which 1 2 have the same domain) are equal iff L(A)= ∅. Note that under the promise that f and g have the same domain, the latter theorem implies that it is decidable to check whether the two functions are equal. However, checking dom(f)= dom(g) is undecidable, as the language- equivalence problem for non-deterministic register automata is undecidable, since, in particular, universality is undecidable [19]. Closure under composition is a desirable property for transducers, which holds in the data-free setting [1]. We show that it also holds for functional NRT. Theorem 12. Let f, g be two functions defined by NRT. Then, their composition f ◦ g is (effectively) definable by some NRT. Proof (Sketch). By f ◦ g we mean f ◦ g : x → f(g(x)). Assume f and g are defined by T =(Q ,R ,q ,F ,Δ ) and T =(Q ,R ,p ,F ,Δ ) respectively. f f f 0 f f g g g 0 g g Wlog we assume that the input and output finite alphabets of T and T are f g all equal to Σ, and that R and R are disjoint. We construct T such that f g T  = f ◦ g. The proof is similar to the data-free case where the composition is shown via a product construction which simulates both transducers in parallel, executing the second on the output of the first. Assume T has some transition g 226 L. Exibard et al. σ,φ|{r},o p − −−−−−→ q where o ∈ (Σ × R ) . Then T has to be able to execute transitions of T while processing o, even though o does not contain any concrete data values (it is here the main important difference with the data-free setting). However, if T knows the equality types between R and R , then it is able to trigger the f g transitions of T . For example, assume that o =(a, r ) and assume that the f g content of r is equal to the content of r , r being a register of T , then if T has g f f f f a,r |{r },o f f some transition of the form p − −−−−−−−→ q then T can trigger the transition σ,φ|{r}∪{r :=r },o (p, q) − −−−−−−−−−−−−→ (p ,q ) where the operation r := r is a syntactic sugar on top of NRT that intuitively means “put the content of r into r ”. Remark 13. The proof of Theorem 12 does not use the hypothesis that f and g are functions, and actually shows a stronger result, namely that relations defined by NRT are closed under composition. 4 Computability and Continuity We equip the set of (finite or infinite) data words with the usual distance: for ω −u∧v u, v ∈ (Σ ×D) , d(u, v)=0if u = v and d(u, v)=2 otherwise. A sequence of (finite or infinite) data words (x ) converges to some infinite data word x n n∈N if for all > 0, there exists N ≥ 0 such that for all n ≥ N, d(x ,x) ≤ . In order to reason with computability, we assume in the sequel that the infinite set of data values D we are dealing with has an effective representation. For instance, this is the case when D = N. We now define how a Turing machine can compute a function of data words. We consider deterministic Turing machines, which three tapes: a read-only one- way input tape (containing the infinite input data word), a two-way working tape, and a write-only one-way output tape (on which it writes the infinite output data word). Consider some input data word x ∈ (Σ ×D) . For any integer k ∈ N,we let M(x, k) denote the output written by M on its output tape after having read the k first cells of the input tape. Observe that as the output tape is write-only, the sequence of data words (M(x, k)) is non-decreasing. k≥0 ω ω Definition 14 (Computability). A function f :(Σ ×D) → (Γ ×D) is computable if there exists a deterministic multi-tape machine M such that for all x ∈ dom(f), the sequence (M(x, k)) converges to f(x). k≥0 ω ω Definition 15 (Continuity). A function f :(Σ ×D) → (Γ ×D) is contin- uous at x ∈ dom(f) if (equivalently): (a) for all sequences of data words (x ) converging towards x, where for all n n∈N i ∈ N, x ∈ dom(f), we have that (f(x )) converges to f(x). i n n∈N (b) ∀i ≥ 0, ∃j ≥ 0, ∀y ∈ dom(f), x ∧ y≥ j ⇒f(x) ∧ f(y)≥ i. Then, f is continuous if and only if it is continuous at each x ∈ dom(f). Finally, a functional NRT T is continuous when T  is continuous. On Computability of Data Word Functions Defined by Transducers 227 Example 16. We give an example of a non-continuous function f. The finite input and output alphabets are unary, and are therefore ignored in the description of f. Such function associates with every sequence s = d d ··· ∈ D the word 1 2 f(s)= d if d occurs infinitely many times in s, otherwise f(s)= s itself. The function f is not continuous. Indeed, by taking d = d , the sequence of n ω  ω  n ω ω data words d(d ) d converges to d(d ) , while f(d(d ) d )= d converges to ω  ω  ω d = f(d(d ) )= d(d ) . Moreover, f is realisable by some NRT which non-deterministically guesses whether d repeats infinitely many times or not. It needs only one register r in which to store d . In the first case, it checks whether the current data d is equal the content r infinitely often, and in the second case, it checks that this test succeeds finitely many times, using Buc ¨ hi conditions. One can show that the register transducer T considered in Example 7 rename2 also realises a function which is not continuous, as the value stored in register r may appear arbitrarily far in the input word. One could modify the specification to obtain a continuous function as follows. Instead of considering an infinite log, one considers now an infinite sequence of finite logs, separated by $ symbols. The register transducer T , depicted in Figure 3, defines such a function. rename3 = = = del,| r , ch,r | r , ch,r ∧ r | r , 1 2 3 1 1 2 1 2 3 4 = = $,| ∅, σ, r | ∅, (σ, r3) σ, r | ∅, (σ, r3) 1 1 #,| ∅, σ, r | ∅, (σ, r2) 7 6 σ, r | ∅, (σ, r ) 5 = = = = = = = σ, r ∧ r | r0, (σ, r0) σ, r ∧ r ∧ r | r0, (σ, r0) σ, r ∧ r | r0, (σ, r0) 1 3 1 2 3 1 2 Fig. 3. A register transducer T . This transducer is non-deterministic, yet it defines rename3 a continuous function. We now prove the equivalence between continuity and computability for functions defined by NRT. One direction, namely the fact that computability implies continuity, is easy, almost by definition. For the other direction, we rely on the following lemma which states that it is decidable whether a word v can be safely output, only knowing a prefix u of the input. In particular, given a function f,welet f be the function defined over all finite prefixes u of words in dom(f) by f(u)= (f(uy) | uy ∈ dom(f)), the longest common prefix of all outputs of continuations of u by f. Then, we have the following decidability result: Lemma 17. The following problem is decidable. Given an NRT T defining a ∗ ∗ function f, two finite data words u ∈ (Σ ×D) and v ∈ (Γ ×D) , decide whether v  f(u). $,| ∅, #,| ∅, 228 L. Exibard et al. Theorem 18. Let f be a function defined by some NRT T . Then f is continuous iff f is computable. Proof. ⇐ Assuming f = T  is computable by some Turing machine M,weshow that f is continuous. Indeed, consider some x ∈ dom(f), and some i ≥ 0. As the sequence of finite words (M(x, k)) converges to f(x) and these words have k∈N non-decreasing lengths, there exists j ≥ 0 such that |M(x, j)|≥ i. Hence, for any data word y ∈ dom(f) such that |x ∧ y|≥ j, the behaviour of M on y is the same during the first j steps, as M is deterministic, and thus |f(x) ∧ f(y)|≥ i, showing that f is continuous at x. ⇒ Assume that f is continuous. We describe a Turing machine computing f; the corresponding algorithm is formalised as Algorithm 1. When reading a finite prefix x[:j] of its input x ∈ dom(f), it computes the set P of all configurations (q, τ) reached by T on x[:j]. This set is updated along taking increasing values of j. It also keeps in memory the finite output word o that has been output so far. For any j,if dt(x[:j]) denotes the data that appear in x, the algorithm then decides, for each input (σ, d) ∈ Σ × (dt(x[:j]) ∪{d }) whether (σ, d) can safely be output, i.e., whether all accepting runs on words of the form x[:j]y, for an infinite word y, outputs at least o (σ, d). The latter can be decided, given T , o j j and x[:j], by Lemma 17. Note that it suffices to look at data in dt(x[:j]) ∪{d } only since, by definition of NRT, any data that is output is necessarily stored in some register, and therefore appears in x[:j] or is equal to d . Let us show that Algorithm 1: Algorithm describing the machine M computing f. Data: x ∈ dom(f) 1 o :=  ; 2 for j =0 to ∞ do 3 for (σ, d) ∈ Σ × (dt(x[:j]) ∪{d }) do 4 if o.(σ, d)  f(x[:j]) then // such test is decidable by Lemma 17 5 o := o.(σ, d); 6 output (σ, d); 7 end 8 end 9 end M actually computes f. Let x ∈ dom(f). We have to show that the sequence (M (x, j)) converges to f(x). Let o be the content of variable o of M when f j j f exiting the inner loop at line 8, when the outer loop (line 2) has been executed j times (hence j input symbols have been read). Note that o = M (x, j). We j f have o  o  ... and o  f(x[:j]) for all j ≥ 0. Hence, o  f(x) for all 1 2 j j j ≥ 0. To show that (o ) converges to f(x), it remains to show that (o ) is j j j j non-stabilising, i.e. o ≺ o ≺ ... for some infinite subsequence i <i <... . i i 1 2 1 2 First, note that f being continuous is equivalent to the sequence (f(x[:k])) converging to f(x). Therefore we have that f(x) ∧ f(x[:k]) can be arbitrarily long, On Computability of Data Word Functions Defined by Transducers 229 for sufficiently large k. Let j ≥ 0 and (σ, d)= f(x)[|o |+1]. By the latter property and the fact that o .(σ, d)  f(x), necessarily, there exists some k> j such that o .(σ, d)  f(x[:k]). Moreover, by definition of NRT, d is necessarily a data that appears in some prefix of x, therefore there exists k ≥ k such that d appears in ˆ ˆ x[:k ] and o .(σ, d)  f(x[:k]  f(x[:k ]. This entails that o .(σ, d)  o . So, we j j k have shown that for all for all j, there exists k >j such that o ≺ o ,which j k concludes the proof. Now that we have shown that computability is equivalent with continuity for functions defined by NRT, we exhibit a pattern which allows to decide continuity. Such pattern generalises the one of [3] to the setting of data words, the difficulty lying in showing that our pattern can be restricted to a finite number of data. Theorem 19. Let T be a functional NRT with k registers. Then, for all X ⊆D such that |X|≥ 2k +3 and d ∈ X, T is not continuous at some x ∈ (Σ ×D) if and only if T is not continuous at some z ∈ (Σ × X) . Proof. The right-to-left direction is trivial. Now, let T be a functional NRT with k registers which is not continuous at some x ∈ (Σ ×D) . Let f : dom(T ) → (Γ ×D) be the function defined by T , as: for all u ∈ dom(T ),f(u)= v where v ∈ (Γ ×D) is the unique data word such that (u, v) ∈ T . Now, let X ⊆D be such that |X|≥ 2k + 3 and d ∈ X. We need to build two words u and v labelled over X which coincide on a sufficiently long prefix to allow for pumping, hence yielding a converging sequence of input data words whose images do not converge, witnessing non-continuity. To that end, we use a similar proof technique as for Theorem 9: we show that the language of interleaved runs whose inputs coincide on a sufficiently long prefix while their respective outputs mismatch before a given position is recognisable by an NRA, allowing us to use the indistinguishability property. We also ask that one run presents sufficiently many occurrences of a final state q , so that we can ensure that there exists a pair of configurations containing q which repeats in both runs. On reading such u and v, the automaton behaves as a finite automaton, since the number of data is finite ([15, Proposition 1]). By analysing the respective runs, we can, using pumping arguments, bound the position on which the mismatch appears in u, then show the existence of a synchronised loop over u and v after such position, allowing us to build the sought witness for non-continuity. Relabel over X Thus, assume T is not continuous at some point x ∈ (Σ ×D) . Let ρ be an accepting run of T over x, and let q ∈ inf(st(ρ)) ∩ F be an accepting state repeating infinitely often in ρ. Then, let i ≥ 0 be such that for all j ≥ 0, there exists y ∈ dom(f) such that x ∧ y≥ j but f(x) ∧ f(y)≤ i. Now, define 2k K = |Q|× (2k +3) and let m =(2i +3) × (K + 1). Finally, pick j such that ρ[1:j] contains at least m occurrences of q . Consider the language: L = ρ ⊗ ρ in(ρ ) ∧ in(ρ )≥ j, out(ρ ) ∧ out(ρ )≤ i and 1 2 1 2 1 2 there are at least m occurrences of q in ρ [1:j] f 1 230 L. Exibard et al. By Lemma 5, L (T ) is recognised by an NRA with 2k registers. Additionnally, by Lemma 6, M is recognised by an NRA with 2 registers. Thus, L = L (T )∩O ∩ m,j M , where O checks there are at least m occurrences of q in ρ [1:j] (this is f 1 j m,j easily doable from the automaton recognising L (T ) by adding an m-bounded counter), is recognisable by an NRA with 2k + 2 registers. Choose y ∈ dom(f) such that x ∧ y≥ j but f(x) ∧ f(y)≤ i. By letting ρ (resp. ρ ) be an accepting run of T over x (resp. y)wehave ρ ⊗ ρ ∈ L,so 1 2 1 2 ω ω L = ∅.By Proposition 4, L ∩ ((Σ × X) × (Γ × X) ) = ∅. Let w = ρ ⊗ ρ ∈ 1 2 ω ω L ∩ ((Σ × X) × (Γ × X) ), u = in(ρ ) and v = in(ρ ). Then, u ∧ v≥ j, 1 2 f(u) ∧ f(v)≤ i and there are at least m occurrences of q in ρ [1:j]. f 1 Now, we depict ρ and ρ in Figure 4, where we decompose u as u = 1 2 u ...u ·s and v as v = u ...u ·t; their corresponding images being respectively 1 m 1 m u = u ...u · s and u = u ...u t . We also let l =(i + 1)(K + 1) and 1 m 1 m l =2(i + 1)(K + 1). Since the data of u, v and w belong to X, we know that τ ,μ : R → X. i i u | u  u  | u u1 | u ul | u l+1 l+1 ul | u  l +1 l +1 um | u s|s 1 l l m R ... q ,μ ... q ,μ ... q ,μ i , d f l f  f m 0 l (i + 1)(K + 1) occurrences of q (i + 1)(K + 1) occurrences of q (K + 1) occurrences of q f f f u | u u | u u | u u | u  u | u  u | u t|t 1 1 l l l+1 l+1 l +1 l +1 m m ... q ,τ ... q ,τ  ... q ,τ i , d l l l l m m 0 0 Fig. 4. Runs of f over u = u ...u · s and v = u ...u · t. 1 m 1 m Repeating configurations First, let us observe that in a partial run of ρ containing more than |Q|×|X| occurrences of q , there is at least one productive transition, i.e. a transition whose output is o =  ε. Otherwise, by the pigeonhole principle, there exists a configuration μ : R → X such that (q ,μ) occurs at least twice in the partial run. Since all transitions are improductive, it would mean that, w|ε by writing w the corresponding part of input, we have (q ,μ) −−→ (q ,μ). This f f partial run is part of ρ , so, in particular, (q ,μ) is accessible, hence by taking w |w w such that (i ,τ ) − −−−→ (q ,μ), we have that f(w w )= w , which is a 0 0 0 f 0 finite word, contradicting our assumption that all accepting runs produce an infinite output. This implies that, for any n ≥|Q|×|X| (in particular for n = l), u ...u ≥ i +1. 1 n Locate the mismatch Again, upon reading u ...u , there are (i + 1)(K +1) l+1 l occurrences of q . There are two cases: (a) There are at least i + 1 productive transitions in ρ . Then, we obtain that u ...u  >i,so mismatch(u ...u ,u ...u ), since we know f(u) ∧ 1 l 1 l 1 l f(v)≤ i and they are respectively prefixes of f(u) and f(v), both of length at On Computability of Data Word Functions Defined by Transducers 231 2k least i+1. Afterwards, upon reading u ...u , there are K+1 > |Q|×|X| l +1 m occurrences of q , so, by the pigeonhole principle, there is a repeating pair: there exist indices p and p such that l ≤ p<p ≤ m and (q ,μ )=(q ,μ ), f p f p (q ,τ )=(q ,τ ). Thus, let z = u ...u , z = u ...u and z = p p p p P 1 p R p+1 p C u ...u · t (P stands for prefix, R for repeat and C for continuation;we p +1 m use capital letters to avoid confusion with indices). By denoting z = u ...u , P 1 p z = u ...u , z = u ...u , z = u ...u  and z = u  ...u ·t R p+1 p P 1 p R p+1 p C p +1 m the corresponding images, z = z · z is a point of discontinuity. Indeed, P R define (z ) as, for all n ∈ N, z = z · z · z . Then, (z ) converges n n∈N n P C n n∈N towards z, but, since for all n ∈ N, f(z )= z · z · z , we have that P L C f(z ) −−→ f(z)= z · z , since mismatch(z ,z ). P L P P n∞ (b) Otherwise, by the same reasoning as above, it means there exists a repeating pair with only improductive transitions in between: there exist indices p and p such that l ≤ p<p ≤ l ,(q ,μ )=(q ,μ ), (q ,τ )=(q ,τ ), f p f p p p p p u ...u  |ε u ...u  |ε p+1 p+1 p p and (q ,μ ) −−−−−−−→ (q ,μ ), (q ,τ ) −−−−−−−→ (q ,τ ). Then, by f p f p p p p p taking z = u ...u , z = u ...u  and z = u  ...u · t,wehave, P 1 p R p+1 p C p +1 m by letting z = u ...u , z = u ...u , z = u ...u , z = ε and P 1 p R p+1 p P 1 p R z = u ...u · t , that z = z · z is a point of discontinuity. Indeed, P R C n +1 m define (z ) as, for all n ∈ N, z = z · z · z . Then, (z ) indeed n n∈N n P C n n∈N converges towards z, but, since for all n ∈ N, f(z )= z · z ,wehave P C that f(z ) −−→ f(z)= z · z , since mismatch(z ,z · z ) (the mismatch P R P P C n∞ necessarily lies in z , since z ≥ i + 1). P P Corollary 20. Deciding whether an NRT defines a continuous function is PSpace-complete. Proof. Let X ⊆D be a set of size 2k + 3 containing d .By Theorem 19, T is not ω ω continuous iff it is not continuous at some z ∈ (Σ × X) ,iff T  ∩ (Σ × X) × (Γ × X) is not continuous. By Proposition 3, such relation is recognisable by a |R| finite transducer T with O(|Q|×|X| ) states, which can be built on-the-fly. By [3], the continuity of functions defined by NFT is decidable in NLogSpace, which yields a PSpace procedure. For the hardness, we reduce again from the emptiness problem of register automata, which is PSpace-complete [4]. Let A be a register automaton over some alphabet Σ ×D. We construct a transducer T which defines a continuous function iff L(A)= ∅ iff the domain of T is empty. Let f be a non-continous function realised by some NRT H (it exists by Example 16). Then, let # ∈ Σ be a fresh symbol, and define the function g as the function mapping any data word of the form w(#,d)w to w(#,d)f(w )if w ∈ L(A). The function g is realised by an NRT which simulates A and copies its inputs on the output to implement the identity, until it sees #. If it was in some accepting state of A before seeing #, it branches to some initial state of H and proceeds executing H. If there is some w ∈ L(A), then the subfunction g mapping words of the form w (#,d)w 0 w 0 to w (#,d)f(w ) is not continuous, since f is not. Hence g is not continuous. Conversely, if L(A)= ∅, then dom(g)= ∅,so g is continuous.   232 L. Exibard et al. In [3], non-continuity is characterised by a specific pattern (Lemma 21, Figure 1), i.e. the existence of some particular sequence of transitions. By applying this ω ω characterisation to the finite transducer recognising T  ∩ ((Σ × X) × (Γ × X) ), as constructed in Proposition 3, we can characterise non-continuity by a similar pattern, which will prove useful to decide (non-)continuity of test-free NRT in NLogSpace (cf Section 5): Corollary 21 ([3]). Let T be an NRT with k registers. Then, for all X ⊆D such that |X|≥ 2k +3 and d ∈ X, T is not continuous at some x ∈ (Σ ×D) if and only if it has the pattern of Figure 5. v | v v | v u | u u | u w | w q ,μ q, τ i ,τ f i ,τ 0 0 0 0 Fig. 5. A pattern characterising non-continuity of functions definable by an NRT:we ask that there exist configurations (q ,μ)and (q, τ), where q is accepting, as well as f f finite input data words u, v, finite output data words u ,v ,u ,v , and an infinite input data word w admitting an accepting run from configuration (q, τ) producing output w , such that mismatch(u ,u ) ∨ (v = ε ∧ mismatch(u ,u w )). 5 Test-free Register Transducers In [7], we introduced a restriction which allows to recover decidability of the bounded synthesis problem for specifications expressed as non-deterministic register automata. Applied to transducers, such restriction also yields polynomial complexities when considering the functionality and computability problems. An NRT T is test-free when its transition function does not depend on the tests conducted over the input data. Formally, we say that T is test-free if for all σ,φ|asgn,o transitions q − −−−−−→ q we have φ = . Thus, we can omit the tests altogether R ∗ and its transition relation can be represented as Δ ⊆ Q × Σ × 2 × (Γ × R) × Q. ω ω Example 22. Consider the function f :(Σ ×D) → (Γ ×D) associating, to x =(σ ,d )(σ ,d ) ... , the value (σ ,d )(σ ,d )(σ ,d ) ... if there are infinitely 1 1 2 2 1 1 2 1 3 1 many a in x, and (σ ,d )(σ ,d )(σ ,d ) ... otherwise. 1 2 2 2 3 2 f can be implemented using a test-free NRT with one register: it initially guesses whether there are infinitely many a in x, if it is the case, it stores d in the single register r, otherwise it waits for the next input to get d and stores it in r. Then, it outputs the content of r along with each σ . f is not continuous, as even outputting the first data requires reading an infinite prefix when d = d . 1 2 output y[i] z[i]= x[j ] On Computability of Data Word Functions Defined by Transducers 233 Note that when a transducer is test-free, the existence of an accepting run over a given input x only depends on its finite labels. Hence, the existence of two outputs y and z which mismatch over data can be characterised by a simple pattern (Figure 6), which allows to decide functionality in polynomial time: Theorem 23. Deciding whether a test-free NRT is functional is in PTime. Proof. Let T be a test-free NRT such that T is not functional. Then, there exists ω ω x ∈ (Σ ×D) , y, z ∈ (Γ ×D) such that (x, y), (x, z) ∈ T  and y =  z. Then, let i be such that y[i] = z[i]. There are two cases. Either lab(y[i]) = lab(z[i]), which means that the finite transducer T obtained by ignoring the registers of T is not functional. By Proposition 8, such property can be decided in NLogSpace,so let us focus on the second case: dt(y[i]) = dt(z[i]). r ∈ asgn r ∈ asgn r ∈ o r ∈ o j l j’ j l l’ r is not reassigned Fig. 6. A situation characterising the existence of a mismatch over data. Since acceptance does not depend on data, we can always choose x such that dt(x[j]) = dt(x[j ]). Here, we assume that the labels of x, y and z range over a unary alphabet; in particular y[i]= x[j]iff dt(y[i]) = dt(x[j]). Finally, for readability, we did not write that r should not be reassigned between j and l . Note that the position of i with regards to j, j ,l and l does not matter; nor does the position of l w.r.t. l . We here give a sketch of the proof: observe that an input x admits two outputs which mismatch over data if and only if it admits two runs which respectively store x[j] and x[j ] such that x[j] = x[j ] and output them later at the same output position i; the outputs y and z are then such that dt(y[i]) = dt(z[i]). Since T is test-free, the existence of two runs over the same input x only depends on its finite labels. Then, the registers containing respectively x[j] and x[j ] should not be reassigned before being output, and should indeed output their content at the same position i (cf Figure 6). Besides, again because of test-freeness, we can always assume that x is such that x[j] = x[j ]. Overall, such pattern can be checked by a 2-counter Parikh automaton, whose emptiness is decidable in PTime [8] (under conditions that are satisfied here). Now, let us move to the case of continuity. Here again, the fact that test-free NRT conduct no test over the input data allows to focus on the only two registers that are responsible for the mismatch, the existence of an accepting run being only determined by finite labels. output z[i] y[i]= x[j] 234 L. Exibard et al. Theorem 24. Deciding whether a test-free NRT defines a continuous function is in PTime. Proof. Let T be a test-free NRT. First, it can be shown that T is continuous if and only if T has the pattern of Figure 7, where r is coaccessible (since acceptance only depends on finite labels, T can be trimmed in polynomial time). u | u v | v u | u v | v z | z q q q q i f f i r 0 0 Fig. 7. A pattern characterising non-continuity of functions defined by NRT, where we ask that there exist some states q , q and r, where q is accepting, as well as f f finite input data words u, v, z and finite output data words u ,v ,u ,v ,z such that mismatch(u ,u )∨(v = ε ∧ mismatch(u ,u z )). Register assignments are not depicted, as there are no conditions on them. We unrolled the loops to highlight the fact that they do not necessarily loop back to the same configuration. Now, it remains to show that such simpler pattern can be checked in PTime. We treat each part of the disjunction separately: u|u v|v u|u v|v (a) there exists u, u ,u ,v,v ,v s.t. i −−→ q −−→ q and i − −−→ q − −−→ 0 f f 0 q, where q ∈ F and mismatch(u ,u ). Then, as shown in the proof of Theorem 23, there exists a mismatch between some u and u produced by the same input u if and only if there exists two runs and two registers r and r assigned at two distinct positions, and later on output at the same position. Such pattern can similarly be checked by a 2-counter Parikh automaton; the only difference is that here, instead of checking that the two end states are coaccessible with a common ω-word, we only need to check that q ∈ F and that there is a synchronised loop over q and q, which are regular properties that can be checked by the Parikh automaton with only a polynomial increase. u|u v|v u|u v|ε (b) there exists u, u ,u ,v,v ,z,z s.t. i −−→ q −−→ q and i − −−→ q −−→ 0 f f 0 z|z q − −−→ r, where q ∈ F and mismatch(u ,u z ). By examining again the proof of Theorem 23, it can be shown that to obtain a mismatch, it suffices that the input is the same for both runs only up to position max(j, j ). More precisely, there is a mismatch between u and u z if and only if there exists two registers r and r and two positions j, j ∈{1,..., u} such that j =  j , r is stored at position j, r is stored at position j , r and r are respectively output at input positions l ∈{1,..., u} and l ∈{1,..., uz} and they are not reassigned in the meantime. Again, such property, along with the fact that q ∈ F and the existence of a synchronised loop can be checked by a 2-counter Parikh automaton of polynomial size. Overall, deciding whether a test-free NRT is continuous is in PTime. We say that T is trim when all its states are both accessible and coaccessible. On Computability of Data Word Functions Defined by Transducers 235 References 1. Berstel, J.: Transductions and Context-free Languages. Teubner Verlag (1979), http: // berstel/LivreTransductions/LivreTransductions.html 2. Carayol, A., L¨ oding, C.: Uniformization in Automata Theory. In: Proceedings of the 14th Congress of Logic, Methodology and Philosophy of Science, Nancy, July 19-26, 2011. pp. 153–178. London: College Publications (2014), https://hal. 3. Dave, V., Filiot, E., Krishna, S.N., Lhote, N.: Deciding the computability of regular functions over infinite words. CoRR abs/1906.04199 (2019), abs/1906.04199 4. Demri, S., Lazic, R.: LTL with the freeze quantifier and regis- ter automata. ACM Trans. Comput. Log. 10(3), 16:1–16:30 (2009). 5. Durand-Gasselin, A., Habermehl, P.: Regular transformations of data words through origin information. In: Foundations of Software Science and Computa- tion Structures - 19th International Conference, FOSSACS 2016, Held as Part of the European Joint Conferences on Theory and Practice of Software, ETAPS 2016, Eindhoven, The Netherlands, April 2-8, Proceedings. pp. 285–300 (2016). 17 6. Ehlers, R., Seshia, S.A., Kress-Gazit, H.: Synthesis with identifiers. In: Pro- ceedings of the 15th International Conference on Verification, Model Checking, and Abstract Interpretation - Volume 8318. pp. 415–433. VMCAI 2014 (2014). 23 7. Exibard, L., Filiot, E., Reynier, P.: Synthesis of data word transduc- ers. In: 30th International Conference on Concurrency Theory, CONCUR 2019, August 27-30, Amsterdam, the Netherlands. pp. 24:1–24:15 (2019). 8. Figueira, D., Libkin, L.: Path logics for querying graphs: Combining expres- siveness and efficiency. In: 30th Annual ACM/IEEE Symposium on Logic in Computer Science, LICS 2015, Kyoto, Japan, July 6-10. pp. 329–340 (2015). 9. Filiot, E., Jecker, I., L¨ oding, C., Winter, S.: On equivalence and uniformisation problems for finite transducers. In: 43rd International Colloquium on Automata, Languages, and Programming, ICALP 2016, July 11-15, Rome, Italy. pp. 125:1– 125:14 (2016). 10. Filiot, E., Mazzocchi, N., Raskin, J.: A pattern logic for automata with out- puts. In: Developments in Language Theory - 22nd International Conference, DLT 2018, Tokyo, Japan, September 10-14, Proceedings. pp. 304–317 (2018). 25 11. Gire, F.: Two decidability problems for infinite words. Inf. Process. Lett. 22(3), 135–140 (1986). 12. Holtmann, M., Kaiser, L., Thomas, W.: Degrees of lookahead in reg- ular infinite games. Logical Methods in Computer Science 8(3) (2012). 13. II, K.C., Pachl, J.K.: Equivalence problems for mappingson infinite strings. Information and Control 49(1), 52–63 (1981). 14. J.R. Buc ¨ hi, L.H. Landweber: Solving sequential conditions finite-state strate- gies. Transactions of the American Mathematical Society 138, 295–311 (1969). 236 L. Exibard et al. 15. Kaminski, M., Francez, N.: Finite-memory automata. Theor. Comput. Sci. 134(2), 329–363 (Nov 1994). 16. Khalimov, A., Kupferman, O.: Register-bounded synthesis. In: 30th International Conference on Concurrency Theory, CONCUR 2019, August 27-30, Amsterdam, the Netherlands. pp. 25:1–25:16 (2019). 17. Khalimov, A., Maderbacher, B., Bloem, R.: Bounded synthesis of register trans- ducers. In: Automated Technology for Verification and Analysis, 16th Interna- tional Symposium, ATVA 2018, Los Angeles, October 7-10. Proceedings (2018). 18. Libkin, L., Tan, T., Vrgoc, D.: Regular expressions for data words. J. Comput. Syst. Sci. 81(7), 1278–1297 (2015). 19. Neven, F., Schwentick, T., Vianu, V.: Finite state machines for strings over infinite alphabets. ACM Trans. Comput. Logic 5(3), 403–435 (Jul 2004). 20. Pnueli, A., Rosner, R.: On the synthesis of a reactive module. In: ACM Symposium on Principles of Programming Languages, POPL. ACM (1989). 21. Prieur, C.: How to decide continuity of rational functions on infinite words. Theor. Comput. Sci. 276(1-2), 445–447 (2002). 3975(01)00307-3 Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made. The images or other third party material in this chapter are included in the chapter’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. Minimal Coverability Tree Construction Made Complete and Efficient 1,3 1,2 1,2 Alain Finkel , Serge Haddad , and Igor Khmelnitsky (B) LSV, ENS Paris-Saclay, CNRS, Universit´ e Paris-Saclay, Cachan, France {finkel,haddad,khmelnitsky} Inria, France Institut Universitaire de France, France Abstract. Downward closures of Petri net reachability sets can be finitely represented by their set of maximal elements called the minimal cover- ability set or Clover. Many properties (coverability, boundedness, ...) can be decided using Clover, in a time proportional to the size of Clover. So it is crucial to design algorithms that compute it efficiently. We present a simple modification of the original but incomplete Minimal Coverability Tree algorithm (MCT), computing Clover, which makes it complete: it memorizes accelerations and fires them as ordinary transitions. Contrary to the other alternative algorithms for which no bound on the size of the required additional memory is known, we establish that the additional space of our algorithm is at most doubly exponential. Furthermore we have implemented a prototype MinCov which is already very competi- tive: on benchmarks it uses less space than all the other tools and its execution time is close to the one of the fastest tool. Keywords: Petri nets · Karp-Miller tree algorithm · Coverability · Min- imal coverability set · Clover · Minimal coverability tree. 1 Introduction Coverability and coverability set in Petri nets. Petri nets are iconic as an infinite-state model used for verifying concurrent systems. Coverability, in Petri nets, is the most studied property for several reasons: (1) many properties like mutual exclusion, safety, control-state reachability reduce to coverability, (2) the coverability problem is EXPSPACE-complete (while reachability is non ele- mentary), and (3) there exist efficient prototypes and numerous case studies. To solve the coverability problem, there are backward and forward algorithms. But these algorithms do not address relevant problems like the repeated coverability problem, the LTL model-checking, the boundedness problem and regularity of the traces. However these problems are EXPSPACE-complete [4, 1] and are also decid- able using the Karp-Miller tree algorithm (KMT) [11] that computes a finite tree The work was carried out in the framework of ReLaX, UMI2000 and also supported by ANR-17-CE40-0028 project BRAVAS. The Author(s) 2020 J. Goubault-Larrecq and B. K¨ onig (Eds.): FOSSACS 2020, LNCS 12077, pp. 237–256, 2020. 238 A. Finkel et al. labeled by a set of ω-markings C ⊆ N (where N is the set of naturals enlarged with an upper bound ω and P is the set of places) such that the reachability set and the finite set C have the same downward closure in N . Thus a marking m is coverable if there exists some m ≥ m with m ∈ C. Hence, C can be seen as one among all the possible finite representations of the infinite downward closure of the reachability set. This set C allows, for instance, to solve multiple instances of coverability in linear time linear w.r.t. the size of C avoiding to call many times a costly algorithm. Informally the KMT algorithm builds a reachability tree but, in order to ensure termination, substitutes ω to some finite components of a marking of a vertex when some marking of an ancestor is smaller. Unfortunately C may contain comparable markings while only the maximal elements are important. The set of maximal elements of C can be defined in- dependently of the KMT algorithm and was called the minimal coverability set (MCS) in [6] and abbreviated as the Clover in the more general framework of Well Structured Transition Systems (WSTS) [7]. The minimal coverability tree algorithm. So in [5, 6] the author computes the minimal coverability set by modifying the KMT algorithm in such a way that at each step of the algorithm, the set of ω-markings labelling vertices is an antichain. But this aggressive strategy, implemented by the so-called Minimal Coverability Tree algorithm (MCT), contains a subtle bug and it may compute a strict under-approximation of Clover as shown in [8, 10]. Alternative minimal coverability set algorithms. Since the discovery of this bug, three algorithms (with variants) [10, 14, 13] have been designed for computing the minimal coverability set without building the full Karp-Miller tree. In [10] the authors proposed a minimal coverability set algorithm (called CovProc) that is not based on the Karp-Miller tree algorithm but uses a similar but restricted introduction of ω’s. In [14], Reynier and Servais proposed a mod- ification of the MCT, called the Monotone-Pruning algorithm (called MP), that keeps but “deactivates” vertices labeled with smaller ω-markings while MCT would have deleted them. Recently in [15], the authors simplified their original proof of correctness. In [16], Valmari and Hansen proposed another algorithm (denoted below as VH) for constructing the minimal coverability set without deleting vertices. Their algorithm builds a graph and not a tree as usual. In [13], Piipponen and Valmari improved this algorithm by designing appropriate data structures and heuristics for exploration strategy that may significantly decrease the size of the graph. Our contributions. 1. We introduce the concept of abstraction as an ω-transition that mimics the effect of an infinite family of firing sequences of markings w.r.t. coverabil- ity. As a consequence adding abstractions to the net does not modify its coverability set. Moreover, the classical Karp-Miller acceleration can be for- malized as an abstraction whose incidence on places is either ω or null. The set of accelerations of a net is upward closed and well-ordered. Hence there exists a finite subset of minimal accelerations and we show that the size of all minimal acceleration is bounded by a double exponential. Minimal Coverability Tree Construction Made Complete and Efficient 239 2. Despite the current opinion that ”The flaw is intricate and we do not see an easy way to get rid of it....Thus, from our point of view, fixing the bug of the MCT algorithm seems to be a difficult task” [10], we have found a simple modification of MCT which makes it correct. It mainly consists in memorizing discovered accelerations and using them as ordinary transitions. 3. Contrary to all existing minimal coverability set algorithms that use an un- known additional memory that could be non primitive recursive, we show, by applying a recent result of Leroux [12], that the additional memory required for accelerations, is at most doubly exponential. 4. We have developed a prototype in order to also empirically evaluate the efficiency of our algorithm and the benchmarks (either from the literature or random ones) have confirmed that our algorithm requires significantly less memory than the other algorithms and is close to the fastest tool w.r.t. the execution time. Organization. Section 2 introduces abstractions and accelerations and studies their properties. Section 3 presents our algorithm and establishes its correctness. Section 4 describes our tool and discusses the results of the benchmarks. We conclude and give some perspectives to this work in Section 5. One can find all the missing proofs and an illustration of the behavior of the algorithm in [9]. 2 Covering abstractions 2.1 Petri nets: reachability and covering Here we define Petri nets differently from the usual way but in an equivalent manner. i.e. based on the backward incidence matrix Pre and the incidence matrix C. The forward incidence matrix is implicitly defined by C + Pre. Such a choice is motivated by the introduction of abstractions in section 2.2. Definition 1. A Petri net (PN) is a tuple N = P, T, Pre, C where: – P is a finite set of places; – T is a finite set of transitions, with P ∩ T = ∅; P ×T – Pre ∈ N is the backward incidence matrix; P ×T –C ∈ Z is the incidence matrix which fulfills: for all p ∈ P and t ∈ T , C(p, t)+ Pre(p, t) ≥ 0. A marked Petri net (N , m ) is a Petri net N equipped with an initial marking m ∈ N . The column vector of matrix Pre (resp. C) indexed by t ∈ T is denoted Pre(t) (resp. C(t)). A transition t ∈ T is fireable from a marking m ∈ N if m ≥ def Pre(t). When t is fireable from m, its firing leads to marking m = m + C(t), t  ∗ denoted by m m . One extends fireability and firing to a sequence σ ∈ T −→ by recurrence on its length. The empty sequence ε is always fireable and let the marking unchanged. Let σ = tσ be a sequence with t ∈ T and σ ∈ T . Then σ 240 A. Finkel et al. is fireable from m if m m and σ is fireable from m . The firing of σ from −→ m leads to the marking m reached by σ from m . One also denotes this firing by m m . −→ Definition 2. Let (N , m ) be a marked net. The reachability set Reach(N , m ) 0 0 is defined by: ∗ σ Reach(N , m )= {m |∃σ ∈ T m m} −→ 0 0 In order to introduce the coverability set of a Petri net, let us recall some definitions and results related to ordered sets. Let (X, ≤) be an ordered set. The downward (resp. upward) closure of a subset E ⊆ X is denoted by ↓ E (resp. ↑E) and defined by: ↓E = {x ∈ X |∃y ∈ Ey ≥ x} (resp. ↑E = {x ∈ X |∃y ∈ Ey ≤ x}) A subset E ⊆ X is downward (resp. upward) closed if E =↓E (resp. E =↑E). An antichain E is a set which fulfills: ∀x = y ∈ E ¬(x ≤ y ∨ y ≤ x). X is said FAC (for Finite AntiChains) if all its antichains are finite. A non empty set E ⊆ X is directed if for all x, y ∈ E there exists z ∈ E such that x ≤ z and y ≤ z.An ideal is a set which is downward closed and directed. There exists an equivalent characterization of FAC sets which provides a finite description of any downward closed set: a set is FAC if and only if every downward closed set admits a finite decomposition in ideals (a proof of this well-known result can be found in [3]). X is well founded if all its (strictly) decreasing sequences are finite. X is well ordered if it is FAC and well founded. There are many equivalent characteriza- tions of well order. For instance, a set X is well ordered if and only if for all sequence (x ) in X, there exists a non decreasing infinite subsequence. This n n∈N characterization allows to design algorithms that computes trees whose finiteness is ensured by well order. Let us recall that (N, ≤) and (N , ≤) are well ordered sets. We are now ready to introduce the cover (also called the coverability set) of a net and to state some of its properties. Definition 3. Let (N , m ) be a marked Petri net. Cover(N , m ), its coverabil- 0 0 ity set, is defined by: Cover(N , m )=↓Reach(N , m ) 0 0 Since the coverability set is downward closed and N is FAC, it admits a finite decomposition in ideals. The ideals of N can be defined in an elegant way as follows. One first extends the sets of naturals and integers: N = N ∪{ω} et Z = Z ∪{ω}. Then one extends the order relation and the addition to Z : ω ω for all n ∈ Z, ω> n and for all n ∈ Z , n + ω = ω + n = ω. N is also a well ordered set and its members are called ω-markings. There is a one-to-one P P mapping between ideals of N and ω-markings. Let m ∈ N . Define m by: m = {m ∈ N | m ≤ m} Minimal Coverability Tree Construction Made Complete and Efficient 241 m is an ideal of N (and all ideal can be defined in such a way). Let Ω be a set of ω-markings, Ω denotes the set m. Due to the above properties, m∈Ω there exists a unique finite set with minimal size Clover(N , m ) ⊆ N such that: Cover(N , m )= Clover(N , m ) 0 0 A more general result can be found in [3] for well structured transition systems. Example 1. The marked net of Figure 1 is unbounded. Its Clover is the following set: {p ,p + p ,p + p + ωp ,p + p + ωp + ωp } i bk m l m ba l bk ba c For instance, the marking p +p +αp +βp is reached thus covered by sequence l bk ba c α+β β t t t . 5 6 t p t 1 l 5 ba 2 bk i t t t 3 4 Fig. 1. An unbounded Petri net 2.2 Abstraction and acceleration In order to introduce abstractions and accelerations, we generalize the transitions to allow the capability to mark a place with ω tokens. Definition 4. Let P be a set of places. An ω-transition a is defined by: – Pre(a) ∈ N its backward incidence; –C(a) ∈ Z its incidence with Pre(a)+ C(a) ≥ 0. For sake of homogeneity, one denotes Pre(a)(p) (resp. C(a)(p)) by Pre(p, a) (resp. C(p, a)). An ω-transition a is fireable from an ω-marking m ∈ N if def m ≥ Pre(a). When a is fireable from m, its firing leads to the ω-marking m = m + C(a), denoted as previously m m . One observes that if Pre(p, a)= ω −→ then for all values of C(p, a), m (a)= ω. So without loss of generality, one assumes that for all ω-transition a, Pre(p, a)= ω implies C(p, a)= ω. In order to define abstractions, we first define the incidences of a sequence σ of def ω-transitions by recurrence on its length. As previously, we denote Pre(p, σ) = 242 A. Finkel et al. def Pre(σ)(p) and C(p, σ) = C(σ)(p). The base case corresponds to the definition of an ω-transition. Let σ = tσ , with t an ω-transition and σ a sequence of ω-transitions, then: –C(σ)= C(t)+ C(σ ); – for all p ∈ P • if C(p, t)= ω then Pre(p, σ)= Pre(p, t); • else Pre(p, σ) = max(Pre(p, t), Pre(p, σ ) − C(p, t)). One checks by recurrence that σ is firable from m if and only if m ≥ Pre(σ) and in this case, m m + C(σ). −→ An abstraction of a net is an ω-transition which concisely expresses the be- haviour of the net w.r.t. covering (see Proposition 1). One will observe that a transition t of a net is by construction (with σ = t for all n) an abstraction. Definition 5. Let N = P, T, Pre, C be a Petri net and a be an ω-transition. a is an abstraction if for all n ≥ 0, there exists σ ∈ T such that for all p ∈ P with Pre(p, a) ∈ N: 1. Pre(p, σ ) ≤ Pre(p, a); 2. If C(p, a) ∈ Z then C(p, σ ) ≥ C(p, a); 3. If C(p, a)= ω then C(p, σ ) ≥ n. The following proposition justifies the interest of abstractions. Proposition 1. Let (N , m ) be a marked Petri net, a be an abstraction and m be an ω-marking such that: m ⊆ Cover(N , m ) and m m . Then m  ⊆ −→ Cover(N , m ). ∗  ∗ Proof. Pick some m ∈ m . Denote n = max(m (p) | m (p)= ω) and  = max(Pre(p, σ ),n − C(p, σ ) | m(p)= ω). Let us define m ∈ m by: n n – If m(p) <ω then m (p)= m(p); – Else m (p)= . Let us check that σ is fireable from m . Let p ∈ P , – If m(p) <ω then m (p)= m(p) ≥ Pre(p, a) ≥ Pre(p, σ ); – Else m (p)=  ≥ Pre(p, σ ). Let us show that m + C(σ ) ≥ m . Let p ∈ P , – If m(p) <ω and C(p, a) <ω then m (p)+ C(p, σ ) ≥ m(p)+ C(p, a)= m (p) ≥ m (p); – If m(p) <ω and C(p, a)= ω then m (p)+ C(p, σ ) ≥ C(p, σ ) ≥ n ≥ n n m (p); – If m(p)= ω then m (p)+ C(p, σ ) ≥ n − C(p, σ )+ C(p, σ )= n ≥ m (p). n n n An easy way to build new abstractions consists in concatenating them. Minimal Coverability Tree Construction Made Complete and Efficient 243 Proposition 2. Let N = P, T, Pre, C be a Petri net and σ be a sequence of abstractions. Then the ω-transition a defined by Pre(a)= Pre(σ) and C(a)= C(σ) is an abstraction. We now introduce the underlying concept of the Karp and Miller construc- tion. Definition 6. Let N = P, T, Pre, C be a Petri net. One says that a is an acceleration if a is an abstraction such that C(a) ∈{0,ω} . The following proposition provides a way to get an acceleration from an arbitrary abstraction. Proposition 3. Let N = P, T, Pre, C be a Petri net and a be an abstraction. Define a an ω-transition as follows. For all p ∈ P : – If C(p, a) < 0 then Pre(p, a )= C(p, a )= ω; – If C(p, a)=0 then Pre(p, a )= Pre(p, a) and C(p, a )=0; – If C(p, a) > 0 then Pre(p, a )= Pre(p, a) and C(p, a )= ω. Then a is an acceleration. Let us study more deeply the set of accelerations. First we equip the set of ω-transitions with a“natural” order w.r.t. covering. Definition 7. Let P be a set of places and two ω-transitions a and a . a ≤ a if and only if Pre(a) ≤ Pre(a ) ∧ C(a) ≥ C(a ) In other words, a ≤ a if given any ω-marking m,if a is fireable from m then a is also fireable and its firing leads to a marking greater or equal that the one reached by the firing of a . Proposition 4. Let N be a Petri net. Then the set of abstractions of N is upward closed. Similarly, the set of accelerations is upward closed in the set of ω-transitions whose incidence belongs to {0,ω} . Proposition 5. The set of accelerations of a Petri net is well ordered. P P Proof. The set of accelerations is a subset of N ×{0,ω} (where P is the set of places) with the order obtained by iterating cartesian products of sets (N, ≤) and ({0,ω}, ≥). These sets are well ordered and the cartesian product preserves this property. So we are done. Since the set of accelerations is well ordered and it is upward closed, it is equal to the upward closure of the finite set of minimal accelerations. Let us study the size of a minimal acceleration. Given some Petri net, one denotes d = |P | and e = max (max(Pre(p, t), Pre(p, t)+ C(p, t)). p,t We are going to use the following result of J´erˆome Leroux (published on HAL in June 2019) which provides a bound for the lengths of shortest sequences between two markings m and m mutually reachable. 1 2 244 A. Finkel et al. Theorem 1. (Theorem 2, [12]) Let N be a Petri net, m , m be markings, 1 2 σ σ 1 2 σ ,σ be sequences of transitions such that m −→ m −→ m . Then there exist 1 2 1 2 1 σ σ 1 2 σ ,σ such that m −→ m −→ m fulfilling: 1 2 1 1 2 2d+4 (d+1) |σ σ |≤||m − m || (3de) 1 2 1 2 ∞ One deduces an upper bound on the size of minimal accelerations. Let v ∈ N . One denotes ||v|| = max(v(p) | v(p) ∈ N). ω ∞ Proposition 6. Let N be a Petri net and a be a minimal acceleration. 2d+4 (d+1) Then ||Pre(a)|| ≤ e(3de) . Proof. Let us consider the net N = P ,T , Pre , C  obtained from N by deleting the set of places {p | Pre(p, a)= ω} and adding the set of transitions T = {t | p ∈ P } with Pre(t )= p et C(t )= −p. Observe that d ≤ d and 1 p p p e = e. One denotes P = {p | Pre(p, a) <ω = C(p, a)}. One introduces m the 1 1 marking obtained by restricting Pre(a)to P and m = m + p. 2 1 p∈P Let {σ } be a family of sequences associated with a. Let n = ||Pre(a)|| +1. n n∈N ∞ Then σ is fireable in N from m and its firing leads to a marking that covers n 1 m . By concatenating some occurrences of transitions of T , one gets a firing 2 1 sequence in N m m . Using the same process, one gets a firing sequence 1 −→ 2 m m . 2 −→ 1 Let us apply Theorem 1. There exists a sequence σ with m m and |σ |≤ 1 −→ 2 1 1 2d+4 (d+1) (3de) since ||m −m || = 1. By deleting the transitions of T occurring 1 2 ∞ 1 ∗ 1 in σ , one gets a sequence σ ∈ T such that m m ≥ m with |σ |≤ 1 −→ 2 1 1 2 1 2d+4 (d+1) (3de) . The ω-transition a , defined by Pre(p, a )= Pre(p, σ ) for all p ∈ P , Pre(p, a )= ω for all p ∈ P \ P and C(a )= C(a), is an acceleration whose associated family is {σ } . By definition of m , a ≤ a. Since a is minimal, a = a. n∈N 1 2d+4 (d+1) Observing that |σ |≤ (3de) , one gets ||Pre(a)|| = ||Pre(a )|| ≤ ∞ ∞ 2d+4 (d+1) e(3de) . Thus given any acceleration, one can easily obtain a smaller acceleration whose (representation) size is exponential. Proposition 7. Let N be a Petri net and a be an acceleration. Then the ω-transition trunc(a) defined by: –C(trunc(a)) = C(a); – for all p such that Pre(p, a) = ω, 2d+4 (d+1) Pre(p, trunc(a)) = min(Pre(p, a),e(3de) ) ; – for all p such that Pre(p, a)= ω, Pre(p, trunc(a)) = ω. is an acceleration. Proof. Let a ≤ a, be a minimal acceleration. For all p such that Pre(p, a) = ω, 2d+4 (d+1) Pre(p, a ) ≤ e(3de) .So a ≤ trunc(a). Since the set of accelerations is upward closed, one gets that trunc(a) is an acceleration. Minimal Coverability Tree Construction Made Complete and Efficient 245 3 A coverability tree algorithm 3.1 Specification and illustration As discussed in the introduction, to compute the clover of a Petri net, most algorithms build coverability trees (or graphs), which are variants of the Karp and Miller tree with the aim of reducing the peak memory during the execution. The seminal algorithm [6] is characterized by a main difference with the KMT construction: when finding that the marking associated with the current vertex strictly covers the marking of another vertex, it deletes the subtree issued from this vertex, and when the current vertex belonged to the removed subtree it sub- stitutes it to the root of the deleted subtree. This operation drastically reduces the peak memory but as shown in [8] entails incompleteness of the algorithm. Like the previous algorithms that ensure completeness with deletions, our algorithm also needs additional memory. However unlike the other algorithms, it memorizes accelerations instead of ω-markings. This approach has two advan- tages. First, we are able to exhibit a theoretical upper bound on the additional memory which is doubly exponential, while the other algorithms do not have such a bound. Furthermore, accelerations are reused in the construction and thus may even shorten the execution time and peak space w.r.t. the algorithm in [6]. Before we delve into a high level description of this algorithm, let us present some of the variables, functions, and definitions used by the algorithm. Algorithm 1, denoted from now on as MinCov takes as an input a marked net (N , m ) and constructs a directed labeled tree CT =(V, E, λ, δ), and a set Acc of ω- transitions (which by Lemma 2 are accelerations). Each v ∈ V is labeled by an ω-marking, λ(v) ∈ N . Since CT is a directed tree, every vertex v ∈ V , has a predecessor (except the root r) denoted by prd(v) and a set of descendants denoted by Des(v). By convention, prd(r)= r. Each edge e ∈ E is labeled by a firing sequence δ(e) ∈ T ·Acc , consisting of an ordinary transition followed by a δ(prd(v),v) sequence of accelerations (which by Lemma 1 fulfills λ(prd(v)) − −−−−−−→ λ(v)). δ(r,r) In addition, again by Lemma 1, m −−−→ λ(r). Let γ = e e ...e ∈ E be 0 1 2 k a path in the tree, we denote by δ(γ):= δ(e )δ(e ) ...δ(e ) ∈ (T ∪ Acc) . The 1 2 k subset Front ⊂ V is the set of vertices ‘to be processed’. MinCov may call function Delete(v) that removes from V a leaf v of CT and function Prune(v) that removes from V all descendants of v ∈ V except v itself as illustrated in the following figure: Delete(u) Prune(v) v u v v First MinCov does some initializations, and sets the tree CT to be a single vertex r with marking λ(r)= m and Front = {r}. Afterwards the main loop 0 246 A. Finkel et al. builds the tree, where each iteration consists in processing some vertex in Front as follows. MinCov picks a vertex u ∈ Front (line 3). From λ(u), MinCov fires a sequence σ ∈ Acc reaching some m that maximizes the number of ω produced, i.e. |{p ∈ P | λ(u)(p) = ω ∧ m (p)= ω}|.Thusin σ, no acceleration occurs twice and its length is bounded by |P |. Then MinCov updates λ(u) with m (line 5) and the label of the edge incoming to u by concatenating σ. Afterwards it performs one of the following actions according to the marking λ(u): – Cleaning (line 7): If there exists u ∈ V \ Front with λ(u ) ≥ λ(u). The vertex u is redundant and MinCov calls Delete(u) – Accelerating (lines 8-16): If there exists u , an ancestor of u with λ(u ) < λ(u) then an acceleration can be computed. The acceleration a is deduced from the firing sequence labeling the path from u to u. MinCov inserts a into Acc, calls Prune(u ) and pushes back u in Front. – Exploring (lines 18 - 25): Otherwise MinCov calls Prune(u ) followed by Delete(u ) for all u ∈ V with λ(u ) <λ(u) since they are redundant. Afterwards, it removes u from Front and for all fireable transition t ∈ T from λ(u), it creates a new child for u in CT and inserts it into Front. For a detailed example of a run of the algorithm see Example 2 in [9]. 3.2 Correctness Proof We now establish the correctness of Algorithm 1 by proving the following prop- erties (where for all W ⊆ V , λ(W ) denotes λ(v)): v∈W – its termination; – the incomparability of ω-markings associated with vertices in V : λ(V ) is an antichain; – its consistency: λ(V ) ⊆ Cover(N , m ); – its completeness: Cover(N , m ) ⊆ λ(V ). We get termination by using the well order of N and Koenig Lemma. Proposition 8. MinCov terminates. Proof. Consider the following variation of the algorithm. Instead of deleting the current vertex when its marking is smaller or equal than the marking of a vertex, one marks it as ‘cut’ and extract it from Front. Instead of cutting a subtree when the marking of the current vertex v is greater than the marking of a vertex which is not an ancestor of v, one marks them as ‘cut’ and extract from Front those who are inside. Instead of cutting a subtree when the marking of the current vertex v is greater than the marking of a vertex which is an ancestor of v,say v , one marks those on the path from v to v (except v) as ‘accelerated’, one marks the other vertices Minimal Coverability Tree Construction Made Complete and Efficient 247 Algorithm 1: Computing the minimal coverability set MinCov(N , m ) Input: A marked Petri net (N , m ) p ∗ Data: V set of vertices; E ⊆ V × V ; Front ⊆ V ; λ : V → N ; δ : E → T Acc ; CT =(V, E, λ, δ) a labeled tree; Acc a set of ω-transitions; Output: A labeled tree CT =(V, E, λ, δ) 1 V ←{r}; E ←∅; Front ←{r}; λ(r) ← m ; Acc ←∅; δ(r, r) ← ε 2 while F ront = ∅ do 3 Select u ∈ Front 4 Let σ ∈ Acc a maximal fireable sequence of accelerations from λ(u) // Maximal w.r.t. the number of ω’s produced 5 λ(u) ← λ(u)+ C(σ) 6 δ((prd(u),u)) ← δ((prd(u),u)) · σ 7 if ∃u ∈ V \ Front s.t. λ(u ) ≥ λ(u) then Delete(u) // λ(u) is covered 8 else if ∃u ∈ Anc(V ) s.t. λ(u) >λ(u ) then // An acceleration was found between u and one of u’s ancestors 9 Let γ ∈ E the path from u to u in CT 10 a ← NewAcceleration() 11 foreach p ∈ P do 12 if C(p, δ(γ)) < 0 then Pre(p, a) ← ω; C(p, a) ← ω 13 if C(p, δ(γ)) = 0 then Pre(p, a) ← Pre(p, δ(γ)); C(p, a) ← 0 14 if C(p, δ(γ)) > 0 then Pre(p, a) ← Pre(p, δ(γ)); C(p, a) ← ω 15 end 16 a ← trunc(a); Acc ← Acc ∪{a}; Prune(u ); Front = Front ∪{u } ; 17 else 18 for u ∈ V do // Remove vertices labeled by markings covered by λ(u) 19 if λ(u ) <λ(u) then Prune(u ); Delete(u ) 20 end 21 Front ← Front \{u} 22 foreach t ∈ T ∧ λ(u) ≥ Pre(t) do // Add the children of u 23 u ← NewNode(); V ← V ∪{u }; Front ← Front ∪{u }); E ← E ∪{(u, u )} 24 λ(u ) ← λ(u)+ C(t); δ((u, u )) ← t 25 end 26 end 27 end 28 return CT 248 A. Finkel et al. of the subtree as ‘cut’ and inserts v again in Front with the marking of v . All the markings of the subtree in Front are extracted from it. All the vertices marked as ‘cut’ or ‘accelerated’ are ignored for comparisons and discovering accelerations. This alternative algorithm behaves as the original one except that the size of the tree never decreases and so if the algorithm does not terminate the tree is infinite. Since this tree is finitely branching, due to Koenig Lemma it contains an infinite path. On this infinite path, no vertex can be marked as ‘cut’ since it would belong to a finite subtree. Observe that the marking labelling the vertex following an accelerated subpath has at least one more ω than the marking of the first vertex of this subpath. So there is an infinite subpath with unmarked vertices in V . But N is well-ordered, so there should be two vertices v and v , where v is a descendant of v with λ(v ) ≥ λ(v), which contradicts the behaviour of the algorithm. Since we are going to use recurrence on the number of iterations of the main loop of Algorithm 1, we introduce the following notations: CT =(V ,E ,λ ,δ ), n n n n n Front , and Acc are the the values of variables CT, Front, and Acc at line 2 n n when n iterations have been executed. Proposition 9. For all n ∈ N, λ(V \ Front ) is an antichain. Thus on termi- n n nation, λ(V ) is an antichain. Proof. Let us introduce V := V \ Front and V := V \ Front . We are going n n to prove by induction on the number n of iterations of the while-loop that V is an antichain. MinCov initializes variables V and Front at line 1. So V = {r} and Front = {r}, therefore V = V \ Front = ∅ is an antichain. 0 0 0 Assume that V = V \ Front is an antichain. Modifying V can be done by n n n n adding or removing vertices from V and removing vertices from Front while n n keeping them in V . The actions that MinCov may perform in order to modify the sets V and Front are: Delete (lines 7 and 19), Prune (lines 16 and 19), adding vertices to V (line 23), adding vertices to Front (lines 16 and 23), and removing vertices from Front (line 21). • Both Delete and Prune do not add new vertices to V . Thus the antichain feature is preserved. • MinCov may add vertices to V only at line 23 where it simultaneously adds them to Front and therefore does not add new vertices to V . Thus the antichain feature is preserved. • Adding vertices to Front may only remove vertices from V . Thus the antichain feature is preserved. • MinCov can only add a vertex to V when it removes it from Front while keeping it in V . This is done only at line 21. There the only vertex MinCov may remove (line 21) is the working vertex u. However if (in the iteration) MinCov reaches line 21 then it did not reach line 7 hence, (1) all markings of λ(V ) ⊆ λ(V ) are either smaller or incomparable to λ (u). Moreover, MinCov has also reached n+1 line 18-20, where (2) it performs Delete on all vertices u ∈ V ⊆ V with λ (u ) <λ (u). Let us denote by V ⊆ V the set V at the end of line n n+1 n n Minimal Coverability Tree Construction Made Complete and Efficient 249 20. Due to (1) and (2), marking λ (u) is incomparable to any marking in n+1 λ (V ). Since V ⊆ V , λ (V ) is an antichain. Combining this fact with n+1 n+1 n n n n the incomparability between λ (u) and any marking in λ (V ), we conclude n+1 n+1 that the set λ (V )= λ (V ) ∪{λ (u)} is an antichain. n+1 n+1 n+1 n+1 n In order to establish consistency, we prove that the labelling of vertices and edges is compatible with the firing rule and that Acc is a set of accelerations. δ(prd(u),u) Lemma 1. For all n ∈ N, for all u ∈ V \{r}, λ (prd(u)) − −−−−−−→ λ (u) n n n δ(r,r) and m −−−→ λ (r). 0 n Proof. Let us prove by induction on the number n of iterations of the main loop that for all v ∈ V , the assertions of the lemma hold. Initially, V = {r} and n 0 λ (r)= m . Since m m = λ (r) the base case is established. 0 0 0 −→ 0 0 Assume that the assertions hold for CT . Observe that MinCov may change the labeling function λ and/or add new vertices in exactly two places: at lines 4-6 and at lines 22-25. Therefore in order to prove the assertion, we show that after each group of lines it still holds. • After lines 4-6: MinCov computes (1) a maximal fireable sequence σ ∈ Acc from λ (u) (line 4), and updates u’s marking to m = λ (u)+ C(σ) (line 5). n u n δ(prd(u),u) Since the assertions hold for CT , (2) if u = r, λ (prd(u)) − −−−−−−→ λ (u) else n n n δ(r,r) δ(prd(u),u)σ m −−−→ λ (r). By concatenation, we get λ (prd(u)) − −−−−−−−→ m if u = r 0 n n u δ(r,r)σ and otherwise m −−−−→ m which establishes that the assertions hold after 0 u line 6. • After lines 22-25: The vertices for which λ is updated at these lines are the children of u that are added to the tree. For every fireable transition t ∈ T from λ(u), MinCov creates a child v for u (lines 22-23). The marking of any child v is set to m (v):= m (u)+ C(t) (line 24). Therefore since λ (u) → − t n+1 n+1 n+1 λ (v ), the assertions hold. n+1 t Lemma 2. At any execution point of MinCov, Acc is a set of accelerations. Proof. At most one acceleration is added per iteration. Let us prove by induction on the number n of iterations of the main loop that Acc is a set of accelerations. Since Acc = ∅, the base case is straightforward. Assume that Acc is a set of accelerations and consider Acc . In an itera- n n+1 tion, MinCov may add an ω-transition a to Acc. Due to the inductive hypothe- sis, δ(γ) is a sequence of abstractions where γ is defined at line 9. Consider b, the ω-transition defined by Pre(b)= Pre(δ(γ)) and C(b)= C(δ(γ)). Due to Proposition 2, b is an abstraction. Due to Proposition 3, the loop of lines 11-15 transforms b into an acceleration a. Due to Proposition 7, after truncation at line 16, a is still an acceleration. Proposition 10. λ(V ) ⊆ Cover(N , m ). 0 250 A. Finkel et al. Proof. Let v ∈ V . Consider the path u ,...,u of CT from the root r = u 0 k 0 to u = v. Let σ ∈ (T ∪ Acc) denote δ(prd(u ),u ) ··· δ(prd(u ),u ). Due to k 0 0 k k Lemma 1, m λ(v). Due to Lemma 2, σ is a sequence of abstractions. Due to 0 −→ Proposition 2, the ω-transition a defined by Pre(a)= Pre(σ) and C(a)= C(σ) is an abstraction. Due to Proposition 1, λ(v) ⊆ Cover(N , m ). The following definitions are related to an arbitrary execution point of MinCov and are introduced to establish its completeness. Definition 8. Let σ = σ t σ ...t σ with for all i, t ∈ T and σ ∈ Acc . Then 0 1 1 k k i i the firing sequence m m is an exploring sequence if: −→ – There exists v ∈ Front with λ(v)= m – For all 0 ≤ i ≤ k, there does not exist v ∈ V \ Front with m + C(σ t σ ...t σ ) ≤ λ(v ). 0 1 1 i i Definition 9. Let m be a marking. Then m is quasi-covered if: – either there exists v ∈ V \ Front with λ(v) ≥ m ; – or there exists an exploring sequence m m ≥ m . −→ In order to prove completeness of the algorithm, we want to prove that at the beginning of every iteration, any m ∈ Cover(N , m ) is quasi-covered. To establish this assertion, we introduce several lemmas showing that this assertion is preserved by some actions of the algorithm with some prerequisites. More pre- cisely, Lemma 3 corresponds to the deletion of the current vertex, Lemma 4 to the discovery of an acceleration, Lemma 5 to the deletion of a subtree whose mark- ing of the root is smaller than the marking of the current vertex and Lemma 6 to the creation of the children of the current vertex. Lemma 3. Let CT , Front and Acc be the values of corresponding variables at some execution point of MinCov and u ∈ V bealeafin CT such that the following items hold: 1. All m ∈ Cover(N , m ) are quasi-covered; 2. λ(V \ Front) is an antichain; 3. For all a ∈ Acc fireable from λ(u), λ(u)= λ(u)+ C(a); 4. There exists v ∈ V \{u} such that λ(v) ≥ λ(u). Then all m ∈ Cover(N , m ) are quasi-covered after performing Delete(u). Lemma 4. Let CT , Front and Acc be the values of corresponding variables at some execution point of MinCov. and u ∈ V such that the following items hold: 1. All m ∈ Cover(N , m ) are quasi-covered; 2. λ(V \ Front) is an antichain; δ(prd(v),v) 3. For all v ∈ V \{r}, λ(prd(v)) − −−−−−−→ λ(v). Then all m ∈ Cover(N , m ) are quasi-covered after performing Prune(u) and then adding u to Front. Minimal Coverability Tree Construction Made Complete and Efficient 251 Lemma 5. Let CT , Front and Acc be the values of corresponding variables at some execution point of MinCov, u ∈ Front and u ∈ V such that the following items hold: 1. All m ∈ Cover(N , m ) are quasi-covered; 2. λ(V \ Front) is an antichain; δ(prd(v),v) 3. For all v ∈ V \{r}, λ(prd(v)) − −−−−−−→ λ(v); 4. λ(u ) <λ(u) and u is not a descendant of u . Then after performing Prune(u ); Delete(u ), 1. All m ∈ Cover(N , m ) are quasi-covered; 2. λ(V \ Front) is an antichain; δ(prd(v),v) 3. For all v ∈ V \{r}, λ(prd(v)) − −−−−−−→ λ(v). Lemma 6. Let CT , Front and Acc be the values of corresponding variables at some execution point of MinCov. and u ∈ Front such that the following items hold: 1. All m ∈ Cover(N , m ) are quasi-covered; 2. λ(V \ Front) ∪{λ(u)} is an antichain; 3. For all a ∈ Acc fireable from λ(u), λ(u)= λ(u)+ C(a). Then after removing u from Front and for all t ∈ T fireable from λ(u), adding a child v to u in Front with marking of v defined by λ (v )= λ(u)+ C(t),all t t u t m ∈ Cover(N , m ) are quasi-covered. Proposition 11. At the beginning of every iteration, all m ∈ Cover(N , m ) are quasi-covered. Proof. Let us prove by induction on the number of iterations that all m ∈ Cover(N , m ) are quasi-covered. Let us consider the base case. MinCov initializes V and Front to {r} and λ(r)to m . By definition, for all m ∈ Cov(N , m ) there exists σ = t t ··· t ∈ T such 0 0 1 2 k that m − → m ≥ m. Since V \ Front = ∅, this firing sequence is an exploring sequence. Assume that all m ∈ Cover(N , m ) are quasi-covered at the beginning of some iteration. Let us examine what may happen during the iteration. In lines 4-6, MinCov computes the maximal fireable sequence σ ∈ Acc from λ (u) (line 4) and sets u’s marking to m := λ (u)+ C(σ) (line 5). Afterwards, there are u n three possible cases: (1) either m is covered by some marking associated with a vertex out of Front, (2) either an acceleration is found, (3) or MinCov computes the successors of u and removes u from Front. Line 7. MinCov calls Delete(u). So CT is obtained by deleting u. More- n+1 over, λ(u ) ≥ m . Let us check the hypotheses of Lemma 3. Assertion 1 follows from induction since (1) the only change in the data is the increas- ing of λ(u) by firing some accelerations and (2) u belongs to Front so cannot 252 A. Finkel et al. cover intermediate markings of exploring sequences. Assertion 2 follows from Proposition 9 since V \ Front is unchanged. Assertion 3 follows immediately from lines 4-6. Assertion 4 follows with v = u . Thus using this lemma the induction is proved in this case. Lines 8-16. Let us check the hypotheses of Lemma 4. Assertions 1 and 2 are established as in the previous case. Assertion 3 holds due to Lemma 1, and the fact that no edge has been added since the beginning of iteration. Thus using this lemma the induction is proved in this case. Lines 18-25. We first show that the hypotheses of Lemma 6 hold before line 21. Let us denote the values of CT and Front after line 20 by CT and Front . n n Observe that for all iteration of Line 19 in the inner loop, the hypotheses of Lemma 5 are satisfied. Therefore, in order to apply Lemma 6 it remains only to check assertions 2 and 3 of this lemma. Assertion 2 holds since (1) λ(V \ Front) is an antichain, (2) due to Line 7 there is no w ∈ V \ Front such that λ(w) ≥ λ(u), and (3) by iteration of Line 19 all w ∈ V \ Front such that λ(w) <λ(u) have been deleted. Assertion 3 holds due to Line 5 (all useful enabled accelerations have been fired) and Line 8 (no acceleration has been added). Lines 21-25 correspond to the operations related to Lemma 6. Thus using this lemma, the induction is proved in this case. The completeness of MinCov is an immediate consequence of the previous proposition. Corollary 1. When MinCov terminates, Cover(N , m ) ⊆ λ(V ). Proof. By Proposition 11 all m ∈ Cover(N , m ) are quasi-covered. Since on termination, Front is empty for all m ∈ Cover(N , m ), there exists v ∈ V such that m ≤ λ(v). 4 Tool and benchmarks In order to empirically evaluate our algorithm, we have implemented a prototype tool which computes the clover and solves the coverability problem. This tool is developed in the programming language Python, using the Numpy library. It can be found on GitHub . All benchmarks were performed on a computer equipped by Intel i5-8250U CPU with 4 cores, 16GB of memory and Ubuntu Linux 18.03. Minimal coverability set. We compare MinCov with the tool MP [14], the tool VH [16], and the tool CovProc [10]. We have also implemented the (incomplete) minimal coverability tree algorithm denoted by AF in order to measure the ad- ditional memory needed for the (complete) tools. Both MP and VH tools were sent to us by the courtesy of the authors. The tool MP has an implementation Minimal Coverability Tree Construction Made Complete and Efficient 253 ++ in Python and another in C . For comparison we selected the Python one to avoid biases due to programming language. We ran two kinds of benchmarks: (1) 123 standard benchmarks from the literature in Table 1, (which were taken from [2]), (2) 100 randomly generated Petri nets also in Table 1, since the benchmarks from the literature do not present all the features that lead to infinite state systems. These random Petri nets have the following properties: (1) 50 < |P |, |T | < 100, (2) the number of places connected of each transition is bounded by 10, and (3) they are not structurally bounded. The execution time of the tools was limited to 900 seconds. Table 1 contains a summary of all the instances of the benchmarks. The first column shows the number of instances on which the tool timed out. The time column consists of the total time on instances that did not time out plus 900 seconds for any instance that led to a time out. The #Nodes column consists of the peak number of nodes in instances that did not time out on any of the tools (except CovProc which does not provide this number). For MinCov we take the peak number of nodes plus accelerations. In the benchmarks from the literature Table 1. Benchmarks for clover 123 benchmarks from the literature 100 random benchmarks T/O Time #Nodes T/O Time #Nodes MinCov 16 18127 48218 MinCov 14 13989 61164 VH 15 14873 75225 VH 15 13692 208134 MP 24 23904 478681 MP 21 21726 755129 CovProc 49 47081 N/A CovProc 80 74767 N/A AF 19 19223 45660 AF 16 15888 63275 we observed that the instances that timed out from MinCov are included in those of AF and MP. However there were instances the timed out on VH but did not time out on MinCov and vice versa. MinCov is the second fastest tool, and compared to VH it is 1.2 times slower. A possible explanation would be that VH is ++ implemented in C . As could be expected, w.r.t. memory requirements MinCov has the least number of nodes. In the benchmarks from the literature MinCov has approximately 10 times less nodes then MP and 1.6 times less then VH. In the random benchmarks these ratio are significantly higher. Coverability. We compare MinCov to the tool qCover [2] on the set of bench- marks from the literature in Table 2. In [2], qCover is compared to the most competitive tools for coverability and achieves a score of 142 solved instances while the second best tool achieves a score of 122. We split the results into safe instances (not coverable) and unsafe ones (coverable). In both categories we counted the number of instances on which the tools failed (columns T/O) and the total time (columns Time) as in Table 1. We observed that the tools are complementary, i.e. qCover is faster at proving that an instance is safe and MinCov is faster at proving that an instance is unsafe. 254 A. Finkel et al. Table 2. Benchmarks for the coverability problem (60 unsafe and 115 safe) Time Unsafe T/O Unsafe Time safe T/O safe T/O Time MinCov 1754 1 51323 53 54 53077 qCover 26467 26 11865 11 37 38332 MinCov qCover 1841 2 13493 11 13 15334 Therefore, by splitting the processing time between them we get better results. The third row of Table 2 represents a parallel execution of the tools, where the time for each instance is computed as follows: Time(MinCov  qCover) = 2 min (Time(MinCov), Time(qCover)) . Combining both tools is 2.5 times faster than qCover and 3.5 times faster than MinCov. This confirms the above statement. We could still get better results by dynamically deciding which ratio of CPU to share between the tools depending on some predicted status of the instance. 5 Conclusion We have proposed a simple and efficient modification of the incomplete mini- mal coverability tree algorithm for building the clover of a net. Our algorithm is based on the introduction of the concepts of covering abstractions and accel- erations. Compared to the alternative algorithms previously designed, we have theoretically bounded the size of the additional space. Furthermore we have implemented a prototype which is already very competitive. From a theoretical point of view, we plan to study how abstractions and accelerations, could be defined in the more general context of well structured transition systems. From an experimental point of view, we will follow three directions in order to increase the performance of our tool. First as in [13], we have to select appropriate data structures to minimize the number of compar- isons between ω-markings. Then we want to precompute a set of accelerations using linear programming as the correctness of the algorithm is preserved and the efficiency could be significantly improved. Last we want to take advantage of parallelism in a more general way than simultaneously running several tools. Minimal Coverability Tree Construction Made Complete and Efficient 255 References 1. Blockelet, M., Schmitz, S.: Model checking coverability graphs of vector addition systems. In: Proceedings of MFCS 2011. LNCS, vol. 6907, pp. 108–119 (2011) 2. Blondin, M., Finkel, A., Haase, C., Haddad, S.: Approaching the coverability prob- lem continuously. In: Proceedings of TACAS 2016. LNCS, vol. 9636, pp. 480–496. Springer (2016) 3. Blondin, M., Finkel, A., McKenzie, P.: Well behaved transition systems. Logical Methods in Computer Science 13(3), 1–19 (2017) 4. Demri, S.: On selective unboundedness of VASS. J. Comput. Syst. Sci. 79(5), 689– 713 (2013) 5. Finkel, A.: Reduction and covering of infinite reachability trees. Information and Computation 89(2), 144–179 (1990) 6. Finkel, A.: The minimal coverability graph for Petri nets. In: Advances in Petri Nets. LNCS, vol. 674, pp. 210–243 (1993) 7. Finkel, A., Goubault-Larrecq, J.: Forward analysis for WSTS, part II: Complete WSTS. Logical Methods in Computer Science 8(4), 1–35 (2012) 8. Finkel, A., Geeraerts, G., Raskin, J.F., Van Begin, L.: A counter-example to the minimal coverability tree algorithm. Tech. rep., Universit´e Libre de Bruxelles, Bel- gium (2005), 9. Finkel, A., Haddad, S., Khmelnitsky, I.: Minimal coverability tree construction made complete and efficient (2020), 10. Geeraerts, G., Raskin, J.F., Van Begin, L.: On the efficient computation of the min- imal coverability set of Petri nets. International Journal of Fundamental Computer Science 21(2), 135–165 (2010) 11. Karp, R.M., Miller, R.E.: Parallel program schemata. J. Comput. Syst. Sci. 3(2), 147–195 (1969) 12. Leroux, J.: Distance between mutually reachable Petri net configurations (Jun 2019),, preprint 13. Piipponen, A., Valmari, A.: Constructing minimal coverability sets. Fundamenta Informaticae 143(3–4), 393–414 (2016) 14. Reynier, P.A., Servais, F.: Minimal coverability set for Petri nets: Karp and Miller algorithm with pruning. Fundamenta Informaticae 122(1–2), 1–30 (2013) 15. Reynier, P.A., Servais, F.: On the computation of the minimal coverability set of Petri nets. In: Proceedings of Reachability Problems 2019. LNCS, vol. 11674, pp. 164–177 (2019) 16. Valmari, A., Hansen, H.: Old and new algorithms for minimal coverability sets. Fundamenta Informaticae 131(1), 1–25 (2014) 256 A. Finkel et al. Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made. The images or other third party material in this chapter are included in the chapter’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. Constructing Infinitary Quotient-Inductive Types () Marcelo P. Fiore , Andrew M. Pitts , and S. C. Steenkamp Department of Computer Science and Technology University of Cambridge, Cambridge CB3 0FD, UK Abstract This paper introduces an expressive class of quotient-induct- ive types, called QW-types. We show that in dependent type theory with uniqueness of identity proofs, even the infinitary case of QW-types can be encoded using the combination of inductive-inductive definitions involving strictly positive occurrences of Hofmann-style quotient types, and Abel’s size types. The latter, which provide a convenient constructive abstraction of what classically would be accomplished with transfinite ordinals, are used to prove termination of the recursive definitions of the elimination and computation properties of our encoding of QW-types. The development is formalized using the Agda theorem prover. Keywords: dependent type theory · higher inductive types · induct- ive-inductive definitions · quotient types · sized types · category theory 1 Introduction One of the key features of proof assistants based on dependent type theory such as Agda, Coq and Lean is their support for inductive definitions of families of types. Homotopy Type Theory [29] introduces a potentially very useful extension of the notion of inductive definition, the higher inductive types (HITs). To define an ordinary inductive type one declares how its elements are constructed. To define a HIT one not only declares element constructors, but also declares equality constructors in identity types (possibly iterated ones), specifying how the constructed elements and identities are to be equated. In this paper we work in a dependent type theory satisfying uniqueness of identity proofs (UIP), so that identity types are trivial in dimensions higher than one. Nevertheless, as Altenkirch and Kaposi [5] point out, HITs are still useful in such a one-dimensional setting. They introduce the term quotient inductive type (QIT) for this truncated form of HIT. Figure 1 gives two examples of QITs, using Agda-style notation for dependent type theory; in particular, Set denotes a universe of types and ≡ denotes the identity type. The first example specifies the element and equality constructors for the type Bag X of finite multisets of elements from a type X. The second example, adapted from [5], specifies the element and equality constructors for the type ωTree X of trees whose nodes are labelled with elements of X and that have unordered countably infinite branching. Both examples illustrate the nice feature The Author(s) 2020 J. Goubault-Larrecq and B. K¨ onig (Eds.): FOSSACS 2020, LNCS 12077, pp. 257–276, 2020. 258 M. P. Fiore et al. Finite multisets: data Bag(X : Set): Set where [] : Bag X _::_ : X → Bag X → Bag X swap :(xy : X)(ys : Bag X) → x :: y :: ys ≡ y :: x :: ys Unordered countably branching trees (elements of isIso f witness that f is a bijection): data ωTree(X : Set): Set where leaf : ωTree X node : X → (N → ωTree X) → ωTree X perm :(x : X)(f : N → N)(_ : isIso f)(g : N → ωTree X) → node xg ≡ node x (g ◦ f) Figure 1. Two examples of QITs of QITs that users only have to specify the particular identifications between data needed for their applications. Thus the standard property of equality that it is an equivalence relation respecting the constructors is inherited by construction from the usual properties of identity types, without the need to say so in the declaration of the QIT. The second example also illustrates a more technical aspect of QITs, that they enable constructive versions of structures that classically use non-constructive choice principles. The first example in Figure 1 only involves element constructors of finite arity ([] is nullary and x :: _ is unary) and consequently Bag X is isomorphic to the type obtained from the ordinary inductive type of finite lists over X by quotienting by the congruence generated by swap.Ofcoursethis assumes, as we do in this paper, that the type theory comes with Hofmann-style quotient types [18, Section]. By contrast, the second example in the figure involves an element constructor with countably infinite arity. So if one first forms the ordinary inductive type of ordered countably branching trees (by dropping the equality constructor perm from the declaration) and then quotients by a suitable relation to get the equalities specified by perm, one needs the axiom of countable choice to be able to lift the node element constructor to the quotient; see [5, Section 2.2] for a detailed discussion. The construction of the Cauchy reals as a higher inductive-inductive type [29, Section 11.3] provides a similar, but more complicated example where use of countable choice is avoided. Such examples have led to the folklore that as far as constructive type theories go, infinitary QITs are more expressive than the combination of ordinary inductive (or inductive-recursive, or inductive-inductive) types with quotient types. In this paper we use Abel’s sized types [2] to show that, for a wide class of QITs, this view is not justified. Thus we make two main contributions: First we define a family of QITs called QW-types and give elimination and computation rules for them (Section 2). The usual W-types of Martin-Löf [22] are inductive types giving the algebraic terms over a possibly infinitary signature. Constructing Infinitary Quotient-Inductive Types 259 One specifies a QW-type by giving a family of equations between such terms. So such QITs give initial algebras for possibly infinitary algebraic theories. As we indicate in Section 3, they can encode a very wide range of examples of possibly infinitary quotient-inductive types, namely those that do not involve constructors taking previously constructed equalities as arguments (so do not cover the infinitary extension of the very general scheme considered by Dybjer and Moeneclaey [12]). In set theory with the Axiom of Choice (AC), QW-types can be constructed simply as Quotients of the underlying W-type—hence the name. Secondly, we prove that contrary to expectation, without AC it is still possible to construct QW-types using quotients, but not simply by quotienting a W-type. Instead, the type to be quotiented and the relation by which to quotient are given simultaneously by definitions that refer to each other. Thus our construction (in Section 4)involves inductive-inductive definitions [15]. The elimination and computation functions which witness that the quotiented type correctly represents the required QW-type are defined recursively. In order to prove that our recursive definitions terminate we combine the use of inductive definitions involving strictly positive occurrences of quotient types with sized types (currently, we do not know whether it is possible to avoid sizing in favour of, say, a suitable well-founded termination ordering). Sized types provide a convenient constructive abstraction of what classically would be accomplished with sequences of transfinite ordinal length. The type theory in which we work To present our results we need a version of Martin-Löf Type Theory with (1) uniqueness of identity proofs, (2) quotient types and hence also function ex- tensionality, (3) inductive-inductive datatypes (with strictly positive occurrences of quotient types) and (4) sized types. Lean 3 provides (1) and (2) out of the box, but also the Axiom of Choice, unfortunately. Neither it, nor Coq provide (3) and (4). Agda provides (1) via unrestricted dependent pattern-matching, (2) via a combination of postulates and the rewriting mechanism of Cockx and Abel [8], (3) via its very liberal mechanism for mutual definitions and (4) thanks to theworkofAbel[2]. Therefore we make use of the type theory implemented by Agda (version to give formal proofs of our results. The Agda code can be found at doi: 10.17863/CAM.48187. In this paper we describe the results informally, using Agda-style notation for dependent type theory. In particular we use Set to denote the universe at the lowest level of a countable hierarchy of (Russell-style) universes. We also use Agda’s convention that an implicit argument of an operation can be made explicit by enclosing it in {braces}. Acknowledgement We would like to acknowledge the contribution Ian Orton made to the initial development of the work described here. He and the first author supervised the third author’s Master’s dissertation Quotient Inductive Types: A Schema, Encoding and Interpretation, in which the notion of QW-type (there called a W -type) was introduced. 260 M. P. Fiore et al. 2QW-types We begin by recalling some facts about types of well-founded trees, the W-types of Martin-Löf [22]. We take signatures to be elements of the dependent product Sig = A : Set, (A → Set) (1) So a signature is given by a pair Σ=(A, B) consisting of a type A : Set and afamilyoftypes B : A → Set. Each such signature determines a polynomial endofunctor [1, 16] S{Σ} : Set → Set whose value at X : Set is the following dependent product S{Σ}X = a : A, (Ba → X) (2) An S-algebra is by definition an element of the dependent product Alg{Σ} = X : Set, (S X → X) (3) S-algebra morphisms (X, s) → (X ,s ) are given by functions h : X → X together with an element of the type isHom h =(a : A)(b : Ba → X) → s (a, h ◦ b) ≡ h(s(a, b)) (4) Then the W-type W{Σ} determined by Σ is the underlying type of an initial S-algebra. More generally, Dybjer [11] shows that the initial algebra of any non- nested, strictly positive endofunctor on Set is given by a W-type; and Abbott, Altenkirch, and Ghani [1] extend this to the case with nested uses of W-types as part of their work on containers. (These proofs take place in extensional type theory [22], but work just as well in the intensional type theory with uniqueness of identity proofs and function extensionality that we are using here.) More concretely, given a signature Σ =(A, B), if one thinks of elements a : A as names of operation symbols whose (not necessarily finite) arity is given by the type Ba : Set, then the elements of W{Σ} represent the closed algebraic terms (i.e. well-founded trees) over the signature. From this point of view it is natural to consider not only closed terms solely built up from operations, but also open terms additionally built up with variables drawn from some type X.As well as allowing operators of possibly infinite arity, we also allow terms involving possibly infinitely many variables (the second example in Figure 1 involves such terms). Categorically, the type T{Σ}X of such open terms is the free S-algebra on X and is another W-type, for the signature obtained from Σ by adding the elements of X as nullary operations. Nevertheless, it is convenient to give a direct inductive definition: data : T{Σ: Sig}(X : Set): Set where η : X → T X (5) σ : S(T X) → T X Given an S-algebra (Y, s): Alg{Σ} and a function f : X → Y , the unique morphism of S-algebras from the free S-algebra (T X, σ) on X to (Y, s) has Constructing Infinitary Quotient-Inductive Types 261 underlying function T X → Y mapping each t : T X to the element t = f in Y defined by recursion on the structure of t: ηx = f = fx (6) σ(a, b) = f = s(a, λx → bx = f) As the notation suggests, = is the Kleisli lifting operation (“bind”) for a monad structure on T; indeed, it is the free monad on the endofunctor S. The notion of “QW-type” that we introduce in this section is obtained from that of W-type by considering not only the algebraic terms over a given signature, but also equations between terms. To code equations we use a type-theoretic rendering of a categorical notion of equational system introduced by Fiore and Hur, referred to as term equational system [14, Section 2] and as monadic equational system [13, Section 5], here instantiated to free monads on signatures. Definition 1. A system of equations over a signature Σ: Sig is specified by – atype E : Set (whose elements e : E name the equations) – a family of types V : E → Set (Ve : Set contains the variables used in the equation named e : E) – for each e : E, elements le and re of type T(Ve),thefree S-algebra on Ve (the terms with variables from Ve that are equated by the equation named e). Thus a system of equations over Σ is an element of the dependent product Syseq{Σ} = E : Set, V :(E → Set), (7) ((e : E) → T(Ve)) × ((e : E) → T(Ve)) An S{Σ}-algebra S X → X satisfies the system of equations ε =(E, V, l, r): Syseq{Σ} if there is an element of type Sat{ε}X =(e : E)(ρ : Ve → X) → ((le) = ρ) ≡ ((re) = ρ) (8) The category-theoretic view of QW-types is that they are simply S-algebras that are initial among those satisfying a given system of equations: Definition 2. A QW-type for a signature Σ=(A, B): Sig and system of equations ε =(E, V, l, r): Syseq{Σ} is given by a type QW{Σ}{ε} : Set equipped with an S-algebra structure and a proof that it satisfies the equations qwintro : S(QW) → QW (9) qwequ : Sat{ε}(QW) (10) together with functions that witness that it is the initial such algebra: qwrec :(X : Set)(s : S X → X) → Sat X → QW → X (11) qwrechom :(X : Set)(s : S X → X)(p : Sat X) → isHom(qwrecXsp) (12) qwuniq :(X : Set)(s : S X → X)(p : Sat X)(f : QW → X) → (13) isHom f → qwrecXsp ≡ f Note that the definition of = depends on the S-algebra structure s; in Agda we use instance arguments to hide this dependence. 262 M. P. Fiore et al. Given the definitions of S{Σ} in (2) and Sat{ε} in (8), properties (9) and (10) suggest that a QW-type is an instance of the notion of quotient-inductive type [5] with element constructor qwintro and equality constructor qwequ. For this to be so, QW{Σ}{ε} needs to have the requisite dependently-typed elimination and computation properties for these element and equality constructors. As Proposition 1 below shows, these follow from (11)–(13), because we are working in a type theory with function extensionality (by virtue of assuming quotient types). To state the proposition we need a dependent version of (6). For each P : QW → Set (14) p :(a : A)(b : Ba → QW) → ((x : Ba) → P (bx)) → P (qwintro(a, b)) type X : Set, function f : X → x : QW,P x and term t : T(X),wegetan element liftPpf t : P (t = fst ◦ f) defined by recursion on the structure of t: liftPpf (ηx)= snd(fx) (15) liftPpf (σ(a, b)) = pa (λx → bx = (fst ◦ f))(liftPpf ◦ b) Proposition 1. For a QW-type as in the above definition, given P and p as in (14) and a term of type (e : E)(f : Ve → x : QW,P x) → liftPpf (le) ≡≡ liftPpf (re) (16) there are elimination and computation terms: qwelim :(x : QW) → Px qwcomp :(a : A)(b : Ba → QW) → qwelim(qwintro(a, b)) ≡ pab (qwelim ◦ b) (Note that (16) uses McBride’s heterogeneous equality type [23], which we denote by ≡≡,because liftPpf (le) and liftPpf (re) inhabit different types, namely P (le = fst ◦ f) and P (re = fst ◦ f) respectively.) The proof of the proposition can be found in the accompanying Agda code (doi: 10.17863/CAM.48187). So QW-types are in particular quotient-inductive types (QITs). Conversely, in the next section we show that a wide range of QITs can be encoded as QW-types. Then in Section 4 we prove: Theorem 1. In constructive dependent type theory with uniqueness of identity proofs (or equivalently the Axiom K of Streicher [27]) and universes with induct- ive-inductive datatypes [15] permitting strictly positive occurrences of quotient types [18] and sized types [2], for every signature and system of equations (Defin- ition 1) there is a QW-type as in Definition 2. We only establish the computation property up to propositional rather than defini- tional equality; so, using the terminology of Shulman [25], these are typal quotient-in- ductive types. Constructing Infinitary Quotient-Inductive Types 263 Remark 1 (Free algebras). Definition 2 defines QW-types as initial algebras. A corollary of Theorem 1 is that free-algebras also exist. In other words, given a signature Σ and a type X : Set,there isan S-algebra (F{Σ}{ε}X, S{Σ}(F{Σ}{ε}X) → F{Σ}{ε}X) satisfying a system of equations ε and equipped with a function X → F{Σ}{ε}X, and which is universal among such S-algebras. Thus QW{Σ}{ε} is isomorphic to F{Σ}{ε}∅,where ∅ is the empty datatype. To see that such free algebras can be constructed as QW-types, given a signature Σ=(A, B),let Σ be the signature (X  A, B ),where X  A is the coproduct datatype (with constructors inl : X → X  A and inr : A → X  A) and where B : X  A → Set maps each inl x to ∅ and each inr a to Ba.Given a system of equations ε =(E, V, l, r),let ε be the system (E, V, l ,r ) where X X X for each e : E, l e = le = η and r e = re = η (using η : Ve → T{Σ }(Ve) X X X as in (5) and the S{Σ}-algebra structure s on T{Σ }(Ve) given by s(a, b)= σ(inr a, b)). Then one can show that the QW-type QW{Σ }{ε } is the free X X algebra F{Σ}{ε}X, with the function X → F{Σ}{ε}X sending each x : X to qwintro(inl x, _): QW{Σ }{ε },and the S{Σ}-algebra structure on F{Σ}{ε}X X X being given by the function sending (a, b): S(QW{Σ }{ε }) to qwintro(inr a, b). X X Remark 2 (Strictly positive equational systems). A very general, categorical notion of equational system was introduced by Fiore and Hur [14, Section 3]. They regard any endofunctor S : Set → Set as a functorial signature.A functorial term over such a signature, S G L, is specified by another functorial signature G : Set → Set (the term’s context) together with a functor L from S-algebras to G-algebras that commutes with the forgetful functors to Set.Thenan equational system is given by a pair of such terms in the same context, S  G L and S  G R say. An S-algebra s : SX → X satisfies the equational system if L(X, s) and R(X, s) are equal G-algebras. Taking the strictly positive endofunctors Set → Set to be the smallest collec- tion containing the identity and constant endofunctors and closed under forming dependent products and dependent functions over fixed types then, as in [11] (and also in the type theory in which we work), up to isomorphism every such endofunctor is of the form S{Σ} for some signature Σ: Sig. If we restrict atten- tion to equational systems S  G L, R with S and G strictly positive, then it turns out that such equational systems are in bijection with the systems of equations from Definition 1, and the two notions of satisfaction for an algebra coincide in that case. (See our Agda development for a proof of this.) So Dybjer’s characterisation of W-types as initial algebras for strictly positive endofunctors generalises to the fact that QW-types are initial among the algebras satisfying strictly positive equational systems in the sense of Fiore and Hur. 3 Quotient-inductive types Higher inductive types (HITs) are originally motivated by their use in homotopy type theory to construct homotopical cell complexes, such as spheres, tori, and 264 M. P. Fiore et al. so on [29]. Intuitively, a higher inductive type is an inductive type with point constructors also allowing for path constructors, surface constructors, etc., which are represented as elements of (iterated) identity types. For example, the sphere is given by the HIT : data S : Set where (17) base : S surf : refl ≡ refl base≡ base In the presence of the UIP axiom we will refer to HITs as quotient inductive types (QITs) [5], since all paths beyond the first level are trivial and any HIT is truncated to an h-set. We use the terms element constructor and equality constructor to refer to the point constructors and the only non-trivial level of path constructors. We believe that QW-types can be used to encode a wide range of QITs: see Conjecture 1 below. As evidence, we give several examples of QITs encoded as QW-types, beginning with the two examples of QITs in Figure 1, giving the corresponding signature (A, B) and system of equations (E, V, l, r) as in Definition 2. Example 1 (Finite multisets). The element constructors for finite multisets are encoded exactly as with a W-type: the constructors are [] and x :: _ for each x : X.Sowetake A to be 1  X, the coproduct of the unit type 1 (whose single constructor is denoted tt)with X.Thearityof [] is zero, and the arity of each x :: _ is one, represented by the empty type ∅ and unit type 1 respectively; so we take B : A → Set to be the function [λ_→ 0 | λ_→ 1]: 1  X → Set mapping inl tt to ∅ and each inr x to 1. The swap equality constructor is parameterised by elements of E = X × X. For each (x, y): E, swap xy yields an equation involving a single free vari- able (called ys : Bag X in Figure 1); so we take V : E → Set to be λ_→ 1. Each side of the equation named by swap xy iscodedbyanelementof T{Σ}(V (x, y)) = T{Σ}(1). Recalling the definition of T from (5), the single free variable corresponds to η tt : T{Σ}(1) andthenthe left-handsideof the equation is σ(inr x, (λ_→ σ(inr y, (λ_→ η tt)))) and the right-hand side is σ(inr y, (λ_→ σ(inr x, (λ_→ η tt)))). So, altogether, the signature and system of equations for the QW-type corres- ponding to the first example in Figure 1 is: A = 1 XE = X × X B =[λ_→∅ | λ_→ 1] V = λ_→ 1 l = λ (x, y) → σ(inr x, (λ_→ σ(inr y, (λ_→ η tt)))) r = λ (x, y) → σ(inr y, (λ_→ σ(inr x, (λ_→ η tt)))) The subscript on ≡ will be treated as an implicit argument and omitted when clear. Constructing Infinitary Quotient-Inductive Types 265 Example 2 (Unordered countably-branching trees). Here the element constructors are leaf of arity zero and, for each x : X, node x of arity N. So we use the signature with A = 1  X and B =[λ_→∅ | λ_→ N]. The perm equality constructor is parameterised by elements of E = X × f :(N → N), isIso f For each element (x, f, i) of that type, perm xf i yields an equation involving an N-indexed family of variables (called g : N → ωTree X in Figure 1); so we take V : E → Set to be λ_→ N. Each side of the equation named by perm xf i iscodedbyanelementof T{Σ}(V (x, f, i)) = T{Σ}(N).The N-indexed family of variables is represented by the function η : N → T{Σ}(N) and its permuted version by η ◦ f . Thus the left- and right-hand sides of the equation named by perm xf i are coded respectively by the elements σ(inr x, η) and σ(inr x, η ◦ f) of T{Σ}(N). So, altogether, the signature and system of equations for the QW-type corres- ponding to the second example in Figure 1 is: A = 1 XE = X × f :(N → N), isIso f B =[λ_→∅ | λ_→ N] V = λ_→ N l = λ (x, _, _) → σ(inr x, η) r = λ (x, f, _) → σ(inr x, η ◦ f) That unordered countably-branching trees are a QW-type is significant since no previous work on various subclasses of QITs (or indeed QIITs [19, 10]) supports infinitary QITs [6, 26, 28, 12, 19, 10]. See Example 5 for another, more substantial infinitary QW-type. So this extension represents one of our main contributions. QW-types generalise prior developments; the internal encodings for particular subclasses of 1-HITs given by Sojakova [26] and Swan [28] are direct instances of QW-types, as the next two examples show. Example 3. W-suspensions [26] are an instance of QW-types. The data for a W-suspension is: A ,C : Set, a type family B : A → Set and functions l ,r : C → A . The equivalent QW-type is: A = A E = C l = λc → σ((l c), η) B = B V = λc → (B (l c)) × (B (r c)) r = λc → σ((r c), η) Example 4. The non-indexed case of W-types with reductions [28] are QW-types. The data of such a type is: Y : Set, X : Y → Set and a reindexing map R :(y : Y ) → Xy. The reindexing map identifies a term σ (y, α) with some α (Ry) used to construct it. The equivalent QW-type is given by: A =YE =Yl = λy → σ (y, η) B =XV =Xr = λy → η (Ri) 266 M. P. Fiore et al. Example 5. Lumsdaine and Shulman [21, Section 9] give an example of a HIT not constructible in type theory from only pushouts and N.TheirHIT F can be thought of as a set of notations for countable ordinals. It consists of three point constructors: 0: F , S : F → F,and sup :(N → F ) → F , and five path constructors which are omitted here for brevity. It is inspired by the infinitary algebraic theory of Blass [7, Section 9] and hence it is not surprising that it can be encoded by a QW-type; the details can be found in our Agda code. 3.1 General QIT schemas Basold,Geuvers,andvander Weide[6] present a schema (though not a model) for infinitary QITs that do not support conditional path equations. Constructors are defined by arbitrary polynomial endofunctors built up using (non-dependent) products and sums, which means in particular that parameters and arguments can occur in any order. They require constructors to be in uncurried form. Dybjer and Moeneclaey [12, Sections 3.1 and 3.2] present a schema for finitary QITs that supports conditional path equations, where constructors are allowed to take inductive arguments not just of the datatype being declared, but also of its identity type. This schema can be generalised to infinitary QITs with conditional path equations. We believe this extension of their schema to be the most general schema for QITs. The schema requires all parameters to appear before all arguments, whereas the schema for regular inductive types in Agda is more flexible, allowing parameters and arguments in any order. We wish to combine the schema for infinitary QITs of Basold, Geuvers, and van der Weide [6] with the schema for QITs with conditional path equations of Dybjer and Moeneclaey [12] to provide a general schema. Moreover, we would like to combine the arbitrarily ordered parameters and arguments of the former with the curried constructors of the latter in order to support flexible pattern matching. For consistency with the definition of inductive types in Agda [9, equation (25) and figure 1] we will define strictly positive (i.e. polynomial) endofunctors in terms of strictly positive telescopes. A telescope is given by the grammar: Δ ::= empty telescope (18) | (x : A)Δ (x/ ∈ dom(Δ)) non-empty telescope A telescope extension (x : A)Δ binds (free) occurrences of x inside the tail Δ. The type A may contain free variables that are later bound by further telescope extensions on the left. A telescope can also exist in a context which binds any free variables not already bound in the telescope. Such a context is implicit in the following definitions. A function type Δ → C from a telescope Δ to a type C is defined as an iterated dependent function type by: def → C = C (19) def (x : A)Δ → C =(x : A) → (Δ → C) Constructing Infinitary Quotient-Inductive Types 267 A strictly positive endofunctor on a variable Y is presented by a strictly positive telescope Δ=(x :Φ (Y ))(x :Φ (Y )) ··· (x :Φ (Y )) (20) 1 1 2 2 n n where each type scheme Φ is described by a expression on Y made up of Π-types, Σ-types, and any (previously defined “constant”) types A not containing Y , accordingtothe grammar: Φ(Y ), Ψ(Y ) ::=(y : A) → Φ(Y ) | Σ p :Φ(Y ), Ψ(Y ) | A | Y (21) def For example, Δ = (x : X)(f : N → Y ) is the strictly positive telescope for the node constructor in Figure 1. In this instance, reordering x and f is permitted by exchange. Note that the variable Y can never appear in the argument position of a Π-type. Now it is possible to define the form of the endpoints of an equality (within the context of a strictly positive telescope), corresponding to the notion of an abstract syntax tree with free variables. With this intuition in mind, we can take the definition in Dybjer and Moeneclaey’s presentation [12] of endpoints given by point constructor patterns : :: l, r, p = c k | y (22) Where y : Y is in the context of the telescope for the equality constructor, and k is a term built without any rule for Y , but which may use other point constructor patterns p : Y . (That is, any sub-term of type Y must either be a variable y : Y found in the telescope, or a constructor for Y applied to further point constructor patterns and earlier defined constants. It could not, for instance, use the function application rule for Y with some function g : M → Y , not least since such functions cannot be defined before defining Y .) Note that this exactly matches the type T in (5). Basold, Geuvers, and van der Weide’s presentation has a sightly more general notion of constructor term [6, Definition 6] (Dybjer and Moeneclaey’s presentation [12] has more restricted telescopes). It is defined by rules which operate in the context of a strictly positive (polynomial) telescope and permit use of its bound variables, and the use of constructors c , but not any other rules for Y .Wetake the dependent form of their rules for products and functions. Note that these rules do not allow the use of terms of type ≡ in the endpoints. As with inductive types, the element constructors of QITs are specified by strictly positive telescopes. The equality constructors also permit conditions to appear in strictly positive positions, where l and r are constructor terms accordingtogrammar(22): Φ(Y ), Ψ(Y ) ::= (same grammar as in (21)) | l ≡ r (23) Y 268 M. P. Fiore et al. Definition 3. A QIT is defined by a list of named element constructors and equality constructors: data Y : Set where c :Δ → Y 1 1 c :Δ → Y n n p :Θ → l ≡ r 1 1 1 Y 1 p :Θ → l ≡ r m m m Y m where Δ are strictly positive telescopes on Y according to (21),and Θ are i j strictly positive telescopes on Y and ≡ in which conditions may also occur in strictly positive positions according to (23). QITs without equality constructors are inductive types. If none of the equality constructors contain Y in an argument position then it is called non-recursive, otherwise it is called recursive [6]. If none of the equality constructors contain an equality in Y then we call it a non-conditional,or equational, QIT, otherwise it is called a conditional [12], or quasi-equational, QIT. If all of the constant types A in any of the constructors are finite (isomorphic to Fin n for n : N)thenitiscalled a finitary QIT [12]. Otherwise, it is called a generalised [12], or infinitary, QIT. We are not aware of any existing examples in the literature of HITs which allow the point constructors to be conditional (though it is not difficult to imagine), nor any schemes for HITs that allow such definitions. However, we do believe this is worth investigating further. Conjecture 1. Any equational QIT can be encoded as a QW-type. We believe this can be proved analogously to the approach of Dybjer [11]for inductive types, though the endpoints still need to be considered and we have not yet translated the schema in definition 3 into Agda. Remark 3. Assuming Conjecture 1, Basold, Geuvers, and van der Weide’s schema [6], being an equational (non-conditional) instance of Definition 3, can be encoded as a QW-type. 4 Construction of QW-types In Section 2 we defined a QW-type to be initial among algebras over a given (possibly infinitary) signature satisfying a given systems of equations (Definition 2). If one interprets these notions in classical Zermelo-Fraenkel set theory with the axiom of Choice (ZFC), one regains the usual notion from universal algebra of initial algebras for infinitary equational theories. Since in the set-theoretic interpretation there is an upper bound on the cardinality of arities of operators in a given signature Σ, the ordinal-indexed sequence S (∅) of iterations of the functor in (2) starting from the empty set eventually becomes stationary; and Constructing Infinitary Quotient-Inductive Types 269 so the sequence has a small colimit, namely the set W{Σ} of well-founded trees over Σ. A system of equations ε (Definition 1)over Σ generates a Σ-congruence relation ∼ on W{Σ}. The quotient set W{Σ}/∼ yields the desired initial algebra for (Σ,ε) provided the S-algebra structure on W{Σ} induces one on the quotient set. It does so, because for each operator, using AC one can pick representatives of the (possibly infinitely many) equivalence classes that are the arguments of the operator, apply the interpretation of the operator in W{Σ} andthentake the equivalence class of that. So the set-theoretic model of type theory in ZFC models QW-types. Is this use of choice really necessary? Blass [7, Section 9] shows that if one drops AC and just works in ZF, then provided a certain large cardinal axiom is consistent with ZFC, it is consistent with ZF that there is an infinitary equational theory with no initial algebra. He shows this by first exhibiting a countably presented equational theory whose initial algebra has to be an uncountable regular cardinal; and secondly appealing to the construction of Gitik [17]ofa model of ZF with no uncountable regular cardinals (assuming a certain large cardinal axiom). Lumsdaine and Shulman [21] turn the infinitary equational theory of Blass into a higher-inductive type that cannot be proved to exist in ZF (and hence cannot be constructed in type theory just using pushouts and the natural numbers). We noted in Example 5 that this higher inductive type can be presented as a QW-type. So one cannot hope to construct QW-types using a type theory which is interpretable in just ZF. However, the type theory in which we work, with its universes closed under inductive-inductive definitions, already requires going beyond ZF to be able to give it a naive, classical set-theoretic interpretation (by assuming the existence of enough strongly inaccessible cardinals, for example). So the above considerations about initial algebras for infinitary equational theories in classical set theory do not rule out the construction of QW-types in the type theory in which we work. However, something more than just quotienting a W-type is needed in order to prove Theorem 1. Figure 2 gives a first attempt to do this (which later we will modify using sized types to get around a termination problem). The definition is relative to a given signature Σ: Sig and system of equations ε =(E, V, l, r): Syseq Σ. It makes use of quotient types, which we add to Agda via postulates, as shown in Figure 3. The REWRITE pragma makes elim RB f e (mkRx) definitionally equal to fx and is not merely a computational convenience—this is what allows function extensionality to be proved from these postulated quotient types. The POLARITY pragmas enable the postulated quotients to be used in datatype declarations at positions that Adga deems to be strictly positive; a case in point being the definitions of Q and Q in Figure 2. Agda’s test for strict positivity is sound 0 1 with respect to a set-theoretic semantics of inductively defined datatypes that are built up using strictly positive uses of dependent functions; the semantics of such datatypes uses initial algebras for endofunctors possessing a rank. Here we The actual implementation is polymorphic in universe levels, but for simplicity here we just give the level-zero version. 270 M. P. Fiore et al. mutual data Q : Set where sq : TQ → Q data Q : Q → Q → Set where 1 0 0 sqeq :(e : E)(ρ : Ve → Q) → Q (sq(T'ρ (le))) (sq(T'ρ (re))) sqη :(x : Q ) → Q (sq(η(qu x))) x 0 1 sqσ :(s : S(TQ)) → Q (sq(σ s)) (sq(ι(S'(qu ◦ sq) s))) Q : Set Q = Q /Q 0 1 qu : Q → Q qu = Q QW{Σ}{ε} = Q Figure 2. First attempt at constructing QW-types are allowing the inductively defined datatypes to be built up using quotients as well, but this is semantically unproblematic, since quotienting does not increase rank. (Later we need to combine the use of POLARITY with sized types; the semantics of this has been studied for System F [3], but needs to be explored further for Agda.) We build up the underlying inductive type Q to be quotiented using a constructor sq that takes well-founded trees T(Q /Q ) of whole equivalence 0 1 classes with respect to a relation Q that is mutually inductively defined with Q —an instance of an inductive-inductive definition [15]. The definition of Q 0 1 makes use of the actions on functions of the signature endofunctor S and its associated free monad T (Section 2); those actions are defined as follows: S' : {XY : Set}→ (X → Y ) → S X → S Y (24) S' f (a, b)=(a, f ◦ b) T' : {XY : Set}→ (X → Y ) → T X → T Y (25) T'ft = t = (η ◦ f) The definition of Q also uses the natural transformation ι : {X : Set}→ S X → T X defined by ι = σ ◦ S' η. Turning to the proof of Theorem 1 using the definitions in Figure 2,the S-algebra structure (9) is easy to define without using any form of choice, because ofthetypeof Q ’s constructor sq.Indeed,wecanjusttake qwintro to be qu◦ sq◦ι : S(QW) → QW. The first constructor sqeq of the data type Q ensures that the quotient Q /Q satisfies the equations in ε,sothatweget qwequ as 0 1 in (10); and the other two constructors, sqη and sqσ make identifications that The use of the free monad T{Σ} in the domain of sq, rather than just S{Σ}, seems necessary in order to define Q with the properties needed for (10)–(13). 1 Constructing Infinitary Quotient-Inductive Types 271 module quot where postulate ty : {A : Set}(R : A → A → Set) → Set mk : {A : Set}(R : A → A → Set) → A → ty R eq : {A : Set}(R : A → A → Set){xy : A}→ Rx y → mk Rx ≡ mk Ry elim : {A : Set}(R : A → A → Set)(B : ty R → Set)(f :(x : A) → B(mk Rx)) (e : {xy : A}→ Rx y → fx ≡≡ fy)(z : ty R) → Bz comp : {A : Set}(R : A → A → Set)(B : ty R → Set)(f :(x : A) → B(mk Rx)) (e : {xy : A}→ Rx y → fx ≡≡ fy)(x : A) → elim RB f e (mk Rx) ≡ fx {-# REWRITE comp -#} {-# POLARITY ty ++ ++ -#} {-# POLARITY mk __* -#} _/_ :(A : Set)(R : A → A → Set) → Set A/R = quot.t y R Figure 3. Quotient types enable the construction of functions qwrec, qwrechom and qwuniq as in (11)–(13). However, there is a problem. Given X : Set, s : S X → X and e : Sat X,for qwrecXse we have to construct a function r : Q → X.Since Q = Q /Q is a 0 1 quotient, we will have to use the eliminator quot.elim from Figure 3 to define r. The following is an obvious candidate definition mutual (26) r : Q → X r = quot.elim Q (λ_ → X) r r 1 0 1 r : Q → X 0 0 r (sq t)= t = r r : {xy : Q }→ Q xy → r x ≡ r y 1 0 1 0 0 r = ··· (where we have elided the details of the invariance proof r ). The problem with this mutually recursive definition is that it is not clear to us (and certainly not to Agda) whether it gives totally defined functions: although the value of r at a typical element sq t is explained in terms of the structurally smaller element t,the explanation involves r, whose definition uses the whole function r rather than some application of it at a structurally smaller argument. Agda’s termination checker rejects the definition. We get around this problem by using a type-based termination method, namely Agda’s implementation of sized types [2]. Intuitively, this provides a type Size of “sizes” which give a constructive abstraction of features of ordinals in ZF when they are used to index sequences of sets that eventually become stationary, such as in various transfinite constructions of free algebras [20, 14]. In Agda, the type Size comes equipped with various relations and functions: given sizes 272 M. P. Fiore et al. mutual data Q (i : Size): Set where sq : {j : Size< i}→ T(Q j) → Q i data Q (i : Size): Q i → Q i → Set where 1 0 0 sqeq : {j : Size< i}(e : E)(ρ : Ve → Q j) → Q i (sq(T'ρ (le))) (sq(T'ρ (re))) sqη : {j : Size< i}(x : Q j) → Q i (sq(η(qujx))) (φ ix) 0 1 0 sqσ : {j : Size< i}{k : Size< j}(s : S(T(Q k))) → Q i (sq(σ s)) (sq(ι(S'(qu j ◦ sq) s))) Q : Size → Set Q i =(Q i)/Q i 0 1 qu :(i : Size) → Q i → Q i qu i = (Q i) φ :(i : Size){j : Size< i}→ Q j → Q i 0 0 0 φ i (sq z)= sq z QW{Σ}{ε} = Q ∞ Figure 4. Construction of QW-types using sized types i, j : Size,thereisarelation i : Size< j to indicate strictly increasing size (so the type Size< j is treated as a subtype of Size); there is a successor operation ↑ : Size → Size (and also a join operation _ _ : Size → Size → Size, but we do not need it here); and a size ∞ : Size to indicate where a sequence becomes stationary. Thus we construct the QW-type QW{Σ}{ε} as Q ∞ for a suitable size-indexed sequence of types Q : Size → Set, shown in Figure 4. For each size i : Size,the type Q i is a quotient Q i/Q i, where the construct- 0 1 ors of the data types Q i and Q i take arguments of smaller sizes j : Size< i. 0 1 Consequently in the following sized version of (26) mutual (27) r : {i : Size}→ Q i → X r{i} = quot.elim (Q i)(λ_ → X)(r {i})(r {i}) 1 0 1 r : {i : Size}→ Q i → X 0 0 r {i}(sq {j} t)= t = r {j} r : {i : Size}{xy : Q i}→ Q ixy → r x ≡ r y 1 0 1 0 0 r = ··· the definition of r {i} involves a recursive call via r to the whole function r , but 0 0 at a size j which is smaller than i. So now Agda accepts that the definition of qwrecXse as r ∞,with r as in (27), is terminating. Thus we get a function qwrec for (11). We still have (9), but now with qwintro = qu ∞ ◦ sq {∞}◦ ι; and as before, the constructor sqeq of Q in Figure 4 ensures that QW =(Q ∞)/Q ∞ satisfies the equations ε. With these definitions 0 1 it turns out that each qwrecXse is an S-algebra morphism up to definitional Constructing Infinitary Quotient-Inductive Types 273 equality, so that the function qwrechom needed for (12) is straightforward to define. Finally, the function qwuniq needed for (13) can be constructed via a sequence of lemmas making use of the other two constructors of the data type Q ,namely sqη, which makes use of an auxiliary function for coercing between different size instances of Q ,and sqσ. We refer the reader to the accompanying Agda code (doi: 10.17863/CAM.48187) for the details of the construction of qwuniq. Altogether, the sized definitions in Figure 4 allow us to complete a proof of Theorem 1. 5 Conclusion QW-types are a general form of QIT that capture many examples, including simple 1-cell complexes and non-recursive QITs [6], non-structural QITs [26], W-types with reductions [28], and also infinitary QITs (e.g. unordered infinitely branching trees [5], and ordinals [21]). They also capture the notion of initial (and free) algebras for strictly positive equational systems [14], analogously to how W-types capture the notion of initial (and free) algebras for strictly positive endofunctors (see Remark 2). Using Agda to formalise our results, we have shown that it is possible to construct any QW-type, even infinitary ones, in intensional type theory satisfying UIP, using inductive-inductive definitions permitting strictly positive occurrences of quotients and sized types (see Theorem 1 and Section 4). We conclude by mentioning related work and some possible directions for future work. Quotients of monads. In view of Remark 2, Section 4 gives a construction of initial algebras for equational systems [14]onthe free monad T{Σ} generated by asignature Σ. By a suitable change of signature (see Remark 1) this extends to a construction of free algebras, rather than just initial ones. We can show that the construction works for an arbitrary strictly positive monad and not just for free ones. Given such a construction one gets a quotient monad morphism from the base monad to the quotient monad. This contravariantly induces a forgetful functor from the algebras of the latter to that of the former. Using the adjoint triangle theorem, one should be able to construct a left adjoint. This would then cover examples such as the free group over a monoid, free ring over a group, etc. Quotient inductive-inductive types. The notion of QW-type generalises to indexed QW-types, analogously to the generalisation of W-types to Petersson-Synek trees for inductively defined indexed families of types [24, Chapter 16], and we will consider it in subsequent work. More generally, we wonder whether our analysis of QITs using quotients, inductive-inductive and sized types can be extended to cover the notion of quotient inductive-inductive type (QIIT) [4, 19]. Dijkstra [10] studies such types in depth and in Chapter 6 of his thesis gives a construction for finitary ones in terms of countable colimits, and hence in terms of countable coproducts and quotients. One could hope to pass to the infinitary case by using sized types as we have done, provided an analogue for QIITs can be found of 274 M. P. Fiore et al. the monadic construction in Section 4 for our class of QITs, the QW-types. Kaposi, Kovács, and Altenkirch [19] give a specification of finitary QIITs using a domain-specific type theory called the theory of signatures and prove existence of QIITs matching this specification. It might be possible to encode their theory of signatures using QW-types (it can already be encoded as a QIIT), or to extend QW-types making this possible. This would allow infinitary QIITs. Schemas for QITs. We have shown by example that QW-types can encode a wide range of QITs. However, we have yet to extend this to a proof of Conjecture 1 that every instance of the schema for QITs considered in Section 3 can be so encoded. Conditional path equations. In Section 3 we mentioned the fact that Dybjer and Moeneclaey [12] give a model for finitary 1-HITs and 2-HITs in which constructors are allowed to take arguments involving the identity type of the datatype being declared. On the face of it, QW-types are not able to encode such conditional QITs. We plan to consider whether it is possible to extend the notion of QW-type to allow encoding of infinitary QITs with such conditional equations. Homotopy Type Theory (HoTT). Our development makes use of UIP (and het- erogeneous equality), which is well-known to be incompatible with the Univalence Axiom [29, Example 3.1.9]. Given the interest in HoTT, it is certainly worth investigating whether a result like Theorem 1 holds in univalent foundations for a suitably coherent version of QW-types. We are currently investigating this using set-truncation. Pattern matching for QITs and HITs. Our reduction of QITs to induction- induction, strictly positive quotients and sized types is of theoretical interest, but in practice one could wish for more direct support in systems like Agda, Lean and Coq for the very useful notion of quotient inductive types (or more generally, for higher inductive types). Even having better support for the special case of quotient types would be welcome. It is not hard to envisage the addition of a general schema for declaring QITs; but when it comes to defining functions on them, having to do that with eliminator forms rapidly becomes cumbersome (for example, for functions of several QIT arguments). Some extension of dependently typed pattern matching to cover equality constructors as well as element constructors is needed and the third author has begun work on that based on the approach of Cockx and Abel [9]. In this context it is worth mentioning that the cubical features of recent versions of Agda give access to cubical type theory [30]. This allows for easy declaration of HITs and hence in particular QITs (and quotients avoiding the need for POLARITY pragmas) and a certain amount of pattern matching when it comes to defining functions on them: the value of a function on a path constructor can be specified by using generic elements of the interval type in point-level patterns; but currently the user is given little mechanised assistance to solve the definitional equality constraints on end-points of paths that are generated by this method. Constructing Infinitary Quotient-Inductive Types 275 References 1. Abbott, M., Altenkirch, T., Ghani, N.: Containers: Constructing strictly positive types. Theoretical Computer Science vol. 342 (1), 3–27 (2005). doi: 10.1016/j.tcs. 2005.06.002. 2. Abel, A.: Type-Based Termination, Inflationary Fixed-Points, and Mixed Induct- ive-Coinductive Types. Electronic Proceedings in Theoretical Computer Science vol. 77, 1–11 (2012). doi: 10.4204/EPTCS.77.1. 3. Abel, A., Pientka, B.: Well-Founded Recursion with Copatterns and Sized Types. J. Funct. Prog. vol. 26, e2 (2016). doi: 10.1017/S0956796816000022. 4. Altenkirch, T., Capriotti, P., Dijkstra, G., Kraus, N., Nordvall Forsberg, F.: Quotient Inductive-Inductive Types. In: Baier, C., Dal Lago, U. (eds.) Foundations of Software Science and Computation Structures, FoSSaCS 2018, LNCS, vol. 10803, pp. 293–310. Springer, Heidelberg (2018). 5. Altenkirch, T., Kaposi, A.: Type Theory in Type Theory Using Quotient Inductive Types. In: Proceedings of the 43rd Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages - POPL 2016, pp. 18–29. ACM Press, St. Petersburg, FL, USA (2016). doi: 10.1145/2837614.2837638. 6. Basold, H., Geuvers, H., van der Weide, N.: Higher Inductive Types in Programming. Journal of Universal Computer Science vol. 23 (1), 27 (2017). doi: 10.3217/jucs- 023-01-0063. 7. Blass, A.: Words, Free Algebras, and Coequalizers. Fundamenta Mathematicae vol. 117 (2), 117–160 (1983). 8. Cockx, J., Abel, A.: “Sprinkles of Extensionality for Your Vanilla Type Theory”. Abstract for the 22nd International Conference on Types for Proofs and Programs (TYPES 2016), Novi Sad, Serbia. 9. Cockx, J., Abel, A.: Elaborating Dependent (Co)Pattern Matching. Proceedings of the ACM on Programming Languages vol. 2, 1–30 (2018). doi: 10.1145/3236770. 10. Dijkstra, G.: Quotient Inductive-Inductive Definitions. PhD thesis, University of Nottingham (2017), url: . 11. Dybjer, P.: Representing Inductively Defined Sets by Wellorderings in Martin-Löf’s Type Theory. Theoretical Computer Science vol. 176 (1-2), 329–335 (1997). doi: 10.1016/S0304-3975(96)00145-4. 12. Dybjer, P., Moeneclaey, H.: Finitary Higher Inductive Types in the Groupoid Model. Electronic Notes in Theoretical Computer Science vol. 336, 119–134 (2018). doi: 10.1016/j.entcs.2018.03.019. 13. Fiore, M.: An Equational Metalogic for Monadic Equational Systems. Theory and Applications of Categories vol. 27 (18), 464–492 (2013). url: https : / / emis . de / journals/TAC/volumes/27/18/27-18.pdf . 14. Fiore, M., Hur, C.-K.: On the Construction of Free Algebras for Equational Systems. Theoretical Computer Science vol. 410 (18), 1704–1729 (2009). doi: 10.1016/j.tcs. 2008.12.052. 15. Forsberg, F.N., Setzer, A.: A Finite Axiomatisation of Inductive-Inductive Defin- itions. In: Berger, U., Diener, H., Schuster, P., Seisenberger, M. (eds.) Logic, Construction, Computation, Ontos mathematical logic, pp. 259–287. De Gruyter (2012). doi: 10.1515/9783110324921.259. 16. Gambino, N., Kock, J.: Polynomial Functors and Polynomial Monads. Math. Proc. Camb. Phil. Soc. vol. 154 (1), 153–192 (2013). doi: 10.1017/S0305004112000394. 17. Gitik, M.: All Uncountable Cardinals Can Be Singular. Israel J. Math. vol. 35 (1–2), 61–88 (1980). 276 M. P. Fiore et al. 18. Hofmann, M.: Extensional Concepts in Intensional Type Theory. PhD thesis, University of Edinburgh (1995). 19. Kaposi, A., Kovács, A., Altenkirch, T.: Constructing Quotient Inductive-Inductive Types. Proc. ACM Program. Lang. vol. 3, 1–24 (2019). doi: 10.1145/3290315. 20. Kelly, M.: A Unified Treatment of Transfinite Constructions for Free Algebras, Free Monoids, Colimits, Associated Sheaves, and so on. Bull. Austral. Math. Soc. vol. 22, 1–83 (1980). 21. Lumsdaine, P.L., Shulman, M.: Semantics of Higher Inductive Types. Math. Proc. Camb. Phil. Soc. (2019). doi: 10.1017/S030500411900015X. 22. Martin-Löf, P.: Constructive Mathematics and Computer Programming. In: Cohen, L.J.,Łoś,J., Pfeiffer,H.,Podewski,K.-P. (eds.) Studies in Logic and the Foundations of Mathematics, pp. 153–175. Elsevier (1982). doi: 10.1016/S0049-237X(09)70189-2. 23. McBride, C.: Dependently Typed Functional Programs and their Proofs. PhD thesis, University of Edinburgh (1999). 24. Nordström, B., Petersson, K., Smith, J.M.: Programming in Martin-Löf ’s Type Theory. Oxford University Press (1990). 25. Shulman, M.: Brouwer’s Fixed-Point Theorem in Real-Cohesive Homotopy Type Theory. Mathematical Structures in Computer Science vol. 28, 856–941 (2018). 26. Sojakova, K.: Higher Inductive Types as Homotopy-Initial Algebras. In: Proceed- ings of the 42nd Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages - POPL ’15, pp. 31–42. ACM Press, Mumbai, India (2015). doi: 10.1145/2676726.2676983. 27. Streicher, T.: Investigations into Intensional Type Theory. Habilitation Thesis, Ludwig Maximilian University (1993). 28. Swan, A.: W-Types with Reductions and the Small Object Argument. (2018). arXiv:1802.07588 [math]. 29. The Univalent Foundations Program, Homotopy Type Theory: Univalent Founda- tions for Mathematics., Institute for Advanced Study (2013). 30. Vezzosi, A., Mörtberg, A., Abel, A.: Cubical Agda: A Dependently Typed Program- ming Language with Univalence and Higher Inductive Types. Proc. ACM Program. Lang. vol. 3 (ICFP), 87:1–87:29 (2019). doi: 10.1145/3341691. Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made. The images or other third party material in this chapter are included in the chapter’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. Relative full completeness for bicategorical cartesian closed structure 1 ()2 Marcelo Fiore and Philip Saville Department of Computer Science and Technology, University of Cambridge, UK School of Informatics, University of Edinburgh, UK Abstract. The glueing construction, defined as a certain comma cate- gory, is an important tool for reasoning about type theories, logics, and programming languages. Here we extend the construction to accommo- date ‘2-dimensional theories’ of types, terms between types, and rewrites between terms. Taking bicategories as the semantic framework for such systems, we define the glueing bicategory and establish a bicategorical version of the well-known construction of cartesian closed structure on a glueing category. As an application, we show that free finite-product bicategories are fully complete relative to free cartesian closed bicate- gories, thereby establishing that the higher-order equational theory of rewriting in the simply-typed lambda calculus is a conservative extension of the algebraic equational theory of rewriting in the fragment with finite products only. Keywords: glueing, bicategories, cartesian closure, relative full com- pleteness, rewriting, type theory, conservative extension 1 Introduction Relative full completeness for cartesian closed structure. Every small category C can be viewed as an algebraic theory. This has sorts the objects of C with unary operators for each morphism of C and equations determined by the equalities in C. Suppose one freely extends C with finite products. Categori- cally, one obtains the free cartesian category F [C] on C. From the well-known construction of F [C] (see e.g. [12] and [46, §8]) it is direct that the universal functor C → F [C] is fully-faithful, a property we will refer to as the relative full × × completeness (c.f. [2,16]) of C in F [C]. Type theoretically, F [C] corresponds to the Simply-Typed Product Calculus (STPC) over the algebraic theory of C, given by taking the fragment of the Simply-Typed Lambda Calculus (STLC) consisting of just the types, rules, and equational theory for products. Relative full completeness corresponds to the STPC being a conservative extension. ×,→ Consider now the free cartesian closed category F [C] on C, type-theoretically corresponding to the STLC over the algebraic theory of C. Does the relative full ×,→ completeness property, and hence conservativity, still hold for either C in F [C] The Author(s) 2020 J. Goubault-Larrecq and B. K¨ onig (Eds.): FOSSACS 2020, LNCS 12077, pp. 277–298, 2020. 278 M. Fiore and P. Saville × ×,→ ×,→ or for F [C] in F [C]? Precisely, is either the universal functor C → F [C] × ×,→ or its universal cartesian extension F [C] → F [C] full and faithful? The answer is affirmative, but the proof is non-trivial. One must either reason proof- theoretically (e.g. in the style of [63, Chapter 8]) or employ semantic techniques such as glueing [39, Annexe C]. In this paper we consider the question of relative full completeness in the bicategorical setting. This corresponds to the question of conservativity for 2-dimensional theories of types, terms between types, and rewrites between terms (see [32,20]). We focus on the particular case of the STLC with invertible rewrites given by β-reductions and η-expansions, and its STPC fragment. By identifying these two systems with cartesian closed, resp. finite product, structure ‘up to isomorphism’ one recovers a conservative extension result for rewrites akin to that for terms. 2-dimensional categories and rewriting. It has been known since the 1980s that one may consider 2-dimensional categories as abstract reduction sys- tems (e.g. [54,51]): if sorts are 0-cells (objects) and terms are 1-cells (morphisms), then rewrites between terms ought to be 2-cells. Indeed, every sesquicategory (of which 2-categories are a special class) generates a rewriting relation  on its 1-cells defined by f  g if and only if there exists a 2-cell f ⇒ g (e.g. [60,58]). Invertible 2-cells may be then thought of as equality witnesses. The rewriting rules of the STLC arise naturally in this framework: Seely [56] observed that β-reduction and η-expansion may be respectively interpreted as the counit and unit of the adjunctions corresponding to lax (directed) products and exponentials in a 2-category (c.f. also [34,27]). This approach was taken up by Hilken [32], who developed a ‘2-dimensional λ-calculus’ with strict products and lax exponentials to study the proof theory of rewriting in the STLC (c.f. also [33]). Our concern here is with equational theories of rewriting, and we follow Seely in viewing weak categorical structure as a semantic model of rewriting modulo an equational theory. We are not aware of non-syntactic examples of 2-dimensional cartesian closed structure that are lax but not pseudo (i.e. up to isomorphism) and so adopt cartesian closed bicategories as our semantic framework. From the perspective of rewriting, a sesquicategory embodies the rewriting of terms modulo the monoid laws for identities and composition, while a bicategory embodies the rewriting of terms modulo the equational theory on rewrites given by the triangle and pentagon laws of a monoidal category. Cartesian closed bicategories further embody the usual β-reductions and η-expansions of STLC modulo an equational theory on rewrites; for instance, this identifies the composite rewrite t ,t ⇒π (t ,t ),π (t ,t )⇒t ,t  with the identity rewrite. 1 2 1 1 2 2 1 2 1 2 Indeed, in the free cartesian closed bicategory over a signature of base types and constant terms, the quotient of 1-cells by the isomorphism relation provided by 2-cells is in bijection with αβη-equivalence classes of STLC-terms (c.f. [55, Chapter 5]). Bicategorical relative full completeness. The bicategorical notion of relative full completeness arises by generalising from functors that are fully-faithful to Relative full completeness for bicategorical cartesian closed structure 279 pseudofunctors F : B→ C that are locally an equivalence, that is, for which every hom-functor F : B(X, Y ) →C(FX, FY ) is an equivalence of categories. X,Y Interpreted in the context of rewriting, this amounts to the conservativity of rewriting theories. First, the equational theory of rewriting in C is conservative over that in B: the hom-functors do not identify distinct rewrites. Second, the reduction relation in C(FX, FY ) is conservative over that in B(X, Y ): whenever Ff  Fg in C then already f  g in B. Third, the term structure in B gets copied by F in C: modulo the equational theory of rewrites, there are no new terms between types in the image of F . Contributions. This paper makes two main contributions. Our first contribution, in Section 3, is to introduce the bicategorical glueing construction and, in Section 4, to initiate the development of its theory. As well as providing an assurance that our notion is the right one, this establishes the basic framework for applications. Importantly, we bicategorify the fundamental folklore result (e.g. [40,12,62]) establishing mild conditions under which a glued bicategory is cartesian closed. Our second contribution, in Section 5, is to employ bicategorical glueing to show that for a bicategory B with finite-product completion F [B] and cartesian- ×,→ ×,→ closed completion F [B], the universal pseudofunctor B→ F [B] and its × ×,→ universal finite-product-preserving extension F [B] →F [B] are both locally an equivalence. Since one may directly observe that the universal pseudofunc- tor B→ F [B] is locally an equivalence, we obtain relative full completeness results for bicategorical cartesian closed structure mirroring those of the categori- cal setting. Establishing this proof-theoretically would require the development of a 2-dimensional proof theory. Given the complexities already present at the categorical level this seems a serious and interesting undertaking. Here, once the basic bicategorical theory has been established, the proof is relatively compact. This highlights the effectiveness of our approach for the application. The result may also be expressed type-theoretically. For instance, in terms of ×,→ the type theories of [20], the type theory Λ for cartesian closed bicategories ps is a conservative extension of the type theory Λ for finite-product bicategories. ps It follows that, modulo the equational theory of bicategorical products and exponentials, any rewrite between STPC-terms constructed using the βη-rewrites for both products and exponentials may be equally presented as constructed from just the βη-rewrites for products (see [21,55]). Further work. We view the foundational theory presented here as the start- ing point for future work. For instance, we plan to incorporate further type structure into the development, such as coproducts (c.f. [22,16,4]) and monoidal structure (c.f. [31]). On the other hand, the importance of glueing in the categorical setting suggests that its bicategorical counterpart will find a range of applications. A case in point, which has already been developed, is the proof of a 2-dimensional ×,→ normalisation property for the type theory Λ for cartesian closed bicategories ps of [20] that entails a corresponding bicategorical coherence theorem [21,55]. There 280 M. Fiore and P. Saville are also a variety of syntactic constructions in programming languages and type theory that naturally come with a 2-dimensional semantics (see e.g. the use of 2-categorical constructions in [23,14,6,61,35]). In such scenarios, bicategorical glueing may prove useful for establishing properties corresponding to the notions of adequacy and/or canonicity, or for proving further conservativity properties. 2 Cartesian closed bicategories We begin by briefly recapitulating the basic theory of bicategories, including the definition of cartesian closure. A summary of the key definitions is in [41]; for a more extensive introduction see e.g. [5,7]. 2.1 Bicategories Bicategories axiomatise structures in which the associativity and unit laws of composition only hold up to coherent isomorphism, for instance when composition is defined by a universal property. They are rife in mathematics and theoretical computer science, appearing in the semantics of computation [29,11,49], datatype models [1,13], categorical logic [26], and categorical algebra [19,25,18]. Definition 1 ([5]). A bicategory B consists of 1. A class of objects ob(B), 2. For every X, Y ∈ ob(B) a hom-category B(X, Y ), •, id with objects 1-cells f : X → Y and morphisms 2-cells α : f ⇒ f : X → Y ; composition of 2-cells is called vertical composition, 3. For every X, Y, Z ∈ ob(B) an identity functor Id : 1 →B(X, X) (for 1 the terminal category) and a horizontal composition functor ◦ : X,Y,Z B(Y, Z) ×B(X, Y ) →B(X, Z), 4. Invertible 2-cells a :(h ◦ g) ◦ f ⇒ h ◦ (g ◦ f): W → Z h,g,f l :Id ◦ f ⇒ f : W → X f X r : g ◦ Id ⇒ g : X → Y g X for every f : W → X, g : X → Y and h : Y → Z, natural in each of their parameters and satisfying a triangle law and a pentagon law analogous to those for monoidal categories. A bicategory is said to be locally small if every hom-category is small. Example 1. 1. Every 2-category is a bicategory in which the structural isomor- phisms are all the identity. 2. For any category C with pullbacks there exists a bicategory of spans over C [5]. The objects are those of C, 1-cells A  B are spans (A ← X → B), and 2-cells (A ← X → B) → (A ← X → B) are morphisms X → X making the expected diagram commute. Composition is defined using chosen pullbacks. Relative full completeness for bicategorical cartesian closed structure 281 A bicategory has three notions of ‘opposite’, depending on whether one reverses 1-cells, 2-cells, or both (see e.g. [37, §1.6]). We shall only require the following. op Definition 2. The opposite of a bicategory B, denoted B , is obtained by setting op B (X, Y ):= B(Y, X) for all X, Y ∈B. A morphism of bicategories is called a pseudofunctor (or homomorphism)[5]. It is a mapping on objects, 1-cells and 2-cells that preserves horizontal composition up to isomorphism. Vertical composition is preserved strictly. Definition 3. A pseudofunctor (F, φ, ψ): B→ C between bicategories B and C consists of 1. A mapping F : ob(B) → ob(C), 2. A functor F : B(X, Y ) →C(FX, FY ) for every X, Y ∈ ob(B), X,Y 3. An invertible 2-cell ψ :Id ⇒ F (Id ) for every X ∈ ob(B), X FX X 4. An invertible 2-cell φ : F (f) ◦ F (g) ⇒ F (f ◦ g) for every g : X → Y and f,g f : Y → Z, natural in f and g, subject to two unit laws and an associativity law. A pseudofunctor for which φ and ψ are both the identity is called strict. A pseudofunctor is called locally P if every functor F satisfies the property P . X,Y Example 2. A monoidal category is equivalently a one-object bicategory; a monoidal functor is equivalently a pseudofunctor between one-object bicate- gories. Pseudofunctors F, G : B→ C are related by pseudonatural transformations. A pseudonatural transformation (k, k): F ⇒ G consists of a family of 1-cells (k : FX → GX) and, for every f : X → Y , an invertible 2-cell k : X X∈B f k ◦ Ff ⇒ Gf ◦ k witnessing naturality. The 2-cells k are required to be Y X f natural in f and satisfy two coherence axioms. A morphism of pseudonatural transformations is called a modification, and may be thought of as a coherent family of 2-cells. Notation 1. For bicategories B and C we write Bicat(B, C) for the (possibly large) bicategory of pseudofunctors, pseudonatural transformations, and modifi- cations (see e.g. [41]). If C is a 2-category, then so is Bicat(B, C). We write Cat for op the 2-category of small categories and think of the 2-category Bicat(B , Cat)as op a bicategorical version of the presheaf category Set . As for presheaf categories, one must take care to avoid size issues. We therefore adopt the convention that op when considering Bicat(B , Cat) the bicategory B is small or locally small as appropriate. Example 3. For every bicategory B and X ∈B there exists the representable op pseudofunctor YX : B → Cat, defined by YX := B(−,X). The 2-cells φ and ψ are structural isomorphisms. 282 M. Fiore and P. Saville The notion of equivalence between bicategories is called biequivalence. A biequivalence B C consists of a pair of pseudofunctors F : B  G : C together with equivalences FG id and GF id in Bicat(C, C) and Bicat(B, B) C B respectively. Equivalences in an arbitrary bicategory are defined by analogy with equivalences of categories, see e.g. [42, pp. 28]. Remark 1. The coherence theorem for monoidal categories [44, Chapter VII] gen- eralises to bicategories: any bicategory is biequivalent to a 2-category [45] (see [42] for a readable summary of the argument). We are therefore justified in writing simply = for composites of a, l and r. As a rule of thumb, a category-theoretic proposition lifts to a bicategorical proposition so long as one takes care to weaken isomorphisms to equivalences and sprinkle the prefixes ‘pseudo’ and ‘bi’ in appropriate places. For instance, bicategorical adjoints are called biadjoints and bicategorical limits are called bilimits [59]. The latter may be thought of as limits in which every cone is filled by a coherent choice of invertible 2-cell. Bilimits are preserved by representable pseud- ofunctors and by right biadjoints. The bicategorical Yoneda lemma [59, §1.9] says op that for any pseudofunctor P : B → Cat, evaluation at the identity determines op a pseudonatural family of equivalences Bicat(B , Cat)(YX, P ) PX. One may op then deduce that the Yoneda pseudofunctor Y: B→ Bicat(B , Cat): X → YX is locally an equivalence. Another ‘bicategorified’ lemma is the following, which we shall employ in Section 5. Lemma 1. 1. For pseudofunctors F, G : B→ C,if F G and G is locally an equivalence, then so is F . 2. For pseudofunctors F : A→B, G : B→ C, H : C→ D,if G ◦ F and H ◦ G are local equivalences, then so is F . 2.2 fp-Bicategories It is convenient to directly consider all finite products, as this reduces the need to deal with the equivalent objects given by re-bracketing binary products. To avoid confusion with the ‘cartesian bicategories’ of Carboni and Walters [10,8], we call a bicategory with all finite products an fp-bicategory. Definition 4. An fp-bicategory (B, Π (−)) is a bicategory B equipped with the following data for every A ,...,A ∈B (n ∈ N): 1 n 1. A chosen object (A ,...,A ), 1 n 2. Chosen arrows π : (A ,...,A ) → A (k =1,...,n), called projections, k 1 n k 3. For every X ∈B an adjoint equivalence (π ◦−,...,π ◦−) 1 n B (X, (A ,...,A )) B(X, A ) (1) 1 n i n i=1 −,...,= specified by choosing a family of universal arrows (see e.g. [44, Theorem IV.2]) (i) with components : π ◦f ,...,f ⇒ f for i =1,...,n. i 1 n i f ,...,f 1 n Relative full completeness for bicategorical cartesian closed structure 283 We call the right adjoint −,..., = the n-ary tupling. (1) (n) Explicitly, the universal property of =( ,..., ) is the following. For any finite family of 2-cells (α : π ◦ g ⇒ f : X → A ) , there exists a i i i i i=1,...,n 2-cell p (α ,...,α ): g ⇒f ,...,f  : X → (A ,...,A ), unique such that 1 n 1 n 1 n (k) • π ◦ p (α ,...,α ) = α : π ◦ g ⇒ f k 1 n k k k f ,...,f 1 n for k =1,...,n. One thereby obtains a functor −,..., = and an adjunction (1) (n) † as in (1) with counit =( ,..., ) and unit ς := p (id ,..., id ): g π ◦g π ◦g 1 n g ⇒π ◦ g,...,π ◦ g. This defines a lax n-ary product structure: one merely 1 n obtains an adjunction in (1). One turns it into a bicategorical (pseudo) product by further requiring the unit and counit to be invertible. The terminal object 1 arises as (). We adopt the same notation as for categorical products, for example by n n writing A for (A ,...,A ) and f for f ◦ π ,...,f ◦ π . i 1 n i 1 1 n n i=1 n i=1 Example 4. The bicategory of spans over a lextensive category [9] has finite products; such a bicategory is biequivalent to its opposite, so these are in fact biproducts [38, Theorem 6.2]. Biproduct structure arises using the coproduct structure of the underlying category (c.f. the biproduct structure of the category of relations). Remark 2 ( c.f. Remark 1). fp-Bicategories satisfy the following coherence the- orem: every fp-bicategory is biequivalent to a 2-category with 2-categorical products [52, Theorem 4.1]. Thus, we shall sometimes simply write = in diagrams for composites of 2-cells arising from either the bicategorical or product structure. In pasting diagrams we shall omit such 2-cells completely (c.f. [30, Remark 3.1.16]; for a detailed exposition, see [64, Appendix A]). One may think of bicategorical product structure as an intensional version of the familiar categorical structure, except the usual equations (e.g. [28]) are now witnessed by natural families of invertible 2-cells. It is useful to introduce explicit names for these 2-cells. Notation 2. In the following, and throughout, we write A for a finite sequence A ,...,A . 1 n Lemma 2. For any fp-bicategory (B, Π (−)) there exist canonical choices for the following natural families of invertible 2-cells: 1. For every (h : Y → A ) and g : X → Y , a 2-cell post(h ; g): i i i=1,...,n • h ,...,h ◦ g ⇒h ◦ g,...,h ◦ g, 1 n 1 n 2. For every (h : A → B ) and (g : X → A ) , a 2-cell i i i i=1,...,n i i i=1,...,n fuse(h ; g ):( h ) ◦g ,...,g ⇒h ◦ g ,...,h ◦ g . • • i 1 n 1 1 n n i=1 In particular, it follows from Lemma 2(2) that there exists a canonical natural n n n family of invertible 2-cells Φ : ( h ) ◦ ( g ) ⇒ (h ◦ g ) for any h ,g i i i i • • i=1 i=1 i=1 (h : A → B ) and (g : X → A ) . i i i i=1,...,n j j j j=1,...,n In the categorical setting, a cartesian functor preserves products up to isomor- phism. An fp-pseudofunctor preserves bicategorical products up to equivalence. 284 M. Fiore and P. Saville Definition 5. An fp-pseudofunctor (F, q ) between fp-bicategories (B, Π (−)) and (C, Π (−)) is a pseudofunctor F : B→ C equipped with specified equivalences n n Fπ ,...,Fπ  : F ( A )  (FA ):q 1 n i i i=1 i=1 A for every A ,...,A ∈B (n ∈ N). We denote the 2-cells witnessing these 1 n × × × × equivalences by u : Id ⇒Fπ ,...,Fπ ◦ q and c :q ◦ ( FA ) 1 n A i A A A • i • • • Fπ ,...,Fπ ⇒ Id . We call (F, q ) strict if F is strict and satis- 1 n (FΠ A ) i i fies F ( (A ,...,A )) = (FA ,...,FA ) 1 n 1 n n n (i) (i) A ,...,A FA ,...,F A 1 n 1 n F = F (π )= π t ,...,t Ft ,...,F t i i 1 n 1 n F t ,...,t  = Ft ,...,Ft q =Id 1 n 1 n Π (FA ,...,F A ) A ,...,A n 1 n 1 n with equivalences given by the 2-cells p (r ,..., r ):Id =⇒π ,...,π . π π 1 n 1 n Notation 3. For fp-bicategories B and C we write fp-Bicat(B, C) for the bicate- gory of fp-pseudofunctors, pseudonatural transformations and modifications. We define two further families of 2-cells to witness standard properties of cartesian functors. The first witnesses the fact that any fp-pseudofunctor com- mutes with the (−,..., =) operation. The second witnesses the equality Fπ ,...,Fπ ◦ F f ,...,f  = Ff ,...,Ff  ‘unpacking’ an n-ary tupling 1 n 1 n 1 n from inside F . Lemma 3. Let (F, q ):(B, Π (−)) → (C, Π (−)) be an fp-pseudofunctor. n n 1. For any finite family of 1-cells (f : A → A ) in B, there exists an i i i=1,...,n × n n × invertible 2-cell nat : q ◦ Ff ⇒ F ( f ) ◦ q such that the pair f  i i • A A i=1 i=1 • (q , nat) forms a a pseudonatural transformation n n (F (−),...,F (=)) ⇒ (F ◦ )(−,..., =) i=1 i=1 2. For any finite family of 1-cells (f : X → B ) in B, there exists a i i i=1,...,n canonical choice of naturally invertible 2-cell unpack : Fπ ,...,Fπ ◦ 1 n F f ,...,f ⇒Ff ,...,Ff  : FX → FB . 1 n 1 n i i=1 2.3 Cartesian closed bicategories A cartesian closed bicategory is an fp-bicategory (B, Π (−)) equipped with a biadjunction (−)×A (A = −) for every A ∈B. Examples include the bicategory of generalised species [17], bicategories of concurrent games [49], and bicategories of operads [26]. In the categorical setting, every natural transformation between cartesian functors is monoidal with respect to the cartesian structure and a similar fact is true bicat- egorically: every pseudonatural transformation is canonically compatible with the product structure, see [55, § 4.1.1]. Relative full completeness for bicategorical cartesian closed structure 285 Definition 6. A cartesian closed bicategory or cc-bicategory is an fp-bicategory (B, Π (−)) equipped with the following data for every A, B ∈B: 1. A chosen object (A = B), 2. A specified 1-cell eval :(A = B) × A → B, A,B 3. For every X ∈B, an adjoint equivalence eval ◦(−×A) A,B B(X, A = B) B(X × A, B) specified by a choice of universal arrow ε :eval ◦ (λf × A) = ⇒ f. f A,B We call the functor λ(−) currying and refer to λf as the currying of f. Explicitly, the counit ε satisfies the following universal property. For every 1-cell g : X → (A = B) and 2-cell α : eval ◦ (g × A) ⇒ f there exists a unique A,B † † 2-cell e (α): g ⇒ λf such that ε • eval ◦ (e (α) × A) = α. This defines a lax f A,B exponential structure. One obtains a pseudo (bicategorical) exponential structure by further requiring that ε and the unit η := e (id ) are invertible. eval ◦(t×A) A,B op Example 5. Every ‘presheaf’ 2-category Bicat(B , Cat) has all bicategorical lim- its [52, Proposition 3.6], given pointwise, and is cartesian closed with (P = Q)X := op Bicat(B , Cat)(YX × P, Q)[55, Chapter 6]. As for products, we adopt the notational conventions that are standard in the categorical setting, for example by writing (f = g):(A = B) → (A = B ) for the currying of (g ◦ eval ) ◦ (Id × f). A,B A = B Just as fp-pseudofunctors preserve products up to equivalence, cartesian closed pseudofunctors preserve products and exponentials up to equivalence. Definition 7. A cartesian closed pseudofunctor or cc-pseudofunctor between cc-bicategories (B, Π (−), = ) and (C, Π (−), = ) is an fp-pseudofunctor (F, q ) n n equipped with specified equivalences m : F (A = B)  (FA = FB):q A,B A,B for every A, B ∈B, where m : F (A = B) → (FA = FB) is the currying of A,B × = × F (eval ) ◦ q . A cc-pseudofunctor (F, q , q ) is strict if (F, q ) is a A,B A = B,A strict fp-pseudofunctor such that F (A = B)=(FA = FB) F (eval )=eval F (ε )= ε t Ft A,B F A,F B F (λt)= λ(Ft) q =Id FA =FB A,B with equivalences given by the 2-cells e (eval ◦ κ):Id = ⇒ λ(eval ◦ Id ) F A,F B  F A,F B (FA = FB) (FA = FB)×FA where κ is the canonical isomorphism Id × FA Id . FA = FB (FA = FB)×FA 286 M. Fiore and P. Saville Remark 3. As is well-known in the case of Cat (e.g. [44, IV.2]), every equivalence X Y in a bicategory gives rise to an adjoint equivalence between X and Y with the same 1-cells (see e.g. [42, pp. 28–29]). Thus, one may assume without loss of generality that all the equivalences in the preceding definition are adjoint equivalences. The same observation applies to the definition of fp-pseudofunctors. Notation 4. For cc-bicategories B and C we write cc-Bicat(B, C) for the bi- category of cc-pseudofunctors, pseudonatural transformations and modifica- tions (c.f. Notation 3). 3 Bicategorical glueing The glueing construction has been discovered in various forms, with correspond- ingly various names: the notions of logical relation [50,57], sconing [24], Freyd covers, and glueing (e.g. [40]) are all closely related (see e.g. [47] for an overview of the connections). Originally presented set-theoretically, the technique was quickly given categorical expression [43,47] and is now a standard component of the armoury for studying type theories (e.g. [40,12]). The glueing gl(F ) of categories C and D along a functor F : C → D may be defined as the comma category (id ↓ F ). We define bicategorical glueing analogously. Definition 8. 1. Let F : A→C and G : B→ C be pseudofunctors of bicategories. The comma bicategory (F ↓ G) has objects triples (A ∈A,f : FA → GB, B ∈B). The 1-cells (A, f, B) → (A ,f ,B ) are triples (p, α, q), where p : A → A and q : B → B are 1-cells and α is an invertible 2-cell α : f ◦ Fp ⇒ Gq ◦ f. The 2-cells (p, α, q) ⇒ (p ,α ,q ) are pairs of 2-cells (σ : p ⇒ p ,τ : q ⇒ q ) such that the following diagram commutes: f ◦F (σ) f ◦ F (p) f ◦ F (p ) (2) G(q) ◦fG(q ) ◦ f G(τ)◦f Identities and horizontal composition are given by the following pasting dia- grams. F (r◦p) F Id Fp A Fr FA FA FA FA FA f f f f f f GB GB GB GB GB GId B Gq Gs G(s◦q) Relative full completeness for bicategorical cartesian closed structure 287 Vertical composition, the identity 2-cell, and the structural isomorphisms are given component-wise. 2. The glueing bicategory gl(J) of bicategories B and C along a pseudofunctor J : B→ C is the comma bicategory (id ↓ J). We call axiom (2) the cylinder condition due to its shape when viewed as a (3-dimensional) pasting diagram. Note that one directly obtains projection π π dom cod pseudofunctors B ← −−− gl(J) − −−→C. We develop some basic theory of glueing bicategories, which we shall put to use in Section 5. We follow the terminology of [15]. Definition 9. Let J : B→ X be a pseudofunctor. The relative hom-pseudofunctor op J : X→ Bicat(B , Cat) is defined by JX := X (J(−),X). Following [15], one might call the glueing bicategory gl(J) associated to a relative hom-pseudofunctor the bicategory of B-intensional Kripke relations of arity J, and view it as an intensional, bicategorical, version of the category of Kripke relations. The relative hom-pseudofunctor preserves all bilimits that exist in its domain. For products, this may be described explicitly. Lemma 4. For any fp-bicategory (X , Π (−)) and pseudofunctor J : B→ X , the relative hom-pseudofunctor J extends canonically to an fp-pseudofunctor. × n n Proof. Take q to be the n-ary tupling X (J(−),X ) −→X (J(−), X ). i i X i=1 i=1 This forms a pseudonatural transformation with naturality witnessed by post. For any pseudofunctor J : B→ X there exists a pseudonatural transformation op (l, l):Y ⇒J◦ J : B→ Bicat(B , Cat) given by the functorial action of J on hom-categories. One may therefore define the following. Definition 10. For any pseudofunctor J : B→ X , define the extended Yoneda pseudofunctor Y : B→ gl(J) by setting YB := YB, (l, l) , JB , Yf := (−,B) J −1 (Yf, (φ ) , Jf), and Y(τ : f ⇒ f : B → B ):=(Yτ, Jτ). The cylinder −,f J Y Y Y J condition holds by the naturality of φ , and the 2-cells φ and ψ are (φ ,φ ) Y J and (ψ ,ψ ), respectively. The extended Yoneda pseudofunctor satisfies a corresponding ‘extended Yoneda lemma’ (c.f. [15, pp. 33]). Lemma 5. For any pseudofunctor J : B→ X and P =(P, (k, k),X) ∈ gl(J) there exists an equivalence of pseudofunctors gl(J)(Y(−),P ) P and an invertible modification as in the diagram below. Hence Y is locally an equivalence. gl(J)(Y(−),P ) P dom (k,k) X (J(−),X) 288 M. Fiore and P. Saville Proof. The arrow marked is the composite of a projection and the equivalence arising from the Yoneda lemma. Its pseudo-inverse is the composite op P − → Bicat(B , Cat)(Y(−),P ) → gl(J)(Y(−),P ) (3) in which the equivalence arises from the Yoneda lemma and the unlabelled pseud- ofunctor takes a pseudonatural transformation (j, j):YB ⇒ P to the triple with first component (j, j), third component j (k (Id )) : JB → X and second B B B component defined using k and j. Chasing the definitions through and evaluating at A, B ∈B, one sees that when P := YB the composite (3) is equivalent to Y . Since (3) is locally an equivalence, Lemma 1(1) completes the proof. A,B 4 Cartesian closed structure on the glueing bicategory It is well-known that, if C and D are cartesian closed categories, D has pullbacks, and F : C → D is cartesian, then gl(F ) is cartesian closed (e.g. [40,12]). In this section we prove a corresponding result for the glueing bicategory. We shall be guided by the categorical proof, for which see e.g. [43, Proposition 2]. 4.1 Finite products in gl(J) Proposition 1. Let (B, Π (−)) and (C, Π (−)) be fp-bicategories and (J, q ): n n B→ C be an fp-pseudofunctor. Then gl(J) is an fp-bicategory with both projection pseudofunctors π and π strictly preserving products. dom cod For a family of objects (C ,c ,B ) , the n-ary product (C ,c ,B ) i i i i=1,...,n i i i i=1 n n n is defined to be the tuple C , q ◦ c , B . The kth projection i i i i=1 B i=1 i=1 π is (π ,μ ,π ), where μ is defined by commutativity of the following diagram: k k k k k × c ◦ π J(π ) ◦ q ◦ c k k k i B i (k) −1 π ◦ c (Jπ ◦ q ) ◦ c k i k i i B i (k) ◦q )◦Π c = i i (π ◦ Id ) ◦ c (π ◦Jπ ,..., Jπ ) ◦ q ◦ c k ( JB ) i k 1 n i i i B i i • (π ◦u )◦Π c k i i • π ◦ (Jπ ,..., Jπ ◦ q ) ◦ c k 1 n i B i For an n-ary family of 1-cells (g ,α ,f ):(Y, y, X) → (C ,c ,B )(i =1,...,n), i i i i i i the n-ary tupling is (g ,...,g , {α ,...,α }, f ,...,f ), where {α ,...,α } 1 n 1 n 1 n 1 n Relative full completeness for bicategorical cartesian closed structure 289 is the composite {α ,...,α } 1 n q ◦ c ◦g ,...,g  J(f ,...,f ) ◦ y i 1 n 1 n • i ∼ ∼ = = q ◦ ( c ◦g ,...,g )Id ◦ (Jf ,...,f ◦ y) i 1 n J( B ) 1 n B i i × × q ◦fuse (c ◦Jf ,...,f )◦y 1 n B B • • × × q ◦c ◦ g ,...,c ◦ g  q ◦Jπ ,..., Jπ  ◦ (Jf ,...,f ◦ y) 1 1 n n 1 n 1 n B B • • q ◦α ,...,α 1 n = × × q ◦Jf ◦ y,..., Jf ◦ y q ◦ ((Jπ ,..., Jπ ◦ Jf ,...,f ) ◦ y) 1 n 1 n 1 n B B • • × −1 q ◦(unpack ◦y) B f • • × −1 q ◦post q ◦ (Jf ,..., Jf ◦ y) 1 n B B • • Finally, for every family of 1-cells (g ,α ,f ):(Y, y, X) → (C ,c ,B )(i = i i i i i i 1,...,n) we require a glued 2-cell π ◦ (g ,...,g , {α ,...,α }, f ,...,f ) ⇒ 1 n 1 n 1 n (k) (k) (g ,α ,f ) to act as the counit. We take simply ( , ). This pair forms a k k k g 2-cell in gl(J), and the required universal property holds pointwise. Remark 4. If (J, q ): B→ X is an fp-pseudofunctor, then Y : B→ gl(J) canon- ically extends to an fp-pseudofunctor. The pseudoinverse to Yπ ,..., Yπ  is 1 n (−,..., =, , q ), where the component of the isomorphism at (f : X → B ) i i i=1,...,n × × −1 ∼ (c ) ◦F f  q ◦unpack B B = • • × × is F f  = ⇒ Id ◦F f  = ========⇒ q ◦Fπ ◦F f = ======⇒ q ◦Ff . • • • • • F (Π B ) i i B B • • 4.2 Exponentials in gl(J) As in the 1-categorical case, the definition of currying in gl(J) employs pullbacks. A pullback of the cospan (X → − X ← − X ) in a bicategory B is a bilimit for the 1 0 2 strict pseudofunctor X :(1 → − 0 ← − 2) →B determined by the cospan. We state the universal property in the form that will be most useful for our applications. f f 1 2 Lemma 6. The pullback of a cospan (X −→ X ←− X ) in a bicategory B 1 0 2 is determined, up to equivalence, by the following data and properties: a span γ γ 1 2 (X ←− P −→ X ) in B and an invertible 2-cell filling the diagram on the left 1 2 below γ γ μ μ 1 2 1 2 X X X X 1 ∼ 2 1 ∼ 2 f f 1 2 f f 1 2 such that 290 M. Fiore and P. Saville 1. for any other diagram as on the right above there exists a fill-in (u, Ξ ,Ξ ), 1 2 namely a 1-cell u : Q → P and invertible 2-cells Ξ : γ ◦ u ⇒ μ (i =1, 2) i i i satisfying ∼ f ◦Ξ 2 2 (f ◦ γ ) ◦uf ◦ (γ ◦ u) f ◦ μ 2 2 2 2 2 2 γ◦u μ (f ◦ γ ) ◦uf ◦ (γ ◦ u) f ◦ μ 1 1 1 1 1 1 = f ◦Ξ 1 1 2. for any 1-cells v, w : Q → P and 2-cells Ψ : γ ◦ v ⇒ γ ◦ w (i =1, 2) i i i satisfying ∼ f ◦Ψ ∼ 2 2 = = (f ◦ γ ) ◦vf ◦ (γ ◦ v) f ◦ (γ ◦ w)(f ◦ γ ) ◦ w 2 2 2 2 2 2 2 2 γ◦v γ◦w (f ◦ γ ) ◦vf ◦ (γ ◦ v) f ◦ (γ ◦ w)(f ◦ γ ) ◦ w 1 1 1 1 1 1 1 1 = f ◦Ψ 1 1 there exists a unique 2-cell Ψ : v ⇒ w such that Ψ = γ ◦ Ψ (i =1, 2). i i F G Example 6. 1. In Cat, the pullback of a cospan (B − → X ←−C) is the full subcategory of the comma category (F ↓ G) consisting of objects of the form (B, f, C) for which f : FB → GC is an isomorphism. Note that this differs from the strict (2-)categorical pullback in Cat, in which every f is required to be an identity (c.f. [65, Example 2.1]). op 2. Like any bilimit, pullbacks in the bicategory Bicat(B , Cat) are computed pointwise (see [53, Proposition 3.6]). We now define exponentials in the glueing bicategory. Precisely, we extend Proposition 1 to the following. Theorem 5. Let (B, Π (−), =) and (C, Π (−), =) be cc-bicategories such that n n C has pullbacks. For any fp-pseudofunctor (J, q ): (B, Π (−)) → (C, Π (−)), n n the glueing bicategory gl(J) has a cartesian closed structure with forgetful pseudo- functor π : gl(J) →B strictly preserving products and exponentials. dom The evaluation map. We begin by defining the mapping (−) = (=) and the evaluation 1-cell eval.For C := (C, c, B),C := (C ,c ,B ) ∈ gl(J) we set C = C to be the left-hand vertical leg of the following pullback diagram, in which we write m := λ(J(eval ) ◦ q ). B,B B,B B = B ,B c,c C ⊃ C (C = C ) p  c,c c,c λ(c ◦eval  ) C,C J(B = B)(JB = JB)(C = JB ) B,B λ(eval  ◦((JB =JB )×c)) JB,JB λ(eval ◦ ((JB =JB ) × c)) ◦ m JB,JB B,B (4) Relative full completeness for bicategorical cartesian closed structure 291 Example 7. The pullback (4) generalises the well-known definition of a logical rela- tion of varying arity [36]. Indeed, where J := K is the relative hom-pseudofunctor for an fp-pseudofunctor (K, q ): B→ X between cc-bicategories, A ∈B and X, X ∈X , the functor m (A) takes a 1-cell f : KA → (X = X )in X X,X to the pseudonatural transformation YA ×X (K(−),X) ⇒X (K(−),X ) with components λB . λ(ρ : B → A, u : KB → X) . eval ◦f ◦ K(ρ),u. Intuitively, X,X therefore, the pullback enforces the usual closure condition defining a logical relation at exponential type, while also tracking the isomorphism witnessing that this condition holds (c.f. [36,3,15]). Notation 6. For reasons of space—particularly in pasting diagrams—we will sometimes write  c := eval  ◦ ((JB = JB ) × c):(JB = JB ) × C → JB JB,JB when c : C → JB in C. The evaluation map eval is defined to be (eval ◦(q  × C), E , eval ), C,C c,c C,C B,B C,C where the witnessing 2-cell E is given by the pasting diagram below, in which C,C the unlabelled arrow is q ◦ (p × c). c,c (B =B ,B) eval  ◦(q  ×C) C,C c,c q ×C c,c (C ⊃ C ) × C (C = C ) ×CC p ×C ∼ λ(c ◦eval )×C c,c = C,C m ×C B,B λ(c)×C p ×c c,c J(B = B ) × C (JB = JB ) × C (C = JB ) × C J(B =B )×c ∼ (JB =JB )×c c eval J(B = B ) × JB (JB = JB ) × JB C,JB m ×JB B,B q ε eval (B =B ,B) JB,JB J ((B = B ) × B) JB Jeval B,B Here the bottom = denotes a composite of Φ, structural isomorphisms and −1 −1 Φ , and the top = denotes a composite of ω × C with instances of Φ, Φ , c,c and the structural isomorphisms. The currying operation. Let R := (R, r, Q), C := (C, c, B) and C := (C ,c ,B ) and suppose given a 1-cell (t, α, s): R×C → C . We construct λ(t, α, s) using the universal property (4) of the pullback. To this end, we define invertible composites −1 † −1 U and T as in the following two diagrams and set L := η • e (U ◦ α ◦ T ): α α α α λ(c ◦ eval ) ◦ λt ⇒ (λ( c) ◦ m ) ◦ (J(λs) ◦ r). C,C B,B 292 M. Fiore and P. Saville α × eval ◦ ((λ( c) ◦ m ) ◦ (J(λs) ◦ r)) × C Js ◦ q ◦ (r × c) C,JB B,B Q,B (eval ◦ (λ( c) × C)) ◦ (m  ◦ (J(λs) ◦ r)) × C Jε ◦(q ◦(r×c)) C,JB B,B Q,B ε ◦(m ◦(J(λs)◦r))×C c B,B c ◦ (m ◦ (J(λs) ◦ r)) × C J(eval ◦ (λs × B)) ◦ q ◦ (r × c) B,B B,B Q,B (eval ◦ (m × JB)) ◦ ((J(λs) × JB) ◦ (r × c)) JB,JB B,B ε ◦(J(λs)×JB)◦(r×c) (Jeval◦q ) J(eval ) ◦ q ◦ ((J(λs) × JId ) ◦ (r × c)) B,B  B (B = B ,B) The unlabelled arrow is the canonical composite of nat with φ λs,id eval,λ(s)×B and structural isomorphisms. T is then defined using U : α α eval  ◦ λ(c ◦ eval ) ◦ λt ×Cc ◦ t C,JB C,C ∼ c ◦ε (eval ◦ (λ(c ◦ eval ) × C)) ◦ (λ(t) × C) c ◦ (eval ◦ (λ(t) × C)) C,JB C,C C,C ε ◦(λ(t)×C) (c ◦eval) (c ◦ eval ) ◦ (λ(t) × C) C,C Applying the universal property of the pullback (4)toL , one obtains a 1-cell lam(t) and a pair of invertible 2-cells Γ  and Δ  filling the diagram c,c c,c R λ(t) lam(t) Δ c,c Γ c,c C ⊃ C (C = C ) c,c J(λs)◦r p  c,c c,c λ(c ◦eval ) C,C J(B = B)(C = JB ) λ( c)◦m B,B We define λ(t, α, s):= lam(t),Γ ,λs . c,c The counit 2-cell. Finally we come to the counit. For a 1-cell t := (t, α, s): (R, r, Q) × (C, c, B) → (C ,c ,B ) the 1-cell eval ◦ λ(t, α, s) × (C, c, B) unwinds to the pasting diagram below, in which the unlabelled arrow is q ◦ (r × c): Q,B Relative full completeness for bicategorical cartesian closed structure 293 eval  ◦ (q  × C) ◦ (lam(t) × C) C,C c,c eval  ◦(q  ×C) lam(t)×C C,C c,c R × C (C ⊃ C ) ×CC Γ  ×c c,c q ×c r×c c,c J(λs)×JB J(λs)×ψ C,C JQ × JB J(B = B ) × JB ∼ ∼ = = J(λs)×JId × × B q q Q,B (B = B ,B) nat J(Q × B) J (B = B ) × B JB Jeval J(λs×B) B,B J(eval  ◦ (λs × B)) B,B For the counit ε we take the 2-cell with first component e defined by t t (eval  ◦ (q  × C)) ◦ (lam(t) × C) t C,C c,c ∼ t eval ◦ ((q ◦ lam(t)) × C)eval ◦ (λ(t) × C) C,C c,c C,C eval  ◦(Δ  ×C) C,C c,c and second component simply ε : eval  ◦ (λ(s) × B) ⇒ s. This pair forms an s B,B invertible 2-cell in gl(J). One checks this satisfies the required universal property in a manner analogous to the 1-categorical case (see [55] for the full details). This completes the proof of Theorem 5. 5 Relative full completeness We apply the theory developed in the preceding two sections to prove the relative full completeness result. As outlined in the introduction, this corresponds to a proof of conservativity of the theory of rewriting for the higher-order equational theory of rewriting in STLC over the algebraic equational theory of rewriting in STPC. We adapt ‘Lafont’s argument’ [39, Annexe C] from the form presented in [16], for which we require bicategorical versions of the free cartesian category × ×,→ F [C] and free cartesian closed category F [C] over a category C. In line with the strategy for the STLC (c.f. [12, pp. 173–4]), we deal with the contravariance of the pseudofunctor (− = =) by restricting to a bicategory of cc-pseudofunctors, pseudonatural equivalences (that is, pseudonatural transformations for which each component is a given equivalence), and invertible modifications. We denote this with the subscript , . = 294 M. Fiore and P. Saville Lemma 7. For any bicategory B, fp-bicategory (C, Π (−)) and cc-bicategory (D, Π (−), = ): × × × 1. There exists an fp-bicategory F [B] and a pseudofunctor η : B→ F [B] such that composition with η induces a biequivalence fp-Bicat(F [B], C) − → Bicat(B, C) ×,→ = ×,→ 2. There exists a cc-bicategory F [B] and a pseudofunctor η : B→ F [B] such that composition with η induces a biequivalence ×,→ cc-Bicat ∼(F [B], D) − → Bicat(B, D) ,= Proof (sketch). A syntactic construction suffices: one defines formal products and exponentials and then quotients by the axioms (see [48, p. 79] or [55]). Thus, for any bicategory B, fp-bicategory (C, Π (−)), and pseudofunctor # × F : B→ C there exists an fp-pseudofunctor F : F [B] →C and an equivalence # × × F ◦ η F . Moreover, for any fp-pseudofunctor G : F [B] →C such that × # G ◦ η F one has G F . A corresponding result holds for cc-bicategories and cc-pseudofunctors. Theorem 7. For any bicategory B the universal fp-pseudofunctor ι : F [B] → ×,→ = = ×,→ F [B] extending η is locally an equivalence. Hence η : B→ F [B] is locally an equivalence. Proof. Since ι preserves finite products, the bicategory gl(ι) is cartesian closed (Theorem 5). The composite K:= Y ◦ η : B→ gl(ι) therefore induces a # ×,→ cc-pseudofunctor K : F [B] → gl(ι). # × # = × First observe that (K ◦ ι) ◦ η K ◦ η K= Y ◦ η . Since Y is canonically an fp-pseudofunctor (Remark 4), it follows that K ◦ ι Y. Since Y is locally an equivalence (Lemma 5), Lemma 1(1) entails that K ◦ ι is locally an equivalence. Next, examining the definition of Y one sees that π ◦ Y = ι, and so dom # = × × = (π ◦ K ) ◦ η (π ◦ Y) ◦ η ι ◦ η η dom dom # # It follows that π ◦ K id ×,→ , and hence that π ◦ K is also locally dom dom F [B] an equivalence. ι K π dom × ×,→ ×,→ Now consider the composite F [B] →F − [B] −−→ gl(ι) − −−→F [B]. By Lemma 1(2) and the preceding, ι is locally an equivalence. Finally, it is direct × × from the construction of F [B] that η is locally an equivalence; thus, so are × = ι ◦ η η . Acknowledgements. We thank all the anonymous reviewers for their comments: these improved the paper substantially. We are especially grateful to the reviewer who pointed out an oversight in the original formulation of Lemma 1(2), which consequently affected the argument in Theorem 7, and provided the elegant fix therein. The second author was supported by a Royal Society University Research Fellow Enhancement Award. Relative full completeness for bicategorical cartesian closed structure 295 References 1. Abbott, M.G.: Categories of containers. Ph.D. thesis, University of Leicester (2003) 2. Abramsky, S., Jagadeesan, R.: Games and full completeness for multi- plicative linear logic. Journal of Symbolic Logic 59(2), 543–574 (1994). 3. Alimohamed, M.: A characterization of lambda definability in categorical models of implicit polymorphism. Theoretical Computer Science 146(1-2), 5–23 (1995). 4. Balat, V., Di Cosmo, R., Fiore, M.: Extensional normalisation and typed-directed partial evaluation for typed lambda calculus with sums. In: Proceedings of the 31st Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages. pp. 64–76 (2004) 5. B´enabou, J.: Introduction to bicategories. In: Reports of the Midwest Category Seminar. pp. 1–77. Springer Berlin Heidelberg, Berlin, Heidelberg (1967) 6. Bloom, S.L., Esik, Z., Labella, A., Manes, E.G.: Iteration 2-theories. Applied Cate- gorical Structures 9(2), 173–216 (2001). 7. Borceux, F.: Bicategories and distributors, Encyclopedia of Mathematics and its Applications, vol. 1, pp. 281–324. Cambridge University Press (1994). 8. Carboni, A., Kelly, G.M., Walters, R.F.C., Wood, R.J.: Cartesian bicategories II. Theory and Applications of Categories 19(6), 93–124 (2008), http://www.tac.mta. ca/tac/volumes/19/6/19-06abs.html 9. Carboni, A., Lack, S., Walters, R.F.C.: Introduction to extensive and distribu- tive categories. Journal of Pure and Applied Algebra 84(2), 145–158 (1993). 10. Carboni, A., Walters, R.F.C.: Cartesian bicategories I. Journal of Pure and Applied Algebra 49(1), 11–32 (1987). 11. Castellan, S., Clairambault, P., Rideau, S., Winskel, G.: Games and strategies as event structures. Logical Methods in Computer Science 13 (2017) 12. Crole, R.L.: Categories for Types. Cambridge University Press (1994). 13. Dagand, P.E., McBride, C.: A categorical treatment of ornaments. In: Pro- ceedings of the 28th Annual ACM/IEEE Symposium on Logic in Computer Science. pp. 530–539. IEEE Computer Society, Washington, DC, USA (2013). 14. Fiore, M.: Axiomatic Domain Theory in Categories of Partial Maps. Distinguished Dissertations in Computer Science, Cambridge University Press (1996) 15. Fiore, M.: Semantic analysis of normalisation by evaluation for typed lambda calculus. In: Proceedings of the 4th ACM SIGPLAN International Conference on Principles and Practice of Declarative Programming. pp. 26–37. ACM, New York, NY, USA (2002). 16. Fiore, M., Di Cosmo, R., Balat, V.: Remarks on isomorphisms in typed lambda calculi with empty and sum types. In: Proceedings of the 28th Annual IEEE Symposium on Logic in Computer Science. pp. 147–156. IEEE Computer Society Press (2002). 17. Fiore, M., Gambino, N., Hyland, M., Winskel, G.: The cartesian closed bicategory of generalised species of structures. Journal of the London Mathematical Society 77(1), 203–220 (2007). 296 M. Fiore and P. Saville 18. Fiore, M., Gambino, N., Hyland, M., Winskel, G.: Relative pseudomonads, Kleisli bicategories, and substitution monoidal structures. Selecta Mathematica New Series (2017) 19. Fiore, M., Joyal, A.: Theory of para-toposes. Talk at the Category Theory 2015 Conference. Departamento de Matematica, Universidade de Aveiro (Portugal) 20. Fiore, M., Saville, P.: A type theory for cartesian closed bicategories. In: Proceedings of the 34th Annual ACM/IEEE Symposium on Logic in Computer Science (2019). 21. Fiore, M., Saville, P.: Coherence and normalisation-by-evaluation for bicategorical cartesian closed structure. Preprint (2020) 22. Fiore, M., Simpson, A.: Lambda definability with sums via Grothendieck logical relations. In: Girard, J.Y. (ed.) Typed lambda calculi and applications: 4th inter- national conference. pp. 147–161. Springer Berlin Heidelberg, Berlin, Heidelberg (1999) 23. Freyd, P.: Algebraically complete categories. In: Lecture Notes in Mathematics, pp. 95–104. Springer Berlin Heidelberg (1991). 24. Freyd, P.J., Scedrov, A.: Categories, Allegories. Elsevier North Holland (1990) 25. Gambino, N., Joyal, A.: On operads, bimodules and analytic functors. Memoirs of the American Mathematical Society 249(1184), 153–192 (2017) 26. Gambino, N., Kock, J.: Polynomial functors and polynomial monads. Mathemati- cal Proceedings of the Cambridge Philosophical Society 154(1), 153–192 (2013). 27. Ghani, N.: Adjoint rewriting. Ph.D. thesis, University of Edinburgh (1995) 28. Gibbons, J.: Conditionals in distributive categories. Tech. rep., University of Oxford (1997) 29. G.L. Cattani, Fiore, M., Winskel, G.: A theory of recursive domains with applications to concurrency. In: Proceedings of the 13th Annual IEEE Symposium on Logic in Computer Science. pp. 214–225. IEEE Computer Society (1998) 30. Gurski, N.: An Algebraic Theory of Tricategories. University of Chicago, Department of Mathematics (2006) 31. Hasegawa, M.: Logical predicates for intuitionistic linear type theories. In: Girard, J.Y. (ed.) Typed lambda calculi and applications: 4th international conference. pp. 198–213. Springer Berlin Heidelberg, Berlin, Heidelberg (1999) 32. Hilken, B.: Towards a proof theory of rewriting: the simply typed 2λ-calculus. The- oretical Computer Science 170(1), 407–444 (1996). 3975(96)80713-4 33. Hirschowitz, T.: Cartesian closed 2-categories and permutation equivalence in higher-order rewriting. Logical Methods in Computer Science 9, 1–22 (2013) 34. Jay, C.B., Ghani, N.: The virtues of eta-expansion. Journal of Functional Program- ming 5(2), 135–154 (1995). 35. Johann, P., Polonsky, P.: Higher-kinded data types: Syntax and semantics. In: 34th Annual ACM/IEEE Symposium on Logic in Computer Science. IEEE (2019). 36. Jung, A., Tiuryn, J.: A new characterization of lambda definability. In: Bezem, M., Groote, J.F. (eds.) Typed Lambda Calculi and Applications. pp. 245–257. Springer Berlin Heidelberg, Berlin, Heidelberg (1993) 37. Lack, S.: A 2-Categories Companion, pp. 105–191. Springer New York, New York, NY (2010) 38. Lack, S., Walters, R.F.C., Wood, R.J.: Bicategories of spans as cartesian bicategories. Theory and Applications of Categories 24(1), 1–24 (2010) Relative full completeness for bicategorical cartesian closed structure 297 39. Lafont, Y.: Logiques, cat´egories et machines. Ph.D. thesis, Universit´e Paris VII (1987) 40. Lambek, J., Scott, P.J.: Introduction to Higher Order Categorical Logic. Cambridge University Press, New York, NY, USA (1986) 41. Leinster, T.: Basic bicategories (May 1998), pdf 42. Leinster, T.: Higher operads, higher categories. No. 298 in London Mathematical Society Lecture Note Series, Cambridge University Press (2004) 43. Ma, Q.M., Reynolds, J.C.: Types, abstraction, and parametric polymorphism, part 2. In: Brookes, S., Main, M., Melton, A., Mislove, M., Schmidt, D. (eds.) Mathematical Foundations of Programming Semantics. pp. 1–40. Springer Berlin Heidelberg, Berlin, Heidelberg (1992) 44. Mac Lane, S.: Categories for the Working Mathematician, Graduate Texts in Mathematics, vol. 5. Springer-Verlag New York, second edn. (1998). 45. Mac Lane, S., Par´e, R.: Coherence for bicategories and indexed categories. Journal of Pure and Applied Algebra 37, 59–80 (1985). 4049(85)90087-8 46. Marmolejo, F., Wood, R.J.: Kan extensions and lax idempotent pseudomonads. Theory and Applications of Categories 26(1), 1–29 (2012) 47. Mitchell, J.C., Scedrov, A.: Notes on sconing and relators. In: B¨ orger, E., J., G., Kleine Buning, ¨ H., Martini, S., Richter, M.M. (eds.) Computer Science Logic. pp. 352–378. Springer Berlin Heidelberg, Berlin, Heidelberg (1993) 48. Ouaknine, J.: A two-dimensional extension of Lambek’s categorical proof theory. Master’s thesis, McGill University (1997) 49. Paquet, H.: Probabilistic concurrent game semantics. Ph.D. thesis, University of Cambridge (2020) 50. Plotkin, G.D.: Lambda-definability and logical relations. Tech. rep., University of Edinburgh School of Artificial Intelligence (1973), memorandum SAI-RM-4 51. Power, A.J.: An abstract formulation for rewrite systems. In: Pitt, D.H., Rydeheard, D.E., Dybjer, P., Pitts, A.M., Poign´e, A. (eds.) Category Theory and Computer Science. pp. 300–312. Springer Berlin Heidelberg, Berlin, Heidelberg (1989) 52. Power, A.J.: Coherence for bicategories with finite bilimits I. In: Gray, J.W., Scedrov, A. (eds.) Categories in Computer Science and Logic: Proceedings of the AMS-IMS-SIAM Joint Summer Research Conference, vol. 92, pp. 341–349. AMS (1989) 53. Power, A.J.: A general coherence result. Journal of Pure and Applied Algebra 57(2), 165–173 (1989). 54. Rydeheard, D.E., Stell, J.G.: Foundations of equational deduction: A categorical treatment of equational proofs and unification algorithms. In: Pitt, D.H., Poign´e, A., Rydeheard, D.E. (eds.) Category Theory and Computer Science. pp. 114–139. Springer Berlin Heidelberg, Berlin, Heidelberg (1987) 55. Saville, P.: Cartesian closed bicategories: type theory and coherence. Ph.D. thesis, University of Cambridge (Submitted) 56. Seely, R.A.G.: Modelling computations: A 2-categorical framework. In: Gries, D. (ed.) Proceedings of the 2nd Annual IEEE Symposium on Logic in Computer Science. pp. 65–71. IEEE Computer Society Press (June 1987) 57. Statman, R.: Logical relations and the typed λ-calculus. Information and Control 65, 85–97 (1985) 58. Stell, J.: Modelling term rewriting systems by sesqui-categories. In: Proc. Cat´ egories, Alg`ebres, Esquisses et N´eo-Esquisses (1994) 298 M. Fiore and P. Saville 59. Street, R.: Fibrations in bicategories. Cahiers de Topologie et G´eom´etrie Diff´erentielle Cat´egoriques 21(2), 111–160 (1980), 60. Street, R.: Categorical structures. In: Hazewinkel, M. (ed.) Handbook of Algebra, vol. 1, chap. 15, pp. 529–577. Elsevier (1995) 61. Tabareau, N.: Aspect oriented programming: A language for 2-categories. In: Proceedings of the 10th International Workshop on Foundations of Aspect-oriented Languages. pp. 13–17. ACM, New York, NY, USA (2011). 62. Taylor, P.: Practical Foundations of Mathematics, Cambridge Studies in Advanced Mathematics, vol. 59. Cambridge University Press (1999) 63. Troelstra, A.S., Schwichtenberg, H.: Basic proof theory. No. 43 in Cambridge Tracts in Theoretical Computer Science, Cambridge University Press, second edn. (2000) 64. Verity, D.: Enriched categories, internal categories and change of base. Ph.D. thesis, University of Cambridge (1992), TAC reprint available at tac/reprints/articles/20/tr20abs.html 65. Weber, M.: Yoneda structures from 2-toposes. Applied Categorical Structures 15(3), 259–323 (2007). Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made. The images or other third party material in this chapter are included in the chapter’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. A duality theoretic view on limits of finite structures 1 1 2( ) Mai Gehrke , Tom´ aˇs Jakl , and Luca Reggio CNRS and Universit´eCote ˆ d’Azur, Nice, France {mgehrke,tomas.jakl} Institute of Computer Science of the Czech Academy of Sciences, Prague, Czech Republic and Mathematical Institute, University of Bern, Switzerland Abstract. A systematic theory of structural limits for finite models has been developed by Neˇsetˇril and Ossona de Mendez. It is based on the insight that the collection of finite structures can be embedded, via a map they call the Stone pairing, in a space of measures, where the desired limits can be computed. We show that a closely related but finer grained space of measures arises — via Stone-Priestley duality and the notion of types from model theory — by enriching the expressive power of first- order logic with certain “probabilistic operators”. We provide a sound and complete calculus for this extended logic and expose the functorial nature of this construction. The consequences are two-fold. On the one hand, we identify the logical gist of the theory of structural limits. On the other hand, our construction shows that the duality-theoretic variant of the Stone pairing captures the adding of a layer of quantifiers, thus making a strong link to recent work on semiring quantifiers in logic on words. In the process, we identify the model theoretic notion of types as the unifying concept behind this link. These results contribute to bridging the strands of logic in computer sci- ence which focus on semantics and on more algorithmic and complexity related areas, respectively. Keywords: Stone duality · finitely additive measures · structural limits · finite model theory · formal languages · logic on words 1 Introduction While topology plays an important role, via Stone duality, in many parts of se- mantics, topological methods in more algorithmic and complexity oriented areas of theoretical computer science are not so common. One of the few examples, This project has been supported by the European Research Council (ERC) under the European Union’s Horizon 2020 research and innovation program (grant agree- ment No.670624). Luca Reggio has received an individual support under the grants GA17-04630S of the Czech Science Foundation, and No.184693 of the Swiss National Science Foundation. The Author(s) 2020 J. Goubault-Larrecq and B. K¨ onig (Eds.): FOSSACS 2020, LNCS 12077, pp. 299–318, 2020. 300 M. Gehrke et al. the one we want to consider here, is the study of limits of finite relational struc- tures. We will focus on the structural limits introduced by Neˇsetˇril and Ossona de Mendez [15,17]. These provide a common generalisation of various notions of limits of finite structures studied in probability theory, random graphs, struc- tural graph theory, and finite model theory. The basic construction in this work is the so-called Stone pairing. Given a relational signature σ and a first-order formula ϕ in the signature σ with free variables v ,...,v , define 1 n |{a ∈ A | A |= ϕ(a)}| (the probability that a random ϕ, A = (1) assignment in A satisfies ϕ). |A| Neˇsetˇril and Ossona de Mendez view the map A →-,A as an embedding of the finite σ-structures into the space of probability measures over the Stone space dual to the Lindenbaum-Tarski algebra of all first-order formulas in the signature σ. This space is complete and thus provides the desired limit objects for all sequences of finite structures which embed as Cauchy sequences. Another example of topological methods in an algorithmically oriented area of computer science is the use of profinite monoids in automata theory. In this setting, profinite monoids are the subject of the extensive theory, based on theo- rems by Eilenberg and Reiterman, and used, among others, to settle decidability questions [18]. In [4], it was shown that this theory may be understood as an application of Stone duality, thus making a bridge between semantics and more algorithmically oriented work. Bridging this semantics-versus-algorithmics gap in theoretical computer science has since gained quite some momentum, notably with the recent strand of research by Abramsky, Dawar and co-workers [2,3]. In this spirit, a natural question is whether the structural limits of Neˇsetˇril and Os- sona de Mendez also can be understood semantically, and in particular whether the topological component may be seen as an application of Stone duality. More precisely, recent work on understanding quantifiers in the setting of logic on finite words [5] has shown that adding a layer of certain quantifiers (such as classical and modular quantifiers) corresponds dually to measure space constructions. The measures involved are not classical but only finitely additive and they take values in finite semirings rather than in the unit interval. Nev- ertheless, this appearance of measures as duals of quantifiers begs the further question whether the measure spaces in the theory of structural limits may be obtained via Stone duality from a semantic addition of certain quantifiers to classical first-order logic. The purpose of this paper is to address this question. Our main result is that the Stone pairing of Neˇsetˇril and Ossona de Mendez is related by a retraction to a Stone space of measures, which is dual to the Lindenbaum-Tarski algebra of a logic fragment obtained from first-order logic by adding one layer of prob- abilistic quantifiers, and which arises in exactly the same way as the spaces of semiring-valued measures in logic on words. That is, the Stone pairing, although originating from other considerations, may be seen as arising by duality from a semantic construction. A duality theoretic view on limits of finite structures 301 A foreseeable hurdle is that spaces of classical measures are valued in the unit interval [0, 1] which is not zero-dimensional and hence outside the scope of Stone duality. This is well-known to cause problems e.g. in attempts to combine non- determinism and probability in domain theory [12]. However, in the structural limits of Neˇsetˇril and Ossona de Mendez, at the base, one only needs to talk about finite models equipped with normal distributions and thus only the finite 1 2 intervals I = {0, , ,..., 1} are involved. A careful duality-theoretic analysis n n identifies a codirected diagram (i.e. an inverse limit system) based on these intervals compatible with the Stone pairing. The resulting inverse limit, which we denote Γ, is a Priestley space. It comes equipped with an algebra-like structure, which allows us to reformulate many aspects of the theory of structural limits in terms of Γ-valued measures as opposed to [0, 1]-valued measures. The analysis justifying the structure of Γ is based on duality theory for double quasi-operator algebras [7,8]. In the presentation, we have tried to compromise between giving interesting topo-relational insights into why Γ is as it is, and not overburdening the reader with technical details. Some interesting features of Γ, dictated by the nature of the Stone pairing and the ensuing codirected diagram, are that • Γ is based on a version of [0, 1] in which the rationals are doubled; • Γ comes with section-retraction maps [0, 1] Γ [0, 1]; • the map ι is lower semicontinuous while the map γ is continuous. These features are a consequence of general theory and precisely allow us to witness continuous phenomena relative to [0, 1] in the setting of Γ. Our contribution We show that the ambient measure space for the structural limits of Neˇsetˇril and Ossona de Mendez can be obtained via “adding a layer of quantifiers” in a suitable enrichment of first-order logic. The conceptual framework for seeing this is that of types from classical model theory. More precisely, we will see that a variant of the Stone pairing is a map into a space of measures with values in a Priestley space Γ. Further, we show that this map is in fact the embedding of the finite structures into the space of (0-)types of an extension of first-order logic, which we axiomatise. On the other hand, Γ-valued measures and [0, 1]-valued measures are tightly related by a retraction-section pair which allows the transfer of properties. These results identify the logical gist of the theory of structural limits and provide a new interesting connection between logic on words and the theory of structural limits in finite model theory. Outline of the paper. In section 2 we briefly recall Stone-Priestley duality, its application in logic via spaces of types, and the particular instance of logic on words (needed only to show the similarity of the constructions). In Section 3 we introduce the Priestley space Γ with its additional operations, and show that it admits [0, 1] as a retract. The spaces of Γ-valued measures are introduced in 302 M. Gehrke et al. Section 4, and the retraction of Γ onto [0, 1] is lifted to the appropriate spaces of measures. In Section 5 we introduce the Γ-valued Stone pairing and make the link with logic on words. Further, we compare convergence in the space of Γ-valued measures with the one considered by Neˇsetˇril and Ossona de Mendez. Finally, in Section 6 we show that constructing the space of Γ-valued measures dually corresponds to enriching the logic with probabilistic operators. 2 Preliminaries f g Notation. Throughout this paper, if X − → Y − → Z are functions, their composi- tion is denoted g · f. For a subset S ⊆ X, f : S → Y is the obvious restriction. Given any set T , (T ) denotes its power-set. Further, for a poset P , P is the poset obtained by turning the order of P upside down. 2.1 Stone-Priestley duality In this paper, we will need Stone duality for bounded distributive lattices in the order topological form due to Priestley [19]. It is a powerful and well established tool in the study of propositional logic and semantics of programming languages, see e.g. [9,1] for major landmarks. We briefly recall how this duality works. A compact ordered space is a pair (X, ≤) where X is a compact space and ≤ is a partial order on X which is closed in the product topology of X ×X. (Note that such a space is automatically Hausdorff). A compact ordered space is a Priestley space provided it is totally order-disconnected. That is, for all x, y ∈ X such that x ≤ y, there is a clopen (i.e. simultaneously closed and open) C ⊆ X which is an up-set for ≤, and satisfies x ∈ C but y/ ∈ C. We recall the construction of the Priestley space of a distributive lattice D. A non-empty proper subset F ⊂ D is a prime filter if it is (i) upward closed (in the natural order of D), (ii) closed under finite meets, and (iii) if a ∨ b ∈ F , either a ∈ F or b ∈ F . Denote by X the set of all prime filters of D. By Stone’s Prime Filter Theorem, the map -: D → (X ),a → a = {F ∈ X | a ∈ F } D D is an embedding. Priestley’s insight was that D can be recovered from X ,if the latter is equipped with the inclusion order and the topology generated by the sets of the form a and their complements. This makes X into a Priestley space — the dual space of D — and the map - is an isomorphism between D and the lattice of clopen up-sets of X . Conversely, any Priestley space X is the dual space of the lattice of its clopen up-sets. We call the latter the dual lattice of X. This correspondence extends to morphisms. In fact, Priestley duality states that the category of distributive lattices with homomorphisms is dually equivalent to the category of Priestley spaces and continuous monotone maps. We assume all distributive lattices are bounded, with the bottom and top denoted by 0 and 1, respectively. The bounds need to be preserved by homomorphisms. A duality theoretic view on limits of finite structures 303 When restricting to Boolean algebras, we recover the celebrated Stone duality restricted to Boolean algebras and Boolean spaces, i.e. compact Hausdorff spaces in which the clopen subsets form a basis. 2.2 Stone duality and logic: type spaces The theory of types is an important tool for first-order logic. We briefly recall the concept as it is closely related to, and provides the link between, two otherwise unrelated occurrences of topological methods in theoretical computer science. Consider a signature σ and a first-order theory T in this signature. For each n ∈ N, let Fm denote the set of first-order formulas whose free variables are among v = {v ,...,v }, and let Mod (T ) denote the class of all pairs (A, α) 1 n n where A is a model of T and α is an interpretation of v in A. Then the satis- faction relation, (A, α) |= ϕ, is a binary relation from Mod to Fm . It induces n n the equivalence relations of elementary equivalence ≡ and logical equivalence ≈ on these sets, respectively. The quotient FO (T)=Fm /≈ carries a natural n n Boolean algebra structure and is known as the n-th Lindenbaum-Tarski algebra of T . Its dual space is Typ (T ), the space of n-types of T , whose points can be identified with elements of Mod (T )/≡. The Boolean algebra FO(T)of all first-order formulas modulo logical equivalence over T is the directed colimit of the FO (T ) for n ∈ N while its dual space, Typ(T ), is the codirected limit of the Typ (T ) for n ∈ N and consists of models equipped with interpretations of the full set of variables. If we want to study finite models, there are two equivalent approaches: e.g. at the level of sentences, we can either consider the theory T of finite T -models, fin or the closure of the collection of all finite T -models in the space Typ (T ). This closure yields a space, which should tell us about finite T -structures. Indeed, it is equal to Typ (T ), the space of pseudofinite T -structures. For an application of fin this, see [10]. Below, we will see an application in finite model theory of the case T = ∅ (in this case we write FO(σ) and Typ(σ) instead of FO(∅) and Typ(∅)). In light of the theory of types as exposed above, the Stone pairing of Neˇsetˇril and Ossona de Mendez (see equation (1)) can be regarded as an embedding of finite structures into the space of probability measures on Typ(σ), which set- theoretically are finitely additive functions FO(σ) → [0, 1]. 2.3 Duality and logic on words As mentioned in the introduction, spaces of measures arise via duality in logic on words [5]. Logic on words, as introduced by Buc ¨ hi, see e.g. [14] for a recent survey, is a variation and specialisation of finite model theory where only models based on words are considered. I.e., a word w ∈ A is seen as a relational structure on {1,..., |w|}, where |w| is the length of w, equipped with a unary relation P , for each a ∈ A, singling out the positions in the word where the letter a appears. Each sentence ϕ in a language interpretable over these structures yields a language L ⊆ A consisting of the words satisfying ϕ. Thus, logic fragments ϕ 304 M. Gehrke et al. are considered modulo the theory of finite words and the Lindenbaum-Tarski algebras are subalgebras of (A ) consisting of the appropriate L ’s, cf. [10] for a treatment of first-order logic on words. For lack of logical completeness, the duals of the Lindenbaum-Tarski alge- bras have more points than those given by models. Nevertheless, the dual spaces of types, which act as compactifications and completions of the collections of models, provide a powerful tool for studying logic fragments by topological means. The central notion is that of recognition, in which, a Boolean subalgebra ∗ ∗ ∗ B⊆ (A ) is studied by means of the dual map η : β(A ) → X . Here β(A )is the Stone dual of (A ), also known in topology as the Cech-Stone compactifica- ∗ ∗ tion of the discrete space A , and X is the Stone dual of B. The set A embeds ∗ ∗ in β(A ), and η is uniquely determined by its restriction η : A → X .Now, 0 B Stone duality implies that L ⊆ A is in B iff there is a clopen subset V ⊆ X so −1 that η (V )= L. Anytime the latter is true for a map η and a language L as above, one says that η recognises L. When studying logic fragments via recognition, the following inductive step is central: given a notion of quantifier and a recogniser for a Boolean algebra of formulas with a free variable, construct a recogniser for the Boolean algebra generated by the formulas obtained by applying the quantifier. This problem was solved in [5], using duality theory, in a general setting of semiring quantifiers. The latter are defined as follows: let (S, +, ·, 0 , 1 ) be a semiring, and k ∈ S. Given a S S formula ψ(v), the formula ∃ v.ψ(v) is true of a word w ∈ A iff k =1 +···+1 , S,k S S m times, where m is the number of assignments of the variable v in w satisfying ψ(v). If S = Z/qZ, we obtain the so-called modular quantifiers, and for S the two-element lattice we recover the existential quantifier ∃. To deal with formulas with a free variable, one considers maps of the form f : β((A × 2) ) → X (the extra bit in A × 2 is used to mark the interpretation of the free variable). In [5] (see also [6]), it was shown that L is recognised ψ(v) by f iff for every k ∈ S the language L is recognised by the composite ∃ v.ψ(v) S,k S(f) ∗ ∗ ξ : A − −−−−→ S(β((A × 2) )) − −−−−→ S(X), (2) where S(X) is the space of finitely additive S-valued measures on X, and R ∗ ∗ ∗ maps w ∈ A to the measure μ : ((A × 2) ) → S sending K ⊆ (A × 2) to the sum 1 + ··· +1 , n times. Here, n is the number of interpretations α of S S w,K w,K the free variable v in w such that the pair (w, α), seen as an element of (A × 2) , belongs to K. Finally, S(f) sends a measure to its pushforward along f. 3 The space Γ Central to our results is a Priestley space Γ closely related to [0, 1], in which our measures will take values. Its construction comes from the insight that the range Here, being beyond the scope of this paper, we are ignoring the important role of the monoid structure available on the spaces (in the form of profinite monoids or BiMs, cf. [10,5]). A duality theoretic view on limits of finite structures 305 of the Stone pairing -,A, for a finite structure A and formulas restricted to a 1 2 fixed number of free variables, can be confined to a chain I = {0, , ,..., 1}. n n Moreover, the floor functions f : I  I are monotone surjections. The mn,n mn n ensuing system {f : I  I | m, n ∈ N} can thus be seen as a codirected mn,n mn n diagram of finite discrete posets and monotone maps. Let us define Γ to be the limit of this diagram. Then, Γ is naturally equipped with a structure of Priestley space, see e.g. [11, Corollary VI.3.3], and can be represented as based on the set − ◦ {r | r ∈ (0, 1]}∪{q | q ∈ Q ∩ [0, 1]}. The order of Γ is the unique total order which has 0 as bottom element, satisfies ∗ ∗ ◦ − r <s if and only if r< s for ∗ ∈{−, ◦}, and such that q is acoverof q − ◦ for every rational q ∈ (0, 1] (i.e. q <q , and there is no element strictly in between). In a sense, the values q represent approximations of the values of the form q . Cf. Figure 1. The topology of Γ is generated by the sets of the form ◦ ◦ − − ↑p = {x ∈ Γ | p ≤ x} and ↓q = {x ∈ Γ | x ≤ q } for p, q ∈ Q ∩ [0, 1] such that q = 0. The distributive lattice dual to Γ, denoted by L, is given by L = {⊥} ∪ (Q ∩ [0, 1]) , with ⊥ < q and q ≤ p for every p ≤ q in Q ∩ [0, 1]. L L Γ = L = Fig. 1. The Priestley space Γ and its dual lattice L 3.1 The algebraic structure on Γ When defining measures we need an algebraic structure available on the space of values. The space Γ fulfils this requirement as it comes equipped with a partial operation −: dom(−) → Γ, where dom(−)= {(x, y) ∈ Γ × Γ | y ≤ x} and ◦ ◦ ◦ ◦ − r − s =(r − s) r − s (r − s) if r − s ∈ Q − ◦ − − − − r − s =(r − s) r − s (r − s) otherwise. In fact, this (partial) operation is dual to the truncated addition on the lattice L. However, explaining this would require us to delve into extended Priestley duality for lattices with operations, which is beyond the scope of this paper. See [9] and also [7,8] for details. It also follows from the general theory that there exists another partial operation definable from −, namely: ◦ ◦ ∼: dom(−) → Γ,x ∼ y = {x − q | y< q ≤ x}. 306 M. Gehrke et al. Next, we collect some basic properties of − and ∼, needed in Section 4, which follow from the general theory of [7,8]. First, recall that a map into an ordered topological space is lower (resp. upper) semicontinuous provided the preimage of any open down-set (resp. open up-set) is open. Lemma 1. If dom(−) is seen as a subspace of Γ × Γ , the following hold: 1. dom(−) is a closed up-set in Γ × Γ ; 2. both −: dom(−) → Γ and ∼: dom(−) → Γ are monotone in the first coor- dinate, and antitone in the second; 3. −: dom(−) → Γ is lower semicontinuous; 4. ∼: dom(−) → Γ is upper semicontinuous. 3.2 The retraction Γ  [0, 1] In this section we show that, with respect to appropriate topologies, the unit interval [0, 1] can be obtained as a topological retract of Γ, in a way which is compatible with the operation −. This will be important in Sections 4 and 5, where we need to move between [0,1]-valued and Γ-valued measures. Let us define the monotone surjection given by collapsing the doubled elements: − ◦ γ : Γ → [0, 1],r ,r → r. (3) The map γ has a right adjoint, given by r if r ∈ Q ι:[0, 1] → Γ,r → (4) r otherwise. Indeed, it is readily seen that γ(y) ≤ x iff y ≤ ι(x), for all y ∈ Γ and x ∈ [0, 1]. The composition γ · ι coincides with the identity on [0, 1], i.e. ι is a section of γ. Moreover, this retraction lifts to a topological retract provided we equip Γ and [0, 1] with the topologies consisting of the open down-sets: Lemma 2. The map γ : Γ → [0, 1] is continuous and the map ι:[0, 1] → Γ is lower semicontinuous. −1 Proof. To check continuity of γ observe that, for a rational q ∈ (0, 1), γ (q, 1] −1 and γ [0,q) coincide, respectively, with the open sets ◦ − {↑p | p ∈ Q ∩ [0, 1] and q< p} and {↓p | p ∈ Q ∩ (0, 1] and p<q}. −1 − Also, ι is lower semicontinuous, for ι (↓q )=[0,q) whenever q ∈ Q∩(0, 1]. It is easy to see that both γ and ι preserve the minus structure available on Γ and [0,1] (the unit interval is equipped with the usual minus operation x − y defined whenever y ≤ x), that is, • γ(x − y)= γ(x ∼ y)= γ(x) − γ(y) whenever y ≤ x in Γ, and • ι(x − y)= ι(x) − ι(y) whenever y ≤ x in [0,1]. Remark. ι:[0, 1] → Γ is not upper semicontinuous because, for every q ∈ −1 ◦ ◦ ◦ Q ∩ [0, 1], ι (↑q )= {x ∈ [0, 1] | q ≤ ι(x)} = {x ∈ [0, 1] | γ(q ) ≤ x} =[q, 1]. A duality theoretic view on limits of finite structures 307 4 Spaces of measures valued in Γ and in [0, 1] The aim of this section is to replace [0, 1]-valued measures by Γ-valued measures. The reason for doing this is two-fold. First, the space of Γ-valued measures is Priestley (Proposition 4), and thus amenable to a duality theoretic treatment and a dual logic interpretation (cf. Section 6). Second, it retains more topological information than the space of [0, 1]-valued measures. Indeed, the former retracts onto the latter (Theorem 10). Let D be a distributive lattice. Recall that, classically, a monotone function m: D → [0, 1] is a (finitely additive, probability) measure provided m(0) = 0, m(1) = 1, and m(a)+ m(b)= m(a ∨ b)+ m(a ∧ b) for every a, b ∈ D. The latter property is equivalently expressed as ∀a, b ∈ D, m(a) − m(a ∧ b)= m(a ∨ b) − m(b). (5) We write M (D) for the set of all measures D → [0, 1], and regard it as an ordered topological space, with the structure induced by the product order and product topology of [0, 1] . The notion of (finitely additive, probability) Γ-valued measure is analogous to the classical one, except that the finite addi- tivity property (5) splits into two conditions, involving − and ∼. Definition 3. Let D be a distributive lattice. A Γ-valued measure (or simply a measure)on D is a function μ: D → Γ such that ◦ ◦ 1. μ(0)=0 and μ(1)=1 , 2. μ is monotone, and 3. for all a, b ∈ D, μ(a) ∼ μ(a ∧ b) ≤ μ(a ∨ b) − μ(b) and μ(a) − μ(a ∧ b) ≥ μ(a ∨ b) ∼ μ(b). We denote by M (D) the subspace of Γ consisting of the measures μ: D → Γ. Since Γ is a Priestley space, so is Γ equipped with the product order and topology. Hence, we regard M (D) as an ordered topological space, whose topol- ogy and order are induced by those of Γ . In fact M (D) is a Priestley space: Proposition 4. For any distributive lattice D, M (D) is a Priestley space. Proof. It suffices to show that M (D) is a closed subspace of Γ . Let D ◦ D ◦ D C = {f ∈ Γ | f(0) = 0 }∩{f ∈ Γ | f(1) = 1 }∩ {f ∈ Γ | f(a) ≤ f(b)}. 1,2 a≤b Note that the evaluation maps ev : Γ → Γ, f → f(a), are continuous for every a ∈ D. Thus, the first set in the intersection defining C is closed because it 1,2 is the equaliser of the evaluation map ev and the constant map of value 0 . D ◦ Similarly, for the set {f ∈ Γ | f(1) = 1 }. The last one is the intersection −1 of the sets of the form ev , ev  (≤), which are closed because ≤ is closed in a b Γ × Γ. Whence, C is a closed subset of Γ . Moreover, 1,2 M (D)= {f ∈ C | f(a) ∼ f(a ∧ b) ≤ f(a ∨ b) − f(b)} Γ 1,2 a,b∈D 308 M. Gehrke et al. ∩ {f ∈ C | f(a) − f(a ∧ b) ≥ f(a ∨ b) ∼ f(b)}. 1,2 a,b∈D From semicontinuity of − and ∼ (Lemma 1) and the following well-known fact in order-topology we conclude that M (D) is closed in Γ . Fact. Let X, Y be compact ordered spaces, f : X → Y a lower semicontinuous function and g : X → Y an upper semicontinuous function. If X is a closed subset of X, then so is E = {x ∈ X | g(x) ≤ f(x)}. Next, we prove a property which is very useful when approximating a frag- ment of a logic by smaller fragments (see, e.g., Section 5.1). Let us denote by DLat the category of distributive lattices and homomorphisms, and by Pries the category of Priestley spaces and continuous monotone maps. Proposition 5. The assignment D →M (D) yields a contravariant functor M : DLat → Pries which sends directed colimits to codirected limits. Proof. If h: D → E is a lattice homomorphism and μ: E → Γ is a measure, it is not difficult to see that M (h)(μ)= μ · h: D → Γ is a measure. The mapping M (h): M (E) →M (D) is clearly monotone. For continuity, recall that the Γ Γ Γ topology of M (D) is generated by the sets a<q = {ν : D → Γ | ν(a) <q } and a ≥ q = {ν : D → Γ | ν(a) ≥ q }, with a ∈ D and q ∈ Q ∩ [0, 1]. We have −1 ◦ M (h) (a<q)= {μ: E → Γ | μ(h(a)) <q } = h(a) <q −1 which is open in M (E). Similarly, M (h) (a ≥ q)= h(a) ≥ q, showing Γ Γ that M (h) is continuous. Thus, M is a contravariant functor. Γ Γ The rest of the proof is a routine verification. Remark 6. We work with the contravariant functor M : DLat → Pries be- cause M is concretely defined on the lattice side. However, by Priestley duality, DLat is dually equivalent to Pries, so we can think of M as a covariant functor Pries → Pries (this is the perspective traditionally adopted in analysis, and also in the works of Neˇsetˇril and Ossona de Mendez). From this viewpoint, Section 6 provides a description of the endofunctor on DLat dual to M : Pries → Pries. Recall the maps γ : Γ → [0, 1] and ι:[0, 1] → Γ from equations (3)–(4). In Section 3.2 we showed that this is a retraction-section pair. In Theorem 10 this retraction is lifted to the spaces of measures. We start with an easy observation: Lemma 7. Let D be a distributive lattice. The following statements hold: 1. for every μ ∈M (D), γ · μ ∈M (D), Γ I 2. for every m ∈M (D), ι · m ∈M (D). I Γ Proof. 1. The only non-trivial condition to verify is finite additivity. In view of the discussion after Lemma 2, the map γ preserves both minus operations on Γ. Hence, for every a, b ∈ D, the inequalities μ(a) ∼ μ(a ∧ b) ≤ μ(a ∨ b) − μ(b) and μ(a)−μ(a∧b) ≥ μ(a∨b) ∼ μ(b) imply that γ·μ(a)−γ·μ(a∧b)= γ·μ(a∨b)−γ·μ(b). A duality theoretic view on limits of finite structures 309 2. The first two conditions in Definition 3 are immediate. The third condition follows from the fact that ι(r − s)= ι(r) − ι(s) whenever s ≤ r in [0,1], and x ∼ y ≤ x − y for every (x, y) ∈ dom(−). In view of the previous lemma, there are well-defined functions # # γ : M (D) →M (D),μ → γ · μ and ι : M (D) →M (D),m → ι · m. Γ I I Γ Lemma 8. γ : M (D) →M (D) is a continuous and monotone map. Γ I Proof. The topology of M (D) is generated by the sets of the form {m ∈ M (D) | m(a) ∈ O}, for a ∈ D and O an open subset of [0, 1]. In turn, # −1 −1 (γ ) {m ∈M (D) | m(a) ∈ O} = {μ ∈M (D) | μ(a) ∈ γ (O)} I Γ is open in M (D) because γ : Γ → [0, 1] is continuous by Lemma 2. This shows that γ : M (D) →M (D) is continuous. Monotonicity is immediate. Γ I # # Note that γ : M (D) →M (D) is surjective, since it admits ι as a (set- Γ I theoretic) section. It follows that M (D) is a compact ordered space: Corollary 9. For each distributive lattice D, M (D) is a compact ordered space. Proof. The surjection γ : M (D) →M (D) is continuous (Lemma 8). Since Γ I M (D) is compact by Proposition 4, so is M (D). The order of M (D) is clearly Γ I I closed in the product topology, thus M (D) is a compact ordered space. Finally, we see that the set-theoretic retraction of M (D)onto M (D) lifts to Γ I the topological setting, provided we restrict to the down-set topologies. If (X, ≤) is a partially ordered topological space, write X for the space with the same underlying set as X and whose topology consists of the open down-sets of X. # ↓ ↓ # ↓ ↓ Theorem 10. The maps γ : M (D) →M (D) and ι : M (D) →M (D) Γ I I Γ are a retraction-section pair of topological spaces. # # Proof. It suffices to show that γ and ι are continuous. It is not difficult to see, # ↓ ↓ using Lemma 8, that γ : M (D) →M (D) is continuous. For the continuity Γ I # ↓ of ι , note that the topology of M (D) is generated by the sets of the form {μ ∈M (D) | μ(a) ≤ q }, for a ∈ D and q ∈ Q ∩ (0, 1]. We have # −1 − −1 − (ι ) {μ ∈M (D) | μ(a) ≤ q } = {m ∈M (D) | m(a) ∈ ι (↓q )} Γ I = {m ∈M (D) | m(a) <q}, which is an open set in M (D) . This concludes the proof. 5 The Γ-valued Stone pairing and limits of finite structures In the work of Neˇsetˇril and Ossona de Mendez, the Stone pairing -,A is [0, 1]- valued, i.e. an element of M (FO(σ)). In this section, we show that basically the I 310 M. Gehrke et al. same construction for the recognisers arising from the application of a layer of semiring quantifiers in logic on words (cf. Section 2.3) provides an embedding of finite σ-structures into the space of Γ-valued measures. It turns out that this embedding is a Γ-valued version of the Stone pairing. Hereafter we make a notational difference, writing -, - for the (classical) [0,1]-valued Stone pairing. The main ingredient of the construction are the Γ-valued finitely supported functions. To start with, we point out that the partial operation − on Γ uniquely determines a partial “plus” operation on Γ. Define +: dom(+) → Γ, where dom(+) = {(x, y) | x ≤ 1 − y}, by the following rules (whenever the expressions make sense): ◦ ◦ ◦ − ◦ − ◦ − − − − − r +s =(r+s) ,r +s =(r+s) ,r +s =(r+s) , and r +s =(r+s) . Then, for every y ∈ Γ, the function (-) + y sending x to x + y is left adjoint to the function (-) − y sending x to x − y. Definition 11. For any set X, F(X) is the set of all functions f : X → Γ s.t. 1. the set supp(f)= {x ∈ X | f(x) =0 } is finite, and 2. f(x )+···+f(x ) is defined and equal to 1 , where supp(f)= {x ,...,x }. 1 n 1 n To improve readability, if the sum y + ··· + y exists in Γ, we denote it 1 m y . Finitely supported functions in the above sense always determine mea- i=1 sures over the power-set algebra (the proof is an easy verification and is omitted): Lemma 12. Let X be any set. There is a well-defined mapping : F(X) → M ( (X)), assigning to every f ∈F(X) the measure f : M → f = {f(x) | x ∈ M ∩ supp(f)}. 5.1 The Γ-valued Stone pairing and logic on words Fix a countably infinite set of variables {v ,v ,... }. Recall that FO (σ) is the 1 2 n Lindenbaum-Tarski algebra of first-order formulas with free variables among {v ,...,v }. The dual space of FO (σ) is the space of n-types Typ (σ). Its 1 n n points are the equivalence classes of pairs (A, α), where A is a σ-structure and α: {v ,...,v }→ A is an interpretation of the variables. Write Fin(σ) for the 1 n set of all finite σ-structures and define a map Fin(σ) →F(Typ (σ)) as A → f , n n where f is the function which sends an equivalence class E ∈ Typ (σ)to n n 1 (Add for every interpretation α of the free A n |A| f (E)= |A| variables s.t. (A, α) is in the equivalence class). (A,α)∈E By Lemma 12, we get a measure f : (Typ (σ)) → Γ. Now, for each ϕ ∈ n n FO (σ), let ϕ ⊆ Typ (σ) be the set of (equivalence classes of) σ-structures n n with interpretations satisfying ϕ. By Stone duality we obtain an embedding - :FO (σ) → (Typ (σ)). Restricting f to FO (σ), we get a measure n n n n n A A μ :FO (σ) → Γ,ϕ → f . n n n A duality theoretic view on limits of finite structures 311 Summing up, we have the composite map A A Fin(σ) →M ( (Typ (σ))) →M (FO (σ)),A → f → μ . (6) Γ Γ n n n n Essentially the same construction is featured in logic on words, cf. equation (2): • The set of finite σ-structures Fin(σ) corresponds to the set of finite words A . • The collection Typ (σ) of (equivalence classes of) σ-structures with interpre- ∗ ∗ tations corresponds to (A × 2) or, interchangeably, β(A × 2) (in the case of one free variable). • The fragment FO (σ) of first-order logic corresponds to the Boolean algebra of languages, defined by formulas with a free variable, dual to the Boolean space X appearing in (2). • The first map in the composite (6) sends a finite structure A to the measure f which, evaluated on K ⊆ Typ (σ), counts the (proportion of) interpre- n n tations α: {v ,...,v }→ A suchthat(A, α) ∈ K, similarly to R from (2). 1 n • Finally, the second map in (6) sends a measure in M ( (Typ (σ))) to its pushforward along - :FO (σ) → (Typ (σ)). This is the second map in n n the composition (2). On the other hand, the assignment A → μ defined in (6) is also closely related to the classical Stone pairing. Indeed, for every formula ϕ in FO (σ), A A μ (ϕ)= f (E)= n n |A| E∈ϕ E∈ϕ (A,α)∈E n n |{a ∈ A | A |= ϕ(a)| = =(ϕ, A ) . (7) |A| In this sense, μ can be regarded as a Γ-valued Stone pairing, relative to the fragment FO (σ). Next, we show how to extend this to the full first-order logic FO(σ). First, we observe that the construction is invariant under extensions of the set of free variables (the proof is the same as in the classical case). A A Lemma 13. Given m, n ∈ N and A ∈ Fin(σ),if m ≥ n then (μ ) = μ . FO (σ) m n n The Lindenbaum-Tarski algebra of all first-order formulas FO(σ) is the directed colimit of the Boolean subalgebras FO (σ), for n ∈ N. Since the functor M n Γ turns directed colimits into codirected limits (Proposition 5), the Priestley space M (FO(σ)) is the limit of the diagram n,m M (FO (σ)) M (FO (σ)) | m, n ∈ N,m ≥ n Γ n Γ m where, for any μ:FO (σ) → Γ in M (FO (σ)), the measure q (μ)isthe m Γ m n,m restriction of μ to FO (σ). In view of Lemma 13, for every A ∈ Fin(σ), the tuple (μ ) is compatible with the restriction maps. Thus, recalling that limits in n∈N the category of Priestley spaces are computed as in sets, by universality of the limit construction, this tuple yields a measure -,A :FO(σ) → Γ Γ 312 M. Gehrke et al. in the space M (FO(σ)). This we call the Γ-valued Stone pairing associated with A. As in the classical case, it is not difficult to see that the mapping A →-,A gives an embedding -, - : Fin(σ) →M (FO(σ)). The following theorem illustrates the relation between the classical Stone pairing -, - : Fin(σ) → M (FO(σ)), and the Γ-valued one. Theorem 14. The following diagram commutes: M (FO(σ)) -,- Fin(σ) γ -,- M (FO(σ)) Proof. Fix an arbitrary finite structure A ∈ Fin(σ). Let ϕ be a formula in FO(σ) with free variables among {v ,...,v }, for some n ∈ N. By construction, 1 n A ◦ ϕ, A = μ (ϕ). Therefore, by equation (7), ϕ, A =(ϕ, A ) . The state- Γ n Γ I ment then follows at once. Remark. The construction in this section works also for proper fragments, i.e. for sublattices D ⊆ FO(σ). This corresponds to composing the embedding Fin(σ) →M (FO(σ)) with the restriction map M (FO(σ)) →M (D) send- Γ Γ Γ ing μ:FO(σ) → Γ to μ : D → Γ. The only difference is that the ensuing map Fin(σ) →M (D) need not be injective, in general. 5.2 Limits in the spaces of measures By Theorem 14 the Γ-valued Stone pairing -, - and the classical Stone pair- ing -, - determine each other. However, the notions of convergence asso- ciated with the spaces M (FO(σ)) and M (FO(σ)) are different: since the Γ I topology of M (FO(σ)) is richer, there are “fewer” convergent sequences. Re- call from Lemma 8 that γ : M (FO(σ)) →M (FO(σ)) is continuous. Also, Γ I γ (-,A )= -,A by Theorem 14. Thus, for any sequence of finite structures Γ I (A ) ,if n n∈N -,A  converges to a measure μ in M (FO(σ)) n Γ then -,A  converges to the measure γ (μ)in M (FO(σ)). n I The converse is not true. For example, consider the signature σ = {<} con- sisting of a single binary relation symbol, and let (A ) be the sequence of n n∈N finite posets displayed in the picture below. A A A A A A ··· 1 2 3 4 5 6 A duality theoretic view on limits of finite structures 313 Let ψ(x) ≈∀y ¬(x<y) ∧∃z ¬(z< x) ∧¬(z = x) be the formula stating that x is maximal but not the maximum in the order given by <. Then, for the sublattice D = {f,ψ, t} of FO(σ), the sequences -,A  and -,A  converge n n Γ I in M (D) and M (D), respectively. However, if we consider the Boolean algebra Γ I B = {f,ψ, ¬ψ, t}, then the -,A  ’s still converge whereas the -,A  ’s do not. n n I Γ Indeed, the following sequence does not converge in Γ: ◦ ◦ ◦ ◦ ◦ ◦ 1 2 3 (¬ψ, A  ) =(1 , ( ) , 1 , ( ) , 1 , ( ) ,...), n n Γ 3 4 5 ◦ − because the odd terms converge to 1 , while the even terms converge to 1 . However, there is a sequence -,B  whose image under γ coincides with the limit of the -,A  ’s (e.g., take the subsequence of even terms of (A ) ). In n n n∈N the next theorem, we will see that this is a general fact. Identify Fin(σ) with a subset of M (FO(σ)) (resp. M (FO(σ))) through Γ I -, - (resp. -, - ). A central question in the theory of structural limits, cf. [16], Γ I is to determine the closure of Fin(σ)in M (FO(σ)), which consists precisely of the limits of sequences of finite structures. The following theorem gives an answer to this question in terms of the corresponding question for M (FO(σ)). Theorem 15. Let Fin(σ) denote the closure of Fin(σ) in M (FO(σ)). Then the set γ (Fin(σ)) coincides with the closure of Fin(σ) in M (FO(σ)). Proof. Write U for the image of -, - : Fin(σ) →M (FO(σ)), and V for the image of -, - : Fin(σ) →M (FO(σ)). We must prove that γ (U)= V.By # # Theorem 14, γ (U)= V . The map γ : M (FO(σ)) →M (FO(σ)) is con- Γ I tinuous (Lemma 8), and the spaces M (FO(σ)) and M (FO(σ)) are compact Γ I Hausdorff (Proposition 4 and Corollary 9). Since continuous maps between com- pact Hausdorff spaces are closed, γ (U)= γ (U)= V . 6 The logic of measures Let D be a distributive lattice. We know from Proposition 4 that the space M (D)of Γ-valued measures on D is a Priestley space, whence it has a dual distributive lattice P(D). In this section we show that P(D) can be represented as the Lindenbaum-Tarski algebra for a propositional logic PL obtained from D by adding probabilistic quantifiers. Since we adopt a logical perspective, we write f and t for the bottom and top elements of D, respectively. The set of propositional variables of PL consists of the symbols P a, for D ≥p every a ∈ D and p ∈ Q ∩ [0, 1]. For every measure μ ∈M (D), we set μ |= P a ⇔ μ(a) ≥ p . (8) ≥p This satisfaction relation extends in the obvious way to the closure under finite conjunctions and finite disjunctions of the set of propositional variables. Define ϕ |= ψ if, ∀μ ∈M (D),μ |= ϕ implies μ |= ψ. Also, write |= ϕ if μ |= ϕ for every μ ∈M (D), and ϕ |= if there is no μ ∈ M (D) with μ |= ϕ. Γ 314 M. Gehrke et al. Consider the following conditions, for any p, q, r ∈ Q ∩ [0, 1] and a, b ∈ D. (L1) P a |= P a whenever p ≤ q ≥q ≥p (L2) P f |= whenever p> 0, |= P f and |= P t ≥p ≥0 ≥q (L3) P a |= P b whenever a ≤ b ≥q ≥q (L4) P a ∧ P b |= P (a∨b) ∨ P (a∧b) whenever 0 ≤ p+q−r ≤ 1 ≥p ≥q ≥p+q−r ≥r (L5) P (a∨b) ∧ P (a∧b) |= P a ∨ P b whenever 0 ≤ p+q−r ≤ 1 ≥p+q−r ≥r ≥p ≥q It is not hard to see that the interpretation in (8) validates these conditions: Lemma 16. The conditions (L1)–(L5) are satisfied in M (D). Write P(D) for the quotient of the free distributive lattice on the set {P a | p ∈ Q ∩ [0, 1],a ∈ D} ≥p with respect to the congruence generated by the conditions (L1)–(L5). Proposition 17. Let F ⊆ P(D) be a prime filter. The assignment a → {q | P a ∈ F } defines a measure μ : D → Γ. ≥q F Proof. Items (L2) and (L3) take care of the first two conditions defining Γ-valued measures (cf. Definition 3). We prove the first half of the third condition, as the other half is proved in a similar fashion. We must show that, for every a, b ∈ D, μ (a) ∼ μ (a ∧ b) ≤ μ (a ∨ b) − μ (b). (9) F F F F ◦ ◦ ◦ ◦ ◦ It is not hard to show that μ (a) − r = {p − r | r ≤ p ≤ μ (a)}, and F F 1 1 x − (-) transforms non-empty joins into meets (this follows by Scott continuity ◦ ∂ of x − (-) seen as a map [0 ,x] → Γ ). Hence, equation (9) is equivalent to ◦ ◦ ◦ ◦ ◦ ◦ {p − r | μ (a ∧ b) <r ≤ p ≤ μ (a)}≤ {μ (a ∨ b) − q | q ≤ μ (b)}. F F F F To settle this inequality it is enough to show that, provided μ (a ∧ b) <r ≤ ◦ ◦ ◦ ◦ p ≤ μ (a) and q ≤ μ (b), we have (p − r) ≤ μ (a ∨ b) − q . The latter F F F inequality is equivalent to (p + q − r) ≤ μ (a ∨ b). In turn, using (L4) and the fact that F is a prime filter, P a, P b ∈ F and P (a ∧ b) ∈ / F entail ≥p ≥q ≥r P (a ∨ b) ∈ F . Whence, ≥p+q−r ◦ ◦ μ (a ∨ b)= {s | P (a ∨ b) ∈ F}≥ (p + q − r) . F ≥s We can now describe the dual lattice of M (D) as the Lindenbaum-Tarski algebra for the logic PL , built from the propositional variables P a by im- D ≥p posing the laws (L1)–(L5). Theorem 18. Let D be a distributive lattice. Then the lattice P(D) is isomor- phic to the distributive lattice dual to the Priestley space M (D). Proof. Let X be the space dual to P(D). By Proposition 17 there is a map P(D) ϑ: X →M (D), F → μ . We claim that ϑ is an isomorphism of Priestley P(D) Γ F space. Clearly, ϑ is monotone. If μ (a) ≤ μ (a) for some a ∈ D,wehave F F 1 2 ◦ − {q | P a ∈ F } = μ (a) ≤ μ (a)= {p | P a/ ∈ F }. (10) ≥q 1 F F ≥p 2 1 2 A duality theoretic view on limits of finite structures 315 Equation (10) implies the existence of p, q satisfying P a ∈ F , P a/ ∈ F and ≥q 1 ≥p 2 q ≥ p. It follows by (L1) that P a ∈ F . We conclude that P a ∈ F \ F , ≥p 1 ≥p 1 2 whence F ⊆ F . This shows that ϑ is an order embedding, whence injective. 1 2 We prove that ϑ is surjective, thus a bijection. Fix a measure μ ∈M (D). It is not hard to see, using Lemma 16, that the filter F ⊆ P(D) generated by {P a | a ∈ D, q ∈ Q ∩ [0, 1],μ(a) ≥ q } ≥q ◦ ◦ ◦ is prime. Further, ϑ(F )(a)= {q | P a ∈ F } = {q | μ(a) ≥ q } = μ(a) μ ≥q μ for every a ∈ D. Hence, ϑ(F )= μ and ϑ is surjective. To settle the theorem it remains to show that ϑ is continuous. Note that for a basic clopen of the form C = {μ ∈M (D) | μ(a) ≥ p } where a ∈ D and −1 ◦ p ∈ Q ∩ [0, 1], the preimage ϑ (C)= {F ⊆ P(D) | μ (a) ≥ p } is equal to ◦ ◦ {F ∈ X | {q | P a ∈ F}≥ p } = {F ∈ X | P a ∈ F }, P(D) ≥q P(D) ≥p which is a clopen of X . Similarly, if C = {μ ∈M (D) | μ(a) ≤ q } for some P(D) −1 a ∈ D and q ∈ Q∩(0, 1], by the claim above ϑ (C)= {F ∈ X | P a/ ∈ F }, ≥q P(D) which is again a clopen of X . P(D) By Theorem 18, for any distributive lattice D, the lattice of clopen up-sets of M (D) is isomorphic to the Lindenbaum-Tarski algebra P(D)ofour positive propositional logic PL . Moving from the lattice of clopen up-sets to the Boolean algebra of all clopens logically corresponds to adding negation to the logic. The logic obtained this way can be presented as follows. Introduce a new propositional variable P a, for each a ∈ D and q ∈ Q ∩ [0, 1]. For a measure μ ∈M (D), set <q Γ μ |= P a ⇔ μ(a) <q . <q We also add a new rule, stating that P a is the negation of P a: <q ≥q (L6) P a ∧ P a |= and |= P a ∨ P a <q ≥q <q ≥q Clearly, (L6) is satisfied in M (D). Moreover, the Boolean algebra of all clopens of M (D) is isomorphic to the quotient of the free distributive lattice on {P a | p ∈ Q ∩ [0, 1],a ∈ D}∪{P b | q ∈ Q ∩ [0, 1],b ∈ D} ≥p <q with respect to the congruence generated by the conditions (L1)–(L6). Specialising to FO(σ). Let us briefly discuss what happens when we instantiate D with the full first-order logic FO(σ). For a formula ϕ ∈ FO(σ) with free variables v ,...,v and a q ∈ Q ∩ [0, 1], we have two new sentences P ϕ and P ϕ.For 1 n ≥q <q a finite σ-structure A identified with its Γ-valued Stone pairing -,A , ◦ ◦ A |= P ϕ (resp. A |= P ϕ)iff ϕ, A ≥ q (resp. ϕ, A <q ). ≥q <q Γ Γ That is, P ϕ is true in A if a random assignment of the variables v ,...,v in A ≥q 1 n satisfies ϕ with probability at least q. Similarly for P ϕ. If we regard P and <q ≥q P as probabilistic quantifiers that bind all free variables of a given formula, <q the Stone pairing -, - : Fin →M (FO(σ)) can be seen as the embedding of finite structures into the space of types for the logic PL . FO(σ) 316 M. Gehrke et al. Conclusion Types are points of the dual space of a logic (viewed as a Boolean algebra). In classical first-order logic, 0-types are just the models modulo elementary equiv- alence. But when there are not ‘enough’ models, as in finite model theory, the spaces of types provide completions of the sets of models. In [5], it was shown that for logic on words and various quantifiers we have that, given a Boolean algebra of formulas with a free variable, the space of types of the Boolean algebra generated by the formulas obtained by quantification is given by a measure space construction. Here we have shown that a suitable enrichment of first-order logic gives rise to a space of measures M (FO(σ)) closely related to the space M (FO(σ)) used in the theory of structural limits. Indeed, Theorem 14 tells us that the ensuing Stone pairings interdetermine each other. Further, the Stone pairing for M (FO(σ)) is just the embedding of the finite models in the completion/compactification provided by the space of types of the enriched logic. These results identify the logical gist of the theory of structural limits, and provide a new and interesting connection between logic on words and the theory of structural limits in finite model theory. But we also expect that it may prove a useful tool in its own right. Thus, for structural limits, it is an open problem to characterise the closure of the image of the [0, 1]-valued Stone pairing [16]. Rea- soning in the Γ-valued setting, native to logic and where we can use duality, one would expect that this is the subspace M (Th(Fin)) of M (FO(σ)) given by the Γ Γ quotient FO(σ)  Th(Fin) onto the theory of pseudofinite structures. The pur- pose of such a characterisation would be to understand the points of the closure as “generalised models”. Another subject that we would like to investigate is that of zero-one laws. The zero-one law for first-order logic states that the sequence of measures for which the nth measure, on a sentence ψ, yields the proportion of n-element structures satisfying ψ, converges to a {0, 1}-valued measure. Over Γ this will no longer be true as 1 is split into its ‘limiting’ and ‘achieved’ personae. Yet, we expect the above sequence to converge also in this setting and, by The- ◦ − ◦ orem 14, it will converge to a {0 , 1 , 1 }-valued measure. Understanding this more fine-grained measure may yield useful information about the zero-one law. Further, it would be interesting to investigate whether the limits for schema mappings introduced by Kolaitis et al. [13] may be seen also as a type-theoretic construction. Finally, we would want to explore the connections with other se- mantically inspired approaches to finite model theory, such as those recently put forward by Abramsky, Dawar et al. [2,3]. A duality theoretic view on limits of finite structures 317 References 1. Abramsky, S.: Domain theory in logical form. Ann. Pure Appl. Logic 51, 1–77 (1991) 2. Abramsky, S., Dawar, A., Wang, P.: The pebbling comonad in finite model theory. In: 32nd Annual ACM/IEEE Symposium on Logic in Computer Science, LICS. pp. 1–12 (2017) 3. Abramsky, S., Shah, N.: Relating Structure and Power: Comonadic semantics for computational resources. In: 27th EACSL Annual Conference on Computer Science Logic, CSL. pp. 2:1–2:17 (2018) 4. Gehrke, M., Grigorieff, S., Pin, J.-E.: Duality and equational theory of regular languages. In: Automata, languages and programming II, LNCS, vol. 5126, pp. 246–257. Springer, Berlin (2008) 5. Gehrke, M., Petri¸san, D., Reggio, L.: Quantifiers on languages and codensity mon- ads. In: 32nd Annual ACM/IEEE Symposium on Logic in Computer Science, LICS. pp. 1–12 (2017) 6. Gehrke, M., Petri¸san, D., Reggio, L.: Quantifiers on languages and codensity mon- ads (2019), extended version. Submitted. Preprint available at abs/1702.08841 7. Gehrke, M., Priestley, H.A.: Canonical extensions of double quasioperator algebras: an algebraic perspective on duality for certain algebras with binary operations. J. Pure Appl. Algebra 209(1), 269–290 (2007) 8. Gehrke, M., Priestley, H.A.: Duality for double quasioperator algebras via their canonical extensions. Studia Logica 86(1), 31–68 (2007) 9. Goldblatt, R.: Varieties of complex algebras. Ann. Pure Appl. Logic 44(3), 173–242 (1989) 10. van Gool, S.J., Steinberg, B.: Pro-aperiodic monoids via saturated models. In: 34th Symposium on Theoretical Aspects of Computer Science, STACS. pp. 39:1–39:14 (2017) 11. Johnstone, P.T.: Stone spaces, Cambridge Studies in Advanced Mathematics, vol. 3. Cambridge University Press (1986), reprint of the 1982 edition 12. Jung, A.: Continuous domain theory in logical form. In: Coecke, B., Ong, L., Panan- gaden, P. (eds.) Computation, Logic, Games, and Quantum Foundations, Lecture Notes in Computer Science, vol. 7860, pp. 166–177. Springer Verlag (2013) 13. Kolaitis, P.G., Pichler, R., Sallinger, E., Savenkov, V.: Limits of schema mappings. Theory of Computing Systems 62(4), 899–940 (2018) 14. Matz, O., Schweikardt, N.: Expressive power of monadic logics on words, trees, pictures, and graphs. In: Logic and Automata: History and Perspectives. pp. 531– 552 (2008) 15. Neˇsetˇril, J., Ossona de Mendez, P.: A model theory approach to structural limits. Commentationes Mathematicae Universitatis Carolinae 53(4), 581–603 (2012) 16. Neˇsetˇril, J., Ossona de Mendez, P.: First-order limits, an analytical perspective. European Journal of Combinatorics 52, 368–388 (2016) 17. Neˇsetˇril, J., Ossona de Mendez, P.: A unified approach to structural limits and limits of graphs with bounded tree-depth (2020), to appear in Memoirs of the American Mathematical Society 18. Pin, J.-E.: Profinite methods in automata theory. In: 26th Symposium on Theo- retical Aspects of Computer Science, STACS. pp. 31–50 (2009) 19. Priestley, H.A.: Representation of distributive lattices by means of ordered Stone spaces. Bull. London Math. Soc. 2, 186–190 (1970) 318 M. Gehrke et al. Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made. The images or other third party material in this chapter are included in the chapter’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. Correctness of Automatic Differentiation via Diffeologies and Categorical Gluing 1∗ 1∗ 2∗ Mathieu Huot  , Sam Staton , and Matthijs V´ak´ar University of Oxford, UK Utrecht University, The Netherlands Equal contribution Abstract. We present semantic correctness proofs of Automatic Differ- entiation (AD). We consider a forward-mode AD method on a higher order language with algebraic data types, and we characterise it as the unique structure preserving macro given a choice of derivatives for basic operations. We describe a rich semantics for differentiable programming, based on diffeological spaces. We show that it interprets our language, and we phrase what it means for the AD method to be correct with re- spect to this semantics. We show that our characterisation of AD gives rise to an elegant semantic proof of its correctness based on a gluing construction on diffeological spaces. We explain how this is, in essence, a logical relations argument. Finally, we sketch how the analysis extends to other AD methods by considering a continuation-based method. 1 Introduction Automatic differentiation (AD), loosely speaking, is the process of taking a pro- gram describing a function, and building the derivative of that function by ap- plying the chain rule across the program code. As gradients play a central role in many aspects of machine learning, so too do automatic differentiation systems such as TensorFlow [1] or Stan [6]. automatic Differentiation has a well differentiation Programs Programs developed mathematical the- ory in terms of differential ge- denotational denotational ometry. The aim of this paper semantics semantics math is to formalize this connec- Differential differentiation Differential tion between differential ge- geometry geometry ometry and the syntactic op- erations of AD. In this way we Fig. 1. Overview of semantics/correctness of AD. achieve two things: (1) a com- positional, denotational understanding of differentiable programming and AD; (2) an explanation of the correctness of AD. This intuitive correspondence (summarized in Fig. 1) is in fact rather com- plicated. In this paper we focus on resolving the following problem: higher order functions play a key role in programming, and yet they have no counterpart in traditional differential geometry. Moreover, we resolve this problem while retain- ing the compositionality of denotational semantics. The Author(s) 2020 J. Goubault-Larrecq and B. K¨ onig (Eds.): FOSSACS 2020, LNCS 12077, pp. 319–338, 2020. 320 M. Huot et al. Higher order functions and differentiation. A major application of higher order functions is to support disciplined code reuse. Code reuse is particularly acute in machine learning. For example, a multi-layer neural network might be built of millions of near-identical neurons, as follows. n n neuron : (real ∗(real ∗real)) → real 0.5 def neuron = λx, w, b.ς(w · x + b) n n layer :((τ ∗P ) → τ ) → (τ ∗P ) → τ 1 2 1 n 2 def layer = λf. λx, p ,...,p . fx, p ,...,fx, p 1 n 1 n −5 0 5 comp : (((τ ∗P ) → τ )∗((τ ∗Q) → τ )) → (τ ∗(P∗Q)) → τ 1 2 2 3 1 3 def comp = λf, g.λx, (p, q).gfx, p,q def (Here ς(x) = is the sigmoid function, as illustrated.) We can use these −x 1+e functions to build a network as follows (see also Fig. 2): complayer (neuron ), complayer (neuron ), neuron  : (real ∗P ) → real k m n m n (1) Here P real with p =(m(k+1)+n(m+1)+n+1). ··· 1 2 3 k This program (1) describes a smooth (infinitely dif- ferentiable) function. The goal of automatic differ- ··· entiation is to find its derivative. If we β-reduce all the λ’s, we end up with a very ··· long function expression just built from the sigmoid function and linear algebra. We can then find a pro- gram for calculating its derivative by applying the chain rule. However, automatic differentiation can Fig. 2. The network in (1) also be expressed without first β-reducing, in a com- with k inputs and two hid- positional way, by explaining how higher order func- den layers. tions like (layer) and (comp) propagate derivatives. This paper is a semantic analysis of this compositional approach. The general idea of denotational semantics is to interpret types as spaces and programs as functions between the spaces. In this paper, we propose to use diffeological spaces and smooth functions [32, 16] to this end. These satisfy the following three desiderata: – R is a space, and the smooth functions R → R are exactly the functions that are infinitely differentiable; – The set of smooth functions X → Y between spaces again forms a space, so we can interpret function types. – The disjoint union of a sequence of spaces again forms a space, so we can interpret variant types and inductive types. We emphasise that the most standard formulation of differential geometry, using manifolds, does not support spaces of functions. Diffeological spaces seem to us the simplest notion of space that satisfies these conditions, but there are other ς(x) Correctness of AD via Diffeologies and Categorical Gluing 321 candidates [3, 33]. A diffeological space is in particular a set X equipped with a chosen set of curves C ⊆ X and a smooth map f : X → Y must be such that if γ ∈ C then γ; f ∈ C . This is remiscent of the method of logical relations. X Y From smoothness to automatic derivatives at higher types. Our denota- tional semantics in diffeological spaces guarantees that all definable functions are smooth. But we need more than just to know that a definable function happens to have a mathematical derivative: we need to be able to find that derivative. In this paper we focus on a simple, forward mode automatic differentiation → − method, which is a macro translation on syntax (called D in §2). We are able to show that it is correct, using our denotational semantics. Here there is one subtle point that is central to our development. Although differential geometry provides established derivatives for first order functions (such as neuron above), there is no canonical notion of derivative for higher order functions (such as layer and comp) in the theory of diffeological spaces (e.g. [7]). We propose a new way to resolve this, by interpreting types as triples (X, X ,S) where, intuitively, X is a space of inhabitants of the type, X is a space serving R R as a chosen bundle of tangents over X, and S ⊆ X × X is a binary relation between curves, informally relating curves in X with their tangent curves in X . This new model gives a denotational semantics for automatic differentiation. In §3 we boil this new approach down to a straightforward and elementary logical relations argument for the correctness of automatic differentiation. The approach is explained in detail in §5. Related work and context. AD has a long history and has many implemen- tations. AD was perhaps first phrased in a functional setting in [26], and there are now a number of teams working on AD in the functional setting (e.g. [34, 31, 12]), some providing efficient implementations. Although that work does not involve formal semantics, it is inspired by intuitions from differential geometry and category theory. This paper adds to a very recent body of work on verified automatic differen- tiation. Much of this is concurrent with and independent from the work in this article. In the first order setting, there are recent accounts based on denotational semantics in manifolds [13] and based on synthetic differential geometry [9], as well as work making a categorical abstraction [8] and work connecting oper- ational semantics with denotational semantics [2, 28]. Recently there has also been significant progress at higher types. The work of Brunel et al. gives formal correctness proofs for reverse-mode derivatives on computation graphs [5]. The work of Barthe et al. [4] provides a general discussion of some new syntactic logical relations arguments including one very similar to our syntactic proof of Theorem 1. We understand that the authors of [9] are working on higher types. The differential λ-calculus [11] is related to AD, and explicit connections are made in [22, 23]. One difference is that the differential λ-calculus allows addition of terms at all types, and hence vector space models are suitable, but this appears peculiar with the variant and inductive types that we consider here. Finally we emphasise that we have chosen the neural network (1) as our running example mainly for its simplicity. There are many other examples of AD 322 M. Huot et al. outside the neural networks literature: AD is useful whenever derivatives need to be calculated on high dimensional spaces. This includes optimization problems more generally, where the derivative is passed to a gradient descent method (e.g. [30, 18, 29, 19, 10, 21]). Other applications of AD are in advanced integration methods, since derivatives play a role in Hamiltonian Monte Carlo [25, 14] and variational inference [20]. Summary of contributions. We have provided a semantic analysis of auto- matic differentiation. Our syntactic starting point is a well-known forward-mode AD macro on a typed higher order language (e.g. [31, 34]). We recall this in §2 for function types, and in §4 we extend it to inductive types and variants. The main contributions of this paper are as follows. – We give a denotational semantics for the language in diffeological spaces, showing that every definable expression is smooth (§3). – We show correctness of the AD macro by a logical relations argument (Th. 1). – We give a categorical analysis of this correctness argument with two parts: canonicity of the macro in terms of syntactic categories, and a new notion of glued space that abstracts the logical relation (§5). – We then use this analysis to state and prove a correctness argument at all first order types (Th. 2). – We show that our method is not specific to one particular AD macro, by also considering a continuation-based AD method (§6). 2 A simple forward-mode AD translation Rudiments of differentiation and dual numbers. Recall that the derivative of a function f : R → R, if it exists, is a function ∇f : R → R such that df(x) ∇f(x )= (x ) is the gradient of f at x . 0 0 0 dx To find ∇f in a compositional way, two generalizations are reasonable: – We need both f and ∇f when calculating ∇(f; g) of a composition f; g, using the chain rule, so we are really interested in the pair (f, ∇f): R → R × R; – In building f we will need to consider functions of multiple arguments, such as + : R → R, and these functions should propagate derivatives. Thus we are more generally interested in transforming a function g : R → R into a function h :(R×R) → R×R in such a way that for any f ...f : R → R, 1 n (f , ∇f ,...,f , ∇f ); h =((f ,...,f ); g, ∇((f ,...,f ); g)). (2) 1 1 n n 1 n 1 n An intuition for h is often given in terms of dual numbers. The transformed function operates on pairs of numbers, (x, x ), and it is common to think of such a pair as x + x  for an ‘infinitesimal’ . But while this is a helpful intuition, the formalization of infinitesimals can be intricate, and the development in this paper is focussed on the elementary formulation in (2). The reader may also notice that h encodes all the partial derivatives of g. def def For example, if g : R → R, then with f (x) = x and f (x) = x , by apply- 1 2 2 ∂g(x,x ) ing (2) to x we obtain h(x , 1,x ,0)=(g(x ,x ), (x )) and similarly 1 1 2 1 2 1 ∂x Correctness of AD via Diffeologies and Categorical Gluing 323 ∂g(x ,x) h(x , 0,x ,1)=(g(x ,x ), (x )). And conversely, if g is differentiable in 1 2 1 2 2 ∂x each argument, then a unique h satisfying (2) can be found by taking linear combinations of partial derivatives: ∂g(x,x ) ∂g(x ,x) 2  1 h(x ,x ,x ,x )=(g(x ,x ),x · (x )+ x · (x )). 1 2 1 2 1 2 1 2 1 2 ∂x ∂x In summary, the idea of differentiation with dual numbers is to transform a n 2n 2 differentiable function g : R → R to a function h : R → R which captures g and all its partial derivatives. We packaged this up in (2) as a sort-of invariant which is useful for building derivatives of compound functions R → R in a compositional way. The idea of forward mode automatic differentiation is to perform this transformation at the source code level. A simple language of smooth functions. We consider a standard higher order typed language with a first order type real of real numbers. The types (τ, σ) and terms (t, s) are as follows. τ, σ, ρ ::= types | (τ ∗ ... ∗τ ) finite product 1 n | real real numbers | τ → σ function t, s, r ::= terms x variable | c | t + s | t ∗ s | ς(t) operations/constants |t ,...,t | case t of x ,...,x → s tuples/pattern matching 1 n 1 n | λx.t | ts function abstraction/app. The typing rules are in Figure 3. We have included a minimal set of operations for the sake of illustration, but it is not difficult to add further operations. We def add some simple syntactic sugar t − u = t +(−1) ∗ u. We intend ς to stand def for the sigmoid function, ς(x) = . We further include syntactic sugar −x 1+e let x = t in s for (λx.s) t and λx ,...,x .t for λ x of x ,...,x → t. 1 n 1 n Syntactic automatic differentiation: a functorial macro. The aim of for- ward mode AD is to find the dual numbers representation of a function by syntactic manipulations. For our simple language, we implement this as the fol- → − lowing inductively defined macro D on both types and terms (see also [34, 31]): → − def → − def → − → − D (real) = (real∗real) D (τ → σ) = D (τ) → D (σ) → − def → − → − D ((τ ∗ ··· ∗τ )) = (D (τ )∗ ··· ∗D (τ )) 1 n 1 n Γ  t : real Γ  s : real Γ  t : real Γ  s : real Γ  t : real (c ∈ R) Γ  c : real Γ  t + s : real Γ  t ∗ s : real Γ  ς(t): real Γ  t : τ ... Γ  t : τ Γ  t : (σ ∗ ... ∗σ ) Γ, x : σ , ..., x : σ  s : τ 1 1 n n 1 n 1 1 n n Γ t ,...,t  : (τ ∗ ... ∗τ ) Γ  case t of x ,...,x → s : τ 1 n 1 n 1 n Γ, x : τ  t : σ Γ  t : σ →τΓ  s : σ ((x : τ) ∈ Γ ) Γ  x : τ Γ  λx : τ.t : τ → σ Γ  ts : τ Fig. 3. Typing rules for the simple language. 324 M. Huot et al. → − def → − def D (x) = x D (c) = c, 0 → − def → − → − D (t + s) = case D (t) of x, x → case D (s) of y, y →x + y, x + y → − def → − → − D (t ∗ s) = case D (t) of x, x → case D (s) of y, y →x ∗ y, x ∗ y + x ∗ y → − def → − D (ς(t)) = case D (t) of x, x → let y = ς(x) in y, x ∗ y ∗ (1 − y) → − def → − → − def → − → − → − def → − → − D (λx.t) = λx.D (t) D (ts) = D (t) D (s) D (t ,...,t ) = D (t ),..., D (t ) 1 n 1 n → − def → − → − D (case t of x ,...,x → s) = case D (t) of x ,...,x → D (s) 1 n 1 n → − → − def → − → − We extend D to contexts: D ({x :τ , ..., x :τ }) = {x :D (τ ), ..., x :D (τ )}. 1 1 n n 1 1 n n → − This turns D into a well-typed, functorial macro in the following sense. → − → − → − Lemma 1 (Functorial macro). If Γ t : τ then D (Γ ) D (t): D (τ). → − → − → − → − s D (s) If Γ, x : σ t : τ and Γ s : σ then D (Γ ) D (t[ / ]) = D (t)[ / ]. x x Example 1 (Inner products). Let us write τ for the n-fold product (τ∗ ... ∗τ). Then, given Γ t, s : real we can define their inner product def t · s = case t of z ,...,z → n 1 n case s of y ,...,y → z ∗ y + ··· + z ∗ y : real 1 n 1 1 n n → − → − To illustrate the calculation of D , let us expand (and β-reduce) D (t · s): → − → − case D (t) of z ,z → case D (s) of y ,y → case z of z ,z → 1 2 1 2 1 1,1 1,2 case y of y ,y → case z of z ,z → case y of y ,y → 1 1,1 1,2 2 2,1 2,2 2 2,1 2,2 z ∗ y + z ∗ y ,z ∗ y + z ∗ y + z ∗ y + z ∗ y 1,1 1,1 2,1 2,1 1,1 1,2 1,2 1,1 2,1 2,2 2,2 2,1 Example 2 (Neural networks). In our introduction (1), we provided a program in our language to build a neural network out of expressions neuron, layer, comp; this program makes use of the inner product of Ex. 1. We can similarly calculate → − D of such deep neural nets by mechanically applying the macro. 3 Semantics of differentiation Consider for a moment the first order fragment of the language in § 2, with only one type, real, and no λ’s or pairs. This has a simple semantics in the category of cartesian spaces and smooth maps. Indeed, a term x ...x : real t : real 1 n has a natural reading as a function t : R → R by interpreting our operation symbols by the well-known operations on R → R with the corresponding name. In fact, the functions that are definable in this first order fragment are smooth, which means that they are continuous, differentiable, and their derivatives are continuous, differentiable, and so on. Let us write CartSp for this category of cartesian spaces (R for some n) and smooth functions. The category CartSp has cartesian products, and so we can also interpret product types, tupling and pattern matching, giving us a useful syntax for con- structing functions into and out of products of R. For example, the interpretation of (neuron ) in (1) becomes · ×id + ς n R n n R × R × R −−−−−→ R × R −−→ R −−→ R. where · , + and ς are the usual inner product, addition and the sigmoid function on R, respectively. Correctness of AD via Diffeologies and Categorical Gluing 325 Inside this category, we can straightforwardly study the first order language without λ’s, and automatic differentiation. In fact, we can prove the following by plain induction on the syntax: → − The interpretation of the (syntactic) forward AD D (t) of a first-order term t equals the usual (semantic) derivative of the interpretation of t as a smooth function. However, as is well known, the category CartSp does not support function spaces. To see this, notice that we have polynomial terms x ,...,x : real λy. x y : real → real 1 d n n=1 for each d, and so if we could interpret (real → real) as a Euclidean space R then, by interpreting these polynomial expressions, we would be able to find d p continuous injections R → R for every d, which is topologically impossible for any p, for example as a consequence of the Borsuk-Ulam theorem (see [15], Appx. A). This means that we cannot interpret the functions (layer) and (comp) from (1) in CartSp, as they are higher order functions, even though they are very use- ful and innocent building blocks for differential programming! Clearly, we could define neural nets such as (1) directly as smooth functions without any higher order subcomponents, though that would quickly become cumbersome for deep networks. A problematic consequence of the lack of a semantics for higher order differential programs is that we have no obvious way of establishing composi- → − tional semantic correctness of D for the given implementation of (1). Diffeological spaces. This motivates us to turn to a more general notion of differential geometry for our semantics, based on diffeological spaces [16]. The key idea will be that a higher order function is called smooth if it sends smooth functions to smooth functions, meaning that we can never use it to build first order functions that are not smooth. For example, (comp) in (1) has this property. Definition 1. A diffeological space (X, P ) consists of a set X together with, n U for each n and each open subset U of R , a set P ⊆ [U → X] of functions, called plots, such that – all constant functions are plots; U V – if f : V → U is a smooth function and p ∈P , then f; p ∈P ; X X – if p ∈P is a compatible family of plots (x ∈ U ∩ U ⇒ p (x)= p (x)) i i j i j i∈I and (U ) covers U, then the gluing p : U → X : x ∈ U → p (x) is a plot. i i i i∈I We call a function f : X → Y between diffeological spaces smooth if, for all plots U U p ∈P , we have that p; f ∈P . We write Diff(X, Y ) for the set of smooth X Y maps from X to Y . Smooth functions compose, and so we have a category Diff of diffeological spaces and smooth functions. A diffeological space is thus a set equipped with structure. Many construc- tions of sets carry over straightforwardly to diffeological spaces. Example 3 (Cartesian diffeologies). Each open subset U of R can be given the structure of a diffeological space by taking all the smooth functions V → U 326 M. Huot et al. as P . It is easily seen that smooth functions from V → U in the traditional sense coincide with smooth functions in the sense of diffeological spaces. Thus diffeological spaces have a profound relationship with ordinary calculus. In categorical terms, this gives a full embedding of CartSp in Diff. Example 4 (Product diffeologies). Given a family (X ) of diffeological spaces, i∈I we can equip the product X of sets with the product diffeology in which i∈I U-plots are precisely the functions of the form (p ) for p ∈P . i i i∈I X This gives us the categorical product in Diff. Example 5 (Functional diffeology). We can equip the set Diff(X, Y ) of smooth functions between diffeological spaces with the functional diffeology in which U- plots consist of functions f : U → Diff(X, Y ) such that (u, x) → f(u)(x)isan element of Diff(U × X, Y ). This specifies the categorical function object in Diff. Semantics and correctness of AD. We can now give a denotational seman- tics to our language from § 2. We interpret each type τ as a set τ  equipped with the relevant diffeology, by induction on the structure of types: def def def real = R (τ ∗ ... ∗τ ) = ττ → σ = Diff(τ , σ) 1 n i i=1 def A context Γ =(x : τ ...x : τ ) is interpreted as a diffeological space Γ  = 1 1 n n τ . Now well typed terms Γ t : τ are interpreted as smooth functions i=1 t : Γ  → τ , giving a meaning for t for every valuation of the context. This is routinely defined by induction on the structure of typing derivations. Constants c : real are interpreted as constant functions; and the first order operations (+, ∗,ς) are interpreted by composing with the corresponding functions, which def are smooth. For example, ς(t)(ρ) = ς(t(ρ)), where ρ ∈ Γ . Variables are def interpreted as x (ρ) = ρ . The remaining constructs are interpreted as follows, i i and it is straightforward to show that smoothness is preserved. def def t ,...,t (ρ) =(t (ρ),..., t (ρ)) λx:τ.t(ρ)(a) = t(ρ, a)(a ∈ τ ) 1 n 1 n def def case t of ...→ s(ρ) = s(ρ, t(ρ)) ts(ρ) = t(ρ)(s(ρ)) Notice that a term x : real,...,x : real t : real is interpreted as a smooth 1 n function t : R → R,evenif t involves higher order functions (like (1)). More- → − → − over the macro differentiation D (t) is a function D (t) :(R × R) → (R × R). This enables us to state a limited version of our main correctness theorem: → − Theorem 1 (Semantic correctness of D (limited)). For any term → − x : real,...,x : real t : real, the function D (t) is the dual numbers repre- 1 n sentation (2) of t. In detail: for any smooth functions f ...f : R → R, 1 n → − (f , ∇f ,...,f , ∇f ); D (t) = (f ...f ); t, ∇((f ...f ); t) . 1 1 n n 1 n 1 n → − ∂t(x,x ) (For instance, if n = 2, then D (t)(x , 1,x ,0)=(t(x ,x ), (x )).) 1 2 1 2 1 ∂x Correctness of AD via Diffeologies and Categorical Gluing 327 Proof. We prove this by logical relations. Although the following proof is ele- mentary, we found it by using the categorical methods in § 5. For each type τ, we define a binary relation S between curves in τ  and → − R R curves in D (τ), i.e. S ⊆P ×P , by induction on τ: τ → − D (τ) def – S = {(f, (f, ∇f)) | f : R → R smooth}; real def – S = {((f ,g ), (f ,g )) | (f ,f ) ∈ S , (g ,g ) ∈ S }; 1 1 2 2 1 2 τ 1 2 σ (τ∗σ) def – S = {(f ,f ) |∀(g ,g ) ∈ S .(x →f (x)(g (x)),x →f (x)(g (x))) ∈ S }. τ→σ 1 2 1 2 τ 1 1 2 2 σ Then, we establish the following ‘fundamental lemma’: If x :τ , ..., x :τ t : σ and, for all 1≤i≤n, y ...y : real s : τ 1 1 n n 1 m i i → − is such that ((f ,...,f ); s , (f , ∇f ), ..., f , ∇f ); D (s )) ∈ S for 1 m i 1 1 m m i τ all smooth f : R → R, then → − s s s s 1 n 1 n (f , ..., f ); t[ / , ..., / ], (f , ∇f , ..., f , ∇f ); D (t[ / , ..., / ]) 1 m x x 1 1 m m x x 1 n 1 n is in S for all smooth f : R → R. σ i This is proved routinely by induction on the typing derivation of t. The case → − for ∗ relies on the precise definition of D (t ∗ s), and similarly for +,ς. We conclude the theorem from the fundamental lemma by considering the case where τ = σ = real, m = n and s = y . i i i 4 Extending the language: variant and inductive types In this section, we show that the definition of forward AD and the semantics generalize if we extend the language of §2 with variants and inductive types. As an example of inductive types, we consider lists. This specific choice is only for expository purposes and the whole development works at the level of generality of arbitrary algebraic data types generated as initial algebras of (polynomial) type constructors formed by finite products and variants. Similarly, our choice of operations is for expository purposes. More generally, assume given a family of operations (Op ) indexed by their arity n. Further n∈N assume that each op ∈ Op has type real → real. We then ask for a certain closure of these operations under differentiation, that is we define → − def → − → − D (op(t ,...,t )) = case D (t ) of x ,x → ... → case D (t ) of x ,x → 1 n 1 1 n n 1 n op(x ,...,x ), x ∗ ∂ op(x ,...,x ) 1 n i 1 n i=1 i where ∂ op(x ,...,x ) is some chosen term in the language, involving free vari- i 1 n ables from x ,...,x , which we think of as implementing the partial derivative 1 n of op with respect to its i-th argument. For constructing the semantics, every op must be interpreted by some smooth function, and, to establish correctness, the semantics of ∂ op(x ,...,x ) must be the semantic i-th partial derivative of the i 1 n semantics of op(x ,...,x ). 1 n Language. We additionally consider the following types and terms: τ, σ, ρ ::= types | list(τ) list |{ τ ... τ } variant 1 1 n n 328 M. Huot et al. t, s, r ::= terms | τ. t variant constructor | [] | t :: s empty list and cons | case t of { x → s ··· x → s } pattern matching: variants 1 1 1 n n n | fold (x ,x ).t over s from r list fold 1 2 We extend the type system according to: t : τ Γ t :τΓ s : list(τ) (( τ ) ∈ τ) i i τ. t : τ Γ [] : list(τ) Γ t :: s : list(τ) t : { τ ... τ } for each 1 ≤ i ≤ n: Γ, x : τ s : τ 1 1 n n i i i case t of { x → s ··· x → s } : τ 1 1 1 n n n s : list(τ) Γ r :σΓ,x : τ, x : σ t : σ 1 2 fold (x ,x ).t over s from r : σ 1 2 → − We can then extend D to our new types and terms by → − def → − → − → − def → − D ({ τ ... τ }) = { D (τ ) ... D (τ )} D (list(τ)) = list(D (τ)) 1 1 n n 1 1 n n → − def → − → − → − def → − def → − → − D (τ. t) = D (τ). D (t) D ([ ]) =[] D (t :: s) = D (t):: D (s) → − def D (case t of { x → s ··· x → s }) = 1 1 1 n n n → − → − → − case D (t) of { x → D (s ) ··· x → D (s )} 1 1 1 n n n → − def → − → − → − D (fold (x ,x ).t over s from r) = fold (x ,x ).D (t) over D (s) from D (r) 1 2 1 2 To demonstrate the practical use of expressive type systems for differential programming, we consider the following two examples. Example 6 (Lists of inputs for neural nets). Usually, we run a neural network on a large data set, the size of which might be determined at runtime. To evaluate a neural network on multiple inputs, in practice, one often sums the outcomes. This can be coded in our extended language as follows. Suppose that we have a network f : (real ∗P ) → real that operates on single input vectors. We can construct one that operates on lists of inputs as follows: def g = λl, w.fold (x ,x ).fx ,w + x over l from 0 : (list(real )∗P ) → real 1 2 1 2 Example 7 (Missing data). In practically every application of statistics and ma- chine learning, we face the problem of missing data: for some observations, only partial information is available. In an expressive typed programming language like we consider, we can model missing data conveniently using the data type maybe(τ)= {Nothing () Just τ}. In the context of a neural network, one might use it as follows. First, define some helper functions def fromMaybe = λx.λ m of {Nothing → x Just x → x } def fromMaybe = λx , ..., x .λm , ..., m .fromMaybe x m , ..., fromMaybe x m 1 n 1 n 1 1 n n n n n :(maybe(real)) → real → real def map = λf.λl.fold (x ,x ).f x :: x over l from [] : (τ → σ) → list(τ) → list(σ) 1 2 1 2 Correctness of AD via Diffeologies and Categorical Gluing 329 Given a neural network f : (list(real )∗P ) → real, we can build a new one that operates on on a data set for which some covariates (features) are missing, by passing in default values to replace the missing covariates: λl, m, w.fmap (fromMaybe m) l, w : (list((maybe(real)) )∗(real ∗P )) → real Then, given a data set l with missing covariates, we can perform automatic differentiation on this network to optimize, simultaneously, the ordinary network parameters w and the default values for missing covariates m. Semantics. In § 3 we gave a denotational semantics for the simple language in diffeological spaces. This extends to the language in this section, as follows. As before, each type τ is interpreted as a diffeological space, which is a set equipped with a family of plots: – A variant type { τ ... τ } is inductively interpreted as the disjoint 1 1 n n def union { τ ··· τ } = τ  with U-plots 1 1 n n i i=1 def j n n U U j P = U −→ τ  → τ  U = U ,f ∈P . j j i j j i=1 j=1 τ { τ ...  τ } 1 1 n n j=1 def – A list type list(τ) is interpreted as the set of lists, list(τ) = τ i=1 with U-plots def j ∞ ∞ U U j i P = U −→ τ  → τ  U = U ,f ∈P . j j j j list(τ) i=1 j=1 j=1 The constructors and destructors for variants and lists are interpreted as in the usual set theoretic semantics. It is routine to show inductively that these interpretations are smooth. Thus every term Γ t : τ in the extended language is interpreted as a smooth function t : Γ  → τ  between diffeological spaces. (In this section we focused on a language with lists, but other inductive types are easily interpreted in the category of diffeological spaces in much the same way; the categorically minded reader may regard this as a consequence of Diff being a concrete Grothendieck quasitopos, e.g. [3].) 5 Categorical analysis of forward AD and its correctness This section has three parts. First, we give a categorical account of the functo- riality of AD (Ex. 8). Then we introduce our gluing construction, and relate it to the correctness of AD (dgm. (3)). Finally, we state and prove a correctness theorem for all first order types by considering a category of manifolds (Th. 2). Syntactic categories. Our language induces a syntactic category as follows. Definition 2. Let Syn be the category whose objects are types, and where a morphism τ → σ is a term in context x : τ t : σ modulo the βη-laws (Fig. 4). Composition is by substitution. 330 M. Huot et al. For simplicity, we do not impose arithmetic identities such as x + y = y + x in Syn. As is standard, this category has the following universal property. Lemma 2 (e.g. [27]). For every bicartesian closed category C with list objects, and every object F (real) ∈C and morphisms F (c) ∈C(1,F (real)), F (+),F (∗) ∈ C(F (real) × F (real),F (real)), F (ς) ∈ Syn(F (real),F (real)) in C, there is a unique functor F : Syn →C respecting the interpretation and preserving the bicartesian closed structure as well as list objects. Proof (notes). The functor F : Syn →C is a canonical denotational semantics for the language, interpreting types as objects of C and terms as morphisms. def For instance, F (τ → σ) =(Fτ → Fσ), the function space in the category C, def and F (ts) = is the composite (Ft, Fs); eval. When C = Diff, the denotational semantics of the language in diffeological spaces (§3,4) can be understood as the unique structure preserving functor − : Syn → Diff satisfying real = R, ς = ς and so on. → − Example 8 (Canonical definition forward AD). The forward AD macro D (§2,4) arises as a canonical cartesian closed functor on Syn. Consider the unique carte- → − sian closed functor F : Syn → Syn such that F (real)= real∗real, F (c)= D (c), → − F (ς)= D (ς(x)), and → − F (+) = z : F (real)∗F (real) case z of x, y→ D (x + y): F (real) → − F (∗)= z : F (real)∗F (real) case z of x, y→ D (x ∗ y): F (real) → − → − Then for any type τ, F (τ)= D (τ), and for any term x : τ t : σ, F (t)= D (t) as morphisms F (τ) → F (σ) in the syntactic category. Categorical gluing and logical relations. Gluing is a method for building new categorical models which has been used for many purposes, including logical relations and realizability [24]. Our logical relations argument in the proof of Th. 1 can be understood in this setting. In this subsection, for the categorically minded, we explain this, and in doing so we quickly recover a correctness result for the more general language in § 4 and for arbitrary first order types. We define a category Gl whose objects are triples (X, X ,S) where X U U and X are diffeological spaces and S ⊆P ×P is a relation between their X X U-plots. A morphism (X, X ,S) → (Y, Y ,T ) is a pair of smooth functions t t 1 n case t ,...,t  of x ,...,x → s = s[ / ,..., / ] 1 n 1 n x xn #x ,...,x t 1 n x ,...,x  (λx.t) s = t[ / ] 1 n x s[ / ] = case t of x ,...,x → s[ / ] y 1 n y #x t = λx.t x case  t of { x → s ···  x → s } = s [ / ] i 1 1 1 n n n i x #x ,...,x t 1 n  x  x 1 1 n n s[ / ] = case t of { x → s[ / ] ···  x → s[ / ]} y 1 1 y n n y #x ,...,x 1 n We write = to indi- fold (x ,x ).t over [] from r = r 1 2 cate that the variables are s fold (x ,x ).t over s from r 1 1 2 2 fold (x1,x2).t over s1 :: s2 from r = t[ /x , /x ] 1 2 free in the left hand side. #x ,x [] s x ::y t 1 2 u = s[ / ],r[ / ]= s[ / ] ⇒ s[ / ] = fold (x ,x ).r over t from u y x y y 1 2 Fig. 4. Standard βη-laws (e.g. [27]) for products, functions, variants and lists. Correctness of AD via Diffeologies and Categorical Gluing 331 f : X → Y , f : X → Y , such that if (g, g ) ∈ S then (g; f, g ; f ) ∈ T . The idea is that this is a semantic domain where we can simultaneously interpret the language and its automatic derivatives. Proposition 1. The category Gl is bicartesian closed, has list objects, and the projection functor proj : Gl → Diff × Diff preserves this structure. Proof (notes). The category Gl is a full subcategory of the comma category id ↓ Diff(U, −) × Diff(U, −). The result thus follows by the general theory Set of categorical gluing (e.g. [17, Lemma 15]). We give a semantics − =(− , − ,S ) for the language in Gl , interpreting 0 1 − R def types τ as objects (τ  , τ  ,S ), and terms as morphisms. We let real = R 0 1 τ 0 def def and real = R , with the relation S = {(f, (f, ∇f)) | f : R → R smooth}. 1 real def def We interpret the constants c as pairs c = c and c =(c, 0), and we interpret 0 1 +, ×,ς in the standard way (meaning, like −)in − , but according to the 2 2 2 derivatives in − , for instance, ∗ : R × R → R is 1 1 def ∗ ((x, x ), (y, y )) =(xy, xy + x y). At this point one checks that these interpretations are indeed morphisms in Gl . This amounts to checking that these interpretations are dual numbers representations in the sense of (2). The remaining constructions of the language are interpreted using the categorical structure of Gl , following Lem. 2. Notice that the diagram below commutes. One can check this by hand or note that it follows from the initiality of Syn (Lem. 2): all the functors preserve all the structure. → − (id,D (−)) Syn Syn × Syn (3) − −×− Gl Diff × Diff proj We thus arrive at a restatement of the correctness theorem (Th. 1), which holds even for the extended language with variants and lists, because for any x ...x : 1 n → − real t : real, the interpretations (t, D (t)) are in the image of the projection → − Gl → Diff × Diff, and hence D (t) is a dual numbers encoding of t. Correctness at all first order types, via manifolds. We now generalize Theorem 1 to hold at all first order types, not just the reals. To do this, we need to define the derivative of a smooth map between the interpretations of first order types. We do this by recalling the well known theory of manifolds and tangent bundles. For our purposes, a smooth manifold M is a second-countable Hausdorff topo- logical space together with a smooth atlas: an open cover U together with home- n(U) −1 omorphisms φ : U → R (called charts) such that φ ; φ is smooth U V U∈U 332 M. Huot et al. on its domain of definition for all U, V ∈U. A function f : M → N between −1 manifolds is smooth if φ ; f; ψ is smooth for all charts φ and ψ of M and V U V N, respectively. Let us write Man for this category. Our manifolds are slightly unusual because different charts in an atlas may have different finite dimension n(U). Thus we consider manifolds with dimensions that are potentially unbounded, albeit locally finite. This does not affect the theory of differential geometry as far as we need it here. Each open subset of R can be regarded as a manifold. This lets us regard the category of manifolds Man as a full subcategory of the category of diffeological spaces. We consider a manifold (X, {φ } ) as a diffeological space with the same U U carrier set X and where the plots P are the smooth functions in Man(U, X). A function X → Y is smooth in the sense of manifolds if and only if it is smooth in the sense of diffeological spaces [16]. For the categorically minded reader, this means that we have a full embedding of Man into Diff. Moreover, the natural interpretation of the first order fragment of our language in Man coincides with that in Diff. That is, the embedding of Man into Diff preserves finite products and countable coproducts (hence initial algebras of polynomial endofunctors). Proposition 2. Suppose that a type τ is first order, i.e. it is just built from reals, products, variants, and lists (or, again, arbitrary inductive types), and not function types. Then the diffeological space τ  is a manifold. Proof (notes). This is proved by induction on the structure of types. In fact, one may show that every such τ  is isomorphic to a manifold of the form R i=1 where the bound n is either finite or ∞, but this isomorphism is typically not an identity function. The constraint to first order types is necessary because, e.g. the space real → real is not a manifold, because of a Borsuk-Ulam argument (see [15], Appx. A). We recall that the derivative of any morphism f : M → N of manifolds is given as follows. For each point x in a manifold M, define the tangent space T M to be the set {γ ∈ Man(R,M) | γ(0) = x}/ ∼ of equivalence classes [γ]of smooth curves γ in M based at x, where we identify γ ∼ γ iff ∇(γ ; f)(0) = 1 2 1 ∇(γ ; f)(0) for all smooth f : M → R. The tangent bundle of M is the set def T (M) = T (M). The charts of M equip T (M) with a canonical manifold x∈M structure. Then for smooth f : M → N, the derivative T (f): T (M) →T (N) def is defined as T (f)(x, [γ]) =(f(x), [γ; f]). All told, the derivative is a functor T : Man → Man. As is standard, we can understand the tangent bundle of a composite space in terms of that of its parts. ∞ ∞ Lemma 3. There are canonical isomorphisms T ( M ) = T (M ) and i i i=1 i=1 T (M × ... × M ) = T (M ) × ... ×T (M ). 1 n 1 n → − → − D T We define a canonical isomorphism φ : D (τ) →T (τ ) for every type τ, → − → − D T by induction on the structure of types. We let φ : D (real) →T (real)be real Correctness of AD via Diffeologies and Categorical Gluing 333 → − def D T given by φ (x, x ) =(x, [t → x + x t]). For the other types, we use Lemma 3. real We can now phrase correctness at all first order types. → − Theorem 2 (Semantic correctness of D (full)). For any ground τ, any first → − order context Γ and any term Γ t : τ, the syntactic translation D coincides with the tangent bundle functor, modulo these canonical isomorphisms: → − D (t) → − → − D (Γ ) D (τ) → − → − ∼ ∼ D T D T = = φ φ T (Γ ) T (τ ) T (t) Proof (notes). For any curve γ ∈ Man(R,M), let γ¯ ∈ Man(R, T (M)) be the tangent curve, given by γ¯(x)=(γ(x), [t → γ(x + t)]). First, we note that a smooth map h : T (M) →T (N) is of the form T (g) for some g : M → N if for all smooth curves γ : R → M we have γ¯; h = (γ; g): R →T (N). This → − D T ˜ ˜ ¯ generalizes (2). Second, for any first order type τ, S = {(f, f) | f; φ = f}. This is shown by induction on the structure of types. We conclude the theorem from diagram (3), by putting these two observations together. 6 A continuation-based AD algorithm We now illustrate the flexibility of our framework by briefly describing an alter- ← − native syntactic translation D . This alternative translation uses aspects of con- tinuation passing style, inspired by recent developments in reverse mode AD [34, ← − ← − 5]. In brief, D works by D (real)=(real∗(real → ρ)). Thus instead of using a ρ ρ pair of a number and its tangent, we use a pair of a number and a continuation. The answer type ρ = real needs to have the structure of a vector space, and the continuations that we consider will turn out to be linear maps. Because we work in continuation passing style, the chain rule is applied contravariantly. If the reader is familiar with reverse-mode AD algorithms, they may think of the dimension k as the number of memory cells used to store the result. Computing the whole gradient of a term x : real, ..., x : real t : real at 1 k ← − once is then achieved by running D (t)ona k-tuple of basis vectors for real . ← − We define the continuation-based AD macro D on types and terms as the ← − unique structure preserving functor Syn → Syn with D (real)= (real∗(real → real )) and ← − def D (c) = c,λz.0,..., 0 ← − def ← − ← − D (t + s) = case D (t) of x, x → case D (s) of y, y →x + y, λz.x z + y z k k k ← − def ← − ← − D (t ∗ s) = case D (t) of x, x → case D (s) of y, y → k k k x ∗ y, λz.x (y ∗ z)+ y (x ∗ z) def ← − ← − D (ς(t)) = case D (t) of x, x → let y = ς(x) in y, λz.x (y ∗ (1 − y) ∗ z). k k def k k Here, we use sugar x : real ,y : real x + y = case x of x ,...,x → 1 k 334 M. Huot et al. case y of y ,...,y →x + y ,...,x + y . (We could easily expand this def- 1 k 1 1 k k ← − → − inition by making D preserve all other term and type formers, as we did for D .) Note that the corresponding scheme for an arbitrary n-ary operation op would be (c.f. the scheme for forward AD in §4) ← − def ← − ← − D (op(t ,...,t )) = case D (t ) of x ,x → ... → case D (t ) of x ,x → k 1 n k 1 1 k n n 1 n op(x ,...,x ),λz. x (∂ op(x ,...,x ) ∗ z). 1 n i 1 n i=1 ← − The idea is that D (t) is a higher order function that simultaneously computes t (the forward pass) and defines as a continuation the reverse pass which com- putes the gradient. In order to actually run the algorithm, we need two auxiliary definitions def lamR = λz. case z of x, x → case x of x ,...,x → real 1 → − ← − x, λy.x ∗ y,...,x ∗ y : D (real) → D (real) k k 1 k def ← − → − evR = λz. case z of x, x →x, x 1 : D (real) → D (real). k k real → − Here, D is a macro on types (and terms) with exactly the same inductive def- → − → − inition as D except for the base case D (real)= (real∗real ). By noting that → − ← − both D and D preserve all type formers, we can extend these definitions to all k k → − ← − ← − → − k k first order types τ: z : D (τ) lamR (z): D (τ),z : D (τ) evR (z): D (τ). k k k k τ τ → − We can think of lamR (z) as encoding k tangent vectors z : D (τ) as a closure, ← − so it is suitable for running D (t) on, and evR (z) as actually evaluating the ← − reverse pass defined by z : D (τ) and returning the result as k tangent vectors. The idea is that given some x : τ t : σ between first order types τ, σ,werun ← − lamR (z) our continuation-based AD by running evR (D (t)[ / ]). k x The correctness proof closely follows that for forward AD. In particular, r,k r,k k R one defines a binary logical relation real =(R, R × (R ) ,S ), where real r,k S = (f, x → (f(x),y → (∂ f(x) ∗ y,...,∂ f(x) ∗ y))) | f ∈P ,onthe 1 k real k k ← − ← − R R plots P ×P and verifies that c × D (c), x + y × D (x + y), k R k k R×((R ) ) ← − ← − x∗y×D (x∗y) and ς(x)×D (ς(x)) respect this logical relation. It follows k k ← − r,k that this relation extends to a functor − : Syn → Gl k such that id × D r,k factors over − , implying the correctness of the continuation-based AD by the following lemma. Lemma 4. For all first order types τ (i.e. types not involving function types), k k we have that evR (lamR (t)) = t. τ τ Proof (notes). This follows by an induction on the structure of τ. The idea is that lamR embeds reals into function spaces as linear maps, which is undone by evR by evaluating the linear maps at 1. To phrase correctness, in this setting, however, we need a few definitions. Keeping in mind the canonical projection T (M) → M, we define T (M)as the k-fold categorical pullback (fibre product) T (M) × ... × T (M). To be M M explicit, T M consists of k-tuples of tangent vectors at the base point x. Again, def k k T extends to a functor Man → Man by defining T (f)(x, (v ,...,v )) = 1 k (f(x), (T (f)(v ),..., T (f)(v ))). As T preserves countable coproducts and x 1 x k Correctness of AD via Diffeologies and Categorical Gluing 335 → − D T finite products (like T ), it follows that the isomorphisms φ generalize to → − → − D T canonical isomorphisms φ : D (τ) →T (τ ) for first order types τ. This τ,k leads to the following correctness statement for continuation-based AD. ← − Theorem 3 (Semantic correctness of D ). For any ground τ, any first order ← − lamR (z) context Γ and any term Γ t : τ, syntactic translation t → evR (D (t)[ / ]) k ... coincides with the tangent bundle functor, modulo these canonical isomorphisms: ← − k k lamR ;D (t);evR → − Γ τ → − D (Γ ) D (τ) k k → − → − ∼ ∼ D T D T = = φ φ Γ,k τ,k k k T (Γ ) T (τ ) T (t) For example, when τ = real and Γ = x, y : real, we can run our continuation- based AD to compute the gradient of a program x, y : real t : real at values x = V, y = W by evaluating 2 2 ← − 2 (lamR v) (lamR w) V,1,0 W,0,1 x:real y:real evR (D (t)[ / , / ])[ / , / ]. 2 x y v w real Indeed, 2 2 ← − 2 (lamR v) (lamR w) V,1,0 W,0,1 x:real y:real evR (D (t)[ / , / ])[ / , / ] = 2 x y v w real t(V , W ),∂ t(V , W ),∂ t(V , W ) . 1 2 7 Discussion and future work Summary. We have shown that diffeological spaces provide a denotational semantics for a higher order language with variants and inductive types (§3,4). We have used this to show correctness of a simple AD translation (Thm. 1, Thm. 2). But the method is not tied to this specific translation, as we illustrated in Section 6. The structure of our elementary correctness argument for Theorem 1 is a typical logical relations proof. As explained in Section 5, this can equivalently be understood as a denotational semantics in a new kind of space obtained by categorical gluing. Overall, then, there are two logical relations at play. One is in diffeological spaces, which ensures that all definable functions are smooth. The other is in the correctness proof (equivalently in the categorical gluing), which explicitly tracks the derivative of each function, and tracks the syntactic AD even at higher types. Connection to the state of the art in AD implementation. As is common in denotational semantics research, we have here focused on an idealized language and simple translations to illustrate the main aspects of the method. There are a number of points where our approach is simplistic compared to the advanced current practice, as we now explain. 336 M. Huot et al. Representation of vectors. In our examples we have treated n-vectors as tuples of length n. This style of programming does not scale to large n.Abetter solution would be to use array types, following [31]. Our categorical semantics and correctness proofs straightforwardly extend to cover them, in a similar way to our treatment of lists. Efficient forward-mode AD. For AD to be useful, it must be fast. The syntactic → − translation D that we use is the basis of an efficient AD library [31]. However, numerous optimizations are needed, ranging from algebraic manipulations, to partial evaluations, to the use of an optimizing C compiler. A topic for future work would be to validate some of these manipulations using our semantics. The resulting implementation is performant in experiments [31]. Efficient reverse-mode AD. Our sketch of continuation-based AD is primarily intended to emphasise that our denotational approach is not tied to any specific → − translation D . Nonetheless, it is worth noting that this algorithm shares similari- ties with advanced reverse-mode implementations: (1) it calculates derivatives in a (contravariant) “reverse pass” in which derivatives of operations are evaluated in the reverse order compared to their order in calculating the function value; (2) it can be used to calculate the full gradient of a function R → R in a single reverse pass (while n passes of fwd AD would be necessary). However, it lacks important optimizations and the continuation scales with the size of the input n where it should scale with the size of the output. This adds an important over- head, as pointed out in [26]. Speed being the main attraction of reverse-mode AD, its implementations tend to rely on mutable state, control operators and/or staging [26, 6, 34, 5], which we have not considered here. Other language features. The idealized languages that we considered so far do not touch on several useful language constructs. For example: the use of functions that are partial (such as division) or partly-smooth (such as RelU); phenomena such as iteration, recursion; and probabilities. There are suggestions that the denotational approach using diffeological spaces can be adapted to these features using standard categorical methods. We leave this for future work. Acknowledgements. We have benefited from discussing this work with many people, including B. Pearlmutter, O. Kammar, C. Mak, L. Ong, G. Plotkin, A. Shaikhha, J. Sigal, and others. Our work is supported by the Royal Society and by a Facebook Research Award. In the course of this work, MV has also been employed at Oxford (EPSRC Project EP/M023974/1) and at Columbia in the Stan development team. This project has received funding from the Euro- pean Union’s Horizon 2020 research and innovation programme under the Marie Sklodowska-Curie grant agreement No. 895827. Correctness of AD via Diffeologies and Categorical Gluing 337 References 1. Abadi, M., Barham, P., Chen, J., Chen, Z., Davis, A., Dean, J., Devin, M., Ghe- mawat, S., Irving, G., Isard, M., et al.: Tensorflow: A system for large-scale machine learning. In: 12th USENIX Symposium on Operating Systems Design and Imple- mentation (OSDI 16). pp. 265–283 (2016) 2. Abadi, M., Plotkin, G.D.: A simple differentiable programming language. In: Proc. POPL 2020. ACM (2020) 3. Baez, J., Hoffnung, A.: Convenient categories of smooth spaces. Transactions of the American Mathematical Society 363(11), 5789–5825 (2011) 4. Barthe, G., Crubill´e, R., Lago, U.D., Gavazzo, F.: On the versatility of open logical relations: Continuity, automatic differentiation, and a containment theorem. In: Proc. ESOP 2020. Springer (2020), to appear 5. Brunel, A., Mazza, D., Pagani, M.: Backpropagation in the simply typed lambda- calculus with linear negation. In: Proc. POPL 2020 (2020) 6. Carpenter, B., Hoffman, M.D., Brubaker, M., Lee, D., Li, P., Betancourt, M.: The Stan math library: Reverse-mode automatic differentiation in C++. arXiv preprint arXiv:1509.07164 (2015) 7. Christensen, J.D., Wu, E.: Tangent spaces and tangent bundles for diffeological spaces. arXiv preprint arXiv:1411.5425 (2014) 8. Cockett, J.R.B., Cruttwell, G.S.H., Gallagher, J., Lemay, J.S.P., MacAdam, B., Plotkin, G.D., Pronk, D.: Reverse derivative categories. In: Proc. CSL 2020 (2020) 9. Cruttwell, G., Gallagher, J., MacAdam, B.: Towards formalizing and extending differential programming using tangent categories. In: Proc. ACT 2019 (2019) 10. Duchi, J., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. Journal of Machine Learning Research 12(Jul), 2121– 2159 (2011) 11. Ehrhard, T., Regnier, L.: The differential lambda-calculus. Theoretical Computer Science 309(1-3), 1–41 (2003) 12. Elliott, C.: The simple essence of automatic differentiation. Proceedings of the ACM on Programming Languages 2(ICFP), 70 (2018) 13. Fong, B., Spivak, D., Tuy´eras, R.: Backprop as functor: A compositional perspec- tive on supervised learning. In: 2019 34th Annual ACM/IEEE Symposium on Logic in Computer Science (LICS). pp. 1–13. IEEE (2019) 14. Hoffman, M.D., Gelman, A.: The No-U-Turn sampler: adaptively setting path lengths in Hamiltonian Monte Carlo. Journal of Machine Learning Research 15(1), 1593–1623 (2014) 15. Huot, M., Staton, S., V´ ak´ ar, M.: Correctness of automatic differentiation via dif- feologies and categorical gluing. Full version (2020), arxiv:2001.02209 16. Iglesias-Zemmour, P.: Diffeology. American Mathematical Soc. (2013) 17. Johnstone, P.T., Lack, S., Sobocinski, P.: Quasitoposes, quasiadhesive categories and Artin glueing. In: Proc. CALCO 2007 (2007) 18. Kiefer, J., Wolfowitz, J., et al.: Stochastic estimation of the maximum of a regres- sion function. The Annals of Mathematical Statistics 23(3), 462–466 (1952) 19. Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) 20. Kucukelbir, A., Tran, D., Ranganath, R., Gelman, A., Blei, D.M.: Automatic differ- entiation variational inference. The Journal of Machine Learning Research 18(1), 430–474 (2017) 338 M. Huot et al. 21. Liu, D.C., Nocedal, J.: On the limited memory BFGS method for large scale opti- mization. Mathematical programming 45(1-3), 503–528 (1989) 22. Mak, C., Ong, L.: A differential-form pullback programming language for higher- order reverse-mode automatic differentiation (2020), arxiv:2002.08241 23. Manzyuk, O.: A simply typed λ-calculus of forward automatic differentiation. In: Proc. MFPS 2012 (2012) 24. Mitchell, J.C., Scedrov, A.: Notes on sconing and relators. In: International Work- shop on Computer Science Logic. pp. 352–378. Springer (1992) 25. Neal, R.M., et al.: MCMC using Hamiltonian dynamics. Handbook of Markov Chain Monte Carlo 2(11), 2 (2011) 26. Pearlmutter, B.A., Siskind, J.M.: Reverse-mode AD in a functional framework: Lambda the ultimate backpropagator. ACM Transactions on Programming Lan- guages and Systems (TOPLAS) 30(2), 7 (2008) 27. Pitts, A.M.: Categorical logic. Tech. rep., University of Cambridge, Computer Lab- oratory (1995) 28. Plotkin, G.D.: Some principles of differential programming languages (2018), in- vited talk, POPL 2018 29. Qian, N.: On the momentum term in gradient descent learning algorithms. Neural networks 12(1), 145–151 (1999) 30. Robbins, H., Monro, S.: A stochastic approximation method. The annals of math- ematical statistics pp. 400–407 (1951) 31. Shaikhha, A., Fitzgibbon, A., Vytiniotis, D., Peyton Jones, S.: Efficient differen- tiable programming in a functional array-processing language. Proceedings of the ACM on Programming Languages 3(ICFP), 97 (2019) 32. Souriau, J.M.: Groupes diff´erentiels. In: Differential geometrical methods in math- ematical physics, pp. 91–128. Springer (1980) 33. Stacey, A.: Comparative smootheology. Theory Appl. Categ. 25(4), 64–117 (2011) 34. Wang, F., Wu, X., Essertel, G., Decker, J., Rompf, T.: Demystifying differen- tiable programming: Shift/reset the penultimate backpropagator. Proceedings of the ACM on Programming Languages 3(ICFP) (2019) Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made. The images or other third party material in this chapter are included in the chapter’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. Deep Induction: Induction Rules for (Truly) Nested Types Patricia JohannB and Andrew Polonsky Appalachian State University, Boone, NC, USA, Abstract. This paper introduces deep induction, and shows that it is the notion of induction most appropriate to nested types and other data types defined over, or mutually recursively with, (other) such types. Stan- dard induction rules induct over only the top-level structure of data, leaving any data internal to the top-level structure untouched. By con- trast, deep induction rules induct over all of the structured data present. We give a grammar generating a robust class of nested types (and thus ADTs), and develop a fundamental theory of deep induction forthem using their recently defined semantics as fixed points of accessible func- tors on locally presentable categories. We then use our theory to derive deep induction rules for some common ADTs and nested types, and show how these rules specialize to give the standard structural induction rules for these types. We also show how deep induction specializes to solve the long-standing problem of deriving principled and practically useful structural induction rules for bushes and other truly nested types. Overall, deep induction opens the way to making induction principles appropriate to richly structured data types available in programming languages and proof assistants. Agda implementations of our develop- ment and examples, including two extended case studies, are available. 1 Introduction This paper is concerned with the problem of inductive reasoning about induc- tive data types that are defined over, or are defined mutually recursively with, (other) such data types. Examples of such deep data types include, trivially, ordi- nary algebraic data types (ADTs), such as list and tree types;data types,such as the forest type, whose recursive occurrences appear below other type con- structors; simple nested types, such as the type of perfect trees, whose recursive occurrences never appear below their own type constructors;truly nested types, such as the type of bushes (also called bootstrapped heaps by Okasaki [16]), whose recursive occurrences do appear below their own type constructors; and GADTs. Proof assistants, including Coq and Agda, currently provide insufficient support for performing induction over deep data types. The inductionrules, ifany, they generate for such types induct over only their top-level structures, leaving any data internal to the top-level structure untouched. This paper develops a prin- ciple that, by contrast, inducts over all of the structured data present. We call this principle deep induction. Deep induction not only provides general support for solving problems that previously had only (usually quite painful and) ad hoc solutions, but also opens the way for incorporating automatic generation of useful induction rules for deep data types into proof assistants. Nested types that are defined over themselves are known as truly nested types. c The Author(s) 2020 J. Goubault-Larrecq and B. K¨ onig (Eds.): FOSSACS 2020, LNCS 12077, pp. 339–358, 2020. 340 P. Johann and A. Polonsky To illustrate the difference between structural induction and deep induction, note that the data inside a structure of type List a = Nil | Cons a (List a)is treated monolithically (i.e., ignored) by the structural induction rule for lists: ∀(a : Set)(P : List a → Set) → P Nil → (∀ (x : a)(xs : List a)→ Pxs → P (Cons x xs))→∀ (xs : List a)→ Pxs By contrast, the deep induction rule for lists traverses not just the outer list structure with a predicate P, but also each data element of that list with a custom predicate Q: ∀ (a : Set)(P : List a → Set)(Q : a → Set) → P Nil → (∀(x : a)(xs : List a) → Qx → Pxs → P (Cons x xs))→ ∀(xs : List a)→ List Qxs → Pxs Here, List lifts its argument predicate Q on data of type a toapredicate on data of type List a asserting that Q holds for every element of its argument list. The structural induction rule for lists is, like that for any ADT, recovered by taking the custom predicate in the corresponding deep rule tobe λx. True. A particular advantage of deep induction is that it obviates the need to reflect properties as data types. For example, although the set of primes cannot be de- fined by an ADT, the primeness predicate Prime on the ADT of natural numbers can be lifted to a predicate List Prime characterizing lists of primes. Properties can then be proved for lists of primes using the following deep induction rule: ∀(P : List Nat→ Set) → P Nil → (∀(x : Nat)(xs : List Nat)→ Prime x → Pxs → P (Cons x xs))→ ∀(xs : List Nat)→ List Prime xs → Pxs As we’ll see in Sections 3, 4,and 5, the extra flexibility afforded by lifting predi- cates like Q and Prime on data internal to a structure makes it possible to derive useful induction principles for more complex types, such as truly nested ones. In each of the above examples, a predicate on the data is lifted toapredicate on the list. This is an example of lifting a predicate on a type in a non-recursive position of an ADT’s definition to the entire ADT. However, the predicate to be lifted can also be on the type in a recursive position of a definition — i.e., on the ADT being defined itself — and this ADT can appear below another type constructor in the definition. This is exactly the situation for the ADT Forest a, which appears below the type constructor List in the definition Forest a = FEmpty | FNode a (List (Forest a)) The induction rule Coq generates for forests is ∀ (a : Set)(P : Forest a→ Set) → P FEmpty→ (∀ (x : a)(ts : List (Forest a))→ P (FNode x ts))→∀ (x : Forest a)→ Px However, this is neither the induction rule we intuitively expect, nor is it expres- sive enough to prove even basic properties of forests that ought to be amenable to inductive proof. The approach of [11,12] does give the expected rule This is equivalent to the rule as classically stated in Coq/Isabelle/HOL. Deep induction 341 ∀ (a : Set)(P : Forest a→ Set) → P FEmpty→ (∀ (x : a)(ts : List (Forest a))→ (∀ (k < length ts)→ P (ts!!k)) → P (FNode x ts))→∀ (x : Forest a)→ Px But to derive it, a technique based on list positions is used to propagate the predicate to be proved over the list of forests that is the second argument to the data constructor FNode. Unfortunately, this technique does not obviously extend to other deep data types, including the type of “generalized forests” introduced in Section 4.4 below, which combines smaller generalized forests into larger ones using a type constructor f potentially different from List. Nevertheless, replac- ing ∀ (k < length ts)→ P (ts!!k) in the expected rule with List Pts,which is equivalent, reveals that it is nothing more than the special case for Q = λx. True of the following deep induction rule for Forest a: ∀ (a : Set)(P : Forest a→ Set)(Q : a → Set) → P FEmpty→ (∀ (x : a)(ts : List (Forest a))→ Qx → List Pts → P (FNode x ts))→ ∀ (x : Forest a)→ Forest Qx → Px When types, like Forest a and List (Forest a) above, are defined by mutual recursion, their (deep) induction rules are defined by mutual recursion as well. For example, the induction rules for the ADTs data Expr = Lit Nat | Add Expr Expr | If BExpr Expr Expr data BExpr = BLit Bool | And BExpr BExpr | Not BExpr | Equal Expr Expr of integer and boolean expressions are defined by mutual recursion as ∀(P : Expr → Set)(Q : BExpr → Set) → (∀(n : Nat) → P (Lit n)) → (∀(e1 : Expr)(e2 : Expr) → Pe1 → Pe2 → P (Add e1 e2))→ (∀(b : BExpr)(e1 : Expr)(e2 : Expr) → Qb → Pe1 → Pe2 → P (Ifbe1e2))→ (∀(b : Bool). Q (BLit b))→ (∀(b1 : BExpr)(b2 : BExpr) → Qb1 → Qb2 → Q (And b1 b2))→ (∀(b : BExpr)→ Qb → Q (Not b))→ (∀(e1 : Expr)(e2 : Expr) → Pe1 → Pe2 → Q (Equal e1 e2))→ (∀(e : Expr) → Pe) × (∀(b : BExpr) → Qb) 2 The Key Idea As the examples of the previous section suggest, the key to deriving deep induc- tion rules from (deep) data type declarations is to parameterize the induction rules not just over a predicate over the top-level data type being defined, but over predicates on the types of primitive data they contain as well. These additional predicates are then lifted to predicates on any internal structures containing these data, and the resulting predicates on these internal structures are lifted to predicates on any internal structures containing structures at the previous level, and so on, until the internal structures at all levels of the data type definition, including the top level, have been so processed. Satisfaction of a predicate by the data at one level of a structure is then conditioned upon satisfaction of the 342 P. Johann and A. Polonsky appropriate predicates by all of the data at the preceding level. The above deep induction rules were all obtained using this technique. For example, the deep induction rule for lists is derived by first noting that struc- tures of type List a contain only data of type a, so that only one additional predicate parameter, which we called Q above, is needed. Then, since the only data structure internal to the type List a is List a itself, Q need only be lifted to lists containing data of type a.Thisisexactly what List Q does. Finally, the deep induction rule for lists is obtained by parameterizing the standard one over not just P but also Q, adding the additional hypothesis Qx to its second antecedent, and adding the additional hypothesis List Qxs to its conclusion. The deep induction rule for forests is similarly obtained from the Coq- generated rule by first parameterizing over an additional predicate Q on the type a of data stored in the forest, then lifting P to a predicate on lists contain- ingdataoftype Forest a and Q to forests containing data of type a,and,finally, adding the additional hypotheses Qx and List Pts to its second antecedent and the additional hypothesis Forest Qx to its conclusion. ∧ ∧ Predicate liftings such as List and Forest may either be supplied as prim- itives, or be generated automatically from the definitions of the types themselves, as describedinSection 4. For container types, lifting a predicate amounts to traversing the container and applying the argument predicate pointwise. Our technique for deriving deep induction rules for ADTs, as well as its gen- eralization to nested types given in Section 3, is both made precise and rigorously justified in Section 4 using the results of [13]. This paper can thus be seen as a concrete application, in the specific category Fam, of the very general semantics developed in [13]; indeed, our induction rules are computed as the interpreta- tions of the syntax for nested types in Fam. A general methodology is extracted in Section 5. The rest of this paper can be read either as “just” describinghow to generate deep induction rules in practice, or as also proving that our technique for doing so is provably correct and general. Our Agda code is at [14]. 3 Extending to Nested Types Appropriately generalizing the basic technique of Section 2 derives deep induc- tion rules, and therefore structural induction rules, for nested types, including truly nested types and other deep nested types. Nested types generalize ADTs by allowing elements at one instance of a data type to depend ondata at other instances of the same type so that, in effect, the entire family of instances is constructed simultaneously. That is, rather than defining standalone families of inductive types, one for each choice of types to which type constructors like List and Tree are applied, the type constructors for nested types define inductive families of types. The structural induction rule for a nested type must therefore account for its changing type parameters by parameterizing over an appropri- ately polymorphic predicate, and appropriately instantiating that predicate’s type argument at each application site. For example, the structural induction rule for the nested type PTree a = PLeaf a | PNode (PTree (a× a)) Deep induction 343 of perfect trees is ∀ (P : ∀ (a : Set) → PTree a → Set) → (∀ (a : Set)(x : a) → Pa (PLeaf x))→ (∀ (a : Set)(x : PTree (a × a)) → P (a × a) x → Pa (PNode x))→ ∀ (a : Set)(x : PTree a)→ Pax and the structural induction rule for the nested type data Lam a where Var :: a → Lam a App :: Lam a → Lam a → Lam a Abs :: Lam (Maybe a)→ Lam a of de Bruijn encoded lambda terms [9] with variables of type a is ∀(P : ∀(a : Set) → Lam a → Set) → (∀(a : Set)(x : a) → Pa (Var x))→ (∀(a : Set)(x : Lam a)(y : Lam a) → Pax → Pay → Pa (Appxy))→ (∀(a : Set)(x : Lam (Maybe a))→ P (Maybe a) x→ Pa (Abs x))→ ∀(a : Set)(x : Lam a)→ Pax Deep induction rules for nested types must similarly account for their type con- structors’ changing type parameters while also parameterizing over the addi- tional predicate on the type of data they contain. Letting Pair Q be the lifting of a predicate Q on a to pairs of type a × a,so that Pair Q (x, y)= Qx × Qy, this gives the deep induction rule ∀ (P : ∀ (a : Set) → (a → Set) → PTree a→ Set) → (∀ (a : Set)(Q : a → Set)(x : a) → Qx → PaQ (PLeaf x))→ (∀ (a : Set)(Q : a → Set)(x : PTree (a × a)) → P (a × a)(Pair Q) x → PaQ (PNode x))→ ∀ (a : Set)(Q : a → Set)(x : PTree a)→ PTree Qx → PaQx for perfect trees, and the deep induction rule ∀(P : ∀(a : Set) → (a → Set) → Lam a → Set) → (∀(a : Set)(Q : a → Set)(x : a) → Qx → PaQ (Var x))→ (∀(a : Set)(Q : a → Set)(x : Lam a)(y : Lam a)→ PaQx→ PaQy→ PaQ (Appxy))→ (∀(a : Set)(Q : a → Set)(x : Lam (Maybe a))→ P (Maybe a)(Maybe Q) x → PaQ (Abs x))→ ∀(a : Set)(Q : a → Set)(x : Lam a) → Lam Qx → PaQx for lambda terms. As usual, the structural induction rules for these types can be recovered by setting Q = λx. True in their deep induction rules. Moreover, the basic technique described in Section 2 can be recovered from the more general one described in this section by noting that the type arguments to ADT data type constructors don’t change, and that the internal predicate parameter to P can therefore be lifted to the outermost level of ADT induction rules. We conclude this section by giving both structural and deep induction rules 344 P. Johann and A. Polonsky for the following truly nested type of bushes [8]: Bush a = BNil | BCons a (Bush (Bush a)) (Note that this type is not even definable in Agda.) Correct and useful structural induction rules for bushes and other truly nested types have long been elusive. One recent effort to derive such rules has been recorded in [10], but the approach taken there is more ad hoc than not, and generates induction rules for data types related to the nested types of interest rather than for the original nested types themselves. To treat bushes, for example, Fu and Selinger rewrite the type Bush a as NBush (Succ Zero) a,where NBush = NTimes Bush and NTimes :: (Set → Set) → Nat → Set → Set NTimes p Zero s = s NTimes p (Succ n) s = p (NTimespns) Their induction rule for bushes is then given in terms of these rewritten ones as ∀ (a : Set)(P : ∀ (n : Nat) → NBush n a→ Set) → (∀ (x : a) → P Zero x)→ (∀ (n : Nat) → P (Succ n) BNil)→ (∀ (n : Nat)(x : NBush n a)(xs : NBush (Succ (Succ n)) a)→ Pnx → P (Succ (Succ n)) xs→ P (Succ n)(BCons x xs))→ ∀ (n : Nat)(xs : NBush n a)→ Pnxs This approach appears promising, but is not yet fully mature.