# Foundations of Software Science and Computation Structures

Foundations of Software Science and Computation Structures
2020-01-01 00:00:00
Jean Goubault-Larrecq Barbara König (Eds.) Foundations of Software Science and Computation Structures 23rd International Conference, FOSSACS 2020 Held as Part of the European Joint Conferences on Theory and Practice of Software, ETAPS 2020 Dublin, Ireland, April 25–30, 2020, Proceedings LNCS 12077 ARCoSS Lecture Notes in Computer Science 12077 Founding Editors Gerhard Goos, Germany Juris Hartmanis, USA Editorial Board Members Elisa Bertino, USA Gerhard Woeginger , Germany Wen Gao, China Moti Yung, USA Bernhard Steffen , Germany Advanced Research in Computing and Software Science Subline of Lecture Notes in Computer Science Subline Series Editors Giorgio Ausiello, University of Rome ‘La Sapienza’, Italy Vladimiro Sassone, University of Southampton, UK Subline Advisory Board Susanne Albers, TU Munich, Germany Benjamin C. Pierce, University of Pennsylvania, USA Bernhard Steffen , University of Dortmund, Germany Deng Xiaotie, Peking University, Beijing, China Jeannette M. Wing, Microsoft Research, Redmond, WA, USA More information about this series at http://www.springer.com/series/7407 � Jean Goubault-Larrecq Barbara König (Eds.) Foundations of Software Science and Computation Structures 23rd International Conference, FOSSACS 2020 Held as Part of the European Joint Conferences on Theory and Practice of Software, ETAPS 2020 Dublin, Ireland, April 25–30, 2020 Proceedings Editors Jean Goubault-Larrecq Barbara König Université Paris-Saclay, University of Duisburg-Essen ENS Paris-Saclay, CNRS Duisburg, Germany Cachan, France ISSN 0302-9743 ISSN 1611-3349 (electronic) Lecture Notes in Computer Science ISBN 978-3-030-45230-8 ISBN 978-3-030-45231-5 (eBook) https://doi.org/10.1007/978-3-030-45231-5 LNCS Sublibrary: SL1 – Theoretical Computer Science and General Issues © The Editor(s) (if applicable) and The Author(s) 2020. This book is an open access publication. Open Access This book is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made. The images or other third party material in this book are included in the book’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the book’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication does not imply, even in the absence of a speciﬁc statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use. The publisher, the authors and the editors are safe to assume that the advice and information in this book are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or the editors give a warranty, expressed or implied, with respect to the material contained herein or for any errors or omissions that may have been made. The publisher remains neutral with regard to jurisdictional claims in published maps and institutional afﬁliations. This Springer imprint is published by the registered company Springer Nature Switzerland AG The registered company address is: Gewerbestrasse 11, 6330 Cham, Switzerland ETAPS Foreword Welcome to the 23rd ETAPS! This is the ﬁrst time that ETAPS took place in Ireland in its beautiful capital Dublin. ETAPS 2020 was the 23rd instance of the European Joint Conferences on Theory and Practice of Software. ETAPS is an annual federated conference established in 1998, and consists of four conferences: ESOP, FASE, FoSSaCS, and TACAS. Each conference has its own Program Committee (PC) and its own Steering Committee (SC). The conferences cover various aspects of software systems, ranging from theoretical computer science to foundations of programming language developments, analysis tools, and formal approaches to software engineering. Organizing these conferences in a coherent, highly synchronized conference program enables researchers to participate in an exciting event, having the possibility to meet many colleagues working in different directions in the ﬁeld, and to easily attend talks of different conferences. On the weekend before the main conference, numerous satellite workshops took place that attracted many researchers from all over the globe. Also, for the second time, an ETAPS Mentoring Workshop was organized. This workshop is intended to help students early in the program with advice on research, career, and life in the ﬁelds of computing that are covered by the ETAPS conference. ETAPS 2020 received 424 submissions in total, 129 of which were accepted, yielding an overall acceptance rate of 30.4%. I thank all the authors for their interest in ETAPS, all the reviewers for their reviewing efforts, the PC members for their contributions, and in particular the PC (co-)chairs for their hard work in running this entire intensive process. Last but not least, my congratulations to all authors of the accepted papers! ETAPS 2020 featured the unifying invited speakers Scott Smolka (Stony Brook University) and Jane Hillston (University of Edinburgh) and the conference-speciﬁc invited speakers (ESOP) Işıl Dillig (University of Texas at Austin) and (FASE) Willem Visser (Stellenbosch University). Invited tutorials were provided by Erika Ábrahám (RWTH Aachen University) on the analysis of hybrid systems and Madhusudan Parthasarathy (University of Illinois at Urbana-Champaign) on combining Machine Learning and Formal Methods. On behalf of the ETAPS 2020 attendants, I thank all the speakers for their inspiring and interesting talks! ETAPS 2020 took place in Dublin, Ireland, and was organized by the University of Limerick and Lero. ETAPS 2020 is further supported by the following associations and societies: ETAPS e.V., EATCS (European Association for Theoretical Computer Science), EAPLS (European Association for Programming Languages and Systems), and EASST (European Association of Software Science and Technology). The local organization team consisted of Tiziana Margaria (general chair, UL and Lero), Vasileios Koutavas (Lero@UCD), Anila Mjeda (Lero@UL), Anthony Ventresque (Lero@UCD), and Petros Stratis (Easy Conferences). vi ETAPS Foreword The ETAPS Steering Committee (SC) consists of an Executive Board, and representatives of the individual ETAPS conferences, as well as representatives of EATCS, EAPLS, and EASST. The Executive Board consists of Holger Hermanns (Saarbrücken), Marieke Huisman (chair, Twente), Joost-Pieter Katoen (Aachen and Twente), Jan Kofron (Prague), Gerald Lüttgen (Bamberg), Tarmo Uustalu (Reykjavik and Tallinn), Caterina Urban (Inria, Paris), and Lenore Zuck (Chicago). Other members of the SC are: Armin Biere (Linz), Jordi Cabot (Barcelona), Jean Goubault-Larrecq (Cachan), Jan-Friso Groote (Eindhoven), Esther Guerra (Madrid), Jurriaan Hage (Utrecht), Reiko Heckel (Leicester), Panagiotis Katsaros (Thessaloniki), Stefan Kiefer (Oxford), Barbara König (Duisburg), Fabrice Kordon (Paris), Jan Kretinsky (Munich), Kim G. Larsen (Aalborg), Tiziana Margaria (Limerick), Peter Müller (Zurich), Catuscia Palamidessi (Palaiseau), Dave Parker (Birmingham), Andrew M. Pitts (Cambridge), Peter Ryan (Luxembourg), Don Sannella (Edinburgh), Bernhard Steffen (Dortmund), Mariëlle Stoelinga (Twente), Gabriele Taentzer (Marburg), Christine Tasson (Paris), Peter Thiemann (Freiburg), Jan Vitek (Prague), Heike Wehrheim (Paderborn), Anton Wijs (Eindhoven), and Nobuko Yoshida (London). I would like to take this opportunity to thank all speakers, attendants, organizers of the satellite workshops, and Springer for their support. I hope you all enjoyed ETAPS 2020. Finally, a big thanks to Tiziana and her local organization team for all their enormous efforts enabling a fantastic ETAPS in Dublin! February 2020 Marieke Huisman ETAPS SC Chair ETAPS e.V. President Preface This volume contains the papers presented at the 23rd International Conference on Foundations of Software Science and Computation Structures (FoSSaCS), which took place in Dublin, Ireland, during April 27–30, 2020. The conference series is dedicated to foundational research with a clear signiﬁcance for software science. It brings together research on theories and methods to support the analysis, integration, syn- thesis, transformation, and veriﬁcation of programs and software systems. This volume contains 31 contributed papers selected from 98 full paper submis- sions, and also a paper accompanying an invited talk by Scott Smolka (Stony Brook University, USA). Each submission was reviewed by at least three Program Committee members, with the help of external reviewers, and the ﬁnal decisions took into account the feedback from a rebuttal phase. The conference submissions were managed using the EasyChair conference system, which was also used to assist with the compilation of these proceedings. We wish to thank all the authors who submitted papers to FoSSaCS 2020, the Program Committee members, the Steering Committee members and the external reviewers. In addition, we are grateful to the ETAPS 2020 Organization for providing an excellent environment for FoSSaCS 2020 alongside the other ETAPS conferences and workshops. February 2020 Jean Goubault-Larrecq Barbara König Organization Program Committee Parosh Aziz Abdulla Uppsala University, Sweden Thorsten Altenkirch University of Nottingham, UK Paolo Baldan Università di Padova, Italy Nick Benton Facebook, UK Frédéric Blanqui Inria and LSV, France Michele Boreale Università di Firenze, Italy Corina Cirstea University of Southampton, UK Pedro R. D’Argenio Universidad Nacional de Córdoba, CONICET, Argentina Josée Desharnais Université Laval, Canada Jean Goubault-Larrecq Université Paris-Saclay, ENS Paris-Saclay, CNRS, LSV, Cachan, France Ichiro Hasuo National Institute of Informatics, Japan Delia Kesner IRIF, Université de Paris, France Shankara Narayanan IIT Bombay, India Krishna Barbara König Universität Duisburg-Essen, Germany Sławomir Lasota University of Warsaw, Poland Xavier Leroy Collège de France and Inria, France Leonid Libkin University of Edinburgh, UK, and ENS Paris, France Jean-Yves Marion LORIA, Université de Lorraine, France Dominique Méry LORIA, Université de Lorraine, France Matteo Mio LIP, CNRS, ENS Lyon, France Andrzej Murawski University of Oxford, UK Prakash Panangaden McGill University, Canada Amr Sabry Indiana University Bloomington, USA Lutz Schröder Friedrich-Alexander-Universität Erlangen-Nürnberg, Germany Sebastian Siebertz Universität Bremen, Germany Benoît Valiron LRI, CentraleSupélec, Université Paris-Saclay, France Steering Committee Andrew Pitts (Chair) University of Cambridge, UK Christel Baier Technische Universität Dresden, Germany Lars Birkedal Aarhus University, Denmark Ugo Dal Lago Università degli Studi di Bologna, Italy x Organization Javier Esparza Technische UniversitätMünchen, Germany Anca Muscholl LaBRI, Université de Bordeaux, France Frank Pfenning Carnegie Mellon University, USA Additional Reviewers Accattoli, Beniamino Dell’Erba, Daniele Alvim, Mario S. Deng, Yuxin André, Étienne Eickmeyer, Kord Argyros, George Exibard, Leo Arun-Kumar, S. Faggian, Claudia Ayala-Rincon, Mauricio Fijalkow, Nathanaël Bacci, Giorgio Filali-Amine, Mamoun Bacci, Giovanni Francalanza, Adrian Balabonski, Thibaut Frutos Escrig, David Basile, Davide Galletta, Letterio Berger, Martin Ganian, Robert Bernardi, Giovanni Garrigue, Jacques Bisping, Benjamin Gastin, Paul Bodeveix, Jean-Paul Genaim, Samir Bollig, Benedikt Genest, Blaise Bonchi, Filippo Ghica, Dan Bonelli, Eduardo Goncharov, Sergey Boulmé, Sylvain Gorla, Daniele Bourke, Timothy Guerrini, Stefano Bradﬁeld, Julian Hirschowitz, Tom Breuvart, Flavien Hofman, Piotr Bruni, Roberto Hoshino, Naohiko Bruse, Florian Howar, Falk Capriotti, Paolo Inverso, Omar Carette, Jacques Iván, Szabolcs Carette, Titouan Jaax, Stefan Carton, Olivier Jeandel, Emmanuel Cassano, Valentin Johnson, Michael Chadha, Rohit Kahrs, Stefan Charguéraud, Arthur Kamburjan, Eduard Cho, Kenta Katsumata, Shin-Ya Choudhury, Vikraman Kerjean, Marie Ciancia, Vincenzo Kiefer, Stefan Clemente, Lorenzo Komorida, Yuichi Colacito, Almudena Kop, Cynthia Corradini, Andrea Kremer, Steve Czerwiński, Wojciech Kuperberg, Denis de Haan, Ronald Křetínský, Jan de Visme, Marc Laarman, Alfons Organization xi Laurent, Fribourg Reutter, Juan L. Levy, Paul Blain Rossman, Benjamin Li, Yong Rot, Jurriaan Licata, Daniel R. Rowe, Reuben Liquori, Luigi Ruemmer, Philipp Lluch Lafuente, Alberto Sammartino, Matteo Lopez, Aliaume Sankaran, Abhisekh Malherbe, Octavio Sankur, Ocan Manuel, Amaldev Sattler, Christian Manzonetto, Giulio Schmitz, Sylvain Matache, Christina Serre, Olivier Matthes, Ralph Shirmohammadi, Mahsa Mayr, Richard Siles, Vincent Melliès, Paul-André Simon, Bertrand Merz, Stephan Simpson, Alex Miculan, Marino Singh, Neeraj Mikulski, Łukasz Sprunger, David Moser, Georg Srivathsan, B. Moss, Larry Staton, Sam Munch-Maccagnoni, Guillaume Stolze, Claude Muskalla, Sebastian Straßburger, Lutz Nantes-Sobrinho, Daniele Streicher, Thomas Nestra, Härmel Tan, Tony Neumann, Eike Tawbi, Nadia Neves, Renato Toruńczyk, Szymon Niehren, Joachim Tzevelekos, Nikos Padovani, Luca Urbat, Henning Pagani, Michele van Bakel, Steffen Paquet, Hugo van Breugel, Franck Patterson, Daniel van de Pol, Jaco Pedersen, Mathias Ruggaard van Doorn, Floris Peressotti, Marco Van Raamsdonk, Femke Pitts, Andrew Vaux Auclair, Lionel Potapov, Igor Verma, Rakesh M. Power, John Vial, Pierre Praveen, M. Vignudelli, Valeria Puppis, Gabriele Vrgoc, Domagoj Péchoux, Romain Waga, Masaki Pérez, Guillermo Wang, Meng Quatmann, Tim Witkowski, Piotr Rabinovich, Roman Zamdzhiev, Vladimir Radanne, Gabriel Zemmari, Akka Rand, Robert Zhang, Zhenya Ravara, António Zorzi, Margherita Remy, Didier Contents Neural Flocking: MPC-Based Supervised Learning of Flocking Controllers.... 1 Usama Mehmood, Shouvik Roy, Radu Grosu, Scott A. Smolka, Scott D. Stoller, and Ashish Tiwari On Well-Founded and Recursive Coalgebras . . . . . . . . . . . . . . . . . . . . . . . 17 Jiří Adámek, Stefan Milius, and Lawrence S. Moss Timed Negotiations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37 S. Akshay, Blaise Genest, LoïcHélouët, and Sharvik Mital Cartesian Difference Categories . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57 Mario Alvarez-Picallo and Jean-Simon Pacaud Lemay Contextual Equivalence for Signal Flow Graphs . . . . . . . . . . . . . . . . . . . . . 77 Filippo Bonchi, Robin Piedeleu, Paweł Sobociński, and Fabio Zanasi Parameterized Synthesis for Fragments of First-Order Logic Over Data Words . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97 Béatrice Bérard, Benedikt Bollig, Mathieu Lehaut, and Nathalie Sznajder Controlling a Random Population . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119 Thomas Colcombet, Nathanaël Fijalkow, and Pierre Ohlmann Decomposing Probabilistic Lambda-Calculi . . . . . . . . . . . . . . . . . . . . . . . . 136 Ugo Dal Lago, Giulio Guerrieri, and Willem Heijltjes On the k-synchronizability of Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . 157 Cinzia Di Giusto, Laetitia Laversa, and Etienne Lozes General Supervised Learning as Change Propagation with Delta Lenses. . . . . 177 Zinovy Diskin Non-idempotent Intersection Types in Logical Form . . . . . . . . . . . . . . . . . . 198 Thomas Ehrhard On Computability of Data Word Functions Defined by Transducers . . . . . . . 217 Léo Exibard, Emmanuel Filiot, and Pierre-Alain Reynier Minimal Coverability Tree Construction Made Complete and Efficient . . . . . 237 Alain Finkel, Serge Haddad, and Igor Khmelnitsky xiv Contents Constructing Infinitary Quotient-Inductive Types. . . . . . . . . . . . . . . . . . . . . 257 Marcelo P. Fiore, Andrew M. Pitts, and S. C. Steenkamp Relative Full Completeness for Bicategorical Cartesian Closed Structure . . . . 277 Marcelo Fiore and Philip Saville A Duality Theoretic View on Limits of Finite Structures . . . . . . . . . . . . . . . 299 Mai Gehrke, Tomáš Jakl, and Luca Reggio Correctness of Automatic Differentiation via Diffeologies and Categorical Gluing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 319 Mathieu Huot, Sam Staton, and Matthijs Vákár Deep Induction: Induction Rules for (Truly) Nested Types . . . . . . . . . . . . . . 339 Patricia Johann and Andrew Polonsky Exponential Automatic Amortized Resource Analysis . . . . . . . . . . . . . . . . . 359 David M. Kahn and Jan Hoffmann Concurrent Kleene Algebra with Observations: From Hypotheses to Completeness . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 381 Tobias Kappé, Paul Brunet, Alexandra Silva, Jana Wagemaker, and Fabio Zanasi Graded Algebraic Theories . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 401 Satoshi Kura A Curry-style Semantics of Interaction: From Untyped to Second-Order Lazy kl-Calculus . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 422 James Laird An Axiomatic Approach to Reversible Computation . . . . . . . . . . . . . . . . . . 442 Ivan Lanese, Iain Phillips, and Irek Ulidowski An Auxiliary Logic on Trees: on the Tower-Hardness of Logics Featuring Reachability and Submodel Reasoning. . . . . . . . . . . . . . . . . . . . . . . . . . . . 462 Alessio Mansutti The Inconsistent Labelling Problem of Stutter-Preserving Partial-Order Reduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 482 Thomas Neele, Antti Valmari, and Tim A. C. Willemse Semantical Analysis of Contextual Types. . . . . . . . . . . . . . . . . . . . . . . . . . 502 Brigitte Pientka and Ulrich Schöpp Ambiguity, Weakness, and Regularity in Probabilistic Büchi Automata . . . . . 522 Christof Löding and Anton Pirogov Contents xv Local Local Reasoning: A BI-Hyperdoctrine for Full Ground Store . . . . . . . . 542 Miriam Polzer and Sergey Goncharov Quantum Programming with Inductive Datatypes: Causality and Affine Type Theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 562 Romain Péchoux, Simon Perdrix, Mathys Rennela, and Vladimir Zamdzhiev Spinal Atomic Lambda-Calculus . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 582 David Sherratt, Willem Heijltjes, Tom Gundersen, and Michel Parigot Learning Weighted Automata over Principal Ideal Domains . . . . . . . . . . . . . 602 Gerco van Heerdt, Clemens Kupke, Jurriaan Rot, and Alexandra Silva The Polynomial Complexity of Vector Addition Systems with States. . . . . . . 622 Florian Zuleger Author Index ... .. ... .. ... .. ... .. .. ... .. ... .. ... .. ... .. .. 643 Neural Flocking: MPC-based Supervised Learning of Flocking Controllers 1 1 2 1 ()Usama Mehmood , Shouvik Roy , Radu Grosu , Scott A. Smolka , 1 3 Scott D. Stoller , and Ashish Tiwari Stony Brook University, Stony Brook NY, USA umehmood@cs.stonybrook.edu Technische Universitat Wien, Wien, Austria Microsoft Research, San Francisco CA, USA Abstract. We show how a symmetric and fully distributed ﬂocking con- troller can be synthesized using Deep Learning from a centralized ﬂocking controller. Our approach is based on Supervised Learning, with the cen- tralized controller providing the training data, in the form of trajectories of state-action pairs. We use Model Predictive Control (MPC) for the cen- tralized controller, an approach that we have successfully demonstrated on ﬂocking problems. MPC-based ﬂocking controllers are high-performing but also computationally expensive. By learning a symmetric and dis- tributed neural ﬂocking controller from a centralized MPC-based one, we achieve the best of both worlds: the neural controllers have high performance (on par with the MPC controllers) and high eﬃciency. Our experimental results demonstrate the sophisticated nature of the dis- tributed controllers we learn. In particular, the neural controllers are capable of achieving myriad ﬂocking-oriented control objectives, includ- ing ﬂocking formation, collision avoidance, obstacle avoidance, predator avoidance, and target seeking. Moreover, they generalize the behavior seen in the training data to achieve these objectives in a signiﬁcantly broader range of scenarios. In terms of veriﬁcation of our neural ﬂock- ing controller, we use a form of statistical model checking to compute conﬁdence intervals for its convergence rate and time to convergence. Keywords: Flocking · Model Predictive Control · Distributed Neural Controller · Deep Neural Network · Supervised Learning 1 Introduction With the introduction of Reynolds rule-based model [16, 17], it is now possible to understand the ﬂocking problem as one of distributed control. Speciﬁcally, in this model, at each time-step, each agent executes a control law given in terms of the weighted sum of three competing forces to determine its next acceleration. Each of these forces has its own rule: separation (keep a safe distance away from your neighbors), cohesion (move towards the centroid of your neighbors), and alignment (steer toward the average heading of your neighbors). Reynolds The Author(s) 2020 J. Goubault-Larrecq and B. K¨ onig (Eds.): FOSSACS 2020, LNCS 12077, pp. 1–16, 2020. https://doi.org/10.1007/978-3-030-45231-5_1 2 U. Mehmood et al. Fig. 1: Neural Flocking Architecture controller is distributed; i.e., it is executed separately by each agent, using information about only itself and nearby agents, and without communication. Furthermore, it is symmetric; i.e., every agent runs the same controller (same code). We subsequently showed that a simpler, more declarative approach to the ﬂocking problem is possible [11]. In this setting, ﬂocking is achieved when the agents combine to minimize a system-wide cost function. We presented centralized and distributed solutions for achieving this form of “declarative ﬂocking” (DF), both of which were formulated in terms of Model-Predictive Control (MPC) [2]. Another advantage of DF over the ruled-based approach exempliﬁed by Reynolds model is that it allows one to consider additional control objectives (e.g., obstacle and predator avoidance) simply by extending the cost function with additional terms for these objectives. Moreover, these additional terms are typically quite straightforward in nature. In contrast, deriving behavioral rules that achieve the new control objectives can be a much more challenging task. An issue with MPC is that computing the next control action can be compu- tationally expensive, as MPC searches for an action sequence that minimizes the cost function over a given prediction horizon. This renders MPC unsuitable for real-time applications with short control periods, for which ﬂocking is a prime example. Another potential problem with MPC-based approaches to ﬂocking is its performance (in terms of achieving the desired ﬂight formation), which may suﬀer in a fully distributed setting. In this paper, we present Neural Flocking (NF), a new approach to the ﬂocking problem that uses Supervised Learning to learn a symmetric and fully distributed ﬂocking controller from a centralized MPC-based controller. By doing so, we achieve the best of both worlds: high performance (on par with the MPC controllers) in terms of meeting ﬂocking ﬂight-formation objectives, and high eﬃciency leading to real-time ﬂight controllers. Moreover, our NF controllers can easily be parallelized on hardware accelerators such as GPUs and TPUs. Figure 1 gives an overview of the NF approach. A high-performing centralized MPC controller provides the labeled training data to the learning agent: a symmetric and distributed neural controller in the form of a deep neural network (DNN). The training data consists of trajectories of state-action pairs, where a state contains the information known to an agent at a time step (e.g., its own position and velocity, and the position and velocity of its neighbors), and the action (the label) is the acceleration assigned to that agent at that time step by the centralized MPC controller. We formulate and evaluate NF in a number of essential ﬂocking scenarios: basic ﬂocking with inter-agent collision avoidance, as in [11], and more advanced Neural Flocking: MPC-based Supervised Learning of Flocking Controllers 3 scenarios with additional objectives, including obstacle avoidance, predator avoid- ance, and target seeking by the ﬂock. We conduct an extensive performance evaluation of NF. Our experimental results demonstrate the sophisticated nature of NF controllers. In particular, they are capable of achieving all of the stated control objectives. Moreover, they generalize the behavior seen in the training data in order to achieve these objectives in a signiﬁcantly broader range of scenar- ios. In terms of veriﬁcation of our neural controller, we use a form of statistical model checking [5, 10] to compute conﬁdence intervals for its rate of convergence to a ﬂock and for its time to convergence. 2 Background We consider a set of n dynamic agents A = {1,...,n} that move according to the following discrete-time equations of motion: p (k +1) = p (k)+ dt · v (k), |v (k)| < v¯ i i i i (1) v (k +1) = v (k)+ dt · a (k), |a (k)| < a ¯ i i i i 2 2 2 where p (k) ∈ R , v (k) ∈ R , a (k) ∈ R are the position, velocity and accelera- i i i tion of agent i ∈A respectively at time step k, and dt ∈ R is the time step. The magnitudes of velocities and accelerations are bounded by v¯ and a ¯, respectively. Acceleration a (k) is the control input for agent i at time step k. The acceleration is updated after every η time steps i.e., η · dt is the control period. The ﬂock conﬁguration at time step k is thus given by the following vectors (in boldface): T T T p(k)=[p (k) ··· p (k)] (2) 1 n T T T v(k)=[v (k) ··· v (k)] (3) 1 n T T T a(k)=[a (k) ··· a (k)] (4) 1 n The conﬁguration vectors are referred to without the time indexing as p, v, and a. The neighborhood of agent i at time step k, denoted by N (k) ⊆A, contains its N -nearest neighbors, i.e., the N other agents closest to it. We use this deﬁnition (in Section 2.2 to deﬁne a distributed-ﬂocking cost function) for simplicity, and expect that a radius-based deﬁnition of neighborhood would lead to similar results for our distributed ﬂocking controllers. 2.1 Model-Predictive Control Model-Predictive control (MPC) [2] is a well-known control technique that has recently been applied to the ﬂocking problem [11, 19, 20]. At each control step, an optimization problem is solved to ﬁnd the optimal sequence of control actions (agent accelerations in our case) that minimizes a given cost function with respect to a predictive model of the system. The ﬁrst control action of the optimal control sequence is then applied to the system; the rest is discarded. In the computation 4 U. Mehmood et al. of the cost function, the predictive model is evaluated for a ﬁnite prediction horizon of T control steps. MPC-based ﬂocking models can be categorized as centralized or distributed.A centralized model assumes that complete information about the ﬂock is available to a single “global” controller, which uses the states of all agents to compute their next optimal accelerations. The following optimization problem is solved by a centralized MPC controller at each control step k: T −1 min J(k)+ λ · a(k + t | k) (5) a(k|k),...,a(k+T −1|k) < a ¯ t=0 The ﬁrst term J(k) is the centralized model-speciﬁc cost, evaluated for T control steps (this embodies the predictive aspect of MPC), starting at time step k.It encodes the control objective of minimizing the cost function J(k). The second term, scaled by a weight λ> 0, penalizes large control inputs: a(k + t | k) are the predictions made at time step k for the accelerations at time step k + t. In distributed MPC, each agent computes its acceleration based only on its own state and its local knowledge, e.g., information about its neighbors: T −1 min J (k)+ λ · a (k + t | k) (6) i i a (k|k),...,a (k+T −1|k) < a ¯ i i t=0 J (k) is the distributed, model-speciﬁc cost function for agent i, analogous to J(k). In a distributed setting where an agent’s knowledge of its neighbors’ behavior is limited, an agent cannot calculate the exact future behavior of its neighbors. Hence, the predictive aspect of J (k) must rely on some assumption about that behavior during the prediction horizon. Our distributed cost functions are based on the assumption that the neighbors have zero accelerations during the prediction horizon. While this simple design is clearly not completely accurate, our experiments show that it still achieves good results. 2.2 Declarative Flocking Declarative ﬂocking (DF) is a high-level approach to designing ﬂocking algorithms based on deﬁning a suitable cost function for MPC [11]. This is in contrast to the operational approach, where a set of rules are used to capture ﬂocking behavior, as in Reynolds model. For basic ﬂocking, the DF cost function contains two terms: (1) a cohesion term based on the squared distance between each pair of agents in the ﬂock; and (2) a separation term based on the inverse of the squared distance between each pair of agents. The ﬂock evolves toward a conﬁguration in which these two opposing forces are balanced. The cost function J for centralized DF, i.e., centralized MPC (CMPC), is as follows: 2 1 C 2 J (p)= · p + ω · (7) ij s |A| · (|A| − 1) p ij i∈A j∈A,i<j Neural Flocking: MPC-based Supervised Learning of Flocking Controllers 5 where ω is the weight of the separation term and controls the density of the ﬂock. |A|·(|A−1|) The cost function is normalized by the number of pairs of agents, ; as such, the cost does not depend on the size of the ﬂock. The control law for CMPC is given by Eq. (5), with J(k)= J (p(k + t | k)). t=1 The basic ﬂocking cost function for distributed DF is similar to that for CMPC, except that the cost function J for agent i is computed over its set of neighbors N (k) at time k: 1 1 D 2 J (p(k)) = · p + ω · (8) ij s |N (k)| p i ij j∈N (k) j∈N (k) i i The control law for agent i is given by Eq. (6), with J (k) = J (p(k + t | k)). t=1 i 3 Additional Control Objectives The cost functions for basic ﬂocking given in Eqs. (7) and (8) are designed to ensure that in the steady state, the agents are well-separated. Additional goals such as obstacle avoidance, predator avoidance, and target seeking are added to the MPC formulation as weighted cost-function terms. Diﬀerent objectives can be combined by including the corresponding terms in the cost function as a weighted sum. Cost-Function Term for Obstacle Avoidance. We consider multiple rectangular obstacles which are distributed randomly in the ﬁeld. For a set of m rectangular obstacles O = {O , O , ..., O }, we deﬁne the cost function term for obstacle 1 2 m avoidance as: 1 1 J (p, o)= (9) OA |A||O| (i) iA jO p − o (i) where o is the set of points on the obstacle boundaries and o is the point on th th the obstacle boundary of the j obstacle O that is closest to the i agent. Cost-Function Term for Target Seeking. This term is the average of the squared distance between the agents and the target. Let g denote the position of the ﬁxed target. Then the target-seeking term is as deﬁned as J (p)= p − g (10) TS i |A| i∈A Cost-Function Term for Predator Avoidance. We introduce a single predator, which is more agile than the ﬂocking agents: its maximum speed and acceleration are a factor of f greater than v¯ and a ¯, respectively, with f > 1. Apart from p p being more agile, the predator has the same dynamics as the agents, given by 6 U. Mehmood et al. Eq. (1). The control law for the predator consists of a single term that causes it to move toward the centroid of the ﬂock with maximum acceleration. For a ﬂock of n agents and one predator, the cost-function term for predator avoidance is the average of the inverse of the cube of the distances between the predator and the agents. It is given by: 1 1 J (p,p )= (11) PA pred |A| p − p i pred iA where p is the position of the predator. In contrast to the separation term pred in Eqs. (5)-(6), which we designed to ensure inter-agent collision avoidance, the predator-avoidance term has a cube instead of a square in the denominator. This is to reduce the inﬂuence of the predator on the ﬂock when the predator is far away from the ﬂock. NF Cost-Function Terms. The MPC cost functions used in our examination of Neural Flocking are weighted sums of the cost function terms introduced above. We refer to the ﬁrst term of our centralized DF cost function J (p) (see Eq. (7)) as J (p) and the second as J (p). We use the following cost functions J , cohes sep 1 J , and J for basic ﬂocking with collision avoidance, obstacle avoidance with 2 3 target seeking, and predator avoidance, respectively. J (p)= J (p)+ ω · J (p) (12a) 1 cohes s sep J (p, o)= J (p)+ ω · J (p)+ ω · J (p, o)+ ω · J (p) (12b) 2 cohes s sep o OA t TS J (p,p )= J (p)+ ω · J (p)+ ω · J (p,p ) (12c) 3 pred cohes s sep p PA pred where ω is the weight of the separation term, ω is the weight of the obstacle s o avoidance term, ω is the weight of the target-seeking term, and ω is the weight t p of the predator-avoidance term. Note that J is equivalent to J (Eq. (7)). The weight ω of the separation term is experimentally chosen to ensure that the distance between agents, throughout the simulation, is at least d , the minimum min inter-agent distance representing collision avoidance. Similar considerations were given to the choice of values for ω and ω . The speciﬁc values we used for the o p weights are: ω = 2000, ω = 1500, ω = 10, and ω = 500. s o t p We experimented with an alternative strategy for introducing inter-agent collision avoidance, obstacle avoidance, and predator avoidance into the MPC problem, namely, as constraints of the form d − p < 0, d −||p − min ij min i (i) o || < 0, and d −||p − p || < 0, respectively. Using the theory of exact min i pred penalty functions [12], we recast the constrained MPC problem as an equivalent unconstrained MPC problem by converting the constraints into a weighted penalty term, which is then added to the MPC cost function. This approach rendered the optimization problem diﬃcult to solve due to the non-smoothness of the penalty term. As a result, constraint violations in the form of collisions were observed during simulation. Neural Flocking: MPC-based Supervised Learning of Flocking Controllers 7 4 Neural Flocking We learn a distributed neural controller (DNC) for the ﬂocking problem using training data in the form of trajectories of state-action pairs produced by a CMPC controller. In addition to basic ﬂocking with inter-agent collision avoidance, the DNC exhibits a number of other ﬂocking-related behaviors, including obstacle avoidance, target seeking, and predator avoidance. We also show how the learned behavior exhibited by the DNC generalizes over a larger number of agents than what was used during training to achieve successful collision-free ﬂocking in signiﬁcantly larger ﬂocks. We use Supervised Learning to train the DNC. Supervised Learning learns a function that maps an input to an output based on example sequences of input- output pairs. In our case, the trajectory data obtained from CMPC contains both the training inputs and corresponding labels (outputs): the state of an agent in the ﬂock (and that of its nearest neighbors) at a particular time step is the input, and that agent’s acceleration at the same time step is the label. 4.1 Training Distributed Flocking Controllers We use Deep Learning to synthesize a distributed and symmetric neural controller from the training data provided by the CMPC controller. Our objective is to learn basic ﬂocking, obstacle avoidance with target seeking, and predator avoidance. Their respective CMPC-based cost functions are given in Sections 2.2 and 3. All of these control objectives implicitly also include inter-agent collision avoidance by virtue of the separation term in Eq. 7. For each of these control objectives, DNC training data is obtained from CMPC trajectory data generated for n = 15 agents, starting from initial con- ﬁgurations in which agent positions and velocities are uniformly sampled from 2 2 [−15, 15] and [0, 1] , respectively. All training trajectories are 1,000 time steps in duration. We further ensure that the initial conﬁgurations are recoverable; i.e., no two agents are so close to each other that they cannot avoid a collision by resorting to maximal accelerations. We learn a single DNC from the state-action pairs of all n agents. This yields a symmetric distributed controller, which we use for each agent in the ﬂock during evaluation. Basic Flocking. Trajectory data for basic ﬂocking is generated using the cost function given in Eq. (7). We generate 200 trajectories, each of which (as noted above) is 1,000 time steps long. The input to the NN is the position and velocity of each agent along with the positions and velocities of its N -nearest neighbors. This yields 200 · 1, 000 · 15 = 3M total training samples. Let us refer to the agent (the DNC) being learned as A . Since we use y y y x x x neighborhood size N = 14, the input to the NN is of the form [p p v v p p 0 0 0 0 1 1 y y y y y x x x x x v v ... p p v v ], where p , p are the position coordinates and v , v 1 1 14 14 14 14 0 0 0 0 y y x x velocity coordinates for agent A , and p , p and v , v are the 1...14 1...14 1...14 1...14 position and velocity vectors of its neighbors. Since this input vector has 60 components, the input to the NN consists of 60 features. 8 U. Mehmood et al. (a) Basic ﬂocking (b) Obstacle avoid. (c) Predator avoid. (d) Target seeking Fig. 2: Snapshots of DNC ﬂocking behaviors for 30 agents Obstacle Avoidance with Target Seeking. For obstacle avoidance with target seeking, we use CMPC with the cost function given in Eq. (12b). The target is located beyond the obstacles, forcing the agents to move through the obstacle ﬁeld. For the training data, we generate 100 trajectories over 4 diﬀerent obstacle ﬁelds (25 trajectories per obstacle ﬁeld). The input to the NN consists of the 92 y y y y y y y x x x x x x x y x features [p p v v o o ... p p v v o o g g ], where o , o is the 0 0 0 0 0 0 14 14 14 14 14 14 0 0 closest point on any obstacle to agent A ; o , o give the closest point on 1...14 1...14 x y any obstacle for the 14 neighboring agents, and g , g is the target location. Predator Avoidance. The CMPC cost function for predator avoidance is given in Eq. (12c). The position, velocity, and the acceleration of the predator are denoted by p , v , a , respectively. We take f =1.40; hence v¯ =1.40 v¯ and pred pred pred p pred a ¯ =1.40 a ¯. The input features to the NN are the positions and velocities pred of agent A and its N -nearest neighbors, and the position and velocity of the y y y x x x predator. The input with 64 features thus has the form [p p v v ... p p 0 0 0 0 14 14 y y y x x x v v p p v v ]. 14 14 pred pred pred pred 5 Experimental Evaluation This section contains the results of our extensive performance analysis of the distributed neural ﬂocking controller (DNC), taking into account various control objectives: basic ﬂocking with collision avoidance, obstacle avoidance with target seeking, and predator avoidance. As illustrated in Fig. 1, this involves running CMPC to generate the training data for the DNCs, whose performance we then compare to that of the DMPC and CMPC controllers. We also show that the DNC ﬂocking controllers generalize the behavior seen in the training data to achieve successful collision-free ﬂocking in ﬂocks signiﬁcantly larger in size than those used during training. Finally, we use Statistical Model Checking to obtain conﬁdence intervals for DNC’s correctness/performance. 5.1 Preliminaries The CMPC and DMPC control problems deﬁned in Section 2.1 are solved using MATLAB fmincon optimizer. In the training phase, the size of the ﬂock is Neural Flocking: MPC-based Supervised Learning of Flocking Controllers 9 n = 15. For obstacle-avoidance with target-seeking, we use 5 obstacles with the target located at [60,50]. The simulation time is 100, dt = 0.1 time units, and η = 3, where (recall) η · dt is the control period. Further, the agent velocity and acceleration bounds are v¯=2.0 and a ¯ =1.5. We use d =1.5 as the minimum inter-agent distance for collision avoidance, min obs d = 1 as the minimum agent-obstacle distance for obstacle avoidance, and min pred d =1.5 as the minimum agent-predator distance for predator avoidance. For min initial conﬁgurations, recall that agent positions and velocities are uniformly 2 2 sampled from [−15, 15] and [0, 1] , respectively, and we ensure that they are recoverable; i.e., no two agents are so close to each other that they cannot avoid a collision when resorting to maximal accelerations. The predator starts at rest from a ﬁxed location at a distance of 40 from the ﬂock center. For training, we considered 15 agents and 200 trajectories per agent, each trajectory 1,000 time steps in length. This yielded a total of 3,000,000 training samples. Our neural controller is a fully connected feed-forward Deep Neural Network (DNN), with 5 hidden layers, 84 neurons per hidden layer, and with a ReLU activation function. We use an iterative approach for choosing the DNN hyperparameters and architecture where we continuously improve our NN, until we observe satisfactory performance by the DNC. For training the DNNs, we use Keras [3], which is a high-level neural network API written in Python and capable of running on top of TensorFlow. To generate the NN model, Keras uses the Adam optimizer [8] with the following settings: −2 −8 lr = 10 , β = 0.9, β = 0.999, = 10 . The batch size (number of samples 1 2 processed before the model is updated) is 2,000, and the number of epochs (number of complete passes through the training dataset) used for training is 1,000. For measuring training loss, we use the mean-squared error metric. For basic ﬂocking, DNN input vectors have 60 features and the number of trainable DNN parameters is 33,854. For ﬂocking with obstacle-avoidance and target-seeking, input vectors have 92 features and the number of trainable parameters is 36,542. Finally, for ﬂocking with predator-avoidance, input vectors have 64 features and the resulting number of trainable DNN parameters is 34,190. To test the trained DNC, we generated 100 simulations (runs) for each of the desired control objectives: basic ﬂocking with collision avoidance, ﬂocking with obstacle avoidance and target seeking, and ﬂocking with predator avoidance. The results presented in Tables 1, were obtained using the same number of agents and obstacles and the same predator as in the training phase. We also ran tests that show DNC controllers can achieve collision-free ﬂocking with obstacle avoidance where the numbers of agents and obstacles are greater than those used during training. 5.2 Results for Basic Flocking We use ﬂock diameter, inter-agent collision count and velocity convergence [20] as performance metrics for ﬂocking behavior. At any time step, the ﬂock diameter D(p)= max p is the largest distance between any two agents in the (i,j)∈A ij ﬂock. We calculate the average converged diameter by averaging the ﬂock diameter 10 U. Mehmood et al. 0.5 12 0 0 20406080 100 0 20406080 100 Time Time (a) Flock diameter (b) Velocity convergence Fig. 3: Performance comparison for basic ﬂocking with collision avoidance, aver- aged over 100 test runs. in the ﬁnal time step of the simulation over the 100 runs. An inter-agent collision (IC) occurs when the distance between two agents at any point in time is less than d . The IC rate (ICR) is the average number of ICs per test-trajectory time- min step. The velocity convergence VC (v)=(1/n) v − ( v )/n is i j i∈A j=1 the average of the squared magnitude of the discrepancy between the velocities of agents and the ﬂock’s average velocity. For all the metrics, lower values are better, indicating a denser and more coherent ﬂock with fewer collisions. A successful ﬂocking controller should also ensure that values of D(p) and VC (v) eventually stabilize. Fig. 3 and Table 1 compare the performance of the DNC on the basic-ﬂocking problem for 15 agents to that of the MPC controllers. Although the DMPC and CMPC outperform the DNC, the diﬀerence is marginal. An important advantage of the DNC over DMPC is that they are much faster. Executing a DNC controller requires a modest number of arithmetic operations, whereas executing an MPC controller requires simulation of a model and controller over the prediction horizon. In our experiments, on average, the CMPC takes 1209 msec of CPU time for the entire ﬂock and DMPC takes 58 msec of CPU time per agent, whereas the DNC takes only 1.6 msec. Table 1: Performance comparison for BF with 15 agents on 100 test runs Avg. Conv. Diameter ICR Velocity Convergence DNC 14.13 0 0.15 DMPC 13.67 0 0.11 CMPC 13.84 0 0.10 VC Neural Flocking: MPC-based Supervised Learning of Flocking Controllers 11 Table 2: DNC Performance Generalization for BF Agents Avg. Conv. Conv. Avg. Conv. ICR Diameter Rate (%) Time 15 14.13 100 52.15 0 20 16.45 97 58.76 0 25 19.81 94 64.11 0 30 23.24 92 72.08 0 35 30.57 86 83.84 0.008 40 38.66 81 95.32 0.019 5.3 Results for Obstacle and Predator Avoidance For obstacle and predator avoidance, collision rates are used as a performance metric. An obstacle-agent collision (OC) occurs when the distance between an obs agent and the closest point on any obstacle is less than d . A predator-agent min collision (PC) occurs when the distance between an agent and the predator is less pred than d . The OC rate (OCR) is the average number of OCs per test-trajectory min time-step, and the PC rate (PCR) is deﬁned similarly. Our test results show that the DNC, along with the DMPC and CMPC, is collision-free (i.e., each of ICR, OCR, and PCR is zero) for 15 agents, with the exception of DMPC for predator avoidance where PCR = 0.013. We also observed that the ﬂock successfully reaches the target location in all 100 test runs. 5.4 DNC Generalization Results Tables 2–3 present DNC generalization results for basic ﬂocking (BF), obstacle avoidance (OA), and predator avoidance (PA), with the number of agents ranging from 15 (the ﬂock size during training) to 40. In all of these experiments, we use a neighborhood size of N = 14, the same as during training. Each controller was evaluated with 100 test runs. The performance metrics in Table 2 are the average converged diameter, convergence rate, average convergence time, and ICR. The convergence rate is the fraction of successful ﬂocks over 100 runs. The collection of agents is said to have converged to a ﬂock (with collision avoidance) if the value of the global cost function is less than the convergence threshold. We use a convergence threshold of J (p) ≤ 150, which was chosen based on its proximity to the value achieved by CMPC. We use the cost function from Eq. 12a to calculate our success rate because we are showing convergence rate for basic ﬂocking. The average convergence time is the time when the global cost function ﬁrst drops below the success threshold and remains below it for the rest of the run, averaged over all 100 runs. Even with a local neighborhood of size 14, the results demonstrate that the DNC can successfully generalize to a large number of agents for all of our control objectives. 12 U. Mehmood et al. Table 3: DNC Generalization Performance for OA and PA OA PA Agents ICR OCR ICR PCR 15 0000 20 0000 25 0000 30 0000 35 0.011 0.009 0.013 0.010 40 0.021 0.018 0.029 0.023 5.5 Statistical Model Checking Results We use Monte Carlo (MC) approximation as a form of Statistical Model Check- ing [5, 10] to compute conﬁdence intervals for the DNC’s convergence rate to a ﬂock with collision avoidance and for the (normalized) convergence time. The convergence rate is the fraction of successful ﬂocks over N runs. The collection of agent is said to have converged to a successful ﬂock with collision avoidance if the global cost function J (p) ≤ 150, where J (p) is cost function for basic 1 1 ﬂocking deﬁned in Eq. 12a. The main idea of MC is to use N random variables, Z ,...,Z , also called 1 N samples, IID distributed according to a random variable Z with mean μ , and to take the sum μ ˜ =(Z + ... + Z )/N as the value approximating the mean μ . Z 1 N Z Since an exact computation of μ is almost always intractable, an MC approach is used to compute an (, δ)-approximation of this quantity. Additive Approximation [6] is an (, δ)-approximation scheme where the mean μ of an RV Z is approximated with absolute error and probability 1 − δ: Pr[μ − ≤ μ ˜ ≤ μ + ] ≥ 1 − δ (13) Z Z Z where μ ˜ is an approximation of μ . An important issue is to determine the Z Z number of samples N needed to ensure that μ ˜ is an (, δ)-approximation of μ .If Z Z Z is a Bernoulli variable expected to be large, one can use the Chernoﬀ-Hoeﬀding instantiation of the Bernstein inequality and take N to be N = 4 ln(2/δ)/ , as in [6]. This results in the additive approximation algorithm [5], deﬁned in Algorithm 1. We use this algorithm to obtain a joint (, δ)-approximation of the mean convergence rate and mean normalized convergence time for the DNC. Each sample Z is based on the result of an execution obtained by simulating the system starting from a random initial state, and we take Z =(B, R), where B is a Boolean variable indicating whether the agents converged to a ﬂock during the execution, and R is a real value denoting the normalized convergence time. The normalized convergence time is the time when the global cost function ﬁrst drops below the convergence threshold and remains below it for the rest of the run, measured as a fraction of the total duration of the run. The assumptions Neural Flocking: MPC-based Supervised Learning of Flocking Controllers 13 Algorithm 1: Additive Approximation Algorithm Input: (, δ) with 0 << 1and0 <δ < 1 Input: Random variables Z ,IID Output: μ ˜ approximation of μ Z Z N = 4 ln(2/δ)/ ; for (i=0; i ≤ N; i++) do S = S + Z ; μ˜ = S/N; return μ ˜ ; Z Z Table 4: SMC results for DNC convergence rate and normalized convergence time; =0.01, δ =0.0001 Agents μ ˜ μ ˜ CR CT 15 0.99 0.53 20 0.97 0.58 25 0.94 0.65 30 0.91 0.71 35 0.86 0.84 40 0.80 0.95 about Z required for validity of the additive approximation hold, because RV B is a Bernoulli variable, the convergence rate is expected to be large (i.e., closer to 1 than to 0), and the proportionality constraint of the Bernstein inequality is also satisﬁed for RV R. In these experiments, the initial conﬁgurations are sampled from the same distributions as in Section 5.1, and we set =0.01 and δ =0.0001, to obtain N = 396,140. We perform the required set of N simulations for 15, 20, 25, 30, 35 and 40 agents. Table 4 presents the results, speciﬁcally, the (, δ)-approximations μ ˜ CR and μ ˜ of the mean convergence rate and the mean normalized convergence CT time, respectively. While the results for the convergence rate are (as expected) nu- merically similar to the results in Table 2, the results in Table 4 are much stronger, because they come with the guarantee that they are (, δ)-approximations of the actual mean values. 6 Related Work In [18], a ﬂocking controller is synthesized using multi-agent reinforcement learning (MARL) and natural evolution strategies (NES). The target model from which the system learns is Reynolds ﬂocking model [16]. For training purposes, a list of metrics called entropy are chosen, which provide a measure of the collective behavior displayed by the target model. As the authors of [18] observe, this technique does not quite work: although it consistently leads to agents forming recognizable patterns during simulation, agents self-organized into a cluster instead of ﬂowing like a ﬂock. 14 U. Mehmood et al. In [9], reinforcement learning and ﬂocking control are combined for the purpose of predator avoidance, where the learning module determines safe spaces in which the ﬂock can navigate to avoid predators. Their approach to predator avoidance, however, isn’t distributed as it requires a majority consensus by the ﬂock to determine its action to avoid predators. They also impose an α-lattice structure [13] on the ﬂock. In contrast, our approach is geometry-agnostic and achieves predator avoidance in a distributed manner. In [7], an uncertainty-aware reinforcement learning algorithm is developed to estimate the probability of a mobile robot colliding with an obstacle in an unknown environment. Their approach is based on bootstrap neural networks using dropouts, allowing it to process raw sensory inputs. Similarly, a learning- based approach to robot navigation and obstacle avoidance is presented in [14]. They train a model that maps sensor inputs and the target position to motion commands generated by the ROS [15] navigation package. Our work in contrast considers obstacle avoidance (and other control objectives) in a multi-agent ﬂocking scenario under the simplifying assumption of full state observation. In [4], an approach based on Bayesian inference is proposed that allows an agent in a heterogeneous multi-agent environment to estimate the navigation model and goal of each of its neighbors. It then uses this information to compute a plan that minimizes inter-agent collisions while allowing the agent to reach its goal. Flocking formation is not considered. 7 Conclusions With the introduction of Neural Flocking (NF), we have shown how machine learning in the form of Supervised Learning can bring many beneﬁts to the ﬂocking problem. As our experimental evaluation conﬁrms, the symmetric and fully distributed neural controllers we derive in this manner are capable of achieving a multitude of ﬂocking-oriented objectives, including ﬂocking formation, inter-agent collision avoidance, obstacle avoidance, predator avoidance, and target seeking. Moreover, NF controllers exhibit real-time performance and generalize the behavior seen in the training data to achieve these objectives in a signiﬁcantly broader range of scenarios. Ongoing work aims to determine whether a DNC can perform as well as the centralized MPC controller for agent models that are signiﬁcantly more realistic than our current point-based model. For this purpose, we are using transfer learning to train a DNC that can achieve acceptable performance on realistic quadrotor dynamics [1], starting from our current point-model-based DNC. This eﬀort also involves extending our current DNC from 2-dimensional to 3-dimensional spatial coordinates. If successful, and preliminary results are encouraging, this line of research will demonstrate that DNCs are capable of achieving ﬂocking with complex realistic dynamics. For future work, we plan to investigate a distance-based notion of agent neigh- borhood as opposed to our current nearest-neighbors formulation. Furthermore, motivated by the quadrotor study of [21], we will seek to combine MPC with Neural Flocking: MPC-based Supervised Learning of Flocking Controllers 15 reinforcement learning in the framework of guided policy search as an alternative solution technique for the NF problem. References 1. Bouabdallah, S.: Design and control of quadrotors with application to autonomous ﬂying (2007) 2. Camacho, E.F., Bordons Alba, C.: Model Predictive Control. Springer (2007) 3. Chollet, F., et al.: Keras (2015), https://github.com/keras-team/keras.git 4. Godoy, J., Karamouzas, I., Guy, S.J., Gini, M.: Moving in a crowd: Safe and eﬃcient navigation among heterogeneous agents. In: Proceedings of the Twenty- Fifth International Joint Conference on Artiﬁcial Intelligence. pp. 294–300. IJCAI’16, AAAI Press (2016) 5. Grosu, R., Peled, D., Ramakrishnan, C.R., Smolka, S.A., Stoller, S.D., Yang, J.: Using statistical model checking for measuring systems. In: 6th International Symposium, ISoLA 2014. Corfu, Greece (Oct 2014) 6. H´erault, T., Lassaigne, R., Magniette, F., Peyronnet, S.: Approximate probabilistic model checking. In: Steﬀen, B., Levi, G. (eds.) Veriﬁcation, Model Checking, and Abstract Interpretation. pp. 73–84. Springer Berlin Heidelberg, Berlin, Heidelberg (2004) 7. Kahn, G., Villaﬂor, A., Pong, V., Abbeel, P., Levine, S.: Uncertainty-aware re- inforcement learning for collision avoidance. arXiv preprint arXiv:1702.01182. pp. 1–12 (2017) 8. Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. In: 3rd Interna- tional Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015) 9. La, H.M., Lim, R., Sheng, W.: Multirobot cooperative learning for predator avoid- ance. IEEE Transactions on Control Systems Technology 23(1), 52–63 (2015) 10. Larsen, K.G., Legay, A.: Statistical model checking: Past, present, and future. In: 6th International Symposium, ISoLA 2014. Corfu, Greece (Oct 2014) 11. Mehmood, U., Paoletti, N., Phan, D., Grosu, R., Lin, S., Stoller, S.D., Tiwari, A., Yang, J., Smolka, S.A.: Declarative vs rule-based control for ﬂocking dynamics. In: Proceedings of SAC 2018, 33rd Annual ACM Symposium on Applied Computing. pp. 816–823 (2018) 12. Nocedal, J., Wright, S.J.: Numerical Optimization. Springer, New York, NY, USA, second edn. (2006) 13. Olfati-Saber, R.: Flocking for multi-agent dynamic systems: Algorithms and theory. IEEE Transactions on automatic control 51(3), 401–420 (2006) 14. Pfeiﬀer, M., Schaeuble, M., Nieto, J.I., Siegwart, R., Cadena, C.: From perception to decision: A data-driven approach to end-to-end motion planning for autonomous ground robots. In: 2017 IEEE International Conference on Robotics and Automation, ICRA 2017, Singapore, Singapore, May 29 - June 3, 2017. pp. 1527–1533 (2017) 15. Quigley, M., Conley, K., Gerkey, B.P., Faust, J., Foote, T., Leibs, J., Wheeler, R., Ng, A.Y.: ROS: an open-source robot operating system. In: ICRA Workshop on Open Source Software (2009) 16. Reynolds, C.W.: Flocks, herds and schools: A distributed behavioral model. SIG- GRAPH Comput. Graph. 21(4) (Aug 1987) 17. Reynolds, C.W.: Steering behaviors for autonomous characters. In: Proceedings of Game Developers Conference 1999. pp. 763–782 (1999) 16 U. Mehmood et al. 18. Shimada, K., Bentley, P.: Learning how to ﬂock: Deriving individual behaviour from collective behaviour with multi-agent reinforcement learning and natural evolution strategies. In: Proceedings of the Genetic and Evolutionary Computation Conference Companion. pp. 169–170. ACM (2018) 19. Zhan, J., Li, X.: Flocking of multi-agent systems via model predictive control based on position-only measurements. IEEE Transactions on Industrial Informatics 9(1), 377–385 (2013) 20. Zhang, H.T., Cheng, Z., Chen, G., Li, C.: Model predictive ﬂocking control for second-order multi-agent systems with input constraints. IEEE Transactions on Circuits and Systems I: Regular Papers 62(6), 1599–1606 (2015) 21. Zhang, T., Kahn, G., Levine, S., Abbeel, P.: Learning deep control policies for autonomous aerial vehicles with MPC-guided policy search. In: 2016 IEEE Interna- tional Conference on Robotics and Automation, ICRA 2016, Stockholm, Sweden, May 16-21, 2016. pp. 528–535 (2016) Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4. 0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made. The images or other third party material in this chapter are included in the chapter’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. On Well-Founded and Recursive Coalgebras 1, 2, ,() 3 , † Jiří Adámek , Stefan Milius , and Lawrence S. Moss Czech Technical University, Prague, Czech Republic j.adamek@tu-braunschweig.de Friedrich-Alexander-Universität Erlangen-Nürnberg, Germany mail@stefan-milius.eu Indiana University, Bloomington, IN, USA lmoss@indiana.edu Abstract This paper studies fundamental questions concerning category- theoretic models of induction and recursion. We are concerned with the relationship between well-founded and recursive coalgebras for an endofunctor. For monomorphism preserving endofunctors on complete and well-powered categories every coalgebra has a well-founded part, and we provide a new, shorter proof that this is the coreﬂection in the category of all well-founded coalgebras. We present a new more general proof of Taylor’s General Recursion Theorem that every well- founded coalgebra is recursive, and we study conditions which imply the converse. In addition, we present a new equivalent characterization of well-foundedness: a coalgebra is well-founded iﬀ it admits a coalgebra-to- algebra morphism to the initial algebra. Keywords: Well-founded · Recursive · Coalgebra · Initial Algebra · General Recursion Theorem 1 Introduction What is induction? What is recursion? In areas of theoretical computer science, the most common answers are related to initial algebras. Indeed, the dominant trend in abstract data types is initial algebra semantics (see e.g. [19]), and this approach has spread to other semantically-inclined areas of the subject. The approach in broad slogans is that, for an endofunctor F describing the type of algebraic operations of interest, the initial algebra μF has the property that for every F -algebra A, there is a unique homomorphism μF → A,andthis is recursion. Perhaps the primary example is recursion on N, the natural numbers. Recall that N is the initial algebra for the set functor FX = X +1.If A is any set, and a ∈ A and α : A → A +1 are given, then initiality tells us that there is a unique f : N → A such that for all n ∈ N, f (0) =af (n +1) = α(f (n)). (1.1) A full version of this paper including full proof details is available on arXiv [5]. Supported by the Grant Agency of the Czech Republic under grant 19-00902S. Supported by Deutsche Forschungsgemeinschaft (DFG) under project MI 717/5-2. Supported by grant #586136 from the Simons Foundation. The Author(s) 2020 J. Goubault-Larrecq and B. K¨ onig (Eds.): FOSSACS 2020, LNCS 12077, pp. 17–36, 2020. https://doi.org/10.1007/978-3-030-45231-5_2 18 J. Ad´ amek et al. Then the ﬁrst additional problem coming with this approach is that of how to “recognize” initial algebras: Given an algebra, how do we really know if it is initial? The answer – again in slogans – is that initial algebras are the ones with “no junk and no confusion.” Although initiality captures some important aspects of recursion, it cannot be a fully satisfactory approach. One big missing piece concerns recursive deﬁnitions based on well-founded relations. For example, the whole study of termination of rewriting systems depends on well-orders, the primary example of recursion on a wel l-founded order.Let (X, R) be a well-founded relation, i.e. one with no inﬁnite sequences ··· x Rx Rx .Let A be any set, and let α : P A → A.(Here 2 1 0 and below, P is the power set functor, taking a set to the set of its subsets.) Then there is a unique f : X → A such that for all x ∈ X , f (x)= α({f (y): yR x}). (1.2) The main goal of this paper is the study of concepts that allow one to extend the algebraic spirit behind initiality in (1.1) to the setting of recursion arising from well-foundedness as we ﬁnd it in (1.2). The corresponding concepts are those of well-founded and recursive coalgebras for an endofunctor, which ﬁrst appear in work by Osius [22] and Taylor [23, 24], respectively. In his work on categorical set theory, Osius [22] ﬁrst studied the notions of well-founded and recursive coalgebras (for the power-set functor on sets and, more generally, the power-object functor on an elementary topos). He deﬁned recursive coalgebras as those coalgebras α : A → P A which have a unique coalgebra-to-algebra homomorphism into every algebra (see Deﬁnition 3.2). Taylor [23,24] took Osius’ ideas much further. He introduced well-founded coalgebras for a general endofunctor, capturing the notion of a well-founded rela- tion categorically, and considered recursive coalgebras under the name ‘coalgebras obeying the recursion scheme’. He then proved the General Recursion Theorem that all well-founded coalgebras are recursive, for every endofunctor on sets (and on more general categories) preserving inverse images. Recursive coalgebras were also investigated by Eppendahl [12], who called them algebra-initial coalgebras. Capretta, Uustalu, and Vene [10] further studied recursive coalgebras, and they showed how to construct new ones from given ones by using comonads. They also explained nicely how recursive coalgebras allow for the semantic treatment of (functional) divide-and-conquer programs. More recently, Jeannin et al. [15] proved the General Recursion Theorem for polynomial functors on the category of many-sorted sets; they also provide many interesting examples of recursive coalgebras arising in programming. Our contributions in this paper are as follows. We start by recalling some pre- liminaries in Section 2 and the deﬁnition of (parametrically) recursive coalgebras in Section 3 and of well-founded coalgebras in Section 4 (using a formulation based on Jacobs’ next time operator [14], which we extend from Kripke poly- nomial set functors to arbitrary functors). We show that every coalgebra for a monomorphism preserving functor on a complete and well-powered category has a well-founded part, and provide a new proof that this is the coreﬂection in the On Well-Founded and Recursive Coalgebras 19 category of well-founded coalgebras (Proposition 4.19), shortening our previous proof [6]. Next we provide a new proof of Taylor’s General Recursion Theorem (Theorem 5.1), generalizing this to endofunctors preserving monomorphisms on a complete and well-powered category having smooth monomorphisms (see Deﬁni- tion 2.8). For the category of sets, this implies that “well-founded ⇒ recursive” holds for all endofunctors, strengthening Taylor’s result. We then discuss the converse: is every recursive coalgebra well-founded? Here the assumption that F preserves inverse images cannot be lifted, and one needs additional assumptions. In fact, we present two results: one assumes universally smooth monomorph- isms and that the functor has a pre-ﬁxed point (see Theorem 5.5). Under these assumptions we also give a new equivalent characterization of recursiveness and well-foundedness: a coalgebra is recursive if it has a coalgebra-to-algebra morphism into the initial algebra (which exists under our assumptions), see Co- rollary 5.6. This characterization was previously established for ﬁnitary functors on sets [3]. The other converse of the above implication is due to Taylor using the concept of a subobject classiﬁer (Theorem 5.8). It implies that ‘recursive’ and ‘well-founded’ are equivalent concepts for all set functors preserving inverse images. We also prove that a similar result holds for the category of vector spaces over a ﬁxed ﬁeld (Theorem 5.12). Finally, we show in Section 6 that well-founded coalgebras are closed under coproducts, quotients and, assuming mild assumptions, under subcoalgebras. 2 Preliminaries We start by recalling some background material. Except for the deﬁnitions of algebra and coalgebra in Subsection 2.1, the subsections below may be read as needed. We assume that readers are familiar with notions of basic category theory; see e.g. [2] for everything which we do not detail. We indicate monomorphisms by writing and strong epimorphisms by . 2.1 Algebras and Coalgebras. We are concerned throughout this paper with algebras and coalgebras for an endofunctor. This means that we have an underlying category, usually written A ; frequently it is the category of sets or of vector spaces over a ﬁxed ﬁeld, and that a functor F : A → A is given. An F -algebra is a pair (A, α),where α : FA → A.An F -coalgebra is a pair (A, α), where α : A → FA. We usually drop the functor F . Given two algebras (A, α) and (B, β),an algebra homomorphism from the ﬁrst to the second is h : A → B in A such that h · α = β · Fh. Similarly, a coalgebra homomorphism satisﬁes β · h = Fh · α.Wedenoteby Coalg F the category of all coalgebras for F . Example 2.1. (1) The power set functor P : Set → Set takes a set X to the set P X of all subsets of it; for a morphism f : X → Y , P f : P X → P Y takes a subset S ⊆ X to its direct image f [S]. Coalgebras α : X → P X may be identiﬁed with directed graphs on the set X of vertices, and the coalgebra structure α describes the edges: b ∈ α(a) means that there is an edge a → b in the graph. 20 J. Ad´ amek et al. (2) Let Σ be a signature, i.e. a set of operation symbols, each with a ﬁnite arity. The polynomial functor H associated to Σ assigns to a set X the set H X = Σ × X , Σ n n∈N where Σ is the set of operation symbols of arity n. This may be identiﬁed with the set of all terms σ(x ,...,x ),for σ ∈ Σ ,and x ,...,x ∈ X.Algebrasfor 1 n n 1 n H are the usual Σ-algebras. (3) Deterministic automata over an input alphabet Σ are coalgebras for the functor FX = {0, 1}× X . Indeed, given a set S of states, a next-state map S × Σ → S may be curried to δ : S → S . The set of ﬁnal states yields the acceptance predicate a : S →{0, 1}. So an automaton may be regarded as a coalgebra a, δ : S →{0, 1}× S . (4) Labelled transitions systems are coalgebras for FX = P (Σ × X ). (5) To describe linear weighted automata, i.e. weighted automata over the input alphabet Σ with weights in a ﬁeld K , as coalgebras, one works with the category coal- Vec of vector spaces over K . A linear weighted automaton is then a gebra for FX = K × X . 2.2 Preservation Properties. Recall that an intersection of two subobjects s : S A (i =1, 2)ofagivenobject A is given by their pullback. Analogously, i i (general) intersections are given by wide pullbacks. Furthermore, the inverse image of a subobject s : S B under a morphism f : A → B is the subobject t : T A obtained by a pullback of s along f . All of the ‘usual’ set functors preserve intersections and inverse images: Example 2.2. (1) Every polynomial functor preserves intersections and inverse images. (2) The power-set functor P preserves intersections and inverse images. (3) Intersection-preserving set functors are closed under taking coproducts, products and composition. Similarly, for inverse images. (4) Consider next the set functor R deﬁned by RX = {(x, y) ∈ X × X : x = y} + {d} for sets X . For a function f : X → Y put Rf (x, y)=(f (x),f (y)) if f (x) = f (y),and d otherwise. R preserves intersections but not inverse images. Proposition 2.3 [27]. For every set functor F there exists an essential ly unique set functor F which coincides with F on nonempty sets and functions and preserves ﬁnite intersections (whence monomorphisms). ¯ ¯ Remark 2.4. (1) In fact, Trnková gave a construction of F : she deﬁned F ∅ as the set of all natural transformations C → F,where C is the set functor with 01 01 C ∅ = ∅ and C X =1 for all nonempty sets X . For the empty map e : ∅→ X 01 01 with X = ∅, Fe maps a natural transformation τ : C → F to the element given by τ :1 → FX . (2) The above functor F is called the Trnková hul l of F . It allows us to achieve preservation of intersections for all ﬁnitary set functors. Intuitively, a functor on On Well-Founded and Recursive Coalgebras 21 sets is ﬁnitary if its behavior is completely determined by its action on ﬁnite sets and functions. For a general functor, this intuition is captured by requiring that the functor preserves ﬁltered colimits [8]. For a set functor F this is equivalent to being ﬁnitely bounded, which is the following condition: for each element x ∈ FX there exists a ﬁnite subset M ⊆ X such that x ∈ Fi[FM ],where i : M→ X is the inclusion map [7,Rem. 3.14]. Proposition 2.5 [4,p. 66]. The Trnková hul l of a ﬁnitary set functor preserves al l intersections. 2.3 Factorizations. Recall that an epimorphism e : A → B is called strong if it satisﬁes the following diagonal ﬁl l-in property: given a monomorphism m : C D and morphisms f : A → C and g : B → D such that m · f = g · e then there exists a unique d : B → C such that f = d · e and g = m · d. Every complete and well-powered category has factorizations of morphisms: every morphism f may be written as f = m · e,where e is a strong epimorphism and m is a monomorphism [9,Prop. 4.4.3]. We call the subobject m the image of f . It follows from a result in Kurz’ thesis [16,Prop. 1.3.6] that factorizations of morphisms lift to coalgebras: Proposition 2.6 (Coalg F inherits factorizations from A ). Suppose that F preserves monomorphisms. Then the category Coalg F has factorizations of homomorphisms f as f = m · e,where e is carried by a strong epimorphism and m by a monomorphism in A . The diagonal ﬁl l-in property holds in Coalg F . Remark 2.7. By a subcoalgebra of a coalgebra (A, α) we mean a subobject in Coalg F represented by a homomorphism m:(B, β) (A, α),where m is monic in A . Similarly, by a strong quotient of a coalgebra (A, α) we mean one represented by a homomorphism e:(A, α) (C, γ) with e strongly epic in A . 2.4 Chains. By a transﬁnite chain in a category A we understand a functor from the ordered class Ord of all ordinals into A . Moreover, for an ordinal λ,a λ-chain in A is a functor from λ to A . A category has colimits of chains if for every ordinal λ it has a colimit of every λ-chain. This includes the initial object 0 (the case λ =0). Deﬁnition 2.8. (1) A category A has smooth monomorphisms if for every λ-chain C of monomorphisms a colimit exists, its colimit cocone is formed by monomorphisms, and for every cone of C formed by monomorphisms, the factorizing morphism from colim C is monic. In particuar, every morphism from 0 is monic. (2) A has universal ly smooth monomorphisms if A also has pullbacks, and for every morphism f : X → colim C , the functor A / colim C → A /X forming pullbacks along f preserves the colimit of C. This implies that initial object 0 is strict, i.e. every morphism f : X → 0 is an isomorphism. Indeed, consider the empty chain (λ =0). Example 2.9. (1) Set has universally smooth monomorphisms. 22 J. Ad´ amek et al. (2) Vec has smooth monomorphisms, but not universally so because the initial object is not strict. (3) Categories in which colimits of chains and pullbacks are formed “set-like” have universally smooth monomorphisms. These include the categories of posets, graphs, topological spaces, presheaf categories, and many varieties, such as monoids, groups, and unary algebras. (4) Every locally ﬁnitely presentable category A with a strict initial object (see Remark 2.12(1)) has smooth monomorphisms. This follows from [8,Prop. 1.62]. Moreover, since pullbacks commute with colimits of chains, it is easy to prove that colimits of chains are universal using the strictness of 0. (5) The category CPO of complete partial orders does not have smooth mono- morphisms. Indeed, consider the ω-chain of linearly ordered sets A = {0,...,n}+ } ( a top element) with inclusion maps A → A . Its colimit is the linearly n n+1 ordered set N + { } of natural numbers with two added top elements For the sub-cpo N + { }, the inclusions of A are monic and form a cocone. But the unique factorizing morphism from the colimit is not monic. Notation 2.10. For every object A we denote by Sub(A) the poset of all subob- jects of A (represented by monomorphisms s : S A), where s ≤ s if there exists i with s = s · i.If A has pullbacks we have, for every morphism f : A → B,the ← − inverse image operator, viz. the monotone map f : Sub(B) → Sub(A) assigning toasubobject s : S A the subobject of B obtained by forming the inverse image of s under f , i.e. the pullback of s along f . ← − Lemma 2.11. If A is complete and wel l-powered, then f has a left adjoint → − given by the (direct) image operator f : Sub(A) → Sub(B). It maps a subobject t : T B to the subobject of A given by the image of f · t; in symbols we have → − ← − f (t) ≤ s iﬀ t ≤ f (s). Remark 2.12. If A is a complete and well-powered category, then Sub(A) is a complete lattice. Now suppose that A has smooth monomorphisms. (1) In this setting, the unique morphism ⊥ :0 → A is a monomorphism and therefore is the bottom element of the poset Sub(A). (2) Furthermore, a join of a chain in Sub(A) is obtained by forming a colimit, in the obvious way. (3) If A has universally smooth monomorphisms, then for every morphism ← − f : A → B, the operator f : Sub(B) → Sub(A) preserves unions of chains. Remark 2.13. Recall [1] that every endofunctor F yields the initial-algebra i 0 chain, viz. a transﬁnite chain formed by the objects F 0 of A , as follows: F 0=0, i+1 i the initial object; F 0= F (F 0), and for a limit ordinal i we take the colimit j i j of the chain (F 0) . The connecting morphisms w : F 0 → F 0 are deﬁned j<i i,j by a similar transﬁnite recursion. On Well-Founded and Recursive Coalgebras 23 3 Recursive Coalgebras Assumption 3.1. We work with a standard set theory (e.g. Zermelo-Fraenkel), assuming the Axiom of Choice. In particular, we use transﬁnite induction on several occasions. (We are not concerned with constructive foundations in this paper.) Throughout this paper we assume that A is a complete and well-powered category A and that F : A → A preserves monomorphisms. For A = Set the condition that F preserves monomorphisms may be dropped. In fact, preservation of non-empty monomorphism is suﬃcient in general (for a suitable notion of non-empty monomorphism) [21, Lemma 2.5], and this holds for every set functor. The following deﬁnition of recursive coalgebras was ﬁrst given by Osius [22]. Taylor [24] speaks of coalgebras obeying the recursion scheme. Capretta et al. [10] extended the concept to parametrical ly recursive coalgebra by dualizing completely iterative algebras [20]. Deﬁnition 3.2. A coalgebra α : A → FA is called recursive if for every algebra e : FX → X there exists a unique coalgebra-to-algebra morphism e : A → X , i.e. a unique morphism such that the square on the left below commutes: † † e e AX AX α e α,A e Fe ×A Fe FA FX FA ×AFX × A (A, α) is called parametrical ly recursive if for every morphism e : FX × A → X there is a unique morphism e : A → X such that the square on the right above commutes. Example 3.3. (1) A graph regarded as a coalgebra for P is recursive iﬀ it has no inﬁnite path. This is an immediate consequence of the General Recursion Theorem (see Corollary 5.6 and Example 4.5(2)). (2) Let ι : F (μF ) → μF be an initial algebra. By Lambek’s Lemma, ι is an −1 isomorphism. So we have a coalgebra ι : μF → F (μF ). This algebra is (para- metrically) recursive. By [20,Thm. 2.8], in dual form, this is precisely the same as the terminal parametrically recursive coalgebra (see also [10,Prop. 7]). (3) The initial coalgebra 0 → F 0 is recursive. (4)If (C, γ) is recursive so is (FC, F γ), see [10,Prop. 6]. (5) Colimits of recursive coalgebras in Coalg F are recursive. This is easy to prove, using that colimits of coalgebras are formed on the level of the underlying category. (6) It follows from items (3)–(5) that in the initial-algebra chain from Re- i i+1 mark 2.13 all coalgebras w : F 0 → F 0, i ∈ Ord, are recursive. i,i+1 24 J. Ad´ amek et al. (7) Every parametrically recursive coalgebra is recursive. (To see this, form for agiven e : FX → X the morphism e = e · π,where π : FX × A → FX is the projection.) In Corollaries 5.6 and 5.9 we will see that the converse often holds. Here is an example where the converse fails [3]. Let R : Set → Set be the functor deﬁned in Example 2.2(4). Also, let C = {0, 1}, and deﬁne γ : C → RC by γ(0) = γ(1) = (0, 1).Then (C, γ) is a recursive coalgebra. Indeed, for every algebra α : RA → A the constant map h : C → A with h(0) = h(1) = α(d) is the unique coalgebra-to-algebra morphism. However, (C, γ) is not parametrically recursive. To see this, consider any morphism e : RX ×{0, 1}→ X such that RX contains more than one pair (x ,x ), x = x with e((x ,x ),i)= x for i =0, 1.Theneachsuchpairyields 0 1 0 1 0 1 i h : C → X with h(i)= x making the appropriate square commutative. Thus, (C, γ) is not parametrically recursive. (8) Capretta et al. [11] showed that recursivity semantically models divide-and- conquer programs, as demonstrated by the example of Quicksort. For every linearly ordered set A (of data elements), Quicksort is usually deﬁned as the ∗ ∗ recursive function q : A → A given by q(ε)= ε and q(aw)= q(w ) (aq(w )), ≤a >a where A is the set of all lists on A, ε is the empty list, is the concatenation of lists and w denotes the list of those elements of w which are less than or equal ≤a than a; analogously for w . >a Now consider the functor FX =1 + A × X × X on Set,where 1= {•},and ∗ ∗ ∗ form the coalgebra s : A → 1+ A × A × A given by s(ε)= • and s(aw)=(a, w ,w ) for a ∈ A and w ∈ A . ≤a >a We shall see that this coalgebra is recursive in Example 5.3. Thus, for the ∗ ∗ ∗ F -algebra m :1 + A × A × A → A given by m(•)= ε and m(a, w, v)= w (av) there exists a unique function q on A such that q = m · Fq · s. Notice that the last equation reﬂects the idea that Quicksort is a divide-and-conquer algorithm. The coalgebra structure s divides a list into two parts w and w .Then Fq ≤a >a sorts these two smaller lists, and ﬁnally in the combine- (or conquer-) step, the algebra structure m merges the two sorted parts to obtain the desired whole sorted list. Jeannin et al. [15, Sec. 4] provide a number of recursive functions arising in programming that are determined by recursivity of a coalgebra, e.g. the gcd of integers, the Ackermann function, and the Towers of Hanoi. 4 The Next Time Operator and Well-Founded Coalgebras As we have mentioned in the Introduction, the main issue of this paper is the relationship between two concepts pertaining to coalgebras: recursiveness and On Well-Founded and Recursive Coalgebras 25 well-foundedness. The concept of well-foundedness is well-known for directed graphs (G, →): it means that there are no inﬁnite directed paths g → g →· · · . 0 1 For a set X with a relation R, well-foundedness means that there are no backwards sequences ··· Rx Rx Rx , i.e. the converse of the relation is well-founded as a 2 1 0 graph. Taylor [24,Def. 6.2.3] gave a more general category theoretic formulation of well-foundedness. We observe here that his deﬁnition can be presented in a compact way, by using an operator that generalizes the way one thinks of the semantics of the ‘next time’ operator of temporal logics for non-deterministic (or even probabilistic) automata and transitions systems. It is also strongly related to the algebraic semantics of modal logic, where one passes from a graph G to a function on P G. Jacobs [14] deﬁned and studied the ‘next time’ operator on coalgebras for Kripke polynomial set functors. This can be generalized to arbitrary functors as follows. Recall that Sub(A) denotes the complete lattice of subobjects of A. Deﬁnition 4.1 [4,Def. 8.9]. Every coalgebra α : A → FA induces an endo- function on Sub(A), called the next time operator ← − : Sub(A) → Sub(A), (s)= α (Fs) for s ∈ Sub(A). In more detail: we deﬁne s and α(s) by the pullback in (4.1). (Being a pullback is indicated by the “corner” symbol.) In words, α(s) SFS assigns to each subobject s : S A the inverse image of Fs under α.Since Fs is a monomorphism, s is a Fs (4.1) monomorphism and α(s) is (for every representation AFA s of that subobject of A) uniquely determined. Example 4.2. (1) Let A be a graph, considered as a coalgebra for P : Set → Set. If S ⊆ A is a set of vertices, then S is the set of vertices all of whose successors belong to S. (2) For the set functor FX = P (Σ × X ) expressing labelled transition systems the operator for a coalgebra α : A → P (Σ × A) is the semantic counterpart of the next time operator of classical linear temporal logic, see e.g. Manna and Pnüeli [18]. In fact, for a subset S→ A we have that S consists of those states all of whose next states lie in S, in symbols: S = x ∈ A | (s, y) ∈ α(x) implies y ∈ S, for all s ∈ Σ . The next time operator allows a compact deﬁnition of well-foundedness as characterized by Taylor [24, Exercise VI.17] (see also [6, Corollary 2.19]): Deﬁnition 4.3. A coalgebra is wel l-founded if id is the only ﬁxed point of its next time operator. Remark 4.4. (1) Let us call a subcoalgebra m:(B, β) (A, α) cartesian provided that the square (4.2) is a pullback. Then (A, α) is well-founded iﬀ it has no proper cartesian BFB subcoalgebra. That is, if m:(B, β) (A, α) is a (4.2) Fm cartesian subcoalgebra, then m is an isomorphism. AFA Indeed, the ﬁxed points of next time are precisely the 26 J. Ad´ amek et al. cartesian subcoalgebras. (2) A coalgebra is well-founded iﬀ has a unique pre-ﬁxed point m ≤ m. Indeed, since Sub(A) is a complete lattice, the least ﬁxed point of a monotone map is its least pre-ﬁxed point. Taylor’s deﬁnition [24,Def. 6.3.2] uses that property: he calls a coalgebra well-founded iﬀ has no proper subobject as a pre-ﬁxed point. Example 4.5. (1) Consider a graph as a coalgebra α : A → P A for the power- set functor (see Example 2.1). A subcoalgebra is a subset m : B→ A such that with every vertex v it contains all neighbors of v. The coalgebra structure β : B → P B is then the domain-codomain restriction of α.Tosaythat B is a cartesian subcoalgebra means that whenever a vertex of A has all neighbors in B, it also lies in B. It follows that (A, α) is well-founded iﬀ it has no inﬁnite directed path, see [24, Example 6.3.3]. (2) If μF exists, then as a coalgebra it is well-founded. Indeed, in every pull- −1 back (4.2),since ι (as α) is invertible, so is β. The unique algebra homomorph- −1 ism from μF to the algebra β : FB → B is clearly inverse to m. (3) If a set functor F fulﬁls F ∅ = ∅, then the only well-founded coalgebra is the empty one. Indeed, this follows from the fact that the empty coalgebra is a ﬁxed point of . For example, a deterministic automaton over the input alphabet Σ, as a coalgebra for FX = {0, 1}× X , is well-founded iﬀ it is empty. (4) A non-deterministic automaton may be considered as a coalgebra for the set functor FX = {0, 1}× (P X ) . It is well-founded iﬀ the state transition graph is well-founded (i.e. has no inﬁnite path). This follows from Corollary 4.10 below. (5) A linear weighted automaton, i.e. a coalgebra for FX = K × X on Vec , is well-founded iﬀ every path in its state transition graph eventually leads to 0. This means that every path starting in a given state leads to the state 0 after ﬁnitely many steps (where it stays). Notation 4.6. Given a set functor F , we deﬁne for every set X the map τ : FX → P X assigning to every element x ∈ FX the intersection of all subsets m : M→ X such that x lies in the image of Fm: τ (x)= {m | m : M→ X satisﬁes x ∈ Fm[FM ]}. (4.3) Recall that a functor preserves intersections if it preserves (wide) pullbacks of families of monomorphisms. Gumm [13,Thm. 7.3] observed that for a set functor preserving intersections, the maps τ : FX → P X in (4.3) form a “subnatural” transformation from F to the power-set functor P . Subnaturality means that (although these maps do not form a natural transformation in general) for every monomorphism i : X → Y we have a commutative square: FX P X (4.4) Fi Pi FY P Y On Well-Founded and Recursive Coalgebras 27 Remark 4.7. As shownin[13,Thm. 7.4]and[23,Prop. 7.5], a set functor F preserves intersections iﬀ the squares in (4.4) above are pullbacks. Moreover, loc. cit. and [13,Thm. 8.1]provethat τ : F → P is a natural transformation, provided F preserves inverse images and intersections. Deﬁnition 4.8. Let F be a set functor. For every coalgebra α : A → FA its α τ canonical graph is the following coalgebra for P : A − → FA − − → P A. Thanks to the subnaturality of τ one obtains the following results. Proposition 4.9. For every set functor F preserving intersections, the next time operator of a coalgebra (A, α) coincides with that of its canonical graph. Corollary 4.10 [24,Rem. 6.3.4]. A coalgebra for a set functor preserving intersections is wel l-founded iﬀ its canonical graph is wel l-founded. Example 4.11. (1) For a (deterministic or non-deterministic) automaton, the canonical graph has an edge from s to t iﬀ there is a transition from s to t for some input letter. Thus, we obtain the characterization of well-foundedness as stated in Example 4.5(3) and (4). (2) Every polynomial functor H : Set → Set preserves intersections. Thus, a coalgebra (A, α) is well-founded if there are no inﬁnite paths in its canonical graph. The canonical graph of A has an edge from a to b if α(a) is of the form σ(c ,...,c ) for some σ ∈ Σ and if b is one of the c ’s. 1 n n i (3) Thus, for the functor FX =1 + A × X × X , the coalgebra (A ,s) of Example 3.3(8) is easily seen to be well-founded via its canonical graph. Indeed, this graph has for every list w one outgoing edge to the list w and one to w ≤a >a for every a ∈ A. Hence, this is a well-founded graph. Lemma 4.12. The next time operator is monotone: if m ≤ n,then m ≤n. Lemma 4.13. Let α : A → FA be a coalgebra and m : B A a subobject. (1) There is a coalgebra structure β : B → FB for which m gives a subcoalgebra of (A, α) iﬀ m ≤m. (2) There is a coalgebra structure β : B → FB for which m gives a cartesian subcoalgebra of (A, α) iﬀ m = m. Lemma 4.14. For every coalgebra homomorphism f :(B, β) → (A, α) we have ← − ← − · f ≤ f · , β α where and denote the next time operators of the coalgebras (A, α) and α β (B, β), respectively, and ≤ is the pointwise order. Corollary 4.15. For every coalgebra homomorphism f :(B, β) → (A, α) we ← − ← − have · f = f · , provided that either β α 28 J. Ad´ amek et al. (1) f is a monomorphism in A and F preserves ﬁnite intersections, or (2) F preserves inverse images. Deﬁnition 4.16 [4]. The wel l-founded part of a coalgebra is its largest well- founded subcoalgebra. The well-founded part of a coalgebra always exists and is the coreﬂection in the category of well-founded coalgebras [6,Prop. 2.27]. We provide a new, shorter proof of this fact. The well-founded part is obtained by the following: Construction 4.17 [6, Not. 2.22]. Let α : A → FA be a coalgebra. We know that Sub(A) is a complete lattice and that the next time operator is monotone (see Lemma 4.12). Hence, by the Knaster-Tarski ﬁxed point theorem, has a ∗ ∗ least ﬁxed point, which we denote by a : A A. ∗ ∗ ∗ By Lemma 4.13(2), we know that there is a coalgebra structure α : A → FA ∗ ∗ ∗ so that a :(A ,α ) (A, α) is the smallest cartesian subcoalgebra of (A, α). ∗ ∗ Proposition 4.18. For every coalgebra (A, α), the coalgebra (A ,α ) is wel l- founded. ∗ ∗ Proof. Let m:(B, β) (A ,α ) be a cartesian subcoalgebra. By Lemma 4.13, ∗ ∗ a · m : B → A is a ﬁxed point of .Since a is the least ﬁxed point, we have ∗ ∗ ∗ ∗ ∗ ∗ a ≤ a · m, i.e. a = a · m · x for some x : A B.Since a is monic, we thus have m · x = id .So m is a monomorphism and a split epimorphism, whence an isomorphism. Proposition 4.19. The ful l subcategory of Coalg F given by wel l-founded coal- gebras is coreﬂective. In fact, the wel l-founded coreﬂection of a coalgebra (A, α) ∗ ∗ ∗ is its wel l-founded part a :(A ,α ) (A, α). Proof. We are to prove that for every coalgebra homomorphism f :(B, β) → (A, α),where (B, β) is well-founded, there exists a coalgebra homomorphism ∗ ∗ ∗ f :(B, β) → (A ,α ) such that a · f = f . The uniqueness is easy. ← − For the existence of f , we ﬁrst observe that f (a ) is a pre-ﬁxed point of ← − ← − ← − ∗ ∗ ∗ : indeed, using Lemma 4.14 we have ( f (a )) ≤ f ( (a )) = f (a ). β β α ← − ∗ ∗ By Remark 4.4(2), we therefore have id = b ≤ f (a ) in Sub(B).Usingthe → − adjunction of Lemma 2.11,wehave f (id ) ≤ a in Sub(A). Now factorize f as e m → − → − B C A.We have f (id )= m, and we then obtain m = f (id ) ≤ a , B B ∗ ∗ i.e. there exists a morphism h : C A such that a · h = m.Thus, f = ∗ ∗ ∗ h · e : B → A is a morphism satisfying a · f = a · h · e = m · e = f . It follows ∗ ∗ ∗ that f is a coalgebra homomorphism from (B, β) to (A ,α ) since f and a are and F preserves monomorphisms. Construction 4.20 [6, Not. 2.22]. Let (A, α) be a coalgebra. We obtain a , the least ﬁxed point of , as the join of the following transﬁnite chain of subobjects a : A A, i ∈ Ord. First, put a = ⊥ , the least subobject of A. i i 0 A Given a : A A, put a = a : A = A A. For every limit ordinal i i i+1 i i+1 i j, put a = a .Since Sub(A) is a set, there exists an ordinal i such that j i i<j ∗ ∗ a = a : A A. i On Well-Founded and Recursive Coalgebras 29 Remark 4.21. Note that, whenever monomorphisms are smooth, we have A = 0 and the above join a is obtained as the colimit of the chain of the subobject a : A A, i<j (see Remark 2.12). i i If F is a ﬁnitary functor on a locally ﬁnitely presentable category, then the least ordinal i with a = a is at most ω, but in general one needs transﬁnite iteration to reach a ﬁxed point. Example 4.22. Let (A, α) be a graph regarded as a coalgebra for P (see Example 2.1). Then A = ∅, A is formed by all leaves; i.e. those nodes with no 0 1 neighbors, A by all leaves and all nodes such that every neighbor is a leaf, etc. We see that a node x lies in A iﬀ every path starting in x has length at most i+1 i. Hence A = A is the set of all nodes from which no inﬁnite paths start. We close with a general fact on well-founded parts of ﬁxed points (i.e. (co)alge- bras whose structure is invertible). The following result generalizes [15, Cor. 3.4], and it also appeared before for functors preserving ﬁnite intersections [4,The- orem 8.16 and Remark 8.18]. Here we lift the latter assumption (see [5,The- orem 7.6] for the new proof): Theorem 4.23. Let A be a complete and wel l-powered category with smooth monomorphisms. For F preserving monomorphisms, the wel l-founded part of every ﬁxed point is an initial algebra. In particular, the only wel l-founded ﬁxed point is the initial algebra. Example 4.24. We illustrate that for a set functor F preserving monomorph- isms, the well-founded part of the terminal coalgebra is the initial algebra. ∞ ∗ Consider FX = A × X +1. The terminal coalgebra is the set A ∪ A of ﬁnite and inﬁnite sequences from the set A. The initial algebra is A . It is easy to ∗ ∞ ∗ check that A is the well-founded part of A ∪ A . 5 The General Recursion Theorem and its Converse The main consequence of well-foundedness is parametric recursivity. This is Taylor’s General Recursion Theorem [24,Theorem 6.3.13]. Taylor assumed that F preserves inverse images. We present a new proof for which it is suﬃcient that F preserves monomorphisms, assuming those are smooth. Theorem 5.1 (General Recursion Theorem). Let A be a complete and wel lpowered category with smooth monomorphisms. For F : A → A preserving monomorphisms, every wel l-founded coalgebra is parametrical ly recursive. Proof sketch. (1) Let (A, α) be well-founded. We ﬁrst prove that it is recursive. We use the subobjects a : A A of Construction 4.20 , the corresponding i i One might object to this use of transﬁnite recursion, since Theorem 5.1 itself could be used as a justiﬁcation for transﬁnite recursion. Let us emphasize that we are not presenting Theorem 5.1 as a foundational contribution. We are building on the classical theory of transﬁnite recursion. 30 J. Ad´ amek et al. morphisms α(a ): A = A → FA (cf. Deﬁnition 4.3), and the recursive i i+1 i i coalgebras (F 0,w ) of Example 3.3(6). We obtain a natural transformation i,i+1 h from the chain (A ) in Construction 4.20 to the initial-algebra chain (F 0) (see Remark 2.13) by transﬁnite recursion. Now for every algebra e : FX → X , we obtain a unique coalgebra-to-algebra morphism f : F 0 → X , i.e. we have that f = e · Ff · w .Since (A, α) is i i i i,i+1 well-founded, we know that α = α = α(a ) for some i. From this it is not diﬃcult to prove that f · h is a coalgebra-to-algebra morphism from (A, α) to (X, e). i i In order to prove uniqueness, we prove by transﬁnite induction that for any † † given coalgebra-to-algebra homomorphism e , one has e · a = f · h · a for j j j j every ordinal number j. Then for the above ordinal number i with a = id ,we i A have e = f · h , as desired. This shows that (A, α) is recursive. i i (2) We prove that (A, α) is parametrically recursive. Consider the coalgebra α, id : A → FA × A for F (−) × A. This functor preserves monomorphisms since F does and monomorphisms are closed under products. The next time operator on Sub(A) is the same for both coalgebras since the square (4.1) is a pullback if and only if the square on the right below is one. Since id is the unique ﬁxed point of w.r.t. F (see Deﬁnition 4.3), it is also the α(m),m SFS × A unique ﬁxed point of w.r.t. F (−) × A. Thus, (A, α, id ) is a well-founded coal- A m Fm×A gebra for F (−) × A. By the previous ar- α,A AFA × A gument, this coalgebra is thus recursive for F (−) × A; equivalently, (A, α) is parametrically recursive for F . Theorem 5.2. For every endofunctor on Set or Vec (vector spaces and linear maps), every wel l-founded coalgebra is parametrical ly recursive. Proof sketch. For Set, we apply Theorem 5.1 to the Trnková hull F (see Proposi- tion 2.3), noting that F and F have the same (non-empty) coalgebras. Moreover, one can show that every well-founded (or recursive) F -coalgebra is a well-founded (recursive, resp.) F -coalgebra. For Vec , observe that monomorphisms split and are therefore preserved by every endofunctor F . Example 5.3. We saw in Example 4.11(3) that for FX =1 + A × X × X the coalgebra (A, s) from Example 3.3(8) is well-founded, and therefore it is (parametrically) recursive. Example 5.4. Well-founded coalgebras need not be recursive when F does not preserve monomorphisms. We take A to be the category of sets with a predicate, i.e. pairs (X, A),where A ⊆ X . Morphisms f :(X, A) → (Y, B) satisfy f [A] ⊆ B. Denote by 1 the terminal object (1, 1). We deﬁne an endofunctor F by F (X, ∅)=(X +1, ∅), and for A = ∅, F (X, A)= 1. For a morphism f :(X, A) → (Y, B), put Ff = f + id if A = ∅;if A = ∅,thenalso B = ∅ and Ff is id : 1 → 1. On Well-Founded and Recursive Coalgebras 31 The terminal coalgebra is id : 1 → 1, and it is easy to see that it is well- founded. But it is not recursive: there are no coalgebra-to-algebra morphisms into an algebra of the form F (X, ∅) → (X, ∅). We next prove a converse to Theorem 5.1: “recursive =⇒ well-founded”. Related results appear in Taylor [23, 24], Adámek et al. [3] and Jeannin et al. [15]. Recall universally smooth monomorphisms from Deﬁnition 2.8(2).A pre-ﬁxed point of F is a monic algebra α : FA A. Theorem 5.5. Let A be a complete and wel lpowered category with universal ly smooth monomorphisms, and suppose that F : A → A preserves inverse images and has a pre-ﬁxed point. Then every recursive coalgebra is wel l-founded. Proof. (1) We ﬁrst observe that an initial algebra exists. This follows from results by Trnková et al. [25] as we now brieﬂy recall. Recall the initial-algebra chain from Remark 2.13.Let β : FB B be a pre-ﬁxed point. Then there is a unique cocone β : F 0 → B satisfying β = β · Fβ .Moreover, each β is monomorphic. i i+1 i i Since B has only a set of subobjects, there is some λ such that for every i>λ, all of the morphisms β represent the same subobject of B. Consequently, w i λ,λ+1 of Remark 2.13 is an isomorphism, due to β = β · w .Then μF = F 0 λ λ+1 λ,λ+1 −1 with the structure ι = w : F (μF ) → μF is an initial algebra. λ,λ+1 (2) Now suppose that (A, α) is a recursive coalgebra. Then there exists a unique −1 coalgebra homomorphism h:(A, α) → (μF, ι ). Let us abbreviate w by iλ c : F 0 μF , and recall the subobjects a : A A from Construction 4.20. i i i We will prove by transﬁnite induction that a is the inverse image of c under h;in i i ← − symbols: a = h (c ) for all ordinals i. Then it follows that a is an isomorphism, i i λ since so is c , whence (A, α) is well-founded. In the base case i =0 this is clear since A = W =0 is a strict initial object. 0 0 For the isolated step we compute the pullback of c : W → μF along h i+1 i+1 using the following diagram: α(a ) i Fh A FA FW i+1 i i i+1 a Fa Fc i+1 i i α Fh ι AFA F (μF ) μF By the induction hypothesis and since F preserves inverse images, the middle square above is a pullback. Since the structure map ι of the initial algebra is an isomorphism, it follows that the middle square pasted with the right-hand triangle is also a pullback. Finally, the left-hand square is a pullback by the deﬁnition of a . Thus, the outside of the above diagram is a pullback, as required. i+1 For a limit ordinal j, we know that a = a and similarly, c = c j i j i i<j i<j since W = colim W and monomorphisms are smooth (see Remark 2.12(2)). j i<j j ← − Using Remark 2.12(3) and the induction hypothesis we thus obtain h (c )= ← − ← − h c = h (c )= a = a . i i i j i<j i<j i<j 32 J. Ad´ amek et al. Corollary 5.6. Let A and F satisfy the assumptions of Theorem 5.5. Then the fol lowing properties of a coalgebra are equivalent: (1) wel l-foundedness, (2) parametric recursiveness, (3) recursiveness, −1 (4) existence of a homomorphism into (μF, ι ), (5) existence of a homomorphism into a wel l-founded coalgebra. Proof sketch. We already know (1) ⇒ (2) ⇒ (3). Since F has an initial algebra (as proved in Theorem 5.5), the implication (3) ⇒ (4) follows from Example 3.3(2). In Theorem 5.5 we also proved (4) ⇒ (1). The implication (4) ⇒ (5) follows −1 from Example 4.5(2). Finally, it follows from [6, Remark 2.40]that (μF, ι ) is a terminal well-founded coalgebra, whence (5) ⇒ (4). Example 5.7. (1) The category of many-sorted sets satisﬁes the assumptions of Theorem 5.5, and polynomial endofunctors on that category preserve inverse images. Thus, we obtain Jeannin et al.’s result [15,Thm. 3.3]that(1)–(4)in Corollary 5.6 are equivalent as a special instance. (2) The implication (4) ⇒ (3)in Corollary 5.6 does not hold for vector spaces. In fact, for the identity functor on Vec we have μId =(0, id ). Hence, every coalgebra has a homomorphism into μId . However, not every coalgebra is recursive, e.g. the coalgebra (K, id ) admits many coalgebra-to-algebra morphisms to the algebra (K, id ). Similarly, the implication (4) ⇒ (1) does not hold. We also wish to mention a result due to Taylor [23,Rem. 3.8]. It uses the concept of a subobject classiﬁer originating in [17] and prominent in topos theory. This is an object Ω with a subobject t:1 Ω such that for every subobject b : B A ˆ ˆ there is a unique b : A → Ω such that b is the inverse image of t under b.By deﬁnition, every elementary topos has a subobject classiﬁer, in particular every category Set with C small. Our standing assumption that A is a complete and well-powered category is not needed for the next result: ﬁnite limits are suﬃcient. Theorem 5.8 (Taylor [23]). Let F be an endofunctor preserving inverse im- ages on a ﬁnitely complete category with a subobject classiﬁer. Then every recursive coalgebra is wel l-founded. Corollary 5.9. For every set functor preserving inverse images, the fol lowing properties of a coalgebra are equivalent: wel l-foundedness ⇐⇒ parametric recursiveness ⇐⇒ recursiveness. Example 5.10. Thehypothesisin Theorems 5.5 and 5.8 that the functor preserves inverse images cannot be lifted. In order to see this, we consider the functor R : Set → Set of Example 2.2(4). It preserves monomorphisms but not inverse images. The coalgebra A = {0, 1} with the structure α constant to (0, 1) is recursive: given an algebra β : RB → B, the unique coalgebra-to-algebra On Well-Founded and Recursive Coalgebras 33 homomorphism h : {0, 1}→ B is given by h(0) = h(1) = β(d). But A is not well-founded: ∅ is a cartesian subcoalgebra. −1 Recall that an initial algebra (μF, ι) is also considered as a coalgebra (μF, ι ). Taylor [23, Cor. 9.9] showed that, for functors preserving inverse images, the terminal well-founded coalgebra is the initial algebra. Surprisingly, this result is true for al l set functors. Theorem 5.11 [6,Thm. 2.46]. For every set functor, a terminal wel l-founded coalgebra is precisely an initial algebra. Theorem 5.12. For every functor on Vec preserving inverse images, the fol- lowing properties of a coalgebra are equivalent: wel l-foundedness ⇐⇒ parametric recursiveness ⇐⇒ recursiveness. 6 Closure Properties of Well-founded Coalgebras In this section we will see that strong quotients and subcoalgebras (see Remark 2.7) of well-founded coalgebras are well-founded again. We mention the following corollary to Proposition 4.19. For endofunctors on sets preserving inverse images this was stated by Taylor [24, Exercise VI.16]: Proposition 6.1. The subcategory of Coalg F formed by al l wel l-founded coal- gebras is closed under strong quotients and coproducts in Coalg F . This follows from a general result on coreﬂective subcategories [2,Thm. 16.8]: the category Coalg F has the factorization system of Proposition 2.6,and its full subcategory of well-founded coalgebras is coreﬂective with monomorphic coreﬂections (see Proposition 4.19). Consequently, it is closed under strong quotients and colimits. We prove next that, for an endofunctor preserving ﬁnite intersections, well- founded coalgebras are closed under subcoalgebras provided that the complete lattice Sub(A) is a frame. This means that for every subobject m : B A and every family m (i ∈ I ) of subobjects of A we have m ∧ m = (m ∧ m ). i i i i∈I i∈I ← − Equivalently, m : Sub(A) → Sub(B) (see Notation 2.10) has a right adjoint m : Sub(B) → Sub(A). This property holds for Set as well as for the categories of posets, graphs, topological spaces, and presheaf categories Set , C small. Moreover, it holds for every Grothendieck topos. The categories of complete partial orders and Vec do not satisfy this requirement. Proposition 6.2. Suppose that F preserves ﬁnite intersections, and let (A, α) be a wel l-founded coalgebra such that Sub(A) a frame. Then every subcoalgebra of (A, α) is wel l-founded. 34 J. Ad´ amek et al. Proof. Let m:(B, β) (A, α) be a subcoalgebra. We will show that the only pre-ﬁxed point of is id (cf. Remark 4.4(2)). Suppose s : S B fulﬁls β B ← − ← − (s) ≤ s.Since F preserves ﬁnite intersections, we have m · = · m by β α β ← − ← − Corollary 4.15(1). The counit of the above adjunction m m yields m(m (s)) ≤ ∗ ∗ ← − ← − s, so that we obtain m( (m (s))) = (m(m (s))) ≤ (s) ≤ s. Using again α ∗ β ∗ β ← − the adjunction m m ,wehaveequivalentlythat (m (s)) ≤ m (s); i.e. m (s) ∗ α ∗ ∗ ∗ is a pre-ﬁxed point of .Since (A, α) is well-founded, Corollary 4.15(1) implies ← − that m (s)= id .Since m is also a right adjoint and therefore preserves the top ∗ A ← − ← − element of Sub(B),wethusobtain id = m(id )= m(m (s)) ≤ s. B A ∗ Remark 6.3. Given a set functor F preserving inverse images, a much better result was proved by Taylor [24, Corollary 6.3.6]: for every coalgebra homo- morphism f :(B, β) → (A, α) with (A, α) well-founded so is (B, β). In fact, our proof above is essentially Taylor’s. Corollary 6.4. If a set functor preserves ﬁnite intersections, then subcoalgebras of wel l-founded coalgebras are wel l-founded. Trnková [26] proved that every set functor preserves all nonempty ﬁnite intersections. However, this does not suﬃce for Corollary 6.4: Example 6.5. A well-founded coalgebra for a set functor can have non-well- founded subcoalgebras. Let F ∅ =1 and FX =1+1 for all nonempty sets X,and let Ff = inl:1 → 1+1 be the left-hand injection for all maps f : ∅→ X with X nonempty. The coalgebra inr:1 → F 1 is not well-founded because its empty subcoalgebra is cartesian. However, this is a subcoalgebra of id :1 + 1 → 1+1 (via the embedding inr), and the latter is well-founded. The fact that subcoalgebras of a well-founded coalgebra are well-founded does not necessarily need the assumption that Sub(A) is a frame. Instead, one may assume that the class of morphisms is universally smooth: Theorem 6.6. If A has universal ly smooth monomorphisms and F preserves ﬁnite intersections, every subcoalgebra of a wel l-founded coalgebra is wel l-founded. 7 Conclusions Well-founded coalgebras introduced by Taylor [24] have a compact deﬁnition based on an extension of Jacobs’ ‘next time’ operator. Our main contribution is a new proof of Taylor’s General Recursion Theorem that every well-founded coalgebra is recursive, generalizing this result to all endofunctors preserving monomorphisms on a complete and well-powered category with smooth monomorphisms. For functors preserving inverse images, we also have seen two variants of the converse implication “recursive ⇒ well-founded”, under additional hypothesis: one due to Taylor for categories with a subobject classiﬁer, and the second one provided that the category has universally smooth monomorphisms and the functor has a pre-ﬁxed point. Various counterexamples demonstrate that all our hypotheses are necessary. On Well-Founded and Recursive Coalgebras 35 References 1. Adámek, J.: Free algebras and automata realizations in the language of categories. Comment. Math. Univ. Carolin. 15, 589–602 (1974) 2. Adámek, J., Herrlich, H., Strecker, G.E.: Abstract and Concrete Categories: The Joy of Cats. Dover Publications, 3rd edn. (2009) 3. Adámek, J., Lücke, D., Milius, S.: Recursive coalgebras of ﬁnitary functors. Theor. In- form. Appl. 41(4), 447–462 (2007) 4. Adámek, J., Milius, S., Moss, L.S.: Fixed points of functors. J. Log. Algebr. Methods Program. 95, 41–81 (2018) 5. Adámek, J., Milius, S., Moss, L.S.: On well-founded and recursive coalgebras (2019), full version; available online at http://arxiv.org/abs/1910.09401 6. Adámek, J., Milius, S., Moss, L.S., Sousa, L.: Well-pointed coalgebras. Log. Methods Comput. Sci. 9(2), 1–51 (2014) 7. Adámek, J., Milius, S., Sousa, L., Wißmann, T.: On ﬁnitary functors. Theor. Appl. Categ. 34, 1134–1164 (2019). available online at https://arxiv.org/abs/1902.05788 8. Adámek, J., Rosický, J.: Locally Presentable and Accessible Categories. Cambridge University Press (1994) 9. Borceux, F.: Handbook of Categorical Algebra: Volume 1, Basic Category Theory. Encyclopedia of Mathematics and its Applications, Cambridge University Press (1994) 10. Capretta, V., Uustalu, T., Vene, V.: Recursive coalgebras from comonads. In- form. and Comput. 204, 437–468 (2006) 11. Capretta, V., Uustalu, T., Vene, V.: Corecursive algebras: A study of general structured corecursion. In: Oliveira, M., Woodcock, J. (eds.) Formal Methods: Foundations and Applications, Lecture Notes in Computer Science, vol. 5902, pp. 84–100. Springer Berlin Heidelberg (2009) 12. Eppendahl, A.: Coalgebra-to-algebra morphisms. In: Proc. Category Theory and Computer Science (CTCS). Electron. Notes Theor. Comput. Sci., vol. 29, pp. 42–49 (1999) 13. Gumm, H.: From T -coalgebras to ﬁlter structures and transition systems. In: Fiadeiro, J.L., Harman, N., Roggenbach, M., Rutten, J. (eds.) Algebra and Coalgebra in Computer Science, Lecture Notes in Computer Science, vol. 3629, pp. 194–212. Springer Berlin Heidelberg (2005) 14. Jacobs, B.: The temporal logic of coalgebras via Galois algebras. Math. Structures Comput. Sci. 12(6), 875–903 (2002) 15. Jeannin, J.B., Kozen, D., Silva, A.: Well-founded coalgebras, revisited. Math. Struc- tures Comput. Sci. 27, 1111–1131 (2017) 16. Kurz, A.: Logics for Coalgebras and Applications to Computer Science. Ph.D. thesis, Ludwig-Maximilians-Universität München (2000) 17. Lawvere, W.F.: Quantiﬁers and sheaves. Actes Congès Intern. Math. 1, 329–334 (1970) 18. Manna, Z., Pnüeli, A.: The Temporal Logic of Reactive and Concurrent Systems: Speciﬁcation. Springer-Verlag (1992) 19. Meseguer, J., Goguen, J.A.: Initiality, induction, and computability. In: Algebraic methods in semantics (Fontainebleau, 1982), pp. 459–541. Cambridge Univ. Press, Cambridge (1985) 20. Milius, S.: Completely iterative algebras and completely iterative monads. In- form. and Comput. 196, 1–41 (2005) 36 J. Ad´ amek et al. 21. Milius, S., Pattinson, D., Wißmann, T.: A new foundation for ﬁnitary corecursion and iterative algebras. Inform. and Comput. 217 (2020), available online at https: //doi.org/10.1016/j.ic.2019.104456. 22. Osius, G.: Categorical set theory: a characterization of the category of sets. J. Pure Appl. Algebra 4(79–119)(1974) 23. Taylor, P.: Towards a uniﬁed treatment of induction I: the general recursion theorem (1995–6), preprint, available at www.paultaylor.eu/ordinals/#towuti 24. Taylor, P.: Practical Foundations of Mathematics. Cambridge University Press (1999) 25. Trnková, V., Adámek, J., Koubek, V., Reiterman, J.: Free algebras, input processes and free monads. Comment. Math. Univ. Carolin. 16, 339–351 (1975) 26. Trnková, V.: Some properties of set functors. Comment. Math. Univ. Carolin. 10, 323–352 (1969) 27. Trnková, V.: On a descriptive classiﬁcation of set functors I. Com- ment. Math. Univ. Carolin. 12, 143–174 (1971) Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made. The images or other third party material in this chapter are included in the chapter’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. Timed Negotiations 1() 2 3 1 S. Akshay , Blaise Genest ,Lo¨ıc H´elou¨et , and Sharvik Mital IIT Bombay, Mumbai, India {akshayss,sharky}@cse.iitb.ac.in Univ Rennes, CNRS, IRISA, Rennes, France blaise.genest@irisa.fr Univ Rennes, Inria, Rennes, France loic.helouet@inria.fr Abstract. Negotiations were introduced in [6] as a model for concurrent systems with multiparty decisions. What is very appealing with negotia- tions is that it is one of the very few non-trivial concurrent models where several interesting problems, such as soundness, i.e. absence of deadlocks, can be solved in PTIME [3]. In this paper, we introduce the model of timed negotiations and consider the problem of computing the minimum and the maximum execution times of a negotiation. The latter can be solved using the algorithm of [10] computing costs in negotiations, but surprisingly minimum execution time cannot. This paper proposes new algorithms to compute both minimum and maximum execution time, that work in much more general classes of ne- gotiations than [10], that only considered sound and deterministic nego- tiations. Further, we uncover the precise complexities of these questions, ranging from PTIME to Δ -complete. In particular, we show that com- puting the minimum execution time is more complex than computing the maximum execution time in most classes of negotiations we consider. 1 Introduction Distributed systems are notoriously diﬃcult to analyze, mainly due to the ex- plosion of the number of conﬁgurations that have to be considered to answer even simple questions. A challenging task is then to propose models on which analysis can be performed with tractable complexities, preferably within poly- nomial time. Free choice Petri nets are a classical model of distributed systems that allow for eﬃcient veriﬁcation, in particular when the nets are 1-safe [4, 5]. Recently, [6] introduced a new model called negotiations for workﬂows and business processes. A negotiation describes how processes interact in a dis- tributed system: a subset of processes in a node of the system take a synchronous decisions among several outcomes. The eﬀect of this outcome sends contribut- ing processes to a new set of nodes. The execution of a negotiation ends when processes reach a ﬁnal conﬁguration. Negotiations can be deterministic (once an outcome is ﬁxed, each process knows its unique successor node) or not. Negotiations are an interesting model since several properties can be decided with a reasonable complexity. The question of soundness, i.e., deadlock-freedom: Supported by DST/CEFIPRA/INRIA Associated team EQuaVE and DST/SERB Matrices grant MTR/2018/000744. The Author(s) 2020 J. Goubault-Larrecq and B. K¨ onig (Eds.): FOSSACS 2020, LNCS 12077, pp. 37–56, 2020. https://doi.org/10.1007/978-3-030-45231-5_3 38 S. Akshay et al. whether from every reachable conﬁguration one can reach a ﬁnal conﬁguration, is PSPACE-complete. However, for deterministic negotiations, it can be decided in PTIME [7]. The decision procedure uses reduction rules. Reduction techniques were originally proposed for Petri nets [2, 8, 11, 16]. The main idea is to deﬁne transformations rules that produce a model of smaller size w.r.t. the original model, while preserving the property under analysis. In the context of negotia- tions, [7, 3] proposed a sound and complete set of soundness-preserving reduction rules and algorithms to apply these rules eﬃciently. The question of soundness for deterministic negotiations was revisited in [9] and showed NLOGSPACE- complete using anti patterns instead of reduction rules. Further, they show that the PTIME result holds even when relaxing determinism [9]. Negotiation games have also been considered to decide whether one particular process can force ter- mination of a negotiation. While this question is EXPTIME-complete in general, for sound and deterministic negotiations, it becomes PTIME [12]. While it is natural to consider cost or time in negotiations (e.g. think of the Brexit negotiation where time is of the essence, and which we model as running example in this paper), the original model of negotiations proposed by [6] is only qualitative. Recently, [10] has proposed a framework to associate costs to the executions of negotiations, and adapt a static analysis technique based on reduction rules to compute end-to-end cost functions that are not sensitive to scheduling of concurrent nodes. For sound and deterministic negotiations, the end-to-end cost can be computed in O(n.(C + n)), where n is the size of the negotiation and C the time needed to compute the cost of an execution. Requir- ing soundness or determinism seems perfectly reasonable, but asking sound and deterministic negotiations is too restrictive: it prevents a process from waiting for decisions of other processes to know how to proceed. In this paper, we revisit time in negotiations. We attach time intervals to outcomes of nodes. We want to compute maximal and minimal executions times, for negotiations that are not necessarily sound and deterministic. Since we are interested in minimal and maximal execution time, cycles in negotiations can be either bypassed or lead to inﬁnite maximal time. Hence, we restrict this study to acyclic negotiations. Notice that time can be modeled as a cost, following [10], and the maximal execution time of a sound and deterministic negotiation can be computed in PTIME using the algorithm from [10]. Surprisingly however, we give an example (Example 3) for which the minimal execution time cannot be computed in PTIME by this algorithm. The ﬁrst contribution of the paper shows that reachability (whether at least one run of a negotiation terminates) is NP-complete, already for (untimed) deter- ministic acyclic negotiations. This implies that computing minimal or maximal execution time for deterministic (but unsound) acyclic negotiations cannot be done in PTIME (unless NP=PTIME). We characterize precisely the complex- ities of diﬀerent decision variants (threshold, equality, etc.), with complexities ranging from (co-)NP-complete to Δ . We thus turn to negotiations that are sound but not necessarily determinis- tic. Our second contribution is a new algorithm, not based on reduction rules, Timed Negotiations 39 to compute the maximal execution time in PTIME for sound negotiations. It is based on computing the maximal execution time of critical paths in the nego- tiations. However, we show that minimal execution time cannot be computed in PTIME for sound negotiations (unless NP=PTIME): deciding whether the minimal execution time is lower than T is NP-complete, even for T given in unary, using a reduction from a Bin packing problem. This shows that minimal execution time is harder to compute than maximal execution time. Our third contribution consists in deﬁning a class in which the minimal exe- cution time can be computed in (pseudo) PTIME. To do so, we deﬁne the class of k-layered negotiations, for k ﬁxed, that is negotiations where nodes can be or- ganized into layers of at most k nodes at the same depth. These negotiations can be executed without remembering more than k nodes at a time. In this case, we show that computing the maximal execution time is PTIME, even if the negoti- ation is neither deterministic nor sound. The algorithm, not based on reduction rules, uses the k-layer restriction in order to navigate in the negotiation while considering only a polynomial number of conﬁgurations. For minimal execution time, we provide a pseudo PTIME algorithm, that is PTIME if constants are given in unary. Finally, we show that the size of constants do matter: deciding whether the minimal execution time of a k-layered negotiation is less than T is NP-complete, when T is given in binary. We show this by reducing from a Knapsack problem, yet again emphasizing that the minimal execution time of a negotiation is harder to compute than its maximal execution time. This paper is organized as follows. Section 2 introduces the key ingredients of negotiations, determinism and soundness, known results in the untimed setting, and provides our running example modeling the Brexit negotiation. Section 3 introduces time in negotiations, gives a semantics to this new model, and for- malizes several decision problems on maximal and minimal durations of runs in timed negotiations. We recall the main results of the paper in Section 4. Then, Section 5 considers timed execution problems for deterministic negotiations, Sec- tion 6 for sound negotiations, and section 7 for layered negotiations. Proof details for the last three sections are given in an extended version of this paper [1]. 2 Negotiations: Deﬁnitions and Brexit example In this section, we recall the deﬁnition of negotiations, of some subclasses (acyclic and deterministic), as well as important problems (soundness and reachability). Deﬁnition 1 (Negotiation [6, 10]). A negotiation over a ﬁnite set of pro- cesses P is a tuple N =(N, n ,n , X ), where: 0 f – N is a ﬁnite set of nodes. Each node is a pair n =(P ,R ) where P ⊆ P n n n is a non empty set of processes participating in node n, and R is a ﬁnite set of outcomes of node n (also called results), with R = {r }. We denote n f by R the union of all outcomes of nodes in N. – n is the ﬁrst node of the negotiation and n is the ﬁnal node. Every process 0 f in P participates in both n and n . 0 f 40 S. Akshay et al. EU PM Pa EU PM Pa court no-backstop backstop court no-court EU EU PM PM Pa Pa c-meet meet recess defend EU PM EU PM Pa deal agreed deal w/backstop debate EU PM Pa delay delay brexit EU PM Pa Fig. 1. A (sound but non-deterministic) negotiation modeling Brexit. – For all n ∈ N, X : P × R → 2 is a map deﬁning the transition relation n n n from node n, with X (p, r)= ∅ iﬀ n = n ,r = r . We denote X : N × P × n f f R → 2 the partial map deﬁned on ({n}× P × R ), with X (n, p, a)= n n n∈N X (p, a) for all p, a. Intuitively, at a node n =(P ,R ) in a negotiation, all processes of P have n n n to agree on a common outcome r chosen from R . Once this outcome r is chosen, every process p ∈ P is ready to move to any node prescribed by X (n, p, r). A new node m can only start when all processes of P are ready to move to m. Example 1. We illustrate negotiations by considering a simpliﬁed model of the Brexit negotiation, see Figure 1. There are 3 processes, P = {EU, PM, Pa}.At ﬁrst EU decides whether or not to enforce a backstop in any deal (outcome back- stop) or not (outcome no-backstop). In the meantime, PM decides to proroge Pa, and Pa can choose or not to appeal to court (outcome court/no court). If it goes to court, then PM and Pa will take some time in court (c-meet, defend), before PM can meet EU to agree on a deal. Otherwise, Pa goes to recess, and PM can meet EU directly. Once EU and PM agreed on a deal, PM tries to convince Pa to vote the deal. The ﬁnal outcome is whether the deal is voted, or whether Brexit is delayed. Deﬁnition 2 (Deterministic negotiations). A process p ∈ P is determinis- tic iﬀ, for every n ∈ N and every outcome r of n, X (n, p, r) is a singleton. A ne- gotiation is deterministic iﬀ all its processes are deterministic. It is weakly non- deterministic [9] (called weakly deterministic in [3]) iﬀ, for every node n, one of the processes in P is deterministic. Last, it is very weakly non-deterministic [9] (called weakly deterministic in [6]) iﬀ, for every n, every p ∈ P and every out- come r of n, there exists a deterministic process q such that q ∈ P for every n ∈X (n, p, r). Timed Negotiations 41 In deterministic negotiations, once an outcome is chosen, each process knows the next node it will be involved in. In (very-)weakly non-deterministic nego- tiations, the next node might depend upon the outcome chosen in other nodes by other processes. However, once the outcomes have been chosen for all cur- rent nodes, there is only one next node possible for each process. Observe that the class of deterministic negotiations is isomorphic to the class of free choice workﬂow nets [10]. In Example 1, the Brexit negotiation is non-deterministic, because process PM is non-deterministic. Indeed, consider outcomes c-meet:it allows two nodes, according to whether the backstop is enforced or not, which is a decision taken by process EU. Semantics: A conﬁguration [3] of a negotiation is a mapping M : P → 2 . Intuitively, it tells for each process p the set M(p) of nodes p is ready to engage in. The semantics of a negotiation is deﬁned in terms of moves from a conﬁguration to the next one. The initial M and ﬁnal M conﬁgurations, are given by M (p)= 0 f 0 {n } and M (p)= ∅ respectively for every process p ∈ P . A conﬁguration M 0 f enables node n if n ∈ M(p) for every p ∈ P . When n is enabled, a decision at node n can occur, and the participants at this node choose an outcome r ∈ R . The occurrence of (n, r) produces the conﬁguration M given by M (p)= X (n, p, r) for every p ∈ P and M (p)= M(p) for remaining processes in P \ P . n n n,r Moving from M to M after choosing (n, r)iscalleda step, denoted M −−→ M .A run of N is a sequence (n ,r ), (n ,r )...(n ,r ) such that there is a sequence of 1 1 2 2 k k conﬁgurations M ,M ,...,M and every (n ,r ) is a step between M and M . 0 1 k i i i−1 i A run starting from the initial conﬁguration and ending in the ﬁnal conﬁguration is called a ﬁnal run. By deﬁnition, its last step is (n ,r ). f f An important class of negotiations in the context of timed negotiations is acyclic negotiations, where inﬁnite sequence of steps is impossible: Deﬁnition 3 (Acyclic negotiations). The graph of a negotiation N is the labeled graph G =(V, E) where V = N, and E = {((n, (p, r),n ) | n ∈ X (n, p, r)}, with pairs of the form (p, r) being the labels. A negotiation is acyclic iﬀ its graph is acyclic. We denote by P aths(G ) the set of paths in the graph of a negotiation. These paths are of form π =(n , (p ,r ),n ) ... (n , (p ,r ),n ). 0 0 0 1 k−1 k k k The Brexit negotiation of Fig.1 is an example of acyclic negotiation. Despite their apparent simplicity, negotiations may express involved behaviors as shown with the Brexit example. Indeed two important questions in this setting are whether there is some way to reach a ﬁnal node in the negotiation from (i) the initial node and (ii) any reachable node in the negotiation. Deﬁnition 4 (Soundness and Reachability). 1. A negotiation is sound iﬀ every run from the initial conﬁguration can be extended to a ﬁnal run. The problem of soundness is to check if a given negotiation is sound. 2. The problem of reachability asks if a given negotiation has a ﬁnal run. 42 S. Akshay et al. Notice that the Brexit negotiation of Fig.1 is sound (but not deterministic). It seems hard to preserve the important features of this negotiation while being both sound and deterministic. The problem of soundness has received consider- able attention. We summarize the results about soudness in the next theorem: Theorem 1. Determining whether a negotiation is sound is PSPACE-Complete. For (very-)weakly non-deterministic negotiations, it is co-NP-complete [9]. For acyclic negotiations, it is in DP and co-NP-Hard [6]. Determining whether an acyclic weakly non-deterministic negotiation is sound is in PTIME [3, 9]. Fi- nally, deciding soundness for deterministic negotiations is NLOGSPACE-complete [9]. Checking reachability is NP-complete, even for deterministic acyclic negoti- ations (surprisingly, we did not ﬁnd this result stated before in the literature): Proposition 1. Reachability is NP-complete for acyclic negotiations, even if the negotiation is deterministic. Proof (sketch). One can guess a run of size ≤|N| in polynomial time, and verify if it reaches n , which gives the inclusion in NP. The hardness part comes from a reduction from 3-CNF-SAT that can be found in the proof of Theorem 3. k-Layered Acyclic Negotiations We introduce a new class of negotiations which has good algorithmic properties, namely k-layered acyclic negotiations, for k ﬁxed. Roughly speaking, nodes of a k-layered acyclic negotiations can be arranged in layers, and these layers contain at most k nodes. Before giving a formal deﬁnition, we need to deﬁne the depth of nodes in N . First, a path in a negotiation is a sequence of nodes n ...n such that for all i ∈{1,..., − 1}, there exists p ,r with n ∈X (n ,p ,r ). The length of a i i i+1 i i i path n ,...,n is . The depth depth(n) of a node n is the maximal length of a path from n to n (recall that N is acyclic, so this number is always ﬁnite). Deﬁnition 5. An acyclic negotiation is layered if for all node n, every path reaching n has length depth(n). An acyclic negotiation is k-layered if it is layered, and for all ∈ N, there are at most k nodes at depth . The Brexit example of Fig. 1 is 6-layered. Notice that a layered negotiation is necessarily k-layered for some k ≤|N| − 2. Note also that we can always transform an acyclic negotiation N into a layered acyclic negotiation N ,by adding dummy nodes: for every node m ∈X (n, p, r) with depth(m) > depth(n)+ 1, we can add several nodes n ,...n with = depth(m) − (depth(n) + 1), and processes P = {p}. We compute a new relation X such that X (n, p, r)= {n }, X (n ,p,r)= {m} and for every i ∈ 1.. − 1, X (n ,p,r)= n . This 1 i i+1 transformation is polynomial: the resulting negotiation is of size up to |N | × |X | × |P |. The proof of the following Theorem can be found in [1]. Theorem 2. Let k ∈ N . Checking reachability or soundness for a k-layered acyclic negotiation N can be done in PTIME. Timed Negotiations 43 3 Timed Negotiations In many negotiations, time is an important feature to take into account. For instance, in the Brexit example, with an initial node starting at the begining of st September 2019, there are 9 weeks to pass a deal till the 31 October deadline. We extend negotiations by introducing timing constraints on outcomes of nodes, inspired by timed Petri nets [14] and by the notion of negotiations with costs [10]. We use time intervals to specify lower and upper bounds for the duration of negotiations. More precisely, we attach time intervals to pairs (n, r) where n is a node and r an outcome. In the rest of the paper, we denote by I the set of intervals with endpoints that are non-negative integers or ∞.For convenience we only use closed intervals in this paper (except for ∞), but the results we show can also be extended to open intervals with some notational overhead. Intuitively, outcome r can be taken at a node n with associated time interval [a, b] only after a time units have elapsed from the time all processes contributing to n are ready to engage in n, and at most b time units later. Deﬁnition 6. A timed negotiation is a pair (N,γ) where N is a negotiation, and γ : N ×R →I associates an interval to each pair (n, r) of node and outcome such that r ∈ R . For a given node n and outcome r, we denote by γ (n, r) (resp. γ (n, r)) the lower bound (resp. the upper bound) of γ(n, r). Example 2. In the Brexit example, we deﬁne the following timed constraints γ. We only specify the outcome names, as the timing only depends upon them. Backstop and no-backstop both take between 1 and 2 weeks: γ(backstop) = γ(no-backstop) = [1, 2]. In case of no-court, recess takes 5 weeks γ(recess) = [5, 5], and PM can meet EU immediatly γ(meet) = [0, 0]. In case of court ac- tion, PM needs to spend 2 weeks in court γ(c-meet) = [2, 2], and depending on the court delay and decision, Pa needs between 3 (court overules recess) to 5 (court conﬁrms recess) weeks, γ(defend) = [3, 5]. Agreeing on a deal can take anywhere from 2 weeks to 2 years (104 weeks): γ(deal agreed) = [2, 104]—some would say inﬁnite time is even possible! It needs more time with the backstop, γ(deal w/backstop) = [5, 104]. All other outcomes are assumed to be immediate, i.e., associated with [0, 0]. ≥0 Semantics: A timed valuation is a map μ : P → R that associates a non- negative real value to every process. A timed conﬁguration is a pair (M, μ) where M is a conﬁguration and μ a timed valuation. There is a timed step from (M, μ) (n,r) (n,r) to (M ,μ ), denoted (M, μ) − −−→ (M ,μ ), if (i) M − −−→ M , (ii) p/ ∈ P implies μ (p)= μ(p) (iii) ∃d ∈ γ(n, r) such that ∀p ∈ P ,wehave μ (p)= max μ(p )+ d (d is the duration of node n). p ∈P (n,r) Intuitively a timed step (M, μ) − −−→ (M ,μ ) depicts a decision taken at node n, and how long each process of P waited in that node before taking decision (n, r). The last process engaged in n must wait for a duration contained in γ(n, r). However, other processes may spend a time greater than γ (n, r). 44 S. Akshay et al. A timed run is a sequence of steps ρ =(M ,μ ) −→ (M ,μ ) ... (M ,μ ) 0 0 1 1 k k where M is the initial conﬁguration, μ (p) = 0 for every p ∈ P , and each 0 0 (M ,μ ) −→ (M ,μ ) is a timed step. It is ﬁnal if M = M . Its execution i i i+1 i+1 k f time δ(ρ) is deﬁned as δ(ρ)=max μ (p). p∈P k Notice that we only attached timing to processes, not to individual steps. With our deﬁnition of runs, timing on steps may not be monotonous (i.e., non- decreasing) along the run, while timing on processes is. Viewed by the lens of concurrent systems, the timing is monotonous on the partial orders of the system rather than the linearization. It is not hard to restrict paths, if necessary, to have a monotonous timing on steps as well. In this paper, we are only interested in execution time, which does not depend on the linearization considered. Given a timed negotiation N , we can now deﬁne the minimum and maximum execution time, which correspond to optimistic or pessimistic views: Deﬁnition 7. Let N be a timed negotiation. Its minimum execution time, de- noted mintime(N ) is the minimal δ(ρ) over all ﬁnal timed run ρ of N.We deﬁne the maximal execution time maxtime(N ) of N similarly. Given T ∈ N, the main problems we consider in this paper are the following: – The mintime problem, i.e., do we have mintime(N ) ≤ T ?. In other words, does there exist a ﬁnal timed run ρ with δ(ρ) ≤ T ? – The maxtime problem, i.e., do we have maxtime(N ) ≤ T ?. In other words, does δ(ρ) ≤ T for every ﬁnal timed run ρ? These questions have a practical interest : in the Brexit example, the question “is there a way to have a vote on a deal within 9 weeks ?” is indeed a minimum execution time problem. We also address the equality variant of these decision problems, i.e., mintime(N)= T : is there a ﬁnal run of N that terminates in exactly T time units and no other ﬁnal run takes less than T time units? Similarly for maxtime(N)= T . Example 3. We use Fig. 1 to show that it is not easy to compute the minimal execution time, and in particular one cannot use the algorithm from [10] to com- pute it. Consider the node n with P = {PM, Pa} and R = {court, no court}. n n If the outcome is court, then PM needs 2 weeks before (s)he can talk to EU and Pa needs at least 3 weeks before he can debate. However, if the outcome is no court, then PM need not wait before (s)he can talk to EU, but Pa wastes 5 weeks in recess. This means that one needs to remember diﬀerent alternatives which could be faster in the end, depending on the future. On the other hand, the algorithm from [10] attaches one minimal time to process Pa, and one min- imal time to process PM. No matter the choices (0 or 2 for PM and 3 or 5 for Pa), there will be futures in which the chosen number will over or underap- proximate the real minimal execution time (this choice is not explicit in [10]) . the authors of [10] acknowledged the issue with their algorithm for mintime. Timed Negotiations 45 For maximum execution time, it is not an issue to attach to each node a unique maximal execution time. The reason for the asymmetry between minimal and maximal execution times of a negotiation is that the execution time of a path is max μ (p), for μ the last timed valuation, which breaks the symmetry p∈P k k between min and max. 4 High level view of the main results In this section, we give a high-level description of our main results. Formal statements can be found in the sections where they are proved. We gather in Fig. 2 the precise complexities for the minimal and the maximal execution time problems for 3 classes of negotiations that we describe in the following. Since we are interested in minimum and maximum execution time, cycles in negotiations can be either bypassed or lead to inﬁnite maximal time. Hence, while we deﬁne timed negotiations in general, we always restrict to acyclic negotiations (such as Brexit) while stating and proving results. In [10], a PTIME algorithm is given to compute diﬀerent costs for negoti- ations that are both sound and deterministic. One limitation of this result is that it cannot compute the minimum execution time, as explained in Example 3. A second limitation is that the class of sound and deterministic negotiations is quite restrictive: it cannot model situations where the next node a process participates in depends on the outcome from another process, as in the Brexit example. We thus consider classes where one of these restrictions is dropped. We ﬁrst consider (Section 5) negotiations that are deterministic, but with- out the soundness restriction. We show that for this class, no timed problem we consider can be solved in PTIME (unless NP=PTIME). Further, we show that the equality problems (maxtime/mintime(N)= T ), are complete for the complexity class DP, i.e., at the second level of the Boolean Hierarchy [15]. We then consider (Section 6) the class of negotiations that are sound, but not necessarily deterministic. We show that maximum execution time can be solved in PTIME, and propose a new algorithm. However, the minimum execution time cannot be computed in PTIME (unless NP=PTIME). Again for the mintime equality problem we have a matching DP-completeness result. Deterministic Sound k-layered Max ≤ T co-NP-complete (Thm. 3) PTIME (Prop. 3) PTIME (Thm. 6) Max = T DP-complete (Prop. 2) pseudo-PTIME (Thm. 8) Min ≤ T NP-complete (Thm. 3) NP-complete (Thm. 5) NP-complete (Thm. 7) Min = T DP-complete (Prop. 2) DP-complete (Prop. 4) pseudo-PTIME (Thm. 8) Fig. 2. Results for acyclic timed negotiations. DP refers to the complexity class, Dif- ference Polynomial time [15], the second level of the Boolean Hierarchy. hardness holds even for very weakly non-deterministic negotiations, and T in unary. hardness holds even for sound and very weakly non-deterministic negotiations. 46 S. Akshay et al. Finally, in order to obtain a polytime algorithm to compute the minimum execution time, we consider the class of k-layered negotiations (see Section 7): Given k ∈ N, we can show that maxtime(N ) can be computed in PTIME for k-layered negotiations. We also show that while the mintime(N ) ≤ T ? problem is weakly NP-complete for k-layered negotiations, we can compute mintime(N ) in pseudo-PTIME, i.e. in PTIME if constants are given in unary. 5 Deterministic Negotiations We start by considering the class of deterministic acyclic negotiations. We show that both maximal and minimal execution times cannot be computed in PTIME (unless NP=PTIME), as the threshold problems are (co-)NP-complete. Theorem 3. The mintime(N ) ≤ T decision problem is NP complete, and the maxtime(N ) ≤ T decision problem is co-NP-complete for acyclic deterministic timed negotiations. Proof. For mintime(N ) ≤ T , containment in NP is easy: we just need to guess a run ρ (of polynomial size as N is acyclic), consider the associated timed run ρ where all decisions are taken at their earliest possible dates, and check whether δ(ρ ) ≤ T , which can be done in time O(|N |+log T ). For the hardness, we give the proof in two steps. First, we start with a proof of Proposition 1 that reachability problem is NP-hard using reduction of 3-CNF SAT, i.e., given a formula φ, we build a deterministic negotiation N s.t. φ is satisﬁable iﬀ N has a ﬁnal run. In a second step, we introduce timings on this negotiation and show that mintime(N ) ≤ T iﬀ φ is satisﬁable. Step 1: Reducing 3-CNF-SAT to Reachability problem. Given a Boolean formula φ with variables v , 1 ≤ i ≤ n and clauses c , 1 ≤ j ≤ i j m, for each variable v we deﬁne the sets of clauses S = {c | v is present in c } i i,t j i j and S = {c |¬v is present in c }. Clauses in S and S are naturally i,f j i j i,t i,f ordered: c <c iﬀ i<j. We denote these elements S (1) <S (2) <.... i j i,t i,t Similarly for set S . i,f Now, we construct a negotiation N (as depicted in Figure 3) with a process V for each variable v and a process C for each clause c : i i j j – Initial node n has a single outcome r taking each process C to node Lone , 0 j c and each process V to node Lone . i v – Lone has three outcomes: if literal v ∈ c , then t is an outcome, taking c i j i C to P air , and if literal ¬v ∈ c , then f is an outcome, taking C to j c ,v i j i j j i P air . c ,¬v j i – The outcomes of Lone are true and false. Outcome true brings V to v i node Tlone and outcome false brings V to node F lone . v ,1 i v ,1 i i – Wehaveanode T lone for each j ≤|S | and F lone for each j ≤|S |, v ,j i,t v ,j i,f i i with V as only process. Let c = S (j). Node T lone has two outcomes i r i,t v ,j vton bringing V to T lone (or n if j = |S |), and vtoc bringing V i v ,j+1 f i,t i,r i to P air . The two outcomes from Flone are similar. c ,v v ,j r i i Timed Negotiations 47 V1 Vi Vn C1 j Ck Cm n0 r r r r r r Lone Lonec Lone vi V j Cj C c i k k f t i i 3 5 t t i i 2 4 true false [2, 2] [2, 2] T lone F lone fi v ,1 V V v ,1 i i i i vton vton vton ctof F lone v ,r Vi V j P airc ,¬v i j i vton ctof F lone v ,r+1 Vi ctof vton vton ctof i V C i k vton ctof V V V C j C C 1 i n 1 k m Fig. 3. A part of N where clause c is (i ∨¬i ∨¬i ) and clause c is (i ∨¬i ∨ i ). φ j 2 3 k 4 5 Timing is [0, 0] whereever not mentioned – Node P air has V and C as its processes and one outcome ctof which c ,v i r r i takes process C to ﬁnal node n and process V to T lone (with c = r f i v ,j+1 r S (j)), or to n if j = |S |.Node P air is deﬁned in the same way i,t f i,t c ,¬v r i from F lone . v ,j With this we claim that N has a ﬁnal run iﬀ φ is satisﬁable which completes the ﬁrst step of the proof. We give a formal proof of this claim in Appendix A of [1]. Observe that the negotiation N constructed is deterministic and acyclic (but it is not sound). Step 2 : Before we introduce timing on N , we introduce a new outcome r at n which takes all processes to n . Now, the timing function γ associated 0 f with N is: γ(n ,r)=[2, 2] and γ(n ,r )=[3, 3] and γ(n, r)=[0, 0], for all φ 0 0 node n = n and all r ∈ R . Then, mintime(N ) ≤ 2iﬀ φ has a satisﬁable 0 n φ assignment: if mintime(N ) ≤ 2, there is a run with decision r taken at n φ 0 which is ﬁnal. But existence of any such ﬁnal run implies satisﬁability of φ.For vtoci,j vtoc i,k ctof 48 S. Akshay et al. reverse implication, if φ is satisﬁable, then the corresponding run for satisfying assignment takes 2 time units, which means that mintime(N ) ≤ 2. Similarly, we can prove that the MaxTime problem is co-NP complete by changing γ(n ,r )=[1, 1] and asking if maxtime(N ) > 1 for the new N . The 0 φ φ answer will be yes iﬀ φ is satisﬁable. We now consider the related problem of checking if mintime(N)= T (or if maxtime(N)= T ). These problems are harder than their threshold variant un- der usual complexity assumptions: they are DP-complete (Diﬀerence Polynomial time class, i.e., second level of the Boolean Hierarchy, deﬁned as intersection of a problem in NP and one in co-NP [15]). Proposition 2. The mintime(N)= T and maxtime(N)= T decision prob- lems are DP-complete for acyclic deterministic negotiations. Proof. We only give the proof for mintime (the proof for maxtime is given in Appendix A of [1]). Indeed, it is easy to see that this problem is in DP, as it can be written as mintime(N ) ≤ T which is in NP and ¬(mintime(N ) ≤ T − 1)), which is in co-NP. To show hardness, we use the negotiation constructed in the above proof as a gadget, and show a reduction from the SAT-UNSAT problem (a standard DP-complete problem). The SAT-UNSAT Problem asks given two Boolean expressions φ and φ , both in CNF forms with three literals per clause, is it true that φ is satisﬁable and φ is unsatisﬁable? SAT-UNSAT is known to be DP-complete [15]. We reduce this problem to mintime(N)= T . Given φ, φ , we ﬁrst make the corresponding negotiations N and N as in the previous proof. Let n and n be the initial and ﬁnal nodes of N and 0 f φ n and n be the initial and ﬁnal nodes of N . (Similarly, for other nodes we 0 f φ write above the nodes to signify they belong to N .) In the negotiation N , we introduce a new node n , in which all the pro- all cesses participate (see Figure 4). The node n has a single outcome r which all all sends all the processes to n . Also, for node n , apart from the outcome r which sends all processes to diﬀerent nodes, there is another outcome r which sends all all the processes to n . Now we merge the nodes n and n and call the merged all f node n . Also nodes n and n now have all the processes of N and N sep 0 φ f φ participating in them. This merged process gives us a new negotiation N in φ,φ which the structure above n is same as N while below it is same as N . sep φ Node n now has all the processes of N and N participating in it. The sep φ outcomes of n will be same as that of n (r ,r). For both the outcomes of sep all n the processes corresponding to N directly go to n of the N . Similarly sep φ f φ,φ n of N which is same n of N sends processes corresponding to N di- 0 0 φ, φ,φ φ rectly to n for all its outcomes. We now deﬁne timing function γ for N sep φ,φ which is as follows: γ(Lone ,r)=[1, 1] for all v ∈ φ and r ∈{true, false}, γ(n ,r )=[2, 2] and γ(n, r)=[0, 0] for all other outcomes of nodes. With this all all construction, one can conclude that mintime(N )=2iﬀ φ is satisﬁable and φ,φ φ is unsatisﬁable (see [1] for details). This completes the reduction and hence proves DP-hardness. Timed Negotiations 49 V1 Vn C1 Cm V V C C 1 1 n m n0 r r r Structure [0, 0] r r r r of N vton vton ctof ctof V V C C V C 1 n 1 m V1 C1 n m rall all n all sep r r r r all Structure r, rall r, rall r, rall r, rall V V C C [0, 0] 1 n 1 m [2, 2] [1, 1] of N all r r all r all all all ctof vton ctof vton V V C C V C 1 n 1 m V C 1 n 1 m Fig. 4. Structure of N φ,φ Finally, we consider a related problem of computing the min and max time. To consider the decision variant, we rephrase this problem as checking whether an arbitrary bit of the minimum execution time is 1. Perhaps surprisingly, we obtain that this problem goes even beyond DP, the second level of the Boolean Hierarchy and is in fact hard for Δ (second level of the polynomial hierarchy), which contains the entire Boolean Hierarchy. Formally, Theorem 4. Given an acyclic deterministic timed negotiation and a positive th integer k,computing the k bit of the maximum/minimum execution time is Δ -complete. Finally, we remark that if we were interested in the optimization variant and not the decision variant of the problem, the above proof can be adapted to show that these variants are OptP-complete (as deﬁned in [13]). But as optimization is not the focus of this paper, we avoid formal details of this proof. 6 Sound Negotiations Sound negotiations are negotiations in which every run can be extended to a ﬁnal run, as in Fig. 1. In this section, we show that maxtime(N ) can be computed in PTIME for sound negotiations, hence giving PTIME complexi- ties for the maxtime(N ) ≤ T ? and maxtime(N)= T ? questions. However, we 50 S. Akshay et al. show that mintime(N ) ≤ T is NP-complete for sound negotiations, and that mintime(N)= T is DP-complete, even if T is given in unary. Consider the graph G of a negotiation N . Let π =(n , (p ,r ),n ) ··· N 0 0 0 1 (n , (p ,r ),n ) be a path of G . We deﬁne the maximal execution time of k k k k+1 N + + a path π as the value δ (π)= γ (n ,r ). We say that a path π = i i i∈0..k (n ,r ) (n , (p ,r ),n ) ··· (n , (p ,r ),n ) is a path of some run ρ =(M ,μ ) −→ 0 0 0 1 +1 1 1 ··· (M ,μ )if r ,...,r is a subword of r ,...,r . k k 0 Lemma 1. Let N be an acyclic and sound timed negotiation. Then maxtime(N ) + + = max δ (π)+ γ (n ,r ). π∈P aths(G ) f f + + Proof. Let us ﬁrst prove that maxtime(N ) ≥ max δ (π)+γ (n ,r ). π∈P aths(G ) f f Consider any path π of G , ending in some node n. First, as N is sound, we can compute a run ρ such that π is a path of ρ , and ρ ends in a conﬁguration π π π in which n is enabled. We associate with ρ the timed run ρ which asso- ciates to every node the latest possible execution date. We have easily δ(ρ ) ≥ + + + δ (π), and then we obtain max δ(ρ ) ≥ max δ (π). As π∈P aths(G ) π∈P aths(G ) N π N maxtime(N ) is the maximal duration over all runs, it is hence necessarily greater + + than max δ(ρ )+ γ (n ,r ). π∈P aths(G ) f f N π + + We now prove that maxtime(N ) ≤ max δ (π)+γ (n ,r ). Take π∈P aths(G ) f f (n ,r ) 1 1 any timed run ρ =(M ,μ ) −→ · · · (M ,μ )of N with a unique maximal node 1 1 k k n . We show that there exists a path π of ρ such that δ(ρ) ≤ δ (π) by induction on the length k of ρ. The initialization is trivial for k =1.Let k ∈ N. Because n + + is the unique maximal node of ρ, we have δ (ρ) = max μ (p)+γ (n ,r ). p∈P k−1 k k We choose one p maximizing μ (p). Let <k be the maximal index of a k−1 k−1 decision involving process p (i.e. p ∈ P ). Now, consider the timed run k−1 k−1 n ρ subword of ρ, but with n as unique maximal node (that is, it is ρ where nodes n ,i > has been removed, but also where some nodes n ,i < have been i i removed if they are not causally before n (in particular, P ∩ P = ∅).) n n + + + + By deﬁnition, we have that δ (ρ)= δ (ρ )+ γ (n ,r )+ γ (n ,r ). We k k apply the induction hypothesis on ρ , and obtain a path π of ρ ending in + + + n such that δ (ρ )+ γ (n ,r ) ≤ δ (π ). It suﬃces to consider path π = + + + π .(n , (p ,r ),n ) to prove the inductive step δ (ρ) ≤ δ (π)+ γ (n ,r ). k−1 k k k + + + Thus maxtime(N ) = max δ (ρ) ≤ max δ (π)+ γ (n ,r ). π∈P aths(G ) f f Lemma 1 gives a way to evaluate the maximal execution time. This amounts to ﬁnding a path of maximal weight in an acyclic graph, which is a standard PTIME problem that can be solved using standard max-cost calculation. Proposition 3. Computing the maximal execution time for an acyclic sound negotiation N =(N, n ,n , X ) can be done in time O(|N| + |X |). 0 f A direct consequence is that maxtime(N ) ≤ T and maxtime(N)= T prob- lems can be solved in polynomial time when N is sound. Notice that if N is deterministic but not sound, then Lemma 1 does not hold: we only have an inequality. Timed Negotiations 51 We now turn to mintime(N ). We show that it is strictly harder to compute for sound negotiations than maxtime(N ). Theorem 5. mintime(N ) ≤ T is NP-complete in the strong sense for sound acyclic negotiations, even if N is very weakly non-deterministic. Proof (sketch). First, we can decide mintime(N ) ≤ T in NP. Indeed, one can guess a ﬁnal (untimed) run ρ of size ≤|N|, consider ρ the timed run corre- sponding to ρ where all outcomes are taken at the earliest possible dates, and − − compute in linear time δ(ρ ), and check that δ(ρ ) ≤ T . The hardness part is obtained by reduction from the Bin Packing problem. The reduction is similar to Knapsack, that we will present in Thm. 7. The diﬀerence is that we use bins in parallel, rather than 2 processes, one for the weight and one for the value. The hardness is thus strong, but the negotiation is not k-layered for a bounded k (it is 2 + 1 bounded, with depending on the input). A detailed proof is given in Appendix B of [1]. We show that mintime(N)= T is harder to decide than mintime(N ) ≤ T , with a proof similar to Prop. 2. Proposition 4. The mintime(N)= T ? decision problem is DP-complete for sound acyclic negotiations, even if it is very weakly non-deterministic. An open question is whether the minimal execution time can be computed in PTIME if the negotiation is both sound and deterministic. The reduction from Bin Packing does not work with deterministic (and sound) negotiations. 7 k-Layered Negotiations In this section, we consider k-layeredness, a syntactic property that can be eﬃ- ciently veriﬁed (see Section 2). 7.1 Algorithmic properties Let k be a ﬁxed integer. We ﬁrst show that the maximum execution time can be computed in PTIME for k-layered negotiations. Let N be the set of nodes at layer i. We deﬁne for every layer i the set S of subsets of nodes X ⊆ N which i i can be jointly enabled and such that for every process p, there is exactly one node n(X, p)in X with p ∈ n(X, p). An element X in S is a subset of nodes that can be selected by solving all non-determnism with an appropriate choice of outcomes. Formally, we deﬁne S inductively. We start with S = {n }. We then i 0 0 deﬁne S from the contents of layer S :wehave Y ∈ S iﬀ P = P i+1 i i+1 n n∈Y and there exist X ∈ S and an outcome r ∈ R for every m ∈ X, such that i m m n ∈X (n(X, p),p,r ) for each n ∈ Y and p ∈ P . m n Theorem 6. Let k ∈ N . Computing the maximum execution time for a k- layered acyclic negotiation N can be done in PTIME. More precisely, the worst- k+1 case time complexity is O(|P|·|N| ). 52 S. Akshay et al. Proof (Sketch). The ﬁrst step is to compute S layer by layer, by following its inductive deﬁnition. The set S is of size at most 2 ,as |N | <k by deﬁnition of i i k-layeredness. Knowing S , it is easy to build S by induction. This takes time i i+1 k+1 in O(|P ||N | ) : We need to consider all k-uples of outcomes for each layer. There can be |N | such tuples. We need to do that for all processes (|P |), and for all layers (at most |N |). We then keep for each subset X ∈ S and each node n ∈ X, the maximal time f (n, X) ∈ N associated with n and X. From S and f , we inductively i i+1 i compute f in the following way: for all X ∈ S with successor Y ∈ S i+1 i i+1 for outcomes (r ) , we denote f (Y, n, X) = max f (X, n(X, p)) + p p∈P i+1 p∈P (n) i γ (n(X, p),r ). If there are several choices of (r ) leading to the same Y , p p p∈P we take r with the maximal f (X, n(X, p)) + γ (n(X, p),r ). We then deﬁne p i p f (Y, n) = max f (Y, n, X). Again, the initialization is trivial, with i+1 X∈S i+1 f ({n },n ) = 0. The maximal execution time of N is f({n },n ). 0 0 0 f f We can bound the complexity precisely by O(d(N ) · C(N ) ·||R|| ), with: – d(N ) ≤|N| the depth of n , that is the number of layers of N , and ||R|| is the maximum number of outcomes of a node, – C(N ) = max |S |≤ 2 , which we will call the number of contexts of N , and i i which is often much smaller than 2 . ∗ ∗ – k = max |X|≤ k. We say that N is k -thread bounded, meaning X∈ S that there cannot be more that k nodes in the same context X of any layer. Usually, k is strictly smaller than k = max |N |,as N = X. i i i X∈S Consider again the Brexit example Figure 1. We have (k + 1) = 7, while ∗ ∗ we have the depth d(N ) = 6, the negotiation is k = 3-thread bounded (k is bounded by the number of processes), ||R|| = 2, and the number of contexts is at most C(N ) = 4 (EU chooses to enforce backstop or not, and Pa chooses to go to court or not). 7.2 Minimal Execution Time As with sound negotiations, computing minimal time is much harder than com- puting the maximal time for k-layered negotiations: Theorem 7. Let k ≥ 6. The Min ≤ T problem is NP-Complete for k-layered acyclic negotiations, even if the negotiation is sound and very weakly non-deterministic. Proof. One can guess in polynomial time a ﬁnal run of size ≤|N|. If the exe- cution time of this ﬁnal run is smaller than T then we have found a ﬁnal run witnessing mintime(N ) ≤ T . Hence the problem is in NP. Let us now show that the problem is NP-hard. We proceed by reduction from the Knapsack decision problem. Let us consider a set of items U = {u ,...u } 1 n of respective values v ,...v and weight w ,...,w and a knapsack of maximal 1 n 1 n capacity W . The knapsack problem asks, given a value V whether there exists a subset of items U ⊆ U such that v ≥ V and such that w ≤ W . i i u ∈U u ∈U i i Timed Negotiations 53 n = C 0 1 p p p p p p 1 2 n 2n n+2 n+1 yes yes yes yes yes yes no no no no no no p1 p1 p2 pn p2n pn+2 pn+1 pn+1 a b no yes yes 1 1 no c 1 0 C3 p1 p2 p1 p2 p3 pn p2n pn+3 pn+2 pn+1 pn+2 pn+1 p p n 2n no yes no p1 pn p1 pn p2n pn+1 p2n pn+1 b c 0 n n p p p p 1 n 2n n+1 Fig. 5. The negotiation encoding Knapsack We build a negotiation with 2n processes P = {p ,...p }, as shown in 1 2n Fig. 5. Intuitively, p ,i ≤ n will serve to encode the value of selected items as timing, while p ,i > n will serve to encode the weight of selected items as timing. Concerning timing constraints for outcomes we do the following: Outcomes 0, yes and no are associated with [0, 0]. Outcome c is associated with [w ,w ], i i i the weight of u . Last, outcome b is associated with a more complex function, i i (v −v )W v W max i max such that b ≤ W iﬀ v ≥ V . For that, we set [ , ] for i i i i n·v −V n·v −v max max i outcome b , where v is the largest value of an item, and V is the total value i max (v )W v W max max we want to reach at least. Also, we set [ , ] for outcome a .We n·v −V n·v −v max max i set T = W , the maximal weight of the knapsack. Now, consider a ﬁnal run ρ in N . The only choices in ρ are outcomes yes or no from C ,...,C . Let I be the set of indices such that yes is the outcome from 1 n all C in this path. We obtain δ(ρ) = max( a + b , c ). We have i i i i i/ ∈I i∈I i∈I δ(ρ) ≤ T = W iﬀ w ≤ W , that is the sum of the weights is lower than i∈I (v )W (v −v )W max max i W , and + ≤ W . That is, n · v − v ≤ max i i/ ∈I n·v −V i∈I n·v −V i∈I max max n · v − V , i.e. v ≥ V . Hence, there exists a path ρ with δ(ρ) ≤ T = W max i i∈I iﬀ there exists a set of items of weight less than W and of value more than V . It is well known that Knapsack is weakly NP-hard, that is, it is NP-hard only when weights/values are given in binary. This means that Thm. 7 shows that minimum execution time ≤ T is NP-hard only when T is given in binary. We 54 S. Akshay et al. can actually show that for k-layered negotiations, the mintime(N ) ≤ T problem can be decided in PTIME if T is given in unary (i.e. if T is not too large): Theorem 8. Let k ∈ N. Given a k-layered negotiation N and T written in unary, one can decide in PTIME whether the minimum execution time of N is ≤ T . The worst-case time complexity is O(|N | · |P|· (T ·|N|) ). Proof. We will remember for each layer i aset T of functions τ from nodes N i i of layer i to a value in {1,...,T, ⊥}. Basically, we have τ ∈T if there exists a path ρ reaching X = {n ∈ N | τ(n) = ⊥}, and this path reaches node n ∈ X after τ(n) time units. As for S , for all p, we should have a unique node n(τ, p) such that p ∈ n(τ, p) and τ(n(τ, p)) = ⊥. Again, it is easy to initialize T = {τ }, 0 0 with τ (n )=0, and τ (n)= ⊥ for all n = n . 0 0 0 0 Inductively, we build T in the following way: τ ∈T iﬀ there exists a i+1 i+1 i+1 τ ∈T and r ∈ R for all p ∈ P such that for all n with τ (n) = ⊥,we i i p i+1 n(τ ,p) have τ (n) = max τ (n(τ ,p)) + γ(n(τ ,p),r ). i+1 p i i p We have that the minimum execution time for N is min τ(n ), for n the τ∈T τ depth of n . There are at most T functions τ in any T , and there are at most f i |N | layers to consider, giving the complexity. As with Thm. 6, we can more accurately state the complexity as O(d(N ) · ∗ ∗ k k −1 ∗ C(N )·||R|| · T ). The k − 1 is because we only need to remember minimal functions τ ∈T :if τ (n) ≥ τ(n) for all n, then we do not need to keep τ in T . i i In particular, for the knapsack encoding in the proof of Thm. 7, we have k =3, ||R|| = 2 and C(N ) = 4. Notice that if k is part of the input, then the problem is strongly NP-hard, even if T is given in unary, as e.g. encoding bin packing with bins result to a 2 + 1-layered negotiations. 8 Conclusion In this paper, we considered timed negotiations. We believe that time is of the essence in negotiations, as exampliﬁed by the Brexit negotiation. It is thus im- portant to be able to compute in a tractable way the minimal and maximal execution time of negotiations. We showed that we can compute in PTIME the maximal execution time for acyclic negotiations that are either sound or k-layered, for k ﬁxed. We showed that we cannot compute in PTIME the max- imal execution time for negotiations that are not sound nor k-layered, even if they are deterministic and acyclic (unless NP=PTIME). We also showed that surprisingly, computing the minimal execution time is much harder, with strong NP-hardness results in most of the classes of negotiations, contradicting a claim in [10]. We came up with a new reasonable class of negotiations, namely k-layered negotiations, which enjoys a pseudo PTIME algorithm to compute the minimal execution time. That is, the algorithm is PTIME when the timing constants are given in unary. We showed that this restriction is necessary, as the prob- lem becomes NP-hard for constants given in binary, even when the negotiation is sound and very weakly non-deterministic. The problem to know whether the minimal execution time can be computed in PTIME for deterministic and sound negotiation remains open. Timed Negotiations 55 References 1. S. Akshay, B. Genest, L. H´elou¨et, and S. Mital. Timed Negotiations (extended version). In Research report, https://hal.inria.fr/hal-02337887, 2020. 2. J. Desel. Reduction and Design of Well-behaved Concurrent Systems. In CONCUR ’90, Theories of Concurrency: Uniﬁcation and Extension, Amsterdam, The Nether- lands, August 27-30, 1990, Proceedings, volume 458 of Lecture Notes in Computer Science, pages 166–181. Springer, 1990. 3. J. Desel, J. Esparza, and P. Hoﬀmann. Negotiation as Concurrency Primitive. Acta Inf., 56(2):93–159, 2019. 4. J. Esparza. Decidability and Complexity of Petri Net Problems - An Introduc- tion. In Lectures on Petri Nets I: Basic Models, Advances in Petri Nets, Dagstuhl, September 1996, volume 1491 of Lecture Notes in Computer Science, pages 374– 428. Springer, 1998. 5. J. Esparza and J. Desel. Free Choice Petri Nets. Cambridge University Press, 6. J. Esparza and J. Desel. On Negotiation as Concurrency Primitive. In CON- CUR 2013 - Concurrency Theory - 24th International Conference, CONCUR 2013, Buenos Aires, Argentina, August 27-30, 2013. Proceedings, volume 8052 of Lecture Notes in Computer Science, pages 440–454. Springer, 2013. 7. J. Esparza and J. Desel. On Negotiation as Concurrency Primitive II: Deterministic Cyclic Negotiations. In FOSSACS’14, volume 8412 of Lecture Notes in Computer Science, pages 258–273. Springer, 2014. 8. J. Esparza and P. Hoﬀmann. Reduction Rules for Colored Workﬂow Nets. In Fundamental Approaches to Software Engineering - 19th International Confer- ence, FASE 2016, Held as Part of the European Joint Conferences on Theory and Practice of Software, ETAPS 2016, Eindhoven, The Netherlands, April 2-8, 2016, Proceedings, volume 9633 of Lecture Notes in Computer Science, pages 342–358. Springer, 2016. 9. J. Esparza, D. Kuperberg, A. Muscholl, and I. Walukiewicz. Soundness in Negoti- ations. Logical Methods in Computer Science, 14(1), 2018. 10. J. Esparza, A. Muscholl, and I. Walukiewicz. Static Analysis of Deterministic Ne- gotiations. In 32nd Annual ACM/IEEE Symposium on Logic in Computer Science, LICS 2017, Reykjavik, Iceland, June 20-23, 2017, pages 1–12, 2017. 11. S. Haddad. A Reduction Theory for Coloured Nets. In Advances in Petri Nets 1989, volume 424 of Lecture Notes in Computer Science, pages 209–235. Springer, 12. P. Hoﬀmann. Negotiation Games. In Javier Esparza and Enrico Tronci, editors, Proceedings Sixth International Symposium on Games, Automata, Logics and For- mal Veriﬁcation, GandALF 2015, Genoa, Italy, 21-22nd September 2015., volume 193 of EPTCS, pages 31–42, 2015. 13. M. W. Krentel. The Complexity of Optimization Problems. Journal of computer and system sciences, 36(3):490–509, 1988. 14. P.M. Merlin. A Study of the Recoverability of Computing Systems. PhD thesis, University of California, Irvine, CA, USA, 1974. 15. C. H. Papadimitriou and M. Yannakakis. The Complexity of Facets (and Some Facets of Complexity). In Proceedings of the Fourteenth Annual ACM Symposium on Theory of Computing, STOC ’82, pages 255–260, New York, NY, USA, 1982. ACM. 56 S. Akshay et al. 16. R.H. Sloan and U.A. Buy. Reduction Rules for Time Petri Nets. Acta Inf., 33(7):687–706, 1996. Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made. The images or other third party material in this chapter are included in the chapter’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. Cartesian Diﬀerence Categories 1 2 Mario Alvarez-Picallo and Jean-Simon Pacaud Lemay () Department of Computer Science, University of Oxford, Oxford, UK mario.alvarez-picallo@cs.ox.ac.uk Department of Computer Science, University of Oxford, Oxford, UK jean-simon.lemay@kellogg.ox.ac.uk Abstract. Cartesian diﬀerential categories are categories equipped with a diﬀerential combinator which axiomatizes the directional derivative. Important models of Cartesian diﬀerential categories include classical diﬀerential calculus of smooth functions and categorical models of the diﬀerential λ-calculus. However, Cartesian diﬀerential categories cannot account for other interesting notions of diﬀerentiation such as the calcu- lus of ﬁnite diﬀerences or the Boolean diﬀerential calculus. On the other hand, change action models have been shown to capture these examples as well as more “exotic” examples of diﬀerentiation. However, change action models are very general and do not share the nice properties of a Cartesian diﬀerential category. In this paper, we introduce Cartesian diﬀerence categories as a bridge between Cartesian diﬀerential categories and change action models. We show that every Cartesian diﬀerential cat- egory is a Cartesian diﬀerence category, and how certain well-behaved change action models are Cartesian diﬀerence categories. In particular, Cartesian diﬀerence categories model both the diﬀerential calculus of smooth functions and the calculus of ﬁnite diﬀerences. Furthermore, ev- ery Cartesian diﬀerence category comes equipped with a tangent bundle monad whose Kleisli category is again a Cartesian diﬀerence category. Keywords: Cartesian Diﬀerence Categories · Cartesian Diﬀerential Cat- egories · Change Actions · Calculus Of Finite Diﬀerences · Stream Cal- culus. 1 Introduction In the early 2000s, Ehrhard and Regnier introduced the diﬀerential λ-calculus [10], an extension of the λ-calculus equipped with a diﬀerential combinator ca- pable of taking the derivative of arbitrary higher-order functions. This develop- ment, based on models of linear logic equipped with a natural notion of “deriva- tive” [11], sparked a wave of research into categorical models of diﬀerentiation. One of the most notable developments in the area is the introduction of Cartesian diﬀerential categories [4] by Blute, Cockett and Seely, which provide an abstract categorical axiomatization of the directional derivative from diﬀerential The second author is ﬁnancially supported by Kellogg College, the Oxford-Google Deep Mind Graduate Scholarship, and the Clarendon Fund. The Author(s) 2020 J. Goubault-Larrecq and B. K¨ onig (Eds.): FOSSACS 2020, LNCS 12077, pp. 57–76, 2020. https://doi.org/10.1007/978-3-030-45231-5_4 58 M. Alvarez-Picallo and J.-S. P. Lemay calculus. The relevance of Cartesian diﬀerential categories lies in their ability to model both “classical” diﬀerential calculus (with the canonical example being the category of Euclidean spaces and smooth functions between) and the diﬀerential λ-calculus (as every categorical model for it gives rise to a Cartesian diﬀerential category [14]). However, while Cartesian diﬀerential categories have proven to be an immensely successful formalism, they have, by design, some limitations. Firstly, they cannot account for certain “exotic” notions of derivative, such as the diﬀerence operator from the calculus of ﬁnite diﬀerences [16] or the Boolean diﬀerential calculus [19]. This is because the axioms of a Cartesian diﬀerential category stipulate that derivatives should be linear in their second argument (in the same way that the directional derivative is), whereas these aforementioned discrete sorts of derivative need not be. Additionally, every Cartesian diﬀerential category is equipped with a tangent bundle monad [7, 15] whose Kleisli category can be intuitively understood as a category of generalized vector ﬁelds. This Kleisli category has an obvious diﬀerentiation operator which comes close to making it a Cartesian diﬀerential category, but again fails the requirement of being linear in its second argument. More recently, discrete derivatives have been suggested as a semantic frame- work for understanding incremental computation. This led to the development of change structures [6] and change actions [2]. Change action models have been successfully used to provide a model for incrementalizing Datalog programs [1], but have also been shown to model the calculus of ﬁnite diﬀerences as well as the Kleisli category of the tangent bundle monad of a Cartesian diﬀerential cate- gory. Change action models, however, are very general, lacking many of the nice properties of Cartesian diﬀerential categories (for example, addition in a change action model is not required to be commutative), even though they are veriﬁed in most change action models. As a consequence of this generality, the tangent bundle endofunctor in a change action model can fail to be a monad. In this work, we introduce Cartesian diﬀerence categories (Section 4.2), whose key ingredients are an inﬁnitesimal extension operator and a diﬀerence combi- nator, whose axioms are a generalization of the diﬀerential combinator axioms of a Cartesian diﬀerential category. In Section 4.3, we show that every Cartesian diﬀerential category is, in fact, a Cartesian diﬀerence category whose inﬁnites- imal extension operator is zero, and conversely how every Cartesian diﬀerence category admits a full subcategory which is a Cartesian diﬀerential category. In Section 4.4, we show that every Cartesian diﬀerence category is a change action model, and conversely how a full subcategory of suitably well-behaved objects of a change action model is a Cartesian diﬀerence category. In Section 6, we show that every Cartesian diﬀerence category comes equipped with a monad whose Kleisli category again a Cartesian diﬀerence category. Finally, in Section 5 we provide some examples of Cartesian diﬀerence categories; notably, the calculus of ﬁnite diﬀerences and the stream calculus. Cartesian Diﬀerence Categories 59 2 Cartesian Diﬀerential Categories In this section, we brieﬂy review Cartesian diﬀerential categories, so that the reader may compare Cartesian diﬀerential categories with the new notion of Cartesian diﬀerence categories which we introduce in the next section. For a full detailed introduction on Cartesian diﬀerential categories, we refer the reader to the original paper [4]. 2.1 Cartesian Left Additive Categories Here we recall the deﬁnition of Cartesian left additive categories [4] – where “additive” is meant being skew enriched over commutative monoids, which in particular means that we do not assume the existence of additive inverses, i.e., “negative elements”. By a Cartesian category we mean a category X with chosen ﬁnite products where we denote the binary product of objects A and B by A × B with projection maps π : A × B → A and π : A × B → B and pairing 0 1 operation −, −, and the chosen terminal object as with unique terminal maps ! : A →. Deﬁnition 1. A left additive category [4] is a category X such that each hom-set X(A, B) is a commutative monoid with addition operation +: X(A, B)× X(A, B) → X(A, B) and zero element (called the zero map) 0 ∈ X(A, B), such that pre-composition preserves the additive structure: (f + g) ◦ h = f ◦ h + g ◦ h and 0◦f =0. A map k in a left additive category is additive if post-composition by k preserves the additive structure: k ◦ (f + g)= k ◦ f + k ◦ g and k ◦ 0=0. A Cartesian left additive category [4] is a Cartesian category X which is also a left additive category such all projection maps π : A × B → A and π : A × B → B are additive. We note that the deﬁnition given here of a Cartesian left additive category is slightly diﬀerent from the one found in [4], but it is indeed equivalent. By [4, Proposition 1.2.2], an equivalent axiomatization is of a Cartesian left additive category is that of a Cartesian category where every object comes equipped with a commutative monoid structure such that the projection maps are monoid morphisms. This will be important later in Section 4.2. 2.2 Cartesian Diﬀerential Categories Deﬁnition 2. A Cartesian diﬀerential category [4] is a Cartesian left ad- ditive category equipped with a diﬀerential combinator D of the form f : A → B D[f]: A × A → B verifying the following coherence conditions: [CD.1] D[f + g]= D[f]+ D[g] and D[0]=0 60 M. Alvarez-Picallo and J.-S. P. Lemay [CD.2] D[f] ◦x, y + z = D[f] ◦x, y + D[f] ◦x, z and D[f] ◦x, 0 =0 [CD.3] D[1 ]= π and D[π ]= π ◦ π and D[π ]= π ◦ π A 1 0 0 1 1 1 1 [CD.4] D[f, g]= D[f], D[g] and D[! ]=! A A×A [CD.5] D[g ◦ f]= D[g] ◦f ◦ π , D[f] [CD.6] D [D[f]] ◦x, y, 0,z = D[f] ◦x, z [CD.7] D [D[f]] ◦x, y, z, 0 = D [D[f]] ◦x, z, y, 0 Note that here, following the more recent work on Cartesian diﬀerential cat- egories, we’ve ﬂipped the convention found in [4], so that the linear argument is in the second argument rather than in the ﬁrst argument. We highlight that by [7, Proposition 4.2], the last two axioms [CD.6] and [CD.7] have an equivalent alternative expression. Lemma 1. In the presence of the other axioms, [CD.6] and [CD.7] are equiv- alent to: [CD.6.a] D [D[f]] ◦x, 0, 0,y = D[f] ◦x, y [CD.7.a] D [D[f]] ◦x, y, z, w = D [D[f]] ◦x, z, y, w As a Cartesian diﬀerence category is a generalization of a Cartesian diﬀer- ential category, we leave the discussion of the intuition of these axioms for later in Section 4.2 below. We also refer to [4, Section 4] for a term calculus which may help better understand the axioms of a Cartesian diﬀerential category. The canonical example of a Cartesian diﬀerential category is the category of real smooth functions, which we will discuss in Section 5.1. Other interesting exam- ples of can be found throughout the literature such as categorical models of the diﬀerential λ-calculus [10, 14], the subcategory of diﬀerential objects of a tangent category [7], and the coKleisli category of a diﬀerential category [3, 4]. 3 Change Action Models Change actions [1, 2] have recently been proposed as a setting for reasoning about higher-order incremental computation, based on a discrete notion of diﬀerentia- tion. Together with Cartesian diﬀerential categories, they provide the core ideas behind Cartesian diﬀerence categories. In this section, we quickly review change actions and change action models, in particular, to highlight where some of the axioms of a Cartesian diﬀerence category come from. For more details on change actions, we invite readers to see the original paper [2]. 3.1 Change Actions Deﬁnition 3. A change action A in a Cartesian category X is a quintuple A ≡ (A, ΔA, ⊕ , + , 0 ) consisting of two objects A and ΔA, and three maps: A A A ⊕ : A × ΔA → A + : ΔA × ΔA → ΔA 0 : → ΔA A A A such that (ΔA, + , 0 ) is a monoid and ⊕ : A × ΔA → A is an action of ΔA A A A on A, that is, the following equalities hold: ⊕ ◦1 , 0 ◦! =1 ⊕ ◦(1 × + )= ⊕ ◦ (⊕ × 1 ) A A A A A A A A A A ΔA Cartesian Diﬀerence Categories 61 For a change action A and given a pair of maps f : C → A and g : C → ΔA, we deﬁne f ⊕ g : C → A as f ⊕ g = ⊕ ◦f, g. Similarly, for maps h : C → ΔA A A and k : C → ΔA, deﬁne h + k =+ ◦h, k. Therefore, that ⊕ is an action A A of ΔA on A can be rewritten as: 1 ⊕ 0 =1 1 ⊕ (1 + 1 )=(1 ⊕ 1 ) ⊕ 1 A A A A ΔA ΔA A ΔA ΔA A A A A A The intuition behind the above deﬁnition is that the monoid ΔA isatypeof possible “changes” or “updates” that might be applied to A, with the monoid structure on ΔA representing the capability to compose updates. Change actions give rise to a notion of derivative, with a distinctly “discrete” ﬂavour. Given change actions on objects A and B, a map f : A → B can be said to be diﬀerentiable when changes to the input (in the sense of elements of ΔA) are mapped to changes to the output (that is, elements of ΔB). In the setting of incremental computation, this is precisely what it means for f to be incrementalizable, with the derivative of f corresponding to an incremental version of f. Deﬁnition 4. Let A ≡ (A, ΔA, ⊕ , + , 0 ) and B ≡ (B, ΔB, ⊕ , + , 0 ) be A A A B B B change actions. For a map f : A → B, a map ∂[f]: A × ΔA → ΔB is a derivative of f whenever the following equalities hold: [CAD.1] f ◦ (x ⊕ y)= f ◦ x ⊕ (∂[f] ◦x, y) A B [CAD.2] ∂[f] ◦x, y + z =(∂[f] ◦x, y)+ (∂[f] ◦x ⊕ y, z) and A B A ∂[f] ◦x, 0 ◦! =0 ◦! B B B A×ΔA The intuition for these axioms will be explained in more detail in Section 4.2 when we explain the axioms of a Cartesian diﬀerence category. Note that although there is nothing in the above deﬁnition guaranteeing that any given map has at most a single derivative, the chain rule does hold. As a corollary, diﬀerentiation is compositional and therefore the change actions in X form a category. Lemma 2. Whenever ∂[f] and ∂[g] are derivatives for composable maps f and g respectively, then ∂[g] ◦f ◦ π , ∂[f] is a derivative for g ◦ f. 3.2 Change Action Models Deﬁnition 5. Given a Cartesian category X, deﬁne its change actions category CAct(X) as the category whose objects are change actions in X and whose arrows f : A → B are the pairs (f, ∂[f]), where f : A → B is an arrow in X and ∂[f]: A × ΔA → ΔB is a derivative for f. The identity is (1 ,π ), while A 1 composition of (f, ∂[f]) and (g, ∂[g]) is (g ◦ f, ∂[g] ◦f ◦ π , ∂[f]). There is an obvious product-preserving forgetful functor E : CAct(X) → X sending every change action (A, ΔA, ⊕, +, 0) to its base object A and every map (f, ∂[f]) to the underlying map f. As a setting for studying diﬀerentiation, the category CAct(X) is rather lacklustre, since there is no notion of higher 62 M. Alvarez-Picallo and J.-S. P. Lemay derivatives, so we will instead work with change action models. Informally, a change action model consists of a rule which for every object A of X associates a change action over it, and for every map a choice of a derivative. Deﬁnition 6. A change action model is a Cartesian category X is a product- preserving functor α : X → CAct(X) that is a section of the forgetful functor E. For brevity, when A is an object of a change action model, we will write ΔA, ⊕ ,+ , and 0 to refer to the components of the corresponding change action A A A α(A). Examples of change action models can be found in [2]. In particular, we highlight that a Cartesian diﬀerential category always provides a change model action. We will generalize this result, and show in Section 4.4 that a Cartesian diﬀerence category also always provides a change action model. 4 Cartesian Diﬀerence Categories In this section, we introduce Cartesian diﬀerence categories, which are gener- alizations of Cartesian diﬀerential categories. Examples of Cartesian diﬀerence categories can be found in Section 5. 4.1 Inﬁnitesimal Extensions in Left Additive Categories We ﬁrst introduce inﬁnitesimal extensions, which is an operator that turns a map into an “inﬁnitesimal” version of itself – in the sense that every map coincides with its Taylor approximation on inﬁnitesimal elements. Deﬁnition 7. A Cartesian left additive category X is said to have an inﬁnites- imal extension ε if every homset X(A, B) comes equipped with a monoid mor- phism ε : X(A, B) → X(A, B), that is, ε(f + g)= ε(f)+ ε(g) and ε(0) = 0, and such that ε(g ◦ f)= ε(g) ◦ f and ε(π )= π ◦ ε(1 ) and ε(π )= π ◦ ε(1 ). 0 0 A×B 1 1 A×B Note that since ε(g ◦ f)= ε(g) ◦ f, it follows that ε(f)= ε(1 ) ◦ f and ε(1 ): A → A is an additive map (Deﬁnition 1). In light of this, it turns out that inﬁnitesimal extensions can equivalently be described as a class of additive maps ε : A → A such that ε = ε ×ε . The equivalence is given by setting A A×B A B ε(f)= ε ◦ f and ε = ε(1 ). Furthermore, inﬁnitesimal extensions equipped B A A each object with a canonical change action structure: Lemma 3. Let X be a Cartesian left additive category with inﬁnitesimal exten- sion ε. For every object A, deﬁne the maps ⊕ : A×A → A as ⊕ = π +ε(π ), A A 0 1 + : A×A → A as π +π , and 0 : → A as 0 =0. Then (A, A, ⊕ , + , 0 ) A 0 1 A A A A A is a change action in X. Proof. As mentioned earlier, that (A, + , 0 ) is a commutative monoid was A A shown in [4]. On the other hand, that ⊕ is a change action follows from the fact that ε preserves the addition. Cartesian Diﬀerence Categories 63 Setting A ≡ (A, A, ⊕ , + , 0 ), we note that f ⊕ g = f +ε(g) and f + g = A A A A A f + g, and so in particular + = +. Therefore, from now on we will omit the subscripts and simply write ⊕ and +. For every Cartesian left additive category, there are always at least two pos- sible inﬁnitesimal extensions: Lemma 4. For any Cartesian left additive category X, 1. Setting ε(f)=0 deﬁnes an inﬁnitesimal extension on X and therefore in this case, ⊕ = π and f ⊕ g = f. A 0 2. Setting ε(f)= f deﬁnes an inﬁnitesimal extension on X and therefore in this case, ⊕ =+ and f ⊕ g = f + g. A A We note that while these examples of inﬁnitesimal extensions may seem triv- ial, they are both very important as they will give rise to key examples of Carte- sian diﬀerence categories. 4.2 Cartesian Diﬀerence Categories Deﬁnition 8. A Cartesian diﬀerence category is a Cartesian left additive category with an inﬁnitesimal extension ε which is equipped with a diﬀerence combinator ∂ of the form: f : A → B ∂[f]: A × A → B verifying the following coherence conditions: [C∂.0] f ◦ (x + ε(y)) = f ◦ x + ε (∂[f] ◦x, y) [C∂.1] ∂[f + g]= ∂[f]+ ∂[g], ∂[0] = 0, and ∂[ε(f)] = ε(∂[f]) [C∂.2] ∂[f] ◦x, y + z = ∂[f] ◦x, y + ∂[f] ◦x + ε(y),z and ∂[f] ◦x, 0 =0 [C∂.3] ∂[1 ]= π and ∂[π ]= π ; π and ∂[π ]= π ; π A 1 0 1 0 1 1 0 [C∂.4] ∂[f, g]= ∂[f], ∂[g] and ∂[! ]=! A A×A [C∂.5] ∂[g ◦ f]= ∂[g] ◦f ◦ π , ∂[f] [C∂.6] ∂ [∂[f]] ◦x, y, 0,z = ∂[f] ◦x + ε(y),z [C∂.7] ∂ [∂[f]] ◦x, y, z, 0 = ∂ [∂[f]] ◦x, z, y, 0 Before giving some intuition on the axioms [C∂.0] to [C∂.7], we ﬁrst observe that one could have used change action notation to express [C∂.0], [C∂.2], and [C∂.6] which would then be written as: [C∂.0] f ◦ (x ⊕ y)=(f ◦ x) ⊕ (∂[f] ◦x, y) [C∂.2] ∂[f] ◦x, y + z = ∂[f] ◦x, y + ∂[f] ◦x ⊕ y, z and ∂[f] ◦x, 0 =0 [C∂.6] ∂ [∂[f]] ◦x, y, 0,z = ∂[f] ◦x ⊕ y, z And also, just like Cartesian diﬀerential categories, [C∂.6] and [C∂.7] have alternative equivalent expressions. Lemma 5. In the presence of the other axioms, [C∂.6] and [C∂.7] are equiv- alent to: 64 M. Alvarez-Picallo and J.-S. P. Lemay [C∂.6.a] ∂ [∂[f]] ◦x, 0, 0,y = ∂[f] ◦x, y [C∂.7.a] ∂ [∂[f]] ◦x, y, z, w = ∂ [∂[f]] ◦x, z, y, w Proof. The proof is essentially the same as [7, Proposition 4.2]. The keen eyed reader will notice that the axioms of a Cartesian diﬀerence cat- egory are very similar to the axioms of a Cartesian diﬀerential category. Indeed, [C∂.1], [C∂.3], [C∂.4], [C∂.5], and [C∂.7] are the same as their Cartesian dif- ferential category counterpart. The axioms which are diﬀerent are [C∂.2] and [C∂.6] where the inﬁnitesimal extension ε is now included, and also there is the new extra axiom [C∂.0]. On the other hand, interestingly enough, [C∂.6.a] is the same as [CD.6.a]. We also point out that writing out [C∂.0] and [C∂.2] using change action notion, we see that these axioms are precisely [CAD.1] and [CAD.2] respectively. To better understand [C∂.0] to [C∂.7] it may be useful to write them out using element-like notation. In element-like notation, [C∂.0] is written as: f(x + ε(y)) = f(x)+ ε (∂[f](x, y)) This condition can be read as a generalization of the Kock-Lawvere axiom that characterizes the derivative in from synthetic diﬀerential geometry [13]. Broadly speaking, the Kock-Lawvere axiom states that, for any map f : R→R and any x ∈R and d ∈D, there exists a unique f (x) ∈R verifying f(x + d)= f(x)+ d · f (x) where D is the subset of R consisting of inﬁnitesimal elements. It is by analogy with the Kock-Lawvere axiom that we refer to ε as an “inﬁnitesimal extension” as it can be thought of as embedding the space A into a subspace ε(A)of inﬁnitesimal elements. [C∂.1] states that the diﬀerential of a sum of maps is the sum of diﬀerentials, and similarly for zero maps and the inﬁnitesimal extension of a map. [C∂.2] is the ﬁrst crucial diﬀerence between a Cartesian diﬀerence category and a Carte- sian diﬀerential category. In a Cartesian diﬀerential category, the diﬀerential of a map is assumed to be additive in its second argument. In a Cartesian diﬀer- ence category, just as derivatives for change actions, while the diﬀerential is still required to preserve zeros in its second argument, it is only additive “up to a small perturbation”, that is: ∂[f](x, y + z)= ∂[f](x, y)+ ∂[f](x + ε(y),z) [C∂.3] tells us what the diﬀerential of the identity and projection maps are, while [C∂.4] says that the diﬀerential of a pairing of maps is the pairing of their diﬀerentials. [C∂.5] is the chain rule which expresses what the diﬀerential of a composition of maps is: ∂[g ◦ f](x, y)= ∂[g](f(x), ∂[f](x, y)) [C∂.6] and [C∂.7] tell us how to work with second order diﬀerentials. [C∂.6] is expressed as follows: ∂ [∂[f]] (x, y, 0,z)= ∂[f](x + ε(y),z) Cartesian Diﬀerence Categories 65 and ﬁnally [C∂.7] is expressed as: ∂ [∂[f]] (x, y, z, 0) = ∂ [∂[f]] (x, z, y, 0) It is interesting to note that while [C∂.6] is diﬀerent from [CD.6], its alternative version [C∂.6.a] is the same as [CD.6.a]. ∂ [∂[f]] ((x, 0), (0,y)) = ∂[f](x, z) 4.3 Another look at Cartesian Diﬀerential Categories Here we explain how a Cartesian diﬀerential category is a Cartesian diﬀerence category where the inﬁnitesimal extension is given by zero. Proposition 1. Every Cartesian diﬀerential category X with diﬀerential com- binator D is a Cartesian diﬀerence category where the inﬁnitesimal extension is deﬁned as ε(f)=0 and the diﬀerence combinator is deﬁned to be the diﬀerential combinator, ∂ = D. Proof. As noted before, the ﬁrst two parts of the [C∂.1], the second part of [C∂.2], [C∂.3], [C∂.4], [C∂.5], and [C∂.7] are precisely the same as their Cartesian diﬀerential axiom counterparts. On the other hand, since ε(f)=0, [C∂.0] and the third part of [C∂.1] trivial state that 0 = 0, while the ﬁrst part of [C∂.2] and [C∂.6] end up being precisely the ﬁrst part of [CD.2] and [CD.6]. Therefore, the diﬀerential combinator satisﬁes the Cartesian diﬀerence axioms and we conclude that a Cartesian diﬀerential category is a Cartesian diﬀerence category. Conversely, one can always build a Cartesian diﬀerential category from a Cartesian diﬀerence category by considering the objects for which the inﬁnites- imal extension is the zero map. Proposition 2. For a Cartesian diﬀerence category X with inﬁnitesimal exten- sion ε and diﬀerence combinator ∂, then X , the full subcategory of objects A such that ε(1 )=0, is a Cartesian diﬀerential category where the diﬀerential combinator is deﬁned to be the diﬀerence combinator, D = ∂. Proof. First note that if ε(1 )=0 and ε(1 ) = 0, then by deﬁnition it also A B follows that ε(1 ) = 0, and also that for the terminal object ε(1 )=0 A×B by uniqueness of maps into the terminal object. Thus X is closed under ﬁnite products and is therefore a Cartesian left additive category. Furthermore, we again note that since ε(f) = 0, this implies that for maps between such objects the Cartesian diﬀerence axioms are precisely the Cartesian diﬀerential axioms. Therefore, the diﬀerence combinator is a diﬀerential combinator for this subcat- egory, and so X is a Cartesian diﬀerential category. 0 66 M. Alvarez-Picallo and J.-S. P. Lemay In any Cartesian diﬀerence category X, the terminal object always satisﬁes that ε(1 ) = 0, and so therefore, X is never empty. On the other hand, applying Proposition 2 to a Cartesian diﬀerential category results in the entire category. It is also important to note that the above two propositions do not imply that if a diﬀerence combinator is a diﬀerential combinator then the inﬁnitesimal ex- tension must be zero. In Section 5.3, we provide such an example of a Cartesian diﬀerential category that comes equipped with a non-zero inﬁnitesimal extension such that the diﬀerential combinator is a diﬀerence combinator with respect to this non-zero inﬁnitesimal extension. 4.4 Cartesian Diﬀerence Categories as Change Action Models In this section, we show how every Cartesian diﬀerence category is a particu- larly well-behaved change action model, and conversely how every change action model contains a Cartesian diﬀerence category. Proposition 3. Let X be a Cartesian diﬀerence category with inﬁnitesimal ex- tension ε and diﬀerence combinator ∂. Deﬁne the functor α : X → CAct(X) as α(A)=(A, A, ⊕ , + , 0 ) (as deﬁned in Lemma 3) and α(f)=(f, ∂[f]). Then A A A (X,α : X → CAct(X)) is a change action model. Proof. By Lemma 3, (A, A, ⊕ , + , 0 ) is a change action and so α is well- A A A deﬁned on objects. While for a map f, ∂[f] is a derivative of f in the change action sense since [C∂.0] and [C∂.2] are precisely [CAD.1] and [CAD.2], and so α is well-deﬁned on maps. That α preserves identities and composition follows from [C∂.3] and [C∂.5] respectively, and so α is a functor. That α preserves ﬁnite products will follow from [C∂.3] and [C∂.4]. Lastly, it is clear that α section of the forgetful functor, and therefore we conclude that (X,α)is a change action model. It is clear that not every change action model is a Cartesian diﬀerence cat- egory. For example, change action models do not require the addition to be commutative. On the other hand, it can be shown that every change action model contains a Cartesian diﬀerence category as a full subcategory. Deﬁnition 9. Let (X,α : X → CAct(X)) be a change action model. An object A is ﬂat whenever the following hold: [F.1] ΔA = A [F.2] α(⊕ )=(⊕ , ⊕ ◦ π ) A A A 1 [F.3] 0 ⊕ (0 ⊕ f)=0 ⊕ f for any f : U → A. A A A [F.4] ⊕ is right-injective, that is, if ⊕ ◦f, g = ⊕ ◦f, h then g = h. A A A We would like to show that for any change action model (X,α), its full sub- category of ﬂat objects, Flat is a Cartesian diﬀerence category. Starting with the ﬁnite product structure, since α preserves ﬁnite products, it is straightfor- ward to see that is Euclidean and if A and B are ﬂat then so is A × B. The sum of maps f : A → B and g : A → B in Flat is deﬁned using the change action structure f + g, while the zero map 0 : A → B is 0=0 ◦! . And so we B B A obtain that: Cartesian Diﬀerence Categories 67 Lemma 6. Flat is a Cartesian left additive category. Proof. Most of the Cartesian left additive structure is straightforward. However, since the addition is not required to be commutative for arbitrary change actions, we will show that the addition is commutative for Euclidean objects. Using that ⊕ is an action, that by [F.2] we have that ⊕ ◦ π is a derivative for ⊕ , and B B 1 B [CAD.1], we obtain that: 0 ⊕ (f + g)=(0 ⊕ f) ⊕ g =(0 ⊕ g) ⊕ f =0 ⊕ (g + f) B B B B B B B B B B B B By [F.4], ⊕ is right-injective and we conclude that f + g = g + f. As an immediate consequence We note that for any change action model (X,α), since the terminal object is always ﬂat, Flat is never empty. We use the action of the change action structure to deﬁne the inﬁnitesimal extension. So for a map f : A → B in Flat , deﬁne ε(f): A → B as follows: ε(f)= ⊕ ◦0 ◦! ,f =0 ⊕ f B B A B Lemma 7. ε is an inﬁnitesimal extension for Flat . Proof. We show that ε preserve the addition. Following the same idea as in the proof of Lemma 6, we obtain the following: 0 ⊕ ε(f + g)=0 ⊕ (0 ⊕ (f + g)) B B B B B B B B =(0 ⊕ 0 ) ⊕ ((0 ⊕ f) ⊕ g)=(0 ⊕ (0 ⊕ f)) ⊕ (0 ⊕ g) B B B B B B B B B B B B B B =(0 ⊕ ε(f)) ⊕ ε(g)=0 ⊕ (ε(f)+ ε(g)) B B B B B B Then by [F.3], it follows that ε(f +g)= ε(f)+ε(g). The remaining inﬁnitesimal extension axioms are proven in a similar fashion. Lastly, the diﬀerence combinator for Flat is deﬁned in the obvious way, that is, ∂[f] is deﬁned as the second component of α(f). Proposition 4. Let (X,α : X → CAct(X)) be a change action model. Then Flat is a Cartesian diﬀerence category. Proof (Sketch). The full calculations will appear in an upcoming extended jour- nal version of this paper, but we give an informal explanation. [C∂.0] and [C∂.2] are a straightforward consequences of [CAD.1] and [CAD.2].[C∂.3] and [C∂.4] follow trivially from the fact that α preserves ﬁnite products and from the structure of products in CAct(X), while [C∂.5] follows from composition in CAct(X). [C∂.1],[C∂.6] and [C∂.7] are obtained by mechanical calculation in the spirit of Lemma 6. Note that every axiom except for [C∂.6] can be proven without using [F.3] 68 M. Alvarez-Picallo and J.-S. P. Lemay 4.5 Linear Maps and ε-Linear Maps An important subclass of maps in a Cartesian diﬀerential category is the subclass of linear maps [4, Deﬁnition 2.2.1]. One can also deﬁne linear maps in a Cartesian diﬀerence category by using the same deﬁnition. Deﬁnition 10. In a Cartesian diﬀerence category, a map f is linear if the following equality holds: ∂[f]= f ◦ π . Using element-like notation, a map f is linear if ∂[f](x, y)= f(y). Linear maps in a Cartesian diﬀerence category satisfy many of the same properties found in [4, Lemma 2.2.2]. Lemma 8. In a Cartesian diﬀerence category, 1. If f : A → B is linear then ε(f)= f ◦ ε(1 ); 2. If f : A → B is linear, then f is additive (Deﬁnition 1); 3. Identity maps, projection maps, and zero maps are linear; 4. The composite, sum, and pairing of linear maps is linear; 5. If f : A → B and k : C → D are linear, then for any map g : B → C, the following equality holds: ∂[k ◦ g ◦ f]= k ◦ ∂[g] ◦ (f × f); 6. If an isomorphism is linear, then its inverse is linear; 7. For any object A, ⊕ and + are linear. A A Using element-like notation, the ﬁrst point of the above lemma says that if f is linear then f(ε(x)) = ε(f(x)). And while all linear maps are additive, the converse is not necessarily true, see [4, Corollary 2.3.4]. However, an immediate consequence of the above lemma is that the subcategory of linear maps of a Cartesian diﬀerence category has ﬁnite biproducts. Another interesting subclass of maps is the subclass of ε-linear maps, which are maps whose inﬁnitesimal extension is linear. Deﬁnition 11. In a Cartesian diﬀerence category, a map f is ε-linear if ε(f) is linear. Lemma 9. In a Cartesian diﬀerence category, 1. If f : A → B is ε-linear then f ◦ (x + ε(y)) = f ◦ x + ε(f) ◦ y; 2. Every linear map is ε-linear; 3. The composite, sum, and pairing of ε-linear maps is ε-linear; 4. If an isomorphism is ε-linear, then its inverse is again ε-linear. Using element-like notation, the ﬁrst point of the above lemma says that if f is ε-linear then f(x + ε(y)) = f(x)+ ε(f(y)). So ε-linear maps are additive on “inﬁnitesimal” elements (i.e. those of the form ε(y)). For a Cartesian diﬀerential category, linear maps in the Cartesian diﬀerence category sense are precisely the same as the Cartesian diﬀerential category sense [4, Deﬁnition 2.2.1], while every map is ε-linear since ε =0. Cartesian Diﬀerence Categories 69 5 Examples of Cartesian Diﬀerence Categories 5.1 Smooth Functions Every Cartesian diﬀerential category is a Cartesian diﬀerence category where the inﬁnitesimal extension is zero. As a particular example, we consider the category of real smooth functions, which as mentioned above, can be considered to be the canonical (and motivating) example of a Cartesian diﬀerential category. Let R be the set of real numbers and let SMOOTH be the category whose n 0 objects are Euclidean spaces R (including the point R = {∗}), and whose n m maps are smooth functions F : R → R . SMOOTH is a Cartesian left additive category where the product structure is given by the standard Cartesian product of Euclidean spaces and where the additive structure is deﬁned by point-wise addition, (F + G)(x)= F (x)+ G(x) and 0(x)=(0,..., 0), where x ∈ R . SMOOTH is a Cartesian diﬀerential category where the diﬀerential combinator is deﬁned by the directional derivative of smooth functions. Explicitly, for a n m smooth function F : R → R , which is in fact a tuple of smooth functions n n n m F =(f ,...,f ) where f : R → R, D[F]: R × R → R is deﬁned as follows: 1 n i n n ∂f ∂f 1 n D[F](x, y):= (x)y ,..., (x)y i i ∂u ∂u i i i=1 i=1 where x =(x ,...,x ), y =(y ,...,y ) ∈ R . Alternatively, D[F ] can also be 1 n 1 n deﬁned in terms of the Jacobian matrix of F . Therefore SMOOTH is a Carte- sian diﬀerence category with inﬁnitesimal extesion ε = 0 and with diﬀerence combinator D. Since ε = 0, the induced action is simply x ⊕ y = x. Also a smooth function is linear in the Cartesian diﬀerence category sense precisely if it is R-linear in the classical sense, and every smooth function is ε-linear. 5.2 Calculus of Finite Diﬀerences Here we explain how the diﬀerence operator from the calculus of ﬁnite diﬀerences gives an example of a Cartesian diﬀerence category but not a Cartesian diﬀer- ential category. This example was the main motivating example for developing Cartesian diﬀerence categories. The calculus of ﬁnite diﬀerences is captured by the category of abelian groups and arbitrary set functions between them. Let Ab be the category whose objects are abelian groups G (where we use additive notation for group structure) and where a map f : G → H is simply an arbitrary function between them (and therefore does not necessarily preserve the group structure). Ab is a Cartesian left additive category where the product structure is given by the standard Cartesian product of sets and where the additive structure is again given by point-wise addition, (f +g)(x)= f(x)+g(x) and 0(x)=0. Ab is a Cartesian diﬀerence category where the inﬁnitesimal extension is simply given by the identity, that is, ε(f)= f, and and where the diﬀerence combinator ∂ is deﬁned as follows for a map f : G → H: ∂[f](x, y)= f(x + y) − f(x) 70 M. Alvarez-Picallo and J.-S. P. Lemay On the other hand, ∂ is not a diﬀerential combinator for Ab since it does not satisfy [CD.6] and part of [CD.2]. Thanks to the addition of the inﬁnitesimal extension, ∂ does satisfy [C∂.2] and [C∂.6], as well as [C∂.0]. However, as noted in [5], it is interesting to note that this ∂ does satisfy [CD.1], the second part of [CD.2], [CD.3], [CD.4], [CD.5], [CD.7], and [CD.6.a]. It is worth noting that in [5], the goal was to drop the addition and develop a “non-additive” version of Cartesian diﬀerential categories. In Ab, since the inﬁnitesimal operator is given by the identity, the induced action is simply the addition, x⊕ y = x+y. On the other hand, the linear maps in Ab are precisely the group homomorphisms. Indeed, f is linear if ∂[f](x, y)= f(y). But by [C∂.0] and [C∂.2], we get that: f(x + y)= f(x)+ ∂[f](x, y)= f(x)+ f(y) f(0) = ∂[f](x, 0) = 0 So f is a group homomorphism. Conversely, if f is a group homomorphism: ∂[f](x, y)= f(x + y) − f(x)= f(x)+ f(y) − f(x)= f(y) So f is linear. Since ε(f)= f, the ε-linear maps are precisely the linear maps. 5.3 Module Morphisms Here we provide a simple example of a Cartesian diﬀerence category whose dif- ference combinator is also a diﬀerential combinator, but where the inﬁnitesimal extension is neither zero nor the identity. Let R be a commutative semiring and let MOD be the category of R- modules and R-linear maps between them. MOD has ﬁnite biproducts and is therefore a Cartesian left additive category where every map is additive. Every r ∈ R induces an inﬁnitesimal extension ε deﬁned by scalar multiplication, ε (f)(m)= rf(m). Then MOD is a Cartesian diﬀerence category with the inﬁnitesimal extension ε for any r ∈ R and diﬀerence combinator ∂ deﬁned as: ∂[f](m, n)= f(n) R-linearity of f assures that [C∂.0] holds, while the remaining Cartesian dif- ference axioms hold trivially. In fact, ∂ is also a diﬀerential combinator and therefore MOD is also a Cartesian diﬀerential category. The induced action is given by m ⊕ n = m + rn. By deﬁnition of ∂, every map in MOD is linear, M R and by deﬁnition of ε and R-linearity, every map is also ε-linear. 5.4 Stream calculus Here we show how one can extend the calculus of ﬁnite diﬀerences example to stream calculus. The diﬀerential calculus of causal functions and interesting applications have recently been studying in [17, 18]. For a set A, let A denote the set of inﬁnite sequences of elements of A, where we write [a ] for the inﬁnite sequence [a ]=(a ,a ,a ,...) and a for i i 1 2 3 i:j Cartesian Diﬀerence Categories 71 ω ω the (ﬁnite) subsequence (a ,a ,...,a ). A function f : A → B is causal i i+1 j whenever the n-th element f ([a ]) of the output sequence only depends on the ﬁrst n elements of [a ], that is, f is causal if and only if whenever a = b i 0:n 0:n then f ([a ]) = f ([b ]) . We now consider streams over abelian groups, so i i 0:n 0:n let Ab be the category whose objects are all the Abelian groups and whose ω ω morphisms are causal maps from G to H . Ab is a Cartesian left-additive category, where the product is given by the standard product of abelian groups and where the additive structure is lifted point-wise from the structure of Ab, that is, (f + g)([a ]) = f ([a ]) + g ([a ]) and 0 ([a ]) = 0. In order to deﬁne i i i i n n n n the inﬁnitesimal extension, we ﬁrst need to deﬁne the truncation operator z.So let G be an abelian group and [a ] ∈ G , then deﬁne the sequence z([a ]) as: i i z([a ]) =0 z ([a ]) = a i 0 i n+1 n+1 The category Ab is a Cartesian diﬀerence category where the inﬁnitesimal ex- tension is given by the truncation operator, ε(f)([a ]) = z (f ([a ])), i i and where the diﬀerence combinator ∂ is deﬁned as follows: ∂[f]([a ] , [b ]) = f ([a ]+[b ]) − f ([a ]) i i i i i 0 0 0 ∂[f]([a ] , [b ]) = f ([a ]+ z([b ])) − f ([a ]) i i i i i n+1 n+1 n+1 Note the similarities between the diﬀerence combinator on Ab and that on Ab . The induced action is computed out to be: ([a ] ⊕ [b ]) = a ([a ] ⊕ [b ]) = a + b i i 0 0 i i n+1 n+1 n+1 A causal map is linear (in the Cartesian diﬀerence category sense) if and only if it is a group homomorphism. While a causal map f is ε-linear if and only if it is a group homomorphism which does not the depend on the 0-th term of its input, that is, f ([a ]) = f (z([a ])). i i 6 Tangent Bundles in Cartesian Diﬀerence Categories In this section, we show that the diﬀerence combinator of a Cartesian diﬀerence category induces a monad, called the tangent monad, whose Kleisli category is again a Cartesian diﬀerence category. This construction is a generalization of the tangent monad for Cartesian diﬀerential categories [7, 15]. However, the Kleisli category of the tangent monad of a Cartesian diﬀerential category is not a Cartesian diﬀerential category, but rather a Cartesian diﬀerence category. 6.1 The Tangent Bundle Monad Let X be a Cartesian diﬀerence category with inﬁnitesimal extension ε and dif- ference combinator ∂. Deﬁne the functor T : X → X as follows: T(A)= A × A T(f)= f ◦ π , ∂[f] 0 72 M. Alvarez-Picallo and J.-S. P. Lemay and deﬁne the natural transformations η : 1 ⇒ T and μ : T ⇒ T as follows: η := 1 , 0 μ := π ◦ π ,π ◦ π + π ◦ π + ε(π ◦ π ) A A A 0 0 1 0 0 1 1 1 Proposition 5. (T,μ,η) is a monad. Proof. Functoriality of T will follow from [C∂.3] and the chain rule [C∂.5]. Naturality of η and μ and the monad identities will follow from the remain- ing diﬀerence combinator axioms. The full lengthy brute force calculations will appear in an upcoming extended journal version of this paper. When X is a Cartesian diﬀerential category with the diﬀerence structure aris- ing from setting ε = 0, this tangent bundle monad coincides with the standard tangent monad corresponding to its tangent category structure [7, 15]. 6.2 The Kleisli Category of T Recall that the Kleisli category of the monad (T,μ,η) is deﬁned as the category X whose objects are the objects of X, and where a map A → B in X is a map T T f : A → T(B)in X, which would be a pair f = f ,f where f : A → B. 0 1 j The identity map in X is the monad unit η : A → T(A), while composition T A of Kleisli maps f : A → T(B) and g : B → T(C) is deﬁned as the composite μ ◦T(g)◦f. To distinguish between composition in X and X , we denote Kleisli C T composition as g ◦ f = μ ◦ T(g) ◦ f.If f = f ,f and g = g ,g , then their C 0 1 0 1 Kleisli composition can be explicitly computed out to be: T T g ◦ f = g ,g ◦ f ,f = g ◦ f , ∂[g ] ◦f ,f + g ◦ (f + ε(f )) 0 1 0 1 0 0 0 0 1 1 0 1 Kleisli maps can be understood as “generalized” vector ﬁelds. Indeed, T(A) should be thought of as the tangent bundle over A, and therefore a vector ﬁeld would be a map 1,f : A → T(A), which is of course also a Kleisli map. For more details on the intuition behind this Kleisli category see [7]. We now wish to explain how the Kleisli category is again a Cartesian diﬀerence category. We begin by exhibiting the Cartesian left additive structure of the Kleisli category. The product of objects in X is deﬁned as A × B with projections T T T π : A × B → T(A) and π : A × B → T(B) deﬁned respectively as π = π , 0 0 1 0 and π = π , 0. The pairing of Kleisli maps f = f ,f and g = ,g ,g is 1 0 1 0 1 deﬁned as f, g = f ,g , f ,g . The terminal object is again and where 0 0 1 1 the unique map to the terminal object is ! = 0. The sum of Kleisli maps f Kleisli maps f = f ,f and g = ,g ,g is deﬁned as f + g = f +g = f +g ,f +g , 0 1 0 1 0 0 1 1 and the zero Kleisli maps is simply 0 =0= 0, 0. Therefore we conclude that the Kleisli category of the tangent monad is a Cartesian left additive category. Lemma 10. X is a Cartesian left additive category. The inﬁnitesimal extension ε for the Kleisli category is deﬁned as follows for a Kleisli map f = f ,f : 0 1 ε (f)= 0,f + ε(f ) 0 1 Cartesian Diﬀerence Categories 73 Lemma 11. ε is an inﬁnitesimal extension on X . It is interesting to point out that for an object A the induced action ⊕ can be computed out to be: T T T T ⊕ = π + ε (π )= π , 0 + 0,π = π ,π =1 1 0 1 0 1 A 0 T(A) and we stress that this is the identity of T(A) in the base category X (but not in the Kleisli category). To deﬁne the diﬀerence combinator for the Kleisli category, ﬁrst note that diﬀerence combinators by deﬁnition do not change the codomain. That is, if f : A → T(B) is a Kleisli arrow, then the type of its derivative qua Kleisli arrow should be A × A → T(B) × T(B), which coincides with the type of its derivative in X. Therefore, the diﬀerence combinator ∂ for the Kleisli category can be deﬁned to be the diﬀerence combinator of the base category, that is, for a Kleisli map f = f ,f : 0 1 ∂ [f]= ∂[f]= ∂[f ], ∂[f ] 0 1 Proposition 6. For a Cartesian diﬀerence category X, the Kleisli category X is a Cartesian diﬀerence category with inﬁnitesimal extension ε and diﬀerence combinator ∂ . Proof. The full lengthy brute force calculations will appear in an upcoming ex- tended journal version of this paper. We do note that a crucial identity for this proof is that for any map f in X, the following equality holds: T(∂[f]) = ∂ [T(f)] ◦π × π ,π × π 0 0 1 1 This helps simplify many of the calculations for the diﬀerence combinator axioms since T(∂[f]) appears everywhere due to the deﬁnition of Kleisli composition. As a result, the Kleisli category of a Cartesian diﬀerence category is again a Cartesian diﬀerence category, whose inﬁnitesimal extension is neither the iden- tity or the zero map. This allows one to build numerous examples of interesting and exotic Cartesian diﬀerence categories, such as the Kleisli category of Carte- sian diﬀerential categories (or iterating this process, taking the Kleisli category of the Kleisli category). We highlight the importance of this construction in the Cartesian diﬀerential case as it does not in general result in a Cartesian diﬀer- ential category. Indeed, even if ε = 0, it is always the case that ε =0. We conclude this section by taking a look at the linear maps and the ε -linear maps in the Kleisli category. A Kleisli map f = f ,f is linear in the Kleisli category 0 1 T T if ∂ [f]= f ◦ π , which amounts to requiring that: ∂[f ], ∂[f ] = f ◦ π ,f ◦ π 0 1 0 1 1 1 Therefore a Kleisli map is linear in the Kleisli category if and only if it is the pairing of maps which are linear in the base category. On the other hand, f is T T ε -linear if ε (f)= 0,f + ε(f ) is linear in the Kleisli category, which in this 0 1 case amounts to requiring that f + ε(f ) is linear. Therefore, if f is linear and 0 1 0 f is ε-linear, then f is ε -linear. 1 74 M. Alvarez-Picallo and J.-S. P. Lemay 7 Conclusions and Future Work We have presented Cartesian diﬀerence categories, which generalize Cartesian diﬀerential categories to account for more discrete deﬁnitions of derivatives while providing an additional structure that is absent in change action models. We have also exhibited important examples and shown that Cartesian diﬀerence cate- gories arise quite naturally from considering tangent bundles in any Cartesian diﬀerential category. We claim that Cartesian diﬀerence categories can facilitate the exploration of diﬀerentiation in discrete spaces, by generalizing techniques and ideas from the study of their diﬀerential counterparts. For example, Carte- sian diﬀerential categories can be extended to allow objects whose tangent space is not necessarily isomorphic to the object itself [9]. The same generalization could be applied to Cartesian diﬀerence categories – with some caveats: for ex- ample, the equation deﬁning a linear map (Deﬁnition 10) becomes ill-typed, but the notion of ε-linear map remains meaningful. Another relevant path to consider is developing the analogue of the “tensor” story for Cartesian diﬀerence categories. Indeed, an important source of exam- ples of Cartesian diﬀerential categories are the coKleisli categories of a tensor diﬀerential category [3, 4]. A similar result likely holds for a hypothetical “ten- sor diﬀerence category”, but it is not clear how these should be deﬁned: [C∂.2] implies that derivatives in the diﬀerence sense are non-linear and therefore their interplay with the tensor structure will be much diﬀerent. A further generalization of Cartesian diﬀerential categories, categories with tangent structure [7] are deﬁned directly in terms of a tangent bundle functor rather than requiring that every tangent bundle be trivial (that is, in a tangent category it may not be the case that TA = A × A). Some preliminary research on change actions has already shown that, when generalized in this way, change actions are precisely internal categories, but the consequences of this for change action models (and, a fortiori, Cartesian diﬀerence categories) are not under- stood. More recently, some work has emerged about diﬀerential equations using the language of tangent categories [8]. We believe similar techniques can be ap- plied in a straightforward way to Cartesian diﬀerence categories, where they might be of use to give an abstract formalization of discrete dynamical systems and diﬀerence equations. An important open question is whether Cartesian diﬀerence categories (or a similar notion) admit an internal language. It is well-known that the diﬀeren- tial λ-calculus can be interpreted in Cartesian closed diﬀerential categories [14]. Given their similarities, we believe there will be a very similar “diﬀerence λ- calculus” which could potentially have applications to automatic diﬀerentiation (change structures, a notion similar to change actions, have already been pro- posed as models of forward-mode automatic diﬀerentiation [12], although work on the area seems to have stagnated). Lastly, we should mention that there are adjunctions between the categories of Cartesian diﬀerence categories, change action models, and Cartesian diﬀer- ential categories given by Proposition 1, 2, 3, and 4. These adjunctions will be explored in detail in the upcoming journal version of this paper. Cartesian Diﬀerence Categories 75 References 1. Alvarez-Picallo, M., Eyers-Taylor, A., Jones, M.P., Ong, C.H.L.: Fixing incremental computation. In: European Symposium on Programming. pp. 525–552. Springer (2019) 2. Alvarez-Picallo, M., Ong, C.H.L.: Change actions: models of generalised diﬀer- entiation. In: International Conference on Foundations of Software Science and Computation Structures. pp. 45–61. Springer (2019) 3. Blute, R.F., Cockett, J.R.B., Seely, R.A.G.: Diﬀerential categories. Mathematical structures in computer science 16(06), 1049–1083 (2006) 4. Blute, R.F., Cockett, J.R.B., Seely, R.A.G.: Cartesian diﬀerential categories. The- ory and Applications of Categories 22(23), 622–672 (2009) 5. Bradet-Legris, J., Reid, H.: Diﬀerential forms in non-linear cartesian diﬀerential categories (2018), Foundational Methods in Computer Science 6. Cai, Y., Giarrusso, P.G., Rendel, T., Ostermann, K.: A theory of changes for higher- order languages: Incrementalizing λ-calculi by static diﬀerentiation. In: ACM SIG- PLAN Notices. vol. 49, pp. 145–155. ACM (2014) 7. Cockett, J.R.B., Cruttwell, G.S.H.: Diﬀerential structure, tangent structure, and sdg. Applied Categorical Structures 22(2), 331–417 (2014) 8. Cockett, J., Cruttwell, G.: Connections in tangent categories. Theory and Appli- cations of Categories 32(26), 835–888 (2017) 9. Cruttwell, G.S.: Cartesian diﬀerential categories revisited. Mathematical Struc- tures in Computer Science 27(1), 70–91 (2017) 10. Ehrhard, T., Regnier, L.: The diﬀerential lambda-calculus. Theoretical Computer Science 309(1), 1–41 (2003) 11. Ehrhard, T.: An introduction to diﬀerential linear logic: proof-nets, models and antiderivatives. Mathematical Structures in Computer Science 28(7), 995–1060 (2018) 12. Kelly, R., Pearlmutter, B.A., Siskind, J.M.: Evolving the incremental {\lambda} calculus into a model of forward automatic diﬀerentiation (ad). arXiv preprint arXiv:1611.03429 (2016) 13. Kock, A.: Synthetic diﬀerential geometry, vol. 333. Cambridge University Press (2006) 14. Manzonetto, G.: What is a categorical model of the diﬀerential and the resource λ-calculi? Mathematical Structures in Computer Science 22(3), 451–520 (2012) 15. Manzyuk, O.: Tangent bundles in diﬀerential lambda-categories. arXiv preprint arXiv:1202.0411 (2012) 16. Richardson, C.H.: An introduction to the calculus of ﬁnite diﬀerences. Van Nos- trand (1954) 17. Sprunger, D., Jacobs, B.: The diﬀerential calculus of causal functions. arXiv preprint arXiv:1904.10611 (2019) 18. Sprunger, D., Katsumata, S.y.: Diﬀerentiable causal computations via delayed trace. In: 2019 34th Annual ACM/IEEE Symposium on Logic in Computer Science (LICS). pp. 1–12. IEEE (2019) 19. Steinbach, B., Posthoﬀ, C.: Boolean diﬀerential calculus. In: Logic Functions and Equations, pp. 75–103. Springer (2009) 76 M. Alvarez-Picallo and J.-S. P. Lemay Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made. The images or other third party material in this chapter are included in the chapter’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. Contextual Equivalence for Signal Flow Graphs 1 2 3 Filippo Bonchi , Robin Piedeleu ,Pawel Soboci´ nski , and Fabio Zanasi () Universit` a di Pisa, Italy University College London, UK, {r.piedeleu, f.zanasi}@ucl.ac.uk Tallinn University of Technology, Estonia Abstract. We extend the signal ﬂow calculus—a compositional account of the classical signal ﬂow graph model of computation—to encompass aﬃne behaviour, and furnish it with a novel operational semantics. The increased expressive power allows us to deﬁne a canonical notion of con- textual equivalence, which we show to coincide with denotational equal- ity. Finally, we characterise the realisable fragment of the calculus: those terms that express the computations of (aﬃne) signal ﬂow graphs. Keywords: signal ﬂow graphs · aﬃne relations · full abstraction · con- textual equivalence · string diagrams 1 Introduction Compositional accounts of models of computation often lead one to consider relational models because a decomposition of an input-output system might consist of internal parts where ﬂow and causality are not always easy to assign. These insights led Willems [33] to introduce a new current of control theory, called behavioural control: roughly speaking, behaviours and observations are of prime concern, notions such as state, inputs or outputs are secondary. Indepen- dently, programming language theory converged on similar ideas, with contextual equivalence [25,28] often considered as the equivalence: programs are judged to be diﬀerent if we can ﬁnd some context in which one behaves diﬀerently from the other, and what is observed about “behaviour” is often something quite canonical and simple, such as termination. Hoare [17] and Milner [23] discovered that these programming language theory innovations also bore fruit in the non- deterministic context of concurrency. Here again, research converged on studying simple and canonical contextual equivalences [24,18]. This paper brings together all of the above threads. The model of computa- tion of interest for us is that of signal ﬂow graphs [32,21], which are feedback systems well known in control theory [21] and widely used in the modelling of linear dynamical systems (in continuous time) and signal processing circuits (in Supported by EPSRC grant EP/R020604/1. Supported by the ESF funded Estonian IT Academy research measure (project 2014- 2020.4.05.19-0001) The Author(s) 2020 J. Goubault-Larrecq and B. K¨ onig (Eds.): FOSSACS 2020, LNCS 12077, pp. 77–96, 2020. https://doi.org/10.1007/978-3-030-45231-5_5 78 F. Bonchi et al. discrete time). The signal ﬂow calculus [10,9] is a syntactic presentation with an underlying compositional denotational semantics in terms of linear relations. Armed with string diagrams [31] as a syntax, the tools and concepts of program- ming language theory and concurrency theory can be put to work and the cal- culus can be equipped with a structural operational semantics. However, while in previous work [9] a connection was made between operational equivalence (essentially trace equivalence) and denotational equality, the signal ﬂow calculus was not quite expressive enough for contextual equivalence to be a useful notion. The crucial step turns out to be moving from linear relations to aﬃne rela- tions, i.e. linear subspaces translated by a vector. In recent work [6], we showed that they can be used to study important physical phenomena, such as current and voltage sources in electrical engineering, as well as fundamental synchroni- sation primitives in concurrency, such as mutual exclusion. Here we show that, in addition to yielding compelling mathematical domains, aﬃnity proves to be the magic ingredient that ties the diﬀerent components of the story of signal ﬂow graphs together: it provides us with a canonical and simple notion of observation to use for the deﬁnition of contextual equivalence, and gives us the expressive power to prove a bona ﬁde full abstraction result that relates contextual equiv- alence with denotational equality. To obtain the above result, we extend the signal ﬂow calculus to handle aﬃne behaviour. While the denotational semantics and axiomatic theory appeared in [6], the operational account appears here for the ﬁrst time and requires some technical innovations: instead of traces, we consider trajectories, which are inﬁ- nite traces that may start in the past. To record the time, states of our transition system have a runtime environment that keeps track of the global clock. Because the aﬃne signal ﬂow calculus is oblivious to ﬂow directionality, some terms exhibit pathological operational behaviour. We illustrate these phenomena with several examples. Nevertheless, for the linear sub-calculus, it is known [9] that every term is denotationally equal to an executable realisation: one that is in a form where a consistent ﬂow can be identiﬁed, like the classical notion of signal ﬂow graph. We show that the question has a more subtle answer in the aﬃne extension: not all terms are realisable as (aﬃne) signal ﬂow graphs. However, we are able to characterise the class of diagrams for which this is true. Related work. Several authors studied signal ﬂow graphs by exploiting concepts and techniques of programming language semantics, see e.g. [4,22,29,2]. The most relevant for this paper is [2], which, independently from [10], proposed the same syntax and axiomatisation for the ordinary signal ﬂow calculus and shares with our contribution the same methodology: the use of string diagrams as a math- ematical playground for the compositional study of diﬀerent sorts of systems. The idea is common to diverse, cross-disciplinary research programmes, includ- ing Categorical Quantum Mechanics [1,11,12], Categorical Network Theory [3], Monoidal Computer [26,27] and the analysis of (a)synchronous circuits [14,15]. Outline In Section 2 we recall the aﬃne signal ﬂow calculus. Section 3 introduces the operational semantics for the calculus. Section 4 deﬁnes contextual equiv- alence and proves full abstraction. Section 5 introduces a well-behaved class of Contextual Equivalence for Signal Flow Graphs 79 circuits, that denotes functional input-output systems, laying the groundwork for Section 6, in which the concept of realisability is introduced before a charac- terisation of which circuit diagrams are realisable. Missing proofs can be found in the extended version of this paper [7]. 2 Background: the Aﬃne Signal Flow Calculus The Aﬃne Signal Flow Calculus extends the signal ﬂow calculus [9] with an extra generator that allows to express aﬃne relations. In this section, we ﬁrst recall its syntax and denotational semantics from [6] and then we highlight two key properties for proving full abstraction that are enabled by the aﬃne extension. The operational semantics is delayed to the next section. :(1, 2) k :(1, 1) x :(1, 1) :(2, 1) :(1, 0) :(0, 1) :(0, 1) :(2, 1) k :(1, 1) x :(1, 1) :(1, 2) :(0, 1) :(1, 0) :(1, 0) c :(n, z) d :(z, m) c :(n, m) d :(r, z) :(0, 0) :(2, 2) :(1, 1) c ; d :(n, m) c⊕d :(n+r, m+z) Fig. 1. Sort inference rules. 2.1 Syntax c :: = | | k | | | | | (1) | | k | | | | | (2) | | | c ⊕ c | c ; c (3) The syntax of the calculus, generated by the grammar above, is parametrised over a given ﬁeld k, with k ranging over k. We refer to the constants in rows (1)- (2) as generators. Terms are constructed from generators, , , , and the two binary operations in (3). We will only consider those terms that are sortable, i.e. they can be associated with a pair (n, m), with n, m ∈ N. Sortable terms are called circuits: intuitively, a circuit with sort (n, m) has n ports on the left and m on the right. The sorting discipline is given in Fig. 1. We delay discussion of computational intuitions to Section 3 but, for the time being, we observe that the generators of row (2) are those of row (1) “reﬂected about the y-axis”. 2.2 String Diagrams It is convenient to consider circuits as the arrows of a symmetric monoidal cat- egory ACirc (for Aﬃne Circuits). Objects of ACirc are natural numbers (thus 80 F. Bonchi et al. ACirc is a prop [19]) and morphisms n → m are the circuits of sort (n, m), quotiented by the laws of symmetric monoidal categories [20,31] . The circuit grammar yields the symmetric monoidal structure of ACirc: sequential composi- tion is given by c ; d, the monoidal product is given by c ⊕ d, and identities and symmetries are built by pasting together and in the obvious way. We will adopt the usual convention of writing morphisms of ACirc as string diagrams, . . . . . . . . . . . . c c meaning that c ; c is drawn . . . and c ⊕ c is drawn . More suc- . . . . . c . cinctly, ACirc is the free prop on generators (1)-(2). The free prop on (1)-(2) sans and , hereafter called Circ, is the signal ﬂow calculus from [9]. Example 1. The diagram represents the circuit (( ; )⊕ );( ⊕( ; )) ; ((( ⊕ )⊕ );(( ; )⊕ )). 2.3 Denotational Semantics and Axiomatisation The semantics of circuits can be given denotationally by means of aﬃne relations. d d Deﬁnition 1. Let k be a ﬁeld. An aﬃne subspace of k is a subset V ⊆ k that is either empty or for which there exists a vector a ∈ k and a linear subspace L of k such that V = {a + v | v ∈ L}.A k-aﬃne relation of type n → m is an n m aﬃne subspace of k × k , considered as a k-vector space. Note that every linear subspace is aﬃne, taking a above to be the zero vector. Aﬃne relations can be organised into a prop: Deﬁnition 2. Let k be a ﬁeld. Let ARel be the following prop: – arrows n → m are k-aﬃne relations. n m – composition is relational: given G = {(u, v) | u ∈ k ,v ∈ k } and H = m l {(v, w) | v ∈ k ,w ∈ k }, their composition is G ; H := {(u, w) |∃v.(u, v) ∈ G ∧ (v, w) ∈ H}. u v – monoidal product given by G⊕H = , | (u, v) ∈ G, (u ,v ) ∈ H . u v In order to give semantics to ACirc, we use the prop of aﬃne relations over the ﬁeld k(x) of fractions of polynomials in x with coeﬃcients from k. Elements 1 2 n k +k ·x +k ·x +···+k ·x 0 1 2 n q ∈ k(x) are a fractions for some n, m ∈ N and k ,l ∈ k. 1 2 m i i l +l ·x +l ·x +···+l ·l 0 1 2 m Sum, product, 0 and 1 in k(x) are deﬁned as usual. This quotient is harmless: both the denotational semantics from [6] and the opera- tional semantics we introduce in this paper satisfy those axioms on the nose. Contextual Equivalence for Signal Flow Graphs 81 Deﬁnition 3. The prop morphism [[·]] : ACirc → ARel is inductively deﬁned k(x) on circuits as follows. For the generators in (1) p p −→ p, | p ∈ k(x) −→ ,p + q | p, q ∈ k(x) p q −→ {(p, •) | p ∈ k(x)} −→ {(•, 0)} −→ {(•, 1)} r −→ {(p, p · r) | p ∈ k(x)} x −→ {(p, p · x) | p ∈ k(x)} where • is the only element of k(x) . The semantics of components in (2) is symmetric, e.g. is mapped to {(p, •) | p ∈ k(x)}.For (3) p q −→ {(p, p) | p ∈ k(x)} −→ , | p, q ∈ k(x) q p −→ {(•, •)} c ⊕ c −→ [[c ]] ⊕ [[c ]] c ; c −→ [[c ]];[[c ]] 1 2 1 2 1 2 1 2 The reader can easily check that the pair of 1-dimensional vectors 1, ∈ 1−x 1 1 k(x) × k(x) belongs to the denotation of the circuit in Example 1. The denotational semantics enjoys a sound and complete axiomatisation. The axioms involve only basic interactions between the generators (1)-(2). The resulting theory is that of Aﬃne Interacting Hopf Algebras (aIH).The generators in (1) form a Hopf algebra, those in (2) form another Hopf algebra, and the interaction of the two give rise to two Frobenius algebras. We refer the reader to [6] for the full set of equations and all further details. aIH Proposition 1. For all c, d in ACirc, [[c]]=[[d]] if and only if c = d. 2.4 Aﬃne vs Linear Circuits It is important to highlight the diﬀerences between ACirc and Circ. The latter is the purely linear fragment: circuit diagrams of Circ denote exactly the linear relations over k(x) [8], while those of ACirc denote the aﬃne relations over k(x). The additional expressivity aﬀorded by aﬃne circuits is essential for our development. One crucial property is that every polynomial fraction can be expressed as an aﬃne circuit of sort (0, 1). Lemma 1. For all p ∈ k(x), there is c ∈ ACirc[0, 1] with [[c ]] = {(•,p)}. p p Proof. For each p ∈ k(x), let P be the linear subspace generated by the pair of 1-dimensional vectors (1,p). By fullness of the denotational semantics of Circ [8], there exists a circuit c in Circ such that [[c]] = P . Then, [[ ; c]] = {(•,p)}. The above observation yields the following: n m Proposition 2. Let (u, v) ∈ k(x) × k(x) . There exist circuits c ∈ ACirc[0,n] and c ∈ ACirc[m, 0] such that [[c ]] = {(•,u)} and [[c ]] = {(v, •)}. v u v 82 F. Bonchi et al. p q 1 1 . . Proof. Let u = . and v = . . By Lemma 1, for each p , there exists a . . p q n m circuit c such that [[c ]] = {(•,p )}. Let c = c ⊕ ... ⊕ c . Then [[c ]] = p p i u p p u i i 1 n {(•,u)}.For c , it is enough to see that Proposition 1 also holds with 0 and 1 switched, then use the argument above. Proposition 2 asserts that any behaviour (u, v) occurring in the denotation of some circuit c, i.e., such that (u, v) ∈ [[c]], can be expressed by a pair of circuits (c ,c ). We will, in due course, think of such a pair as a context, namely an u v environment with which a circuit can interact. Observe that this is not possible with the linear fragment Circ, since the only singleton linear subspace is 0. Another diﬀerence between linear and aﬃne concerns circuits of sort (0, 0). 0 0 0 Indeed k(x) = {•}, and the only linear relation over k(x) ×k(x) is the singleton {(•, •)}, which is id in ARel . But there is another aﬃne relation, namely the 0 k(x) 0 0 empty relation ∅∈ k(x) × k(x) . This can be represented by , for instance, since [[ ]] = {(•, 1)} ; {(0, •)} = ∅. Proposition 3. Let c ∈ ACirc[0, 0]. Then [[c]] is either id or ∅. 3 Operational Semantics for Aﬃne Circuits Here we give the structural operational semantics of aﬃne circuits, building on previous work [9] that considered only the core linear fragment, Circ. We consider circuits to be programs that have an observable behaviour. Observations are possible interactions at the circuit’s interface. Since there are two interfaces: a left and a right, each transition has two labels. In a transition tc −→ t c , c and c are states, that is, circuits augmented with information about which values k ∈ k are stored in each regis- x x ter ( and ) at that instant of the computation. When transitioning to c , the v above the arrow is a vector of values with which c synchronises on the left, and the w below the arrow accounts for the synchronisation on the right. States are decorated with runtime contexts: t and t are (possibly negative) inte- gers that—intuitively—indicate the time when the transition happens. Indeed, in Fig. 2, every rule advances time by 1 unit. “Negative time” is important: as we shall see in Example 3, some executions must start in the past. The rules in the top section of Fig. 2 provide the semantics for the generators in (1): is a copier, duplicating the signal arriving on the left; accepts any signal on the left and discards it, producing nothing on the right; is an adder that takes two signals on the left and emits their sum on the right, emits the constant 0 signal on the right; k is an ampliﬁer, multiplying the signal on the left by the scalar k ∈ k. All the generators described so far are stateless. State is provided by which is a register; a synchronous one place buﬀer with the value l stored. When it receives some value k on the left, it emits l on the right and stores k. The behaviour of the aﬃne generator Contextual Equivalence for Signal Flow Graphs 83 k k t −−→ t +1 t −→ t +1 kk • kl • t −−→ t +1 t −→ t +1 k+l 0 l k k l t x −→ t +1 x t r −−→ t +1 r l rl • • 0 −→ 1 t −→ t +1 (t =0) 1 0 kk • t −−→ t +1 t −→ t +1 k k k+l 0 t − −−→ t +1 t −→ t +1 kl • l k l rl t x − → t +1 x t r − − → t +1 r k l 1 0 0 −→ 1 t −→ t +1 (t =0) • • k kl • t −→ t +1 t −−→ t +1 t −→ t +1 k lk u v tc −→ t +1 c td −→ t +1 d v w tc ; d −→ t +1 c ; d u u 1 2 tc −−→ t +1 c td −−→ t +1 d v v 1 2 u u 1 2 tc ⊕ d − −−−→ t +1 c ⊕ d v v 1 2 Fig. 2. Structural rules for operational semantics, with p ∈ Z, k, l ranging over k and u, v, w vectors of elements of k of the appropriate size. The only vector of k is written T n as • (as in Deﬁnition 3), while a vector (k ... k ) ∈ k as k ...k . 1 n 1 n depends on the time: when t = 0, it emits 1, otherwise it emits 0. Observe that the behaviour of all other generators is time-independent. So far, we described the behaviour of the components in (1) using the in- tuition that signal ﬂows from left to right: in a transition −→ , the signal v on the left is thought as trigger and w as eﬀect. For the generators in (2), whose behaviour is deﬁned by the rules in the second section of Fig. 2, the behaviour is symmetric—indeed, here it is helpful to think of signals as ﬂowing from right to left. The next section of Fig. 2 speciﬁes the behaviours of the structural con- nectors of (3): is a twist, swapping two signals, is the empty circuit and is the identity wire: the signals on the left and on the right ports are equal. Finally, the rule for sequential ; composition forces the two components to have the same value v on the shared interface, while for parallel ⊕ composition, 84 F. Bonchi et al. components can proceed independently. Observe that both forms of composition require component transitions to happen at the same time. Deﬁnition 4. Let c ∈ ACirc. The initial state c of c is the one where all the registers store 0.A computation of c starting at time t ≤ 0 is a (possibly inﬁnite) sequence of transitions v v t+1 t+2 tc −−→ t +1 c − −−→ t +2 c − −−→ ... (4) 0 1 2 w w w t t+1 t+2 Since all transitions increment the time by 1, it suﬃces to record the time at which a computation starts. As a result, to simplify notation, we will omit the runtime context after the ﬁrst transition and, instead of (4), write v v v t+1 t+2 t c −−→ c − −−→ c − −−→ ... 0 1 2 w w w t t+1 t+2 Example 2. The circuit in Example 1 can perform the following computation. 0 1 1 1 0 0 x x x 0 −→ −→ −→ ··· 1 1 1 In the example above, the ﬂow has a clear left-to-right orientation, albeit with a feedback loop. For arbitrary circuits of ACirc this is not always the case, which sometimes results in unexpected operational behaviour. Example 3. In x is not possible to identify a consistent ﬂow: goes from left to right, while from right to left. Observe that there is no computation starting at t = 0, since in the initial state the register contains 0 while must emit 1. There is, however, a (unique!) computation starting at time t = −1, that loads the register with 1 before can also emit 1 at time t =0. 0 1 0 0 • • • • x x x x −1 −→ −→ −→ −→ ... 1 0 0 0 x x Similarly, features a unique computation starting at time t = −2. 00 01 10 00 • • • • −2 x x −→ x x −→ x x −→ x x −→ ... 1 0 0 0 It is worthwhile clarifying the reason why, in the aﬃne calculus, some compu- tations start in the past. As we have already mentioned, in the linear fragment the semantics of all generators is time-independent. It follows easily that time- independence is a property enjoyed by all purely linear circuits. The behaviour of , however, enforces a particular action to occur at time 0. Considering this in conjunction with a right-to-left register results in x , and the eﬀect is to anticipate that action by one step to time -1, as shown in Example 3. It is obvi- ous that this construction can be iterated, and it follows that the presence of a single time-dependent generator results in a calculus in which the computation of some terms must start at a ﬁnite, but unbounded time in the past. Contextual Equivalence for Signal Flow Graphs 85 Example 4. Another circuit with conﬂicting ﬂow is . Here there is no possible transition at t = 0, since at that time must emit a 1 and can only synchro- nise on a 0. Instead, the circuit can always perform an inﬁnite computation • • t −→ −→ ... , for any t ≤ 0. Roughly speaking, the computations of • • these two (0, 0) circuits are operational mirror images of the two possible denota- tions of Proposition 3. This intuition will be made formal in Section 4. For now, it is worth observing that for all c, ⊕ c can perform the same computations of c, while ⊕ c cannot ever make a transition at time 0. x x Example 5. Consider the circuit , which again features conﬂicting ﬂow. Our equational theory equates it with , but the computations involved are subtly diﬀerent. Indeed, for any sequence a ∈ k, it is obvious that admits the computation a a a 0 1 2 0 −−→ −−→ −−→ ... (5) a a a 0 1 2 x x The circuit admits a similar computation, but we must begin at time t = −1 in order to ﬁrst “load” the registers with a : a a a a a a 00 0 0 1 1 2 2 a a a 0 1 2 −1 x x −→ x x −−→ x x −−→ x x −−→ ... (6) 0 a a a 0 1 2 The circuit x x , which again is equated with by the equational theory, is more tricky. Although every computation of can be reproduced, x x admits additional, problematic computations. Indeed, consider 00 01 x x x x 0 −→ (7) at which point no further transition is possible—the circuit can deadlock. The following lemma is an easy consequence of the rules of Fig. 2 and follows by structural induction. It states that all circuits can stay idle in the past. Lemma 2. Let c ∈ ACirc[n, m] with initial state c . Then t c −→ c if t< 0. 0 0 0 3.1 Trajectories For the non-aﬃne version of the signal ﬂow calculus, we studied in [9] traces arising from computations. For the aﬃne extension, this is not possible since, as explained above, we must also consider computations that start in the past. In this paper, rather than traces we adopt a common control theoretic notion. n m Deﬁnition 5. An (n, m)-trajectory σ is a Z-indexed sequence σ : Z → k × k that is ﬁnite in the past, i.e., for which ∃j ∈ Z such that σ(i)=(0, 0) for i ≤ j. n m By the universal property of the product we can identify σ : Z → k × k n m with the pairing σ ,σ of σ : Z → k and σ : Z → k .A(k, m)-trajectory l r l r σ and (m, n)-trajectory τ are compatible if σ = τ . In this case, we can deﬁne r l 86 F. Bonchi et al. their composite, a (k, n)-trajectory σ ; τ by σ ; τ := σ ,τ . Given an (n ,m )- l r 1 1 trajectory σ , and an (n ,m )-trajectory σ , their product, an (n +n ,m +m )- 1 2 2 2 1 2 1 2 σ(i) trajectory σ ⊕σ , is deﬁned (σ ⊕σ )(i):= . Using these two operations 1 2 1 2 τ(i) we can organise sets of trajectories into a prop. Deﬁnition 6. The composition of two sets of trajectories is deﬁned as S ; T := {σ ; τ | σ ∈ S, τ ∈ T are compatible}. The product of sets of trajectories is deﬁned as S ⊕ S := {σ ⊕ σ | σ ∈ S ,σ ∈ S }. 1 2 1 2 1 1 2 2 Clearly both operations are strictly associative. The unit for ⊕ is the singleton with the unique (0, 0)-trajectory. Also ; has a two sided identity, given by sets of “copycat” (n, n)-trajectories. Indeed, we have that: Proposition 4. Sets of (n, m)-trajectories are the arrows n → m of a prop Traj with composition and monoidal product given as in Deﬁnition 6. Traj serves for us as the domain for operational semantics: given a circuit c and an inﬁnite computation u u t t+1 t+2 t c −−→ c −−−→ c −−−→ ... 0 1 2 v v v t+1 t+2 its associated trajectory σ is (u ,v)if i ≥ t, i i σ(i)= (8) (0, 0) otherwise. Deﬁnition 7. For a circuit c, c is the set of trajectories given by its inﬁnite computations, following the translation (8) above. The assignment c →c is compositional, that is: Theorem 1. · : ACirc → Traj is a morphism of props. Example 6. Consider the computations (5) and (6) from Example 5. According to (8) both are translated into the trajectory σ mapping i ≥ 0into(a ,a ) and i i i< 0into(0, 0). The reader can easily verify that, more generally, it holds that x x = . At this point it is worth to remark that the two circuits would be distinguished when looking at their traces: the trace of computation (5) is diﬀerent from the trace of (6). Indeed, the full abstraction result in [9] does not hold for all circuits, but only for those of a certain kind. The aﬃne extension obliges us to consider computations that starts in the past and, in turn, this drives us toward a stronger full abstraction result, shown in the next section. Before concluding, it is important to emphasise that = x x also holds. Indeed, problematic computations, like (7), are all ﬁnite and, by deﬁnition, do not give rise to any trajectory. The reader should note that the use of trajectories is not a semantic device to get rid of problematic computations. In fact, trajectories do not appear in the statement of our full abstraction result; they are merely a convenient tool to prove it. Another result (Proposition 9) independently takes care of ruling out problematic computations. Contextual Equivalence for Signal Flow Graphs 87 4 Contextual Equivalence and Full Abstraction This section contains the main contribution of the paper: a traditional full ab- straction result asserting that contextual equivalence agrees with denotational equivalence. It is not a coincidence that we prove this result in the aﬃne set- ting: aﬃnity plays a crucial role, both in its statement and proof. In particular, Proposition 3 gives us two possibilities for the denotation of (0, 0) circuits: (i) ∅—which, roughly speaking, means that there is a problem (see e.g. Example 4) and no inﬁnite computation is possible—or (ii) id , in which case inﬁnite com- putations are possible. This provides us with a basic notion of observation, akin to observing termination vs non-termination in the λ-calculus. Deﬁnition 8. For a circuit c ∈ ACirc[0, 0] we write c ↑ if c can perform an inﬁnite computation and c/ ↑ otherwise. For instance ↑, while /↑. To be able to make observations about arbitrary circuits we need to intro- duce an appropriate notion of context. Roughly speaking, contexts for us are (0, 0)-circuits with a hole into which we can plug another circuit. Since ours is a variable-free presentation, “dangling wires” assume the role of free vari- ables [16]: restricting to (0, 0) contexts is therefore analogous to considering ground contexts—i.e. contexts with no free variables—a standard concept of programming language theory. To deﬁne contexts formally, we extend the syntax of Section 2.1 with an extra generator “−” of sort (n, m). A (0, 0)-circuit of this extended syntax is a context when “−” occurs exactly once. Given an (n, m)-circuit c and a context C[−], we write C[c] for the circuit obtained by replacing the unique occurrence of “−”by c. With this setup, given an (n, m)-circuit c, we can insert it into a context C[−] and observe the possible outcome: either C[c] ↑ or C[c] /↑. This naturally leads us to contextual equivalence and the statement of our main result. Deﬁnition 9. Given c, d ∈ ACirc[n, m], we say that they are contextually equiv- alent, written c ≡ d, if for all contexts C[−], C[c] ↑ iﬀ C[d] ↑ . x x Example 7. Recall from Example 5, the circuits and . Take the context C[−]= c ; − ; c for c ∈ ACirc[0, 1] and c ∈ ACirc[1, 0]. Assume that σ τ σ τ c and c have a single inﬁnite computation. Call σ and τ the corresponding σ τ x x trajectories. If σ = τ, both C[ ] and C[ ] would be able to perform an inﬁnite computation. Instead if σ = τ, none of them would perform any inﬁnite computation: would stop at time t, for t the ﬁrst moment such that x x σ(t) = τ(t), while C[ ] would stop at time t +1. Now take as context C[−]= ; − ; . In contrast to c and c , σ τ and can perform more than one single computation: at any time they can nondeterministically emit any value. Thus every computation of C[ ]= 88 F. Bonchi et al. can always be extended to an inﬁnite one, forcing synchronisation of and at each step. For C[ x x ]= x x , and may emit diﬀerent values at time t, but the computation will get stuck at t + 1. However, our x x deﬁnition of ↑ only cares about whether C[ ] can perform an inﬁnite computation. Indeed it can, as long as and consistently emit the same value at each time step. If we think of contexts as tests, and say that a circuit c passes test C[−]if C[c] perform an inﬁnite computation, then our notion of contextual equivalence x x is may-testing equivalence [13]. From this perspective, and are not must equivalent, since the former must pass the test ; − ; while x x may not. It is worth to remark here that the distinction between may and must testing will cease to make sense in Section 5 where we identify a certain class of circuits equipped with a proper ﬂow directionality and thus a deterministic, input-output, behaviour. aIH Theorem 2 (Full abstraction). c ≡ d iﬀ c = d The remainder of this section is devoted to the proof of Theorem 2. We will start by clarifying the relationship between fractions of polynomials (the denotational domain) and trajectories (the operational domain). 4.1 From Polynomial Fractions to Trajectories The missing link between polynomial fractions and trajectories are (formal) Laurent series: we now recall this notion. Formally, a Laurent series is a function σ : Z → k for which there exists j ∈ Z such that σ(i) = 0 for all i<j.We write σ as ...,σ(−1),σ(0),σ(1),... with position 0 underlined, or as formal sum σ(i)x . Each Laurent series σ has then a degree d ∈ Z, which is the ﬁrst i=d non-zero element. Laurent series form a ﬁeld k((x)): sum is pointwise, product −1 is by convolution, and the inverse σ of σ with degree d is deﬁned as: 0if i< −d −1 −1 σ(d) if i = −d σ (i)= (9) n −1 σ(d+i)·σ (−d+n−i) ⎩ i=1 if i=−d+n for n>0 −σ(d) Note (formal) power series, which form ‘just’ a ring k[[x]], are a particular case of Laurent series, namely those σs for which d ≥ 0. What is most interesting for our purposes is how polynomials and fractions of polynomials relate to k((x)) and k[[x]]. First, the ring k[x] of polynomials embeds into k[[x]], and thus into k((x)): a polynomial p + p x + ··· + p x can also be regarded as the power series 0 1 n p x with p = 0 for all i>n. Because Laurent series are closed under i i i=0 division, this immediately gives also an embedding of the ﬁeld of polynomial fractions k(x)into k((x)). Note that the full expressiveness of k((x)) is required: for instance, the fraction is represented as the Laurent series ..., 0, 1, 0, 0,... , x Contextual Equivalence for Signal Flow Graphs 89 which is not a power series, because a non-zero value appears before position 0. In fact, fractions that are expressible as power series are precisely the rational 2 n k +k x+k x ···+k x 0 1 2 n fractions, i.e. of the form where l =0. 2 n 0 l +l x+l x ···+l x 0 1 2 n Rational fractions form a ring kx which, dif- k[[x]] k((x)) ferently from the full ﬁeld k(x), embeds into k[[x]]. Indeed, whenever l = 0, the inverse of 2 n l + l x + l x ··· + l x is, by (9), a bona ﬁde 0 1 2 n kx power series. The commutative diagram on the k[x] k(x) right is a summary. Relations between k((x))-vectors organise themselves into a prop ARel k((x)) (see Deﬁnition 2). There is an evident prop morphism ι: ARel → ARel : k(x) k((x)) it maps the empty aﬃne relation on k(x) to the one on k((x)), and otherwise applies pointwise the embedding of k(x)into k((x)). For the next step, observe that trajectories are in fact rearrangements of Laurent series: each pair of vectors n m (u, v) ∈ k((x)) × k((x)) , as on the left below, yields the trajectory κ(u, v) deﬁned for all i ∈ Z as on the right below. ⎛ ⎛ ⎞ ⎛ ⎞ ⎞ ⎛ ⎛ ⎞ ⎛ ⎞ ⎞ 1 1 1 1 α β α (i) β (i) ⎜ ⎜ ⎟ ⎜ ⎟ ⎟ ⎜ ⎜ ⎟ ⎜ ⎟ ⎟ . . . . (u, v)= . , . κ(u, v)(i)= . , . ⎝ ⎝ ⎠ ⎝ ⎠ ⎠ ⎝ ⎝ ⎠ ⎝ ⎠ ⎠ . . . . n m n m α β α (i) β (i) Similarly to ι, the assignment κ extends to sets of vectors, and also to a prop morphism from ARel to Traj. Together, κ and ι provide the desired link k((x)) between operational and denotational semantics. Theorem 3. · = κ ◦ ι ◦ [[·]] Proof. Since both are symmetric monoidal functors from a free prop, it is enough to check the statement for the generators of ACirc. We show, as an example, the case of . By Deﬁnition 3, [[ ]] = p, | p ∈ k(x) . This is mapped by ι to α, | α ∈ k((x)) . Now, to see that κ(ι([[ ]])) = ,itis enough to observe that a trajectory σ is in κ(ι([[ ]])) precisely when, for all i, there exists some k ∈ k such that σ(i)= k , . i i 4.2 Proof of Full Abstraction We now have the ingredients to prove Theorem 2. First, we prove an adequacy result for (0, 0) circuits. Proposition 5. Let c ∈ ACirc[0, 0]. Then [[c]] = id if and only if c ↑. Proof. By Proposition 3, either [[c]] = id or [[c]] = ∅, which, combined with Theorem 3, means that c = κ ◦ ι(id )or c = κ ◦ ι(∅). By deﬁnition of ι this implies that either c contains a trajectory or not. In the ﬁrst case c ↑;inthe second c/↑. 90 F. Bonchi et al. Next we obtain a result that relates denotational equality in all contexts to equality in aIH. Note that it is not trivial: since we consider ground contexts it does not make sense to merely consider “identity” contexts. Instead, it is at this point that we make another crucial use of aﬃnity, taking advantage of the increased expressivity of aﬃne circuits, as showcased by Proposition 2. aIH Proposition 6. If [[C[c]]]=[[C[d]]] for all contexts C[−], then c = d. aIH Proof. Suppose that c = d. Then [[c]] =[[d]]. Since both [[c]] and [[d]] are aﬃne n m relations over k(x), there exists a pair of vectors (u, v) ∈ k(x) × k(x) that is in one of [[c]] and [[d]], but not both. Assume wlog that (u, v) ∈ [[c]] and (u, v) ∈ / [[d]]. By Proposition 2, there exists c and c such that [[c ; c ; c ]]=[[c ]];[[c]];[[c ]] = u v u v u v {(•,u)} ;[[c]] ; {(v, •)}. Since (u, v) ∈ [[c]], then [[c ; c ; c ]] = {(•, •)}. Instead, since u v (u, v) ∈ / [[d]], we have that [[c ; d ; c ]] = ∅. Therefore, for the context C[−]= u v c ; − ; c , we have that [[C[c]]] =[[C[d]]]. u v The proof of our main result is now straightforward. aIH Proof of Theorem 2. Let us ﬁrst suppose that c = d. Then [[C[c]]]=[[C[d]]] for all contexts C[−], since [[·]] is a morphism of props. By Corollary 5, it follows immediately that C[c] ↑ if and only if C[d] ↑, namely c ≡ d. Conversely, suppose that, for all C[−], C[c] ↑ iﬀ C[d] ↑. Again by Corollary 5, we have that [[C[c]]]=[[C[d]]]. We conclude by invoking Proposition 6. 5 Functional Behaviour and Signal Flow Graphs There is a sub-prop SF of Circ of classical signal ﬂow graphs (see e.g. [21]). Here signal ﬂows left-to-right, possibly featuring feedback loops, provided that these go through at least one register. Feedback can be captured algebraically via an operation Tr(·): Circ[n +1,m +1] → Circ[n, m] taking c: n +1 → m + 1 to: n m → − Following [9], let us call Circ the free sub-prop of Circ of circuits built from (3) → − and the generators of (1), without . Then SF is deﬁned as the closure of Circ under Tr(·). For instance, the circuit of Example 2 is in SF. Signal ﬂow graphs are intimately connected to the executability of circuits. In general, the rules of Figure 2 do not assume a ﬁxed ﬂow orientation. As a result, some circuits in Circ are not executable as functional input-output systems, as x x x we have demonstrated with , and of Examples 3-5. Notice that none of these are signal ﬂow graphs. In fact, the circuits of SF do not have pathological behaviour, as we shall state more precisely in Proposition 9. At the denotational level, signal ﬂow graphs correspond precisely to rational functional behaviours, that is, matrices whose coeﬃcients are in the ring kx Contextual Equivalence for Signal Flow Graphs 91 of rational fractions (see Section 4.1). We call such matrices, rational matrices. One may check that the semantics of a signal ﬂow graph c:(n, m)isalways of the form [[c]] = {(v, A · v) | v ∈ k(x) }, for some m × n rational matrix A. Conversely, all relations that are the graph of rational matrices can be expressed as signal ﬂow graphs. Proposition 7. Given c:(n, m), we have [[c]] = {(p, A · p) | p ∈ k(x) } for some rational m×n matrix A iﬀ there exists a signal ﬂow graph f, i.e., a circuit f :(n, m) of SF, such that [[f]]=[[c]]. Proof. This is a folklore result in control theory which can be found in [30]. The details of the translation between rational matrices and circuits of SF can be found in [10, Section 7]. The following gives an alternative characterisation of rational matrices—and therefore, by Proposition 7, of the behaviour of signal ﬂow graphs—that clariﬁes their role as realisations of circuits. m n Proposition 8. An m × n matrix is rational iﬀ A · r ∈ kx for all r ∈ kx . Proposition 8 is another guarantee of good behaviour—it justiﬁes the name of inputs (resp. outputs) for the left (resp. right) ports of signal ﬂow graphs. Recall from Section 4.1 that rational fractions can be mapped to Laurent series of nonnegative degree, i.e., to plain power series. Operationally, these correspond to trajectories that start after t = 0. Proposition 8 guarantees that any trajectory of a signal ﬂow graph whose ﬁrst nonzero value on the left appears at time t =0, will not have nonzero values on the right starting before time t = 0. In other words, signal ﬂow graphs can be seen as processing a stream of values from left to right. As a result, their ports can be clearly partitioned into inputs and outputs. But the circuits of SF are too restrictive for our purposes. For example, can also be seen to realise a functional behaviour transforming inputs on the left into outputs on the right yet it is not in SF. Its behaviour is no longer linear, but aﬃne. Hence, we need to extend signal ﬂow graphs to include functional aﬃne behaviour. The following deﬁnition does just that. Deﬁnition 10. Let ASF be the sub-prop of ACirc obtained from all the genera- tors in (1), closed under Tr(·). Its circuits are called aﬃne signal ﬂow graphs. x x x As before, none of , and from Examples 3-5 are aﬃne sig- nal ﬂow graphs. In fact, ASF rules out pathological behaviour: all computations can be extended to be inﬁnite, or in other words, do not get stuck. Proposition 9. Given an aﬃne signal ﬂow graph f, for every computation u t+1 t f −−→ f −−−→ ...f 0 1 n v v t p+1 there exists a trajectory σ ∈c such that σ(i)=(u ,v ) for t ≤ i ≤ t + n. i i Proof. By induction on the structure of aﬃne signal ﬂow graphs. 92 F. Bonchi et al. If SF circuits correspond precisely to kx -matrices, those of ASF correspond precisely to kx -aﬃne transformations. n m Deﬁnition 11. A map f: k(x) → k(x) is an aﬃne map if there exists an m n m × n matrix A and b ∈ k(x) such that f(p)= A · p + b for all p ∈ k(x) .We call the pair (A, b) the representation of f. The notion of rational aﬃne map is a straightforward extension of the linear case and so is the characterisation in terms of rational input-output behaviour. Deﬁnition 12. An aﬃne map f: p → A · p + b is rational if A and b have coeﬃcients in kx . n m m Proposition 10. An aﬃne map f: k(x) → k(x) is rational iﬀ f(r) ∈ kx for all r ∈ kx . The following extends the correspondence of Proposition 7, showing that ASF is the rightful aﬃne heir of SF. Proposition 11. Given c:(n, m), we have [[c]] = {(p, f(p)) | p ∈ k(x) } for some rational aﬃne map f iﬀ there exists an aﬃne signal ﬂow graph g, i.e., a circuit g:(n, m) of ASF, such that [[g]]=[[c]]. Proof. Let f be given by p → Ap + b for some rational m × n matrix A and vector b ∈ kx . By Proposition 7, we can ﬁnd a circuit c of SF such that [[c ]] = {(p, A · p) | p ∈ k(x)}. Similarly, we can n represent b as a signal ﬂow graph c of sort (1,m). b m c := Then, the circuit on the right is clearly in ASF and veriﬁes [[c]] = {(p, Ap + b) | p ∈ k(x)} as required. For the converse direction it is straightforward to check by structural in- duction that the denotation of aﬃne signal ﬂow graphs is the graph (in the set-theoretic sense of pairs of values) of some rational aﬃne map. 6 Realisability In the previous section we gave a restricted class of morphisms with good be- havioural properties. We may wonder how much of ACirc we can capture with this restricted class. The answer is, in a precise sense: most of it. Surprisingly, the behaviours realisable in Circ—the purely linear fragment— are not more expressive. In fact, from an operational (or denotational, by full abstraction) point of view, Circ is nothing more than jumbled up version of SF. Indeed, it turns out that Circ enjoys a realisability theorem: any circuit c of Circ can be associated with one of SF, that implements or realises the behaviour of c into an executable form. But the corresponding realisation may not ﬂow neatly from left to right like signal ﬂow graphs do—its inputs and outputs may have been moved from one side to the other. Consider for example, the circuit on the right Contextual Equivalence for Signal Flow Graphs 93 It does not belong to SF but it can be read as a signal ﬂow graph with an input that has been bent and moved to the bottom right. The behaviour it realises can therefore executed by rewiring this port to obtain a signal ﬂow graph: aIH x x We will not make this notion of rewiring precise here but refer the reader to [9] for the details. The intuition is simply that a rewiring partitions the ports of a circuit into two sets—that we call inputs and outputs—and uses or to bend input ports to the left and and output ports to the right. The realisability theorem then states that we can always recover a (not necessarily unique) signal ﬂow graph from any circuit by performing these operations. Theorem 4. [9, Theorem 5] Every circuit in Circ is equivalent to the rewiring of a signal ﬂow graph, called its realisation. This theorem allows us to extend the notion of inputs and outputs to all circuits of Circ. Deﬁnition 13. A port of a circuit c of Circ is an input (resp. output) port, if there exists a realisation for which it is an input (resp. output). Note that, since realisations are not necessarily unique, the same port can be both an input and an output. Then, the realisability theorem (Theorem 4) says that every port is always an input, an output or both (but never neither). An output-only port is an output port that is not an input port. Similarly an input-only port in an input port that is not an output port. Example 8. The left port of the register x is input-only whereas its right port is output-only. In the identity wire, both ports are input and output ports. The single port of is output-only ; that of is input-only. While in the purely linear case, all behaviours are realisable, the general case of ACirc is a bit more subtle. To make this precise, we can extend our deﬁnition of realisability to include aﬃne signal ﬂow graphs. Deﬁnition 14. A circuit of ACirc is realisable if its ports can be rewired so that it is equivalent to a circuit of ASF. Example 9. is realisable; is not. Notice that Proposition 11, gives the following equivalent semantic criterion for realisability. Realisable behaviours are precisely those that map rationals to rationals. Theorem 5. A circuit c is realisable iﬀ its ports can be partitioned into two sets, that we call inputs and outputs, such that the corresponding rewiring of c is an aﬃne rational map from inputs to outputs. 94 F. Bonchi et al. We oﬀer another perspective on realisability below: realisable behaviours cor- respond precisely to those for which the constants are connected to inputs of the underlying Circ-circuit. First, notice that, since (1-dup) (1-del) = and = in aIH, we can assume without loss of generality that each circuit contains exactly one . Proposition 12. Every circuit c of ACirc is equivalent to one with precisely one and no . For c:(n, m) a circuit of ACirc, we will call cˆ the circuit of Circ of sort (n +1,m) that one obtains by ﬁrst transforming c into an equivalent circuit with a single and no as above, then removing this , and replacing it by an identity wire that extends to the left boundary. Theorem 6. A circuit c is realisable iﬀ is connected to an input port of cˆ. 7 Conclusion and Future Work We introduced the operational semantics of the aﬃne extension of the signal ﬂow calculus and proved that contextual equivalence coincides with denotational equality, previously introduced and axiomatised in [6]. We have observed that, at the denotational level, aﬃnity provides two key properties (Propositions 2 and 3) for the proof of full abstraction. However, at the operational level, aﬃn- ity forces us to consider computations starting in the past (Example 3) as the syntax allows terms lacking a proper ﬂow directionality. This leads to circuits that might deadlock ( in Example 4) or perform some problematic computa- x x tions ( in Example 5). We have identiﬁed a proper subclass of circuits, called aﬃne signal ﬂow graphs (Deﬁnition 10), that possess an inherent ﬂow directionality: in these circuits, the same pathological behaviours do not arise (Proposition 9). This class is not too restrictive as it captures all desirable be- haviours: a realisability result (Theorem 5) states that all and only the circuits that do not need computations to start in the past are equivalent to (the rewiring of) an aﬃne signal ﬂow graph. The reader may be wondering why we do not restrict the syntax to aﬃne signal ﬂow graphs. The reason is that, like in the behavioural approach to control theory [33], the lack of ﬂow direction is what allows the (aﬃne) signal ﬂow calcu- lus to achieve a strong form of compositionality and a complete axiomatisation (see [9] for a deeper discussion). We expect that similar methods and results can be extended to other models of computation. Our next step is to tackle Petri nets, which, as shown in [5], can be regarded as terms of the signal ﬂow calculus, but over N rather than a ﬁeld. Contextual Equivalence for Signal Flow Graphs 95 References 1. Abramsky, S., Coecke, B.: A categorical semantics of quantum protocols. In: Pro- ceedings of the 19th Annual IEEE Symposium on Logic in Computer Science (LICS), 2004. pp. 415–425. IEEE (2004) 2. Baez, J., Erbele, J.: Categories in control. Theory and Applications of Categories 30, 836–881 (2015) 3. Baez, J.C.: Network theory (2014), http://math.ucr.edu/home/baez/networks/, website (retrieved 15/04/2014) 4. Basold, H., Bonsangue, M., Hansen, H., Rutten, J.: (Co)Algebraic characterizations of signal ﬂow graphs. In: van Breugel, F., Kasheﬁ, E., Palamidessi, C., Rutten, J. (eds.) Horizons of the Mind. A Tribute to Prakash Panangaden, Lecture Notes in Computer Science, vol. 8464, pp. 124–145. Springer International Publishing (2014) 5. Bonchi, F., Holland, J., Piedeleu, R., Sobocinski, ´ P., Zanasi, F.: Diagrammatic al- gebra: from linear to concurrent systems. Proceedings of the 46th ACM SIGPLAN Symposium on Principles of Programming Languages (POPL) 3, 1–28 (2019) 6. Bonchi, F., Piedeleu, R., Sobocinski, ´ P., Zanasi, F.: Graphical aﬃne algebra. In: Proceedings of the 34th Annual ACM/IEEE Symposium on Logic in Computer Science (LICS). pp. 1–12 (2019) 7. Bonchi, F., Piedeleu, R., Sobocinski, ´ P., Zanasi, F.: Contextual equivalence for signal ﬂow graphs (2020), https://arxiv.org/abs/2002.08874 8. Bonchi, F., Sobocinski, ´ P., Zanasi, F.: A categorical semantics of signal ﬂow graphs. In: Proceedings of the 25th International Conference on Concurrency Theory (CONCUR). pp. 435–450. Springer (2014) 9. Bonchi, F., Sobocinski, P., Zanasi, F.: Full abstraction for signal ﬂow graphs. In: Proceedings of the 42nd Annual ACM SIGPLAN Symposium on Principles of Programming Languages (POPL). pp. 515–526 (2015) 10. Bonchi, F., Sobocinski, P., Zanasi, F.: The calculus of signal ﬂow diagrams I: linear relations on streams. Information and Computation 252, 2–29 (2017) 11. Coecke, B., Duncan, R.: Interacting quantum observables. In: Proceedings of the 35th international colloquium on Automata, Languages and Programming (ICALP), Part II. pp. 298–310 (2008) 12. Coecke, B., Kissinger, A.: Picturing Quantum Processes - A ﬁrst course in Quantum Theory and Diagrammatic Reasoning. Cambridge University Press (2017) 13. De Nicola, R., Hennessy, M.C.: Testing equivalences for processes. Theoretical Computer Science 34(1-2), 83–133 (1984) 14. Ghica, D.R.: Diagrammatic reasoning for delay-insensitive asynchronous circuits. In: Computation, Logic, Games, and Quantum Foundations. The Many Facets of Samson Abramsky, pp. 52–68. Springer (2013) 15. Ghica, D.R., Jung, A.: Categorical semantics of digital circuits. In: Proceedings of the 16th Conference on Formal Methods in Computer-Aided Design (FMCAD). pp. 41–48 (2016) 16. Ghica, D.R., Lopez, A.: A structural and nominal syntax for diagrams. In: Pro- ceedings 14th International Conference on Quantum Physics and Logic (QPL). pp. 71–83 (2017) 17. Hoare, C.A.R.: Communicating Sequential Processes. Prentice Hall (1985) 18. Honda, K., Yoshida, N.: On reduction-based process semantics. Theoretical Com- puter Science 152(2), 437–486 (1995) 96 F. Bonchi et al. 19. Mac Lane, S.: Categorical algebra. Bulletin of the American Mathematical Society 71, 40–106 (1965) 20. Mac Lane, S.: Categories for the Working Mathematician. Springer (1998) 21. Mason, S.J.: Feedback Theory: I. Some Properties of Signal Flow Graphs. MIT Research Laboratory of Electronics (1953) 22. Milius, S.: A sound and complete calculus for ﬁnite stream circuits. In: Proceedings of the 2010 25th Annual IEEE Symposium on Logic in Computer Science (LICS). pp. 421–430 (2010) 23. Milner, R.: A Calculus of Communicating Systems, Lecture Notes in Computer Science, vol. 92. Springer (1980) 24. Milner, R., Sangiorgi, D.: Barbed bisimulation. In: Proceedings of the 19th Inter- national Colloquium on Automata, Languages and Programming (ICALP). pp. 685–695 (1992) 25. Morris Jr, J.H.: Lambda-calculus models of programming languages. Ph.D. thesis, Massachusetts Institute of Technology (1969) 26. Pavlovic, D.: Monoidal computer I: Basic computability by string diagrams. Infor- mation and Computation 226, 94–116 (2013) 27. Pavlovic, D.: Monoidal computer II: Normal complexity by string diagrams. arXiv:1402.5687 (2014) 28. Plotkin, G.D.: Call-by-name, call-by-value and the λ-calculus. Theoretical Com- puter Science 1(2), 125–159 (1975) 29. Rutten, J.J.M.M.: A tutorial on coinductive stream calculus and signal ﬂow graphs. Theoretical Computer Science 343(3), 443–481 (2005) 30. Rutten, J.J.M.M.: Rational streams coalgebraically. Logical Methods in Computer Science 4(3) (2008) 31. Selinger, P.: A survey of graphical languages for monoidal categories. Springer Lecture Notes in Physics 13(813), 289–355 (2011) 32. Shannon, C.E.: The theory and design of linear diﬀerential equation machines. Tech. rep., National Defence Research Council (1942) 33. Willems, J.C.: The behavioural approach to open and interconnected systems. IEEE Control Systems Magazine 27, 46–99 (2007) Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/ 4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made. The images or other third party material in this chapter are included in the chapter’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. Parameterized Synthesis for Fragments of First-Order Logic over Data Words 1 2 1() B´eatrice B´erard , Benedikt Bollig , Mathieu Lehaut , and Nathalie Sznajder Sorbonne Universit´e, CNRS, LIP6, F-75005 Paris, France CNRS, LSV & ENS Paris-Saclay, Universit´e Paris-Saclay, Cachan, France Abstract. We study the synthesis problem for systems with a parame- terized number of processes. As in the classical case due to Church, the system selects actions depending on the program run so far, with the aim of fulﬁlling a given speciﬁcation. The diﬃculty is that, at the same time, the environment executes actions that the system cannot control. In con- trast to the case of ﬁxed, ﬁnite alphabets, here we consider the case of parameterized alphabets. An alphabet reﬂects the number of processes, which is static but unknown. The synthesis problem then asks whether there is a ﬁnite number of processes for which the system can satisfy the speciﬁcation. This variant is already undecidable for very limited logics. Therefore, we consider a ﬁrst-order logic without the order on word posi- tions. We show that even in this restricted case synthesis is undecidable if both the system and the environment have access to all processes. On the other hand, we prove that the problem is decidable if the environ- ment only has access to a bounded number of processes. In that case, there is even a cutoﬀ meaning that it is enough to examine a bounded number of process architectures to solve the synthesis problem. 1 Introduction Synthesis deals with the problem of automatically generating a program that satisﬁes a given speciﬁcation. The problem goes back to Church [9], who formu- lated it as follows: The environment and the system alternately select an input symbol and an output symbol from a ﬁnite alphabet, respectively, and in this way generate an inﬁnite sequence. The question now is whether the system has a winning strategy, which guarantees that the resulting inﬁnite run is contained in a given (ω)-regular language representing the speciﬁcation, no matter how the environment behaves. This problem is decidable and very well understood [8,37], and it has been extended in several diﬀerent ways (e.g., [24, 26, 28, 36, 43]). In this paper, we consider a variant of the synthesis problem that allows us to model programs with a variable number of processes. As we then deal with an unbounded number of process identiﬁers, a ﬁxed ﬁnite alphabet is not suit- able anymore. It is more appropriate to use an inﬁnite alphabet, in which every Partly supported by ANR FREDDA (ANR-17-CE40-0013). The Author(s) 2020 J. Goubault-Larrecq and B. K¨ onig (Eds.): FOSSACS 2020, LNCS 12077, pp. 97–118, 2020. https://doi.org/10.1007/978-3-030-45231-5_6 98 B. B´ erard et al. letter contains a process identiﬁer and a program action. One can distinguish two cases here. In [16], a potentially inﬁnite number of data values are involved in an inﬁnite program run (e.g. by dynamic process generation). In a parameter- ized system [4, 13], on the other hand, one has an unknown but static number of processes so that, along each run, the number of processes is ﬁnite. In this paper, we are interested in the latter, i.e., parameterized case. Parameterized programs are ubiquitous and occur, e.g., in distributed algorithms, ad-hoc net- works, telecommunication protocols, cache-coherence protocols, swarm robotics, and biological systems. The synthesis question asks whether the system has a winning strategy for some number of processes (existential version) or no matter how many processes there are (universal version). Over inﬁnite alphabets, there are a variety of diﬀerent speciﬁcation languages (e.g., [5, 11, 12, 19, 29, 33, 40]). Unlike in the case of ﬁnite alphabets, there is no canonical deﬁnition of regular languages. In fact, the synthesis problem has been studied for N-memory automata [7], the Logic of Repeating Values [16], and reg- ister automata [15,30,31]. Though there is no agreement on a “regular” automata model, ﬁrst-order (FO) logic over data words can be considered as a canonical logic, and this is the speciﬁcation language we consider here. In addition to classical FO logic on words over ﬁnite alphabets, it provides a predicate x ∼ y to express that two events x and y are triggered by the same process. Its two- variable fragment FO has a decidable emptiness and universality problem [5] and is, therefore, a promising candidate for the synthesis problem. Previous generalizations of Church’s synthesis problem to inﬁnite alphabets were generally synchronous in the sense that the system and the environment perform their actions in strictly alternating order. This assumption was made, e.g., in the above-mentioned recent papers [7, 15, 16, 30, 31]. If there are several processes, however, it is realistic to relax this condition, which leads us to an asynchronous setting in which the system has no inﬂuence on when the envi- ronment acts. Like in [21], where the asynchronous case for a ﬁxed number of processes was considered, we only make the reasonable fairness assumption that the system is not blocked forever. In summary, the synthesis problem over inﬁnite alphabets can be classiﬁed as (i) parameterized vs. dynamic, (ii) synchronous vs. asynchronous, and (iii) according to the speciﬁcation language (register automata, Logic of Repeating Values, FO logic, etc.). As explained above, we consider here the parameter- ized asynchronous case for speciﬁcations written in FO logic. To the best of our knowledge, this combination has not been considered before. For ﬂexible model- ing, we also distinguish between three types of processes: those that can only be controlled by the system; those that can only be controlled by the environment; and ﬁnally those that can be triggered by both. A partition into system and environment processes is also made in [3,18], but for a ﬁxed number of processes and in the presence of an arena in terms of a Petri net. Let us brieﬂy describe our results. We show that the general case of the synthesis problem is undecidable for FO logic. This follows from an adaptation of an undecidability result from [16,17] for a fragment of the Logic of Repeating Parameterized Synthesis for First-Order Logic over Data Words 99 Values [11]. We therefore concentrate on an orthogonal logic, namely FO without the order on the word positions. First, we show that this logic can essentially count processes and actions of a given process up to some threshold. Though it has limited expressive power (albeit orthogonal to that of FO ), it leads to intricate behaviors in the presence of an uncontrollable environment. In fact, we show that the synthesis problem is still undecidable. Due to the lack of the order relation, the proof requires a subtle reduction from the reachability problem in 2-counter Minsky machines. However, it turns out that the synthesis problem is decidable if the number of processes that are controllable by the environment is bounded, while the number of system processes remains unbounded. In this case, there is even a cutoﬀ k, an important measure for parameterized systems (cf. [4] for an overview): If the system has a winning strategy for k processes, then it has one for any number of processes greater than k, and the same applies to the environment. The proofs of both main results rely on a reduction of the synthesis problem to turn-based parameterized vector games, in which, similar to Petri nets, tokens corresponding to processes are moved around between states. The paper is structured as follows. In Section 2, we deﬁne FO logic (especially FO without word order), and in Section 3, we present the parameterized synthesis problem. In Section 4, we transform a given formula into a normal form and ﬁnally into a parameterized vector game. Based on this reduction, we investigate cutoﬀ properties and show our (un)decidability results in Section 5. We conclude in Section 6. Some proof details can be found in the long version of this paper [2] 2 Preliminaries ∗ ω For a ﬁnite or inﬁnite alphabet Σ, let Σ and Σ denote the sets of ﬁnite and, ∗ ω respectively, inﬁnite words over Σ. The empty word is ε. Given w ∈ Σ ∪ Σ , let |w| denote the length of w and Pos(w) its set of positions: |w| = n and Pos(w)= {1,...,n} if w = σ σ ...σ ∈ Σ , and |w| = ω and Pos(w)= 1 2 n {1, 2,...} if w ∈ Σ . Let w[i]bethe i-th letter of w for all i ∈ Pos(w). Executions. We consider programs involving a ﬁnite (but not ﬁxed) number of processes. Processes are controlled by antagonistic protagonists, System and Environment. Accordingly, each process has a type among T = {s, e, se}, and we let P , P , and P denote the pairwise disjoint ﬁnite sets of processes controlled s e se by System, by Environment, and by both System and Environment, respectively. We let P denote the triple (P , P , P ). Abusing notation, we sometimes refer to s e se P as the disjoint union P ∪ P ∪ P . s e se Given any set S, vectors s ∈ S are usually referred to as triples s = (s ,s ,s ). Moreover, for s, s ∈ N , we write s ≤ s if s ≤ s for all θ ∈ T. s e se θ Finally, let s + s =(s + s ,s + s ,s + s ). s e se s e se Processes can execute actions from a ﬁnite alphabet A. Whenever an action is executed, we would like to know whether it was triggered by System or by Environment. Therefore, A is partitioned into A = A A . Let Σ = A ×(P ∪P ) s e s s s se and Σ = A × (P ∪ P ). Their union Σ = Σ ∪ Σ is the set of events. A word e e e se s e ∗ ω w ∈ Σ ∪ Σ is called a P-execution. 100 B. B´ erard et al. e A = {a, b} A = {c, d} s e a c a c a a b d d b d 1 8 7 4 7 6 2 77 6 6 se Fig. 1. Representation of P-execution as a mathematical structure Logic. Formulas of our logic are evaluated over P-executions. We ﬁx an inﬁnite supply V = {x, y, z, . . .} of variables, which are interpreted as processes from P or positions of the execution. The logic FO [∼,<, +1] is given by the grammar ϕ ::= θ(x) | a(x) | x = y | x ∼ y | x<y | +1(x, y) |¬ϕ | ϕ ∨ ϕ |∃x.ϕ where x, y ∈V, θ ∈ T, and a ∈ A. Conjunction (∧), universal quantiﬁcation (∀), implication (=⇒), true, and false are obtained as abbreviations as usual. Let ϕ ∈ FO [∼,<, +1]. By Free(ϕ) ⊆V, we denote the set of variables that occur free in ϕ.If Free(ϕ)= ∅, then we call ϕ a sentence. We sometimes write ϕ(x ,...,x ) to emphasize the fact that Free(ϕ) ⊆{x ,...,x }. 1 n 1 n To evaluate ϕ over a P-execution w =(a ,p )(a ,p ) ..., we consider (P,w)as 1 1 2 2 a structure S =(P Pos(w), P , P , P , (R ) , ∼,<, +1) where PPos(w) (P,w) s e se a a∈A is the universe, P P , and P are interpreted as unary relations, R is the unary s e se a relation {i ∈ Pos(w) | a = a}, < = {(i, j) ∈ Pos(w) × Pos(w) | i<j}, +1 = {(i, i +1) | 1 ≤ i< |w|}, and ∼ is the smallest equivalence relation over P Pos(w) containing – (p, i) for all p ∈ P and i ∈ Pos(w) such that p = p , and – (i, j) for all (i, j) ∈ Pos(w) × Pos(w) such that p = p . i j An equivalence class of ∼ is often simply referred to as a class. Note that it contains exactly one process. Example 1. Suppose A = {a, b} and A = {c, d}. Let the set of processes s e P be given by P = {1, 2, 3}, P = {4, 5}, and P = {6, 7, 8}. Moreover, let s e se w =(a, 1)(b, 8)(d, 7)(c, 4)(a, 6)(c, 6)(a, 7)(d, 6)(b, 2)(d, 7)(a, 7) ∈ Σ . Figure 1 il- lustrates S . The edge relation represents +1, its transitive closure is <. (P,w) An interpretation for (P,w) is a partial mapping I : V→ P ∪ Pos(w). Sup- pose ϕ ∈ FO [∼,<, +1] such that Free(ϕ) ⊆ dom(I). The satisfaction relation (P,w),I |= ϕ is then deﬁned as expected, based on the structure S and in- (P,w) terpreting free variables according to I. For example, let w =(a ,p )(a ,p ) ... 1 1 2 2 and i ∈ Pos(w). Then, for I(x)= i,wehave (P,w),I |= a(x)if a = a. We identify some fragments of FO [∼,<, +1]. For R ⊆{∼,<, +1}, let FO [R] A A denote the set of formulas that do not use symbols in {∼,<, +1}\ R. Moreover, FO [R] denotes the fragment of FO [R] that uses only two (reusable) variables. A A Parameterized Synthesis for First-Order Logic over Data Words 101 Let ϕ(x ,...,x ,y) ∈ FO [∼,<, +1] be a formula and m ∈ N.Weuse 1 n A ≥m ∃ y.ϕ(x ,...,x ,y) as an abbreviation for 1 n ∃y ... ∃y . ¬(y = y ) ∧ ϕ(x ,...,x ,y ) , 1 m i j 1 n i 1≤i<j≤m 1≤i≤m ≥0 ≥m if m> 0, and ∃ y.ϕ(x ,...,x ,y)= true. Thus, ∃ y.ϕ says that there are at 1 n =m least m distinct elements that verify ϕ. We also use ∃ y.ϕ as an abbreviation ≥m ≥m+1 ≥m for ∃ y.ϕ∧¬∃ y.ϕ. Note that ϕ ∈ FO [R] implies that ∃ y.ϕ ∈ FO [R] A A =m and ∃ y.ϕ ∈ FO [R]. Example 2. Let A, P, and w be like in Example 1 and Figure 1. – ϕ = ∀x. (s(x) ∨ se(x)) =⇒∃y.(x ∼ y ∧ (a(y) ∨ b(y))) says that each process that System can control executes at least one system action. We have ϕ ∈ FO [∼] and (P,w) |= ϕ , as process 3 is idle. 1 1 – ϕ = ∀x. d(x)=⇒∃y.(x ∼ y ∧ a(y)) says that, for every d, there is an a on the same process. We have ϕ ∈ FO [∼] and (P,w) |= ϕ . 2 2 – ϕ = ∀x. d(x)=⇒∃y.(x ∼ y ∧x<y ∧a(y)) says that every d is eventually followed by an a executed by the same process. We have ϕ ∈ FO [∼,<] and (P,w) |= ϕ : The event (d, 6) is not followed by some (a, 6). =2 =2 – ϕ = ∀x. ∃ y.(x ∼ y ∧ a(y)) ⇐⇒ ∃ y.(x ∼ y ∧ d(y)) says that each class contains exactly two occurrences of a iﬀ it contains exactly two occurrences of d. Moreover, ϕ ∈ FO [∼] and (P,w) |= ϕ . Note that ϕ ∈ 4 A 4 4 =2 FO [∼], as ∃ y requires the use of three diﬀerent variable names. 3 Parameterized Synthesis Problem We deﬁne an asynchronous synthesis problem. A P-strategy (for System) is a ∗ ∗ ω mapping f : Σ → Σ ∪{ε}.A P-execution w = σ σ ... ∈ Σ ∪ Σ is f- s 1 2 compatible if, for all i ∈ Pos(w) such that σ ∈ Σ ,wehave f(σ ...σ )= σ . i s 1 i−1 i We call wf-fair if the following hold: (i)If w is ﬁnite, then f(w)= ε, and (ii) if w is inﬁnite and f(σ ...σ ) = ε for inﬁnitely many i ≥ 1, then σ ∈ Σ for 1 i−1 j s inﬁnitely many j ≥ 1. Let ϕ ∈ FO [∼,<, +1] be a sentence. We say that f is P-winning for ϕ if, for every P-execution w that is f-compatible and f-fair, we have (P,w) |= ϕ. The existence of a P-strategy that is P-winning for a given formula does not depend on the concrete process identities but only on the cardinality of the sets P , P ,and P . This motivates the following deﬁnition of winning triples for a s e se formula. Given ϕ, let Win(ϕ) be the set of triples (k ,k ,k ) ∈ N for which s e se there is P =(P , P , P ) such that |P | = k for all θ ∈ T and there is a P-strategy s e se θ θ that is P-winning for ϕ. Let 0 = {0} and k ,k ∈ N. In this paper, we focus on the intersection of e se Win(ϕ) with the sets N × 0 × 0 (which corresponds to the usual satisﬁability problem); N ×{k }×{k } (there is a constant number of environment and e se mixed processes); N×N×{k } (there is a constant number of mixed processes); se 0 × 0 × N (each process is controlled by both System and Environment). 102 B. B´ erard et al. Deﬁnition 3 (synthesis problem). For ﬁxed F ∈{FO, FO }, set of relation symbols R ⊆{∼,<, +1}, and N , N , N ⊆ N, the (parameterized) synthesis s e se problem is given as follows: Synth(F[R], N , N , N ) s e se Input: A = A A and a sentence ϕ ∈ F [R] s e A Question: Win(ϕ) ∩ (N ×N ×N ) = ∅ ? s e se The satisﬁability problem for F[R] is deﬁned as Synth(F[R], N, 0, 0). Example 4. Suppose A = {a, b} and A = {c, d}, and consider the formulas s e ϕ –ϕ from Example 2. 1 4 First, we have Win(ϕ )= N . Given an arbitrary P and any total order over P ∪ P , a possible P-strategy f that is P-winning for ϕ maps w ∈ Σ to s se 1 (a, p)if p is the smallest process from P ∪ P wrt. that does not occur in w, s se and that returns ε for w if all processes from P ∪ P already occur in w. s se For the three formulas ϕ , ϕ , and ϕ , observe that, since d is an environment 2 3 4 action, if there is at least one process that is exclusively controlled by Environ- ment, then there is no winning strategy. Hence we must have P = ∅. In fact, this condition is suﬃcient in the three cases and the strategies described below show that all three sets Win(ϕ ), Win(ϕ ), and Win(ϕ ) are equal to N×0×N. 2 3 4 – For ϕ , the very same strategy as for ϕ also works in this case, producing 2 1 an a for every process in P ∪ P , whether there is a d or not. s se – For ϕ , a winning strategy f will apply the previous mechanism itera- tively, performing (a, p) for p ∈ P = {p ,...,p } over and over again: se 0 n−1 f(w)=(a, p ) where i is the number of occurrences of letters from Σ mod- i s ulo n. By the fairness assumption, this guarantees satisfaction of ϕ . A more “economical” winning strategy f may organize pending requests in terms of d in a queue and acknowledge them successively. More precisely, given u ∈ P and σ ∈ Σ, we deﬁne another word uσ ∈ P by u(d, p)= u·p (inserting p in the queue) and (p·u)(a, p)= u (deleting it). In all other cases, uσ = u. Let w = σ ...σ ∈ Σ , with queue ((ε σ ) σ ...) σ = p ...p .We 1 n 1 2 n 1 k let f (w)= ε if k = 0, and f (w)=(a, p )if k ≥ 1. – For ϕ , the strategy f for ϕ ensures that every d has a corresponding a so 4 3 that, in the long run, there are as many a’s as d’s in every class. Another interesting question is whether System (or Environment) has a win- ning strategy as soon as the number of processes is big enough. This leads to the notion of a cutoﬀ (cf. [4] for an overview): Let N , N , N ⊆ N and W ⊆ N .We s e se call k ∈ N a cutoﬀ of W wrt. (N , N , N )if k ∈N ×N ×N and either 0 s e se 0 s e se – for all k ∈N ×N ×N such that k ≥ k ,wehave k ∈ W,or s e se 0 – for all k ∈N ×N ×N such that k ≥ k ,wehave k ∈ W . s e se 0 Let F ∈{FO, FO } and R ⊆{∼,<, +1}. If, for every alphabet A = A A s e and every sentence ϕ ∈ F [R], the set Win(ϕ) has a computable cutoﬀ wrt. A Parameterized Synthesis for First-Order Logic over Data Words 103 Table 1. Summary of results. Our contributions are highlighted in bold. Synthesis (N, 0, 0)(N, {k }, {k })(N, N, 0)(0, 0, N) e se FO [∼,<, +1] decidable [5]? ? undecidable FO [∼,<] NEXPTIME-c. [5]? ? ? FO[∼] decidable decidable ? undecidable We show, however, that there is no cutoﬀ. (N , N , N ), then we know that Synth(F[R], N , N , N ) is decidable, as it s e se s e se can be reduced to a ﬁnite number of simple synthesis problems over a ﬁnite alphabet. The latter can be solved, e.g., using attractor-based backward search (cf. [42]). This is how we will show decidability of Synth(FO[∼], N, {k }, {k }) e se for all k ,k ∈ N. e se Our contributions are summarized in Table 1. Note that known satisﬁability results for data logic apply to our logic, as processes can be simulated by treating every θ ∈ T as an ordinary letter. Let us ﬁrst state undecidability of the general synthesis problem, which motivates the study of other FO fragments. Theorem 5. The problem Synth(FO [∼,<, +1], 0, 0, N) is undecidable. Proof (sketch). We adapt the proof from [16, 17] reducing the halting problem for 2-counter machines. We show that their encoding can be expressed in our logic, even if we restrict it to two variables, and can also be adapted to the asynchronous setting. 4FO[∼] and Parameterized Vector Games Due to the undecidability result of Theorem 5, one has to switch to other frag- ments of ﬁrst-order logic. We will henceforth focus on the logic FO[∼] and es- tablish some important properties, such as a normal form, that will allow us to deduce a couple of results, both positive and negative. 4.1 Satisﬁability and Normal Form for FO[∼] We ﬁrst show that FO[∼] logic essentially allows one to count letters in a class up to some threshold, and to count such classes up to some other threshold. Let B ∈ N and ∈{0,...,B} . Intuitively, (a) imposes a constraint on the number of occurrences of a in a class. We ﬁrst deﬁne an FO [∼]-formula ψ (y) A B, verifying that, in the class deﬁned by y, the number of occurrences of each letter a ∈ A, counted up to B,is (a): =(a) ≥(a) ψ (y)= ∃ z. y ∼ z ∧ a(z) ∧ ∃ z. y ∼ z ∧ a(z) B, a∈A | a∈A | (a)<B (a)=B 104 B. B´ erard et al. Theorem 6 (normal form for FO[∼]). Let ϕ ∈ FO [∼] be a sentence. There is a computable B ∈ N such that ϕ is eﬀectively equivalent to a disjunction of conjunctions of formulas of the form ∃ y. θ(y) ∧ ψ (y) where ∈{≥, =}, B, m ∈ N, θ ∈ T, and ∈{0,...,B} . The normal form can be obtained using known normal-form constructions [23,41] for general FO logic [2], or using Ehrenfeucht-Fra¨ıss´e games [39], or using a direct inductive transformation in the spirit of [23]. =2 =2 Example 7. Recall the formula ϕ = ∀x. ∃ y.(x ∼ y ∧ a(y)) ⇐⇒ ∃ y.(x ∼ y ∧ d(y)) ∈ FO [∼] from Example 2, over A = {a, b} and A = {c, d}.An A s e =0 equivalent formula in normal form is ϕ = ∃ y. θ(y)∧ψ (y) where 3, 4 θ∈T, ∈Z Z is the set of vectors ∈{0,..., 3} such that (a)=2 = (d)or (d)=2 = (a). The formula indeed says that there is no class with =2 occurrences of a and =2 occurrences of d or vice versa, which is equivalent to ϕ . Thanks to the normal form, it is suﬃcient to test ﬁnitely many structures to determine whether a given formula is satisﬁable: Corollary 8. The satisﬁability problem for FO[∼] over data words is decidable. Moreover, every satisﬁable FO [∼] formula has a ﬁnite model. Note that the satisﬁability problem for FO [∼] is already NEXPTIME-hard, due to NEXPTIME-hardness for two-variable logic with unary relations only [14, 20,22]. In fact, it is NEXPTIME-complete due to the upper bound for FO [∼,<] [5]. It is worth mentioning that two-variable logic with one equivalence relation on arbitrary structures also has the ﬁnite-model property [32]. 4.2 From Synthesis to Parameterized Vector Games Exploiting the normal form for FO [∼], we now present a reduction of the syn- thesis problem to a strictly turn-based two-player game. This game is conceptu- ally simpler and easier to reason about. The reduction works in both directions, which will allow us to derive both decidability and undecidability results. Note that, given a formula ϕ ∈ FO [∼] (which we suppose to be in normal form with threshold B), the order of letters in an execution does not matter. Thus, given some P, a reasonable strategy for Environment would be to just “wait and see”. More precisely, it does not put Environment into a worse position if, given the current execution w ∈ Σ , it lets the System execute as many actions as it wants in terms of a word u ∈ Σ . Due to the fairness assumption, System would be able to execute all the letters from u anyway. Environment can even require System to play a word u such that (P,wu) |= ϕ. If System is not able to produce such a word, Environment can just sit back and do nothing. Conversely, upon wu satisfying ϕ, Environment has to be able to come up with a word v ∈ Σ such that (P,wuv) |= ϕ. This leads to a turn-based game in which System and Environment play in strictly alternate order and have to provide a satisfying and, respectively, falsifying execution. Parameterized Synthesis for First-Order Logic over Data Words 105 In a second step, we can get rid of process identiﬁers: According to our normal form, all we are interested in is the number of processes that agree on their letters counted up to threshold B. That is, a ﬁnite execution can be T A abstracted as a conﬁguration C : L → N where L = {0,...,B} .For ∈ L and C()=(n ,n ,n ), n is the number of processes of type θ whose letter count s e se θ up to threshold B corresponds to . We can also say that contains n tokens of type θ. If it is System’s turn, it will pick some pairs (, ) and move some tokens of type θ ∈{s, se} from to , provided (a) ≤ (a) for all a ∈ A and (a)= (a) for all a ∈ A . This actually corresponds to adding more system letters in the corresponding processes. The Environment proceeds analogously. Finally, the formula ϕ naturally translates to an acceptance condition F⊆ C over conﬁgurations, where C is the set of local acceptance conditions, which are of the form ( n , n , n ) where ∈{=, ≥} and n ,n ,n ∈ N. s s e e se se s e se s e se We end up with a turn-based game in which, similarly to a VASS game [1,6, 10,27,38], System and Environment move tokens along vectors from L. Note that, however, our games have a very particular structure so that undecidability for VASS games does not carry over to our setting. Moreover, existing decidability results do not allow us to infer our cutoﬀ results below. In the following, we will formalize parameterized vector games. Deﬁnition 9. A parameterized vector game (or simply game) is given by a triple G =(A, B, F) where A = A A is the ﬁnite alphabet, B ∈ N is a bound, s e A L and, letting L = {0,...,B} , F⊆ C is a ﬁnite set called acceptance condition. Locations. Let be the location such that (a) = 0 for all a ∈ A.For ∈ L 0 0 and a ∈ A, we deﬁne + a by ( + a)(b)= (b) for b = a and ( + a)(b)= max{(a)+1,B} otherwise. This is extended for all u ∈ A and a ∈ A by + ε = and + ua =( + u)+ a.By ⟪w⟫, we denote the location + w. Conﬁgurations. As explained above, a conﬁguration of G is a mapping C : L → N . Suppose that, for ∈ L and θ ∈ T,wehave C()=(n ,n ,n ). Then, we s e se let C(, θ) refer to n .By Conf , we denote the set of all conﬁgurations. Transitions. A system transition (respectively environment transition) is a map- ping τ : L×L → (N×{0}×N) (respectively τ : L×L → ({0}×N×N)) such that, for all (, ) ∈ L×L with τ(, ) =(0, 0, 0), there is a word w ∈ A (respectively w ∈ A ) such that = + w. Let T denote the set of system transitions, T the s e set of environment transitions, and T = T ∪ T the set of all transitions. s e For τ ∈ T , let the mappings out , in : L → N be deﬁned by out ()= τ τ τ τ(, ) and in ()= τ( ,) (recall that sum is component-wise). ∈L ∈L We say that τ ∈ T is applicable at C ∈ Conf if, for all ∈ L,wehave out () ≤ C() (component-wise). Abusing notation, we let τ(C) denote the conﬁguration C deﬁned by C ()= C() − out ()+ in () for all ∈ L. Moreover, for τ τ τ(, )=(n ,n ,n ) and θ ∈ T,welet τ(, ,θ) refer to n . s e se θ Plays. Let C ∈ Conf . We write C |= F if there is κ ∈F such that, for all ∈ L,wehave C() |= κ() (in the expected manner). A C-play, or simply play, is a ﬁnite sequence π = C τ C τ C ...τ C alternating between conﬁgurations 0 1 1 2 2 n n 106 B. B´ erard et al. and transitions (with n ≥ 0) such that C = C and, for all i ∈{1,...,n}, C = τ (C ) and i i i−1 – if i is odd, then τ ∈ T and C |= F (System’s move), i s i – if i is even, then τ ∈ T and C |= F (Environment’s move). i e i The set of all C-plays is denoted by Plays . Strategies. A C-strategy for System is a partial mapping f : Plays → T C s such that f(C) is deﬁned and, for all π = C τ C ...τ C ∈ Plays with τ = 0 1 1 i i f(π) deﬁned, we have that τ is applicable at C and τ(C ) |= F. Play π = i i C τ C ...τ C is 0 1 1 n n – f-compatible if, for all odd i ∈{1,...,n}, τ = f(C τ C ...τ C ), i 0 1 1 i−1 i−1 – f-maximal if it is not the strict preﬁx of an f-compatible play, – winning if C |= F. We say that f is winning for System (from C)ifall f-compatible f-maximal C- plays are winning. Finally, C is winning if there is a C-strategy that is winning. Note that, given an initial conﬁguration C, we deal with an acyclic ﬁnite reach- ability game so that, if there is a winning C-strategy, then there is a positional one, which only depends on the last conﬁguration. For k ∈ N , let C denote the conﬁguration that maps to k and all other k 0 locations to (0, 0, 0). We set Win(G)= {k ∈ N | C is winning for System}. Deﬁnition 10 (game problem). For sets N , N , N ⊆ N, the game problem s e se is given as follows: Game(N , N , N ) s e se Input: Parameterized vector game G Question: Win(G) ∩ (N ×N ×N ) = ∅ ? s e se One can show that parameterized vector games are equivalent to the synthesis problem in the following sense: Lemma 11. For every sentence ϕ ∈ FO [∼], there is a parameterized vector game G =(A, B, F) such that Win(ϕ)= Win(G). Conversely, for every param- eterized vector game G =(A, B, F), there is a sentence ϕ ∈ FO [∼] such that Win(G)= Win(ϕ). Both directions are eﬀective. Example 12. To illustrate parameterized vector games and the reduction from =0 the synthesis problem, consider the formula ϕ = ∃ y. θ(y)∧ψ (y) 3, 4 θ∈T, ∈Z in normal form from Example 7. For simplicity, we assume that A = {a} and i j {a,d} A = {d}. That is, Z is the set of vectors ⟪a d ⟫ ∈ L = {0,..., 3} such that i =2 = j or j =2 = i. Figure 2 illustrates a couple of conﬁgurations C ,...,C : L → N . The leftmost location in a conﬁguration is , the rightmost 0 5 0 Parameterized Synthesis for First-Order Logic over Data Words 107 C 1 C τ C 0 1 2 System Environment 6 6 4 2 2 6 System System Environment τ τ τ 3 C C 5 C 3 4 4 5 Fig. 2. A play of a parameterized vector game 3 3 3 3 location ⟪a d ⟫, the topmost one ⟪a ⟫, and the one at the bottom ⟪d ⟫. Self- loops have been omitted, and locations from Z have gray background and a dashed border. Towards an equivalent game G =(A, 3, F), it remains to determine the accep- tance condition F. Recall that ϕ says that every class contains two occurrences of a iﬀ it contains two occurrences of d. This is reﬂected by the acceptance condi- tion F = {κ} where κ()=(=0 , =0 , =0) for all ∈ Z and κ()=(≥0 , ≥0 , ≥0) for all ∈ L \ Z. With this, a conﬁguration is accepting iﬀ no token is on a location from Z (a gray location). We can verify that Win(G)= Win(ϕ )= N ×0 ×N.In G, a uniform winning strategy f for System that works for all P with P = ∅ proceeds as follows: System ﬁrst awaits an Environment’s move and then moves each token upwards as many locations as Environment has moved it downwards. Figure 2 illustrates an f-maximal C -play that is winning for System. We note that f is a (6,0,0) “compressed” version of the winning strategy presented in Example 4, as System makes her moves only when really needed. 5 Results for FO[∼] via Parameterized Vector Games In this section, we present our results for the synthesis problem for FO[∼], which we obtain showing corresponding results for parameterized vector games. In particular, we show that (FO[∼], 0, 0, N) and (FO[∼], N, N, 0) do not have a cutoﬀ, whereas (FO[∼], N, {k }, {k }) has a cutoﬀ for all k ,k ∈ N. Finally, we e se e se prove that Synth(FO[∼], 0, 0, N) is, in fact, undecidable. Lemma 13. There is a game G =(A, B, F) such that Win(G) does not have a cutoﬀ wrt. (0, 0, N). Proof. We let A = {a} and A = {b}, as well as B =2. For k ∈{0, 1, 2}, deﬁne s e = ≥ the local acceptance conditions k = (=0 , =0 , =k) and k = (=0 , =0 , ≥k). Set 108 B. B´ erard et al. 2 0 0 0 0 ≥0 ≥0 ≥0 ≥0 ≥2 0 0 0 0 1 0 0 1 ≥0 ≥0 ≥0 ≥1 1 0 Fig. 3. Acceptance conditions for a game with no cutoﬀ wrt. (0, 0, N) 2 2 2 = ⟪a⟫, = ⟪ab⟫, = ⟪a b⟫, and = ⟪a b ⟫.For k ,...,k ∈{0, 1, 2} and 1 2 3 4 0 4 0 1 2 3 4 ,..., ∈{=, ≥}, let [ k , k , k , k , k ] denote κ ∈ C where 0 4 0 1 2 3 4 κ( )=( k ) for all i ∈{0,..., 4} and κ( )=( 0) for ∈{ / ,..., }. Finally, i i 0 4 ≥ = = = ≥ ≥ = = = ≥ = = = = ≥ [ 0 , 2 , 0 , 0 , 0] [ 0 , 0 , 0 , 2 , 0] [ 0 , 0 , 0 , 0 , 2] F = ∪ K ≥ = = = ≥ ≥ = = = ≥ [ 0 , 1 , 1 , 0 , 0] [ 0 , 0 , 0 , 1 , 1] where K = {κ | ∈ L such that (b) >(a)} with κ ( )=( 1) if = , and κ ( )=( 0) otherwise. This is illustrated in Figure 3. There is a winning strategy for System from any initial conﬁguration of size 2n: Move two tokens from to , wait until Environment sends them both to 0 1 , then move them to , wait until they are moved to , then repeat with two 2 3 4 new tokens from until all the tokens are removed from , and Environment 0 0 cannot escape F anymore. However, one can check that there is no winning strategy for initial conﬁgurations of odd size. Lemma 14. There is a game G =(A, B, F) such that Win(G) does not have a cutoﬀ wrt. (N, N, 0). Proof. We deﬁne G such that System wins only if she has at least as many processes as Environment. Let A = {a}, A = {b}, and B = 2. As there are no s e shared processes, we can safely ignore locations with a letter from both System and Environment. We set F = {κ ,κ ,κ ,κ } where 1 2 3 4 κ (⟪a⟫) = (=1 , =0 , =0) κ (⟪a⟫) = (=1 , =0 , =0) κ (⟪a⟫) = (=0 , =0 , =0) 1 2 3 κ (⟪b⟫)=(=0 , =0 , =0) κ (⟪b⟫) = (=0 , ≥2 , =0) κ (⟪b⟫)=(=0 , ≥1 , =0) , 1 2 3 κ ( ) = (=0 , =0 , =0), and κ ( )=(≥0 , ≥0 , =0) for all other ∈ L and 4 0 i i ∈{1, 2, 3, 4}. We now turn to the case where the number of processes that can be trig- gered by Environment is bounded. Note that similar restrictions are imposed in other settings to get decidability, such as limiting the environment to a ﬁ- nite (Boolean) domain [16] or restricting to one environment process [3,18]. We obtain decidability of the synthesis problem via a cutoﬀ construction: Parameterized Synthesis for First-Order Logic over Data Words 109 Theorem 15. Given k ,k ∈ N, every game G =(A, B, F) has a cutoﬀ wrt. e se (N, {k }, {k }). More precisely: Let K be the largest constant that occurs in F. e se Max+1 ˆ ˆ Moreover, let Max =(k +k )·|A |·B and N = |L| ·K. Then, (N, k ,k ) e se e e se is a cutoﬀ of Win(G) wrt. (N, {k }, {k }). e se Proof. We will show that, for all N ≥ N, (N, k ,k ) ∈ Win(G) ⇐⇒ (N +1,k ,k ) ∈ Win(G) . e se e se The main observation is that, when C contains more than K tokens in a given ∈ L, adding more tokens in will not change whether C |= F. Given C, C ∈ Conf , we write C< C if C = C and there is τ ∈ T such that τ(C)= C . Note e e that the length d of a chain C < C < ... < C is bounded by Max. In other 0 e 1 e e d words, Max is the maximal number of transitions that Environment can do in a play. For all d ∈{0,..., Max}, let Conf be the set of conﬁgurations C ∈ Conf such that the longest chain in (Conf ,< ) starting from C has length d. Claim. Suppose that C ∈ Conf and ∈ L such that C()=(N, n ,n ) with e se d+1 N ≥|L| · K and n ,n ∈ N. Set D = C[ → (N +1,n ,n )]. Then, e se e se C is winning for System ⇐⇒ D is winning for System. To show the claim, we proceed by induction on d ∈ N, which is illustrated in Figure 4. In each implication, we distinguish the cases d = 0 and d ≥ 1. For the latter, we assume that equivalence holds for all values strictly smaller than d. For τ ∈ T and , ∈ L, we let τ[(, , s)++] denote the transition η ∈ T s s given by η( , , e)= τ( , , e)=0, η( , , se)= τ( , , se), η( , , s)= 1 2 1 2 1 2 1 2 1 2 τ( , , s)+1 if ( , )=(, ), and η( , , s)= τ( , , s)if( , ) =(, ). 1 2 1 2 1 2 1 2 1 2 We deﬁne τ[(, , s)––] similarly (provided τ(, , s) ≥ 1). =⇒: Let f be a winning strategy for System from C ∈ Conf . Let τ = f(C) d+1 and C = τ (C). Note that C |= F. Since C(, s)= N ≥|L| · K, there is ∗ d ∈ L such that + w = for some w ∈ A and C ( , s)= N ≥|L| · K. We show that D = C[ → (N +1,n ,n )] is winning for System by exhibiting e se a corresponding winning strategy g from D that will carefully control the position of the additional token. First, set g(D)= η where η = τ [(, , s)++]. Let D = η (D). We obtain D ( , s)= N + 1. Note that, since N ≥ K, the acceptance condition F cannot distinguish between C and D . Thus, we have D |= F. Case d = 0: As, for all transitions η ∈ T ,wehave η (D )= D |= F,we reached a maximal play that is winning for System. We deduce that D is winning for System. Case d ≥ 1: Take any η ∈ T and D such that D = η (D ) |= F.Let τ = η and C = τ (C ). Note that D = C [( , s) → N + 1], C = D [( , s) → N], and C ,D ∈ Conf for some d <d.As f is a winning strategy for System from C, we have that C is winning for System. By induction hypothesis, D is winning for System, say by winning strategy g .Welet g(Dη D η π)= g (π) for all D -plays π. For all unspeciﬁed plays, let g return any applicable system transition. Altogether, for any choice of η ,we have that g is winning from D .Thus, g is a winning strategy from D. 110 B. B´ erard et al. Conf Conf d d |= F η − d <d D D D N +1 n n N +1 n n e se e se N +1 n n e se d+1 d N ≥ |L| · K N ≥ |L| · K N n n N n n e se e se Nn n e se τ |= F τ Fig. 4. Induction step in the cutoﬀ construction ⇐=: Suppose g is a winning strategy for System from D. Thus, for η = g(D) d+1 and D = η (D), we have D |= F. Recall that D(, s) ≥ (|L| · K)+1. We distinguish two cases: 1. Suppose there is ∈ L such that = , D ( , s)= N + 1 for some N ≥|L| · K, and η (, , s) ≥ 1. Then, we set τ = η [(, , s)––]. 2. Otherwise, we have D (, s) ≥ (|L| · K) + 1, and we set τ = η (as well as = and N = N). Let C = τ (C). Since D |= F, one obtains C |= F. Case d = 0: For all transitions τ ∈ T ,wehave τ (C )= C |= F. Thus, we reached a maximal play that is winning for System. We deduce that C is winning for System. Case d ≥ 1: Take any τ ∈ T such that C = τ (C ) |= F. Let η = τ and D = η (D ). We have C = D [( , s) → N ], D = C [( , s) → N + 1], and C ,D ∈ Conf − for some d <d.As D is winning for System, by induction hypothesis, C is winning for System, say by winning strategy f . We let f(Cτ C τ π)= f (π) for all C -plays π. For all unspeciﬁed plays, let f return an arbitrary applicable system transition. Again, for any choice of τ , f is winning from C .Thus, f is a winning strategy from C. This concludes the proof of the claim and, therefore, of Theorem 15. Corollary 16. Let k ,k ∈ N be the number of environment and the num- e se ber of mixed processes, respectively. The problems Game(N, {k }, {k }) and e se Synth(FO[∼], N, {k }, {k }) are decidable. e se Parameterized Synthesis for First-Order Logic over Data Words 111 In particular, by Theorem 15, the game problem can be reduced to an ex- ponential number of acyclic ﬁnite-state games whose size (and hence the time complexity for determining the winner) is exponential in the cutoﬀ and, there- fore, doubly exponential in the size of the alphabet, the bound B, and the ﬁxed number of processes that are controllable by the environment. Theorem 17. Game(0, 0, N) and Synth(FO[∼], 0, 0, N) are undecidable. Proof. We provide a reduction from the halting problem for 2-counter machines (2CM) to Game(0, 0, N). A 2CM M =(Q, Δ, c , c ,q ,q ) has two counters, 1 2 0 h c and c , a ﬁnite set of states Q, and a set of transitions Δ ⊆ Q × Op × Q 1 2 where Op = {c ++ , c –– , c ==0 | i ∈{1, 2}}. Moreover, we have an initial i i i state q ∈ Q and a halting state q ∈ Q. A conﬁguration of M is a triple 0 h γ =(q, ν ,ν ) ∈ Q × N × N giving the current state and the current respective 1 2 counter values. The initial conﬁguration is γ =(q , 0, 0) and the set of halting 0 0 conﬁgurations is F = {q }× N × N.For t ∈ Δ, conﬁguration (q ,ν ,ν )isa 1 2 (t-)successor of (q, ν ,ν ), written (q, ν ,ν ) (q ,ν ,ν ), if there is i ∈{1, 2} 1 2 1 2 t 1 2 such that ν = ν and one of the following holds: (i) t =(q, c ++,q ) and 3−i i 3−i ν = ν +1, or (ii) t =(q, c ––,q ) and ν = ν − 1, or (iii) t =(q, c ==0,q ) and i i i i i i ν = ν = 0. A run of M is a (ﬁnite or inﬁnite) sequence γ γ ... . The i 0 t 1 t i 1 2 2CM halting problem asks whether there is a run reaching a conﬁguration in F . It is known to be undecidable [34]. We ﬁx a 2CM M =(Q, Δ, c , c ,q ,q ). Let A = Q ∪ Δ ∪{a ,a } and A = 1 2 0 h s 1 2 e {b} with a , a , and b three fresh symbols. We consider the game G =(A, B, F) 1 2 with A = A A , B = 4, and F deﬁned below. Let L = {0,...,B} . Since there s e are only processes shared by System and Environment, we alleviate notation and consider that a conﬁguration is simply a mapping C : L → N. From now on, to avoid confusion, we refer to conﬁgurations of the 2CM M as M-conﬁgurations, and to conﬁgurations of G as G-conﬁgurations. Intuitively, every valid run of M will be encoded as a play in G, and the acceptance condition will enforce that, if a player in G deviates from a valid play, then she will lose immediately. At any point in the play, there will be at most one process with only a letter from Q played, which will represent the current state in the simulated 2CM run. Similarly, there will be at most one process with only a letter from Δ to represent what transition will be taken next. Finally, the value of counter c will be encoded by the number of processes 2 2 with exactly two occurrences of a and two occurrences of b (i.e., C(⟪a b ⟫)). 2 2 To increase counter c , the players will move a new token to ⟪a b ⟫, and to 2 2 4 4 decrease it, they will move, together, a token from ⟪a b ⟫ to ⟪a b ⟫. Observe i i 2 2 that, if c has value 0, then C(⟪a b ⟫) = 0 in the corresponding conﬁguration of the game. As expected, it is then impossible to simulate the decrement of c . Environment’s only role is to acknowledge System’s actions by playing its (only) letter when System simulates a valid run. If System tries to cheat, she loses immediately. Encoding an M-conﬁguration. Let us be more formal. Suppose γ =(q, ν ,ν )is 1 2 an M-conﬁguration and C a G-conﬁguration. We say that C encodes γ if 112 B. B´ erard et al. 2 2 2 2 – C(⟪q⟫)=1, C(⟪a b ⟫)= ν , C(⟪a b ⟫)= ν , 1 2 1 2 2 2 2 2 4 4 – C() ≥ 0 for all ∈{ }∪{⟪qˆ b ⟫, ⟪t b ⟫, ⟪a b ⟫ | qˆ ∈ Q, t ∈ Δ, i ∈{1, 2}}, – C() = 0 for all other ∈ L. We then write γ = m(C). Let C(γ) be the set of G-conﬁgurations C that en- code γ. We say that a G-conﬁguration C is valid if C ∈ C(γ) for some γ. Simulating a transition of M. Let us explain how we go from a G-conﬁguration encoding γ to a G-conﬁguration encoding a successor M-conﬁguration γ . Ob- serve that System cannot change by herself the M-conﬁguration encoded. If, for instance, she tries to change the current state q, she might move one process from to ⟪q ⟫, but then the G-conﬁguration is not valid anymore. We need to move 2 2 the process in ⟪q⟫ into ⟪q b ⟫ and this requires the cooperation of Environment. Assume that the game is in conﬁguration C encoding γ =(q, ν ,ν ). System 1 2 will pick a transition t starting in state q,say, t =(q, c ++,q ). From con- ﬁguration C, System will go to the conﬁguration C deﬁned by C (⟪t⟫)=1, 1 1 C (⟪a ⟫) = 1, and C ()= C() for all other ∈ L. 1 1 1 If the transition t is correctly chosen, Environment will go to a conﬁgura- tion C deﬁned by C (⟪q⟫)=0, C (⟪qb⟫)=1, C (⟪t⟫)=0, C (⟪tb⟫)=1, 2 2 2 2 2 C (⟪a ⟫)=0, C (⟪a b⟫) = 1 and, for all other ∈ L, C ()= C (). This 2 1 2 1 2 1 means that Environment moves processes in locations ⟪t⟫, ⟪q⟫, ⟪a ⟫ to loca- tions ⟪tb⟫, ⟪qb⟫, ⟪a b⟫, respectively. To ﬁnish the transition, System will now move a process to the destination state q of t, and go to conﬁguration C deﬁned by C (⟪q ⟫)=1, C (⟪tb⟫)=0, 3 3 3 2 2 2 C (⟪t b⟫)=1, C (⟪qb⟫)=0, C (⟪q b⟫)=1, C (⟪a b⟫)=0, C (⟪a b⟫)=1, 3 3 3 3 1 3 and C ()= C () for all other ∈ L. 3 2 Finally, Environment moves to conﬁguration C given by C (⟪t b⟫)=0, 4 4 2 2 2 2 2 2 2 2 2 C (⟪t b ⟫)= C (⟪t b ⟫)+1, C (⟪q b⟫)=0, C (⟪q b ⟫)= C (⟪q b ⟫)+1, 4 3 4 4 3 2 2 2 2 2 C (⟪a b⟫)=0, C (⟪a b ⟫)= C (⟪a b ⟫)+1, and C ()= C () for all other 4 4 3 4 3 1 1 1 ∈ L. Observe that C ∈ C((q ,ν +1,ν )). 4 1 2 Other types of transitions will be simulated similarly. To force System to start the simulation in γ , and not in any M-conﬁguration, the conﬁgurations 2 2 C such that C(⟪q b ⟫)=0 and C(⟪q⟫) = 1 for q = q are not valid, and will be losing for System. Acceptance condition. It remains to deﬁne F in a way that enforces the above 2 2 4 4 sequence of G-conﬁgurations. Let L = { }∪{⟪a b ⟫, ⟪a b ⟫ | i ∈{1, 2}} ∪ i i 2 2 2 2 {⟪q b ⟫ | q ∈ Q}∪{⟪t b ⟫ | t ∈ Δ} be the set of elements in L whose values do not aﬀect the acceptance of the conﬁguration. By [ n ,..., n ], we 1 1 1 k k k denote κ ∈ C such that κ( )=( n ) for i ∈{1,...,k} and κ() = (=0) for all i i i ˆ ˆ ∈ L \{ ,..., }. Moreover, for a set of locations L ⊆ L,welet L ≥ 0 stand 1 k for “( ≥ 0) for all ∈ L”. First, we force Environment to play only in response to System by making System win as soon as there is a process where Environment has played more letters than System (see Condition (d) in Table 2). If γ is not halting, the conﬁgurations in C(γ) will not be winning for System. Hence, System will have to move to win (Condition (a)). Parameterized Synthesis for First-Order Logic over Data Words 113 Table 2. Acceptance conditions for the game simulating a 2CM Requirements for System (a) For all t =(q, op,q ) ∈ Q: 2 2 2 2 F = [⟪q⟫ =1, ⟪t⟫ =1, ⟪a ⟫ =1, ⟪qˆ b ⟫ ≥ 1, L \{⟪qˆ b ⟫} ≥ 0] if op = c ++ (q,t) i i qˆ∈Q 3 2 2 2 2 2 F = [⟪q⟫ =1, ⟪t⟫ =1, ⟪a b ⟫ =1, ⟪qˆ b ⟫ ≥ 1, L \{⟪qˆ b ⟫} ≥ 0] if op = c –– (q,t) i i qˆ∈Q 2 2 2 2 2 2 2 2 F = [⟪q⟫ =1, ⟪t⟫ =1, ⟪a b ⟫ =0, ⟪qˆ b ⟫ ≥ 1, L \{⟪qˆ b ⟫, ⟪a b ⟫} ≥ 0] if op = c ==0 (q,t) i qˆ∈Q i i (b) For all t =(q , op,q ) ∈ Q such that op ∈{c ++, c ==0}: 0 i i F = [⟪q ⟫ =1, ⟪t⟫ =1, ⟪a ⟫ =1, ≥ 0] if op = c ++ t 0 i 0 i F = [⟪q ⟫ =1, ⟪t⟫ =1, ≥ 0] if op = c ==0 t 0 0 i (c) For all t =(q, op,q ) ∈ Q: 2 2 2 F = [⟪q b⟫ =1, ⟪t b⟫ =1, ⟪a b⟫ =1, ⟪q ⟫ =1,L ≥ 0] if op = c ++ (q,t,q ) i i 2 2 4 3 F = [⟪q b⟫ =1, ⟪t b⟫ =1, ⟪a b ⟫ =1, ⟪q ⟫ =1,L ≥ 0] if op = c –– (q,t,q ) i i 2 2 F = [⟪q b⟫ =1, ⟪t b⟫ =1,L ≥ 0] if op = c ==0 (q,t,q ) i Requirements for Environment (d) Let L = ∈ L | (α) <(b) . For all ∈ L : F =[ ≥ 1, (L \{}) ≥ 0] s<e s<e α∈A (e) For all t =(q, op,q ) ∈ Q: ⎧ ⎫ ⎪ ⎪ [⟪qb⟫ =1, ⟪t⟫ =1, ⟪ai⟫ =1,L ≥ 0], [⟪q⟫ =1, ⟪tb⟫ =1, ⟪ai⟫ =1,L ≥ 0], ⎪ ⎪ ⎨ ⎬ [⟪q⟫ =1, ⟪t⟫ =1, ⟪a b⟫ =1,L ≥ 0], [⟪qb⟫ =1, ⟪tb⟫ =1, ⟪a ⟫ =1,L ≥ 0], F = i i if op = c ++ (q,t) ⎪ ⎪ ⎪ ⎪ ⎩ ⎭ [⟪qb⟫ =1, ⟪t⟫ =1, ⟪a b⟫ =1,L ≥ 0], [⟪q⟫ =1, ⟪tb⟫ =1, ⟪a b⟫ =1,L ≥ 0] i i ⎧ ⎫ 3 2 3 2 ⎪ [⟪qb⟫ =1, ⟪t⟫ =1, ⟪a b ⟫ =1,L ≥ 0], [⟪q⟫ =1, ⟪tb⟫ =1, ⟪a b ⟫ =1,L ≥ 0],⎪ i i ⎪ ⎪ ⎨ ⎬ 3 3 3 2 [⟪q⟫ =1, ⟪t⟫ =1, ⟪a b ⟫ =1,L ≥ 0], [⟪qb⟫ =1, ⟪tb⟫ =1, ⟪a b ⟫ =1,L ≥ 0], F = i i if op = ci–– (q,t) ⎪ ⎪ ⎪ ⎪ 3 3 3 3 ⎩ ⎭ [⟪qb⟫ =1, ⟪t⟫ =1, ⟪a b ⟫ =1,L ≥ 0], [⟪q⟫ =1, ⟪tb⟫ =1, ⟪a b ⟫ =1,L ≥ 0] i i F = [⟪qb⟫ =1, ⟪t⟫ =1,L ≥ 0], [⟪q⟫ =1, ⟪tb⟫ =1,L ≥ 0] if op = c ==0 (q,t) (f) For all t =(q, op,q ) ∈ Q: ⎧ ⎫ 2 2 2 ⎪ [⟪q ⟫ =1, ⟪q b⟫ =1, ⟪t b⟫ ≥ 0, ⟪a b⟫ ≥ 0,L ≥ 0],⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ 2 2 2 ⎪ ⎨ ⎬ [⟪q ⟫ =1, ⟪q b⟫ ≥ 0, ⟪t b⟫ =1, ⟪a b⟫ ≥ 0,L ≥ 0], F = if op = c ++ (q,t,q ) 2 2 2 ⎪ ⎪ [⟪q ⟫ =1, ⟪q b⟫ ≥ 0, ⟪t b⟫ ≥ 0, ⟪a b⟫ =1,L ≥ 0], ⎪ i ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ 2 2 2 ⎩ ⎭ [⟪q b⟫ =1, ⟪q b⟫ ≥ 0, ⟪t b⟫ ≥ 0, ⟪a b⟫ ≥ 0,L ≥ 0] ⎧ ⎫ 2 2 4 3 ⎪ [⟪q ⟫ =1, ⟪q b⟫ =1, ⟪t b⟫ ≥ 0, ⟪a b ⟫ ≥ 0,L ≥ 0],⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ 2 2 4 3 ⎪ ⎨ ⎬ [⟪q ⟫ =1, ⟪q b⟫ ≥ 0, ⟪t b⟫ =1, ⟪a b ⟫ ≥ 0,L ≥ 0], F = if op = c –– (q,t,q ) i 2 2 4 3 ⎪ ⎪ [⟪q ⟫ =1, ⟪q b⟫ ≥ 0, ⟪t b⟫ ≥ 0, ⟪a b ⟫ =1,L ≥ 0], ⎪ i ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ 2 2 4 3 ⎩ ⎭ [⟪q b⟫ =1, ⟪q b⟫ ≥ 0, ⟪t b⟫ ≥ 0, ⟪a b ⟫ ≥ 0,L ≥ 0] ⎧ ⎫ 2 2 [⟪q ⟫ =1, ⟪q b⟫ =1, ⟪t b⟫ ≥ 0,L ≥ 0], ⎪ ⎪ ⎪ ⎪ ⎨ ⎬ 2 2 [⟪q ⟫ =1, ⟪q b⟫ ≥ 0, ⟪t b⟫ =1,L ≥ 0], F = if op = c ==0 (q,t,q ) i ⎪ ⎪ ⎪ ⎪ ⎩ 2 2 4 3 ⎭ [⟪q b⟫ =1, ⟪q b⟫ ≥ 0, ⟪t b⟫ ≥ 0, ⟪a b ⟫ ≥ 0,L ≥ 0] i 114 B. B´ erard et al. The ﬁrst transition chosen by System must start from the initial state of M. This is enforced by Condition (b). Once System has moved, Environment will move other processes to leave accepting conﬁgurations. The only possible move for her is to add b on a pro- cess in locations ⟪q⟫, ⟪t⟫, and ⟪a ⟫,if t is a transition incrementing counter 3 2 c (respectively ⟪a b ⟫ if t is a transition decrementing counter c ). All other i i G-conﬁgurations accessible by Environment from already deﬁned accepting con- ﬁgurations are winning for System, as established in Condition (e). System can now encode the successor conﬁguration of M, according to the chosen transition, by moving a process to the destination state of the transition (see Condition (c)). Finally, Environment makes the necessary transitions for the conﬁguration to be a valid G-conﬁguration. If she deviates, System wins (see Condition (f)). If Environment reaches a conﬁguration in C(γ) for γ ∈ F , System can win by moving the process in ⟪q ⟫ to ⟪q ⟫. From there, all the conﬁgurations reachable by Environment are also winning for System: 2 2 2 2 F = [⟪q ⟫ =1,L ≥ 0] , [⟪q b⟫ =1,L ≥ 0] , [⟪q b ⟫ =1,L ≥ 0] . h h h Finally, the acceptance condition is given by e e F = F ∪ F ∪ (F ∪F ∪F ∪F )∪F . t F (q,t) (q,t,q ) (q,t) (q,t,q ) ∈L t=(q ,op,q )∈Δ t=(q,op,q )∈Δ s<e 0 Note that a correct play can end in three diﬀerent ways: either there is a process in ⟪q ⟫ and System moves it to ⟪q ⟫, or System has no transition to pick, or there are not enough processes in for System to simulate a new transition. Only the ﬁrst kind is winning for System. We can show that there is an accepting run in M iﬀ there is some k such that System has a winning C -strategy for G. (0,0,k) 6 Conclusion There are several questions that we left open and that are interesting in their own right due to their fundamental character. Moreover, in the decidable cases, it will be worthwhile to provide tight bounds on cutoﬀs and the algorithmic complexity of the decision problem. Like in [7,15,16,30,31], our strategies allow the system to have a global view of the whole program run executed so far. However, it is also perfectly natural to consider uniform local strategies where each process only sees its own actions and possibly those that are revealed according to some causal dependencies. This is, e.g., the setting considered in [3,18] for a ﬁxed number of processes and in [25] for parameterized systems over ring architectures. Moreover, we would like to study a parameterized version of the control problem [35] where, in addition to a speciﬁcation, a program in terms of an arena is already given but has to be controlled in a way such that the speciﬁcation is satisﬁed. Finally, our synthesis results crucially rely on the fact that the number of processes in each execution is ﬁnite. It would be interesting to consider the case with potentially inﬁnitely many processes. Parameterized Synthesis for First-Order Logic over Data Words 115 References 1. P. A. Abdulla, R. Mayr, A. Sangnier, and J. Sproston. Solving parity games on integer vectors. In P. R. D’Argenio and H. C. Melgratti, editors, CONCUR 2013 - Concurrency Theory - 24th International Conference, CONCUR 2013, Buenos Aires, Argentina, August 27-30, 2013. Proceedings, volume 8052 of Lecture Notes in Computer Science, pages 106–120. Springer, 2013. 2. B. B´erard, B. Bollig, M. Lehaut, and N. Sznajder. Parameterized synthesis for fragments of ﬁrst-order logic over data words. CoRR, abs/1910.14294, 2019. 3. R. Beutner, B. Finkbeiner, and J. Hecking-Harbusch. Translating Asynchronous Games for Distributed Synthesis. In W. Fokkink and R. van Glabbeek, editors, 30th International Conference on Concurrency Theory (CONCUR 2019), volume 140 of Leibniz International Proceedings in Informatics (LIPIcs), pages 26:1–26:16, Dagstuhl, Germany, 2019. Schloss Dagstuhl–Leibniz-Zentrum fuer Informatik. 4. R. Bloem, S. Jacobs, A. Khalimov, I. Konnov, S. Rubin, H. Veith, and J. Widder. Decidability of Parameterized Veriﬁcation. Morgan & Claypool Publishers, 2015. 5. M. Bojanczyk, C. David, A. Muscholl, T. Schwentick, and L. Segouﬁn. Two- variable logic on data words. ACM Trans. Comput. Log., 12(4):27, 2011. 6. T. Br´ azdil, P. Jancar, and A. Kucera. Reachability games on extended vector addition systems with states. In ICALP’10, Part II, volume 6199 of LNCS, pages 478–489. Springer, 2010. 7. B. Brutsc ¨ h and W. Thomas. Playing games in the Baire space. In Proc. Cassting Workshop on Games for the Synthesis of Complex Systems and 3rd Int. Workshop on Synthesis of Complex Parameters, volume 220 of EPTCS, pages 13–25, 2016. 8. J. R. Buc ¨ hi and L. H. Landweber. Solving sequential conditions by ﬁnite-state strategies. Transactions of the American Mathematical Society, 138:295–311, Apr. 9. A. Church. Applications of recursive arithmetic to the problem of circuit synthesis. In Summaries of the Summer Institute of Symbolic Logic – Volume 1, pages 3–50. Institute for Defense Analyses, 1957. 10. J. Courtois and S. Schmitz. Alternating vector addition systems with states. In E. Csuhaj-Varju, ´ M. Dietzfelbinger, and Z. Esik, editors, Mathematical Founda- tions of Computer Science 2014 - 39th International Symposium, MFCS 2014, Budapest, Hungary, August 25-29, 2014. Proceedings, Part I, volume 8634 of Lec- ture Notes in Computer Science, pages 220–231. Springer, 2014. 11. S. Demri, D. D’Souza, and R. Gascon. Temporal logics of repeating values. J. Log. Comput., 22(5):1059–1096, 2012. 12. S. Demri and R. Lazi´c. LTL with the freeze quantiﬁer and register automata. ACM Transactions on Computational Logic, 10(3), 2009. 13. J. Esparza. Keeping a crowd safe: On the complexity of parameterized veriﬁcation. In STACS’14, volume 25 of Leibniz International Proceedings in Informatics, pages 1–10. Leibniz-Zentrum fur ¨ Informatik, 2014. 14. K. Etessami, M. Y. Vardi, and T. Wilke. First-order logic with two variables and unary temporal logic. Inf. Comput., 179(2):279–295, 2002. 15. L. Exibard, E. Filiot, and P.-A. Reynier. Synthesis of Data Word Transducers. In W. Fokkink and R. van Glabbeek, editors, 30th International Conference on Con- currency Theory (CONCUR 2019), volume 140 of Leibniz International Proceed- ings in Informatics (LIPIcs), pages 24:1–24:15, Dagstuhl, Germany, 2019. Schloss Dagstuhl–Leibniz-Zentrum fuer Informatik. 116 B. B´ erard et al. 16. D. Figueira and M. Praveen. Playing with repetitions in data words using en- ergy games. In A. Dawar and E. Gr¨ adel, editors, Proceedings of the 33rd Annual ACM/IEEE Symposium on Logic in Computer Science, LICS 2018, Oxford, UK, July 09-12, 2018, pages 404–413. ACM, 2018. 17. D. Figueira and M. Praveen. Playing with repetitions in data words using energy games. arXiv preprint arXiv:1802.07435, 2018. 18. B. Finkbeiner and E. Olderog. Petri games: Synthesis of distributed systems with causal memory. Inf. Comput., 253:181–203, 2017. 19. H. Frenkel, O. Grumberg, and S. Sheinvald. An automata-theoretic approach to model-checking systems and speciﬁcations over inﬁnite data domains. J. Autom. Reasoning, 63(4):1077–1101, 2019. 20. M. Furer. ¨ The computational complexity of the unconstrained limited domino problem (with implications for logical decision problems). In E. B¨ orger, G. Hasen- jaeger, and D. R¨ odding, editors, Logic and Machines: Decision Problems and Com- plexity, Proceedings of the Symposium ”Rekursive Kombinatorik” held from May 23-28, 1983 at the Institut fur ¨ Mathematische Logik und Grundlagenforschung der Universit¨at Munster/Westfalen ¨ , volume 171 of Lecture Notes in Computer Science, pages 312–319. Springer, 1983. 21. P. Gastin and N. Sznajder. Fair synthesis for asynchronous distributed systems. ACM Transactions on Computational Logic, 14(2:9), 2013. 22. E. Gr¨ adel, P. G. Kolaitis, and M. Y. Vardi. On the decision problem for two- variable ﬁrst-order logic. Bulletin of Symbolic Logic, 3(1):53–69, 1997. 23. W. Hanf. Model-theoretic methods in the study of elementary logic. In J. W. Addison, L. Henkin, and A. Tarski, editors, The Theory of Models. North-Holland, Amsterdam, 1965. 24. F. Horn, W. Thomas, N. Wallmeier, and M. Zimmermann. Optimal strategy syn- thesis for request-response games. RAIRO - Theor. Inf. and Applic., 49(3):179–203, 25. S. Jacobs and R. Bloem. Parameterized synthesis. Logical Methods in Computer Science, 10(1), 2014. 26. S. Jacobs, L. Tentrup, and M. Zimmermann. Distributed synthesis for parameter- ized temporal logics. Inf. Comput., 262(Part):311–328, 2018. 27. P. Jancar. On reachability-related games on vector addition systems with states. In RP’15, volume 9328 of LNCS, pages 50–62. Springer, 2015. 28. M. Jenkins, J. Ouaknine, A. Rabinovich, and J. Worrell. The church synthesis problem with metric. In M. Bezem, editor, Computer Science Logic, 25th Interna- tional Workshop / 20th Annual Conference of the EACSL, CSL 2011, September 12-15, 2011, Bergen, Norway, Proceedings,volume12of LIPIcs, pages 307–321. Schloss Dagstuhl - Leibniz-Zentrum fuer Informatik, 2011. 29. M. Kaminski and N. Francez. Finite-memory automata. Theoretical Computer Science, 134(2):329–363, 1994. 30. A. Khalimov and O. Kupferman. Register-Bounded Synthesis. In W. Fokkink and R. van Glabbeek, editors, 30th International Conference on Concurrency Theory (CONCUR 2019), volume 140 of Leibniz International Proceedings in Informatics (LIPIcs), pages 25:1–25:16, Dagstuhl, Germany, 2019. Schloss Dagstuhl–Leibniz- Zentrum fuer Informatik. 31. A. Khalimov, B. Maderbacher, and R. Bloem. Bounded synthesis of register trans- ducers. In S. K. Lahiri and C. Wang, editors, Automated Technology for Veriﬁca- tion and Analysis - 16th International Symposium, ATVA 2018, Los Angeles, CA, USA, October 7-10, 2018, Proceedings, volume 11138 of Lecture Notes in Computer Science, pages 494–510. Springer, 2018. Parameterized Synthesis for First-Order Logic over Data Words 117 32. E. Kieronski and M. Otto. Small substructures and decidability issues for ﬁrst- order logic with two variables. J. Symb. Log., 77(3):729–765, 2012. 33. L. Libkin, T. Tan, and D. Vrgoc. Regular expressions for data words. J. Comput. Syst. Sci., 81(7):1278–1297, 2015. 34. M. L. Minsky. Computation: Finite and Inﬁnite Machines. Prentice Hall, Upper Saddle River, NJ, USA, 1967. 35. A. Muscholl. Automated synthesis of distributed controllers. In M. M. Halld´ orsson, K. Iwama, N. Kobayashi, and B. Speckmann, editors, Automata, Languages, and Programming - 42nd International Colloquium, ICALP 2015, Kyoto, Japan, July 6- 10, 2015, Proceedings, Part II, volume 9135 of Lecture Notes in Computer Science, pages 11–27. Springer, 2015. 36. A. Pnueli and R. Rosner. Distributed reactive systems are hard to synthesize. In 31st Annual Symposium on Foundations of Computer Science, St. Louis, Missouri, USA, October 22-24, 1990, Volume II, pages 746–757. IEEE Computer Society, 37. M. O. Rabin. Automata on inﬁnite objects and Church’s problem.Number 13 in Regional Conference Series in Mathematics. American Mathematical Soc., 1972. 38. J. Raskin, M. Samuelides, and L. V. Begin. Games for counting abstractions. Electr. Notes Theor. Comput. Sci., 128(6):69–85, 2005. 39. A. Sangnier and O. Stietel. Private communication, 2020. 40. L. Schr¨ oder, D. Kozen, S. Milius, and T. Wißmann. Nominal automata with name binding. In J. Esparza and A. S. Murawski, editors, Foundations of Software Science and Computation Structures - 20th International Conference, FOSSACS 2017, Held as Part of the European Joint Conferences on Theory and Practice of Software, ETAPS 2017, Uppsala, Sweden, April 22-29, 2017, Proceedings, volume 10203 of Lecture Notes in Computer Science, pages 124–142, 2017. 41. T. Schwentick and K. Barthelmann. Local normal forms for ﬁrst-order logic with applications to games and automata. In Annual Symposium on Theoretical Aspects of Computer Science, pages 444–454. Springer, 1998. 42. W. Thomas. Church’s problem and a tour through automata theory. In Pillars of Computer Science, Essays Dedicated to Boris (Boaz) Trakhtenbrot on the Occasion of His 85th Birthday, volume 4800 of Lecture Notes in Computer Science, pages 635–655. Springer, 2008. 43. Y. Velner and A. Rabinovich. Church synthesis problem for noisy input. In M. Hof- mann, editor, Foundations of Software Science and Computational Structures - 14th International Conference, FOSSACS 2011, Held as Part of the Joint Euro- pean Conferences on Theory and Practice of Software, ETAPS 2011, Saarbrucken, ¨ Germany, March 26-April 3, 2011. Proceedings, volume 6604 of Lecture Notes in Computer Science, pages 275–289. Springer, 2011. 118 B. B´ erard et al. Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/ 4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made. The images or other third party material in this chapter are included in the chapter’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. Controlling a random population 1 2,3 1 Thomas Colcombet , Nathana¨el Fijalkow (), and Pierre Ohlmann Universit´e de Paris, IRIF, CNRS, Paris, France {thomas.colcombet,pierre.ohlmann}@irif.fr CNRS, LaBRI, Bordeaux, France nathanael.fijalkow@labri.fr The Alan Turing Institute of data science, London, United Kingdom Abstract. Bertrand et al. introduced a model of parameterised systems, where each agent is represented by a ﬁnite state system, and studied the following control problem: for any number of agents, does there exist a controller able to bring all agents to a target state? They showed that the problem is decidable and EXPTIME-complete in the adversarial setting, and posed as an open problem the stochastic setting, where the agent is represented by a Markov decision process. In this paper, we show that the stochastic control problem is decidable. Our solution makes signiﬁcant uses of well quasi orders, of the max-ﬂow min-cut theorem, and of the theory of regular cost functions. 1 Introduction The control problem for populations of identical agents. The model we study was introduced in [3] (see also the journal version [4]): a population of agents are controlled uniformly, meaning that the controller applies the same action to every agent. The agents are represented by a ﬁnite state system, the same for every agent. The key diﬃculty is that there is an arbitrary large number of agents: the control problem is whether for every n ∈ N, there exists a controller able to bring all n agents synchronously to a target state. The technical contribution of [3,4] is to prove that in the adversarial setting where an opponent chooses the evolution of the agents, the (adversarial) control problem is EXPTIME-complete. In this paper, we study the stochastic setting, where each agent evolves in- dependently according to a probabilistic distribution, i.e. the ﬁnite state system modelling an agent is a Markov decision process. The control problem becomes whether for every n ∈ N, there exists a controller able to bring all n agents synchronously to a target state with probability one. The authors are committed to making professional choices acknowledging the cli- mate emergency. We submitted this work to FoSSaCS for its excellence and because its location induces for us a low carbon footprint. This work was supported by the European Research Council (ERC) under the European Union’s Horizon 2020 re- search and innovation programme (grant agreement No.670624), and by the DeLTA ANR project (ANR-16-CE40-0007). c The Author(s) 2020 J. Goubault-Larrecq and B. K¨ onig (Eds.): FOSSACS 2020, LNCS 12077, pp. 119–135, 2020. https://doi.org/10.1007/978-3-030-45231-5_7 120 T. Colcombet et al. Our main technical result is that the stochastic control problem is decidable. In the next paragraphs we discuss four motivations for studying this problem: control of biological systems, parameterised veriﬁcation and control, distributed computing, and automata theory. Modelling biological systems. The original motivation for studying this model was for controlling population of yeasts ([21]). In this application, the concen- tration of some molecule is monitored through ﬂuorescence level. Controlling the frequency and duration of injections of a sorbitol solution inﬂuences the concen- tration of the target molecule, triggering diﬀerent chemical reactions which can be modelled by a ﬁnite state system. The objective is to control the popula- tion to reach a predetermined ﬂuorescence state. As discussed in the conclusions of [3,4], the stochastic semantics is more satisfactory than the adversarial one for representing the behaviours of the chemical reactions, so our decidability result is a step towards a better understanding of the modelling of biological systems as populations of arbitrarily many agents represented by ﬁnite state systems. From parameterised veriﬁcation to parameterised control. Parameterised veriﬁ- cation was introduced in [12]: it is the veriﬁcation of a system composed of an arbitrary number of identical components. The control problem we study here and introduced in [3,4] is the ﬁrst step towards parameterised control: the goal is control a system composed of many identical components in order to ensure a given property. To the best of our knowledge, the contributions of [3,4] are the ﬁrst results on parameterised control; by extension, we present the ﬁrst results on parameterised control in a stochastic setting. Distributed computing. Our model resembles two models introduced for the study of distributed computing. The ﬁrst and most widely studied is popula- tion protocols, introduced in [2]: the agents are modelled by ﬁnite state systems and interact by pairs drawn at random. The mode of interaction is the key diﬀerence with the model we study here: in a time step, all of our agents per- form simultaneously and independently the same action. This brings us closer to broadcast protocols as studied for instance in [8], in which one action involves an arbitrary number of agents. As explained in [3,4], our model can be seen as a subclass of (stochastic) broadcast protocols, but key diﬀerences exist in the semantics, making the two bodies of work technically independent. The focus of the distributed computing community when studying population or broadcast protocols is to construct the most eﬃcient protocols for a given task, such as (prominently) electing a leader. A growing literature from the veriﬁcation community focusses on checking the correctness of a given protocol against a given speciﬁcation; we refer to the recent survey [7] for an overview. We concentrate on the control problem, which can then be seen as a ﬁrst result in the control of distributed systems in a stochastic setting. Alternative semantics for probabilistic automata. It is very tempting to con- sider the limit case of inﬁnitely many agents: the parameterised control question Controlling a random population 121 becomes the value 1 problem for probabilistic automata, which was proved un- decidable in [13], and even in very restricted cases ([10]). Hence abstracting continuous distributions by a discrete population of arbitrary size can be seen as an approximation technique for probabilistic automata. Using n agents cor- −n reponds to using numerical approximation up to 2 with random rounding; in this sense the control problem considers arbitrarily ﬁne approximations. The plague of undecidability results on probabilistic automata (see e.g. [9]) is nicely contrasted by our positive result, which is one of the few decidability results on probabilistic automata not making structural assumptions on the underlying graph. Our results. We prove decidability of the stochastic control problem. The ﬁrst insight is given by the theory of well quasi orders, which motivates the introduc- tion of a new problem called the sequential ﬂow problem. The ﬁrst step of our solution is to reduce the stochastic control problem to (many instances of) the sequential ﬂow problem. The second insight comes from the theory of regular cost functions, providing us with a set of tools for addressing the key diﬃculty of the problem, namely the fact that there are arbitarily many agents. Our key technical contribution is to show the computability of the sequential ﬂow prob- lem by reducing it to a boundedness question expressed in the cost monadic second order logic using the max-ﬂow min-cut theorem. Related work. The notion of decisive Markov chains was introduced in [1] as a unifying property for studying inﬁnite-state Markov chains with ﬁnite-like properties. A typical example of decisive Markov chains is lossy channel sys- tems where tokens can be lost anytime inducing monotonicity properties. Our situation is the exact opposite as we are considering (using the Petri nets ter- minology) safe Petri nets where the number of tokens along a run is constant. So it is not clear whether the underlying argument in both cases can be uniﬁed using decisiveness. Organisation of the paper. We deﬁne the stochastic control problem in Section 2, and the sequential ﬂow problem in Section 3. We construct a reduction from the former to (many instances of) the latter in Section 4, and show the decidability of the sequential ﬂow problem in Section 5. 2 The stochastic control problem Deﬁnition 1. A Markov decision process (MDP for short) consists of – a ﬁnite set of states Q, – a ﬁnite set of actions A, – a stochastic transition table ρ : Q × A→D (Q). The interpretation of the transition table is that from the state p under action a, the probability to transition to q is ρ(p, a)(q). The transition relation Δ is 122 T. Colcombet et al. deﬁned by Δ = {(p, a, q) ∈ Q×A×Q : ρ(p, a)(q) > 0} . We also use Δ given by {(p, q) ∈ Q × Q :(p, a, q) ∈ Δ}. We refer to [17] for the usual notions related to MDPs; it turns out that very little probability theory will be needed in this paper, so we restrict ourselves to mentioning only the relevant objects. In an MDP M, a strategy is a function σ : Q→A; note that we consider only pure and positional strategies, as they will be suﬃcient for our purposes. Given a source s ∈Q and a target t ∈Q, we say that the strategy σ almost surely reaches t if the probability that a path starting from s and consistent with σ eventually leads to t is 1. As we shall recall in Section 4, whether there exists a strategy ensuring to reach t almost surely from s, called the almost sure reachability problem for MDP can be reduced to solving a two player Buc ¨ hi game, and in particular does not depend upon the exact probabilities. In other words, the only relevant information for each (p, a, q) ∈ Q ×A×Q is whether ρ(p, a)(q) > 0 or not. Since the same will be true for the stochastic control problem we study in this paper, in our examples we do not specify the exact probabilities, and an edge from p to q labelled a means that ρ(p, a)(q) > 0. Let us now ﬁx an MDP M and consider a population of n tokens (we use tokens to represent the agents). Each token evolves in an independent copy of the MDP M. The controller acts through a strategy σ : Q →A, meaning that given the state each of the n tokens is in, the controller chooses one action to be performed by all tokens independently. Formally, we are considering the n n product MDP M whose set of states is Q , set of actions is A, and transition n n th table is ρ (u, a)(v)= ρ(u ,a)(v ), where u, v ∈Q and u ,v are the i i i i i i=1 components of u and v. n n Let s, t ∈Q be the source and target states, we write s and t for the constant n-tuples where all components are s and t. For a ﬁxed value of n, n n whether there exists a strategy ensuring to reach t almost surely from s can be reduced to solving a two player Buc ¨ hi game in the same way as above for a single MDP, replacing M by M . The stochastic control problem asks whether this is true for arbitrary values of n: Problem 1 (Stochastic control problem). The inputs are an MDP M, a source state s ∈Q and a target state t ∈Q. The question is whether for all n ∈ N, n n there exists a strategy ensuring to reach t almost surely from s . Our main result is the following. Theorem 1. The stochastic control problem is decidable. The fact that the problem is co-recursively enumerable is easy to see: if the answer is “no”, there exists n ∈ N such that there exist no strategy ensuring n n to reach t almost surely from s . Enumerating the values of n and solving the almost sure reachability problem for M eventually ﬁnds this out. However, it is not clear whether one can place an upper bound on such a witness n, which Controlling a random population 123 would yield a simple (yet ineﬃcient!) algorithm. As a corollary of our analysis we can indeed derive such an upper bound, but it is non elementary in the size of the MDP. In the remainder of this section we present a few interesting examples. Example 1 Let us consider the MDP represented in Figure 1. We show that for this MDP, for any n ∈ N, the controller has an almost sure strategy to reach n n t from s . Starting with n tokens on s, we iterate the following strategy: – Repeatedly play action a until all tokens are in q; – Play action b. The ﬁrst step is eventually successful with probability one, since at each iteration there is a positive probability that the number of tokens in state q increases. In the second step, with non zero probability at least one token goes to t, while the rest go back to s. It follows that each iteration of this strategy increases with non zero probability the number of tokens in t. Hence, all tokens are eventually transferred to t almost surely. n n Fig. 1. The controller can almost surely reach t from s , for any n ∈ N. Example 2 We now consider the MDP represented in Figure 2. By convention, if from a state some action does not have any outgoing transition (for instance the action u from s), then it goes to the sink state ⊥. We show that there exists a controller ensuring to transfer seven tokens from s to t, but that the same does not hold for eight tokens. For the ﬁrst assertion, we present the following strategy: – Play a. One of the states q for i ∈{u, d} receives at least 4 tokens. – Play i ∈{u, d}. At least 4 tokens go to t while at most 3 go to q . 1 1 – Play a. One of the states q for i ∈{u, d} receives at least 2 tokens. – Play i ∈{u, d}. At least 2 tokens go to t while at most 1 token goes to q . 2 2 – Play a. The token (if any) goes to q for i ∈{u, d}. 3 3 124 T. Colcombet et al. – Play i ∈{u, d}. The remaining token (if any) goes to t. Now assume that there are 8 tokens or more on s. The only choices for a strategy are to play u or d on the second, fourth, and sixth move. First, with non zero probability at least 4 tokens are in each of q for i ∈{u, d}. Then, whatever the choice of action i ∈{u, d}, there are at least 4 tokens in q after the next step. Proceeding likewise, there are at least 2 tokens in q with non zero probability two steps later. Then again two steps later, at least 1 token falls in the sink with non zero probability. Fig. 2. The controller can synchronise up to 7 tokens on the target state t almost surely, but not more. Generalising this example shows that if the answer to the stochastic control problem is “no”, the smallest number of tokens n for which there exist no almost n n surely strategy for reaching t from s may be exponential in |Q|. This can further extended to show a doubly exponential in Q lower bound, as done in [3,4]; the example produced there holds for both the adversarial and the stochastic setting. Interestingly, for the adversarial setting this doubly exponential lower bound is tight. Our proof for the stochastic setting yields a non-elementary bound, leaving a very large gap. Example 3 We consider the MDP represented in Figure 3. For any n ∈ N, n n there exists a strategy almost surely reaching t from s . However, this strategy has to pass tokens one by one through q . We iterate the following strategy: – Repeatedly play action a until exactly 1 token is in q . – Play action b. The token goes to q for some i ∈{l, r}. – Play action i ∈{l, r}, which moves the token to t. Note that the ﬁrst step may take a very long time (the expectation of the number of as to be played until this happens is exponential in the number of tokens), Controlling a random population 125 but it is eventually successful with probability one. This very slow strategy is necessary: if q contains at least two tokens, then action b should not be played: with non zero probability, at least one token ends up in each of q ,q ,soatthe l r next step some token ends up in ⊥. It follows that any strategy almost surely reaching t has to be able to detect the presence of at most 1 token in q . This is a key example for understanding the diﬃculty of the stochastic control problem. Fig. 3. The controller can synchronise any number of tokens almost surely on the target state t, but they have to go one by one. 3 The sequential ﬂow problem We let Q be a ﬁnite set of states. We call conﬁguration an element of N and Q×Q ﬂow an element of f ∈ N .Aﬂow f induces two conﬁgurations pre(f) and post(f) deﬁned by pre(f)(p)= f(p, q) and post(f)(q)= f(p, q). q∈Q p∈Q Given c, c two conﬁgurations and f a ﬂow, we say that c goes to c using f and write c → c ,if c = pre(f) and c = post(f). A ﬂow word is f = f ...f where each f is a ﬂow. We write c c if there 1 i exists a sequence of conﬁgurations c = c ,c ,...,c = c such that c → c 0 1 i−1 i for all i ∈{1,...,}. In this case, we say that c goes to c using the ﬂow word f. We now recall some classical deﬁnitions related to well quasi orders ([15,16], see [19] for an exposition of recent results). Let (E,) be a quasi ordered set (i.e. is reﬂexive and transitive), it is a well quasi ordered set (WQO) if any inﬁnite sequence contains an increasing pair. We say that S ⊆ E is downward closed if for any x ∈ S,if y x then y ∈ S.An ideal is a non-empty downward 126 T. Colcombet et al. closed set I ⊆ E such that for all x, y ∈ I, there exists some z ∈ I satisfying both x z and y z. Lemma 1. – Any inﬁnite sequence of decreasing downward closed sets in a WQO is even- tually constant. – A subset is downward closed if and only if it is a ﬁnite union of incomparable ideals. We call it its decomposition into ideals (or simply, its decomposi- tion), which is unique (up to permutation). – An ideal is included in a downward closed set if and only if it is included in one of the ideals of its decomposition. Q Q×Q We equip the set of conﬁgurations N and the set of ﬂows N with the quasi order deﬁned component wise, yielding thanks to Dickson’s Lemma [6] two WQOs. Lemma 2. Let X be a ﬁnite set. A subset of N is an ideal if and only if it is of the form a↓= {c ∈ N | c a}, for some a ∈ (N ∪{ω}) (in which ω is larger than all integers). We represent downward closed sets of conﬁgurations and ﬂows using their decomposition into ﬁnitely many ideals of the form a ↓ for a ∈ (N ∪{ω}) or Q×Q a ∈ (N ∪{ω}) . Problem 2 (Sequential ﬂow problem). Let Q be a ﬁnite set of states. Given a Q×Q downward closed set of ﬂows Flows ⊆ N and a downward closed set of ﬁnal conﬁgurations F ⊆ N , compute the downward closed set ∗ Q f ∗ Pre (Flows,F)= {c ∈ N | c c ∈ F, f ∈ Flows } , i.e. the conﬁgurations from which one may reach F using only ﬂows from Flows. 4 Reduction of the stochastic control problem to the sequential ﬂow problem Let us consider an MDP M and a target t ∈Q. We ﬁrst recall a folklore result reducing the almost sure reachability question for MDPs to solving a two player B¨ uchi game (we refer to [14] for the deﬁnitions and notations of Buc ¨ hi games). The Buc ¨ hi game is played between Eve and Adam as follows. From a state p: 1. Eve chooses an action a and a transition (p, q) ∈ Δ ; 2. Adam can either choose to agree and the game continues from q,or interrupt and choose another transition (p, q ) ∈ Δ , the game continues from q . Controlling a random population 127 The Buc ¨ hi objective is satisﬁed (meaning Eve wins) if either the target state t is reached or Adam interrupts inﬁnitely many times. Lemma 3. There exists a strategy ensuring almost surely to reach t from s if and only if Eve has a winning strategy from s in the above Buchi ¨ game. We now explain how this reduction can be extended to the stochastic control problem. Let us consider an MDP M and a target t ∈Q. We now deﬁne an inﬁnite Buc ¨ hi game G . The set of vertices is the set of conﬁgurations N .For aﬂow f, we write supp(f)= (p, q) ∈Q : f(p, q) > 0 . The game is played as follows from a conﬁguration c: 1. Eve chooses an action a and a ﬂow f such that pre(f)= c and supp(f) ⊆ Δ . 2. Adam can either choose to agree and the game continues from c = post(f) interrupt and choose a ﬂow f such that pre(f )= c and supp(f ) ⊆ Δ , and the game continues from c = post(f ). Note that Eve choosing a ﬂow f is equivalent to choosing for each token a transition (p, q) ∈ Δ , inducing the conﬁguration c , and simiarly for Adam should he decide to interrupt. Eve wins if either all tokens are in the target state, or if Adam interrupts inﬁnitely many times. Note that although the game is inﬁnite, it is actually a disjoint union of ﬁnite games. Indeed, along a play the number of tokens is ﬁxed, so each play is included in Q for some n ∈ N. Lemma 4. Let c be a conﬁguration with n tokens in total, the following are equivalent: – There exists a strategy almost surely reaching t from c, – Eve has a winning strategy in the Buchi ¨ game G starting from c. Lemma 4 follows from applying Lemma 3 on the product MDP M . (i) We also consider the game G for i ∈ N, which is deﬁned just as G except (i) for the winning objective: Eve wins in G if either all tokens are in the target state, or if Adam interrupts more than i times. It is clear that if Eve has a (i) winning strategy in G then she has a winning strategy in G . Conversely, the (i) following result states that G is equivalent to G for some i. Lemma 5. There exists i ∈ N such that from any conﬁguration c ∈ N , Eve (i) has a winning strategy in G if and only if Eve has a winning strategy in G . M 128 T. Colcombet et al. (i) (i) Q Proof: Let X ⊆ N be the winning region for Eve in G . We ﬁrst argue that (i) X = X is the winning region in G . It is clear that X is contained in the winning region: if Eve has a strategy to ensure that either all tokens are in the target state, or that Adam interrupts inﬁnitely many times, then it particular this is true for Adam interrupting more than i times for any i. The converse inclusion holds because G is a disjoint union of ﬁnite Buc ¨ hi games. Indeed, in a ﬁnite Buc ¨ hi game, since Adam can restrict himself to playing a memoryless winning strategy, if Eve can ensure that he interrupts a certain number of times (larger than the size of the game), then by a simple pumping argument this implies that Adam will interrupt inﬁnitely many times. (i) To conclude, we note that each X is downward closed: indeed, a winning strategy from a conﬁguration c can be used from a conﬁguration c where there (i) are fewer tokens in each state. It follows that (X ) is a decreasing sequence i≥0 of downward closed sets in N , hence it stabilises thanks to Lemma 1, i.e. there (i ) (i) exists i ∈ N such that X = X , which concludes. Note that Lemma 4 and Lemma 5 substantiate the claims made in Section 2: pure positional strategies are enough and the answer to the stochastic control problem does not depend upon the exact probabilities in the MDP. Indeed, the construction of the Buc ¨ hi games do not depend on them, and the answer to the former is equivalent to determining whether Eve has a winning strategy in each of them. We are now fully equipped to show that a solution to the sequential ﬂow problem yields the decidability of the stochastic control problem. Let F be the set of conﬁgurations for which all tokens are in state t.welet (i) (i) Q X ⊆ N denote the winning region for Eve in the game G . Note ﬁrst that ∗ 0 (0) X = Pre (Flows ,F ) where 0 Q×Q Flows = {f ∈ N : ∃a ∈A, supp(f) ⊆ Δ }. (0) Indeed, in the game G Adam cannot interrupt as this would make him lose (0) 0 immediately. Hence, the winning region for Eve in G is Pre (Flows ,F ). We generalise this by setting Flows for all i> 0 to be the set of ﬂows f ∈ Q×Q N such that for some action a ∈A, – supp(f) ⊆ Δ , and (i−1) – for f with pre(f ) = pre(f) and supp(f ) ⊆ Δ , we have post(f ) ∈ X . Equivalently, this is the set of ﬂows for which, when played in the game G by Eve, Adam cannot use an interrupt move and force the conﬁguration outside (i−1) of X . We now claim that ∗ i (i) X = Pre (Flows ,F ) for all i ≥ 0. (i) We note that this means that for each i computing X reduces to solving one instance of the sequential ﬂow problem. This induces an algorithm for solving Controlling a random population 129 (i) the stochastic control problem: compute the sequence (X ) until it stabilises, i≥0 which is ensured by Lemma 5 and yields the winning region of G . The answer to the stochastic control problem is then whether the initial conﬁguration where all tokens are in s belongs to the winning region of G . Let us prove the claim by induction on i. ∗ i Let c be a conﬁguration in Pre (Flows ,F ). This means that there exists i f a ﬂow word f = f ··· f such that f ∈ Flows for all k, and c c ∈ F . 1 k Expanding the deﬁnition, there exist c = c,...,c = c such that c → c 0 k−1 k for all k. (i) Let us now describe a strategy for Eve in G starting from c. As long as Adam agrees, Eve successively chooses the sequence of ﬂows f ,f ,... and the 1 2 corresponding conﬁgurations c ,c ,... . If Adam never interrupts, then the game 1 2 reaches the conﬁguration c ∈ F , and Eve wins. Otherwise, as soon as Adam (i−1) interrupts, by deﬁnition of Flows , we reach a conﬁguration d ∈ X .By induction hypothesis, Eve has a strategy which ensures from d to either reach F or that Adam interrupts at least i − 1 times. In the latter case, adding the interrupt move leading to d yields i interrupts, so this is a winning strategy for (i) (i) Eve in G , witnessing that c ∈ X . (i) Conversely, assume that there is a winning strategy σ of Eve in G from a conﬁguration c. Consider a play consistent with σ, it either reaches F or Adam interrupts. Let us denote by f = f ,f ,...,f the sequence of ﬂows until 1 2 then. We argue that f ∈ Flows for k ∈{1,...,}.Let f = f for some k,by k k deﬁnition of the game supp(f) ⊆ Δ for some action a. Let f such that pre(f )= pre(f) and supp(f ) ⊆ Δ . In the game G after Eve played f , Adam has a M k the possibility to interrupt and choose f . From this conﬁguration onward the (i−1) i strategy σ is winning in G , implying that f ∈ Flows .Thus f = f f ...f 1 2 (i) is a witness that c ∈ X . 5 Computability of the sequential ﬂow problem Q×Q Let Q be a ﬁnite set of states, Flows ⊆ N a downward closed set of ﬂows and F ⊆ N a downward closed set of conﬁgurations, the sequential ﬂow problem is to compute the downward closed set Pre deﬁned by ∗ Q f ∗ Pre (Flows,F)= {c ∈ N | c c ∈ F, f ∈ Flows } , i.e. the conﬁgurations from which one may reach F using only ﬂows from Flows. The following classical result of [22] allows us to further reduce our problem. Lemma 6. The task of computing a downward closed set can be reduced to the task of deciding whether a given ideal is included in a downward closed set. Thanks to Lemma 6, it is suﬃcient for solving the sequential ﬂow problem to establish the following result. 130 T. Colcombet et al. Lemma 7. Let I be an ideal of the form a↓ for a ∈ (N ∪{ω}) , and Flows ⊆ Q×Q N be a downward closed set of ﬂows. It is decidable whether F can be reached from all conﬁgurations of I using only ﬂows from Flows. Q×Q We call a vector a ∈ (N ∪{ω}) a capacity.A capacity word is a ﬁnite sequence of capacities. For two capacity words w, w of the same length, we write w ≤ w to mean that w ≤ w for each i. Since ﬂows are particular cases of capacities, we can compare ﬂows with capacities in the same way. Before proving Lemma 7 let us give an example and some notations. Given a state q, we write q ∈ N for the vector which has value 1 on the q component and 0 elsewhere. More generally we let αq for α ∈ N ∪{ω} denote the vector with value α on the q component and 0 elsewhere. We use similar notations for ﬂows. For instance, ωq + q has value ω in the q component, 1 in 1 2 1 the q component, and 0 elsewhere. In the instance of the sequential ﬂow problem represented in Figure 4, we ask the following question: can F be reached from any conﬁguration of I =(ωq )↓? n−1 n f The answer is yes: the capacity word w =(ac b) is such that nq nq ∈ F 2 4 for a ﬂow word f w, the begining of which is described in Figure 5. Fig. 4. An instance of the sequential ﬂow problem. We let Flows = a↓∪ b↓∪ c ↓ where a = ω(q ,q )+(q ,q )+ ω(q ,q ), b = ω(q ,q )+(q ,q )+ ω(q ,q ), and c = 2 2 2 3 4 4 1 2 3 4 4 4 ω(q ,q )+(q ,q )+ ω(q ,q )+ ω(q ,q )+ ω(q ,q ). Set also F =(ωq )↓. 1 1 2 1 2 2 3 3 4 4 4 n−1 Fig. 5. A ﬂow word f = f f ...f ac b such that nq goes to (n − 1)q + q 1 2 n+1 2 1 4 using f. This construction can be extended to f w such that nq goes to nq using f. 2 4 We write a[ω ← n] for the conﬁguration obtained from a by replacing all ωs by n. Controlling a random population 131 The key idea for solving the sequential ﬂow problem is to rephrase it using regular cost functions (a set of tools for solving boundedness questions). Indeed, whether F can be reached from all conﬁgurations of I = a ↓ using only ﬂows from Flows can be equivalently phrased as a boundedness question, as follows: does there exist a bound on the values of n ∈ N such that a[ω ← n] c for some c ∈ F and f ∈ Flows ? We show that this boundedness question can be formulated as a boundedness question for a formula of cost monadic logic, a formalism that we introduce now. We assume that the reader is familiar with monadic second order logic (MSO) over ﬁnite words, and refer to [20] for the deﬁnitions. The syntax of cost monadic logic (cost MSO for short) extends MSO with the construct |X|≤ N, where X is a second order variable and N is a bounding variable. The semantics is deﬁned as usual: w, n |= ϕ for a word w ∈ A , with n ∈ N specifying the bound N. We assume that there is at most one bounding variable, and that the construct |X|≤ N appears positively, i.e. under an even number of negations. This ensures that the larger N, the more true the formula is: if w, n |= ϕ, then w, n |= ϕ for all n ≥ n. The semantics of a formula ϕ of cost MSO induces a function A → N ∪ {∞} deﬁned by ϕ(w)=inf {n ∈ N | w, n |= ϕ}. The boundedness problem for cost monadic logic is the following problem: ∗ ∗ given a cost MSO formula ϕ over A , is it true that the function A → N ∪ {∞} is bounded, i.e.: ∃n ∈ N, ∀w ∈ A ,w,n |= ϕ? The decidability of the boundedness problem is a central result in the theory of regular cost functions ([5]). Since in the theory of regular cost functions, when considering functions we are only interested in whether they are bounded or not, we will consider functions “up to boundedness properties”. Concretely, this means that a cost function is an equivalence class of functions A → N ∪ {∞}, with the equivalence being f ≈ g if there exists α : N → N such that f(w) is ﬁnite if and only if g(w) is ﬁnite, and in this case, f(w) α(g(w)) and g(w) α(f(w)). This is equivalent to stating that for all X ⊆ A ,if f is bounded over X if and only if g is bounded over X. Let us now establish Lemma 7. Proof: Let T = {q ∈Q| a(q)= ω}. Note that for n suﬃciently large, we have Q×Q a[ω ← n]↓= I ∩{0, 1,...,n}.Welet C ⊆ (N ∪{ω}) be the decomposition of Flows into ideals, that is, C is the minimal ﬁnite set such that Flows = b↓ . b∈C We let k denote the largest ﬁnite value that appears in the deﬁnition of C , that is, k = max{b(q, q ): b ∈ C ,q,q ∈Q,b(q, q ) = ω}. Let us deﬁne the function Φ : C −→ N ∪{ω} w −→ sup{n ∈ N : ∃f w, a[ω ← n] F }. 132 T. Colcombet et al. By deﬁnition Φ is unbounded if and only if F can be reached from all conﬁgura- tions of I. Since boundedness of cost MSO is decidable, it suﬃces to construct a formula in cost monadic logic for Φ to obtain the decidability of our problem. Our approach will be to additively decompose the capacity word w into a ﬁnitary (ﬁn) part w (which is handled using a regular language), and several unbounded (s) parts w for each s ∈ T . The unbounded parts require a more careful analysis which notably goes through the use of the max-ﬂow min-cut theorem. Note that a[ω ← n] decomposes as the sum of its ﬁnite part a = a[ω ← 0] ﬁn and ns. Since ﬂows are additive, it holds that f w = w ...w is a 1 l s∈T ﬂow from c to F if and only if the capacity word w may be decomposed into (s) (s) (ﬁn) (ﬁn) (s) (ﬁn) (w ) =(w ...w ) and w = w ...w such that s∈T s∈T 1 l 1 l (s) – all the numbers appearing in the w capacities are bounded by k, (s) – for all i ∈{1,...,l},w = w , s∈T ∪{fin} f (s) – for all s ∈ T , ns F for some ﬂow word f w , f (ﬁn) – and a F for some ﬂow word f w . ﬁn In order to encode such capacity words in cost MSO we use monadic variables (s) W where q, q ∈Q, p ∈{0,...,k,ω} and s ∈ T ∪{ﬁn}. They are meant to q,q ,p (s) (s) satisfy that i ∈ W if and only if w (q, q )= p. We use bold W to denote q,q ,p,s i (s) (s) (s) the tuple (W ) , and W for (W ) when s ∈ T ∪{ω} is ﬁxed. q,q ,p,s q,q ,p q,q ,p q,q ,p (s) The MSO formula IsDecomp(W,w) states that a decomposition (w ) s∈T ∪{ω} is semantically valid and sums to w: (s) (s) ∀i, i ∈ W ∧ i/ ∈ W q,q ,s p∈{0,...,k,ω} q,q ,p p =p q,q ,p (s) ∧ w (q, q )= p =⇒ i ∈ W i (p ) q,q p s∈T ∪{fin} s∈T ∪{ﬁn} q,q ,p p =p For s ∈ T , we now consider the function (s) Q×Q Ψ : {0, 1,...,k,ω} −→ N ∪{ω} (s) (s) w −→ sup{n ∈ N |∃f w ,ns F }. Q×Q (ﬁn) We also deﬁne Ψ ⊆ ({0,...,k,ω}) to be the language of capacity words (ﬁn) (ﬁn) f w such that there exists a ﬂow f w with a F . Note that ﬁn (ﬁn) Ψ is a regular language since it is recognized by a ﬁnite automaton over {0, 1,...,k|Q|} that may update the current bounded conﬁguration only with (ﬁn) ﬂows smaller than the current letter of w . We have (s) (s) (ﬁn) (ﬁn) Φ(w)=sup ∃W , IsDecomp(W,w) ∧ Ψ (W ) ≥ n ∧ W ∈ Ψ . s∈T (s) Hence, it is suﬃcient to prove that for each s ∈ T , Ψ is deﬁnable in cost MSO. Controlling a random population 133 Q×Q Let us ﬁx s and a capacity word w ∈{0,...,k,ω} of length |w| = . Consider the ﬁnite graph G with vertex set Q×{0, 1,...,} and for all i ≥ 1, an (s) edge from (q, i − 1) to (q ,i) labelled by w (q, q ). Then Ψ (w) is the maximal ﬂow from (s, 0) to (t, )in G. We recall that a cut in a graph with distinguished source s and target t is a set of edges such that removing them disconnects s and t. The cost of a cut is the sum of the weight of its edges. The max-ﬂow min-cut theorem states that the maximal ﬂow in a graph is exactly the minimal cost of a cut ([11]). (s) We now deﬁne a cost MSO formula Ψ which is equivalent (in terms of cost (s) functions) to the minimal cost of cut in the previous graph G and thus to Ψ .In the following formula, X =(X ) represents a cut in the graph: i ∈ X q,q q,q ∈Q q,q means that edge ((q, i−1), (q ,i)) belongs to the cut. Likewise, P =(P ) q,q q,q ∈Q (s) represents paths in the graph. Let Ψ (w) be deﬁned by inf ∃X n ≥|X | ∧ ∀i, i ∈ X =⇒ w (q, q ) <ω ∧ Disc (X,w) , q,q q,q i s,t q,q where Disc (X,w) expresses that X disconnects (s, 0) and (t, )in G.For s,t instance Disc (X,w) is deﬁned by s,t ∀P , ∀i, i ∈ P =⇒ w (q, q ) > 0 ∧ 0 ∈ P ∧ ∈ P ∧ q,q i s,q q,t q,q q ∀i ≥ 1, i ∈ P =⇒ i − 1 ∈ P =⇒∃i, i ∈ X ∧ i ∈ P . q,q q ,q q,q q,q q,q q q,q (s) (s) Now Ψ (w) does not exactly deﬁne the minimal total weight Φ (w) of a cut, but rather the minimal value over all cuts of the minimum over (q, q ) ∈Q of how many edges are of the form ((q, i − 1), (q ,i)). This is good enough for our purposes since these two values are related by (s) (s) 2 (s) ˜ ˜ Ψ (w) Φ (w) k|Q| Ψ (w), (s) (s) implying that the functions Ψ and Φ deﬁne the same cost function. In par- (s) ticular, Φ is deﬁnable in cost MSO. 6 Conclusions We showed the decidability of the stochastic control problem. Our approach uses well quasi orders and the sequential ﬂow problem, which is then solved using the theory of regular cost functions. Together with the original result of [3,4] in the adversarial setting, our result contributes to the theoretical foundations of parameterised control. We return to the ﬁrst application of this model, control of biological systems. As we discussed 134 T. Colcombet et al. the stochastic setting is perhaps more satisfactory than the adversarial one, although as we saw very complicated behaviours emerge in the stochastic setting involving single agents, which are arguably not pertinent for modelling biological systems. We thus pose two open questions. The ﬁrst is to settle the complexity status of the stochastic control problem. Very recently [18] proved the EXPTIME- hardness of the problem, which is interesting because the underlying phenomena involved in this hardness result are speciﬁc to the stochastic setting (and do not apply to the adversarial setting). Our algorithm does not even yield elementary upper bounds, leaving a very large complexity gap. The second question is to- wards more accurately modelling biological systems: can we reﬁne the stochastic control problem by taking into account the synchronising time of the controller, and restrict it to reasonable bounds? Acknowledgements We thank Nathalie Bertrand and Blaise Genest for introducing us to this fasci- nating problem, and the preliminary discussions at the Simons Institute for the Theory of Computing in Fall 2015. References 1. Abdulla, P.A., Henda, N.B., Mayr, R.: Decisive Markov chains. Logical Methods in Computer Science 3(4) (2007). https://doi.org/10.2168/LMCS-3(4:7)2007 2. Angluin, D., Aspnes, J., Diamadi, Z., Fischer, M.J., Peralta, R.: Computation in networks of passively mobile ﬁnite-state sensors. Distributed Computing 18(4), 235–253 (2006). https://doi.org/10.1007/s00446-005-0138-3 3. Bertrand, N., Dewaskar, M., Genest, B., Gimbert, H.: Con- trolling a population. In: CONCUR. pp. 12:1–12:16 (2017). https://doi.org/10.4230/LIPIcs.CONCUR.2017.12 4. Bertrand, N., Dewaskar, M., Genest, B., Gimbert, H., Godbole, A.A.: Controlling a population. Logical Methods in Computer Science 15(3) (2019), https://lmcs. episciences.org/5647 5. Colcombet, T.: Regular cost functions, part I: logic and algebra over words. Log- ical Methods in Computer Science 9(3) (2013). https://doi.org/10.2168/LMCS- 9(3:3)2013 6. Dickson, L.E.: Finiteness of the odd perfect and primitive abundant numbers with n distinct prime factors. American Journal of Mathematics 35(4), 413–422 (1913), http://www.jstor.org/stable/2370405 7. Esparza, J.: Parameterized veriﬁcation of crowds of anonymous processes. In: Dependable Software Systems Engineering, pp. 59–71. IOS Press (2016). https://doi.org/10.3233/978-1-61499-627-9-59 8. Esparza, J., Finkel, A., Mayr, R.: On the veriﬁcation of broadcast protocols. In: LICS. pp. 352–359 (1999). https://doi.org/10.1109/LICS.1999.782630 9. Fijalkow, N.: Undecidability results for probabilistic automata. SIGLOG News 4(4), 10–17 (2017), https://dl.acm.org/citation.cfm?id=3157833 Controlling a random population 135 10. Fijalkow, N., Gimbert, H., Horn, F., Oualhadj, Y.: Two recursively insep- arable problems for probabilistic automata. In: MFCS. pp. 267–278 (2014). https://doi.org/10.1007/978-3-662-44522-8 23 11. Ford, L.R., Fulkerson, D.R.: Maximal ﬂow through a network. Canadian Journal of Mathematics 8, 399–404 (1956). https://doi.org/10.4153/CJM-1956-045-5 12. German, S.M., Sistla, A.P.: Reasoning about systems with many processes. Journal of the ACM 39(3), 675–735 (1992) 13. Gimbert, H., Oualhadj, Y.: Probabilistic automata on ﬁnite words: De- cidable and undecidable problems. In: ICALP. pp. 527–538 (2010). https://doi.org/10.1007/978-3-642-14162-1 44 14. Gr¨ adel, E., Thomas, W., Wilke, T. (eds.): Automata, Logics, and Inﬁnite Games, LNCS, vol. 2500. Springer (2002) 15. Higman, G.: Ordering by divisibility in abstract algebras. Proceed- ings of the London Mathematical Society s3-2(1), 326–336 (1952). https://doi.org/10.1112/plms/s3-2.1.326 16. Kruskal, J.B.: The theory of well-quasi-ordering: A frequently discovered concept. J. Comb. Theory, Ser. A 13(3), 297–305 (1972). https://doi.org/10.1016/0097- 3165(72)90063-5 17. Ku˘cera, A.: Turn-Based Stochastic Games. Lectures in Game Theory for Computer Scientists, Cambridge University Press (2011) 18. Mascle, C., Shirmohammadi, M., Totzke, P.: Controlling a random population is EXPTIME-hard. CoRR (2019), http://arxiv.org/abs/1909.06420 19. Schmitz, S.: Algorithmic Complexity of Well-Quasi-Orders. Habilitation a ` diriger des recherches, Ecole normale sup´erieure Paris-Saclay (Nov 2017), https://tel. archives-ouvertes.fr/tel-01663266 20. Thomas, W.: Languages, automata, and logic. In: Handbook of Formal Language Theory, vol. III, pp. 389–455. Springer (1997) 21. Uhlendorf, J., Miermont, A., Delaveau, T., Charvin, G., Fages, F., Bottani, S., Hersen, P., Batt, G.: In silico control of biomolecular processes. Computational Methods in Synthetic Biology 13, 277–285 (2015) 22. Valk, R., Jantzen, M.: The residue of vector sets with applications to de- cidability problems in Petri nets. Acta Informatica 21, 643–674 (03 1985). https://doi.org/10.1007/BF00289715 Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made. The images or other third party material in this chapter are included in the chapter’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. Decomposing Probabilistic Lambda-Calculi 1 2() 2 Ugo Dal Lago , Giulio Guerrieri , and Willem Heijltjes Dipartimento di Informatica - Scienza e Ingegneria Universit` a di Bologna, Bologna, Italy ugo.dallago@unibo.it Department of Computer Science University of Bath, Bath, UK {w.b.heijltjes,g.guerrieri}@bath.ac.uk Abstract. A notion of probabilistic lambda-calculus usually comes with a prescribed reduction strategy, typically call-by-name or call-by-value, as the calculus is non-conﬂuent and these strategies yield diﬀerent results. This is a break with one of the main advantages of lambda-calculus: conﬂuence, which means that results are independent from thechoice of strategy. We present a probabilistic lambda-calculus where the proba- bilistic operator is decomposed into two syntactic constructs: a generator, which represents a probabilistic event; and a consumer, which acts on the term depending on a given event. The resulting calculus, the Prob- abilistic Event Lambda-Calculus, is conﬂuent, and interprets the call- by-name and call-by-value strategies through diﬀerent interpretations of the probabilistic operator into our generator and consumer constructs. We present two notions of reduction, one via ﬁne-grained local rewrite steps, and one by generation and consumption of probabilistic events. Simple types for the calculus are essentially standard, and they convey strong normalization. We demonstrate how we can encode call-by-name and call-by-value probabilistic evaluation. 1 Introduction Probabilistic lambda-calculi [24,22,17,11,18,9,15] extend the standard lambda- calculus with a probabilistic choice operator N M, which chooses N with probability p and M with probability 1 − p (throughout this paper, we let p be /2 and will omit it). Duplication of N ⊕ M, as is wont to happen in lambda- calculus, raises a fundamental question about its semantics: do the duplicate occurrences represent the same probabilistic event, or diﬀerent ones with the same probability? For example, take the term ⊥ that represents a coin ﬂip between boolean values true and false ⊥. If we duplicate this term, do the copies represent two distinct coin ﬂips with possibly distinct outcomes, or do these represent a single coin ﬂip that determines the outcome for both copies? Put diﬀerently again, when we duplicate ⊥, do we duplicate the event,or only its outcome? In probabilistic lambda-calculus, these two interpretations are captured by the evaluation strategies of call-by-name ( ), which duplicates events, and cbn c The Author(s) 2020 J. Goubault-Larrecq and B. K¨ onig (Eds.): FOSSACS 2020, LNCS 12077, pp. 136–156, 2020. https://doi.org/10.1007/978-3-030-45231-5_8 Decomposing Probabilistic Lambda-Calculi 137 call-by-value ( ), which evaluates any probabilistic choice before it is du- cbv plicated, and thus only duplicates outcomes. Consider the following example, where = tests equality of boolean values. ⊕ ⊕ (λx. x = x)( ⊥) ⊥ cbv cbn This situation is not ideal, for several, related reasons. Firstly, it demonstrates how probabilistic lambda-calculus is non-conﬂuent, negating one of the central properties of the lambda-calculus, and one of the main reasons why it is the prominent model of computation that it is. Secondly, it means that a probabilis- tic lambda-calculus must derive its semantics from a prescribed reduction strat- egy, and its terms only have meaning in the context of that strategy. Thirdly, combining diﬀerent kinds of probabilities becomes highly involved [15], as it would require specialized reduction strategies. These issues present themselves even in a more general setting, namely that of commutative (algebraic) eﬀects, which in general do not commute with copying. We address these issues by a decomposition of the probabilistic operator into a generator a and a choice ⊕, as follows. Δ a ⊕ ⊕ N M = a .N M Semantically, a represents a probabilistic event, that generates a boolean value recorded as a. The choice N ⊕M is simply a conditional on a, choosing N if a is false and M if a is true. Syntactically, a is a boolean variable with an occurrence in ⊕, and a acts as a probabilistic quantiﬁer, binding all occurrences in its scope. (To capture a non-equal chance, one would attach a probability p to a generator, as a , though we will not do so in this paper.) The resulting probabilistic event lambda-calculus Λ , which we present in PE this paper, is conﬂuent. Our decomposition allows us to separate duplicating an event, represented by the generator a , from duplicating only its outcome a, through having multiple choice operators . In this way our calculus may interpret both original strategies, call-by-name and call-by-value, by diﬀerent translations of standard probabilistic terms into Λ : call-by-name by the above PE decomposition (see also Section 2), and call-by-value by a diﬀerent one (see Sec- tion 7). For our initial example, we get the following translations and reductions. a a b = = ⊕ cbn:(λx. x x)( a . ⊕⊥) ( a . ⊕⊥) ( b . ⊕⊥) ⊥ (1) a a a = = cbv : a . (λx. x x)( ⊕⊥) a . ( ⊕⊥) ( ⊕⊥) (2) We present two reduction relations for our probabilistic constructs, both in- dependent of beta-reduction. Our main focus will be on permutative reduction (Sections 2, 3), a small-step local rewrite relation which is computationally ineﬃ- cient but gives a natural and very ﬁne-grained operational semantics. Projective reduction (Section 6) is a more standard reduction, following the intuition that a generates a coin ﬂip to evaluate ⊕ , and is coarser but more eﬃcient. We further prove conﬂuence (Section 4), and we give a system of simple types and prove strong normalization for typed terms by reducibility (Section 5). Omitted proofs can be found in [7], the long version of this paper. 138 U. Dal Lago et al. 1.1 Related Work Probabilistic λ-calculi are a topic of study since the pioneering work by Saheb- Djaromi [24], the ﬁrst to give the syntax and operational semantics of a λ-calculus with binary probabilistic choice. Giving well-behaved denotational models for probabilistic λ-calculi has proved to be challenging, as witnessed by the many contributions spanning the last thirty years: from Jones and Plotkin’s early study of the probabilistic powerdomain [17], to Jung and Tix’s remarkable (and mostly negative) observations [18], to the very recent encouraging results by Goubault- Larrecq [16]. A particularly well-behaved model for probabilistic λ-calculus can be obtained by taking a probabilistic variation of Girard’s coherent spaces [10], this way getting full abstraction [13]. On the operational side, one could mention a study about the various ways the operational semantics of a calculus with binary probabilistic choice can be speciﬁed, namely by small-step or big-step semantics, or by inductively or coin- ductively deﬁned sets of rules [9]. Termination and complexity analysis of higher- order probabilistic programs seen as λ-terms have been studied by way of type systems in a series of recent results about size [6], intersection [4], and reﬁnement type disciplines [1]. Contextual equivalence on probabilistic λ-calculi has been studied, and compared with equational theories induced by B¨ohm Trees [19], applicative bisimilarity [8], or environmental bisimilarity [25]. In all the aforementioned works, probabilistic λ-calculi have been taken as implicitly endowed with either call-by-name or call-by-value strategies, for the reasons outlined above. There are only a few exceptions, namely some works on Geometry of Interaction [5], Probabilistic Coherent Spaces [14], and Standard- ization [15], which achieve, in diﬀerent contexts, a certain degree of indepen- dence from the underlying strategy, thus accommodating both call-by-name and call-by-value evaluation. The way this is achieved, however, invariably relies on Linear Logic or related concepts. This is deeply diﬀerent from what we do here. Some words of comparison with Faggian and Ronchi Della Rocca’s work on conﬂuence and standardization [15] are also in order. The main diﬀerence between their approach and the one we pursue here is that the operator ! in their calculus Λ plays both the roles of a marker for duplicability and of a checkpoint for any probabilistic choice ”ﬂowing out” of the term (i.e. being ﬁred). In our calculus, we do not control duplication, but we deﬁnitely make use of checkpoints. Saying it another way, Faggian and Ronchi Della Rocca’s work is inspired by linear logic, while our approach is inspired by deep inference, even though this is, on purpose, not evident in the design of our calculus. Probabilistic λ-calculi can also be seen as vehicles for expressing probabilistic models in the sense of bayesian programming [23,3]. This, however, requires an operator for modeling conditioning, which complicates the metatheory consid- erably, and that we do not consider here. Our permutative reduction is a reﬁnement of that for the call-by-name prob- abilistic λ-calculus [20], and is an implementation of the equational theory of (ordered) binary decision trees via rewriting [27]. Probabilistic decision trees Decomposing Probabilistic Lambda-Calculi 139 have been proposed with a primitive binary probabilistic operator [22], but not with a decomposition as we explore here. 2 The Probabilistic Event λ-Calculus Λ PE Deﬁnition 1. The probabilistic event λ-calculus (Λ ) is given by the follow- PE ing grammar, with from left to right: a variable (denoted by x, y, z, . . . ), an abstraction,an application,a (labeled) choice, and a (probabilistic) generator. ·· M, N = x | λx.N | NM | N M | a .N ·· In a term λx. M the abstraction λx binds the free occurrences of the variable x in its scope M, and in a .N the generator a binds the label a in N. The calculus features a decomposition of the usual probabilistic sum , as follows. Δ a ⊕ ⊕ N M = a .N M (3) The generator a represents a probabilistic event, whose outcome, a binary value {0, 1} represented by the label a, is used by the choice operator ⊕. That is, a ﬂips a coin setting a to 0 (resp. 1), and depending on this N ⊕M reduces to N (resp. M). We will use the unlabeled choice as in (3). This convention also gives the translation from a call-by-name probabilistic λ-calculus into Λ (the PE interpretation of a call-by-value probabilistic λ-calculus is in Section 7). Reduction. Reduction in Λ will consist of standard β-reduction plus an PE β evaluation mechanism for generators and choice operators, which implements probabilistic choice. We will present two such mechanisms: projective reduc- tion and permutative reduction . While projective reduction implements π p the given intuition for the generator and choice operator, we relegate it to Sec- tion 6 and make permutative reduction our main evaluation mechanism, for the reason that it is more ﬁne-grained, and thus more general. Permutative reduction is based on the idea that any operator distributes over the labeled choice operator (see the reduction steps in Figure 1), even other choice operators, as below. a b b a b (N ⊕M) ⊕P ∼ (N ⊕P ) ⊕(M ⊕P ) To orient this as a rewrite rule, we need to give priority to one label over another. Fortunately, the relative position of the associated generators a and b provides just that. Then to deﬁne , we will want every choice to belong to some generator, and make the order of generators explicit. Deﬁnition 2. The set ﬂ(N)of free labels of a term N is deﬁned inductively by: ﬂ(x)= ∅ ﬂ(MN)= ﬂ(M) ∪ ﬂ(N) ﬂ(λx. M)= ﬂ(M) ﬂ( a .M)= ﬂ(M) {a} ﬂ(M N)= ﬂ(M) ∪ ﬂ(N) ∪{a} A term M is label-closed if ﬂ(M)= ∅. 140 U. Dal Lago et al. (λx.N)M N[M/x](β) N N N (i) a a a (N ⊕M) ⊕P N ⊕P (c ) p 1 a a a N ⊕(M ⊕P ) N ⊕P (c ) p 2 a a λx. (N ⊕M) (λx. N) ⊕(λx. M)( λ) a a (N ⊕M)P (NP ) ⊕(MP)( f) a a N(M ⊕P ) (NM) ⊕(NP)(⊕a) a b b a b ⊕ ⊕ ⊕ ⊕ ⊕ < ⊕⊕ (N M) P (N P ) (M P ) (if a b)( ) p 1 b a b a b < ⊕⊕ N ⊕(M ⊕P ) (N ⊕M) ⊕(N ⊕P ) (if a b)( ) p 2 a a b . (N ⊕M) ( b .N) ⊕( b .M) (if a = b)( ) a .N N (if a/ ∈ ﬂ(N)) () λx. a .N a . λx. N (λ) ( a .N)M a . (NM) (if a/ ∈ ﬂ(M)) (f) Fig. 1. Reduction Rules for β-reduction and p-reduction. From here on, we consider only label-closed terms (we implicitly assume this, unless otherwise stated). All terms are identiﬁed up to renaming of their bound variables and labels. Given some terms M and N and a variable x, M[N/x]is the capture-avoiding (for both variables and labels) substitution of N for the free occurrences of x in M. We speak of a representative M of a term when M is not considered up to such a renaming. A representative M of a term is well-labeled if for every occurrence of a in M there is no a occurring in its scope. Deﬁnition 3 (Order for labels). Let M be a well-labeled representative of a term. We deﬁne an order < for the labels occurring in M as follows: a< b M M if and only if b occurs in the scope of a . For a well-labeled and label-closed representative M, < is a ﬁnite tree order. Deﬁnition 4. Reduction = ∪ in Λ consists of β-reduction β p PE β and permutative or p-reduction , both deﬁned as the contextual closure of the rules given in Figure 1. We write for the reﬂexive–transitive closure of , and for reduction to normal form; similarly for and . We write = β p p for the symmetric and reﬂexive–transitive closure of . p Decomposing Probabilistic Lambda-Calculi 141 a a = ⊕ = ⊕ = ⊕ a . (λx. x x)( ⊥) a . (λx. x x) (λx. x x)⊥ ( a) = = a . ( ) ⊕ (⊥ ⊥) = a . ⊕ a . (i,) p p Fig. 2. Example Reduction of the cbv-translation of the Term on p. 137. Two example reductions are (1)-(2) on p. 137; a third, complete reduction is in Figure 2. The crucial feature of p-reduction is that a choice ⊕ does permute out of the argument position of an application, but a generator a does not, as below. Since the argument of a redex may be duplicated, this is how we characterize the diﬀerence between the outcome of a probabilistic event, whose duplicates may be identiﬁed, and the event itself, whose duplicates may yield diﬀerent outcomes. a a N (M ⊕P ) (NM) ⊕(NP ) N ( a .M) a .N M p p By inspection of the rewrite rules in Figure 1, we can then characterize the normal forms of and as follows. Proposition 5 (Normal forms). The normal forms P of , respectively 0 p N of , are characterized by the following grammars. ·· ⊕ N = N | N N 0 ·· 1 0 ·· ⊕ P = P | P P 0 ·· 1 0 ·· N = N | λx.N 1 ·· 2 1 ·· P = x | λx.P | P P 1 ·· 1 1 0 ·· N = x | N N 2 ·· 2 0 3 Properties of Permutative Reduction We will prove strong normalization and conﬂuence of . For strong normal- ization, the obstacle is the interaction between diﬀerent choice operators, which may duplicate each other, creating super-exponential growth. Fortunately, Der- showitz’s recursive path orders [12] seem tailor-made for our situation. Observe that the set Λ endowed with is a ﬁrst-order term rewriting sys- PE p tem over a countably inﬁnite set of variables and the signature Σ given by: • the binary function symbol ⊕, for any label a; • the unary function symbol a , for any label a; • the unary function symbol λx, for any variable x; • the binary function symbol @, letting @(M, N) stand for MN. This was inferred only from a simple simulation; we would be interested to know a rigorous complexity result. 142 U. Dal Lago et al. Deﬁnition 6. Let M be a well-labeled representative of a label-closed term, and let Σ be the set of signature symbols occurring in M. We deﬁne ≺ as M M the (strict) partial order on Σ generated by the following rules. a b ⊕ ≺ ⊕ if a< b M M ≺ b for any labels a, b b ≺ @,λx for any label b Lemma 7. The reduction is strongly normalizing. Proof. For the ﬁrst-order term rewriting system (Λ , ) we derive a well- PE p founded recursive path ordering < from ≺ following [12, p. 289]. Let f and g range over function symbols, let [N ,...,N ] denote a multiset and extend < 1 n to multisets by the standard multiset ordering, and let N = f(N ,...,N ) and 1 n M = g(M ,...,M ); then 1 m [N ,...,N ] < [M ,...,M ]if f = g 1 n 1 m N< M ⇐⇒ [N ,...,N ] < [M]if f ≺ g 1 n M [N] ≤ [M ,...,M]if f g. 1 m M While ≺ is deﬁned only relative to Σ , reduction may only reduce the signa- M M ture. Inspection of Figure 1 then shows that M N implies N< M. Conﬂuence of Permutative Reduction. With strong normalization, conﬂu- ence of requires only local conﬂuence. We reduce the number of cases to consider, by casting the permutations of ⊕ as instances of a common shape. Deﬁnition 8. We deﬁne a context C[ ] (with exactly one hole [ ]) as follows, and let C[N] represent C[ ] with the hole [ ] replaced by N. a a ·· C[] =[] | λx.C[] | C[]M | NC[] | C[] ⊕M | N ⊕C[] | a .C[] ·· ⊕ ⊕ Observe that the six reduction rules λ through in Figure 1 are all of the following form. We refer to these collectively as . a a C[N ⊕M] C[N] ⊕C[M]( ) Lemma 9 (Conﬂuence of ). Reduction is conﬂuent. p p Proof. By Newman’s lemma and strong normalization of (Lemma 7), con- ﬂuence follows from local conﬂuence. The proof of local conﬂuence consists of joining all critical pairs given by . Details are in the Appendix of [7]. Deﬁnition 10. We denote the unique p-normal form of a term N by N . p Decomposing Probabilistic Lambda-Calculi 143 4 Conﬂuence We aim to prove that = ∪ is conﬂuent. We will use the standard β p technique of parallel β-reduction [26], a simultaneous reduction step on a number of β-redexes, which we deﬁne via a labeling of the redexes to be reduced. The central point is to ﬁnd a notion of reduction that is diamond, i.e. every critical pair can be closed in one (or zero) steps. This will be our complete reduction, which consists of parallel β-reduction followed by p-reduction to normal form. Deﬁnition 11. A labeled term P is a term P with chosen β-redexes annotated • • • as (λx. N) M. The unique labeled β-step P P from P to the labeled reduct β • P reduces every labeled redex, and is deﬁned inductively as follows. • • • • • (λx. N ) M N [M /x] N M N M β • • β • • a a • • x xN ⊕M N ⊕M β β • • • • λx. N λx. N a .N a .N β • β • • • A parallel β-step P P is a labeled step P P for some labeling P . β • β • Note that P is an unlabeled term, since all labels are removed in the reduction. For the empty labeling, P = P = P , so parallel reduction is reﬂexive: P P . • β Lemma 12. A parallel β-step P P is a β-reduction P P . β • β • Proof. By induction on the labeled term P generating P P . β • Lemma 13. Parallel β-reduction is diamond. • ◦ Proof. Let P P and P P be two labeled reduction steps on a term β • β ◦ P . We annotate each step with the label of the other, preserved by reduction, •◦ ◦• to give the span from the doubly labeled term P = P below left. Reducing the remaining labels will close the diagram, as below right. ◦ •◦ ◦• • ◦ • P P = P P P P = P P β β β •◦ ◦• β • ◦ • ◦ •◦ This is proved by induction on P , where only two cases are not immediate: those where a redex carries one but not the other label. One case follows by the below diagram; the other case is symmetric. Below, for the step top right, • • • induction on N shows that N [M /x] N [M /x]. β • • ◦• ◦ ◦• • • (λx. N ) M N [M /x] N [M /x] β β ◦• ◦• ◦ ◦ == •◦ ◦ •◦ ◦ ◦ ◦ (λx. N ) M (λx. N ) M N [M /x] β β •◦ •◦ • • 144 U. Dal Lago et al. 4.1 Parallel Reduction and Permutative Reduction For the commutation of (parallel) β-reduction with p-reduction, we run into the minor issue that a permuting generator or choice operator may block a redex: in both cases below, before the term has a redex, but after it is blocked. p p a a ⊕ ⊕ (λx. N M) P ((λx. N) (λx. M)) P (λx. a .N) M ( a . λx. N) M p p We address this by an adaptation of p-reduction on labeled terms, which is a strategy in that permutes past a labeled redex in one step. • • Deﬁnition 14. A labeled p-reduction N M on labeled terms is a p- reduction of one of the forms a a • • • • • • • • • • (λx. N ⊕M ) P (λx. N ) P ⊕(λx. M ) P • • • • • • (λx. a .N ) M a . (λx. N ) M or a single p-step on unlabeled constructors in N . Lemma 15. Reduction to normal form in is equal to (on labeled terms). p p Proof. Observe that and have the same normal forms. Then in one p p direction, since ⊆ we have ⊆ . Conversely, let N M. On this p p p p p reduction, let P Q be the ﬁrst step such that P Q. Then there is an R p p such that P R and Q R. Note that we have N R. By conﬂuence, p p p R M, and by induction on the sum length of paths in from R (smaller p p than from N)wehave R M, and hence N M. p p The following lemmata then give the required commutation properties of the relations , , and . Figure 3 illustrates these by commuting diagrams. p p β • • Lemma 16. If N M then N = M . p • p • Proof. By induction on the rewrite step . The two interesting cases are: a a • • • • • • • • • • (λx. M ) (N ⊕P)((λx. M ) N ) ⊕((λx. M ) P ) β β (x ∈ fv(M)) a a M [(N ⊕P )/x] M [N /x] ⊕M [P /x] • • • • • • • a a • • • • • • • • • • ⊕ ⊕ (λx. M ) (N P)((λx. M ) N ) ((λx. M ) P ) β β (x/ ∈ fv(M)) M M M • • • p Decomposing Probabilistic Lambda-Calculi 145 How the critical pairs in the above diagrams are joined shows that we cannot use the Hindley-Rosen Lemma [2, Prop. 3.3.5] to prove conﬂuence of ∪ . β p Lemma 17. N = N . • p p• • • Proof. Using Lemma 15 we decompose N N as • • • • • N = N N ··· N = N p p p 1 2 n p where (N ) = (N ) by Lemma 16. i • p i+1 • 4.2 Complete Reduction To obtain a reduction strategy with the diamond property for , we combine parallel reduction with permutative reduction to normal form into a no- β p tion of complete reduction . We will show that it is diamond (Lemma 19), and that any step in maps onto a complete step of p-normal forms (Lemma 20). Conﬂuence of (Theorem 21) then follows: any two paths map onto complete paths on p-normal forms, which then converge by the diamond property. Deﬁnition 18. A complete reduction step N N is a parallel β-step fol- •p lowed by p-reduction to normal form: N N = N N N . •p · β • p •p Lemma 19 (Complete reduction is diamond). If P N M then for some Q, P Q M. Proof. By the following diagram, where M = N and P = N , and Q = N . ◦p •p ◦•p The square top left is by Lemma 13, top right and bottom left are by Lemma 17, and bottom right is by conﬂuence and strong normalization of p-reduction. β p ◦• • • N N N ◦ ◦p β β N N N ◦• ◦p• p p β p N N N •p◦ ◦•p •p Lemma 20 (p-Normalization maps reduction to complete reduction). If N M then N M . p p Proof. For a p-step N M we have N = M while is reﬂexive. For a p p p β β-step N M we label the reduced redex in N to get N N = M. Then β β • Lemma 17 gives N = M, and hence N N M . p• p p β p• p p 146 U. Dal Lago et al. p p NM NM NM NM β β β β p p P = Q P = Q PQ PQ p p Lemma 16 Lemma 17 Lemma 19 Lemma 20 Fig. 3. Diagrams for the Lemmata Leading up to Conﬂuence Theorem 21. Reduction is conﬂuent. Proof. By the following diagram. For the top and left areas, by Lemma 20 any reduction path N M maps onto one N M . The main square follows by p p the diamond property of complete reduction, Lemma 19. NM N M p p P Q 5 Strong Normalization for Simply-Typed Terms In this section, we prove that the relation enjoys strong normalization in simply typed terms. Our proof of strong normalization is based on the classic reducibility technique, and inherently has to deal with label-open terms. It thus make great sense to turn the order < from Deﬁnition 3 into something more formal, at the same time allowing terms to be label-open. This is in Figure 4. It is easy to realize that, of course modulo label α-equivalence, for every term M there is at least one θ such that θ M. An easy fact to check is that if θ M and M N, then θ N. It thus makes sense to parametrize on L L a sequence of labels θ, i.e., one can deﬁne a family of reduction relations on pairs in the form (M, θ). The set of strongly normalizable terms, and the number of steps to normal forms become themselves parametric: • The set SN of those terms M such that θ M and (M, θ) is strongly normalizing modulo ; θ θ θ • The function sn assigning to any term in SN the maximal number of steps to normal form. Decomposing Probabilistic Lambda-Calculi 147 ·· Label Sequences: θ = ε | a · θ ·· ·· Label Judgments: ξ = θ M ·· L θ M a · θ M L L θ L x θ L λx.M θ L a .M Label Rules: θ Mθ N θ Mθ Na ∈ θ L L L L θ MN θ M N L L Fig. 4. Labeling Terms ·· Types: τ = α | τ ⇒ ρ ·· ·· Environments: Γ = x : τ ,...,x : τ ·· 1 1 n n ·· Judgments: π = Γ M : τ ·· Γ, x : τ M : ρ Γ M : τ Γ, x : τ x : τ Γ λx.M : τ ⇒ ρ Γ a .M : τ Typing Rules: Γ M : τ ⇒ρΓ N : τ Γ M :τΓ N : τ Γ MN : ρ Γ M N : τ Fig. 5. Types, Environments, Judgments, and Rules θ θ θ θ ML ...L ∈ SN NL ...L ∈ SN a ∈ θ L ∈ SN ··· L ∈ SN 1 m 1 m 1 m θ θ xL ...L ∈ SN M ⊕NL ...L ∈ SN 1 m 1 m θ θ a·θ M[L /x]L ...L ∈ SN L ∈ SN ML ...L ∈ SN ∀i.a ∈ L 0 1 m 0 1 m i θ θ (λx. M)L ...L ∈ SN ( a .M)L ...L ∈ SN 0 m 1 m Fig. 6. Closure Rules for Sets SN We can now deﬁne types, environments, judgments, and typing rules in Figure 5. Please notice that the type structure is precisely the one of the usual, vanilla, simply-typed λ-calculus (although terms are of course diﬀerent), and we can thus reuse most of the usual proof of strong normalization, for example in the version given by Ralph Loader’s notes [21], page 17. Lemma 22. The closure rules in Figure 6 are all sound. 148 U. Dal Lago et al. Since the structure of the type system is the one of plain, simple types, the deﬁnition of reducibility sets is the classic one: Red = {(Γ, θ, M) | M ∈ SN ∧ Γ M : α}; Red = {(Γ, θ, M) | (Γ M : τ ⇒ ρ) ∧ (θ M) ∧ τ⇒ρ L ∀(ΓΔ,θ,N) ∈ Red .(ΓΔ,θ,MN) ∈ Red }. τ ρ Before proving that all terms are reducible, we need some auxiliary results. Lemma 23. 1. If (Γ, θ, M) ∈ Red , then M ∈ SN . 2. If Γ xL ...L : τ and L ,...,L ∈ SN , then (Γ, θ, xL ...L ) ∈ Red . 1 m 1 m 1 m τ 3. If (Γ, θ, M[L /x]L ...L ) ∈ Red with Γ L : ρ and L ∈ SN , then 0 1 m τ 0 0 (Γ, θ, (λx. M)L ...L ) ∈ Red . 0 m τ 4. If (Γ, θ, ML ...L ) ∈ Red with (Γ, θ, NL ...L ) ∈ Red and a ∈ θ, then 1 m τ 1 m τ (Γ, θ, (M ⊕N)L ...L ) ∈ Red . 1 m τ 5. If (Γ, a · θ, ML ...L ) ∈ Red and a ∈ L for all i, 1 m τ i then (Γ, θ, ( a .M)L ...L ) ∈ Red . 1 m τ Proof. The proof is an induction on τ:If τ is an atom α, then Point 1 follows by deﬁnition, while points 2 to 5 come from Lemma 22.If τ is ρ ⇒ μ, Points 2 to 5 come directly from the induction hypothesis, while Point 1 can be proved θ θ by observing that M is in SN if Mx is itself SN , where x is a fresh variable. By induction hypothesis (on Point 2), we can say that (Γ (x : ρ),θ,x) ∈ Red , and conclude that (Γ (x : ρ),θ,Mx) ∈ Red . The following is the so-called Main Lemma: Proposition 24. Suppose y : τ ,...,y : τ M : ρ and θ M, with 1 1 n n L (Γ, θ, N ) ∈ Red for all 1 ≤ j ≤ n. Then (Γ, θ, M[N /y ,...,N /y ]) ∈ Red . j τ 1 1 n n ρ Proof. This is an induction on the structure of the term M: • If M is a variable, necessarily one among y ,...,y , then the result is trivial. 1 n • If M is an application LP , then there exists a type ξ such that y : τ ,...,y : 1 1 n τ L : ξ ⇒ ρ and y : τ ,...,y : τ P : ξ. Moreover, θ L and θ P n 1 1 n n L L we can then safely apply the induction hypothesis and conclude that (Γ, θ, L[N/y]) ∈ Red (Γ, θ, P [N/y]) ∈ Red . ξ⇒ρ ξ By deﬁnition, we get (Γ, θ, (LP )[N/y]) ∈ Red . • If M is an abstraction λx. L, then ρ is an arrow type ξ ⇒ μ and y : τ ,...,y : τ ,x : ξ L : μ. Now, consider any (ΓΔ,θ,P ) ∈ Red . Our 1 n n ξ objective is to prove with this hypothesis that (ΓΔ, θ, (λx.L[N/y])P ) ∈ Red . By induction hypothesis, since (ΓΔ, N ) ∈ Red , we get that μ i τ (ΓΔ,θ,L[N/y, P/x]) ∈ Red . The thesis follows from Lemma 23. μ Decomposing Probabilistic Lambda-Calculi 149 • If M is a sum L ⊕P , we can make use of Lemma 23 and the induction hypothesis, and conclude. • If M is a generator a .P , we can make use of Lemma 23 and the induction hypothesis. We should however observe that a · θ P , since θ M. L L We now have all the ingredients for our proof of strong normalization: Theorem 25. If Γ M : τ and θ M, then M ∈ SN . Proof. Suppose that x : ρ ,...,x : ρ M : τ. Since x : ρ ,...,x : ρ x : 1 1 n n 1 1 n n i ρ for all i, and clearly θ x for every i, we can apply Lemma 24 and obtain i L i that (Γ, θ, M[x/x]) ∈ Red from which, via Lemma 23, one gets the thesis. 6 Projective Reduction Permutative reduction evaluates probabilistic sums purely by rewriting. Here we look at a more standard projective notion of reduction, which conforms more closely to the intuition that a generates a probabilistic event to determine the choice . Using + for an external probabilistic sum, we expect to reduce a .N to N + N where each N is obtained from N by projecting every subterm M M 0 1 i 0 1 to M . The question is, in what context should we admit this reduction? We ﬁrst limit ourselves to reducing in head position. a a Deﬁnition 26. The a-projections π (N) and π (N) are deﬁned as follows: 0 1 a a a a π (N M)= π (N) π (λx. N)= λx.π (N) 0 0 i i a a a a a π (N ⊕M)= π (M) π (NM)= π (N) π (M) 1 1 i i i b b a a a a π ( a .N)= a .N π (N ⊕M)= π (N) ⊕ π (M)if a = b i i i i a a a π (x)=xπ ( b .N)= b .π (N)if a = b. i i i Deﬁnition 27. A head context H[ ] is given by the following grammar. ·· H[] =[] | λx. H[] | H[]N ·· Deﬁnition 28. Projective head reduction is given by πh a a H[ a .N] H[π (N)] + H[π (N)] . πh 0 1 We can simulate by permutative reduction if we interpret the external πh sum + by an outermost ⊕ (taking special care if the label does not occur). Proposition 29. Permutative reduction simulates projective head reduction: H[N] if a/ ∈ ﬂ(N) H[ a .N] a a H[π (N)] H[π (N)] otherwise. 0 1 150 U. Dal Lago et al. Proof. The case a/ ∈ ﬂ(N) is immediate by a step. For the other case, observe that H[ a .N] a .H[N]by λ and f steps, and since a does not occur in a a H[ ], that H[π (N)] = π (H[N]). By induction on N,if a is minimal in N (i.e. i i a a a ∈ ﬂ(N) and a ≤ b for all b ∈ ﬂ(N)) then N π (N) π (N). As required, 0 1 a a H[ a .N] a .H[π (N)] ⊕ H[π (N)] if a ∈ ﬂ(N) . 0 1 A gap remains between which generators will not be duplicated, which we should be able to reduce, and which generators projective head reduction does reduce. In particular, to interpret call-by-value probabilistic reduction in Sec- tion 7, we would like to reduce under other generators. However, permutative reduction does not permit exchanging generators, and so only simulates reducing in head position. While (independent) probabilistic events are generally consid- ered interchangeable, it is a question whether the below equivalence is desirable. a . b .N ∼ b . a .N (4) We elide the issue by externalizing probabilistic events, and reducing with refer- ence to a predetermined binary stream s ∈{0, 1} representing their outcomes. In this way, we will preserve the intuitions of both permutative and projective reduction: we obtain a qualiﬁed version of the equivalence (4) (see (5) below), and will be able to reduce any generator on the spine of a term: under (other) generators and choices as well as under abstractions and in function position. Deﬁnition 30. The set of streams is S = {0, 1} , ranged over by r, s, t, and i · s denotes a stream with i ∈{0, 1} as ﬁrst element and s as the remainder. Deﬁnition 31. The stream labeling N of a term N with a stream s ∈ S, which i s annotates generators as a with i ∈{0, 1} and variables as x with a stream s, is given inductively below. We lift β-reduction to stream-labeled terms by s s introducing a substitution case for stream-labeled variables: x [M/x]= M . s s i·s i s (λx. N) = λx. N ( a .N) = a .N a a s s s s s (NM) = N M (N ⊕M) = N ⊕M Deﬁnition 32. Projective reduction on stream-labeled terms is the rewrite relation given by i a a .N π (N) . Observe that in N a generator that occurs under n other generators on the spine of N, is labeled with the element of s at position n + 1. Generators in argument position remain unlabeled, until a β-step places them on the spine, in which case they become labeled by the new substitution case. We allow to annotate a term with a ﬁnite preﬁx of a stream, e.g. N with a singleton i, so that only part of the spine is labeled. Subsequent labeling of a partly labeled term is r s r·s then by (N ) = N (abusing notation). To introduce streams via the external Decomposing Probabilistic Lambda-Calculi 151 probabilistic sum, and to ignore an unused remaining stream after completing a probabilistic computation, we adopt the following equation. 0 1 N = N + N Proposition 33. Projective reduction generalizes projective head reduction: 0 1 a a H[ a .N]= H[ a .N]+ H[ a .N] H[π (N)] + H[π (N)] . 0 1 Returning to the interchangeability of probabilistic events, we reﬁne (4)by exchanging the corresponding elements of the annotating streams: i·j·s i j s a b s ( a . b .N) = a . b .N π (π (N )) i j ∼ = (5) j·i·s j i s b a s ( b . a .N) = b . a .N π (π (N )) j i Stream-labeling externalizes all probabilities, making reduction determinis- tic. This is expressed by the following proposition, that stream-labeling com- mutes with reduction: if a generator remains unlabeled in M and becomes la- beled after a reduction step M N, what label it receives is predetermined. The deep reason is that stream labeling assigns an outcome to each generator in a way that corresponds to a call-by-name strategy for probabilistic reduction. s s Proposition 34. If M N by a step other than then M N . Remark 35. The statement is false for the rule a .N N (a/ ∈ ﬂ(N)), as it removes a generator but not an element from the stream. Arguably, for this reason the rule should be excluded from the calculus. On the other hand, the ⊕ ⊕ rule is necessary to implement idempotence of , rather than just , as follows. N N = a .N ⊕N a .N N where a/ ∈ ﬂ(N) p p The below proposition then expresses that projective reduction is an invari- ant for permutative reduction. If N M by a step (that is not ) on a labeled generator a or a corresponding choice ⊕, then N and M reduce to a common term, N P M, by the projective steps evaluating a . π π Proposition 36. Projective reduction is an invariant for permutative reduction, as follows (with a case for c symmetric to c , and where D[] is a context). 2 1 p p a a a a i i i i ⊕ a .C[N] ⊕ ⊕ ⊕ a .C[N N] a .C[(N M) N ] a .C[N N ] 0 1 0 1 π π π π a a π (C[N]) π (C[N ]) i i a a i i a .C[D[N ⊕N ]] a .C[D[N ] ⊕D[N ]] 0 1 0 1 π π π (C[D[N ]]) i 152 U. Dal Lago et al. p p i i i i λx. a .N a . λx. N ( a .N)M a .NM π π π π λ f a a a a = = λx. π (N) π (λx. N) π (N)Mπ (NM) i i i i 7 Call-by-value Interpretation We consider the interpretation of a call-by-value probabilistic λ-calculus. For simplicity we will allow duplicating (or deleting) β-redexes, and only restrict duplicating probabilities; our values V are then just deterministic—i.e. without choices—terms, possibly applications and not necessarily β-normal (so that our is actually β-reduction on deterministic terms, unlike [9]). We evaluate the βv internal probabilistic choice ⊕ to an external probabilistic choice +. ·· ⊕ N = x | λx.N | MN | M N (λx.N)V N[V/x] ·· v βv ·· ⊕ V, W = x | λx.V | VW M N M + N ·· v v The interpretation N of a call-by-value term N into Λ is given as follows. v PE First, we translate N to a label-open term N = θ P by replacing each open L ⊕ ⊕ choice with one with a unique label, where the label-context θ collects the labels used. Then N is the label closure N = θ P , which preﬁxes P v v L with a generator a for every a in θ. Deﬁnition 37. (Call-by-value interpretation) The open interpretation N open of a call-by-value term N is as follows, where all labels are fresh, and inductively N = θ P for i ∈{1, 2}. i open i L i x = x N N = θ · θ P P open L 1 2 open 2 1 L 1 2 λx.N = θ λx.P N N = θ · θ · a P ⊕P 1 open 1 L 1 1 v 2 open 2 1 L 1 2 The label closure θ P is given inductively as follows. P = P a · θ P = θ a .P L L L The call-by-value interpretation of N is N = N . v open Our call-by-value reduction may choose an arbitrary order in which to evalu- ate the choices in a term N, but the order of generators in the interpretation N is necessarily ﬁxed. Then to simulate a call-by-value reduction, we cannot choose a ﬁxed context stream a priori; all we can say is that for every reduction, there is some stream that allows us to simulate it. Speciﬁcally, a reduction step C[N N ] C[N ] where C[ ] is a call-by-value term context is simulated by 0 v 1 v j the following projective step. i j k i k ... a . b . c ...D[P ⊕P ] ... a . c ...D[P ] 0 1 π j Decomposing Probabilistic Lambda-Calculi 153 ⊕ ⊕ Here, C[N N ] = θ D[P P ] with D[] a Λ -context, and θ giving 0 v 1 open L 0 1 PE rise to the sequence of generators ... a . b . c ... in the call-by-value transla- tion. To simulate the reduction step, if b occupies the n-th position in θ, then the n-th position in the context stream s must be the element j. Since β-reduction survives the translation and labeling process intact, we may simulate call-by- value probabilistic reduction by projective and β-reduction. Theorem 38. If N V then N V for some stream s ∈ S. v,βv π,β v 8 Conclusions and Future Work We believe our decomposition of probabilistic choice in λ-calculus to be an ele- gant and compelling way of restoring conﬂuence, one of the core properties of the λ-calculus. Our probabilistic event λ-calculus captures traditional call-by-name and call-by-value probabilistic reduction, and oﬀers ﬁner control beyond those strategies. Permutative reduction implements a natural and ﬁne-grained equiv- alence on probabilistic terms as internal rewriting, while projective reduction provides a complementary and more traditional external perspective. There are a few immediate areas for future work. Firstly, within probabilistic λ-calculus, it is worth exploring if our decomposition opens up new avenues in semantics. Secondly, our approach might apply to probabilistic reasoning more widely, outside the λ-calculus. Most importantly, we will explore if our approach can be extended to other computational eﬀects. Our use of streams interprets probabilistic choice as a read operation from an external source, which means other read operations can be treated similarly. A complementary treatment of write operations would allow us to express a considerable range of eﬀects, in- cluding input/output and state. Acknowledgments This work was supported by EPSRC Project EP/R029121/1 Typed Lambda- Calculi with Sharing and Unsharing. The ﬁrst author is partially supported by the ANR project 19CE480014 PPS, the ERC Consolidator Grant 818616 DI- APASoN, and the MIUR PRIN 201784YSZ5 ASPRA. We thank the referees for their diligence and their helpful comments. We are grateful to Chris Bar- rett and—indirectly—Anupam Das for pointing us to Zantema and Van de Pol’s work [27]. References 1. Avanzini, M., Dal Lago, U., Ghyselen, A.: Type-based complexity analysis of probabilistic functional programs. In: 34th Annual ACM/IEEE Symposium on Logic in Computer Science, LICS 2019. pp. 1–13. IEEE Computer Society (2019). https://doi.org/10.1109/LICS.2019.8785725 2. Barendregt, H.P.: The Lambda Calculus – Its Syntax and Semantics, Studies in logic and the foundations of mathematics, vol. 103. North-Holland (1984) 154 U. Dal Lago et al. 3. Borgstr¨ om, J., Dal Lago, U., Gordon, A.D., Szymczak, M.: A lambda-calculus foundation for universal probabilistic programming. In: 21st ACM SIGPLAN In- ternational Conference on Functional Programming, ICFP 2016. pp. 33–46. ACM (2016). https://doi.org/10.1145/2951913.2951942 4. Breuvart, F., Dal Lago, U.: On intersection types and probabilistic lambda calculi. In: roceedings of the 20th International Symposium on Principles and Practice of Declarative Programming, PPDP 2018. pp. 8:1–8:13. ACM (2018). https://doi.org/10.1145/3236950.3236968 5. Dal Lago, U., Faggian, C., Valiron, B., Yoshimizu, A.: The geometry of parallelism: classical, probabilistic, and quantum eﬀects. In: Proceedings of the 44th ACM SIGPLAN Symposium on Principles of Programming Languages, POPL 2017. pp. 833–845. ACM (2017). https://doi.org/10.1145/3009837 6. Dal Lago, U., Grellois, C.: Probabilistic termination by monadic aﬃne sized typing. ACM Transactions on Programming Languages and Systems 41(2), 10:1–10:65 (2019). https://doi.org/10.1145/3293605 7. Dal Lago, U., Guerrieri, G., Heijltjes, W.: Decomposing probabilistic lambda- calculi (long version) (2020), https://arxiv.org/abs/2002.08392 8. Dal Lago, U., Sangiorgi, D., Alberti, M.: On coinductive equivalences for higher- order probabilistic functional programs. In: The 41st Annual ACM SIGPLAN- SIGACT Symposium on Principles of Programming Languages, POPL ’14. pp. 297–308. ACM (2014). https://doi.org/10.1145/2535838.2535872 9. Dal Lago, U., Zorzi, M.: Probabilistic operational semantics for the lambda cal- culus. RAIRO - Theoretical Informatics and Applications 46(3), 413–450 (2012). https://doi.org/10.1051/ita/2012012 10. Danos, V., Ehrhard, T.: Probabilistic coherence spaces as a model of higher- order probabilistic computation. Information and Compututation 209(6), 966–991 (2011). https://doi.org/10.1016/j.ic.2011.02.001 11. de’Liguoro, U., Piperno, A.: Non deterministic extensions of untyped lambda-calculus. Information and Computation 122(2), 149–177 (1995). https://doi.org/10.1006/inco.1995.1145 12. Dershowitz, N.: Orderings for term-rewriting systems. Theoretical Computer Sci- ence 17, 279–301 (1982). https://doi.org/10.1016/0304-3975(82)90026-3 13. Ehrhard, T., Pagani, M., Tasson, C.: Full abstraction for probabilistic PCF. Journal of the ACM 65(4), 23:1–23:44 (2018). https://doi.org/10.1145/3164540 14. Ehrhard, T., Tasson, C.: Probabilistic call by push value. Logical Methods in Com- puter Science 15(1) (2019). https://doi.org/10.23638/LMCS-15(1:3)2019 15. Faggian, C., Ronchi Della Rocca, S.: Lambda calculus and probabilis- tic computation. In: 34th Annual ACM/IEEE Symposium on Logic in Computer Science, LICS 2019. pp. 1–13. IEEE Computer Society (2019). https://doi.org/10.1109/LICS.2019.8785699 16. Goubault-Larrecq, J.: A probabilistic and non-deterministic call-by-push- value language. In: 34th Annual ACM/IEEE Symposium on Logic in Computer Science, LICS 2019. pp. 1–13. IEEE Computer Society (2019). https://doi.org/10.1109/LICS.2019.8785809 17. Jones, C., Plotkin, G.D.: A probabilistic powerdomain of evaluations. In: Proceedings of the Fourth Annual Symposium on Logic in Com- puter Science (LICS ’89). pp. 186–195. IEEE Computer Society (1989). https://doi.org/10.1109/LICS.1989.39173 18. Jung, A., Tix, R.: The troublesome probabilistic powerdomain. Electronic Notes in Theoretical Computer Science 13, 70–91 (1998). https://doi.org/10.1016/S1571- 0661(05)80216-6 Decomposing Probabilistic Lambda-Calculi 155 19. Leventis, T.: Probabilistic B¨ ohm trees and probabilistic separation. In: Pro- ceedings of the 33rd Annual ACM/IEEE Symposium on Logic in Com- puter Science, LICS 2018. pp. 649–658. IEEE Computer Society (2018). https://doi.org/10.1145/3209108.3209126 20. Leventis, T.: A deterministic rewrite system for the probabilistic λ-calculus. Mathematical Structures in Computer Science 29(10), 1479–1512 (2019). https://doi.org/10.1017/S0960129519000045 21. Loader, R.: Notes on simply typed lambda calculus. Reports of the laboratory for foundations of computer science ECS-LFCS-98-381, University of Edinburgh, Edinburgh (1998), http://www.lfcs.inf.ed.ac.uk/reports/98/ECS-LFCS-98-381/ 22. Manber, U., Tompa, M.: Probabilistic, nondeterministic, and alternating decision trees. In: 14th Annual ACM Symposium on Theory of Computing. pp. 234–244 (1982). https://doi.org/10.1145/800070.802197 23. Ramsey, N., Pfeﬀer, A.: Stochastic lambda calculus and monads of probability distributions. In: Conference Record of POPL 2002: The 29th SIGPLAN-SIGACT Symposium on Principles of Programming Languages. pp. 154–165. POPL ’02 (2002). https://doi.org/10.1145/503272.503288 24. Saheb-Djahromi, N.: Probabilistic LCF. In: Mathematical Foundations of Com- puter Science 1978, Proceedings, 7th Symposium. Lecture Notes in Computer Sci- ence, vol. 64, pp. 442–451. Springer (1978). https://doi.org/10.1007/3-540-08921- 7 92 25. Sangiorgi, D., Vignudelli, V.: Environmental bisimulations for probabilistic higher- order languages. In: Proceedings of the 43rd Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, POPL 2016. pp. 595–607 (2016). https://doi.org/10.1145/2837614.2837651 26. Takahashi, M.: Parallel reductions in lambda-calculus. Information and Computa- tion 118(1), 120–127 (1995). https://doi.org/10.1006/inco.1995.1057 27. Zantema, H., van de Pol, J.: A rewriting approach to binary decision dia- grams. The Journal of Logic and Algebraic Programming 49(1-2), 61–86 (2001). https://doi.org/10.1016/S1567-8326(01)00013-3 156 U. Dal Lago et al. Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made. The images or other third party material in this chapter are included in the chapter’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. On the k-synchronizability of Systems Cinzia Di Giusto () , Laetitia Laversa , and Etienne Lozes Universit´eCote ˆ d’Azur, CNRS, I3S, Sophia Antipolis, France {cinzia.di-giusto,laetitia.laversa,etienne.lozes}@univ-cotedazur.fr Abstract. We study k-synchronizability: a system is k-synchronizable if any of its executions, up to reordering causally independent actions, can be divided into a succession of k-bounded interaction phases. We show two results (both for mailbox and peer-to-peer automata): ﬁrst, the reachability problem is decidable for k-synchronizable systems; second, the membership problem (whether a given system is k-synchronizable) is decidable as well. Our proofs ﬁx several important issues in previous attempts to prove these two results for mailbox automata. Keywords: Veriﬁcation · Communicating Automata · A/Synchronous communication. 1 Introduction Asynchronous message-passing is ubiquitous in communication-centric systems; these include high-performance computing, distributed memory management, event-driven programming, or web services orchestration. One of the parameters that play an important role in these systems is whether the number of pending sent messages can be bounded in a predictable fashion, or whether the buﬀering capacity oﬀered by the communication layer should be unlimited. Clearly, when considering implementation, testing, or veriﬁcation, bounded asynchrony is pre- ferred over unbounded asynchrony. Indeed, for bounded systems, reachability analysis and invariants inference can be solved by regular model-checking [5]. Unfortunately and even if designing a new system in this setting is easier, this is not the case when considering that the buﬀering capacity is unbounded, or that the bound is not known a priori . Thus, a question that arises naturally is how can we bound the “behaviour” of a system so that it operates as one with un- bounded buﬀers? In a recent work [4], Bouajjani et al. introduced the notion of k-synchronizable system of ﬁnite state machines communicating through mail- boxes and showed that the reachability problem is decidable for such systems. Intuitively, a system is k-synchronizable if any of its executions, up to reordering causally independent actions, can be chopped into a succession of k-bounded in- teraction phases. Each of these phases starts with at most k send actions that are followed by at most k receptions. Notice that, a system may be k-synchronizable even if some of its executions require buﬀers of unbounded capacity. As explained in the present paper, this result, although valid, is surprisingly non-trivial, mostly due to complications introduced by the mailbox semantics of c The Author(s) 2020 J. Goubault-Larrecq and B. K¨ onig (Eds.): FOSSACS 2020, LNCS 12077, pp. 157–176, 2020. https://doi.org/10.1007/978-3-030-45231-5_9 158 C. Di Giusto et al. communications. Some of these complications were missed by Bouajjani et al. and the algorithm for the reachability problem in [4] suﬀers from false positives. Another problem is the membership problem for the subclass of k-synchronizable systems: for a given k and a given system of communicating ﬁnite state machines, is this system k-synchronizable? The main result in [4] is that this problem is decidable. However, again, the proof of this result contains an important ﬂaw at the very ﬁrst step that breaks all subsequent developments; as a consequence, the algorithm given in [4] produces both false positives and false negatives. In this work, we present a new proof of the decidability of the reachability problem together with a new proof of the decidability of the membership pro- blem. Quite surprisingly, the reachability problem is more demanding in terms of causality analysis, whereas the membership problem, although rather intricate, builds on a simpler dependency analysis. We also extend both decidability results to the case of peer-to-peer communication. Outline. Next section recalls the deﬁnition of communicating systems and re- lated notions. In Section 3 we introduce k-synchronizability and we give a graphi- cal characterisation of this property. This characterisation corrects Theorem 1 in [4] and highlights the ﬂaw in the proof of the membership problem. Next, in Section 4, we establish the decidability of the reachability problem, which is the core of our contribution and departs considerably from [4]. In Section 5,we show the decidability of the membership problem. Section 6 extends previous results to the peer-to-peer setting. Finally Section 7 concludes the paper dis- cussing other related works. Proofs and some additional material are available at https://hal.archives-ouvertes.fr/hal-02272347. 2 Preliminaries A communicating system is a set of ﬁnite state machines that exchange messages: automata have transitions labelled with either send or receive actions. The paper mainly considers as communication architecture, mailboxes: i.e., messages await to be received in FIFO buﬀers that store all messages sent to a same automaton, regardless of their senders. Section 6, instead, treats peer-to-peer systems, their introduction is therefore delayed to that point. Let V be a ﬁnite set of messages and P a ﬁnite set of processes. A send action, denoted send(p, q, v), designates the sending of message v from process p to process q. Similarly a receive action rec(p, q, v) expresses that process q is receiving message v from p. We write a to denote a send or receive action. Let S = {send(p, q, v) | p, q ∈ P, v ∈ V} be the set of send actions and R = {rec(p, q, v) | p, q ∈ P, v ∈ V} the set of receive actions. S and R stand p p for the set of sends and receives of process p respectively. Each process is encoded by an automaton and by abuse of notation we say that a system is the parallel composition of processes. Deﬁnition 1 (System). A system is a tuple S = (L ,δ ,l ) | p ∈ P where, p p for each process p, L is a ﬁnite set of local control states, δ ⊆ (L ×(S ∪R )× p p p p p L ) is the transition relation (also denoted l − → l ) and l is the initial state. p p p On the k-synchronizability of Systems 159 Deﬁnition 2 (Conﬁguration). Let S = (L ,δ ,l ) | p ∈ P , a conﬁguration p p is a pair (l, Buf) where l =(l ) ∈ Π L is a global control state of S (a p p∈P p∈P p ∗ P local control state for each automaton), and Buf =(b ) ∈ (V ) is a vector p p∈P of buﬀers, each b being a word over V. We write l to denote the vector of initial states of all processes p ∈ P, and Buf 0 0 stands for the vector of empty buﬀers. The semantics of a system is deﬁned by the two rules below. [SEND] [RECEIVE] send(p,q,v) rec(p,q,v) l − −−−−−−→ l b = b · v l − −−−−−→ l b = v · b p p q q q q p q q q send(p,q,v) rec(p,q,v) (l, Buf) − −−−−−−→ (l[l /l ], Buf[b /b ]) (l, Buf) − −−−−−→ (l[l /l ], Buf[b /b ]) p p q q q q q q A send action adds a message in the buﬀer b of the receiver, and a receive action pops the message from this buﬀer. An execution e = a ··· a is a sequence of 1 n a a 1 n actions in S ∪ R such that (l , Buf ) −→ ··· −−→ (l, Buf) for some l and Buf. 0 0 e a a 1 n As usual = ⇒ stands for −→··· −−→.Wewrite asEx(S) to denote the set of asynchronous executions of a system S. In a sequence of actions e = a ··· a , 1 n a send action a = send(p, q, v)is matched by a reception a = rec(p ,q , v ) i j (denoted by a a )if i<j, p = p , q = q , v = v , and there is ≥ 1 such i j that a and a are the th actions of e with these properties respectively. A send i j action a is unmatched if there is no matching reception in e.A message exchange of a sequence of actions e is a set either of the form v = {a ,a } with a a or i j i j of the form v = {a } with a unmatched. For a message v , we will note v the i i i i corresponding message exchange. When v is either an unmatched send(p, q, v) or a pair of matched actions {send(p, q, v),rec(p, q, v)}, we write proc (v) for p and proc (v) for q. Note that proc (v) is deﬁned even if v is unmatched. Finally, R R we write procs(v) for {p} in the case of an unmatched send and {p, q} in the case of a matched send. An execution imposes a total order on the actions. We are interested in stressing the causal dependencies between messages. We thus make use of mes- sage sequence charts (MSCs) that only impose an order between matched pairs of actions and between the actions of a same process. Informally, an MSC will be depicted with vertical timelines (one for each process) where time goes from top to bottom, that carry some events (points) representing send and receive actions of this process (see Fig. 1). An arc is drawn between two matched events. We will also draw a dashed arc to depict an unmatched send event. An MSC is, thus, a partially ordered set of events, each corresponding to a send or receive action. Deﬁnition 3 (MSC). A message sequence chart is a tuple (Ev, λ, ≺), where – Ev is a ﬁnite set of events, – λ : Ev → S ∪ R tags each event with an action, – ≺=(≺ ∪≺ ) is the transitive closure of ≺ and ≺ where: po src po src •≺ is a partial order on Ev such that, for all process p, ≺ induces a po po −1 total order on the set of events of process p, i.e., on λ (S ∪ R ) p p SR 160 C. Di Giusto et al. pq pq pq r r r v v v 1 1 1 v v v SS 2 2 3 v v 3 2 RS v v 1 2 SR (a) (b) (c) (d) Fig. 1: (a) and (b): two MSCs that violate causal delivery. (c) and (d): an MSC and its conﬂict graph •≺ is a binary relation that relates each receive event to its preceding src send event : −1 ∗ for all events r ∈ λ (R), there is exactly one events s such that s ≺ r src −1 ∗ for all events s ∈ λ (S), there is at most one event r such that s ≺ r src ∗ for any two events s, r such that s ≺ r, there are p, q, v such that src λ(s)= send(p, q, v) and λ(r)= rec(p, q, v). We identify MSCs up to graph isomorphism (i.e., we view an MSC as a labeled graph). For a given well-formed (i.e., each reception is matched) sequence of actions e = a ...a ,welet msc(e) be the MSC where Ev =[1..n], ≺ is the 1 n po set of pairs of indices (i, j) such that i<j and {a ,a }⊆ S ∪ R for some i j p p p ∈ P (i.e., a and a are actions of a same process), and ≺ is the set of pairs i j src of indices (i, j) such that a a . We say that e = a ...a is a linearisation i j 1 n of msc(e), and we write asT r(S) to denote {msc(e) | e ∈ asEx(S)} the set of MSCs of system S. Mailbox communication imposes a number of constraints on what and when messages can be read. The precise deﬁnition is given below, we now discuss some of the possible scenarios. For instance: if two messages are sent to a same process, they will be received in the same order as they have been sent. As another example, unmatched messages also impose some constraints: if a process p sends an unmatched message to r, it will not be able to send matched messages to r afterwards (Fig. 1a); or similarly, if a process p sends an unmatched message to r, any process q that receives subsequent messages from p will not be able to send matched messages to r afterwards (Fig. 1b). When an MSC satisﬁes the constraint imposed by mailbox communication, we say that it satisﬁes causal delivery. Notice that, by construction, all executions satisfy causal delivery. Deﬁnition 4 (Causal Delivery). Let (Ev, λ, ≺) be an MSC. We say that it satisﬁes causal delivery if the MSC has a linearisation e = a ...a such that for 1 n any two events i ≺ j such that a = send(p, q, v) and a = send(p ,q, v ), either i j a is unmatched, or there are i ,j such that a a , a a , and i ≺ j . j i i j j Our deﬁnition enforces the following intuitive property. On the k-synchronizability of Systems 161 Proposition 1. An MSC msc satisﬁes causal delivery if and only if there is a system S and an execution e ∈ asEx(S) such that msc = msc(e). We now recall from [4] the deﬁnition of conﬂict graph depicting the causal dependencies between message exchanges. Intuitively, we have a dependency SS whenever two messages have a process in common. For instance an −→ depen- dency between message exchanges v and v expresses the fact that v has been sent after v, by the same process. Deﬁnition 5 (Conﬂict Graph). The conﬂict graph CG(e) of a sequence of XY actions e = a ··· a is the labeled graph (V, {−→} ) where V is the set 1 n X,Y ∈{R,S} of message exchanges of e, and for all X, Y ∈{S, R}, for all v, v ∈ V , there is XY a XY dependency edge v −→ v between v and v if there are i<j such that {a } = v ∩ X, {a } = v ∩ Y , and proc (v)= proc (v ). i j X Y Notice that each linearisation e of an MSC will have the same conﬂict graph. We can thus talk about an MSC and the associated conﬂict graph. (As an exam- ple see Figs. 1c and 1d.) XY We write v → v if v −→ v for some X, Y ∈{R, S}, and v → v if there is a (possibly empty) path from v to v . 3 k-synchronizable Systems In this section, we deﬁne k-synchronizable systems. The main contribution of this part is a new characterisation of k-synchronizable executions that corrects the one given in [4]. In the rest of the paper, k denotes a given integer k ≥ 1. A k-exchange denotes a sequence of actions starting with at most k sends and followed by at most k receives matching some of the sends. An MSC is k-synchronous if there exists a linearisation that is breakable into a sequence of k-exchanges, such that a message sent during a k-exchange cannot be received during a subsequent one: either it is received during the same k-exchange, or it remains orphan forever. Deﬁnition 6 (k-synchronous). An MSC msc is k-synchronous if: 1. there exists a linearisation of msc e = e · e ··· e where for all i ∈ [1..n], 1 2 n ≤k ≤k e ∈ S · R , 2. msc satisﬁes causal delivery, 3. for all j, j such that a a holds in e, a a holds in some e . j j j j i An execution e is k-synchronizable if msc(e) is k-synchronous. We write sT r (S) to denote the set {msc(e) | e ∈ asEx(S) and msc(e)is k-synchronous}. Example 1 (k-synchronous MSCs and k-synchronizable Executions). SS RR 162 C. Di Giusto et al. pq pq pq r r rs RR v v v 2 1 2 SR SR v v v v 3 2 1 3 v v v 3 2 4 v v v v 4 3 4 5 RR (a) (b) (c) (d) Fig. 2: (a) the MSC of Example 1.1. (b) the MSC of Example 1.2. (c) the MSC of Example 2 and (d) its conﬂict graph. 1. There is no k such that the MSC in Fig. 2ais k-synchronous. All messages must be grouped in the same k-exchange, but it is not possible to schedule all the sends ﬁrst, because the reception of v happens before the sending of v . Still, this MSC satisﬁes causal delivery. 2. Let e = send(r, q, v )·send(q, p, v )·send(p, q, v )·rec(q, p, v )·rec(r, q, v ) 1 3 2 1 2 3 be an execution. Its MSC, msc(e ) depicted in Fig. 2b satisﬁes causal deliv- ery. Notice that e can not be divided in 1-exchanges. However, if we consider the alternative linearisation of msc(e ): e = send(p, q, v ) · send(q, p, v ) · 1 2 1 2 rec(q, p, v ) · send(r, q, v ) · rec(r, q, v ), we have that e is breakable into 1- 2 3 3 2 exchanges in which each matched send is in a 1-exchange with its reception. Therefore, msc(e ) is 1-synchronous and e is 1-synchronizable. Remark that 1 1 e is not an execution and there exists no execution that can be divided into 1-exchanges. A k-synchronous MSC highlights dependencies between mes- sages but does not impose an order for the execution. Comparison with [4].In [4], the authors deﬁne set sEx (S) as the set of k- synchronous executions of system S in the k-synchronous semantics. Nonetheless as remarked in Example 1.2 not all executions of a system can be divided into k-exchanges even if they are k-synchronizable. Thus, in order not to lose any executions, we have decided to reason only on MSCs (called traces in [4]). Following standard terminology, we say that a set U ⊆ V of vertices is a strongly connected component (SCC) of a given graph (V, →) if between any two ∗ ∗ vertices v, v ∈ U, there exist two oriented paths v → v and v → v. The statement below ﬁxes some issues with Theorem 1 in [4]. Theorem 1 (Graph Characterisation of k-synchronous MSCs). Let msc be a causal delivery MSC. msc is k-synchronous iﬀ every SCC in its conﬂict graph is of size at most k and if no RS edge occurs on any cyclic path. Example 2 (A 5-synchronous MSC). Fig. 2c depicts a 5-synchronous MSC, that is not 4-synchronous. Indeed, its conﬂict graph (Fig. 2d) contains a SCC of size 5 (all vertices are on the same SCC). RR SS On the k-synchronizability of Systems 163 Comparison with [4]. Bouajjani et al. give a characterisation of k-synchronous executions similar to ours, but they use the word cycle instead of SCC, and the subsequent developments of the paper suggest that they intended to say Hamiltonian cycle (i.e., a cyclic path that does not go twice through the same vertex). It is not the case that a MSC is k-synchronous if and only if every Hamiltonian cycle in its conﬂict graph is of size at most k and if no RS edge occurs on any cyclic path. Indeed, consider again Example 2. This graph is not Hamiltonian, and the largest Hamiltonian cycle indeed is of size 4 only. But as we already discussed in Example 2, the corresponding MSC is not 4-synchronous. As a consequence, the algorithm that is presented in [4] for deciding whether a system is k-synchronizable is not correct as well: the MSC of Fig. 2c would be considered 4-synchronous according to this algorithm, but it is not. 4 Decidability of Reachability for k-synchronizable Systems We show that the reachability problem is decidable for k-synchronizable systems. While proving this result, we have to face several non-trivial aspects of causal delivery that were missed in [4] and that require a completely new approach. Deﬁnition 7 (k-synchronizable System). A system S is k-synchronizable if all its executions are k-synchronizable, i.e., sT r (S)= asT r(S). In other words, a system S is k-synchronizable if for every execution e of S, msc(e) may be divided into k-exchanges. Remark 1. In particular, a system may be k-synchronizable even if some of its executions ﬁll the buﬀers with more than k messages. For instance, the only linearisation of the 1-synchronous MSC Fig. 2b that is an execution of the system needs buﬀers of size 2. For a k-synchronizable system, the reachability problem reduces to the rea- chability through a k-synchronizable execution. To show that k-synchronous reachability is decidable, we establish that the set of k-synchronous MSCs is regular. More precisely, we want to deﬁne a ﬁnite state automaton that accepts a sequence e · e ··· e of k-exchanges if and only if they satisfy causal delivery. 1 2 n We start by giving a graph-theoretic characterisation of causal delivery. For XY this, we deﬁne the extended edges v v of a given conﬂict graph. The relation XY XY is deﬁned in Fig. 3 with X, Y ∈{S, R}. Intuitively, v v expresses that event X of v must happen before event Y of v due to either their order on the same machine (Rule 1), or the fact that a send happens before its matching receive (Rule 2), or due to the mailbox semantics (Rules 3 and 4), or because of a chain of such dependencies (Rule 5). We observe that in the extended con- ﬂict graph, obtained applying such rules, a cyclic dependency appears whenever causal delivery is not satisﬁed. 164 C. Di Giusto et al. XY RR v ∩ R = ∅ v −→ v v −→ v 1 2 1 2 (Rule 2) (Rule 1) (Rule 3) SR XY SS v v v v v v 1 2 1 2 v ∩ R = ∅ v ∩ R = ∅ 1 2 XY YZ v v 1 2 proc (v )= proc (v ) 1 2 R R (Rule 5) (Rule 4) XZ SS v v 1 2 v v 1 2 Fig. 3: Deduction rules for extended dependency edges of the conﬂict graph Example 3. Fig. 5aand 5b depict an MSC and its associated conﬂict graph with some extended edges. This MSC violates causal delivery and there is a cyclic SS dependency v v . 1 1 Theorem 2 (Graph-theoretic Characterisation of Causal Delivery). An MSC satisﬁes causal delivery iﬀ there is no cyclic causal dependency of the form SS v v for some vertex v of its extended conﬂict graph. Let us now come back to our initial problem: we want to recognise with ﬁnite memory the sequences e ,e ...e of k-exchanges that composed give an MSC 1 2 n that satisﬁes causal delivery. We proceed by reading each k-exchange one by one in sequence. This entails that, at each step, we have only a partial view of the global conﬂict graph. Still, we want to determine whether the acyclicity condition of Theorem 2 is satisﬁed in the global conﬂict graph. The crucial observation is that only the edges generated by Rule 4 may “go back in time”. This means that we have to remember enough information from the previously examined k- exchanges to determine whether the current k-exchange contains a vertex v that shares an edge with some unmatched vertex v seen in a previous k-exchange and whether this could participate in a cycle. This is achieved by computing two sets of processes C and C that collect the following information: a process S,p R,p q is in C if it performs a send action causally after an unmatched send to S,p p, or it is the sender of the unmatched send; a process q belongs to C if it R,p receives a message that was sent after some unmatched message directed to p. More precisely, we have: SS C = {proc (v) | v v & v is unmatched & proc (v )= p} S,p S R SS C = {proc (v) | v v & v is unmatched & proc (v )= p & v ∩ R
= ∅} R,p R R These sets abstract and carry from one k-exchange to another the necessary information to detect violations of causal delivery. We compute them in any local conﬂict graph of a k-exchange incrementally, i.e., knowing what they were at the end of the previous k-exchange, we compute them at the end of the current one. More precisely, let e = s ··· s · r ··· r be a k-exchange, CG(e)=(V, E) its 1 m 1 m P P conﬂict graph and B : P → (2 × 2 ) the function that associates to each p ∈ P the two sets B(p)=(C ,C ). Then, the conﬂict graph CG(e, B) is the graph S,p R,p (V ,E ) with V = V ∪{ψ | p ∈ P} and E ⊇ E as deﬁned below. For each process p ∈ P, the “summary node” ψ shall account for all past unmatched p On the k-synchronizability of Systems 165 ∗ ∗ e = s ··· s · r ··· r s ··· s ∈ S r ··· r ∈ R 0 ≤ m ≤ m ≤ k 1 m 1 1 m 1 m m (l, Buf ) = ⇒ (l , Buf) for some Buf for all p ∈ P B(p)=(C ,C )and B (p)=(C ,C ), S,p R,p S,p R,p Unm = {ψ }∪{v | v is unmatched, proc (v)= p} p p SS C = C ∪{p | p ∈ C ,v ψ , (proc (v)= p or v = ψ )}∪ X,p X,p X,q q p SS {proc (v) | v ∈ Unm ∩ V, X = S}∪{proc (v ) | v v ,v ∈ Unm ,v ∩ X = ∅} p p X X for all p ∈ P,p ∈ C R,p e,k (l, B) ==⇒ (l ,B ) cd e,k Fig. 4: Deﬁnition of the relation ==⇒ cd messages sent to p that occurred in some k-exchange before e. E is the set E XY of edges −→ among message exchanges of e, as in Deﬁnition 5, augmented with the following set of extra edges that takes into account summary nodes. SX {ψ −→ v | proc (v) ∈ C & v ∩ X
= ∅ for some X ∈{S, R}} (1) p S,p SS ∪{ψ −→ v | proc (v) ∈ C & v ∩ R
= ∅ for some X ∈{S, R}} (2) p R,p SS ∪{ψ −→ v | proc (v) ∈ C & v is unmatched} (3) p R,p SS SS ∪{v −→ ψ | proc (v)= p & v ∩ R
= ∅} ∪ {ψ −→ ψ | p ∈ C } (4) p q p R,q These extra edges summarise/abstract the connections to and from previous SS SR k-exchanges. Equation (1) considers connections −→ and −→ that are due to two sends messages or, respectively, a send and a receive on the same process. RR RS Equations (2) and (3) considers connections −→ and −→ that are due to two received messages or, respectively, a receive and a subsequent send on the same process. Notice how the rules in Fig. 3 would then imply the existence of a SS SS connection , in particular Equation (3) abstract the existence of an edge built because of Rule 4. Equations in (4) abstract edges that would connect the current k-exchange to previous ones. As before those edges in the global conﬂict graph would correspond to extended edges added because of Rule 4 in Fig. 3. Once we have this enriched local view of the conﬂict graph, we take its extended XY version. Let denote the edges of the extended conﬂict graph as deﬁned from rules in Fig. 3 taking into account the new vertices ψ and their edges. e,k Finally, let S be a system and ==⇒ be the transition relation given in Fig. 4 cd among abstract conﬁgurations of the form (l, B). l is a global control state of P P S and B : P → 2 × 2 is the function deﬁned above that associates to each e,k process p a pair of sets of processes B(p)=(C ,C ). Transition ==⇒ updates S,p R,p cd these sets with respect to the current k-exchange e. Causal delivery is veriﬁed by checking that for all p ∈ P,p
∈ C meaning that there is no cyclic dependency R,p 166 C. Di Giusto et al. pq rs v1 v1 C = ∅ S,r C = ∅ R,r SS SS SS SS e C = {q} 1 S,r C = {s} v2 v2 R,r SS v SS RR SS RR C = {q} S,r e2 v3 SS v3 C = {s} R,r C = {p, q} S,r SS SS SS C = {s, r} R,r SS v4 v4 (a) (b) (c) Fig. 5: (a) an MSC (b) its associated global conﬂict graph, (c) the conﬂict graphs of its k-exchanges P P as stated in Theorem 2. The initial state is (l ,B ), where B : P → (2 × 2 ) 0 0 0 denotes the function such that B (p)=(∅, ∅) for all p ∈ P. Example 4 (An Invalid Execution). Let e = e · e with e and e the two 1 2 1 2 2-exchanges of this execution. such that e = send(q, r, v ) · send(q, s, v ) · 1 1 2 rec(q, s, v ) and e = send(p, s, v ) · rec(p, s, v ) · send(p, r, v ) · rec(p, r, v ). 2 2 3 3 4 4 Fig. 5a and 5c show the MSC and corresponding conﬂict graph of each of the 2-exchanges. Note that two edges of the global graph (in blue) “go across” k- exchanges. These edges do not belong to the local conﬂict graphs and are mim- icked by the incoming and outgoing edges of summary nodes. The values of sets C and C at the beginning and at the end of the k-exchange are given S,r R,r on the right. All other sets C and C for p
= r are empty, since there is S,p R,p only one unmatched message to process r. Notice how at the end of the second k-exchange, r ∈ C signalling that message v violates causal delivery. R,r e,k Comparison with [4].In [4] the authors deﬁne ==⇒ in a rather diﬀerent way: cd they do not explicitly give a graph-theoretic characterisation of causal delivery; instead they compute, for every process p, the set B(p) of processes that either sent an unmatched message to p or received a message from a process in B(p). They then make sure that any message sent to p by a process q ∈ B(p)is unmatched. According to that deﬁnition, the MSC of Fig. 5b would satisfy causal delivery and would be 1-synchronous. However, this is not the case (this MSC does not satisfy causal delivery) as we have shown in Example 3. Due to to the above errors, we had to propose a considerably diﬀerent approach. The extended edges of the conﬂict graph, and the graph-theoretic characterisation of causal delivery as well as summary nodes, have no equivalent in [4]. Next lemma proves that Fig. 4 properly characterises causal delivery. On the k-synchronizability of Systems 167 Lemma 1. An MSC msc is k-synchronous iﬀ there is e = e ··· e a lineari- 1 n e ,k e ,k 1 n sation such that (l ,B ) ==⇒ ··· ==⇒ (l ,B ) for some global state l and some 0 0 cd cd P P B : P → (2 × 2 ). Note that there are only ﬁnitely many abstract conﬁgurations of the form P P (l, B) with l a tuple of control states and B : P → (2 × 2 ). Moreover, since V is ﬁnite, the alphabet over the possible k-exchange for a given k is also ﬁnite. e,k Therefore ==⇒ is a relation on a ﬁnite set, and the set sT r (S)of k-synchronous cd MSCs of a system S forms a regular language. It follows that it is decidable whether a given abstract conﬁguration of the form (l, B) is reachable from the initial conﬁguration following a k-synchronizable execution. Theorem 3. Let S be a k-synchronizable system and l a global control state of S. The problem whether there exists e ∈ asEx(S) and Buf such that (l , Buf ) = ⇒ 0 0 (l, Buf) is decidable. Remark 2. Deadlock-freedom, unspeciﬁed receptions, and absence of orphan mes- sages are other properties that become decidable for a k-synchronizable system because of the regularity of the set of k-synchronous MSCs. 5 Decidability of k-synchronizability for Mailbox Systems We establish the decidability of k-synchronizability; our approach is similar to the one of [4] based on the notion of borderline violation, but we adjust it to adapt to the new characterisation of k-synchronizable executions (Theorem 1). Deﬁnition 8 (Borderline Violation). A non k-synchronizable execution e is a borderline violation if e = e · r, r is a reception and e is k-synchronizable. Note that a system S that is not k-synchronizable always admits at least one borderline violation e · r ∈ asEx(S) with r ∈ R: indeed, there is at least one execution e ∈ asEx(S) which contains a unique minimal preﬁx of the form e · r that is not k-synchronizable; moreover since e is k-synchronizable, r cannot be a k-exchange of just one send action, therefore it must be a receive action. In order to ﬁnd such a borderline violation, Bouajjani et al. introduced an instrumented system S that behaves like S, except that it contains an extra process π,and such that a non-deterministically chosen message that should have been sent from a process p to a process q may now be sent from p to π, and later forwarded by π to q.In S , each process p has the possibility, instead of sending a message v to q, to deviate this message to π;ifitdoesso, p continues its execution as if it really had sent it to q. Note also that the message sent to π get tagged with the original destination process q. Similarly, for each possible reception, a process has the possibility to receive a given message not from the initial sender but from π. The process π has an initial state from which it can receive any messages from the system. Each reception makes it go into a diﬀerent state. From this state, v 168 C. Di Giusto et al. it is able to send the message back to the original recipient. Once a message is forwarded, π reaches its ﬁnal state and remains idle. The following example illustrates how the instrumented system works. Example 5 (A Deviated Message). Let e , e be two executions of a system S with 1 2 MSCs respectively msc(e ) and msc(e ). e is not 1- 1 2 1 pq pq synchronizable. It is borderline in S. If we delete the last π (q, v ) reception, it becomes indeed 1-synchronizable. msc(e ) 1 is the MSC obtained from the instrumented system S v v 2 2 where the message v is ﬁrst deviated to π and then sent back to q from π. Note that msc(e ) is 1-synchronous. In this case, the instrumented system S in the 1-synchronous semantics msc(e ) msc(e ) 1 2 “reveals” the existence of a borderline violation of S. For each execution e · r ∈ asEx(S) that ends with a reception, there exists an execution deviate(e · r) ∈ asEx(S ) where the message exchange associated with the reception r has been deviated to π; formally, if e · r = e · s · e · r with 1 2 r = rec(p, q, v) and s r, then deviate(e·r)= e ·send(p, π, (q, v))·rec(p, π, (q, v))·e ·send(π, q, (v))·rec(π, q, v). 1 2 Deﬁnition 9 (Feasible Execution, Bad Execution). A k-synchronizable execution e of S is feasible if there is an execution e · r ∈ asEx(S) such that deviate(e·r)= e . A feasible execution e = deviate(e·r) of S is bad if execution e · r is not k-synchronizable in S. pq pq Example 6 (A Non-feasible Execution). (q, v ) Let e be an execution such that msc(e ) is as depicted on the right. Clearly, this MSC satisﬁes causal delivery and could be the execution of some instrumented system S . However, the sequence e·r such that deviate(e·r)= e does not satisfy causal delivery, therefore it cannot be an execution of the original system S. In other words, msc(e ) msc(e · r) the execution e is not feasible. Lemma 2. A system S is not k-synchronizable iﬀ there is a k-synchronizable execution e of S that is feasible and bad. As we have already noted, the set of k-synchronous MSCs of S is regular. The decision procedure for k-synchronizability follows from the fact that the set of MSCs that have as linearisation a feasible bad execution as we will see, is regular as well, and that it can be recognised by an (eﬀectively computable) non-deterministic ﬁnite state automaton. The decidability of k-synchronizability follows then from Lemma 2 and the decidability of the emptiness problem for non-deterministic ﬁnite state automata. On the k-synchronizability of Systems 169 Recognition of Feasible Executions. We start with the automaton that recognises feasible executions; for this, we revisit the construction we just used for recognising sequences of k-exchanges that satisfy causal delivery. In the remainder, we assume an execution e ∈ asEx(S ) that contains exactly one send of the form send(p, π, (q, v)) and one reception of the form XY rec(π, q, v), this reception being the last action of e . Let (V, {−→} )be X,Y ∈{R,S} the conﬂict graph of e . There are two uniquely determined vertices υ ,υ ∈ start stop V such that proc (υ )= π and proc (υ )= π that correspond, respectively, start stop R S to the ﬁrst and last message exchanges of the deviation. The conﬂict graph of e · r is then obtained by merging these two nodes. Lemma 3. The execution e is not feasible iﬀ there is a vertex v in the conﬂict SS RR graph of e such that υ v −→ υ . start stop In order to decide whether an execution e is feasible, we want to forbid that a send action send(p ,q, v ) that happens causally after υ is matched by a start receive rec(p ,q, v ) that happens causally before the reception υ . As a matter stop of fact, this boils down to deal with the deviated send action as an unmatched π π send. So we will consider sets of processes C and C similar to the ones used S R e,k for ==⇒, but with the goal of computing which actions happen causally after the cd send to π. We also introduce a summary node ψ and the extra edges following start P P the same principles as in the previous section. Formally, let B : P → (2 × 2 ), π π ≤k ≤k C ,C ⊆ P and e ∈ S R be ﬁxed, and let CG(e, B)=(V ,E ) be the S R constraint graph with summary nodes for unmatched sent messages as deﬁned π π in the previous section. The local constraint graph CG(e, B, C ,C ) is deﬁned S R as the graph (V ,E ) where V = V ∪{ψ } and E is E augmented with start SX {ψ −→ v | proc (v) ∈ C & v ∩ X
= ∅ for some X ∈{S, R}} start X S SS ∪{ψ −→ v | proc (v) ∈ C & v ∩ R
= ∅ for some X ∈{S, R}} start X R SS SS π π ∪{ψ −→ v | proc (v) ∈ C & v is unmatched}∪{ψ −→ ψ | p ∈ C } start start p R R R XY As before, we consider the “closure” of these edges by the rules of Fig. 3. e,k The transition relation = ==⇒ is deﬁned in Fig. 6. It relates abstract conﬁgurations feas of the form (l, B, C, dest ) with C =(C ,C ) and dest ∈ P∪{⊥} storing to π S,π R,π π whom the message deviated to π was supposed to be delivered. Thus, the initial abstract conﬁguration is (l ,B , (∅, ∅), ⊥), where ⊥ means that the processus 0 0 dest has not been determined yet. It will be set as soon as the send to process π is encountered. Lemma 4. Let e be an execution of S . Then e is a k-synchronizable feasible execution iﬀ there are e = e ··· e · send(π, q, v) · rec(π, q, v) with e ,...,e ∈ 1 n 1 n ≤k ≤k P P 2 S R , B : P → 2 , C ∈ (2 ) , and a tuple of control states l such that msc(e )= msc(e ), π
∈ C (with B (q)=(C ,C )), and R,q S,q R,q e ,k e ,k 1 n (l ,B , (∅, ∅), ⊥) = ==⇒ ... = ==⇒ (l ,B , C ,q). 0 0 feas feas 170 C. Di Giusto et al. e,k (l, B) ==⇒ (l ,B ) e = a ··· a (∀v) proc (v) = π 1 n cd (∀v, v ) proc (v)= proc (v )= π =⇒ v = v ∧ dest = ⊥ R R (∀v) v
send(p, π, (q, v)) =⇒ dest = q dest = ⊥ =⇒ dest = dest π π π π SS π π C = C ∪{proc (v ) | v v & v ∩ X = ∅ &(proc (v)= π or v = ψ )} start X X X R ∪{proc (v) | proc (v)= π & X = S} S R SS ∪{p | p ∈ C & v ψ &(proc (v)= π or v = ψ )} X,q q start dest ∈ C e,k π π π π (l, B, C ,C , dest ) = ==⇒ (l ,B ,C ,C , dest ) S R S R π feas e,k Fig. 6: Deﬁnition of the relation = ==⇒ feas Comparison with [4].In[4] the authors verify that an execution is feasible with a monitor which reviews the actions of the execution and adds processes that no longer are allowed to send a message to the receiver of π. Unfortunately, we have here a similar problem that the one mentioned in the previous comparison paragraph. According to their monitor, the following execution e = deviate(e·r) is feasible, i.e., is runnable in S and e · r is runnable in S. e = send(q, π, (r, v )) · rec(q, π, (r, v )) · send(q, s, v ) · rec(q, s, v )· 1 1 2 2 send(p, s, v ) · rec(p, s, v ) · send(p, r, v ) · rec(p, r, v )· 3 3 4 4 send(π, r, v ) · rec(π, r, v ) 1 4 However, this execution is not feasible because there is a causal dependency between v and v .In[4] this execution would then be considered as feasible 1 3 and therefore would belong to set sT r (S ). Yet there is no corresponding exe- cution in asT r(S), the comparison and therefore the k-synchronizability, could be distorted and appear as a false negative. Recognition of Bad Executions. Finally, we deﬁne a non-deterministic ﬁnite state automaton that recognizes MSCs of bad executions, i.e., feasible executions e = deviate(e · r) such that e · r is not k-synchronizable. We come back to the XY “non-extended” conﬂict graph, without edges of the form . Let Post (v)= {v ∈ V | v → v } be the set of vertices reachable from v, and let Pre (v)= {v ∈ V | v → v} be the set of vertices co-reachable from v. For a set of vertices ∗ ∗ ∗ ∗ U ⊆ V , let Post (U)= {Post (v) | v ∈ U}, and Pre (U)= {Pre (v) | v ∈ U}. Lemma 5. The feasible execution e is bad iﬀ one of the two holds RS ∗ ∗ 1. υ −→ −→−→ υ ,or start stop ∗ ∗ 2. the size of the set Post (υ ) ∩ Pre (υ ) is greater or equal to k +2. start stop In order to determine whether a given message exchange v of CG(e ) should be counted as reachable (resp. co-reachable), we will compute at the entry and exit of every k-exchange of e which processes are “reachable” or “co-reachable”. On the k-synchronizability of Systems 171 Example 7. (Reachable and Co-reachable Processes) Consider the MSC on the right made of ﬁve 1-exchanges. pq rs π While sending message (s, v ) that corresponds to υ , 0 start (s, v ) process r becomes “reachable”: any subsequent message exchange that involves r corresponds to a vertex of the conﬂict graph that is reachable from υ . While send- start ing v , process s becomes “reachable”, because process r will be reachable when it will receive message v . Sim- 2 v ilary, q becomes reachable after receiving v because r was reachable when it sent v , and p becomes reachable msc(e) after receiving v because q was reachable when it sent v . Co-reachability works similarly, but reasoning backwards on the timelines. For instance, process s stops being “co-reachable” while it receives v , process r stops being co-reachable after it receives v , and process p stops being co- reachable by sending v . The only message that is sent by a process being both reachable and co-reachable at the instant of the sending is v , therefore it is the only message that will be counted as contributing to the SCC. More formally, let e be sequence of actions, CG(e) its conﬂict graph and P, Q two sets of processes, Post (P)= Post {v | procs(v) ∩ P
= ∅} and Pre (Q)= Pre {v | procs(v) ∩ Q
= ∅} are introduced to represent the local ∗ ∗ view through k-exchanges of Post (υ ) and Pre (υ ). For instance, for e start stop as in Example 7,weget Post ({π})= {(s, v ), v , v , v , v } and Pre ({π})= e 0 2 3 4 0 e {v , v , v , (s, v )}. In each k-exchange e the size of the intersection between 0 2 1 0 i Post (P ) and Pre (Q) will give the local contribution of the current k-exchange e e i i e,k to the calculation of the size of the global SCC. In the transition relation = ==⇒ bad this value is stored in variable cnt. The last ingredient to consider is to recognise if an edge RS belongs to the SCC. To this aim, we use a function lastisRec : P →{True, False} that for each process stores the information whether the last action in the previous k-exchange was a reception or not. Then depending on the value of this variable and if a node is in the current SCC or not the value of sawRS is set accordingly. e,k The transition relation = ==⇒ deﬁned in Fig. 7 deals with abstract conﬁ- bad gurations of the form (P, Q, cnt, sawRS, lastisRec ) where P, Q ⊆ P, sawRS is a boolean value, and cnt is a counter bounded by k +2. We denote by lastisRec the function where all lastisRec(p)= False for all p ∈ P. Lemma 6. Let e be a feasible k-synchronizable execution of S . Then e is a bad execution iﬀ there are e = e ··· e · send(π, q, v) · rec(π, q, v) with e ,...,e ∈ 1 n 1 n ≤k ≤k S R and msc(e )= msc(e ), P ,Q ⊆ P, sawRS ∈{True, False}, cnt ∈ {0,...,k +2}, such that e ,k e ,k 1 n ({π},Q, 0, False, lastisRec ) = ==⇒ ... = ==⇒ (P , {π}, cnt, sawRS, lastisRec) bad bad 172 C. Di Giusto et al. P = procs(Post (P )) Q = procs(Pre (Q )) e e SCC = Post (P ) ∩ Pre (Q ) e e e cnt = min(k +2, cnt + n) where n = |SCC | lastisRec (q) ⇔ (∃v ∈ SCC .proc (v)= q ∧ v ∩ R = ∅)∨ (lastisRec(q)∧ ∃v ∈ V.proc (v)= q) sawRS = sawRS∨ (∃v ∈ SCC )(∃p ∈ P \{π}) proc (v)= p ∧ lastisRec(p) ∧ p ∈ P ∩ Q e,k (P, Q, cnt, sawRS, lastisRec) = == ⇒ (P ,Q , cnt , sawRS , lastisRec ) bad e,k Fig. 7: Deﬁnition of the relation = ==⇒ bad and at least one of the two holds: either sawRS = True,or cnt = k +2. Comparison with [4]. As for the notion of feasibility, to determine if an execution is bad, in [4] the authors use a monitor that builds a path between the send to process π and the send from π. In addition to the problems related to the wrong characterisation of k-synchronizability, this monitor not only can detect an RS edge when there should be none, but also it can miss them when they exist. In general, the problem arises because the path is constructed by considering only an endpoint at the time. We can ﬁnally conclude that: Theorem 4. The k-synchronizability of a system S is decidable for k ≥ 1. 6 k-synchronizability for Peer-to-Peer Systems In this section, we will apply k-synchronizability to peer-to-peer systems. A peer- to-peer system is a composition of communicating automata where each pair of machines exchange messages via two private FIFO buﬀers, one per direction of communication. Here we only give an insight on what changes with respect to the mailbox setting. Causal delivery reveals the order imposed by FIFO buﬀers. Deﬁnition 4 must then be adapted to account for peer-to-peer communication. For instance, two messages that are sent to a same process p by two diﬀerent processes can be received by p in any order, regardless of any causal dependency between the two sends. Thus, checking causal delivery in peer-to-peer systems is easier than in the mailbox setting, as we do not have to carry information on causal dependencies. Within a peer-to-peer architecture, MSCs and conﬂict graphs are deﬁned as within a mailbox communication. Indeed, they represents dependencies over machines, i.e., the order in which the actions can be done on a given machine, and over the send and the reception of a same message, and they do not depend on the type of communication. The notion of k-exchange remains also unchanged. On the k-synchronizability of Systems 173 Decidability of Reachability for k-synchronizable Peer-to-Peer Sys- tems. To establish the decidability of reachability for k-synchronizable peer-to- p2p e,k peer systems, we deﬁne a transition relation ==⇒ for a sequence of action e cd describing a k-exchange. As for mailbox systems, if a send action is unmatched in the current k-exchange, it will stay orphan forever. Moreover, after a process p sent an orphan message to a process q, p is forbidden to send any matched message to q. Nonetheless, as a consequence of the simpler deﬁnition of causal delivery, , we no longer need to work on the conﬂict graph. Summary nodes and extended edges are not needed and all the necessary information is in function B that solely contains all the forbidden senders for process p. The characterisation of a k-synchronizable execution is the same as for mail- box systems as the type of communication is not relevant. We can thus conclude, as within mailbox communication, that reachability is decidable. Theorem 5. Let S be a k-synchronizable system and l a global control state of S. The problem whether there exists e ∈ asEx(S) and Buf such that (l , Buf ) = ⇒ 0 0 (l, Buf) is decidable. Decidability of k-synchronizability for Peer-to-Peer Systems. As in mailbox system, the detection of a borderline execution determines whether a system is k-synchronizable. p2p e,k The relation transition = ==⇒ allows to obtain feasible executions. Diﬀer- feas ently from the mailbox setting, we need to save not only the recipient dest but also the sender of the delayed message (information stored in variable exp ). The transition rule then checks that there is no message that is violating causal delivery, i.e., there is no message sent by exp to dest after the deviation. Finally the recognition of bad execution, works in the same way as for mailbox p2p e,k systems. The characterisation of a bad execution and the deﬁnition of = ==⇒ bad are, therefore, the same. As for mailbox systems, we can, thus, conclude that for a given k, k-synchro- nizability is decidable. Theorem 6. The k-synchronizability of a system S is decidable for k ≥ 1. 7 Concluding Remarks and Related works In this paper we have studied k-synchronizability for mailbox and peer-to-peer systems. We have corrected the reachability and decidability proofs given in [4]. The ﬂaws in [4] concern fundamental points and we had to propose a consid- erably diﬀerent approach. The extended edges of the conﬂict graph, and the graph-theoretic characterisation of causal delivery as well as summary nodes, e,k e,k have no equivalent in [4]. Transition relations = ==⇒ and = ==⇒ building on the feas bad 174 C. Di Giusto et al. graph-theoretic characterisations of causal delivery and k-synchronizability, de- part considerably from the proposal in [4]. We conclude by commenting on some other related works. The idea of “com- munication layers” is present in the early works of Elrad and Francez [8] or Chou and Gafni [7]. More recently, Chaouch-Saad et al. [6] veriﬁed some consensus al- gorithms using the Heard-Of Model that proceeds by “communication-closed rounds”. The concept that an asynchronous system may have an “equivalent” synchronous counterpart has also been widely studied. Lipton’s reduction [14] reschedules an execution so as to move the receive actions as close as possible from their corresponding send. Reduction recently received an increasing interest for veriﬁcation purpose, e.g. by Kragl et al. [12], or Gleissenthal et al. [11]. Existentially bounded communication systems have been studied by Ge- nest et al. [10,15]: a system is existentially k-bounded if any execution can be rescheduled in order to become k-bounded. This approach targets a broader class of systems than k-synchronizability, because it does not require that the execu- tion can be chopped in communication-closed rounds. In the perspective of the current work, an interesting result is the decidability of existential k-boundedness for deadlock-free systems of communicating machines with peer-to-peer channels. Despite the more general deﬁnition, these older results are incomparable with the present ones, that deal with systems communicating with mailboxes, and not peer-to-peer channels. Basu and Bultan studied a notion they also called synchronizability, but it diﬀers from the notion studied in the present work; synchronizability and k- synchronizability deﬁne incomparable classes of communicating systems. The proofs of the decidability of synchronizability [3,2] were shown to have ﬂaws by Finkel and Lozes [9]. A question left open in their paper is whether synchroni- zability is decidable for mailbox communications, as originally claimed by Basu and Bultan. Akroun and Salaun ¨ deﬁned also a property they called stability [1] and that shares many similarities with the synchronizability notion in [2]. Context-bounded model-checking is yet another approach for the automatic veriﬁcation of concurrent systems. La Torre et al. studied systems of commu- nicating machines extended with a calling stack, and showed that under some conditions on the interplay between stack actions and communications, context- bounded reachability was decidable [13]. A context-switch is found in an exe- cution each time two consecutive actions are performed by a diﬀerent partici- pant. Thus, while k-synchronizability limits the number of consecutive sendings, bounded context-switch analysis limits the number of times two consecutive ac- tions are performed by two diﬀerent processes. As for future work, it would be interesting to explore how both context- boundedness and communication-closed rounds could be composed. Moreover reﬁnements of the deﬁnition of k-synchronizability can also be considered. For instance, we conjecture that the current development can be greatly simpliﬁed if we forbid linearisation that do not correspond to actual executions. On the k-synchronizability of Systems 175 References 1. Akroun, L., Salaun, ¨ G.: Automated veriﬁcation of automata communicating via FIFO and bag buﬀers. Formal Methods in System Design 52(3), 260–276 (2018). https://doi.org/10.1007/s10703-017-0285-8 2. Basu, S., Bultan, T.: On deciding synchronizability for asynchronously communicating systems. Theor. Comput. Sci. 656, 60–75 (2016). https://doi.org/10.1016/j.tcs.2016.09.023 3. Basu, S., Bultan, T., Ouederni, M.: Synchronizability for veriﬁcation of asyn- chronously communicating systems. In: Kuncak, V., Rybalchenko, A. (eds.) Veriﬁ- cation, Model Checking, and Abstract Interpretation - 13th International Con- ference, VMCAI 2012, Philadelphia, PA, USA, January 22-24, 2012. Proceed- ings. Lecture Notes in Computer Science, vol. 7148, pp. 56–71. Springer (2012). https://doi.org/10.1007/978-3-642-27940-9 5 4. Bouajjani, A., Enea, C., Ji, K., Qadeer, S.: On the completeness of verifying mes- sage passing programs under bounded asynchrony. In: Chockler, H., Weissenbacher, G. (eds.) Computer Aided Veriﬁcation - 30th International Conference, CAV 2018, Held as Part of the Federated Logic Conference, FloC 2018, Oxford, UK, July 14-17, 2018, Proceedings, Part II. Lecture Notes in Computer Science, vol. 10982, pp. 372–391. Springer (2018). https://doi.org/10.1007/978-3-319-96142-2 23 5. Bouajjani, A., Habermehl, P., Vojnar, T.: Abstract regular model checking. In: Alur, R., Peled, D.A. (eds.) Computer Aided Veriﬁcation, 16th Interna- tional Conference, CAV 2004, Boston, MA, USA, July 13-17, 2004, Proceed- ings. Lecture Notes in Computer Science, vol. 3114, pp. 372–386. Springer (2004). https://doi.org/10.1007/978-3-540-27813-9 29 6. Chaouch-Saad, M., Charron-Bost, B., Merz, S.: A reduction theorem for the veri- ﬁcation of round-based distributed algorithms. In: Bournez, O., Potapov, I. (eds.) Reachability Problems, 3rd International Workshop, RP 2009, Palaiseau, France, September 23-25, 2009. Proceedings. Lecture Notes in Computer Science, vol. 5797, pp. 93–106. Springer (2009). https://doi.org/10.1007/978-3-642-04420-5 10 7. Chou, C., Gafni, E.: Understanding and verifying distributed algorithms us- ing stratiﬁed decomposition. In: Dolev, D. (ed.) Proceedings of the Sev- enth Annual ACM Symposium on Principles of Distributed Computing, Toronto, Ontario, Canada, August 15-17, 1988. pp. 44–65. ACM (1988). https://doi.org/10.1145/62546.62556 8. Elrad, T., Francez, N.: Decomposition of distributed programs into communication-closed layers. Sci. Comput. Program. 2(3), 155–173 (1982). https://doi.org/10.1016/0167-6423(83)90013-8 9. Finkel, A., Lozes, E.: Synchronizability of communicating ﬁnite state ma- chines is not decidable. In: Chatzigiannakis, I., Indyk, P., Kuhn, F., Muscholl, A. (eds.) 44th International Colloquium on Automata, Languages, and Programming, ICALP 2017, July 10-14, 2017, Warsaw, Poland. LIPIcs, vol. 80, pp. 122:1–122:14. Schloss Dagstuhl - Leibniz-Zentrum fuer Informatik (2017). https://doi.org/10.4230/LIPIcs.ICALP.2017.122, http://www.dagstuhl. de/dagpub/978-3-95977-041-5 10. Genest, B., Kuske, D., Muscholl, A.: On communicating automata with bounded channels. Fundam. Inform. 80(1-3), 147–167 (2007), http://content.iospress.com/ articles/fundamenta-informaticae/ﬁ80-1-3-09 11. von Gleissenthall, K., Kici, R.G., Bakst, A., Stefan, D., Jhala, R.: Pretend syn- chrony: synchronous veriﬁcation of asynchronous distributed programs. PACMPL 3(POPL), 59:1–59:30 (2019). https://doi.org/10.1145/3290372 176 C. Di Giusto et al. 12. Kragl, B., Qadeer, S., Henzinger, T.A.: Synchronizing the asynchronous. In: Schewe, S., Zhang, L. (eds.) 29th International Conference on Concurrency The- ory, CONCUR 2018, September 4-7, 2018, Beijing, China. LIPIcs, vol. 118, pp. 21:1–21:17. Schloss Dagstuhl - Leibniz-Zentrum fuer Informatik (2018). https://doi.org/10.4230/LIPIcs.CONCUR.2018.21 13. La Torre, S., Madhusudan, P., Parlato, G.: Context-bounded analysis of concurrent queue systems. In: Ramakrishnan, C.R., Rehof, J. (eds.) Tools and Algorithms for the Construction and Analysis of Systems, 14th International Conference, TACAS 2008, Held as Part of the Joint European Conferences on Theory and Practice of Software, ETAPS 2008, Budapest, Hungary, March 29-April 6, 2008. Proceed- ings. Lecture Notes in Computer Science, vol. 4963, pp. 299–314. Springer (2008). https://doi.org/10.1007/978-3-540-78800-3 21 14. Lipton, R.J.: Reduction: A method of proving properties of parallel programs. Commun. ACM 18(12), 717–721 (1975). https://doi.org/10.1145/361227.361234 15. Muscholl, A.: Analysis of communicating automata. In: Dediu, A., Fernau, H., Mart´ın-Vide, C. (eds.) Language and Automata Theory and Applications, 4th International Conference, LATA 2010, Trier, Germany, May 24-28, 2010. Proceed- ings. Lecture Notes in Computer Science, vol. 6031, pp. 50–57. Springer (2010). https://doi.org/10.1007/978-3-642-13089-2 4 Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made. The images or other third party material in this chapter are included in the chapter’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. General Supervised Learning as Change Propagation with Delta Lenses ( ) Zinovy Diskin McMaster University, Hamilton, Canada diskinz@mcmaster.ca Abstract. Delta lenses are an established mathematical framework for modelling and designing bidirectional model transformations (Bx). Fol- lowing the recent observations by Fong et al, the paper extends the delta lens framework with a a new ingredient: learning over a parameterized space of model transformations seen as functors. We will deﬁne a notion of an asymmetric learning delta lens with amendment (ala-lens), and show how ala-lenses can be organized into a symmetric monoidal (sm) category. We also show that sequential and parallel composition of well- behaved (wb) ala-lenses are also wb so that wb ala-lenses constitute a full sm-subcategory of ala-lenses. 1 Introduction The goal of the paper is to develop a formal model of supervised learning in a very general context of bidirectional model transformation or Bx, i.e., synchro- nization of two arbitrary complex structures (called models) related by a trans- formation. Rather than learning parameterized functions between Euclidean spaces as is typical for machine learning (ML), we will consider learning map- pings between model spaces and formalize them as parameterized functors be- tween categories, f: P ×A → B,with P being a parameter space. The basic ML-notion of a training pair (A, B ) ∈ A × B will be considered as an incon- 0 0 sistency between models caused by a change (delta) v: B → B of the target model B = f(p, A), p ∈ P , that was ﬁrst consistent with A w.r.t. the transfor- mation (functor) f(p, _). An inconsistency is repaired by an appropriate change of the source structure, u: A → A , changing the parameter p to p ,andan @ @ @ amendment of the target structure v : B → B so that f(p ,A )= B is a consistent state of the parameterized two-model system. The setting above without parameterization and learning (i.e., p = p always holds), and without amendment (v = id always holds), is well known in the Bx literature under the name of delta lenses— mathematical structures, in Term Bx refers to a wide area including ﬁle synchronization, data exchange in databases, and model synchronization in Model-Driven software Engineering (MDE), see [7] for a survey. In the present paper, Bx will mainly refer to Bx in the MDE context. c The Author(s) 2020 J. Goubault-Larrecq and B. K¨ onig (Eds.): FOSSACS 2020, LNCS 12077, pp. 177–197, 2020. https://doi.org/10.1007/978-3-030-45231-5_10 178 Z. Diskin which consistency restoration via change propagation is modelled by functorial- like algebraic operations over categories [12,6]. There are several types of delta lenses tailored for modelling diﬀerent synchronization tasks and scenarios, partic- ularly, symmetric and asymmetric. In the paper, we only consider asymmetric delta lenses and will often omit explicit mentioning these attributes. Despite their extra-generality, (delta) lenses have been proved useful in the design and implementation of practical model synchronization systems with triple graph grammars (TGG) [5,2]; enriching lenses with amendment is a recent extension of the framework motivated and formalized in [11]. A major advantage of the lens framework for synchronization is its compositionality: a lens satisfying sev- eral equational laws specifying basic synchronization requirements is called well- behaved (wb), and basic lens theorems state that sequential and parallel compo- sition of wb lenses is again wb. In practical applications, it allows the designer of a complex synchronizer to avoid integration testing: if elementary synchronizers are tested and proved to be wb, their composition is automatically wb too. The present paper makes the following contributions to the delta lens frame- work for Bx. a) We motivate model synchronization enriched with learning and, moreover, with categorical learning, in which the parameter space is a cate- gory, and introduce the notion of a wb asymmetric learning (delta) lens with amendment (a wb ala-lens) (this is the content of Sect. 3). b) We prove compo- sitionality of wb ala-lenses and show how their universe can be organized into a symmetric monoidal (sm) category (Theorems 1-3 in Sect. 4). All proofs (rather straightforward but notationally laborious) can be found in the long version of the paper [9]. One more compositional result is c) a deﬁnition of a compositional bidirectional transformation language (Def. 6) that formalizes an important re- quirement to model synchronization tools, which (surprisingly) is missing from the Bx literature. Background Sect. 2 provides a simple example demonstrat- ing main concepts of Bx and delta lenses in the MDE context. Section 5 brieﬂy surveys related work, and Sect. 6 concludes. Notation. Given a category A, its objects are denoted by capital letters A, A , etc. to recall that in MDE applications, objects are complex structures, which themselves have elements a, a , ....; the collection of all objects of category A is denoted by A .Anarrowwithdomain A ∈ A is written as u: A → _or 0 0 dom u ∈ A(A, _);wealsowrite dom(u)= A (and sometimes u = A to shorten formulas). Similarly, formula u: _ → A denotes an arrow with codomain u.cod = A . Given a functor f: A → B, its object function is denoted by f : A → B . 0 0 0 A subcategory B ⊂ A is called wide if it has the same objects. All categories we consider in the paper are small. 2 Background: Update propagation and delta lenses Although Bx ideas work well only in domains conforming to the slogan any im- plementation satisfying the speciﬁcation is good enough such as code generation (see [10] for discussion), and have limited applications in databases (only so called updatable views can be treated in the Bx-way), we will employ a simple General Supervised Learning 179 database example: it allows demonstrating the core ideas without any special domain knowledge required by typical Bx-amenable areas. The presentation will be semi-formal as our goal is to motivate the delta lens formalism that abstracts the details away rather than formalize the example as such. 2.1 Why deltas. Bx-lenses ﬁrst appeared in the work on ﬁle synchronization, and if we have two sets of strings, say, B = {John, Mary} and B = {Jon, Mary}, we can readily see the diﬀerence: John = Jon but Mary = Mary. We thus have a structure in-between B and B (which maybe rather complex if B and B are big ﬁles), but this structure can be recovered by string matching and thus updates can be identiﬁed with pairs. The situation dramatically changes if B and B are object structures, e.g., B = {o ,o } with Name(o ) = John, Name(o ) = Mary and 1 2 1 2 similarly B = {o ,o } with Name(o ) = Jon, Name(o ) = Mary.Now string 1 2 1 2 matching does not say too much: it may happen that o and o are the same object (think of a typo in the dataset), while o and o are diﬀerent (although equally named) objects. Of course, for better matching we could use full names or ID numbers or something similar (called, in the database parlance, primary keys), but absolutely reliable keys are rare, and typos and bugs can compromise them anyway. Thus, for object structures that Bx needs to keep in sync, deltas between models need to be independently speciﬁed, e.g., by specifying a same- ness relation u ⊂ B×B between models. For example, u = {o ,o } says that John@B and Jon@B are the same person while Mary@B and Mary@B are not. Hence, model spaces in Bx are categories (objects are models and arrows are update/delta speciﬁcations) rather than sets (codiscrete categories). 2.2 Consistency restoration via update propagation: An Example Figure 1 presents a simple example of delta propagation for consistency restora- tion. Models consist of objects (in the sense of OO programming) with attributes (a.k.a. labelled records), e.g., the source model A consists of three objects iden- tiﬁed by their oids (object identiﬁers) #A, #J, #M (think about employees of some company) with attribute values as shown in the table: attribute Expr. refers to Experience measured by a number of years, and Depart. is the column of de- partment names. The schema of the table, i.e., the triple S of attributes (Name, Expr., Depart.) with their domains of values S S Str tr tring ing ing, IIInteg n nteg teger er er, S S Str tr tring ing ing resp., de- termines a model space A.Amodel X ∈ A is given by its set of objects OID X X X together with three functions Name , Expr. , Depart. from the same domain OID to targets S S Str tr tring ing ing, IIIn nteg nteg teger er er, S S Str tr tring ing ing resp., which are compactly speciﬁed by tables as shown for model A. The target model space B is given by a similar schema S consisting of two attributes. The B-view get(X) of an A-model X X X is computed by selecting those oids #O ∈ OID for which Depart. (#O) is an def IT-department, i.e., an element of the set IT = {ML, DB}. For example, the upper part of the ﬁgure shows the IT-view B of model A. u update : #A = #A #J = #J #M = #M 180 Z. Diskin We assume that all column names in schemas S ,and S are qualiﬁed by A B schema names, e.g., OID@S , OID@S etc, so that schemas are disjoint except A B elementary domains like S S Str tr tring ing ing etc. Also disjoint are OID-values, e.g., #J@A and #J@B are diﬀerent elements, but constants like John and Mary are elements of String set S Str tring ing shared by both schemas. To shorten long expressions in the diagrams, we will often omit qualiﬁers and write #J =#J meaning #J@A =#J@B or #J@B =#J@B depending on the context given by the diagram; often we will also write #J and #J for such OIDs. Also, when we write #J =#J inside block arrows denoting updates, we actually mean a pair, e.g., (#J@B, #J@B ). Giventwo modelsoverthesameschema,say, B and B over S ,anupdate B B v: B → B is a relation v ⊂ OID ×OID ; if a schema contains several nodes, an update should provide a relation v for each node N in the schema. Note an essential diﬀerence between the two parallel updates v ,v : B → B 1 2 speciﬁed in the ﬁgure. Update v says that John’s name was changed to Jon (think of ﬁxing a typo), and the experience data for Mary were also corrected (either because of a typo or, e.g., because the department started to use a new ML method for which Mary has a longer experience). Update v speciﬁes the same story for John but a new story for Mary: it says that Mary #M left the IT-view and Mary #M is a new employee in one of IT-departments. :get Source model A Target (view) model B (IT-departments) OIDs Name Expr. Depart. OIDs Name Expr. #A Ann 10 Sales #J John 10 #J John 10 DB 1:put #M Mary 5 #M Mary 5 ML upd. v : upd.v : 1 tr #J = #J’ 2:put #J = #J’ #M = #M’ qt 2:put Updated view BÊ qt Updated source AÊ Updated source AÊ OIDs Name Expr. OIDs Name Expr. Depart. OIDs Name Expr. Depart. Jon tr #A Ann 10 Sales #J’ 10 upd.u #A Ann 10 Sales 2 #J Jon 10 DB #M’ Mary #A = #A Jon #J 10 DB #M’ Mary 7 ? (in IT) #J = #J par #M Mary 7 ML 2:put #M = #M tr Updated source model AÊ par 2 Updated source model AÊ OIDs Name Expr. Depart. OIDs Name Expr. Depart. #A Ann 10 Sales #A Ann 10 Sales #J Jon 10 DB #J Jon 10 DB #M Mary ?(not IT) ? (not IT) #M Mary 5 ML #M’ Mary 7 ? (in IT) #M’ Mary 7 ? (in IT) Fig. 1: Example of update propagation qt upd. u : #A = #A #J = #J General Supervised Learning 181 2.3 Update propagation and update policies The updated view B is inconsistent with the source A and the latter is to be updated accordingly — we say that update v is to be propagated back to A.Prop- agation of v is easy: we just update accordingly the values of the attributes as shown in the ﬁgure in the block arrow u : A → A (of black colour). Importantly, propagation needs two pieces of data: the view update v and the original state A of the source as shown in the ﬁgure by two data-ﬂow lines into the chevron 1:put; the latter denotes invocation of the backward propagation operation put (read “put view update back to the source”). The quadruple 1=(v ,A, u ,A ) 1 1 can be seen as an instance of operation put, hence the notation 1:put (borrowed from the UML). Propagation of update v is more challenging: Mary can disappear from the IT-view because a) she quit the company, b) she transitioned to a non-IT de- partment, and c) the view deﬁnition has changed, e.g., the new view must only show employees with experience more than 5 years. Choosing between these pos- sibilities is often called choosing an (update) policy. We will consider the case of changing the view in Sect. 3, and in the current section discuss policies a) and b) (ignore for a while the propagation scenario shown in blue in the right lower corner of the ﬁgure that shows policy c)). For policy a), further referred to as quiting and brieﬂy denoted by qt,the result of update propagation is shown in the ﬁgure with green colour: notice qt qt theupdate(block)arrow u and its result, model A , produced by invoking 2 2 qt operation put . Note that while we know the new employee Mary works in one of IT departments, we do not know in which one. This is speciﬁed with a special value ’ ?’ (a.k.a. labelled null in the database parlance). For policy b), further referred to as transition and denoted tr, the result of update propagation is shown in the ﬁgure with orange colour: notice update tr tr tr arrow u and its result, model A produced by put .Mary#Mistheold 2 2 employee who transitioned to a new non-IT department, for which her expertize is unknown. Mary #M’ is a new employee in one of IT-departments (we assume that the set of departments is not exhausted by those appearing in a particular state A ∈ A). There are also updates whose backward propagation is uniquely deﬁned and does not need a policy, e.g., update v is such. An important property of update propagations we have considered is that they restore consistency: the view of the updated source equals to the updated view initiated the update: get(A )= B ; moreover, this equality extends for update arrows: get(u )= v , i =1, 2. Such extensions can be derived from view i i deﬁnitions if the latter are determined by so called monotonic queries (which encompass a wide class of practically useful queries including the Select-Project- Join class). For views deﬁned by non-monotonic queries, in order to obtain get’s action on source updates u: A → A , a suitable policy is to be added to the view deﬁnition (see [1,14,12] for details and discussion). Moreover, normally get preserves identity updates, get(id )= id , and update composition: for any A get(A) u: A → A and u : A → A , equality get(u; u )= get(u); get(u ) holds. 182 Z. Diskin 2.4 Delta lenses Our discussion of the example can be summarized in the following algebraic terms. We have two categories of models and updates, A and B, and a functor get: A → B incrementally computing B-views of A-models (we will often write A.get for get(A)). We also suppose that for a chosen update policy, we have worked out precise procedures for how to propagate any view update backwards. This gives us a family of operations put : A(A, _) ← B(A.get, _) indexed by A-objects, A ∈ A ,for whichwewrite put .v or put (v) interchangeably. A A Deﬁnition 1 (Delta Lenses ([12])) Let A, B be two categories. An (asym- metric delta) lens from A (the source of the lens) to B (the target) is a pair =(get, put),where get: A → B is a functor and put is a family of operations put : A(A, _) ← B(A.get, _) indexedbyobjectsof A, A ∈ A .Given A,op- eration put maps any arrow v: A.get → B to an arrow u: A → A such that A .get = B . The last condition is called (co)discrete Putget law: (Putget) (put .v).cod.get = v.cod for all A ∈ A and v ∈ B(A.get, _) A 0 where get denotes the object function of functor get. We will write a lens as an arrow : A → B going in the direction of get. Note that family put corresponds to a chosen update policy, e.g., in terms of the example above, for the same view functor get, we have two families qt tr of put-operations, put and put , corresponding to the two updated policies qt qt we discussed. These two policies determine two lenses =(get, put ) and tr tr =(get, put ) sharing the same get. Deﬁnition 2 (Well-behavedness) A (lens) equational law is an equation to hold for all values of two variables: A ∈ A and v: A.get → T . A lens is called well-behaved (wb) if the following two laws hold: (Stability) id = put .id for all A ∈ A A A.get 0 (Putget)(put .v).get = v for all A ∈ A and all v ∈ B(A.get, _) Remark 1. Stability law says that a wb lens does nothing if nothing happens on the target side (no actions without triggers). Putget requires consistency after the backward propagation is ﬁnished. Note the distinction between the Putget condition included into the very deﬁnition of a lens, and the full Putget law required for the wb lenses. The former is needed to ensure smooth tiling of put-squares (i.e., arrow squares describing application of put to a view update and its result) both horizontally (for sequential composition) and vertically (not considered in the paper). The full Putget assures true consistency as considering a state B alone does not say much about the real update and elements of B cannot be properly interpreted. The real story is speciﬁed by delta v: B → B , and consistency restoration needs the full (PutGet) law as above. A more detailed trailer of lenses can be found in the long version [9]. As shown in [6], the Putget condition is needed if we want to deﬁne operations put separately from the functor get: then we still need a function get : A → B and 0 0 the codiscrete Putget law to ensure a reasonable behaviour of put. General Supervised Learning 183 3 Asymmetric Learning Lenses with Amendments We will begin with a brief motivating discussion, and then proceed with formal deﬁnitions 3.1 Does Bx need categorical learning? Enriching delta lenses with learning capabilities has a clear practical sense for Bx. Having a lens (get, put): A → B and inconsistency A.get = B ,the idea of learning extends the notion of the search space and allows us to update the transformation itself so that the ﬁnal consistency is achieved for a new transfor- mation get : A.get = B . For example, in the case shown in Fig. 1, disappearance of Mary #M in the updated view B can be caused by changing the view def- inition, which now requires to show only those employees whose experience is more than 5 years and hence Mary #M is to be removed from the view, while Mary #M’ is a new IT-employee whose experience satisﬁes the new deﬁnition. Then the update v can be propagated as shown in the bottom right corner of Fig. 1, where index par indicates a new update policy allowing for view deﬁnition (parameter) change. To manage the extended search possibilities, we parameterize the space of transformations as a family of mappings get : A → B indexed over some param- eter space p ∈ P. For example, we may deﬁne the IT-view to be parameterized by the experience of employees shown in the view (including any experience as a special parameter value). Then we have two interrelated propagation operations that map an update B B to a parameter update p p and a source update AA . Thus, the extended search space allows for new update policies that look for updating the parameter as an update propagation possibility. The possibility to update the transformation appears to be very natural in at least two impor- tant Bx scenarios: a) model transformation design and b) model transformation evolution (cf. [21]), which necessitates the enrichment of the delta lens frame- work with parameterization and learning. Note that all transformations get , p ∈ P are to be elements of the same lens, and operations put are not indexed by p, hence, formalization of learning by considering a family of ordinary lenses would not do the job. Categorical vs. codiscrete learning Suppose that the parameter p is itself a set, e.g., the set of departments forming a view can vary depending on some context. Then an update from p to p has a relational structure as discussed above, i.e., e: p → p is a relation e ⊂ p×p specifying which departments disap- peared from the view and which are freshly added. This is a general phenomenon: as soon as parameters are structures (sets of objects or graphs of objects and attributes), a parameter change becomes a structured delta and the space of pa- rameters gives rise to a category P. The search/propagation procedure returns an arrow e: p → p in this category, which updates the parameter value from p to p . Hence, a general model of supervised learning should assume P to be a category (and we say that learning is categorical).Thecaseoftheparameter 184 Z. Diskin space being a set is captured by considering a codiscrete category P whose only arrows are pairs of its objects; we call such learning codiscrete. 3.2 Ala-lenses The notion of a parameterized functor (p-functor) is fundamental for ala-lenses, but is not a lens notion per se and is thus placed into Appendix Sect. A.1. We will work with its exponential (rather than equivalent product-based) formulation but will do uncurrying and currying back if necessary, and often using the same symbol for an arrow f and its uncurried version f . Deﬁnition 3 (ala-lenses) Let A and B be categories. An ala-lens from A (the source of the lens) to B (the target)isapair =(get, put) whose ﬁrst component is a p-functor get: A B and the second component is a triple of upd req self (families of) operations put =(put , put , put ) indexed by pairs p ∈ P , p,A p,A p,A A ∈ A ; arities of the operations are speciﬁed below after we introduce some notation. Names req (for ’request’) and upd (for ’update’) are chosen to match the terminology in [17]. Categories A, B are called model spaces,theirobjectsare models and their arrows are (model) updates or deltas. Objects of P are called parameters and are denoted by small letters p, p ,.. rather than capital ones to avoid confusion with [17], in which capital P is used for the entire parameter set. Arrows of P are called parameter deltas. For a parameter p ∈ P ,wewrite get for the functor get(p): A → B (read “get B-views of A”), and if A ∈ A is a source model, its get -view is denoted by get (A) or A.get or even A (so that _ becomes p p p p yet another notation for functor get ). Given a parameter delta e: p → p and asourcemodel A ∈ A , the model delta get(e): get (A) → get (A) will be p p denoted by get (A) or e (rather than A as we would like to keep capital letters S e for objects only). In the uncurried version, get (A) is nothing but get(e, id ) Since get is a natural transformation, for any delta u: A → A we have a commutative square e ; u = u ; e (whose diagonal is get(e, u)). We will S p p A denote the diagonal of this square by u.get or u : A → A . Thus, we use e p e p notation def def def A = A.get = get (A) = get(p)(A) p p (1) def def def def nat u = u.get = get (u) = get(e)(u) = e ; u = u ; e : A → A e S p p A p e e p Now we describe operations put. They all have the same indexing set P × A , 0 0 and the same domain: for any index p, A and any model delta v: A → B in B, the value put (p, A), x ∈{req, upd, self} is deﬁned and unique: p,A upd put : p → p is a parameter delta from p, p,A req put : A → A is a model delta from A, p,A (2) self put : B → A is a model delta from B p,A p called the amendment and denoted by v . e SÊ General Supervised Learning 185 self Note that the deﬁnition of put involves an equational dependency between all three operations: for all A ∈ A , v ∈ B(A.get, _),werequire req upd self (Putget) (put .v).cod.get =(v; put ).cod where p =(put .v).cod 0 A p A A We will write an ala-lens as an arrow =(get, put): A B. Alensiscalled (twice) codiscrete if categories A, B, P are codiscrete and thus get: A B is a parameterized function. If only P is codiscrete, we call a codiscretely learning delta lens, while if only model spaces are codiscrete, we call a categorically learning codiscrete lens. Diagram in Fig. 2 shows how a lens’ operations are interrelated. The up- kkk upd e = put (v) p,A per part shows an arrow e: p → p in category P and two correspond- ing functors from A to B.Thelower part is to be seen as a 3D-prism with visible front face AA A A and visible upper face AA A ,thebot- p p tom and two back faces are invisi- ble, and the corresponding arrows are :get pÊ pÊ dashed. The prism denotes an alge- p v braic term: given elements are shown req u = put (v) p, A BÊ with black ﬁll and white font while de- AÊ pÊ rived elements are blue (recalls being mechanically computed) and blank :get pÊ AÊ AÊ (double-body arrows are considered pÊ self kkk v = put (v) p, A as “blank”). The two pairs of arrows originating from A and A are not blank because they denote pairs of Fig. 2: Ala-lens operations nodes (the UML says links)rather than mappings/deltas between nodes. Equational deﬁnitions of deltas e, u, v arewrittenupinthethreecalloutsnear them. The right back face of the prism is formed by the two vertical derived deltas u = u.get and u = u.get , and the two matching them horizontal derived p p p p deltas e = get (A) and e = get (A ); together they form a commutative S A e e square due to the naturality of get(e) as explained earlier. Deﬁnition 4 (Well-behavedness) An ala-lens is called well-behaved (wb) if the following two laws hold for all p ∈ P , A ∈ A and v: A → B : 0 0 p (Stability) if v = id then all three propagated updates e, u, v are identities: upd req self put (id )= id , put (id )= id , put (id )= id A p A S A A p,A p p,A p p,A p p req upd @ @ self (Putget)(put .v).get = v; v where e = put (v) and v = put (v) p,A p,A p,A Remark 2. Note that Remark 1 about the Putget law is again applicable. Example 1 (Identity lenses). Any category A givesrisetoanala-lens id with the following components. The source and target spaces are equal to A,and pÊ get get get pÊ :get :get p 186 Z. Diskin the parameter space is 1. Functor get is the identity functor and all putsare identities. Obviously, this lens is wb. Example 2 (Iso-lenses). Let ι: A → B be an isomorphism between model spaces. (ι) It gives rise to a wb ala-lens (ι): A → B with P = 1 = {∗} as follows. Given (ι).req −1 any A in A and v: ι(A) → B in B, we deﬁne put (v)= ι (v) while the ∗,A two other put operations map v to identities. Example 3 (Bx lenses). Examples of wb aa-lenses modelling a Bx can be found in [11]: they all can be considered as ala-lenses with a trivial parameter space 1. Example 4 (Learners). Learners deﬁned in [17] are codiscretely learning codis- crete lenses with amendment, and as such satisfy (the amended) Putget (Remark 1). Looking at the opposite direction, ala-lenses are a categoriﬁcation of learners as detailed in Fig. 8 on p. 194. 4 Compositionality of ala-lenses This section explores the compositional structure of the universe of ala-lenses; especially interesting is their sequential composition. We will begin with a small example demonstrating sequential composition of ordinary lenses and showing that the notion of update policy transcends individual lenses. Then we deﬁne sequential and parallel composition of ala-lenses (the former is much more in- volved than for ordinary lenses) and show that wb ala-lenses can be organized into an sm-category. Finally, we formalize the notion of a compositional update policy via the notion of a compositional bidirectional language. 4.1 Compositionality of update policies: An example Fig. 3 extends the example in Fig. 1 with a new model space C whose schema consists of the only attribute Name, and a view of the IT-view, in which only employees of the ML department are to be shown. Thus, we now have two functors, get1: A → B and get2: B → C, and their composition Get: A → C (referredtoasthe long get). The top part of Fig. 3 shows how it works for model A considered above. Each of the two policies, policy qt (green) and policy tr (orange), in which person’s disappearance from the view are interpreted, resp., as quiting the com- pany and transitioning to a department not included into the view, is applicable to the new view mappings get2 and Get, thus giving us six lenses shown in Fig. 4 qt tr with solid arrows; amongst them, lenses, L and L are obtained by applying policy pol to the (long) functor Get;, and we will refer to them long lenses. In addition, we can compose lenses of the same colour as shown in Fig. 4 by dashed qt tr tr arrows (and we can also compose lenses of diﬀerent colours ( with and 1 2 1 qt with ) but we do not need them). Now an important question is how long and pol pol pol composed lenses are related: whether L and ; for pol ∈{qt, tr},are 1 2 equal (perhaps up to some equivalence) or diﬀerent? qt upd.v : #J = #J General Supervised Learning 187 :Get Source model View B(IT departments) OIDs Name Expr. Depart. View C (ML dep.) OIDs Name Dep. #A Ann 10 Sales OIDs Name #J John DB :get1 :get2 #J John 10 DB #M Mary ML #M Mary qt #M Mary 5 ML qt :put2 :put1 qt qt upd. w: upd.u = u : #A = #A tr #J = #J :put2 quit Updated BÂ OIDs Name Dep. Updated CÊ qt qt Upd. source A‘ =A‘ #J John DB tr OIDs Name upd. u : OIDs Name Expr. Dep. 12 #M’ Mary ML #A = #A #M’ Mary tr #A Ann 10 Sales #J = #J :put1 #J John 10 DB #M = #M #M’ Mary ? ML tr tr Upd. source model A Updated BÊ qt uu =id tr OIDs Name Expr. Depart. Updated source A ‘ OIDs Name Dep. #A Ann 10 Sales OIDs Name Expr. Depart. #J John DB tr #J John 10 DB 𝛿 : upd. #A Ann 10 Sales A,w #M Mary ? in IT/notML #M’ Mary ? ML #A = #A #J John 10 DB #M’ Mary ML #J = #J #M Mary 5? in IT/notML #M’= #M’ #M’ Mary ? ML #M = #M #M Mary 5? notML Fig. 3: Example cont’d: functoriality of update policies qt Fig. 3 demonstrates how the mechanisms work L with a simple example. We begin with an update w qt qt l ;l 1 2 q qt of the view C that says that Mary #M left the ML qt department, and a new Mary #M was hired for A C tr tr ML. Policy qt interprets Mary’s disappearance as 1 2 tr tr l ;l 1 2 quiting the company, and hence this Mary doesn’t qt qt tr appear in view B produced by put2 nor in view qt qt qt qt A produced from B by put1 , and updates v qt Fig. 4: Lens combination and u are written accordingly. Obviously, Mary qt schemas for Fig. 3 also does not appear in view A produced by qt qt qt the long lens’s Put . Thus, put1 (put2 (w)) = A A qt Put (w), and it is easy to understand that such equality will hold for any source model A and any update w: C → C due to the nature of our two views get1 qt qt qt qt qt qt qt and get2. Hence, L = ; where L =(Get, Put ) and =(geti, puti ). 1 2 tr The situation with policy tr is more interesting. Model A produced by the tr tr tr tr tr composed lens ; , and model A produced by the long lens L =(Get, Put ) 1 2 are diﬀerent as shown in the ﬁgure (notice the two diﬀerent values for Mary’s department framed with red ovals in the models). Indeed, the composed lens has more information about the old employee Mary—it knows that Mary was in the IT view, and hence can propagate the update more accurately. The com- tr tr tr parison update δ : A → A adds this missing information so that equality A,w 12 tr tr tr u ; δ = u holds. This is a general phenomenon: functor composition looses A,w 12 information and, in general, functor Get = get1; get2 knows less than the pair (get1, get2). Hence, operation Put back-propagating updates over Get (we will tr :Put qt :Put upd. : #J = #J #M = #M upd. #A = # #J = #J #M = #M tr t tr e AÊ B@ BÊ (e*h) (e*h) AÊ (e ) A qÊ (e ) AÊ q Ap 188 Z. Diskin also say inverting Get) will, in general, result in less certain models than com- position put1 ◦ put2 that inverts the composition get1; get2 (a discussion and examples of this phenomenon in the context of vertical composition of updates tr can be found in [8]). Hence, comparison updates such as δ should exist for any A,w A and any w: A.Get → C , and together they should give rise to something like tr tr tr tr a natural transformation between lenses, δ : L ⇒ ; . To make this no- A,B,C 1 2 tion precise, we need a notion of natural transformation between “functors” put, which we leave for future work. In the present paper, we will consider policies like qt, for which strict equality holds. 4.2 Sequential composition of ala-lenses Let k : A → B and : B → C be two ala-lenses with parameterized functors get : P → [A, B] and get : Q → [B, C] resp. Their composition is the following ala-lens k ; . Its parameter space is the product P × Q,and the get-family is k ; deﬁned as follows. For any pair of parameters (p, q) (we will write pq), get = pq get ; get : A → C. Given a pair of parameter deltas, e: p → p in P and h: q → q p q k ; in Q,their get -image is the Godement product ∗ of natural transformations, k ; k k get (eh)= get (e) ∗ get (h) ( we will also write get || get ) pq 3Ê:qÊ pqÊ pq w 4Ê:pÊ v 5:qÊ pÊ pÊqÊ u BÊ p AÊ q pq v qÊ CÊ v v u u pÊ @ u pÊqÊ B B AÊ 3Ê: qÊ BÊ BÊ qÊ qÊ 5Ê: qÊ 4Ê:pÊ AÊ pÊqÊ AÊ AÊ pÊ Fig. 5: Sequential composition of apa-lenses Now we deﬁne k ; ’s propagation operations puts. Let (A, pq, A ) with A ∈ pq A , pq ∈ (P×Q) , A.get .get = A ∈ C be a state of lens k ; ,and w: A → 0 0 pq 0 pq p q C is a target update as shown in Fig. 3. For the ﬁrst propagation step, we run lens as shown in Fig. 3 with the blue colour for derived elements: this is just an qÊ pÊ 0:p 1:q 5:q 5:q 4:p 3:q General Supervised Learning 189 instantiation of the pattern of Fig. 2 with the source object being A = A.get and parameter q. The results are deltas (3) .upd .req @ .self h = put (w): q → q ,v = put (w): A → B ,w = put (w): C → B . q,A q q,A q,A p p p Next we run lens k at state (p, A) and the target update v produced by lens ;it is yet another instantiation of pattern in Fig. 2 (this time with the green colour for derived elements), which produces three deltas (4) k .upd k .req @ k .self e = put (v): p → p ,u = put (v): A → A ,v = put (v): B → A . p,A p,A p,A p These data specify the green prism adjoint to the blue prism: the edge v of the latter is the “ﬁrst half” of the right back face diagonal A A of the former. In order to make an instance of the pattern in Fig. 2 for lens k ; , we need to extend the blue-green diagram to a triangle prism by ﬁlling-in the corresponding “empty space”. These ﬁlling-in arrows are provided by functors get and get and shown in orange (where we have chosen one of the two equivalent ways of forming the Godement product – note two curve brown arrows). In this way we obtain yet another instantiation of the pattern in Fig. 2 denoted by k ; : (k ;)upd (k ;)req (k ;)self @ @ (5) put (w)=(e, h), put (w)= u, put (w)= w ; v A,pq A,pq A,pq @ @ where v denotes v .get . Thus, we built an ala-lens k ; , which satisﬁes equa- tion Putget by construction. Theorem 1 (Sequential composition and lens laws). Given ala-lenses k : A → B and : B → C,let lens k ; : A → C be their sequential composi- tion as deﬁned above. Then the lens k ; is wb as soon as lenses k and are such. See [9, Appendix A.3] for a proof. 4.3 Parallel composition of ala-lenses Let : A → B , i =1, 2 be two ala-lenses with parameter spaces P . The lens i i i i || : A ×A → B ×B is deﬁned as follows. Parameter space || .P = P × 1 2 1 2 1 2 1 2 1 || 1 2 1 2 P . For any pair p ||p ∈ (P ×P ) ,deﬁne get = get × get (we denote 2 1 2 1 2 0 p p p ||p 1 2 1 2 pairs of parameters by p ||p rather than p ⊗ p to shorten long formulas going 1 2 1 2 beyond the page width). Further, for any pair of models A ||A ∈ (A × A ) 1 2 1 2 0 || 1 2 and deltas v ||v :(A ||A ).get → B ||B , we deﬁne componentwise 1 2 1 2 1 2 p ||p 1 2 ( || )upd 1 2 e = put (v ||v ): p ||p → p ||p 1 2 1 2 1 2 p ||p ,A ||A 1 2 1 2 ( || )req 1 2 by setting e = e ||e where e = put (v ), i =1, 2 and similarly for put 1 2 i i p ,S i i p ||p ,A ||A 1 2 1 2 ( || )self 1 2 and put The following result is obvious. p ||p ,A ||A 1 2 1 2 Theorem 2 (Parallel composition and lens laws). Lens || is wb as soon 1 2 as lenses and are such. 1 2 −L bx 190 Z. Diskin 4.4 Symmetric monoidal structure over ala-lenses Our goal is to organize ala-lenses into an sm-category. To make sequential compo- sition of ala-lenses associative, we need to consider them up to some equivalence (indeed, Cartesian product is not strictly associative). Deﬁnition 5 (Ala-lens Equivalence) Two parallel ala-lenses , : A → B are called equivalent if their parameter spaces are isomorphic via a functor ι: P → P such that for any A ∈ A , e: p → p ∈ P and v:(A.get ) → T the following 0 p holds (for x∈{req, self}): upd A.get = A.get ,ι(put (v)) = put (v), and put (v)= put (v) e ι(e) ι(p),A p,A ι(p),A p,A Remark 3. It would be more categorical to require delta isomorphisms (i.e., com- mutative squares whose horizontal edges are isomorphisms) rather than equali- ties as above. However, model spaces appearing in Bx-practice are skeletal cat- egories (and even stronger than skeletal in the sense that all isos, including iso loops, are identities), for which isos become equalities so that the generality would degenerate into equality anyway. It is easy to see that operations of lens’ sequential and parallel composition are compatible with lens’ equivalence and hence are well-deﬁned for equivalence classes. Below we identify lenses with their equivalence classes by default. Theorem 3 (Ala-lenses form an sm-category). Operations of sequential and parallel composition of ala-lenses deﬁned above give rise to an sm-category aLaLens aLaLens aLaLens, whose objects are model spaces (= categories) and arrows are (equiv- alence classes of) ala-lenses. See [9, p.17 and Appendix A.4] for a proof. 4.5 Functoriality of learning in the delta lens setting As example in Sect. 4.1 shows, the notion of update policy transcends individual lenses. Hence, its proper formalization needs considering the entire category of ala-lenses and functoriality of a suitable mapping. Deﬁnition 6 (Bx-transformation language) A compositional bidirectional model transforma- aLaLens aLaLens aLaLens wb tion language L is given by (i) an sm-category bx pGet pGet pGet(L ) whose objects are (L -)model spaces bx bx −wb and arrows are (L -)transformations which is bx bx pGet pGet pGet(L ) aLaLens aLaLens aLaLens bx supplied with forgetful functor into pC pC pCat at at,and (ii) an sm-functor L : pGet pGet pGet(L ) → aLaLens aLaLens aLaLens L bx bx such that the lower triangle in the inset diagram commutes. (Forgetful functors in this diagram pCat pC pCat at are named “−X”with X referring to the structure to be forgotten.) An L -language is well-behaved (wb) if functor L factorizes as shown by bx L bx the upper triangle of the diagram. −put −R General Supervised Learning 191 Example. A major compositionality result of Fong et al [17] states the existence of an sm-functor from the category of Euclidean spaces and parameterized dif- ferentiable functions (pd-functions) Pa Pa Parrraaa into the category Lear Lear Learn n n of learning algorithms (learners) as shown by the inset commutative diagram. (The functor is itself parameterized by a step size 0 <ε ∈ R and ε,err an error function err: R×R → R needed to specify Pa Pa Parrraaa Lear Lear Learn n n the gradient descent procedure.) However, learners are nothing but codiscrete ala-lenses (see Sect. A.2), and thus the inset diagram is a codiscrete specialization of pS pS pSet et et the diagram in Def. 6 above. That is, the category of Euclidean spaces and pd-functions, and the gradient descent method for back propagation, give rise to a (codiscrete) compositional pSet pCat bx-transformation language (over pS pSet et rather than pC pCat at). Finding a speciﬁcally Bx instance of Def. 6 (e.g., checking whether it holds for concrete languages and tools such as eMoflon [23] or groundTram [22]) is laborious and left for future work. 5 Related work Figure 6 on the right is a simpliﬁed version of Fig. 8 Parameter Space on p. 194 convenient for our discussion here: imme- learning diate related work should be found in areas located 1 delta learners at points (0,1) (codiscrete learning lenses) and (1,0) lenses (delta lenses) of the plane. For the point (0,1), the pa- Model per [17] by Fong, Spivak and Tuyéras is fundamental: codiscr. delta Spaces lenses lenses they deﬁned the notion of a codiscrete learning lens (called a learner), proved a fundamental results about Fig. 6 sm-functoriality of the gradient descent approach to ML, and thus laid a foundation for the compositional approach to change prop- agation with learning. One follow-up of that work is paper [16] by Fong and Johnson, in which they build an sm-functor Lear Lear Learn n n → sLens sLens sLens which maps learn- ers to so called symmetric lenses. That paper is probably the ﬁrst one where the terms ’lens’ and ’learner’ are met, but the initial observation that a learner whose parameter set is a singleton is actually a lens is due to Jules Hedges, see [16]. There are conceptual and technical distinctions between [16] and the present paper. On the conceptual level, by encoding learners as symmetric lenses, they “hide” learning inside the lens framework and make it a technical rather than conceptual idea. In contrast, we consider parameterization and supervised learn- ing as a fundamental idea and a ﬁrst-class citizen for the lens framework, which grants creation of a new species of lenses. Moreover, while an ordinary lens is a way to invert a functor, a learning lens is a way to invert a parameterized func- tor so that learning lenses appear as an extension of the parameterization idea from functors to lenses. (This approach can probably be speciﬁed formally by treating parameterization as a suitably deﬁned functorial construction.) Besides −put 192 Z. Diskin technical advantages (working with asymmetric lenses is simpler), our asymmet- ric model seems more adequate to the problem of learning functions rather than relations. On the technical level, the lens framework we develop in the paper is much more general than in [16]: we categoriﬁcated both the parameter space and model spaces, and we work with lenses with amendment (which allows us to relax the Putget law if needed). As for the delta lens roots (the point (1,0) in the ﬁgure), delta lenses were motivated and formally deﬁned in [12] (the asymmetric case) and [13] (the sym- metric one). Categorical foundations for the delta lens theory were developed by Johnson and Rosebrugh in a series of papers (see [20] for references); this line is continued in Clarke’s work [6]. The notion of a delta lens with amend- ments (in both asymmetric and symmetric variants) was deﬁned in [11], and several composition results were proved. Another extensive body of work within the delta-based area is modelling and implementing model transformations with triple-graph grammars (TGG) [4,23]. TGG provide an implementation frame- work for delta lenses as is shown and discussed in [5,19,2], and thus inevitably consider change propagation on a much more concrete level than lenses. The author is not aware of any work considering functoriality of update policies developed within the TGG framework. The present paper is probably the ﬁrst one at the intersection (1,1) of the plane. The preliminary results have recently been reported at ACT’19 in Oxford to a representative lens community, and no references besides [17], [16] mentioned abovewereprovided. 6 Conclusion The perspective on Bx presented in the paper is an example of a fruitful in- teraction between two domains—ML and Bx. In order to be ported to Bx, the compositional approach to ML developed in [17] is to be categoriﬁcated as shown in Fig. 8 on p. 194. This opens a whole new program for Bx: checking that cur- rently existing Bx languages and tools are compositional (and well-behaved) in the sense of Def. 6 p. 190. The wb compositionality is an important practical requirement as it allows for modular design and testing of bidirectional trans- formations. Surprisingly, but this important requirement has been missing from the agenda of the Bx community, e.g., the recent endeavour of developing an eﬀective benchmark for Bx-tools [3] does not discuss it. In a wider context, the main message of the paper is that the learning idea transcends its applications in ML: it is applicable and usable in many domains in which lenses are applicable such as model transformations, data migration, and open games [18]. Moreover, the categoriﬁcated learning may perhaps ﬁnd useful applications in ML itself. In the current ML setting, the object to be learnt is m n a function f: R → R that, in the OO class modelling perspective, is a very simple structure: it can be seen as one object with a (huge) amount of attributes, or, perhaps, a predeﬁned set of objects, which is not allowed to be changed during the search — only attribute values may be changed. In the delta lens view, General Supervised Learning 193 such changes constitute a rather narrow class of updates and thus unjustiﬁably narrow the search space. Learning with the possibility to change dimensions m, n may be an appropriate option in several contexts. On the other hand, while categoriﬁcation of model spaces extends the search space, categoriﬁcation of the parameter space would narrow the search space as we are allowed to replace a parameter p by parameter p only if there is a suitable arrow e: p → p in category P. This narrowing may, perhaps, improve performance. All in all, the interaction between ML and Bx could be bidirectional! A Appendices A.1 Category of parameterized functors pC pC pCat at at Category pC pC pCat at at has all small categories as objects. pC pC pCat at at-arrows A → B are parameterized functors (p-functors) i.e., functors f: P → [A, B] with P asmall category of parameters and [A, B] the category of functors from A to B and their natural transformations. For an object p andanarrow e: p → p in P, we write f for the functor f(p): A → B and f for the natural transformation p e f(e): f ⇒ f . We will write p-functors as labelled arrows f: A B.As Ca Ca Cattt p p is Cartesian closed, we have a natural isomorphism between Ca Ca Cattt(P, [A, B]) and Ca Ca Cattt(P×A, B) and can reformulate the above deﬁnition in an equivalent way with functors P×A → B. We prefer the former formulation as it corresponds to the notation f: A B visualizing P as a hidden state of the transformation, which seems adequate to the intuition of parameterized in our context. (If some technicalities may perhaps be easier to see with the product formulation, we will switch to the product view thus doing currying and uncurrying without special P Q mentioning.) Sequential composition of of f: A B and g: B C is P×Q def f.g: A C given by (f.g) = f .g for objects, i.e., pairs p∈P, q∈Q,and pq p q by the Godement product of natural transformations for arrows in P×Q.That is, given a pair e: p → p in P and h: q → q in Q, we deﬁne the transformation (f.g) : f .g ⇒ f .g to be the Godement product f ∗ g . eh p q p q e h Any category A givesrisetoap-functor Id : A A, whose param- eter space is a singleton category 1 with the only object ∗, Id (∗)= id A A and Id (id ): id ⇒ id is the identity transformation. It’s easy to see that A ∗ A A p-functors Id are units of the sequential composition. To ensure associativ- ity we need to consider p-functors up to an equivalence of their parameter P P spaces. Two parallel p-functors f: A B and f: A B,are equiv- alent if there is an isomorphism α: P → P such that two parallel functors f: P → [A, B] and α; f: P → [A, B] are naturally isomorphic; then we write ˆ ˆ f ≈ f.It’seasytoseethatif f ≈ f: A → B and g ≈ gˆ: B → C,then α α β f; g ≈ f;ˆ g: A → C, i.e., sequential composition is stable under equivalence. α×β Below we will identify p-functors and their equivalence classes. Using a natu- ral isomorphism (P×Q)×R P×(Q×R), strict associativity of the functor composition and strict associativity of the Godement product, we conclude that 194 Z. Diskin sequential composition of (equivalence classes of) p-functors is strictly associa- tive. Hence, pC pC pCat at at is a category. pC pC pCat at at pS pS pSet et et Our next goal is to supply it with a monoidal structure. We borrow the latter from the sm- Cat category (Ca Cat, t ×), whose tensor is given by the prod- uct. There is an identical on objects embedding (Ca Ca Cat, tt ×) (Se Se Set, tt ×) Cat pCat (Ca Cat, t ×) pC pCat at that maps a functor f: A → B Fig. 7 toap-functor f: A B whose parameter space is the singleton category 1. Moreover, as this embedding is a functor, the co- Cat herence equations for the associators and unitors that hold in (Ca Cat, t ×) hold in pCat pCat pC pCat at as well (this proof idea is borrowed from [17]). In this way, pC pCat at becomes an sm-category. In a similar way, we deﬁne the sm-category pS pS pSet et et of small sets and parametrized functions between them — the codiscrete version of pC pC pCat at at.The diagram in Fig. 7 shows how these categories are related. A.2 Ala-lenses as categoriﬁcation of ML-learners Figure 8 shows a discrete two-dimensional plane with each axis having three points: a space is a singleton, a set, a category encoded by coordinates 0,1,2 resp. Each of the points x is then the location of a corresponding sm-category of ij Parameter cat egorical learning delta space learners lenses with amend. aLLens 2 aLaLens P∈ Cat {1} learners of co discr. learning delta Fo ng et al lenses with amend. ☀ ☀ ☀ aL L Lens ∈ aL aLens P Cat 1 {1} co discr. d elta lenses lenses w it h amend. {1} aLens aaLens P = 1 {1} 0 2 Model A, B ∈ Cat ∈ spaces A, B = 1 A, B Cat Fig. 8: The universe of categories of learning delta lenses (asymmetric) learning (delta) lenses. Category {111111111} is a terminal category whose only arrow is the identity lens 111111111 =(id , id ): 1 → 1 propagating from a terminal 1 1 category 1 to itself. Label ∗∗∗ refers to the codiscrete specialization of the construct ∗ ∗ being labelled: L L L means codiscrete learning (i.e., the parameter space P is a ∗ ∗ set considered as a codiscrete category) and aLens aLens aLens refers to codiscrete model spaces. The category of learners deﬁned in [17] is located at point (1,1), and the category of learning delta lenses with amendments deﬁned in the present paper is located at (2,2). There are also two semi-categoriﬁcated species of learning lenses: categorical learners at point (1,2) and codiscretely learning delta lenses at (2,1), which are special cases of ala-lenses. General Supervised Learning 195 References 1. Abiteboul, S., McHugh, J., Rys, M., Vassalos, V., J.Wiener: Incremental Mainte- nance for Materialized Views over Semistructured Data. In: Gupta, A., Shmueli, O., Widom, J. (eds.) VLDB. Morgan Kaufmann (1998) 2. Anjorin, A.: An introduction to triple graph grammars as an implementation of the delta-lens framework. In: Gibbons, J., Stevens, P. (eds.) Bidirectional Trans- formations - International Summer School, Oxford, UK, July 25-29, 2016, Tutorial Lectures. Lecture Notes in Computer Science, vol. 9715, pp. 29–72. Springer (2016). https://doi.org/10.1007/978-3-319-79108-1 3. Anjorin, A., Diskin, Z., Jouault, F., Ko, H., Leblebici, E., Westfechtel, B.: Bench- marx reloaded: A practical benchmark framework for bidirectional transformations. In: Eramo and Johnson [15], pp. 15–30, http://ceur-ws.org/Vol-1827/paper6. pdf 4. Anjorin, A., Leblebici, E., Schürr, A.: 20 years of triple graph grammars: A roadmap for future research. ECEASST 73 (2015). https://doi.org/10.14279/ tuj.eceasst.73.1031 5. Anjorin, A., Rose, S., Deckwerth, F., Schürr, A.: Eﬃcient model synchronization with view triple graph grammars. In: Modelling Foundations and Applications - 10th European Conference, ECMFA 2014, York, UK, July 21-25, 2014. Proceed- ings. Lecture Notes in Computer Science, vol. 8569, pp. 1–17. Springer (2014). https://doi.org/10.1007/978-3-319-09195-2_1 6. Clarke, B.: Internal lenses as functors and cofunctors. In: Pre-proceedings of ACT’19, Oxford, 2019. http://www.cs.ox.ac.uk/ACT2019/preproceedings/ BryceClarke.pdf 7. Czarnecki, K., Foster, J.N., Hu, Z., Lämmel, R., Schürr, A., Terwilliger, J.F.: Bidi- rectional transformations: A cross-discipline perspective. In: Theory and Practice of Model Transformations, pp. 260–283. Springer (2009) 8. Diskin, Z.: Compositionality of update propagation: Lax putput. In: Eramo and Johnson [15], pp. 74–89, http://ceur-ws.org/Vol-1827/paper12.pdf 9. Diskin, Z.: General supervised learning as change propagation with delta lenses. CoRR abs/1911.12904 (2019), http://arxiv.org/abs/1911.12904 10. Diskin, Z., Gholizadeh, H., Wider, A., Czarnecki, K.: A three-dimensional taxon- omy for bidirectional model synchronization. Journal of System and Software 111, 298–322 (2016). https://doi.org/10.1016/j.jss.2015.06.003 11. Diskin, Z., König, H., Lawford, M.: Multiple model synchronization with multiary delta lenses with amendment and K-Putput. Formal Asp. Comput. 31(5), 611–640 (2019). https://doi.org/10.1007/s00165-019-00493-0, (Sect.7.1 of the paper is unreadable and can be found in http://arxiv.org/abs/1911.11302) 12. Diskin, Z., Xiong, Y., Czarnecki, K.: From State- to Delta-Based Bidirectional Model Transformations: the Asymmetric Case. Journal of Object Technology 10, 6: 1–25 (2011) 13. Diskin, Z., Xiong, Y., Czarnecki, K., Ehrig, H., Hermann, F., Orejas, F.: From state-to delta-based bidirectional model transformations: the symmetric case. In: MODELS, pp. 304–318. Springer (2011) 14. El-Sayed, M., Rundensteiner, E.A., Mani, M.: Incremental Maintenance of Materi- alized XQuery Views. In: Liu, L., Reuter, A., Whang, K.Y., Zhang, J. (eds.) ICDE. p. 129. IEEE Computer Society (2006). https://doi.org/10.1109/ICDE.2006.80 15. Eramo, R., Johnson, M. (eds.): Proceedings of the 6th International Workshop on Bidirectional Transformations co-located with The European Joint Conferences 196 Z. Diskin on Theory and Practice of Software, Bx@ETAPS 2017, Uppsala, Sweden, April 29, 2017, CEUR Workshop Proceedings, vol. 1827. CEUR-WS.org (2017), http: //ceur-ws.org/Vol-1827 16. Fong, B., Johnson, M.: Lenses and learners. In: Cheney, J., Ko, H. (eds.) Proceed- ings of the 8th International Workshop on Bidirectional Transformations co-located with the Philadelphia Logic Week, Bx@PLW 2019, Philadelphia, PA, USA, June 4, 2019. CEUR Workshop Proceedings, vol. 2355, pp. 16–29. CEUR-WS.org (2019), http://ceur-ws.org/Vol-2355/paper2.pdf 17. Fong, B., Spivak, D.I., Tuyéras, R.: Backprop as functor: A compositional perspec- tive on supervised learning. In: The 34th Annual ACM/IEEE Symposium on Logic in Computer Science, LICS 2019, Vancouver, BC, Canada, June 24-27, 2019. pp. 1–13. IEEE (2019). https://doi.org/10.1109/LICS.2019.8785665 18. Hedges, J.: From open learners to open games. CoRR abs/1902.08666 (2019), http://arxiv.org/abs/1902.08666 19. Hermann, F., Ehrig, H., Orejas, F., Czarnecki, K., Diskin, Z., Xiong, Y., Gottmann, S., Engel, T.: Model synchronization based on triple graph grammars: correct- ness, completeness and invertibility. Software and System Modeling 14(1), 241–269 (2015). https://doi.org/10.1007/s10270-012-0309-1 20. Johnson, M., Rosebrugh, R.D.: Unifying set-based, delta-based and edit-based lenses. In: The 5th International Workshop on Bidirectional Transformations, Bx 2016. pp. 1–13 (2016), http://ceur-ws.org/Vol-1571/paper_13.pdf 21. Kappel, G., Langer, P., Retschitzegger, W., Schwinger, W., Wimmer, M.: Model transformation by-example: A survey of the ﬁrst wave. In: Conceptual Modelling and Its Theoretical Foundations - Essays Dedicated to Bernhard Thalheim on the Occasion of His 60th Birthday. pp. 197–215 (2012). https://doi.org/10.1007/ 978-3-642-28279-9_15 22. Sasano, I., Hu, Z., Hidaka, S., Inaba, K., Kato, H., Nakano, K.: Toward bidi- rectionalization of ATL with GRoundTram. In: Theory and Practice of Model Transformations - 4th International Conference, ICMT 2011, Zurich, Switzerland, June 27-28, 2011. Proceedings. Lecture Notes in Computer Science, vol. 6707, pp. 138–151. Springer (2011). https://doi.org/10.1007/978-3-642-21732-6_10 23. Weidmann, N., Anjorin, A., Fritsche, L., Varró, G., Schürr, A., Leblebici, E.: Incremental bidirectional model transformation with emoﬂon: Ibex. In: The 8th International Workshop on Bidirectional Transformations co-located with the Philadelphia Logic Week, Bx@PLW 2019, Philadelphia, PA, USA, June 4, 2019. CEUR Workshop Proceedings, vol. 2355, pp. 45–55. CEUR-WS.org (2019), http: //ceur-ws.org/Vol-2355/paper4.pdf General Supervised Learning 197 Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/ 4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made. The images or other third party material in this chapter are included in the chapter’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. Non-idempotent intersection types in logical form Thomas Ehrhard [] Université de Paris, IRIF, CNRS, F-75013 Paris, France ehrhard@irif.fr https://www.irif.fr/ ehrhard/ Abstract. Intersection types are an essential tool in the analysis of oper- ational and denotational properties of lambda-terms and functional pro- grams. Among them, non-idempotent intersection types provide precise quantitative information about the evaluation of terms and programs. However, unlike simple or second-order types, intersection types cannot be considered as a logical system because the application rule (or the intersection rule, depending on the presentation of the system) involves a condition stipulating that the proofs of premises must have the same structure. Using earlier work introducing an indexed version of Linear Logic, we show that non-idempotent typing can be given a logical form in a system where formulas represent hereditarily indexed families of intersection types. Keywords: Lambda Calculus · Denotational Semantics · Intersection Types · Linear Logic Introduction Intersection types, introduced in the work of Coppo and Dezani [4,5] and de- veloped since then by many authors, are still a very active research topic. As quite clearly explained in [13], the Coppo and Dezani intersection type system DΩ can be understood as a syntactic presentation of the denotational interpre- tation of λ-terms in the Engeler’s model, which is a model of the pure λ-calculus in the cartesian closed category of prime-algebraic complete lattices and Scott continuous functions. Intersection types can be considered as formulas of the propositional calculus with implication ⇒ and conjunction ∧ as connectives. However, as pointed out by Hindley [12], intersection types deduction rules depart drastically from the standard logical rules of intuitionistic logic (and of any standard logical system) by the fact that, in the ∧-introduction rule, it is assumed that the proofs of the two premises are typings of the same λ-term, which means that, in some sense made precise by the typing system itself, they have the same structure. Such requirements on proofs premises, and not only on formulas proven in premises, Partially supported by the project ANR-19-CE48-0014 PPS. c The Author(s) 2020 J. Goubault-Larrecq and B. K¨ onig (Eds.): FOSSACS 2020, LNCS 12077, pp. 198–216, 2020. https://doi.org/10.1007/978-3-030-45231-5_11 Non-idempotent intersection types in logical form 199 are absent from standard (intuitionistic or classical) logical systems where the proofs of premises are completely independent from each other. Many authors have addressed this issue, we refer to [14] for a discussion on several solutions which mainly focus on the design of àlaChurch presentations of intersection typ- ing systems, thus enriching λ-terms with additional structures. Among the most recent and convincing contributions to this line of research we should certainly mention [15]. In our “new” approach to this problem — not so new actually since it dates back to [3] —, we change formulas instead of changing terms. It is based on a speciﬁc model of Linear Logic (and thus of the λ-calculus): the relational model. It is fair to credit Girard for the introduction of this model since it appears at least implicitly in [11]. It was probably known by many people in the Linear Logic community as a piece of folklore since the early 1990’s and is presented formally in [3]. In this quite simple and canonical denotational model, types are interpreted as sets (without any additional structure) and a closed term of type σ is interpreted as a subset of the interpretation of σ. It is quite easy to deﬁne, in this semantic framework, analogues of the usual models of the pure λ-calculus such as Scott’s D or Engeler’s model, which in some sense are simpler than the original ones since the sets interpreting types need not to be pre-ordered. As explained in the work of De Carvalho [6,7], the intersection type counterpart of this semantics is a typing system where “intersection” is non- idempotent (in sharp contrast with the original systems introduced by Coppo and Dezani), sometimes called system R. Notice that the precise connection between the idempotent and non-idempotent approaches is analyzed in [8], in a quite general Linear Logic setting by means of an extensional collapse. In order to explain our approach, we restrict ﬁrst to simple types, interpreted as follows in the relational model: a basic type α isinterpretedasagivenset α and the type σ ⇒ τ is interpreted as the set M (σ) × τ (where M (E) is ﬁn ﬁn the set of ﬁnite multisets of elements of E). Remember indeed that intersection types can be considered as a syntactic presentation of denotational semantics, so it makes sense to deﬁne intersection types relative to simple types (in the spirit of [10]) as we do in Section 3: an intersection type relative to the base type α is an element of α and an intersection type relative to σ ⇒ τ is a pair ([ a ,...,a ],b) 1 n where the a s are intersection types relative to σ and b is an intersection type relative to τ ; with more usual notations ([ a ,...,a ],b) would be written (a ∧ 1 n 1 ··· ∧ a ) → b. Then, given a type σ, the main idea consists in representing an indexed family of elements of σ as a formula of a new logical system. If σ =(ϕ ⇒ ψ) then the family can be written ([ a | k ∈ K and u(k)= j ],b ) k j j∈J −1 where J and K are indexing sets, u : K → J is a function such that f ({j}) is ﬁnite for all j ∈ J , (b ) is a family of elements of ψ (represented by a formula j j∈J B)and (a ) is a family of elements of ϕ (represented by a formula A): in k k∈K that case we introduce the implicative formula (A ⇒ B) to represent the family That we prefer not to use for avoiding confusions between these two levels of typing. We use [ ··· ] for denoting multisets much as one uses {· · · } for denoting sets, the only diﬀerence is that multiplicities are taken into account. 200 T. Ehrhard ([ a | k ∈ K and u(k)= j ],b ) . It is clear that a family of simple types has k j j∈J generally inﬁnitely many representations as such formulas; this huge redundancy makes it possible to establish a tight link between inhabitation of intersection types with provability of formulas representing them (in an indexed version LJ(I) of intuitionistic logic). Such a correspondence is exhibited in Section 3 in the simply typed setting and the idea is quite simple: given a type σ, a family (a ) of elements of σ, and a closed λ-term j j∈J of type σ,itisequivalenttosay that M : a holds for all j and to say that some (and actually any) formula A representing (a ) has an j j∈J LJ(I) proof whose underlying λ-term is M . In Section 4 we extend this approach to the untyped λ-calculus taking as underlying model of the pure λ-calculus our relational version R of Scott’s D . ∞ ∞ We deﬁne an adapted version of LJ(I) and establish a similar correspondence, with some slight modiﬁcations due to the speciﬁcities of R . 1 Notations and preliminary deﬁnitions If E is a set, a ﬁnite multiset of elements of E is a function m : E → N such that the set {a ∈ E | m(a) =0} (called the domain of m) is ﬁnite. The cardinal of such a multiset m is #m = m(a).Weuse + for the obvious addition a∈E operation on multisets, and if a ,...,a are elements of E,weuse [ a ,...,a ] 1 n 1 n for the corresponding multiset (taking multiplicities into account); for instance [0, 1, 0, 2,1] is the multiset m of elements of N such that m(0) = 2, m(1) = 2, m(2)=1 and m(i)=0 for i> 2.If (a ) is a family of elements of E and if J i i∈I is a ﬁnite subset of I, we use [ a | i ∈ J ] for the multiset of elements of E which maps a ∈ E to the number of elements i ∈ J such that a = a (which is ﬁnite since J is). We use M (E) for the set of ﬁnite multisets of elements of E. ﬁn We use + to denote set union when we we want to stress the fact that the −1 involved sets are disjoint. A function u : J → K is almost injective if #u {k} is ﬁnite for each k ∈ K (equivalently, the inverse image of any ﬁnite subset of K under u is ﬁnite). If s =(a ,...,a ) is a sequence of elements of E and 1 n i ∈{1,...,n}, we use (s) \ i for the sequence (a ,...,a ,a ,...,a ).Given 1 i−1 i+1 n sets E and F , we use F for the set of function from E to F . The elements of F are sometimes considered as functions u (with a functional notation u(e) for application) and sometimes as indexed families a (with index notations a for application) especially when E is countable. If i ∈{1,...,n} and j ∈{1,...,n − 1}, we deﬁne s(j, i) ∈{1,...,n} as follows: s(j, i)= j if j< i and s(j, i)= j +1 if j ≥ i. Any such proof can be stripped from its indexing data giving rise to a proof of σ in intuitionistic logic. Non-idempotent intersection types in logical form 201 2 The relational model of the λ-calculus Let Rel the category whose objects are sets and Rel (X, Y )= P(M (X) × Y ) ! ! ﬁn with Id = {([ a ],a) | a ∈ X} and composition of s ∈ Rel (X, Y ) and t ∈ X ! Rel (Y, Z) given by t ◦ s = {(m + ··· + m ,c) | 1 k ∃b ,...,b ∈ Y ([ b ,...,b ],c) ∈ t and ∀j (m ,b ) ∈ s} . 1 k 1 k j j It is easily checked that this composition law is associative and that Id is neutral for composition . This category has all countable products: let (X ) be a j j∈J countable family of sets, their product is X = X = {j}× X and j∈J j j j∈J projections (pr ) given by pr = {([ (j, a)],a) | a ∈ X }∈ Rel (X, X ) and if j∈J j ! j j j (s ) is a family of morphisms s ∈ Rel (Y, X ) then their tupling is s
= j j∈J j ! j j j∈J {([ a ], (j, b))) | j ∈ J and ([ a ],b) ∈ s }∈ Rel (Y, X). j ! The category Rel is cartesian closed with object of morphisms from X to Y the set (X ⇒ Y )= M (X)×Y and evaluation morphism Ev ∈ Rel ((X ⇒ Y )& ﬁn ! X, Y ) is given by Ev = {([ (1, [ a ,...,a ],b), (2,a ),..., (2,a )],b) | a ,...,a ∈ 1 k 1 k 1 k X and b ∈ Y }. The transpose (or curryﬁcation) of s ∈ Rel (Z & X, Y ) is Cur(s) ∈ Rel (Z, X ⇒ Y ) given by Cur(s)= {([ c ,...,c ], ([ a ,...,a ],b)) | ! 1 n 1 k ([ (1,c ),..., (1,c ), (2,a ),..., (2,a )],c) ∈ s}. 1 n 1 k Relational D . Let R be the least set such that (m ,m ,... ) ∈ R as soon ∞ ∞ 0 1 ∞ as m ,m ... are ﬁnite multisets of elements of R which are almost all equal 0 1 ∞ to []. Notice in particular that e =([ ],[],... ) ∈ R and satisﬁes e =([ ], e). By construction we have R = M (R ) × R ,thatis R =(R ⇒ R ) ∞ ﬁn ∞ ∞ ∞ ∞ ∞ and hence R is a model of the pure λ-calculus in Rel which also satisﬁes the ∞ ! η-rule. See [1] for general facts on this kind of model. 3Thesimplytypedcase We assume to be given a set of type atoms α, β, . . . and of variables x, y, . . . ; types and terms are given as usual by σ, τ, . . . := α | σ ⇒ τ and M, N, . . . := x | (M) N | λx N . With any type atom we associate a set α. This interpretation is extended to all types by σ ⇒ τ = σ ⇒ τ = M (σ)× τ . The relational semantics of ﬁn this λ-calculus can be described as a non-idempotent intersection type system, with judgments of shape x : m : σ ,...,x : m : σ M : a : σ where the x ’s 1 1 1 n n n i are pairwise distinct variables, M is a term, a ∈ σ and m ∈M (σ ) for i ﬁn i each i. Here are the typing rules: j = i ⇒ m =[ ] and m =[ a ] Φ, x : m : σ M : b : τ j i Φ λx M :(m, b): σ ⇒ τ (x : m : σ ) x : a : σ i i i i i=1 We can restrict to countable sets. This results from the fact that Rel arises as the Kleisli category of the LL model of sets and relations, see [3] for instance. 202 T. Ehrhard Φ M :([ a ,...,a ],b): σ ⇒ τ (Φ N : a : σ) 1 k l l l=1 Ψ (M) N : b : τ n l n where Φ =(x : m : σ ) , Φ =(x : m : σ ) for l =1,...,k and i i i l i i i=1 i i=1 l n Ψ =(x : m + m : σ ) . i i i i i=1 l=1 3.1 Why do we need another system? The trouble with this deduction system is that it cannot be considered as the term decorated version of an underlying “logical system for intersection types” allowing to prove sequents of shape m : σ ,...,m : σ a : σ (where non- 1 1 n n idempotent intersection types m and a are considered as logical formulas, the ordinary types σ playing the role of “kinds”) because, in the application rule above, it is required that all the proofs of the k right hand side premises have the same shape given by the λ-term N . We propose now a “logical system” derived from [3] which, in some sense, solves this issue. The main idea is quite simple and relies on three principles: (1) replace hereditarily multisets with indexed families in intersection types, (2) instead of proving single types, prove indexed families of hereditarily indexed types and (3) represent syntactically such families (of hereditarily indexed types) as formulas of a new system of indexed logic. 3.2 Minimal LJ(I) We deﬁne now the syntax of indexed formulas. Assume to be given an inﬁnite countable set I of indices. Then we deﬁne indexed types A; with each such type d(A) we associate an underlying type A, a set d(A) and a family A
∈ A . These formulas are given by the following inductive deﬁnition: – if J ⊆ I and f : J → α is a function then α[f] is a formula with α[f] = α, d(α[f]) = J and α[f]
= f – and if A and B are formulas and u : d(A) → d(B) is almost injective then A ⇒ B is a formula with A ⇒ B = A ⇒ B, d(A ⇒ B)= d(B) and, for u u u k ∈ d(B), A ⇒ B
=([ A
| j ∈ d(A) and u(j)= k ], B
). u k j k Proposition 1. Let σ be a type, J be a subset of I and f ∈ σ . There is a formula A such that A = σ, d(A)= J and A
= f (actually, there are inﬁnitely many such A’s as soon as σ is not an atom and J = ∅). Proof. The proof is by induction on σ.If σ is an atom α then we take A = α[f]. Assume that σ =(ρ ⇒ τ) so that f(j)=(m ,b ) with m ∈M (ρ) and j j j ﬁn b ∈ τ . Since each m is ﬁnite and I is inﬁnite, we can ﬁnd a family (K ) of j j j j∈J pairwise disjoint ﬁnite subsets of I such that #K =#m .Let K = K , j j j j∈J there is a function g : K → ρ such that m =[ g(k) | k ∈ K ] for each j ∈ J j j (choose ﬁrst an enumeration g : K → ρ of m for each j and then deﬁne j j j g(k)= g (k) where j is the unique element of J such that k ∈ K ). Let u : K → J j j be the unique function such that k ∈ K for all k ∈ K; since each K is ﬁnite, u(k) j Non-idempotent intersection types in logical form 203 this function u is almost injective. By inductive hypothesis there is a formula A such that A = ρ, d(A)= K and A
= g, and there is a formula B such that B = τ , d(B)= J and B
=(b ) . Then the formula A ⇒ B is well formed j j∈J u (since u is an almost injective function d(A)= K → d(B)= J ) and satisﬁes A ⇒ B = σ, d(A ⇒ B)= J and A ⇒ B
= f as contended. u u u As a consequence, for any type σ and any element a of σ (so a is a non- idempotent intersection type of kind σ), one can ﬁnd a formula A such that A = σ, d(A)= {j} (where j is an arbitrary element of I)and A
= a.Inother word, any intersection type can be represented as a formula (in inﬁnitely many diﬀerent ways in general of course, but up to renaming of indices, that is, up to “hereditary α-equivalence”, this representation is unique). For any formula A and J ⊆ I, we deﬁne a formula A such that A = A, J J d(A )= d(A) ∩ J and A
= A
. The deﬁnition is by induction on A. J J J – α[f] = α[f ] J J −1 – (A ⇒ B) =(A ⇒ B ) where K = u (d(B) ∩ J) and v = u . u J K v J K Let u : d(A) → J be a bijection (so that u(d(A)) = J ), we deﬁne a formula u (A) such that u (A) = A, d(u (A)) = u(d(A)) and u (A)
= A
−1 .The ∗ ∗ ∗ ∗ j u (j) deﬁnition is by induction on A: −1 – u (α[f]) = α[f ◦ u ] – u (A ⇒ B)=(A ⇒ u (B)). ∗ v u◦v ∗ Using these two auxiliary notions, we can give a set of three deduction rules for a minimal natural deduction allowing to prove formulas in this indexed intu- itionistic logic. This logical system allows to derive sequents which are of shape 1 u A ,...,A B (1) 1 n where for each i =1,...,n, the function u : d(A ) → d(B) is almost injective (it i i n u is not required that d(B)= u (d(A ))). Notice that the expressions A are i i i=1 not formulas; this construction A is part of the syntax of sequents, just as the “,” separating these pseudo-formulas. Given a formula A and u : d(A) → J almost u J u injective, it is nevertheless convenient to deﬁne A
∈M (A) by A
= ﬁn j [ A
| u(k)= j ]. In particular, when u is a bijection, A
=[ A
−1 ]. k j u (j) The crucial point here is that such a sequent (1)involvesno λ-term. The main diﬀerence between the original system LL(I) of [3] and the present system is the way axioms are dealt with. In LL(I) there is no explicit identity axiom and only “atomic axioms” restricted to the basic constants of LL; indeed it is well-known that in LL all identity axioms can be η-expanded, leading to proofs using only such atomic axioms. In the λ-calculus, and especially in the untyped λ-calculus we want to deal with in next sections, such η-expansions are hard to handle so we prefer to use explicit identity axioms. The axiom is j = i ⇒ d(A )= ∅ and u is a bijection j i 1 u A ,...,A u (A ) i∗ i 1 n 204 T. Ehrhard so that for j = i, the function u is empty. A special case is j = i ⇒ d(A )= ∅ and u is the identity function j i 1 u A ,...,A A 1 n which may look more familiar, but the general axiom rule, allowing to “delocalize” the proven formula A by an arbitrary bijection u , is required as we shall see. i i The ⇒ introduction rule is quite simple 1 u u A ,...,A ,A B 1 n u u 1 n A ,...,A A ⇒ B Last the ⇒ elimination rule is more complicated (from a Linear Logic point of view, this is due to the fact that it combines 3 LL logical rules: elimination, contraction and promotion). We have the deduction u v 1 u 1 v n n C ,...,C A ⇒ BD ,...,D A 1 n 1 n w w 1 n E ,...,E B under the following conditions, to be satisﬁed by the involved formulas and functions: for each i =1,...,n one has d(C )∩d(D )= ∅, d(E )= d(C )+d(D ), i i i i i C = E , D = E , w = u ,and w = u ◦ v . i i d(C ) i i d(D ) i d(C ) i i d(D ) i i i i i 1 u Let π be a deduction tree of the sequent A ,...,A B in this system. 1 n By dropping all index information we obtain a derivation tree π of A ,...,A 1 n → − B, and, upon choosing a sequence x of n pairwise distinct variables, we can associate with this derivation tree a simply typed λ-term π→ − which satisﬁes x : A ,...,x : A π→ − : B. 1 1 n n 3.3 Basic properties of LJ(I) We prove some basic properties of this logical system. This is also the opportunity to get some acquaintance with it. Notice that in many places we drop the type annotations of variables in λ-terms, ﬁrst because they are easy to recover, and second because the very same results and proofs are also valid in the untyped setting of Section 4. Lemma 1 (Weakening). Assume that Φ A is provable by a proof π and let B be a formula such that d(B)= ∅. Then Φ A is provable by a proof π ,where d(A) → − Φ is obtained by inserting B at any place in Φ. Moreover π→ − = π (where → − → − x is obtained from x by inserting a dummy variable at the same place). The proof is an easy induction on the proof of Φ A. i n Lemma 2 (Relocation). Let π be a proof of (A ) A let u : d(A) → J be i i=1 u◦u i n a bijection, there is a proof π of (A ) u (A) such that π → − = π→ − . i i=1 x x The proof is a straightforward induction on π. u n Lemma 3 (Restriction). Let π be a proof of (A ) A and let J ⊆ d(A). i i=1 −1 For i =1,...,n,let K = u (J) ⊆ d(A ) and u = u : K → J . Then the i i i i K i i i u n sequent ((A ) ) A has a proof π such that π → − = π→ − . i K J i i=1 x x Non-idempotent intersection types in logical form 205 uj Proof. By induction on π. Assume that π consists of an axiom (A ) u (A ) i i j=1 ∗ with d(A )= ∅ if j = i,and u a bijection. With the notations of the lemma, j i K = ∅ for j = i and u is a bijection K → J.Moreover u (A )= u (A ) j i i K i i J i i i ∗ u n so that ((A ) ) A is obtained by an axiom π with π → − = x = π→ − . i K J i i i=1 x x Assume that π ends with a ⇒-introduction rule: u n+1 (A ) B i i=1 i n (A ) A ⇒ B n+1 u n+1 i i=1 → − with A =(A ⇒ B), and we have π = λx ρ .Withthe no- n+1 u n+1 → − n+1 x x,x n+1 tations of the lemma we have A =(A ⇒ B ). By inductive J n+1 K u J n+1 n+1 n+1 hypothesis there is a proof ρ of (A ) B such that ρ = ρ i J → − → − K i=1 x,x x,x n+1 n+1 → − and hence we have a proof π of (A ) A with π = λx ρ = i J n+1 → − K i=1 i x,x n+1 π→ − as contended. Assume last that π ends with a ⇒-elimination rule: μ ρ v w i n i n (B ) B ⇒ A (C ) B i=1 i=1 i i i n (A ) A i i=1 with d(A )= d(B )+ d(C ), B = A and C = A , u = v i i i i i i i i i d(B ) d(C ) d(B ) i i i → − and u = v ◦ w for i =1,...,n, and of course π = μ ρ .Let i i → − → − d(C ) x x x −1 −1 −1 L = v (J) ⊆ d(B).Let L = v (J) and R = w (L) for i =1,...,n (we i i i i also set v = v , w = w and v = v ). By inductive hypothesis, we have i L i R L i i i i i n aproof μ of (B ) B ⇒ A such that μ = μ and a proof ρ i L v J → − → − L i=1 x x i n −1 of (C ) B such that ρ = ρ .Now,setting K = u (K), observe i L → − → − i i i=1 i x x that – d(B ) ∩ K = L = d(B ) and u = v since u = v i i i i L i L i d(B ) i i i i i −1 −1 – d(C ) ∩ K = R = d(C ) ∩ w (L) since u = v ◦ w and L = v (J), i i i i i i d(C ) i hence d(C ) ∩ K = d(C ), and also u = v ◦ w . i i i R i L i i i It follows that d(A )= L + R , and, setting u = u ,wehave u = v i K i i i K L i i i i i i and u = v ◦ w . Hence we have a proof π of (A ) A such that R i J i i i K i=1 π → − = μ ρ = μ ρ = π→ − as contended. → − → − → − → − x x x x x x Though substitution lemmas are usually trivial, the LJ(I) substitution lemma requires some care in its statement and proof . j n Lemma 4 (Substitution). Assume that (A ) A withaproof μ and j j=1 n−1 that, for some i ∈{1,...,n}, (B ) A with a proof ρ. Then there is a j j=1 j n−1 proof π of (C ) A such that π → − = μ ρ /x as soon as for each → − → − i j j=1 ( x )\i x ( x )\i j =1,...,n − 1, d(C )= d(A )+ d(B ) for each j =1,...,n − 1 (remember j s(j,i) j that this requires also that d(A ) ∩ d(B )= ∅)with: s(j,i) j We use notations introduced in Section 1, especially for s(j, i). 206 T. Ehrhard – C = A and w = u j j d(A ) s(j,i) d(A ) s(j,i) s(j,i) s(j,i) – C = B and w = u ◦ v . j d(B ) j j d(B ) i j j j Proof. By induction on the proof μ. Assume that μ is an axiom, so that there is a k ∈{1,...,n} such that A = u (A ), u is a bijection and d(A )= ∅ for all k∗ k k j j = k.Inthatcasewehave μ = x . There are two subcases to consider. Assume → − k u ◦v i j n−1 ﬁrst that k = i.ByLemma 2 there is a proof ρ of (B ) u (A ) such i∗ i j j=1 that ρ = ρ .Wehave C = B and w = u ◦ v for j =1,...,n − 1, → − → − j j j i j ( x )\i ( x )\i j n−1 so that ρ is a proof of (C ) A,sowetake π = ρ and equation π → − = j j=1 ( x )\i μ ρ /x holds since μ = x . Assume next that k = i,then d(A )= ∅ → − → − → − i i i x ( x )\i x and hence d(B )= ∅ (and v =0 )for j =1,...,n − 1.Therefore C = A j j ∅ j s(j,i) j n−1 and w = v for j =1,...,n − 1. So our target sequent (C ) A can j s(j,i) j j=1 s(j,i) n−1 also be written (A ) u (A ) andisprovablebyaproof π such that k∗ k j=1 s(j,i) π → − = x as contended. ( x )\i Assume now that μ is a ⇒-intro, that is A =(A ⇒ A ) and μ is n+1 u n+1 j n+1 (A ) A j j=1 (A ) A j j=1 We set B = A and of course v =0 .Thenwehaveaproof ρ of n n+1 n+1 ∅ d(A) (B ) A such that ρ = ρ by Lemma 1.Weset C = A i → − → − n n+1 j j=1 ( x )\i,x ( x )\i n+1 and w = u . Then by inductive hypothesis applied to θ we have a proof n n+1 0 n 0 → − → − π of (C ) A which satisﬁes π = θ ρ /x and → − i j j=1 ( x )\i,x x,x n+1 n+1 ( x )\i n−1 applying a ⇒-introduction rule we get a proof π of (C ) A such that j j=1 π → − = λx (θ→ − ρ /x )= μ ρ /x as expected. n+1 → − i → − → − i ( x )\i x,x n+1 ( x )\i x ( x )\i Assume last that the proof μ ends with s t j j n n (E ) E ⇒ A (F ) E j j=1 j j=1 (A ) A j j=1 with d(A )= d(E )+ d(F ), A = E , A = F , u = s j j j j d(E ) j j d(F ) j j d(E ) j j j j and u = s ◦ t ,for j =1,...,n. And we have μ = ϕ ψ .The j j → − → − → − d(F ) x x x idea is to “share” the substituting proof ρ of (B ) A among ϕ and ψ j j=1 according to what they need, as speciﬁed by the formulas E and F .Sowewrite i i −1 −1 d(B )= L +R where L = v (d(E )) and R = v (d(F )) and by Lemma 3 j j j j j i j j i L R v v j n−1 j n−1 we have two proofs ρ of (B ) E and (B ) F whereweset j i j i L j=1 R j=1 j j L R v = v and v = v , obtained from ρ by restriction. These proofs satisfy j L j R j j j j L R ρ = ρ = ρ . → − → − → − ( x )\i ( x )\i ( x )\i Non-idempotent intersection types in logical form 207 Now we want to apply the inductive hypothesis to ϕ and ρ ,inorder to get j n−1 a proof of the sequent (G ) E ⇒ A where G = C (observe s j j d(E )+L j j=1 j s(j,i) indeed that d(E ) ⊆ d(A ) and L ⊆ d(B ) and hence are disjoint by our s(j,i) s(j,i) j j assumption that d(C )= d(A )+ d(B ))and w = w .With j s(j,i) j j d(E )+L j j s(j,i) these deﬁnitions, and by our assumptions about C and w , we have for all j j j =1,...,n − 1 G = C = A = E j d(E ) j d(A ) d(E ) s(j,i) d(E ) s(j,i) s(j,i) s(j,i) s(j,i) s(j,i) w = w = u = s j d(E ) d(A ) d(E ) s(j,i) d(E ) s(j,i) s(j,i) s(j,i) s(j,i) s(j,i) G = C = B j L j d(B ) L j L j j j j L L L w = w =(u ◦ v ) = u ◦ v = s ◦ v . L j d(B ) L i j L i d(E ) i j j j j j i j j n−1 Therefore the inductive hypothesis applies yielding a proof ϕ of (G ) j j=1 E ⇒ A such that ϕ = ϕ ρ /x = ϕ ρ /x . s → − → − → − i → − → − i ( x )\i x ( x )\i x ( x )\i Next we want to apply the inductive hypothesis to ψ and ρ ,inorder to j n−1 get a proof of the sequent (H ) E where, for j =1,...,n − 1, H = j j=1 C (again d(F ) ⊆ d(A ) and R ⊆ d(B ) are disjoint by our j d(F )+R s(j,i) s(j,i) j j s(j,i) j assumption that d(C )= d(A )+ d(B ))and r is deﬁned by r = j s(j,i) j j j d(F ) s(j,i) R R t and r = t ◦ v . Remember indeed that v : R → d(F ) and t : j R i j i i s(j,i) j j j d(F ) → d(E).Wehave H = C = A = F j j d(F ) d(A ) d(F ) s(j,i) d(F ) s(j,i) s(j,i) s(j,i) s(j,i) s(j,i) H = C = B j R j d(B ) R j R j j j j j n−1 and hence by inductive hypothesis there is a proof ψ of (H ) E such that j j=1 ψ = ψ ρ /x = ψ ρ /x . → − → − → − i → − → − i ( x )\i x ( x )\i x ( x )\i To end the proof of the lemma, it will be suﬃcient to prove that we can apply j j n−1 n−1 a ⇒-elimination rule to the sequents (G ) E ⇒ A and (H ) E j j=1 j j=1 j n−1 in ordertoget aproof π of the sequent (C ) A. Indeed, the proof π j j=1 obtained in that way will satisfy π → − = ϕ ψ = μ ρ /x . → − → − → − → − i ( x )\i ( x )\i ( x )\i x ( x )\i Let j ∈{1,...,n−1}.Wehave C = G and C = H simply because j d(G ) j j d(H ) j j j G and H are deﬁned by restricting C .Moreover d(G )= d(E )+ L and j j j j s(j,i) j d(H )= d(F )+ R . Therefore d(G ) ∩ d(H )= ∅ and j s(j,i) j j j d(C )= d(A )+ d(B )= d(E )+ d(F )+ L + R = d(G )+ d(H ) . j s(j,i) j s(j,i) s(j,i) j j j j L L We have w = w by deﬁnition of w as w .Wehave j d(G ) j d(E )+L j j j s(j,i) j w = w = u j d(H ) d(F ) j d(A ) d(F ) s(j,i) d(F ) j s(j,i) s(j,i) s(j,i) s(j,i) = s ◦ t =(s ◦ r ) s(j,i) d(F ) s(j,i) w = w =(u ◦ v ) j d(H ) R j d(B ) R i j R j j j j j R R = u ◦ v = s ◦ t ◦ v = s ◦ r =(s ◦ r ) i d(F ) i j R j R i j j j j 208 T. Ehrhard and therefore w = s ◦ r as required. j j d(H ) We shall often use the two following consequences of the Substitution Lemma. j n v Lemma 5. Given a proof μ of (A ) A and a proof ρ of B A (for j j=1 u u j i−1 j ui◦v n some i ∈{1,...,n}), there is a proof π of (A ) ,B , (A ) A such j j=1 j j=i+1 → − that π = μ ρ /x → − i x x u u j d(A) j i n Proof. By weakening we have a proof μ of (A ) ,B , (A ) A j j=1 j j=i+1 → − such that μ = μ (where x is a list of pairwise distinct variables of → − → − x ( x )\i+1 0 0 d(A ) d(A ) i i v i n length n+1), as well as a proof ρ of (A ) ,B , (A ) A such j j i j=1 j=i+1 ∅ ∅ u u j i−1 j u ◦v n that ρ = ρ .ByLemma 4,wehaveaproof π of (A ) ,B , (A ) → − j j=1 j j=i+1 x x i+1 → − A which satisﬁes π = μ ρ /x = μ ρ /x . → − → − i → − i ( x )\i x ( x )\i x x v n Lemma 6. Given a proof μ of A B and a proof ρ of (A ) A,thereis j j=1 v◦u → − aproof π of (A ) B such that π = μ ρ /x . → − j=1 x x x The proof is similar to the previous one. If A and B are formulas such that A = B, d(A)= d(B) and A
= B
,we say that A and B are similar and we write A ∼ B. One fundamental property of our deduction system is that two formulas which represent the same family of intersection types are logically equivalent. Id Theorem 1. If A ∼ B then A B with a proof π such that π ∼ x. Id Proof. Assume that A = α[f],thenwehave B = A and A B is an axiom. Assume that A =(C ⇒ D) and B =(E ⇒ F ).Wehave D ∼ F and u v Id hence D F with a proof ρ such that ρ ∼ x. And there is a bijection w : d(E) → d(C) such that w (E) ∼ C and u ◦ w = v. By inductive hypothesis Id we have a proof μ of w (E) C such that μ ∼ y, and hence using the axiom ∗ η w w E w (E) and Lemma 5 we have a proof μ of E C such that μ = μ . x x 1 Id u 1 There is a proof π of (C ⇒ D) ,C D such that π =(x) y (consider x,y 0 0 d(D) d(C) Id Id the two axioms (C ⇒ D) ,C C ⇒ D and (C ⇒ D) ,C C u u u ∅ ∅ and use a ⇒-elimination rule). So by Lemma 5 there is a proof π of (C ⇒ Id u◦w Id v 2 D) ,E D,thatisof (C ⇒ D) ,E D, such that π =(x) μ . x,y 3 Id v 3 Applying Lemma 6 we get a proof π of (C ⇒ D) ,E F such that π = x,y ρ (x) μ /z . We get the expected proof π by a ⇒-introduction rule so that z y π = λy ρ (x) μ /z . By inductive hypothesis π ∼ x. x x z y Non-idempotent intersection types in logical form 209 3.4 Relation between intersection types and LJ(I) Now we explain the precise connection between non-idempotent intersection types and our logical system LJ(I). This connection consists of two statements: – the ﬁrst one means that any proof of LJ(I) can be seen as a typing derivation in non-idempotent intersection types (soundness) – and the second one means that any non-idempotent intersection typing can be seen as a derivation in LJ(I) (completeness). i n Theorem 2 (Soundness). Let π be a deduction tree of the sequent (A ) i i=1 → − B and x asequenceof n pairwise distinct variables. Then the λ-term π→ − sat- u n isﬁes (x : A
: A ) π→ − : B
: B in the intersection type system, for i j i j i i=1 x each j ∈ d(B). Proof. We prove the ﬁrst part by induction on π (in the course of this induction, we recall the precise deﬁnition of π→ − ). If π is the proof q = i ⇒ d(A )= ∅ and u is a bijection q i (A ) u (A ) q i∗ i q=1 q u → − (so that B = u (A ))then π = x .Wehave A
=[ ] if q = i, A
= i∗ i x i j j q n [ A
−1 ] and u (A )
= A
−1 . It follows that (x : A
: A ) i i i j i q q j q u (j) ∗ u (j) q=1 i i x : B
: B is a valid axiom in the intersection type system. i j Assume that π is the proof 1 u u A ,...,A ,A B 1 u A ,...,A A ⇒ B 1 n where π is the proof of the premise of the last rule of π. By inductive hypothesis 0 i n u 0 the λ-term π → − satisﬁes (x : A
: A ) ,x : A
: A π → − : B
: B i j i j j x,x i i=1 x,x i n A 0 u → − from which we deduce (x : A
: A ) λx π :( A
, B
): A ⇒ B i j i j j i i=1 x,x A 0 i → − → − which is the required judgment since π = λx π and ( A
, B
)= j j x x,x A ⇒ B
as easily checked. u j Assume last that π ends with 1 2 π π u v 1 u 1 v n n C ,...,C A ⇒ B D ,...,D A 1 n 1 n w1 w E ,...,E B with: for each i =1,...,n there are two disjoint sets L and R such that i i d(E )= L + R , C = E , D = E , w = u ,and w = u ◦ v . i i i i i L i i R i L i i R i i i i i u n Let j ∈ d(B). By inductive hypothesis, the judgment (x : C
: C ) i j i i i=1 π → − : A ⇒ B
: A ⇒ B is derivable in the intersection type system. Let K = u j j −1 u ({j}), which is a ﬁnite subset of d(A). By inductive hypothesis again, for 210 T. Ehrhard i n 2 → − each k ∈ K we have (x : D
: D ) π : A
: A . Now observe that j i k i k i=1 x A ⇒ B
=([ A
| k ∈ K ], B
) so that u j k j j u u n 1 2 i i (x : C
+ D
: E ) π → − π → − : B
: B i j k i j i i i=1 x x k∈K is derivable in intersection types (remember that C = D = E ). Since π→ − = i i i 1 2 π → − π → − it will be suﬃcient to prove that x x w u v i i i E
= C
+ D
. (2) j j k i i i k∈K For this, since E
=[ E
| w (l)= j ], consider an element l of d(E ) such j i l i i that w (l)= j. There are two possibilities: (1) either l ∈ L and in that case we i i know that E
= C
since E = C and moreover we have u (l)= w (l)= j i l i l i L i i i (2) or l ∈ R .Inthatcasewehave E
= D
since E = D .Moreover i i l i l i R i u(v (l)) = w (l)= j and hence v (l) ∈ K . Therefore i i i j [ E
| l ∈ L and w (l)= j ]=[ C
| u (l)= j ]= C i l i i i l i j [ E
| l ∈ R and w (l)= j ]=[ D
| v (l) ∈ K ]= D i l i i i l i j k k∈K and (2) follows. Theorem 3 (Completeness). Let J ⊆ I.Let M be a λ-term and x ,...,x 1 n be pairwise distinct variables, such that (x : m : σ ) M : b : τ in the i i j i i=1 intersection type system for all j ∈ J.Let A ,...,A and B be formulas and 1 n let u ,...,u be almost injective functions such that u : d(A ) → J = d(B). 1 n i i Assume also that A = σ for each i =1,...,n and that B = τ . Last assume i i u j that, for all j ∈ J , one has B
= b and A
= m for i =1,...,n. Then j j j i i i n the judgment (A ) B has a proof π such that π→ − ∼ M . i i=1 x Proof. By induction on M . Assume ﬁrst that M = x for some i ∈{1,...,n}. Then we must have τ = σ , m =[ ] for q = i and m =[ b ] for all j ∈ J . i j q i Therefore d(A )= ∅ and u is the empty function for q = i, u is a bijection q q i d(A ) → J and ∀k ∈ d(A ) A
= b ,inother words u (A ) ∼ B.By i i i k u (k) i∗ i Id Theorem 1 we know that the judgment (u (A )) B is provable in LJ(I) with i∗ i u n aproof ρ such that ρ ∼ x.Wehaveaproof θ of (A ) u (A ) which η i∗ i i i=1 → − consists of an axiom so that θ = x and hence by Lemma 6 we have a proof π u n → − → − of (A ) B such that π = ρ [θ /x] ∼ x . η i x x i i=1 Assume that M = λx N,that τ =(σ ⇒ ϕ) andthatwehaveafam- n j ily of deductions (for j ∈ J)of (x : m : σ ) M :(m ,c ): σ ⇒ ϕ with i i j i=1 b =(m ,c ) and the premise of this conclusion in each of these deductions is j j n j (x : m : σ ) ,x : m : σ N : c : ϕ. We must have B =(C ⇒ D) with i i j u i i=1 D = ϕ, C = σ, d(D)= J , u : d(C) → d(D) almost injective, D
= c and j j Non-idempotent intersection types in logical form 211 j u j [ C
| k ∈ d(C) and u(k)= j]= m ,thatis C
= m ,for each j ∈ J . k j i n u By inductive hypothesis we have a proof ρ of (A ) ,C D such that i=1 i n ρ ∼ N from which we obtain a proof π of (A ) C ⇒ D such that → − η u i=1 x,x → − π = λx ρ ∼ M as expected. → − η x,x Assume last that M =(N) P andthatwehavea J -indexed family of deduc- tions (x : m : σ ) M : b : τ. Let A ,...,A , u ,...,u and B be LJ(I) i i j 1 n 1 n i i=1 formulas and almost injective functions as in the statement of the theorem. j,0 j,l Let j ∈ J . There is a ﬁnite set L ⊆ I and multisets m , (m ) such j l∈L i i j j,0 j 7 n that we have deductions of (x : m : σ ) N :([ a | l ∈ L ],b ): σ ⇒ τ i i j j i i=1 l j,l j and, for each l ∈ L ,of (x : m : σ ) P : a : σ with j i i i i=1 l j j,0 j,l m = m + m . (3) i i i l∈L We assume the ﬁnite sets L to be pairwise disjoint (this is possible because I is inﬁnite) and we use L for their union. Let u : L → J be the function which maps l ∈ L to the unique j such that l ∈ L , this function is almost injective. u(l) Let A be an LL(J) formula such that A = σ, d(A)= L and A
= a ; such a formula exists by Proposition 1. Let i ∈{1,...,n}.Foreach j ∈ J we know that j j,0 j,l [ A
| r ∈ d(A ) and u (r)= j ]= m = m + m i r i i i i i l∈L j,0 −1 and hence we can split the set d(A ) ∩ u ({j}) into disjoint subsets R and i i j,l (R ) in such a way that l∈L j,0 j,0 j,l j,l [ A
| r ∈ R ]= m and ∀l ∈ L [ A
| r ∈ R ]= m . i r j i r i i i i j,0 j,0 We set R = R ; observe that this is a disjoint union because R ⊆ j∈J i i u(l),l −1 1 u ({j}). Similarly we deﬁne R = R which is a disjoint union for i i l∈L j,l j,l the following reason: if l, l ∈ L satisfy u(l)= u(l )= j then R and R i i have been chosen disjoint and if u(l)= j and u(l )= j with j = j we have j,l j ,l −1 −1 1 R ⊆ u {j} and R ⊆ u ({j }).Let v : R → L be deﬁned by: v (r) is i i i i i i u(l),l j,l the unique l ∈ L such that r ∈ R . Since each R is ﬁnite the function v is i i almost injective. Moreover u ◦ v = u 1 . i i 0 0 We use u for the restriction of u to R so that u : R → J.Byinduc- i i i i u n tive hypothesis we have ((A 0) ) A ⇒ B with a proof μ such that i u R i=1 j,0 μ ∼ N . Indeed [ A 0
| r ∈ R and u (r)= j ]= m and A ⇒ B
= → − η i R r u j i i i x i v n ([ a | u(l)= j ],b ) for each j ∈ J . For the same reason we have ((A 1) ) j i R l i=1 A with a proof ρ such that ρ ∼ P . Indeed for each l ∈ L = d(A) we have → − Notice that our λ-calculus is in Church style and hence the type σ is uniquely determined by the sub-term N of M . 212 T. Ehrhard j,l j [ A 1
| v (r)= l]= m and A
= a where j = u(l). By an application i r i l i l i n → − rule we get a proof π of (A ) B such that π = μ ρ ∼ (N) P = M → − → − η i=1 x x x as contended. 4 The untyped Scott case Since intersection types usually apply to the pure λ-calculus, we move now to this setting by choosing in Rel the set R as model of the pure λ-calculus. The ! ∞ R intersection typing system has the elements of R as types, and the typing ∞ ∞ rules involve sequents of shape (x : m ) M : a where m ∈M (R ) and i i i ﬁn ∞ i=1 a ∈ R . We use Λ for the set of terms of the pure λ-calculus, and Λ as the pure λ- calculus extended with a constant Ω subject to the two following reduction rules: λx Ω Ω and (Ω) M Ω. We use ∼ for the least congruence on Λ ω ω ηω Ω which contains and and similarly for ∼ . We deﬁne a family (H(x)) η ω βηω x∈V → − → − of subsets of Λ minimal such that, for any sequence x =(x ,...,x ) and y = Ω 1 n → − → − (y ,...,y ) such that x, y is repetition-free, and for any terms M ∈H(x ) (for 1 k i i → − → − i =1,...,n), one has λxλ y (x) M ··· M O ··· O ∈H(x) where O ∼ Ω 1 n 1 l j ω for j =1,...,l.Noticethat x ∈H(x). The typing rules of R are Φ, x : m M : a x :[ ],...,x :[ a ],...,x :[ ] x : a 1 i n i Φ λx M :(m, a) Φ M :([ a ,...,a ],b)(Φ N : a ) 1 k j j j=1 Φ + Φ (M) N : b j=1 where we use the following convention: when we write Φ + Ψ it is assumed that n n Φ is of shape (x : m ) and Ψ is of shape (x : p ) ,and then Φ + Ψ is i i i i i=1 i=1 (x : m + p ) . This typing system is just a “proof-theoretic” rephrasing of the i i i i=1 denotational semantics of the terms of Λ in R . Ω ∞ → − Proposition 2. Let M, M ∈ Λ and x =(x ,...,x ) be a list of pairwise dis- Ω 1 n tinct variables containing all the free variables of M and M .Let m ∈M (R ) i ﬁn ∞ for i =1,...,n and b ∈ R .If M ∼ M then (x : m ) M : b iﬀ ∞ βηω i i i=1 (x : m ) M : b. i i i=1 4.1 Formulas We deﬁne the associated formulas as follows, each formula A being given together d(A) with d(A) ⊆ I and A
∈ R . – If J ⊆ I then ε is a formula with d(ε )= J and ε
= e for j ∈ J J J J j – and if A and B are formulas and u : d(A) → d(B) is almost injective then A ⇒ B is a formula with d(A ⇒ B)= d(B) and A ⇒ B
= u u u j ([ A
| u(k)= j ], B
) ∈ R . k j ∞ Non-idempotent intersection types in logical form 213 We can consider that there is a type o of pure λ-terms interpreted as R in Rel , such that (o ⇒ o)= o, and then for any formula A we have A = o. Operations of restriction and relocation of formulas are the same as in Sec- tion 3 (setting ε = ε ) and satisfy the same properties, for instance J K J∩K A
= A
and one sets u (ε )= ε if u : J → K is a bijection. K K ∗ J K The deduction rules are exactly the same as those of Section 3, plus the axiom u → − i n ε . With any deduction π of (A ) B and sequence x =(x ,...,x ) of ∅ 1 n i i=1 pairwise distinct variables, we can associate a pure π→ − ∈ Λ deﬁned exactly as in Section 3 (just drop the types associated with variables in abstractions). If π consists of an instance of the additional axiom, we set π→ − = Ω. Lemma 7. Let A, A ,...,A be a formula such that d(A)= d(A )= ∅. Then 1 n i ∅ n (A ) A is provable by a proof π which satisﬁes π ∼ Ω. i i=1 x1,...,x The proof is a straightforward induction on A using the additional axiom, Lemma 1 and the observations that if d(B ⇒ C)= ∅ then u =0 . u ∅ One can easily deﬁne a size function sz : R → N such that sz(e)=0 and sz([ a ,...,a ],a)= sz(a)+ (1 + sz(a )). First we have to prove an adapted 1 k i i=1 version of Proposition 1; here it will be restricted to ﬁnite sets. Proposition 3. Let J be a ﬁnite subset of I and f ∈ R . There is a formula A such that d(A)= J and A
= f . Proof. Observe that, since J is ﬁnite, there is an N ∈ N such that ∀j ∈ J ∀q ∈ N q ≥ N ⇒ f(j) =[ ] (remember that f(j) ∈M (R ) ). Let N(f) be the q ﬁn ∞ least such N.Weset sz(f)= sz(f(j)) and the proof is by induction on j∈J (sz(f),N(f)) lexicographically. If sz(f)=0 this means that f(j)= e for all j ∈ J and hence we can take A = ε . Assume that sz(f) > 0,one canwrite f(j)=(m ,a ) with J j j m ∈M (R ) and a ∈ R for each j ∈ J . Just as in the proof of Proposition 1 j ﬁn ∞ j ∞ we choose a set K, a function g : K → R and an almost injective function u : K → J such that m =[ g(k) | u(k)= j ]. The set K is ﬁnite since J is and we have sz(g) < sz(f) because sz(f) > 0. Therefore by inductive hypothesis there is a formula B such that d(B)= K and B
= g.Let f : J → R deﬁned by f (j)= a ,wehave sz(f ) ≤ sz(f) and N(f ) <N(f) and hence by inductive hypothesis there is a formula C such that C
= f.Weset A =(B ⇒ C) which satisﬁes A
= f as required. Theorem 1 still holds up to some mild adaptation. First notice that A ∼ B simply means now that d(A)= d(B) and A
= B
. Id Theorem 4. If A and B are such that A ∼ B then A B with a proof π which satisﬁes π ∈H(x). This is also possible if sz(f)=0 actually. 214 T. Ehrhard Proof. By induction on the sum of the sizes of A and B. Assume that A = ε so that d(B)= J and ∀j ∈ J B
= e. There are two cases as to B.Inthe ﬁrst case B is of shape ε but then we must have K = J and we can take for π an axiom so that π = x ∈H(x). Otherwise we have B =(C ⇒ D) with d(D)= J , ∀j ∈ J D
= e and d(C)= ∅,sothat u =0 .Wehave A ∼ D and j J Id hence by inductive hypothesis we have a proof ρ of A D such that ρ ∈H(x). Id By weakening and ⇒-introduction we get a proof π of A B which satisﬁes π = λy ρ ∈H(x). Assume that A =(C ⇒ D).If B = ε then we must have d(C)= ∅, u =0 u J J Id and D ∼ B and hence by inductive hypothesis we have a proof ρ of D B such that ρ ∈H(x).ByLemma 7 there is a proof θ of C such that θ ∼ Ω. Id Hence there is a proof π of A B such that π = ρ [(x) θ/y] ∈H(x). Assume last that B =(E ⇒ F ), then we must have D ∼ F and there must be a bijection w : d(E) → d(C) such that u ◦ w = v and w (E) ∼ C.Wereason Id as in the proof of Lemma 1: by inductive hypothesis we have a proof ρ of D F Id Id and a proof μ of w (E) C from which we build a proof π of A B such that π = λy ρ (x) μ /z ∈H(x) by inductive hypothesis. z y u u 1 n Theorem 5 (Soundness). Let π be a deduction tree of A ,...,A B and 1 n → − → − x asequenceof n pairwise distinct variables. Then the λ-term π ∈ Λ satisﬁes u n → − (x : A
) π : B
in the R intersection type system, for each j ∈ i j j ∞ i i=1 d(B). The proof is exactly the same as that of Theorem 2, dropping all simple types. For all λ-term M ∈ Λ, we deﬁne H (M) as the least subset of element of Λ such that: – if O ∈ Λ and O ∼ Ω then O ∈H (M) for all M ∈ Λ Ω ω Ω – if M = x then H(x) ⊆H (M) – if M = λy N and N ∈H (N) then λy N ∈H (M) Ω Ω – if M =(N) P , N ∈H (N) and P ∈H (P ) then (N ) P ∈H (M). Ω Ω Ω The elements of H (M) can probably be seen as approximates of M . Theorem 6 (Completeness). Let J ⊆ I be ﬁnite.Let M ∈ Λ and x ,...,x Ω 1 n be pairwise distinct variables, such that (x : m ) M : b in the R inter- i j ∞ i i=1 section type system for all j ∈ J.Let A ,...,A and B be formulas and let 1 n u ,...,u be almost injective functions such that u : d(A ) → J = d(B).As- 1 n i i sume also that, for all j ∈ J , one has B
= b and A
= m for i =1,...,n. j j j i i 1 u Then the judgment A ,...,A B has a proof π such that π→ − ∈H (M). 1 n x The proof is very similar to that of Theorem 3. 5 Concluding remarks and acknowledgments The results presented in this paper show that, at least in non-idempotent inter- section types, the problem of knowing whether all elements of a given family of Non-idempotent intersection types in logical form 215 intersection types (a ) are inhabited by a common λ-term can be reformu- j j∈J lated logically: is it true that one (or equivalently, any) of the indexed formulas A such that d(A)= J and ∀j ∈ A
= a is provable in LJ(I)? Such a strong con- j j nection between intersection and Indexed Linear Logic was already mentioned in the introduction of [2], but we never made it more explicit until now. To conclude we propose a typed λ-calculus àlaChurch to denote proofs of the LJ(I) system of Section 4. The syntax of pre-terms is given by s,t... := u u x[J] | λx : A s | (s) t where in x[J], x is a variable and J ⊆ I and, in λx : A s, u is an almost injective function from d(A) to a set J ⊆ I. Given a pre-term s and a variable x,the domain of x in s is the subset dom(x, s) of I given by dom(x, x[J]) = J , dom(x, y[J]) = ∅ if y = x, dom(x, λy : A s)= dom(x, s) (assuming of course y = x)and dom(x, (s) t)= dom(x, s) ∪ dom(x, t).Then a pre-term s is a term if any subterm of t which is of shape (s ) s satisﬁes 1 2 dom(x, s )∩dom(x, s )= ∅ for all variable x. A typing judgment is an expression 1 2 i n (x : A ) s : B where the x ’s are pairwise distinct variables, s is a term i i i i=1 and each u is an almost injective function d(A ) → d(B). The following typing i i rules exactly mimic the logical rules of LJ(I): d(A)= ∅ ∅ n ((x : A ) ) Ω : A i i=1 i n u q = i ⇒ d(A )= ∅ and u bijection (x : A ) ,x : A s : B i i i i=1 q u n i n u (x : A ) x [d(A )] : u (A ) (x : A ) λx : A s : A ⇒ B q q i i i i i u q=1 ∗ i i=1 v w i n i n (x : A ) s : A ⇒ B (x : A ) t : A i i u i i i=1 i=1 dom(x ,s) dom(x ,t) i i v +(u◦w ) i i n (x : A ) (s) t : B i=1 The properties of this calculus, and more speciﬁcally of its β-reduction, and its connections with the resource calculus of [9] will be explored in further work. Another major objective will be to better understand the meaning of LJ(I) formulas, using ideas developed in [3]where a phase semantics is introduced and related to (non-uniform) coherence space semantics. In the intuitionistic present setting, it is tempting to look for Kripke-like interpretations with the hope of generalizing indexed logic beyond the (perhaps too) speciﬁc relational setting we started from. Last, we would like to thank Luigi Liquori and Claude Stolze for many helpful discussions on intersection types and the referees for their careful reading and insightful comments and suggestions. References 1. F. Breuvart, G. Manzonetto, and D. Ruoppolo. Relational graph models at work. Logical Methods in Computer Science, 14(3), 2018. 2. A. Bucciarelli and T. Ehrhard. On phase semantics and denotational semantics in multiplicative-additive linear logic. Annals of Pure and Applied Logic, 102(3):247– 282, 2000. 216 T. Ehrhard 3. A. Bucciarelli and T. Ehrhard. On phase semantics and denotational semantics: the exponentials. Annals of Pure and Applied Logic, 109(3):205–241, 2001. 4. M. Coppo and M. Dezani-Ciancaglini. An extension of the basic functionality theory for the λ-calculus. Notre Dame Journal of Formal Logic, 21(4):685–693, 5. M. Coppo, M. Dezani-Ciancaglini, and B. Venneri. Functional characters of solv- able terms. Mathematical Logic Quarterly, 27(2-6):45–58, 1981. 6. D. de Carvalho. Execution time of lambda-terms via denotational semantics and intersection types. CoRR, abs/0905.4251, 2009. 7. D. de Carvalho. Execution time of λ-terms via denotational semantics and inter- section types. MSCS, 28(7):1169–1203, 2018. 8. T. Ehrhard. The Scott model of linear logic is the extensional collapse of its relational model. Theoretical Computer Science, 424:20–45, 2012. 9. T. Ehrhard and L. Regnier. Uniformity and the Taylor expansion of ordinary lambda-terms. Theoretical Computer Science, 403(2-3):347–372, 2008. 10. T. S. Freeman and F. Pfenning. Reﬁnement Types for ML. In D. S. Wise, editor, Proceedings of the ACM SIGPLAN’91 Conference on Programming Language De- sign and Implementation (PLDI), Toronto, Ontario, Canada, June 26-28, 1991, pages 268–277. ACM, 1991. 11. J.-Y. Girard. Normal functors, power series and the λ-calculus. Annals of Pure and Applied Logic, 37:129–177, 1988. 12. J. R. Hindley. Coppo-dezani types do not correspond to propositional logic. The- oretical Computer Science, 28:235–236, 1984. 13. J.-L. Krivine. Lambda-Calculus, Types and Models. Ellis Horwood Series in Com- puters and Their Applications. Ellis Horwood, 1993. Translation by René Cori from French 1990 edition (Masson). 14. L. Liquori and S. R. D. Rocca. Intersection-types à la Church. Information and Computation, 205(9):1371–1386, 2007. 15. L. Liquori and C. Stolze. The Delta-calculus: Syntax and Types. In H. Geu- vers, editor, 4th International Conference on Formal Structures for Computation and Deduction, FSCD 2019, June 24-30, 2019, Dortmund, Germany., volume 131 of LIPIcs, pages 28:1–28:20. Schloss Dagstuhl - Leibniz-Zentrum fuer Informatik, Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made. The images or other third party material in this chapter are included in the chapter’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. On Computability of Data Word Functions Deﬁned by Transducers 1,2() 1 2† L´eo Exibard , Emmanuel Filiot , and Pierre-Alain Reynier Universit´e Libre de Bruxelles, Brussels, Belgium leo.exibard@ulb.ac.be Aix Marseille Univ, Universit´edeToulon, CNRS,LIS,Marseille, France Abstract. In this paper, we investigate the problem of synthesizing computable functions of inﬁnite words over an inﬁnite alphabet (data ω-words). The notion of computability is deﬁned through Turing machines with inﬁnite inputs which can produce the corresponding inﬁnite outputs in the limit. We use non-deterministic transducers equipped with registers, an extension of register automata with outputs, to specify functions. Such transducers may not deﬁne functions but more generally relations of data ω-words, and we show that it is PSpace-complete to test whether a given transducer deﬁnes a function. Then, given a function deﬁned by some register transducer, we show that it is decidable (and again, PSpace-c) whether such function is computable. As for the known ﬁnite alphabet case, we show that computability and continuity coincide for functions deﬁned by register transducers, and show how to decide continuity. We also deﬁne a subclass for which those problems are PTime. Keywords: Data Words · Register Automata · Register Transducers · Functionality · Continuity · Computability. 1 Introduction Context Program synthesis aims at deriving, in an automatic way, a program that fulﬁls a given speciﬁcation. Such setting is very appealing when for instance the speciﬁcation describes, in some abstract formalism (an automaton or ideally a logic), important properties that the program must satisfy. The synthesised program is then correct-by-construction with regards to those properties. It is particularly important and desirable for the design of safety-critical systems with hard dependability constraints, which are notoriously hard to design correctly. Program synthesis is hard to realise for general-purpose programming lan- guages but important progress has been made recently in the automatic synthesis A version with full proofs can be found at https://arxiv.org/abs/2002.08203. Funded by a FRIA fellowship from the F.R.S.-FNRS. Research associate of F.R.S.-FNRS. Supported by the ARC Project Transform F´ed´eration Wallonie-Bruxelles and the FNRS CDR J013116F; MIS F451019F projects. Partly funded by the ANR projects DeLTA (ANR-16-CE40-0007) and Ticktac (ANR- 18-CE40-0015). The Author(s) 2020 J. Goubault-Larrecq and B. K¨ onig (Eds.): FOSSACS 2020, LNCS 12077, pp. 217–236, 2020. https://doi.org/10.1007/978-3-030-45231-5_12 218 L. Exibard et al. of reactive systems. In this context, the system continuously receives input signals to which it must react by producing output signals. Such systems are not assumed to terminate and their executions are usually modelled as inﬁnite words over the alphabets of input and output signals. A speciﬁcation is thus a set of pairs (in,out), where in and out are inﬁnite words, such that out is a legitimate output for in. Most methods for reactive system synthesis only work for synchronous systems over ﬁnite sets of input and output signals Σ and Γ . In this synchronous setting, input and output signals alternate, and thus implementations of such a speciﬁcation are deﬁned by means of synchronous transducers, which are Buc ¨ hi automata with transitions of the form (q, σ, γ, q ), expressing that in state q, when getting input σ ∈ Σ, output γ ∈ Γ is produced and the machine moves to state q . We aim at building deterministic implementations, in the sense that the output γ and state q uniquely depend on q and σ. The realisability problem of speciﬁcations given as synchronous non-deterministic transducers, by implementations deﬁned by synchronous deterministic transducers is known to be decidable [14,20]. In this paper, we are interested in the asynchronous setting, in which transducers can produce none or several outputs at once every time some input is read, i.e., transitions are of the form (q, σ, w, q ) where w ∈ Γ . However, such generalisation makes the realisability problem undecidable [2,9]. Synthesis of Transducers with Registers In the setting we just described, the set of signals is considered to be ﬁnite. This assumption is not realistic in general, as signals may come with unbounded information (e.g. process ids) that we call here data. To address this limitation, recent works have considered the synthesis of reactive systems processing data words [17,6,16,7]. Data words are inﬁnite words over an alphabet Σ ×D, where Σ is a ﬁnite set and D is a possibly inﬁnite countable set. To handle data words, just as automata have been extended to register automata, transducers have been extended to register transducers. Such transducers are equipped with a ﬁnite set of registers in which they can store data and with which they can compare data for equality or inequality. While the realisability problem of speciﬁcations given as synchronous non-deterministic register transducers (NRT ) by implementation deﬁned by synchronous deter- syn ministic register transducers (DRT ) is undecidable, decidability is recovered syn for speciﬁcations deﬁned by universal register transducers and by giving as input the number of registers the implementation must have [7,17]. Computable Implementations In the previously mentioned works, both for ﬁnite or inﬁnite alphabets, implementations are considered to be deterministic transducers. Such an implementation is guaranteed to use only a constant amount of memory (assuming data have size O(1)). While it makes sense with regards to memory- eﬃciency, some problems turn out to be undecidable, as already mentioned: realisability of NRT speciﬁcations by DRT , or, in the ﬁnite alphabet setting, syn syn when both the speciﬁcation and implementation are asynchronous. In this paper, we propose to study computable implementations, in the sense of (partial) functions f of data ω-words computable by some Turing machine M that has an inﬁnite input x ∈ dom(f), and produces longer and longer preﬁxes of the output On Computability of Data Word Functions Deﬁned by Transducers 219 f(x) as it reads longer and longer preﬁxes of the input x. Therefore, such a machine produces the output f(x) in the limit. We denote by TM the class of Turing machines computing functions in this sense. As an example, consider the function f that takes as input any data ω-word u =(σ ,d )(σ ,d ) ... and outputs 1 1 2 2 (σ ,d ) if d occurs at least twice in u, and otherwise outputs u. This function is 1 1 1 not computable, as an hypothetic machine could not output anything as long as d is not met a second time. However, the following function g is computable. It ∗ ω is deﬁned only on words (σ ,d )(σ ,d ) ... such that σ σ ··· ∈ ((a + b)c ) ,and 1 1 2 2 1 2 transforms any (σ ,d )by(σ ,d ) if the next symbol in {a, b} is an a, otherwise it i i i 1 keeps (σ ,d ) unchanged. To compute it, a TM would need to store d , and then i i 1 wait until the next symbol in {a, b} is met before outputting something. Since ∗ ω the ﬁnite input labels are necessarily in ((a + b)c ) , this machine will produce the whole output in the limit. Note that g cannot be deﬁned by any deterministic register transducer, as it needs unbounded memory to be implemented. However, already in the ﬁnite alphabet setting, the problem of deciding if a speciﬁcation given as some non-deterministic synchronous transducer is realisable by some computable function is open. The particular case of realisability by computable functions of universal domain (the set of all ω-words) is known to be decidable [12]. In the asynchronous setting, the undecidability proof of [2] can be easily adapted to show the undecidability of realisability of speciﬁcations given by non-deterministic (asynchronous) transducers by computable functions. Functional Speciﬁcations As said before, a speciﬁcation is in general a relation from inputs to outputs. If this relation is a function, we call it functional. Due to the negative results just mentioned about the synthesis of computable functions from non-functional speciﬁcations, we instead here focus on the case of functional speciﬁcations and address the following general question: given the speciﬁcation of a function of data ω-words, is this function “implementable”, where we deﬁne “implementable” as “being computable by some Turing machine”. Moreover, if it is implementable, then we want a procedure to automatically generate an algorithm that computes it. This raises another important question: how to decide whether a speciﬁcation is functional ? We investigate these questions for asynchronous register transducers, here called register transducers. This asynchrony allows for much more expressive power, but is a source of technical challenge. Contributions In this paper, we solve the questions mentioned before for the class of (asynchronous) non-deterministic register transducers (NRT). We also give fundamental results on this class. In particular, we prove that: 1. deciding whether an NRT deﬁnes a function is PSpace-complete, 2. deciding whether two functions deﬁned by NRT are equal on the intersection of their domains is PSpace-complete, 3. the class of functions deﬁned by NRT is eﬀectively closed under composition, 4. computability and continuity are equivalent notions for functions deﬁned by NRT, where continuity is deﬁned using the classical Cantor distance, 5. deciding whether a function given as an NRT is computable is PSpace-c, 220 L. Exibard et al. 6. those problems are in PTime for a subclass of NRT, called test-free NRT. Finally, we also mention that considering the class of deterministic register transducers (DRT for short) instead of computable functions as a yardstick for the notion of being “implementable” for a function would yield undecidability. Indeed, given a function deﬁned by some NRT, it is in general undecidable to check whether this function is realisable by some DRT, by a simple reduction from the universality problem of non-deterministic register automata [19]. Related Work The notion of continuity with regards to Cantor distance is not new, and for rational functions over ﬁnite alphabets, it was already known to be decidable [21]. Its connection with computability for functions of ω-words over a ﬁnite alphabet has recently been investigated in [3] for one-way and two-way transducers. Our results lift some of theirs to the setting of data words. The model of test-free NRT can be seen as a one-way non-deterministic version of a model of two-way transducers considered in [5]. 2 Data Words and Register Transducers ∗ ω For a (possibly inﬁnite) set S,wedenoteby S (resp. S ) the set of ﬁnite ∞ ∗ ω (resp. inﬁnite) words over this alphabet, and we let S = S ∪ S . For a word u = u ...u , we denote u = n its length, and, by convention, for 1 n u ∈ S , u = ∞. The empty word is denoted ε. For 1 ≤ i ≤ j ≤u, we let u[i:j]= u u ...u and u[i]= u[i:i] the ith letter of u.For u, v ∈ S ,wesay i i+1 j that u is a preﬁx of v, written u v, if there exists w ∈ S such that v = uw. −1 ∞ In this case, we deﬁne u v = w.For u, v ∈ S , we say that u and v mismatch, written mismatch(u, v), when there exists a position i such that 1 ≤ i ≤u, 1 ≤ i ≤v and u[i] = v[i]. Finally, for u, v ∈ S , we denote by u ∧ v their longest common preﬁx, i.e. the longest word w ∈ S such that w u and w v. Data Words In this paper, Σ and Γ are two ﬁnite alphabets and D is a countably inﬁnite set of data. We use letter σ (resp. γ, d) to denote elements of Σ (resp. Γ , D). We also distinguish an arbitrary data value d ∈D. Given a set R, let R R τ be the constant function deﬁned by τ (r)= d for all r ∈ R. Given a ﬁnite 0 0 alphabet A,a labelled data is a pair x =(a, d) ∈ A ×D, where a is the label and d the data. We deﬁne the projections lab(x)= a and dt(x)= d.A data word over A and D is an inﬁnite sequence of labelled data, i.e. a word w ∈ (A ×D) .We extend the projections lab and dt to data words naturally, i.e. lab(w) ∈ A and ω ω dt(w) ∈D .A data word language is a subset L ⊆ (A ×D) . Note that here, data words are inﬁnite, otherwise they are called ﬁnite data words. 2.1 Register Transducers Register transducers are transducers recognising data word relations. They are an extension of ﬁnite transducers to data word relations, in the same way register On Computability of Data Word Functions Deﬁned by Transducers 221 automata [15] are an extension of ﬁnite automata to data word languages. Here, we deﬁne them over inﬁnite data words with a Buc ¨ hi acceptance condition, and allow multiple registers to contain the same data, with a syntax close to [18]. The current data can be compared for equality with the register contents via tests, which are symbolic and deﬁned via Boolean formulas of the following form. Given R a set of registers, a test is a formula φ satisfying the following syntax: = = φ ::= |⊥| r | r | φ ∧ φ | φ ∨ φ |¬φ where r ∈ R. Given a valuation τ : R →D, a test φ and a data d, we denote by τ, d |= φ the satisﬁability of φ by d in valuation τ, deﬁned as τ, d |= r if τ(r)= d and τ, d |= r if τ(r) = d. The Boolean combinators behave as usual. We denote by Tst the set of (symbolic) tests over R. Deﬁnition 1. A non-deterministic register transducer ( NRT) is a tuple T = (Q, R, i ,F,Δ), where Q is a ﬁnite set of states, i ∈ Q is the initial state, 0 0 F ⊆ Q is the set of accepting states, R is a ﬁnite set of registers and Δ ⊆ R ∗ Q × Σ × Tst × 2 × (Γ × R) × Q is a ﬁnite set of transitions. We write σ,φ|asgn,o q − −−−−−→ q for (q, σ, φ, asgn,o,q ) ∈ Δ (T is sometimes omitted). The semantics of a register transducer is given by a labelled transition system: we deﬁne L =(C, Λ, →), where C = Q × (R →D) is the set of conﬁgurations, Λ =(Σ ×D) × (Γ ×D) is the set of labels, and we have, for all (q, τ), (q ,τ ) ∈ C (l,w) and for all (l, w) ∈ Λ, that (q, τ) − −−→ (q ,τ ) whenever there exists a transition σ,φ|asgn,o q − −−−−−→ q such that, by writing l =(σ ,d) and w =(γ ,d ) ... (γ ,d ): 1 n n – (Matching labels) σ = σ – (Compatibility) d satisﬁes the test φ ∈ Tst , i.e. τ, d |= φ. – (Update) τ is the successor register conﬁguration of τ with regards to d and asgn: τ (r)= d if r ∈ asgn, and τ (r)= τ(r) otherwise – (Output) By writing o =(γ ,r ) ... (γ ,r ), we have that m = n and for 1 1 m m all 1 ≤ i ≤ n, γ = γ and d = τ (r ). i i i Then, a run of T is an inﬁnite sequence of conﬁgurations and transitions (u ,v ) (u ,v ) 1 1 2 2 ρ =(q ,τ ) −−−−→ (q ,τ ) −−−−→ ··· . Its input is in(ρ)= u u ... , its output is 0 0 1 1 1 2 L L T T out(ρ)= v · v ... . We also deﬁne its sequence of states st(ρ)= q q ... , and its 1 2 0 1 trace tr(ρ)= u ·v ·u ·v ... . Such run is initial if (q ,τ )=(i ,τ ). It is ﬁnal if it 1 1 2 2 0 0 0 satisﬁes the Buc ¨ hi condition, i.e. inf(st) ∩ F = ∅,where inf(st)= {q ∈ Q | q = q for inﬁnitely many i}. Finally, it is accepting if it is both initial and ﬁnal. We u|v then write (q ,τ ) −−→ to express that there is a ﬁnal run ρ of T starting from 0 0 (q ,τ ) such that in(ρ)= u and out(ρ)= v. In the whole paper, and unless stated 0 0 otherwise, we always assume that the output of an accepting run is inﬁnite (v ∈ (Γ ×D) ), which can be ensured by a Buc ¨ hi condition. A partial run is a ﬁnite preﬁx of a run. The notions of input, output and states u|v are extended by taking the corresponding preﬁxes. We then write (q ,τ ) −−→ 0 0 T 222 L. Exibard et al. (q ,τ ) to express that there is a partial run ρ of T starting from conﬁguration n n (q ,τ ) and ending in conﬁguration (q ,τ ) such that in(ρ)= u and out(ρ)= v. 0 0 n n Finally, the relation represented by a transducer T is: ω ω T = (u, v) ∈ (Σ ×D) × (Γ ×D) | there exists an accepting run ρ of T such that in(ρ)= u and out(ρ)= v Example 2. As an example, consider the register transducer T depicted in rename Figure 1. It realises the following transformation: consider a setting in which we deal with logs of communications between a set of clients. Such a log is an inﬁnite sequence of pairs consisting of a tag, chosen in some ﬁnite alphabet Σ, and the identiﬁer of the client delivering this tag, chosen in some inﬁnite set of data values. The transformation should modify the log as follows: for a given client that needs to be modiﬁed, each of its messages should now be associated with some new identiﬁer. The transformation has to verify that this new identiﬁer is indeed free, i.e. never used in the log. Before treating the log, the transformation receives as input the id of the client that needs to be modiﬁed (associated with the tag del), and then a sequence of identiﬁers (associated with the tag ch), ending with #. The transducer is non-deterministic as it has to guess which of these identiﬁers it can choose to replace the one of the client. In particular, observe that it may associate multiple output words to a same input if two such free identiﬁers exist. ch,| ∅, ch,| ∅, σ, r | ∅, (σ, r2) del,| r , ch,r | r , #,| ∅, 1 2 1 2 3 4 = = σ, r ∧ r | r , (σ, r ) 0 0 1 2 Fig. 1. A register transducer T . It has three registers r , r and r and four states. rename 1 2 0 σ denotes any letter in Σ, r1 stores the id of del and r2 the chosen id of ch, while r0 is used to output the last data value read as input. As we only assign data to single registers, we write r for the singleton assignment set {r }. i i Finite Transducers Since we reduce the decision of continuity and functionality of NRT to the one of ﬁnite transducers, let us introduce them: a ﬁnite transducer (NFT for short) is an NRT with 0 registers (i.e. R = ∅). Thus, its transition relation can be represented as Δ ⊆ Q × Σ × Γ × Q. A direct extension of the construction of [15, Proposition 1] allows to show that: Proposition 3. Let T be an NRT with k registers, and let X ⊂ D be a ﬁnite ω ω subset of data. Then, T ∩ (Σ × X) × (Γ × X) is recognised by an NFT of |R| exponential size, more precisely with O(|Q|×|X| ) states. 2.2 Technical Properties of Register Automata Although automata are simpler machines than transducers, we only use them as tools in our proofs, which is why we deﬁne them from transducers, and not the On Computability of Data Word Functions Deﬁned by Transducers 223 other way around. A non-deterministic register automaton, denoted NRA,isa transducer without outputs: its transition relation is Δ ⊆ Q × Σ × Tst × 2 × {ε}× Q (simply represented as Δ ⊆ Q × Σ × Tst × 2 × Q). The semantics are the same, except that now we lift the condition that the output v is inﬁnite since there is no output. For A an NRA, we denote L(A)= {u ∈ (Σ ×D) | there exists an accepting run ρ of A over u}. Necessarily the output of an accepting run is ε. In this section, we establish technical properties about NRA. Proposition 4, the so-called “indistinguishability property”, was shown in the seminal paper by Kaminski and Francez [15, Proposition 1]. Their model diﬀers in that they do not allow distinct registers to contain the same data, and in the corresponding test syntax, but their result easily carries to our setting. It states that if an NRA accepts a data word, then such data word can be relabelled with data from any set containing d and with at least k + 1 elements. Indeed, at any point of time, the automaton can only store at most k data in its registers, so its notion of “freshness” is a local one, and forgotten data can thus be reused as fresh ones. Moreover, as the automaton only tests data for equality, their actual value does not matter, except for d which is initially contained in the registers. Such “small-witness” property is fundamental to NRA, and will be paramount in establishing decidability of functionality (Section 3) and computability (Sec- tion 4). We use it jointly with Lemma 5, which states that the interleaving of the traces of runs of an NRT can be recognised with an NRA, and Lemma 6, which expresses that an NRA can check whether interleaved words coincide on some bounded preﬁx, and/or mismatch before some given position. Proposition 4 ([15]). Let A be an NRA with k registers. If L(A) = ∅, then, for any X ⊆D of size |X|≥ k +1 such that d ∈ X, L(A) ∩ (Σ × X) = ∅. The runs of a register transducer T can be ﬂattened to their traces, so as to be recognised by an NRA. Those traces can then be interleaved, in order to be compared. The proofs of the following properties are straightforward. (u ,u ) (v ,v ) 1 1 1 1 Let ρ =(q ,τ ) − −−−−→ (q ,τ ) ... and ρ =(p ,μ ) − −−−→ (p ,μ ) ... be 1 0 0 1 1 2 0 0 1 1 L L T T two runs of a transducer T . Then, we deﬁne their interleaving ρ ⊗ρ = u ·u ·v · 1 2 1 1 v · u · u · v · v ... and L (T)= {ρ ⊗ ρ | ρ and ρ are accepting runs of T }. 2 2 ⊗ 1 2 1 2 1 2 2 Lemma 5. If T has k registers, then L (T ) is recognised by an NRA with 2k registers. Lemma 6. Let i, j ∈ N ∪ {∞}. We deﬁne M = {u u v v ···|∀k ≥ 1,u ,v ∈ 1 1 k k j 1 1 (Σ ×D),u ,v ∈ (Γ ×D) , ∀1 ≤ k ≤ j, v = u and u · u ···∧ v · v ...≤ i}. k k k k 1 2 1 2 Then, M is recognisable by an NRA with 2 registers and with 1 register if i = ∞. 3 Functionality, Equivalence and Composition of NRT In general, since they are non-deterministic, NRT may not deﬁne functions but relations, as illustrated by Example 2. In this section, we ﬁrst show that deciding 224 L. Exibard et al. whether a given NRT deﬁnes a function is PSpace-complete, in which case we call it functional. We show, as a consequence, that testing whether two functional NRT deﬁne two functions which coincide on their common domain is PSpace-complete. Finally, we show that functions deﬁned by NRT are closed under composition. This is an appealing property in transducer theory, as it allows to deﬁne complex functions by composing simple ones. Example 7. As explained before, the transducer T described in Example 2 rename is not functional. To gain functionality, one can reinforce the speciﬁcation by considering that one gets at the beginning a list of k possible identiﬁers, and that one has to select the ﬁrst one which is free, for some ﬁxed k. This transformation is realised by the register transducer T depicted in Figure 2 (for k = 2). rename2 = = = del,| r1, ch,r | r2, ch,r ∧ r | r3, 1 1 2 1 2 3 4 = = = σ, r | ∅, (σ, r ) σ, r | ∅, (σ, r ) 1 3 σ, r | ∅, (σ, r ) 1 2 1 3 σ, r | ∅, (σ, r ) #,| ∅, 2 2 7 6 5 = = = = = = = σ, r ∧ r | r , (σ, r ) σ, r ∧ r ∧ r | r , (σ, r ) σ, r ∧ r | r , (σ, r ) 0 0 0 0 0 0 1 3 1 2 3 1 2 Fig. 2. A NRT T , with four registers r ,r ,r and r (the latter being used, as in rename2 1 2 3 0 Figure 1, to output the last read data). After reading the # symbol, it guesses whether the value of register r appears in the suﬃx of the input word. If not, it goes to state 5, and replaces occurrences of r by r . Otherwise, it moves to state 6, waiting for an 1 2 occurrence of r , and replaces occurrences of r by r . 2 1 3 Let us start with the functionality problem in the data-free case. It is al- ready known that checking whether an NFT over ω-words is functional is decid- able [13,11]. By relying on the pattern logic of [10] designed for transducers of ﬁnite words, it can be shown that it is decidable in NLogSpace. Proposition 8. Deciding whether an NFT is functional is in NLogSpace. The following theorem shows that a relation between data-words deﬁned by an NRT with k registers is a function iﬀ its restriction to a set of data with at most 2k + 3 data is a function. As a consequence, functionality is decidable as it reduces to the functionality problem of transducers over a ﬁnite alphabet. Theorem 9. Let T be an NRT with k registers. Then, for all X ⊆D of size |X|≥ 2k +3 such that d ∈ X, we have that T is functional if and only if ω ω T ∩ ((Σ × X) × (Γ × X) ) is functional. Proof. The left-to-right direction is trivial. Now, assume T is not functional. Let ω ω x ∈ (Σ ×D) be such that there exists y, z ∈ (Γ ×D) such that y = z and (x, y), (x, z) ∈ T . Let i = y ∧ z. Then, consider the language L = {ρ ⊗ ρ | ρ 1 2 1 and ρ are accepting runs of T, in(ρ )= in(ρ ) and out(ρ )∧out(ρ )≤ i}. Since, 2 1 2 1 2 #,| ∅, On Computability of Data Word Functions Deﬁned by Transducers 225 by Lemma 5, L (T ) is recognised by an NRA with 2k registers and, by Lemma 6, i i M is recognised by an NRA with 2 registers, we get that L = L (T ) ∩ M is ∞ ∞ recognised by an NRA with 2k + 2 registers. Now, L = ∅, since, by letting ρ and ρ be the runs of T both with input x and 1 2 with respective outputs y and z, we have that w = ρ ⊗ ρ ∈ L. Let X ⊆D such 1 2 that |X|≥ 2k + 3 and d ∈ X.By Proposition 4, we get that L ∩ (Σ × X) = ∅. By letting w = ρ ⊗ ρ ∈ L ∩ (Σ × X) , and x = in(ρ )= in(ρ ), y = out(ρ ) 1 2 1 2 1 ω ω and z = out(ρ ), we have that (x ,y ), (x ,z ) ∈ T ∩ ((Σ × X) × (Γ × X) ) and y ∧ z ≤ i, so, in particular, y = z (since both are inﬁnite words). Thus, ω ω T ∩ ((Σ × X) × (Γ × X) ) is not functional. As a consequence of Proposition 8 and Theorem 9, we obtain the follow- ing result. The lower bound is obtained by encoding non-emptiness of register automata, which is PSpace-complete [4]. Corollary 10. Deciding whether an NRT T is functional is PSpace-complete. Hence, the following problem on the equivalence of NRT is decidable: Theorem 11. The problem of deciding, given two functions f, g deﬁned by NRT, whether for all x ∈ dom(f) ∩ dom(g), f(x)= g(x),is PSpace-complete. Proof. The formula ∀x ∈ dom(f) ∩ dom(g) · f(x)= g(x) is true iﬀ the relation f ∪ g = {(x, y) | y = f(x) ∨ y = g(x)} is a function. The latter can be decided by testing whether the disjoint union of the transducers deﬁning f and g deﬁnes a function, which is in PSpace by Corollary 10. To show the hardness, we similarly reduce the emptiness problem of NRA A over ﬁnite words, just as in the proof of Corollary 10. In particular, the functions f and f deﬁned in this proof (which 1 2 have the same domain) are equal iﬀ L(A)= ∅. Note that under the promise that f and g have the same domain, the latter theorem implies that it is decidable to check whether the two functions are equal. However, checking dom(f)= dom(g) is undecidable, as the language- equivalence problem for non-deterministic register automata is undecidable, since, in particular, universality is undecidable [19]. Closure under composition is a desirable property for transducers, which holds in the data-free setting [1]. We show that it also holds for functional NRT. Theorem 12. Let f, g be two functions deﬁned by NRT. Then, their composition f ◦ g is (eﬀectively) deﬁnable by some NRT. Proof (Sketch). By f ◦ g we mean f ◦ g : x → f(g(x)). Assume f and g are deﬁned by T =(Q ,R ,q ,F ,Δ ) and T =(Q ,R ,p ,F ,Δ ) respectively. f f f 0 f f g g g 0 g g Wlog we assume that the input and output ﬁnite alphabets of T and T are f g all equal to Σ, and that R and R are disjoint. We construct T such that f g T = f ◦ g. The proof is similar to the data-free case where the composition is shown via a product construction which simulates both transducers in parallel, executing the second on the output of the ﬁrst. Assume T has some transition g 226 L. Exibard et al. σ,φ|{r},o p − −−−−−→ q where o ∈ (Σ × R ) . Then T has to be able to execute transitions of T while processing o, even though o does not contain any concrete data values (it is here the main important diﬀerence with the data-free setting). However, if T knows the equality types between R and R , then it is able to trigger the f g transitions of T . For example, assume that o =(a, r ) and assume that the f g content of r is equal to the content of r , r being a register of T , then if T has g f f f f a,r |{r },o f f some transition of the form p − −−−−−−−→ q then T can trigger the transition σ,φ|{r}∪{r :=r },o (p, q) − −−−−−−−−−−−−→ (p ,q ) where the operation r := r is a syntactic sugar on top of NRT that intuitively means “put the content of r into r ”. Remark 13. The proof of Theorem 12 does not use the hypothesis that f and g are functions, and actually shows a stronger result, namely that relations deﬁned by NRT are closed under composition. 4 Computability and Continuity We equip the set of (ﬁnite or inﬁnite) data words with the usual distance: for ω −u∧v u, v ∈ (Σ ×D) , d(u, v)=0if u = v and d(u, v)=2 otherwise. A sequence of (ﬁnite or inﬁnite) data words (x ) converges to some inﬁnite data word x n n∈N if for all
> 0, there exists N ≥ 0 such that for all n ≥ N, d(x ,x) ≤
. In order to reason with computability, we assume in the sequel that the inﬁnite set of data values D we are dealing with has an eﬀective representation. For instance, this is the case when D = N. We now deﬁne how a Turing machine can compute a function of data words. We consider deterministic Turing machines, which three tapes: a read-only one- way input tape (containing the inﬁnite input data word), a two-way working tape, and a write-only one-way output tape (on which it writes the inﬁnite output data word). Consider some input data word x ∈ (Σ ×D) . For any integer k ∈ N,we let M(x, k) denote the output written by M on its output tape after having read the k ﬁrst cells of the input tape. Observe that as the output tape is write-only, the sequence of data words (M(x, k)) is non-decreasing. k≥0 ω ω Deﬁnition 14 (Computability). A function f :(Σ ×D) → (Γ ×D) is computable if there exists a deterministic multi-tape machine M such that for all x ∈ dom(f), the sequence (M(x, k)) converges to f(x). k≥0 ω ω Deﬁnition 15 (Continuity). A function f :(Σ ×D) → (Γ ×D) is contin- uous at x ∈ dom(f) if (equivalently): (a) for all sequences of data words (x ) converging towards x, where for all n n∈N i ∈ N, x ∈ dom(f), we have that (f(x )) converges to f(x). i n n∈N (b) ∀i ≥ 0, ∃j ≥ 0, ∀y ∈ dom(f), x ∧ y≥ j ⇒f(x) ∧ f(y)≥ i. Then, f is continuous if and only if it is continuous at each x ∈ dom(f). Finally, a functional NRT T is continuous when T is continuous. On Computability of Data Word Functions Deﬁned by Transducers 227 Example 16. We give an example of a non-continuous function f. The ﬁnite input and output alphabets are unary, and are therefore ignored in the description of f. Such function associates with every sequence s = d d ··· ∈ D the word 1 2 f(s)= d if d occurs inﬁnitely many times in s, otherwise f(s)= s itself. The function f is not continuous. Indeed, by taking d = d , the sequence of n ω ω n ω ω data words d(d ) d converges to d(d ) , while f(d(d ) d )= d converges to ω ω ω d = f(d(d ) )= d(d ) . Moreover, f is realisable by some NRT which non-deterministically guesses whether d repeats inﬁnitely many times or not. It needs only one register r in which to store d . In the ﬁrst case, it checks whether the current data d is equal the content r inﬁnitely often, and in the second case, it checks that this test succeeds ﬁnitely many times, using Buc ¨ hi conditions. One can show that the register transducer T considered in Example 7 rename2 also realises a function which is not continuous, as the value stored in register r may appear arbitrarily far in the input word. One could modify the speciﬁcation to obtain a continuous function as follows. Instead of considering an inﬁnite log, one considers now an inﬁnite sequence of ﬁnite logs, separated by $ symbols. The register transducer T , depicted in Figure 3, deﬁnes such a function. rename3 = = = del,| r , ch,r | r , ch,r ∧ r | r , 1 2 3 1 1 2 1 2 3 4 = = $,| ∅, σ, r | ∅, (σ, r3) σ, r | ∅, (σ, r3) 1 1 #,| ∅, σ, r | ∅, (σ, r2) 7 6 σ, r | ∅, (σ, r ) 5 = = = = = = = σ, r ∧ r | r0, (σ, r0) σ, r ∧ r ∧ r | r0, (σ, r0) σ, r ∧ r | r0, (σ, r0) 1 3 1 2 3 1 2 Fig. 3. A register transducer T . This transducer is non-deterministic, yet it deﬁnes rename3 a continuous function. We now prove the equivalence between continuity and computability for functions deﬁned by NRT. One direction, namely the fact that computability implies continuity, is easy, almost by deﬁnition. For the other direction, we rely on the following lemma which states that it is decidable whether a word v can be safely output, only knowing a preﬁx u of the input. In particular, given a function f,welet f be the function deﬁned over all ﬁnite preﬁxes u of words in dom(f) by f(u)= (f(uy) | uy ∈ dom(f)), the longest common preﬁx of all outputs of continuations of u by f. Then, we have the following decidability result: Lemma 17. The following problem is decidable. Given an NRT T deﬁning a ∗ ∗ function f, two ﬁnite data words u ∈ (Σ ×D) and v ∈ (Γ ×D) , decide whether v f(u). $,| ∅, #,| ∅, 228 L. Exibard et al. Theorem 18. Let f be a function deﬁned by some NRT T . Then f is continuous iﬀ f is computable. Proof. ⇐ Assuming f = T is computable by some Turing machine M,weshow that f is continuous. Indeed, consider some x ∈ dom(f), and some i ≥ 0. As the sequence of ﬁnite words (M(x, k)) converges to f(x) and these words have k∈N non-decreasing lengths, there exists j ≥ 0 such that |M(x, j)|≥ i. Hence, for any data word y ∈ dom(f) such that |x ∧ y|≥ j, the behaviour of M on y is the same during the ﬁrst j steps, as M is deterministic, and thus |f(x) ∧ f(y)|≥ i, showing that f is continuous at x. ⇒ Assume that f is continuous. We describe a Turing machine computing f; the corresponding algorithm is formalised as Algorithm 1. When reading a ﬁnite preﬁx x[:j] of its input x ∈ dom(f), it computes the set P of all conﬁgurations (q, τ) reached by T on x[:j]. This set is updated along taking increasing values of j. It also keeps in memory the ﬁnite output word o that has been output so far. For any j,if dt(x[:j]) denotes the data that appear in x, the algorithm then decides, for each input (σ, d) ∈ Σ × (dt(x[:j]) ∪{d }) whether (σ, d) can safely be output, i.e., whether all accepting runs on words of the form x[:j]y, for an inﬁnite word y, outputs at least o (σ, d). The latter can be decided, given T , o j j and x[:j], by Lemma 17. Note that it suﬃces to look at data in dt(x[:j]) ∪{d } only since, by deﬁnition of NRT, any data that is output is necessarily stored in some register, and therefore appears in x[:j] or is equal to d . Let us show that Algorithm 1: Algorithm describing the machine M computing f. Data: x ∈ dom(f) 1 o := ; 2 for j =0 to ∞ do 3 for (σ, d) ∈ Σ × (dt(x[:j]) ∪{d }) do 4 if o.(σ, d) f(x[:j]) then // such test is decidable by Lemma 17 5 o := o.(σ, d); 6 output (σ, d); 7 end 8 end 9 end M actually computes f. Let x ∈ dom(f). We have to show that the sequence (M (x, j)) converges to f(x). Let o be the content of variable o of M when f j j f exiting the inner loop at line 8, when the outer loop (line 2) has been executed j times (hence j input symbols have been read). Note that o = M (x, j). We j f have o o ... and o f(x[:j]) for all j ≥ 0. Hence, o f(x) for all 1 2 j j j ≥ 0. To show that (o ) converges to f(x), it remains to show that (o ) is j j j j non-stabilising, i.e. o ≺ o ≺ ... for some inﬁnite subsequence i <i <... . i i 1 2 1 2 First, note that f being continuous is equivalent to the sequence (f(x[:k])) converging to f(x). Therefore we have that f(x) ∧ f(x[:k]) can be arbitrarily long, On Computability of Data Word Functions Deﬁned by Transducers 229 for suﬃciently large k. Let j ≥ 0 and (σ, d)= f(x)[|o |+1]. By the latter property and the fact that o .(σ, d) f(x), necessarily, there exists some k> j such that o .(σ, d) f(x[:k]). Moreover, by deﬁnition of NRT, d is necessarily a data that appears in some preﬁx of x, therefore there exists k ≥ k such that d appears in ˆ ˆ x[:k ] and o .(σ, d) f(x[:k] f(x[:k ]. This entails that o .(σ, d) o . So, we j j k have shown that for all for all j, there exists k >j such that o ≺ o ,which j k concludes the proof. Now that we have shown that computability is equivalent with continuity for functions deﬁned by NRT, we exhibit a pattern which allows to decide continuity. Such pattern generalises the one of [3] to the setting of data words, the diﬃculty lying in showing that our pattern can be restricted to a ﬁnite number of data. Theorem 19. Let T be a functional NRT with k registers. Then, for all X ⊆D such that |X|≥ 2k +3 and d ∈ X, T is not continuous at some x ∈ (Σ ×D) if and only if T is not continuous at some z ∈ (Σ × X) . Proof. The right-to-left direction is trivial. Now, let T be a functional NRT with k registers which is not continuous at some x ∈ (Σ ×D) . Let f : dom(T ) → (Γ ×D) be the function deﬁned by T , as: for all u ∈ dom(T ),f(u)= v where v ∈ (Γ ×D) is the unique data word such that (u, v) ∈ T . Now, let X ⊆D be such that |X|≥ 2k + 3 and d ∈ X. We need to build two words u and v labelled over X which coincide on a suﬃciently long preﬁx to allow for pumping, hence yielding a converging sequence of input data words whose images do not converge, witnessing non-continuity. To that end, we use a similar proof technique as for Theorem 9: we show that the language of interleaved runs whose inputs coincide on a suﬃciently long preﬁx while their respective outputs mismatch before a given position is recognisable by an NRA, allowing us to use the indistinguishability property. We also ask that one run presents suﬃciently many occurrences of a ﬁnal state q , so that we can ensure that there exists a pair of conﬁgurations containing q which repeats in both runs. On reading such u and v, the automaton behaves as a ﬁnite automaton, since the number of data is ﬁnite ([15, Proposition 1]). By analysing the respective runs, we can, using pumping arguments, bound the position on which the mismatch appears in u, then show the existence of a synchronised loop over u and v after such position, allowing us to build the sought witness for non-continuity. Relabel over X Thus, assume T is not continuous at some point x ∈ (Σ ×D) . Let ρ be an accepting run of T over x, and let q ∈ inf(st(ρ)) ∩ F be an accepting state repeating inﬁnitely often in ρ. Then, let i ≥ 0 be such that for all j ≥ 0, there exists y ∈ dom(f) such that x ∧ y≥ j but f(x) ∧ f(y)≤ i. Now, deﬁne 2k K = |Q|× (2k +3) and let m =(2i +3) × (K + 1). Finally, pick j such that ρ[1:j] contains at least m occurrences of q . Consider the language: L = ρ ⊗ ρ in(ρ ) ∧ in(ρ )≥ j, out(ρ ) ∧ out(ρ )≤ i and 1 2 1 2 1 2 there are at least m occurrences of q in ρ [1:j] f 1 230 L. Exibard et al. By Lemma 5, L (T ) is recognised by an NRA with 2k registers. Additionnally, by Lemma 6, M is recognised by an NRA with 2 registers. Thus, L = L (T )∩O ∩ m,j M , where O checks there are at least m occurrences of q in ρ [1:j] (this is f 1 j m,j easily doable from the automaton recognising L (T ) by adding an m-bounded counter), is recognisable by an NRA with 2k + 2 registers. Choose y ∈ dom(f) such that x ∧ y≥ j but f(x) ∧ f(y)≤ i. By letting ρ (resp. ρ ) be an accepting run of T over x (resp. y)wehave ρ ⊗ ρ ∈ L,so 1 2 1 2 ω ω L = ∅.By Proposition 4, L ∩ ((Σ × X) × (Γ × X) ) = ∅. Let w = ρ ⊗ ρ ∈ 1 2 ω ω L ∩ ((Σ × X) × (Γ × X) ), u = in(ρ ) and v = in(ρ ). Then, u ∧ v≥ j, 1 2 f(u) ∧ f(v)≤ i and there are at least m occurrences of q in ρ [1:j]. f 1 Now, we depict ρ and ρ in Figure 4, where we decompose u as u = 1 2 u ...u ·s and v as v = u ...u ·t; their corresponding images being respectively 1 m 1 m u = u ...u · s and u = u ...u t . We also let l =(i + 1)(K + 1) and 1 m 1 m l =2(i + 1)(K + 1). Since the data of u, v and w belong to X, we know that τ ,μ : R → X. i i u | u u | u u1 | u ul | u l+1 l+1 ul | u l +1 l +1 um | u s|s 1 l l m R ... q ,μ ... q ,μ ... q ,μ i , d f l f f m 0 l (i + 1)(K + 1) occurrences of q (i + 1)(K + 1) occurrences of q (K + 1) occurrences of q f f f u | u u | u u | u u | u u | u u | u t|t 1 1 l l l+1 l+1 l +1 l +1 m m ... q ,τ ... q ,τ ... q ,τ i , d l l l l m m 0 0 Fig. 4. Runs of f over u = u ...u · s and v = u ...u · t. 1 m 1 m Repeating conﬁgurations First, let us observe that in a partial run of ρ containing more than |Q|×|X| occurrences of q , there is at least one productive transition, i.e. a transition whose output is o = ε. Otherwise, by the pigeonhole principle, there exists a conﬁguration μ : R → X such that (q ,μ) occurs at least twice in the partial run. Since all transitions are improductive, it would mean that, w|ε by writing w the corresponding part of input, we have (q ,μ) −−→ (q ,μ). This f f partial run is part of ρ , so, in particular, (q ,μ) is accessible, hence by taking w |w w such that (i ,τ ) − −−−→ (q ,μ), we have that f(w w )= w , which is a 0 0 0 f 0 ﬁnite word, contradicting our assumption that all accepting runs produce an inﬁnite output. This implies that, for any n ≥|Q|×|X| (in particular for n = l), u ...u ≥ i +1. 1 n Locate the mismatch Again, upon reading u ...u , there are (i + 1)(K +1) l+1 l occurrences of q . There are two cases: (a) There are at least i + 1 productive transitions in ρ . Then, we obtain that u ...u >i,so mismatch(u ...u ,u ...u ), since we know f(u) ∧ 1 l 1 l 1 l f(v)≤ i and they are respectively preﬁxes of f(u) and f(v), both of length at On Computability of Data Word Functions Deﬁned by Transducers 231 2k least i+1. Afterwards, upon reading u ...u , there are K+1 > |Q|×|X| l +1 m occurrences of q , so, by the pigeonhole principle, there is a repeating pair: there exist indices p and p such that l ≤ p<p ≤ m and (q ,μ )=(q ,μ ), f p f p (q ,τ )=(q ,τ ). Thus, let z = u ...u , z = u ...u and z = p p p p P 1 p R p+1 p C u ...u · t (P stands for preﬁx, R for repeat and C for continuation;we p +1 m use capital letters to avoid confusion with indices). By denoting z = u ...u , P 1 p z = u ...u , z = u ...u , z = u ...u and z = u ...u ·t R p+1 p P 1 p R p+1 p C p +1 m the corresponding images, z = z · z is a point of discontinuity. Indeed, P R deﬁne (z ) as, for all n ∈ N, z = z · z · z . Then, (z ) converges n n∈N n P C n n∈N towards z, but, since for all n ∈ N, f(z )= z · z · z , we have that P L C f(z ) −−→ f(z)= z · z , since mismatch(z ,z ). P L P P n∞ (b) Otherwise, by the same reasoning as above, it means there exists a repeating pair with only improductive transitions in between: there exist indices p and p such that l ≤ p<p ≤ l ,(q ,μ )=(q ,μ ), (q ,τ )=(q ,τ ), f p f p p p p p u ...u |ε u ...u |ε p+1 p+1 p p and (q ,μ ) −−−−−−−→ (q ,μ ), (q ,τ ) −−−−−−−→ (q ,τ ). Then, by f p f p p p p p taking z = u ...u , z = u ...u and z = u ...u · t,wehave, P 1 p R p+1 p C p +1 m by letting z = u ...u , z = u ...u , z = u ...u , z = ε and P 1 p R p+1 p P 1 p R z = u ...u · t , that z = z · z is a point of discontinuity. Indeed, P R C n +1 m deﬁne (z ) as, for all n ∈ N, z = z · z · z . Then, (z ) indeed n n∈N n P C n n∈N converges towards z, but, since for all n ∈ N, f(z )= z · z ,wehave P C that f(z ) −−→ f(z)= z · z , since mismatch(z ,z · z ) (the mismatch P R P P C n∞ necessarily lies in z , since z ≥ i + 1). P P Corollary 20. Deciding whether an NRT deﬁnes a continuous function is PSpace-complete. Proof. Let X ⊆D be a set of size 2k + 3 containing d .By Theorem 19, T is not ω ω continuous iﬀ it is not continuous at some z ∈ (Σ × X) ,iﬀ T ∩ (Σ × X) × (Γ × X) is not continuous. By Proposition 3, such relation is recognisable by a |R| ﬁnite transducer T with O(|Q|×|X| ) states, which can be built on-the-ﬂy. By [3], the continuity of functions deﬁned by NFT is decidable in NLogSpace, which yields a PSpace procedure. For the hardness, we reduce again from the emptiness problem of register automata, which is PSpace-complete [4]. Let A be a register automaton over some alphabet Σ ×D. We construct a transducer T which deﬁnes a continuous function iﬀ L(A)= ∅ iﬀ the domain of T is empty. Let f be a non-continous function realised by some NRT H (it exists by Example 16). Then, let # ∈ Σ be a fresh symbol, and deﬁne the function g as the function mapping any data word of the form w(#,d)w to w(#,d)f(w )if w ∈ L(A). The function g is realised by an NRT which simulates A and copies its inputs on the output to implement the identity, until it sees #. If it was in some accepting state of A before seeing #, it branches to some initial state of H and proceeds executing H. If there is some w ∈ L(A), then the subfunction g mapping words of the form w (#,d)w 0 w 0 to w (#,d)f(w ) is not continuous, since f is not. Hence g is not continuous. Conversely, if L(A)= ∅, then dom(g)= ∅,so g is continuous. 232 L. Exibard et al. In [3], non-continuity is characterised by a speciﬁc pattern (Lemma 21, Figure 1), i.e. the existence of some particular sequence of transitions. By applying this ω ω characterisation to the ﬁnite transducer recognising T ∩ ((Σ × X) × (Γ × X) ), as constructed in Proposition 3, we can characterise non-continuity by a similar pattern, which will prove useful to decide (non-)continuity of test-free NRT in NLogSpace (cf Section 5): Corollary 21 ([3]). Let T be an NRT with k registers. Then, for all X ⊆D such that |X|≥ 2k +3 and d ∈ X, T is not continuous at some x ∈ (Σ ×D) if and only if it has the pattern of Figure 5. v | v v | v u | u u | u w | w q ,μ q, τ i ,τ f i ,τ 0 0 0 0 Fig. 5. A pattern characterising non-continuity of functions deﬁnable by an NRT:we ask that there exist conﬁgurations (q ,μ)and (q, τ), where q is accepting, as well as f f ﬁnite input data words u, v, ﬁnite output data words u ,v ,u ,v , and an inﬁnite input data word w admitting an accepting run from conﬁguration (q, τ) producing output w , such that mismatch(u ,u ) ∨ (v = ε ∧ mismatch(u ,u w )). 5 Test-free Register Transducers In [7], we introduced a restriction which allows to recover decidability of the bounded synthesis problem for speciﬁcations expressed as non-deterministic register automata. Applied to transducers, such restriction also yields polynomial complexities when considering the functionality and computability problems. An NRT T is test-free when its transition function does not depend on the tests conducted over the input data. Formally, we say that T is test-free if for all σ,φ|asgn,o transitions q − −−−−−→ q we have φ = . Thus, we can omit the tests altogether R ∗ and its transition relation can be represented as Δ ⊆ Q × Σ × 2 × (Γ × R) × Q. ω ω Example 22. Consider the function f :(Σ ×D) → (Γ ×D) associating, to x =(σ ,d )(σ ,d ) ... , the value (σ ,d )(σ ,d )(σ ,d ) ... if there are inﬁnitely 1 1 2 2 1 1 2 1 3 1 many a in x, and (σ ,d )(σ ,d )(σ ,d ) ... otherwise. 1 2 2 2 3 2 f can be implemented using a test-free NRT with one register: it initially guesses whether there are inﬁnitely many a in x, if it is the case, it stores d in the single register r, otherwise it waits for the next input to get d and stores it in r. Then, it outputs the content of r along with each σ . f is not continuous, as even outputting the ﬁrst data requires reading an inﬁnite preﬁx when d = d . 1 2 output y[i] z[i]= x[j ] On Computability of Data Word Functions Deﬁned by Transducers 233 Note that when a transducer is test-free, the existence of an accepting run over a given input x only depends on its ﬁnite labels. Hence, the existence of two outputs y and z which mismatch over data can be characterised by a simple pattern (Figure 6), which allows to decide functionality in polynomial time: Theorem 23. Deciding whether a test-free NRT is functional is in PTime. Proof. Let T be a test-free NRT such that T is not functional. Then, there exists ω ω x ∈ (Σ ×D) , y, z ∈ (Γ ×D) such that (x, y), (x, z) ∈ T and y = z. Then, let i be such that y[i] = z[i]. There are two cases. Either lab(y[i]) = lab(z[i]), which means that the ﬁnite transducer T obtained by ignoring the registers of T is not functional. By Proposition 8, such property can be decided in NLogSpace,so let us focus on the second case: dt(y[i]) = dt(z[i]). r ∈ asgn r ∈ asgn r ∈ o r ∈ o j l j’ j l l’ r is not reassigned Fig. 6. A situation characterising the existence of a mismatch over data. Since acceptance does not depend on data, we can always choose x such that dt(x[j]) = dt(x[j ]). Here, we assume that the labels of x, y and z range over a unary alphabet; in particular y[i]= x[j]iﬀ dt(y[i]) = dt(x[j]). Finally, for readability, we did not write that r should not be reassigned between j and l . Note that the position of i with regards to j, j ,l and l does not matter; nor does the position of l w.r.t. l . We here give a sketch of the proof: observe that an input x admits two outputs which mismatch over data if and only if it admits two runs which respectively store x[j] and x[j ] such that x[j] = x[j ] and output them later at the same output position i; the outputs y and z are then such that dt(y[i]) = dt(z[i]). Since T is test-free, the existence of two runs over the same input x only depends on its ﬁnite labels. Then, the registers containing respectively x[j] and x[j ] should not be reassigned before being output, and should indeed output their content at the same position i (cf Figure 6). Besides, again because of test-freeness, we can always assume that x is such that x[j] = x[j ]. Overall, such pattern can be checked by a 2-counter Parikh automaton, whose emptiness is decidable in PTime [8] (under conditions that are satisﬁed here). Now, let us move to the case of continuity. Here again, the fact that test-free NRT conduct no test over the input data allows to focus on the only two registers that are responsible for the mismatch, the existence of an accepting run being only determined by ﬁnite labels. output z[i] y[i]= x[j] 234 L. Exibard et al. Theorem 24. Deciding whether a test-free NRT deﬁnes a continuous function is in PTime. Proof. Let T be a test-free NRT. First, it can be shown that T is continuous if and only if T has the pattern of Figure 7, where r is coaccessible (since acceptance only depends on ﬁnite labels, T can be trimmed in polynomial time). u | u v | v u | u v | v z | z q q q q i f f i r 0 0 Fig. 7. A pattern characterising non-continuity of functions deﬁned by NRT, where we ask that there exist some states q , q and r, where q is accepting, as well as f f ﬁnite input data words u, v, z and ﬁnite output data words u ,v ,u ,v ,z such that mismatch(u ,u )∨(v = ε ∧ mismatch(u ,u z )). Register assignments are not depicted, as there are no conditions on them. We unrolled the loops to highlight the fact that they do not necessarily loop back to the same conﬁguration. Now, it remains to show that such simpler pattern can be checked in PTime. We treat each part of the disjunction separately: u|u v|v u|u v|v (a) there exists u, u ,u ,v,v ,v s.t. i −−→ q −−→ q and i − −−→ q − −−→ 0 f f 0 q, where q ∈ F and mismatch(u ,u ). Then, as shown in the proof of Theorem 23, there exists a mismatch between some u and u produced by the same input u if and only if there exists two runs and two registers r and r assigned at two distinct positions, and later on output at the same position. Such pattern can similarly be checked by a 2-counter Parikh automaton; the only diﬀerence is that here, instead of checking that the two end states are coaccessible with a common ω-word, we only need to check that q ∈ F and that there is a synchronised loop over q and q, which are regular properties that can be checked by the Parikh automaton with only a polynomial increase. u|u v|v u|u v|ε (b) there exists u, u ,u ,v,v ,z,z s.t. i −−→ q −−→ q and i − −−→ q −−→ 0 f f 0 z|z q − −−→ r, where q ∈ F and mismatch(u ,u z ). By examining again the proof of Theorem 23, it can be shown that to obtain a mismatch, it suﬃces that the input is the same for both runs only up to position max(j, j ). More precisely, there is a mismatch between u and u z if and only if there exists two registers r and r and two positions j, j ∈{1,..., u} such that j = j , r is stored at position j, r is stored at position j , r and r are respectively output at input positions l ∈{1,..., u} and l ∈{1,..., uz} and they are not reassigned in the meantime. Again, such property, along with the fact that q ∈ F and the existence of a synchronised loop can be checked by a 2-counter Parikh automaton of polynomial size. Overall, deciding whether a test-free NRT is continuous is in PTime. We say that T is trim when all its states are both accessible and coaccessible. On Computability of Data Word Functions Deﬁned by Transducers 235 References 1. Berstel, J.: Transductions and Context-free Languages. Teubner Verlag (1979), http: //www-igm.univ-mlv.fr/ berstel/LivreTransductions/LivreTransductions.html 2. Carayol, A., L¨ oding, C.: Uniformization in Automata Theory. In: Proceedings of the 14th Congress of Logic, Methodology and Philosophy of Science, Nancy, July 19-26, 2011. pp. 153–178. London: College Publications (2014), https://hal. archives-ouvertes.fr/hal-01806575 3. Dave, V., Filiot, E., Krishna, S.N., Lhote, N.: Deciding the computability of regular functions over inﬁnite words. CoRR abs/1906.04199 (2019), http://arxiv.org/ abs/1906.04199 4. Demri, S., Lazic, R.: LTL with the freeze quantiﬁer and regis- ter automata. ACM Trans. Comput. Log. 10(3), 16:1–16:30 (2009). https://doi.org/10.1145/1507244.1507246 5. Durand-Gasselin, A., Habermehl, P.: Regular transformations of data words through origin information. In: Foundations of Software Science and Computa- tion Structures - 19th International Conference, FOSSACS 2016, Held as Part of the European Joint Conferences on Theory and Practice of Software, ETAPS 2016, Eindhoven, The Netherlands, April 2-8, Proceedings. pp. 285–300 (2016). https://doi.org/10.1007/978-3-662-49630-5 17 6. Ehlers, R., Seshia, S.A., Kress-Gazit, H.: Synthesis with identiﬁers. In: Pro- ceedings of the 15th International Conference on Veriﬁcation, Model Checking, and Abstract Interpretation - Volume 8318. pp. 415–433. VMCAI 2014 (2014). https://doi.org/10.1007/978-3-642-54013-4 23 7. Exibard, L., Filiot, E., Reynier, P.: Synthesis of data word transduc- ers. In: 30th International Conference on Concurrency Theory, CONCUR 2019, August 27-30, Amsterdam, the Netherlands. pp. 24:1–24:15 (2019). https://doi.org/10.4230/LIPIcs.CONCUR.2019.24 8. Figueira, D., Libkin, L.: Path logics for querying graphs: Combining expres- siveness and eﬃciency. In: 30th Annual ACM/IEEE Symposium on Logic in Computer Science, LICS 2015, Kyoto, Japan, July 6-10. pp. 329–340 (2015). https://doi.org/10.1109/LICS.2015.39 9. Filiot, E., Jecker, I., L¨ oding, C., Winter, S.: On equivalence and uniformisation problems for ﬁnite transducers. In: 43rd International Colloquium on Automata, Languages, and Programming, ICALP 2016, July 11-15, Rome, Italy. pp. 125:1– 125:14 (2016). https://doi.org/10.4230/LIPIcs.ICALP.2016.125 10. Filiot, E., Mazzocchi, N., Raskin, J.: A pattern logic for automata with out- puts. In: Developments in Language Theory - 22nd International Conference, DLT 2018, Tokyo, Japan, September 10-14, Proceedings. pp. 304–317 (2018). https://doi.org/10.1007/978-3-319-98654-8 25 11. Gire, F.: Two decidability problems for inﬁnite words. Inf. Process. Lett. 22(3), 135–140 (1986). https://doi.org/10.1016/0020-0190(86)90058-X 12. Holtmann, M., Kaiser, L., Thomas, W.: Degrees of lookahead in reg- ular inﬁnite games. Logical Methods in Computer Science 8(3) (2012). https://doi.org/10.2168/LMCS-8(3:24)2012 13. II, K.C., Pachl, J.K.: Equivalence problems for mappingson inﬁnite strings. Information and Control 49(1), 52–63 (1981). https://doi.org/10.1016/S0019-9958(81)90444-7 14. J.R. Buc ¨ hi, L.H. Landweber: Solving sequential conditions ﬁnite-state strate- gies. Transactions of the American Mathematical Society 138, 295–311 (1969). https://doi.org/10.2307/1994916 236 L. Exibard et al. 15. Kaminski, M., Francez, N.: Finite-memory automata. Theor. Comput. Sci. 134(2), 329–363 (Nov 1994). https://doi.org/10.1016/0304-3975(94)90242-9 16. Khalimov, A., Kupferman, O.: Register-bounded synthesis. In: 30th International Conference on Concurrency Theory, CONCUR 2019, August 27-30, Amsterdam, the Netherlands. pp. 25:1–25:16 (2019). https://doi.org/10.4230/LIPIcs.CONCUR.2019.25 17. Khalimov, A., Maderbacher, B., Bloem, R.: Bounded synthesis of register trans- ducers. In: Automated Technology for Veriﬁcation and Analysis, 16th Interna- tional Symposium, ATVA 2018, Los Angeles, October 7-10. Proceedings (2018). https://doi.org/10.1007/978-3-030-01090-4 18. Libkin, L., Tan, T., Vrgoc, D.: Regular expressions for data words. J. Comput. Syst. Sci. 81(7), 1278–1297 (2015). https://doi.org/10.1016/j.jcss.2015.03.005 19. Neven, F., Schwentick, T., Vianu, V.: Finite state machines for strings over inﬁnite alphabets. ACM Trans. Comput. Logic 5(3), 403–435 (Jul 2004). https://doi.org/10.1145/1013560.1013562 20. Pnueli, A., Rosner, R.: On the synthesis of a reactive module. In: ACM Symposium on Principles of Programming Languages, POPL. ACM (1989). https://doi.org/10.1145/75277.75293 21. Prieur, C.: How to decide continuity of rational functions on inﬁnite words. Theor. Comput. Sci. 276(1-2), 445–447 (2002). https://doi.org/10.1016/S0304- 3975(01)00307-3 Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made. The images or other third party material in this chapter are included in the chapter’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. Minimal Coverability Tree Construction Made Complete and Eﬃcient 1,3 1,2 1,2 Alain Finkel , Serge Haddad , and Igor Khmelnitsky (B) LSV, ENS Paris-Saclay, CNRS, Universit´ e Paris-Saclay, Cachan, France {finkel,haddad,khmelnitsky}@lsv.fr Inria, France Institut Universitaire de France, France Abstract. Downward closures of Petri net reachability sets can be ﬁnitely represented by their set of maximal elements called the minimal cover- ability set or Clover. Many properties (coverability, boundedness, ...) can be decided using Clover, in a time proportional to the size of Clover. So it is crucial to design algorithms that compute it eﬃciently. We present a simple modiﬁcation of the original but incomplete Minimal Coverability Tree algorithm (MCT), computing Clover, which makes it complete: it memorizes accelerations and ﬁres them as ordinary transitions. Contrary to the other alternative algorithms for which no bound on the size of the required additional memory is known, we establish that the additional space of our algorithm is at most doubly exponential. Furthermore we have implemented a prototype MinCov which is already very competi- tive: on benchmarks it uses less space than all the other tools and its execution time is close to the one of the fastest tool. Keywords: Petri nets · Karp-Miller tree algorithm · Coverability · Min- imal coverability set · Clover · Minimal coverability tree. 1 Introduction Coverability and coverability set in Petri nets. Petri nets are iconic as an inﬁnite-state model used for verifying concurrent systems. Coverability, in Petri nets, is the most studied property for several reasons: (1) many properties like mutual exclusion, safety, control-state reachability reduce to coverability, (2) the coverability problem is EXPSPACE-complete (while reachability is non ele- mentary), and (3) there exist eﬃcient prototypes and numerous case studies. To solve the coverability problem, there are backward and forward algorithms. But these algorithms do not address relevant problems like the repeated coverability problem, the LTL model-checking, the boundedness problem and regularity of the traces. However these problems are EXPSPACE-complete [4, 1] and are also decid- able using the Karp-Miller tree algorithm (KMT) [11] that computes a ﬁnite tree The work was carried out in the framework of ReLaX, UMI2000 and also supported by ANR-17-CE40-0028 project BRAVAS. The Author(s) 2020 J. Goubault-Larrecq and B. K¨ onig (Eds.): FOSSACS 2020, LNCS 12077, pp. 237–256, 2020. https://doi.org/10.1007/978-3-030-45231-5_13 238 A. Finkel et al. labeled by a set of ω-markings C ⊆ N (where N is the set of naturals enlarged with an upper bound ω and P is the set of places) such that the reachability set and the ﬁnite set C have the same downward closure in N . Thus a marking m is coverable if there exists some m ≥ m with m ∈ C. Hence, C can be seen as one among all the possible ﬁnite representations of the inﬁnite downward closure of the reachability set. This set C allows, for instance, to solve multiple instances of coverability in linear time linear w.r.t. the size of C avoiding to call many times a costly algorithm. Informally the KMT algorithm builds a reachability tree but, in order to ensure termination, substitutes ω to some ﬁnite components of a marking of a vertex when some marking of an ancestor is smaller. Unfortunately C may contain comparable markings while only the maximal elements are important. The set of maximal elements of C can be deﬁned in- dependently of the KMT algorithm and was called the minimal coverability set (MCS) in [6] and abbreviated as the Clover in the more general framework of Well Structured Transition Systems (WSTS) [7]. The minimal coverability tree algorithm. So in [5, 6] the author computes the minimal coverability set by modifying the KMT algorithm in such a way that at each step of the algorithm, the set of ω-markings labelling vertices is an antichain. But this aggressive strategy, implemented by the so-called Minimal Coverability Tree algorithm (MCT), contains a subtle bug and it may compute a strict under-approximation of Clover as shown in [8, 10]. Alternative minimal coverability set algorithms. Since the discovery of this bug, three algorithms (with variants) [10, 14, 13] have been designed for computing the minimal coverability set without building the full Karp-Miller tree. In [10] the authors proposed a minimal coverability set algorithm (called CovProc) that is not based on the Karp-Miller tree algorithm but uses a similar but restricted introduction of ω’s. In [14], Reynier and Servais proposed a mod- iﬁcation of the MCT, called the Monotone-Pruning algorithm (called MP), that keeps but “deactivates” vertices labeled with smaller ω-markings while MCT would have deleted them. Recently in [15], the authors simpliﬁed their original proof of correctness. In [16], Valmari and Hansen proposed another algorithm (denoted below as VH) for constructing the minimal coverability set without deleting vertices. Their algorithm builds a graph and not a tree as usual. In [13], Piipponen and Valmari improved this algorithm by designing appropriate data structures and heuristics for exploration strategy that may signiﬁcantly decrease the size of the graph. Our contributions. 1. We introduce the concept of abstraction as an ω-transition that mimics the eﬀect of an inﬁnite family of ﬁring sequences of markings w.r.t. coverabil- ity. As a consequence adding abstractions to the net does not modify its coverability set. Moreover, the classical Karp-Miller acceleration can be for- malized as an abstraction whose incidence on places is either ω or null. The set of accelerations of a net is upward closed and well-ordered. Hence there exists a ﬁnite subset of minimal accelerations and we show that the size of all minimal acceleration is bounded by a double exponential. Minimal Coverability Tree Construction Made Complete and Eﬃcient 239 2. Despite the current opinion that ”The ﬂaw is intricate and we do not see an easy way to get rid of it....Thus, from our point of view, ﬁxing the bug of the MCT algorithm seems to be a diﬃcult task” [10], we have found a simple modiﬁcation of MCT which makes it correct. It mainly consists in memorizing discovered accelerations and using them as ordinary transitions. 3. Contrary to all existing minimal coverability set algorithms that use an un- known additional memory that could be non primitive recursive, we show, by applying a recent result of Leroux [12], that the additional memory required for accelerations, is at most doubly exponential. 4. We have developed a prototype in order to also empirically evaluate the eﬃciency of our algorithm and the benchmarks (either from the literature or random ones) have conﬁrmed that our algorithm requires signiﬁcantly less memory than the other algorithms and is close to the fastest tool w.r.t. the execution time. Organization. Section 2 introduces abstractions and accelerations and studies their properties. Section 3 presents our algorithm and establishes its correctness. Section 4 describes our tool and discusses the results of the benchmarks. We conclude and give some perspectives to this work in Section 5. One can ﬁnd all the missing proofs and an illustration of the behavior of the algorithm in [9]. 2 Covering abstractions 2.1 Petri nets: reachability and covering Here we deﬁne Petri nets diﬀerently from the usual way but in an equivalent manner. i.e. based on the backward incidence matrix Pre and the incidence matrix C. The forward incidence matrix is implicitly deﬁned by C + Pre. Such a choice is motivated by the introduction of abstractions in section 2.2. Deﬁnition 1. A Petri net (PN) is a tuple N = P, T, Pre, C where: – P is a ﬁnite set of places; – T is a ﬁnite set of transitions, with P ∩ T = ∅; P ×T – Pre ∈ N is the backward incidence matrix; P ×T –C ∈ Z is the incidence matrix which fulﬁlls: for all p ∈ P and t ∈ T , C(p, t)+ Pre(p, t) ≥ 0. A marked Petri net (N , m ) is a Petri net N equipped with an initial marking m ∈ N . The column vector of matrix Pre (resp. C) indexed by t ∈ T is denoted Pre(t) (resp. C(t)). A transition t ∈ T is ﬁreable from a marking m ∈ N if m ≥ def Pre(t). When t is ﬁreable from m, its ﬁring leads to marking m = m + C(t), t ∗ denoted by m m . One extends ﬁreability and ﬁring to a sequence σ ∈ T −→ by recurrence on its length. The empty sequence ε is always ﬁreable and let the marking unchanged. Let σ = tσ be a sequence with t ∈ T and σ ∈ T . Then σ 240 A. Finkel et al. is ﬁreable from m if m m and σ is ﬁreable from m . The ﬁring of σ from −→ m leads to the marking m reached by σ from m . One also denotes this ﬁring by m m . −→ Deﬁnition 2. Let (N , m ) be a marked net. The reachability set Reach(N , m ) 0 0 is deﬁned by: ∗ σ Reach(N , m )= {m |∃σ ∈ T m m} −→ 0 0 In order to introduce the coverability set of a Petri net, let us recall some deﬁnitions and results related to ordered sets. Let (X, ≤) be an ordered set. The downward (resp. upward) closure of a subset E ⊆ X is denoted by ↓ E (resp. ↑E) and deﬁned by: ↓E = {x ∈ X |∃y ∈ Ey ≥ x} (resp. ↑E = {x ∈ X |∃y ∈ Ey ≤ x}) A subset E ⊆ X is downward (resp. upward) closed if E =↓E (resp. E =↑E). An antichain E is a set which fulﬁlls: ∀x = y ∈ E ¬(x ≤ y ∨ y ≤ x). X is said FAC (for Finite AntiChains) if all its antichains are ﬁnite. A non empty set E ⊆ X is directed if for all x, y ∈ E there exists z ∈ E such that x ≤ z and y ≤ z.An ideal is a set which is downward closed and directed. There exists an equivalent characterization of FAC sets which provides a ﬁnite description of any downward closed set: a set is FAC if and only if every downward closed set admits a ﬁnite decomposition in ideals (a proof of this well-known result can be found in [3]). X is well founded if all its (strictly) decreasing sequences are ﬁnite. X is well ordered if it is FAC and well founded. There are many equivalent characteriza- tions of well order. For instance, a set X is well ordered if and only if for all sequence (x ) in X, there exists a non decreasing inﬁnite subsequence. This n n∈N characterization allows to design algorithms that computes trees whose ﬁniteness is ensured by well order. Let us recall that (N, ≤) and (N , ≤) are well ordered sets. We are now ready to introduce the cover (also called the coverability set) of a net and to state some of its properties. Deﬁnition 3. Let (N , m ) be a marked Petri net. Cover(N , m ), its coverabil- 0 0 ity set, is deﬁned by: Cover(N , m )=↓Reach(N , m ) 0 0 Since the coverability set is downward closed and N is FAC, it admits a ﬁnite decomposition in ideals. The ideals of N can be deﬁned in an elegant way as follows. One ﬁrst extends the sets of naturals and integers: N = N ∪{ω} et Z = Z ∪{ω}. Then one extends the order relation and the addition to Z : ω ω for all n ∈ Z, ω> n and for all n ∈ Z , n + ω = ω + n = ω. N is also a well ordered set and its members are called ω-markings. There is a one-to-one P P mapping between ideals of N and ω-markings. Let m ∈ N . Deﬁne m by: m = {m ∈ N | m ≤ m} Minimal Coverability Tree Construction Made Complete and Eﬃcient 241 m is an ideal of N (and all ideal can be deﬁned in such a way). Let Ω be a set of ω-markings, Ω denotes the set m. Due to the above properties, m∈Ω there exists a unique ﬁnite set with minimal size Clover(N , m ) ⊆ N such that: Cover(N , m )= Clover(N , m ) 0 0 A more general result can be found in [3] for well structured transition systems. Example 1. The marked net of Figure 1 is unbounded. Its Clover is the following set: {p ,p + p ,p + p + ωp ,p + p + ωp + ωp } i bk m l m ba l bk ba c For instance, the marking p +p +αp +βp is reached thus covered by sequence l bk ba c α+β β t t t . 5 6 t p t 1 l 5 ba 2 bk i t t t 3 4 Fig. 1. An unbounded Petri net 2.2 Abstraction and acceleration In order to introduce abstractions and accelerations, we generalize the transitions to allow the capability to mark a place with ω tokens. Deﬁnition 4. Let P be a set of places. An ω-transition a is deﬁned by: – Pre(a) ∈ N its backward incidence; –C(a) ∈ Z its incidence with Pre(a)+ C(a) ≥ 0. For sake of homogeneity, one denotes Pre(a)(p) (resp. C(a)(p)) by Pre(p, a) (resp. C(p, a)). An ω-transition a is ﬁreable from an ω-marking m ∈ N if def m ≥ Pre(a). When a is ﬁreable from m, its ﬁring leads to the ω-marking m = m + C(a), denoted as previously m m . One observes that if Pre(p, a)= ω −→ then for all values of C(p, a), m (a)= ω. So without loss of generality, one assumes that for all ω-transition a, Pre(p, a)= ω implies C(p, a)= ω. In order to deﬁne abstractions, we ﬁrst deﬁne the incidences of a sequence σ of def ω-transitions by recurrence on its length. As previously, we denote Pre(p, σ) = 242 A. Finkel et al. def Pre(σ)(p) and C(p, σ) = C(σ)(p). The base case corresponds to the deﬁnition of an ω-transition. Let σ = tσ , with t an ω-transition and σ a sequence of ω-transitions, then: –C(σ)= C(t)+ C(σ ); – for all p ∈ P • if C(p, t)= ω then Pre(p, σ)= Pre(p, t); • else Pre(p, σ) = max(Pre(p, t), Pre(p, σ ) − C(p, t)). One checks by recurrence that σ is ﬁrable from m if and only if m ≥ Pre(σ) and in this case, m m + C(σ). −→ An abstraction of a net is an ω-transition which concisely expresses the be- haviour of the net w.r.t. covering (see Proposition 1). One will observe that a transition t of a net is by construction (with σ = t for all n) an abstraction. Deﬁnition 5. Let N = P, T, Pre, C be a Petri net and a be an ω-transition. a is an abstraction if for all n ≥ 0, there exists σ ∈ T such that for all p ∈ P with Pre(p, a) ∈ N: 1. Pre(p, σ ) ≤ Pre(p, a); 2. If C(p, a) ∈ Z then C(p, σ ) ≥ C(p, a); 3. If C(p, a)= ω then C(p, σ ) ≥ n. The following proposition justiﬁes the interest of abstractions. Proposition 1. Let (N , m ) be a marked Petri net, a be an abstraction and m be an ω-marking such that: m ⊆ Cover(N , m ) and m m . Then m ⊆ −→ Cover(N , m ). ∗ ∗ Proof. Pick some m ∈ m . Denote n = max(m (p) | m (p)= ω) and = max(Pre(p, σ ),n − C(p, σ ) | m(p)= ω). Let us deﬁne m ∈ m by: n n – If m(p) <ω then m (p)= m(p); – Else m (p)= . Let us check that σ is ﬁreable from m . Let p ∈ P , – If m(p) <ω then m (p)= m(p) ≥ Pre(p, a) ≥ Pre(p, σ ); – Else m (p)= ≥ Pre(p, σ ). Let us show that m + C(σ ) ≥ m . Let p ∈ P , – If m(p) <ω and C(p, a) <ω then m (p)+ C(p, σ ) ≥ m(p)+ C(p, a)= m (p) ≥ m (p); – If m(p) <ω and C(p, a)= ω then m (p)+ C(p, σ ) ≥ C(p, σ ) ≥ n ≥ n n m (p); – If m(p)= ω then m (p)+ C(p, σ ) ≥ n − C(p, σ )+ C(p, σ )= n ≥ m (p). n n n An easy way to build new abstractions consists in concatenating them. Minimal Coverability Tree Construction Made Complete and Eﬃcient 243 Proposition 2. Let N = P, T, Pre, C be a Petri net and σ be a sequence of abstractions. Then the ω-transition a deﬁned by Pre(a)= Pre(σ) and C(a)= C(σ) is an abstraction. We now introduce the underlying concept of the Karp and Miller construc- tion. Deﬁnition 6. Let N = P, T, Pre, C be a Petri net. One says that a is an acceleration if a is an abstraction such that C(a) ∈{0,ω} . The following proposition provides a way to get an acceleration from an arbitrary abstraction. Proposition 3. Let N = P, T, Pre, C be a Petri net and a be an abstraction. Deﬁne a an ω-transition as follows. For all p ∈ P : – If C(p, a) < 0 then Pre(p, a )= C(p, a )= ω; – If C(p, a)=0 then Pre(p, a )= Pre(p, a) and C(p, a )=0; – If C(p, a) > 0 then Pre(p, a )= Pre(p, a) and C(p, a )= ω. Then a is an acceleration. Let us study more deeply the set of accelerations. First we equip the set of ω-transitions with a“natural” order w.r.t. covering. Deﬁnition 7. Let P be a set of places and two ω-transitions a and a . a ≤ a if and only if Pre(a) ≤ Pre(a ) ∧ C(a) ≥ C(a ) In other words, a ≤ a if given any ω-marking m,if a is ﬁreable from m then a is also ﬁreable and its ﬁring leads to a marking greater or equal that the one reached by the ﬁring of a . Proposition 4. Let N be a Petri net. Then the set of abstractions of N is upward closed. Similarly, the set of accelerations is upward closed in the set of ω-transitions whose incidence belongs to {0,ω} . Proposition 5. The set of accelerations of a Petri net is well ordered. P P Proof. The set of accelerations is a subset of N ×{0,ω} (where P is the set of places) with the order obtained by iterating cartesian products of sets (N, ≤) and ({0,ω}, ≥). These sets are well ordered and the cartesian product preserves this property. So we are done. Since the set of accelerations is well ordered and it is upward closed, it is equal to the upward closure of the ﬁnite set of minimal accelerations. Let us study the size of a minimal acceleration. Given some Petri net, one denotes d = |P | and e = max (max(Pre(p, t), Pre(p, t)+ C(p, t)). p,t We are going to use the following result of J´erˆome Leroux (published on HAL in June 2019) which provides a bound for the lengths of shortest sequences between two markings m and m mutually reachable. 1 2 244 A. Finkel et al. Theorem 1. (Theorem 2, [12]) Let N be a Petri net, m , m be markings, 1 2 σ σ 1 2 σ ,σ be sequences of transitions such that m −→ m −→ m . Then there exist 1 2 1 2 1 σ σ 1 2 σ ,σ such that m −→ m −→ m fulﬁlling: 1 2 1 1 2 2d+4 (d+1) |σ σ |≤||m − m || (3de) 1 2 1 2 ∞ One deduces an upper bound on the size of minimal accelerations. Let v ∈ N . One denotes ||v|| = max(v(p) | v(p) ∈ N). ω ∞ Proposition 6. Let N be a Petri net and a be a minimal acceleration. 2d+4 (d+1) Then ||Pre(a)|| ≤ e(3de) . Proof. Let us consider the net N = P ,T , Pre , C obtained from N by deleting the set of places {p | Pre(p, a)= ω} and adding the set of transitions T = {t | p ∈ P } with Pre(t )= p et C(t )= −p. Observe that d ≤ d and 1 p p p e = e. One denotes P = {p | Pre(p, a) <ω = C(p, a)}. One introduces m the 1 1 marking obtained by restricting Pre(a)to P and m = m + p. 2 1 p∈P Let {σ } be a family of sequences associated with a. Let n = ||Pre(a)|| +1. n n∈N ∞ Then σ is ﬁreable in N from m and its ﬁring leads to a marking that covers n 1 m . By concatenating some occurrences of transitions of T , one gets a ﬁring 2 1 sequence in N m m . Using the same process, one gets a ﬁring sequence 1 −→ 2 m m . 2 −→ 1 Let us apply Theorem 1. There exists a sequence σ with m m and |σ |≤ 1 −→ 2 1 1 2d+4 (d+1) (3de) since ||m −m || = 1. By deleting the transitions of T occurring 1 2 ∞ 1 ∗ 1 in σ , one gets a sequence σ ∈ T such that m m ≥ m with |σ |≤ 1 −→ 2 1 1 2 1 2d+4 (d+1) (3de) . The ω-transition a , deﬁned by Pre(p, a )= Pre(p, σ ) for all p ∈ P , Pre(p, a )= ω for all p ∈ P \ P and C(a )= C(a), is an acceleration whose associated family is {σ } . By deﬁnition of m , a ≤ a. Since a is minimal, a = a. n∈N 1 2d+4 (d+1) Observing that |σ |≤ (3de) , one gets ||Pre(a)|| = ||Pre(a )|| ≤ ∞ ∞ 2d+4 (d+1) e(3de) . Thus given any acceleration, one can easily obtain a smaller acceleration whose (representation) size is exponential. Proposition 7. Let N be a Petri net and a be an acceleration. Then the ω-transition trunc(a) deﬁned by: –C(trunc(a)) = C(a); – for all p such that Pre(p, a) = ω, 2d+4 (d+1) Pre(p, trunc(a)) = min(Pre(p, a),e(3de) ) ; – for all p such that Pre(p, a)= ω, Pre(p, trunc(a)) = ω. is an acceleration. Proof. Let a ≤ a, be a minimal acceleration. For all p such that Pre(p, a) = ω, 2d+4 (d+1) Pre(p, a ) ≤ e(3de) .So a ≤ trunc(a). Since the set of accelerations is upward closed, one gets that trunc(a) is an acceleration. Minimal Coverability Tree Construction Made Complete and Eﬃcient 245 3 A coverability tree algorithm 3.1 Speciﬁcation and illustration As discussed in the introduction, to compute the clover of a Petri net, most algorithms build coverability trees (or graphs), which are variants of the Karp and Miller tree with the aim of reducing the peak memory during the execution. The seminal algorithm [6] is characterized by a main diﬀerence with the KMT construction: when ﬁnding that the marking associated with the current vertex strictly covers the marking of another vertex, it deletes the subtree issued from this vertex, and when the current vertex belonged to the removed subtree it sub- stitutes it to the root of the deleted subtree. This operation drastically reduces the peak memory but as shown in [8] entails incompleteness of the algorithm. Like the previous algorithms that ensure completeness with deletions, our algorithm also needs additional memory. However unlike the other algorithms, it memorizes accelerations instead of ω-markings. This approach has two advan- tages. First, we are able to exhibit a theoretical upper bound on the additional memory which is doubly exponential, while the other algorithms do not have such a bound. Furthermore, accelerations are reused in the construction and thus may even shorten the execution time and peak space w.r.t. the algorithm in [6]. Before we delve into a high level description of this algorithm, let us present some of the variables, functions, and deﬁnitions used by the algorithm. Algorithm 1, denoted from now on as MinCov takes as an input a marked net (N , m ) and constructs a directed labeled tree CT =(V, E, λ, δ), and a set Acc of ω- transitions (which by Lemma 2 are accelerations). Each v ∈ V is labeled by an ω-marking, λ(v) ∈ N . Since CT is a directed tree, every vertex v ∈ V , has a predecessor (except the root r) denoted by prd(v) and a set of descendants denoted by Des(v). By convention, prd(r)= r. Each edge e ∈ E is labeled by a ﬁring sequence δ(e) ∈ T ·Acc , consisting of an ordinary transition followed by a δ(prd(v),v) sequence of accelerations (which by Lemma 1 fulﬁlls λ(prd(v)) − −−−−−−→ λ(v)). δ(r,r) In addition, again by Lemma 1, m −−−→ λ(r). Let γ = e e ...e ∈ E be 0 1 2 k a path in the tree, we denote by δ(γ):= δ(e )δ(e ) ...δ(e ) ∈ (T ∪ Acc) . The 1 2 k subset Front ⊂ V is the set of vertices ‘to be processed’. MinCov may call function Delete(v) that removes from V a leaf v of CT and function Prune(v) that removes from V all descendants of v ∈ V except v itself as illustrated in the following ﬁgure: Delete(u) Prune(v) v u v v First MinCov does some initializations, and sets the tree CT to be a single vertex r with marking λ(r)= m and Front = {r}. Afterwards the main loop 0 246 A. Finkel et al. builds the tree, where each iteration consists in processing some vertex in Front as follows. MinCov picks a vertex u ∈ Front (line 3). From λ(u), MinCov ﬁres a sequence σ ∈ Acc reaching some m that maximizes the number of ω produced, i.e. |{p ∈ P | λ(u)(p) = ω ∧ m (p)= ω}|.Thusin σ, no acceleration occurs twice and its length is bounded by |P |. Then MinCov updates λ(u) with m (line 5) and the label of the edge incoming to u by concatenating σ. Afterwards it performs one of the following actions according to the marking λ(u): – Cleaning (line 7): If there exists u ∈ V \ Front with λ(u ) ≥ λ(u). The vertex u is redundant and MinCov calls Delete(u) – Accelerating (lines 8-16): If there exists u , an ancestor of u with λ(u ) < λ(u) then an acceleration can be computed. The acceleration a is deduced from the ﬁring sequence labeling the path from u to u. MinCov inserts a into Acc, calls Prune(u ) and pushes back u in Front. – Exploring (lines 18 - 25): Otherwise MinCov calls Prune(u ) followed by Delete(u ) for all u ∈ V with λ(u ) <λ(u) since they are redundant. Afterwards, it removes u from Front and for all ﬁreable transition t ∈ T from λ(u), it creates a new child for u in CT and inserts it into Front. For a detailed example of a run of the algorithm see Example 2 in [9]. 3.2 Correctness Proof We now establish the correctness of Algorithm 1 by proving the following prop- erties (where for all W ⊆ V , λ(W ) denotes λ(v)): v∈W – its termination; – the incomparability of ω-markings associated with vertices in V : λ(V ) is an antichain; – its consistency: λ(V ) ⊆ Cover(N , m ); – its completeness: Cover(N , m ) ⊆ λ(V ). We get termination by using the well order of N and Koenig Lemma. Proposition 8. MinCov terminates. Proof. Consider the following variation of the algorithm. Instead of deleting the current vertex when its marking is smaller or equal than the marking of a vertex, one marks it as ‘cut’ and extract it from Front. Instead of cutting a subtree when the marking of the current vertex v is greater than the marking of a vertex which is not an ancestor of v, one marks them as ‘cut’ and extract from Front those who are inside. Instead of cutting a subtree when the marking of the current vertex v is greater than the marking of a vertex which is an ancestor of v,say v , one marks those on the path from v to v (except v) as ‘accelerated’, one marks the other vertices Minimal Coverability Tree Construction Made Complete and Eﬃcient 247 Algorithm 1: Computing the minimal coverability set MinCov(N , m ) Input: A marked Petri net (N , m ) p ∗ Data: V set of vertices; E ⊆ V × V ; Front ⊆ V ; λ : V → N ; δ : E → T Acc ; CT =(V, E, λ, δ) a labeled tree; Acc a set of ω-transitions; Output: A labeled tree CT =(V, E, λ, δ) 1 V ←{r}; E ←∅; Front ←{r}; λ(r) ← m ; Acc ←∅; δ(r, r) ← ε 2 while F ront = ∅ do 3 Select u ∈ Front 4 Let σ ∈ Acc a maximal ﬁreable sequence of accelerations from λ(u) // Maximal w.r.t. the number of ω’s produced 5 λ(u) ← λ(u)+ C(σ) 6 δ((prd(u),u)) ← δ((prd(u),u)) · σ 7 if ∃u ∈ V \ Front s.t. λ(u ) ≥ λ(u) then Delete(u) // λ(u) is covered 8 else if ∃u ∈ Anc(V ) s.t. λ(u) >λ(u ) then // An acceleration was found between u and one of u’s ancestors 9 Let γ ∈ E the path from u to u in CT 10 a ← NewAcceleration() 11 foreach p ∈ P do 12 if C(p, δ(γ)) < 0 then Pre(p, a) ← ω; C(p, a) ← ω 13 if C(p, δ(γ)) = 0 then Pre(p, a) ← Pre(p, δ(γ)); C(p, a) ← 0 14 if C(p, δ(γ)) > 0 then Pre(p, a) ← Pre(p, δ(γ)); C(p, a) ← ω 15 end 16 a ← trunc(a); Acc ← Acc ∪{a}; Prune(u ); Front = Front ∪{u } ; 17 else 18 for u ∈ V do // Remove vertices labeled by markings covered by λ(u) 19 if λ(u ) <λ(u) then Prune(u ); Delete(u ) 20 end 21 Front ← Front \{u} 22 foreach t ∈ T ∧ λ(u) ≥ Pre(t) do // Add the children of u 23 u ← NewNode(); V ← V ∪{u }; Front ← Front ∪{u }); E ← E ∪{(u, u )} 24 λ(u ) ← λ(u)+ C(t); δ((u, u )) ← t 25 end 26 end 27 end 28 return CT 248 A. Finkel et al. of the subtree as ‘cut’ and inserts v again in Front with the marking of v . All the markings of the subtree in Front are extracted from it. All the vertices marked as ‘cut’ or ‘accelerated’ are ignored for comparisons and discovering accelerations. This alternative algorithm behaves as the original one except that the size of the tree never decreases and so if the algorithm does not terminate the tree is inﬁnite. Since this tree is ﬁnitely branching, due to Koenig Lemma it contains an inﬁnite path. On this inﬁnite path, no vertex can be marked as ‘cut’ since it would belong to a ﬁnite subtree. Observe that the marking labelling the vertex following an accelerated subpath has at least one more ω than the marking of the ﬁrst vertex of this subpath. So there is an inﬁnite subpath with unmarked vertices in V . But N is well-ordered, so there should be two vertices v and v , where v is a descendant of v with λ(v ) ≥ λ(v), which contradicts the behaviour of the algorithm. Since we are going to use recurrence on the number of iterations of the main loop of Algorithm 1, we introduce the following notations: CT =(V ,E ,λ ,δ ), n n n n n Front , and Acc are the the values of variables CT, Front, and Acc at line 2 n n when n iterations have been executed. Proposition 9. For all n ∈ N, λ(V \ Front ) is an antichain. Thus on termi- n n nation, λ(V ) is an antichain. Proof. Let us introduce V := V \ Front and V := V \ Front . We are going n n to prove by induction on the number n of iterations of the while-loop that V is an antichain. MinCov initializes variables V and Front at line 1. So V = {r} and Front = {r}, therefore V = V \ Front = ∅ is an antichain. 0 0 0 Assume that V = V \ Front is an antichain. Modifying V can be done by n n n n adding or removing vertices from V and removing vertices from Front while n n keeping them in V . The actions that MinCov may perform in order to modify the sets V and Front are: Delete (lines 7 and 19), Prune (lines 16 and 19), adding vertices to V (line 23), adding vertices to Front (lines 16 and 23), and removing vertices from Front (line 21). • Both Delete and Prune do not add new vertices to V . Thus the antichain feature is preserved. • MinCov may add vertices to V only at line 23 where it simultaneously adds them to Front and therefore does not add new vertices to V . Thus the antichain feature is preserved. • Adding vertices to Front may only remove vertices from V . Thus the antichain feature is preserved. • MinCov can only add a vertex to V when it removes it from Front while keeping it in V . This is done only at line 21. There the only vertex MinCov may remove (line 21) is the working vertex u. However if (in the iteration) MinCov reaches line 21 then it did not reach line 7 hence, (1) all markings of λ(V ) ⊆ λ(V ) are either smaller or incomparable to λ (u). Moreover, MinCov has also reached n+1 line 18-20, where (2) it performs Delete on all vertices u ∈ V ⊆ V with λ (u ) <λ (u). Let us denote by V ⊆ V the set V at the end of line n n+1 n n Minimal Coverability Tree Construction Made Complete and Eﬃcient 249 20. Due to (1) and (2), marking λ (u) is incomparable to any marking in n+1 λ (V ). Since V ⊆ V , λ (V ) is an antichain. Combining this fact with n+1 n+1 n n n n the incomparability between λ (u) and any marking in λ (V ), we conclude n+1 n+1 that the set λ (V )= λ (V ) ∪{λ (u)} is an antichain. n+1 n+1 n+1 n+1 n In order to establish consistency, we prove that the labelling of vertices and edges is compatible with the ﬁring rule and that Acc is a set of accelerations. δ(prd(u),u) Lemma 1. For all n ∈ N, for all u ∈ V \{r}, λ (prd(u)) − −−−−−−→ λ (u) n n n δ(r,r) and m −−−→ λ (r). 0 n Proof. Let us prove by induction on the number n of iterations of the main loop that for all v ∈ V , the assertions of the lemma hold. Initially, V = {r} and n 0 λ (r)= m . Since m m = λ (r) the base case is established. 0 0 0 −→ 0 0 Assume that the assertions hold for CT . Observe that MinCov may change the labeling function λ and/or add new vertices in exactly two places: at lines 4-6 and at lines 22-25. Therefore in order to prove the assertion, we show that after each group of lines it still holds. • After lines 4-6: MinCov computes (1) a maximal ﬁreable sequence σ ∈ Acc from λ (u) (line 4), and updates u’s marking to m = λ (u)+ C(σ) (line 5). n u n δ(prd(u),u) Since the assertions hold for CT , (2) if u = r, λ (prd(u)) − −−−−−−→ λ (u) else n n n δ(r,r) δ(prd(u),u)σ m −−−→ λ (r). By concatenation, we get λ (prd(u)) − −−−−−−−→ m if u = r 0 n n u δ(r,r)σ and otherwise m −−−−→ m which establishes that the assertions hold after 0 u line 6. • After lines 22-25: The vertices for which λ is updated at these lines are the children of u that are added to the tree. For every ﬁreable transition t ∈ T from λ(u), MinCov creates a child v for u (lines 22-23). The marking of any child v is set to m (v):= m (u)+ C(t) (line 24). Therefore since λ (u) → − t n+1 n+1 n+1 λ (v ), the assertions hold. n+1 t Lemma 2. At any execution point of MinCov, Acc is a set of accelerations. Proof. At most one acceleration is added per iteration. Let us prove by induction on the number n of iterations of the main loop that Acc is a set of accelerations. Since Acc = ∅, the base case is straightforward. Assume that Acc is a set of accelerations and consider Acc . In an itera- n n+1 tion, MinCov may add an ω-transition a to Acc. Due to the inductive hypothe- sis, δ(γ) is a sequence of abstractions where γ is deﬁned at line 9. Consider b, the ω-transition deﬁned by Pre(b)= Pre(δ(γ)) and C(b)= C(δ(γ)). Due to Proposition 2, b is an abstraction. Due to Proposition 3, the loop of lines 11-15 transforms b into an acceleration a. Due to Proposition 7, after truncation at line 16, a is still an acceleration. Proposition 10. λ(V ) ⊆ Cover(N , m ). 0 250 A. Finkel et al. Proof. Let v ∈ V . Consider the path u ,...,u of CT from the root r = u 0 k 0 to u = v. Let σ ∈ (T ∪ Acc) denote δ(prd(u ),u ) ··· δ(prd(u ),u ). Due to k 0 0 k k Lemma 1, m λ(v). Due to Lemma 2, σ is a sequence of abstractions. Due to 0 −→ Proposition 2, the ω-transition a deﬁned by Pre(a)= Pre(σ) and C(a)= C(σ) is an abstraction. Due to Proposition 1, λ(v) ⊆ Cover(N , m ). The following deﬁnitions are related to an arbitrary execution point of MinCov and are introduced to establish its completeness. Deﬁnition 8. Let σ = σ t σ ...t σ with for all i, t ∈ T and σ ∈ Acc . Then 0 1 1 k k i i the ﬁring sequence m m is an exploring sequence if: −→ – There exists v ∈ Front with λ(v)= m – For all 0 ≤ i ≤ k, there does not exist v ∈ V \ Front with m + C(σ t σ ...t σ ) ≤ λ(v ). 0 1 1 i i Deﬁnition 9. Let m be a marking. Then m is quasi-covered if: – either there exists v ∈ V \ Front with λ(v) ≥ m ; – or there exists an exploring sequence m m ≥ m . −→ In order to prove completeness of the algorithm, we want to prove that at the beginning of every iteration, any m ∈ Cover(N , m ) is quasi-covered. To establish this assertion, we introduce several lemmas showing that this assertion is preserved by some actions of the algorithm with some prerequisites. More pre- cisely, Lemma 3 corresponds to the deletion of the current vertex, Lemma 4 to the discovery of an acceleration, Lemma 5 to the deletion of a subtree whose mark- ing of the root is smaller than the marking of the current vertex and Lemma 6 to the creation of the children of the current vertex. Lemma 3. Let CT , Front and Acc be the values of corresponding variables at some execution point of MinCov and u ∈ V bealeafin CT such that the following items hold: 1. All m ∈ Cover(N , m ) are quasi-covered; 2. λ(V \ Front) is an antichain; 3. For all a ∈ Acc ﬁreable from λ(u), λ(u)= λ(u)+ C(a); 4. There exists v ∈ V \{u} such that λ(v) ≥ λ(u). Then all m ∈ Cover(N , m ) are quasi-covered after performing Delete(u). Lemma 4. Let CT , Front and Acc be the values of corresponding variables at some execution point of MinCov. and u ∈ V such that the following items hold: 1. All m ∈ Cover(N , m ) are quasi-covered; 2. λ(V \ Front) is an antichain; δ(prd(v),v) 3. For all v ∈ V \{r}, λ(prd(v)) − −−−−−−→ λ(v). Then all m ∈ Cover(N , m ) are quasi-covered after performing Prune(u) and then adding u to Front. Minimal Coverability Tree Construction Made Complete and Eﬃcient 251 Lemma 5. Let CT , Front and Acc be the values of corresponding variables at some execution point of MinCov, u ∈ Front and u ∈ V such that the following items hold: 1. All m ∈ Cover(N , m ) are quasi-covered; 2. λ(V \ Front) is an antichain; δ(prd(v),v) 3. For all v ∈ V \{r}, λ(prd(v)) − −−−−−−→ λ(v); 4. λ(u ) <λ(u) and u is not a descendant of u . Then after performing Prune(u ); Delete(u ), 1. All m ∈ Cover(N , m ) are quasi-covered; 2. λ(V \ Front) is an antichain; δ(prd(v),v) 3. For all v ∈ V \{r}, λ(prd(v)) − −−−−−−→ λ(v). Lemma 6. Let CT , Front and Acc be the values of corresponding variables at some execution point of MinCov. and u ∈ Front such that the following items hold: 1. All m ∈ Cover(N , m ) are quasi-covered; 2. λ(V \ Front) ∪{λ(u)} is an antichain; 3. For all a ∈ Acc ﬁreable from λ(u), λ(u)= λ(u)+ C(a). Then after removing u from Front and for all t ∈ T ﬁreable from λ(u), adding a child v to u in Front with marking of v deﬁned by λ (v )= λ(u)+ C(t),all t t u t m ∈ Cover(N , m ) are quasi-covered. Proposition 11. At the beginning of every iteration, all m ∈ Cover(N , m ) are quasi-covered. Proof. Let us prove by induction on the number of iterations that all m ∈ Cover(N , m ) are quasi-covered. Let us consider the base case. MinCov initializes V and Front to {r} and λ(r)to m . By deﬁnition, for all m ∈ Cov(N , m ) there exists σ = t t ··· t ∈ T such 0 0 1 2 k that m − → m ≥ m. Since V \ Front = ∅, this ﬁring sequence is an exploring sequence. Assume that all m ∈ Cover(N , m ) are quasi-covered at the beginning of some iteration. Let us examine what may happen during the iteration. In lines 4-6, MinCov computes the maximal ﬁreable sequence σ ∈ Acc from λ (u) (line 4) and sets u’s marking to m := λ (u)+ C(σ) (line 5). Afterwards, there are u n three possible cases: (1) either m is covered by some marking associated with a vertex out of Front, (2) either an acceleration is found, (3) or MinCov computes the successors of u and removes u from Front. Line 7. MinCov calls Delete(u). So CT is obtained by deleting u. More- n+1 over, λ(u ) ≥ m . Let us check the hypotheses of Lemma 3. Assertion 1 follows from induction since (1) the only change in the data is the increas- ing of λ(u) by ﬁring some accelerations and (2) u belongs to Front so cannot 252 A. Finkel et al. cover intermediate markings of exploring sequences. Assertion 2 follows from Proposition 9 since V \ Front is unchanged. Assertion 3 follows immediately from lines 4-6. Assertion 4 follows with v = u . Thus using this lemma the induction is proved in this case. Lines 8-16. Let us check the hypotheses of Lemma 4. Assertions 1 and 2 are established as in the previous case. Assertion 3 holds due to Lemma 1, and the fact that no edge has been added since the beginning of iteration. Thus using this lemma the induction is proved in this case. Lines 18-25. We ﬁrst show that the hypotheses of Lemma 6 hold before line 21. Let us denote the values of CT and Front after line 20 by CT and Front . n n Observe that for all iteration of Line 19 in the inner loop, the hypotheses of Lemma 5 are satisﬁed. Therefore, in order to apply Lemma 6 it remains only to check assertions 2 and 3 of this lemma. Assertion 2 holds since (1) λ(V \ Front) is an antichain, (2) due to Line 7 there is no w ∈ V \ Front such that λ(w) ≥ λ(u), and (3) by iteration of Line 19 all w ∈ V \ Front such that λ(w) <λ(u) have been deleted. Assertion 3 holds due to Line 5 (all useful enabled accelerations have been ﬁred) and Line 8 (no acceleration has been added). Lines 21-25 correspond to the operations related to Lemma 6. Thus using this lemma, the induction is proved in this case. The completeness of MinCov is an immediate consequence of the previous proposition. Corollary 1. When MinCov terminates, Cover(N , m ) ⊆ λ(V ). Proof. By Proposition 11 all m ∈ Cover(N , m ) are quasi-covered. Since on termination, Front is empty for all m ∈ Cover(N , m ), there exists v ∈ V such that m ≤ λ(v). 4 Tool and benchmarks In order to empirically evaluate our algorithm, we have implemented a prototype tool which computes the clover and solves the coverability problem. This tool is developed in the programming language Python, using the Numpy library. It can be found on GitHub . All benchmarks were performed on a computer equipped by Intel i5-8250U CPU with 4 cores, 16GB of memory and Ubuntu Linux 18.03. Minimal coverability set. We compare MinCov with the tool MP [14], the tool VH [16], and the tool CovProc [10]. We have also implemented the (incomplete) minimal coverability tree algorithm denoted by AF in order to measure the ad- ditional memory needed for the (complete) tools. Both MP and VH tools were sent to us by the courtesy of the authors. The tool MP has an implementation https://github.com/IgorKhm/MinCov Minimal Coverability Tree Construction Made Complete and Eﬃcient 253 ++ in Python and another in C . For comparison we selected the Python one to avoid biases due to programming language. We ran two kinds of benchmarks: (1) 123 standard benchmarks from the literature in Table 1, (which were taken from [2]), (2) 100 randomly generated Petri nets also in Table 1, since the benchmarks from the literature do not present all the features that lead to inﬁnite state systems. These random Petri nets have the following properties: (1) 50 < |P |, |T | < 100, (2) the number of places connected of each transition is bounded by 10, and (3) they are not structurally bounded. The execution time of the tools was limited to 900 seconds. Table 1 contains a summary of all the instances of the benchmarks. The ﬁrst column shows the number of instances on which the tool timed out. The time column consists of the total time on instances that did not time out plus 900 seconds for any instance that led to a time out. The #Nodes column consists of the peak number of nodes in instances that did not time out on any of the tools (except CovProc which does not provide this number). For MinCov we take the peak number of nodes plus accelerations. In the benchmarks from the literature Table 1. Benchmarks for clover 123 benchmarks from the literature 100 random benchmarks T/O Time #Nodes T/O Time #Nodes MinCov 16 18127 48218 MinCov 14 13989 61164 VH 15 14873 75225 VH 15 13692 208134 MP 24 23904 478681 MP 21 21726 755129 CovProc 49 47081 N/A CovProc 80 74767 N/A AF 19 19223 45660 AF 16 15888 63275 we observed that the instances that timed out from MinCov are included in those of AF and MP. However there were instances the timed out on VH but did not time out on MinCov and vice versa. MinCov is the second fastest tool, and compared to VH it is 1.2 times slower. A possible explanation would be that VH is ++ implemented in C . As could be expected, w.r.t. memory requirements MinCov has the least number of nodes. In the benchmarks from the literature MinCov has approximately 10 times less nodes then MP and 1.6 times less then VH. In the random benchmarks these ratio are signiﬁcantly higher. Coverability. We compare MinCov to the tool qCover [2] on the set of bench- marks from the literature in Table 2. In [2], qCover is compared to the most competitive tools for coverability and achieves a score of 142 solved instances while the second best tool achieves a score of 122. We split the results into safe instances (not coverable) and unsafe ones (coverable). In both categories we counted the number of instances on which the tools failed (columns T/O) and the total time (columns Time) as in Table 1. We observed that the tools are complementary, i.e. qCover is faster at proving that an instance is safe and MinCov is faster at proving that an instance is unsafe. 254 A. Finkel et al. Table 2. Benchmarks for the coverability problem (60 unsafe and 115 safe) Time Unsafe T/O Unsafe Time safe T/O safe T/O Time MinCov 1754 1 51323 53 54 53077 qCover 26467 26 11865 11 37 38332 MinCov qCover 1841 2 13493 11 13 15334 Therefore, by splitting the processing time between them we get better results. The third row of Table 2 represents a parallel execution of the tools, where the time for each instance is computed as follows: Time(MinCov qCover) = 2 min (Time(MinCov), Time(qCover)) . Combining both tools is 2.5 times faster than qCover and 3.5 times faster than MinCov. This conﬁrms the above statement. We could still get better results by dynamically deciding which ratio of CPU to share between the tools depending on some predicted status of the instance. 5 Conclusion We have proposed a simple and eﬃcient modiﬁcation of the incomplete mini- mal coverability tree algorithm for building the clover of a net. Our algorithm is based on the introduction of the concepts of covering abstractions and accel- erations. Compared to the alternative algorithms previously designed, we have theoretically bounded the size of the additional space. Furthermore we have implemented a prototype which is already very competitive. From a theoretical point of view, we plan to study how abstractions and accelerations, could be deﬁned in the more general context of well structured transition systems. From an experimental point of view, we will follow three directions in order to increase the performance of our tool. First as in [13], we have to select appropriate data structures to minimize the number of compar- isons between ω-markings. Then we want to precompute a set of accelerations using linear programming as the correctness of the algorithm is preserved and the eﬃciency could be signiﬁcantly improved. Last we want to take advantage of parallelism in a more general way than simultaneously running several tools. Minimal Coverability Tree Construction Made Complete and Eﬃcient 255 References 1. Blockelet, M., Schmitz, S.: Model checking coverability graphs of vector addition systems. In: Proceedings of MFCS 2011. LNCS, vol. 6907, pp. 108–119 (2011) 2. Blondin, M., Finkel, A., Haase, C., Haddad, S.: Approaching the coverability prob- lem continuously. In: Proceedings of TACAS 2016. LNCS, vol. 9636, pp. 480–496. Springer (2016) 3. Blondin, M., Finkel, A., McKenzie, P.: Well behaved transition systems. Logical Methods in Computer Science 13(3), 1–19 (2017) 4. Demri, S.: On selective unboundedness of VASS. J. Comput. Syst. Sci. 79(5), 689– 713 (2013) 5. Finkel, A.: Reduction and covering of inﬁnite reachability trees. Information and Computation 89(2), 144–179 (1990) 6. Finkel, A.: The minimal coverability graph for Petri nets. In: Advances in Petri Nets. LNCS, vol. 674, pp. 210–243 (1993) 7. Finkel, A., Goubault-Larrecq, J.: Forward analysis for WSTS, part II: Complete WSTS. Logical Methods in Computer Science 8(4), 1–35 (2012) 8. Finkel, A., Geeraerts, G., Raskin, J.F., Van Begin, L.: A counter-example to the minimal coverability tree algorithm. Tech. rep., Universit´e Libre de Bruxelles, Bel- gium (2005), http://www.lsv.fr/Publis/PAPERS/PDF/FGRV-ulb05.pdf 9. Finkel, A., Haddad, S., Khmelnitsky, I.: Minimal coverability tree construction made complete and eﬃcient (2020), https://hal.inria.fr/hal-02479879 10. Geeraerts, G., Raskin, J.F., Van Begin, L.: On the eﬃcient computation of the min- imal coverability set of Petri nets. International Journal of Fundamental Computer Science 21(2), 135–165 (2010) 11. Karp, R.M., Miller, R.E.: Parallel program schemata. J. Comput. Syst. Sci. 3(2), 147–195 (1969) 12. Leroux, J.: Distance between mutually reachable Petri net conﬁgurations (Jun 2019), https://hal.archives-ouvertes.fr/hal-02156549, preprint 13. Piipponen, A., Valmari, A.: Constructing minimal coverability sets. Fundamenta Informaticae 143(3–4), 393–414 (2016) 14. Reynier, P.A., Servais, F.: Minimal coverability set for Petri nets: Karp and Miller algorithm with pruning. Fundamenta Informaticae 122(1–2), 1–30 (2013) 15. Reynier, P.A., Servais, F.: On the computation of the minimal coverability set of Petri nets. In: Proceedings of Reachability Problems 2019. LNCS, vol. 11674, pp. 164–177 (2019) 16. Valmari, A., Hansen, H.: Old and new algorithms for minimal coverability sets. Fundamenta Informaticae 131(1), 1–25 (2014) 256 A. Finkel et al. Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made. The images or other third party material in this chapter are included in the chapter’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. Constructing Inﬁnitary Quotient-Inductive Types () Marcelo P. Fiore , Andrew M. Pitts , and S. C. Steenkamp Department of Computer Science and Technology University of Cambridge, Cambridge CB3 0FD, UK s.c.steenkamp@cl.cam.ac.uk Abstract This paper introduces an expressive class of quotient-induct- ive types, called QW-types. We show that in dependent type theory with uniqueness of identity proofs, even the inﬁnitary case of QW-types can be encoded using the combination of inductive-inductive deﬁnitions involving strictly positive occurrences of Hofmann-style quotient types, and Abel’s size types. The latter, which provide a convenient constructive abstraction of what classically would be accomplished with transﬁnite ordinals, are used to prove termination of the recursive deﬁnitions of the elimination and computation properties of our encoding of QW-types. The development is formalized using the Agda theorem prover. Keywords: dependent type theory · higher inductive types · induct- ive-inductive deﬁnitions · quotient types · sized types · category theory 1 Introduction One of the key features of proof assistants based on dependent type theory such as Agda, Coq and Lean is their support for inductive deﬁnitions of families of types. Homotopy Type Theory [29] introduces a potentially very useful extension of the notion of inductive deﬁnition, the higher inductive types (HITs). To deﬁne an ordinary inductive type one declares how its elements are constructed. To deﬁne a HIT one not only declares element constructors, but also declares equality constructors in identity types (possibly iterated ones), specifying how the constructed elements and identities are to be equated. In this paper we work in a dependent type theory satisfying uniqueness of identity proofs (UIP), so that identity types are trivial in dimensions higher than one. Nevertheless, as Altenkirch and Kaposi [5] point out, HITs are still useful in such a one-dimensional setting. They introduce the term quotient inductive type (QIT) for this truncated form of HIT. Figure 1 gives two examples of QITs, using Agda-style notation for dependent type theory; in particular, Set denotes a universe of types and ≡ denotes the identity type. The ﬁrst example speciﬁes the element and equality constructors for the type Bag X of ﬁnite multisets of elements from a type X. The second example, adapted from [5], speciﬁes the element and equality constructors for the type ωTree X of trees whose nodes are labelled with elements of X and that have unordered countably inﬁnite branching. Both examples illustrate the nice feature The Author(s) 2020 J. Goubault-Larrecq and B. K¨ onig (Eds.): FOSSACS 2020, LNCS 12077, pp. 257–276, 2020. https://doi.org/10.1007/978-3-030-45231-5_14 258 M. P. Fiore et al. Finite multisets: data Bag(X : Set): Set where [] : Bag X _::_ : X → Bag X → Bag X swap :(xy : X)(ys : Bag X) → x :: y :: ys ≡ y :: x :: ys Unordered countably branching trees (elements of isIso f witness that f is a bijection): data ωTree(X : Set): Set where leaf : ωTree X node : X → (N → ωTree X) → ωTree X perm :(x : X)(f : N → N)(_ : isIso f)(g : N → ωTree X) → node xg ≡ node x (g ◦ f) Figure 1. Two examples of QITs of QITs that users only have to specify the particular identiﬁcations between data needed for their applications. Thus the standard property of equality that it is an equivalence relation respecting the constructors is inherited by construction from the usual properties of identity types, without the need to say so in the declaration of the QIT. The second example also illustrates a more technical aspect of QITs, that they enable constructive versions of structures that classically use non-constructive choice principles. The ﬁrst example in Figure 1 only involves element constructors of ﬁnite arity ([] is nullary and x :: _ is unary) and consequently Bag X is isomorphic to the type obtained from the ordinary inductive type of ﬁnite lists over X by quotienting by the congruence generated by swap.Ofcoursethis assumes, as we do in this paper, that the type theory comes with Hofmann-style quotient types [18, Section 3.2.6.1]. By contrast, the second example in the ﬁgure involves an element constructor with countably inﬁnite arity. So if one ﬁrst forms the ordinary inductive type of ordered countably branching trees (by dropping the equality constructor perm from the declaration) and then quotients by a suitable relation to get the equalities speciﬁed by perm, one needs the axiom of countable choice to be able to lift the node element constructor to the quotient; see [5, Section 2.2] for a detailed discussion. The construction of the Cauchy reals as a higher inductive-inductive type [29, Section 11.3] provides a similar, but more complicated example where use of countable choice is avoided. Such examples have led to the folklore that as far as constructive type theories go, inﬁnitary QITs are more expressive than the combination of ordinary inductive (or inductive-recursive, or inductive-inductive) types with quotient types. In this paper we use Abel’s sized types [2] to show that, for a wide class of QITs, this view is not justiﬁed. Thus we make two main contributions: First we deﬁne a family of QITs called QW-types and give elimination and computation rules for them (Section 2). The usual W-types of Martin-Löf [22] are inductive types giving the algebraic terms over a possibly inﬁnitary signature. Constructing Inﬁnitary Quotient-Inductive Types 259 One speciﬁes a QW-type by giving a family of equations between such terms. So such QITs give initial algebras for possibly inﬁnitary algebraic theories. As we indicate in Section 3, they can encode a very wide range of examples of possibly inﬁnitary quotient-inductive types, namely those that do not involve constructors taking previously constructed equalities as arguments (so do not cover the inﬁnitary extension of the very general scheme considered by Dybjer and Moeneclaey [12]). In set theory with the Axiom of Choice (AC), QW-types can be constructed simply as Quotients of the underlying W-type—hence the name. Secondly, we prove that contrary to expectation, without AC it is still possible to construct QW-types using quotients, but not simply by quotienting a W-type. Instead, the type to be quotiented and the relation by which to quotient are given simultaneously by deﬁnitions that refer to each other. Thus our construction (in Section 4)involves inductive-inductive deﬁnitions [15]. The elimination and computation functions which witness that the quotiented type correctly represents the required QW-type are deﬁned recursively. In order to prove that our recursive deﬁnitions terminate we combine the use of inductive deﬁnitions involving strictly positive occurrences of quotient types with sized types (currently, we do not know whether it is possible to avoid sizing in favour of, say, a suitable well-founded termination ordering). Sized types provide a convenient constructive abstraction of what classically would be accomplished with sequences of transﬁnite ordinal length. The type theory in which we work To present our results we need a version of Martin-Löf Type Theory with (1) uniqueness of identity proofs, (2) quotient types and hence also function ex- tensionality, (3) inductive-inductive datatypes (with strictly positive occurrences of quotient types) and (4) sized types. Lean 3 provides (1) and (2) out of the box, but also the Axiom of Choice, unfortunately. Neither it, nor Coq provide (3) and (4). Agda provides (1) via unrestricted dependent pattern-matching, (2) via a combination of postulates and the rewriting mechanism of Cockx and Abel [8], (3) via its very liberal mechanism for mutual deﬁnitions and (4) thanks to theworkofAbel[2]. Therefore we make use of the type theory implemented by Agda (version 2.6.0.1) to give formal proofs of our results. The Agda code can be found at doi: 10.17863/CAM.48187. In this paper we describe the results informally, using Agda-style notation for dependent type theory. In particular we use Set to denote the universe at the lowest level of a countable hierarchy of (Russell-style) universes. We also use Agda’s convention that an implicit argument of an operation can be made explicit by enclosing it in {braces}. Acknowledgement We would like to acknowledge the contribution Ian Orton made to the initial development of the work described here. He and the ﬁrst author supervised the third author’s Master’s dissertation Quotient Inductive Types: A Schema, Encoding and Interpretation, in which the notion of QW-type (there called a W -type) was introduced. 260 M. P. Fiore et al. 2QW-types We begin by recalling some facts about types of well-founded trees, the W-types of Martin-Löf [22]. We take signatures to be elements of the dependent product Sig = A : Set, (A → Set) (1) So a signature is given by a pair Σ=(A, B) consisting of a type A : Set and afamilyoftypes B : A → Set. Each such signature determines a polynomial endofunctor [1, 16] S{Σ} : Set → Set whose value at X : Set is the following dependent product S{Σ}X = a : A, (Ba → X) (2) An S-algebra is by deﬁnition an element of the dependent product Alg{Σ} = X : Set, (S X → X) (3) S-algebra morphisms (X, s) → (X ,s ) are given by functions h : X → X together with an element of the type isHom h =(a : A)(b : Ba → X) → s (a, h ◦ b) ≡ h(s(a, b)) (4) Then the W-type W{Σ} determined by Σ is the underlying type of an initial S-algebra. More generally, Dybjer [11] shows that the initial algebra of any non- nested, strictly positive endofunctor on Set is given by a W-type; and Abbott, Altenkirch, and Ghani [1] extend this to the case with nested uses of W-types as part of their work on containers. (These proofs take place in extensional type theory [22], but work just as well in the intensional type theory with uniqueness of identity proofs and function extensionality that we are using here.) More concretely, given a signature Σ =(A, B), if one thinks of elements a : A as names of operation symbols whose (not necessarily ﬁnite) arity is given by the type Ba : Set, then the elements of W{Σ} represent the closed algebraic terms (i.e. well-founded trees) over the signature. From this point of view it is natural to consider not only closed terms solely built up from operations, but also open terms additionally built up with variables drawn from some type X.As well as allowing operators of possibly inﬁnite arity, we also allow terms involving possibly inﬁnitely many variables (the second example in Figure 1 involves such terms). Categorically, the type T{Σ}X of such open terms is the free S-algebra on X and is another W-type, for the signature obtained from Σ by adding the elements of X as nullary operations. Nevertheless, it is convenient to give a direct inductive deﬁnition: data : T{Σ: Sig}(X : Set): Set where η : X → T X (5) σ : S(T X) → T X Given an S-algebra (Y, s): Alg{Σ} and a function f : X → Y , the unique morphism of S-algebras from the free S-algebra (T X, σ) on X to (Y, s) has Constructing Inﬁnitary Quotient-Inductive Types 261 underlying function T X → Y mapping each t : T X to the element t = f in Y deﬁned by recursion on the structure of t: ηx = f = fx (6) σ(a, b) = f = s(a, λx → bx = f) As the notation suggests, = is the Kleisli lifting operation (“bind”) for a monad structure on T; indeed, it is the free monad on the endofunctor S. The notion of “QW-type” that we introduce in this section is obtained from that of W-type by considering not only the algebraic terms over a given signature, but also equations between terms. To code equations we use a type-theoretic rendering of a categorical notion of equational system introduced by Fiore and Hur, referred to as term equational system [14, Section 2] and as monadic equational system [13, Section 5], here instantiated to free monads on signatures. Deﬁnition 1. A system of equations over a signature Σ: Sig is speciﬁed by – atype E : Set (whose elements e : E name the equations) – a family of types V : E → Set (Ve : Set contains the variables used in the equation named e : E) – for each e : E, elements le and re of type T(Ve),thefree S-algebra on Ve (the terms with variables from Ve that are equated by the equation named e). Thus a system of equations over Σ is an element of the dependent product Syseq{Σ} = E : Set, V :(E → Set), (7) ((e : E) → T(Ve)) × ((e : E) → T(Ve)) An S{Σ}-algebra S X → X satisﬁes the system of equations ε =(E, V, l, r): Syseq{Σ} if there is an element of type Sat{ε}X =(e : E)(ρ : Ve → X) → ((le) = ρ) ≡ ((re) = ρ) (8) The category-theoretic view of QW-types is that they are simply S-algebras that are initial among those satisfying a given system of equations: Deﬁnition 2. A QW-type for a signature Σ=(A, B): Sig and system of equations ε =(E, V, l, r): Syseq{Σ} is given by a type QW{Σ}{ε} : Set equipped with an S-algebra structure and a proof that it satisﬁes the equations qwintro : S(QW) → QW (9) qwequ : Sat{ε}(QW) (10) together with functions that witness that it is the initial such algebra: qwrec :(X : Set)(s : S X → X) → Sat X → QW → X (11) qwrechom :(X : Set)(s : S X → X)(p : Sat X) → isHom(qwrecXsp) (12) qwuniq :(X : Set)(s : S X → X)(p : Sat X)(f : QW → X) → (13) isHom f → qwrecXsp ≡ f Note that the deﬁnition of = depends on the S-algebra structure s; in Agda we use instance arguments to hide this dependence. 262 M. P. Fiore et al. Given the deﬁnitions of S{Σ} in (2) and Sat{ε} in (8), properties (9) and (10) suggest that a QW-type is an instance of the notion of quotient-inductive type [5] with element constructor qwintro and equality constructor qwequ. For this to be so, QW{Σ}{ε} needs to have the requisite dependently-typed elimination and computation properties for these element and equality constructors. As Proposition 1 below shows, these follow from (11)–(13), because we are working in a type theory with function extensionality (by virtue of assuming quotient types). To state the proposition we need a dependent version of (6). For each P : QW → Set (14) p :(a : A)(b : Ba → QW) → ((x : Ba) → P (bx)) → P (qwintro(a, b)) type X : Set, function f : X → x : QW,P x and term t : T(X),wegetan element liftPpf t : P (t = fst ◦ f) deﬁned by recursion on the structure of t: liftPpf (ηx)= snd(fx) (15) liftPpf (σ(a, b)) = pa (λx → bx = (fst ◦ f))(liftPpf ◦ b) Proposition 1. For a QW-type as in the above deﬁnition, given P and p as in (14) and a term of type (e : E)(f : Ve → x : QW,P x) → liftPpf (le) ≡≡ liftPpf (re) (16) there are elimination and computation terms: qwelim :(x : QW) → Px qwcomp :(a : A)(b : Ba → QW) → qwelim(qwintro(a, b)) ≡ pab (qwelim ◦ b) (Note that (16) uses McBride’s heterogeneous equality type [23], which we denote by ≡≡,because liftPpf (le) and liftPpf (re) inhabit diﬀerent types, namely P (le = fst ◦ f) and P (re = fst ◦ f) respectively.) The proof of the proposition can be found in the accompanying Agda code (doi: 10.17863/CAM.48187). So QW-types are in particular quotient-inductive types (QITs). Conversely, in the next section we show that a wide range of QITs can be encoded as QW-types. Then in Section 4 we prove: Theorem 1. In constructive dependent type theory with uniqueness of identity proofs (or equivalently the Axiom K of Streicher [27]) and universes with induct- ive-inductive datatypes [15] permitting strictly positive occurrences of quotient types [18] and sized types [2], for every signature and system of equations (Deﬁn- ition 1) there is a QW-type as in Deﬁnition 2. We only establish the computation property up to propositional rather than deﬁni- tional equality; so, using the terminology of Shulman [25], these are typal quotient-in- ductive types. Constructing Inﬁnitary Quotient-Inductive Types 263 Remark 1 (Free algebras). Deﬁnition 2 deﬁnes QW-types as initial algebras. A corollary of Theorem 1 is that free-algebras also exist. In other words, given a signature Σ and a type X : Set,there isan S-algebra (F{Σ}{ε}X, S{Σ}(F{Σ}{ε}X) → F{Σ}{ε}X) satisfying a system of equations ε and equipped with a function X → F{Σ}{ε}X, and which is universal among such S-algebras. Thus QW{Σ}{ε} is isomorphic to F{Σ}{ε}∅,where ∅ is the empty datatype. To see that such free algebras can be constructed as QW-types, given a signature Σ=(A, B),let Σ be the signature (X A, B ),where X A is the coproduct datatype (with constructors inl : X → X A and inr : A → X A) and where B : X A → Set maps each inl x to ∅ and each inr a to Ba.Given a system of equations ε =(E, V, l, r),let ε be the system (E, V, l ,r ) where X X X for each e : E, l e = le = η and r e = re = η (using η : Ve → T{Σ }(Ve) X X X as in (5) and the S{Σ}-algebra structure s on T{Σ }(Ve) given by s(a, b)= σ(inr a, b)). Then one can show that the QW-type QW{Σ }{ε } is the free X X algebra F{Σ}{ε}X, with the function X → F{Σ}{ε}X sending each x : X to qwintro(inl x, _): QW{Σ }{ε },and the S{Σ}-algebra structure on F{Σ}{ε}X X X being given by the function sending (a, b): S(QW{Σ }{ε }) to qwintro(inr a, b). X X Remark 2 (Strictly positive equational systems). A very general, categorical notion of equational system was introduced by Fiore and Hur [14, Section 3]. They regard any endofunctor S : Set → Set as a functorial signature.A functorial term over such a signature, S G L, is speciﬁed by another functorial signature G : Set → Set (the term’s context) together with a functor L from S-algebras to G-algebras that commutes with the forgetful functors to Set.Thenan equational system is given by a pair of such terms in the same context, S G L and S G R say. An S-algebra s : SX → X satisﬁes the equational system if L(X, s) and R(X, s) are equal G-algebras. Taking the strictly positive endofunctors Set → Set to be the smallest collec- tion containing the identity and constant endofunctors and closed under forming dependent products and dependent functions over ﬁxed types then, as in [11] (and also in the type theory in which we work), up to isomorphism every such endofunctor is of the form S{Σ} for some signature Σ: Sig. If we restrict atten- tion to equational systems S G L, R with S and G strictly positive, then it turns out that such equational systems are in bijection with the systems of equations from Deﬁnition 1, and the two notions of satisfaction for an algebra coincide in that case. (See our Agda development for a proof of this.) So Dybjer’s characterisation of W-types as initial algebras for strictly positive endofunctors generalises to the fact that QW-types are initial among the algebras satisfying strictly positive equational systems in the sense of Fiore and Hur. 3 Quotient-inductive types Higher inductive types (HITs) are originally motivated by their use in homotopy type theory to construct homotopical cell complexes, such as spheres, tori, and 264 M. P. Fiore et al. so on [29]. Intuitively, a higher inductive type is an inductive type with point constructors also allowing for path constructors, surface constructors, etc., which are represented as elements of (iterated) identity types. For example, the sphere is given by the HIT : data S : Set where (17) base : S surf : reﬂ ≡ reﬂ base≡ base In the presence of the UIP axiom we will refer to HITs as quotient inductive types (QITs) [5], since all paths beyond the ﬁrst level are trivial and any HIT is truncated to an h-set. We use the terms element constructor and equality constructor to refer to the point constructors and the only non-trivial level of path constructors. We believe that QW-types can be used to encode a wide range of QITs: see Conjecture 1 below. As evidence, we give several examples of QITs encoded as QW-types, beginning with the two examples of QITs in Figure 1, giving the corresponding signature (A, B) and system of equations (E, V, l, r) as in Deﬁnition 2. Example 1 (Finite multisets). The element constructors for ﬁnite multisets are encoded exactly as with a W-type: the constructors are [] and x :: _ for each x : X.Sowetake A to be 1 X, the coproduct of the unit type 1 (whose single constructor is denoted tt)with X.Thearityof [] is zero, and the arity of each x :: _ is one, represented by the empty type ∅ and unit type 1 respectively; so we take B : A → Set to be the function [λ_→ 0 | λ_→ 1]: 1 X → Set mapping inl tt to ∅ and each inr x to 1. The swap equality constructor is parameterised by elements of E = X × X. For each (x, y): E, swap xy yields an equation involving a single free vari- able (called ys : Bag X in Figure 1); so we take V : E → Set to be λ_→ 1. Each side of the equation named by swap xy iscodedbyanelementof T{Σ}(V (x, y)) = T{Σ}(1). Recalling the deﬁnition of T from (5), the single free variable corresponds to η tt : T{Σ}(1) andthenthe left-handsideof the equation is σ(inr x, (λ_→ σ(inr y, (λ_→ η tt)))) and the right-hand side is σ(inr y, (λ_→ σ(inr x, (λ_→ η tt)))). So, altogether, the signature and system of equations for the QW-type corres- ponding to the ﬁrst example in Figure 1 is: A = 1 XE = X × X B =[λ_→∅ | λ_→ 1] V = λ_→ 1 l = λ (x, y) → σ(inr x, (λ_→ σ(inr y, (λ_→ η tt)))) r = λ (x, y) → σ(inr y, (λ_→ σ(inr x, (λ_→ η tt)))) The subscript on ≡ will be treated as an implicit argument and omitted when clear. Constructing Inﬁnitary Quotient-Inductive Types 265 Example 2 (Unordered countably-branching trees). Here the element constructors are leaf of arity zero and, for each x : X, node x of arity N. So we use the signature with A = 1 X and B =[λ_→∅ | λ_→ N]. The perm equality constructor is parameterised by elements of E = X × f :(N → N), isIso f For each element (x, f, i) of that type, perm xf i yields an equation involving an N-indexed family of variables (called g : N → ωTree X in Figure 1); so we take V : E → Set to be λ_→ N. Each side of the equation named by perm xf i iscodedbyanelementof T{Σ}(V (x, f, i)) = T{Σ}(N).The N-indexed family of variables is represented by the function η : N → T{Σ}(N) and its permuted version by η ◦ f . Thus the left- and right-hand sides of the equation named by perm xf i are coded respectively by the elements σ(inr x, η) and σ(inr x, η ◦ f) of T{Σ}(N). So, altogether, the signature and system of equations for the QW-type corres- ponding to the second example in Figure 1 is: A = 1 XE = X × f :(N → N), isIso f B =[λ_→∅ | λ_→ N] V = λ_→ N l = λ (x, _, _) → σ(inr x, η) r = λ (x, f, _) → σ(inr x, η ◦ f) That unordered countably-branching trees are a QW-type is signiﬁcant since no previous work on various subclasses of QITs (or indeed QIITs [19, 10]) supports inﬁnitary QITs [6, 26, 28, 12, 19, 10]. See Example 5 for another, more substantial inﬁnitary QW-type. So this extension represents one of our main contributions. QW-types generalise prior developments; the internal encodings for particular subclasses of 1-HITs given by Sojakova [26] and Swan [28] are direct instances of QW-types, as the next two examples show. Example 3. W-suspensions [26] are an instance of QW-types. The data for a W-suspension is: A ,C : Set, a type family B : A → Set and functions l ,r : C → A . The equivalent QW-type is: A = A E = C l = λc → σ((l c), η) B = B V = λc → (B (l c)) × (B (r c)) r = λc → σ((r c), η) Example 4. The non-indexed case of W-types with reductions [28] are QW-types. The data of such a type is: Y : Set, X : Y → Set and a reindexing map R :(y : Y ) → Xy. The reindexing map identiﬁes a term σ (y, α) with some α (Ry) used to construct it. The equivalent QW-type is given by: A =YE =Yl = λy → σ (y, η) B =XV =Xr = λy → η (Ri) 266 M. P. Fiore et al. Example 5. Lumsdaine and Shulman [21, Section 9] give an example of a HIT not constructible in type theory from only pushouts and N.TheirHIT F can be thought of as a set of notations for countable ordinals. It consists of three point constructors: 0: F , S : F → F,and sup :(N → F ) → F , and ﬁve path constructors which are omitted here for brevity. It is inspired by the inﬁnitary algebraic theory of Blass [7, Section 9] and hence it is not surprising that it can be encoded by a QW-type; the details can be found in our Agda code. 3.1 General QIT schemas Basold,Geuvers,andvander Weide[6] present a schema (though not a model) for inﬁnitary QITs that do not support conditional path equations. Constructors are deﬁned by arbitrary polynomial endofunctors built up using (non-dependent) products and sums, which means in particular that parameters and arguments can occur in any order. They require constructors to be in uncurried form. Dybjer and Moeneclaey [12, Sections 3.1 and 3.2] present a schema for ﬁnitary QITs that supports conditional path equations, where constructors are allowed to take inductive arguments not just of the datatype being declared, but also of its identity type. This schema can be generalised to inﬁnitary QITs with conditional path equations. We believe this extension of their schema to be the most general schema for QITs. The schema requires all parameters to appear before all arguments, whereas the schema for regular inductive types in Agda is more ﬂexible, allowing parameters and arguments in any order. We wish to combine the schema for inﬁnitary QITs of Basold, Geuvers, and van der Weide [6] with the schema for QITs with conditional path equations of Dybjer and Moeneclaey [12] to provide a general schema. Moreover, we would like to combine the arbitrarily ordered parameters and arguments of the former with the curried constructors of the latter in order to support ﬂexible pattern matching. For consistency with the deﬁnition of inductive types in Agda [9, equation (25) and ﬁgure 1] we will deﬁne strictly positive (i.e. polynomial) endofunctors in terms of strictly positive telescopes. A telescope is given by the grammar: Δ ::= empty telescope (18) | (x : A)Δ (x/ ∈ dom(Δ)) non-empty telescope A telescope extension (x : A)Δ binds (free) occurrences of x inside the tail Δ. The type A may contain free variables that are later bound by further telescope extensions on the left. A telescope can also exist in a context which binds any free variables not already bound in the telescope. Such a context is implicit in the following deﬁnitions. A function type Δ → C from a telescope Δ to a type C is deﬁned as an iterated dependent function type by: def → C = C (19) def (x : A)Δ → C =(x : A) → (Δ → C) Constructing Inﬁnitary Quotient-Inductive Types 267 A strictly positive endofunctor on a variable Y is presented by a strictly positive telescope Δ=(x :Φ (Y ))(x :Φ (Y )) ··· (x :Φ (Y )) (20) 1 1 2 2 n n where each type scheme Φ is described by a expression on Y made up of Π-types, Σ-types, and any (previously deﬁned “constant”) types A not containing Y , accordingtothe grammar: Φ(Y ), Ψ(Y ) ::=(y : A) → Φ(Y ) | Σ p :Φ(Y ), Ψ(Y ) | A | Y (21) def For example, Δ = (x : X)(f : N → Y ) is the strictly positive telescope for the node constructor in Figure 1. In this instance, reordering x and f is permitted by exchange. Note that the variable Y can never appear in the argument position of a Π-type. Now it is possible to deﬁne the form of the endpoints of an equality (within the context of a strictly positive telescope), corresponding to the notion of an abstract syntax tree with free variables. With this intuition in mind, we can take the deﬁnition in Dybjer and Moeneclaey’s presentation [12] of endpoints given by point constructor patterns : :: l, r, p = c k | y (22) Where y : Y is in the context of the telescope for the equality constructor, and k is a term built without any rule for Y , but which may use other point constructor patterns p : Y . (That is, any sub-term of type Y must either be a variable y : Y found in the telescope, or a constructor for Y applied to further point constructor patterns and earlier deﬁned constants. It could not, for instance, use the function application rule for Y with some function g : M → Y , not least since such functions cannot be deﬁned before deﬁning Y .) Note that this exactly matches the type T in (5). Basold, Geuvers, and van der Weide’s presentation has a sightly more general notion of constructor term [6, Deﬁnition 6] (Dybjer and Moeneclaey’s presentation [12] has more restricted telescopes). It is deﬁned by rules which operate in the context of a strictly positive (polynomial) telescope and permit use of its bound variables, and the use of constructors c , but not any other rules for Y .Wetake the dependent form of their rules for products and functions. Note that these rules do not allow the use of terms of type ≡ in the endpoints. As with inductive types, the element constructors of QITs are speciﬁed by strictly positive telescopes. The equality constructors also permit conditions to appear in strictly positive positions, where l and r are constructor terms accordingtogrammar(22): Φ(Y ), Ψ(Y ) ::= (same grammar as in (21)) | l ≡ r (23) Y 268 M. P. Fiore et al. Deﬁnition 3. A QIT is deﬁned by a list of named element constructors and equality constructors: data Y : Set where c :Δ → Y 1 1 c :Δ → Y n n p :Θ → l ≡ r 1 1 1 Y 1 p :Θ → l ≡ r m m m Y m where Δ are strictly positive telescopes on Y according to (21),and Θ are i j strictly positive telescopes on Y and ≡ in which conditions may also occur in strictly positive positions according to (23). QITs without equality constructors are inductive types. If none of the equality constructors contain Y in an argument position then it is called non-recursive, otherwise it is called recursive [6]. If none of the equality constructors contain an equality in Y then we call it a non-conditional,or equational, QIT, otherwise it is called a conditional [12], or quasi-equational, QIT. If all of the constant types A in any of the constructors are ﬁnite (isomorphic to Fin n for n : N)thenitiscalled a ﬁnitary QIT [12]. Otherwise, it is called a generalised [12], or inﬁnitary, QIT. We are not aware of any existing examples in the literature of HITs which allow the point constructors to be conditional (though it is not diﬃcult to imagine), nor any schemes for HITs that allow such deﬁnitions. However, we do believe this is worth investigating further. Conjecture 1. Any equational QIT can be encoded as a QW-type. We believe this can be proved analogously to the approach of Dybjer [11]for inductive types, though the endpoints still need to be considered and we have not yet translated the schema in deﬁnition 3 into Agda. Remark 3. Assuming Conjecture 1, Basold, Geuvers, and van der Weide’s schema [6], being an equational (non-conditional) instance of Deﬁnition 3, can be encoded as a QW-type. 4 Construction of QW-types In Section 2 we deﬁned a QW-type to be initial among algebras over a given (possibly inﬁnitary) signature satisfying a given systems of equations (Deﬁnition 2). If one interprets these notions in classical Zermelo-Fraenkel set theory with the axiom of Choice (ZFC), one regains the usual notion from universal algebra of initial algebras for inﬁnitary equational theories. Since in the set-theoretic interpretation there is an upper bound on the cardinality of arities of operators in a given signature Σ, the ordinal-indexed sequence S (∅) of iterations of the functor in (2) starting from the empty set eventually becomes stationary; and Constructing Inﬁnitary Quotient-Inductive Types 269 so the sequence has a small colimit, namely the set W{Σ} of well-founded trees over Σ. A system of equations ε (Deﬁnition 1)over Σ generates a Σ-congruence relation ∼ on W{Σ}. The quotient set W{Σ}/∼ yields the desired initial algebra for (Σ,ε) provided the S-algebra structure on W{Σ} induces one on the quotient set. It does so, because for each operator, using AC one can pick representatives of the (possibly inﬁnitely many) equivalence classes that are the arguments of the operator, apply the interpretation of the operator in W{Σ} andthentake the equivalence class of that. So the set-theoretic model of type theory in ZFC models QW-types. Is this use of choice really necessary? Blass [7, Section 9] shows that if one drops AC and just works in ZF, then provided a certain large cardinal axiom is consistent with ZFC, it is consistent with ZF that there is an inﬁnitary equational theory with no initial algebra. He shows this by ﬁrst exhibiting a countably presented equational theory whose initial algebra has to be an uncountable regular cardinal; and secondly appealing to the construction of Gitik [17]ofa model of ZF with no uncountable regular cardinals (assuming a certain large cardinal axiom). Lumsdaine and Shulman [21] turn the inﬁnitary equational theory of Blass into a higher-inductive type that cannot be proved to exist in ZF (and hence cannot be constructed in type theory just using pushouts and the natural numbers). We noted in Example 5 that this higher inductive type can be presented as a QW-type. So one cannot hope to construct QW-types using a type theory which is interpretable in just ZF. However, the type theory in which we work, with its universes closed under inductive-inductive deﬁnitions, already requires going beyond ZF to be able to give it a naive, classical set-theoretic interpretation (by assuming the existence of enough strongly inaccessible cardinals, for example). So the above considerations about initial algebras for inﬁnitary equational theories in classical set theory do not rule out the construction of QW-types in the type theory in which we work. However, something more than just quotienting a W-type is needed in order to prove Theorem 1. Figure 2 gives a ﬁrst attempt to do this (which later we will modify using sized types to get around a termination problem). The deﬁnition is relative to a given signature Σ: Sig and system of equations ε =(E, V, l, r): Syseq Σ. It makes use of quotient types, which we add to Agda via postulates, as shown in Figure 3. The REWRITE pragma makes elim RB f e (mkRx) deﬁnitionally equal to fx and is not merely a computational convenience—this is what allows function extensionality to be proved from these postulated quotient types. The POLARITY pragmas enable the postulated quotients to be used in datatype declarations at positions that Adga deems to be strictly positive; a case in point being the deﬁnitions of Q and Q in Figure 2. Agda’s test for strict positivity is sound 0 1 with respect to a set-theoretic semantics of inductively deﬁned datatypes that are built up using strictly positive uses of dependent functions; the semantics of such datatypes uses initial algebras for endofunctors possessing a rank. Here we The actual implementation is polymorphic in universe levels, but for simplicity here we just give the level-zero version. 270 M. P. Fiore et al. mutual data Q : Set where sq : TQ → Q data Q : Q → Q → Set where 1 0 0 sqeq :(e : E)(ρ : Ve → Q) → Q (sq(T'ρ (le))) (sq(T'ρ (re))) sqη :(x : Q ) → Q (sq(η(qu x))) x 0 1 sqσ :(s : S(TQ)) → Q (sq(σ s)) (sq(ι(S'(qu ◦ sq) s))) Q : Set Q = Q /Q 0 1 qu : Q → Q qu = quot.mk Q QW{Σ}{ε} = Q Figure 2. First attempt at constructing QW-types are allowing the inductively deﬁned datatypes to be built up using quotients as well, but this is semantically unproblematic, since quotienting does not increase rank. (Later we need to combine the use of POLARITY with sized types; the semantics of this has been studied for System F [3], but needs to be explored further for Agda.) We build up the underlying inductive type Q to be quotiented using a constructor sq that takes well-founded trees T(Q /Q ) of whole equivalence 0 1 classes with respect to a relation Q that is mutually inductively deﬁned with Q —an instance of an inductive-inductive deﬁnition [15]. The deﬁnition of Q 0 1 makes use of the actions on functions of the signature endofunctor S and its associated free monad T (Section 2); those actions are deﬁned as follows: S' : {XY : Set}→ (X → Y ) → S X → S Y (24) S' f (a, b)=(a, f ◦ b) T' : {XY : Set}→ (X → Y ) → T X → T Y (25) T'ft = t = (η ◦ f) The deﬁnition of Q also uses the natural transformation ι : {X : Set}→ S X → T X deﬁned by ι = σ ◦ S' η. Turning to the proof of Theorem 1 using the deﬁnitions in Figure 2,the S-algebra structure (9) is easy to deﬁne without using any form of choice, because ofthetypeof Q ’s constructor sq.Indeed,wecanjusttake qwintro to be qu◦ sq◦ι : S(QW) → QW. The ﬁrst constructor sqeq of the data type Q ensures that the quotient Q /Q satisﬁes the equations in ε,sothatweget qwequ as 0 1 in (10); and the other two constructors, sqη and sqσ make identiﬁcations that The use of the free monad T{Σ} in the domain of sq, rather than just S{Σ}, seems necessary in order to deﬁne Q with the properties needed for (10)–(13). 1 Constructing Inﬁnitary Quotient-Inductive Types 271 module quot where postulate ty : {A : Set}(R : A → A → Set) → Set mk : {A : Set}(R : A → A → Set) → A → ty R eq : {A : Set}(R : A → A → Set){xy : A}→ Rx y → mk Rx ≡ mk Ry elim : {A : Set}(R : A → A → Set)(B : ty R → Set)(f :(x : A) → B(mk Rx)) (e : {xy : A}→ Rx y → fx ≡≡ fy)(z : ty R) → Bz comp : {A : Set}(R : A → A → Set)(B : ty R → Set)(f :(x : A) → B(mk Rx)) (e : {xy : A}→ Rx y → fx ≡≡ fy)(x : A) → elim RB f e (mk Rx) ≡ fx {-# REWRITE comp -#} {-# POLARITY ty ++ ++ -#} {-# POLARITY mk __* -#} _/_ :(A : Set)(R : A → A → Set) → Set A/R = quot.t y R Figure 3. Quotient types enable the construction of functions qwrec, qwrechom and qwuniq as in (11)–(13). However, there is a problem. Given X : Set, s : S X → X and e : Sat X,for qwrecXse we have to construct a function r : Q → X.Since Q = Q /Q is a 0 1 quotient, we will have to use the eliminator quot.elim from Figure 3 to deﬁne r. The following is an obvious candidate deﬁnition mutual (26) r : Q → X r = quot.elim Q (λ_ → X) r r 1 0 1 r : Q → X 0 0 r (sq t)= t = r r : {xy : Q }→ Q xy → r x ≡ r y 1 0 1 0 0 r = ··· (where we have elided the details of the invariance proof r ). The problem with this mutually recursive deﬁnition is that it is not clear to us (and certainly not to Agda) whether it gives totally deﬁned functions: although the value of r at a typical element sq t is explained in terms of the structurally smaller element t,the explanation involves r, whose deﬁnition uses the whole function r rather than some application of it at a structurally smaller argument. Agda’s termination checker rejects the deﬁnition. We get around this problem by using a type-based termination method, namely Agda’s implementation of sized types [2]. Intuitively, this provides a type Size of “sizes” which give a constructive abstraction of features of ordinals in ZF when they are used to index sequences of sets that eventually become stationary, such as in various transﬁnite constructions of free algebras [20, 14]. In Agda, the type Size comes equipped with various relations and functions: given sizes 272 M. P. Fiore et al. mutual data Q (i : Size): Set where sq : {j : Size< i}→ T(Q j) → Q i data Q (i : Size): Q i → Q i → Set where 1 0 0 sqeq : {j : Size< i}(e : E)(ρ : Ve → Q j) → Q i (sq(T'ρ (le))) (sq(T'ρ (re))) sqη : {j : Size< i}(x : Q j) → Q i (sq(η(qujx))) (φ ix) 0 1 0 sqσ : {j : Size< i}{k : Size< j}(s : S(T(Q k))) → Q i (sq(σ s)) (sq(ι(S'(qu j ◦ sq) s))) Q : Size → Set Q i =(Q i)/Q i 0 1 qu :(i : Size) → Q i → Q i qu i = quot.mk (Q i) φ :(i : Size){j : Size< i}→ Q j → Q i 0 0 0 φ i (sq z)= sq z QW{Σ}{ε} = Q ∞ Figure 4. Construction of QW-types using sized types i, j : Size,thereisarelation i : Size< j to indicate strictly increasing size (so the type Size< j is treated as a subtype of Size); there is a successor operation ↑ : Size → Size (and also a join operation _ _ : Size → Size → Size, but we do not need it here); and a size ∞ : Size to indicate where a sequence becomes stationary. Thus we construct the QW-type QW{Σ}{ε} as Q ∞ for a suitable size-indexed sequence of types Q : Size → Set, shown in Figure 4. For each size i : Size,the type Q i is a quotient Q i/Q i, where the construct- 0 1 ors of the data types Q i and Q i take arguments of smaller sizes j : Size< i. 0 1 Consequently in the following sized version of (26) mutual (27) r : {i : Size}→ Q i → X r{i} = quot.elim (Q i)(λ_ → X)(r {i})(r {i}) 1 0 1 r : {i : Size}→ Q i → X 0 0 r {i}(sq {j} t)= t = r {j} r : {i : Size}{xy : Q i}→ Q ixy → r x ≡ r y 1 0 1 0 0 r = ··· the deﬁnition of r {i} involves a recursive call via r to the whole function r , but 0 0 at a size j which is smaller than i. So now Agda accepts that the deﬁnition of qwrecXse as r ∞,with r as in (27), is terminating. Thus we get a function qwrec for (11). We still have (9), but now with qwintro = qu ∞ ◦ sq {∞}◦ ι; and as before, the constructor sqeq of Q in Figure 4 ensures that QW =(Q ∞)/Q ∞ satisﬁes the equations ε. With these deﬁnitions 0 1 it turns out that each qwrecXse is an S-algebra morphism up to deﬁnitional Constructing Inﬁnitary Quotient-Inductive Types 273 equality, so that the function qwrechom needed for (12) is straightforward to deﬁne. Finally, the function qwuniq needed for (13) can be constructed via a sequence of lemmas making use of the other two constructors of the data type Q ,namely sqη, which makes use of an auxiliary function for coercing between diﬀerent size instances of Q ,and sqσ. We refer the reader to the accompanying Agda code (doi: 10.17863/CAM.48187) for the details of the construction of qwuniq. Altogether, the sized deﬁnitions in Figure 4 allow us to complete a proof of Theorem 1. 5 Conclusion QW-types are a general form of QIT that capture many examples, including simple 1-cell complexes and non-recursive QITs [6], non-structural QITs [26], W-types with reductions [28], and also inﬁnitary QITs (e.g. unordered inﬁnitely branching trees [5], and ordinals [21]). They also capture the notion of initial (and free) algebras for strictly positive equational systems [14], analogously to how W-types capture the notion of initial (and free) algebras for strictly positive endofunctors (see Remark 2). Using Agda to formalise our results, we have shown that it is possible to construct any QW-type, even inﬁnitary ones, in intensional type theory satisfying UIP, using inductive-inductive deﬁnitions permitting strictly positive occurrences of quotients and sized types (see Theorem 1 and Section 4). We conclude by mentioning related work and some possible directions for future work. Quotients of monads. In view of Remark 2, Section 4 gives a construction of initial algebras for equational systems [14]onthe free monad T{Σ} generated by asignature Σ. By a suitable change of signature (see Remark 1) this extends to a construction of free algebras, rather than just initial ones. We can show that the construction works for an arbitrary strictly positive monad and not just for free ones. Given such a construction one gets a quotient monad morphism from the base monad to the quotient monad. This contravariantly induces a forgetful functor from the algebras of the latter to that of the former. Using the adjoint triangle theorem, one should be able to construct a left adjoint. This would then cover examples such as the free group over a monoid, free ring over a group, etc. Quotient inductive-inductive types. The notion of QW-type generalises to indexed QW-types, analogously to the generalisation of W-types to Petersson-Synek trees for inductively deﬁned indexed families of types [24, Chapter 16], and we will consider it in subsequent work. More generally, we wonder whether our analysis of QITs using quotients, inductive-inductive and sized types can be extended to cover the notion of quotient inductive-inductive type (QIIT) [4, 19]. Dijkstra [10] studies such types in depth and in Chapter 6 of his thesis gives a construction for ﬁnitary ones in terms of countable colimits, and hence in terms of countable coproducts and quotients. One could hope to pass to the inﬁnitary case by using sized types as we have done, provided an analogue for QIITs can be found of 274 M. P. Fiore et al. the monadic construction in Section 4 for our class of QITs, the QW-types. Kaposi, Kovács, and Altenkirch [19] give a speciﬁcation of ﬁnitary QIITs using a domain-speciﬁc type theory called the theory of signatures and prove existence of QIITs matching this speciﬁcation. It might be possible to encode their theory of signatures using QW-types (it can already be encoded as a QIIT), or to extend QW-types making this possible. This would allow inﬁnitary QIITs. Schemas for QITs. We have shown by example that QW-types can encode a wide range of QITs. However, we have yet to extend this to a proof of Conjecture 1 that every instance of the schema for QITs considered in Section 3 can be so encoded. Conditional path equations. In Section 3 we mentioned the fact that Dybjer and Moeneclaey [12] give a model for ﬁnitary 1-HITs and 2-HITs in which constructors are allowed to take arguments involving the identity type of the datatype being declared. On the face of it, QW-types are not able to encode such conditional QITs. We plan to consider whether it is possible to extend the notion of QW-type to allow encoding of inﬁnitary QITs with such conditional equations. Homotopy Type Theory (HoTT). Our development makes use of UIP (and het- erogeneous equality), which is well-known to be incompatible with the Univalence Axiom [29, Example 3.1.9]. Given the interest in HoTT, it is certainly worth investigating whether a result like Theorem 1 holds in univalent foundations for a suitably coherent version of QW-types. We are currently investigating this using set-truncation. Pattern matching for QITs and HITs. Our reduction of QITs to induction- induction, strictly positive quotients and sized types is of theoretical interest, but in practice one could wish for more direct support in systems like Agda, Lean and Coq for the very useful notion of quotient inductive types (or more generally, for higher inductive types). Even having better support for the special case of quotient types would be welcome. It is not hard to envisage the addition of a general schema for declaring QITs; but when it comes to deﬁning functions on them, having to do that with eliminator forms rapidly becomes cumbersome (for example, for functions of several QIT arguments). Some extension of dependently typed pattern matching to cover equality constructors as well as element constructors is needed and the third author has begun work on that based on the approach of Cockx and Abel [9]. In this context it is worth mentioning that the cubical features of recent versions of Agda give access to cubical type theory [30]. This allows for easy declaration of HITs and hence in particular QITs (and quotients avoiding the need for POLARITY pragmas) and a certain amount of pattern matching when it comes to deﬁning functions on them: the value of a function on a path constructor can be speciﬁed by using generic elements of the interval type in point-level patterns; but currently the user is given little mechanised assistance to solve the deﬁnitional equality constraints on end-points of paths that are generated by this method. Constructing Inﬁnitary Quotient-Inductive Types 275 References 1. Abbott, M., Altenkirch, T., Ghani, N.: Containers: Constructing strictly positive types. Theoretical Computer Science vol. 342 (1), 3–27 (2005). doi: 10.1016/j.tcs. 2005.06.002. 2. Abel, A.: Type-Based Termination, Inﬂationary Fixed-Points, and Mixed Induct- ive-Coinductive Types. Electronic Proceedings in Theoretical Computer Science vol. 77, 1–11 (2012). doi: 10.4204/EPTCS.77.1. 3. Abel, A., Pientka, B.: Well-Founded Recursion with Copatterns and Sized Types. J. Funct. Prog. vol. 26, e2 (2016). doi: 10.1017/S0956796816000022. 4. Altenkirch, T., Capriotti, P., Dijkstra, G., Kraus, N., Nordvall Forsberg, F.: Quotient Inductive-Inductive Types. In: Baier, C., Dal Lago, U. (eds.) Foundations of Software Science and Computation Structures, FoSSaCS 2018, LNCS, vol. 10803, pp. 293–310. Springer, Heidelberg (2018). 5. Altenkirch, T., Kaposi, A.: Type Theory in Type Theory Using Quotient Inductive Types. In: Proceedings of the 43rd Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages - POPL 2016, pp. 18–29. ACM Press, St. Petersburg, FL, USA (2016). doi: 10.1145/2837614.2837638. 6. Basold, H., Geuvers, H., van der Weide, N.: Higher Inductive Types in Programming. Journal of Universal Computer Science vol. 23 (1), 27 (2017). doi: 10.3217/jucs- 023-01-0063. 7. Blass, A.: Words, Free Algebras, and Coequalizers. Fundamenta Mathematicae vol. 117 (2), 117–160 (1983). 8. Cockx, J., Abel, A.: “Sprinkles of Extensionality for Your Vanilla Type Theory”. Abstract for the 22nd International Conference on Types for Proofs and Programs (TYPES 2016), Novi Sad, Serbia. 9. Cockx, J., Abel, A.: Elaborating Dependent (Co)Pattern Matching. Proceedings of the ACM on Programming Languages vol. 2, 1–30 (2018). doi: 10.1145/3236770. 10. Dijkstra, G.: Quotient Inductive-Inductive Deﬁnitions. PhD thesis, University of Nottingham (2017), url: http://eprints.nottingham.ac.uk/42317/1/thesis.pdf . 11. Dybjer, P.: Representing Inductively Deﬁned Sets by Wellorderings in Martin-Löf’s Type Theory. Theoretical Computer Science vol. 176 (1-2), 329–335 (1997). doi: 10.1016/S0304-3975(96)00145-4. 12. Dybjer, P., Moeneclaey, H.: Finitary Higher Inductive Types in the Groupoid Model. Electronic Notes in Theoretical Computer Science vol. 336, 119–134 (2018). doi: 10.1016/j.entcs.2018.03.019. 13. Fiore, M.: An Equational Metalogic for Monadic Equational Systems. Theory and Applications of Categories vol. 27 (18), 464–492 (2013). url: https : / / emis . de / journals/TAC/volumes/27/18/27-18.pdf . 14. Fiore, M., Hur, C.-K.: On the Construction of Free Algebras for Equational Systems. Theoretical Computer Science vol. 410 (18), 1704–1729 (2009). doi: 10.1016/j.tcs. 2008.12.052. 15. Forsberg, F.N., Setzer, A.: A Finite Axiomatisation of Inductive-Inductive Deﬁn- itions. In: Berger, U., Diener, H., Schuster, P., Seisenberger, M. (eds.) Logic, Construction, Computation, Ontos mathematical logic, pp. 259–287. De Gruyter (2012). doi: 10.1515/9783110324921.259. 16. Gambino, N., Kock, J.: Polynomial Functors and Polynomial Monads. Math. Proc. Camb. Phil. Soc. vol. 154 (1), 153–192 (2013). doi: 10.1017/S0305004112000394. 17. Gitik, M.: All Uncountable Cardinals Can Be Singular. Israel J. Math. vol. 35 (1–2), 61–88 (1980). 276 M. P. Fiore et al. 18. Hofmann, M.: Extensional Concepts in Intensional Type Theory. PhD thesis, University of Edinburgh (1995). 19. Kaposi, A., Kovács, A., Altenkirch, T.: Constructing Quotient Inductive-Inductive Types. Proc. ACM Program. Lang. vol. 3, 1–24 (2019). doi: 10.1145/3290315. 20. Kelly, M.: A Uniﬁed Treatment of Transﬁnite Constructions for Free Algebras, Free Monoids, Colimits, Associated Sheaves, and so on. Bull. Austral. Math. Soc. vol. 22, 1–83 (1980). 21. Lumsdaine, P.L., Shulman, M.: Semantics of Higher Inductive Types. Math. Proc. Camb. Phil. Soc. (2019). doi: 10.1017/S030500411900015X. 22. Martin-Löf, P.: Constructive Mathematics and Computer Programming. In: Cohen, L.J.,Łoś,J., Pfeiﬀer,H.,Podewski,K.-P. (eds.) Studies in Logic and the Foundations of Mathematics, pp. 153–175. Elsevier (1982). doi: 10.1016/S0049-237X(09)70189-2. 23. McBride, C.: Dependently Typed Functional Programs and their Proofs. PhD thesis, University of Edinburgh (1999). 24. Nordström, B., Petersson, K., Smith, J.M.: Programming in Martin-Löf ’s Type Theory. Oxford University Press (1990). 25. Shulman, M.: Brouwer’s Fixed-Point Theorem in Real-Cohesive Homotopy Type Theory. Mathematical Structures in Computer Science vol. 28, 856–941 (2018). 26. Sojakova, K.: Higher Inductive Types as Homotopy-Initial Algebras. In: Proceed- ings of the 42nd Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages - POPL ’15, pp. 31–42. ACM Press, Mumbai, India (2015). doi: 10.1145/2676726.2676983. 27. Streicher, T.: Investigations into Intensional Type Theory. Habilitation Thesis, Ludwig Maximilian University (1993). 28. Swan, A.: W-Types with Reductions and the Small Object Argument. (2018). arXiv:1802.07588 [math]. 29. The Univalent Foundations Program, Homotopy Type Theory: Univalent Founda- tions for Mathematics. http://homotopytypetheory.org/book, Institute for Advanced Study (2013). 30. Vezzosi, A., Mörtberg, A., Abel, A.: Cubical Agda: A Dependently Typed Program- ming Language with Univalence and Higher Inductive Types. Proc. ACM Program. Lang. vol. 3 (ICFP), 87:1–87:29 (2019). doi: 10.1145/3341691. Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made. The images or other third party material in this chapter are included in the chapter’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. Relative full completeness for bicategorical cartesian closed structure 1 ()2 Marcelo Fiore and Philip Saville Department of Computer Science and Technology, University of Cambridge, UK marcelo.fiore@cl.cam.ac.uk School of Informatics, University of Edinburgh, UK philip.saville@ed.ac.uk Abstract. The glueing construction, deﬁned as a certain comma cate- gory, is an important tool for reasoning about type theories, logics, and programming languages. Here we extend the construction to accommo- date ‘2-dimensional theories’ of types, terms between types, and rewrites between terms. Taking bicategories as the semantic framework for such systems, we deﬁne the glueing bicategory and establish a bicategorical version of the well-known construction of cartesian closed structure on a glueing category. As an application, we show that free ﬁnite-product bicategories are fully complete relative to free cartesian closed bicate- gories, thereby establishing that the higher-order equational theory of rewriting in the simply-typed lambda calculus is a conservative extension of the algebraic equational theory of rewriting in the fragment with ﬁnite products only. Keywords: glueing, bicategories, cartesian closure, relative full com- pleteness, rewriting, type theory, conservative extension 1 Introduction Relative full completeness for cartesian closed structure. Every small category C can be viewed as an algebraic theory. This has sorts the objects of C with unary operators for each morphism of C and equations determined by the equalities in C. Suppose one freely extends C with ﬁnite products. Categori- cally, one obtains the free cartesian category F [C] on C. From the well-known construction of F [C] (see e.g. [12] and [46, §8]) it is direct that the universal functor C → F [C] is fully-faithful, a property we will refer to as the relative full × × completeness (c.f. [2,16]) of C in F [C]. Type theoretically, F [C] corresponds to the Simply-Typed Product Calculus (STPC) over the algebraic theory of C, given by taking the fragment of the Simply-Typed Lambda Calculus (STLC) consisting of just the types, rules, and equational theory for products. Relative full completeness corresponds to the STPC being a conservative extension. ×,→ Consider now the free cartesian closed category F [C] on C, type-theoretically corresponding to the STLC over the algebraic theory of C. Does the relative full ×,→ completeness property, and hence conservativity, still hold for either C in F [C] The Author(s) 2020 J. Goubault-Larrecq and B. K¨ onig (Eds.): FOSSACS 2020, LNCS 12077, pp. 277–298, 2020. https://doi.org/10.1007/978-3-030-45231-5_15 278 M. Fiore and P. Saville × ×,→ ×,→ or for F [C] in F [C]? Precisely, is either the universal functor C → F [C] × ×,→ or its universal cartesian extension F [C] → F [C] full and faithful? The answer is aﬃrmative, but the proof is non-trivial. One must either reason proof- theoretically (e.g. in the style of [63, Chapter 8]) or employ semantic techniques such as glueing [39, Annexe C]. In this paper we consider the question of relative full completeness in the bicategorical setting. This corresponds to the question of conservativity for 2-dimensional theories of types, terms between types, and rewrites between terms (see [32,20]). We focus on the particular case of the STLC with invertible rewrites given by β-reductions and η-expansions, and its STPC fragment. By identifying these two systems with cartesian closed, resp. ﬁnite product, structure ‘up to isomorphism’ one recovers a conservative extension result for rewrites akin to that for terms. 2-dimensional categories and rewriting. It has been known since the 1980s that one may consider 2-dimensional categories as abstract reduction sys- tems (e.g. [54,51]): if sorts are 0-cells (objects) and terms are 1-cells (morphisms), then rewrites between terms ought to be 2-cells. Indeed, every sesquicategory (of which 2-categories are a special class) generates a rewriting relation on its 1-cells deﬁned by f g if and only if there exists a 2-cell f ⇒ g (e.g. [60,58]). Invertible 2-cells may be then thought of as equality witnesses. The rewriting rules of the STLC arise naturally in this framework: Seely [56] observed that β-reduction and η-expansion may be respectively interpreted as the counit and unit of the adjunctions corresponding to lax (directed) products and exponentials in a 2-category (c.f. also [34,27]). This approach was taken up by Hilken [32], who developed a ‘2-dimensional λ-calculus’ with strict products and lax exponentials to study the proof theory of rewriting in the STLC (c.f. also [33]). Our concern here is with equational theories of rewriting, and we follow Seely in viewing weak categorical structure as a semantic model of rewriting modulo an equational theory. We are not aware of non-syntactic examples of 2-dimensional cartesian closed structure that are lax but not pseudo (i.e. up to isomorphism) and so adopt cartesian closed bicategories as our semantic framework. From the perspective of rewriting, a sesquicategory embodies the rewriting of terms modulo the monoid laws for identities and composition, while a bicategory embodies the rewriting of terms modulo the equational theory on rewrites given by the triangle and pentagon laws of a monoidal category. Cartesian closed bicategories further embody the usual β-reductions and η-expansions of STLC modulo an equational theory on rewrites; for instance, this identiﬁes the composite rewrite t ,t ⇒π (t ,t ),π (t ,t )⇒t ,t with the identity rewrite. 1 2 1 1 2 2 1 2 1 2 Indeed, in the free cartesian closed bicategory over a signature of base types and constant terms, the quotient of 1-cells by the isomorphism relation provided by 2-cells is in bijection with αβη-equivalence classes of STLC-terms (c.f. [55, Chapter 5]). Bicategorical relative full completeness. The bicategorical notion of relative full completeness arises by generalising from functors that are fully-faithful to Relative full completeness for bicategorical cartesian closed structure 279 pseudofunctors F : B→ C that are locally an equivalence, that is, for which every hom-functor F : B(X, Y ) →C(FX, FY ) is an equivalence of categories. X,Y Interpreted in the context of rewriting, this amounts to the conservativity of rewriting theories. First, the equational theory of rewriting in C is conservative over that in B: the hom-functors do not identify distinct rewrites. Second, the reduction relation in C(FX, FY ) is conservative over that in B(X, Y ): whenever Ff Fg in C then already f g in B. Third, the term structure in B gets copied by F in C: modulo the equational theory of rewrites, there are no new terms between types in the image of F . Contributions. This paper makes two main contributions. Our ﬁrst contribution, in Section 3, is to introduce the bicategorical glueing construction and, in Section 4, to initiate the development of its theory. As well as providing an assurance that our notion is the right one, this establishes the basic framework for applications. Importantly, we bicategorify the fundamental folklore result (e.g. [40,12,62]) establishing mild conditions under which a glued bicategory is cartesian closed. Our second contribution, in Section 5, is to employ bicategorical glueing to show that for a bicategory B with ﬁnite-product completion F [B] and cartesian- ×,→ ×,→ closed completion F [B], the universal pseudofunctor B→ F [B] and its × ×,→ universal ﬁnite-product-preserving extension F [B] →F [B] are both locally an equivalence. Since one may directly observe that the universal pseudofunc- tor B→ F [B] is locally an equivalence, we obtain relative full completeness results for bicategorical cartesian closed structure mirroring those of the categori- cal setting. Establishing this proof-theoretically would require the development of a 2-dimensional proof theory. Given the complexities already present at the categorical level this seems a serious and interesting undertaking. Here, once the basic bicategorical theory has been established, the proof is relatively compact. This highlights the eﬀectiveness of our approach for the application. The result may also be expressed type-theoretically. For instance, in terms of ×,→ the type theories of [20], the type theory Λ for cartesian closed bicategories ps is a conservative extension of the type theory Λ for ﬁnite-product bicategories. ps It follows that, modulo the equational theory of bicategorical products and exponentials, any rewrite between STPC-terms constructed using the βη-rewrites for both products and exponentials may be equally presented as constructed from just the βη-rewrites for products (see [21,55]). Further work. We view the foundational theory presented here as the start- ing point for future work. For instance, we plan to incorporate further type structure into the development, such as coproducts (c.f. [22,16,4]) and monoidal structure (c.f. [31]). On the other hand, the importance of glueing in the categorical setting suggests that its bicategorical counterpart will ﬁnd a range of applications. A case in point, which has already been developed, is the proof of a 2-dimensional ×,→ normalisation property for the type theory Λ for cartesian closed bicategories ps of [20] that entails a corresponding bicategorical coherence theorem [21,55]. There 280 M. Fiore and P. Saville are also a variety of syntactic constructions in programming languages and type theory that naturally come with a 2-dimensional semantics (see e.g. the use of 2-categorical constructions in [23,14,6,61,35]). In such scenarios, bicategorical glueing may prove useful for establishing properties corresponding to the notions of adequacy and/or canonicity, or for proving further conservativity properties. 2 Cartesian closed bicategories We begin by brieﬂy recapitulating the basic theory of bicategories, including the deﬁnition of cartesian closure. A summary of the key deﬁnitions is in [41]; for a more extensive introduction see e.g. [5,7]. 2.1 Bicategories Bicategories axiomatise structures in which the associativity and unit laws of composition only hold up to coherent isomorphism, for instance when composition is deﬁned by a universal property. They are rife in mathematics and theoretical computer science, appearing in the semantics of computation [29,11,49], datatype models [1,13], categorical logic [26], and categorical algebra [19,25,18]. Deﬁnition 1 ([5]). A bicategory B consists of 1. A class of objects ob(B), 2. For every X, Y ∈ ob(B) a hom-category B(X, Y ), •, id with objects 1-cells f : X → Y and morphisms 2-cells α : f ⇒ f : X → Y ; composition of 2-cells is called vertical composition, 3. For every X, Y, Z ∈ ob(B) an identity functor Id : 1 →B(X, X) (for 1 the terminal category) and a horizontal composition functor ◦ : X,Y,Z B(Y, Z) ×B(X, Y ) →B(X, Z), 4. Invertible 2-cells a :(h ◦ g) ◦ f ⇒ h ◦ (g ◦ f): W → Z h,g,f l :Id ◦ f ⇒ f : W → X f X r : g ◦ Id ⇒ g : X → Y g X for every f : W → X, g : X → Y and h : Y → Z, natural in each of their parameters and satisfying a triangle law and a pentagon law analogous to those for monoidal categories. A bicategory is said to be locally small if every hom-category is small. Example 1. 1. Every 2-category is a bicategory in which the structural isomor- phisms are all the identity. 2. For any category C with pullbacks there exists a bicategory of spans over C [5]. The objects are those of C, 1-cells A B are spans (A ← X → B), and 2-cells (A ← X → B) → (A ← X → B) are morphisms X → X making the expected diagram commute. Composition is deﬁned using chosen pullbacks. Relative full completeness for bicategorical cartesian closed structure 281 A bicategory has three notions of ‘opposite’, depending on whether one reverses 1-cells, 2-cells, or both (see e.g. [37, §1.6]). We shall only require the following. op Deﬁnition 2. The opposite of a bicategory B, denoted B , is obtained by setting op B (X, Y ):= B(Y, X) for all X, Y ∈B. A morphism of bicategories is called a pseudofunctor (or homomorphism)[5]. It is a mapping on objects, 1-cells and 2-cells that preserves horizontal composition up to isomorphism. Vertical composition is preserved strictly. Deﬁnition 3. A pseudofunctor (F, φ, ψ): B→ C between bicategories B and C consists of 1. A mapping F : ob(B) → ob(C), 2. A functor F : B(X, Y ) →C(FX, FY ) for every X, Y ∈ ob(B), X,Y 3. An invertible 2-cell ψ :Id ⇒ F (Id ) for every X ∈ ob(B), X FX X 4. An invertible 2-cell φ : F (f) ◦ F (g) ⇒ F (f ◦ g) for every g : X → Y and f,g f : Y → Z, natural in f and g, subject to two unit laws and an associativity law. A pseudofunctor for which φ and ψ are both the identity is called strict. A pseudofunctor is called locally P if every functor F satisﬁes the property P . X,Y Example 2. A monoidal category is equivalently a one-object bicategory; a monoidal functor is equivalently a pseudofunctor between one-object bicate- gories. Pseudofunctors F, G : B→ C are related by pseudonatural transformations. A pseudonatural transformation (k, k): F ⇒ G consists of a family of 1-cells (k : FX → GX) and, for every f : X → Y , an invertible 2-cell k : X X∈B f k ◦ Ff ⇒ Gf ◦ k witnessing naturality. The 2-cells k are required to be Y X f natural in f and satisfy two coherence axioms. A morphism of pseudonatural transformations is called a modiﬁcation, and may be thought of as a coherent family of 2-cells. Notation 1. For bicategories B and C we write Bicat(B, C) for the (possibly large) bicategory of pseudofunctors, pseudonatural transformations, and modiﬁ- cations (see e.g. [41]). If C is a 2-category, then so is Bicat(B, C). We write Cat for op the 2-category of small categories and think of the 2-category Bicat(B , Cat)as op a bicategorical version of the presheaf category Set . As for presheaf categories, one must take care to avoid size issues. We therefore adopt the convention that op when considering Bicat(B , Cat) the bicategory B is small or locally small as appropriate. Example 3. For every bicategory B and X ∈B there exists the representable op pseudofunctor YX : B → Cat, deﬁned by YX := B(−,X). The 2-cells φ and ψ are structural isomorphisms. 282 M. Fiore and P. Saville The notion of equivalence between bicategories is called biequivalence. A biequivalence B C consists of a pair of pseudofunctors F : B G : C together with equivalences FG id and GF id in Bicat(C, C) and Bicat(B, B) C B respectively. Equivalences in an arbitrary bicategory are deﬁned by analogy with equivalences of categories, see e.g. [42, pp. 28]. Remark 1. The coherence theorem for monoidal categories [44, Chapter VII] gen- eralises to bicategories: any bicategory is biequivalent to a 2-category [45] (see [42] for a readable summary of the argument). We are therefore justiﬁed in writing simply = for composites of a, l and r. As a rule of thumb, a category-theoretic proposition lifts to a bicategorical proposition so long as one takes care to weaken isomorphisms to equivalences and sprinkle the preﬁxes ‘pseudo’ and ‘bi’ in appropriate places. For instance, bicategorical adjoints are called biadjoints and bicategorical limits are called bilimits [59]. The latter may be thought of as limits in which every cone is ﬁlled by a coherent choice of invertible 2-cell. Bilimits are preserved by representable pseud- ofunctors and by right biadjoints. The bicategorical Yoneda lemma [59, §1.9] says op that for any pseudofunctor P : B → Cat, evaluation at the identity determines op a pseudonatural family of equivalences Bicat(B , Cat)(YX, P ) PX. One may op then deduce that the Yoneda pseudofunctor Y: B→ Bicat(B , Cat): X → YX is locally an equivalence. Another ‘bicategoriﬁed’ lemma is the following, which we shall employ in Section 5. Lemma 1. 1. For pseudofunctors F, G : B→ C,if F G and G is locally an equivalence, then so is F . 2. For pseudofunctors F : A→B, G : B→ C, H : C→ D,if G ◦ F and H ◦ G are local equivalences, then so is F . 2.2 fp-Bicategories It is convenient to directly consider all ﬁnite products, as this reduces the need to deal with the equivalent objects given by re-bracketing binary products. To avoid confusion with the ‘cartesian bicategories’ of Carboni and Walters [10,8], we call a bicategory with all ﬁnite products an fp-bicategory. Deﬁnition 4. An fp-bicategory (B, Π (−)) is a bicategory B equipped with the following data for every A ,...,A ∈B (n ∈ N): 1 n 1. A chosen object (A ,...,A ), 1 n 2. Chosen arrows π : (A ,...,A ) → A (k =1,...,n), called projections, k 1 n k 3. For every X ∈B an adjoint equivalence (π ◦−,...,π ◦−) 1 n B (X, (A ,...,A )) B(X, A ) (1) 1 n i n i=1 −,...,= speciﬁed by choosing a family of universal arrows (see e.g. [44, Theorem IV.2]) (i) with components : π ◦f ,...,f ⇒ f for i =1,...,n. i 1 n i f ,...,f 1 n Relative full completeness for bicategorical cartesian closed structure 283 We call the right adjoint −,..., = the n-ary tupling. (1) (n) Explicitly, the universal property of =( ,..., ) is the following. For any ﬁnite family of 2-cells (α : π ◦ g ⇒ f : X → A ) , there exists a i i i i i=1,...,n 2-cell p (α ,...,α ): g ⇒f ,...,f : X → (A ,...,A ), unique such that 1 n 1 n 1 n (k) • π ◦ p (α ,...,α ) = α : π ◦ g ⇒ f k 1 n k k k f ,...,f 1 n for k =1,...,n. One thereby obtains a functor −,..., = and an adjunction (1) (n) † as in (1) with counit =( ,..., ) and unit ς := p (id ,..., id ): g π ◦g π ◦g 1 n g ⇒π ◦ g,...,π ◦ g. This deﬁnes a lax n-ary product structure: one merely 1 n obtains an adjunction in (1). One turns it into a bicategorical (pseudo) product by further requiring the unit and counit to be invertible. The terminal object 1 arises as (). We adopt the same notation as for categorical products, for example by n n writing A for (A ,...,A ) and f for f ◦ π ,...,f ◦ π . i 1 n i 1 1 n n i=1 n i=1 Example 4. The bicategory of spans over a lextensive category [9] has ﬁnite products; such a bicategory is biequivalent to its opposite, so these are in fact biproducts [38, Theorem 6.2]. Biproduct structure arises using the coproduct structure of the underlying category (c.f. the biproduct structure of the category of relations). Remark 2 ( c.f. Remark 1). fp-Bicategories satisfy the following coherence the- orem: every fp-bicategory is biequivalent to a 2-category with 2-categorical products [52, Theorem 4.1]. Thus, we shall sometimes simply write = in diagrams for composites of 2-cells arising from either the bicategorical or product structure. In pasting diagrams we shall omit such 2-cells completely (c.f. [30, Remark 3.1.16]; for a detailed exposition, see [64, Appendix A]). One may think of bicategorical product structure as an intensional version of the familiar categorical structure, except the usual equations (e.g. [28]) are now witnessed by natural families of invertible 2-cells. It is useful to introduce explicit names for these 2-cells. Notation 2. In the following, and throughout, we write A for a ﬁnite sequence A ,...,A . 1 n Lemma 2. For any fp-bicategory (B, Π (−)) there exist canonical choices for the following natural families of invertible 2-cells: 1. For every (h : Y → A ) and g : X → Y , a 2-cell post(h ; g): i i i=1,...,n • h ,...,h ◦ g ⇒h ◦ g,...,h ◦ g, 1 n 1 n 2. For every (h : A → B ) and (g : X → A ) , a 2-cell i i i i=1,...,n i i i=1,...,n fuse(h ; g ):( h ) ◦g ,...,g ⇒h ◦ g ,...,h ◦ g . • • i 1 n 1 1 n n i=1 In particular, it follows from Lemma 2(2) that there exists a canonical natural n n n family of invertible 2-cells Φ : ( h ) ◦ ( g ) ⇒ (h ◦ g ) for any h ,g i i i i • • i=1 i=1 i=1 (h : A → B ) and (g : X → A ) . i i i i=1,...,n j j j j=1,...,n In the categorical setting, a cartesian functor preserves products up to isomor- phism. An fp-pseudofunctor preserves bicategorical products up to equivalence. 284 M. Fiore and P. Saville Deﬁnition 5. An fp-pseudofunctor (F, q ) between fp-bicategories (B, Π (−)) and (C, Π (−)) is a pseudofunctor F : B→ C equipped with speciﬁed equivalences n n Fπ ,...,Fπ : F ( A ) (FA ):q 1 n i i i=1 i=1 A for every A ,...,A ∈B (n ∈ N). We denote the 2-cells witnessing these 1 n × × × × equivalences by u : Id ⇒Fπ ,...,Fπ ◦ q and c :q ◦ ( FA ) 1 n A i A A A • i • • • Fπ ,...,Fπ ⇒ Id . We call (F, q ) strict if F is strict and satis- 1 n (FΠ A ) i i ﬁes F ( (A ,...,A )) = (FA ,...,FA ) 1 n 1 n n n (i) (i) A ,...,A FA ,...,F A 1 n 1 n F = F (π )= π t ,...,t Ft ,...,F t i i 1 n 1 n F t ,...,t = Ft ,...,Ft q =Id 1 n 1 n Π (FA ,...,F A ) A ,...,A n 1 n 1 n with equivalences given by the 2-cells p (r ,..., r ):Id =⇒π ,...,π . π π 1 n 1 n Notation 3. For fp-bicategories B and C we write fp-Bicat(B, C) for the bicate- gory of fp-pseudofunctors, pseudonatural transformations and modiﬁcations. We deﬁne two further families of 2-cells to witness standard properties of cartesian functors. The ﬁrst witnesses the fact that any fp-pseudofunctor com- mutes with the (−,..., =) operation. The second witnesses the equality Fπ ,...,Fπ ◦ F f ,...,f = Ff ,...,Ff ‘unpacking’ an n-ary tupling 1 n 1 n 1 n from inside F . Lemma 3. Let (F, q ):(B, Π (−)) → (C, Π (−)) be an fp-pseudofunctor. n n 1. For any ﬁnite family of 1-cells (f : A → A ) in B, there exists an i i i=1,...,n × n n × invertible 2-cell nat : q ◦ Ff ⇒ F ( f ) ◦ q such that the pair f i i • A A i=1 i=1 • (q , nat) forms a a pseudonatural transformation n n (F (−),...,F (=)) ⇒ (F ◦ )(−,..., =) i=1 i=1 2. For any ﬁnite family of 1-cells (f : X → B ) in B, there exists a i i i=1,...,n canonical choice of naturally invertible 2-cell unpack : Fπ ,...,Fπ ◦ 1 n F f ,...,f ⇒Ff ,...,Ff : FX → FB . 1 n 1 n i i=1 2.3 Cartesian closed bicategories A cartesian closed bicategory is an fp-bicategory (B, Π (−)) equipped with a biadjunction (−)×A (A = −) for every A ∈B. Examples include the bicategory of generalised species [17], bicategories of concurrent games [49], and bicategories of operads [26]. In the categorical setting, every natural transformation between cartesian functors is monoidal with respect to the cartesian structure and a similar fact is true bicat- egorically: every pseudonatural transformation is canonically compatible with the product structure, see [55, § 4.1.1]. Relative full completeness for bicategorical cartesian closed structure 285 Deﬁnition 6. A cartesian closed bicategory or cc-bicategory is an fp-bicategory (B, Π (−)) equipped with the following data for every A, B ∈B: 1. A chosen object (A = B), 2. A speciﬁed 1-cell eval :(A = B) × A → B, A,B 3. For every X ∈B, an adjoint equivalence eval ◦(−×A) A,B B(X, A = B) B(X × A, B) speciﬁed by a choice of universal arrow ε :eval ◦ (λf × A) = ⇒ f. f A,B We call the functor λ(−) currying and refer to λf as the currying of f. Explicitly, the counit ε satisﬁes the following universal property. For every 1-cell g : X → (A = B) and 2-cell α : eval ◦ (g × A) ⇒ f there exists a unique A,B † † 2-cell e (α): g ⇒ λf such that ε • eval ◦ (e (α) × A) = α. This deﬁnes a lax f A,B exponential structure. One obtains a pseudo (bicategorical) exponential structure by further requiring that ε and the unit η := e (id ) are invertible. eval ◦(t×A) A,B op Example 5. Every ‘presheaf’ 2-category Bicat(B , Cat) has all bicategorical lim- its [52, Proposition 3.6], given pointwise, and is cartesian closed with (P = Q)X := op Bicat(B , Cat)(YX × P, Q)[55, Chapter 6]. As for products, we adopt the notational conventions that are standard in the categorical setting, for example by writing (f = g):(A = B) → (A = B ) for the currying of (g ◦ eval ) ◦ (Id × f). A,B A = B Just as fp-pseudofunctors preserve products up to equivalence, cartesian closed pseudofunctors preserve products and exponentials up to equivalence. Deﬁnition 7. A cartesian closed pseudofunctor or cc-pseudofunctor between cc-bicategories (B, Π (−), = ) and (C, Π (−), = ) is an fp-pseudofunctor (F, q ) n n equipped with speciﬁed equivalences m : F (A = B) (FA = FB):q A,B A,B for every A, B ∈B, where m : F (A = B) → (FA = FB) is the currying of A,B × = × F (eval ) ◦ q . A cc-pseudofunctor (F, q , q ) is strict if (F, q ) is a A,B A = B,A strict fp-pseudofunctor such that F (A = B)=(FA = FB) F (eval )=eval F (ε )= ε t Ft A,B F A,F B F (λt)= λ(Ft) q =Id FA =FB A,B with equivalences given by the 2-cells e (eval ◦ κ):Id = ⇒ λ(eval ◦ Id ) F A,F B F A,F B (FA = FB) (FA = FB)×FA where κ is the canonical isomorphism Id × FA Id . FA = FB (FA = FB)×FA 286 M. Fiore and P. Saville Remark 3. As is well-known in the case of Cat (e.g. [44, IV.2]), every equivalence X Y in a bicategory gives rise to an adjoint equivalence between X and Y with the same 1-cells (see e.g. [42, pp. 28–29]). Thus, one may assume without loss of generality that all the equivalences in the preceding deﬁnition are adjoint equivalences. The same observation applies to the deﬁnition of fp-pseudofunctors. Notation 4. For cc-bicategories B and C we write cc-Bicat(B, C) for the bi- category of cc-pseudofunctors, pseudonatural transformations and modiﬁca- tions (c.f. Notation 3). 3 Bicategorical glueing The glueing construction has been discovered in various forms, with correspond- ingly various names: the notions of logical relation [50,57], sconing [24], Freyd covers, and glueing (e.g. [40]) are all closely related (see e.g. [47] for an overview of the connections). Originally presented set-theoretically, the technique was quickly given categorical expression [43,47] and is now a standard component of the armoury for studying type theories (e.g. [40,12]). The glueing gl(F ) of categories C and D along a functor F : C → D may be deﬁned as the comma category (id ↓ F ). We deﬁne bicategorical glueing analogously. Deﬁnition 8. 1. Let F : A→C and G : B→ C be pseudofunctors of bicategories. The comma bicategory (F ↓ G) has objects triples (A ∈A,f : FA → GB, B ∈B). The 1-cells (A, f, B) → (A ,f ,B ) are triples (p, α, q), where p : A → A and q : B → B are 1-cells and α is an invertible 2-cell α : f ◦ Fp ⇒ Gq ◦ f. The 2-cells (p, α, q) ⇒ (p ,α ,q ) are pairs of 2-cells (σ : p ⇒ p ,τ : q ⇒ q ) such that the following diagram commutes: f ◦F (σ) f ◦ F (p) f ◦ F (p ) (2) G(q) ◦fG(q ) ◦ f G(τ)◦f Identities and horizontal composition are given by the following pasting dia- grams. F (r◦p) F Id Fp A Fr FA FA FA FA FA f f f f f f GB GB GB GB GB GId B Gq Gs G(s◦q) Relative full completeness for bicategorical cartesian closed structure 287 Vertical composition, the identity 2-cell, and the structural isomorphisms are given component-wise. 2. The glueing bicategory gl(J) of bicategories B and C along a pseudofunctor J : B→ C is the comma bicategory (id ↓ J). We call axiom (2) the cylinder condition due to its shape when viewed as a (3-dimensional) pasting diagram. Note that one directly obtains projection π π dom cod pseudofunctors B ← −−− gl(J) − −−→C. We develop some basic theory of glueing bicategories, which we shall put to use in Section 5. We follow the terminology of [15]. Deﬁnition 9. Let J : B→ X be a pseudofunctor. The relative hom-pseudofunctor op J : X→ Bicat(B , Cat) is deﬁned by JX := X (J(−),X). Following [15], one might call the glueing bicategory gl(J) associated to a relative hom-pseudofunctor the bicategory of B-intensional Kripke relations of arity J, and view it as an intensional, bicategorical, version of the category of Kripke relations. The relative hom-pseudofunctor preserves all bilimits that exist in its domain. For products, this may be described explicitly. Lemma 4. For any fp-bicategory (X , Π (−)) and pseudofunctor J : B→ X , the relative hom-pseudofunctor J extends canonically to an fp-pseudofunctor. × n n Proof. Take q to be the n-ary tupling X (J(−),X ) −→X (J(−), X ). i i X i=1 i=1 This forms a pseudonatural transformation with naturality witnessed by post. For any pseudofunctor J : B→ X there exists a pseudonatural transformation op (l, l):Y ⇒J◦ J : B→ Bicat(B , Cat) given by the functorial action of J on hom-categories. One may therefore deﬁne the following. Deﬁnition 10. For any pseudofunctor J : B→ X , deﬁne the extended Yoneda pseudofunctor Y : B→ gl(J) by setting YB := YB, (l, l) , JB , Yf := (−,B) J −1 (Yf, (φ ) , Jf), and Y(τ : f ⇒ f : B → B ):=(Yτ, Jτ). The cylinder −,f J Y Y Y J condition holds by the naturality of φ , and the 2-cells φ and ψ are (φ ,φ ) Y J and (ψ ,ψ ), respectively. The extended Yoneda pseudofunctor satisﬁes a corresponding ‘extended Yoneda lemma’ (c.f. [15, pp. 33]). Lemma 5. For any pseudofunctor J : B→ X and P =(P, (k, k),X) ∈ gl(J) there exists an equivalence of pseudofunctors gl(J)(Y(−),P ) P and an invertible modiﬁcation as in the diagram below. Hence Y is locally an equivalence. gl(J)(Y(−),P ) P dom (k,k) X (J(−),X) 288 M. Fiore and P. Saville Proof. The arrow marked is the composite of a projection and the equivalence arising from the Yoneda lemma. Its pseudo-inverse is the composite op P − → Bicat(B , Cat)(Y(−),P ) → gl(J)(Y(−),P ) (3) in which the equivalence arises from the Yoneda lemma and the unlabelled pseud- ofunctor takes a pseudonatural transformation (j, j):YB ⇒ P to the triple with ﬁrst component (j, j), third component j (k (Id )) : JB → X and second B B B component deﬁned using k and j. Chasing the deﬁnitions through and evaluating at A, B ∈B, one sees that when P := YB the composite (3) is equivalent to Y . Since (3) is locally an equivalence, Lemma 1(1) completes the proof. A,B 4 Cartesian closed structure on the glueing bicategory It is well-known that, if C and D are cartesian closed categories, D has pullbacks, and F : C → D is cartesian, then gl(F ) is cartesian closed (e.g. [40,12]). In this section we prove a corresponding result for the glueing bicategory. We shall be guided by the categorical proof, for which see e.g. [43, Proposition 2]. 4.1 Finite products in gl(J) Proposition 1. Let (B, Π (−)) and (C, Π (−)) be fp-bicategories and (J, q ): n n B→ C be an fp-pseudofunctor. Then gl(J) is an fp-bicategory with both projection pseudofunctors π and π strictly preserving products. dom cod For a family of objects (C ,c ,B ) , the n-ary product (C ,c ,B ) i i i i=1,...,n i i i i=1 n n n is deﬁned to be the tuple C , q ◦ c , B . The kth projection i i i i=1 B i=1 i=1 π is (π ,μ ,π ), where μ is deﬁned by commutativity of the following diagram: k k k k k × c ◦ π J(π ) ◦ q ◦ c k k k i B i (k) −1 π ◦ c (Jπ ◦ q ) ◦ c k i k i i B i (k) ◦q )◦Π c = i i (π ◦ Id ) ◦ c (π ◦Jπ ,..., Jπ ) ◦ q ◦ c k ( JB ) i k 1 n i i i B i i • (π ◦u )◦Π c k i i • π ◦ (Jπ ,..., Jπ ◦ q ) ◦ c k 1 n i B i For an n-ary family of 1-cells (g ,α ,f ):(Y, y, X) → (C ,c ,B )(i =1,...,n), i i i i i i the n-ary tupling is (g ,...,g , {α ,...,α }, f ,...,f ), where {α ,...,α } 1 n 1 n 1 n 1 n Relative full completeness for bicategorical cartesian closed structure 289 is the composite {α ,...,α } 1 n q ◦ c ◦g ,...,g J(f ,...,f ) ◦ y i 1 n 1 n • i ∼ ∼ = = q ◦ ( c ◦g ,...,g )Id ◦ (Jf ,...,f ◦ y) i 1 n J( B ) 1 n B i i × × q ◦fuse (c ◦Jf ,...,f )◦y 1 n B B • • × × q ◦c ◦ g ,...,c ◦ g q ◦Jπ ,..., Jπ ◦ (Jf ,...,f ◦ y) 1 1 n n 1 n 1 n B B • • q ◦α ,...,α 1 n = × × q ◦Jf ◦ y,..., Jf ◦ y q ◦ ((Jπ ,..., Jπ ◦ Jf ,...,f ) ◦ y) 1 n 1 n 1 n B B • • × −1 q ◦(unpack ◦y) B f • • × −1 q ◦post q ◦ (Jf ,..., Jf ◦ y) 1 n B B • • Finally, for every family of 1-cells (g ,α ,f ):(Y, y, X) → (C ,c ,B )(i = i i i i i i 1,...,n) we require a glued 2-cell π ◦ (g ,...,g , {α ,...,α }, f ,...,f ) ⇒ 1 n 1 n 1 n (k) (k) (g ,α ,f ) to act as the counit. We take simply ( , ). This pair forms a k k k g 2-cell in gl(J), and the required universal property holds pointwise. Remark 4. If (J, q ): B→ X is an fp-pseudofunctor, then Y : B→ gl(J) canon- ically extends to an fp-pseudofunctor. The pseudoinverse to Yπ ,..., Yπ is 1 n (−,..., =, , q ), where the component of the isomorphism at (f : X → B ) i i i=1,...,n × × −1 ∼ (c ) ◦F f q ◦unpack B B = • • × × is F f = ⇒ Id ◦F f = ========⇒ q ◦Fπ ◦F f = ======⇒ q ◦Ff . • • • • • F (Π B ) i i B B • • 4.2 Exponentials in gl(J) As in the 1-categorical case, the deﬁnition of currying in gl(J) employs pullbacks. A pullback of the cospan (X → − X ← − X ) in a bicategory B is a bilimit for the 1 0 2 strict pseudofunctor X :(1 → − 0 ← − 2) →B determined by the cospan. We state the universal property in the form that will be most useful for our applications. f f 1 2 Lemma 6. The pullback of a cospan (X −→ X ←− X ) in a bicategory B 1 0 2 is determined, up to equivalence, by the following data and properties: a span γ γ 1 2 (X ←− P −→ X ) in B and an invertible 2-cell ﬁlling the diagram on the left 1 2 below γ γ μ μ 1 2 1 2 X X X X 1 ∼ 2 1 ∼ 2 f f 1 2 f f 1 2 such that 290 M. Fiore and P. Saville 1. for any other diagram as on the right above there exists a ﬁll-in (u, Ξ ,Ξ ), 1 2 namely a 1-cell u : Q → P and invertible 2-cells Ξ : γ ◦ u ⇒ μ (i =1, 2) i i i satisfying ∼ f ◦Ξ 2 2 (f ◦ γ ) ◦uf ◦ (γ ◦ u) f ◦ μ 2 2 2 2 2 2 γ◦u μ (f ◦ γ ) ◦uf ◦ (γ ◦ u) f ◦ μ 1 1 1 1 1 1 = f ◦Ξ 1 1 2. for any 1-cells v, w : Q → P and 2-cells Ψ : γ ◦ v ⇒ γ ◦ w (i =1, 2) i i i satisfying ∼ f ◦Ψ ∼ 2 2 = = (f ◦ γ ) ◦vf ◦ (γ ◦ v) f ◦ (γ ◦ w)(f ◦ γ ) ◦ w 2 2 2 2 2 2 2 2 γ◦v γ◦w (f ◦ γ ) ◦vf ◦ (γ ◦ v) f ◦ (γ ◦ w)(f ◦ γ ) ◦ w 1 1 1 1 1 1 1 1 = f ◦Ψ 1 1 there exists a unique 2-cell Ψ : v ⇒ w such that Ψ = γ ◦ Ψ (i =1, 2). i i F G Example 6. 1. In Cat, the pullback of a cospan (B − → X ←−C) is the full subcategory of the comma category (F ↓ G) consisting of objects of the form (B, f, C) for which f : FB → GC is an isomorphism. Note that this diﬀers from the strict (2-)categorical pullback in Cat, in which every f is required to be an identity (c.f. [65, Example 2.1]). op 2. Like any bilimit, pullbacks in the bicategory Bicat(B , Cat) are computed pointwise (see [53, Proposition 3.6]). We now deﬁne exponentials in the glueing bicategory. Precisely, we extend Proposition 1 to the following. Theorem 5. Let (B, Π (−), =) and (C, Π (−), =) be cc-bicategories such that n n C has pullbacks. For any fp-pseudofunctor (J, q ): (B, Π (−)) → (C, Π (−)), n n the glueing bicategory gl(J) has a cartesian closed structure with forgetful pseudo- functor π : gl(J) →B strictly preserving products and exponentials. dom The evaluation map. We begin by deﬁning the mapping (−) = (=) and the evaluation 1-cell eval.For C := (C, c, B),C := (C ,c ,B ) ∈ gl(J) we set C = C to be the left-hand vertical leg of the following pullback diagram, in which we write m := λ(J(eval ) ◦ q ). B,B B,B B = B ,B c,c C ⊃ C (C = C ) p c,c c,c λ(c ◦eval ) C,C J(B = B)(JB = JB)(C = JB ) B,B λ(eval ◦((JB =JB )×c)) JB,JB λ(eval ◦ ((JB =JB ) × c)) ◦ m JB,JB B,B (4) Relative full completeness for bicategorical cartesian closed structure 291 Example 7. The pullback (4) generalises the well-known deﬁnition of a logical rela- tion of varying arity [36]. Indeed, where J := K is the relative hom-pseudofunctor for an fp-pseudofunctor (K, q ): B→ X between cc-bicategories, A ∈B and X, X ∈X , the functor m (A) takes a 1-cell f : KA → (X = X )in X X,X to the pseudonatural transformation YA ×X (K(−),X) ⇒X (K(−),X ) with components λB . λ(ρ : B → A, u : KB → X) . eval ◦f ◦ K(ρ),u. Intuitively, X,X therefore, the pullback enforces the usual closure condition deﬁning a logical relation at exponential type, while also tracking the isomorphism witnessing that this condition holds (c.f. [36,3,15]). Notation 6. For reasons of space—particularly in pasting diagrams—we will sometimes write c := eval ◦ ((JB = JB ) × c):(JB = JB ) × C → JB JB,JB when c : C → JB in C. The evaluation map eval is deﬁned to be (eval ◦(q × C), E , eval ), C,C c,c C,C B,B C,C where the witnessing 2-cell E is given by the pasting diagram below, in which C,C the unlabelled arrow is q ◦ (p × c). c,c (B =B ,B) eval ◦(q ×C) C,C c,c q ×C c,c (C ⊃ C ) × C (C = C ) ×CC p ×C ∼ λ(c ◦eval )×C c,c = C,C m ×C B,B λ(c)×C p ×c c,c J(B = B ) × C (JB = JB ) × C (C = JB ) × C J(B =B )×c ∼ (JB =JB )×c c eval J(B = B ) × JB (JB = JB ) × JB C,JB m ×JB B,B q ε eval (B =B ,B) JB,JB J ((B = B ) × B) JB Jeval B,B Here the bottom = denotes a composite of Φ, structural isomorphisms and −1 −1 Φ , and the top = denotes a composite of ω × C with instances of Φ, Φ , c,c and the structural isomorphisms. The currying operation. Let R := (R, r, Q), C := (C, c, B) and C := (C ,c ,B ) and suppose given a 1-cell (t, α, s): R×C → C . We construct λ(t, α, s) using the universal property (4) of the pullback. To this end, we deﬁne invertible composites −1 † −1 U and T as in the following two diagrams and set L := η • e (U ◦ α ◦ T ): α α α α λ(c ◦ eval ) ◦ λt ⇒ (λ( c) ◦ m ) ◦ (J(λs) ◦ r). C,C B,B 292 M. Fiore and P. Saville α × eval ◦ ((λ( c) ◦ m ) ◦ (J(λs) ◦ r)) × C Js ◦ q ◦ (r × c) C,JB B,B Q,B (eval ◦ (λ( c) × C)) ◦ (m ◦ (J(λs) ◦ r)) × C Jε ◦(q ◦(r×c)) C,JB B,B Q,B ε ◦(m ◦(J(λs)◦r))×C c B,B c ◦ (m ◦ (J(λs) ◦ r)) × C J(eval ◦ (λs × B)) ◦ q ◦ (r × c) B,B B,B Q,B (eval ◦ (m × JB)) ◦ ((J(λs) × JB) ◦ (r × c)) JB,JB B,B ε ◦(J(λs)×JB)◦(r×c) (Jeval◦q ) J(eval ) ◦ q ◦ ((J(λs) × JId ) ◦ (r × c)) B,B B (B = B ,B) The unlabelled arrow is the canonical composite of nat with φ λs,id eval,λ(s)×B and structural isomorphisms. T is then deﬁned using U : α α eval ◦ λ(c ◦ eval ) ◦ λt ×Cc ◦ t C,JB C,C ∼ c ◦ε (eval ◦ (λ(c ◦ eval ) × C)) ◦ (λ(t) × C) c ◦ (eval ◦ (λ(t) × C)) C,JB C,C C,C ε ◦(λ(t)×C) (c ◦eval) (c ◦ eval ) ◦ (λ(t) × C) C,C Applying the universal property of the pullback (4)toL , one obtains a 1-cell lam(t) and a pair of invertible 2-cells Γ and Δ ﬁlling the diagram c,c c,c R λ(t) lam(t) Δ c,c Γ c,c C ⊃ C (C = C ) c,c J(λs)◦r p c,c c,c λ(c ◦eval ) C,C J(B = B)(C = JB ) λ( c)◦m B,B We deﬁne λ(t, α, s):= lam(t),Γ ,λs . c,c The counit 2-cell. Finally we come to the counit. For a 1-cell t := (t, α, s): (R, r, Q) × (C, c, B) → (C ,c ,B ) the 1-cell eval ◦ λ(t, α, s) × (C, c, B) unwinds to the pasting diagram below, in which the unlabelled arrow is q ◦ (r × c): Q,B Relative full completeness for bicategorical cartesian closed structure 293 eval ◦ (q × C) ◦ (lam(t) × C) C,C c,c eval ◦(q ×C) lam(t)×C C,C c,c R × C (C ⊃ C ) ×CC Γ ×c c,c q ×c r×c c,c J(λs)×JB J(λs)×ψ C,C JQ × JB J(B = B ) × JB ∼ ∼ = = J(λs)×JId × × B q q Q,B (B = B ,B) nat J(Q × B) J (B = B ) × B JB Jeval J(λs×B) B,B J(eval ◦ (λs × B)) B,B For the counit ε we take the 2-cell with ﬁrst component e deﬁned by t t (eval ◦ (q × C)) ◦ (lam(t) × C) t C,C c,c ∼ t eval ◦ ((q ◦ lam(t)) × C)eval ◦ (λ(t) × C) C,C c,c C,C eval ◦(Δ ×C) C,C c,c and second component simply ε : eval ◦ (λ(s) × B) ⇒ s. This pair forms an s B,B invertible 2-cell in gl(J). One checks this satisﬁes the required universal property in a manner analogous to the 1-categorical case (see [55] for the full details). This completes the proof of Theorem 5. 5 Relative full completeness We apply the theory developed in the preceding two sections to prove the relative full completeness result. As outlined in the introduction, this corresponds to a proof of conservativity of the theory of rewriting for the higher-order equational theory of rewriting in STLC over the algebraic equational theory of rewriting in STPC. We adapt ‘Lafont’s argument’ [39, Annexe C] from the form presented in [16], for which we require bicategorical versions of the free cartesian category × ×,→ F [C] and free cartesian closed category F [C] over a category C. In line with the strategy for the STLC (c.f. [12, pp. 173–4]), we deal with the contravariance of the pseudofunctor (− = =) by restricting to a bicategory of cc-pseudofunctors, pseudonatural equivalences (that is, pseudonatural transformations for which each component is a given equivalence), and invertible modiﬁcations. We denote this with the subscript , . = 294 M. Fiore and P. Saville Lemma 7. For any bicategory B, fp-bicategory (C, Π (−)) and cc-bicategory (D, Π (−), = ): × × × 1. There exists an fp-bicategory F [B] and a pseudofunctor η : B→ F [B] such that composition with η induces a biequivalence fp-Bicat(F [B], C) − → Bicat(B, C) ×,→ = ×,→ 2. There exists a cc-bicategory F [B] and a pseudofunctor η : B→ F [B] such that composition with η induces a biequivalence ×,→ cc-Bicat ∼(F [B], D) − → Bicat(B, D) ,= Proof (sketch). A syntactic construction suﬃces: one deﬁnes formal products and exponentials and then quotients by the axioms (see [48, p. 79] or [55]). Thus, for any bicategory B, fp-bicategory (C, Π (−)), and pseudofunctor # × F : B→ C there exists an fp-pseudofunctor F : F [B] →C and an equivalence # × × F ◦ η F . Moreover, for any fp-pseudofunctor G : F [B] →C such that × # G ◦ η F one has G F . A corresponding result holds for cc-bicategories and cc-pseudofunctors. Theorem 7. For any bicategory B the universal fp-pseudofunctor ι : F [B] → ×,→ = = ×,→ F [B] extending η is locally an equivalence. Hence η : B→ F [B] is locally an equivalence. Proof. Since ι preserves ﬁnite products, the bicategory gl(ι) is cartesian closed (Theorem 5). The composite K:= Y ◦ η : B→ gl(ι) therefore induces a # ×,→ cc-pseudofunctor K : F [B] → gl(ι). # × # = × First observe that (K ◦ ι) ◦ η K ◦ η K= Y ◦ η . Since Y is canonically an fp-pseudofunctor (Remark 4), it follows that K ◦ ι Y. Since Y is locally an equivalence (Lemma 5), Lemma 1(1) entails that K ◦ ι is locally an equivalence. Next, examining the deﬁnition of Y one sees that π ◦ Y = ι, and so dom # = × × = (π ◦ K ) ◦ η (π ◦ Y) ◦ η ι ◦ η η dom dom # # It follows that π ◦ K id ×,→ , and hence that π ◦ K is also locally dom dom F [B] an equivalence. ι K π dom × ×,→ ×,→ Now consider the composite F [B] →F − [B] −−→ gl(ι) − −−→F [B]. By Lemma 1(2) and the preceding, ι is locally an equivalence. Finally, it is direct × × from the construction of F [B] that η is locally an equivalence; thus, so are × = ι ◦ η η . Acknowledgements. We thank all the anonymous reviewers for their comments: these improved the paper substantially. We are especially grateful to the reviewer who pointed out an oversight in the original formulation of Lemma 1(2), which consequently aﬀected the argument in Theorem 7, and provided the elegant ﬁx therein. The second author was supported by a Royal Society University Research Fellow Enhancement Award. Relative full completeness for bicategorical cartesian closed structure 295 References 1. Abbott, M.G.: Categories of containers. Ph.D. thesis, University of Leicester (2003) 2. Abramsky, S., Jagadeesan, R.: Games and full completeness for multi- plicative linear logic. Journal of Symbolic Logic 59(2), 543–574 (1994). https://doi.org/10.2307/2275407 3. Alimohamed, M.: A characterization of lambda deﬁnability in categorical models of implicit polymorphism. Theoretical Computer Science 146(1-2), 5–23 (1995). https://doi.org/10.1016/0304-3975(94)00283-O 4. Balat, V., Di Cosmo, R., Fiore, M.: Extensional normalisation and typed-directed partial evaluation for typed lambda calculus with sums. In: Proceedings of the 31st Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages. pp. 64–76 (2004) 5. B´enabou, J.: Introduction to bicategories. In: Reports of the Midwest Category Seminar. pp. 1–77. Springer Berlin Heidelberg, Berlin, Heidelberg (1967) 6. Bloom, S.L., Esik, Z., Labella, A., Manes, E.G.: Iteration 2-theories. Applied Cate- gorical Structures 9(2), 173–216 (2001). https://doi.org/10.1023/a:1008708924144 7. Borceux, F.: Bicategories and distributors, Encyclopedia of Mathematics and its Applications, vol. 1, pp. 281–324. Cambridge University Press (1994). https://doi.org/10.1017/CBO9780511525858.009 8. Carboni, A., Kelly, G.M., Walters, R.F.C., Wood, R.J.: Cartesian bicategories II. Theory and Applications of Categories 19(6), 93–124 (2008), http://www.tac.mta. ca/tac/volumes/19/6/19-06abs.html 9. Carboni, A., Lack, S., Walters, R.F.C.: Introduction to extensive and distribu- tive categories. Journal of Pure and Applied Algebra 84(2), 145–158 (1993). https://doi.org/10.1016/0022-4049(93)90035-r 10. Carboni, A., Walters, R.F.C.: Cartesian bicategories I. Journal of Pure and Applied Algebra 49(1), 11–32 (1987). https://doi.org/10.1016/0022-4049(87)90121-6 11. Castellan, S., Clairambault, P., Rideau, S., Winskel, G.: Games and strategies as event structures. Logical Methods in Computer Science 13 (2017) 12. Crole, R.L.: Categories for Types. Cambridge University Press (1994). https://doi.org/10.1017/CBO9781139172707 13. Dagand, P.E., McBride, C.: A categorical treatment of ornaments. In: Pro- ceedings of the 28th Annual ACM/IEEE Symposium on Logic in Computer Science. pp. 530–539. IEEE Computer Society, Washington, DC, USA (2013). https://doi.org/10.1109/LICS.2013.60 14. Fiore, M.: Axiomatic Domain Theory in Categories of Partial Maps. Distinguished Dissertations in Computer Science, Cambridge University Press (1996) 15. Fiore, M.: Semantic analysis of normalisation by evaluation for typed lambda calculus. In: Proceedings of the 4th ACM SIGPLAN International Conference on Principles and Practice of Declarative Programming. pp. 26–37. ACM, New York, NY, USA (2002). https://doi.org/10.1145/571157.571161 16. Fiore, M., Di Cosmo, R., Balat, V.: Remarks on isomorphisms in typed lambda calculi with empty and sum types. In: Proceedings of the 28th Annual IEEE Symposium on Logic in Computer Science. pp. 147–156. IEEE Computer Society Press (2002). https://doi.org/10.1109/LICS.2002.1029824 17. Fiore, M., Gambino, N., Hyland, M., Winskel, G.: The cartesian closed bicategory of generalised species of structures. Journal of the London Mathematical Society 77(1), 203–220 (2007). https://doi.org/10.1112/jlms/jdm096 296 M. Fiore and P. Saville 18. Fiore, M., Gambino, N., Hyland, M., Winskel, G.: Relative pseudomonads, Kleisli bicategories, and substitution monoidal structures. Selecta Mathematica New Series (2017) 19. Fiore, M., Joyal, A.: Theory of para-toposes. Talk at the Category Theory 2015 Conference. Departamento de Matematica, Universidade de Aveiro (Portugal) 20. Fiore, M., Saville, P.: A type theory for cartesian closed bicategories. In: Proceedings of the 34th Annual ACM/IEEE Symposium on Logic in Computer Science (2019). https://doi.org/10.1109/LICS.2019.8785708 21. Fiore, M., Saville, P.: Coherence and normalisation-by-evaluation for bicategorical cartesian closed structure. Preprint (2020) 22. Fiore, M., Simpson, A.: Lambda deﬁnability with sums via Grothendieck logical relations. In: Girard, J.Y. (ed.) Typed lambda calculi and applications: 4th inter- national conference. pp. 147–161. Springer Berlin Heidelberg, Berlin, Heidelberg (1999) 23. Freyd, P.: Algebraically complete categories. In: Lecture Notes in Mathematics, pp. 95–104. Springer Berlin Heidelberg (1991). https://doi.org/10.1007/bfb0084215 24. Freyd, P.J., Scedrov, A.: Categories, Allegories. Elsevier North Holland (1990) 25. Gambino, N., Joyal, A.: On operads, bimodules and analytic functors. Memoirs of the American Mathematical Society 249(1184), 153–192 (2017) 26. Gambino, N., Kock, J.: Polynomial functors and polynomial monads. Mathemati- cal Proceedings of the Cambridge Philosophical Society 154(1), 153–192 (2013). https://doi.org/10.1017/S0305004112000394 27. Ghani, N.: Adjoint rewriting. Ph.D. thesis, University of Edinburgh (1995) 28. Gibbons, J.: Conditionals in distributive categories. Tech. rep., University of Oxford (1997) 29. G.L. Cattani, Fiore, M., Winskel, G.: A theory of recursive domains with applications to concurrency. In: Proceedings of the 13th Annual IEEE Symposium on Logic in Computer Science. pp. 214–225. IEEE Computer Society (1998) 30. Gurski, N.: An Algebraic Theory of Tricategories. University of Chicago, Department of Mathematics (2006) 31. Hasegawa, M.: Logical predicates for intuitionistic linear type theories. In: Girard, J.Y. (ed.) Typed lambda calculi and applications: 4th international conference. pp. 198–213. Springer Berlin Heidelberg, Berlin, Heidelberg (1999) 32. Hilken, B.: Towards a proof theory of rewriting: the simply typed 2λ-calculus. The- oretical Computer Science 170(1), 407–444 (1996). https://doi.org/10.1016/S0304- 3975(96)80713-4 33. Hirschowitz, T.: Cartesian closed 2-categories and permutation equivalence in higher-order rewriting. Logical Methods in Computer Science 9, 1–22 (2013) 34. Jay, C.B., Ghani, N.: The virtues of eta-expansion. Journal of Functional Program- ming 5(2), 135–154 (1995). https://doi.org/10.1017/S0956796800001301 35. Johann, P., Polonsky, P.: Higher-kinded data types: Syntax and semantics. In: 34th Annual ACM/IEEE Symposium on Logic in Computer Science. IEEE (2019). https://doi.org/10.1109/lics.2019.8785657 36. Jung, A., Tiuryn, J.: A new characterization of lambda deﬁnability. In: Bezem, M., Groote, J.F. (eds.) Typed Lambda Calculi and Applications. pp. 245–257. Springer Berlin Heidelberg, Berlin, Heidelberg (1993) 37. Lack, S.: A 2-Categories Companion, pp. 105–191. Springer New York, New York, NY (2010) 38. Lack, S., Walters, R.F.C., Wood, R.J.: Bicategories of spans as cartesian bicategories. Theory and Applications of Categories 24(1), 1–24 (2010) Relative full completeness for bicategorical cartesian closed structure 297 39. Lafont, Y.: Logiques, cat´egories et machines. Ph.D. thesis, Universit´e Paris VII (1987) 40. Lambek, J., Scott, P.J.: Introduction to Higher Order Categorical Logic. Cambridge University Press, New York, NY, USA (1986) 41. Leinster, T.: Basic bicategories (May 1998), https://arxiv.org/pdf/math/9810017. pdf 42. Leinster, T.: Higher operads, higher categories. No. 298 in London Mathematical Society Lecture Note Series, Cambridge University Press (2004) 43. Ma, Q.M., Reynolds, J.C.: Types, abstraction, and parametric polymorphism, part 2. In: Brookes, S., Main, M., Melton, A., Mislove, M., Schmidt, D. (eds.) Mathematical Foundations of Programming Semantics. pp. 1–40. Springer Berlin Heidelberg, Berlin, Heidelberg (1992) 44. Mac Lane, S.: Categories for the Working Mathematician, Graduate Texts in Mathematics, vol. 5. Springer-Verlag New York, second edn. (1998). https://doi.org/10.1007/978-1-4757-4721-8 45. Mac Lane, S., Par´e, R.: Coherence for bicategories and indexed categories. Journal of Pure and Applied Algebra 37, 59–80 (1985). https://doi.org/10.1016/0022- 4049(85)90087-8 46. Marmolejo, F., Wood, R.J.: Kan extensions and lax idempotent pseudomonads. Theory and Applications of Categories 26(1), 1–29 (2012) 47. Mitchell, J.C., Scedrov, A.: Notes on sconing and relators. In: B¨ orger, E., J., G., Kleine Buning, ¨ H., Martini, S., Richter, M.M. (eds.) Computer Science Logic. pp. 352–378. Springer Berlin Heidelberg, Berlin, Heidelberg (1993) 48. Ouaknine, J.: A two-dimensional extension of Lambek’s categorical proof theory. Master’s thesis, McGill University (1997) 49. Paquet, H.: Probabilistic concurrent game semantics. Ph.D. thesis, University of Cambridge (2020) 50. Plotkin, G.D.: Lambda-deﬁnability and logical relations. Tech. rep., University of Edinburgh School of Artiﬁcial Intelligence (1973), memorandum SAI-RM-4 51. Power, A.J.: An abstract formulation for rewrite systems. In: Pitt, D.H., Rydeheard, D.E., Dybjer, P., Pitts, A.M., Poign´e, A. (eds.) Category Theory and Computer Science. pp. 300–312. Springer Berlin Heidelberg, Berlin, Heidelberg (1989) 52. Power, A.J.: Coherence for bicategories with ﬁnite bilimits I. In: Gray, J.W., Scedrov, A. (eds.) Categories in Computer Science and Logic: Proceedings of the AMS-IMS-SIAM Joint Summer Research Conference, vol. 92, pp. 341–349. AMS (1989) 53. Power, A.J.: A general coherence result. Journal of Pure and Applied Algebra 57(2), 165–173 (1989). https://doi.org/https://doi.org/10.1016/0022-4049(89)90113-8 54. Rydeheard, D.E., Stell, J.G.: Foundations of equational deduction: A categorical treatment of equational proofs and uniﬁcation algorithms. In: Pitt, D.H., Poign´e, A., Rydeheard, D.E. (eds.) Category Theory and Computer Science. pp. 114–139. Springer Berlin Heidelberg, Berlin, Heidelberg (1987) 55. Saville, P.: Cartesian closed bicategories: type theory and coherence. Ph.D. thesis, University of Cambridge (Submitted) 56. Seely, R.A.G.: Modelling computations: A 2-categorical framework. In: Gries, D. (ed.) Proceedings of the 2nd Annual IEEE Symposium on Logic in Computer Science. pp. 65–71. IEEE Computer Society Press (June 1987) 57. Statman, R.: Logical relations and the typed λ-calculus. Information and Control 65, 85–97 (1985) 58. Stell, J.: Modelling term rewriting systems by sesqui-categories. In: Proc. Cat´ egories, Alg`ebres, Esquisses et N´eo-Esquisses (1994) 298 M. Fiore and P. Saville 59. Street, R.: Fibrations in bicategories. Cahiers de Topologie et G´eom´etrie Diﬀ´erentielle Cat´egoriques 21(2), 111–160 (1980), https://eudml.org/doc/91227 60. Street, R.: Categorical structures. In: Hazewinkel, M. (ed.) Handbook of Algebra, vol. 1, chap. 15, pp. 529–577. Elsevier (1995) 61. Tabareau, N.: Aspect oriented programming: A language for 2-categories. In: Proceedings of the 10th International Workshop on Foundations of Aspect-oriented Languages. pp. 13–17. ACM, New York, NY, USA (2011). https://doi.org/10.1145/1960510.1960514 62. Taylor, P.: Practical Foundations of Mathematics, Cambridge Studies in Advanced Mathematics, vol. 59. Cambridge University Press (1999) 63. Troelstra, A.S., Schwichtenberg, H.: Basic proof theory. No. 43 in Cambridge Tracts in Theoretical Computer Science, Cambridge University Press, second edn. (2000) 64. Verity, D.: Enriched categories, internal categories and change of base. Ph.D. thesis, University of Cambridge (1992), TAC reprint available at http://www.tac.mta.ca/ tac/reprints/articles/20/tr20abs.html 65. Weber, M.: Yoneda structures from 2-toposes. Applied Categorical Structures 15(3), 259–323 (2007). https://doi.org/10.1007/s10485-007-9079-2 Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made. The images or other third party material in this chapter are included in the chapter’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. A duality theoretic view on limits of ﬁnite structures 1 1 2( ) Mai Gehrke , Tom´ aˇs Jakl , and Luca Reggio CNRS and Universit´eCote ˆ d’Azur, Nice, France {mgehrke,tomas.jakl}@unice.fr Institute of Computer Science of the Czech Academy of Sciences, Prague, Czech Republic and Mathematical Institute, University of Bern, Switzerland luca.reggio@math.unibe.ch Abstract. A systematic theory of structural limits for ﬁnite models has been developed by Neˇsetˇril and Ossona de Mendez. It is based on the insight that the collection of ﬁnite structures can be embedded, via a map they call the Stone pairing, in a space of measures, where the desired limits can be computed. We show that a closely related but ﬁner grained space of measures arises — via Stone-Priestley duality and the notion of types from model theory — by enriching the expressive power of ﬁrst- order logic with certain “probabilistic operators”. We provide a sound and complete calculus for this extended logic and expose the functorial nature of this construction. The consequences are two-fold. On the one hand, we identify the logical gist of the theory of structural limits. On the other hand, our construction shows that the duality-theoretic variant of the Stone pairing captures the adding of a layer of quantiﬁers, thus making a strong link to recent work on semiring quantiﬁers in logic on words. In the process, we identify the model theoretic notion of types as the unifying concept behind this link. These results contribute to bridging the strands of logic in computer sci- ence which focus on semantics and on more algorithmic and complexity related areas, respectively. Keywords: Stone duality · ﬁnitely additive measures · structural limits · ﬁnite model theory · formal languages · logic on words 1 Introduction While topology plays an important role, via Stone duality, in many parts of se- mantics, topological methods in more algorithmic and complexity oriented areas of theoretical computer science are not so common. One of the few examples, This project has been supported by the European Research Council (ERC) under the European Union’s Horizon 2020 research and innovation program (grant agree- ment No.670624). Luca Reggio has received an individual support under the grants GA17-04630S of the Czech Science Foundation, and No.184693 of the Swiss National Science Foundation. The Author(s) 2020 J. Goubault-Larrecq and B. K¨ onig (Eds.): FOSSACS 2020, LNCS 12077, pp. 299–318, 2020. https://doi.org/10.1007/978-3-030-45231-5_16 300 M. Gehrke et al. the one we want to consider here, is the study of limits of ﬁnite relational struc- tures. We will focus on the structural limits introduced by Neˇsetˇril and Ossona de Mendez [15,17]. These provide a common generalisation of various notions of limits of ﬁnite structures studied in probability theory, random graphs, struc- tural graph theory, and ﬁnite model theory. The basic construction in this work is the so-called Stone pairing. Given a relational signature σ and a ﬁrst-order formula ϕ in the signature σ with free variables v ,...,v , deﬁne 1 n |{a ∈ A | A |= ϕ(a)}| (the probability that a random ϕ, A = (1) assignment in A satisﬁes ϕ). |A| Neˇsetˇril and Ossona de Mendez view the map A →-,A as an embedding of the ﬁnite σ-structures into the space of probability measures over the Stone space dual to the Lindenbaum-Tarski algebra of all ﬁrst-order formulas in the signature σ. This space is complete and thus provides the desired limit objects for all sequences of ﬁnite structures which embed as Cauchy sequences. Another example of topological methods in an algorithmically oriented area of computer science is the use of proﬁnite monoids in automata theory. In this setting, proﬁnite monoids are the subject of the extensive theory, based on theo- rems by Eilenberg and Reiterman, and used, among others, to settle decidability questions [18]. In [4], it was shown that this theory may be understood as an application of Stone duality, thus making a bridge between semantics and more algorithmically oriented work. Bridging this semantics-versus-algorithmics gap in theoretical computer science has since gained quite some momentum, notably with the recent strand of research by Abramsky, Dawar and co-workers [2,3]. In this spirit, a natural question is whether the structural limits of Neˇsetˇril and Os- sona de Mendez also can be understood semantically, and in particular whether the topological component may be seen as an application of Stone duality. More precisely, recent work on understanding quantiﬁers in the setting of logic on ﬁnite words [5] has shown that adding a layer of certain quantiﬁers (such as classical and modular quantiﬁers) corresponds dually to measure space constructions. The measures involved are not classical but only ﬁnitely additive and they take values in ﬁnite semirings rather than in the unit interval. Nev- ertheless, this appearance of measures as duals of quantiﬁers begs the further question whether the measure spaces in the theory of structural limits may be obtained via Stone duality from a semantic addition of certain quantiﬁers to classical ﬁrst-order logic. The purpose of this paper is to address this question. Our main result is that the Stone pairing of Neˇsetˇril and Ossona de Mendez is related by a retraction to a Stone space of measures, which is dual to the Lindenbaum-Tarski algebra of a logic fragment obtained from ﬁrst-order logic by adding one layer of prob- abilistic quantiﬁers, and which arises in exactly the same way as the spaces of semiring-valued measures in logic on words. That is, the Stone pairing, although originating from other considerations, may be seen as arising by duality from a semantic construction. A duality theoretic view on limits of ﬁnite structures 301 A foreseeable hurdle is that spaces of classical measures are valued in the unit interval [0, 1] which is not zero-dimensional and hence outside the scope of Stone duality. This is well-known to cause problems e.g. in attempts to combine non- determinism and probability in domain theory [12]. However, in the structural limits of Neˇsetˇril and Ossona de Mendez, at the base, one only needs to talk about ﬁnite models equipped with normal distributions and thus only the ﬁnite 1 2 intervals I = {0, , ,..., 1} are involved. A careful duality-theoretic analysis n n identiﬁes a codirected diagram (i.e. an inverse limit system) based on these intervals compatible with the Stone pairing. The resulting inverse limit, which we denote Γ, is a Priestley space. It comes equipped with an algebra-like structure, which allows us to reformulate many aspects of the theory of structural limits in terms of Γ-valued measures as opposed to [0, 1]-valued measures. The analysis justifying the structure of Γ is based on duality theory for double quasi-operator algebras [7,8]. In the presentation, we have tried to compromise between giving interesting topo-relational insights into why Γ is as it is, and not overburdening the reader with technical details. Some interesting features of Γ, dictated by the nature of the Stone pairing and the ensuing codirected diagram, are that • Γ is based on a version of [0, 1] in which the rationals are doubled; • Γ comes with section-retraction maps [0, 1] Γ [0, 1]; • the map ι is lower semicontinuous while the map γ is continuous. These features are a consequence of general theory and precisely allow us to witness continuous phenomena relative to [0, 1] in the setting of Γ. Our contribution We show that the ambient measure space for the structural limits of Neˇsetˇril and Ossona de Mendez can be obtained via “adding a layer of quantiﬁers” in a suitable enrichment of ﬁrst-order logic. The conceptual framework for seeing this is that of types from classical model theory. More precisely, we will see that a variant of the Stone pairing is a map into a space of measures with values in a Priestley space Γ. Further, we show that this map is in fact the embedding of the ﬁnite structures into the space of (0-)types of an extension of ﬁrst-order logic, which we axiomatise. On the other hand, Γ-valued measures and [0, 1]-valued measures are tightly related by a retraction-section pair which allows the transfer of properties. These results identify the logical gist of the theory of structural limits and provide a new interesting connection between logic on words and the theory of structural limits in ﬁnite model theory. Outline of the paper. In section 2 we brieﬂy recall Stone-Priestley duality, its application in logic via spaces of types, and the particular instance of logic on words (needed only to show the similarity of the constructions). In Section 3 we introduce the Priestley space Γ with its additional operations, and show that it admits [0, 1] as a retract. The spaces of Γ-valued measures are introduced in 302 M. Gehrke et al. Section 4, and the retraction of Γ onto [0, 1] is lifted to the appropriate spaces of measures. In Section 5 we introduce the Γ-valued Stone pairing and make the link with logic on words. Further, we compare convergence in the space of Γ-valued measures with the one considered by Neˇsetˇril and Ossona de Mendez. Finally, in Section 6 we show that constructing the space of Γ-valued measures dually corresponds to enriching the logic with probabilistic operators. 2 Preliminaries f g Notation. Throughout this paper, if X − → Y − → Z are functions, their composi- tion is denoted g · f. For a subset S ⊆ X, f : S → Y is the obvious restriction. Given any set T , (T ) denotes its power-set. Further, for a poset P , P is the poset obtained by turning the order of P upside down. 2.1 Stone-Priestley duality In this paper, we will need Stone duality for bounded distributive lattices in the order topological form due to Priestley [19]. It is a powerful and well established tool in the study of propositional logic and semantics of programming languages, see e.g. [9,1] for major landmarks. We brieﬂy recall how this duality works. A compact ordered space is a pair (X, ≤) where X is a compact space and ≤ is a partial order on X which is closed in the product topology of X ×X. (Note that such a space is automatically Hausdorﬀ). A compact ordered space is a Priestley space provided it is totally order-disconnected. That is, for all x, y ∈ X such that x ≤ y, there is a clopen (i.e. simultaneously closed and open) C ⊆ X which is an up-set for ≤, and satisﬁes x ∈ C but y/ ∈ C. We recall the construction of the Priestley space of a distributive lattice D. A non-empty proper subset F ⊂ D is a prime ﬁlter if it is (i) upward closed (in the natural order of D), (ii) closed under ﬁnite meets, and (iii) if a ∨ b ∈ F , either a ∈ F or b ∈ F . Denote by X the set of all prime ﬁlters of D. By Stone’s Prime Filter Theorem, the map -: D → (X ),a → a = {F ∈ X | a ∈ F } D D is an embedding. Priestley’s insight was that D can be recovered from X ,if the latter is equipped with the inclusion order and the topology generated by the sets of the form a and their complements. This makes X into a Priestley space — the dual space of D — and the map - is an isomorphism between D and the lattice of clopen up-sets of X . Conversely, any Priestley space X is the dual space of the lattice of its clopen up-sets. We call the latter the dual lattice of X. This correspondence extends to morphisms. In fact, Priestley duality states that the category of distributive lattices with homomorphisms is dually equivalent to the category of Priestley spaces and continuous monotone maps. We assume all distributive lattices are bounded, with the bottom and top denoted by 0 and 1, respectively. The bounds need to be preserved by homomorphisms. A duality theoretic view on limits of ﬁnite structures 303 When restricting to Boolean algebras, we recover the celebrated Stone duality restricted to Boolean algebras and Boolean spaces, i.e. compact Hausdorﬀ spaces in which the clopen subsets form a basis. 2.2 Stone duality and logic: type spaces The theory of types is an important tool for ﬁrst-order logic. We brieﬂy recall the concept as it is closely related to, and provides the link between, two otherwise unrelated occurrences of topological methods in theoretical computer science. Consider a signature σ and a ﬁrst-order theory T in this signature. For each n ∈ N, let Fm denote the set of ﬁrst-order formulas whose free variables are among v = {v ,...,v }, and let Mod (T ) denote the class of all pairs (A, α) 1 n n where A is a model of T and α is an interpretation of v in A. Then the satis- faction relation, (A, α) |= ϕ, is a binary relation from Mod to Fm . It induces n n the equivalence relations of elementary equivalence ≡ and logical equivalence ≈ on these sets, respectively. The quotient FO (T)=Fm /≈ carries a natural n n Boolean algebra structure and is known as the n-th Lindenbaum-Tarski algebra of T . Its dual space is Typ (T ), the space of n-types of T , whose points can be identiﬁed with elements of Mod (T )/≡. The Boolean algebra FO(T)of all ﬁrst-order formulas modulo logical equivalence over T is the directed colimit of the FO (T ) for n ∈ N while its dual space, Typ(T ), is the codirected limit of the Typ (T ) for n ∈ N and consists of models equipped with interpretations of the full set of variables. If we want to study ﬁnite models, there are two equivalent approaches: e.g. at the level of sentences, we can either consider the theory T of ﬁnite T -models, ﬁn or the closure of the collection of all ﬁnite T -models in the space Typ (T ). This closure yields a space, which should tell us about ﬁnite T -structures. Indeed, it is equal to Typ (T ), the space of pseudoﬁnite T -structures. For an application of ﬁn this, see [10]. Below, we will see an application in ﬁnite model theory of the case T = ∅ (in this case we write FO(σ) and Typ(σ) instead of FO(∅) and Typ(∅)). In light of the theory of types as exposed above, the Stone pairing of Neˇsetˇril and Ossona de Mendez (see equation (1)) can be regarded as an embedding of ﬁnite structures into the space of probability measures on Typ(σ), which set- theoretically are ﬁnitely additive functions FO(σ) → [0, 1]. 2.3 Duality and logic on words As mentioned in the introduction, spaces of measures arise via duality in logic on words [5]. Logic on words, as introduced by Buc ¨ hi, see e.g. [14] for a recent survey, is a variation and specialisation of ﬁnite model theory where only models based on words are considered. I.e., a word w ∈ A is seen as a relational structure on {1,..., |w|}, where |w| is the length of w, equipped with a unary relation P , for each a ∈ A, singling out the positions in the word where the letter a appears. Each sentence ϕ in a language interpretable over these structures yields a language L ⊆ A consisting of the words satisfying ϕ. Thus, logic fragments ϕ 304 M. Gehrke et al. are considered modulo the theory of ﬁnite words and the Lindenbaum-Tarski algebras are subalgebras of (A ) consisting of the appropriate L ’s, cf. [10] for a treatment of ﬁrst-order logic on words. For lack of logical completeness, the duals of the Lindenbaum-Tarski alge- bras have more points than those given by models. Nevertheless, the dual spaces of types, which act as compactiﬁcations and completions of the collections of models, provide a powerful tool for studying logic fragments by topological means. The central notion is that of recognition, in which, a Boolean subalgebra ∗ ∗ ∗ B⊆ (A ) is studied by means of the dual map η : β(A ) → X . Here β(A )is the Stone dual of (A ), also known in topology as the Cech-Stone compactiﬁca- ∗ ∗ tion of the discrete space A , and X is the Stone dual of B. The set A embeds ∗ ∗ in β(A ), and η is uniquely determined by its restriction η : A → X .Now, 0 B Stone duality implies that L ⊆ A is in B iﬀ there is a clopen subset V ⊆ X so −1 that η (V )= L. Anytime the latter is true for a map η and a language L as above, one says that η recognises L. When studying logic fragments via recognition, the following inductive step is central: given a notion of quantiﬁer and a recogniser for a Boolean algebra of formulas with a free variable, construct a recogniser for the Boolean algebra generated by the formulas obtained by applying the quantiﬁer. This problem was solved in [5], using duality theory, in a general setting of semiring quantiﬁers. The latter are deﬁned as follows: let (S, +, ·, 0 , 1 ) be a semiring, and k ∈ S. Given a S S formula ψ(v), the formula ∃ v.ψ(v) is true of a word w ∈ A iﬀ k =1 +···+1 , S,k S S m times, where m is the number of assignments of the variable v in w satisfying ψ(v). If S = Z/qZ, we obtain the so-called modular quantiﬁers, and for S the two-element lattice we recover the existential quantiﬁer ∃. To deal with formulas with a free variable, one considers maps of the form f : β((A × 2) ) → X (the extra bit in A × 2 is used to mark the interpretation of the free variable). In [5] (see also [6]), it was shown that L is recognised ψ(v) by f iﬀ for every k ∈ S the language L is recognised by the composite ∃ v.ψ(v) S,k S(f) ∗ ∗ ξ : A − −−−−→ S(β((A × 2) )) − −−−−→ S(X), (2) where S(X) is the space of ﬁnitely additive S-valued measures on X, and R ∗ ∗ ∗ maps w ∈ A to the measure μ : ((A × 2) ) → S sending K ⊆ (A × 2) to the sum 1 + ··· +1 , n times. Here, n is the number of interpretations α of S S w,K w,K the free variable v in w such that the pair (w, α), seen as an element of (A × 2) , belongs to K. Finally, S(f) sends a measure to its pushforward along f. 3 The space Γ Central to our results is a Priestley space Γ closely related to [0, 1], in which our measures will take values. Its construction comes from the insight that the range Here, being beyond the scope of this paper, we are ignoring the important role of the monoid structure available on the spaces (in the form of proﬁnite monoids or BiMs, cf. [10,5]). A duality theoretic view on limits of ﬁnite structures 305 of the Stone pairing -,A, for a ﬁnite structure A and formulas restricted to a 1 2 ﬁxed number of free variables, can be conﬁned to a chain I = {0, , ,..., 1}. n n Moreover, the ﬂoor functions f : I I are monotone surjections. The mn,n mn n ensuing system {f : I I | m, n ∈ N} can thus be seen as a codirected mn,n mn n diagram of ﬁnite discrete posets and monotone maps. Let us deﬁne Γ to be the limit of this diagram. Then, Γ is naturally equipped with a structure of Priestley space, see e.g. [11, Corollary VI.3.3], and can be represented as based on the set − ◦ {r | r ∈ (0, 1]}∪{q | q ∈ Q ∩ [0, 1]}. The order of Γ is the unique total order which has 0 as bottom element, satisﬁes ∗ ∗ ◦ − r <s if and only if r< s for ∗ ∈{−, ◦}, and such that q is acoverof q − ◦ for every rational q ∈ (0, 1] (i.e. q <q , and there is no element strictly in between). In a sense, the values q represent approximations of the values of the form q . Cf. Figure 1. The topology of Γ is generated by the sets of the form ◦ ◦ − − ↑p = {x ∈ Γ | p ≤ x} and ↓q = {x ∈ Γ | x ≤ q } for p, q ∈ Q ∩ [0, 1] such that q = 0. The distributive lattice dual to Γ, denoted by L, is given by L = {⊥} ∪ (Q ∩ [0, 1]) , with ⊥ < q and q ≤ p for every p ≤ q in Q ∩ [0, 1]. L L Γ = L = Fig. 1. The Priestley space Γ and its dual lattice L 3.1 The algebraic structure on Γ When deﬁning measures we need an algebraic structure available on the space of values. The space Γ fulﬁls this requirement as it comes equipped with a partial operation −: dom(−) → Γ, where dom(−)= {(x, y) ∈ Γ × Γ | y ≤ x} and ◦ ◦ ◦ ◦ − r − s =(r − s) r − s (r − s) if r − s ∈ Q − ◦ − − − − r − s =(r − s) r − s (r − s) otherwise. In fact, this (partial) operation is dual to the truncated addition on the lattice L. However, explaining this would require us to delve into extended Priestley duality for lattices with operations, which is beyond the scope of this paper. See [9] and also [7,8] for details. It also follows from the general theory that there exists another partial operation deﬁnable from −, namely: ◦ ◦ ∼: dom(−) → Γ,x ∼ y = {x − q | y< q ≤ x}. 306 M. Gehrke et al. Next, we collect some basic properties of − and ∼, needed in Section 4, which follow from the general theory of [7,8]. First, recall that a map into an ordered topological space is lower (resp. upper) semicontinuous provided the preimage of any open down-set (resp. open up-set) is open. Lemma 1. If dom(−) is seen as a subspace of Γ × Γ , the following hold: 1. dom(−) is a closed up-set in Γ × Γ ; 2. both −: dom(−) → Γ and ∼: dom(−) → Γ are monotone in the ﬁrst coor- dinate, and antitone in the second; 3. −: dom(−) → Γ is lower semicontinuous; 4. ∼: dom(−) → Γ is upper semicontinuous. 3.2 The retraction Γ [0, 1] In this section we show that, with respect to appropriate topologies, the unit interval [0, 1] can be obtained as a topological retract of Γ, in a way which is compatible with the operation −. This will be important in Sections 4 and 5, where we need to move between [0,1]-valued and Γ-valued measures. Let us deﬁne the monotone surjection given by collapsing the doubled elements: − ◦ γ : Γ → [0, 1],r ,r → r. (3) The map γ has a right adjoint, given by r if r ∈ Q ι:[0, 1] → Γ,r → (4) r otherwise. Indeed, it is readily seen that γ(y) ≤ x iﬀ y ≤ ι(x), for all y ∈ Γ and x ∈ [0, 1]. The composition γ · ι coincides with the identity on [0, 1], i.e. ι is a section of γ. Moreover, this retraction lifts to a topological retract provided we equip Γ and [0, 1] with the topologies consisting of the open down-sets: Lemma 2. The map γ : Γ → [0, 1] is continuous and the map ι:[0, 1] → Γ is lower semicontinuous. −1 Proof. To check continuity of γ observe that, for a rational q ∈ (0, 1), γ (q, 1] −1 and γ [0,q) coincide, respectively, with the open sets ◦ − {↑p | p ∈ Q ∩ [0, 1] and q< p} and {↓p | p ∈ Q ∩ (0, 1] and p<q}. −1 − Also, ι is lower semicontinuous, for ι (↓q )=[0,q) whenever q ∈ Q∩(0, 1]. It is easy to see that both γ and ι preserve the minus structure available on Γ and [0,1] (the unit interval is equipped with the usual minus operation x − y deﬁned whenever y ≤ x), that is, • γ(x − y)= γ(x ∼ y)= γ(x) − γ(y) whenever y ≤ x in Γ, and • ι(x − y)= ι(x) − ι(y) whenever y ≤ x in [0,1]. Remark. ι:[0, 1] → Γ is not upper semicontinuous because, for every q ∈ −1 ◦ ◦ ◦ Q ∩ [0, 1], ι (↑q )= {x ∈ [0, 1] | q ≤ ι(x)} = {x ∈ [0, 1] | γ(q ) ≤ x} =[q, 1]. A duality theoretic view on limits of ﬁnite structures 307 4 Spaces of measures valued in Γ and in [0, 1] The aim of this section is to replace [0, 1]-valued measures by Γ-valued measures. The reason for doing this is two-fold. First, the space of Γ-valued measures is Priestley (Proposition 4), and thus amenable to a duality theoretic treatment and a dual logic interpretation (cf. Section 6). Second, it retains more topological information than the space of [0, 1]-valued measures. Indeed, the former retracts onto the latter (Theorem 10). Let D be a distributive lattice. Recall that, classically, a monotone function m: D → [0, 1] is a (ﬁnitely additive, probability) measure provided m(0) = 0, m(1) = 1, and m(a)+ m(b)= m(a ∨ b)+ m(a ∧ b) for every a, b ∈ D. The latter property is equivalently expressed as ∀a, b ∈ D, m(a) − m(a ∧ b)= m(a ∨ b) − m(b). (5) We write M (D) for the set of all measures D → [0, 1], and regard it as an ordered topological space, with the structure induced by the product order and product topology of [0, 1] . The notion of (ﬁnitely additive, probability) Γ-valued measure is analogous to the classical one, except that the ﬁnite addi- tivity property (5) splits into two conditions, involving − and ∼. Deﬁnition 3. Let D be a distributive lattice. A Γ-valued measure (or simply a measure)on D is a function μ: D → Γ such that ◦ ◦ 1. μ(0)=0 and μ(1)=1 , 2. μ is monotone, and 3. for all a, b ∈ D, μ(a) ∼ μ(a ∧ b) ≤ μ(a ∨ b) − μ(b) and μ(a) − μ(a ∧ b) ≥ μ(a ∨ b) ∼ μ(b). We denote by M (D) the subspace of Γ consisting of the measures μ: D → Γ. Since Γ is a Priestley space, so is Γ equipped with the product order and topology. Hence, we regard M (D) as an ordered topological space, whose topol- ogy and order are induced by those of Γ . In fact M (D) is a Priestley space: Proposition 4. For any distributive lattice D, M (D) is a Priestley space. Proof. It suﬃces to show that M (D) is a closed subspace of Γ . Let D ◦ D ◦ D C = {f ∈ Γ | f(0) = 0 }∩{f ∈ Γ | f(1) = 1 }∩ {f ∈ Γ | f(a) ≤ f(b)}. 1,2 a≤b Note that the evaluation maps ev : Γ → Γ, f → f(a), are continuous for every a ∈ D. Thus, the ﬁrst set in the intersection deﬁning C is closed because it 1,2 is the equaliser of the evaluation map ev and the constant map of value 0 . D ◦ Similarly, for the set {f ∈ Γ | f(1) = 1 }. The last one is the intersection −1 of the sets of the form ev , ev (≤), which are closed because ≤ is closed in a b Γ × Γ. Whence, C is a closed subset of Γ . Moreover, 1,2 M (D)= {f ∈ C | f(a) ∼ f(a ∧ b) ≤ f(a ∨ b) − f(b)} Γ 1,2 a,b∈D 308 M. Gehrke et al. ∩ {f ∈ C | f(a) − f(a ∧ b) ≥ f(a ∨ b) ∼ f(b)}. 1,2 a,b∈D From semicontinuity of − and ∼ (Lemma 1) and the following well-known fact in order-topology we conclude that M (D) is closed in Γ . Fact. Let X, Y be compact ordered spaces, f : X → Y a lower semicontinuous function and g : X → Y an upper semicontinuous function. If X is a closed subset of X, then so is E = {x ∈ X | g(x) ≤ f(x)}. Next, we prove a property which is very useful when approximating a frag- ment of a logic by smaller fragments (see, e.g., Section 5.1). Let us denote by DLat the category of distributive lattices and homomorphisms, and by Pries the category of Priestley spaces and continuous monotone maps. Proposition 5. The assignment D →M (D) yields a contravariant functor M : DLat → Pries which sends directed colimits to codirected limits. Proof. If h: D → E is a lattice homomorphism and μ: E → Γ is a measure, it is not diﬃcult to see that M (h)(μ)= μ · h: D → Γ is a measure. The mapping M (h): M (E) →M (D) is clearly monotone. For continuity, recall that the Γ Γ Γ topology of M (D) is generated by the sets a<q = {ν : D → Γ | ν(a) <q } and a ≥ q = {ν : D → Γ | ν(a) ≥ q }, with a ∈ D and q ∈ Q ∩ [0, 1]. We have −1 ◦ M (h) (a<q)= {μ: E → Γ | μ(h(a)) <q } = h(a) <q −1 which is open in M (E). Similarly, M (h) (a ≥ q)= h(a) ≥ q, showing Γ Γ that M (h) is continuous. Thus, M is a contravariant functor. Γ Γ The rest of the proof is a routine veriﬁcation. Remark 6. We work with the contravariant functor M : DLat → Pries be- cause M is concretely deﬁned on the lattice side. However, by Priestley duality, DLat is dually equivalent to Pries, so we can think of M as a covariant functor Pries → Pries (this is the perspective traditionally adopted in analysis, and also in the works of Neˇsetˇril and Ossona de Mendez). From this viewpoint, Section 6 provides a description of the endofunctor on DLat dual to M : Pries → Pries. Recall the maps γ : Γ → [0, 1] and ι:[0, 1] → Γ from equations (3)–(4). In Section 3.2 we showed that this is a retraction-section pair. In Theorem 10 this retraction is lifted to the spaces of measures. We start with an easy observation: Lemma 7. Let D be a distributive lattice. The following statements hold: 1. for every μ ∈M (D), γ · μ ∈M (D), Γ I 2. for every m ∈M (D), ι · m ∈M (D). I Γ Proof. 1. The only non-trivial condition to verify is ﬁnite additivity. In view of the discussion after Lemma 2, the map γ preserves both minus operations on Γ. Hence, for every a, b ∈ D, the inequalities μ(a) ∼ μ(a ∧ b) ≤ μ(a ∨ b) − μ(b) and μ(a)−μ(a∧b) ≥ μ(a∨b) ∼ μ(b) imply that γ·μ(a)−γ·μ(a∧b)= γ·μ(a∨b)−γ·μ(b). A duality theoretic view on limits of ﬁnite structures 309 2. The ﬁrst two conditions in Deﬁnition 3 are immediate. The third condition follows from the fact that ι(r − s)= ι(r) − ι(s) whenever s ≤ r in [0,1], and x ∼ y ≤ x − y for every (x, y) ∈ dom(−). In view of the previous lemma, there are well-deﬁned functions # # γ : M (D) →M (D),μ → γ · μ and ι : M (D) →M (D),m → ι · m. Γ I I Γ Lemma 8. γ : M (D) →M (D) is a continuous and monotone map. Γ I Proof. The topology of M (D) is generated by the sets of the form {m ∈ M (D) | m(a) ∈ O}, for a ∈ D and O an open subset of [0, 1]. In turn, # −1 −1 (γ ) {m ∈M (D) | m(a) ∈ O} = {μ ∈M (D) | μ(a) ∈ γ (O)} I Γ is open in M (D) because γ : Γ → [0, 1] is continuous by Lemma 2. This shows that γ : M (D) →M (D) is continuous. Monotonicity is immediate. Γ I # # Note that γ : M (D) →M (D) is surjective, since it admits ι as a (set- Γ I theoretic) section. It follows that M (D) is a compact ordered space: Corollary 9. For each distributive lattice D, M (D) is a compact ordered space. Proof. The surjection γ : M (D) →M (D) is continuous (Lemma 8). Since Γ I M (D) is compact by Proposition 4, so is M (D). The order of M (D) is clearly Γ I I closed in the product topology, thus M (D) is a compact ordered space. Finally, we see that the set-theoretic retraction of M (D)onto M (D) lifts to Γ I the topological setting, provided we restrict to the down-set topologies. If (X, ≤) is a partially ordered topological space, write X for the space with the same underlying set as X and whose topology consists of the open down-sets of X. # ↓ ↓ # ↓ ↓ Theorem 10. The maps γ : M (D) →M (D) and ι : M (D) →M (D) Γ I I Γ are a retraction-section pair of topological spaces. # # Proof. It suﬃces to show that γ and ι are continuous. It is not diﬃcult to see, # ↓ ↓ using Lemma 8, that γ : M (D) →M (D) is continuous. For the continuity Γ I # ↓ of ι , note that the topology of M (D) is generated by the sets of the form {μ ∈M (D) | μ(a) ≤ q }, for a ∈ D and q ∈ Q ∩ (0, 1]. We have # −1 − −1 − (ι ) {μ ∈M (D) | μ(a) ≤ q } = {m ∈M (D) | m(a) ∈ ι (↓q )} Γ I = {m ∈M (D) | m(a) <q}, which is an open set in M (D) . This concludes the proof. 5 The Γ-valued Stone pairing and limits of ﬁnite structures In the work of Neˇsetˇril and Ossona de Mendez, the Stone pairing -,A is [0, 1]- valued, i.e. an element of M (FO(σ)). In this section, we show that basically the I 310 M. Gehrke et al. same construction for the recognisers arising from the application of a layer of semiring quantiﬁers in logic on words (cf. Section 2.3) provides an embedding of ﬁnite σ-structures into the space of Γ-valued measures. It turns out that this embedding is a Γ-valued version of the Stone pairing. Hereafter we make a notational diﬀerence, writing -, - for the (classical) [0,1]-valued Stone pairing. The main ingredient of the construction are the Γ-valued ﬁnitely supported functions. To start with, we point out that the partial operation − on Γ uniquely determines a partial “plus” operation on Γ. Deﬁne +: dom(+) → Γ, where dom(+) = {(x, y) | x ≤ 1 − y}, by the following rules (whenever the expressions make sense): ◦ ◦ ◦ − ◦ − ◦ − − − − − r +s =(r+s) ,r +s =(r+s) ,r +s =(r+s) , and r +s =(r+s) . Then, for every y ∈ Γ, the function (-) + y sending x to x + y is left adjoint to the function (-) − y sending x to x − y. Deﬁnition 11. For any set X, F(X) is the set of all functions f : X → Γ s.t. 1. the set supp(f)= {x ∈ X | f(x) =0 } is ﬁnite, and 2. f(x )+···+f(x ) is deﬁned and equal to 1 , where supp(f)= {x ,...,x }. 1 n 1 n To improve readability, if the sum y + ··· + y exists in Γ, we denote it 1 m y . Finitely supported functions in the above sense always determine mea- i=1 sures over the power-set algebra (the proof is an easy veriﬁcation and is omitted): Lemma 12. Let X be any set. There is a well-deﬁned mapping : F(X) → M ( (X)), assigning to every f ∈F(X) the measure f : M → f = {f(x) | x ∈ M ∩ supp(f)}. 5.1 The Γ-valued Stone pairing and logic on words Fix a countably inﬁnite set of variables {v ,v ,... }. Recall that FO (σ) is the 1 2 n Lindenbaum-Tarski algebra of ﬁrst-order formulas with free variables among {v ,...,v }. The dual space of FO (σ) is the space of n-types Typ (σ). Its 1 n n points are the equivalence classes of pairs (A, α), where A is a σ-structure and α: {v ,...,v }→ A is an interpretation of the variables. Write Fin(σ) for the 1 n set of all ﬁnite σ-structures and deﬁne a map Fin(σ) →F(Typ (σ)) as A → f , n n where f is the function which sends an equivalence class E ∈ Typ (σ)to n n 1 (Add for every interpretation α of the free A n |A| f (E)= |A| variables s.t. (A, α) is in the equivalence class). (A,α)∈E By Lemma 12, we get a measure f : (Typ (σ)) → Γ. Now, for each ϕ ∈ n n FO (σ), let ϕ ⊆ Typ (σ) be the set of (equivalence classes of) σ-structures n n with interpretations satisfying ϕ. By Stone duality we obtain an embedding - :FO (σ) → (Typ (σ)). Restricting f to FO (σ), we get a measure n n n n n A A μ :FO (σ) → Γ,ϕ → f . n n n A duality theoretic view on limits of ﬁnite structures 311 Summing up, we have the composite map A A Fin(σ) →M ( (Typ (σ))) →M (FO (σ)),A → f → μ . (6) Γ Γ n n n n Essentially the same construction is featured in logic on words, cf. equation (2): • The set of ﬁnite σ-structures Fin(σ) corresponds to the set of ﬁnite words A . • The collection Typ (σ) of (equivalence classes of) σ-structures with interpre- ∗ ∗ tations corresponds to (A × 2) or, interchangeably, β(A × 2) (in the case of one free variable). • The fragment FO (σ) of ﬁrst-order logic corresponds to the Boolean algebra of languages, deﬁned by formulas with a free variable, dual to the Boolean space X appearing in (2). • The ﬁrst map in the composite (6) sends a ﬁnite structure A to the measure f which, evaluated on K ⊆ Typ (σ), counts the (proportion of) interpre- n n tations α: {v ,...,v }→ A suchthat(A, α) ∈ K, similarly to R from (2). 1 n • Finally, the second map in (6) sends a measure in M ( (Typ (σ))) to its pushforward along - :FO (σ) → (Typ (σ)). This is the second map in n n the composition (2). On the other hand, the assignment A → μ deﬁned in (6) is also closely related to the classical Stone pairing. Indeed, for every formula ϕ in FO (σ), A A μ (ϕ)= f (E)= n n |A| E∈ϕ E∈ϕ (A,α)∈E n n |{a ∈ A | A |= ϕ(a)| = =(ϕ, A ) . (7) |A| In this sense, μ can be regarded as a Γ-valued Stone pairing, relative to the fragment FO (σ). Next, we show how to extend this to the full ﬁrst-order logic FO(σ). First, we observe that the construction is invariant under extensions of the set of free variables (the proof is the same as in the classical case). A A Lemma 13. Given m, n ∈ N and A ∈ Fin(σ),if m ≥ n then (μ ) = μ . FO (σ) m n n The Lindenbaum-Tarski algebra of all ﬁrst-order formulas FO(σ) is the directed colimit of the Boolean subalgebras FO (σ), for n ∈ N. Since the functor M n Γ turns directed colimits into codirected limits (Proposition 5), the Priestley space M (FO(σ)) is the limit of the diagram n,m M (FO (σ)) M (FO (σ)) | m, n ∈ N,m ≥ n Γ n Γ m where, for any μ:FO (σ) → Γ in M (FO (σ)), the measure q (μ)isthe m Γ m n,m restriction of μ to FO (σ). In view of Lemma 13, for every A ∈ Fin(σ), the tuple (μ ) is compatible with the restriction maps. Thus, recalling that limits in n∈N the category of Priestley spaces are computed as in sets, by universality of the limit construction, this tuple yields a measure -,A :FO(σ) → Γ Γ 312 M. Gehrke et al. in the space M (FO(σ)). This we call the Γ-valued Stone pairing associated with A. As in the classical case, it is not diﬃcult to see that the mapping A →-,A gives an embedding -, - : Fin(σ) →M (FO(σ)). The following theorem illustrates the relation between the classical Stone pairing -, - : Fin(σ) → M (FO(σ)), and the Γ-valued one. Theorem 14. The following diagram commutes: M (FO(σ)) -,- Fin(σ) γ -,- M (FO(σ)) Proof. Fix an arbitrary ﬁnite structure A ∈ Fin(σ). Let ϕ be a formula in FO(σ) with free variables among {v ,...,v }, for some n ∈ N. By construction, 1 n A ◦ ϕ, A = μ (ϕ). Therefore, by equation (7), ϕ, A =(ϕ, A ) . The state- Γ n Γ I ment then follows at once. Remark. The construction in this section works also for proper fragments, i.e. for sublattices D ⊆ FO(σ). This corresponds to composing the embedding Fin(σ) →M (FO(σ)) with the restriction map M (FO(σ)) →M (D) send- Γ Γ Γ ing μ:FO(σ) → Γ to μ : D → Γ. The only diﬀerence is that the ensuing map Fin(σ) →M (D) need not be injective, in general. 5.2 Limits in the spaces of measures By Theorem 14 the Γ-valued Stone pairing -, - and the classical Stone pair- ing -, - determine each other. However, the notions of convergence asso- ciated with the spaces M (FO(σ)) and M (FO(σ)) are diﬀerent: since the Γ I topology of M (FO(σ)) is richer, there are “fewer” convergent sequences. Re- call from Lemma 8 that γ : M (FO(σ)) →M (FO(σ)) is continuous. Also, Γ I γ (-,A )= -,A by Theorem 14. Thus, for any sequence of ﬁnite structures Γ I (A ) ,if n n∈N -,A converges to a measure μ in M (FO(σ)) n Γ then -,A converges to the measure γ (μ)in M (FO(σ)). n I The converse is not true. For example, consider the signature σ = {<} con- sisting of a single binary relation symbol, and let (A ) be the sequence of n n∈N ﬁnite posets displayed in the picture below. A A A A A A ··· 1 2 3 4 5 6 A duality theoretic view on limits of ﬁnite structures 313 Let ψ(x) ≈∀y ¬(x<y) ∧∃z ¬(z< x) ∧¬(z = x) be the formula stating that x is maximal but not the maximum in the order given by <. Then, for the sublattice D = {f,ψ, t} of FO(σ), the sequences -,A and -,A converge n n Γ I in M (D) and M (D), respectively. However, if we consider the Boolean algebra Γ I B = {f,ψ, ¬ψ, t}, then the -,A ’s still converge whereas the -,A ’s do not. n n I Γ Indeed, the following sequence does not converge in Γ: ◦ ◦ ◦ ◦ ◦ ◦ 1 2 3 (¬ψ, A ) =(1 , ( ) , 1 , ( ) , 1 , ( ) ,...), n n Γ 3 4 5 ◦ − because the odd terms converge to 1 , while the even terms converge to 1 . However, there is a sequence -,B whose image under γ coincides with the limit of the -,A ’s (e.g., take the subsequence of even terms of (A ) ). In n n n∈N the next theorem, we will see that this is a general fact. Identify Fin(σ) with a subset of M (FO(σ)) (resp. M (FO(σ))) through Γ I -, - (resp. -, - ). A central question in the theory of structural limits, cf. [16], Γ I is to determine the closure of Fin(σ)in M (FO(σ)), which consists precisely of the limits of sequences of ﬁnite structures. The following theorem gives an answer to this question in terms of the corresponding question for M (FO(σ)). Theorem 15. Let Fin(σ) denote the closure of Fin(σ) in M (FO(σ)). Then the set γ (Fin(σ)) coincides with the closure of Fin(σ) in M (FO(σ)). Proof. Write U for the image of -, - : Fin(σ) →M (FO(σ)), and V for the image of -, - : Fin(σ) →M (FO(σ)). We must prove that γ (U)= V.By # # Theorem 14, γ (U)= V . The map γ : M (FO(σ)) →M (FO(σ)) is con- Γ I tinuous (Lemma 8), and the spaces M (FO(σ)) and M (FO(σ)) are compact Γ I Hausdorﬀ (Proposition 4 and Corollary 9). Since continuous maps between com- pact Hausdorﬀ spaces are closed, γ (U)= γ (U)= V . 6 The logic of measures Let D be a distributive lattice. We know from Proposition 4 that the space M (D)of Γ-valued measures on D is a Priestley space, whence it has a dual distributive lattice P(D). In this section we show that P(D) can be represented as the Lindenbaum-Tarski algebra for a propositional logic PL obtained from D by adding probabilistic quantiﬁers. Since we adopt a logical perspective, we write f and t for the bottom and top elements of D, respectively. The set of propositional variables of PL consists of the symbols P a, for D ≥p every a ∈ D and p ∈ Q ∩ [0, 1]. For every measure μ ∈M (D), we set μ |= P a ⇔ μ(a) ≥ p . (8) ≥p This satisfaction relation extends in the obvious way to the closure under ﬁnite conjunctions and ﬁnite disjunctions of the set of propositional variables. Deﬁne ϕ |= ψ if, ∀μ ∈M (D),μ |= ϕ implies μ |= ψ. Also, write |= ϕ if μ |= ϕ for every μ ∈M (D), and ϕ |= if there is no μ ∈ M (D) with μ |= ϕ. Γ 314 M. Gehrke et al. Consider the following conditions, for any p, q, r ∈ Q ∩ [0, 1] and a, b ∈ D. (L1) P a |= P a whenever p ≤ q ≥q ≥p (L2) P f |= whenever p> 0, |= P f and |= P t ≥p ≥0 ≥q (L3) P a |= P b whenever a ≤ b ≥q ≥q (L4) P a ∧ P b |= P (a∨b) ∨ P (a∧b) whenever 0 ≤ p+q−r ≤ 1 ≥p ≥q ≥p+q−r ≥r (L5) P (a∨b) ∧ P (a∧b) |= P a ∨ P b whenever 0 ≤ p+q−r ≤ 1 ≥p+q−r ≥r ≥p ≥q It is not hard to see that the interpretation in (8) validates these conditions: Lemma 16. The conditions (L1)–(L5) are satisﬁed in M (D). Write P(D) for the quotient of the free distributive lattice on the set {P a | p ∈ Q ∩ [0, 1],a ∈ D} ≥p with respect to the congruence generated by the conditions (L1)–(L5). Proposition 17. Let F ⊆ P(D) be a prime ﬁlter. The assignment a → {q | P a ∈ F } deﬁnes a measure μ : D → Γ. ≥q F Proof. Items (L2) and (L3) take care of the ﬁrst two conditions deﬁning Γ-valued measures (cf. Deﬁnition 3). We prove the ﬁrst half of the third condition, as the other half is proved in a similar fashion. We must show that, for every a, b ∈ D, μ (a) ∼ μ (a ∧ b) ≤ μ (a ∨ b) − μ (b). (9) F F F F ◦ ◦ ◦ ◦ ◦ It is not hard to show that μ (a) − r = {p − r | r ≤ p ≤ μ (a)}, and F F 1 1 x − (-) transforms non-empty joins into meets (this follows by Scott continuity ◦ ∂ of x − (-) seen as a map [0 ,x] → Γ ). Hence, equation (9) is equivalent to ◦ ◦ ◦ ◦ ◦ ◦ {p − r | μ (a ∧ b) <r ≤ p ≤ μ (a)}≤ {μ (a ∨ b) − q | q ≤ μ (b)}. F F F F To settle this inequality it is enough to show that, provided μ (a ∧ b) <r ≤ ◦ ◦ ◦ ◦ p ≤ μ (a) and q ≤ μ (b), we have (p − r) ≤ μ (a ∨ b) − q . The latter F F F inequality is equivalent to (p + q − r) ≤ μ (a ∨ b). In turn, using (L4) and the fact that F is a prime ﬁlter, P a, P b ∈ F and P (a ∧ b) ∈ / F entail ≥p ≥q ≥r P (a ∨ b) ∈ F . Whence, ≥p+q−r ◦ ◦ μ (a ∨ b)= {s | P (a ∨ b) ∈ F}≥ (p + q − r) . F ≥s We can now describe the dual lattice of M (D) as the Lindenbaum-Tarski algebra for the logic PL , built from the propositional variables P a by im- D ≥p posing the laws (L1)–(L5). Theorem 18. Let D be a distributive lattice. Then the lattice P(D) is isomor- phic to the distributive lattice dual to the Priestley space M (D). Proof. Let X be the space dual to P(D). By Proposition 17 there is a map P(D) ϑ: X →M (D), F → μ . We claim that ϑ is an isomorphism of Priestley P(D) Γ F space. Clearly, ϑ is monotone. If μ (a) ≤ μ (a) for some a ∈ D,wehave F F 1 2 ◦ − {q | P a ∈ F } = μ (a) ≤ μ (a)= {p | P a/ ∈ F }. (10) ≥q 1 F F ≥p 2 1 2 A duality theoretic view on limits of ﬁnite structures 315 Equation (10) implies the existence of p, q satisfying P a ∈ F , P a/ ∈ F and ≥q 1 ≥p 2 q ≥ p. It follows by (L1) that P a ∈ F . We conclude that P a ∈ F \ F , ≥p 1 ≥p 1 2 whence F ⊆ F . This shows that ϑ is an order embedding, whence injective. 1 2 We prove that ϑ is surjective, thus a bijection. Fix a measure μ ∈M (D). It is not hard to see, using Lemma 16, that the ﬁlter F ⊆ P(D) generated by {P a | a ∈ D, q ∈ Q ∩ [0, 1],μ(a) ≥ q } ≥q ◦ ◦ ◦ is prime. Further, ϑ(F )(a)= {q | P a ∈ F } = {q | μ(a) ≥ q } = μ(a) μ ≥q μ for every a ∈ D. Hence, ϑ(F )= μ and ϑ is surjective. To settle the theorem it remains to show that ϑ is continuous. Note that for a basic clopen of the form C = {μ ∈M (D) | μ(a) ≥ p } where a ∈ D and −1 ◦ p ∈ Q ∩ [0, 1], the preimage ϑ (C)= {F ⊆ P(D) | μ (a) ≥ p } is equal to ◦ ◦ {F ∈ X | {q | P a ∈ F}≥ p } = {F ∈ X | P a ∈ F }, P(D) ≥q P(D) ≥p which is a clopen of X . Similarly, if C = {μ ∈M (D) | μ(a) ≤ q } for some P(D) −1 a ∈ D and q ∈ Q∩(0, 1], by the claim above ϑ (C)= {F ∈ X | P a/ ∈ F }, ≥q P(D) which is again a clopen of X . P(D) By Theorem 18, for any distributive lattice D, the lattice of clopen up-sets of M (D) is isomorphic to the Lindenbaum-Tarski algebra P(D)ofour positive propositional logic PL . Moving from the lattice of clopen up-sets to the Boolean algebra of all clopens logically corresponds to adding negation to the logic. The logic obtained this way can be presented as follows. Introduce a new propositional variable P a, for each a ∈ D and q ∈ Q ∩ [0, 1]. For a measure μ ∈M (D), set <q Γ μ |= P a ⇔ μ(a) <q . <q We also add a new rule, stating that P a is the negation of P a: <q ≥q (L6) P a ∧ P a |= and |= P a ∨ P a <q ≥q <q ≥q Clearly, (L6) is satisﬁed in M (D). Moreover, the Boolean algebra of all clopens of M (D) is isomorphic to the quotient of the free distributive lattice on {P a | p ∈ Q ∩ [0, 1],a ∈ D}∪{P b | q ∈ Q ∩ [0, 1],b ∈ D} ≥p <q with respect to the congruence generated by the conditions (L1)–(L6). Specialising to FO(σ). Let us brieﬂy discuss what happens when we instantiate D with the full ﬁrst-order logic FO(σ). For a formula ϕ ∈ FO(σ) with free variables v ,...,v and a q ∈ Q ∩ [0, 1], we have two new sentences P ϕ and P ϕ.For 1 n ≥q <q a ﬁnite σ-structure A identiﬁed with its Γ-valued Stone pairing -,A , ◦ ◦ A |= P ϕ (resp. A |= P ϕ)iﬀ ϕ, A ≥ q (resp. ϕ, A <q ). ≥q <q Γ Γ That is, P ϕ is true in A if a random assignment of the variables v ,...,v in A ≥q 1 n satisﬁes ϕ with probability at least q. Similarly for P ϕ. If we regard P and <q ≥q P as probabilistic quantiﬁers that bind all free variables of a given formula, <q the Stone pairing -, - : Fin →M (FO(σ)) can be seen as the embedding of ﬁnite structures into the space of types for the logic PL . FO(σ) 316 M. Gehrke et al. Conclusion Types are points of the dual space of a logic (viewed as a Boolean algebra). In classical ﬁrst-order logic, 0-types are just the models modulo elementary equiv- alence. But when there are not ‘enough’ models, as in ﬁnite model theory, the spaces of types provide completions of the sets of models. In [5], it was shown that for logic on words and various quantiﬁers we have that, given a Boolean algebra of formulas with a free variable, the space of types of the Boolean algebra generated by the formulas obtained by quantiﬁcation is given by a measure space construction. Here we have shown that a suitable enrichment of ﬁrst-order logic gives rise to a space of measures M (FO(σ)) closely related to the space M (FO(σ)) used in the theory of structural limits. Indeed, Theorem 14 tells us that the ensuing Stone pairings interdetermine each other. Further, the Stone pairing for M (FO(σ)) is just the embedding of the ﬁnite models in the completion/compactiﬁcation provided by the space of types of the enriched logic. These results identify the logical gist of the theory of structural limits, and provide a new and interesting connection between logic on words and the theory of structural limits in ﬁnite model theory. But we also expect that it may prove a useful tool in its own right. Thus, for structural limits, it is an open problem to characterise the closure of the image of the [0, 1]-valued Stone pairing [16]. Rea- soning in the Γ-valued setting, native to logic and where we can use duality, one would expect that this is the subspace M (Th(Fin)) of M (FO(σ)) given by the Γ Γ quotient FO(σ) Th(Fin) onto the theory of pseudoﬁnite structures. The pur- pose of such a characterisation would be to understand the points of the closure as “generalised models”. Another subject that we would like to investigate is that of zero-one laws. The zero-one law for ﬁrst-order logic states that the sequence of measures for which the nth measure, on a sentence ψ, yields the proportion of n-element structures satisfying ψ, converges to a {0, 1}-valued measure. Over Γ this will no longer be true as 1 is split into its ‘limiting’ and ‘achieved’ personae. Yet, we expect the above sequence to converge also in this setting and, by The- ◦ − ◦ orem 14, it will converge to a {0 , 1 , 1 }-valued measure. Understanding this more ﬁne-grained measure may yield useful information about the zero-one law. Further, it would be interesting to investigate whether the limits for schema mappings introduced by Kolaitis et al. [13] may be seen also as a type-theoretic construction. Finally, we would want to explore the connections with other se- mantically inspired approaches to ﬁnite model theory, such as those recently put forward by Abramsky, Dawar et al. [2,3]. A duality theoretic view on limits of ﬁnite structures 317 References 1. Abramsky, S.: Domain theory in logical form. Ann. Pure Appl. Logic 51, 1–77 (1991) 2. Abramsky, S., Dawar, A., Wang, P.: The pebbling comonad in ﬁnite model theory. In: 32nd Annual ACM/IEEE Symposium on Logic in Computer Science, LICS. pp. 1–12 (2017) 3. Abramsky, S., Shah, N.: Relating Structure and Power: Comonadic semantics for computational resources. In: 27th EACSL Annual Conference on Computer Science Logic, CSL. pp. 2:1–2:17 (2018) 4. Gehrke, M., Grigorieﬀ, S., Pin, J.-E.: Duality and equational theory of regular languages. In: Automata, languages and programming II, LNCS, vol. 5126, pp. 246–257. Springer, Berlin (2008) 5. Gehrke, M., Petri¸san, D., Reggio, L.: Quantiﬁers on languages and codensity mon- ads. In: 32nd Annual ACM/IEEE Symposium on Logic in Computer Science, LICS. pp. 1–12 (2017) 6. Gehrke, M., Petri¸san, D., Reggio, L.: Quantiﬁers on languages and codensity mon- ads (2019), extended version. Submitted. Preprint available at https://arxiv.org/ abs/1702.08841 7. Gehrke, M., Priestley, H.A.: Canonical extensions of double quasioperator algebras: an algebraic perspective on duality for certain algebras with binary operations. J. Pure Appl. Algebra 209(1), 269–290 (2007) 8. Gehrke, M., Priestley, H.A.: Duality for double quasioperator algebras via their canonical extensions. Studia Logica 86(1), 31–68 (2007) 9. Goldblatt, R.: Varieties of complex algebras. Ann. Pure Appl. Logic 44(3), 173–242 (1989) 10. van Gool, S.J., Steinberg, B.: Pro-aperiodic monoids via saturated models. In: 34th Symposium on Theoretical Aspects of Computer Science, STACS. pp. 39:1–39:14 (2017) 11. Johnstone, P.T.: Stone spaces, Cambridge Studies in Advanced Mathematics, vol. 3. Cambridge University Press (1986), reprint of the 1982 edition 12. Jung, A.: Continuous domain theory in logical form. In: Coecke, B., Ong, L., Panan- gaden, P. (eds.) Computation, Logic, Games, and Quantum Foundations, Lecture Notes in Computer Science, vol. 7860, pp. 166–177. Springer Verlag (2013) 13. Kolaitis, P.G., Pichler, R., Sallinger, E., Savenkov, V.: Limits of schema mappings. Theory of Computing Systems 62(4), 899–940 (2018) 14. Matz, O., Schweikardt, N.: Expressive power of monadic logics on words, trees, pictures, and graphs. In: Logic and Automata: History and Perspectives. pp. 531– 552 (2008) 15. Neˇsetˇril, J., Ossona de Mendez, P.: A model theory approach to structural limits. Commentationes Mathematicae Universitatis Carolinae 53(4), 581–603 (2012) 16. Neˇsetˇril, J., Ossona de Mendez, P.: First-order limits, an analytical perspective. European Journal of Combinatorics 52, 368–388 (2016) 17. Neˇsetˇril, J., Ossona de Mendez, P.: A uniﬁed approach to structural limits and limits of graphs with bounded tree-depth (2020), to appear in Memoirs of the American Mathematical Society 18. Pin, J.-E.: Proﬁnite methods in automata theory. In: 26th Symposium on Theo- retical Aspects of Computer Science, STACS. pp. 31–50 (2009) 19. Priestley, H.A.: Representation of distributive lattices by means of ordered Stone spaces. Bull. London Math. Soc. 2, 186–190 (1970) 318 M. Gehrke et al. Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made. The images or other third party material in this chapter are included in the chapter’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. Correctness of Automatic Diﬀerentiation via Diﬀeologies and Categorical Gluing 1∗ 1∗ 2∗ Mathieu Huot , Sam Staton , and Matthijs V´ak´ar University of Oxford, UK Utrecht University, The Netherlands Equal contribution mathieu.huot@stx.ox.ac.uk Abstract. We present semantic correctness proofs of Automatic Diﬀer- entiation (AD). We consider a forward-mode AD method on a higher order language with algebraic data types, and we characterise it as the unique structure preserving macro given a choice of derivatives for basic operations. We describe a rich semantics for diﬀerentiable programming, based on diﬀeological spaces. We show that it interprets our language, and we phrase what it means for the AD method to be correct with re- spect to this semantics. We show that our characterisation of AD gives rise to an elegant semantic proof of its correctness based on a gluing construction on diﬀeological spaces. We explain how this is, in essence, a logical relations argument. Finally, we sketch how the analysis extends to other AD methods by considering a continuation-based method. 1 Introduction Automatic diﬀerentiation (AD), loosely speaking, is the process of taking a pro- gram describing a function, and building the derivative of that function by ap- plying the chain rule across the program code. As gradients play a central role in many aspects of machine learning, so too do automatic diﬀerentiation systems such as TensorFlow [1] or Stan [6]. automatic Diﬀerentiation has a well diﬀerentiation Programs Programs developed mathematical the- ory in terms of diﬀerential ge- denotational denotational ometry. The aim of this paper semantics semantics math is to formalize this connec- Diﬀerential diﬀerentiation Diﬀerential tion between diﬀerential ge- geometry geometry ometry and the syntactic op- erations of AD. In this way we Fig. 1. Overview of semantics/correctness of AD. achieve two things: (1) a com- positional, denotational understanding of diﬀerentiable programming and AD; (2) an explanation of the correctness of AD. This intuitive correspondence (summarized in Fig. 1) is in fact rather com- plicated. In this paper we focus on resolving the following problem: higher order functions play a key role in programming, and yet they have no counterpart in traditional diﬀerential geometry. Moreover, we resolve this problem while retain- ing the compositionality of denotational semantics. The Author(s) 2020 J. Goubault-Larrecq and B. K¨ onig (Eds.): FOSSACS 2020, LNCS 12077, pp. 319–338, 2020. https://doi.org/10.1007/978-3-030-45231-5_17 320 M. Huot et al. Higher order functions and diﬀerentiation. A major application of higher order functions is to support disciplined code reuse. Code reuse is particularly acute in machine learning. For example, a multi-layer neural network might be built of millions of near-identical neurons, as follows. n n neuron : (real ∗(real ∗real)) → real 0.5 def neuron = λx, w, b.ς(w · x + b) n n layer :((τ ∗P ) → τ ) → (τ ∗P ) → τ 1 2 1 n 2 def layer = λf. λx, p ,...,p . fx, p ,...,fx, p 1 n 1 n −5 0 5 comp : (((τ ∗P ) → τ )∗((τ ∗Q) → τ )) → (τ ∗(P∗Q)) → τ 1 2 2 3 1 3 def comp = λf, g.λx, (p, q).gfx, p,q def (Here ς(x) = is the sigmoid function, as illustrated.) We can use these −x 1+e functions to build a network as follows (see also Fig. 2): complayer (neuron ), complayer (neuron ), neuron : (real ∗P ) → real k m n m n (1) Here P real with p =(m(k+1)+n(m+1)+n+1). ··· 1 2 3 k This program (1) describes a smooth (inﬁnitely dif- ferentiable) function. The goal of automatic diﬀer- ··· entiation is to ﬁnd its derivative. If we β-reduce all the λ’s, we end up with a very ··· long function expression just built from the sigmoid function and linear algebra. We can then ﬁnd a pro- gram for calculating its derivative by applying the chain rule. However, automatic diﬀerentiation can Fig. 2. The network in (1) also be expressed without ﬁrst β-reducing, in a com- with k inputs and two hid- positional way, by explaining how higher order func- den layers. tions like (layer) and (comp) propagate derivatives. This paper is a semantic analysis of this compositional approach. The general idea of denotational semantics is to interpret types as spaces and programs as functions between the spaces. In this paper, we propose to use diﬀeological spaces and smooth functions [32, 16] to this end. These satisfy the following three desiderata: – R is a space, and the smooth functions R → R are exactly the functions that are inﬁnitely diﬀerentiable; – The set of smooth functions X → Y between spaces again forms a space, so we can interpret function types. – The disjoint union of a sequence of spaces again forms a space, so we can interpret variant types and inductive types. We emphasise that the most standard formulation of diﬀerential geometry, using manifolds, does not support spaces of functions. Diﬀeological spaces seem to us the simplest notion of space that satisﬁes these conditions, but there are other ς(x) Correctness of AD via Diﬀeologies and Categorical Gluing 321 candidates [3, 33]. A diﬀeological space is in particular a set X equipped with a chosen set of curves C ⊆ X and a smooth map f : X → Y must be such that if γ ∈ C then γ; f ∈ C . This is remiscent of the method of logical relations. X Y From smoothness to automatic derivatives at higher types. Our denota- tional semantics in diﬀeological spaces guarantees that all deﬁnable functions are smooth. But we need more than just to know that a deﬁnable function happens to have a mathematical derivative: we need to be able to ﬁnd that derivative. In this paper we focus on a simple, forward mode automatic diﬀerentiation → − method, which is a macro translation on syntax (called D in §2). We are able to show that it is correct, using our denotational semantics. Here there is one subtle point that is central to our development. Although diﬀerential geometry provides established derivatives for ﬁrst order functions (such as neuron above), there is no canonical notion of derivative for higher order functions (such as layer and comp) in the theory of diﬀeological spaces (e.g. [7]). We propose a new way to resolve this, by interpreting types as triples (X, X ,S) where, intuitively, X is a space of inhabitants of the type, X is a space serving R R as a chosen bundle of tangents over X, and S ⊆ X × X is a binary relation between curves, informally relating curves in X with their tangent curves in X . This new model gives a denotational semantics for automatic diﬀerentiation. In §3 we boil this new approach down to a straightforward and elementary logical relations argument for the correctness of automatic diﬀerentiation. The approach is explained in detail in §5. Related work and context. AD has a long history and has many implemen- tations. AD was perhaps ﬁrst phrased in a functional setting in [26], and there are now a number of teams working on AD in the functional setting (e.g. [34, 31, 12]), some providing eﬃcient implementations. Although that work does not involve formal semantics, it is inspired by intuitions from diﬀerential geometry and category theory. This paper adds to a very recent body of work on veriﬁed automatic diﬀeren- tiation. Much of this is concurrent with and independent from the work in this article. In the ﬁrst order setting, there are recent accounts based on denotational semantics in manifolds [13] and based on synthetic diﬀerential geometry [9], as well as work making a categorical abstraction [8] and work connecting oper- ational semantics with denotational semantics [2, 28]. Recently there has also been signiﬁcant progress at higher types. The work of Brunel et al. gives formal correctness proofs for reverse-mode derivatives on computation graphs [5]. The work of Barthe et al. [4] provides a general discussion of some new syntactic logical relations arguments including one very similar to our syntactic proof of Theorem 1. We understand that the authors of [9] are working on higher types. The diﬀerential λ-calculus [11] is related to AD, and explicit connections are made in [22, 23]. One diﬀerence is that the diﬀerential λ-calculus allows addition of terms at all types, and hence vector space models are suitable, but this appears peculiar with the variant and inductive types that we consider here. Finally we emphasise that we have chosen the neural network (1) as our running example mainly for its simplicity. There are many other examples of AD 322 M. Huot et al. outside the neural networks literature: AD is useful whenever derivatives need to be calculated on high dimensional spaces. This includes optimization problems more generally, where the derivative is passed to a gradient descent method (e.g. [30, 18, 29, 19, 10, 21]). Other applications of AD are in advanced integration methods, since derivatives play a role in Hamiltonian Monte Carlo [25, 14] and variational inference [20]. Summary of contributions. We have provided a semantic analysis of auto- matic diﬀerentiation. Our syntactic starting point is a well-known forward-mode AD macro on a typed higher order language (e.g. [31, 34]). We recall this in §2 for function types, and in §4 we extend it to inductive types and variants. The main contributions of this paper are as follows. – We give a denotational semantics for the language in diﬀeological spaces, showing that every deﬁnable expression is smooth (§3). – We show correctness of the AD macro by a logical relations argument (Th. 1). – We give a categorical analysis of this correctness argument with two parts: canonicity of the macro in terms of syntactic categories, and a new notion of glued space that abstracts the logical relation (§5). – We then use this analysis to state and prove a correctness argument at all ﬁrst order types (Th. 2). – We show that our method is not speciﬁc to one particular AD macro, by also considering a continuation-based AD method (§6). 2 A simple forward-mode AD translation Rudiments of diﬀerentiation and dual numbers. Recall that the derivative of a function f : R → R, if it exists, is a function ∇f : R → R such that df(x) ∇f(x )= (x ) is the gradient of f at x . 0 0 0 dx To ﬁnd ∇f in a compositional way, two generalizations are reasonable: – We need both f and ∇f when calculating ∇(f; g) of a composition f; g, using the chain rule, so we are really interested in the pair (f, ∇f): R → R × R; – In building f we will need to consider functions of multiple arguments, such as + : R → R, and these functions should propagate derivatives. Thus we are more generally interested in transforming a function g : R → R into a function h :(R×R) → R×R in such a way that for any f ...f : R → R, 1 n (f , ∇f ,...,f , ∇f ); h =((f ,...,f ); g, ∇((f ,...,f ); g)). (2) 1 1 n n 1 n 1 n An intuition for h is often given in terms of dual numbers. The transformed function operates on pairs of numbers, (x, x ), and it is common to think of such a pair as x + x for an ‘inﬁnitesimal’ . But while this is a helpful intuition, the formalization of inﬁnitesimals can be intricate, and the development in this paper is focussed on the elementary formulation in (2). The reader may also notice that h encodes all the partial derivatives of g. def def For example, if g : R → R, then with f (x) = x and f (x) = x , by apply- 1 2 2 ∂g(x,x ) ing (2) to x we obtain h(x , 1,x ,0)=(g(x ,x ), (x )) and similarly 1 1 2 1 2 1 ∂x Correctness of AD via Diﬀeologies and Categorical Gluing 323 ∂g(x ,x) h(x , 0,x ,1)=(g(x ,x ), (x )). And conversely, if g is diﬀerentiable in 1 2 1 2 2 ∂x each argument, then a unique h satisfying (2) can be found by taking linear combinations of partial derivatives: ∂g(x,x ) ∂g(x ,x) 2 1 h(x ,x ,x ,x )=(g(x ,x ),x · (x )+ x · (x )). 1 2 1 2 1 2 1 2 1 2 ∂x ∂x In summary, the idea of diﬀerentiation with dual numbers is to transform a n 2n 2 diﬀerentiable function g : R → R to a function h : R → R which captures g and all its partial derivatives. We packaged this up in (2) as a sort-of invariant which is useful for building derivatives of compound functions R → R in a compositional way. The idea of forward mode automatic diﬀerentiation is to perform this transformation at the source code level. A simple language of smooth functions. We consider a standard higher order typed language with a ﬁrst order type real of real numbers. The types (τ, σ) and terms (t, s) are as follows. τ, σ, ρ ::= types | (τ ∗ ... ∗τ ) ﬁnite product 1 n | real real numbers | τ → σ function t, s, r ::= terms x variable | c | t + s | t ∗ s | ς(t) operations/constants |t ,...,t | case t of x ,...,x → s tuples/pattern matching 1 n 1 n | λx.t | ts function abstraction/app. The typing rules are in Figure 3. We have included a minimal set of operations for the sake of illustration, but it is not diﬃcult to add further operations. We def add some simple syntactic sugar t − u = t +(−1) ∗ u. We intend ς to stand def for the sigmoid function, ς(x) = . We further include syntactic sugar −x 1+e let x = t in s for (λx.s) t and λx ,...,x .t for λx.case x of x ,...,x → t. 1 n 1 n Syntactic automatic diﬀerentiation: a functorial macro. The aim of for- ward mode AD is to ﬁnd the dual numbers representation of a function by syntactic manipulations. For our simple language, we implement this as the fol- → − lowing inductively deﬁned macro D on both types and terms (see also [34, 31]): → − def → − def → − → − D (real) = (real∗real) D (τ → σ) = D (τ) → D (σ) → − def → − → − D ((τ ∗ ··· ∗τ )) = (D (τ )∗ ··· ∗D (τ )) 1 n 1 n Γ t : real Γ s : real Γ t : real Γ s : real Γ t : real (c ∈ R) Γ c : real Γ t + s : real Γ t ∗ s : real Γ ς(t): real Γ t : τ ... Γ t : τ Γ t : (σ ∗ ... ∗σ ) Γ, x : σ , ..., x : σ s : τ 1 1 n n 1 n 1 1 n n Γ t ,...,t : (τ ∗ ... ∗τ ) Γ case t of x ,...,x → s : τ 1 n 1 n 1 n Γ, x : τ t : σ Γ t : σ →τΓ s : σ ((x : τ) ∈ Γ ) Γ x : τ Γ λx : τ.t : τ → σ Γ ts : τ Fig. 3. Typing rules for the simple language. 324 M. Huot et al. → − def → − def D (x) = x D (c) = c, 0 → − def → − → − D (t + s) = case D (t) of x, x → case D (s) of y, y →x + y, x + y → − def → − → − D (t ∗ s) = case D (t) of x, x → case D (s) of y, y →x ∗ y, x ∗ y + x ∗ y → − def → − D (ς(t)) = case D (t) of x, x → let y = ς(x) in y, x ∗ y ∗ (1 − y) → − def → − → − def → − → − → − def → − → − D (λx.t) = λx.D (t) D (ts) = D (t) D (s) D (t ,...,t ) = D (t ),..., D (t ) 1 n 1 n → − def → − → − D (case t of x ,...,x → s) = case D (t) of x ,...,x → D (s) 1 n 1 n → − → − def → − → − We extend D to contexts: D ({x :τ , ..., x :τ }) = {x :D (τ ), ..., x :D (τ )}. 1 1 n n 1 1 n n → − This turns D into a well-typed, functorial macro in the following sense. → − → − → − Lemma 1 (Functorial macro). If Γ t : τ then D (Γ ) D (t): D (τ). → − → − → − → − s D (s) If Γ, x : σ t : τ and Γ s : σ then D (Γ ) D (t[ / ]) = D (t)[ / ]. x x Example 1 (Inner products). Let us write τ for the n-fold product (τ∗ ... ∗τ). Then, given Γ t, s : real we can deﬁne their inner product def t · s = case t of z ,...,z → n 1 n case s of y ,...,y → z ∗ y + ··· + z ∗ y : real 1 n 1 1 n n → − → − To illustrate the calculation of D , let us expand (and β-reduce) D (t · s): → − → − case D (t) of z ,z → case D (s) of y ,y → case z of z ,z → 1 2 1 2 1 1,1 1,2 case y of y ,y → case z of z ,z → case y of y ,y → 1 1,1 1,2 2 2,1 2,2 2 2,1 2,2 z ∗ y + z ∗ y ,z ∗ y + z ∗ y + z ∗ y + z ∗ y 1,1 1,1 2,1 2,1 1,1 1,2 1,2 1,1 2,1 2,2 2,2 2,1 Example 2 (Neural networks). In our introduction (1), we provided a program in our language to build a neural network out of expressions neuron, layer, comp; this program makes use of the inner product of Ex. 1. We can similarly calculate → − D of such deep neural nets by mechanically applying the macro. 3 Semantics of diﬀerentiation Consider for a moment the ﬁrst order fragment of the language in § 2, with only one type, real, and no λ’s or pairs. This has a simple semantics in the category of cartesian spaces and smooth maps. Indeed, a term x ...x : real t : real 1 n has a natural reading as a function t : R → R by interpreting our operation symbols by the well-known operations on R → R with the corresponding name. In fact, the functions that are deﬁnable in this ﬁrst order fragment are smooth, which means that they are continuous, diﬀerentiable, and their derivatives are continuous, diﬀerentiable, and so on. Let us write CartSp for this category of cartesian spaces (R for some n) and smooth functions. The category CartSp has cartesian products, and so we can also interpret product types, tupling and pattern matching, giving us a useful syntax for con- structing functions into and out of products of R. For example, the interpretation of (neuron ) in (1) becomes · ×id + ς n R n n R × R × R −−−−−→ R × R −−→ R −−→ R. where · , + and ς are the usual inner product, addition and the sigmoid function on R, respectively. Correctness of AD via Diﬀeologies and Categorical Gluing 325 Inside this category, we can straightforwardly study the ﬁrst order language without λ’s, and automatic diﬀerentiation. In fact, we can prove the following by plain induction on the syntax: → − The interpretation of the (syntactic) forward AD D (t) of a ﬁrst-order term t equals the usual (semantic) derivative of the interpretation of t as a smooth function. However, as is well known, the category CartSp does not support function spaces. To see this, notice that we have polynomial terms x ,...,x : real λy. x y : real → real 1 d n n=1 for each d, and so if we could interpret (real → real) as a Euclidean space R then, by interpreting these polynomial expressions, we would be able to ﬁnd d p continuous injections R → R for every d, which is topologically impossible for any p, for example as a consequence of the Borsuk-Ulam theorem (see [15], Appx. A). This means that we cannot interpret the functions (layer) and (comp) from (1) in CartSp, as they are higher order functions, even though they are very use- ful and innocent building blocks for diﬀerential programming! Clearly, we could deﬁne neural nets such as (1) directly as smooth functions without any higher order subcomponents, though that would quickly become cumbersome for deep networks. A problematic consequence of the lack of a semantics for higher order diﬀerential programs is that we have no obvious way of establishing composi- → − tional semantic correctness of D for the given implementation of (1). Diﬀeological spaces. This motivates us to turn to a more general notion of diﬀerential geometry for our semantics, based on diﬀeological spaces [16]. The key idea will be that a higher order function is called smooth if it sends smooth functions to smooth functions, meaning that we can never use it to build ﬁrst order functions that are not smooth. For example, (comp) in (1) has this property. Deﬁnition 1. A diﬀeological space (X, P ) consists of a set X together with, n U for each n and each open subset U of R , a set P ⊆ [U → X] of functions, called plots, such that – all constant functions are plots; U V – if f : V → U is a smooth function and p ∈P , then f; p ∈P ; X X – if p ∈P is a compatible family of plots (x ∈ U ∩ U ⇒ p (x)= p (x)) i i j i j i∈I and (U ) covers U, then the gluing p : U → X : x ∈ U
→ p (x) is a plot. i i i i∈I We call a function f : X → Y between diﬀeological spaces smooth if, for all plots U U p ∈P , we have that p; f ∈P . We write Diﬀ(X, Y ) for the set of smooth X Y maps from X to Y . Smooth functions compose, and so we have a category Diﬀ of diﬀeological spaces and smooth functions. A diﬀeological space is thus a set equipped with structure. Many construc- tions of sets carry over straightforwardly to diﬀeological spaces. Example 3 (Cartesian diﬀeologies). Each open subset U of R can be given the structure of a diﬀeological space by taking all the smooth functions V → U 326 M. Huot et al. as P . It is easily seen that smooth functions from V → U in the traditional sense coincide with smooth functions in the sense of diﬀeological spaces. Thus diﬀeological spaces have a profound relationship with ordinary calculus. In categorical terms, this gives a full embedding of CartSp in Diﬀ. Example 4 (Product diﬀeologies). Given a family (X ) of diﬀeological spaces, i∈I we can equip the product X of sets with the product diﬀeology in which i∈I U-plots are precisely the functions of the form (p ) for p ∈P . i i i∈I X This gives us the categorical product in Diﬀ. Example 5 (Functional diﬀeology). We can equip the set Diﬀ(X, Y ) of smooth functions between diﬀeological spaces with the functional diﬀeology in which U- plots consist of functions f : U → Diﬀ(X, Y ) such that (u, x)
→ f(u)(x)isan element of Diﬀ(U × X, Y ). This speciﬁes the categorical function object in Diﬀ. Semantics and correctness of AD. We can now give a denotational seman- tics to our language from § 2. We interpret each type τ as a set τ equipped with the relevant diﬀeology, by induction on the structure of types: def def def real = R (τ ∗ ... ∗τ ) = ττ → σ = Diﬀ(τ , σ) 1 n i i=1 def A context Γ =(x : τ ...x : τ ) is interpreted as a diﬀeological space Γ = 1 1 n n τ . Now well typed terms Γ t : τ are interpreted as smooth functions i=1 t : Γ → τ , giving a meaning for t for every valuation of the context. This is routinely deﬁned by induction on the structure of typing derivations. Constants c : real are interpreted as constant functions; and the ﬁrst order operations (+, ∗,ς) are interpreted by composing with the corresponding functions, which def are smooth. For example, ς(t)(ρ) = ς(t(ρ)), where ρ ∈ Γ . Variables are def interpreted as x (ρ) = ρ . The remaining constructs are interpreted as follows, i i and it is straightforward to show that smoothness is preserved. def def t ,...,t (ρ) =(t (ρ),..., t (ρ)) λx:τ.t(ρ)(a) = t(ρ, a)(a ∈ τ ) 1 n 1 n def def case t of ...→ s(ρ) = s(ρ, t(ρ)) ts(ρ) = t(ρ)(s(ρ)) Notice that a term x : real,...,x : real t : real is interpreted as a smooth 1 n function t : R → R,evenif t involves higher order functions (like (1)). More- → − → − over the macro diﬀerentiation D (t) is a function D (t) :(R × R) → (R × R). This enables us to state a limited version of our main correctness theorem: → − Theorem 1 (Semantic correctness of D (limited)). For any term → − x : real,...,x : real t : real, the function D (t) is the dual numbers repre- 1 n sentation (2) of t. In detail: for any smooth functions f ...f : R → R, 1 n → − (f , ∇f ,...,f , ∇f ); D (t) = (f ...f ); t, ∇((f ...f ); t) . 1 1 n n 1 n 1 n → − ∂t(x,x ) (For instance, if n = 2, then D (t)(x , 1,x ,0)=(t(x ,x ), (x )).) 1 2 1 2 1 ∂x Correctness of AD via Diﬀeologies and Categorical Gluing 327 Proof. We prove this by logical relations. Although the following proof is ele- mentary, we found it by using the categorical methods in § 5. For each type τ, we deﬁne a binary relation S between curves in τ and → − R R curves in D (τ), i.e. S ⊆P ×P , by induction on τ: τ → − D (τ) def – S = {(f, (f, ∇f)) | f : R → R smooth}; real def – S = {((f ,g ), (f ,g )) | (f ,f ) ∈ S , (g ,g ) ∈ S }; 1 1 2 2 1 2 τ 1 2 σ (τ∗σ) def – S = {(f ,f ) |∀(g ,g ) ∈ S .(x
→f (x)(g (x)),x
→f (x)(g (x))) ∈ S }. τ→σ 1 2 1 2 τ 1 1 2 2 σ Then, we establish the following ‘fundamental lemma’: If x :τ , ..., x :τ t : σ and, for all 1≤i≤n, y ...y : real s : τ 1 1 n n 1 m i i → − is such that ((f ,...,f ); s , (f , ∇f ), ..., f , ∇f ); D (s )) ∈ S for 1 m i 1 1 m m i τ all smooth f : R → R, then → − s s s s 1 n 1 n (f , ..., f ); t[ / , ..., / ], (f , ∇f , ..., f , ∇f ); D (t[ / , ..., / ]) 1 m x x 1 1 m m x x 1 n 1 n is in S for all smooth f : R → R. σ i This is proved routinely by induction on the typing derivation of t. The case → − for ∗ relies on the precise deﬁnition of D (t ∗ s), and similarly for +,ς. We conclude the theorem from the fundamental lemma by considering the case where τ = σ = real, m = n and s = y . i i i 4 Extending the language: variant and inductive types In this section, we show that the deﬁnition of forward AD and the semantics generalize if we extend the language of §2 with variants and inductive types. As an example of inductive types, we consider lists. This speciﬁc choice is only for expository purposes and the whole development works at the level of generality of arbitrary algebraic data types generated as initial algebras of (polynomial) type constructors formed by ﬁnite products and variants. Similarly, our choice of operations is for expository purposes. More generally, assume given a family of operations (Op ) indexed by their arity n. Further n∈N assume that each op ∈ Op has type real → real. We then ask for a certain closure of these operations under diﬀerentiation, that is we deﬁne → − def → − → − D (op(t ,...,t )) = case D (t ) of x ,x → ... → case D (t ) of x ,x → 1 n 1 1 n n 1 n op(x ,...,x ), x ∗ ∂ op(x ,...,x ) 1 n i 1 n i=1 i where ∂ op(x ,...,x ) is some chosen term in the language, involving free vari- i 1 n ables from x ,...,x , which we think of as implementing the partial derivative 1 n of op with respect to its i-th argument. For constructing the semantics, every op must be interpreted by some smooth function, and, to establish correctness, the semantics of ∂ op(x ,...,x ) must be the semantic i-th partial derivative of the i 1 n semantics of op(x ,...,x ). 1 n Language. We additionally consider the following types and terms: τ, σ, ρ ::= types | list(τ) list |{ τ ... τ } variant 1 1 n n 328 M. Huot et al. t, s, r ::= terms | τ. t variant constructor | [] | t :: s empty list and cons | case t of { x → s ··· x → s } pattern matching: variants 1 1 1 n n n | fold (x ,x ).t over s from r list fold 1 2 We extend the type system according to: t : τ Γ t :τΓ s : list(τ) (( τ ) ∈ τ) i i τ. t : τ Γ [] : list(τ) Γ t :: s : list(τ) t : { τ ... τ } for each 1 ≤ i ≤ n: Γ, x : τ s : τ 1 1 n n i i i case t of { x → s ··· x → s } : τ 1 1 1 n n n s : list(τ) Γ r :σΓ,x : τ, x : σ t : σ 1 2 fold (x ,x ).t over s from r : σ 1 2 → − We can then extend D to our new types and terms by → − def → − → − → − def → − D ({ τ ... τ }) = { D (τ ) ... D (τ )} D (list(τ)) = list(D (τ)) 1 1 n n 1 1 n n → − def → − → − → − def → − def → − → − D (τ. t) = D (τ). D (t) D ([ ]) =[] D (t :: s) = D (t):: D (s) → − def D (case t of { x → s ··· x → s }) = 1 1 1 n n n → − → − → − case D (t) of { x → D (s ) ··· x → D (s )} 1 1 1 n n n → − def → − → − → − D (fold (x ,x ).t over s from r) = fold (x ,x ).D (t) over D (s) from D (r) 1 2 1 2 To demonstrate the practical use of expressive type systems for diﬀerential programming, we consider the following two examples. Example 6 (Lists of inputs for neural nets). Usually, we run a neural network on a large data set, the size of which might be determined at runtime. To evaluate a neural network on multiple inputs, in practice, one often sums the outcomes. This can be coded in our extended language as follows. Suppose that we have a network f : (real ∗P ) → real that operates on single input vectors. We can construct one that operates on lists of inputs as follows: def g = λl, w.fold (x ,x ).fx ,w + x over l from 0 : (list(real )∗P ) → real 1 2 1 2 Example 7 (Missing data). In practically every application of statistics and ma- chine learning, we face the problem of missing data: for some observations, only partial information is available. In an expressive typed programming language like we consider, we can model missing data conveniently using the data type maybe(τ)= {Nothing () Just τ}. In the context of a neural network, one might use it as follows. First, deﬁne some helper functions def fromMaybe = λx.λm.case m of {Nothing → x Just x → x } def fromMaybe = λx , ..., x .λm , ..., m .fromMaybe x m , ..., fromMaybe x m 1 n 1 n 1 1 n n n n n :(maybe(real)) → real → real def map = λf.λl.fold (x ,x ).f x :: x over l from [] : (τ → σ) → list(τ) → list(σ) 1 2 1 2 Correctness of AD via Diﬀeologies and Categorical Gluing 329 Given a neural network f : (list(real )∗P ) → real, we can build a new one that operates on on a data set for which some covariates (features) are missing, by passing in default values to replace the missing covariates: λl, m, w.fmap (fromMaybe m) l, w : (list((maybe(real)) )∗(real ∗P )) → real Then, given a data set l with missing covariates, we can perform automatic diﬀerentiation on this network to optimize, simultaneously, the ordinary network parameters w and the default values for missing covariates m. Semantics. In § 3 we gave a denotational semantics for the simple language in diﬀeological spaces. This extends to the language in this section, as follows. As before, each type τ is interpreted as a diﬀeological space, which is a set equipped with a family of plots: – A variant type { τ ... τ } is inductively interpreted as the disjoint 1 1 n n def union { τ ··· τ } = τ with U-plots 1 1 n n i i=1 def j n n U U j P = U −→ τ → τ U = U ,f ∈P . j j i j j i=1 j=1 τ { τ ... τ } 1 1 n n j=1 def – A list type list(τ) is interpreted as the set of lists, list(τ) = τ i=1 with U-plots def j ∞ ∞ U U j i P = U −→ τ → τ U = U ,f ∈P . j j j j list(τ) i=1 j=1 j=1 The constructors and destructors for variants and lists are interpreted as in the usual set theoretic semantics. It is routine to show inductively that these interpretations are smooth. Thus every term Γ t : τ in the extended language is interpreted as a smooth function t : Γ → τ between diﬀeological spaces. (In this section we focused on a language with lists, but other inductive types are easily interpreted in the category of diﬀeological spaces in much the same way; the categorically minded reader may regard this as a consequence of Diﬀ being a concrete Grothendieck quasitopos, e.g. [3].) 5 Categorical analysis of forward AD and its correctness This section has three parts. First, we give a categorical account of the functo- riality of AD (Ex. 8). Then we introduce our gluing construction, and relate it to the correctness of AD (dgm. (3)). Finally, we state and prove a correctness theorem for all ﬁrst order types by considering a category of manifolds (Th. 2). Syntactic categories. Our language induces a syntactic category as follows. Deﬁnition 2. Let Syn be the category whose objects are types, and where a morphism τ → σ is a term in context x : τ t : σ modulo the βη-laws (Fig. 4). Composition is by substitution. 330 M. Huot et al. For simplicity, we do not impose arithmetic identities such as x + y = y + x in Syn. As is standard, this category has the following universal property. Lemma 2 (e.g. [27]). For every bicartesian closed category C with list objects, and every object F (real) ∈C and morphisms F (c) ∈C(1,F (real)), F (+),F (∗) ∈ C(F (real) × F (real),F (real)), F (ς) ∈ Syn(F (real),F (real)) in C, there is a unique functor F : Syn →C respecting the interpretation and preserving the bicartesian closed structure as well as list objects. Proof (notes). The functor F : Syn →C is a canonical denotational semantics for the language, interpreting types as objects of C and terms as morphisms. def For instance, F (τ → σ) =(Fτ → Fσ), the function space in the category C, def and F (ts) = is the composite (Ft, Fs); eval. When C = Diﬀ, the denotational semantics of the language in diﬀeological spaces (§3,4) can be understood as the unique structure preserving functor − : Syn → Diﬀ satisfying real = R, ς = ς and so on. → − Example 8 (Canonical deﬁnition forward AD). The forward AD macro D (§2,4) arises as a canonical cartesian closed functor on Syn. Consider the unique carte- → − sian closed functor F : Syn → Syn such that F (real)= real∗real, F (c)= D (c), → − F (ς)= D (ς(x)), and → − F (+) = z : F (real)∗F (real) case z of x, y→ D (x + y): F (real) → − F (∗)= z : F (real)∗F (real) case z of x, y→ D (x ∗ y): F (real) → − → − Then for any type τ, F (τ)= D (τ), and for any term x : τ t : σ, F (t)= D (t) as morphisms F (τ) → F (σ) in the syntactic category. Categorical gluing and logical relations. Gluing is a method for building new categorical models which has been used for many purposes, including logical relations and realizability [24]. Our logical relations argument in the proof of Th. 1 can be understood in this setting. In this subsection, for the categorically minded, we explain this, and in doing so we quickly recover a correctness result for the more general language in § 4 and for arbitrary ﬁrst order types. We deﬁne a category Gl whose objects are triples (X, X ,S) where X U U and X are diﬀeological spaces and S ⊆P ×P is a relation between their X X U-plots. A morphism (X, X ,S) → (Y, Y ,T ) is a pair of smooth functions t t 1 n case t ,...,t of x ,...,x → s = s[ / ,..., / ] 1 n 1 n x xn #x ,...,x t 1 n x ,...,x (λx.t) s = t[ / ] 1 n x s[ / ] = case t of x ,...,x → s[ / ] y 1 n y #x t = λx.t x case t of { x → s ··· x → s } = s [ / ] i 1 1 1 n n n i x #x ,...,x t 1 n x x 1 1 n n s[ / ] = case t of { x → s[ / ] ··· x → s[ / ]} y 1 1 y n n y #x ,...,x 1 n We write = to indi- fold (x ,x ).t over [] from r = r 1 2 cate that the variables are s fold (x ,x ).t over s from r 1 1 2 2 fold (x1,x2).t over s1 :: s2 from r = t[ /x , /x ] 1 2 free in the left hand side. #x ,x [] s x ::y t 1 2 u = s[ / ],r[ / ]= s[ / ] ⇒ s[ / ] = fold (x ,x ).r over t from u y x y y 1 2 Fig. 4. Standard βη-laws (e.g. [27]) for products, functions, variants and lists. Correctness of AD via Diﬀeologies and Categorical Gluing 331 f : X → Y , f : X → Y , such that if (g, g ) ∈ S then (g; f, g ; f ) ∈ T . The idea is that this is a semantic domain where we can simultaneously interpret the language and its automatic derivatives. Proposition 1. The category Gl is bicartesian closed, has list objects, and the projection functor proj : Gl → Diﬀ × Diﬀ preserves this structure. Proof (notes). The category Gl is a full subcategory of the comma category id ↓ Diﬀ(U, −) × Diﬀ(U, −). The result thus follows by the general theory Set of categorical gluing (e.g. [17, Lemma 15]). We give a semantics − =(− , − ,S ) for the language in Gl , interpreting 0 1 − R def types τ as objects (τ , τ ,S ), and terms as morphisms. We let real = R 0 1 τ 0 def def and real = R , with the relation S = {(f, (f, ∇f)) | f : R → R smooth}. 1 real def def We interpret the constants c as pairs c = c and c =(c, 0), and we interpret 0 1 +, ×,ς in the standard way (meaning, like −)in − , but according to the 2 2 2 derivatives in − , for instance, ∗ : R × R → R is 1 1 def ∗ ((x, x ), (y, y )) =(xy, xy + x y). At this point one checks that these interpretations are indeed morphisms in Gl . This amounts to checking that these interpretations are dual numbers representations in the sense of (2). The remaining constructions of the language are interpreted using the categorical structure of Gl , following Lem. 2. Notice that the diagram below commutes. One can check this by hand or note that it follows from the initiality of Syn (Lem. 2): all the functors preserve all the structure. → − (id,D (−)) Syn Syn × Syn (3) − −×− Gl Diﬀ × Diﬀ proj We thus arrive at a restatement of the correctness theorem (Th. 1), which holds even for the extended language with variants and lists, because for any x ...x : 1 n → − real t : real, the interpretations (t, D (t)) are in the image of the projection → − Gl → Diﬀ × Diﬀ, and hence D (t) is a dual numbers encoding of t. Correctness at all ﬁrst order types, via manifolds. We now generalize Theorem 1 to hold at all ﬁrst order types, not just the reals. To do this, we need to deﬁne the derivative of a smooth map between the interpretations of ﬁrst order types. We do this by recalling the well known theory of manifolds and tangent bundles. For our purposes, a smooth manifold M is a second-countable Hausdorﬀ topo- logical space together with a smooth atlas: an open cover U together with home- n(U) −1 omorphisms φ : U → R (called charts) such that φ ; φ is smooth U V U∈U 332 M. Huot et al. on its domain of deﬁnition for all U, V ∈U. A function f : M → N between −1 manifolds is smooth if φ ; f; ψ is smooth for all charts φ and ψ of M and V U V N, respectively. Let us write Man for this category. Our manifolds are slightly unusual because diﬀerent charts in an atlas may have diﬀerent ﬁnite dimension n(U). Thus we consider manifolds with dimensions that are potentially unbounded, albeit locally ﬁnite. This does not aﬀect the theory of diﬀerential geometry as far as we need it here. Each open subset of R can be regarded as a manifold. This lets us regard the category of manifolds Man as a full subcategory of the category of diﬀeological spaces. We consider a manifold (X, {φ } ) as a diﬀeological space with the same U U carrier set X and where the plots P are the smooth functions in Man(U, X). A function X → Y is smooth in the sense of manifolds if and only if it is smooth in the sense of diﬀeological spaces [16]. For the categorically minded reader, this means that we have a full embedding of Man into Diﬀ. Moreover, the natural interpretation of the ﬁrst order fragment of our language in Man coincides with that in Diﬀ. That is, the embedding of Man into Diﬀ preserves ﬁnite products and countable coproducts (hence initial algebras of polynomial endofunctors). Proposition 2. Suppose that a type τ is ﬁrst order, i.e. it is just built from reals, products, variants, and lists (or, again, arbitrary inductive types), and not function types. Then the diﬀeological space τ is a manifold. Proof (notes). This is proved by induction on the structure of types. In fact, one may show that every such τ is isomorphic to a manifold of the form R i=1 where the bound n is either ﬁnite or ∞, but this isomorphism is typically not an identity function. The constraint to ﬁrst order types is necessary because, e.g. the space real → real is not a manifold, because of a Borsuk-Ulam argument (see [15], Appx. A). We recall that the derivative of any morphism f : M → N of manifolds is given as follows. For each point x in a manifold M, deﬁne the tangent space T M to be the set {γ ∈ Man(R,M) | γ(0) = x}/ ∼ of equivalence classes [γ]of smooth curves γ in M based at x, where we identify γ ∼ γ iﬀ ∇(γ ; f)(0) = 1 2 1 ∇(γ ; f)(0) for all smooth f : M → R. The tangent bundle of M is the set def T (M) = T (M). The charts of M equip T (M) with a canonical manifold x∈M structure. Then for smooth f : M → N, the derivative T (f): T (M) →T (N) def is deﬁned as T (f)(x, [γ]) =(f(x), [γ; f]). All told, the derivative is a functor T : Man → Man. As is standard, we can understand the tangent bundle of a composite space in terms of that of its parts. ∞ ∞ Lemma 3. There are canonical isomorphisms T ( M ) = T (M ) and i i i=1 i=1 T (M × ... × M ) = T (M ) × ... ×T (M ). 1 n 1 n → − → − D T We deﬁne a canonical isomorphism φ : D (τ) →T (τ ) for every type τ, → − → − D T by induction on the structure of types. We let φ : D (real) →T (real)be real Correctness of AD via Diﬀeologies and Categorical Gluing 333 → − def D T given by φ (x, x ) =(x, [t
→ x + x t]). For the other types, we use Lemma 3. real We can now phrase correctness at all ﬁrst order types. → − Theorem 2 (Semantic correctness of D (full)). For any ground τ, any ﬁrst → − order context Γ and any term Γ t : τ, the syntactic translation D coincides with the tangent bundle functor, modulo these canonical isomorphisms: → − D (t) → − → − D (Γ ) D (τ) → − → − ∼ ∼ D T D T = = φ φ T (Γ ) T (τ ) T (t) Proof (notes). For any curve γ ∈ Man(R,M), let γ¯ ∈ Man(R, T (M)) be the tangent curve, given by γ¯(x)=(γ(x), [t
→ γ(x + t)]). First, we note that a smooth map h : T (M) →T (N) is of the form T (g) for some g : M → N if for all smooth curves γ : R → M we have γ¯; h = (γ; g): R →T (N). This → − D T ˜ ˜ ¯ generalizes (2). Second, for any ﬁrst order type τ, S = {(f, f) | f; φ = f}. This is shown by induction on the structure of types. We conclude the theorem from diagram (3), by putting these two observations together. 6 A continuation-based AD algorithm We now illustrate the ﬂexibility of our framework by brieﬂy describing an alter- ← − native syntactic translation D . This alternative translation uses aspects of con- tinuation passing style, inspired by recent developments in reverse mode AD [34, ← − ← − 5]. In brief, D works by D (real)=(real∗(real → ρ)). Thus instead of using a ρ ρ pair of a number and its tangent, we use a pair of a number and a continuation. The answer type ρ = real needs to have the structure of a vector space, and the continuations that we consider will turn out to be linear maps. Because we work in continuation passing style, the chain rule is applied contravariantly. If the reader is familiar with reverse-mode AD algorithms, they may think of the dimension k as the number of memory cells used to store the result. Computing the whole gradient of a term x : real, ..., x : real t : real at 1 k ← − once is then achieved by running D (t)ona k-tuple of basis vectors for real . ← − We deﬁne the continuation-based AD macro D on types and terms as the ← − unique structure preserving functor Syn → Syn with D (real)= (real∗(real → real )) and ← − def D (c) = c,λz.0,..., 0 ← − def ← − ← − D (t + s) = case D (t) of x, x → case D (s) of y, y →x + y, λz.x z + y z k k k ← − def ← − ← − D (t ∗ s) = case D (t) of x, x → case D (s) of y, y → k k k x ∗ y, λz.x (y ∗ z)+ y (x ∗ z) def ← − ← − D (ς(t)) = case D (t) of x, x → let y = ς(x) in y, λz.x (y ∗ (1 − y) ∗ z). k k def k k Here, we use sugar x : real ,y : real x + y = case x of x ,...,x → 1 k 334 M. Huot et al. case y of y ,...,y →x + y ,...,x + y . (We could easily expand this def- 1 k 1 1 k k ← − → − inition by making D preserve all other term and type formers, as we did for D .) Note that the corresponding scheme for an arbitrary n-ary operation op would be (c.f. the scheme for forward AD in §4) ← − def ← − ← − D (op(t ,...,t )) = case D (t ) of x ,x → ... → case D (t ) of x ,x → k 1 n k 1 1 k n n 1 n op(x ,...,x ),λz. x (∂ op(x ,...,x ) ∗ z). 1 n i 1 n i=1 ← − The idea is that D (t) is a higher order function that simultaneously computes t (the forward pass) and deﬁnes as a continuation the reverse pass which com- putes the gradient. In order to actually run the algorithm, we need two auxiliary deﬁnitions def lamR = λz. case z of x, x → case x of x ,...,x → real 1 → − ← − x, λy.x ∗ y,...,x ∗ y : D (real) → D (real) k k 1 k def ← − → − evR = λz. case z of x, x →x, x 1 : D (real) → D (real). k k real → − Here, D is a macro on types (and terms) with exactly the same inductive def- → − → − inition as D except for the base case D (real)= (real∗real ). By noting that → − ← − both D and D preserve all type formers, we can extend these deﬁnitions to all k k → − ← − ← − → − k k ﬁrst order types τ: z : D (τ) lamR (z): D (τ),z : D (τ) evR (z): D (τ). k k k k τ τ → − We can think of lamR (z) as encoding k tangent vectors z : D (τ) as a closure, ← − so it is suitable for running D (t) on, and evR (z) as actually evaluating the ← − reverse pass deﬁned by z : D (τ) and returning the result as k tangent vectors. The idea is that given some x : τ t : σ between ﬁrst order types τ, σ,werun ← − lamR (z) our continuation-based AD by running evR (D (t)[ / ]). k x The correctness proof closely follows that for forward AD. In particular, r,k r,k k R one deﬁnes a binary logical relation real =(R, R × (R ) ,S ), where real r,k S = (f, x
→ (f(x),y
→ (∂ f(x) ∗ y,...,∂ f(x) ∗ y))) | f ∈P ,onthe 1 k real k k ← − ← − R R plots P ×P and veriﬁes that c × D (c), x + y × D (x + y), k R k k R×((R ) ) ← − ← − x∗y×D (x∗y) and ς(x)×D (ς(x)) respect this logical relation. It follows k k ← − r,k that this relation extends to a functor − : Syn → Gl k such that id × D r,k factors over − , implying the correctness of the continuation-based AD by the following lemma. Lemma 4. For all ﬁrst order types τ (i.e. types not involving function types), k k we have that evR (lamR (t)) = t. τ τ Proof (notes). This follows by an induction on the structure of τ. The idea is that lamR embeds reals into function spaces as linear maps, which is undone by evR by evaluating the linear maps at 1. To phrase correctness, in this setting, however, we need a few deﬁnitions. Keeping in mind the canonical projection T (M) → M, we deﬁne T (M)as the k-fold categorical pullback (ﬁbre product) T (M) × ... × T (M). To be M M explicit, T M consists of k-tuples of tangent vectors at the base point x. Again, def k k T extends to a functor Man → Man by deﬁning T (f)(x, (v ,...,v )) = 1 k (f(x), (T (f)(v ),..., T (f)(v ))). As T preserves countable coproducts and x 1 x k Correctness of AD via Diﬀeologies and Categorical Gluing 335 → − D T ﬁnite products (like T ), it follows that the isomorphisms φ generalize to → − → − D T canonical isomorphisms φ : D (τ) →T (τ ) for ﬁrst order types τ. This τ,k leads to the following correctness statement for continuation-based AD. ← − Theorem 3 (Semantic correctness of D ). For any ground τ, any ﬁrst order ← − lamR (z) context Γ and any term Γ t : τ, syntactic translation t
→ evR (D (t)[ / ]) k ... coincides with the tangent bundle functor, modulo these canonical isomorphisms: ← − k k lamR ;D (t);evR → − Γ τ → − D (Γ ) D (τ) k k → − → − ∼ ∼ D T D T = = φ φ Γ,k τ,k k k T (Γ ) T (τ ) T (t) For example, when τ = real and Γ = x, y : real, we can run our continuation- based AD to compute the gradient of a program x, y : real t : real at values x = V, y = W by evaluating 2 2 ← − 2 (lamR v) (lamR w) V,1,0 W,0,1 x:real y:real evR (D (t)[ / , / ])[ / , / ]. 2 x y v w real Indeed, 2 2 ← − 2 (lamR v) (lamR w) V,1,0 W,0,1 x:real y:real evR (D (t)[ / , / ])[ / , / ] = 2 x y v w real t(V , W ),∂ t(V , W ),∂ t(V , W ) . 1 2 7 Discussion and future work Summary. We have shown that diﬀeological spaces provide a denotational semantics for a higher order language with variants and inductive types (§3,4). We have used this to show correctness of a simple AD translation (Thm. 1, Thm. 2). But the method is not tied to this speciﬁc translation, as we illustrated in Section 6. The structure of our elementary correctness argument for Theorem 1 is a typical logical relations proof. As explained in Section 5, this can equivalently be understood as a denotational semantics in a new kind of space obtained by categorical gluing. Overall, then, there are two logical relations at play. One is in diﬀeological spaces, which ensures that all deﬁnable functions are smooth. The other is in the correctness proof (equivalently in the categorical gluing), which explicitly tracks the derivative of each function, and tracks the syntactic AD even at higher types. Connection to the state of the art in AD implementation. As is common in denotational semantics research, we have here focused on an idealized language and simple translations to illustrate the main aspects of the method. There are a number of points where our approach is simplistic compared to the advanced current practice, as we now explain. 336 M. Huot et al. Representation of vectors. In our examples we have treated n-vectors as tuples of length n. This style of programming does not scale to large n.Abetter solution would be to use array types, following [31]. Our categorical semantics and correctness proofs straightforwardly extend to cover them, in a similar way to our treatment of lists. Eﬃcient forward-mode AD. For AD to be useful, it must be fast. The syntactic → − translation D that we use is the basis of an eﬃcient AD library [31]. However, numerous optimizations are needed, ranging from algebraic manipulations, to partial evaluations, to the use of an optimizing C compiler. A topic for future work would be to validate some of these manipulations using our semantics. The resulting implementation is performant in experiments [31]. Eﬃcient reverse-mode AD. Our sketch of continuation-based AD is primarily intended to emphasise that our denotational approach is not tied to any speciﬁc → − translation D . Nonetheless, it is worth noting that this algorithm shares similari- ties with advanced reverse-mode implementations: (1) it calculates derivatives in a (contravariant) “reverse pass” in which derivatives of operations are evaluated in the reverse order compared to their order in calculating the function value; (2) it can be used to calculate the full gradient of a function R → R in a single reverse pass (while n passes of fwd AD would be necessary). However, it lacks important optimizations and the continuation scales with the size of the input n where it should scale with the size of the output. This adds an important over- head, as pointed out in [26]. Speed being the main attraction of reverse-mode AD, its implementations tend to rely on mutable state, control operators and/or staging [26, 6, 34, 5], which we have not considered here. Other language features. The idealized languages that we considered so far do not touch on several useful language constructs. For example: the use of functions that are partial (such as division) or partly-smooth (such as RelU); phenomena such as iteration, recursion; and probabilities. There are suggestions that the denotational approach using diﬀeological spaces can be adapted to these features using standard categorical methods. We leave this for future work. Acknowledgements. We have beneﬁted from discussing this work with many people, including B. Pearlmutter, O. Kammar, C. Mak, L. Ong, G. Plotkin, A. Shaikhha, J. Sigal, and others. Our work is supported by the Royal Society and by a Facebook Research Award. In the course of this work, MV has also been employed at Oxford (EPSRC Project EP/M023974/1) and at Columbia in the Stan development team. This project has received funding from the Euro- pean Union’s Horizon 2020 research and innovation programme under the Marie Sklodowska-Curie grant agreement No. 895827. Correctness of AD via Diﬀeologies and Categorical Gluing 337 References 1. Abadi, M., Barham, P., Chen, J., Chen, Z., Davis, A., Dean, J., Devin, M., Ghe- mawat, S., Irving, G., Isard, M., et al.: Tensorﬂow: A system for large-scale machine learning. In: 12th USENIX Symposium on Operating Systems Design and Imple- mentation (OSDI 16). pp. 265–283 (2016) 2. Abadi, M., Plotkin, G.D.: A simple diﬀerentiable programming language. In: Proc. POPL 2020. ACM (2020) 3. Baez, J., Hoﬀnung, A.: Convenient categories of smooth spaces. Transactions of the American Mathematical Society 363(11), 5789–5825 (2011) 4. Barthe, G., Crubill´e, R., Lago, U.D., Gavazzo, F.: On the versatility of open logical relations: Continuity, automatic diﬀerentiation, and a containment theorem. In: Proc. ESOP 2020. Springer (2020), to appear 5. Brunel, A., Mazza, D., Pagani, M.: Backpropagation in the simply typed lambda- calculus with linear negation. In: Proc. POPL 2020 (2020) 6. Carpenter, B., Hoﬀman, M.D., Brubaker, M., Lee, D., Li, P., Betancourt, M.: The Stan math library: Reverse-mode automatic diﬀerentiation in C++. arXiv preprint arXiv:1509.07164 (2015) 7. Christensen, J.D., Wu, E.: Tangent spaces and tangent bundles for diﬀeological spaces. arXiv preprint arXiv:1411.5425 (2014) 8. Cockett, J.R.B., Cruttwell, G.S.H., Gallagher, J., Lemay, J.S.P., MacAdam, B., Plotkin, G.D., Pronk, D.: Reverse derivative categories. In: Proc. CSL 2020 (2020) 9. Cruttwell, G., Gallagher, J., MacAdam, B.: Towards formalizing and extending diﬀerential programming using tangent categories. In: Proc. ACT 2019 (2019) 10. Duchi, J., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. Journal of Machine Learning Research 12(Jul), 2121– 2159 (2011) 11. Ehrhard, T., Regnier, L.: The diﬀerential lambda-calculus. Theoretical Computer Science 309(1-3), 1–41 (2003) 12. Elliott, C.: The simple essence of automatic diﬀerentiation. Proceedings of the ACM on Programming Languages 2(ICFP), 70 (2018) 13. Fong, B., Spivak, D., Tuy´eras, R.: Backprop as functor: A compositional perspec- tive on supervised learning. In: 2019 34th Annual ACM/IEEE Symposium on Logic in Computer Science (LICS). pp. 1–13. IEEE (2019) 14. Hoﬀman, M.D., Gelman, A.: The No-U-Turn sampler: adaptively setting path lengths in Hamiltonian Monte Carlo. Journal of Machine Learning Research 15(1), 1593–1623 (2014) 15. Huot, M., Staton, S., V´ ak´ ar, M.: Correctness of automatic diﬀerentiation via dif- feologies and categorical gluing. Full version (2020), arxiv:2001.02209 16. Iglesias-Zemmour, P.: Diﬀeology. American Mathematical Soc. (2013) 17. Johnstone, P.T., Lack, S., Sobocinski, P.: Quasitoposes, quasiadhesive categories and Artin glueing. In: Proc. CALCO 2007 (2007) 18. Kiefer, J., Wolfowitz, J., et al.: Stochastic estimation of the maximum of a regres- sion function. The Annals of Mathematical Statistics 23(3), 462–466 (1952) 19. Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) 20. Kucukelbir, A., Tran, D., Ranganath, R., Gelman, A., Blei, D.M.: Automatic diﬀer- entiation variational inference. The Journal of Machine Learning Research 18(1), 430–474 (2017) 338 M. Huot et al. 21. Liu, D.C., Nocedal, J.: On the limited memory BFGS method for large scale opti- mization. Mathematical programming 45(1-3), 503–528 (1989) 22. Mak, C., Ong, L.: A diﬀerential-form pullback programming language for higher- order reverse-mode automatic diﬀerentiation (2020), arxiv:2002.08241 23. Manzyuk, O.: A simply typed λ-calculus of forward automatic diﬀerentiation. In: Proc. MFPS 2012 (2012) 24. Mitchell, J.C., Scedrov, A.: Notes on sconing and relators. In: International Work- shop on Computer Science Logic. pp. 352–378. Springer (1992) 25. Neal, R.M., et al.: MCMC using Hamiltonian dynamics. Handbook of Markov Chain Monte Carlo 2(11), 2 (2011) 26. Pearlmutter, B.A., Siskind, J.M.: Reverse-mode AD in a functional framework: Lambda the ultimate backpropagator. ACM Transactions on Programming Lan- guages and Systems (TOPLAS) 30(2), 7 (2008) 27. Pitts, A.M.: Categorical logic. Tech. rep., University of Cambridge, Computer Lab- oratory (1995) 28. Plotkin, G.D.: Some principles of diﬀerential programming languages (2018), in- vited talk, POPL 2018 29. Qian, N.: On the momentum term in gradient descent learning algorithms. Neural networks 12(1), 145–151 (1999) 30. Robbins, H., Monro, S.: A stochastic approximation method. The annals of math- ematical statistics pp. 400–407 (1951) 31. Shaikhha, A., Fitzgibbon, A., Vytiniotis, D., Peyton Jones, S.: Eﬃcient diﬀeren- tiable programming in a functional array-processing language. Proceedings of the ACM on Programming Languages 3(ICFP), 97 (2019) 32. Souriau, J.M.: Groupes diﬀ´erentiels. In: Diﬀerential geometrical methods in math- ematical physics, pp. 91–128. Springer (1980) 33. Stacey, A.: Comparative smootheology. Theory Appl. Categ. 25(4), 64–117 (2011) 34. Wang, F., Wu, X., Essertel, G., Decker, J., Rompf, T.: Demystifying diﬀeren- tiable programming: Shift/reset the penultimate backpropagator. Proceedings of the ACM on Programming Languages 3(ICFP) (2019) Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made. The images or other third party material in this chapter are included in the chapter’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. Deep Induction: Induction Rules for (Truly) Nested Types Patricia JohannB and Andrew Polonsky Appalachian State University, Boone, NC, USA johannp@appstate.edu, polonskya@appstate.edu Abstract. This paper introduces deep induction, and shows that it is the notion of induction most appropriate to nested types and other data types deﬁned over, or mutually recursively with, (other) such types. Stan- dard induction rules induct over only the top-level structure of data, leaving any data internal to the top-level structure untouched. By con- trast, deep induction rules induct over all of the structured data present. We give a grammar generating a robust class of nested types (and thus ADTs), and develop a fundamental theory of deep induction forthem using their recently deﬁned semantics as ﬁxed points of accessible func- tors on locally presentable categories. We then use our theory to derive deep induction rules for some common ADTs and nested types, and show how these rules specialize to give the standard structural induction rules for these types. We also show how deep induction specializes to solve the long-standing problem of deriving principled and practically useful structural induction rules for bushes and other truly nested types. Overall, deep induction opens the way to making induction principles appropriate to richly structured data types available in programming languages and proof assistants. Agda implementations of our develop- ment and examples, including two extended case studies, are available. 1 Introduction This paper is concerned with the problem of inductive reasoning about induc- tive data types that are deﬁned over, or are deﬁned mutually recursively with, (other) such data types. Examples of such deep data types include, trivially, ordi- nary algebraic data types (ADTs), such as list and tree types;data types,such as the forest type, whose recursive occurrences appear below other type con- structors; simple nested types, such as the type of perfect trees, whose recursive occurrences never appear below their own type constructors;truly nested types, such as the type of bushes (also called bootstrapped heaps by Okasaki [16]), whose recursive occurrences do appear below their own type constructors; and GADTs. Proof assistants, including Coq and Agda, currently provide insuﬃcient support for performing induction over deep data types. The inductionrules, ifany, they generate for such types induct over only their top-level structures, leaving any data internal to the top-level structure untouched. This paper develops a prin- ciple that, by contrast, inducts over all of the structured data present. We call this principle deep induction. Deep induction not only provides general support for solving problems that previously had only (usually quite painful and) ad hoc solutions, but also opens the way for incorporating automatic generation of useful induction rules for deep data types into proof assistants. Nested types that are deﬁned over themselves are known as truly nested types. c The Author(s) 2020 J. Goubault-Larrecq and B. K¨ onig (Eds.): FOSSACS 2020, LNCS 12077, pp. 339–358, 2020. https://doi.org/10.1007/978-3-030-45231-5_18 340 P. Johann and A. Polonsky To illustrate the diﬀerence between structural induction and deep induction, note that the data inside a structure of type List a = Nil | Cons a (List a)is treated monolithically (i.e., ignored) by the structural induction rule for lists: ∀(a : Set)(P : List a → Set) → P Nil → (∀ (x : a)(xs : List a)→ Pxs → P (Cons x xs))→∀ (xs : List a)→ Pxs By contrast, the deep induction rule for lists traverses not just the outer list structure with a predicate P, but also each data element of that list with a custom predicate Q: ∀ (a : Set)(P : List a → Set)(Q : a → Set) → P Nil → (∀(x : a)(xs : List a) → Qx → Pxs → P (Cons x xs))→ ∀(xs : List a)→ List Qxs → Pxs Here, List lifts its argument predicate Q on data of type a toapredicate on data of type List a asserting that Q holds for every element of its argument list. The structural induction rule for lists is, like that for any ADT, recovered by taking the custom predicate in the corresponding deep rule tobe λx. True. A particular advantage of deep induction is that it obviates the need to reﬂect properties as data types. For example, although the set of primes cannot be de- ﬁned by an ADT, the primeness predicate Prime on the ADT of natural numbers can be lifted to a predicate List Prime characterizing lists of primes. Properties can then be proved for lists of primes using the following deep induction rule: ∀(P : List Nat→ Set) → P Nil → (∀(x : Nat)(xs : List Nat)→ Prime x → Pxs → P (Cons x xs))→ ∀(xs : List Nat)→ List Prime xs → Pxs As we’ll see in Sections 3, 4,and 5, the extra ﬂexibility aﬀorded by lifting predi- cates like Q and Prime on data internal to a structure makes it possible to derive useful induction principles for more complex types, such as truly nested ones. In each of the above examples, a predicate on the data is lifted toapredicate on the list. This is an example of lifting a predicate on a type in a non-recursive position of an ADT’s deﬁnition to the entire ADT. However, the predicate to be lifted can also be on the type in a recursive position of a deﬁnition — i.e., on the ADT being deﬁned itself — and this ADT can appear below another type constructor in the deﬁnition. This is exactly the situation for the ADT Forest a, which appears below the type constructor List in the deﬁnition Forest a = FEmpty | FNode a (List (Forest a)) The induction rule Coq generates for forests is ∀ (a : Set)(P : Forest a→ Set) → P FEmpty→ (∀ (x : a)(ts : List (Forest a))→ P (FNode x ts))→∀ (x : Forest a)→ Px However, this is neither the induction rule we intuitively expect, nor is it expres- sive enough to prove even basic properties of forests that ought to be amenable to inductive proof. The approach of [11,12] does give the expected rule This is equivalent to the rule as classically stated in Coq/Isabelle/HOL. Deep induction 341 ∀ (a : Set)(P : Forest a→ Set) → P FEmpty→ (∀ (x : a)(ts : List (Forest a))→ (∀ (k < length ts)→ P (ts!!k)) → P (FNode x ts))→∀ (x : Forest a)→ Px But to derive it, a technique based on list positions is used to propagate the predicate to be proved over the list of forests that is the second argument to the data constructor FNode. Unfortunately, this technique does not obviously extend to other deep data types, including the type of “generalized forests” introduced in Section 4.4 below, which combines smaller generalized forests into larger ones using a type constructor f potentially diﬀerent from List. Nevertheless, replac- ing ∀ (k < length ts)→ P (ts!!k) in the expected rule with List Pts,which is equivalent, reveals that it is nothing more than the special case for Q = λx. True of the following deep induction rule for Forest a: ∀ (a : Set)(P : Forest a→ Set)(Q : a → Set) → P FEmpty→ (∀ (x : a)(ts : List (Forest a))→ Qx → List Pts → P (FNode x ts))→ ∀ (x : Forest a)→ Forest Qx → Px When types, like Forest a and List (Forest a) above, are deﬁned by mutual recursion, their (deep) induction rules are deﬁned by mutual recursion as well. For example, the induction rules for the ADTs data Expr = Lit Nat | Add Expr Expr | If BExpr Expr Expr data BExpr = BLit Bool | And BExpr BExpr | Not BExpr | Equal Expr Expr of integer and boolean expressions are deﬁned by mutual recursion as ∀(P : Expr → Set)(Q : BExpr → Set) → (∀(n : Nat) → P (Lit n)) → (∀(e1 : Expr)(e2 : Expr) → Pe1 → Pe2 → P (Add e1 e2))→ (∀(b : BExpr)(e1 : Expr)(e2 : Expr) → Qb → Pe1 → Pe2 → P (Ifbe1e2))→ (∀(b : Bool). Q (BLit b))→ (∀(b1 : BExpr)(b2 : BExpr) → Qb1 → Qb2 → Q (And b1 b2))→ (∀(b : BExpr)→ Qb → Q (Not b))→ (∀(e1 : Expr)(e2 : Expr) → Pe1 → Pe2 → Q (Equal e1 e2))→ (∀(e : Expr) → Pe) × (∀(b : BExpr) → Qb) 2 The Key Idea As the examples of the previous section suggest, the key to deriving deep induc- tion rules from (deep) data type declarations is to parameterize the induction rules not just over a predicate over the top-level data type being deﬁned, but over predicates on the types of primitive data they contain as well. These additional predicates are then lifted to predicates on any internal structures containing these data, and the resulting predicates on these internal structures are lifted to predicates on any internal structures containing structures at the previous level, and so on, until the internal structures at all levels of the data type deﬁnition, including the top level, have been so processed. Satisfaction of a predicate by the data at one level of a structure is then conditioned upon satisfaction of the 342 P. Johann and A. Polonsky appropriate predicates by all of the data at the preceding level. The above deep induction rules were all obtained using this technique. For example, the deep induction rule for lists is derived by ﬁrst noting that struc- tures of type List a contain only data of type a, so that only one additional predicate parameter, which we called Q above, is needed. Then, since the only data structure internal to the type List a is List a itself, Q need only be lifted to lists containing data of type a.Thisisexactly what List Q does. Finally, the deep induction rule for lists is obtained by parameterizing the standard one over not just P but also Q, adding the additional hypothesis Qx to its second antecedent, and adding the additional hypothesis List Qxs to its conclusion. The deep induction rule for forests is similarly obtained from the Coq- generated rule by ﬁrst parameterizing over an additional predicate Q on the type a of data stored in the forest, then lifting P to a predicate on lists contain- ingdataoftype Forest a and Q to forests containing data of type a,and,ﬁnally, adding the additional hypotheses Qx and List Pts to its second antecedent and the additional hypothesis Forest Qx to its conclusion. ∧ ∧ Predicate liftings such as List and Forest may either be supplied as prim- itives, or be generated automatically from the deﬁnitions of the types themselves, as describedinSection 4. For container types, lifting a predicate amounts to traversing the container and applying the argument predicate pointwise. Our technique for deriving deep induction rules for ADTs, as well as its gen- eralization to nested types given in Section 3, is both made precise and rigorously justiﬁed in Section 4 using the results of [13]. This paper can thus be seen as a concrete application, in the speciﬁc category Fam, of the very general semantics developed in [13]; indeed, our induction rules are computed as the interpreta- tions of the syntax for nested types in Fam. A general methodology is extracted in Section 5. The rest of this paper can be read either as “just” describinghow to generate deep induction rules in practice, or as also proving that our technique for doing so is provably correct and general. Our Agda code is at [14]. 3 Extending to Nested Types Appropriately generalizing the basic technique of Section 2 derives deep induc- tion rules, and therefore structural induction rules, for nested types, including truly nested types and other deep nested types. Nested types generalize ADTs by allowing elements at one instance of a data type to depend ondata at other instances of the same type so that, in eﬀect, the entire family of instances is constructed simultaneously. That is, rather than deﬁning standalone families of inductive types, one for each choice of types to which type constructors like List and Tree are applied, the type constructors for nested types deﬁne inductive families of types. The structural induction rule for a nested type must therefore account for its changing type parameters by parameterizing over an appropri- ately polymorphic predicate, and appropriately instantiating that predicate’s type argument at each application site. For example, the structural induction rule for the nested type PTree a = PLeaf a | PNode (PTree (a× a)) Deep induction 343 of perfect trees is ∀ (P : ∀ (a : Set) → PTree a → Set) → (∀ (a : Set)(x : a) → Pa (PLeaf x))→ (∀ (a : Set)(x : PTree (a × a)) → P (a × a) x → Pa (PNode x))→ ∀ (a : Set)(x : PTree a)→ Pax and the structural induction rule for the nested type data Lam a where Var :: a → Lam a App :: Lam a → Lam a → Lam a Abs :: Lam (Maybe a)→ Lam a of de Bruijn encoded lambda terms [9] with variables of type a is ∀(P : ∀(a : Set) → Lam a → Set) → (∀(a : Set)(x : a) → Pa (Var x))→ (∀(a : Set)(x : Lam a)(y : Lam a) → Pax → Pay → Pa (Appxy))→ (∀(a : Set)(x : Lam (Maybe a))→ P (Maybe a) x→ Pa (Abs x))→ ∀(a : Set)(x : Lam a)→ Pax Deep induction rules for nested types must similarly account for their type con- structors’ changing type parameters while also parameterizing over the addi- tional predicate on the type of data they contain. Letting Pair Q be the lifting of a predicate Q on a to pairs of type a × a,so that Pair Q (x, y)= Qx × Qy, this gives the deep induction rule ∀ (P : ∀ (a : Set) → (a → Set) → PTree a→ Set) → (∀ (a : Set)(Q : a → Set)(x : a) → Qx → PaQ (PLeaf x))→ (∀ (a : Set)(Q : a → Set)(x : PTree (a × a)) → P (a × a)(Pair Q) x → PaQ (PNode x))→ ∀ (a : Set)(Q : a → Set)(x : PTree a)→ PTree Qx → PaQx for perfect trees, and the deep induction rule ∀(P : ∀(a : Set) → (a → Set) → Lam a → Set) → (∀(a : Set)(Q : a → Set)(x : a) → Qx → PaQ (Var x))→ (∀(a : Set)(Q : a → Set)(x : Lam a)(y : Lam a)→ PaQx→ PaQy→ PaQ (Appxy))→ (∀(a : Set)(Q : a → Set)(x : Lam (Maybe a))→ P (Maybe a)(Maybe Q) x → PaQ (Abs x))→ ∀(a : Set)(Q : a → Set)(x : Lam a) → Lam Qx → PaQx for lambda terms. As usual, the structural induction rules for these types can be recovered by setting Q = λx. True in their deep induction rules. Moreover, the basic technique described in Section 2 can be recovered from the more general one described in this section by noting that the type arguments to ADT data type constructors don’t change, and that the internal predicate parameter to P can therefore be lifted to the outermost level of ADT induction rules. We conclude this section by giving both structural and deep induction rules 344 P. Johann and A. Polonsky for the following truly nested type of bushes [8]: Bush a = BNil | BCons a (Bush (Bush a)) (Note that this type is not even deﬁnable in Agda.) Correct and useful structural induction rules for bushes and other truly nested types have long been elusive. One recent eﬀort to derive such rules has been recorded in [10], but the approach taken there is more ad hoc than not, and generates induction rules for data types related to the nested types of interest rather than for the original nested types themselves. To treat bushes, for example, Fu and Selinger rewrite the type Bush a as NBush (Succ Zero) a,where NBush = NTimes Bush and NTimes :: (Set → Set) → Nat → Set → Set NTimes p Zero s = s NTimes p (Succ n) s = p (NTimespns) Their induction rule for bushes is then given in terms of these rewritten ones as ∀ (a : Set)(P : ∀ (n : Nat) → NBush n a→ Set) → (∀ (x : a) → P Zero x)→ (∀ (n : Nat) → P (Succ n) BNil)→ (∀ (n : Nat)(x : NBush n a)(xs : NBush (Succ (Succ n)) a)→ Pnx → P (Succ (Succ n)) xs→ P (Succ n)(BCons x xs))→ ∀ (n : Nat)(xs : NBush n a)→ Pnxs This approach appears promising, but is not yet fully mature.