Get 20M+ Full-Text Papers For Less Than $1.50/day. Start a 14-Day Trial for You or Your Team.

Learn More →

Measuring the Readability of Geometric Proofs: The Area Method Case

Measuring the Readability of Geometric Proofs: The Area Method Case Using an approach, inspired by our modernisation of Lemoine’s Geometrography, this paper proposes a new readability criterion for formal proofs produced by automated theorem provers for geometry. We analyse two criteria to measure the readability of a proof: the criterion given by Chou et al. and the one given by Wiedijk. After discussing the limitations of these two criteria, we introduce a novel approach, which provides a new criterion. We conclude discussing some future work. Keywords Readability · Formal proof · Automated deduction in geometry · Geometrography Mathematics Subject Classification 51A05 · 68T15 1 Introduction An important feature of a text is its readability. Readability is the ease with which a reader can understand a written text. The readability of a text is determined by many factors and plays an important role in many areas of interest. For example, it might depend both on the content of the text, i.e., the complexity of the vocabulary and syntax, and on the layout of the The first author was partially support by FCT - Foundation for Science and Technology, I.P., within the scope of the project CISUC - UID/CEC/00326/2020 and by European Social Fund, through the Regional Operational Program Centro 2020. The second author was partially support by Italian Ministry of Education, University and Research through the PRIN 2017 project “The Manifest Image and the Scientific Image” prot. 2017ZNWW7F_004. Pedro Quaresma pedro@mat.uc.pt Pierluigi Graziani pierluigi.graziani@uniurb.it Departamento de Matemática da FCTUC, Largo D. Dinis, 3000−143 Coimbra, Portugal Department of Pure and Applied Sciences, University of Urbino, Via Sant’Andrea, 34, Urbino, PU 61029, Italy 0123456789().: V,-vol 123 5 Page 2 of 21 P. Quaresma, P. Graziani text, e.g. its typographical aspects. Readability has to be distinguished from legibility that is the ease with which a reader can recognise individual characters in a text. In order to quantify the readability of a text, various formulas have been defined [5]. In this paper we will deal with the readability of geometric proofs produced by automatic theorem provers. A potential approach that has been followed in the past is to take formulas that were developed for applications to non-scientific texts and apply them to mathematical texts [13]. However, this sort of approach has not been extended in a proper way to measure the readability of proofs produced by automatic theorem provers and much more work needs to be done [6, 12, 19, 22, 26]. A mathematical text is composed of many elements: descriptions in natural language, formulas and diagrams; thus, it is much more difficult to quantify its readability through formulas, then in the case of regular text. Even more complex is the problem of the readability of mathematical proofs produced by automatic provers that are often presented in a form that can only be read by experts. In this paper, we will introduce both a language to formulate readability criteria for formal proofs produced by automated theorem provers for geometry, based on the area method [2, 10] (see Appendix 1), and a novel criterion based on our modernisation of Lemoine’s Geometrography [14, 22, 24]. We will show how this new criterion is consis- tent with the results of the other already existing criteria, but that it is also more general and expressive compared to the others. The proposed language allows for an easy formulation of new readability criteria as well as for an easy implementation of those criteria in repositories such as the Thousand of Geometric problems for geometric Theorem Provers (TGTP), thereby collecting data relevant to the further development of the area of automated theorem proving in geometry. The data will help strengthen the use of automatic tools not only in research but also in applications like in mathematics education, where the use of automated deduction is already making its way [7, 23]. Therefore, as in the Automath project, the formulation of a readability criterion will allow the definition of a threshold below which “people will start using them (the proofs produced by automated theorem provers ) for serious work” (see §2.22.2). Overview of the paper. The paper is organised as follows: first, in §2, the known readability criteria will be discussed. In §3, Lemoine’s Geometrography, its modernisation and a formal language employed to study readability of formal proofs produced by automated theorem provers for geometry, based on the area method, will be analysed. In §4, a new readability criterion that uses Geometrography will be presented, providing also some examples of its application. In §5 conclusions are drawn and future work will be discussed. 2 Criteria of Readability (by Experts) To the best of our knowledge there are two precise proposals to measure the readability of a proof. The first one is that proposed by Chou et al. [1, p. 452], while the second is that proposed by Freek Wiedijk and is known as the de Bruijn factor [4, 27]. We will not consider here the problem of the readability of geometric proofs from the Mathematics Educa- tions point of view. We will address some issues related to that context in the conclusions. http://hilbert.mat.uc.pt/TGTP/index.php. In the original sentence, “Automath like Systems”. 123 Measuring the Readability of... Page 3 of 21 5 Fig. 1 Geometric Construction, Ceva’s Theorem 2.1 Maxt-Lems Criterion Chou et al. [1, p. 452] proposed a way to measure how difficult it is to read a formal proof, obtained by using an automated theorem prover for geometry (GATP) implementing the area method. The Maxt-Lems (ML) criterion considered the following pair (maxt, lems), where: – maxt is the number of terms of the maximal polynomial occurring in the machine’s proof. Thus, maxt measures the number of computations needed in the proof; – lems is the number of elimination lemmas used to eliminate points from geometric quan- tities. In other words, lems indicates the number of deduction steps in the proof. Using those two elements and analysing all the proofs done by their GATP, they managed to determine an indicative threshold for readability. According to [1, p. 452] a formal proof, which employs the area method, is considered readable if one of the following conditions holds: – the maximal term in the proof is less than or equal to 5; – the number of deduction steps of the proof is less than or equal to 10; – the maximal term in the proof is less than or equal to 10 and the deduction steps are less than or equal to 20. It is interesting that according to their corpus: 66.9% of the proofs have maxt ≤ 5, 42.6% have lems ≤ 10 and 73.2% have lems ≤ 20. Let us consider, for example, the Thousand of Geometric problems for geometric Theorem Provers (TGTP) repository, specifically, problem GEO0001,the Ceva’s Theorem. Theorem 1 (Ceva’s Theorem) Let  A BC be a triangle and P be any point in the plane. Let AF BD CE D = AP ∩ CB, E = BP ∩ AC , and F = CP ∩ A B . Show that: × × = 1. FB DC EA P should not be in the lines parallel to AC, AB and BC and passing through B, C and A respectively [10]. With respect to the ML criterion, considering the proof made by the Geometry Construc- tions L T X Converter (GCLC)[11] GATP (see Appendix 1), the values are: max t er m = 1, and lems = 3. Therefore, this would be considered a readable proof. They considered 478 machine solved geometry problems. 123 5 Page 4 of 21 P. Quaresma, P. Graziani Table 1 Ceva’s Theorem Proof Informal Formal de Bruijn factor by GCLC Area Method GATP, de Bruijn factor Uncompressed 125KB 137KB 1.09 Compressed 124KB 136KB 1.09 2.2 The de Bruijn Factor The Automath project had the goal of developing a system that would allow to write entire mathematical theories in such a precise fashion that verification of the correctness of theo- rems in such theories could have been carried out by formal (mechanical) operations applied directly to the text [4]. This was a first effort in the direction of the Formalisation of Mathe- matics that is now pursued by researchers working in systems like Coq, Isabelle and Mizar. In “A Survey of Project Automath”, de Bruijn introduced the loss factor between the size of an ordinary mathematical exposition and its full formal translation inside a computer. The loss factor expresses what someone loses, in terms of shortness, when translating informal mathematics into Automath. Wiedjk developed the concept and called it the de Bruijn factor. The de Bruijn factor was developed for a situation where a proof is entered in a computer in full detail in such a way that the computer can check its correctness, e.g, when an existing informal mathematical text is taken and it is translated into a computer representation (using a system like Automath). So the de Bruijn factor measures how efficient a system is [27]. Wiedijk noted that non-meaningful questions about formatting could affect the calculus of the loss factor, for example: if indentation is performed employing the tab key, then such indentation can be eight times smaller compared to situations in which the indentation is done using the space key; also the T X macro name for the ’⇔’ symbol uses 15 characters, while an encoding like “<=>” uses only 3. To further smooth formatting choices, Wiedjk proposed to compress the files before calculating the ratios of their sizes. Wiedijk calls the ratio of the uncompressed file sizes the apparent de Bruijn factor, and the ratio of the compressed file sizes the intrinsic de Bruijn factor [27]. We claim that the de Bruijn factor can be used, in a broader sense, to measure the effi- ciency of an automated theorem prover and a given axiomatisation. Whenever a informal proof is known for a given theorem, it can be compared with the formal proof produced by the automated theorem prover, using a specified axiomatisation. This is particularly true in geometry where a given informal geometric proof can be compared with an, also geometric, formal proof produced by a geometric automated theorem prover. Using again the Ceva’s Theorem as an example, the readability of its formal proof, with respect to the de Bruijn factor can be calculated (see Table 1). Wiedijk also introduced the de Bruijn threshold, i.e., a limit below which “the people will start using them (Automath like system) for serious work”. We will consider the value of 2 as a readability threshold. Further studies are needed in order to establish a readability threshold for automated proofs, using the de Bruijn factor. Moreover, a broader comparison between formal proofs and informal proofs is needed. Considering the quotient of the size of the compressed formal proof (area method) and the size of the informal proof, the de Bruijn factor of Ceva’s Theorem is 1.09. It would therefore be sensible to consider the GCLC area method proof, readable. https://coq.inria.fr/, https://isabelle.in.tum.de/, http://mizar.org/. We used the proof found in https://artofproblemsolving.com/wiki/index.php/Ceva’s_theorem as source for the informal proof. 123 Measuring the Readability of... Page 5 of 21 5 2.3 ML and de Bruijn Factor’s Limits Analysing the previous criteria, we can note a first limit for both the ML criterion and the de Bruijn factor: they assume that readability by expert is being considered, i.e., a geometer expert in the language of the prover that produces the proof. A second limit emerges when the following [22] classification of formal geometric proofs produced by GATPs is taken into consideration: 1. no readable proof, only a proved/not proved output; 2. non-synthetic proof (i.e., a proof without a corresponding geometric description, e.g. algebraic methods); 3. semi-synthetic proof with a corresponding prover’s language rendering; 4. (semi-)synthetic proof with a corresponding natural language rendering; 5. (semi-)synthetic proof with a corresponding natural language and visual rendering; Relating the ML criterion with this classification, we can note that such criterion only allows the definition of a threshold for semi-synthetic proofs that employ the area method (level 3). The direct applicability of the ML criterion to other synthetic methods, e.g. full- angle methods or the deductive database method [3, 28], would be possible, considering the number of deduction steps of the proofs and adapting the condition regarding the maximal term in the proofs. The de Bruijn factor can be used directly in all levels above 1, although it is more mean- ingful on levels greater or equal than 2.3. Considering the (GCLC) and its integrated GATPs based on the area method, Wu’s method and Gröbner Basis method [9], it is possible to cal- culate the readability of the proofs developed using the different GATPs. It is indeed possible to imagine, extrapolating from the results with the area method, that all those proofs would be readable, and this would hold even though the de Bruijn factor requires informal proofs to be provided. The two criteria analysed are very different, the first is very specific while the second is very generic, although both criteria require readability by experts. We can therefore ask ourselves if it is possible to define a new criterion which does not require readability by experts, which is also more natural and expressive than the previous ones, and which can be generalised to various proof methods. 3 Looking for a More Natural Readability Criterion The new criterion that we want to propose is based on our modernisation of Lemoine’s Geometrography [14, 22, 24]. We will begin by explaining what Geometrography is and what its modernisation consists of. Geometrography, “alias the art of geometric constructions”, aims at providing a tool: (i) to designate every geometric construction by a symbol that manifests its simplicity and GATPs can be of two major types: algebraic, the proof, if it exist, is done recurring to an algebraic reasoning (e.g. Gröbner basis); geometric (synthetic), the proof, if it exist, is done recurring a set of axioms and inference rules of geometry, without the use of coordinates. Semi-synthetic methods, e.g. the area method, use also the axioms of a field of characteristic different from 2. The Wu’s method and the Gröbner basis method are both algebraic methods, from the geometric point of view their proofs are unreadable (level 2). 123 5 Page 6 of 21 P. Quaresma, P. Graziani exactitude; (ii) to teach the simplest way to execute an assigned construction; (iii) to discuss a known solution to a problem and eventually replacing it with a better solution; (iv) to compare different solutions for a problem, by deciding which is the most exact and the simplest solution from the point of view of Geometrography [14–17, 20, 22, 24]. 3.1 Classical Geometrography In Lemoine’s Geometrography two coefficients are defined to measure the relative difficulty to perform some geometric constructions. The approach is applied to ruler and compass geometry, i.e., geometric constructions made only with the help of a ruler and a compass. Considering the modifications proposed by Mackay [16], the following Ruler and Compass constructions and the corresponding coefficients can be analysed. To place the edge of the ruler in coincidence with one point . ............. R To place the edge of the ruler in coincidence with two points . . . .........2 R To draw astraight line.................................................. R To put one point of thecompassesonadeterminate point ............... C To put thepointsofthe compassesontwo determinate points..........2C To describeacircle...................................................... C Then a given construction is measured against the number of uses of those elementary steps. For a given construction expressed by the equation: l R + l R + m C + m C 1 1 2 2 1 1 2 2 where l and m are coefficients denoting the number of times any particular operation is i j performed. The number (l + l + m + m ) is called the coefficient of simplicity (cs) of the 1 2 1 2 construction, and it denotes the total number of operations performed. The number (l + m ) 1 1 is called the coefficient of exactitude (ce) of the construction, and it denotes the number of preparatory operations on which the exactitude of the construction (made with the help of physical, inaccurate, tools) depends [16, 17]. 3.2 Geometrography in Dynamic Geometry Classical Geometrography applies to geometric constructions made with the help of a ruler and a compass. Its modernisation, proposed in [22, 24] uses the tools of the dynamic geometry systems (DGS). In [22] it was shown how to modernise Geometrography using GCLC,in [24] the generality of the approach is shown, using GeoGebra [8]. Considering the operations: define a point, anywhere in the plane, D and define a given object, using other objects, C, the following values for the GCLC basic constructions are obtained: point –fixapoint in theplane........................................... D line –usestwo points. ..................................................2C circle –usestwo points................................................2C intersec –usestwo lines ................................................2C intersec –usesfour points..............................................4C intersec2 –usesacircle andacircle or line.............................2C Exactitude, or the lack of it, in this context refer to the possible inaccuracy introduced by physical tools such as ruler and compass. 123 Measuring the Readability of... Page 7 of 21 5 Fig. 2 Geometric Construction, TGTP problem GEO0369 midpoint –usestwo points..............................................2C med –usestwo points....................................................2C bis –usesthree points..................................................3C perp –usesapoint andaline...........................................2C foot –usesapoint andaline...........................................2C parallel –usesapoint andaline.......................................2C onsegment –usestwo points.............................................2C online –usestwo points................................................2C oncircle –usestwo points..............................................2C In the modernisation (extrapolation) of the Geometrography, considering the “tools” of dynamic geometry systems, the coefficient of exactitude loses its meaning, the constructions will be executed by the DGS, so they are accurate (exact). However, the coefficient of simplic- ity of the constructions can still be useful, it can be used to classify constructions by levels of simplicity. A new dimension can also be added, the coefficient of freedom (cf), given by the degree of freedom a given geometric object has, e.g. “a point in a line” has one degree of freedom, a point in the plane has two degrees of freedom, etc. This new coefficient will give a value to the dynamism of the geometric construction. The degrees of freedom are mea- sured against the point definitions. The point definition, defines a point with two degrees of freedom, the onsegment, online and oncircle constructions, define points with one degree of freedom. For the GCLC constructions contained in TGTP an average value of simplicity (CS )of20.8 was obtained. Using the k-means clustering function implemented gcl in the statistics package of Octave, three classes of geometric constructions describing an increasing level of complexity were defined: simple constructions, 1 ≤ CS ≤ 18; average gcl complexity constructions, 18 < CS ≤ 28; complex constructions, CS > 28. gcl gcl TGTP contain 71 simple constructions; 81 average complexity constructions; 28 complex constructions. For example (TGTP problem’s GEO0369): “In triangle  ABC,let F be the midpoint of the side BC,and D and E the feet of the altitudes on AB and AC, respectively. FG is perpendicular to DE at G. Show that G is the midpoint of DE”, has a geometric construction with coefficient of simplicity 19 (see Fig. 2), so an average complexity construction. The value GNU Octave, version 6.1.1, package octave-statistics, function kmeans https://octave.sourceforge.io/ statistics/function/kmeans.html. 123 5 Page 8 of 21 P. Quaresma, P. Graziani of 6 for its coefficient of freedom is given by the fact that only the three points A, B,and C are free in the plane, while all the other points are completely bind, by construction. 3.3 Geometrography in Automatic Theorem Proving The same approach can be (again) extrapolated to take into consideration synthetic geometric proofs, i.e., proofs based on a geometric axiomatic theory, using geometric inference rules. Considering the proofs produced by the GATP GCLC, implementing the area method [9, 10], the coefficient of simplicity for all the axioms and lemmas of the theory can be calculated. Apart from the geometric constructions in which the proof is based (with coefficient of simplicity nCnst), there are other steps to be considered. (Elementary) Algebraic Simplification..................................(AS) (Elementary) GeometricSimplification. ................................(GS) Application of the Area Method Lemma n...........................(AML ) A given proof can thus be measured against the number of those steps. For a given proof expressed by the equation: n Cnst + n × AS + n × GS + AML 1 2 3 j j =l where n is the coefficient of simplicity of the geometric construction, n is the number of 1 2 algebraic simplifications and n is the number of geometric simplifications. The coefficient of simplicity for the proof would be: CS = n + n + n + CS (AML ) proof 1 2 3 proof j j =l The coefficient of freedom has no meaning in this setting. Each lemma of the area method, AML , has a corresponding simplicity coefficient, the term, CS (AML ), is the sum of all those values, for all the lemmas used in proof j j =l the proof. In order to achieve this for each lemma of the area method the corresponding coefficients of simplicity were calculated [21]. For example, the proof of Lemma 9 will have the following coefficient of simplicity, CS (AML ) = 74. proof 9 Lemma 1 (AML ) Let R be a point on the line P Q. Then for any two points A and B it RQ PR holds that S = S + S . RAB QAB PAB PQ PQ The proofs developed by GATPs based on the Area Method are formal proofs. The method itself was formalised, and proved sound, using the Coq proof assistant. The GATP developed by J. Narboux, as a Coq tactic, can have the proofs verified by Coq.The GCLC area method, do not have, explicitly, that possibility, but, it would be a matter of developing a filter from the GCLC language to the Coq language (see Appendix 1). By elementary algebraic simplification it is understood the basic algebraic operations: addition, subtraction, multiplication, division, and their properties of commutativity, associativity and distributivity. By elementary geometric simplification it is understood the direct application of the definition of the area method quantities. We call them trivial steps. 123 Measuring the Readability of... Page 9 of 21 5 The following is a shorter version of its proof with the elementary algebraic and geometric simplifications condensed (the expanded version can be see in [21]). Geometrography of Lemma 9 (AML ) A B CS = 22 = 4D + 18C gcl CF = 8 gcl – s = S , initial construction; AB P Q –1 × GS, areas of triangles with the same orientation, S = s − S − S ; RAB AR Q BP R PR –1 × AML , lemma 14, = r (AML = 8); 14 14 PQ AR Q RQ –1 × AML , lemma 5, = (AML = 18); 5 5 AP Q PQ RQ PQ−PR –1 × GS, segments with the same orientation, = ; PQ PQ PQ−PR –2 × AS, algebraic simplifications, = (1 − r ) and S = (1 − r )S ; AR Q AP Q PQ S PR BP R –1 × AML , lemma 5, = (AML = 11); 5 5 BPQ PQ –2 × AS, algebraic simplifications, S = r S and S = s − (1 − r )S − BP R BPQ RAB AP Q r S ; BPQ –2 × GS, areas of triangles with the same orientation, S = s − (1 − r )(s − S ) − RAB PAB r (s − S ); QAB –7 × AS, algebraic simplification, S = s − s + rs + S − r S − rs + r S , RAB PAB PAB QAB PR RQ S = r S + (1 − r )S and S = S + S . RAB QAB PAB RAB QAB PAB PQ PQ Geometrography for the demonstration: 4D + 18C + 4GS + 11AS + 1AML + 2AML 14 5 CS = 74 = 22 + 4 + 11 + 8 + (18 + 11) proof AML CS = 22 gcl with AML = 8and AML = 18 (first application) and AML = 11 (second application). 14 5 5 It is considered that, from the second application of a lemma onward, its proof is accepted, so, only its adaptation to the new configuration is needed, i.e., the pattern matching of the lemma configuration to a new setting. For that reason, in any second, third, etc. application of a lemma, only the CS coefficient values are considered. gcl Given that a mathematical proof is a sequence of steps, in addition to the coefficient of simplicity, it would be useful to have other coefficients: e.g., the total number of steps in the proof; the value of the most difficult step in the proof; the number of different steps of high difficulty in the proof; the number of different types of steps (lemmas) in the proof; a proof script; a numerical description of the proof; and a corresponding line chart or proof trace. Therefore, to fully characterise a formal synthetic proof produced by a GATP, we can define and consider the following coefficients: –CS , the simplicity coefficient (as above), it gives the simplicity coefficient for the proof overall proof; 123 5 Page 10 of 21 P. Quaresma, P. Graziani –CT , the total number of steps in the proof; proof –CS , the highest simplicity coefficient of the lemmas/definitions applications, it proofmax gives the simplicity coefficient for the most difficult step of the proof; –CD , the number of different types of lemmas used in the proof; typeproof –CD , the number of different steps of high difficulty in the proof; highproof – The proof script, as defined above; – The corresponding line chart or proof trace in tikz format. It is important to note that to obtain the coefficient CD (hp) the area method lemmas highproof implemented in the GATP GCLC were analysed, and, using the k-means clustering function implemented in the statistics package of Octave, divided into three categories: low difficulty (hp < 284), medium difficulty (284 ≤ hp < 1848) and high difficulty (hp ≥ 1848). Using the defined coefficients above, we have the following values for the proof of AML : ⎪ CS = 74 = 22 + 4 + 11 + 8 + (18 + 11) proof CS = 22 gcl CT = 19 proof AML CS = 18 ⎪ proofmax CD = 2 typeproof CD = 0 highproof The GATP GCLC implementation of the area method [9, 10] is able to produce proof scripts. Using the command prooflevel it is possible to have control over the level of detail of the proof script. Two programs were implemented to calculate the Geometrogra- phy of the proofs. The Geometrography of the construction is calculated by a bash script, gclcGeometrography.bash, that analyses the GCLC geometric construction (not con- sidering all the rendering commands). The Geometrography of the proof script (minus the geometric construction) is calculated by, csproof, a parser that analyse the proof script counting the algebraic steps and the geometric steps in sequence and also the lemmas and definitions of the area method with the respective coefficient of simplicity. Using the program csproof on an arbitrary geometric proof, it can be obtained: a CSV file with the values regarding the Geometrographic Readability Coefficient of Proofs (see Sect. 4); a file with the coefficient of simplicity of the geometric construction; a file with a line chart, a graphical representation of the proof done by the GATP GCLC. To better understand some details, let’s consider again the Ceva’s theorem (see Theorem 1). Using the GATP GCLC, with the full level of detail, the proof script of Ceva’s theorem has all the details explained and it fills two pages, almost three pages, if the notes about the non-degeneracy conditions and about the proof itself are taken into consideration (see Appendix 1). The line chart is shown in Fig. 3. In it, the sequences of algebraic, or geometric, simplifications are condensed in only one step (for a more condensed view of the graph). https://ftp.eq.uc.pt/software/TeX/graphics/pgf/base/doc/pgfmanual.pdf. The open source codes are available in the GitHub project https://github.com/GeoTiles/Geometrography/ tree/master/GeometrographyProofs. Comma Separated Values format. 123 Measuring the Readability of... Page 11 of 21 5 Fig. 3 Ceva’s Theorem, Geometrography Proof Trace Therefore, the Geometrography of Ceva’s Theorem Proof is the following: 4D + 18C + 23AS + 3AML + 3AML + 3AML . 1 8 10 ⎪ CS = 220 proof CS = 22 ⎪ gcl CT = 32 proof Ceva sTheorem ⎪ CS = 84 proofmax CD = 3 typeproof CD = 0 highproof 4 A Geometrographic Criterion It is interesting to note how the Geometrographic coefficients highlight many salient aspects of the proof, aspects that could be used to analyse the readability of such proofs. Furthermore, it is interesting to stress how the proof trace constitutes a sort of electroencephalogram of the machine while proving the theorem. Just as an electroencephalogram can be useful for measuring a brain’s electrical activity, the line chart helps to understand some features of the proof by looking at its trace. Applying the Geometrography to the area method proofs contained in the repository TGTP,using theGATP GCLC with the full level of detail, and using the geometrographic coefficients we can argue in favour of the following new readability coefficient: Geometrographic Readability Coefficient of Proofs (GRCP) GRC P = ((CS − CT ) × (CD + CD )) proof proof highproof typeproof This coefficient relates four quantities: the simplicity coefficient of the proof, the total number of steps in the proof, the number of different steps with high-difficulty in the proof, the number of different lemmas used in the proof. The first factor, (CS − CT ), gives an approximation to the overall coefficient proof proof of simplicity of the non-trivial steps in the proof. Note that CT count the number of proof 123 5 Page 12 of 21 P. Quaresma, P. Graziani steps rather than the coefficient of simplicity of each step. By contrast, in CS ,itisthe proof coefficient of simplicity that counts. Each trivial step has a coefficient of simplicity equal to one, and the coefficients of simplicity for non-trivial steps, such as the construction and the lemmas, are much greater than one. In the light of this, it can be concluded that the difference between CS and CT emphasises the complexity of the proof, disregarding its length. proof proof The second factor, (CD + CD ), gives an account of the difficult steps. highproof typeproof Steps that, potentiality, make the proof much harder to follow, steps where the normal flow of the proof would be interrupted to jump to the proof of the lemma, resuming after completing the lemma’s proof. The addition of the number of high-difficulty steps with the number of different lemmas used in the proof, gave a multiplying factor for the overall complexity of the proof. A final note about this second factor: a high-difficulty step is, for sure, a lemma application, nevertheless we felt that the high-difficulty nature of the lemma is a sufficient reason for this double counting. Multiplying these factors, the approximation for the overall simplicity coefficient and the number difficult steps—both elements that we believe characterise the readability of a proof—we obtain a readability coefficient of a proof. Therefore, considering 71 theorems and their area method proofs, from the TGTP repository and using, again, the k-means clustering function from Octave, the proofs can be divided into the following classes of Geometrographic readability: readable (hi gh − r eadabi li t y), GRCP ≤ 48000; medium-readability 48000 < GRCP ≤ 135000; low-readability, GRCP > 135000. The GRCP for GEO0001, Ceva’s proofs is: GRCP = (220 − 32) × (0 + 3) = GEO0001 564 ≤ 48000, so a readable (high-readability) proof. GRCP Medium-readability Example. TGTP problem’s GEO0021: Theorem 2 (Circumcenter of a Triangle) The circumcenter of a triangle can be found as the intersection of the three perpendicular bisectors has the following values for the different coefficients. ⎪ CS = 8554 proof CS = 11 gcl CT = 591 proof GEO0021 CS = 2807 ⎪ proofmax ⎪ CD = 13 typeproof CD = 3 highproof 48000 < GRC P = 127408 ≤ 135000 By the GRCP criterion, this is a medium-readability problem. It can be seen that it has 13 different lemmas, 3 high-difficulty step, a long proof with a significant difference between the CS and the number of steps of the proof (see Figs. 4 and 6). proof 4.1 GRCP low-readability Example. TGTP problem’s GEO0020: Theorem 3 (Distance of a line containing the centroid to the vertices) Given a triangle A BC and a point X , the sum of the distances of the line X G , where G is the centroid of A BC , to The actual values were rounded for better readability. 123 Measuring the Readability of... Page 13 of 21 5 Fig. 4 TGTP, GEO0021, Circumcenter of a Triangle the two vertices of the triangle situated on the same side of the line is equal to the distance of the line from the third vertex. has the following values for the different coefficients: CS = 19989 proof ⎪ CS = 26 gcl CT = 4119 proof GEO0020 ⎪ CS = 2807 proofmax ⎪ CD = 13 typeproof CD = 4 highproof GRC P = 269790 > 135000, so a low-readability problem. It can be seen that it has 13 different lemmas, 4 high-difficulty step, a long proof, with a very high value of overall complexity (see Figs. 5 and 7). 4.2 Comparing the Different Criteria The Geometrography Readability Coefficient of Proofs criterion takes into consideration all the significant aspects of a formal proof, its overall difficulty, its number of steps, the number of difficult steps and the number of different lemmas that must be applied. The other criteria consider fewer aspects. The de Bruijn criterion, given its different goal, takes only in consideration the size of the proof and it needs to have an informal proof to compare with. The ML criterion considers the number of different lemmas applied and uses the number of 123 5 Page 14 of 21 P. Quaresma, P. Graziani Fig. 5 TGTP, GEO0020, Distance of a line containing the centroid to the vertices terms of the maximal polynomial as a way to have an approximation to the complexity of the proof. Alongside the ML criterion, in the GRCP criterion, the number of lemmas in the proof is considered: in the GRCP criterion as a multiplicative factor, in the ML criterion as one of the conditions for readability. In the ML criterion the number of terms in the maximal polynomial are considered, but, as its authors remarked, this measures the number of computations needed in the proof, not its readability. This is weakly related to the number of steps in the proof. It approaches the number of steps needed to decompose those long polynomials occurring in the proof to a simple expression. Regardless of this criteria comparison, we want to emphasise that the Geometrographic view proposed in this paper has a more general scope. Although the GRCP criterion is a reasonable proposal, the elementary quality of the Geometrographic approach, through the analysis of various coefficients of the proofs, the proof scripts and the proof traces, makes it possible to have a language or a tool that can be used by non-experts to formulate other criteria weaker or stronger than the one we propose. The contribution of this paper is therefore not only that of a Geometrographic criterion, but of a Geometrographic approach to the problem of measuring the readability of formal proofs in automated deduction in geometry, an approach that offer an environment in which to analyse the proofs in detail by proposing and test readability criteria. To the best of our knowledge, it is the first time that the community has access to such a general tool to formulate and to study the readability of formal proofs in automated deduction in geometry. It is also interesting to note that our criterion offers a classification of proofs that is in line, when the fundamental points are considered, with the classifications given by the other two criteria. i.e., proofs that are classified as difficult to read according to the new criterion are also classified as difficult to read for the others, and the same applies to proofs that are easy to read (Table 2). 123 Measuring the Readability of... Page 15 of 21 5 Table 2 Comparison of the three criteria TGTP ML de Bruijn GRCP GEO0001 3 < 5, deduction steps easy 1.6 < 2 easy 564 ≤ 48000 easy(high) GEO0021 13 > 5 deduction steps & 37.63 > 2 difficult 127408 ≤ 135000 number of terms > 5 difficult(medium) difficult GEO0020 13 > 5 deduction steps & 47.31 > 2 difficult 269790 > 135000 number of terms > 5 difficult(low) difficult Finally, we have to say that all the criteria proposed here have no empirical validation through the submission of tests to students, experts, etc. Nevertheless, the great advantage that our approach offers is that it allows to formulate criteria that can be implemented in repositories such as TGTP and can be evaluated experimentally in a very simple way. 5 Conclusions In this paper we have analysed the problem of measuring the readability of formal proofs in automated deduction in geometry. We have introduced two known criteria and highlighted some of their limitations. We have then introduced a third criterion that seems to overcome the problem of readability by expert, therefore being more natural than the previous ones, and seems to be easily generalised. One possible generalisation is given by the possibility to formulate weaker, or stronger, criteria, using the proposed language. Another possible avenue is given by the generalisation of our approach to other GATPs (e.g. the JGEx inte- grated GATPs, area method, full-angle method and deductive databases method [3, 28, 29], ArgoCLP, coherent logic prover [25]) and any other ATP that has a proof script based on axioms, lemmas applications and, eventually, elementary steps (algebraic, geometric, etc.). It is a matter of calculation of the coefficients of simplicity for the axioms and lemmas of the base theory in consideration. As we pointed out, the great advantage that our approach offers is that it allows to formulate criteria that can be implemented in repositories such as TGTP and evaluated experimentally. For this reason, an important work that we are planning is an experiment to be submitted to mathematicians, computer scientists, educationalists and students providing an adequate empirical test for our Geometrographic criterion. Acknowledgements The authors wish to thank Salvatore Florio, Victor Pambuccian, and Mirko Tagliaferri for their comments on an earlier draft of this work. Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. 123 5 Page 16 of 21 P. Quaresma, P. Graziani Appendix A Ceva’s Theorem, GCLC Area Method Proof The area method for Euclidean constructive geometry was proposed by Chou, Gao and Zhang in the early 1990’s [2]. The method can efficiently prove many non-trivial geometry theorems and is one of the most interesting and most successful methods for automated theorem proving in geometry. In [10] a variant of the original axiom system was presented, based on that axiomatisation all the lemmas needed by the method were formally proved and the soundness of the method was established, using the Coq proof assistant [18]. The GCLC implementation of the area method is able to produce formal proofs. If the highest level of details is chosen, prooflevel 7, it would be possible to (an appropriated filter has to be built) formally verify those proofs using a proof assistant, e.g. Coq.The L T X proof scripts that GCLC produces (by default, at prooflevel 2) are a natural language rendering, to be read by mathematicians. The area method axiomatic system for Euclidean plane geometry (within first order logic with equality), has just one primitive type of geometrical objects: points. Variables can also range over a field ( F , +, ·, 0, 1),where F is any field of characteristic different from 2. The axioms of the theory of fields used in GCLC area method proofs, are standard. The Ceva’s proof presented below is a L T X proof script produced by GCLC, at prooflevel 7, edited to include the GRCP values. GCLC Prover Output for conjecture “cevaGEO0001”, Area method used − → −→ − → AF BD CE by the statement · · = 1 (0) − → −→ − → 22 = 4 × D + 18 × C FB DC EA − → −→ − → AF BD CE by Lemma 10 −1 · · · = 1 (1) − → −→ − → AML = 13 BF DC EA − → −→ − → AF BD CE by right association −1 · · · = 1 (2) − → −→ − → 1 × AS BF DC EA − → −→ − → AF BD CE by right association −1 · · · = 1 (3) − → −→ − → 1 × AS BF DC EA −→ − → by Lemma 8 (point S BD CE APC −1 · · · = 1 F eliminated) (4) −→ − → BPC DC EA AML = 84 −→ − → S BD CE by Lemma 10 APC −1 · · · −1 · = 1 (5) −→ − → AML = 13 S 10 BPC DC AE ⎛ ⎞ −→ − → BD CE S · · −1 · APC −→ − → ⎜ ⎟ DC AE by multiplication of frac- ⎜ ⎟ −1 · = 1 (6) ⎝ ⎠ tions 1 × AS BPC Narboux, J.: Formalization of the area method. Coq user contribution (2009). http://dpt-info.u-strasbg.fr/ ~narboux/area_method.html. 123 Measuring the Readability of... Page 17 of 21 5 −→ − → BD CE −1 · S · · −1 · APC −→ − → DC AE by multiplication of frac- = 1 (7) tions 1 × AS BPC −→ − → BD CE −1 · S · −1 · · APC −→ − → DC AE by associativity and com- = 1 (8) mutativity 1 × AS BPC −→ − → BD CE −1 · −1 · S · · APC −→ − → DC AE by associativity and com- = 1 (9) mutativity 1 × AS BPC −→ − → BD CE 1 · S · · APC −→ − → DC AE by multiplication of con- = 1 (10) stants 1 × AS BPC −→ − → BD CE S · · APC −→ − → DC AE by multiplication by 1 = 1 (11) 1 × AS BPC −→ BD CP B S · · APC −→ by Lemma 8 (point AP B DC = 1 E eliminated) (12) BPC AML = 14 −→ BD CP B S · · APC −→ AP B DC by Lemma 1 = 1 (13) AML = 10 (−1 · S ) 1 CP B −→ BD CP B S · −1 · · APC −→ AP B CD by Lemma 10 = 1 (14) AML = 11 (−1 · S ) 10 CP B −→ BD CP B S · 1 · · APC −→ AP B CD by ratio cancellation = 1 (15) 1 × AS (1 · S ) CP B −→ BD CP B S · · APC −→ AP B CD by multiplication by 1 = 1 (16) 1 × AS (1 · S ) CP B −→ BD CP B S · · APC −→ AP B CD by multiplication by 1 = 1 (17) 1 × AS CP B ⎛   ⎞ −→ BD ·S −→ CP B ⎜ ⎟ CD ⎝ S · ⎠ APC AP B by multiplication of frac- = 1 (18) S tions 1 × AS CP B −→ BD S · ·S −→ APC CP B CD by multiplication of frac- AP B = 1 (19) tions 1 × AS CP B −→ BD S · · S −→ APC CP B CD by multiple fraction sim- = 1 (20) plification 1 × AS (S · S ) AP B CP B 123 5 Page 18 of 21 P. Quaresma, P. Graziani −→ BD S · · 1 −→ APC CD by ratio cancellation = 1 (21) 1 × AS (S · 1) AP B −→ BD S · −→ APC CD by multiplication by 1 = 1 (22) (S · 1) 1 × AS AP B −→ BD S · APC −→ CD by multiplication by 1 = 1 (23) 1 × AS AP B BP A by Lemma 8 (point S · APC CP A D eliminated) = 1 (24) AP B AML = 14 BP A S · APC −1·S ( ) by Lemma 1 APC = 1 (25) AML = 9 AP B BP A S · APC (−1·S ) by Lemma 1 APC = 1 (26) AML = 9 (−1 · S ) BP A S ·S ( ) APC BP A −1·S ( ) by multiplication of frac- APC = 1 (27) tions 1 × AS (−1 · S ) BP A (1·S ) BP A (−1·1) by ratio cancellation = 1 (28) 1 × AS (−1 · S ) BP A BP A (−1·1) by multiplication by 1 = 1 (29) 1 × AS (−1 · S ) BP A BP A −1 by multiplication by 1 = 1 (30) 1 × AS (−1 · S ) BP A (−1 · S ) BP A by fraction with number = 1 (31) denominator 1 × AS (−1 · S ) BP A by ratio cancellation 1 = 1 (32) 1 × AS Q.E.D NDG conditions are: S = S i.e.,linesBCand PA are not parallel (construction based assumption) BP A CP A S = S i.e., lines ACand PB are not parallel (construction based assumption) AP B CP B S = S i.e., lines ABand PC are not parallel (construction based assumption) APC BPC P = 0i.e., points Fand Bare not identical (conjecture based assumption) FBF 123 Measuring the Readability of... Page 19 of 21 5 P = 0i.e., points DandCare not identical (conjecture based assumption) DC D P = 0i.e., points EandA are not identical (conjecture based assumption) EAE Number of elimination proof steps: 3 Number of geometric proof steps: 6 Number of algebraic proof steps: 23 Total number of proof steps: 32 Time spent by the prover: 0.001 seconds Enlarged Proof Traces TGTP: GEO0021 See Fig. 6. Fig. 6 Medium-readability—TGTP, GEO0021, the first 146, out of 203, steps of the proof TGTP: GEO0020 See Fig. 7. 123 5 Page 20 of 21 P. Quaresma, P. Graziani Fig. 7 Low-readability—TGTP, GEO0020. The first 350, out of 1200, steps of the proof References 1. Chou, S.C., Gao, X.S., Zhang, J.Z.: Machine Proofs in Geometry. World Scientific, Singapore (1994) 2. Chou, S.C., Gao, X.S., Zhang, J.Z.: Automated generation of readable proofs with geometric invariants, I. Multiple and shortest proof generation. J. Automat. Reason. 17, 325–347 (1996). https://doi.org/10. 1007/BF00283134 3. Chou, S.C., Gao, X.S., Zhang, J.Z.: Automated generation of readable proofs with geometric invariants, II. Theorem proving with full-angles. J. Automat. Reason. 17(3), 349–370 (1996). https://doi.org/10. 1007/BF00283134 4. de Bruijn, N.: Selected Papers on Automath, Studies in Logic and the Foundations of Mathematics, vol. 133, chap. A survey of the project Automath, pp. 141–161. Elsevier, Amsterdam (1994). https://doi.org/ 10.1016/S0049-237X(08)70203-9 5. DuBay, W.H. (ed.): The Classic Readability Studies. Impact Information (2006) 6. Gao, H., Li, J., Cheng, J.: Measuring interestingness of theorems in automated theorem finding by forward reasoning based on strong relevant logic. In: 2019 IEEE International Conference on Energy Internet (ICEI), pp. 356–361 (2019). https://doi.org/10.1109/ICEI.2019.00069 7. Hanna, G., Reid, D., de Villiers, M. (eds.): Proof Technology in Mathematics Research and Teaching. Springer, Berlin (2019) 8. Hohenwarter, M.: Geogebra—a software system for dynamic geometry and algebra in the plane. Master’s thesis, University of Salzburg, Austria (2002) 9. Janici ˇ c, ´ P.: GCLC - A tool for constructive Euclidean geometry and more than that. In: A. Iglesias, N. Takayama (eds.) Mathematical Software - ICMS 2006, Lecture Notes in Computer Science, vol. 4151, pp. 58–73. Springer (2006). https://doi.org/10.1007/11832225_6 10. Janici ˇ c, ´ P., Narboux, J., Quaresma, P.: The area method: a recapitulation. J. Automat. Reason. 48(4), 489–532 (2012). https://doi.org/10.1007/s10817-010-9209-7 11. Janici ˇ c, ´ P., Quaresma, P.: System description: GCLCprover + GeoThms. In: U. Furbach, N. Shankar (eds.) Automated Reasoning, Lecture Notes in Computer Science, vol. 4130, pp. 145–150. Springer (2006). https://doi.org/10.1007/11814771_13 12. Jiang, J., Zhang, J.: A review and prospect of readable machine proofs for geometry theorems. J. Syst. Sci. Complexity 25(4), 802–820 (2012). https://doi.org/10.1007/s11424-012-2048-3 13. Johnson, D.A.: The readability of mathematics books. Math. Teach. 50(2), 105–110 (1957). https://doi. org/10.5951/MT.50.2.0105 14. Lemoine, É.: Géométrographie ou Art des constructions géométriques. No. 18 in Scientia, Série Physico- Mathématique. C. Naud, Éditeur, Paris (1902). http://catalogue.bnf.fr/ark:/12148/cb36049032t 15. Loria, G.: La geometrografia e le sue trasfigurazioni. Period. Mat. 3(6), 114–122 (1908) 16. Mackay, J.S.: The geometrography of Euclid’s problems. Proc. Edinb. Math. Soc. 12, 2–16 (1893). https:// doi.org/10.1017/S0013091500001565 123 Measuring the Readability of... Page 21 of 21 5 17. Merikoski, J.K., Tossavainen, T.: Two approaches to geometrography. J. Geom. Graph. 13(1), 15–28 (2010) 18. Narboux, J.: A decision procedure for geometry in Coq. Lect. Notes Comput. Sci. 3223, 225–240 (2004). https://doi.org/10.1007/b100400 19. Pak, K., Schubert, A.: The impact of proof steps sequence on proof readability—experimental setting. In: Workshop and Work in Progress Papers at CICM 2016 (2016) 20. Pinheiro, V.A.: Geometrografia 1. Gráfica Editora Bahiense (1974) 21. Quaresma, P., Graziani, P.: The geometrography’s simplicity coefficient for the axioms and lemma of the area method. Technical Report TR 2021-001, Center for Informatics and Systems of the University of Coimbra (2021) 22. Quaresma, P., Santos, V., Graziani, P., Baeta, N.: Taxonomy of geometric problems. J. Symbol. Comput. 97, 31–55 (2020). https://doi.org/10.1016/j.jsc.2018.12.004 23. Richard, P., Vélez, M., Vaerenbergh, S.V. (eds.): Mathematics Education in the Age of Artificial Intelli- gence, Mathematics Education in the Digital Era, vol. 17. Springer (2022). https://doi.org/10.1007/978- 3-030-86909-0 24. Santos, V., Baeta, N., Quaresma, P.: Geometrography in dynamic geometry. Int. J. Technol. Math. Educ. 26(2), 89–96 (2019). https://doi.org/10.1564/tme_v26.2.06 25. Stojanovic, ´ S., Pavlovic, ´ V., Janici ˇ c, ´ P.: A coherent logic based geometry theorem prover capable of producing formal and readable proofs. In: P. Schreck, J. Narboux, J. Richter-Gebert (eds.) Automated Deduction in Geometry, Lecture Notes in Computer Science, vol. 6877, pp. 201–220. Springer, Berlin (2011). https://doi.org/10.1007/978-3-642-25070-5_12 26. Wang, K., Su, Z.: Automated geometry theorem proving for human-readable proofs. In: Proceedings of the 24th International Conference on Artificial Intelligence, IJCAI’15, pp. 1193–1199. AAAI (2015). http://dl.acm.org/citation.cfm?id=2832249.2832414 27. Wiedijk, F.: The de Bruijn factor. Poster at International Conference on Theorem Proving in Higher Order Logics (TPHOL2000) (2000). Portland, Oregon, USA, 14–18 August 2000 28. Ye, Z., Chou, S.C., Gao, X.S.: Visually dynamic presentation of proofs in plane geometry: Part 2. Auto- mated generation of visually dynamic presentations with the full-angle method and the deductive database method. J. Automat. Reason. 45(3), 243–266 (2010). https://doi.org/10.1007/s10817-009-9163-4 29. Ye, Z., Chou, S.C., Gao, X.S.: An introduction to Java geometry expert. In: T. Sturm, C. Zengler (eds.) Automated Deduction in Geometry, Lecture Notes in Computer Science, vol. 6301, pp. 189–195. Springer, Berlin (2011). https://doi.org/10.1007/978-3-642-21046-4_10 Publisher’s Note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations. http://www.deepdyve.com/assets/images/DeepDyve-Logo-lg.png Journal of Automated Reasoning Springer Journals

Measuring the Readability of Geometric Proofs: The Area Method Case

Loading next page...
 
/lp/springer-journals/measuring-the-readability-of-geometric-proofs-the-area-method-case-qlhf8P8RWU
Publisher
Springer Journals
Copyright
Copyright © The Author(s) 2023
ISSN
0168-7433
eISSN
1573-0670
DOI
10.1007/s10817-022-09652-0
Publisher site
See Article on Publisher Site

Abstract

Using an approach, inspired by our modernisation of Lemoine’s Geometrography, this paper proposes a new readability criterion for formal proofs produced by automated theorem provers for geometry. We analyse two criteria to measure the readability of a proof: the criterion given by Chou et al. and the one given by Wiedijk. After discussing the limitations of these two criteria, we introduce a novel approach, which provides a new criterion. We conclude discussing some future work. Keywords Readability · Formal proof · Automated deduction in geometry · Geometrography Mathematics Subject Classification 51A05 · 68T15 1 Introduction An important feature of a text is its readability. Readability is the ease with which a reader can understand a written text. The readability of a text is determined by many factors and plays an important role in many areas of interest. For example, it might depend both on the content of the text, i.e., the complexity of the vocabulary and syntax, and on the layout of the The first author was partially support by FCT - Foundation for Science and Technology, I.P., within the scope of the project CISUC - UID/CEC/00326/2020 and by European Social Fund, through the Regional Operational Program Centro 2020. The second author was partially support by Italian Ministry of Education, University and Research through the PRIN 2017 project “The Manifest Image and the Scientific Image” prot. 2017ZNWW7F_004. Pedro Quaresma pedro@mat.uc.pt Pierluigi Graziani pierluigi.graziani@uniurb.it Departamento de Matemática da FCTUC, Largo D. Dinis, 3000−143 Coimbra, Portugal Department of Pure and Applied Sciences, University of Urbino, Via Sant’Andrea, 34, Urbino, PU 61029, Italy 0123456789().: V,-vol 123 5 Page 2 of 21 P. Quaresma, P. Graziani text, e.g. its typographical aspects. Readability has to be distinguished from legibility that is the ease with which a reader can recognise individual characters in a text. In order to quantify the readability of a text, various formulas have been defined [5]. In this paper we will deal with the readability of geometric proofs produced by automatic theorem provers. A potential approach that has been followed in the past is to take formulas that were developed for applications to non-scientific texts and apply them to mathematical texts [13]. However, this sort of approach has not been extended in a proper way to measure the readability of proofs produced by automatic theorem provers and much more work needs to be done [6, 12, 19, 22, 26]. A mathematical text is composed of many elements: descriptions in natural language, formulas and diagrams; thus, it is much more difficult to quantify its readability through formulas, then in the case of regular text. Even more complex is the problem of the readability of mathematical proofs produced by automatic provers that are often presented in a form that can only be read by experts. In this paper, we will introduce both a language to formulate readability criteria for formal proofs produced by automated theorem provers for geometry, based on the area method [2, 10] (see Appendix 1), and a novel criterion based on our modernisation of Lemoine’s Geometrography [14, 22, 24]. We will show how this new criterion is consis- tent with the results of the other already existing criteria, but that it is also more general and expressive compared to the others. The proposed language allows for an easy formulation of new readability criteria as well as for an easy implementation of those criteria in repositories such as the Thousand of Geometric problems for geometric Theorem Provers (TGTP), thereby collecting data relevant to the further development of the area of automated theorem proving in geometry. The data will help strengthen the use of automatic tools not only in research but also in applications like in mathematics education, where the use of automated deduction is already making its way [7, 23]. Therefore, as in the Automath project, the formulation of a readability criterion will allow the definition of a threshold below which “people will start using them (the proofs produced by automated theorem provers ) for serious work” (see §2.22.2). Overview of the paper. The paper is organised as follows: first, in §2, the known readability criteria will be discussed. In §3, Lemoine’s Geometrography, its modernisation and a formal language employed to study readability of formal proofs produced by automated theorem provers for geometry, based on the area method, will be analysed. In §4, a new readability criterion that uses Geometrography will be presented, providing also some examples of its application. In §5 conclusions are drawn and future work will be discussed. 2 Criteria of Readability (by Experts) To the best of our knowledge there are two precise proposals to measure the readability of a proof. The first one is that proposed by Chou et al. [1, p. 452], while the second is that proposed by Freek Wiedijk and is known as the de Bruijn factor [4, 27]. We will not consider here the problem of the readability of geometric proofs from the Mathematics Educa- tions point of view. We will address some issues related to that context in the conclusions. http://hilbert.mat.uc.pt/TGTP/index.php. In the original sentence, “Automath like Systems”. 123 Measuring the Readability of... Page 3 of 21 5 Fig. 1 Geometric Construction, Ceva’s Theorem 2.1 Maxt-Lems Criterion Chou et al. [1, p. 452] proposed a way to measure how difficult it is to read a formal proof, obtained by using an automated theorem prover for geometry (GATP) implementing the area method. The Maxt-Lems (ML) criterion considered the following pair (maxt, lems), where: – maxt is the number of terms of the maximal polynomial occurring in the machine’s proof. Thus, maxt measures the number of computations needed in the proof; – lems is the number of elimination lemmas used to eliminate points from geometric quan- tities. In other words, lems indicates the number of deduction steps in the proof. Using those two elements and analysing all the proofs done by their GATP, they managed to determine an indicative threshold for readability. According to [1, p. 452] a formal proof, which employs the area method, is considered readable if one of the following conditions holds: – the maximal term in the proof is less than or equal to 5; – the number of deduction steps of the proof is less than or equal to 10; – the maximal term in the proof is less than or equal to 10 and the deduction steps are less than or equal to 20. It is interesting that according to their corpus: 66.9% of the proofs have maxt ≤ 5, 42.6% have lems ≤ 10 and 73.2% have lems ≤ 20. Let us consider, for example, the Thousand of Geometric problems for geometric Theorem Provers (TGTP) repository, specifically, problem GEO0001,the Ceva’s Theorem. Theorem 1 (Ceva’s Theorem) Let  A BC be a triangle and P be any point in the plane. Let AF BD CE D = AP ∩ CB, E = BP ∩ AC , and F = CP ∩ A B . Show that: × × = 1. FB DC EA P should not be in the lines parallel to AC, AB and BC and passing through B, C and A respectively [10]. With respect to the ML criterion, considering the proof made by the Geometry Construc- tions L T X Converter (GCLC)[11] GATP (see Appendix 1), the values are: max t er m = 1, and lems = 3. Therefore, this would be considered a readable proof. They considered 478 machine solved geometry problems. 123 5 Page 4 of 21 P. Quaresma, P. Graziani Table 1 Ceva’s Theorem Proof Informal Formal de Bruijn factor by GCLC Area Method GATP, de Bruijn factor Uncompressed 125KB 137KB 1.09 Compressed 124KB 136KB 1.09 2.2 The de Bruijn Factor The Automath project had the goal of developing a system that would allow to write entire mathematical theories in such a precise fashion that verification of the correctness of theo- rems in such theories could have been carried out by formal (mechanical) operations applied directly to the text [4]. This was a first effort in the direction of the Formalisation of Mathe- matics that is now pursued by researchers working in systems like Coq, Isabelle and Mizar. In “A Survey of Project Automath”, de Bruijn introduced the loss factor between the size of an ordinary mathematical exposition and its full formal translation inside a computer. The loss factor expresses what someone loses, in terms of shortness, when translating informal mathematics into Automath. Wiedjk developed the concept and called it the de Bruijn factor. The de Bruijn factor was developed for a situation where a proof is entered in a computer in full detail in such a way that the computer can check its correctness, e.g, when an existing informal mathematical text is taken and it is translated into a computer representation (using a system like Automath). So the de Bruijn factor measures how efficient a system is [27]. Wiedijk noted that non-meaningful questions about formatting could affect the calculus of the loss factor, for example: if indentation is performed employing the tab key, then such indentation can be eight times smaller compared to situations in which the indentation is done using the space key; also the T X macro name for the ’⇔’ symbol uses 15 characters, while an encoding like “<=>” uses only 3. To further smooth formatting choices, Wiedjk proposed to compress the files before calculating the ratios of their sizes. Wiedijk calls the ratio of the uncompressed file sizes the apparent de Bruijn factor, and the ratio of the compressed file sizes the intrinsic de Bruijn factor [27]. We claim that the de Bruijn factor can be used, in a broader sense, to measure the effi- ciency of an automated theorem prover and a given axiomatisation. Whenever a informal proof is known for a given theorem, it can be compared with the formal proof produced by the automated theorem prover, using a specified axiomatisation. This is particularly true in geometry where a given informal geometric proof can be compared with an, also geometric, formal proof produced by a geometric automated theorem prover. Using again the Ceva’s Theorem as an example, the readability of its formal proof, with respect to the de Bruijn factor can be calculated (see Table 1). Wiedijk also introduced the de Bruijn threshold, i.e., a limit below which “the people will start using them (Automath like system) for serious work”. We will consider the value of 2 as a readability threshold. Further studies are needed in order to establish a readability threshold for automated proofs, using the de Bruijn factor. Moreover, a broader comparison between formal proofs and informal proofs is needed. Considering the quotient of the size of the compressed formal proof (area method) and the size of the informal proof, the de Bruijn factor of Ceva’s Theorem is 1.09. It would therefore be sensible to consider the GCLC area method proof, readable. https://coq.inria.fr/, https://isabelle.in.tum.de/, http://mizar.org/. We used the proof found in https://artofproblemsolving.com/wiki/index.php/Ceva’s_theorem as source for the informal proof. 123 Measuring the Readability of... Page 5 of 21 5 2.3 ML and de Bruijn Factor’s Limits Analysing the previous criteria, we can note a first limit for both the ML criterion and the de Bruijn factor: they assume that readability by expert is being considered, i.e., a geometer expert in the language of the prover that produces the proof. A second limit emerges when the following [22] classification of formal geometric proofs produced by GATPs is taken into consideration: 1. no readable proof, only a proved/not proved output; 2. non-synthetic proof (i.e., a proof without a corresponding geometric description, e.g. algebraic methods); 3. semi-synthetic proof with a corresponding prover’s language rendering; 4. (semi-)synthetic proof with a corresponding natural language rendering; 5. (semi-)synthetic proof with a corresponding natural language and visual rendering; Relating the ML criterion with this classification, we can note that such criterion only allows the definition of a threshold for semi-synthetic proofs that employ the area method (level 3). The direct applicability of the ML criterion to other synthetic methods, e.g. full- angle methods or the deductive database method [3, 28], would be possible, considering the number of deduction steps of the proofs and adapting the condition regarding the maximal term in the proofs. The de Bruijn factor can be used directly in all levels above 1, although it is more mean- ingful on levels greater or equal than 2.3. Considering the (GCLC) and its integrated GATPs based on the area method, Wu’s method and Gröbner Basis method [9], it is possible to cal- culate the readability of the proofs developed using the different GATPs. It is indeed possible to imagine, extrapolating from the results with the area method, that all those proofs would be readable, and this would hold even though the de Bruijn factor requires informal proofs to be provided. The two criteria analysed are very different, the first is very specific while the second is very generic, although both criteria require readability by experts. We can therefore ask ourselves if it is possible to define a new criterion which does not require readability by experts, which is also more natural and expressive than the previous ones, and which can be generalised to various proof methods. 3 Looking for a More Natural Readability Criterion The new criterion that we want to propose is based on our modernisation of Lemoine’s Geometrography [14, 22, 24]. We will begin by explaining what Geometrography is and what its modernisation consists of. Geometrography, “alias the art of geometric constructions”, aims at providing a tool: (i) to designate every geometric construction by a symbol that manifests its simplicity and GATPs can be of two major types: algebraic, the proof, if it exist, is done recurring to an algebraic reasoning (e.g. Gröbner basis); geometric (synthetic), the proof, if it exist, is done recurring a set of axioms and inference rules of geometry, without the use of coordinates. Semi-synthetic methods, e.g. the area method, use also the axioms of a field of characteristic different from 2. The Wu’s method and the Gröbner basis method are both algebraic methods, from the geometric point of view their proofs are unreadable (level 2). 123 5 Page 6 of 21 P. Quaresma, P. Graziani exactitude; (ii) to teach the simplest way to execute an assigned construction; (iii) to discuss a known solution to a problem and eventually replacing it with a better solution; (iv) to compare different solutions for a problem, by deciding which is the most exact and the simplest solution from the point of view of Geometrography [14–17, 20, 22, 24]. 3.1 Classical Geometrography In Lemoine’s Geometrography two coefficients are defined to measure the relative difficulty to perform some geometric constructions. The approach is applied to ruler and compass geometry, i.e., geometric constructions made only with the help of a ruler and a compass. Considering the modifications proposed by Mackay [16], the following Ruler and Compass constructions and the corresponding coefficients can be analysed. To place the edge of the ruler in coincidence with one point . ............. R To place the edge of the ruler in coincidence with two points . . . .........2 R To draw astraight line.................................................. R To put one point of thecompassesonadeterminate point ............... C To put thepointsofthe compassesontwo determinate points..........2C To describeacircle...................................................... C Then a given construction is measured against the number of uses of those elementary steps. For a given construction expressed by the equation: l R + l R + m C + m C 1 1 2 2 1 1 2 2 where l and m are coefficients denoting the number of times any particular operation is i j performed. The number (l + l + m + m ) is called the coefficient of simplicity (cs) of the 1 2 1 2 construction, and it denotes the total number of operations performed. The number (l + m ) 1 1 is called the coefficient of exactitude (ce) of the construction, and it denotes the number of preparatory operations on which the exactitude of the construction (made with the help of physical, inaccurate, tools) depends [16, 17]. 3.2 Geometrography in Dynamic Geometry Classical Geometrography applies to geometric constructions made with the help of a ruler and a compass. Its modernisation, proposed in [22, 24] uses the tools of the dynamic geometry systems (DGS). In [22] it was shown how to modernise Geometrography using GCLC,in [24] the generality of the approach is shown, using GeoGebra [8]. Considering the operations: define a point, anywhere in the plane, D and define a given object, using other objects, C, the following values for the GCLC basic constructions are obtained: point –fixapoint in theplane........................................... D line –usestwo points. ..................................................2C circle –usestwo points................................................2C intersec –usestwo lines ................................................2C intersec –usesfour points..............................................4C intersec2 –usesacircle andacircle or line.............................2C Exactitude, or the lack of it, in this context refer to the possible inaccuracy introduced by physical tools such as ruler and compass. 123 Measuring the Readability of... Page 7 of 21 5 Fig. 2 Geometric Construction, TGTP problem GEO0369 midpoint –usestwo points..............................................2C med –usestwo points....................................................2C bis –usesthree points..................................................3C perp –usesapoint andaline...........................................2C foot –usesapoint andaline...........................................2C parallel –usesapoint andaline.......................................2C onsegment –usestwo points.............................................2C online –usestwo points................................................2C oncircle –usestwo points..............................................2C In the modernisation (extrapolation) of the Geometrography, considering the “tools” of dynamic geometry systems, the coefficient of exactitude loses its meaning, the constructions will be executed by the DGS, so they are accurate (exact). However, the coefficient of simplic- ity of the constructions can still be useful, it can be used to classify constructions by levels of simplicity. A new dimension can also be added, the coefficient of freedom (cf), given by the degree of freedom a given geometric object has, e.g. “a point in a line” has one degree of freedom, a point in the plane has two degrees of freedom, etc. This new coefficient will give a value to the dynamism of the geometric construction. The degrees of freedom are mea- sured against the point definitions. The point definition, defines a point with two degrees of freedom, the onsegment, online and oncircle constructions, define points with one degree of freedom. For the GCLC constructions contained in TGTP an average value of simplicity (CS )of20.8 was obtained. Using the k-means clustering function implemented gcl in the statistics package of Octave, three classes of geometric constructions describing an increasing level of complexity were defined: simple constructions, 1 ≤ CS ≤ 18; average gcl complexity constructions, 18 < CS ≤ 28; complex constructions, CS > 28. gcl gcl TGTP contain 71 simple constructions; 81 average complexity constructions; 28 complex constructions. For example (TGTP problem’s GEO0369): “In triangle  ABC,let F be the midpoint of the side BC,and D and E the feet of the altitudes on AB and AC, respectively. FG is perpendicular to DE at G. Show that G is the midpoint of DE”, has a geometric construction with coefficient of simplicity 19 (see Fig. 2), so an average complexity construction. The value GNU Octave, version 6.1.1, package octave-statistics, function kmeans https://octave.sourceforge.io/ statistics/function/kmeans.html. 123 5 Page 8 of 21 P. Quaresma, P. Graziani of 6 for its coefficient of freedom is given by the fact that only the three points A, B,and C are free in the plane, while all the other points are completely bind, by construction. 3.3 Geometrography in Automatic Theorem Proving The same approach can be (again) extrapolated to take into consideration synthetic geometric proofs, i.e., proofs based on a geometric axiomatic theory, using geometric inference rules. Considering the proofs produced by the GATP GCLC, implementing the area method [9, 10], the coefficient of simplicity for all the axioms and lemmas of the theory can be calculated. Apart from the geometric constructions in which the proof is based (with coefficient of simplicity nCnst), there are other steps to be considered. (Elementary) Algebraic Simplification..................................(AS) (Elementary) GeometricSimplification. ................................(GS) Application of the Area Method Lemma n...........................(AML ) A given proof can thus be measured against the number of those steps. For a given proof expressed by the equation: n Cnst + n × AS + n × GS + AML 1 2 3 j j =l where n is the coefficient of simplicity of the geometric construction, n is the number of 1 2 algebraic simplifications and n is the number of geometric simplifications. The coefficient of simplicity for the proof would be: CS = n + n + n + CS (AML ) proof 1 2 3 proof j j =l The coefficient of freedom has no meaning in this setting. Each lemma of the area method, AML , has a corresponding simplicity coefficient, the term, CS (AML ), is the sum of all those values, for all the lemmas used in proof j j =l the proof. In order to achieve this for each lemma of the area method the corresponding coefficients of simplicity were calculated [21]. For example, the proof of Lemma 9 will have the following coefficient of simplicity, CS (AML ) = 74. proof 9 Lemma 1 (AML ) Let R be a point on the line P Q. Then for any two points A and B it RQ PR holds that S = S + S . RAB QAB PAB PQ PQ The proofs developed by GATPs based on the Area Method are formal proofs. The method itself was formalised, and proved sound, using the Coq proof assistant. The GATP developed by J. Narboux, as a Coq tactic, can have the proofs verified by Coq.The GCLC area method, do not have, explicitly, that possibility, but, it would be a matter of developing a filter from the GCLC language to the Coq language (see Appendix 1). By elementary algebraic simplification it is understood the basic algebraic operations: addition, subtraction, multiplication, division, and their properties of commutativity, associativity and distributivity. By elementary geometric simplification it is understood the direct application of the definition of the area method quantities. We call them trivial steps. 123 Measuring the Readability of... Page 9 of 21 5 The following is a shorter version of its proof with the elementary algebraic and geometric simplifications condensed (the expanded version can be see in [21]). Geometrography of Lemma 9 (AML ) A B CS = 22 = 4D + 18C gcl CF = 8 gcl – s = S , initial construction; AB P Q –1 × GS, areas of triangles with the same orientation, S = s − S − S ; RAB AR Q BP R PR –1 × AML , lemma 14, = r (AML = 8); 14 14 PQ AR Q RQ –1 × AML , lemma 5, = (AML = 18); 5 5 AP Q PQ RQ PQ−PR –1 × GS, segments with the same orientation, = ; PQ PQ PQ−PR –2 × AS, algebraic simplifications, = (1 − r ) and S = (1 − r )S ; AR Q AP Q PQ S PR BP R –1 × AML , lemma 5, = (AML = 11); 5 5 BPQ PQ –2 × AS, algebraic simplifications, S = r S and S = s − (1 − r )S − BP R BPQ RAB AP Q r S ; BPQ –2 × GS, areas of triangles with the same orientation, S = s − (1 − r )(s − S ) − RAB PAB r (s − S ); QAB –7 × AS, algebraic simplification, S = s − s + rs + S − r S − rs + r S , RAB PAB PAB QAB PR RQ S = r S + (1 − r )S and S = S + S . RAB QAB PAB RAB QAB PAB PQ PQ Geometrography for the demonstration: 4D + 18C + 4GS + 11AS + 1AML + 2AML 14 5 CS = 74 = 22 + 4 + 11 + 8 + (18 + 11) proof AML CS = 22 gcl with AML = 8and AML = 18 (first application) and AML = 11 (second application). 14 5 5 It is considered that, from the second application of a lemma onward, its proof is accepted, so, only its adaptation to the new configuration is needed, i.e., the pattern matching of the lemma configuration to a new setting. For that reason, in any second, third, etc. application of a lemma, only the CS coefficient values are considered. gcl Given that a mathematical proof is a sequence of steps, in addition to the coefficient of simplicity, it would be useful to have other coefficients: e.g., the total number of steps in the proof; the value of the most difficult step in the proof; the number of different steps of high difficulty in the proof; the number of different types of steps (lemmas) in the proof; a proof script; a numerical description of the proof; and a corresponding line chart or proof trace. Therefore, to fully characterise a formal synthetic proof produced by a GATP, we can define and consider the following coefficients: –CS , the simplicity coefficient (as above), it gives the simplicity coefficient for the proof overall proof; 123 5 Page 10 of 21 P. Quaresma, P. Graziani –CT , the total number of steps in the proof; proof –CS , the highest simplicity coefficient of the lemmas/definitions applications, it proofmax gives the simplicity coefficient for the most difficult step of the proof; –CD , the number of different types of lemmas used in the proof; typeproof –CD , the number of different steps of high difficulty in the proof; highproof – The proof script, as defined above; – The corresponding line chart or proof trace in tikz format. It is important to note that to obtain the coefficient CD (hp) the area method lemmas highproof implemented in the GATP GCLC were analysed, and, using the k-means clustering function implemented in the statistics package of Octave, divided into three categories: low difficulty (hp < 284), medium difficulty (284 ≤ hp < 1848) and high difficulty (hp ≥ 1848). Using the defined coefficients above, we have the following values for the proof of AML : ⎪ CS = 74 = 22 + 4 + 11 + 8 + (18 + 11) proof CS = 22 gcl CT = 19 proof AML CS = 18 ⎪ proofmax CD = 2 typeproof CD = 0 highproof The GATP GCLC implementation of the area method [9, 10] is able to produce proof scripts. Using the command prooflevel it is possible to have control over the level of detail of the proof script. Two programs were implemented to calculate the Geometrogra- phy of the proofs. The Geometrography of the construction is calculated by a bash script, gclcGeometrography.bash, that analyses the GCLC geometric construction (not con- sidering all the rendering commands). The Geometrography of the proof script (minus the geometric construction) is calculated by, csproof, a parser that analyse the proof script counting the algebraic steps and the geometric steps in sequence and also the lemmas and definitions of the area method with the respective coefficient of simplicity. Using the program csproof on an arbitrary geometric proof, it can be obtained: a CSV file with the values regarding the Geometrographic Readability Coefficient of Proofs (see Sect. 4); a file with the coefficient of simplicity of the geometric construction; a file with a line chart, a graphical representation of the proof done by the GATP GCLC. To better understand some details, let’s consider again the Ceva’s theorem (see Theorem 1). Using the GATP GCLC, with the full level of detail, the proof script of Ceva’s theorem has all the details explained and it fills two pages, almost three pages, if the notes about the non-degeneracy conditions and about the proof itself are taken into consideration (see Appendix 1). The line chart is shown in Fig. 3. In it, the sequences of algebraic, or geometric, simplifications are condensed in only one step (for a more condensed view of the graph). https://ftp.eq.uc.pt/software/TeX/graphics/pgf/base/doc/pgfmanual.pdf. The open source codes are available in the GitHub project https://github.com/GeoTiles/Geometrography/ tree/master/GeometrographyProofs. Comma Separated Values format. 123 Measuring the Readability of... Page 11 of 21 5 Fig. 3 Ceva’s Theorem, Geometrography Proof Trace Therefore, the Geometrography of Ceva’s Theorem Proof is the following: 4D + 18C + 23AS + 3AML + 3AML + 3AML . 1 8 10 ⎪ CS = 220 proof CS = 22 ⎪ gcl CT = 32 proof Ceva sTheorem ⎪ CS = 84 proofmax CD = 3 typeproof CD = 0 highproof 4 A Geometrographic Criterion It is interesting to note how the Geometrographic coefficients highlight many salient aspects of the proof, aspects that could be used to analyse the readability of such proofs. Furthermore, it is interesting to stress how the proof trace constitutes a sort of electroencephalogram of the machine while proving the theorem. Just as an electroencephalogram can be useful for measuring a brain’s electrical activity, the line chart helps to understand some features of the proof by looking at its trace. Applying the Geometrography to the area method proofs contained in the repository TGTP,using theGATP GCLC with the full level of detail, and using the geometrographic coefficients we can argue in favour of the following new readability coefficient: Geometrographic Readability Coefficient of Proofs (GRCP) GRC P = ((CS − CT ) × (CD + CD )) proof proof highproof typeproof This coefficient relates four quantities: the simplicity coefficient of the proof, the total number of steps in the proof, the number of different steps with high-difficulty in the proof, the number of different lemmas used in the proof. The first factor, (CS − CT ), gives an approximation to the overall coefficient proof proof of simplicity of the non-trivial steps in the proof. Note that CT count the number of proof 123 5 Page 12 of 21 P. Quaresma, P. Graziani steps rather than the coefficient of simplicity of each step. By contrast, in CS ,itisthe proof coefficient of simplicity that counts. Each trivial step has a coefficient of simplicity equal to one, and the coefficients of simplicity for non-trivial steps, such as the construction and the lemmas, are much greater than one. In the light of this, it can be concluded that the difference between CS and CT emphasises the complexity of the proof, disregarding its length. proof proof The second factor, (CD + CD ), gives an account of the difficult steps. highproof typeproof Steps that, potentiality, make the proof much harder to follow, steps where the normal flow of the proof would be interrupted to jump to the proof of the lemma, resuming after completing the lemma’s proof. The addition of the number of high-difficulty steps with the number of different lemmas used in the proof, gave a multiplying factor for the overall complexity of the proof. A final note about this second factor: a high-difficulty step is, for sure, a lemma application, nevertheless we felt that the high-difficulty nature of the lemma is a sufficient reason for this double counting. Multiplying these factors, the approximation for the overall simplicity coefficient and the number difficult steps—both elements that we believe characterise the readability of a proof—we obtain a readability coefficient of a proof. Therefore, considering 71 theorems and their area method proofs, from the TGTP repository and using, again, the k-means clustering function from Octave, the proofs can be divided into the following classes of Geometrographic readability: readable (hi gh − r eadabi li t y), GRCP ≤ 48000; medium-readability 48000 < GRCP ≤ 135000; low-readability, GRCP > 135000. The GRCP for GEO0001, Ceva’s proofs is: GRCP = (220 − 32) × (0 + 3) = GEO0001 564 ≤ 48000, so a readable (high-readability) proof. GRCP Medium-readability Example. TGTP problem’s GEO0021: Theorem 2 (Circumcenter of a Triangle) The circumcenter of a triangle can be found as the intersection of the three perpendicular bisectors has the following values for the different coefficients. ⎪ CS = 8554 proof CS = 11 gcl CT = 591 proof GEO0021 CS = 2807 ⎪ proofmax ⎪ CD = 13 typeproof CD = 3 highproof 48000 < GRC P = 127408 ≤ 135000 By the GRCP criterion, this is a medium-readability problem. It can be seen that it has 13 different lemmas, 3 high-difficulty step, a long proof with a significant difference between the CS and the number of steps of the proof (see Figs. 4 and 6). proof 4.1 GRCP low-readability Example. TGTP problem’s GEO0020: Theorem 3 (Distance of a line containing the centroid to the vertices) Given a triangle A BC and a point X , the sum of the distances of the line X G , where G is the centroid of A BC , to The actual values were rounded for better readability. 123 Measuring the Readability of... Page 13 of 21 5 Fig. 4 TGTP, GEO0021, Circumcenter of a Triangle the two vertices of the triangle situated on the same side of the line is equal to the distance of the line from the third vertex. has the following values for the different coefficients: CS = 19989 proof ⎪ CS = 26 gcl CT = 4119 proof GEO0020 ⎪ CS = 2807 proofmax ⎪ CD = 13 typeproof CD = 4 highproof GRC P = 269790 > 135000, so a low-readability problem. It can be seen that it has 13 different lemmas, 4 high-difficulty step, a long proof, with a very high value of overall complexity (see Figs. 5 and 7). 4.2 Comparing the Different Criteria The Geometrography Readability Coefficient of Proofs criterion takes into consideration all the significant aspects of a formal proof, its overall difficulty, its number of steps, the number of difficult steps and the number of different lemmas that must be applied. The other criteria consider fewer aspects. The de Bruijn criterion, given its different goal, takes only in consideration the size of the proof and it needs to have an informal proof to compare with. The ML criterion considers the number of different lemmas applied and uses the number of 123 5 Page 14 of 21 P. Quaresma, P. Graziani Fig. 5 TGTP, GEO0020, Distance of a line containing the centroid to the vertices terms of the maximal polynomial as a way to have an approximation to the complexity of the proof. Alongside the ML criterion, in the GRCP criterion, the number of lemmas in the proof is considered: in the GRCP criterion as a multiplicative factor, in the ML criterion as one of the conditions for readability. In the ML criterion the number of terms in the maximal polynomial are considered, but, as its authors remarked, this measures the number of computations needed in the proof, not its readability. This is weakly related to the number of steps in the proof. It approaches the number of steps needed to decompose those long polynomials occurring in the proof to a simple expression. Regardless of this criteria comparison, we want to emphasise that the Geometrographic view proposed in this paper has a more general scope. Although the GRCP criterion is a reasonable proposal, the elementary quality of the Geometrographic approach, through the analysis of various coefficients of the proofs, the proof scripts and the proof traces, makes it possible to have a language or a tool that can be used by non-experts to formulate other criteria weaker or stronger than the one we propose. The contribution of this paper is therefore not only that of a Geometrographic criterion, but of a Geometrographic approach to the problem of measuring the readability of formal proofs in automated deduction in geometry, an approach that offer an environment in which to analyse the proofs in detail by proposing and test readability criteria. To the best of our knowledge, it is the first time that the community has access to such a general tool to formulate and to study the readability of formal proofs in automated deduction in geometry. It is also interesting to note that our criterion offers a classification of proofs that is in line, when the fundamental points are considered, with the classifications given by the other two criteria. i.e., proofs that are classified as difficult to read according to the new criterion are also classified as difficult to read for the others, and the same applies to proofs that are easy to read (Table 2). 123 Measuring the Readability of... Page 15 of 21 5 Table 2 Comparison of the three criteria TGTP ML de Bruijn GRCP GEO0001 3 < 5, deduction steps easy 1.6 < 2 easy 564 ≤ 48000 easy(high) GEO0021 13 > 5 deduction steps & 37.63 > 2 difficult 127408 ≤ 135000 number of terms > 5 difficult(medium) difficult GEO0020 13 > 5 deduction steps & 47.31 > 2 difficult 269790 > 135000 number of terms > 5 difficult(low) difficult Finally, we have to say that all the criteria proposed here have no empirical validation through the submission of tests to students, experts, etc. Nevertheless, the great advantage that our approach offers is that it allows to formulate criteria that can be implemented in repositories such as TGTP and can be evaluated experimentally in a very simple way. 5 Conclusions In this paper we have analysed the problem of measuring the readability of formal proofs in automated deduction in geometry. We have introduced two known criteria and highlighted some of their limitations. We have then introduced a third criterion that seems to overcome the problem of readability by expert, therefore being more natural than the previous ones, and seems to be easily generalised. One possible generalisation is given by the possibility to formulate weaker, or stronger, criteria, using the proposed language. Another possible avenue is given by the generalisation of our approach to other GATPs (e.g. the JGEx inte- grated GATPs, area method, full-angle method and deductive databases method [3, 28, 29], ArgoCLP, coherent logic prover [25]) and any other ATP that has a proof script based on axioms, lemmas applications and, eventually, elementary steps (algebraic, geometric, etc.). It is a matter of calculation of the coefficients of simplicity for the axioms and lemmas of the base theory in consideration. As we pointed out, the great advantage that our approach offers is that it allows to formulate criteria that can be implemented in repositories such as TGTP and evaluated experimentally. For this reason, an important work that we are planning is an experiment to be submitted to mathematicians, computer scientists, educationalists and students providing an adequate empirical test for our Geometrographic criterion. Acknowledgements The authors wish to thank Salvatore Florio, Victor Pambuccian, and Mirko Tagliaferri for their comments on an earlier draft of this work. Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. 123 5 Page 16 of 21 P. Quaresma, P. Graziani Appendix A Ceva’s Theorem, GCLC Area Method Proof The area method for Euclidean constructive geometry was proposed by Chou, Gao and Zhang in the early 1990’s [2]. The method can efficiently prove many non-trivial geometry theorems and is one of the most interesting and most successful methods for automated theorem proving in geometry. In [10] a variant of the original axiom system was presented, based on that axiomatisation all the lemmas needed by the method were formally proved and the soundness of the method was established, using the Coq proof assistant [18]. The GCLC implementation of the area method is able to produce formal proofs. If the highest level of details is chosen, prooflevel 7, it would be possible to (an appropriated filter has to be built) formally verify those proofs using a proof assistant, e.g. Coq.The L T X proof scripts that GCLC produces (by default, at prooflevel 2) are a natural language rendering, to be read by mathematicians. The area method axiomatic system for Euclidean plane geometry (within first order logic with equality), has just one primitive type of geometrical objects: points. Variables can also range over a field ( F , +, ·, 0, 1),where F is any field of characteristic different from 2. The axioms of the theory of fields used in GCLC area method proofs, are standard. The Ceva’s proof presented below is a L T X proof script produced by GCLC, at prooflevel 7, edited to include the GRCP values. GCLC Prover Output for conjecture “cevaGEO0001”, Area method used − → −→ − → AF BD CE by the statement · · = 1 (0) − → −→ − → 22 = 4 × D + 18 × C FB DC EA − → −→ − → AF BD CE by Lemma 10 −1 · · · = 1 (1) − → −→ − → AML = 13 BF DC EA − → −→ − → AF BD CE by right association −1 · · · = 1 (2) − → −→ − → 1 × AS BF DC EA − → −→ − → AF BD CE by right association −1 · · · = 1 (3) − → −→ − → 1 × AS BF DC EA −→ − → by Lemma 8 (point S BD CE APC −1 · · · = 1 F eliminated) (4) −→ − → BPC DC EA AML = 84 −→ − → S BD CE by Lemma 10 APC −1 · · · −1 · = 1 (5) −→ − → AML = 13 S 10 BPC DC AE ⎛ ⎞ −→ − → BD CE S · · −1 · APC −→ − → ⎜ ⎟ DC AE by multiplication of frac- ⎜ ⎟ −1 · = 1 (6) ⎝ ⎠ tions 1 × AS BPC Narboux, J.: Formalization of the area method. Coq user contribution (2009). http://dpt-info.u-strasbg.fr/ ~narboux/area_method.html. 123 Measuring the Readability of... Page 17 of 21 5 −→ − → BD CE −1 · S · · −1 · APC −→ − → DC AE by multiplication of frac- = 1 (7) tions 1 × AS BPC −→ − → BD CE −1 · S · −1 · · APC −→ − → DC AE by associativity and com- = 1 (8) mutativity 1 × AS BPC −→ − → BD CE −1 · −1 · S · · APC −→ − → DC AE by associativity and com- = 1 (9) mutativity 1 × AS BPC −→ − → BD CE 1 · S · · APC −→ − → DC AE by multiplication of con- = 1 (10) stants 1 × AS BPC −→ − → BD CE S · · APC −→ − → DC AE by multiplication by 1 = 1 (11) 1 × AS BPC −→ BD CP B S · · APC −→ by Lemma 8 (point AP B DC = 1 E eliminated) (12) BPC AML = 14 −→ BD CP B S · · APC −→ AP B DC by Lemma 1 = 1 (13) AML = 10 (−1 · S ) 1 CP B −→ BD CP B S · −1 · · APC −→ AP B CD by Lemma 10 = 1 (14) AML = 11 (−1 · S ) 10 CP B −→ BD CP B S · 1 · · APC −→ AP B CD by ratio cancellation = 1 (15) 1 × AS (1 · S ) CP B −→ BD CP B S · · APC −→ AP B CD by multiplication by 1 = 1 (16) 1 × AS (1 · S ) CP B −→ BD CP B S · · APC −→ AP B CD by multiplication by 1 = 1 (17) 1 × AS CP B ⎛   ⎞ −→ BD ·S −→ CP B ⎜ ⎟ CD ⎝ S · ⎠ APC AP B by multiplication of frac- = 1 (18) S tions 1 × AS CP B −→ BD S · ·S −→ APC CP B CD by multiplication of frac- AP B = 1 (19) tions 1 × AS CP B −→ BD S · · S −→ APC CP B CD by multiple fraction sim- = 1 (20) plification 1 × AS (S · S ) AP B CP B 123 5 Page 18 of 21 P. Quaresma, P. Graziani −→ BD S · · 1 −→ APC CD by ratio cancellation = 1 (21) 1 × AS (S · 1) AP B −→ BD S · −→ APC CD by multiplication by 1 = 1 (22) (S · 1) 1 × AS AP B −→ BD S · APC −→ CD by multiplication by 1 = 1 (23) 1 × AS AP B BP A by Lemma 8 (point S · APC CP A D eliminated) = 1 (24) AP B AML = 14 BP A S · APC −1·S ( ) by Lemma 1 APC = 1 (25) AML = 9 AP B BP A S · APC (−1·S ) by Lemma 1 APC = 1 (26) AML = 9 (−1 · S ) BP A S ·S ( ) APC BP A −1·S ( ) by multiplication of frac- APC = 1 (27) tions 1 × AS (−1 · S ) BP A (1·S ) BP A (−1·1) by ratio cancellation = 1 (28) 1 × AS (−1 · S ) BP A BP A (−1·1) by multiplication by 1 = 1 (29) 1 × AS (−1 · S ) BP A BP A −1 by multiplication by 1 = 1 (30) 1 × AS (−1 · S ) BP A (−1 · S ) BP A by fraction with number = 1 (31) denominator 1 × AS (−1 · S ) BP A by ratio cancellation 1 = 1 (32) 1 × AS Q.E.D NDG conditions are: S = S i.e.,linesBCand PA are not parallel (construction based assumption) BP A CP A S = S i.e., lines ACand PB are not parallel (construction based assumption) AP B CP B S = S i.e., lines ABand PC are not parallel (construction based assumption) APC BPC P = 0i.e., points Fand Bare not identical (conjecture based assumption) FBF 123 Measuring the Readability of... Page 19 of 21 5 P = 0i.e., points DandCare not identical (conjecture based assumption) DC D P = 0i.e., points EandA are not identical (conjecture based assumption) EAE Number of elimination proof steps: 3 Number of geometric proof steps: 6 Number of algebraic proof steps: 23 Total number of proof steps: 32 Time spent by the prover: 0.001 seconds Enlarged Proof Traces TGTP: GEO0021 See Fig. 6. Fig. 6 Medium-readability—TGTP, GEO0021, the first 146, out of 203, steps of the proof TGTP: GEO0020 See Fig. 7. 123 5 Page 20 of 21 P. Quaresma, P. Graziani Fig. 7 Low-readability—TGTP, GEO0020. The first 350, out of 1200, steps of the proof References 1. Chou, S.C., Gao, X.S., Zhang, J.Z.: Machine Proofs in Geometry. World Scientific, Singapore (1994) 2. Chou, S.C., Gao, X.S., Zhang, J.Z.: Automated generation of readable proofs with geometric invariants, I. Multiple and shortest proof generation. J. Automat. Reason. 17, 325–347 (1996). https://doi.org/10. 1007/BF00283134 3. Chou, S.C., Gao, X.S., Zhang, J.Z.: Automated generation of readable proofs with geometric invariants, II. Theorem proving with full-angles. J. Automat. Reason. 17(3), 349–370 (1996). https://doi.org/10. 1007/BF00283134 4. de Bruijn, N.: Selected Papers on Automath, Studies in Logic and the Foundations of Mathematics, vol. 133, chap. A survey of the project Automath, pp. 141–161. Elsevier, Amsterdam (1994). https://doi.org/ 10.1016/S0049-237X(08)70203-9 5. DuBay, W.H. (ed.): The Classic Readability Studies. Impact Information (2006) 6. Gao, H., Li, J., Cheng, J.: Measuring interestingness of theorems in automated theorem finding by forward reasoning based on strong relevant logic. In: 2019 IEEE International Conference on Energy Internet (ICEI), pp. 356–361 (2019). https://doi.org/10.1109/ICEI.2019.00069 7. Hanna, G., Reid, D., de Villiers, M. (eds.): Proof Technology in Mathematics Research and Teaching. Springer, Berlin (2019) 8. Hohenwarter, M.: Geogebra—a software system for dynamic geometry and algebra in the plane. Master’s thesis, University of Salzburg, Austria (2002) 9. Janici ˇ c, ´ P.: GCLC - A tool for constructive Euclidean geometry and more than that. In: A. Iglesias, N. Takayama (eds.) Mathematical Software - ICMS 2006, Lecture Notes in Computer Science, vol. 4151, pp. 58–73. Springer (2006). https://doi.org/10.1007/11832225_6 10. Janici ˇ c, ´ P., Narboux, J., Quaresma, P.: The area method: a recapitulation. J. Automat. Reason. 48(4), 489–532 (2012). https://doi.org/10.1007/s10817-010-9209-7 11. Janici ˇ c, ´ P., Quaresma, P.: System description: GCLCprover + GeoThms. In: U. Furbach, N. Shankar (eds.) Automated Reasoning, Lecture Notes in Computer Science, vol. 4130, pp. 145–150. Springer (2006). https://doi.org/10.1007/11814771_13 12. Jiang, J., Zhang, J.: A review and prospect of readable machine proofs for geometry theorems. J. Syst. Sci. Complexity 25(4), 802–820 (2012). https://doi.org/10.1007/s11424-012-2048-3 13. Johnson, D.A.: The readability of mathematics books. Math. Teach. 50(2), 105–110 (1957). https://doi. org/10.5951/MT.50.2.0105 14. Lemoine, É.: Géométrographie ou Art des constructions géométriques. No. 18 in Scientia, Série Physico- Mathématique. C. Naud, Éditeur, Paris (1902). http://catalogue.bnf.fr/ark:/12148/cb36049032t 15. Loria, G.: La geometrografia e le sue trasfigurazioni. Period. Mat. 3(6), 114–122 (1908) 16. Mackay, J.S.: The geometrography of Euclid’s problems. Proc. Edinb. Math. Soc. 12, 2–16 (1893). https:// doi.org/10.1017/S0013091500001565 123 Measuring the Readability of... Page 21 of 21 5 17. Merikoski, J.K., Tossavainen, T.: Two approaches to geometrography. J. Geom. Graph. 13(1), 15–28 (2010) 18. Narboux, J.: A decision procedure for geometry in Coq. Lect. Notes Comput. Sci. 3223, 225–240 (2004). https://doi.org/10.1007/b100400 19. Pak, K., Schubert, A.: The impact of proof steps sequence on proof readability—experimental setting. In: Workshop and Work in Progress Papers at CICM 2016 (2016) 20. Pinheiro, V.A.: Geometrografia 1. Gráfica Editora Bahiense (1974) 21. Quaresma, P., Graziani, P.: The geometrography’s simplicity coefficient for the axioms and lemma of the area method. Technical Report TR 2021-001, Center for Informatics and Systems of the University of Coimbra (2021) 22. Quaresma, P., Santos, V., Graziani, P., Baeta, N.: Taxonomy of geometric problems. J. Symbol. Comput. 97, 31–55 (2020). https://doi.org/10.1016/j.jsc.2018.12.004 23. Richard, P., Vélez, M., Vaerenbergh, S.V. (eds.): Mathematics Education in the Age of Artificial Intelli- gence, Mathematics Education in the Digital Era, vol. 17. Springer (2022). https://doi.org/10.1007/978- 3-030-86909-0 24. Santos, V., Baeta, N., Quaresma, P.: Geometrography in dynamic geometry. Int. J. Technol. Math. Educ. 26(2), 89–96 (2019). https://doi.org/10.1564/tme_v26.2.06 25. Stojanovic, ´ S., Pavlovic, ´ V., Janici ˇ c, ´ P.: A coherent logic based geometry theorem prover capable of producing formal and readable proofs. In: P. Schreck, J. Narboux, J. Richter-Gebert (eds.) Automated Deduction in Geometry, Lecture Notes in Computer Science, vol. 6877, pp. 201–220. Springer, Berlin (2011). https://doi.org/10.1007/978-3-642-25070-5_12 26. Wang, K., Su, Z.: Automated geometry theorem proving for human-readable proofs. In: Proceedings of the 24th International Conference on Artificial Intelligence, IJCAI’15, pp. 1193–1199. AAAI (2015). http://dl.acm.org/citation.cfm?id=2832249.2832414 27. Wiedijk, F.: The de Bruijn factor. Poster at International Conference on Theorem Proving in Higher Order Logics (TPHOL2000) (2000). Portland, Oregon, USA, 14–18 August 2000 28. Ye, Z., Chou, S.C., Gao, X.S.: Visually dynamic presentation of proofs in plane geometry: Part 2. Auto- mated generation of visually dynamic presentations with the full-angle method and the deductive database method. J. Automat. Reason. 45(3), 243–266 (2010). https://doi.org/10.1007/s10817-009-9163-4 29. Ye, Z., Chou, S.C., Gao, X.S.: An introduction to Java geometry expert. In: T. Sturm, C. Zengler (eds.) Automated Deduction in Geometry, Lecture Notes in Computer Science, vol. 6301, pp. 189–195. Springer, Berlin (2011). https://doi.org/10.1007/978-3-642-21046-4_10 Publisher’s Note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Journal

Journal of Automated ReasoningSpringer Journals

Published: Mar 1, 2023

Keywords: Readability; Formal proof; Automated deduction in geometry; Geometrography; 51A05; 68T15

References