User:GrafZahl/Draft of set-theoretical description of JHilbert


 * [[Image:Zeichen 123.svg|50px|none]] Work in progress

Introduction
JHilbert is a Java application based on Ghilbert, which in turn is based on metamath. It allows the collaborative formal verification of proofs. JHilbert has a command line mode, and a server mode (which drives this wiki), making it an application of already medium complexity. Being what it is, however, the small kernel of core concepts, the logic that drives the proof machinery, is of crucial importance. It is only these core concepts, notably without the collaboration features (modularisation) and I/O system, which shall be formally described in this essay in informal, set-theoretical language. (By "formally" I mean that I use actual mathematics for the description, by "informal" I mean that I don't intend to verify the description with a proof verifier, interesting a project as it would be.)

There has been an earlier attempt to describe these core concepts and prove their soundness. Aside from containing some errors, it was at times unnecessarily complicated. I shall attempt to do away with such cruft this time.

Acknowledgements
There are lots of people to acknowledge, most notably Normal Megill for creating metamath, Raph Levien for creating Ghilbert and for discussing the definition mechanism with me, Mel L. O'Cat for discussions during the time JHilbert was made, Carl Witty for the new definition mechanism, and Kingdon for prolonged interest in JHilbert. Bug me if I forgot someone.

Notation
As the title suggests, we shall use set theory to describe JHilbert concepts. You should have elementary familiarity with set theory before reading this document. The notation used by us is explained in the following table.

Formal systems
The four basic concepts of JHilbert are kinds, variables, functors and statements. When JHilbert reads source code, it keeps track of these in the form of a formal system $$\mathfrak{F}$$, whose set-theoretical representation will be given here by $$\mathcal{F}=(KIND,VAR,FUNC,STAT)$$. Here, the sets $$KIND$$, $$VAR$$, $$FUNC$$, and $$STAT$$ stand for the currently defined kinds, variables, functors, and statements, respectively. Each time the user defines a new one of these objects, the respective set is increased by one new member in the set-theoretical representation. In the following sections, we shall define the set-theoretical nature of the objects that go into these sets. Initially, all four sets start out empty, that is $$\mathcal{F}=(\emptyset,\emptyset,\emptyset,\emptyset)$$.

Objects and names
In the following sections, we shall define the set-theoretical objects that may go into the sets $$KIND$$, $$VAR$$, $$FUNC$$, and $$STAT$$ as certain tuples. The first element of such a tuple will always be a name. This document does not elaborate on the precise nature a name might have. In a JHilbert implementation they would be Unicode identifiers. We merely remark that we assume all of our objects to have mutually distinct names. Hence, names serve as sure distinctors where two objects are structurally identical but must be different nevertheless. A typical example would be two different variables of the same kind.

Kinds
Kinds, the elements of $$KIND$$, are just names. They are assigned additional data by the fact that we assume the set $$KIND$$ to be a disjoint union $$KIND=SKIND\cup VKIND$$ ("disjoint union" means $$SKIND\cap VKIND=\emptyset$$), where $$SKIND$$ is the set of substitutable kinds, while $$VKIND$$ is the set of pure variable kinds. The difference between those two kinds of kinds will become apparent in the sections on functors and expressions.

In JHilbert, the  command allows to define equivalence classes among kinds. As this is a feature of the modularity system, we shall omit its set-theoretical representation here and merely remark that a substitutable kind and a pure variable kind can never be equivalent.

Variables
Variables, the elements of $$VAR$$ are defined as tuples $$(N,K)$$ where $$N$$ is a name and $$K\in KIND$$. The decomposition of kinds into substitutable kinds and pure variable kinds induces a disjoint decomposition $$VAR=SVAR\cup PVAR$$, where $$SVAR$$ is the set of variables of substitutable kind, and $$PVAR$$ is the set of variable of pure variable kind.

Functors
Functors are the elements of $$FUNC$$. Together with the variables, they will be used to build complex expressions in the next section. We assume $$FUNC$$ to be totally ordered. The order of functors corresponds to the order in which they are defined in JHilbert. This total order is later used to prevent cyclic definitions and abbreviations. In this context, we speak of a functor coming before or after another. The functors themselves are abstractly characterised by the following properties: The set $$FUNC$$ decomposes into three mutually disjoint subsets $$FUNC=TERM\cup ABBREV\cup DEF$$, the elements of each of which will have a different set-theoretical representation, but all elements will have the above abstract properties. In this section, we will only explain the set-theoretical representation of the elements of $$TERM$$, leaving $$ABBREV$$ and $$DEF$$ for the sections on abbreviations and definitions, respectively.
 * A result kind $$R\in SKIND$$. Note that $$R\in VKIND$$ is not allowed.
 * A place count $$n\in\mathbb{N}_0$$. It describes the number of "parameters" a functor takes.
 * $$n$$ input kinds $$K_1,\ldots,K_n\in KIND$$. Note that here, both substitutable and pure variable kinds are permitted. If $$n=0$$, the functor has no input kinds at all and is constant.

The elements of $$TERM$$ are called the term functors. They are triples $$(N,R,K)$$ where $$N$$ is a name, $$R\in SKIND$$ is the result kind and $$K=(K_1,\ldots,K_n)$$ is a finite, possibly empty, sequence of kinds, that is $$K_i\in KIND$$, $$i=1,\ldots,n$$. The place count is then given by $$n=|K|$$, the length of the sequence $$K$$.

Expressions
Expressions are a derived concept. Set-theoretically, every expression is a finite sequence of elements from $$VAR\cup FUNC$$. However, the converse is not true. We shall now describe recursively which of these finite sequences are expressions. Alongside, we shall define the kind of each expression.
 * 1) Each sequence which consists of only a single variable is an expression. The kind of the expression equals the kind of the variable.
 * 2) Each sequence which is a concatenation $$(f)\sqcup e_1\sqcup\ldots\sqcup e_n$$ where $$f$$ is a functor, $$n$$ is the place count of $$f$$ and $$e_1,\ldots,e_n$$ are expressions such that the kind of the $$k$$-th expression matches the $$k$$-th input kind of $$f$$, for all $$k=1,\ldots,n$$, is an expression. The kind of the expression is the result kind of $$f$$.
 * 3) Nothing else is an expression, or the kind of an expression.

We denote the set of all expressions by $$EXP$$. Note that $$EXP$$ depends on $$\mathcal{F}$$. Unlike the previously encountered sets, $$EXP$$ may be (and usually is) an infinite set. Also note that the only way the kind of an expression can be a pure variable kind is the expression consisting of a single variable with pure variable kind.

If $$e\in EXP$$, then we call a variable $$v$$ appearing in $$e$$ an apparent variable of $$e$$.

Proposition. The set $$EXP$$ is prefix-free, that is, if $$e\in EXP$$ and $$s$$ is any finite non-empty sequence of elements from $$VAR\cup FUNC$$, then $$e\sqcup s\notin EXP$$. Furthermore, the decomposition of an expression according to 2. of the definition is unique.

Proof. By definition, there are no empty expressions, and any expression of length greater than one starts with a functor. Hence, the claims are true for expressions of length exactly one, as such expressions are either variables or constant functors with place count zero. Let $$e$$ be an expression of length greater than one. Assume the claims to be true for shorter expressions. Let $$s$$ be a finite sequence of elements from $$VAR\cup FUNC$$ such that $$\tilde{e}:=e\sqcup s$$ is an expression. Let $$(f)\sqcup e_1\sqcup\ldots\sqcup e_n$$ be a decomposition of $$e$$, and let $$(\tilde{f})\sqcup\tilde{e}_1\sqcup\ldots\sqcup\tilde{e}_{\tilde{n}}$$ be a decomposition of $$\tilde{e}$$, both as suggested by the definition. Clearly, $$f=\tilde{f}$$. Therefore, $$n=\tilde{n}$$ to match the place count. Now, $$\tilde{e}_1$$ cannot be shorter than $$e_1$$, or else it would be a prefix for $$e_1$$, in violation of the induction hypothesis. Likewise, $$e_1$$ cannot be shorter than $$\tilde{e}_1$$. Therefore, $$e_1=\tilde{e}_1$$. Now, $$e_2$$ and $$\tilde{e}_2$$, if they exist, have the same starting point. Hence, by the same argument, $$e_2=\tilde{e}_2$$, and so on, $$e_k=\tilde{e}_k$$ for all $$k=1,\ldots,n$$. Therefore, $$e=\tilde{e}$$ and thus $$s=$$. Hence, by induction, $$e$$ is not a prefix, and its decomposition is unique. Q.E.D.

A map $$s\colon VAR\to EXP$$ is called a proper substitution map if the kind of $$v$$ and $$s(v)$$ are equal for all $$v\in VAR$$. Note that this implies $$s(PVAR)\subseteq PVAR^1$$, that is, pure variable kind variables are mapped to expressions consisting only of a pure variable kind variable. Also note that we can extend any partial map from $$VAR$$ to $$EXP$$ to a total map by setting $$s(v):=(v)$$ for all $$v\in VAR$$ outside the domain of $$s$$.

Let $$e\in EXP$$ and let $$s\colon VAR\to EXP$$ be a proper substitution map. We can then recursively define the proper substitution $$s(e)$$ as follows: Due to the above proposition, $$s(e)$$ is well-defined. It is easy to see that $$s(e)\in EXP$$.
 * 1) $$s(e)=(s(v))$$ if $$e=(v)$$.
 * 2) $$s(e)=(f)\sqcup s(e_1)\sqcup\ldots\sqcup s(e_n)$$ if $$e=(f)\sqcup e_1\sqcup\ldots\sqcup e_n$$.

Abbreviations
Abbreviation functors, the members of the set $$ABBREV$$ are special functors which can be used in an expression to abbreviate or otherwise re-express another expression. Set-theoretically, they take the form $$(N,l,e)$$ where $$N$$ is a name, $$l$$ is a finite (possibly empty) sequence of variables and $$e$$ is an expression, called the definiens, which starts with a functor and whose apparent variables are precisely the elements of $$l$$. Furthermore, all functors occurring in $$e$$ must come before $$(N,l,e)$$. The place count of a definition $$(N,l,e)$$ is given by $$|l|$$, and its input kinds are the kinds of the variables $$l_1,\ldots,l_{|l|}$$. Its result kind is the kind of $$e$$. Note that since $$e$$ starts with a functor, this is always a substitutable kind. We now recursively define the depth of a functor recursively as a non-negative integer: These definitions establish a relation between an abbreviation functor and the starting functor of its definiens. This gives rise to chains of functors. The elements of these chains are enumerated by their depth. Chains may overlap, but they form a forest (union of trees), in particular the maximum depth element of a chain never overlaps with anything else. The union of these chains form a partial order which is compatible (that is, a subset of) the total order on the functors.
 * 1) A functor has depth $$0$$ if it is not an abbreviation functor.
 * 2) A an abbreviation functor has depth $$n$$ if the starting functor of its definiens has depth n-1.

Let $$f:=(N,l,e)\in ABBREV$$ have place count $$n$$, and let $$e_1,\ldots,e_n\in EXP$$ such that $$\tilde{e}:=f\sqcup e_1\sqcup\ldots\sqcup e_n\in EXP$$. Define $$s\colon VAR\to EXP$$ by setting $$s(l_1):=e_1,\ldots,s(l_n):=e_n$$, and $$s(v)=(v)$$ otherwise. It is easy to see that this is a proper substitution map. We now define recursively when an expression is the abbreviation of another. This relation defines a partial order on $$EXP$$. Furthermore, each element of $$EXP$$ either does not contain any abbreviation functors, or is the abbreviation of a unique such expression. For assume $$e$$ contains an abbreviation functor. Pick all such functors with maximal depth in $$e$$ and replace each of their corresponding expressions with expressions they are abbreviations of. The expression thus gained only contains functors which come before that functor. Hence, after a finite number of steps, the functors in the expression will all have depth zero. Uniqueness follows since the chains form a tree. We call this unique expression the total unfolding of the original expression.
 * 1) For all $$e,s,\tilde{e}$$ as above, $$\tilde{e}$$ is an abbreviation of $$s(e)$$.
 * 2) Let $$e,s,\tilde{e}$$ be as above. Assume $$e'$$ contains $$\tilde{e}$$ as a subexpression. Let $$e$$ be $$e'$$, with one or more occurrences of $$\tilde{e}$$ replaced by $$s(e)$$. Then $$e'$$ is an abbreviation of $$e$$.
 * 3) Let $$e$$ be an abbreviation of $$e'$$ and let $$e'$$ be an abbreviation of $$e$$, then $$e$$ is an abbreviation of $$e$$.

Let $$e_1,e_2\in EXP$$ and let $$e_1',e_2'$$ be their total unfoldings. We call $$e_2$$ a specialisation of $$e_1$$ if there is a proper substitution map $$s$$ such that $$s(e_1')=e_2'$$.

Disjoint variable constraints
A disjoint variable constraint is a tuple $$(v_1,v_2)$$ with $$v_1\in PVAR$$,$$v_2\in VAR$$ and $$v_1\neq v_2$$.

Note that since $$VAR$$ is a finite set, any set of disjoint variable constraints is necessarily also finite.

Statements
Statements, the elements of $$STAT$$, are defined as quadruples $$(N,D,H,e)$$, where $$N$$ is a name, $$H$$ is a finite (possibly empty) sequence of expressions (the hypotheses), $$e$$ is an expression (the consequent) and $$D$$ is a set of disjoint variable constraints with the extra restriction that all variables appearing in the tuples of $$D$$ are apparent variables of the consequent or of one of the hypotheses.

We call a quadruple $$(N,D,H,e)$$ a pre-statement if it fulfils the above definition except possibly the apparent variable restriction on $$D$$.