Basic proof assistant based on Calculus of Constructions

Find a file

William Ball 84e44b0e33 made universe hierarchy predicative except for lowest		2024-12-02 20:39:56 -08:00
app	made repl preserve environment	2024-12-01 18:06:03 -08:00
examples	made repl preserve environment	2024-12-01 18:06:03 -08:00
lib	made universe hierarchy predicative except for lowest	2024-12-02 20:39:56 -08:00
.gitignore	basics	2024-10-05 13:36:05 -07:00
CHANGELOG.md	basics	2024-10-05 13:31:09 -07:00
LICENSE	basics	2024-10-05 13:31:09 -07:00
perga.cabal	compiles, getting stuck somewhere though	2024-11-30 23:43:17 -08:00
README.org	made universe hierarchy predicative except for lowest	2024-12-02 20:39:56 -08:00

README.org

Perga

perga is a basic proof assistant based on a dependently typed lambda calculus (calculus of constructions), but augmented with a simple universe hierarchy (Extended Calculus of Constructions but without Σ-types, though I intend to add them). This implementation is based on the exposition in Nederpelt and Geuvers' Type Theory and Formal Proof. Right now it is a perfectly capable higher order logic proof checker, though there is lots of room for improved ergonomics and usability, which I intend to work on. At the moment, perga is comparable to Automath in terms of power and ease of use, being slightly more powerful than Automath, and a touch less ergonomic.

Syntax

The syntax is fairly flexible and should work as you expect. Identifiers can be Unicode as long as megaparsec calls them alphanumeric. λ and Π abstractions can be written in the usual ways that should be clear from the examples below. Additionally, arrows can be used as an abbreviation for a Π type where the parameter doesn't appear in the body as usual.

All of the following example terms correctly parse, and should look familiar if you are used to standard lambda calculus notation or Coq syntax.

  λ (α : *) ⇒ λ (β : *) ⇒ λ (x : α) ⇒ λ (y : β) ⇒ x
  fun (A B C : *) (g : → C) (f : A → B) (x : A) : C ⇒ g (f x)
  fun (S : *) (P Q : S -> *) (H : Π (x : S) , P x -> Q x) (HP : forall (x : S), P x) => fun (x : S) => H x (HP x)

To be perfectly clear, λ abstractions can be written with either "λ" or "fun", and are separated from their bodies by either "=>" or "⇒". Binders with the same type can be grouped together, and multiple binders can occur between the "λ" and the "⇒". You can also optionally add the return type after the binders and before the "⇒", though this can always be inferred and so isn't necessary

Π types can be written with either "Π", "∀", or "forall", and are separated from their bodies with a ",". Arrow types can be written "->" or "→". Like with λ abstractions, binders with the same type can be grouped, and multiple binders can occur between the "Π" and the ",". Like with λ types, the "return" type can optionally be added after the binders and before the ",", however this is even more useless, as it is nearly always *, the type of types.

The universe hierarchy is very similar to Coq, with * : □ : □₁ : □₂ : ..., where * is impredicative and the □ᵢ are predicative. There is no universe polymorphism, making this rather limited. A lack of inductive types (or even just built-in Σ-types and sum types) makes doing logic at any universe level other than * extremely limited. For ease of typing, []1, □1, []₁, and □₁ are all the same.

Let expressions have syntax shown below.

  let ( (<ident> (: <type>)? := <expr>) )+ in <expr> end

Below is a more concrete example.

  let (a : A := (and_elim_l A (and B C) h))
      (bc := (and_elim_r A (and B C) h))
      (b := (and_elim_l B C bc))
      (c := (and_elim_r B C bc))
  in
      and_intro (and A B) C (and_intro A B a b) c
  end

You can also directly bind functions. Here's an example.

  let (f (A : *) (x : A) : A := x) in
      f x
  end

The syntax for binding functions is just like with definitions.

Definitions and axioms have abstract syntax as shown below.

  def <ident> (<ident> : <type>)* : <type>? := <term>;
  axiom <ident> (<ident> : <type>)* : <type>;

(The distinction between <type> and <term> is purely for emphasis; they are the exact same syntactic category.) Here's a couple definitions of the const function from above showing the options with the syntax, and a more complex example declaring functional extensionality as an axiom (assuming equality has been previously defined having type eq : Π (A : *) → A → A → *). Duplicate definitions are not normally allowed and will result in an error.

  def const := λ (α : *) ⇒ λ (β : *) ⇒ λ (x : α) => λ (y : β) => x;
  def const : ∀ (α β : *), α → β → α := fun (α β : *) (x : α) (y : β) ⇒ x;
  def const (α β : *) (x : α) (y : β) : α := x;

  axiom funext (A B : *) (f g : A → B) : (∀ (x : A), eq B (f x) (g x)) → eq (A → B) f g;

Type ascriptions are optional in both definitions and let bindings. If included, perga will check to make sure your definition matches the ascription, and, if so, will remember the way your wrote the type when printing inferred types, which is particularly handy when using abbreviations for complex types. perga has no problem inferring the types of top-level definitions, as they are completely determined by the term, but I recommend including ascriptions most of the time, as they serve as a nice piece of documentation, help guide the implementation process, and make sure you are implementing the type you think you are.

If the RHS of a definition is axiom, then perga will assume that the identifier is an inhabitant of the type ascribed to it (as such when using axioms, a type ascription is required). This allows you to use axioms.

Line comments are -- like in Haskell, and block comments are [* *] somewhat like ML (and nest properly). There is no significant whitespace, so you are free to format code as you wish.

There isn't a proper module system (yet), but you can include other files in a dumb, C preprocessor way by using @include <filepath> (NOTE: this unfortunately messes up line numbers in error messages). Filepaths are relative to the current file. Additionally, @include automatically keeps track of what has been included, so duplicate inclusions are skipped, meaning no include guards are necessary.

Usage

Running perga without any arguments drops you into a basic repl. From here, you can type in definitions which perga will typecheck. Previous definitions are accessible in future definitions. The usual readline keybindings are available, including navigating history, which is saved between sessions (in ~/.cache/perga/history). In the repl, you can enter ":q", press C-c, or press C-d to quit. Entering ":e" shows everything that has been defined along with their types. If you want to see the value of an identifier defined in the environment, you can enter ":v <ident>". Entering ":t <expr>" prints the type of an expression. Entering ":n <expr>" will fully normalize (including unfolding definitions) an expression, while ":w <expr>" will reduce it to weak head normal form. Finally ":l <filepath>" loads a file.

Here's an example session showing the capabilities of the repl.

  > :l examples/computation.pg
  loading: examples/computation.pg
  > :e
  eight : nat
  eq : ∏ (A : *) . A -> A -> *
  eq_cong : ∏ (A B : *) (x y : A) (f : A -> B) . eq A x y -> eq B (f x) (f y)
  eq_refl : ∏ (A : *) (x : A) . eq A x x
  eq_sym : ∏ (A : *) (x y : A) . eq A x y -> eq A y x
  eq_trans : ∏ (A : *) (x y z : A) . eq A x y -> eq A y z -> eq A x z
  five : nat
  four : nat
  nat : *
  nine : nat
  one : nat
  one_plus_one_is_two : eq nat (plus one one) two
  plus : nat -> nat -> nat
  seven : nat
  six : nat
  suc : nat -> nat
  ten : nat
  three : nat
  times : nat -> nat -> nat
  two : nat
  two_plus_two_is_four : eq nat (plus two two) four
  two_times_five_is_ten : eq nat (times two five) ten
  zero : nat
  > :n plus one one
  λ (A : *) (f : A -> A) (x : A) . f (f x)
  > :n two
  λ (A : *) (f : A -> A) (x : A) . f (f x)
  > :w plus one one
  λ (A : *) (f : A -> A) (x : A) . one A f (one A f x)
  > :w two
  λ (A : *) (f : A -> A) (x : A) . f (one A f x)

You can also give perga a filename as an argument, in which case it will typecheck every definition in the file. If you give perga multiple filenames, it will process each one in turn, sharing an environment between them. Upon finishing, which should be nearly instantaneous, it will print out all files it processed, and "success!" if it successfully typechecked, and the first error it encountered otherwise.

Simple Example

There are many very well commented examples in the /wball/perga/src/commit/84e44b0e33565fac172967521e57f67f6fe5ef64/examples folder. These include

/wball/perga/src/commit/84e44b0e33565fac172967521e57f67f6fe5ef64/examples/logic.pg, which defines the standard logical operators and proves standard results about them,
/wball/perga/src/commit/84e44b0e33565fac172967521e57f67f6fe5ef64/examples/classical.pg, which asserts the law of the excluded middle as an axiom, and proves several results that require it,
/wball/perga/src/commit/84e44b0e33565fac172967521e57f67f6fe5ef64/examples/computation.pg, which demonstrates using perga for computational purposes,
/wball/perga/src/commit/84e44b0e33565fac172967521e57f67f6fe5ef64/examples/algebra.pg, which defines standard algebraic structures and proves results for them, and
/wball/perga/src/commit/84e44b0e33565fac172967521e57f67f6fe5ef64/examples/peano.pg, which proves standard arithmetic results from the Peano axioms.

I intend to extend these examples further.

Here is an example file defining Leibniz equality and proving that it is reflexive, symmetric, and transitive.

  -- file: equality.pg
  
  -- Defining Leibniz equality
  -- Note that we can leave the ascription off
  eq (A : *) (x y : A) := forall (P : A -> *), P x -> P y;

  -- Equality is reflexive, which is easy to prove
  -- Here we give an ascription so that when `perga` reports the type,
  -- it references `eq` rather than inferring the type.
  eq_refl (A : *) (x : A) : eq A x x := fun (P : A -> *) (Hx : P x) => Hx;

  -- Equality is symmetric. This one's a little harder to prove.
  eq_sym (A : *) (x y : A) (Hxy : eq A x y) : eq A y x :=
      fun (P : A -> *) (Hy : P y) =>
          Hxy (fun (z : A) => P z -> P x) (fun (Hx : P x) => Hx) Hy;

  -- Equality is transitive.
  eq_trans (A : *) (x y z : A) (Hxy : eq A x y) (Hyz : eq A y z) : eq A x z :=
      fun (P : A -> *) (Hx : P x) => Hyz P (Hxy P Hx);

Running perga equality.pg yields the following output.

  loading: equality.pg
  success!

This means our proofs were accepted.

If we had an error in the proofs, we would get a somewhat useful error message. For example, if we had defined eq_trans as shown below, it would be incorrect (missing the P after Hxy).

  eq_trans (A : *) (x y z : A) (Hxy : eq A x y) (Hyz : eq A y z) : eq A x z :=
    fun (P : A -> *) (Hx : P x) => Hyz P (Hxy Hx);

Then running perga equality.pg yields the following output.

  loading: equality.pg
  19:50:
     |
  19 |     fun (P : A -> *) (Hx : P x) => Hyz P (Hxy Hx);
     |                                                  ^
  Cannot unify 'A -> *' with 'P x' when evaluating 'Hxy Hx'

This indicates that, when evaluating Hxy Hx, it was expecting something of type A -> *, but instead found something of type P x. Since P is type A -> *, we can then realize that we forgot the P.

Future Goals

Substantive

TODO Sections

Coq-style sections would be very handy, and probably relatively easy to implement (compared to everything else on this todo list), especially now that we have an intermediate representation.

TODO Inference

Not decidable, but I might be able to implement some basic unification algorithm, or switch to bidirectional type checking. This isn't super necessary though, I find leaving off the types of arguments to generally be a bad idea, but in some cases it can be handy, especially not at the top level.

TODO Implicits

Much, much more useful than inference, implicit arguments would be amazing. It also seems a lot more complicated, but any system for dealing with implicit arguments is far better than none.

TODO Module System

A proper module system would be wonderful. To me, ML style modules with structures, signatures, and functors seems like the right way to handle algebraic structures for a relatively simple language, rather than records (or, worse, a bunch of and's like I currently have; especially painful without implicits) or type classes (probably much harder, but could be nicer), but any way of managing scope, importing files, etc. is a necessity. The F-ing modules paper is probably a good reference. Now that I have an intermediate representation, following in F-ing modules's footsteps and implementing modules purely through elaboration should be possible.

DONE Universes?

Not super necessary, but occasionally extremely useful. Could be fun, idk.

I was looking into bidirectional typing and came across a description of universes. It turned out to be much easier to implement than I was expecting, so I figured why not and added universes. So now * : *1 : *2 : *3 : .... However, they're basically useless without universe polymorphism of some kind.

Also, everything ends up impredicative (no * : *, but quantifying over *i still leaves you in *i), and my implementation of impredicativity feels a little sketchy. There might be paradoxes lurking. It would be easy to switch it over to being predicative, but, without inductive types or at least more built-in types, logical connectives can only be defined impredicatively, so that will have to wait until we have inductive definitions.

I have since followed in Coq's footsteps and switched universe hierarchies to * : □ : □₁ : □₂ : □₃ : ..., where all the □ᵢ are predicative and * is impredicative (analogous to Prop and Type). For now at least, we definitely need at least the lowest sort to be impredicative to allow for impredicative definitions of connectives.

TODO Universe polymorphism

I have universes, but without universe polymorphism, they're basically useless, or at least I am unable to find a practical use for them. (When would you want to quantify over e.g. kinds specifically?)

TODO Sigma and sum types

While not full inductive definitions, builtin sigma and sum types (and probably a unit type to complete the algebra) would make predicative universes actually possible to work in, and generally make working with conjunctions, disjunctions, and existentials much easier (especially with pattern matching). Record types could then likely follow as syntax sugar for a bunch of dependent pairs dealt with by the elaborator, making for easier definitions of e.g. algebraic structures.

TODO Inductive Definitions

This is definitely a stretch goal. It would be cool though, and would turn this proof checker into a much more competent programming language. It's not necessary for the math, but inductive definitions let you leverage computation in proofs, which is amazing. They also make certain definitions way easier, by avoiding needing to manually stipulate elimination rules, including induction principles, and let you keep more math constructive and understandable to the computer.

Cosmetic/usage/technical

TODO Prettier pretty printing

Right now, everything defaults to one line, which can be a problem with how large the proof terms get. Probably want to use prettyprinter to be able to nicely handle indentation and line breaks.

TODO Better repl

The repl is decent, probably the most fully-featured repl I've ever made, but implementing something like this would be awesome.

TODO Improve error messages

Error messages are decent, but a little buggy. Syntax error messages are pretty ok, but could have better labeling. The type check error messages are decent, but could do with better location information. Right now, the location defaults to the end of the current definition, which is often good enough, but more detail can't hurt. The errors are generally very janky and hard to read. Having had quite a bit of practice reading them now, they actually provide very useful information, but could be made a lot more readable.

Since adding an intermediate AST, the error messages have gotten much worse. This is pretty urgent now.

TODO Document library code

Low priority, as I'm the only one working on this, I'm working on it very actively, and things will continue rapidly changing, but I'll want to get around to it once things stabilize, before I forget how everything works.

TODO Add versions to `perga.cabal` and/or nixify

Probably a smart idea.

TODO More incremental parsing/typechecking

Right now, if there's a failure, everything just stops immediately. More incremental parsing/typechecking could pave the way for more interactivity, e.g. development with holes, an LSP server, etc., not to mention better error messages.

DONE Multiple levels of AST

Added a couple types representing an intermediate representation, as well as a module responsible for elaboration (basically a function from these intermediate types to Expr). This drastically simplified the parser, now that it is not responsible for converting to de Bruijn indices, handling the environment, and type checking all in addition to parsing. However, now that type checking is out of the parser, we lost location information for errors, making better error messages much more important now. I have some ideas for getting location information (and more accurate location information, instead of always pointing to the end of the most relevant definition), which should drastically improve the error messages.

TODO Improve type checking algorithm

I'm proud that I just kinda made up a working type checking algorithm, but it is clearly quite flawed. Even assuming no need to check beta equivalence, I'm pretty sure that this algorithm is approximately exponential in the length of a term, which is pretty terrible. It hasn't been a huge problem, but type checking just the term R2_sub_R in /wball/perga/src/commit/84e44b0e33565fac172967521e57f67f6fe5ef64/examples/peano.pg takes about 1.5 seconds. Performance could easily be drastically improved with some memoization, but upgrading to an off-the-shelf bidirectional type checking algorithm seems like a good idea in general. Another problem with my current type checking algorithm is that it is very inflexible (e.g. adding optional return type ascriptions in functions, or type ascriptions in let bindings is currently impossible, while trivial to add with bidirectional type checking). I also have no idea how to add type inference or implicits with how things are currently structured. A more flexible type checking algorithm, likely together with multiple levels of AST, makes it seem more possible.

TODO Alternate syntax

I've had a bunch of ideas for a more mathematician-friendly syntax bouncing around my head for a while. Implementing one of them would be awesome, but probably quite tricky.

Something like

  Theorem basic (S : *) (P : S → *) :
      (∀ (x : S), P x → Q x) → (∀ (x : S), P x) → ∀ (x : S), Q x.
  Proof
          1. Suppose ∀ (x : S), P x → Q x
          2. Suppose ∀ (x : S), P x
          3. Let x : S
          4. P x by [2 x]
          5. Q x by [1 x 4]
  Qed

I think could be reliably translated into

  basic (S : *) (P : S → *) : (Π (x : S), P x → Q x) → (Π (x : S), P x) → Π (x : S), Q x :=
        fun (a1 : Π (x : S), P x → Q x) ⇒
            fun (a2 : Π (x : S), P x) ⇒
                fun (x : S) ⇒
                    a1 x (a2 x);

and is more intuitively understandable to a mathematician not familiar with type theory, while the latter would be utter nonsense.

I'm imagining the parser could be chosen based on the file extension or something. Some way to mix the syntaxes could be nice too.

TODO Infix/misfix operators

Infix/misfix operators would be very nice and make perga look more normal. It's funny, at the moment it looks a lot like a lisp, even though it's totally not. Here's an excerpt from the proof that addition is commutative that looks particularly lispy.

  (eq_trans nat (plus n (suc m)) (suc (plus n m)) (plus (suc m) n)
      (plus_s_r n m)
      (eq_trans nat (suc (plus n m)) (suc (plus m n)) (plus (suc m) n)
        (eq_cong nat nat (plus n m) (plus m n) suc IH)
        (eq_sym nat (plus (suc m) n) (suc (plus m n)) (plus_s_l m n))))

DONE treesitter parser and/or emacs mode

There's a tree-sitter parser and neovim plugin available now, but no emacs-mode.

TODO TUI

This is definitely a stretch goal, and I'm not sure how good of an idea it would be, but I'm imagining a TUI split into two panels. On the left you can see the term you are building with holes in it. On the right you have the focused hole's type as well as the types of everything in scope (like Coq and Lean show while you're in the middle of a proof). Then you can interact with the system by entering commands (e.g. intros, apply, etc.) which changes the proof term on the left. You'd also just be able to type in the left window as well, and edit the proof term directly. This way you'd get the benefits of working with tactics, making it way faster to construct proof terms, and the benefits of working with proof terms directly, namely transparency and simplicity. I'll probably want to look into brick if I want to make this happen.

README.org Unescape Escape