added let expressions

2024-11-23 09:16:32 -08:00 · 2024-11-23 09:16:32 -08:00 · 780ab52809
commit 780ab52809
parent 5234f43194
7 changed files with 151 additions and 25 deletions
--- a/README.md
+++ b/README.md
@ -15,6 +15,20 @@ To be perfectly clear, `λ` abstractions can be written with either &ldquo;λ&rd

 `Π` types can be written with either &ldquo;Π&rdquo;, &ldquo;∀&rdquo;, or &ldquo;forall&rdquo;, and are separated from their bodies with a &ldquo;,&rdquo;. Arrow types can be written &ldquo;->&rdquo; or &ldquo;&rarr;&rdquo;. Like with `λ` abstractions, binders with the same type can be grouped, and multiple binders can occur between the &ldquo;Π&rdquo; and the &ldquo;,&rdquo;.

+Finally, `let` expressions have syntax shown below.
+
+    let ( (<ident> := <expr>) )+ in <expr> end
+
+Below is a more concrete example.
+
+    let (a := (and_elim_l A (and B C) h))
+        (bc := (and_elim_r A (and B C) h))
+        (b := (and_elim_l B C bc))
+        (c := (and_elim_r B C bc))
+    in
+        and_intro (and A B) C (and_intro A B a b) c
+    end
+
 Definitions work similarly, having abstract syntax as shown below.

    <ident> (<ident> : <type>)* : <type>? := <term> | axiom;
@ -129,14 +143,26 @@ This means our proofs were accepted. Furthermore, as a sanity check, we can see
 ## Substantive


-### TODO Let-expressions
+### DONE Let-expressions

-I vaguely remember from reading Coq&rsquo;art that Coq does something special with `let` expressions, so I&rsquo;ll maybe want to check that out. I tried implementing `let` as syntax sugar for an immediately called function, but that proved to be a massive mess with how I&rsquo;m handling things. `let` expressions would definitely be handy for factoring out complex sub expressions.
+I decided to bake `let` expressions into the formal language, rather than being a layer on top of the syntax. This means we can treat typing `let` expressions different from functions. The only substantial difference between
+
+    let (x := value) in body end
+
+and
+
+    (fun (x : type) => body) value
+
+is that the latter must have a proper function type while the former doesn&rsquo;t need to. So, for instance,
+
+    let (x := *) in ...
+
+is possible, while you could not write a function whose argument is type `□`. We are justified in doing this because we always know what the argument is, and can just immediately substitute it, both into the value of the body, as well as the type of the body.


 ### TODO Sections

-Coq-style sections would be very handy, and probably relatively easy to implement. Upon parsing a definition inside a section, will somehow need to look ahead to see what variables are used to see how I need to modify `binders`, or just make every definition require every section variable as an argument.
+Coq-style sections would be very handy, and probably *relatively* easy to implement (compared to everything else on this todo list). Upon parsing a definition inside a section, will somehow need to look ahead to see what variables are used to see how I need to modify `binders`, or just make every definition require every section variable as an argument.


 ### TODO Inference
@ -146,12 +172,12 @@ Not decidable, but I might be able to implement some basic unification algorithm

 ### TODO Implicits

-Much, much more useful than [inference](#orgc6eb0b1), implicit arguments would be amazing. It also seems a lot more complicated, but any system for dealing with implicit arguments is far better than none.
+Much, much more useful than [inference](#org31a76ab), implicit arguments would be amazing. It also seems a lot more complicated, but any system for dealing with implicit arguments is far better than none.


 ### TODO Module System

-A proper module system would be wonderful. To me, ML style modules with structures, signatures, and functors seems like the right way to handle algebraic structures for a relatively simple language, rather than records (or, worse, a bunch of `and`&rsquo;s like I currently have; especially painful without [implicits](#orged32318)) or type classes (probably much harder, but could be nicer), but any way of managing scope, importing files, etc. is a necessity.
+A proper module system would be wonderful. To me, ML style modules with structures, signatures, and functors seems like the right way to handle algebraic structures for a relatively simple language, rather than records (or, worse, a bunch of `and`&rsquo;s like I currently have; especially painful without [implicits](#orge9a1bd4)) or type classes (probably much harder, but could be nicer), but any way of managing scope, importing files, etc. is a necessity. The [F-ing modules](https://www.cambridge.org/core/services/aop-cambridge-core/content/view/B573FA00832D55D4878863DE1725D90B/S0956796814000264a.pdf/f-ing-modules.pdf) paper is probably a good reference.


 ### TODO Universes?
@ -197,6 +223,11 @@ Probably a smart idea.
 Right now, if there&rsquo;s a failure, everything just stops immediately. More incremental parsing/typechecking could pave the way for more interactivity, e.g. development with holes, an LSP server, etc., not to mention better error messages.


+### TODO Multiple levels of AST
+
+Right now, the parsing and typechecking kind of happens all at once on a single syntax representation. As I add fancier and fancier elaboration, it might be a good idea to have multiple syntax representations. So we&rsquo;d have one level of syntax representing what is actually in the file (modulo some easy elaboration like with function definitions), and through a series of transformations transform it into something like the current `Expr` type with all the information needed for typechecking and all the complex language features removed.
+
+
 ### TODO Alternate syntax

 I&rsquo;ve had a bunch of ideas for a more mathematician-friendly syntax bouncing around my head for a while. Implementing one of them would be awesome, but probably quite tricky.
--- a/README.org
+++ b/README.org
@ -16,6 +16,21 @@ To be perfectly clear, =λ= abstractions can be written with either "λ" or "fun

 =Π= types can be written with either "Π", "∀", or "forall", and are separated from their bodies with a ",". Arrow types can be written "->" or "\to". Like with =λ= abstractions, binders with the same type can be grouped, and multiple binders can occur between the "Π" and the ",".

+Finally, =let= expressions have syntax shown below.
+#+begin_src
+  let ( (<ident> := <expr>) )+ in <expr> end
+#+end_src
+Below is a more concrete example.
+#+begin_src
+  let (a := (and_elim_l A (and B C) h))
+      (bc := (and_elim_r A (and B C) h))
+      (b := (and_elim_l B C bc))
+      (c := (and_elim_r B C bc))
+  in
+      and_intro (and A B) C (and_intro A B a b) c
+  end
+#+end_src
+
 Definitions work similarly, having abstract syntax as shown below.
 #+begin_src
  <ident> (<ident> : <type>)* : <type>? := <term> | axiom;
@ -122,11 +137,23 @@ This means our proofs were accepted. Furthermore, as a sanity check, we can see

 ** Substantive

-*** TODO Let-expressions
-I vaguely remember from reading Coq'art that Coq does something special with =let= expressions, so I'll maybe want to check that out. I tried implementing =let= as syntax sugar for an immediately called function, but that proved to be a massive mess with how I'm handling things. =let= expressions would definitely be handy for factoring out complex sub expressions.
+*** DONE Let-expressions
+I decided to bake =let= expressions into the formal language, rather than being a layer on top of the syntax. This means we can treat typing =let= expressions different from functions. The only substantial difference between
+#+begin_src
+  let (x := value) in body end
+#+end_src
+and
+#+begin_src
+  (fun (x : type) => body) value
+#+end_src
+is that the latter must have a proper function type while the former doesn't need to. So, for instance,
+#+begin_src
+  let (x := *) in ...
+#+end_src
+is possible, while you could not write a function whose argument is type =□=. We are justified in doing this because we always know what the argument is, and can just immediately substitute it, both into the value of the body, as well as the type of the body.

 *** TODO Sections
-Coq-style sections would be very handy, and probably relatively easy to implement. Upon parsing a definition inside a section, will somehow need to look ahead to see what variables are used to see how I need to modify =binders=, or just make every definition require every section variable as an argument.
+Coq-style sections would be very handy, and probably /relatively/ easy to implement (compared to everything else on this todo list). Upon parsing a definition inside a section, will somehow need to look ahead to see what variables are used to see how I need to modify =binders=, or just make every definition require every section variable as an argument.

 *** TODO Inference
 Not decidable, but I might be able to implement some basic unification algorithm, or switch to bidirectional type checking. This isn't super necessary though, I find leaving off the types of arguments to generally be a bad idea, but in some cases it can be handy, especially not at the top level.
@ -135,7 +162,7 @@ Not decidable, but I might be able to implement some basic unification algorithm
 Much, much more useful than [[Inference][inference]], implicit arguments would be amazing. It also seems a lot more complicated, but any system for dealing with implicit arguments is far better than none.

 *** TODO Module System
-A proper module system would be wonderful. To me, ML style modules with structures, signatures, and functors seems like the right way to handle algebraic structures for a relatively simple language, rather than records (or, worse, a bunch of =and='s like I currently have; especially painful without [[Implicits][implicits]]) or type classes (probably much harder, but could be nicer), but any way of managing scope, importing files, etc. is a necessity.
+A proper module system would be wonderful. To me, ML style modules with structures, signatures, and functors seems like the right way to handle algebraic structures for a relatively simple language, rather than records (or, worse, a bunch of =and='s like I currently have; especially painful without [[Implicits][implicits]]) or type classes (probably much harder, but could be nicer), but any way of managing scope, importing files, etc. is a necessity. The [[https://www.cambridge.org/core/services/aop-cambridge-core/content/view/B573FA00832D55D4878863DE1725D90B/S0956796814000264a.pdf/f-ing-modules.pdf][F-ing modules]] paper is probably a good reference.

 *** TODO Universes?
 Not super necessary, but occasionally extremely useful. Could be fun, idk.
@ -163,6 +190,9 @@ Probably a smart idea.
 *** TODO More incremental parsing/typechecking
 Right now, if there's a failure, everything just stops immediately. More incremental parsing/typechecking could pave the way for more interactivity, e.g. development with holes, an LSP server, etc., not to mention better error messages.

+*** TODO Multiple levels of AST
+Right now, the parsing and typechecking kind of happens all at once on a single syntax representation. As I add fancier and fancier elaboration, it might be a good idea to have multiple syntax representations. So we'd have one level of syntax representing what is actually in the file (modulo some easy elaboration like with function definitions), and through a series of transformations transform it into something like the current =Expr= type with all the information needed for typechecking and all the complex language features removed.
+
 *** TODO Alternate syntax
 I've had a bunch of ideas for a more mathematician-friendly syntax bouncing around my head for a while. Implementing one of them would be awesome, but probably quite tricky.

--- a/examples/logic.pg
+++ b/examples/logic.pg
@ -151,11 +151,13 @@ or_comm (A B : *) (h : or A B) : or B A :=

 -- A ∧ (B ∧ C) => (A ∧ B) ∧ C
 and_assoc_l (A B C : *) (h : and A (and B C)) : and (and A B) C :=
-    and_intro (and A B) C
-        (and_intro A B
-            (and_elim_l A (and B C) h)
-            (and_elim_l B C (and_elim_r A (and B C) h)))
-        (and_elim_r B C (and_elim_r A (and B C) h));
+    let (a := (and_elim_l A (and B C) h))
+        (bc := (and_elim_r A (and B C) h))
+        (b := (and_elim_l B C bc))
+        (c := (and_elim_r B C bc))
+    in
+        and_intro (and A B) C (and_intro A B a b) c
+    end;

 -- (A ∧ B) ∧ C => A ∧ (B ∧ C)
 and_assoc_r (A B C : *) (h : and (and A B) C) : and A (and B C) :=
--- a/lib/Check.hs
+++ b/lib/Check.hs
@ -49,6 +49,16 @@ findType g (Abs x a m) = do
 findType g (Pi _ a b) = do
    validateType_ g a
    validateType (incIndices a : map incIndices g) b
+findType g (Let _ v b) = do
+    a <- findType g v
+    validateType_ g a
+    res <- findType (incIndices a : map incIndices g) b
+    pure $ subst 0 a res
+
+-- this is kinda goofy, it's just like a function, except the resulting type
+-- of the body doesn't need to result in a valid function type
+-- this means things like `let x := * in ...` would be allowed, even though
+-- you couldn't write a function that takes something of type `□` as an argument

 checkType :: Env -> Expr -> Result Expr
 checkType env t = runReaderT (findType [] t) env
--- a/lib/Eval.hs
+++ b/lib/Eval.hs
@ -47,6 +47,7 @@ subst _ _ Square = Square
 subst k s (App m n) = App (subst k s m) (subst k s n)
 subst k s (Abs x m n) = Abs x (subst k s m) (subst (k + 1) (incIndices s) n)
 subst k s (Pi x m n) = Pi x (subst k s m) (subst (k + 1) (incIndices s) n)
+subst k s (Let x v b) = Let x (subst k s v) (subst (k + 1) (incIndices s) b)

 envLookupVal :: Text -> ReaderT Env Result Expr
 envLookupVal n = asks ((_val <$>) . M.lookup n) >>= maybe (throwError $ UnboundVariable n) pure
@ -54,7 +55,7 @@ envLookupVal n = asks ((_val <$>) . M.lookup n) >>= maybe (throwError $ UnboundV
 envLookupTy :: Text -> ReaderT Env Result Expr
 envLookupTy n = asks ((_ty <$>) . M.lookup n) >>= maybe (throwError $ UnboundVariable n) pure

-- reduce until β reducts are impossible in head position
+-- reduce until β reducts or let simplifications are impossible in head position
 whnf :: Expr -> ReaderT Env Result Expr
 whnf (App (Abs _ _ v) n) = whnf $ subst 0 n v
 whnf (App m n) = do
@ -63,6 +64,7 @@ whnf (App m n) = do
        then pure $ App m n
        else whnf $ App m' n
 whnf (Free n) = envLookupVal n >>= whnf
+whnf (Let _ v b) = whnf $ subst 0 v b
 whnf e = pure e

 reduce :: Expr -> ReaderT Env Result Expr
@ -71,6 +73,7 @@ reduce (App m n) = App <$> reduce m <*> reduce n
 reduce (Abs x t v) = Abs x <$> reduce t <*> reduce v
 reduce (Pi x t v) = Pi x <$> reduce t <*> reduce v
 reduce (Free n) = envLookupVal n
+reduce (Let _ v b) = pure $ subst 0 v b
 reduce e = pure e

 normalize :: Expr -> ReaderT Env Result Expr
--- a/lib/Expr.hs
+++ b/lib/Expr.hs
@ -9,6 +9,7 @@ data Expr where
    App :: Expr -> Expr -> Expr
    Abs :: Text -> Expr -> Expr -> Expr
    Pi :: Text -> Expr -> Expr -> Expr
+    Let :: Text -> Expr -> Expr -> Expr
    deriving (Show, Ord)

 instance Eq Expr where
@ -20,6 +21,7 @@ instance Eq Expr where
    (App e1 e2) == (App f1 f2) = e1 == f1 && e2 == f2
    (Abs _ t1 b1) == (Abs _ t2 b2) = t1 == t2 && b1 == b2
    (Pi _ t1 b1) == (Pi _ t2 b2) = t1 == t2 && b1 == b2
+    (Let _ v1 b1) == (Let _ v2 b2) = v1 == v2 && b1 == b2
    _ == _ = False

 occursFree :: Integer -> Expr -> Bool
@ -31,6 +33,7 @@ occursFree _ Square = False
 occursFree n (App a b) = on (||) (occursFree n) a b
 occursFree n (Abs _ a b) = occursFree n a || occursFree (n + 1) b
 occursFree n (Pi _ a b) = occursFree n a || occursFree (n + 1) b
+occursFree n (Let _ v b) = occursFree n v || occursFree (n + 1) b

 shiftIndices :: Integer -> Integer -> Expr -> Expr
 shiftIndices d c (Var x k)
@ -43,6 +46,7 @@ shiftIndices _ _ Square = Square
 shiftIndices d c (App m n) = App (shiftIndices d c m) (shiftIndices d c n)
 shiftIndices d c (Abs x m n) = Abs x (shiftIndices d c m) (shiftIndices d (c + 1) n)
 shiftIndices d c (Pi x m n) = Pi x (shiftIndices d c m) (shiftIndices d (c + 1) n)
+shiftIndices d c (Let x v b) = Let x (shiftIndices d c v) (shiftIndices d (c + 1) b)

 incIndices :: Expr -> Expr
 incIndices = shiftIndices 1 0
@ -58,13 +62,11 @@ collectLambdas (Abs x ty body) = ((x, ty) : params, final)
    (params, final) = collectLambdas body
 collectLambdas e = ([], e)

-cleanNames :: Expr -> Expr
-cleanNames (App m n) = App (cleanNames m) (cleanNames n)
-cleanNames (Abs x ty body) = Abs x (cleanNames ty) (cleanNames body)
-cleanNames (Pi x ty body)
-    | occursFree 0 body = Pi x (cleanNames ty) (cleanNames body)
-    | otherwise = Pi "" ty (cleanNames body)
-cleanNames e = e
+collectLets :: Expr -> ([(Text, Expr)], Expr)
+collectLets (Let x val body) = ((x, val) : params, final)
+  where
+    (params, final) = collectLets body
+collectLets e = ([], e)

 collectPis :: Expr -> ([(Text, Expr)], Expr)
 collectPis p@(Pi "" _ _) = ([], p)
@ -73,6 +75,14 @@ collectPis (Pi x ty body) = ((x, ty) : params, final)
    (params, final) = collectPis body
 collectPis e = ([], e)

+cleanNames :: Expr -> Expr
+cleanNames (App m n) = App (cleanNames m) (cleanNames n)
+cleanNames (Abs x ty body) = Abs x (cleanNames ty) (cleanNames body)
+cleanNames (Pi x ty body)
+    | occursFree 0 body = Pi x (cleanNames ty) (cleanNames body)
+    | otherwise = Pi "" ty (cleanNames body)
+cleanNames e = e
+
 groupParams :: [(Text, Expr)] -> [([Text], Expr)]
 groupParams = foldr addParam []
  where
@ -85,6 +95,9 @@ groupParams = foldr addParam []
 showParamGroup :: ([Text], Expr) -> Text
 showParamGroup (ids, ty) = parenthesize $ unwords ids <> " : " <> pretty ty

+showBinding :: (Text, Expr) -> Text
+showBinding (ident, val) = parenthesize $ ident <> " := " <> pretty val
+
 helper :: Integer -> Expr -> Text
 helper _ (Var s _) = s
 helper _ (Free s) = s
@ -107,6 +120,10 @@ helper k e@(Abs{}) = if k >= 1 then parenthesize res else res
    (params, body) = collectLambdas e
    grouped = showParamGroup <$> groupParams params
    res = "λ " <> unwords grouped <> " . " <> pretty body
+helper _ e@(Let{}) = res
+  where
+    (binds, body) = collectLets e
+    res = "let " <> unwords (map showBinding binds) <> " in " <> pretty body <> " end"

 pretty :: Expr -> Text
 pretty = helper 0 . cleanNames
--- a/lib/Parser.hs
+++ b/lib/Parser.hs
@ -49,11 +49,18 @@ lexeme = L.lexeme skipSpace
 eat :: Text -> Parser ()
 eat = void . lexeme . chunk

+keywords :: [Text]
+keywords = ["forall", "let", "in", "end", "fun"]
+
 pIdentifier :: Parser Text
-pIdentifier = label "identifier" $ lexeme $ do
+pIdentifier = try $ label "identifier" $ lexeme $ do
    firstChar <- letterChar <|> char '_'
    rest <- many $ alphaNumChar <|> char '_'
-    return $ T.pack (firstChar : rest)
+    let ident = T.pack (firstChar : rest)
+    when (ident `elem` keywords) $
+        fail $
+            "Reserved word: " ++ T.unpack ident
+    pure ident

 pVar :: Parser Expr
 pVar = label "variable" $ lexeme $ do
@ -101,6 +108,25 @@ pPAbs = lexeme $ label "Π-abstraction" $ withBinders $ do
    body <- pExpr
    pure $ foldr (uncurry Pi) body params

+pBinding :: Parser (Text, Expr)
+pBinding = lexeme $ label "binding" $ do
+    eat "("
+    ident <- pIdentifier
+    eat ":="
+    value <- pExpr
+    eat ")"
+    modify $ over bindsL (ident :)
+    pure (ident, value)
+
+pLet :: Parser Expr
+pLet = lexeme $ label "let expression" $ withBinders $ do
+    eat "let"
+    bindings <- some pBinding
+    eat "in"
+    body <- try pExpr
+    eat "end"
+    pure $ foldr (uncurry Let) body bindings
+
 pArrow :: Parser Expr
 pArrow = lexeme $ label "->" $ do
    a <- pAppTerm
@ -167,7 +193,14 @@ pTerm =
                ]

 pAppTerm :: Parser Expr
-pAppTerm = lexeme $ pLAbs <|> pPAbs <|> pApp
+pAppTerm =
+    lexeme $
+        choice
+            [ pLAbs
+            , pPAbs
+            , pLet
+            , pApp
+            ]

 pExpr :: Parser Expr
 pExpr = lexeme $ try pArrow <|> pAppTerm