Metascope - My personal playground for LLVM Compiler Framework
For the release of my new blog i want to announce a small article series about my new personal research project Metascope. In the past i was always dreaming about my own functional programming language with compiler implemented by the help of the popular LLVM frameowrk for code generation. Unfortanetly it’s quite easy to get lost in different materials (parsing principles, type system, type inferences, optimization, Intermediates representations) where learning curve can be very steep. Especially if you follow the highly intelectual, scientific discussions by some famous compiler architects like Edwin Brady (Idris), Simon Peyton Jones (Haskell) or Andreas Rossberg & Bob Harper (Standard ML) about type system theories.
Luckily, the LLVM team provides a minimal Userguide based on the educational language Kaleidoscope [1] which demonstrate a simple walkthrough about compiler techniques based on LLVM framework. The main purpose of this series is a replay of the implementation steps by Kaleidoscope tutorial with some variants in favour of the functional programming paradigma. I’m actually a whole beginner in compiler technologies and i would like to use my blog to share my experiences in this complex research area.
Do not overstate some technology choices of myself. I’m intending to choose Standard ML for my own experiments driven by personal reasons. (I really like it’s simple and consistent language design). Using OCaml for LLVM first-level binding support or Haskell for it’s highly expressive type system are probably much more productive selection.
We will start now this series with the introduction of our minimal frontend languages which is conceptionally not any different to Kaleidoscope. We are calling this language Metascope which embraces the typical, functional syntax of ML-family.
Language specification
The only available datatype is a double-precision floating number with 64-bit wordsize. The following literals should be valid lexemes
42 1.0 10_000 2e10 1.0e-3
Functions or also known in the functional community as abstraction are specified in the same way we represent them in Standard ML with the keyword fun.
fun fibo x = if x < 3 then 1 else fibo (x - 1) + fibo (x - 2)
The application of a function with a parameter is formed without parenthese
fibo 40
External function call to a standard library or runtime implementation have to be declared with keyword foreign.
foreign sin : float -> float foreign atan2 : float * float -> float
The syntax of the language is completely whitespace-independent. A lexer should ignore all possible spaces, newlines or tabs.
Next steps
In the next article we will start with writing of a small frontend which transforms a metascope programm into an abstract syntax tree. In the sense of traditional compiler principles we have to implement a scanner for tokenizing and parser. In a real programming language this step is often a time-consuming process. But thanks to the simplicity of the language specification we can quickly overcome this task for diving as fast as possible into LLVM framework (The main purpose of Kaleidoscope’s language design).
We will follow the same process how old school compilers have been written in the 80s. We will use generators like flex/yacc to implement our grammar which gives us more flexibility if we want to extend later the language. The popular Standard ML compiler mlton already delivers the appropriate tools mllex and mlyacc that are well-documented in [2] and will be very helpful. We will also use mlton which is very efficient Standard ML compiler with a simple build toolchain through MLB basis library system.
References
- LLVM Compiler Infrastructure (2019): Kaleidoscope: Tutorial Introduction and the Lexer. URL: https://llvm.org/docs/tutorial/OCamlLangImpl1.html
- Andrew W. Appell, David R. Tarditi and James S. Mattson (2009): User’s Guide to ML-Lex and ML-Yacc. URL: http://www.rogerprice.org/