Metascope - My personal playground for LLVM Compiler Framework

For the release of my new blog i want to announce a small article series about my new personal research project Metascope. In the past i was always dreaming about my own functional programming language with compiler implemented by the help of the popular LLVM frameowrk for code generation. Unfortanetly it’s quite easy to get lost in different materials (parsing principles, type system, type inferences, optimization, Intermediates representations) where learning curve can be very steep. Especially if you follow the highly intelectual, scientific discussions by some famous compiler architects like Edwin Brady (Idris), Simon Peyton Jones (Haskell) or Andreas Rossberg & Bob Harper (Standard ML) about type system theories.

Luckily, the LLVM team provides a minimal Userguide based on the educational language Kaleidoscope [1] which demonstrate a simple walkthrough about compiler techniques based on LLVM framework. The main purpose of this series is a replay of the implementation steps by Kaleidoscope tutorial with some variants in favour of the functional programming paradigma. I’m actually a whole beginner in compiler technologies and i would like to use my blog to share my experiences in this complex research area.

Do not overstate some technology choices of myself. I’m intending to choose Standard ML for my own experiments driven by personal reasons. (I really like it’s simple and consistent language design). Using OCaml for LLVM first-level binding support or Haskell for it’s highly expressive type system are probably much more productive selection.

We will start now this series with the introduction of our minimal frontend languages which is conceptionally not any different to Kaleidoscope. We are calling this language Metascope which embraces the typical, functional syntax of ML-family.

Language specification

Next steps

In the next article we will start with writing of a small frontend which transforms a metascope programm into an abstract syntax tree. In the sense of traditional compiler principles we have to implement a scanner for tokenizing and parser. In a real programming language this step is often a time-consuming process. But thanks to the simplicity of the language specification we can quickly overcome this task for diving as fast as possible into LLVM framework (The main purpose of Kaleidoscope’s language design).

We will follow the same process how old school compilers have been written in the 80s. We will use generators like flex/yacc to implement our grammar which gives us more flexibility if we want to extend later the language. The popular Standard ML compiler mlton already delivers the appropriate tools mllex and mlyacc that are well-documented in [2] and will be very helpful. We will also use mlton which is very efficient Standard ML compiler with a simple build toolchain through MLB basis library system.

References

  1. LLVM Compiler Infrastructure (2019): Kaleidoscope: Tutorial Introduction and the Lexer. URL: https://llvm.org/docs/tutorial/OCamlLangImpl1.html
  2. Andrew W. Appell, David R. Tarditi and James S. Mattson (2009): User’s Guide to ML-Lex and ML-Yacc. URL: http://www.rogerprice.org/