I haven't done that LISP project, but I recommend Crafting Interpreters.
I know that you say you want to build a compiler. And this tripped me up when I first started. But an interpreter and a compiler are very, very similar--to the point of being nearly identical. You can build a compiler by building an interpreter, then swapping out the immediate-calculation functions for functions that generate machine code (or LLVM IR or whatever). You will often want at least a basic interpreter for your language anyway, if for no other purpose than simplifying expressions.
I also recommend the "Dragon Book", which is pretty much the textbook on compiler design. Reading this book was absolutely eye-opening, because it presents a bunch of very basic and general information that everybody in language design just assumes you know.
I really struggled when I started before I read this book because so much stuff is very much simpler than I imagined. For instance, for my first language project, I got stuck on codegen and never finished. I was stuck on the basic question: okay, but how do I know what order to generate the code in? It seemed like a very hard question, requiring DAGs and dependency analysis and all sorts of shit like that, and I got discouraged and quit. Especially because so many tutorials are like "now that you've parsed your language, interpreting it is easy and left as an exercise for the reader". Guess what? It is easy. It's just a brain-dead post-order traversal. But literally the only place that ever said it that simply and directly was the Dragon Book.
Lua is a dynamic programming language which besides of numbers and strings only has these tables. They are like a hashtable and can be indexed by a number or string.
You can download the lua interpreter and look into ltable.c: https://github.com/lua/lua/blob/master/ltable.c
And here for for more information: https://www.lua.org/doc/jucs05.pdf
If you don't already know, there already is llvm haskell package, so you don't have to use C api directly. There are even examples and some other resources.
Can't comment on what's best for your language without specific details, but you should definitely take a look at the annotated source of the Lua parser: https://www.lua.org/source/5.3/lparser.c.html - it is hand-written and provides excellent performance.
The Lua language provides a good example of how to design a minimal and fast, but powerful grammar.
Look for the "Turbo Assembler Quick Reference". Example: TASM 5.0 Quick Reference
32-bit only and predate the various 586/686 extensions, but very useful anyway. It's not too hard to go from the quick reference and a disassembly to see how you can generate opcodes.
Haxe can cross-compile to quite a few different languages, including C++. I've used Haxe with OpenFL for making games and cross-compile them into native Windows, OSX, Linux, iOS and Android programs.
Haxe can interface with native codes by writing some interfaces/externs so you can use native codes within Haxe. Usually you'd do it this way instead of editing the cross-compiled output, which may not always be human-readable.
The Haxe compiler can also do a number of optimizations like DCE, inlining, and you can write macros too.
I've had the same problem. llvm-c/Core.h was very helpfull. I just used C++ documentation and, with the help of that header, translated it to C.
I've also found this link. It should help you get started.
Check "Crafting Interpreters" it's a huge book but great for starters.
In lieu of a textbook (if no fitting textbook stands out), it could be worth gathering a pile of worthwhile excerpts from published papers, photocopying them, and handing those packets out to students as if they were a textbook for your class. Many materials are old and probably wouldn't require you ask permission. Lisp was invented in 1958 and it had a runtime system. A lot of the things worth studying about C# and Java could be studied just as well about older languages, for free.
Alternately, if not given to students verbatim, the articles can serve as inspiration and guidance in making your own materials, avoiding references to older languages and focusing on newer languages. The format of this article below (especially its list of bullet points) seems very well suited for turning into a class syllabus.
A Runtime System by Andrew W. Appel, Princeton University, May 1989
When you Google the term runtime system, many of the articles that show up require that you pay for them in order to download them. With Google Scholar, that doesn't seem to be the case, per se. Google Scholar gives you 2 columns and the link in the right column is a downloadable format for free, in my little experience with it.
AFAIK, you don't get reference counting in C++ unless you explicitly use shared_ptr
and friends.
Didn't know Rust recognised Rc as a special function, thanks! But yeah, as you said, I'm more interested in learning about the value of a low level language that's pure ARC, Rust doesn't even need reference counting, it's O&B system ensures that 😉
Nim is really looking like a perfect language on the surface, tbh. Too bad I'm in the D language camp right now; maybe things will change in the future🙂
In D, C++ interop is achieved via matching C++ name mangling conventions, function calling conventions, and matching the vtable layout for single inheritance.
Please see this for more:
Go’s Assembler is an interesting example as the Go compiler produces an architecture-agnostic assembly language which is then compiled to architecture-specific code by the assembler.
> The most important thing to know about Go's assembler is that it is not a direct representation of the underlying machine. Some of the details map precisely to the machine, but some do not. This is because the compiler suite (see this description) needs no assembler pass in the usual pipeline. Instead, the compiler operates on a kind of semi-abstract instruction set, and instruction selection occurs partly after code generation. The assembler works on the semi-abstract form, so when you see an instruction like MOV what the toolchain actually generates for that operation might not be a move instruction at all, perhaps a clear or load. Or it might correspond exactly to the machine instruction with that name. In general, machine-specific operations tend to appear as themselves, while more general concepts like memory move and subroutine call and return are more abstract. The details vary with architecture, and we apologize for the imprecision; the situation is not well-defined.
C++ (Lots of experience)
Haskell (Classes taught with haskell in school)
LISP (Built a lisp interpreter when I started learning to code)
SML (Learnt from https://www.coursera.org/course/proglang )
Python (Had to do a lot of scripting. My favorite feature is yield
!)
I really like the idea of functional programming.
But don't understand it in much depth.
Stuff like monads make my head spin, for example.
But recursion I can comprehend rather well.
I'm also doing something similar with F# (ie: ML-like. Working torwards bussines apps/arrays/relational). This is a small test (I have something larger):
https://gist.github.com/mamcx/fe4cf6c5a4452a341983
Is a interpreter, and it help to "get for free" the GC:
If interested and use F#, maybe we can join forces (even if each own with his codebase). A lot of fundamentals in each language is the same thing (need functions, ifs, loops, pattern match, etc)
I wrote my own word processor, thank you very much... http://cowlark.com/wordgrinder/
The actual answer is: because I am writing a compiler which I want to be able to have compile itself, and using massive tools like Java for core pieces of infrastructure renders that completely impossible. I want tools which are small and easy to adapt.
Given that neither of us backed up our argument, there's not much to discuss.
But if you have an example of a compiler that uses a compiled grammar to parse source that has great (parse) error reporting, I'd love to see it! My example of a hand-rolled solution with what I'd call best-in-class is Rust. Most of it's famed compiler warnings are after the parse step (typecheck, borrowcheck), the parse errors are just as friendly and helpful, making actionable suggestions and attempting to continue in a meaningful way.
The closest competitor from a generated stack would probably be JetBrains' GrammarKit/JFlex/PSI stack, which still is closer to writing parsing logic than declaring your grammar imho.
I do exactly this kind of analysis (on an AST) in my pet language, http://strlen.com/lobster/, which has nullable types. It propagates the "promoted" non-null types across and
/ or
operations, and also to branches of an if
it sits in, and a few other contexts. More details under "the trouble with nil" here: http://aardappel.github.io/lobster/type_checker.html
I point to where in the implementation this happens, if you feel like seeing an example.
Edit: language has no goto
. Type-checking happens in call-graph order, which makes a lot of these things simpler.
LLVM is capable of TCO provided it's constraints are met. The implementation is in lib/Transforms/Scalar/TailRecursionElimination.cpp.
See http://llvm.org/doxygen/TailRecursionElimination_8cpp_source.html the top of file describes the current implementation.
If you're generating LLVM-IR, you can safely attach 'tail' in front of the call instruction and LLVM will figure out if it can eliminate the tail call or perform a tail call separately.
The linked tutorial describes how to implement a Kaleidoscope-LLVM compiler using Haskell. It's based on the standard LLVM tutorials, including one for OCaml. It may be worth trying both and using whichever you like better. Also, F# derives from OCaml, so porting from the OCaml version shouldn't be very difficult.
To be honest, I haven't worked with GCC as a backend so I can't do a direct comparison. However, LLVM was designed to be more modular and to be easy to use as a backend for language designers. There's even a nice official tutorial for implementing a language using LLVM. Another advantage of LLVM is it has built-in support for JIT-compilation, making creating a REPL or something similar for your language simpler.
Awesome, thanks! I think I even found that talk some time ago, but couldn't sit through the terrible recording quality ^^
Luckily I just found the slides!
You realize that even gcc doesn't fully support C99? What features of C99 do you need that aren't in C90/C85?
EDIT: I just realized that sounds harsh. If there is something basic you really need (inline functions, for instance), it isn't that hard to add it to the lexer. You still have to figure out how to turn it into code. :)
My 2cw:
I found Modern Compiler Design extremely useful. It's one of the few technical books I have read cover to cover.
If you don't mind delving a bit into the past, Niklaus Wirth's 'Compiler Construction' is a joy to read. It's an extremely slim book and gives you all the essentials on building a compiler.
I like Ronald Mak's book Writing Compilers and Interpreters. He doesn't use lexer and parser generators, but instead demonstrates how to write a compiler from first principles. It serves as a good introduction. It's light on theory, but if you're looking to go in-depth into that, you won't find it in a beginner's book. The edition that I have (1991) uses plain C to write a Pascal compiler. Later editions use C++ and Java, so pick the version that suits your familiarity.
I suggest Compiler Construction: Principles and Practice by Kenneth C. Louden. This was the book used for in my compiler design classes and was written in a very clear and straightforward way following a very practical approach.
I believe <em>Virtual Machines: Versatile Platforms for Systems and Processes</em> is (or at least was) kind of a standard text for virtual machines (though I could be wrong).