No nessuna risorsa che mi viene in mente al momento.
Ma ci sono due questioni molto interessanti nella costruzione di un compilatore.
Senza usare termini precisi in nessun ambito, l'interpretazione (da codice sorgente ad AST) e la sintesi (da AST a binario).
L'interpretazione è molto più accademica, la sintesi di solito è più pratica.
Per diventare capaci a livello di C secondo me capire bene la sintesi è fondamentale.
Detto questo se trovi qualsiasi cosa che parla di come creare un compilatore, di sicuro spiega i dettagli.
Il libro, che non ho mai letto ma che di fama la fa da padrone, è il libro del drago: https://www.amazon.com/Compilers-Principles-Techniques-Tools-2nd/dp/0321486811
Il libro del drago, spesso viene contrapposto al SICP, altro libro must che dovrebbe essere letto. E in questo caso per esperienza posso dire che merita una lettura!
What have you tried? How did you fail?
Have you read the VHDL LRM?
Do you know anything about compiler design? The syntax analysis is the same. Have you heard of the dragon book? That's pretty much the definitive book on compiler design.
Right now you should use that library the top comment guy recommends but once you get more familiar with C and CS you should read this and learn to make your own json parser.
I haven't done that LISP project, but I recommend Crafting Interpreters.
I know that you say you want to build a compiler. And this tripped me up when I first started. But an interpreter and a compiler are very, very similar--to the point of being nearly identical. You can build a compiler by building an interpreter, then swapping out the immediate-calculation functions for functions that generate machine code (or LLVM IR or whatever). You will often want at least a basic interpreter for your language anyway, if for no other purpose than simplifying expressions.
I also recommend the "Dragon Book", which is pretty much the textbook on compiler design. Reading this book was absolutely eye-opening, because it presents a bunch of very basic and general information that everybody in language design just assumes you know.
I really struggled when I started before I read this book because so much stuff is very much simpler than I imagined. For instance, for my first language project, I got stuck on codegen and never finished. I was stuck on the basic question: okay, but how do I know what order to generate the code in? It seemed like a very hard question, requiring DAGs and dependency analysis and all sorts of shit like that, and I got discouraged and quit. Especially because so many tutorials are like "now that you've parsed your language, interpreting it is easy and left as an exercise for the reader". Guess what? It is easy. It's just a brain-dead post-order traversal. But literally the only place that ever said it that simply and directly was the Dragon Book.
Check out r/programminglanguages and see the kind of things people are talking about, that's a whole subreddit dedicated to writing programming languages. Check out the links in the sidebar for related communities and basic computer science topics you'll want to be familiar with before getting start
The main topic you need to know about before writing a language is compilers. There's a lot of great resources out there, many of them free:
https://www.edx.org/course/compilers
https://www.cs.cornell.edu/courses/cs6120/2020fa/self-guided/
https://www.amazon.com/Compilers-Principles-Techniques-Tools-2nd/dp/0321486811?pldnSite=1
Good question, and as u/xoner2 commented, you can use string.gmatch()
's %b
specifier to match balanced pairs of characters, but in general it'd be difficult to write a reliable parser using a function like that because often in languages substrings can have different meanings in different contexts. For example, in HTML, you can have angle brackets in a quoted element attribute, and matching %b<>
wwon't always give what you want:
> iter = string.gmatch('<hi rah=">"> <there>', '%b<>') > iter() <hi rah=">
Parsing can be surprisingly complicated, so I'd recommend reading about how parsers are conventionally written. Often they'll have a lexer/scanner component that strips unneeded whitespace and identifies meaningful "tokens" for the language being parsed, and then the parser proper defines the "grammar", or the valid relationships between the tokens. (Yes, it can be confusing that "parser" refers to the whole parsing system as well as a particular component within it.)
Hand-written parsers often use recursive descent, and otherwise special tools are often used to help: - scanner generators like GNU Flex - parser generators like GNU Bison - Parsing Expression Grammars like Lua Parsing Expression Grammars or LPeg - toolkits like ANTLR
I started out learning about parsing with the "Dragon Book", Compilers: Principles, Techniques, and Tools, which was a common recommendation when I was in school. I enjoyed the parts of it I went through but it's fairly academic and starts doing everything from scratch in C, which won't be to everyone's tastes. A Guide to Parsing: Algorithms and Terminology looks like a good overview of the field.
If you dont' want to do too much reading right now but still have a robust parser, maybe look into LPeg. I haven't used PEGs myself, but my impression is they'd be suitable for your purposes here.
I'm not a native speaker, but I might be able to help you on this one.
The Dragon Book, a very famous book in Computer Science, uses first person plural. Here's a quote:
> Up to this point we have treated a compiler as a single box that maps a source program into a semantically equivalent target program. If we open up this box a little, we see that there are two parts to this mapping: analysis and synthesis.
Recently I read a lot of papers on generative adversarial networks. All of them -- as far as I remember -- use first person plural too. Here's a quote from WarpGAN: Automatic Caricature Generation:
> We propose, WarpGAN, a fully automatic network that can generate caricatures given an input face photo [...] We introduce an identity-preserving adversarial loss that aids the discriminator to distinguish between different subjects.
Also, in my experience passive voice is common in academic writing. When I write academic papers or any piece of technical documentation I usually prefer passive voice.
I hope that helps!
Macroeconomists have the purple book on international macro.
Computer scientists have the purple dragon book on compilers.
They serve roughly the same purpose.
Get the “dragon book”. There is nothing else like it.
It’s harder than you think. And something viscerally satisfying when you understand how it all works.
https://www.amazon.com/Compilers-Principles-Techniques-Tools-2nd/dp/0321486811
I don't think you're asking for advice, rather you want a reading list of good resources. Here's what I can think of from the top of my head:
Of course, nobody should skip the "Dragon Book" if he/she is serious about compilers: https://www.amazon.com/Compilers-Principles-Techniques-Tools-2nd/dp/0321486811
The Dragon book. Link below https://www.amazon.com/Compilers-Principles-Techniques-Tools-2nd/dp/0321486811
I should probably read Compilers: Principles, Techniques and Tools (aka the Dragon book) now...
আমি তেমন ভালো ভাবে বোঝাতে পারি না। প্রোগ্রামিং ভাষা কিভাবে কাজ করে জানতে চাইলে আমার থেকে ভালো সোর্স হচ্ছে craftinginterpreters.com/ অথবা dragon book।
C# for Systems Programming wasn't a proposal to start anything, Midori powered Bing Asian cluster nodes for quite a while, before Windows was brought back into the picture.
Here, you seem like being in deep need of reading this.
https://www.amazon.com/Compilers-Principles-Techniques-Tools-2nd/dp/0321486811
https://www.cs.princeton.edu/~appel/modern/ml/
https://www.amazon.com/Project-Oberon-Design-Operating-Compiler/dp/0201544288
Then you might understand why those choices were made, and what the alternatives where.
If not, I couldn't care less, already wasted too much of my precious time with you.
Honestly, I haven't read it, but the classic answer to your question is "The dragon book"
https://www.amazon.com/Compilers-Principles-Techniques-Tools-2nd/dp/0321486811
I had this exact edition of the book for my OS class. I also appreciated the compiler "dragon book."
https://www.amazon.com/Compilers-Principles-Techniques-Tools-2nd/dp/0321486811
She's trying to hide her disappointment. I'm pretty sure this is the one she wanted .
OK, a few things:
It looks like you're trying to build a shift/reduce parser, which is a form of an LR parser, for your language. LR parsers try to reduce symbols into more abstract terms as soon as possible. To do this, an LR parser "remembers" all the possible reductions that it's pursuing, and as soon as it sees the input symbols that correspond to a specific reduction, it will perform that reduction. This is called "handle finding".
> If I am correct, my Automaton is a DFA?
When the parser is pursuing a reduction, it's looking for sequences of symbols that match the right-hand sides of the relevant (to our current parse state) productions in our grammar. Since the right-hand sides of all the productions in a grammar are simple sequences, all the handle finding work can be done by a DFA. Yes, the handle recognizer of your parser is a DFA. But keep in mind that it needs to be combined with other parts to make a full parser, and your actual grammar can't be recognized with just a DFA.
In particular, you've shown the ACTION
table for a shift/reduce parser. It determines what to do when you encounter a symbol in the input stream. But a shift/reduce parser typically needs a second table as well - the GOTO
table - that determines what to do after a reduction has taken place.
One other thing that's worth mentioning: you've expressed your ACTION
table as a plain DFA transition table. That's not necessarily wrong, but it's not commonly done that way. Instead of reducing when you reach a certain state, it's common to instead attach an action - either 'shift' or 'reduce' ('accept') - to each transition itself. So in a shift/reduce parser, your table might look more like this:
[ | ] | < | > | id | / | attr | |
---|---|---|---|---|---|---|---|
0 | S1 | S4 | |||||
1 | S2 | R3 : Reduce Tag -> [ id ] | |||||
2 | R3 | R7 : Reduce Tag -> < id ??? / > | |||||
4 | S5 | S10 | R9 : Reduce Tag -> < id ??? > | ||||
5 | R9 | S6 | S8 R12 : Reduce Tag -> < / id > | ||||
6 | R7 | ||||||
8 | R9 | S6 | S8 | ||||
10 | S11 | ||||||
11 | R12 |
Note that R7
and R9
aren't well-formed, since multiple sequences of input tokens might cause you to reach these actions. While it would be possible to construct a shift / reduce parser this way, it's not commonly done. Typically, the DFA to recognize handles is an acyclic graph, but your have a self-transition in state 8.
> What would be the best way of implementing this automaton in C++? Do I really have to make a huge array?
In general, yes, you need a big array (or, as suggested before, two big arrays). But you can use any space-saving technique you want. For example, since most entries in the ACTION
table are invalid, one could represent that data with a sparse array data structure. Also, both The Dragon Book and Cooper and Torczon briefly cover parser-specific ways to compress those tables. For example, notice that rows 5 and 8 in your example have the same entries. Most real grammars have multiple instances of identical rows, so factoring out this commonality can save enough space that the extra complexity is worth it.
I'm a little surprised that you're building a parser like this by hand, though. Typically people do one of two things:
You're sort of doing a mix of the two, which means you have the downsides of both approaches. You need to track all the states and transitions by hand, instead of relying on tools to automate that process, yet you don't get the flexibility of a hand-coded recursive descent parser.
If you're doing this for education's sake, then by all means proceed. I'd highly encourage you to pick up a book on parsing; I think Cooper and Torczon is a great source. But if you just want a parser that works, I'd definitely recommend using a tool or using a more direct approach, like recursive-descent.
> What book to read to understand how to develop a general compiler?
A lot of people recommend Compilers: Principles, Techniques, and Tools (AKA the Dragon Book), but a Redditor recommended Engineering a Compiler, and I think it's the better book.
well, the common source out there that i can't avoid recommend is the Dragon's book
http://www.amazon.com/Compilers-Principles-Techniques-Tools-2nd/dp/0321486811
Writing a lambda in, eg, Lisp is indeed a few lines of code, but the same process of translation down to binary will still happen, with potentially complex intermediate forms. (Edit: Lisp is interpreted, but that just means the translation to binary is happening at runtime instead of compile time.) The resultant assembly is very likely more than a few lines, even with simple 1960s-era compilers / interpreters.
While older hardware and as-yet undiscovered compiler techniques may have limited exactly which features compilers could support in the past, the translation down to imperative assembly (and then binary) has always had to happen to run programs written in high-level languages.
In fact, the amount of computation it took to translate Lisp to assembly in days of yore is part of what gave rise to special Lisp machines, architected explicitly to make running Lisp efficient.
That said, compilers are certainly getting more complex as the field advances, and today's generated assembly is certainly different than it used to be. The number of people who must understand a particular compiler at a deep level is (ideally) a small fraction of the number of people who will write code in the compiler's source language, so compilers are the logical place to house arcane optimizations and complex feature implementations.
If you are interested in learning more about compiler guts, I'd look around for a compiler geared towards students rather than a compiler used to create production binaries. (Sorry, I don't know of any "teaching compilers" off the top of my head.) Or, grab a copy of the dragon book and try building your own! :D
Disclaimer: I am a book hoarder, I buy a lot of programming books that I haven't gotten the chance to read so take my advice with a grain of salt.
I just finished my masters and I have saved about 2 or 3 of the books that I needed for classes (I started renting in the beginning so I couldn't keep the book if I wanted to). On top of that, I've bought books that were textbooks at one time but are way out of date, for example the Dragon Book. I have The C++ Programming Language and Java Concurrency in Practice. Both of those books are going to be obsolete but they are super useful. If you don't need the money right away and are interested in the topic then I recommend keeping it.
Was expecting art from this book, aka "The Dragon Book":
http://www.amazon.com/Compilers-Principles-Techniques-Tools-2nd/dp/0321486811
was left both let down and impressed.
No, I can't. Have you even peaked inside a book on compilers? RMS is not just a unique person with a set of original ideas on software, he is a a fucking genious, having completed the most rigorous math course in the country. The fact that he casually admits not having installed GNU/Linux, doesn't have any bearing on his abilities or accomplishments. Of course GCC took off, but compilers aren't trivial creations.
http://www.amazon.com/Compilers-Principles-Techniques-Tools-2nd/dp/0321486811
The Dragon Book.
I have the first edition that I bought back in the early 90s. http://www.amazon.com/Compilers-Principles-Techniques-Alfred-Aho/dp/0201100886
Its a great idea. There's actually a fair amount of work that goes into that (which I personally find very mind-numbing, but you might get a kick out of). Also a lot of theory going on under the hood. A lot of good resources though.
Abelson/Sussman's SICP and Knuth's Concrete Math are supposed to be very intro, but both of them are a pretty brutal read :)
Since I don't know any advanced books, I'll list what beginner CS people like to talk about:
Compilers: Principles, Techniques, and Tools by Aho et al.
Introduction to Computing Systems: From bits & gates to C & beyond by Patt & Patel.
Introduction to the Theory of Computation by Sipser.
Computer Systems: A Programmer's Perspective by Bryant & O'Hallaron.
Concepts, Techniques, and Models of Computer Programming by Van Roy.
Programming Language Pragmatics by Scott
Thinking Functionally with Haskell by Bird.
xv6.
Operating Systems: Three Easy Pieces by Remzi H. Arpaci-Dusseau..
If you're looking for books I would suggest the Dragon book and Constructing Language Processors for Little Languages. Both of these were very helpful to me when I did something similar to what you're doing.
Shockingly enough the experience I gained trying to write my own compiler has been extremely useful in other areas, so I personally think this is a great idea. You may want to check out some projects that already go this- Twig, for example, is a template processing language that compiled to native php.
I tried to come up with a simple explanation how a compiler works but I feel unable to provide a wording which is still somehow correct. I guess there is a reason why this book has 1000 pages: https://www.amazon.com/Compilers-Principles-Techniques-Tools-2nd/dp/0321486811
The simplest explanation would be that somewone wrote a program which the computer understands, that is able to detect your language and converts it also to something your computer can understand.
No, I'm not getting anything confused!
I started in the 70-ties with asm, worked my way through C, then C++ and lately Java. At some point during that time I've implemented a Pascal compiler and I've read the dragon book cover to back and again.
I think I've got a pretty firm grasp of what call by reference is, and let me tell you: Java DOES NOT have call by reference, ONLY call by value.
If there's anyone who's confused it's you. Sorry.
Things get a LOT easier mentally when you call it a pointer in Java. This is just terminology, even the Java designers knew it was a pointer (hence the name NullPointerException).
In Java, objects are only accessible via pointers. To be more consistent with C++, every variable in Java should actually be written as:
MyObject obj* = new MyObject();
But since there's no concept of a stack allocated object in Java, you would always have to put that * there, and so they left it out.
And indeed, for pass-by-reference, a reference to whatever you have (a pointer in this case) has to be passed. So that's indeed a double indirection. In C++ you can even express this explicitly, since there's not only call-by-reference semantics, but also the address off operator &. Applying the & operator to a pointer (e.g. foo*) yields a double pointer (foo**). This is what happens behind the scenes when you use call-by-reference in C++ and various other languages.
In Java, everything but primitives is a pointer, and that pointer is passed, BY VALUE. In C++ terminology, there's only foo*, foo** does not exist. Hence, you can follow that pointer from within a method and thus modify the object that is being pointed too, but you CAN NOT re-assign the pointer as I showed above.
Pass-by-reference semantics DEMAND that a pointer can be re-assigned.
Modifying that was is being pointed too does not count. You are CONFUSED with modifying a stack allocated object. If you pass that into a method and can modify it, then it's pass by reference, since the object itself is passed (no pointer involved), but in Java there is no stack allocated object, only a pointer to an object and that pointer is passed by value.
If you're still confused, please read the dragon book. Even if not for this argument, it will greatly enhance your understanding of how languages and computers work.
The famed Dragon Book has most of what you need. I am personally more fond of the Modern Compiler Implementation in {Java,C,ML} series, which I recommend you check out as well.
There are a lot of packages that make implementing languages easier, and you don't have to start from scratch. It's also pretty dumb to make an entirely new language when a simple embedded domain-specific language (EDSL) will suffice.
>We are talking about a high-level language compiler, remember?
I still consider C a <em>high level language</em>. Some people don't for various reasons..
>You were complaining that it compiles to C rather than emit instructions.
You simply read/took my post wrong. Nowhere am I complaining. Merely making an observation. It is not an unusual feat for a compiler to generate assembly instructions or machine code. Nor would I call it super difficult to write a compiler, but rather straightforward.
>If you are going to emit instructions, it's up to you to write your own optimizer.
Or buy/obtain a compiler that already is capable of doing this step.