Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Basics of Compiler Design (2000) [pdf] (diku.dk)
207 points by _mtkq on Dec 13, 2019 | hide | past | favorite | 34 comments


sigh

Yet another book which spends more than 1/3 of its pages on parsing.

Syntax doesn't matter (much), semantics matters. You need to know how to implement efficiently various PL stuff within your compiler: exceptions and algebraic effects, modules and parametric modules, parametric polymorphism and optimizations for it in presence of modularity, fibers, method dispatching in Object Oriented langs, type inference etc etc.

These books are about parsing, not about compilers. They spend a lot of time explaining how to parse a simple featureless language, instead of just use a parser generator and focus on actual programming languages design and features.

Appel's Modern compiler in ML/Java/C is way better. Also there is a great course from the creator of Chez Scheme (I bet I've seen the whole course available in the internet, but I couldn't find it anymore)

https://www.cs.princeton.edu/~appel/modern/

http://composition.al/blog/2017/07/31/my-first-fifteen-compi...


I don't think it matters that a third of the pages are spent on parsing - that only covers 20% of the chapters. It's just that parsing takes somewhat more space to explain because the figures, tables, and examples take up relatively more space than when explaining e.g. type checking. I think it is a superficial knee-jerk reaction to classify this book as mostly being about parsing.

Also, while it is true that production compilers (especially optimising ones) have relatively little effort spent on their parsers compared to everything else, if you consider a bare minimum compiler, parsing is going to be a proportionally much larger part of it - especially if you do not use a prewritten parser generator (which this book assumes you do not, since it's partly about how you would create such a generator).

I definitely agree that too many compiler resources place too much emphasis on parsing, but I don't think this specific book is a particularly egregious case. I have also read Appel's books, and while they are good, I think you need to have read something else first to cover the basics.


> the figures, tables, and examples take up relatively more space than when explaining e.g. type checking.

When you explain instruction selection for instance, there are also lots of figures and tables and examples you could add. This book gets around that by not even mentioning instruction selection. Meh.

The register allocation chapter does look pretty decent, it looks like it manages to explain all of liveness analysis, graph coloring, and spilling in 15 pages. Including examples and figures and tables. It looks like the author is able to explain complex stuff succinctly.


> bare minimum compiler

But who needs books on that? Especially considering that if I would like to write a simple toyish compiler I would rather use a parser generator (handcrafting a parser makes sense only in the case of serious industrial grade compiler).

These books are books on parsers, not books on compilers. IMHO for learning a compiler craft it's much better to use a parser generator (or even Lisp) to skip the syntax part and move to something that constitutes an actual compiler: runtime interaction, exceptions mechanisms, code generation, type inference, intermediate languages, effect tracking etc etc.


This is a textbook for a university course. The intended audience for this book needs a bare minimum compiler, because that is all they will have time to write.

You can't just say "use a parser generator". Who is supposed to write the parser generator, then? This book is for people who want to learn how to do all that foundational stuff, not an industrial programmer who needs to build a good product quickly.


>You can't just say "use a parser generator"

You can just give students grammar for parser generator, and a parsetree/AST. Or use Lisp (Common Lisp or Scheme) and avoid syntax whatsoever.


Yeah, imagine going through all that, trying to build your first ever Compiler, in an 8 week course as a CS freshman... (which is the intended audience)


> Yet another book which spends more than 1/3 of its pages on parsing.

Yup, lexing and parsing are, by far, the easiest part of a compiler. Like about 0.1% of the work.


yes.

I'm trying to build a relational lang (in python, F#, Swift and now rust) and have stayed at the AST level all this time.

I only put the lexer to be done from someone that wanna learn rust.

----

Is incredible how hard is tu build more complex semantics than "1 + 1". I have even get some nice answers from the guy of crafting-interpreters and he not even now some of them!

Things you wanna know, and maybe you not find (easily) how:

* How incorporate AGDT and pattern matching in a lang. It take me A LOT of time to figure that you can encode it in arrays + labels (now you get more samples of do it... Is getting more popular with hobbits to implement it)

* How implement type inference

* How create generators, coroutines and the like, specially if your host not have it

* ok, add a debugger

* ok, add support for a IDE or at least a code editor

* How design a FFI system?

* How make a lang without GC and it work fine?

* ... and with a GC, not using the one in your host?

* Continuation passing style that perform well in a lang without call-cc

* How to test the lang, correctly! (have not even the idea yet)

* You read tons of paper with amazing ideas, that even claim to have worked. Then you find you can't read most of the "math stuff" inside and no code to look at them. Them PL guys get mad because we ignore them

And many many many more.


> You read tons of paper with amazing ideas, that even claim to have worked.

They often do work, on a toy compiler that has a carefully selected subset of features expected in a professional compiler. Things like "let's not allow casting! or exceptions! or inline assembler! or interfacing to C! or integer wraparound!"


I know you have decades of experience with compilers, but are you sure 0.1% isn't a bit low? I can imagine 0.1% to be true if you have experience writing parsers, and just make a shitty parser for a batch compiler, and spend a lot of time on the semantics on the language.

You can also emit some shitty stack machine bytecode with a single AST pass, and write a simple interpreter for that bytecode, and the effort would be about the same as writing the simple parser by hand.

And you can spend a lot of time making a parser actually robust, threading file positions through the pipeline, handle errors nicely, allow for efficient incremental parsing, and so on. I could probably spend man-months and man-years on such a parser.


It’s probably true.

You write the parser once and then eventually forget about it except if there is a language change. Then you spend a small amount of time adding the syntax for that change to the parser and again forget about it.

On the other hand, _lots_ of folks work on the optimization passes, instruction selection, register allocation and the core of the IR. Since instruction selection and register allocation have CPU specific features usually, they have some logic whose complexity (in terms of size of the code) is proportional to the number of CPUs targeted.

So 0.1% is probably pretty close to the true number.


Inversely with LLVM around (and the JVM, sort of), people now get to focus on building the frontend of the compiler and forget about all those optimisation passes.


Only suckers forget about optimization passes. ;-)

Don't forget that:

- Optimization work on llvm and other compilers is still happening in anger.

- Frontend is a relative term. For example, the "frontend" to LLVM when WebKit used LLVM as a "backend" was a full-blown compiler with >100KLoC of code. Since then that "frontend" grew its own backend and dropped LLVM, but point is, just because you're a frontend doesn't mean you don't have optimizations. Frontend doesn't just mean parser.

- Rust's or Haskell's "frontends" to LLVM are seriously impressive compilers and the complexity is not to do with parsing AFAICT but all the other stuff (type checking comes to mind but there's also really impressive lowering and optimization).

- Backends aren't just complex because of optimization passes. The register allocator and instruction selector aren't so much optimization passes as necessary transformations for turning a compiler's IR into machine code. Those things just grow and grow.

- The whispers in the wind that I'm hearing from compiler folks looking to make bank is that there's shitloads of work to be done porting compilers to new targets. It's much more work to port a compiler to a new target than it is to write a parser. That said, writing a frontend for a language as complex as Rust/Haskell/etc is probably on the same order of complexity as parting a backend or maybe even more complex - but that's not because of parsing; it's usually either because of System F type shit (if you have a statically typed language) or dynamic type inference (if you have a dynamically typed language).


>now get to focus on building the frontend of the compiler

Although LLVM is cool, it's just a backend, so besides frontend you need some runtime, type checker, type inference, and, if your lang is sufficiently complex, you still need a huge middle end layer which knows something about semantics of your language and how to optimize it properly.

So if your language is just a very simple featureless language with procedures and simple data types, like Pascal, you could just write a direct code generator. But when you have a language with GC, exceptions, modules and all that kind of stuff, you need a middle end, and a complex one.


No, not true at all. It depends on the language of course.


> I know you have decades of experience with compilers, but are you sure 0.1% isn't a bit low?

I've been working professionally on my compiler since 2013. That's about 2500 days.

0.1% of that is 2.5 days.

Yes, that's about how much of my time I've spent working on parsing issues.

Parsing, for compilers, is pretty much a solved problem. We know exactly how to do it. Parsing for editors and things might be more interesting still.


> but are you sure 0.1% isn't a bit low?

It's probably way too high. The reason is parsing is a small, simple, very well defined problem, and only rarely changes. This is quite unlike anything else in the compiler.

Unless, of course, you're talking about the C preprocessor. Prepare to spend an ungodly amount of time on that.


>You can also emit some shitty stack machine bytecode with a single AST pass

This is what the earliest C compilers did, right?


I once read an awesome paper that wrote a lisp compiler as a series of lisp macros that defined more and more of the language in terms of a smaller and smaller set of primitives. Then it started introducing new primitives that were lisp structured representations of assembly language, and rewrote the language in those terms. The final step of the compiler was to remove the parens and you had assembly language. I can't remember the name of this paper, any thoughts?


Maybe this is the same effect as when Hollywood tries to convert a novel to the screen. Often they'll spend half of their time budget on a decent adaptation of the first couple of chapters, then have to rush through the rest.


Let's say I want to skip the development of a parser and use a preexisting library for that; allowing me to dive into the semantics. Do you suggest any libraries/tools to take care of the parsing?


Instaparse is excellent (https://github.com/engelberg/instaparse) if you are using Clojure


Most code converts an array of items into another array of items (Ex: Sorting converts an arbitrary array into a sorted array).

If you're a more complicated fellow, you'll convert a tree of items into another tree of items. For example, collision detection in video games is often done with B-Trees, Raytracing with BVH trees that represent all the vertices on the screen.

The final step... the end-all be-all of complexity... is the arbitrary graph. That is: code that walks arbitrary graphs, and converts them into other arbitrary graphs. After all, most code have cycles, so you can't even make a DAG-assumption like in maximum flow.

That's about it. We may call it "dead code elimination", or "common subexpression elimination". But at the end of the day, all you're doing is running graph-analysis, and then converting that graph into a "more efficient" form.


Interesting perspective! Even graph analysis could be reduced to manipulating infinite tape in the end.


I was taught from, and have since taught with, this book. I like that it's so concise, where I think e.g. the Dragon Book contains way too much irrelevant information (and also dubious and obsolete implementation advice, like global symbol tables). Of the books I have read, I think this one strikes the best balance between breadth and depth when it comes to the theory of compiler design. It gives you sufficient information about how to handle every part of a compiler, but doesn't necessarily show you a lot of options in each area.

The only weakness is that it takes a mature programmer to go from the high-level descriptions and pseudocode in the book, to a concrete implementation. It helps if you write in a functional language, since the pseudocode is essentially Standard ML. I think this book might be well served by a small companion guide on how to practically implement the designs and algorithms it covers.


The book's chapter 3 does not describe the current state of the art. I find it is useful only as a historic background up to a certain point in time (1969?).

However, from a practical standpoint, if one wants to want to implement a parser (for the purpose of a compiler) or attempt to use the book in order to try to save time deciding what is a good parser that does not suffer from algorithmic shortcomings, one is advised to look at more modern algorithms. Notably, anything with LR in its name, anything called PEG or packrat can be outright avoided if one values one's time.


???

Then which one? Pratt and top down by hand?


As someone currently building a language, books like this have been critical. Basics of Compiler Design covers a lot of the common ground of compiler construction from a more theoretical standpoint.

Other recent books I've found really helpful include:

Crafting Interpreters by Bob Nystrom (@munificent on HN)

https://www.craftinginterpreters.com

Language Implementation Patterns by Terence Parr (creator of ANTLR)

https://pragprog.com/book/tpdsl/language-implementation-patt...



Folks interested in compiler design: here's a list of resources I put together for building compilers - http://hexopress.com/@joe/blog/2019/04/14/awesome-compiler-r...


Shouldn't the title be 2010 since the edition linked was published in 2010?


What's the go-to book on compilers in this day and age?


Saved! Nice book.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: