I thought I was done with this topic for while, happily planning a new blog post on generalised parsing, but here we are again. This post will be a kind of remix of the two postson optimising recursive ascent parsing. Why? Well, I wrote those posts directly based on the papers I refer to in there, and they present their optimisations based on code. At the time that felt quite natural, and I manually applied their ideas and some of my own to code examples. But when I introduced the topic of LR...| Whatever
Welcome back! Previously, on Optimising Recursive Ascent Parsing, we explored the ideas from a 1990 paper called Optimizing Directly Executable LR Parsers by Peter Pfahler. With the paper’s example grammar, and the described optimisations, we managed to optimise away 6 out of 15 states in the parser. But that’s peanuts compared to what we’ll do in this post! We’ll be taking inspiration from another 1990 paper, this one is called Even Faster LR Parsing by Nigel Horspool and Michael Whi...| Whatever
This is post picks up where we left off with parsing: Recursive Ascent. In the previous post I highlighted how parsing is all about grammars and (push-down) automata (PDA). And that if you follow the logic of how LL parsing has recursive descent, then LR parsing should have recursive ascent. Which it does! In this post we’ll explore a couple more techniques for making the recursive ascent parser a bit smaller and faster. We’ll explore the ideas from a 1990 paper called Optimizing Directly...| Whatever
Sorry if I broke the feed in your RSS reader. I’ve switched away from Jekyll to Zola for generating this blog. And I preserved only the blog post links (through HTML based redirects). I didn’t preserve the exact format of the RSS feed file, or really any of the design of the blog as you can see. I just picked a theme from among the ones on Zola’s list on the site, one that looked simple and didn’t contain any JavaScript. Then I forked Zola, because it has no plugin system and I needed...| Whatever
This is part 2 of old-school linear time parsing algorithms, which only need to go over the input once, without backtracking or caching. In part 1 we learnt about LL parsing, how to construct the parse tables for it, and how those relate to direct execution of the parser with recursive descent. Since part 1 and part 2 were originally one blog post that I simply cut in half after feedback that it was too long, this post assumes you’ve read part 1. It’s probably still pretty readable withou...| Whatever
Hello again! I’m picking up my series on Automata, with this post that goes into what I had always meant to get to: parsers. We’ll check out the old-school linear time parsing algorithms, which only need to go over the input once, without backtracking or caching. Originally this was one big post, but given the feedback I’ve gotten from (non-)readers, I’ve now split it up into two. In this first post we’ll check out LL, parse tables, and recursive descent. This post is meant to be re...| Whatever
Once upon a time, I wrote an interpreter for Stratego Core in Rust, which I named strs. Stratego Core is the core language that Stratego is compiled to before the compiler goes further (to Java, or previously to C). A core language is an intermediate representation that is a subset of the surface language. While I optimised that interpreter quite a bit, I noticed that the CTree (Stratego Core Abstract Syntax Tree) that the compiler spit out for me to interpret was very unoptimised. Therefore ...| Whatever
More than a year ago a friend of mine wanted to learn a bit more about Rust by trying out a project. He had a nice project in mind which suits Rust quite well I think. For fun I joined his effort and created an implementation at the same time as he did, discussing and comparing along the way. In this post I’ll tell you about the project specifics, but the point of the post is more an encouragement. If you’ve read about Rust before but haven’t tried it yet, find a small project like the ...| Whatever
At the end of my last post, three months ago by now, I promised a blog post about the Stratego interpreter that I am writing. In fact I promised it soon, which sadly became “soon”. Life happened, deadlines on top of deadlines with major stress. I made it through in one piece though, so here’s the long promised blog post. I assume you’re already a bit familiar with Rust, most of my blog posts use it. I’m using it for side-projects, one of which is writing a performant Stratego interp...| Whatever
I published my first crate to crates.io! It’s called aterm, and it’s a library that implements the Annotated Term format. Currently it can only parse and print the normal textual format, but I’m planning to add the other three formats too at some point. There are also a number of other improvements that I have planned. But I’m going to try to not make this post a brain-dump of meandering thoughts… What are ATerms The Annotated Term format[1] originates from the Centre for Mathematic...| Whatever
In this post I’d like to shortly discuss an idea I’ve had a long time ago about type systems and units of measure. The usual pitch about having units in the type system of a programming language starts with a sad story about some space craft crash because different teams used different measures of distance. The competing systems are usually Imperial vs RebelsMetric. Then units in the type system are introduced, which is a way to check that only numbers of the exact same unit are used in a...| Whatever
I’ve purchased a domain name: jeffsmits.net. This weblog is now on blog.jeffsmits.net. I’ve made sure this blog stays available from the old URLs, but I can’t guarantee that new content will be available there. One of the perks of having a domain name is that you can switch around the back-end that hosts the content. For now the hosting is still GitHub Pages, but that may change at some point in the future (I don’t have concrete plans yet). To stay up to date I do advise RSS subscribe...| Whatever
This is post number four in a series on Automata (in the formal languages / regex / parsing sense). It’s also part two of the “implementation-heavy” stuff, where we go into implementing automata for real and useful things. This one is more of a mix of theory and code, which I hope is more appealing than the previous post which were either one or the other. In part one I naively claimed that this would be two posts of implementation, but I’ve since found more to write about. Plus it al...| Whatever
This is post number three in a series on Automata (in the formal languages / regex / parsing sense). This is the promised “implementation-heavy” post, where we go into implementing automata for real and useful things. As always the programming language is Rust. By now I’ve actually had a bit of practice with the language, so hopefully the code will be less naive. Where in the previous post on Finite Automata we went through examples of direct encodings of specific automata, in this post...| Whatever
TL;DR: I’ve ported the tool cargo-benchcmp from Python to Rust and added some functionality. There is more to come which is mostly waiting for review in pull requests. I’ve been messing around with Rust for a while now, and I found a little utility called cargo-benchcmp by the famous BurntSushi (Andrew Gallant). You may have seen the benchmark comparisons in one of his blogposts already. I found out that the utility was a nice single file Python script. Not a quick and dirty hack, but cla...| Whatever
This is just a short post about my new terminal setup. I think it’s both pretty and useful. For example, it gives me the current time and info about version control when I’m in a directory with vcs: Skip to the end for terse instructions/commands for mac and linux. Terminal First off, you need a good terminal emulator that can handle 256 colours. Try echo $TERM. If it returns xterm-256color you’re good. You can use a script like this to get an overview of the kinds of colours that are i...| Whatever
Welcome back! This is my second post in a series on Automata. I decided to do another theory post first on context-free languages, and only afterwards start on a more implementation-heavy post about implementing this kind of theory in Rust for practically useful stuff. There is of course still code in this post as well :) I’ll start with a quick refresher, but for more details read the first post. Finite Automata refresher Last time, we looked at deterministic and non-deterministic finite a...| Whatever
What do Turing machines and regular expressions have in common? One is a theoretical model of a computer, and can be used to prove that some things cannot be computed. The other is a practical tool for matching strings. And yet they are both based on a simple computational model: a (very constrained) finite state machine (FSM). In this blog post we’ll go over the basics of this type of FSM and instead of going over proofs, we’ll go over examples and little implementations in Rust. For mor...| Whatever
A web log. Mostly about computer science-y stuff.| blog.jeffsmits.net
A web log. Mostly about computer science-y stuff.| blog.jeffsmits.net