More from Blog - Bitfield Consulting
The night is dark and full of errors—and durable Rust software is not only ready for them, but handles them sensibly. Let’s see how, by returning to our line-counter project.
Meditation is easy when you know what to do: absolutely nothing! It's hard at first, like trying to look at the back of your own head, but there's a knack to it.
The secret of being a great coder is to write terrible code. Wait, wait. Hear me out: I’m going somewhere with this.
Thou shalt not suffer a flaky test to live, because it’s annoying, counterproductive, and dangerous: one day it might fail for real, and you won’t notice. Here’s what to do.
Leaving a job is never easy, and it’s a consequential decision. But when it’s time, it’s time. Here’s how to escape the comfort trap, and take the next step in your career.
More in programming
There's this thing in Python that always trips me up. It's not that tricky, once you know what you're looking for, but it's not intuitive for me, so I do forget. It's that shadowing a variable can sometimes give you an UnboundLocalError! It happened to me last week while working on a workflow engine with a coworker. We were refactoring some of the code. I can't share that code (yet?) so let's use a small example that illustrates the same problem. Let's start with some working code, which we had before our refactoring caused a problem. Here's some code that defines a decorator for a function, which will trigger some other functions after it runs. def trigger(*fns): """After the decorated function runs, it will trigger the provided functions to run sequentially. You can provide multiple functions and they run in the provided order. This function *returns* a decorator, which is then applied to the function we want to use to trigger other functions. """ def decorator(fn): """This is the decorator, which takes in a function and returns a new, wrapped, function """ fn._next = fns def _wrapper(): """This is the function we will now invoke when we call the wrapped function. """ fn() for f in fn._next: f() return _wrapper return decorator The outermost function has one job: it creates a closure for the decorator, capturing the passed in functions. Then the decorator itself will create another closure, which captures the original wrapped function. Here's an example of how it would be used[1]. def step_2(): print("step 2") def step_3(): print("step 3") @trigger(step_2, step_3) def step_1(): print("step 1") step_1() This prints out step 1 step 2 step 3 Here's the code of the wrapper after I made a small change (omitting docstrings here for brevity, too). I changed the for loop to name the loop variable fn instead of f, to shadow it and reuse that name. def decorator(fn): fn._next = fns def _wrapper(): fn() for fn in fn._next: fn() And then when we ran it, we got an error! UnboundLocalError: cannot access local variable 'fn' where it is not associated with a value But why? You look at the code and it's defined. Right out there, it is bound. If you print out the locals, trying to chase that down, you'll see that there does not, in fact, exist fn yet. The key lies in Python's scoping rules. Variables are defined for their entire scope, which is a module, class body, or function body. If you define a variable within a scope, anywhere inside a function, then that variable has that name as its own for the entire scope. The docs make this quite clear: If a name binding operation occurs anywhere within a code block, all uses of the name within the block are treated as references to the current block. This can lead to errors when a name is used within a block before it is bound. This rule is subtle. Python lacks declarations and allows name binding operations to occur anywhere within a code block. The local variables of a code block can be determined by scanning the entire text of the block for name binding operations. See the FAQ entry on UnboundLocalError for examples. This comes up in a few other places, too. You can use a loop variable anywhere inside the enclosing scope, for example. def my_func(): for x in [1,2,3]: print(x) # this will still work! # x is still defined! print(x) So once I saw an UnboundLocalError after I'd shadowed it, I knew what was going on. The name was used by the local for the entire function, not just after it was initialized! I'm used to shadowing being the idiomatic thing in Rust, then had to recalibrate for writing Python again. It made sense once I remembered what was going on, but I think it's one of Python's little rough edges. This is not how you'd want to do it in production usage, probably. It's a somewhat contrived example for this blog post. ↩
Picasso got it right: Great artists steal. Even if he didn’t actually say it, and we all just repeat the quote because Steve Jobs used it. Because it strikes at the heart of creativity: None of it happens in a vacuum. Everything is inspired by something. The best ideas, angles, techniques, and tones are stolen to build everything that comes after the original. Furthermore, the way to learn originality is to set it aside while you learn to perfect a copy. You learn to draw by imitating the masters. I learned photography by attempting to recreate great compositions. I learned to program by aping the Ruby standard library. Stealing good ideas isn’t a detour on the way to becoming a master — it’s the straight route. And it’s nothing to be ashamed of. This, by the way, doesn’t just apply to art but to the economy as well. Japan became an economic superpower in the 80s by first poorly copying Western electronics in the decades prior. China is now following exactly the same playbook to even greater effect. You start with a cheap copy, then you learn how to make a good copy, and then you don’t need to copy at all. AI has sped through the phase of cheap copies. It’s now firmly established in the realm of good copies. You’re a fool if you don’t believe originality is a likely next step. In all likelihood, it’s a matter of when, not if. (And we already have plenty of early indications that it’s actually already here, on the edges.) Now, whether that’s good is a different question. Whether we want AI to become truly creative is a fair question — albeit a theoretical or, at best, moral one. Because it’s going to happen if it can happen, and it almost certainly can (or even has). Ironically, I think the peanut gallery disparaging recent advances — like the Ghibli fever — over minor details in the copying effort will only accelerate the quest toward true creativity. AI builders, like the Japanese and Chinese economies before them, eager to demonstrate an ability to exceed. All that is to say that AI is in the "Good Copy" phase of its creative evolution. Expect "The Great Artist" to emerge at any moment.
I have added syntax highlighting to my blog using tree-sitter. Here are some notes about what I learned, with some complaining. static site generator markdown ingestion highlighting incompatible?! highlight names class names styling code results future work frontmatter templates feed style highlight quality static site generator I moved my blog to my own web site a few years ago. It is produced using a scruffy Rust program that converts a bunch of Markdown files to HTML using pulldown-cmark, and produces complete pages from Handlebars templates. Why did I write another static site generator? Well, partly as an exercise when learning Rust. Partly, since I wrote my own page templates, I’m not going to benefit from a library of existing templates. On the contrary, it’s harder to create new templates that work with a general-purpose SSG than write my own simpler site-specific SSG. It’s miserable to write programs in template languages. My SSG can keep the logic in the templates to a minimum, and do all the fiddly stuff in Rust. (Which is not very fiddly, because my site doesn’t have complicated navigation – compared to the multilevel menus on www.dns.cam.ac.uk for instance.) markdown ingestion There are a few things to do to each Markdown file: split off and deserialize the YAML frontmatter find the <cut> or <toc> marker that indicates the end of the teaser / where the table of contents should be inserted augment headings with self-linking anchors (which are also used by the ToC) Before this work I was using regexes to do all these jobs, because that allowed me to treat pulldown-cmark as a black box: Markdown in, HTML out. But for syntax highlighting I had to be able to find fenced code blocks. It was time to put some code into the pipeline between pulldown-cmark’s parser and renderer. And if I’m using a proper parser I can get rid of a few regexes: after some hacking, now only the YAML frontmatter is handled with a regex. Sub-heading linkification and ToC construction are fiddly and more complicated than they were before. But they are also less buggy: markup in headings actually works now! Compared to the ToC, it’s fairly simple to detect code blocks and pass them through a highlighter. You can look at my Markdown munger here. (I am not very happy with the way it uses state, but it works.) highlighting As well as the tree-sitter-highlight documentation I used femark as an example implementation. I encountered a few problems. incompatible?! I could not get the latest tree-sitter-highlight to work as described in its documentation. I thought the current tree-sitter crates were incompatible with each other! For a while I downgraded to an earlier version, but eventually I solved the problem. Where the docs say, let javascript_language = tree_sitter_javascript::language(); They should say: let javascript_language = tree_sitter::Language::new( tree_sitter_javascript::LANGUAGE ); highlight names I was offended that tree-sitter-highlight seems to expect me to hardcode a list of highlight names, without explaining where they come from or what they mean. I was doubly offended that there’s an array of STANDARD_CAPTURE_NAMES but it isn’t exported, and doesn’t match the list in the docs. You mean I have to copy and paste it? Which one?! There’s some discussion of highlight names in the tree-sitter manual’s “syntax highlighting” chapter, but that is aimed at people who are writing a tree-sitter grammar, not people who are using one. Eventually I worked out that tree_sitter_javascript::HIGHLIGHT_QUERY in the tree-sitter-highlight example corresponds to the contents of a highlights.scm file. Each @name in highlights.scm is a highlight name that I might be interested in. In principle I guess different tree-sitter grammars should use similar highlight names in their highlights.scm files? (Only to a limited extent, it turns out.) I decided the obviously correct list of highlight names is the list of every name defined in the HIGHLIGHT_QUERY. The query is just a string so I can throw a regex at it and build an array of the matches. This should make the highlighter produce <span> wrappers for as many tokens as possible in my code, which might be more than necessary but I don’t have to style them all. class names The tree-sitter-highlight crate comes with a lightly-documented HtmlRenderer, which does much of the job fairly straightforwardly. The fun part is the attribute_callback. When the HtmlRenderer is wrapping a token, it emits the start of a <span then expects the callback to append whatever HTML attributes it thinks might be appropriate. Uh, I guess I want a class="..." here? Well, the highlight names work a little bit like class names: they have dot-separated parts which tree-sitter-highlight can match more or less specifically. (However I am telling it to match all of them.) So I decided to turn each dot-separated highlight name into a space-separated class attribute. The nice thing about this is that my Rust code doesn’t need to know anything about a language’s tree-sitter grammar or its highlight query. The grammar’s highlight names become CSS class names automatically. styling code Now I can write some simple CSS to add some colours to my code. I can make type names green, code span.hilite.type { color: #aca; } If I decide builtin types should be cyan like keywords I can write, code span.hilite.type.builtin, code span.hilite.keyword { color: #9cc; } results You can look at my tree-sitter-highlight wrapper here. Getting it to work required a bit more creativity than I would have preferred, but it turned out OK. I can add support for a new language by adding a crate to Cargo.toml and a couple of lines to hilite.rs – and maybe some CSS if I have not yet covered its highlight names. (Like I just did to highlight the CSS above!) future work While writing this blog post I found myself complaining about things that I really ought to fix instead. frontmatter I might simplify the per-page source format knob so that I can use pulldown-cmark’s support for YAML frontmatter instead of a separate regex pass. This change will be easier if I can treat the html pages as Markdown without mangling them too much (is Markdown even supposed to be idempotent?). More tricky are a couple of special case pages whose source is Handlebars instead of Markdown. templates I’m not entirely happy with Handlebars. It’s a more powerful language than I need – I chose Handlebars instead of Mustache because Handlebars works neatly with serde. But it has a dynamic type system that makes the templates more error-prone than I would like. Perhaps I can find a more static Rust template system that takes advantage of the close coupling between my templates and the data structure that describes the web site. However, I like my templates to be primarily HTML with a sprinkling of insertions, not something weird that’s neither HTML nor Rust. feed style There’s no CSS in my Atom feed, so code blocks there will remain unstyled. I don’t know if feed readers accept <style> tags or if it has to be inline styles. (That would make a mess of my neat setup!) highlight quality I’m not entirely satisfied with the level of detail and consistency provided by the tree-sitter language grammars and highlight queries. For instance, in the CSS above the class names and property names have the same colour because the CSS highlights.scm gives them the same highlight name. The C grammar is good at identifying variables, but the Rust grammar is not. Oh well, I guess it’s good enough for now. At least it doesn’t involve Javascript.
Simplify complex decisions by separating upsides from downsides, investing in upsides, vetoing with downsides, and using an appropriate decision framework.