More from wingolog
Earlier this weekGuileWhippet But now I do! Today’s note is about how we can support untagged allocations of a few different kinds in Whippet’s .mostly-marking collector Why bother supporting untagged allocations at all? Well, if I had my way, I wouldn’t; I would just slog through Guile and fix all uses to be tagged. There are only a finite number of use sites and I could get to them all in a month or so. The problem comes for uses of from outside itself, in C extensions and embedding programs. These users are loathe to adapt to any kind of change, and garbage-collection-related changes are the worst. So, somehow, we need to support these users if we are not to break the Guile community.scm_gc_malloclibguile The problem with , though, is that it is missing an expression of intent, notably as regards tagging. You can use it to allocate an object that has a tag and thus can be traced precisely, or you can use it to allocate, well, anything else. I think we will have to add an API for the tagged case and assume that anything that goes through is requesting an untagged, conservatively-scanned block of memory. Similarly for : you could be allocating a tagged object that happens to not contain pointers, or you could be allocating an untagged array of whatever. A new API is needed there too for pointerless untagged allocations.scm_gc_mallocscm_gc_mallocscm_gc_malloc_pointerless Recall that the mostly-marking collector can be built in a number of different ways: it can support conservative and/or precise roots, it can trace the heap precisely or conservatively, it can be generational or not, and the collector can use multiple threads during pauses or not. Consider a basic configuration with precise roots. You can make tagged pointerless allocations just fine: the trace function for that tag is just trivial. You would like to extend the collector with the ability to make pointerless allocations, for raw data. How to do this?untagged Consider first that when the collector goes to trace an object, it can’t use bits inside the object to discriminate between the tagged and untagged cases. Fortunately though . Of those 8 bits, 3 are used for the mark (five different states, allowing for future concurrent tracing), two for the , one to indicate whether the object is pinned or not, and one to indicate the end of the object, so that we can determine object bounds just by scanning the metadata byte array. That leaves 1 bit, and we can use it to indicate untagged pointerless allocations. Hooray!the main space of the mostly-marking collector has one metadata byte for each 16 bytes of payloadprecise field-logging write barrier However there is a wrinkle: when Whippet decides the it should evacuate an object, it tracks the evacuation state in the object itself; the embedder has to provide an implementation of a , allowing the collector to detect whether an object is forwarded or not, to claim an object for forwarding, to commit a forwarding pointer, and so on. We can’t do that for raw data, because all bit states belong to the object, not the collector or the embedder. So, we have to set the “pinned” bit on the object, indicating that these objects can’t move.little state machine We could in theory manage the forwarding state in the metadata byte, but we don’t have the bits to do that currently; maybe some day. For now, untagged pointerless allocations are pinned. You might also want to support untagged allocations that contain pointers to other GC-managed objects. In this case you would want these untagged allocations to be scanned conservatively. We can do this, but if we do, it will pin all objects. Thing is, conservative stack roots is a kind of a sweet spot in language run-time design. You get to avoid constraining your compiler, you avoid a class of bugs related to rooting, but you can still support compaction of the heap. How is this, you ask? Well, consider that you can move any object for which we can precisely enumerate the incoming references. This is trivially the case for precise roots and precise tracing. For conservative roots, we don’t know whether a given edge is really an object reference or not, so we have to conservatively avoid moving those objects. But once you are done tracing conservative edges, any live object that hasn’t yet been traced is fair game for evacuation, because none of its predecessors have yet been visited. But once you add conservatively-traced objects back into the mix, you don’t know when you are done tracing conservative edges; you could always discover another conservatively-traced object later in the trace, so you have to pin everything. The good news, though, is that we have gained an easier migration path. I can now shove Whippet into Guile and get it running even before I have removed untagged allocations. Once I have done so, I will be able to allow for compaction / evacuation; things only get better from here. Also as a side benefit, the mostly-marking collector’s heap-conservative configurations are now faster, because we have metadata attached to objects which allows tracing to skip known-pointerless objects. This regains an optimization that BDW has long had via its , used in Guile since time out of mind.GC_malloc_atomic With support for untagged allocations, I think I am finally ready to start getting Whippet into Guile itself. Happy hacking, and see you on the other side! inside and outside on intent on data on slop fin
Salutations, populations. Today’s note is more of a work-in-progress than usual; I have been finally starting to look at getting into , and there are some open questions.WhippetGuile I started by taking a look at how Guile uses the ‘s API, to make sure I had all my bases covered for an eventual switch to something that was not BDW. I think I have a good overview now, and have divided the parts of BDW-GC used by Guile into seven categories.Boehm-Demers-Weiser collector Firstly there are the ways in which Guile’s run-time and compiler depend on BDW-GC’s behavior, without actually using BDW-GC’s API. By this I mean principally that we assume that any reference to a GC-managed object from any thread’s stack will keep that object alive. The same goes for references originating in global variables, or static data segments more generally. Additionally, we rely on GC objects not to move: references to GC-managed objects in registers or stacks are valid across a GC boundary, even if those references are outside the GC-traced graph: all objects are pinned. Some of these “uses” are internal to Guile’s implementation itself, and thus amenable to being changed, albeit with some effort. However some escape into the wild via Guile’s API, or, as in this case, as implicit behaviors; these are hard to change or evolve, which is why I am putting my hopes on Whippet’s , which allows for conservative roots.mostly-marking collector Then there are the uses of BDW-GC’s API, not to accomplish a task, but to protect the mutator from the collector: , explicitly enabling or disabling GC, calls to that take BDW-GC’s use of POSIX signals into account, and so on. BDW-GC can stop any thread at any time, between any two instructions; for most users is anodyne, but if ever you use weak references, things start to get really gnarly.GC_call_with_alloc_locksigmask Of course a new collector would have its own constraints, but switching to cooperative instead of pre-emptive safepoints would be a welcome relief from this mess. On the other hand, we will require client code to explicitly mark their threads as inactive during calls in more cases, to ensure that all threads can promptly reach safepoints at all times. Swings and roundabouts? Did you know that the Boehm collector allows for precise tracing? It does! It’s slow and truly gnarly, but when you need precision, precise tracing nice to have. (This is the interface.) Guile uses it to mark Scheme stacks, allowing it to avoid treating unboxed locals as roots. When it loads compiled files, Guile also adds some sliced of the mapped files to the root set. These interfaces will need to change a bit in a switch to Whippet but are ultimately internal, so that’s fine.GC_new_kind What is not fine is that Guile allows C users to hook into precise tracing, notably via . This is not only the wrong interface, not allowing for copying collection, but these functions are just truly gnarly. I don’t know know what to do with them yet; are our external users ready to forgo this interface entirely? We have been working on them over time, but I am not sure.scm_smob_set_mark Weak references, weak maps of various kinds: the implementation of these in terms of BDW’s API is incredibly gnarly and ultimately unsatisfying. We will be able to replace all of these with ephemerons and tables of ephemerons, which are natively supported by Whippet. The same goes with finalizers. The same goes for constructs built on top of finalizers, such as ; we’ll get to reimplement these on top of nice Whippet-supplied primitives. Whippet allows for resuscitation of finalized objects, so all is good here.guardians There is a long list of miscellanea: the interfaces to explicitly trigger GC, to get statistics, to control the number of marker threads, to initialize the GC; these will change, but all uses are internal, making it not a terribly big deal. I should mention one API concern, which is that BDW’s state is all implicit. For example, when you go to allocate, you don’t pass the API a handle which you have obtained for your thread, and which might hold some thread-local freelists; BDW will instead load thread-local variables in its API. That’s not as efficient as it could be and Whippet goes the explicit route, so there is some additional plumbing to do. Finally I should mention the true miscellaneous BDW-GC function: . Guile exposes it via an API, . It was already vestigial and we should just remove it, as it has no sensible semantics or implementation.GC_freescm_gc_free That brings me to what I wanted to write about today, but am going to have to finish tomorrow: the actual allocation routines. BDW-GC provides two, essentially: and . The difference is that “atomic” allocations don’t refer to other GC-managed objects, and as such are well-suited to raw data. Otherwise you can think of atomic allocations as a pure optimization, given that BDW-GC mostly traces conservatively anyway.GC_mallocGC_malloc_atomic From the perspective of a user of BDW-GC looking to switch away, there are two broad categories of allocations, tagged and untagged. Tagged objects have attached metadata bits allowing their type to be inspected by the user later on. This is the happy path! We’ll be able to write a function that takes any object, does a switch on, say, some bits in the first word, dispatching to type-specific tracing code. As long as the object is sufficiently initialized by the time the next safepoint comes around, we’re good, and given cooperative safepoints, the compiler should be able to ensure this invariant.gc_trace_object Then there are untagged allocations. Generally speaking, these are of two kinds: temporary and auxiliary. An example of a temporary allocation would be growable storage used by a C run-time routine, perhaps as an unbounded-sized alternative to . Guile uses these a fair amount, as they compose well with non-local control flow as occurring for example in exception handling.alloca An auxiliary allocation on the other hand might be a data structure only referred to by the internals of a tagged object, but which itself never escapes to Scheme, so you never need to inquire about its type; it’s convenient to have the lifetimes of these values managed by the GC, and when desired to have the GC automatically trace their contents. Some of these should just be folded into the allocations of the tagged objects themselves, to avoid pointer-chasing. Others are harder to change, notably for mutable objects. And the trouble is that for external users of , I fear that we won’t be able to migrate them over, as we don’t know whether they are making tagged mallocs or not.scm_gc_malloc One conventional way to handle untagged allocations is to manage to fit your data into other tagged data structures; V8 does this in many places with instances of FixedArray, for example, and Guile should do more of this. Otherwise, you make new tagged data types. In either case, all auxiliary data should be tagged. I think there may be an alternative, which would be just to support the equivalent of untagged and ; but for that, I am out of time today, so type at y’all tomorrow. Happy hacking!GC_mallocGC_malloc_atomic inventory what is to be done? implicit uses defensive uses precise tracing reachability misc allocation
Hey all, quick post today to mention that I added tracing support to the . If the support library for is available when Whippet is compiled, Whippet embedders can visualize the GC process. Like this!Whippet GC libraryLTTng Click above for a full-scale screenshot of the trace explorer processing the with the on a 2.5x heap. Of course no image will have all the information; the nice thing about trace visualizers like is that you can zoom in to sub-microsecond spans to see exactly what is happening, have nice mouseovers and clicky-clickies. Fun times!Perfetto microbenchmarknboyerparallel copying collector Adding tracepoints to a library is not too hard in the end. You need to , which has a file. You need to . Then you have a that includes the header, to generate the code needed to emit tracepoints.pull in the librarylttng-ustdeclare your tracepoints in one of your header filesminimal C filepkg-config Annoyingly, this header file you write needs to be in one of the directories; it can’t be just in the the source directory, because includes it seven times (!!) using (!!!) and because the LTTng file header that does all the computed including isn’t in your directory, GCC won’t find it. It’s pretty ugly. Ugliest part, I would say. But, grit your teeth, because it’s worth it.-Ilttngcomputed includes Finally you pepper your source with tracepoints, which probably you so that you don’t have to require LTTng, and so you can switch to other tracepoint libraries, and so on.wrap in some macro I wrote up a little . It’s not as easy as , which I think is an error. Another ugly point. Buck up, though, you are so close to graphs!guide for Whippet users about how to actually get tracesperf record By which I mean, so close to having to write a Python script to make graphs! Because LTTng writes its logs in so-called Common Trace Format, which as you might guess is not very common. I have a colleague who swears by it, that for him it is the lowest-overhead system, and indeed in my case it has no measurable overhead when trace data is not being collected, but his group uses custom scripts to convert the CTF data that he collects to... (?!?!?!!).GTKWave In my case I wanted to use Perfetto’s UI, so I found a to convert from CTF to the . But, it uses an old version of Babeltrace that wasn’t available on my system, so I had to write a (!!?!?!?!!), probably the most Python I have written in the last 20 years.scriptJSON-based tracing format that Chrome profiling used to usenew script Yes. God I love blinkenlights. As long as it’s low-maintenance going forward, I am satisfied with the tradeoffs. Even the fact that I had to write a script to process the logs isn’t so bad, because it let me get nice nested events, which most stock tracing tools don’t allow you to do. I fixed a small performance bug because of it – a . A win, and one that never would have shown up on a sampling profiler too. I suspect that as I add more tracepoints, more bugs will be found and fixed.worker thread was spinning waiting for a pool to terminate instead of helping out I think the only thing that would be better is if tracepoints were a part of Linux system ABIs – that there would be header files to emit tracepoint metadata in all binaries, that you wouldn’t have to link to any library, and the actual tracing tools would be intermediated by that ABI in such a way that you wouldn’t depend on those tools at build-time or distribution-time. But until then, I will take what I can get. Happy tracing! on adding tracepoints using the thing is it worth it? fin
Usually in this space I like to share interesting things that I find out; you might call it a research-epistle-publish loop. Today, though, I come not with answers, but with questions, or rather one question, but with fractal surface area: what is the value proposition of generational garbage collection? The conventional wisdom is encapsulated in a 2004 Blackburn, Cheng, and McKinley paper, , which compares whole-heap mark-sweep and copying collectors to their generational counterparts, using the Jikes RVM as a test harness. (It also examines a generational reference-counting collector, which is an interesting predecessor to the 2022 work by Zhao, Blackburn, and McKinley.)“Myths and Realities: The Performance Impact of Garbage Collection”LXR The paper finds that generational collectors spend less time than their whole-heap counterparts for a given task. This is mainly due to less time spent collecting, because generational collectors avoid tracing/copying work for older objects that mostly stay in the same place in the live object graph. The paper also notes an improvement for mutator time under generational GC, but only for the generational mark-sweep collector, which it attributes to the locality and allocation speed benefit of bump-pointer allocation in the nursery. However for copying collectors, generational GC tends to slow down the mutator, probably because of the write barrier, but in the end lower collector times still led to lower total times. So, I expected generational collectors to always exhibit lower wall-clock times than whole-heap collectors. In , I have a garbage collector with an abstract API that specializes at compile-time to the mutator’s object and root-set representation and to the collector’s allocator, write barrier, and other interfaces. I embed it in , a simple Scheme-to-C compiler that can run some small multi-threaded benchmarks, for example the classic Gabriel benchmarks. We can then test those benchmarks against different collectors, mutator (thread) counts, and heap sizes. I expect that the generational parallel copying collector takes less time than the whole-heap parallel copying collector.whippetwhiffle So, I ran some benchmarks. Take the splay-tree benchmark, derived from Octane’s . I have a port to Scheme, and the results are... not good!splay.js In this graph the “pcc” series is the whole-heap copying collector, and “generational-pcc” is the generational counterpart, with a nursery sized such that after each collection, its size is 2 MB times the number of active mutator threads in the last collector. So, for this test with eight threads, on my 8-core Ryzen 7 7840U laptop, the nursery is 16MB including the copy reserve, which happens to be the same size as the L3 on this CPU. New objects are kept in the nursery one cycle before being promoted to the old generation. There are also results for , which use an Immix-derived algorithm that allows for bump-pointer allocation but which doesn’t require a copy reserve. There, the generational collectors use a , which has very different performance characteristics as promotion is in-place, and the nursery is as large as the available heap size.“mmc” and “generational-mmc” collectorssticky mark-bit algorithm The salient point is that at all heap sizes, and for these two very different configurations (mmc and pcc), generational collection takes more time than whole-heap collection. It’s not just the splay benchmark either; I see the same thing for the very different . What is the deal?nboyer benchmark I am honestly quite perplexed by this state of affairs. I wish I had a narrative to tie this together, but in lieu of that, voici some propositions and observations. Sometimes people say that the reason generational collection is good is because you get bump-pointer allocation, which has better locality and allocation speed. This is misattribution: it’s bump-pointer allocators that have these benefits. You can have them in whole-heap copying collectors, or you can have them in whole-heap mark-compact or immix collectors that bump-pointer allocate into the holes. Or, true, you can have them in generational collectors with a copying nursery but a freelist-based mark-sweep allocator. But also you can have generational collectors without bump-pointer allocation, for free-list sticky-mark-bit collectors. To simplify this panorama to “generational collectors have good allocators” is incorrect. It’s true, generational GC does lower median pause times: But because a major collection is usually slightly more work under generational GC than in a whole-heap system, because of e.g. the need to reset remembered sets, the maximum pauses are just as big and even a little bigger: I am not even sure that it is meaningful to compare median pause times between generational and non-generational collectors, given that the former perform possibly orders of magnitude more collections than the latter. Doing fewer whole-heap traces is good, though, and in the ideal case, the less frequent major traces under generational collectors allows time for concurrent tracing, which is the true mitigation for long pause times. Could it be that the test harness I am using is in some way unrepresentative? I don’t have more than one test harness for Whippet yet. I will start work on a second Whippet embedder within the next few weeks, so perhaps we will have an answer there. Still, there is ample time spent in GC pauses in these benchmarks, so surely as a GC workload Whiffle has some utility. One reasons that Whiffle might be unrepresentative is that it is an ahead-of-time compiler, whereas nursery addresses are assigned at run-time. Whippet exposes the necessary information to allow a just-in-time compiler to specialize write barriers, for example the inline check that the field being mutated is not in the nursery, and an AOT compiler can’t encode this as an immediate. But it seems a small detail. Also, Whiffle doesn’t do much compiler-side work to elide write barriers. Could the cost of write barriers be over-represented in Whiffle, relative to a production language run-time? Relatedly, Whiffle is just a baseline compiler. It does some partial evaluation but no CFG-level optimization, no contification, no nice closure conversion, no specialization, and so on: is it not representative because it is not an optimizing compiler? How big should the nursery be? I have no idea. As a thought experiment, consider the case of a 1 kilobyte nursery. It is probably too small to allow the time for objects to die young, so the survival rate at each minor collection would be high. Above a certain survival rate, generational GC is probably a lose, because your program violates the weak generational hypothesis: it introduces a needless copy for all survivors, and a synchronization for each minor GC. On the other hand, a 1 GB nursery is probably not great either. It is plenty large enough to allow objects to die young, but the number of survivor objects in a space that large is such that pause times would not be very low, which is one of the things you would like in generational GC. Also, you lose out on locality: a significant fraction of the objects you traverse are probably out of cache and might even incur TLB misses. So there is probably a happy medium somewhere. My instinct is that for a copying nursery, you want to make it about as big as L3 cache, which on my 8-core laptop is 16 megabytes. Systems are different sizes though; in Whippet my current heuristic is to reserve 2 MB of nursery per core that was active in the previous cycle, so if only 4 threads are allocating, you would have a 8 MB nursery. Is this good? I don’t know. I don’t have a very large set of benchmarks that run on Whiffle, and they might not be representative. I mean, they are microbenchmarks. One question I had was about heap sizes. If a benchmark’s maximum heap size fits in L3, which is the case for some of them, then probably generational GC is a wash, because whole-heap collection stays in cache. When I am looking at benchmarks that evaluate generational GC, I make sure to choose those that exceed L3 size by a good factor, for example the 8-mutator splay benchmark in which minimum heap size peaks at 300 MB, or the 8-mutator nboyer-5 which peaks at 1.6 GB. But then, should nursery size scale with total heap size? I don’t know! Incidentally, the way that I scale these benchmarks to multiple mutators is a bit odd: they are serial benchmarks, and I just run some number of threads at a time, and scale the heap size accordingly, assuming that the minimum size when there are 4 threads is four times the minimum size when there is just one thread. However, , in the sense that there is no heap size under which they fail and above which they succeed; I quote:multithreaded programs are unreliable A generational collector partitions objects into old and new sets, and a minor collection starts by visiting all old-to-new edges, called the “remembered set”. As the program runs, mutations to old objects might introduce new old-to-new edges. To maintain the remembered set in a generational collector, the mutator invokes : little bits of code that run when you mutate a field in an object. This is overhead relative to non-generational configurations, where the mutator doesn’t have to invoke collector code when it sets fields.write barriers So, could it be that Whippet’s write barriers or remembered set are somehow so inefficient that my tests are unrepresentative of the state of the art? I used to use card-marking barriers, but I started to suspect they cause too much overhead during minor GC and introduced too much cache contention. I switched to some months back for Whippet’s Immix-derived space, and we use the same kind of barrier in the generational copying (pcc) collector. I think this is state of the art. I need to see if I can find a configuration that allows me to measure the overhead of these barriers, independently of other components of a generational collector.precise field-logging barriers A few months ago, my only generational collector used the algorithm, which is an unconventional configuration: its nursery is not contiguous, non-moving, and can be as large as the heap. This is part of the reason that I implemented generational support for the parallel copying collector, to have a different and more conventional collector to compare against. But generational collection loses on some of these benchmarks in both places!sticky mark-bit On one benchmark which repeatedly constructs some trees and then verifies them, I was seeing terrible results for generational GC, which I realized were because of cooperative safepoints: generational GC collects more often, so it requires that all threads reach safepoints more often, and the non-allocating verification phase wasn’t emitting any safepoints. I had to change the compiler to emit safepoints at regular intervals (in my case, on function entry), and it sped up the generational collector by a significant amount. This is one instance of a general observation, which is that any work that doesn’t depend on survivor size in a GC pause is more expensive with a generational collector, which runs more collections. Synchronization can be a cost. I had one bug in which tracing ephemerons did work proportional to the size of the whole heap, instead of the nursery; I had to specifically add generational support for the way Whippet deals with ephemerons during a collection to reduce this cost. Looking deeper at the data, I have partial answers for the splay benchmark, and they are annoying :) Splay doesn’t actually allocate all that much garbage. At a 2.5x heap, the stock parallel MMC collector (in-place, sticky mark bit) collects... one time. That’s all. Same for the generational MMC collector, because the first collection is always major. So at 2.5x we would expect the generational collector to be slightly slower. The benchmark is simply not very good – or perhaps the most generous interpretation is that it represents tasks that allocate 40 MB or so of long-lived data and not much garbage on top. Also at 2.5x heap, the whole-heap copying collector runs 9 times, and the generational copying collector does 293 minor collections and... 9 major collections. We are not reducing the number of major GCs. It means either the nursery is too small, so objects aren’t dying young when they could, or the benchmark itself doesn’t conform to the weak generational hypothesis. At a 1.5x heap, the copying collector doesn’t have enough space to run. For MMC, the non-generational variant collects 7 times, and generational MMC times out. Timing out indicates a bug, I think. Annoying! I tend to think that if I get results and there were fewer than, like, 5 major collections for a whole-heap collector, that indicates that the benchmark is probably inapplicable at that heap size, and I should somehow surface these anomalies in my analysis scripts. Doing a similar exercise for nboyer at 2.5x heap with 8 threads (4GB for 1.6GB live data), I see that pcc did 20 major collections, whereas generational pcc lowered that to 8 major collections and 3471 minor collections. Could it be that there are still too many fixed costs associated with synchronizing for global stop-the-world minor collections? I am going to have to add some fine-grained tracing to find out. I just don’t know! I want to believe that generational collection was an out-and-out win, but I haven’t yet been able to prove it is true. I do have some homework to do. I need to find a way to test the overhead of my write barrier – probably using the MMC collector and making it only do major collections. I need to fix generational-mmc for splay and a 1.5x heap. And I need to do some fine-grained performance analysis for minor collections in large heaps. Enough for today. Feedback / reactions very welcome. Thanks for reading and happy hacking! hypothesis test workbench results? “generational collection is good because bump-pointer allocation” “generational collection lowers pause times” is it whiffle? is it something about the nursery size? is it something about the benchmarks? is it the write barrier? is it something about the generational mechanism? is it something about collecting more often? is it something about collection frequency? collecting more often redux conclusion?
More in programming
I have added syntax highlighting to my blog using tree-sitter. Here are some notes about what I learned, with some complaining. static site generator markdown ingestion highlighting incompatible?! highlight names class names styling code results future work frontmatter templates feed style highlight quality static site generator I moved my blog to my own web site a few years ago. It is produced using a scruffy Rust program that converts a bunch of Markdown files to HTML using pulldown-cmark, and produces complete pages from Handlebars templates. Why did I write another static site generator? Well, partly as an exercise when learning Rust. Partly, since I wrote my own page templates, I’m not going to benefit from a library of existing templates. On the contrary, it’s harder to create new templates that work with a general-purpose SSG than write my own simpler site-specific SSG. It’s miserable to write programs in template languages. My SSG can keep the logic in the templates to a minimum, and do all the fiddly stuff in Rust. (Which is not very fiddly, because my site doesn’t have complicated navigation – compared to the multilevel menus on www.dns.cam.ac.uk for instance.) markdown ingestion There are a few things to do to each Markdown file: split off and deserialize the YAML frontmatter find the <cut> or <toc> marker that indicates the end of the teaser / where the table of contents should be inserted augment headings with self-linking anchors (which are also used by the ToC) Before this work I was using regexes to do all these jobs, because that allowed me to treat pulldown-cmark as a black box: Markdown in, HTML out. But for syntax highlighting I had to be able to find fenced code blocks. It was time to put some code into the pipeline between pulldown-cmark’s parser and renderer. And if I’m using a proper parser I can get rid of a few regexes: after some hacking, now only the YAML frontmatter is handled with a regex. Sub-heading linkification and ToC construction are fiddly and more complicated than they were before. But they are also less buggy: markup in headings actually works now! Compared to the ToC, it’s fairly simple to detect code blocks and pass them through a highlighter. You can look at my Markdown munger here. (I am not very happy with the way it uses state, but it works.) highlighting As well as the tree-sitter-highlight documentation I used femark as an example implementation. I encountered a few problems. incompatible?! I could not get the latest tree-sitter-highlight to work as described in its documentation. I thought the current tree-sitter crates were incompatible with each other! For a while I downgraded to an earlier version, but eventually I solved the problem. Where the docs say, let javascript_language = tree_sitter_javascript::language(); They should say: let javascript_language = tree_sitter::Language::new( tree_sitter_javascript::LANGUAGE ); highlight names I was offended that tree-sitter-highlight seems to expect me to hardcode a list of highlight names, without explaining where they come from or what they mean. I was doubly offended that there’s an array of STANDARD_CAPTURE_NAMES but it isn’t exported, and doesn’t match the list in the docs. You mean I have to copy and paste it? Which one?! There’s some discussion of highlight names in the tree-sitter manual’s “syntax highlighting” chapter, but that is aimed at people who are writing a tree-sitter grammar, not people who are using one. Eventually I worked out that tree_sitter_javascript::HIGHLIGHT_QUERY in the tree-sitter-highlight example corresponds to the contents of a highlights.scm file. Each @name in highlights.scm is a highlight name that I might be interested in. In principle I guess different tree-sitter grammars should use similar highlight names in their highlights.scm files? (Only to a limited extent, it turns out.) I decided the obviously correct list of highlight names is the list of every name defined in the HIGHLIGHT_QUERY. The query is just a string so I can throw a regex at it and build an array of the matches. This should make the highlighter produce <span> wrappers for as many tokens as possible in my code, which might be more than necessary but I don’t have to style them all. class names The tree-sitter-highlight crate comes with a lightly-documented HtmlRenderer, which does much of the job fairly straightforwardly. The fun part is the attribute_callback. When the HtmlRenderer is wrapping a token, it emits the start of a <span then expects the callback to append whatever HTML attributes it thinks might be appropriate. Uh, I guess I want a class="..." here? Well, the highlight names work a little bit like class names: they have dot-separated parts which tree-sitter-highlight can match more or less specifically. (However I am telling it to match all of them.) So I decided to turn each dot-separated highlight name into a space-separated class attribute. The nice thing about this is that my Rust code doesn’t need to know anything about a language’s tree-sitter grammar or its highlight query. The grammar’s highlight names become CSS class names automatically. styling code Now I can write some simple CSS to add some colours to my code. I can make type names green, code span.hilite.type { color: #aca; } If I decide builtin types should be cyan like keywords I can write, code span.hilite.type.builtin, code span.hilite.keyword { color: #9cc; } results You can look at my tree-sitter-highlight wrapper here. Getting it to work required a bit more creativity than I would have preferred, but it turned out OK. I can add support for a new language by adding a crate to Cargo.toml and a couple of lines to hilite.rs – and maybe some CSS if I have not yet covered its highlight names. (Like I just did to highlight the CSS above!) future work While writing this blog post I found myself complaining about things that I really ought to fix instead. frontmatter I might simplify the per-page source format knob so that I can use pulldown-cmark’s support for YAML frontmatter instead of a separate regex pass. This change will be easier if I can treat the html pages as Markdown without mangling them too much (is Markdown even supposed to be idempotent?). More tricky are a couple of special case pages whose source is Handlebars instead of Markdown. templates I’m not entirely happy with Handlebars. It’s a more powerful language than I need – I chose Handlebars instead of Mustache because Handlebars works neatly with serde. But it has a dynamic type system that makes the templates more error-prone than I would like. Perhaps I can find a more static Rust template system that takes advantage of the close coupling between my templates and the data structure that describes the web site. However, I like my templates to be primarily HTML with a sprinkling of insertions, not something weird that’s neither HTML nor Rust. feed style There’s no CSS in my Atom feed, so code blocks there will remain unstyled. I don’t know if feed readers accept <style> tags or if it has to be inline styles. (That would make a mess of my neat setup!) highlight quality I’m not entirely satisfied with the level of detail and consistency provided by the tree-sitter language grammars and highlight queries. For instance, in the CSS above the class names and property names have the same colour because the CSS highlights.scm gives them the same highlight name. The C grammar is good at identifying variables, but the Rust grammar is not. Oh well, I guess it’s good enough for now. At least it doesn’t involve Javascript.
Simplify complex decisions by separating upsides from downsides, investing in upsides, vetoing with downsides, and using an appropriate decision framework.
I've been running Linux, Neovim, and Framework for a year now, but it easily feels like a decade or more. That's the funny thing about habits: They can be so hard to break, but once you do, they're also easily forgotten. That's how it feels having left the Apple realm after two decades inside the walled garden. It was hard for the first couple of weeks, but since then, it’s rarely crossed my mind. Humans are rigid in the short term, but flexible in the long term. Blessed are the few who can retain the grit to push through that early mental resistance and reach new maxima. That is something that gets harder with age. I can feel it. It takes more of me now to wipe a mental slate clean and start over. To go back to being a beginner. But the reward for learning something new is as satisfying as ever. But it's also why I've tried to be modest with the advocacy. I don't know if most developers are better off on Linux. I mean, I believe they are, at some utopian level, especially if they work for the web, using open source tooling. But I don't know if they are as humans with limited will or capacity for change. Of course, it's fair to say that one doesn't want to. Either because one remain a fan of Apple, in dire need of the remaining edge MacBooks retain on efficiency/battery, or simply content inside the ecosystem. There are plenty of reasons why someone might not want to change. It's not just about rigidity. Besides, it's a dead end trying to convince anyone of an alternative with the sharp end of a religious argument. That kind of crusading just seeds resentment and stubbornness. I know that all too well. What I've found to work much better is planting seeds and showing off your plowshare. Let whatever curiosity that blooms find its own way towards your blue sky. The mimetic engine of persuasion runs much cleaner anyway. And for me, it's primarily about my personal computing workbench regardless of what the world does or doesn't. It was the same with finding Ruby. It's great when others come along for the ride, but I'd also be happy taking the trip solo too. So consider this a postcard from a year into the Linux, Neovim, and Framework journey. The sun is still shining, the wind is in my hair, and the smile on my lips hasn't been this big since the earliest days of OS X.
Yesterday I gave a talk at Monki Gras 2025. This year, the theme is Sustaining Software Development Craft, and here’s the description from the conference website: The big question we want to explore is – how can we keep doing the work we do, when it sustains us, provides meaning and purpose, and sometimes pays the bills? We’re in a period of profound change, technically, politically, socially, economically, which has huge implications for us as practitioners, the makers and doers, but also for the culture at large. I did a talk about the first decade of my career, which I’ve spent working on projects that are designed to last. I’m pleased with my talk, and I got a lot of nice comments. Monki Gras is always a pleasure to attend and speak at – it’s such a lovely, friendly vibe, and the organisers James Governor and Jessica West do a great job of making it a nice day. When I left yesterday, I felt warm and fuzzy and appreciated. I also have a front-row photo of me speaking, courtesy of my dear friend Eriol Fox. Naturally, I chose my outfit to match my slides (and this blog post!). Key points How do you create something that lasts? You can’t predict the future, but there are patterns in what lasts People skills sustain a career more than technical skills Long-lasting systems cannot grow without bound; they need weeding Links/recommended reading Sibyl Schaefer presented a paper Energy, Digital Preservation, and the Climate at iPres 2024, which is about how digital preservation needs to change in anticipation of the climate crisis. This was a major inspiration for this talk. Simon Willison gave a talk Coping strategies for the serial project hoarder at DjangoCon US in 2022, which is another inspiration for me. I’m not as prolific as Simon, but I do see parallels between his approach and what I remember of Metaswitch. Most of the photos in the talk come from the Flickr Commons, a collection of historical photographs from over 100 international cultural heritage organisations. You can learn more about the Commons, browse the photos, and see who’s involved using the Commons Explorer https://commons.flickr.org/. (Which I helped to build!) Slides and notes Photo: dry stone wall building in South Wales. Taken by Wikimedia Commons user TR001, used under CC BY‑SA 3.0. [Make introductory remarks; name and pronouns; mention slides on my website] I’ve been a software developer for ten years, and I’ve spent my career working on projects that are designed to last – first telecoms and networking, now cultural heritage – so when I heard this year’s theme “sustaining craft”, I thought about creating things that last a long time. The key question I want to address in this talk is how do you create something that lasts? I want to share a few thoughts I’ve had from working on decade- and century-scale projects. Part of this is about how we sustain ourselves as software developers, as the individuals who create software, especially with the skill threat of AI and the shifting landscape of funding software. I also want to go broader, and talk about how we sustain the craft, the skill, the projects. Let’s go through my career, and see what we can learn. Photo: women working at a Bell System telephone switchboard. From the U.S. National Archives, no known copyright restrictions. My first software developer job was at a company called Metaswitch. Not a household name, they made telecoms equipment, and you’d probably have heard of their customers. They sold equipment to carriers like AT&T, Vodafone, and O2, who’d use that equipment to sell you telephone service. Telecoms infrastructure is designed to last a long time. I spent most of my time at Metaswitch working with BGP, a routing protocol designed on a pair of napkins in 1989. BGP is sometimes known as the "two-napkin protocol", because of the two napkins on which Kirk Lougheed and Yakov Rekhter wrote the original design. From the Computer History Museum. These are those napkins. This design is basically still the backbone of the Internet. A lot of the building blocks of the telephone network and the Internet are fundamentally the same today as when they were created. I was working in a codebase that had been actively developed for most of my life, and was expected to outlast me. This was my first job so I didn’t really appreciate it at the time, but Metaswitch did a lot of stuff designed to keep that codebase going, to sustain it into the future. Let’s talk about a few of them. Photo: a programmer testing electronic equipment. From the San Diego Air & Space Museum Archives, no known copyright restrictions. Metaswitch was very careful about adopting new technologies. Most of their code was written in C, a little C++, and Rust was being adopted very slowly. They didn’t add new technology quickly. Anything they add, they have to support for a long time – so they wanted to pick technologies that weren’t a flash in the pan. I learnt about something called “the Lindy effect” – this is the idea that any technology is about halfway through its expected life. An open-source library that’s been developed for decades? That’ll probably be around a while longer. A brand new JavaScript framework? That’s a riskier long-term bet. The Lindy effect is about how software that’s been around a long time has already proven its staying power. And talking of AI specifically – I’ve been waiting for things to settle. There’s so much churn and change in this space, if I’d learnt a tool six months ago, most of that would be obsolete today. I don’t hate AI, I love that people are trying all these new tools – but I’m tired and I learning new things is exhausting. I’m waiting for things to calm down before really diving deep on these tools. Metaswitch was very cautious about third-party code, and they didn’t have much of it. Again, anything they use will have to be supported for a long time – is that third-party code, that open-source project stick around? They preferred to take the short-term hit of writing their own code, but then having complete control over it. To give you some idea of how seriously they took this: every third-party dependency had to be reviewed and vetted by lawyers before it could be added to the codebase. Imagine doing that for a modern Node.js project! They had a lot of safety nets. Manual and automated testing, a dedicated QA team, lots of checks and reviews. These were large codebases which had to be reliable. Long-lived systems can’t afford to “move fast and break things”. This was a lot of extra work, but it meant more stability, less churn, and not much risk of outside influences breaking things. This isn’t the only way to build software – Metaswitch is at one extreme of a spectrum – but it did seem to work. I think this is a lesson for building software, but also in what we choose to learn as individuals. Focusing on software that’s likely to last means less churn in our careers. If you learn the fundamentals of the web today, that knowledge will still be useful in five years. If you learn the JavaScript framework du jour? Maybe less so. How do you know what’s going to last? That’s the key question! It’s difficult, but it’s not impossible. This is my first thought for you all: you can’t predict the future, but there are patterns in what lasts. I’ve given you some examples of coding practices that can help the longevity of a codebase, these are just a few. Maybe I have rose-tinted spectacles, but I’ve taken the lessons from Metaswitch and brought them into my current work, and I do like them. I’m careful about external dependencies, I write a lot of my own code, and I create lots of safety nets, and stuff doesn’t tend to churn so much. My code lasts because it isn’t constantly being broken by external forces. Photo: a child in nursery school cutting a plank of wood with a saw. From the Community Archives of Belleville and Hastings County, no known copyright restrictions. So that’s what the smart people were doing at Metaswitch. What was I doing? I joined Metaswitch when I was a young and twenty-something graduate, so I knew everything. I knew software development was easy, these old fuddy-duddies were making it all far too complicated, and I was gonna waltz in and show them how it was done. And obviously, that happened. (Please imagine me reading that paragraph in a very sarcastic voice.) I started doing the work, and it was a lot harder than I expected – who knew that software development was difficult? But I was coming from a background as a solo dev who’d only done hobby projects. I’d never worked in a team before. I didn’t know how to say that I was struggling, to ask for help. I kept making bold promises about what I could do, based on how quickly I thought I should be able to do the work – but I was making promises my skills couldn’t match. I kept missing self-imposed deadlines. You can do that once, but you can’t make it a habit. About six months before I left, my manager said to me “Alex, you have a reputation for being unreliable”. Photo: a boy with a pudding bowl haircut, photographed by Elinor Wiltshire, 1964. From the National Library of Ireland, no known copyright restrictions. He was right! I had such a history of making promises that I couldn’t keep, people stopped trusting me. I didn’t get to work on interesting features or the exciting projects, because nobody trusted me to deliver. That was part of why I left that job – I’d ploughed my reputation into the ground, and I needed to reset. Photo: the library stores at Wellcome Collection. Taken by Thomas SG Farnetti used under CC BY‑NC 4.0. I got that reset at Wellcome Collection, a London museum and library that some of you might know. I was working a lot with their collections, a lot of data and metadata. Wellcome Collection is building on long tradition of libraries and archives, which go back thousands of years. Long-term thinking is in their DNA. To give you one example: there’s stuff in the archive that won’t be made public until the turn of the century. Everybody who works there today will be long gone, but they assume that those records will exist in some shape or form form when that time comes, and they’re planning for those files to eventually be opened. This is century-scale thinking. Photo: Bob Hoover. From the San Diego Air & Space Museum Archives, no known copyright restrictions. When I started, I sat next to a guy called Chris. (I couldn’t find a good picture of him, but I feel like this photo captures his energy.) Chris was a senior archivist. He’d been at Wellcome Collection about twenty-five years, and there were very few people – if anyone – who knew more about the archive than he did. He absolutely knew his stuff, and he could have swaggered around like he owned the place. But he didn’t. Something I was struck by, from my very first day, was how curious and humble he was. A bit of a rarity, if you work in software. He was the experienced veteran of the organisation, but he cared about what other people had to say and wanted to learn from them. Twenty-five years in, and he still wanted to learn. He was a nice guy. He was a pleasure to work with, and I think that’s a big part of why he was able to stay in that job as long as he did. We were all quite disappointed when he left for another job! This is my second thought for you: people skills sustain a career more than technical ones. Being a pleasure to work with opens so many doors and opportunities than technical skill alone cannot. We could do another conference just on what those people skills are, but for now I just want to give you a few examples to think about. Photo: Lt.(jg.) Harriet Ida Pickens and Ens. Frances Wills, first Negro Waves to be commissioned in the US Navy. From the U.S. National Archives, no known copyright restrictions. Be a respectful and reliable teammate. You want to be seen as a safe pair of hands. Reliability isn’t about avoiding mistakes, it’s about managing expectations. If you’re consistently overpromising and underdelivering, people stop trusting you (which I learnt the hard way). If you want people to trust you, you have to keep your promises. Good teammates communicate early when things aren’t going to plan, they ask for help and offer it in return. Good teammates respect the work that went before. It’s tempting to dismiss it as “legacy”, but somebody worked hard on it, and it was the best they knew how to do – recognise that effort and skill, don’t dismiss it. Listen with curiosity and intent. My colleague Chris had decades of experience, but he never acted like he knew everything. He asked thoughtful questions and genuinely wanted to learn from everyone. So many of us aren’t really listening when we’re “listening” – we’re just waiting for the next silence, where we can interject with the next thing we’ve already thought of. We aren’t responding to what other people are saying. When we listen, we get to learn, and other people feel heard – and that makes collaboration much smoother and more enjoyable. Finally, and this is a big one: don’t give people unsolicited advice. We are very bad at this as an industry. We all have so many opinions and ideas, but sometimes, sharing isn’t caring. Feedback is only useful when somebody wants to hear it – otherwise, it feels like criticism, it feels like an attack. Saying “um, actually” when nobody asked for feedback isn’t helpful, it just puts people on the defensive. Asking whether somebody wants feedback, and what sort of feedback they want, will go a long way towards it being useful. So again: people skills sustain a career more than technical skills. There aren’t many truly solo careers in software development – we all have to work with other people – for many of us, that’s the joy of it! If you’re a nice person to work with, other people will want to work with you, to collaborate on projects, they’ll offer you opportunities, it opens doors. Your technical skills won’t sustain your career if you can’t work with other people. Photo: "The Keeper", an exhibition at the New Museum in New York. Taken by Daniel Doubrovkine, used under CC BY‑NC‑SA 4.0. When I went to Wellcome Collection, it was my first time getting up-close and personal with a library and archive, and I didn’t really know how they worked. If you’d asked me, I’d have guessed they just keep … everything? And it was gently explained to me that “No Alex, that’s hoarding.” “Your overflowing yarn stash does not count as an archive.” Big collecting institutions are actually super picky – they have guidelines about what sort of material they collect, what’s in scope, what isn’t, and they’ll aggressively reject anything that isn’t a good match. At Wellcome Collection, their remit was “the history of health and human experience”. You have medical papers? Definitely interesting! Your dad’s old pile of car magazines? Less so. Photo: a dumpster full of books that have been discarded. From brewbooks on Flickr, used under CC BY‑SA 2.0. Collecting institutions also engage in the practice of “weeding” or “deaccessioning”, which is removing material, pruning the collection. For example, in lending libraries, books will be removed from the shelves if they’ve become old, damaged, or unpopular. They may be donated, or sold, or just thrown away – but whatever happens, they’re gotten rid of. That space is reclaimed for other books. Getting rid of material is a fundamental part of professional collecting, because professionals know that storing something has an ongoing cost. They know they can’t keep everything. Photo: a box full of printed photos. From Miray Bostancı on Pexels, used under the Pexels license. This is something I think about in my current job as well. I currently work at the Flickr Foundation, where we’re thinking about how to keep Flickr’s pictures visible for 100 years. How do we preserve social media, how do we maintain our digital legacy? When we talk to people, one thing that comes up regularly is that almost everybody has too many photos. Modern smartphones have made it so easy to snap, snap, snap, and we end up with enormous libraries with thousands of images, but we can’t find the photos we care about. We can’t find the meaningful memories. We’re collecting too much stuff. Digital photos aren’t expensive to store, but we feel the cost in other ways – the cognitive load of having to deal with so many images, of having to sift through a disorganised collection. Photo: a wheelbarrow in a garden. From Hans Middendorp on Pexels, used under the Pexels license. I think there’s a lesson here for the software industry. What’s the cost of all the code that we’re keeping? We construct these enormous edifices of code, but when do we turn things off? When do we delete code? We’re more focused on new code, new ideas, new features. I’m personally quite concerned by how much generative AI has focused on writing more code, and not on dealing with the code we already have. Code is text, so it’s cheap to store, but it still has a cost – it’s more cognitive load, more maintenance, more room for bugs and vulnerabilities. We can keep all our software forever, but we shouldn’t. Photo: Open Garbage Dump on Highway 112, North of San Sebastian. Taken by John Vachon, 1973. From the U.S. National Archives no known copyright restrictions. I think this is going to become a bigger issue for us. We live in an era of abundance, where we can get more computing resources at the push of a button. But that can’t last forever. What happens when our current assumptions about endless compute no longer hold? The climate crisis – where’s all our electricity and hardware coming from? The economics of AI – who’s paying for all these GPU-intensive workloads? And politics – how many of us are dependent on cloud computing based in the US? How many of us feel as good about that as we did three months ago? Libraries are good at making a little go a long way, about eking out their resources, about deciding what’s a good use of resources and what’s waste. Often the people who are good with money are the people who don’t have much of it, and we have a lot of money. It’s easier to make decisions about what to prune and what to keep when things are going well – it’s harder to make decisions in an emergency. This is my third thought for you: long-lasting systems cannot grow without bound; they need weeding. It isn’t sustainable to grow forever, because eventually you get overwhelmed by the weight of everything that came before. We need to get better at writing software efficiently, at turning things off that we don’t need. It’s a skill we’ve neglected. We used to be really good at it – when computers were the size of the room, programmers could eke out every last bit of performance. We can’t do that any more, but it’s so important when building something to last, and I think it’s a skill we’ll have to re-learn soon. Photo: Val Weaver and Vera Askew running in a relay race, Brisbane, 1939. From the State Library of Queensland no known copyright restrictions. Weeding is a term that comes from the preservation world, so let’s stay there. When you talk to people who work in digital preservation, we often describe it as a relay race. There is no permanent digital media, there’s no digital parchment or stone tablets – everything we have today will be unreadable in a few decades. We’re constantly migrating from one format to another, trying to stay ahead of obsolete technology. Software is also a bit of a relay race – there is no “write it once and you’re done”. We’re constantly upgrading, editing, improving. And that can be frustrating, but it also means have regular opportunities to learn and improve. We have that chance to reflect, to do things better. Photo: Broken computer monitor found in the woods. By Jeff Myers on Flickr, used under CC BY‑NC 2.0. I think we do our best reflections when computers go bust. When something goes wrong, we spring into action – we do retrospectives, root cause analysis, we work out what went wrong and how to stop it happening again. This is a great way to build software that lasts, to make it more resilient. It’s a period of intense reflection – what went wrong, how do we stop it happening again? What I’ve noticed is that the best systems are doing this sort of reflection all the time – they aren’t waiting for something to go wrong. They know that prevention is better than cure, and they embody it. They give themselves regular time to reflect, to think about what’s working and what’s not – and when we do, great stuff can happen. Photo: Statue of Astrid Lindgren. By Tobias Barz on Flickr, used under CC BY‑ND 2.0. I want to give you one more example. As a sidebar to my day job, I’ve been writing a blog for thirteen years. It’s the longest job – asterisk – I’ve ever had. The indie web is still cool! A lot of what I write, especially when I was starting, was sharing bits of code. “Here’s something I wrote, here’s what it does, here’s how it works and why it’s cool.” Writing about my code has been an incredible learning experience. You might know have heard the saying “ask a developer to review 5 lines of code, she’ll find 5 issues, ask her to review 500 lines and she’ll say it looks good”. When I sit back and deeply read and explain short snippets of my code, I see how to do things better. I get better at programming. Writing this blog has single-handedly had the biggest impact on my skill as a programmer. Photo: Midnight sun in Advent Bay, Spitzbergen, Norway. From the Library of Congress, no known copyright restrictions. There are so many ways to reflect on our work, opportunities to look back and ask how we can do better – but we have to make the most of them. I think we are, in some ways, very lucky that our work isn’t set in stone, that we do keep doing the same thing, that we have the opportunity to do better. Writing this talk has been, in some sense, a reflection on the first decade of my career, and it’s made me think about what I want the next decade to look like. In this talk, I’ve tried to distill some of those things, tried to give you some of the ideas that I want to keep, that I think will help my career and my software to last. Be careful about what you create, what you keep, and how you interact with other people. That care, that process of reflection – that is what creates things that last. [If the formatting of this post looks odd in your feed reader, visit the original article]