More from alexwlchan
I’ve had a couple of projects recently where I needed to work with a list that involved images. For example, choosing a series of photos to print, or making an inventory of Lego parts. I could write a simple text list, but it’s really helpful to be able to see the images as part of the list, especially when I’m working with other people. The best tool I’ve found is Google Sheets – not something I usually associate with pictures! I’m using Google Sheets, and I use the IMAGE function, which inserts an image into a cell. For example: =IMAGE("https://www.google.com/images/srpr/logo3w.png") There’s a similar function in Microsoft Excel, but not in Apple Numbers. This function can reference values in other cells, so I’ll often prepare my spreadsheet in another tool – say, a Python script – and include an image URL in one of the columns. When I import the spreadsheet into Google Sheets, I use IMAGE() to reference that column, and then I see inline images. After that, I tend to hide the column with the image URL, and resize the rows/columns containing images to make them bigger and easier to look at. I often pair this with the HYPERLINK function, which can add a clickable link to a cell. This is useful to link to the source of the image, or to more detail I can’t fit in the spredsheet. I don’t know how far this approach can scale – I’ve never tried more than a thousand or so images in a single spreadsheet – but it’s pretty cool that it works at all! Using a spreadsheet gives me a simple, lightweight interface that most people are already familiar with. It doesn’t take much work on my part, and I get useful features like sorting and filtering for “free”. Previously I’d only thought of spreadsheets as a tool for textual data, and being able to include images has made them even more powerful. [If the formatting of this post looks odd in your feed reader, visit the original article]
I’ve posted another command-line tool on GitHub: randline, which gives you a random selection of lines in a file: $ randline < /usr/share/dict/words ultraluxurious $ randline 3 < /usr/share/dict/words unexceptionably baselessness salinity There are lots of tools that solve this problem; I wrote my own as a way to get some more Rust practice and try a new-to-me technique called reservoir sampling. Existing approaches There’s a shuf command in coreutils which is designed to do this exact thing: $ shuf -n 3 /usr/share/dict/words brimstone melody's reimbursed But I don’t have coreutils on my Mac, so I can’t use shuf. You can do this in lots of other ways using tools like awk, sort and perl. If you’re interested, check out these Stack Overflow and Unix & Linux Stack Exchange threads for examples. For my needs, I wrote a tiny Python script called randline which I saved in my PATH years ago, and I haven’t thought much about since: import random import sys if __name__ == "__main__": lines = sys.stdin.read().splitlines() try: k = int(sys.argv[1]) except IndexError: k = 1 random.shuffle(lines) print("\n".join(lines[:k])) (I’m not sure why my past self decided not to use random.sample. I suspect I’d forgotten about it.) This script has worked fine, but I stumbled across it recently and it got me thinking. This approach isn’t very efficient – it has to load the whole file into memory. Can we do better? Reservoir sampling In other Python scripts, I process files as a stream – look at one line at a time, rather than loading the whole file at once. This doesn’t make much difference for small files, but it pays off when you have really big files. I couldn’t think of a good way to take a random sample of a file using streaming, and still get a uniform distribution – but smart people have already thought about this. I did some reading and I found a technique called reservoir sampling. The introduction in the Wikipedia article makes it clear this is exactly what I want: Reservoir sampling is a family of randomized algorithms for choosing a simple random sample, without replacement, of k items from a population of unknown size n in a single pass over the items. The size of the population n is not known to the algorithm and is typically too large for all n items to fit into main memory. The population is revealed to the algorithm over time, and the algorithm cannot look back at previous items. The basic idea is that rather than holding the whole file in memory at once, I can keep a fixed-sized buffer – or “reservoir” – of the items I’ve selected. As I go line-by-line through the file, I can add or remove items in this resevoir, and it will always use about the same amount of memory. I’m only holding a line in memory if it’s in the reservoir, not every line in the file. Algorithm L The Wikipedia article describes several algorithms, including a simple Algorithm R and an optimal Algorithm L. The underlying principle of Algorithm L is pretty concise: If we generate $n$ random numbers $u_1, \ldots, u_n \sim U[0,1]$ independently, then the indices of the smallest $k$ of them is a uniform sample of the $k$-subsets of $\{1, \ldots, n\}$. There’s no proof in the Wikipedia article, but I wanted to satisfy myself that this is true. If you’re happy to take it as given, you can skip the maths and go to the next section. Here’s my attempt at a justification: What we really care about is the relative ranking of the $u_1, \ldots, u_n$, not their actual values – we care whether, for example, $u_1 < u_2$, but not the exact difference between them. Because the variables are independent and they have the same distribution, every possible ranking is equally likely. Every variable is the same, so none of them can be “special” or favoured above the others. This means that each permutation of the indices $\{1, \ldots, n\}$ is equally likely. There are $n!$ such permutations, so each occurs with probability $1/n!$. For a given $k$-subset, we’re interested in permuations where this subset is the first $k$ items. This means the probability that a particular $k$-subset will be selected is a simple fraction: $$ \begin{equation*} \text{probability of selecting this }k\text{-subset} = \frac{\text{# of permutations where this subset is the first }k\text{ items}} {\text{# of permutations}} \end{equation*} $$ How many permutations are there where this $k$-subset is the first $k$ items? There are $k!$ ways to arrange this $k$-subset as the first $k$ digits, and $(n-k)!$ ways to arrange the remaining digits. This means there are $k!\left(n-k\right)!$ permutations that match, and so: $$ \begin{equation*} \text{probability of selecting this }k\text{-subset} = \frac{k!\left(n-k\right)!}{n!} \end{equation*} $$ This probability is the same for every $k$-subset, so each one is equally likely – which is the thing we care about. This was enough to give me the confidence to try implementing Algorithm L. Implementing Algorithm L in an efficient way If we don’t know $n$ upfront, we could save all the items and only then generate the random variables $u_1, \ldots, u_n \sim U[0,1]$ – but that’s precisely the sort of inefficiency I’m trying to avoid! Fortunately, we don’t need to: the nice thing about this algorithm is that we only need to track the $k$ smallest values of $u_1, \ldots, u_i$ we’ve seen so far. Once a value is larger than the $k$ smallest, we can safely discard it because we know it’ll never be used. Here’s the approach I took: Create an empty “reservoir” of $k$ items. As you get items, assign each one a “weight” and start filling the reservoir. (These weights are the random variables $u_1, \ldots, u_n$.) If you run out of items before you fill the reservoir, go to step 4. If you fill the reservoir and there are more items, calculate the largest weight of the items in the reservoir, and go to step 3. Once the reservoir is full, go through the remaining items one-by-one. For each item, assign it a weight. If the weight of this new item is larger than the largest weight already in the reservoir, discard the item. This weight isn’t in the $k$ smallest, so we don’t care about it. If the weight of this new item is smaller than the largest weight in the resevoir, then add the item to the reservoir and remove the item with the previously-largest weight. Recalculate the largest weight of the items in the reservoir. When you run out of items, go to step 4. Return the items in the reservoir. This is your random sample. This approach means we only have to hold a fixed number of items/weights in memory at a time – much more efficient, and it should scale to an arbitrarily large number of inputs. It’s a bit too much code to include here, but you can read my Rust implementation on GitHub. I wrote some tests, which include a statistical test – I run the sampling code 10,000 times, and check the results are the uniform distribution I want. What did I learn about Rust? This is only about 250 lines of Rust, but it was still good practice, and I learnt a few new things. Working with generics I’ve used generics in other languages, and I’d read about them in the Rust Book, but I’d never written my own code using generics in Rust. I used a generic to write my sampling function: fn reservoir_sample<T>( mut items: impl Iterator<Item = T>, k: usize) -> Vec<T> { … } It was straightforward, and there were no big surprises. The difference between .iter() and .into_iter() I’ve used both of these methods before, but I only understood part of the difference. When you call .iter(), you’re borrowing the vector, which means it can be used later. When you call .into_iter(), you’re consuming the vector, which means it can’t be used later. I hadn’t thought about how this affects the types. When you call .iter(), you get an iterator of references. When you call .into_iter(), you get an iterator of values. This caused me some confusion when I was writing a test. Consider the following code: fn reservoir_sample<T>( mut items: impl Iterator<Item = T>, k: usize) -> Vec<T> { … } let letters = vec!["A", "A", "A"]; let items = letters.iter(); assert_eq!(reservoir_sample(items, 1), vec!["A"]); I was trying to write a test that reservoir_sample would only return the number of items I asked for, and no more. This was my first attempt, and it doesn’t compile. When I call letters.iter(), I’m getting an iterator of string references, that is Iterator<&&str>. Then I’m comparing it to a Vec<&str>, but Rust doesn’t know how to check equality of &str and &&str, so it refuses to compile this code. There are two ways I could fix this: Use .into_iter(), so I get an iterator of string values, i.e. Iterator<&str>. let letters = vec!["A", "A", "A"]; let items = letters.into_iter(); assert_eq!(reservoir_sample(items, 1), vec!["A"]); Change the expected result so it’s a Vec<&&str>: let letters = vec!["A", "A", "A"]; let items = letters.iter(); assert_eq!(reservoir_sample(items, 1), vec![&"A"]); I used .into_iter() in my tests. This sort of distinction is probably obvious to more experienced Rust programmers, but it was new to me. I’ve read about these methods, but I only understand them by writing code. Arrays are indexed with usize I wasn’t sure what type I should use for k, the size of the random sample. It’s a positive integer, but should I use u32 or usize? I read the descriptions of both, and it wasn’t immediately obvious which was preferable. I looked to Vec::with_capacity for inspiration, because it’s one of the methods I was using and it feels similar. It takes a single argument capacity: usize. That gave me an example to follow, but I still wanted to understand why. I did some more reading, and I learned that Rust arrays are indexed with usize. It makes sense that a pointer type is used for array indexing, but it’s been a while since I used a language with pointers, and so it didn’t occur to me. There’s a lot of cool stuff in std::collections At the core of this tool, I have a reservoir of weighted items, and I want to be able to find the item with the largest weight when it gets replaced. This sounds like priority queue, and there’s an implementation of one in the Rust standard library. I was able to use BinaryHeap from the std::collections module, which saved me from writing a bunch of fiddly code myself. Here’s the broad shape of it: struct WeightedItem<T> { item: T, weight: f64, } let mut reservoir: BinaryHeap<WeightedItem<T>> = BinaryHeap::with_capacity(k); There’s a bit more code to implement Eq and Ord for WeightedItem, but that wasn’t difficult. I didn’t even need to read the documentation – the compiler error messages were so helpful, I could just follow their suggestions to get a working solution. In this sense, Rust feels very like Python – both languages have a built-in collections module with some common data structures. I need to spend more time exploring the Rust variant, and there’s a When should you use which collection? guide to help me find the useful parts. This whole project is less than 250 lines of Rust, including tests. There are plenty of other tools that do the same thing, so I doubt anybody else will want to use it. Most people should use shuf – to which Assaf Gordon added reservoir sampling nearly twelve years ago. But in case anybody is interested, I’ve put all the code on GitHub. I’ve learnt every programming language in tiny steps – a little at a time, growing slowly until I have something approximating skill. This project is the latest tiny step towards learning Rust, and now I know a little bit more than I did before. It’s over eight years since I wrote my first Rust, and I’m still a beginner, but I’m having fun learning, and I’m having fun writing it down as I go. [If the formatting of this post looks odd in your feed reader, visit the original article]
Rust has become my go-to language for my personal toolbox – small, standalone utilities like create_thumbnail, emptydir, and dominant_colours. There’s no place for Rust in my day job, so having some self-contained hobby projects means I can still have fun playing with it. I’ve been using the assert_cmd crate to test my command line tools, but I wanted to review my testing approach before I write my next utility. My old code was fine and it worked, but that’s about all you could say about it – it wasn’t clean or idiomatic Rust, and it wasn’t especially readable. My big mistake was trying to write Rust like Python. I’d written wrapper functions that would call assert_cmd and return values, then I wrote my own assertions a bit like I’d write a Python test. I missed out on the nice assertion helpers in the crate. I’d skimmed just enough of the assert_cmd documentation to get something working, but I hadn’t read it properly. As I was writing this blog post, I went back and read the documentation in more detail, to understand the right way to use the crate. Here are some examples of how I’m using it in my refreshed test suites: Testing a basic command This test calls dominant_colours with a single argument, then checks it succeeds and that a single line is printed to stdout: use assert_cmd::Command; /// If every pixel in an image is the same colour, then the image /// has a single dominant colour. #[test] fn it_prints_the_colour() { Command::cargo_bin("dominant_colours") .unwrap() .arg("./src/tests/red.png") .assert() .success() .stdout("#fe0000\n") .stderr(""); } If I have more than one argument or flag, I can replace .arg with .args to pass a list: use assert_cmd::Command; /// It picks the best colour from an image to go with a background -- /// the colour with sufficient contrast and the most saturation. #[test] fn it_chooses_the_right_colour_for_a_light_background() { Command::cargo_bin("dominant_colours") .unwrap() .args(&[ "src/tests/stripes.png", "--max-colours=5", "--best-against-bg=#fff", ]) .assert() .success() .stdout("#693900\n") .stderr(""); } Alternatively, I can omit .arg and .args if I don’t need to pass any arguments. Testing error cases Most of my tests are around error handling – call the tool with bad input, and check it returns a useful error message. I can check that the command failed, the exit code, and the error message printed to stderr: use assert_cmd::Command; /// Getting the dominant colour of a file that doesn't exist is an error. #[test] fn it_fails_if_you_pass_an_nonexistent_file() { Command::cargo_bin("dominant_colours") .unwrap() .arg("doesnotexist.jpg") .assert() .failure() .code(1) .stdout("") .stderr("No such file or directory (os error 2)\n"); } Comparing output to a regular expression All the examples so far are doing an exact match for the stdout/stderr, but sometimes I need something more flexible. Maybe I only know what part of the output will look like, or I only care about checking how it starts. If so, I can use the predicate::str::is_match predicate from the predicates crate and define a regular expression I want to match against. Here’s an example where I’m checking the output contains a version number, but not what the version number is: use assert_cmd::Command; use predicates::prelude::*; /// If I run `dominant_colours --version`, it prints the version number. #[test] fn it_prints_the_version() { // Match strings like `dominant_colours 1.2.3` let is_version_string = predicate::str::is_match(r"^dominant_colours [0-9]+\.[0-9]+\.[0-9]+\n$").unwrap(); Command::cargo_bin("dominant_colours") .unwrap() .arg("--version") .assert() .success() .stdout(is_version_string) .stderr(""); } Creating focused helper functions I have a couple of helper functions for specific test scenarios. I try to group these by common purpose – they should be testing similar behaviour. I’m trying to avoid creating helpers for the sake of reducing repetitive code. For example, I have a helper function that passes a single invalid file to dominant_colours and checks the error message is what I expect: use assert_cmd::Command; use predicates::prelude::*; /// Getting the dominant colour of a file that doesn't exist is an error. #[test] fn it_fails_if_you_pass_an_nonexistent_file() { assert_file_fails_with_error( "./doesnotexist.jpg", "No such file or directory (os error 2)\n", ); } /// Try to get the dominant colours for a file, and check it fails /// with the given error message. fn assert_file_fails_with_error( path: &str, expected_stderr: &str, ) -> assert_cmd::assert::Assert { Command::cargo_bin("dominant_colours") .unwrap() .arg(path) .assert() .failure() .code(1) .stdout("") .stderr(predicate::eq(expected_stderr)) } Initially I wrote this helper just calling .stderr(expected_stderr) to do an exact match, like in previous tests, but I got an error “expected_stderr escapes the function body here”. I’m not sure what that means – it’s something to do with borrowing – but wrapping it in a predicate seems to fix the error, so I’m happy. My test suite is a safety net, not a playground Writing this blog post has helped me refactor my tests into something that’s actually good. I’m sure there’s still room for improvement, but this is the first iteration that I feel happy with. It’s no coincidence that it looks very similar to other test suites using assert_cmd. My earlier approaches were far too clever. I was over-abstracting to hide a few lines of boilerplate, which made the tests harder to follow. I even wrote a macro with a variadic interface because of a minor annoyance, which is stretching the limits of my Rust knowledge. It was fun to write, but it would have been a pain to debug or edit later. It’s okay to have a bit of repetition in a test suite, if it makes them easier to read. I keep having to remind myself of this – I’m often tempted to create helper functions whose sole purpose is to remove boilerplate, or create some clever parametrisation which only made sense as I’m writing it. I need to resist the urge to compress my test code. My new tests are more simple and more readable. There’s a time and a place for clever code, but my test suite isn’t it. [If the formatting of this post looks odd in your feed reader, visit the original article]
I use 1Password to store the passwords for my online accounts, and I’ve been reviewing it as a new year cleanup task. I’ve been deleting unused accounts, changing old passwords which were weak, and making sure I’ve enabled multi-factor authentication for key accounts. Each 1Password item has a notes field, and I use it to record extra information about each account. I’ve never seen anybody else talk about these notes, or how they use them, but I find it invaluable, so I thought it’d be worth explaining what I do. (This is different from the Secure Notes feature – I’m talking about the notes attached to other 1Password items, not standalone notes.) Lots of password managers have a notes field, so you can keep notes even if you don’t use 1Password. I use the notes field as a mini-changelog, where I write dated entries to track the history of each account. Here’s some of the stuff I write down: Why did I create this account? If the purpose of an account isn’t obvious, I write a note that explains why I created it. This happens more often than you might think. For example, there are lots of ticketing websites that don’t allow a guest checkout – you have to make an account. If I’m only booking a single event, I’ll save the account in 1Password, and without a note it would be easy to forget why the account exists. Why did I make significant changes? I write down the date and details of anything important I change in an account, like: Updating the email address Changing the password Adding or removing authentication methods like passkeys Enabling multi-factor authentication If it’s useful, I include an explanation of why I made a change, not just what it was. For example, when I change a password: was it because the old password was weak, because the site forced a reset, or because I thought a password might be compromised? Somebody recently tried to hack into my broadband account, so I reset the password as a precaution. I wrote a note about it, so if I see signs of another hacking attempt, I’ll remember what happened the first time. Why is it set up in an unusual way? There are a small number of accounts that I set up in a different way to the rest of my accounts. I write down the reason, so my future self knows why and doesn’t try to “fix” the account later. For example, most of my accounts are linked to my @alexwlchan.net email address – but a small number of them are tied to other email addresses. When I do this, I wrote a note explaining why I deliberately linked that account to another email. The most common reason I do this is because the account is particularly important, and if I lost access to my @alexwlchan.net email, I wouldn’t want to lose access to that account at the same time. What are the password rules? I write down any frustrating password rules I discover. This is particularly valuable if those rules aren’t explicitly documented, and you can only discover them by trial and error. I include a date with each of these rules, in case they change later. These notes reduce confusion and annoyance if I ever have to change the password. It also means that when I’m reviewing my passwords later, I know that there’s a reason I picked a fairly weak password – I’d done the best I could given the site’s requirements. Here’s a real example: “the password reset UI won’t tell you this, but passwords longer than 16 characters are silently truncated, and must be alphanumeric only”. Do I have multi-factor authentication? I enable multi-factor authentication (MFA) for important accounts, but I don’t put the MFA codes in 1Password. Keeping the password and MFA code in the same app is collapsing multiple authentication factors back into one. If I do enable MFA, I write a note in 1Password that says when I enabled it, where to find my MFA codes, and where to find my account recovery codes. For example, when I used hardware security keys at work, I wrote notes about where the keys were stored (“in the fire safe, ask Jane Smith in IT to unlock”) and how to identify different keys (“pink ribbon = workflow account”). These details weren’t sensitive security information, but they were easy to forget. Sometimes I choose not to enable MFA even though it’s available, and I write a note about that as well. For example, Wikipedia supports MFA but it’s described as “experimental and optional”, so I’ve decided not to enable it on my account yet. Why doesn’t this account exist any more? When I deactivate an account, I don’t delete it from 1Password. Instead, I write a final note explaining why and how I deactivated it, and then I move it to the Archive. I prefer this to deleting the entry – it means I still have some record that the account existed, and I can see how long it existed for. I’ve only had to retrieve something from the archive a handful of times, but I was glad I could do so, and I don’t see any downside to having a large archive. I also use the archive for accounts that I can’t delete, but are probably gone. For example, I have old accounts with utility companies that have been acquired or gone bust, and their website no longer exists. My account is probably gone, but I have no way of verifying that. Moving it to the archive gets it out of the way, and I still have the password if it ever comes back. What am I going to forget? I’m not trying to create a comprehensive audit trail of my online accounts – I’m just writing down stuff I think will be helpful later, and which I know I’m likely to forget. It only takes a few seconds to write each note. Writing notes is always a tricky balance. You want to capture the useful information, but you don’t want the note-taking to become a chore, or for the finished notes to be overwhelming. I’ve only been writing notes in my password manager for a few years, so I might not have the right balance yet – but I’m almost certainly better off than I was before, when I wasn’t writing any. I’m really glad I started keeping notes in my password manager, and if you’ve never done it, I’d encourage you to try it. [If the formatting of this post looks odd in your feed reader, visit the original article]
I read 58 books this year – slightly down on last year, but I’m still happy with that number. I spent a lot of time this on my own writing and crafting, and I had less time for books. I returned to a couple of favourite authors and their latest releases – including Toshikazu Kawaguchi, Darcie Little Badger, and Maureen Johnson – but I also discovered some new-to-me authors who are on my “must read” list for anything they release in future. As well as the authors mentioned below, the books I read by Ravena Guron, Ashley Herring Blake and the late Jean-Paul Didierlaurent were highlights this year. Ashley Herring Blake is also the winner of this year’s serendipity award: I was reading Astrid Parker Doesn’t Fall, and it mentions Written in the Stars – my copy of which was literally within reach at the time. (That was one of my favourites in 2021.) I track the books I read at books.alexwlchan.net, and I also write an annual round-up post of my favourites. This is the fourth year of that tradition. Previously: 2023, 2022, 2021. Below are the best books I read in 2024, in the order I read them. Show Me The Bodies 13 January 2024 This is a thorough and damning description of the systemic failures that led to the Grenfell Tower fire, a high-rise fire in London in 2017 that killed 72 people. Major themes include repeated warnings about unsafe cladding, government and companies looking the other way because it was cheaper than doing the right thing, and multiple missed opportunities to avoid the tragedy. The chapters alternate between giving a timeline of the fire on the night (12.54 a.m., 1.20 a.m., 1.30 a.m., …) and describing the longer-term events that led to the fire. It’s a powerful if unsettling read. I don’t enjoy the events it described, but the writing is clear and thorough. The author has done plenty of research and spoken to a lot of Grenfell survivors, which creates a good mix of detail and human stories. There’s so much good stuff in here (“good” with an asterisk), and if you like reading post-event analyses of disasters then you should read this. The Checklist Manifesto 10 April 2024 A fascinating book about how checklists are used for quality assurance by industries like finance, construction, and aviation – and the author is a surgeon trying to introduce them in medicine. It’s a mix of stories about people who are using already using checklists, and discussion of the ideas and theory behind them. It has a lot of practical information, and delivers it in a concise and easily readable package. It’s also upfront about limitations and risks; it doesn’t present checklists as some sort of panacea, but instead discusses the challenges of doing them properly. I’ve heard about the power of checklists and this book in particular for years, and I was glad to finally read it. Herc 13 May 2024 This is fun retelling of the myth of Hercules (or “Herc”). It’s told entirely in the first person, by the people who met him – not by Hercules himself. The perspective shifts from chapter to chapter, as we hear the havoc he caused in their lives, before the story moves on to somebody else. Herc is traced entirely by the outline he leaves in the lives of people around him, and not as the hero of his own story. There’s some beautiful writing in here, especially around grief and trauma – and where Hercules goes, both are bound to follow. I’m biased because the author is a dear friend, and I’m guilty that it took me so long to read it – but I’m glad I finally did, and other friends who’ve read it have similarly enjoyed it. Wasteland 24 May 2024 This is a book about waste – where it goes, how it’s processed, and what really happens when you put something in the recycling bin. The author visited waste management sites all over the world, including tip sites in India, toxic mine spoil in the US, and nuclear waste storage in the UK. The conversations they retell present a more nuanced view than “waste bad, recycling good”, and explore the non-obvious side effects of certain attempts to make things better – for example, how EU laws prohibiting the export of broken electronics have affected the second-hand and repair market in Ghana. The book doesn’t try to evangelise too hard, and at multiple points the author freely admits they don’t know the answer. It’s obviously not a pleasant read, but I’m glad I read it and it’s given me a lot to think about. I want to think more about sustainability and my environmental impact in 2025, and this book set me on that path. Silo 49: Flying Season for the Mis-Recorded 17 July 2024 This is a darker entry: it’s about a girl in a dystopian society who commits suicide. It’s set in Hugh Howey’s Silo universe, now a television series on Apple TV+. The Earth has been destroyed by an apocalypse, the outside world is toxic, and humanity survives in underground silos – giant self-sustaining cities. Most of the Silo residents know nothing about human history or how they came to live underground. This is the final novella in the Silo 49 tetralogy, a supposedly enlightened society where more people know the truth of Earth’s history, and the dark forces that drove them underground. After three books showing how a society can improve, we see how the darker shades of humanity still survive – judgement, prejudice, and discrimination. We see the final hours of Lizbet’s life, the escape of dancing at the club, and the arrest of her father and her social ostracisation. I felt instant sympathy for her, and I cried when she went over the rails. The writing is as lyrical as the subject matter is grim. Lover Birds 31 August 2024 To the end the list on a happier note, this is my favourite romance of the year. It’s a feel-good YA romcom that was exactly the sort of light-hearted, fun read I needed to round out my summer. It hits all of my favourite tropes: enemies-to-lovers, oblivious lesbians, girls standing up for each other, and showing somebody the place where you live. I love romance stories that have a strong sense of place and location. I completely missed the pun in the title until it was pointed out to me, but it tickled me when I saw it. I can’t remember the last time I went to Liverpool, but I got a feel for it as Lou shows Isabel her home. One of the main characters has ADHD, and it’s a big part of the story. This meant a lot to me, as it’s the first book I can recall reading with an explicitly ADHD character. [If the formatting of this post looks odd in your feed reader, visit the original article]
More in programming
The first time we had to evacuate Malibu this season was during the Franklin fire in early December. We went to bed with our bags packed, thinking they'd probably get it under control. But by 2am, the roaring blades of fire choppers shaking the house got us up. As we sped down the canyon towards Pacific Coast Highway (PCH), the fire had reached the ridge across from ours, and flames were blazing large out the car windows. It felt like we had left the evacuation a little too late, but they eventually did get Franklin under control before it reached us. Humans have a strange relationship with risk and disasters. We're so prone to wishful thinking and bad pattern matching. I remember people being shocked when the flames jumped the PCH during the Woolsey fire in 2017. IT HAD NEVER DONE THAT! So several friends of ours had to suddenly escape a nightmare scenario, driving through burning streets, in heavy smoke, with literally their lives on the line. Because the past had failed to predict the future. I feel into that same trap for a moment with the dramatic proclamations of wind and fire weather in the days leading up to January 7. Warning after warning of "extremely dangerous, life-threatening wind" coming from the City of Malibu, and that overly-bureaucratic-but-still-ominous "Particularly Dangerous Situation" designation. Because, really, how much worse could it be? Turns out, a lot. It was a little before noon on the 7th when we first saw the big plumes of smoke rise from the Palisades fire. And immediately the pattern matching ran astray. Oh, it's probably just like Franklin. It's not big yet, they'll get it out. They usually do. Well, they didn't. By the late afternoon, we had once more packed our bags, and by then it was also clear that things actually were different this time. Different worse. Different enough that even Santa Monica didn't feel like it was assured to be safe. So we headed far North, to be sure that we wouldn't have to evacuate again. Turned out to be a good move. Because by now, into the evening, few people in the connected world hadn't started to see the catastrophic images emerging from the Palisades and Eaton fires. Well over 10,000 houses would ultimately burn. Entire neighborhoods leveled. Pictures that could be mistaken for World War II. Utter and complete destruction. By the night of the 7th, the fire reached our canyon, and it tore through the chaparral and brush that'd been building since the last big fire that area saw in 1993. Out of some 150 houses in our immediate vicinity, nearly a hundred burned to the ground. Including the first house we moved to in Malibu back in 2009. But thankfully not ours. That's of course a huge relief. This was and is our Malibu Dream House. The site of that gorgeous home office I'm so fond to share views from. Our home. But a house left standing in a disaster zone is still a disaster. The flames reached all the way up to the base of our construction, incinerated much of our landscaping, and devoured the power poles around it to dysfunction. We have burnt-out buildings every which way the eye looks. The national guard is still stationed at road blocks on the access roads. Utility workers are tearing down the entire power grid to rebuild it from scratch. It's going to be a long time before this is comfortably habitable again. So we left. That in itself feels like defeat. There's an urge to stay put, and to help, in whatever helpless ways you can. But with three school-age children who've already missed over a months worth of learning from power outages, fire threats, actual fires, and now mudslide dangers, it was time to go. None of this came as a surprise, mind you. After Woolsey in 2017, Malibu life always felt like living on borrowed time to us. We knew it, even accepted it. Beautiful enough to be worth the risk, we said. But even if it wasn't a surprise, it's still a shock. The sheer devastation, especially in the Palisades, went far beyond our normal range of comprehension. Bounded, as it always is, by past experiences. Thus, we find ourselves back in Copenhagen. A safe haven for calamities of all sorts. We lived here for three years during the pandemic, so it just made sense to use it for refuge once more. The kids' old international school accepted them right back in, and past friendships were quickly rebooted. I don't know how long it's going to be this time. And that's an odd feeling to have, just as America has been turning a corner, and just as the optimism is back in so many areas. Of the twenty years I've spent in America, this feels like the most exciting time to be part of the exceptionalism that the US of A offers. And of course we still are. I'll still be in the US all the time on both business, racing, and family trips. But it won't be exclusively so for a while, and it won't be from our Malibu Dream House. And that burns.
I’ve been doing Dry January this year. One thing I missed was something for apéro hour, a beverage to mark the start of the evening. Something complex and maybe bitter, not like a drink you’d have with lunch. I found some good options. Ghia sodas are my favorite. Ghia is an NA apéritif based on grape juice but with enough bitterness (gentian) and sourness (yuzu) to be interesting. You can buy a bottle and mix it with soda yourself but I like the little cans with extra flavoring. The Ginger and the Sumac & Chili are both great. Another thing I like are low-sugar fancy soda pops. Not diet drinks, they still have a little sugar, but typically 50 calories a can. De La Calle Tepache is my favorite. Fermented pineapple is delicious and they have some fun flavors. Culture Pop is also good. A friend gave me the Zero book, a drinks cookbook from the fancy restaurant Alinea. This book is a little aspirational but the recipes are doable, it’s just a lot of labor. Very fancy high end drink mixing, really beautiful flavor ideas. The only thing I made was their gin substitute (mostly junipers extracted in glycerin) and it was too sweet for me. Need to find the right use for it, a martini definitely ain’t it. An easier homemade drink is this Nonalcoholic Dirty Lemon Tonic. It’s basically a lemonade heavily flavored with salted preserved lemons, then mixed with tonic. I love the complexity and freshness of this drink and enjoy it on its own merits. Finally, non-alcoholic beer has gotten a lot better in the last few years thanks to manufacturing innovations. I’ve been enjoying NA Black Butte Porter, Stella Artois 0.0, Heineken 0.0. They basically all taste just like their alcoholic uncles, no compromise. One thing to note about non-alcoholic substitutes is they are not cheap. They’ve become a big high end business. Expect to pay the same for an NA drink as one with alcohol even though they aren’t taxed nearly as much.
Thou shalt not suffer a flaky test to live, because it’s annoying, counterproductive, and dangerous: one day it might fail for real, and you won’t notice. Here’s what to do.
The ware for January 2025 is shown below. Thanks to brimdavis for contributing this ware! …back in the day when you would get wares that had “blue wires” in them… One thing I wonder about this ware is…where are the ROMs? Perhaps I’ll find out soon! Happy year of the snake!
While I frequently hear engineers bemoan a missing strategy, they rarely complete the thought by articulating why the missing strategy matters. Instead, it serves as more of a truism: the economy used to be better, children used to respect their parents, and engineering organizations used to have an engineering strategy. This chapter starts by exploring something I believe quite strongly: there’s always an engineering strategy, even if there’s nothing written down. From there, we’ll discuss why strategy, especially written strategy, is such a valuable opportunity for organizations that take it seriously. We’ll dig into: Why there’s always a strategy, even when people say there isn’t How strategies have been impactful across my career How inappropriate strategies create significant organizational pain without much compensating impact How written strategy drives organizational learning The costs of not writing strategy down How strategy supports personal learning and developing, even in cases where you’re not empowered to “do strategy” yourself By this chapter’s end, hopefully you will agree with me that strategy is an undertaking worth investing your–and your organization’s–time in. This is an exploratory, draft chapter for a book on engineering strategy that I’m brainstorming in #eng-strategy-book. As such, some of the links go to other draft chapters, both published drafts and very early, unpublished drafts. There’s always a strategy I’ve never worked somewhere where people didn’t claim there as no strategy. In many of those companies, they’d say there was no engineering strategy. Once I became an executive and was able to document and distribute an engineering strategy, accusations of missing strategy didn’t go away, they just shfited to focus on a missing product or company strategy. This even happened at companies that definitively had engineering strategies like Stripe in 2016 which had numerous pillars to a clear engineering strategy such as: Maintain backwards API compatibilty, at almost any cost (e.g. force an upgrade from TLS 1.2 to TLS 1.3 to retain PCI compliance, but don’t force upgrades from the /v1/charges endpoint to the /v1/payment_intents endpoint) Work in Ruby in a monorepo, unless it’s the PCI environment, data processing, or data science work Engineers are fully responsible for the usability of their work, even when there are product or engineering managers involved Working there it was generally clear what the company’s engineering strategy was on any given topic. That said, it sometimes required asking around, and over time certain decisions became sufficiently contentious that it became hard to definitively answer what the strategy was. For example, the adoptino of Ruby versus Java became contentious enough that I distributed a strategy attempting to mediate the disagreement, Magnitudes of exploration, although it wasn’t a particularly successful effort (for reasons that are obvious in hindsight, particularly the lack of any enforcement mechanism). In the same sense that William Gibson said “The future is already here – it’s just not very evenly distributed,” there is always a strategy embedded into an organization’s decisions, although in many organizations that strategy is only visible to a small group, and may be quickly forgotten. If you ever find yourself thinking that a strategy doesn’t exist, I’d encourage you to instead ask yourself where the strategy lives if you can’t find it. Once you do find it, you may also find that the strategy is quite ineffective, but I’ve simply never found that it doesn’t exist. Strategy is impactful In “We are a product engineering company!”, we discuss Calm’s engineering strategy to address pervasive friction within the engineering team. The core of that strategy is clarifying how Calm makes major technology decisions, along with documenting the motivating goal steering those decisions: maximizing time and energy spent on creating their product. That strategy reduced friction by eliminating the cause of ongoing debate. It was successful in resetting the team’s focus. It also caused several engineers to leave the company, because it was incompatible with their priorities. It’s easy to view that as a downside, but I don’t think it was. A clear, documented strategy made it clear to everyone involved what sort of game we were playing, the rules for that game, and for the first time let them accurately decide if they wanted to be part of that game with the wider team. Creating alignment is one of the ways that strategy makes an impact, but it’s certainly not the only way. Some of the ways that strategies support the creating organization are: Concentrating company investment into a smaller space. For example, deciding not to decompose a monolith allows you to invest the majority of your tooling efforts on one language, one test suite, and one deployment mechanism. Many interesting properties only available through universal adoption. For example, moving to an “N-1 policy” on backfilled roles is a significant opportunity for managing costs, but only works if consistently adopted. As another example, many strategies for disaster recovery or multi-region are only viable if all infrastructure has a common configuration mechanism. Focus execution on what truly matters. For example, Uber’s service migration strategy allowed a four engineer team to migrate a thousand services operated by two thousand engineers to a new provisioning and orchestration platform in less than a year. This was an extraordinarily difficult project, and was only possible because of clear thinking. Creating a knowledge repository of how your organization thinks. Onboarding new hires, particularly senior new hires, is much more effective with documented strategy. For example, most industry professionals today have a strongly held opinion on how to adopt large language models. New hires will have a strong opinion as well, but they’re unlikely to share your organization’s opinion unless there’s a clear document they can read to understand it. There are some things that a strategy, even a cleverly written one, cannot do. However, it’s always been my experience that developing a strategy creates progress, even if the progress is understanding the inherent disagreement preventing agreement. Inappropriate strategy is especially impactful While good strategy can accomplish many things, it sometimes feels that inappropriate strategy is far more impactful. Of course, impactful in all the wrong ways. Digg V4 remains the worst considered strategy I’ve personally participated in. It was a complete rewrite of the Digg V3.5 codebase from a PHP monolith to a PHP frontend and backend of a dozen Python services. It also moved the database from sharded MySQL to an early version of Cassandra. Perhaps worst, it replaced the nuanced algorithms developed over a decade with a hack implemented a few days before launch. Although it’s likely Digg would have struggled to become profitable due to its reliance on search engine optimization for traffic, and Google’s frequently changing search algorithm of that era, the engineering strategy ensured we died fast rather than having an opportunity to dig our way out. Importantly, it’s not just Digg. Almost every engineering organization you drill into will have it’s share of unused platform projects that captured decades of engineering years to the detriment of an important opportunity. A shocking number of senior leaders join new companies and initiate a grand migration that attempts to entirely rewrite the architecture, switch programming languages, or otherwise shift their new organization to resemble a prior organization where they understood things better. Inappropriate versus bad When I first wrote this section, I just labeled this sort of strategy as “bad.” The challenge with that term is that the same strategy might well be very effective in a different set of circumstances. For example, if Digg had been a three person company with no revenue, rewriting from scratch could have the right decision! As a result, I’ve tried to prefer the term “inappropriate” rather than “bad” to avoid getting caught up on whether a given approach might work in other circumstances. Every approach undoubtedly works in some organization. Written strategy drives organizational learning When I joined Carta, I noticed we had an inconsistent approach to a number of important problems. Teams had distinct standard kits for how they approached new projects. Adoption of existing internal platforms was inconsistent, as was decision making around funding new internal platforms. There was widespread agreement that we were decomposing our monolith, but no agreement on how we were doing it. Coming into such a permissive strategy environment, with strong, differing perspectives on the ideal path forward, one of my first projects was writing down an explicit engineering strategy along with our newly formed Navigators team, itself a part of our new engineering strategy. Navigators at Carta As discussed in Navigators, we developed a program at Carta to have explicitly named individual contributor, technical leaders to represent key parts of the engineering organization. This representative leadership group made it possible to iterate on strategy with a small team of about ten engineers that represented the entire organization, rather than take on the impossible task of negotiating with 400 engineers directly. This written strategy made it possible to explicitly describe the problems we saw, and how we wanted to navigate those problems. Further, it was an artifact that we were able to iterate on in a small group, but then share widely for feedback from teams we might have missed. After initial publishing, we shared it widely and talked about it frequently in engineering all-hands meetings. Then we came back to it each year, or when things stopped making much sense, and revised it. As an example, our initial strategy didn’t talk about artificial intelligence at all. A few months later, we extended it to mention a very conservative approach to using Large Language Models. Most recently, we’ve revised the artificial intelligence portion again, as we dive deeply into agentic workflows. A lot of people have disagreed with parts of the strategy, which is great: that’s one of the key benefits of a written strategy, it’s possible to precisely disagree. From that disagreement, we’ve been able to evolve our strategy. Sometimes because there’s new information like the current rapidly evolution of artificial intelligence pratices, and other times because our initial approach could be improved like in how we gated membership of the initial Navigators team. New hires are able to disagree too, and do it from an informed place rather than coming across as attached to their prior company’s practices. In particular, they’re able to understand the historical thinking that motivated our decisions, even when that context is no longer obvious. At the time we paused decomposition of our monolith, there was significant friction in service provisioning, but that’s far less true today, which makes the decision seem a bit arbitrary. Only the written document can consistently communicate that context across a growing, shifting, and changing organization. With oral history, what you believe is highly dependent on who you talk with, which shapes your view of history and the present. With writen history, it’s far more possible to agree at scale, which is the prerequisite to growing at scale rather than isolating growth to small pockets of senior leadership. The cost of implicit strategy We just finished talking about written strategy, and this book spends a lot of time on this topic, including a chapter on how to structure strategies to maximize readability. It’s not just because of the positives created by written strategy, but also because of the damage unwritten strategy creates. Vulnerable to misinterpretation. Information flow in verbal organizations depends on an individual being in a given room for a decision, and then accurately repeating that information to the others who need it. However, it’s common to see those individuals fail to repeat that information elsewhere. Sometimes their interpretation is also faulty to some degree. Both of these create significant problems in operating strategy. Two-headed organizations Some years ago, I started moving towards a model where most engineering organizations I worked with have two leaders: one who’s a manager, and another who is a senior engineer. This was partially to ensure engineering context was included in senior decision making, but it was also to reduce communication errors. Errors in point-to-point communication are so prevalent when done one-to-one, that the only solution I could find for folks who weren’t reading-oriented communicators was ensuring I had communicated strategy (and other updates) to at least two people. Inconsistency across teams. At one company I worked in, promotions to Staff-plus role happened at a much higher rate in the infrastructure engineering organization than the product engineering team. This created a constant drain out of product engineering to work on infrastructure shaped problems, even if those problems weren’t particularly valuable to the business. New leaders had no idea this informal policy existed, and they would routinely run into trouble in calibration discussions. They also weren’t aware they needed to go argue for a better policy. Worse, no one was sure if this was a real policy or not, so it was ultimately random whether this perspective was represented for any given promotion: sometimes good promotions would be blocked, sometimes borderline cases would be approved. Inconsistency over time. Implementing a new policy tends to be a mix of persistent and one-time actions. For example, let’s say you wanted to standardize all HTTP operations to use the same library across your codebase. You might add a linter check to reject known alternatives, and you’ll probably do a one-time pass across your codebase standardizing on that library. However, two years later there are another three random HTTP libraries in your codebase, creeping into the cracks surrounding your linting. If the policy is written down, and a few people read it, then there’s a number of ways this could be nonetheless prevented. If it’s not written down, it’s much less likely someone will remember, and much more likely they won’t remember the rationale well enough to argue about it. Hazard to new leadership. When a new Staff-plus engineer or executive joins a company, it’s common to blame them for failing to understand the existing context behind decisions. That’s fair: a big part of senior leadership is uncovering and understanding context. It’s also unfair: explicit documentation of prior thinking would have made this much easier for them. Every particularly bad new-leader onboarding that I’ve seen has involved a new leader coming into an unfilled role, that the new leader’s manager didn’t know how to do. In those cases, success is entirely dependent on that new leader’s ability and interest in learning. In most ways, the practice of documenting strategy has a lot in common with succession planning, where the full benefits accrue to the organization rather than to the individual doing it. It’s possible to maintain things when the original authors are present, appreciating the value requires stepping outside yourself for a moment to value things that will matter most to the organization when you’re no longer a member. Information herd immunity A frequent objection to written strategy is that no one reads anything. There’s some truth to this: it’s extremely hard to get everyone in an organization to know something. However, I’ve never found that goal to be particularly important. My view of information dispersal in an organization is the same as Herd immunity: you don’t need everyone to know something, just to have enough people who know something that confusion doesn’t propagate too far. So, it may be impossible for all engineers to know strategy details, but you certainly can have every Staff-plus engineer and engineering manager know those details. Strategy supports personal learning While I believe that the largest benefits of strategy accrue to the organization, rather than the individual creating it, I also believe that strategy is an underrated avenue for self-development. The ways that I’ve seen strategy support personal development are: Creating strategy builds self-awareness. Starting with a concrete example, I’ve worked with several engineers who viewed themselves as extremely senior, but frequently demanded that projects were implemented using new programming languages or technologies because they personally wanted to learn about the technology. Their internal strategy was clear–they wanted to work on something fun–but following the steps to build an engineering strategy would have created a strategy that even they agreed didn’t make sense. Strategy supports situational awareness in new environments. Wardley mapping talks a lot about situational awareness as a prerequisite to good strategy. This is ensuring you understand the realities of your circumstances, which is the most destructive failure of new senior engineering leaders. By explicitly stating the diagnosis where the strategy applied, it makes it easier for you to debug why reusing a prior strategy in a new team or company might not work. Strategy as your personal archive. Just as documented strategy is institutional memory, it also serves as personal memory to understand the impact of your prior approaches. Each of us is an archivist of our prior work, pulling out the most valuable pieces to address the problem at hand. Over a long career, memory fades–and motivated reasoning creeps in–but explicit documentation doesn’t. Indeed, part of the reason I started working on this book now rather than later is that I realized I was starting to forget the details of the strategy work I did earlier in my career. If I wanted to preserve the wisdom of that era, and ensure I didn’t have to relearn the same lessons in the future, I had to write it now. Summary We’ve covered why strategy can be a valuable learning mechanism for both your engineering organization and for you. We’ve shown how strategies have helped organizations deal with service migrations, monolith decomposition, and right-sizing backfilling. We’ve also discussed how inappropriate strategy contributed to Digg’s demise. However, if I had to pick two things to emphasize as this chapter ends, it wouldn’t be any of those things. Rather, it would be two themes that I find are the most frequently ignored: There’s always a strategy, even if it isn’t written down. The single biggest act you can take to further strategy in your organization is to write down strategy so it can be debated, agreed upon, and explicitly evolved. Discussions around topics like strategy often get caught up in high prestige activities like making controversial decisions, but the most effective strategists I’ve seen make more progress by actually performing the basics: writing things down, exploring widely to see how other companies solve the same problem, accepting feedback into their draft from folks who disagree with them. Strategy is useful, and doing strategy can be simple, too.