More from alexwlchan
I’m trying something a bit different today – fiction. I had an idea for a short story the other evening, and I fleshed it out into a proper piece. I want to get better at writing fiction, and the only way to do that is with practice. I hope you like what I’ve written! When the fire starts, I am already running for the exit. When the fire starts, the world is thrown into sharp relief. I have worked in this theatre since it opened its doors. When the fire starts, my work begins – and in a way, it also ends. When the fire starts, they run beneath me. When the fire starts, they leave their bags behind. Their coats. Their tickets. They hear me, though I have no voice. When the fire starts, I know I will never leave. When the fire starts, I will keep running. I will always be running for the exit, because somebody must. A “running man” exit sign. Photo by Mateusz Dach on Pexels, used under the Pexels license. Hopefully it’s clear that this isn’t a story about a person, but about the “running man” who appears on emergency exit signs around the world. It’s an icon that was first devised by Japanese graphic designer Yukio Ota in 1970 and adopted as an international symbol in 1985. I was sitting in the theatre on Friday evening, waiting for the second half to start, and my eye was drawn to the emergency exit signs. It struck me that there’s a certain sort of tragedy to the running man – although he guides people to the exit, in a real fire his sign will be burnt to a crisp. I wrote the first draft on the train home, and I finished it today. I found the “when the fire starts” line almost immediately, but the early drafts were more vague about the protagonist. I thought it would be fun to be quite mysterious, to make it a shocking realisation that they’re actually a pictogram. I realised it was too subtle – I don’t think you’d necessarily work out who I was talking about. I rewrote it so you get the “twist” much earlier, and I think the concept still works. Another change in the second draft was the line breaks. I use semantic linebreaks in my source code, but they get removed in the rendered site. A paragraph gets compressed into a single line. That’s fine for most prose, but I realised I was losing something in this short story. Leaning into the line breaks highlights the repetition and the structure of the words, so I put them back. It gives the story an almost poetic quality. I’ve always been able to find stories in the everyday and the mundane – a pencil is a rocket ship, a plate is a building, a sock is a cave. The only surprising thing about this idea is that it’s taken me this long to turn the running man into a character in one of my stories. I really enjoyed writing this, so maybe you’ll see more short stories in the future. I have a lot of ideas, but not much experience turning them into written prose. Watch this space! [If the formatting of this post looks odd in your feed reader, visit the original article]
Yesterday I gave a talk at Monki Gras 2025. This year, the theme is Sustaining Software Development Craft, and here’s the description from the conference website: The big question we want to explore is – how can we keep doing the work we do, when it sustains us, provides meaning and purpose, and sometimes pays the bills? We’re in a period of profound change, technically, politically, socially, economically, which has huge implications for us as practitioners, the makers and doers, but also for the culture at large. I did a talk about the first decade of my career, which I’ve spent working on projects that are designed to last. I’m pleased with my talk, and I got a lot of nice comments. Monki Gras is always a pleasure to attend and speak at – it’s such a lovely, friendly vibe, and the organisers James Governor and Jessica West do a great job of making it a nice day. When I left yesterday, I felt warm and fuzzy and appreciated. I also have a front-row photo of me speaking, courtesy of my dear friend Eriol Fox. Naturally, I chose my outfit to match my slides (and this blog post!). Key points How do you create something that lasts? You can’t predict the future, but there are patterns in what lasts People skills sustain a career more than technical skills Long-lasting systems cannot grow without bound; they need weeding Links/recommended reading Sibyl Schaefer presented a paper Energy, Digital Preservation, and the Climate at iPres 2024, which is about how digital preservation needs to change in anticipation of the climate crisis. This was a major inspiration for this talk. Simon Willison gave a talk Coping strategies for the serial project hoarder at DjangoCon US in 2022, which is another inspiration for me. I’m not as prolific as Simon, but I do see parallels between his approach and what I remember of Metaswitch. Most of the photos in the talk come from the Flickr Commons, a collection of historical photographs from over 100 international cultural heritage organisations. You can learn more about the Commons, browse the photos, and see who’s involved using the Commons Explorer https://commons.flickr.org/. (Which I helped to build!) Slides and notes Photo: dry stone wall building in South Wales. Taken by Wikimedia Commons user TR001, used under CC BY‑SA 3.0. [Make introductory remarks; name and pronouns; mention slides on my website] I’ve been a software developer for ten years, and I’ve spent my career working on projects that are designed to last – first telecoms and networking, now cultural heritage – so when I heard this year’s theme “sustaining craft”, I thought about creating things that last a long time. The key question I want to address in this talk is how do you create something that lasts? I want to share a few thoughts I’ve had from working on decade- and century-scale projects. Part of this is about how we sustain ourselves as software developers, as the individuals who create software, especially with the skill threat of AI and the shifting landscape of funding software. I also want to go broader, and talk about how we sustain the craft, the skill, the projects. Let’s go through my career, and see what we can learn. Photo: women working at a Bell System telephone switchboard. From the U.S. National Archives, no known copyright restrictions. My first software developer job was at a company called Metaswitch. Not a household name, they made telecoms equipment, and you’d probably have heard of their customers. They sold equipment to carriers like AT&T, Vodafone, and O2, who’d use that equipment to sell you telephone service. Telecoms infrastructure is designed to last a long time. I spent most of my time at Metaswitch working with BGP, a routing protocol designed on a pair of napkins in 1989. BGP is sometimes known as the "two-napkin protocol", because of the two napkins on which Kirk Lougheed and Yakov Rekhter wrote the original design. From the Computer History Museum. These are those napkins. This design is basically still the backbone of the Internet. A lot of the building blocks of the telephone network and the Internet are fundamentally the same today as when they were created. I was working in a codebase that had been actively developed for most of my life, and was expected to outlast me. This was my first job so I didn’t really appreciate it at the time, but Metaswitch did a lot of stuff designed to keep that codebase going, to sustain it into the future. Let’s talk about a few of them. Photo: a programmer testing electronic equipment. From the San Diego Air & Space Museum Archives, no known copyright restrictions. Metaswitch was very careful about adopting new technologies. Most of their code was written in C, a little C++, and Rust was being adopted very slowly. They didn’t add new technology quickly. Anything they add, they have to support for a long time – so they wanted to pick technologies that weren’t a flash in the pan. I learnt about something called “the Lindy effect” – this is the idea that any technology is about halfway through its expected life. An open-source library that’s been developed for decades? That’ll probably be around a while longer. A brand new JavaScript framework? That’s a riskier long-term bet. The Lindy effect is about how software that’s been around a long time has already proven its staying power. And talking of AI specifically – I’ve been waiting for things to settle. There’s so much churn and change in this space, if I’d learnt a tool six months ago, most of that would be obsolete today. I don’t hate AI, I love that people are trying all these new tools – but I’m tired and I learning new things is exhausting. I’m waiting for things to calm down before really diving deep on these tools. Metaswitch was very cautious about third-party code, and they didn’t have much of it. Again, anything they use will have to be supported for a long time – is that third-party code, that open-source project stick around? They preferred to take the short-term hit of writing their own code, but then having complete control over it. To give you some idea of how seriously they took this: every third-party dependency had to be reviewed and vetted by lawyers before it could be added to the codebase. Imagine doing that for a modern Node.js project! They had a lot of safety nets. Manual and automated testing, a dedicated QA team, lots of checks and reviews. These were large codebases which had to be reliable. Long-lived systems can’t afford to “move fast and break things”. This was a lot of extra work, but it meant more stability, less churn, and not much risk of outside influences breaking things. This isn’t the only way to build software – Metaswitch is at one extreme of a spectrum – but it did seem to work. I think this is a lesson for building software, but also in what we choose to learn as individuals. Focusing on software that’s likely to last means less churn in our careers. If you learn the fundamentals of the web today, that knowledge will still be useful in five years. If you learn the JavaScript framework du jour? Maybe less so. How do you know what’s going to last? That’s the key question! It’s difficult, but it’s not impossible. This is my first thought for you all: you can’t predict the future, but there are patterns in what lasts. I’ve given you some examples of coding practices that can help the longevity of a codebase, these are just a few. Maybe I have rose-tinted spectacles, but I’ve taken the lessons from Metaswitch and brought them into my current work, and I do like them. I’m careful about external dependencies, I write a lot of my own code, and I create lots of safety nets, and stuff doesn’t tend to churn so much. My code lasts because it isn’t constantly being broken by external forces. Photo: a child in nursery school cutting a plank of wood with a saw. From the Community Archives of Belleville and Hastings County, no known copyright restrictions. So that’s what the smart people were doing at Metaswitch. What was I doing? I joined Metaswitch when I was a young and twenty-something graduate, so I knew everything. I knew software development was easy, these old fuddy-duddies were making it all far too complicated, and I was gonna waltz in and show them how it was done. And obviously, that happened. (Please imagine me reading that paragraph in a very sarcastic voice.) I started doing the work, and it was a lot harder than I expected – who knew that software development was difficult? But I was coming from a background as a solo dev who’d only done hobby projects. I’d never worked in a team before. I didn’t know how to say that I was struggling, to ask for help. I kept making bold promises about what I could do, based on how quickly I thought I should be able to do the work – but I was making promises my skills couldn’t match. I kept missing self-imposed deadlines. You can do that once, but you can’t make it a habit. About six months before I left, my manager said to me “Alex, you have a reputation for being unreliable”. Photo: a boy with a pudding bowl haircut, photographed by Elinor Wiltshire, 1964. From the National Library of Ireland, no known copyright restrictions. He was right! I had such a history of making promises that I couldn’t keep, people stopped trusting me. I didn’t get to work on interesting features or the exciting projects, because nobody trusted me to deliver. That was part of why I left that job – I’d ploughed my reputation into the ground, and I needed to reset. Photo: the library stores at Wellcome Collection. Taken by Thomas SG Farnetti used under CC BY‑NC 4.0. I got that reset at Wellcome Collection, a London museum and library that some of you might know. I was working a lot with their collections, a lot of data and metadata. Wellcome Collection is building on long tradition of libraries and archives, which go back thousands of years. Long-term thinking is in their DNA. To give you one example: there’s stuff in the archive that won’t be made public until the turn of the century. Everybody who works there today will be long gone, but they assume that those records will exist in some shape or form form when that time comes, and they’re planning for those files to eventually be opened. This is century-scale thinking. Photo: Bob Hoover. From the San Diego Air & Space Museum Archives, no known copyright restrictions. When I started, I sat next to a guy called Chris. (I couldn’t find a good picture of him, but I feel like this photo captures his energy.) Chris was a senior archivist. He’d been at Wellcome Collection about twenty-five years, and there were very few people – if anyone – who knew more about the archive than he did. He absolutely knew his stuff, and he could have swaggered around like he owned the place. But he didn’t. Something I was struck by, from my very first day, was how curious and humble he was. A bit of a rarity, if you work in software. He was the experienced veteran of the organisation, but he cared about what other people had to say and wanted to learn from them. Twenty-five years in, and he still wanted to learn. He was a nice guy. He was a pleasure to work with, and I think that’s a big part of why he was able to stay in that job as long as he did. We were all quite disappointed when he left for another job! This is my second thought for you: people skills sustain a career more than technical ones. Being a pleasure to work with opens so many doors and opportunities than technical skill alone cannot. We could do another conference just on what those people skills are, but for now I just want to give you a few examples to think about. Photo: Lt.(jg.) Harriet Ida Pickens and Ens. Frances Wills, first Negro Waves to be commissioned in the US Navy. From the U.S. National Archives, no known copyright restrictions. Be a respectful and reliable teammate. You want to be seen as a safe pair of hands. Reliability isn’t about avoiding mistakes, it’s about managing expectations. If you’re consistently overpromising and underdelivering, people stop trusting you (which I learnt the hard way). If you want people to trust you, you have to keep your promises. Good teammates communicate early when things aren’t going to plan, they ask for help and offer it in return. Good teammates respect the work that went before. It’s tempting to dismiss it as “legacy”, but somebody worked hard on it, and it was the best they knew how to do – recognise that effort and skill, don’t dismiss it. Listen with curiosity and intent. My colleague Chris had decades of experience, but he never acted like he knew everything. He asked thoughtful questions and genuinely wanted to learn from everyone. So many of us aren’t really listening when we’re “listening” – we’re just waiting for the next silence, where we can interject with the next thing we’ve already thought of. We aren’t responding to what other people are saying. When we listen, we get to learn, and other people feel heard – and that makes collaboration much smoother and more enjoyable. Finally, and this is a big one: don’t give people unsolicited advice. We are very bad at this as an industry. We all have so many opinions and ideas, but sometimes, sharing isn’t caring. Feedback is only useful when somebody wants to hear it – otherwise, it feels like criticism, it feels like an attack. Saying “um, actually” when nobody asked for feedback isn’t helpful, it just puts people on the defensive. Asking whether somebody wants feedback, and what sort of feedback they want, will go a long way towards it being useful. So again: people skills sustain a career more than technical skills. There aren’t many truly solo careers in software development – we all have to work with other people – for many of us, that’s the joy of it! If you’re a nice person to work with, other people will want to work with you, to collaborate on projects, they’ll offer you opportunities, it opens doors. Your technical skills won’t sustain your career if you can’t work with other people. Photo: "The Keeper", an exhibition at the New Museum in New York. Taken by Daniel Doubrovkine, used under CC BY‑NC‑SA 4.0. When I went to Wellcome Collection, it was my first time getting up-close and personal with a library and archive, and I didn’t really know how they worked. If you’d asked me, I’d have guessed they just keep … everything? And it was gently explained to me that “No Alex, that’s hoarding.” “Your overflowing yarn stash does not count as an archive.” Big collecting institutions are actually super picky – they have guidelines about what sort of material they collect, what’s in scope, what isn’t, and they’ll aggressively reject anything that isn’t a good match. At Wellcome Collection, their remit was “the history of health and human experience”. You have medical papers? Definitely interesting! Your dad’s old pile of car magazines? Less so. Photo: a dumpster full of books that have been discarded. From brewbooks on Flickr, used under CC BY‑SA 2.0. Collecting institutions also engage in the practice of “weeding” or “deaccessioning”, which is removing material, pruning the collection. For example, in lending libraries, books will be removed from the shelves if they’ve become old, damaged, or unpopular. They may be donated, or sold, or just thrown away – but whatever happens, they’re gotten rid of. That space is reclaimed for other books. Getting rid of material is a fundamental part of professional collecting, because professionals know that storing something has an ongoing cost. They know they can’t keep everything. Photo: a box full of printed photos. From Miray Bostancı on Pexels, used under the Pexels license. This is something I think about in my current job as well. I currently work at the Flickr Foundation, where we’re thinking about how to keep Flickr’s pictures visible for 100 years. How do we preserve social media, how do we maintain our digital legacy? When we talk to people, one thing that comes up regularly is that almost everybody has too many photos. Modern smartphones have made it so easy to snap, snap, snap, and we end up with enormous libraries with thousands of images, but we can’t find the photos we care about. We can’t find the meaningful memories. We’re collecting too much stuff. Digital photos aren’t expensive to store, but we feel the cost in other ways – the cognitive load of having to deal with so many images, of having to sift through a disorganised collection. Photo: a wheelbarrow in a garden. From Hans Middendorp on Pexels, used under the Pexels license. I think there’s a lesson here for the software industry. What’s the cost of all the code that we’re keeping? We construct these enormous edifices of code, but when do we turn things off? When do we delete code? We’re more focused on new code, new ideas, new features. I’m personally quite concerned by how much generative AI has focused on writing more code, and not on dealing with the code we already have. Code is text, so it’s cheap to store, but it still has a cost – it’s more cognitive load, more maintenance, more room for bugs and vulnerabilities. We can keep all our software forever, but we shouldn’t. Photo: Open Garbage Dump on Highway 112, North of San Sebastian. Taken by John Vachon, 1973. From the U.S. National Archives no known copyright restrictions. I think this is going to become a bigger issue for us. We live in an era of abundance, where we can get more computing resources at the push of a button. But that can’t last forever. What happens when our current assumptions about endless compute no longer hold? The climate crisis – where’s all our electricity and hardware coming from? The economics of AI – who’s paying for all these GPU-intensive workloads? And politics – how many of us are dependent on cloud computing based in the US? How many of us feel as good about that as we did three months ago? Libraries are good at making a little go a long way, about eking out their resources, about deciding what’s a good use of resources and what’s waste. Often the people who are good with money are the people who don’t have much of it, and we have a lot of money. It’s easier to make decisions about what to prune and what to keep when things are going well – it’s harder to make decisions in an emergency. This is my third thought for you: long-lasting systems cannot grow without bound; they need weeding. It isn’t sustainable to grow forever, because eventually you get overwhelmed by the weight of everything that came before. We need to get better at writing software efficiently, at turning things off that we don’t need. It’s a skill we’ve neglected. We used to be really good at it – when computers were the size of the room, programmers could eke out every last bit of performance. We can’t do that any more, but it’s so important when building something to last, and I think it’s a skill we’ll have to re-learn soon. Photo: Val Weaver and Vera Askew running in a relay race, Brisbane, 1939. From the State Library of Queensland no known copyright restrictions. Weeding is a term that comes from the preservation world, so let’s stay there. When you talk to people who work in digital preservation, we often describe it as a relay race. There is no permanent digital media, there’s no digital parchment or stone tablets – everything we have today will be unreadable in a few decades. We’re constantly migrating from one format to another, trying to stay ahead of obsolete technology. Software is also a bit of a relay race – there is no “write it once and you’re done”. We’re constantly upgrading, editing, improving. And that can be frustrating, but it also means have regular opportunities to learn and improve. We have that chance to reflect, to do things better. Photo: Broken computer monitor found in the woods. By Jeff Myers on Flickr, used under CC BY‑NC 2.0. I think we do our best reflections when computers go bust. When something goes wrong, we spring into action – we do retrospectives, root cause analysis, we work out what went wrong and how to stop it happening again. This is a great way to build software that lasts, to make it more resilient. It’s a period of intense reflection – what went wrong, how do we stop it happening again? What I’ve noticed is that the best systems are doing this sort of reflection all the time – they aren’t waiting for something to go wrong. They know that prevention is better than cure, and they embody it. They give themselves regular time to reflect, to think about what’s working and what’s not – and when we do, great stuff can happen. Photo: Statue of Astrid Lindgren. By Tobias Barz on Flickr, used under CC BY‑ND 2.0. I want to give you one more example. As a sidebar to my day job, I’ve been writing a blog for thirteen years. It’s the longest job – asterisk – I’ve ever had. The indie web is still cool! A lot of what I write, especially when I was starting, was sharing bits of code. “Here’s something I wrote, here’s what it does, here’s how it works and why it’s cool.” Writing about my code has been an incredible learning experience. You might know have heard the saying “ask a developer to review 5 lines of code, she’ll find 5 issues, ask her to review 500 lines and she’ll say it looks good”. When I sit back and deeply read and explain short snippets of my code, I see how to do things better. I get better at programming. Writing this blog has single-handedly had the biggest impact on my skill as a programmer. Photo: Midnight sun in Advent Bay, Spitzbergen, Norway. From the Library of Congress, no known copyright restrictions. There are so many ways to reflect on our work, opportunities to look back and ask how we can do better – but we have to make the most of them. I think we are, in some ways, very lucky that our work isn’t set in stone, that we do keep doing the same thing, that we have the opportunity to do better. Writing this talk has been, in some sense, a reflection on the first decade of my career, and it’s made me think about what I want the next decade to look like. In this talk, I’ve tried to distill some of those things, tried to give you some of the ideas that I want to keep, that I think will help my career and my software to last. Be careful about what you create, what you keep, and how you interact with other people. That care, that process of reflection – that is what creates things that last. [If the formatting of this post looks odd in your feed reader, visit the original article]
A week ago, somebody added malicious code to the tj-actions/changed-files GitHub Action. If you used the compromised action, it would leak secrets to your build log. Those build logs are public for public repositories, so anybody could see your secrets. Scary! Mutable vs immutable references This attack was possible because it’s common practice to refer to tags in a GitHub Actions workflow, for example: jobs: changed_files: ... steps: - name: Get changed files id: changed-files uses: tj-actions/changed-files@v2 ... At a glance, this looks like an immutable reference to an already-released “version 2” of this action, but actually this is a mutable Git tag. If somebody changes the v2 tag in the tj-actions/changed-files repo to point to a different commit, this action will run different code the next time it runs. If you specify a Git commit ID instead (e.g. a5b3abf), that’s an immutable reference that will run the same code every time. Tags vs commit IDs is a tradeoff between convenience and security. Specifying an exact commit ID means the code won’t change unexpectedly, but tags are easier to read and compare. Do I have any mutable references? I wasn’t worried about this particular attack because I don’t use tj-actions, but I was curious about what other GitHub Actions I’m using. I ran a short shell script in the folder where I have local clones of all my repos: find . -path '*/.github/workflows/*' -type f -name '*.yml' -print0 \ | xargs -0 grep --no-filename "uses:" \ | sed 's/\- uses:/uses:/g' \ | tr '"' ' ' \ | awk '{print $2}' \ | sed 's/\r//g' \ | sort \ | uniq --count \ | sort --numeric-sort This prints a tally of all the actions I’m using. Here’s a snippet of the output: 1 hashicorp/setup-terraform@v3 2 dtolnay/rust-toolchain@v1 2 taiki-e/create-gh-release-action@v1 2 taiki-e/upload-rust-binary-action@v1 4 actions/setup-python@v4 6 actions/cache@v4 9 ruby/setup-ruby@v1 31 actions/setup-python@v5 58 actions/checkout@v4 I went through the entire list and thought about how much I trust each action and its author. Is it from a large organisation like actions or ruby? They’re not perfect, but they’re likely to have good security procedures in place to protect against malicious changes. Is it from an individual developer or small organisation? Here I tend to be more wary, especially if I don’t know the author personally. That’s not to say that individuals can’t have good security, but there’s more variance in the security setup of random developers on the Internet than among big organisations. Do I need to use somebody else’s action, or could I write my own script to replace it? This is what I generally prefer, especially if I’m only using a small subset of the functionality offered by the action. It’s a bit more work upfront, but then I know exactly what it’s doing and there’s less churn and risk from upstream changes. I feel pretty good about my list. Most of my actions are from large organisations, and the rest are a few actions specific to my Rust command-line tools which are non-critical toys, where the impact of a compromised GitHub repo would be relatively slight. How this script works This is a classic use of Unix pipelines, where I’m chaining together a bunch of built-in text processing tools. Let’s step through how it works. find . -path '*/.github/workflows/*' -type f -name '*.yml' -print0 .yml in a folder like .github/workflows/. It prints a list of filenames, like: ./alexwlchan.net/.github/workflows/build_site.yml \0) between them, which makes it possible to split the filenames in the next step. By default it uses a newline, but a null byte is a bit safer, in case you have filenames which include newline characters. .yml as a file extension, but if you sometimes use .yaml, you can replace -name '*.yml' with \( -name '*.yml' -o -name '*.yaml' \) -path rules, like -not -path './cpython/*'. xargs -0 grep --no-filename "uses:" xargs to go through the filenames one-by-one. The `-0` flag tells it to split on the null byte, and then it runs grep to look for lines that include "uses:" – this is how you use an action in your workflow file. --no-filename option means this just prints the matching line, and not the name of the file it comes from. Not all of my files are formatted or indented consistently, so the output is quite messy: - uses: actions/checkout@v4 sed 's/\- uses:/uses:/g' \ uses: is the first key in the YAML dictionary. This sed command replaces "- uses:" with "uses:" to start tidying up the data. uses: actions/checkout@v4 sed is a pretty powerful tool for making changes to text, but I only know a couple of simple commands, like this pattern for replacing text: sed 's/old/new/g'. tr '"' ' ' uses: actions/checkout@v4 sed to make this substitution as well. I reached for tr because I've been using it for longer, and the syntax is simpler for doing single character substitutions: tr '<oldchar>' '<newchar>' awk '{print $2}' actions/checkout@v4 awk is another powerful text utility that I’ve never learnt properly – I only know how to print the nth word in a string. It has a lot of pattern-matching features I’ve never tried. sed 's/\r//g' \r), and those were included in the awk output. This command gets rid of them, which makes the data more consistent for the final step. sort | uniq --count | sort --numeric-sort tally. 6 actions/cache@v4 This step-by-step approach is how I build Unix text pipelines: I can write a step at a time, and gradually refine and tweak the output until I get the result I want. There are lots of ways to do it, and because this is a script I’ll use once and then discard, I don’t have to worry too much about doing it in the “purest” way – as long as it gets the right result, that’s good enough. If you use GitHub Actions, you might want to use this script to check your own actions, and see what you’re using. But more than that, I recommend becoming familiar with the Unix text processing tools and pipelines – even in the age of AI, they’re still a powerful and flexible way to cobble together one-off scripts for processing data. [If the formatting of this post looks odd in your feed reader, visit the original article]
I was building a small feature for the Flickr Commons Explorer today: show a random selection of photos from the entire collection. I wanted a fast and varied set of photos. This meant getting a random sample of rows from a SQLite table (because the Explorer stores all its data in SQLite). I’m happy with the code I settled on, but it took several attempts to get right. Approach #1: ORDER BY RANDOM() My first attempt was pretty naïve – I used an ORDER BY RANDOM() clause to sort the table, then limit the results: SELECT * FROM photos ORDER BY random() LIMIT 10 This query works, but it was slow – about half a second to sample a table with 2 million photos (which is very small by SQLite standards). This query would run on every request for the homepage, so that latency is unacceptable. It’s slow because it forces SQLite to generate a value for every row, then sort all the rows, and only then does it apply the limit. SQLite is fast, but there’s only so fast you can sort millions of values. I found a suggestion from Stack Overflow user Ali to do a random sort on the id column first, pick my IDs from that, and only fetch the whole row for the photos I’m selecting: SELECT * FROM photos WHERE id IN ( SELECT id FROM photos ORDER BY RANDOM() LIMIT 10 ) This means SQLite only has to load the rows it’s returning, not every row in the database. This query was over three times faster – about 0.15s – but that’s still slower than I wanted. Approach #2: WHERE rowid > (…) Scrolling down the Stack Overflow page, I found an answer by Max Shenfield with a different approach: SELECT * FROM photos WHERE rowid > ( ABS(RANDOM()) % (SELECT max(rowid) FROM photos) ) LIMIT 10 The rowid is a unique identifier that’s used as a primary key in most SQLite tables, and it can be looked up very quickly. SQLite automatically assigns a unique rowid unless you explicitly tell it not to, or create your own integer primary key. This query works by picking a point between the biggest and smallest rowid values used in the table, then getting the rows with rowids which are higher than that point. If you want to know more, Max’s answer has a more detailed explanation. This query is much faster – around 0.0008s – but I didn’t go this route. The result is more like a random slice than a random sample. In my testing, it always returned contiguous rows – 101, 102, 103, … – which isn’t what I want. The photos in the Commons Explorer database were inserted in upload order, so photos with adjacent row IDs were uploaded at around the same time and are probably quite similar. I’d get one photo of an old plane, then nine more photos of other planes. I want more variety! (This behaviour isn’t guaranteed – if you don’t add an ORDER BY clause to a SELECT query, then the order of results is undefined. SQLite is returning rows in rowid order in my table, and a quick Google suggests that’s pretty common, but that may not be true in all cases. It doesn’t affect whether I want to use this approach, but I mention it here because I was confused about the ordering when I read this code.) Approach #3: Select random rowid values outside SQLite Max’s answer was the first time I’d heard of rowid, and it gave me an idea – what if I chose random rowid values outside SQLite? This is a less “pure” approach because I’m not doing everything in the database, but I’m happy with that if it gets the result I want. Here’s the procedure I came up with: Create an empty list to store our sample. Find the highest rowid that’s currently in use: sqlite> SELECT MAX(rowid) FROM photos; 1913389 Use a random number generator to pick a rowid between 1 and the highest rowid: >>> import random >>> random.randint(1, max_rowid) 196476 If we’ve already got this rowid, discard it and generate a new one. (The rowid is a signed, 64-bit integer, so the minimum possible value is always 1.) Look for a row with that rowid: SELECT * FROM photos WHERE rowid = 196476 If such a row exists, add it to our sample. If we have enough items in our sample, we’re done. Otherwise, return to step 3 and generate another rowid. If such a row doesn’t exist, return to step 3 and generate another rowid. This requires a bit more code, but it returns a diverse sample of photos, which is what I really care about. It’s a bit slower, but still plenty fast enough (about 0.001s). This approach is best for tables where the rowid values are mostly contiguous – it would be slower if there are lots of rowids between 1 and the max that don’t exist. If there are large gaps in rowid values, you might try multiple missing entries before finding a valid row, slowing down the query. You might want to try something different, like tracking valid rowid values separately. This is a good fit for my use case, because photos don’t get removed from Flickr Commons very often. Once a row is written, it sticks around, and over 97% of the possible rowid values do exist. Summary Here are the four approaches I tried: Approach Performance (for 2M rows) Notes ORDER BY RANDOM() ~0.5s Slowest, easiest to read WHERE id IN (SELECT id …) ~0.15s Faster, still fairly easy to understand WHERE rowid > ... ~0.0008s Returns clustered results Random rowid in Python ~0.001s Fast and returns varied results, requires code outside SQL I’m using the random rowid in Python in the Commons Explorer, trading code complexity for speed. I’m using this random sample to render a web page, so it’s important that it returns quickly – when I was testing ORDER BY RANDOM(), I could feel myself waiting for the page to load. But I’ve used ORDER BY RANDOM() in the past, especially for asynchronous data pipelines where I don’t care about absolute performance. It’s simpler to read and easier to see what’s going on. Now it’s your turn – visit the Commons Explorer and see what random gems you can find. Let me know if you spot anything cool! [If the formatting of this post looks odd in your feed reader, visit the original article]
One rabbit hole I can never resist going down is finding the original creator of a piece of art. This sounds simple, but it’s often quite difficult. The Internet is a maze of social media accounts that only exist to repost other people’s art, usually with minimal or non-existent attribution. A popular image spawns a thousand copies, each a little further from the original. Signatures get cropped, creators’ names vanish, and we’re left with meaningless phrases like “no copyright intended”, as if that magically absolves someone of artistic theft. Why do I do this? I’ve always been a bit obsessive, a bit completionist. I’ve worked in cultural heritage for eight years, which has made me more aware of copyright and more curious about provenance. And it’s satisfying to know I’ve found the original source, that I can’t dig any further. This takes time. It’s digital detective work, using tools like Google Lens and TinEye, and it’s not always easy or possible. Sometimes the original pops straight to the top, but other times it takes a lot of digging to find the source of an image. So many of us have become accustomed to art as an endless, anonymous stream of “content”. A beautiful image appears in our feed, we give it a quick heart, and scroll on, with no thought for the human who sweated blood and tears to create it. That original artist feels distant, disconected. Whatever benefit they might get from the “exposure” of your work going viral, they don’t get any if their name has been removed first. I came across two examples recently that remind me it’s not just artists who miss out – it’s everyone who enjoys art. I saw a photo of some traffic lights on Tumblr. I love their misty, nighttime aesthetic, the way the bright colours of the lights cut through the fog, the totality of the surrounding darkness. But there was no name – somebody had just uploaded the image to their Tumblr page, it was reblogged a bunch of times, and then it appeared on my dashboard. Who took it? I used Google Lens to find the original photographer: Lucas Zimmerman. Then I discovered it was part of a series. And there was a sequel. I found interviews. Context. Related work. I found all this cool stuff, but only because I knew Lucas’s name. Traffic Lights, by Lucas Zimmerman. Published on Behance.net under a CC BY‑NC 4.0 license, and reposted here in accordance with that license. The second example was a silent video of somebody making tiny chess pieces, just captioned “wow”. It was clearly an edit of another video, with fast-paced cuts to make it accommodate a short attention span – and again with no attribution. This was a little harder to find – I had to search several frames in Google Lens before I found a summary on a Russian website, which had a link to a YouTube video by metalworker and woodworker Левша (Levsha). This video is four times longer than the cut-up version I found, in higher resolution, and with commentary from the original creator. I don’t speak Russian, but YouTube has auto-translated subtitles. Now I know how this amazing set was made, and I have a much better understanding of the materials and techniques involved. (This includes the delightful name Wenge wood, which I’d never heard before.) https://youtube.com/watch?v=QoKdDK3y-mQ A piece of art is more than just a single image or video. It’s a process, a human story. When art is detached from its context and creator, we lose something fundamental. Creators lose the chance to benefit from their work, and we lose the opportunity to engage with it in a deeper way. We can’t learn how it was made, find their other work, or discover how to make similar art for ourselves. The Internet has done many wonderful things for art, but it’s also a machine for endless copyright infringement. It’s not just about generative AI and content scraping – those are serious issues, but this problem existed long before any of us had heard of ChatGPT. It’s a thousand tiny paper cuts. How many of us have used an image from the Internet because it showed up in a search, without a second thought for its creator? When Google Images says “images may be subject to copyright”, how many of us have really thought about what that means? Next time you want to use an image from the web, look to see if it’s shared under a license that allows reuse, and make sure you include the appropriate attribution – and if not, look for a different image. Finding the original creator is hard, sometimes impossible. The Internet is full of shadows: copies of things that went offline years ago. But when I succeed, it feels worth the effort – both for the original artist and myself. When I read a book or watch a TV show, the credits guide me to the artists, and I can appreciate both them and the rest of their work. I wish the Internet was more like that. I wish the platforms we rely on put more emphasis on credit and attribution, and the people behind art. The next time an image catches your eye, take a moment. Who made this? What does it mean? What’s their story? [If the formatting of this post looks odd in your feed reader, visit the original article]
More in programming
Understand how computers represent numbers and perform operations at the bit level before diving into assembly
To be a successful founder, you have to believe that what you're working on is going to work — despite knowing it probably won't! That sounds like an oxymoron, but it's really not. Believing that what you're building is going to work is an essential component of coming to work with the energy, fortitude, and determination it's going to require to even have a shot. Knowing it probably won't is accepting the odds of that shot. It's simply the reality that most things in business don't work out. At least not in the long run. Most businesses fail. If not right away, then eventually. Yet the world economy is full of entrepreneurs who try anyway. Not because they don't know the odds, but because they've chosen to believe they're special. The best way to balance these opposing points — the conviction that you'll make it work, the knowledge that it probably won't — is to do all your work in a manner that'll make you proud either way. If it doesn't work, you still made something you wouldn't be ashamed to put your name on. And if it does work, you'll beam with pride from making it on the basis of something solid. The deep regret from trying and failing only truly hits when you look in the mirror and see Dostoevsky staring back at you with this punch to the gut: "Your worst sin is that you have destroyed and betrayed yourself for nothing." Oof. Believe it's going to work. Build it in a way that makes you proud to sign it. Base your worth on a human on something greater than a business outcome.
I recently went into a deep dive on “UART” and will publish a much longer article on the topic. This is just a recap of the basics to help put things in context. Many tutorials focus on using UART over USB, which adds many layers of abstraction, hiding what it actually is. Here, I deliberately … Continue reading How to use “real” UART → The post How to use “real” UART appeared first on Quentin Santos.
You know about Critical Race Theory, right? It says that if there’s an imbalance in, say, income between races, it must be due to discrimination. This is what wokism seems to be, and it’s moronic and false. The right wing has invented something equally stupid. Introducing Critical Trade Theory, stolen from this tweet. If there’s an imbalance in trade between countries, it must be due to unfair practices. (not due to the obvious, like one country is 10x richer than the other) There’s really only one way the trade deficits will go away, and that’s if trade goes to zero (or maybe if all these countries become richer than America). Same thing with the race deficits, no amount of “leg up” bullshit will change them. Why are all the politicians in America anti-growth anti-reality idiots who want to drive us into the poor house? The way this tariff shit is being done is another stupid form of anti-merit benefits to chosen groups of people, with a whole lot of grift to go along with it. Makes me just not want to play.
One of the most memorable quotes in Arthur Miller’s The Death of a Salesman comes from Uncle Ben, who describes his path to becoming wealthy as, “When I was seventeen, I walked into the jungle, and when I was twenty-one I walked out. And by God I was rich.” I wish I could describe the path to learning engineering strategy in similar terms, but by all accounts it’s a much slower path. Two decades in, I am still learning more from each project I work on. This book has aimed to accelerate your learning path, but my experience is that there’s still a great deal left to learn, despite what this book has hoped to accomplish. This final chapter is focused on the remaining advice I have to give on how you can continue to improve at strategy long after reading this book’s final page. Inescapably, this chapter has become advice on writing your own strategy for improving at strategy. You are already familiar with my general suggestions on creating strategy, so this chapter provides focused advice on creating your own plan to get better at strategy. It covers: Exploring strategy creation to find strategies you can learn from via public and private resources, and through creating learning communities How to diagnose the strategies you’ve found, to ensure you learn the right lessons from each one Policies that will help you find ways to perform and practice strategy within your organization, whether or not you have organizational authority Operational mechanisms to hold yourself accountable to developing a strategy practice My final benediction to you as a strategy practitioner who has finished reading this book With that preamble, let’s write this book’s final strategy: your personal strategy for developing your strategy practice. This is an exploratory, draft chapter for a book on engineering strategy that I’m brainstorming in #eng-strategy-book. As such, some of the links go to other draft chapters, both published drafts and very early, unpublished drafts. Exploring strategy creation Ideally, we’d start our exploration of how to improve at engineering strategy by reading broadly from the many publicly available examples. Unfortunately, there simply aren’t many easily available works to learn from others’ experience. Nonetheless, resources do exist, and we’ll discuss the three categories that I’ve found most useful: Public resources on engineering strategy, such as companies’ engineering blogs Private and undocumented strategies available through your professional network Learning communities that you build together, including ongoing learning circles Each of these is explored in its own section below. Public resources While there aren’t as many public engineering strategy resources as I’d like, I’ve found that there are still a reasonable number available. This book collects a number of such resources in the appendix of engineering strategy resources. That appendix also includes some individuals’ blog posts that are adjacent to this topic. You can go a long way by searching and prompting your way into these resources. As you read them, it’s important to recognize that public strategies are often misleading, as discussed previously in evaluating strategies. Everyone writing in public has an agenda, and that agenda often means that they’ll omit important details to make themselves, or their company, come off well. Make sure you read through the lines rather than taking things too literally. Private resources Ironically, where public resources are hard to find, I’ve found it much easier to find privately held strategy resources. While private recollections are still prone to inaccuracies, the incentives to massage the truth are less pronounced. The most useful sources I’ve found are: peers’ stories – strategies are often oral histories, and they are shared freely among peers within and across companies. As you build out your professional network, you can usually get access to any company’s engineering strategy on any topic by just asking. There are brief exceptions. Even a close peer won’t share a sensitive strategy before its existence becomes obvious externally, but they’ll be glad to after it does. People tend to over-estimate how much information companies can keep private anyway: even reading recent job postings can usually expose a surprising amount about a company. internal strategy archaeologists – while surprisingly few companies formally collect their strategies into a repository, the stories are informally collected by the tenured members of the organization. These folks are the company’s strategy archaeologists, and you can learn a great deal by explicitly consulting them becoming a strategy archaeologist yourself – whether or not you’re a tenured member of your company, you can learn a tremendous amount by starting to build your own strategy repository. As you start collecting them, you’ll interest others in contributing their strategies as well. As discussed in Staff Engineer’s section on the Write five then synthesize approach to strategy, over time you can foster a culture of documentation where one didn’t exist before. Even better, building that culture doesn’t require any explicit authority, just an ongoing show of excitement. There are other sources as well, ranging from attending the hallway track in conferences to organizing dinners where stories are shared with a commitment to privacy. Working in community My final suggestion for seeing how others work on strategy is to form a learning circle. I formed a learning circle when I first moved into an executive role, and at this point have been running it for more than five years. What’s surprised me the most is how much I’ve learned from it. There are a few reasons why ongoing learning circles are exceptional for sharing strategy: Bi-directional discussion allows so much more learning and understanding than mono-directional communication like conference talks or documents. Groups allow you to learn from others’ experiences and others’ questions, rather than having to guide the entire learning yourself. Continuity allows you to see the strategy at inception, during the rollout, and after it’s been in practice for some time. Trust is built slowly, and you only get the full details about a problem when you’ve already successfully held trust about smaller things. An ongoing group makes this sort of sharing feasible where a transient group does not. Although putting one of these communities together requires a commitment, they are the best mechanism I’ve found. As a final secret, many people get stuck on how they can get invited to an existing learning circle, but that’s almost always the wrong question to be asking. If you want to join a learning circle, make one. That’s how I got invited to mine. Diagnosing your prior and current strategy work Collecting strategies to learn from is a valuable part of learning. You also have to determine what lessons to learn from each strategy. For example, you have to determine whether Calm’s approach to resourcing Engineering-driven projects is something to copy or something to avoid. What I’ve found effective is to apply the strategy rubric we developed in the “Is this strategy any good?” chapter to each of the strategies you’ve collected. Even by splitting a strategy into its various phases, you’ll learn a lot. Applying the rubric to each phase will teach you more. Each time you do this to another strategy, you’ll get a bit faster at applying the rubric, and you’ll start to see interesting, recurring patterns. As you dig into a strategy that you’ve split into phases and applied the evaluation rubric to, here are a handful of questions that I’ve found interesting to ask myself: How long did it take to determine a strategy’s initial phase could be improved? How high was the cost to fund that initial phase’s discovery? Why did the strategy reach its final stage and get repealed or replaced? How long did that take to get there? If you had to pick only one, did this strategy fail in its approach to exploration, diagnosis, policy or operations? To what extent did the strategy outlive the tenure of its primary author? Did it get repealed quickly after their departure, did it endure, or was it perhaps replaced during their tenure? Would you generally repeat this strategy, or would you strive to avoid repeating it? If you did repeat it, what conditions seem necessary to make it a success? How might you apply this strategy to your current opportunities and challenges? It’s not necessary to work through all of these questions for every strategy you’re learning from. I often try to pick the two that I think might be most interesting for a given strategy. Policy for improving at strategy At a high level, there are just a few key policies to consider for improving your strategic abilities. The first is implementing strategy, and the second is practicing implementing strategy. While those are indeed the starting points, there are a few more detailed options worth consideration: If your company has existing strategies that are not working, debug one and work to fix it. If you lack the authority to work at the company scope, then decrease altitude until you find an altitude you can work at. Perhaps setting Engineering organizational strategies is beyond your circumstances, but strategy for your team is entirely accessible. If your company has no documented strategies, document one to make it debuggable. Again, if operating at a high altitude isn’t attainable for some reason, operate at a lower altitude that is within reach. If your company’s or team’s strategies are effective but have low adoption, see if you can iterate on operational mechanisms to increase adoption. Many such mechanisms require no authority at all, such as low-noise nudges or the model-document-share approach. If existing strategies are effective and have high adoption, see if you can build excitement for a new strategy. Start by mining for which problems Staff-plus engineers and senior managers believe are important. Once you find one, you have a valuable strategy vein to start mining. If you don’t feel comfortable sharing your work internally, then try writing proposals while only sharing them to a few trusted peers. You can even go further to only share proposals with trusted external peers, perhaps within a learning circle that you create or join. Trying all of these at once would be overwhelming, so I recommend picking one in any given phase. If you aren’t able to make traction, then try another until something works. It’s particularly important to recognize in your diagnosis where things are not working–perhaps you simply don’t have the sponsorship you need to enforce strategy so you need to switch towards suggesting strategies instead–and you’ll find something that works. What if you’re not allowed to do strategy? If you’re looking to find one, you’ll always unearth a reason why it’s not possible to do strategy in your current environment. If you’ve convinced yourself that there’s simply no policy that would allow you to do strategy in your current role, then the two most useful levers I’ve found are: Lower your altitude – there’s always a scale where you can perform strategy, even if it’s just your team or even just yourself. Only you can forbid yourself from developing personal strategies. Practice rather than perform – organizations can only absorb so much strategy development at a given time, so sometimes they won’t be open to you doing more strategy. In that case, you should focus on practicing strategy work rather than directly performing it. Only you can stop yourself from practice. Don’t believe the hype: you can always do strategy work. Operating your strategy improvement policies As the refrain goes, even the best policies don’t accomplish much if they aren’t paired with operational mechanisms to ensure the policies actually happen, and debug why they aren’t happening. Although it’s tempting to ignore operations when it comes to our personal habits, I think that would be a mistake: our personal habits have the most significant long-term impact on ourselves, and are the easiest habits to ignore since others generally won’t ask about them. The mechanisms I’d recommend: Explicitly track the strategies that you’ve implemented, refined, documented, or read. This should be in a document, spreadsheet or folder where you can explicitly see if you have or haven’t done the work. Review your tracked strategies every quarter: are you working on the expected number and in the expected way? If not, why not? Ideally, your review should be done in community with a peer or a learning circle. It’s too easy to deceive yourself, it’s much harder to trick someone else. If your periodic review ever discovers that you’re simply not doing the work you expected, sit down for an hour with someone that you trust–ideally someone equally or more experienced than you–and debug what’s going wrong. Commit to doing this before your next periodic review. Tracking your personal habits can feel a bit odd, but it’s something I highly recommend. I’ve been setting and tracking personal goals for some time now—for example, in my 2024 year in review—and have benefited greatly from it. Too busy for strategy Many companies convince themselves that they’re too much in a rush to make good decisions. I’ve certainly gotten stuck in this view at times myself, although at this point in my career I find it increasingly difficult to not recognize that I have a number of tools to create time for strategy, and an obligation to do strategy rather than inflict poor decisions on the organizations I work in. Here’s my advice for creating time: If you’re not tracking how often you’re creating strategies, then start there. If you’ve not worked on a single strategy in the past six months, then start with one. If implementing a strategy has been prohibitively time consuming, then focus on practicing a strategy instead. If you do try all those things and still aren’t making progress, then accept your reality: you don’t view doing strategy as particularly important. Spend some time thinking about why that is, and if you’re comfortable with your answer, then maybe this is a practice you should come back to later. Final words At this point, you’ve read everything I have to offer on drafting engineering strategy. I hope this has refined your view on what strategy can be in your organization, and has given you the tools to draft a more thoughtful future for your corner of the software engineering industry. What I’d never ask is for you to wholly agree with my ideas here. They are my best thinking on this topic, but strategy is a topic where I’m certain Hegel’s world view is the correct one: even the best ideas here are wrong in interesting ways, and will be surpassed by better ones.