Full Width [alt+shift+f] Shortcuts [alt+shift+k]
Sign Up [alt+shift+s] Log In [alt+shift+l]
10
Back in January of 2012, Russ Cox posted an excellent blog post detailing how Google Code Search had worked, using a trigram index. By that point, I’d already implemented early versions of my own livegrep source-code search engine, using a different indexing approach that I developed independently, with input from a few friends. This post is my long-overdue writeup of how it works. Suffix Arrays A suffix array is a data structure used for full-text search and other applications, primarily these days in the field of bioinformatics.
over a year ago

Improve your reading experience

Logged in users get linked directly to articles resulting in a better reading experience. Please login for free, it takes less than 1 minute.

More from Posts on Made of Bugs

Performance of the Python 3.14 tail-call interpreter

About a month ago, the CPython project merged a new implementation strategy for their bytecode interpreter. The initial headline results were very impressive, showing a 10-15% performance improvement on average across a wide range of benchmarks across a variety of platforms. Unfortunately, as I will document in this post, these impressive performance gains turned out to be primarily due to inadvertently working around a regression in LLVM 19. When benchmarked against a better baseline (such GCC, clang-18, or LLVM 19 with certain tuning flags), the performance gain drops to 1-5% or so depending on the exact setup.

a month ago 21 votes
Building personal software with Claude

Earlier this month, I used Claude to port (parts of) an Emacs package into Rust, shrinking the execution time by a factor of 1000 or more (in one concrete case: from 90s to about 15ms). This is a variety of yak-shave that I do somewhat routinely, both professionally and in service of my personal computing environment. However, this time, Claude was able to execute substantially the entire project under my supervision without me writing almost-any lines of code, speeding up the project substantially compared to doing it by hand.

2 months ago 32 votes
Finding near-duplicates with Jaccard similarity and MinHash

Suppose we have a large collection of documents, and we wish you identify which documents are approximately the same as each other. For instance, we may have crawled the web over some period of time, and expect to have fetched the “same page” several times, but to see slight differences in metadata, or that we have several revisions of a page following small edits. In this post I want to explore the method of approximate deduplication via Jaccard similarity and the MinHash approximation trick.

9 months ago 27 votes
Stripe's monorepo developer environment

I worked at Stripe for about seven years, from 2012 to 2019. Over that time, I used and contributed to many generations of Stripe’s developer environment – the tools that engineers used daily to write and test code. I think Stripe did a pretty good job designing and building that developer experience, and since leaving, I’ve found myself repeatedly describing features of that environment to friends and colleagues. This post is an attempt to record the salient features of that environment as I remember it.

10 months ago 20 votes
Performance engineering, profilers, and seeing the invisible

I was recently introduced to the paper “Seeing the Invisible: Perceptual-Cognitive Aspects of Expertise” by Gary Klein and Robert Hoffman. It’s excellent and I recommend you read it when you have a chance. Klein and Hoffman discuss the ability of experts to “see what is not there”: in addition to observing data and cues that are present in the environment, experts perceive implications of these cues, such as the absence of expected or “typical” information, the typicality or atypicality of observed data, and likely/possible past and future time trajectories of a system based on a point-in-time snapshot or limited duration of observation.

a year ago 24 votes

More in technology

Australian Air Force

If You're Switched On, This is Paradise.

3 hours ago 2 votes
Spring Mailbag - Questions Wanted!

Ask me stuff!

21 hours ago 1 votes
Book Notes: “Masters of Uncertainty”

Masters of Uncertainty: The Navy SEAL Way to Turn Stress into Success for You and Your Team By Rich Diviney  Amplify Publishing, 2025 We’re dealing with unprecedented levels of uncertainty. But that shouldn’t disempower us. Diviney, a former Navy SEAL, provides insights for becoming a “Master of Uncertainty” — i.e., adept at acting skillfully even in trying circumstances. The book is divided into three parts. The first explains how our bodies react to uncertain, fast-changing circumstances (e.g., with stress) and offers practical means for making the most of such conditions. For example, we can reframe our contexts (or “horizons”) to include only that which is in our immediate awareness and control and focus on small, near-term wins. We can also ask ourselves better questions and apply physical techniques (e.g., breathing patterns) to modulate stress. Reframing is an important component of the strategic design toolbox, so this section resonated with me. The second part of the book explores how our internal narratives — what we believe about ourselves and our goals — shape our behavior under uncertainty. Our attributes set natural constraints: for example, my physiology simply doesn’t allow me to be a pro basketball player. Self-identity is also powerful; it’s easier to quit smoking if you see yourself as a nonsmoker. And of course, having clear objectives is essential: you need to know what direction to move towards. Diviney echoes an idea we saw in On Grand Strategy: that you must keep the general direction in mind while paying attention to local conditions; if you encounter a swamp while traveling south, you may need to walk east for a while. Part three explains how to use these skills to develop teams that handle uncertainty effectively. Diviney proposes a leadership approach called dynamic subordination: Team members remain present and move in unison, working seamlessly to enhance one another’s strengths and buttress weaknesses. When one team member’s specific skills or attributes are needed, they step up and lead. The others then automatically move to support them fully. This requires deep trust and alignment, which is why there’s a chapter devoted to each. (The one on alignment focuses on developing a particular culture for your team.) Dynamic subordination offers a promising model for combining top-down direction with bottom-up adaptation to real-world conditions. Parts one and two echo Stoic ideas — especially around focus and self-regulation. Dynamic subordination was new to me. It sounds like a genuinely useful approach, albeit one that calls for 1) a very particular org culture and 2) a carefully vetted team. The SEALs meet both conditions; business teams less so. In our podcast, Harry said Masters of Uncertainty is in the running for his 2025 book of the year. I can see why: it’s a practical, short, and well-grounded guide for anyone designing teams or systems meant to thrive in fast-changing, unpredictable environments. (Aren’t they all?) Masters of Uncertainty by Rich Diviney

yesterday 1 votes
Say hello to another Quick Stuff app

Another day, another little app on Quick Stuff: Markdown Converter. This one also solves a personal need I have, which is that I write the show notes for my podcast in Markdown, but I need to put them in my podcast host as HTML and my co-host Chris needs them

2 days ago 2 votes
Solar upgrades the Nebulophone synthesizer to enhance playability 

Woodwinds and brass are so 19th century. We’re living in the future and now it is synthesizers all the way down. There are many to choose from and the Bleep Labs Nebulophone is a neat example that was sold from 2012 to 2016, with the design files now available on GitHub for DIYers. Marcus Dunn […] The post Solar upgrades the Nebulophone synthesizer to enhance playability  appeared first on Arduino Blog.

2 days ago 2 votes