More from Posts on Made of Bugs
About a month ago, the CPython project merged a new implementation strategy for their bytecode interpreter. The initial headline results were very impressive, showing a 10-15% performance improvement on average across a wide range of benchmarks across a variety of platforms. Unfortunately, as I will document in this post, these impressive performance gains turned out to be primarily due to inadvertently working around a regression in LLVM 19. When benchmarked against a better baseline (such GCC, clang-18, or LLVM 19 with certain tuning flags), the performance gain drops to 1-5% or so depending on the exact setup.
Suppose we have a large collection of documents, and we wish you identify which documents are approximately the same as each other. For instance, we may have crawled the web over some period of time, and expect to have fetched the “same page” several times, but to see slight differences in metadata, or that we have several revisions of a page following small edits. In this post I want to explore the method of approximate deduplication via Jaccard similarity and the MinHash approximation trick.
I worked at Stripe for about seven years, from 2012 to 2019. Over that time, I used and contributed to many generations of Stripe’s developer environment – the tools that engineers used daily to write and test code. I think Stripe did a pretty good job designing and building that developer experience, and since leaving, I’ve found myself repeatedly describing features of that environment to friends and colleagues. This post is an attempt to record the salient features of that environment as I remember it.
I was recently introduced to the paper “Seeing the Invisible: Perceptual-Cognitive Aspects of Expertise” by Gary Klein and Robert Hoffman. It’s excellent and I recommend you read it when you have a chance. Klein and Hoffman discuss the ability of experts to “see what is not there”: in addition to observing data and cues that are present in the environment, experts perceive implications of these cues, such as the absence of expected or “typical” information, the typicality or atypicality of observed data, and likely/possible past and future time trajectories of a system based on a point-in-time snapshot or limited duration of observation.
More in technology
Daniel Boguslaw: Intuit, Owner of TurboTax, Wins Battle Against America’s Taxpayers Even when the Biden administration broke through in the Inflation Reduction Act to fund a pilot program for Direct File, which expanded to 25 states this tax season, Intuit didn’t stop fighting. Instead, it continued
You can’t throw a rock these days without hitting someone trying to build humanoid robots.
As we pack our bags and prepare for the adult-er version of BlackHat (that apparently doesn’t require us to print out stolen mailspoolz to hand to people at their talks), we want to tell you about a recent adventure - a heist, if you will. No heist story
I know there’s been a lot of frustration directed at me specifically. Some of it, I believe, is misplaced—but I also understand where it’s coming from. The passing of Pope Francis has deeply impacted me. While I still disagree with the Church on many issues, he was the Pope who broke the mold in so … Continue reading Reflecting →