More from samwho.dev
header h1 { padding: 0; margin-top: 0.2rem; margin-bottom: 1rem; } button { margin: 0.5rem; padding: 0.5rem 1rem; background-color: #444444; color: white; border: none; border-radius: 8px; cursor: pointer; touch-action: manipulation; user-select: none; } button:hover:not(:disabled) { background-color: #555555; touch-action: manipulation; user-select: none; } button:disabled { filter: opacity(0.5); cursor: not-allowed; touch-action: manipulation; user-select: none; } input[type="range"] { width: 100%; margin: 1rem 0; } .control { display: flex; justify-content: center; align-items: center; margin: 1rem 0; } .control label { margin-right: 1rem; white-space: nowrap; } .control input[type="range"] { flex-grow: 1; } .odds { font-size: 1.2rem; font-family: var(--code-font); font-weight: bold; text-align: center; display: flex; justify-content: center; gap: 2rem; margin: 1rem 0; } .odds .hold { display: flex; flex-direction: column; align-items: center; } .odds .replace { display: flex; flex-direction: column; align-items: center; } .odds .hold .value { color: var(--palette-dark-blue); white-space: nowrap; } .odds .replace .value { color: var(--palette-red); white-space: nowrap; } article input { --c: var(--palette-orange); --l: 2px; --h: 30px; --w: 30px; width: 100%; height: var(--h); -webkit-appearance :none; -moz-appearance :none; appearance :none; background: none; cursor: pointer; overflow: hidden; } article input:focus-visible, article input:hover{ --p: 25%; } article input[type="range" i]::-webkit-slider-thumb{ height: var(--h); width: var(--w); aspect-ratio: 1; border-radius: 50%; background: var(--c); border-image: linear-gradient(90deg,var(--c) 50%,#ababab 0) 0 1/calc(50% - var(--l)/2) 100vw/0 100vw; -webkit-appearance: none; appearance: none; box-shadow: none; transition: .3s; } article input[type="range"]::-moz-range-thumb { --h: 25px; --w: 25px; height: var(--h); width: var(--w); aspect-ratio: 1; border-radius: 50%; background: var(--c); border-image: linear-gradient(90deg,var(--c) 50%,#ababab 0) 0 1/calc(50% - var(--l)/2) 100vw/0 100vw; -webkit-appearance: none; appearance: none; box-shadow: none; transition: .3s; } img.hero { --width: 200px; width: var(--width); max-width: var(--width); margin-top: calc(var(--width) * -0.5); margin-bottom: 1rem; } Reservoir Sampling Reservoir sampling is a technique for selecting a fair random sample when you don't know the size of the set you're sampling from. By the end of this essay you will know: When you would need reservoir sampling. The mathematics behind how it works, using only basic operations: subtraction, multiplication, and division. No math notation, I promise. A simple way to implement reservoir sampling if you want to use it. ittybit, and their API for working with videos, images, and audio. If you need to store, encode, or get intelligence from the media files in your app, check them out! # Sampling when you know the size In front of you are 10 playing cards and I ask you to pick 3 at random. How do you do it? The first technique that might come to mind from your childhood is to mix them all up in the middle. Then you can straighten them out and pick the first 3. You can see this happen below by clicking "Shuffle." Every time you click "Shuffle," the chart below tracks what the first 3 cards were. At first you'll notice some cards are selected more than others, but if you keep going it will even out. All cards have an equal chance of being selected. This makes it "fair." Click "Shuffle 100 times" until the chart evens out. You can reset the chart if you'd like to start over. This method works fine with 10 cards, but what if you had 1 million cards? Mixing those up won't be easy. Instead, we could use a random number generator to pick 3 indices. These would be our 3 chosen cards. We no longer have to move all of the cards, and if we click the "Select" button enough times we'll see that this method is just as fair as the mix-up method. I'm stretching the analogy a little here. It would take a long time to count through the deck to get to, say, index 436,234. But when it's an array in memory, computers have no trouble finding an element by its index. Now let me throw you a curveball: what if I were to show you 1 card at a time, and you had to pick 1 at random? That's the curveball: you don't know. No, you can only hold on to 1 card at a time. You're free to swap your card with the newest one each time I show you a card, but you can only hold one and you can't go back to a card you've already seen. Believe it or not, this is a real problem and it has a real and elegant solution. For example, let's say you're building a log collection service. Text logs, not wooden ones. This service receives log messages from other services and stores them so that it's easy to search them in one place. One of the things you need to think about when building a service like this is what do you do when another service starts sending you way too many logs. Maybe it's a bad release, maybe one of your videos goes viral. Whatever the reason, it threatens to overwhelm your log collection service. Let's simulate this. Below you can see a stream of logs that experiences periodic spikes. A horizontal line indicates the threshold of logs per second that the log collection service can handle, which in this example is 5 logs per second. You can see that every so often, logs per second spikes above the threshold . One way to deal with this is "sampling." Deciding to send only a fraction of the logs to the log collection service. Let's send 10% of the logs. Below we will see the same simulation again, but this time logs that don't get sent to our log collection service will be greyed out. The graph has 2 lines: a black line tracks sent logs , the logs that are sent to our log collection service, and a grey line tracks total logs . The rate of sent logs never exceeds the threshold , so we never overwhelm our log collection service. However, in the quieter periods we're throwing away 90% of the logs when we don't need to! What we really want is to send at most 5 logs per second. This would mean that during quiet periods you get all the logs, but during spikes you discard logs to protect the log collection service. The simple way to achieve this would be to send the first 5 logs you see each second, but this isn't fair. You aren't giving all logs an equal chance of being selected. # Sampling when you don't know the size We instead want to pick a fair sample of all the logs we see each second. The problem is that we don't know how many we will see. Reservoir sampling is an algorithm that solves this exact problem. You could, but why live with that uncertainty? You'd be holding on to an unknown number of logs in memory. A sufficiently big spike could cause you problems. Reservoir sampling solves this problem, and does so without ever using more memory than you ask it to. Let's go back to our curveball of me showing you 1 card at a time. Here's a recap of the rules: I'll draw cards one at a time from a deck. Each time I show you a card, you have to choose to hold it or discard it. If you were already holding a card, you discard your held card before replacing it with the new card. At any point I can stop drawing cards and whatever card you're holding is the one you've chosen. How would you play this game in a way that ensures all cards have been given an equal chance to be selected when I decide to stop? You're on the right track. Let's have a look at how the coin flip idea plays out in practice. Below you see a deck of cards. Clicking "Deal" will draw a card and 50% of the time it will go to the discard pile on the right, and 50% of the time it will become your held card in the center, with any previously held card moving to the discard pile. The problem is that while the hold vs discard counts are roughly equal, which feels fair, later cards are much more likely to be held when I stop than earlier cards. The first card drawn has to win 10 coin flips to still be in your hand after the 10th card is drawn. The last card only has to win 1. Scrub the slider below to see how the chances change as we draw more cards. Each bar represents a card in the deck, and the height of the bar is the chance we're holding that card when I stop. Below the slider are the chances we're holding the first card drawn vs. the last card drawn. Anything older than 15 cards ago is has a less than 0.01% chance of being held when I stop. Because believe it or not, we only have to make one small change to this idea to make it fair. Instead of flipping a coin to decide if we'll hold the card or not, instead we give each new card a 1/n chance of being held, where n is the number of cards we've seen so far. Yep! In order to be fair, every card must have an equal chance of being selected. So for the 2nd card, we want both cards to have a 1/2 chance. For the 3rd card, we want all 3 cards to have a 1/3 chance. For the 4th card, we want all 4 cards to have a 1/4 chance, and so on. So if we use 1/n for the new card, we can at least say that the new card has had a fair shot. Let's have a look at the chances as you draw more cards with this new method. new card has the right chance of being selected, but how does that make the older cards fair? So far we've focused on the chance of the new card being selected, but we also need to consider the chance of the card you're holding staying in your hand. Let's walk through the numbers. # Card 1 The first card is easy: we're not holding anything, so we always choose to hold the first card. The chance we're holding this card is 1/1, or 100%. # Card 2 This time we have a real choice. We can keep hold of the card we have, or replace it with the new one. We've said that we're going to do this with a 1/n chance, where n is the number of cards we've seen so far. So our chance of replacing the first card is 1/2, or 50%, and our chance of keeping hold of the first card is its chance of being chosen last time multiplied by its chance of being replaced, so 100% * 1/2, which is again 50%. # Card 3 The card we're holding has a 50% chance of being there. This is true regardless what happened up to this point. No matter whether we're holding card 1 or card 2, it's 50%. The new card has a 1/3 chance of being selected, so the card we're holding has a 1/3 chance of being replaced. This means that our held card has a 2/3 chance of remaining held. So its chances of "surviving" this round are 50% * 2/3. # Card N This pattern continues for as many cards as you want to draw. We can express both options as formulas. Drag the slider to substitute n with real numbers and see that the two formulas are always equal. If 1/n is the chance of choosing the new card, 1/(n-1) is the chance of choosing the new card from the previous draw. The chance of not choosing the new card is the complement of 1/n, which is 1-(1/n). Below are the cards again except this time set up to use 1/n instead of a coin flip. Click to the end of the deck. Does it feel fair to you? There's a good chance that through the 2nd half of the deck, you never swap your chosen card. This feels wrong, at least to me, but as we saw above the numbers say it is completely fair. # Choosing multiple cards Now that we know how to select a single card, we can extend this to selecting multiple cards. There are 2 changes we need to make: Rather than new cards having a 1/n chance of being selected, they now have a k/n chance, where k is the number of cards we want to choose. When we decide to replace a held card, we choose one of the k cards we're holding at random. So our new previous card selection formula becomes k/(n-1) because we're now holding k cards. Then the chance that any of the cards we're holding get replaced is 1-(1/n). Let's see how this plays out with real numbers. The fairness still holds, and will hold for any k and n pair. This is because all held cards have an equal chance of being replaced, which keeps them at an equal likelihood of still being in your hand every draw. A nice way to implement this is to use an array of size k. For each new card, generate a random number between 0 and n. If the random number is less than k, replace the card at that index with the new card. Otherwise, discard the new card. And that's how reservoir sampling works! # Applying this to log collection Let's take what we now know about reservoir sampling and apply it to our log collection service. We'll set k=5, so we're "holding" at most 5 log messages at a time, and every second we will send the selected logs to the log collection service. After we've done that, we empty our array of size 5 and start again. This creates a "lumpy" pattern in the graph below, and highlights a trade-off when using reservoir sampling. It's no longer a real-time stream of logs, but chunks of logs sent at an interval. However, sent logs never exceeds the threshold , and during quiet periods the two lines track each other almost perfectly. No logs lost during quiet periods, and never more than threshold logs per second sent during spikes. The best of both worlds. It also doesn't store more than k=5 logs, so it will have predictable memory usage. # Further reading Something you may have thought while reading this post is that some logs are more valuable than others. You almost certainly want to keep all error logs, for example. For that use-case there is a weighted variant of reservoir sampling. I wasn't able to find a simpler explanation of it, so that link is to Wikipedia which I personally find a bit hard to follow. But the key point is that it exists and if you need it you can use it. # Conclusion Reservoir sampling is one of my favourite algorithms, and I've been wanting to write about it for years now. It allows you to solve a problem that at first seems impossible, in a way that is both elegant and efficient. Thank you again to ittybit for sponsoring this post. I really couldn't have hoped for a more supportive first sponsor. Thank you for believing in and understanding what I'm doing here. Thank you to everyone who read this post and gave their feedback. You made this post much better than I could have done on my own, and steered me away from several paths that just weren't working. If you want to tell me what you thought of this post by sending me an anonymous message that goes directly to my phone, go to https://samwho.dev/ping.
body { text-wrap: pretty; } @media (prefers-reduced-motion: reduce) { * { transition: none; animation: none; } } turing-machine { width: 100%; display: block; position: relative; padding-bottom: 1em; } turing-machine .program-container { position: relative; display: flex; justify-content: center; } turing-machine table { border: none; font-family: Fira Code; border-collapse: collapse; border-spacing: 0; margin: 1px; margin-top: 0.5em; width: auto; } turing-machine thead td { text-align: center; } turing-machine td { text-align: left; padding-left: 3vw; padding-right: 3vw; padding-top: 0.2em; padding-bottom: 0.2em; border: 1px dashed #bbbbbb; } turing-machine thead td { border: 0; } turing-machine .container { z-index: 1; background-color: white; } turing-machine .svg-container { padding-bottom: 1px; padding-top: 0.5em; line-height: 0; overflow-x: scroll; overflow-y: hidden; position: relative; scrollbar-width: none; cursor: grab; cursor: -webkit-grab; } turing-machine .svg-container::-webkit-scrollbar { display: none; } turing-machine .controls-container { line-height: 1.5; display: flex; align-items: center; justify-content: center; gap: 0.3em; margin-top: 0.2em; padding-bottom: 0.2em; } turing-machine .controls-container button { color: black; background-color: #aaaaaa; border-radius: 30%/50%; padding: 0.3em; height: 2em; min-width: 3em; text-align: center; border: none; cursor: pointer; } turing-machine .controls-container select { color: black; border: 2px solid #aaaaaa; background-color: #ffffff; border-radius: 30%/50%; padding: 0.15em; height: 2em; min-width: 3em; text-align: center; cursor: pointer; font-family: "Fira Code"; } turing-machine .controls-container button:disabled { background-color: #eeeeee; color: #aaaaaa; cursor: auto; } turing-machine .controls-container button:disabled:hover { background-color: #eeeeee; } turing-machine .controls-container button:hover { background-color: #999999; } turing-machine .controls-container button:active { background-color: #888888; } turing-machine .error { width: 80%; margin-top: 0.5em; margin-bottom: 0.5em; border-radius: 10px; background-color: #ffdddd; color: red; font-family: "Fira Code"; font-size: 1.5em; font-weight: bold; text-align: center; margin: auto; } .dragging { cursor: grabbing !important; cursor: -webkit-grabbing !important; } .sticky { position: sticky !important; top: 0; } turing-machine .gradient { content: ""; z-index: 2; position: sticky; top: 0; left: 0; pointer-events: none; background-image: linear-gradient(90deg, rgba(255,255,255,1) 0%, rgba(255,255,255,0) 20%, rgba(255,255,255,0) 80%, rgba(255,255,255,1) 100%); width: 100%; margin-top: -62px; height: 64px; } turing-machine svg { user-select: none; /*touch-action: none;*/ } turing-machine svg text { font-family: "Fira Code"; } turing-machine tr { padding-top: 0.5em; padding-bottom: 0.5em; } turing-machine .instruction { color: black; display: inline-block; border-radius: 0.5em; padding: 0.1em; min-width: 4.5ch; height: 100%; text-align: center; } turing-machine .pointer { display: inline-block; border-radius: 0.5em; position: absolute; z-index: -1; } turing-machine program { display: none; } turing-machine controls { display: none; } turing-machine.heading svg text { font-family: "Lora"; font-weight: bold; } figure.hero { font-family: "Lora"; text-align: center; padding-top: 1em; padding-bottom: 2em; margin-top: 1em; width: 100%; max-width: 100%; } figure.hero turing-machine { padding: 0; margin: 0; } figure.hero turing-machine .svg-container { margin: 0; padding-top: 0; } figure.hero turing-machine .gradient { top: 0; } figure.hero .signature { margin-top: 0.5rem; max-width: 200px; } figure.hero .portrait { margin-top: 1em; width: 250px; max-width: 250px; } figure.hero p { padding: 0; font-size: 0.9em; } figure.hero figcaption { font-family: "Lora"; font-style: small-caps; font-size: 0.8em; } @keyframes pulse { 0% { transform: scale(1); } 50% { transform: scale(1.1); } 100% { transform: scale(1); } } .symbol { display: inline-block; border: 1px solid black; width: 1.5em; height: 1.5em; font-family: "Fira Code"; text-align: center; } .inline-instruction { display: inline-block; font-family: "Fira Code"; background-color: rgba(0, 158, 115, 0.7); color: black; border-radius: 0.5em; padding: 0.1em; min-width: 3ch; padding-left: 0.2em; padding-right: 0.2em; text-align: center; } li:has(.inline-instruction) { margin-top: 0.35em; margin-bottom: 0.35em; } .inline-state { display: inline-block; font-family: "Fira Code"; background-color: rgb(86, 180, 233); color: black; border-radius: 0.5em; padding: 0.1em; min-width: 3ch; padding-left: 0.2em; padding-right: 0.2em; text-align: center; } li:has(.inline-state) { margin-top: 0.35em; margin-bottom: 0.35em; } .inline-value { display: inline-block; font-family: "Fira Code"; background-color: rgb(230, 159, 0); color: black; border-radius: 0.5em; padding: 0.1em; min-width: 3ch; padding-left: 0.2em; padding-right: 0.2em; text-align: center; } li:has(.inline-value) { margin-top: 0.35em; margin-bottom: 0.35em; } figure { margin: auto; margin-top: 2em; margin-bottom: 2em; } figure img { max-width: 400px; } figure figcaption { max-width: 500px; color: #666; font-size: 0.8em; font-style: italic; text-align: center; font-family: "Lora"; text-wrap: pretty; margin: auto; } ALAN M. TURING 23 June 1912 – 7 June 1954 B B | | L P( ) L P( ) L P( ) L P( ) L P( ) L P( ) L P( ) L P( ) L P( ) L P( ) L P( ) L P( ) L P( ) L P( ) L P( ) -> F In 1928, David Hilbert, one of the most influential mathematicians of his time, asked whether it is possible to create an algorithm that could determine the correctness of a mathematical statement. This was called the "decision problem," or "Entscheidungsproblem" in Hilbert's native German. In 1936 both Alan Turing and Alonzo Church independently reached the conclusion, using different methods, that the answer is "no." The way Turing did it was to imagine a "universal machine", a machine that could compute anything that could be computed. This idea, the "Turing machine" as Alonzo Church christened it in 1937, laid the foundations for the device you are using to read this post. If we look hard enough we can see Turing's legacy in today's CPUs. By the end of this post, you will know: What a Turing machine is. What can and cannot be computed. What it means to be Turing complete. How modern computers relate to Turing machines. How to write and run your own programs for a Turing machine. # What is a Turing machine? You might expect a universal machine, capable of computing anything that can be computed, to be a complex device. Nothing could be further from the truth. The machine has just 4 parts, and the language used to program it has just 5 instructions. theoretical machine. It was created as a thought experiment to explore the limits of what can be computed. Some have of course been built, but in 1936 they existed only in the heads of Turing and those who read his paper. The parts are: a !tape, a !head, a !program, and a !state. When you're ready, go ahead and press !play. start What you're seeing here is a program that executes P(0) to print 0 to the tape, moves the head right with the R instruction, then !jumps back to the start. It will go on printing 0s forever. At any point, feel free to !pause, step the machine !forwards or !backwards one instruction at a time, or !restart the program from the beginning. There is also a speed selector on the far right of the controls if you want to speed up the machine. Notice that !state and !value never change. Every time the machine performs a !jump, the current state and value are used to pick the correct next row of instructions to execute. This program only has a single state, start, and every time it jumps, the symbol under the !head is !blank. Let's take a look at a program with multiple states. one one | | P(1) R -> zero This program prints alternating 0s and 1s to the tape. It has 2 states, zero and one, to illustrate what happens when you !jump to a different state. Things moving too fast? The slider below can be used to adjust the speed of all the Turing machines on this page. 100% You can also achieve this same result by using a single !state and alternating the !value. Here's an example of that: start start | 1 | R P(0) -> start start | 0 | R P(1) -> start The !value column always stays up to date with what the current symbol is under the !head, then when we !jump that value is used to know which row of instructions to execute. Combining state and value gives us a surprising amount of control over what our !program does. We've so far seen 3 instructions: P prints a given symbol to the tape. R moves the tape head right. ↪︎ jumps to a given state. There are 2 more: L moves the tape head left. H halts the machine. 1 1 | | P(a) L -> 2 2 | | P(l) L -> 3 3 | | P(A) H This program prints the word "Alan" from right to left then halts. If you can't see the full word, you can drag the !tape left and right. If the machine has halted, you can use !restart to start it again from the beginning. All of it! You're probably not going to get Crysis running at 60fps on a simulated Turing machine, but all of the calculations required to render each frame can be done with just these 5 instructions. Everything you have ever seen a computer do can be done with a Turing machine. We'll see a glimpse of how that can work in practice a little later. The last example I want to show you before we move on is the very first program Alan Turing showed the world. It's the first program featured in his 1936 paper: "On Computable Numbers, with an application to the Entsheidungsproblem." c c | | R -> e e | | P(1) R -> k k | | R -> b Turing liked to leave spaces between symbols, going as far as to even define them as "F-squares" and "E-squares". F for figure, and E for erasable. His algorithms would often make use of E-squares to help the machine remember the location of specific !tape squares. # What does it mean to compute? Something is said to be "computable" if there exists an algorithm that can get from the given input to the expected output. For example, adding together 2 integers is computable. Here I'm giving the machine a !tape that starts out with the values 2 and 6 in binary separated by a !blank. b1 b1 | 1 | R -> b1 b1 | | R -> b2 b2 | 0 | R -> b2 b2 | 1 | R -> b2 b2 | | L -> dec dec | 0 | P(1) L -> dec dec | 1 | P(0) L -> b3 dec | | H b3 | 0 | L -> b3 b3 | 1 | L -> b3 b3 | | L -> inc inc | 0 | P(1) R -> b1 inc | 1 | P(0) L -> inc inc | | P(1) R -> b1 This program adds the two numbers together, arriving at the answer 8. It does this by decrementing from the right number and incrementing the left number until the right number is 0. Here's the purpose of each state: b1 Move right until the first !blank square is found. This is to navigate past the first number. b2 Move right until the second !blank square is found. This is to navigate past the second number. dec Decrement the current number by 1. In practice this will always be the right number. It decrements by flipping bits until it either reaches a 1, where it will navigate back to the left number, or a !blank, which it will interpret as the number having hit 0. b3 Move left until the first !blank square again. This is only ever used after we've decremented the rightmost number. We can't reuse the b1 state here because when we reach the !blank, we want to jump to inc. inc Increment the current number by 1. Similar to decrementing, this will only ever happen on the leftmost number in practice. This is done by flipping bits until we reach a 0, at which point we navigate back to the right number. That we can write this program at all is proof that addition is computable, but it also implies that all integers are computable. If we can add any 2 integers, we can compute any other integer. 1 is 0+1, 2 is 1+1, 3 is 2+1, and so on. # Binary vs Decimal You may have wondered why I'm choosing to work with binary numbers rather than decimal. It's not just because that's how modern computers work. I'm going to show you 2 examples, and from those examples you'll be able to see why modern computers choose to work in binary. The first example is a program that increments a binary number in an endless loop. move move | 0 | R -> move move | | L -> flip flip | 0 | P(1) R -> move flip | 1 | P(0) L -> flip flip | | P(1) R -> move The second example is a program that increments a decimal number in an endless loop. back inc | 1 | P(2) -> back inc | 2 | P(3) -> back inc | 3 | P(4) -> back inc | 4 | P(5) -> back inc | 5 | P(6) -> back inc | 6 | P(7) -> back inc | 7 | P(8) -> back inc | 8 | P(9) -> back inc | 9 | P(0) L -> inc inc | | P(1) -> back back | | L -> inc back | * | R -> back These two programs are doing the same thing, but the program for manipulating decimal numbers is much longer. We've even introduced some new syntax, the * symbol, to handle a !value under the !head that does not match any of the other values for that !state. It's for this reason when programming Turing machines we prefer binary numbers: the programs end up being shorter and easier to reason about. This benefit also translates to the physical world. Components that switch between 2 states are cheaper, smaller, and more reliable than components that switch between 10. It was more practical to build computers that worked in binary than ones that work in decimal, though attempts to build decimal computers were made. # What can't be computed? To approach this question we need to explain the "Halting problem." It goes like this: The answer is no, and this is what Turing essentially proved. The proof is complicated and I'm not ashamed to admit I don't understand it, but there is an example I can give you that can be intuitively understood to be "undecidable." Imagine you write a program that takes as its input the program being used to decide whether it will halt or not. What it then does is run the decider program on itself, and then do the opposite of what the decider program says. function undecidable(willHalt) { if (willHalt(undecidable)) { while (true); } else { return true; } } This program intentionally enters an infinite loop if it is told it will halt, and halts if it is told it will run forever. It seems like a silly example, the kind of answer a cheeky high school student might try to get away with, but it is a legitimate counterexample to the idea that the halting problem can be solved. If you were to imagine encoding the program and input into something that could be represented on the !tape, there would be no !program that could determine whether the program would halt or not. Imagining this encoding becomes quite natural when you realise that modern programs are encoded as binary data to be saved to disk. # What does it mean to be Turing complete? If you've been involved in the world of programming for more than a few years, there's a good chance you've come across the term "Turing complete." Most likely in the context of things that really ought not to be Turing complete, like C++ templates, TypeScript's type system or Microsoft Excel. But what does it mean? Like the Halting problem, the proof is complicated but there's a straightforward test you can apply to something to judge it Turing complete: I've written this post, with the Turing machine simulations, in JavaScript. Therefore JavaScript is Turing complete. The C++ template example given above simulates a Turing machine in C++'s template system. The TypeScript example takes the route of writing an interpreter for a different Turing complete language. You're right, and everyone tends to cheat a bit with the definition. When someone says something is Turing complete, what they mean is it would be Turing complete if it had an infinite amount of memory. The infinite tape limitation means no Turing machine could ever exist in our physical reality, so that requirement tends to get waived. # How does this all relate to modern computers? If you read around the topic of Turing machines outside of this post, you might see it said that modern computers are effectively Turing machines. You would be forgiven for finding it difficult to imagine how you go from adding 2 integers in binary on a !tape to running a web browser, but the line is there. A key difference between our Turing machine and the device you're reading this on is that your device's CPU has "registers." These are small pieces of memory that live directly on the CPU and are used to store values temporarily while they're being operated on. Values are being constantly loaded from memory into registers and saved back again. You can think of registers as variables for your CPU, but they can only store fixed-size numbers. We can create registers in our Turing machine. We can do this by creating a "format" for our tape. Here we define 3 registers: A, B, and C. Each register contains a 3 bits and can store numbers between 0 and 7. Then at the far left we have an H, which stands for "home", which will help us navigate. To increment register C, we can write a program like this: goto_c goto_c | C | R R R -> inc inc | 0 | P(1) -> goto_h goto_h | * | L -> goto_h goto_h | H | H We're making a lot more liberal use of the * symbol here to help us navigate to specific parts of the !tape without having to enumerate all possible values that could be under the !head on the way there. This program is effectively equivalent to the following x86 assembly code, if x86 had a register named c: mov c, 0 ; Load 0 into c inc c ; Increment c by 1 If we wanted to add values in A and B, storing the result in C, we need to do more work. Here's the assembly code we're trying to replicate: mov a, 2 ; Load 2 into a mov b, 3 ; Load 3 into b add c, a ; Add a to c add c, b ; Add b to c Before you scroll down I will warn you that the program is long and complex. It is the last program we will see in this post, and I don't expect you to understand it in full to continue to the end. Its main purpose is to show you that we can implement operations seen in modern assembly code on a Turing machine. initb initb | * | R P(0) R P(1) R P(1) R -> start start | * | L -> start start | H | R -> go_a go_a | * | R -> go_a go_a | A | R R R -> dec_a dec_a | 0 | P(1) L -> cry_a dec_a | 1 | P(0) -> go_c1 cry_a | 0 | P(1) L -> cry_a cry_a | 1 | P(0) -> go_c1 cry_a | * | R P(0) R P(0) R P(0) -> goto_b go_c1 | * | R -> go_c1 go_c1 | C | R R R -> inc_c1 inc_c1 | 0 | P(1) -> h2a inc_c1 | 1 | P(0) L -> cry_ca cry_ca | 0 | P(1) -> h2a cry_ca | 1 | P(0) L -> cry_ca cry_ca | * | R -> h2a h2a | * | L -> h2a h2a | H | R -> go_a goto_b | * | R -> goto_b goto_b | B | R R R -> dec_b dec_b | 0 | P(1) L -> dec_bc dec_b | 1 | P(0) -> go_c2 dec_bc | 0 | P(1) L -> dec_bc dec_bc | 1 | P(0) -> go_c2 dec_bc | * | R P(0) R P(0) R P(0) -> end go_c2 | * | R -> go_c2 go_c2 | C | R R R -> inc_c2 inc_c2 | 0 | P(1) -> go_hb inc_c2 | 1 | P(0) L -> cry_cb cry_cb | 0 | P(1) -> go_hb cry_cb | 1 | P(0) L -> cry_cb cry_cb | * | R -> go_hb go_hb | * | L -> go_hb go_hb | H | R -> goto_b end | * | L -> end end | H | H This is painfully laborious, and it doesn't even precisely match the assembly code. It destroys the values in A and B as it adds them together, and it doesn't handle overflow. But it's a start, and I hope it gives you a glimpse of how this theoretical machine can be built up to operate like a modern CPU. If you watch the program run to completion, something that might strike you is just how much work is required to do something as simple as adding 2 numbers. Turing machines were not designed to be practical, Turing never intended anyone to go out and build one of these machines in the hope it will be useful. Modern machines have circuits within them that can add 2 numbers together by passing 2 electrical signals in and getting the sum as a single signal out. This happens in less than a nanosecond. Modern machines have memory where any byte can be accessed at any time, no tape manipulation required. This memory access takes a few dozen nanoseconds. # Writing and running your own programs I've built a web-based development environment for writing programs that will run on the Turing machine visualisations you've seen throughout the post. You can access the editor here. I encourage you to play around with it. Set a simple goal, like adding together 2 numbers without going back to look at the way I did it in the post. It's a great way to get a feel for how the machine works. # Conclusion To recap, we've covered: What a Turing machine is. What can and cannot be computed. What it means to be Turing complete. How modern computers relate to Turing machines. And you now have access to an environment for writing and running your own Turing machine programs. If you use it to make something neat, please do reach out to me and show me! My email address is hello@samwho.dev. # Further reading Turing's original paper on computable numbers. The Annotated Turing I referenced this throughout the making of this post. It is a fabulous read, strongly recommend. Alan Turing: The Enigma by Andrew Hodges. An excellent biography of Turing, I read this during the writing of this post. Calculating a Mandelbrot Set using a Turing Machine. This was exceptionally useful for me to understand how to get from Turing machines to modern computers. # Acknowledgements These posts are never a solo effort, and this one is no exception. Sincere thanks go to the following people: To my wife, Sophie, who drew the biographical sketches you're seeing at the end here, and for putting up with my incessant talking about this post the last 2 weeks. Everyone who let me watch them read this post in real-time over a video call and gave me feedback: Jaga Santagostino, Robert Aboukhalil, Tarun Verghis, Tyler Sparks. Everyone who came to hang out and help out in the Twitch streams when I was building out the early versions of the Turing machine visualisations. Everyone who supports my work on Patreon. Everyone who works on the tools used to build this post: TypeScript, Bun, Two.js, Tween.js, Monaco, Peggy, Zed, and many other indirect dependencies. We really do stand upon the shoulders of giants. Hut 8, Bletchley Park, where Turing worked during World War II. It was in this hut that Alan worked with his team to break the German Naval Enigma code.
.dog-line { display: flex; flex-wrap: nowrap; flex-direction: row; width: 100%; height: 10rem; margin-top: 2rem; margin-bottom: 2rem; } .dog-line img { flex-grow: 1; height: auto; margin: 0; padding: 0; object-fit: contain; } .dog-grid { display: grid; grid-template-columns: repeat(4, 1fr); grid-gap: 1rem; margin-top: 2rem; margin-bottom: 2rem; } Back in Memory Allocation, I introduced Haskie. The idea behind Haskie was to create a character that could ask questions the reader might have, and to "soften" the posts to make them feel less intimidating. I got some feedback from people that Haskie was a bit too childish, and didn't feel like he belonged in posts about serious topics. This feedback was in the minority, though, and most people liked him. So I kept him and used him again in Hashing. Having a proxy to the reader was useful. I could anticipate areas of confusion and clear them up without creating enormous walls of text. I don't like it when the entire screen is filled with text, I like to break it up with images and interactive elements. And now dogs. Then in Bloom Filters, I found myself needing a character to represent the "adult in the room." If Haskie was my proxy to the reader, this new character would serve as a proxy to all of the material I learned from in the writing of the post. This is Sage. I liked the idea of having a cast of characters, each with their own personality and purpose. But I had a few problems. # Problems Both Haskie and Sage, because I have no artistic ability, were generated by AI. Back when I made them I was making no money from this blog, and I had no idea if I was going to keep them around. I didn't want to invest money in an idea that could flop, so I didn't feel bad about using AI to try it out. Since then, however, I have been paid twice to write posts for companies, and I know that I'm keeping the dogs. It wasn't ethical to continue piggybacking on AI. While ethics were the primary motivation, there were some other smaller problems with the dogs: The visual style of them, while I did like it, never felt like it fit with the rest of my personal brand. It was difficult to get AI to generate consistent dogs. You'll notice differences in coat colouration and features between variants of the same dog. The AI generated images look bad at small sizes. So I worked with the wonderful Andy Carolan to create a new design for my dogs. A design that would be consistent, fit with my brand, and look good at any size. # Haskie, Sage, and Doe The redesigned dogs are consistent, use simple colours and shapes, and use the SVGs file format to look good at any size. Each variant clocks in at around 20kb, which is slightly larger than the small AI-generated images, but I'll be able to use them at any size. Together the dogs represent a family unit: Sage as the dad, Haskie as the youngest child, and Doe as his older sister. They also come in a variety of poses, so I can use them to represent different emotions or actions. We were careful to make the dogs recognisable apart. They differ in colour, ear shape, tail shape, and collar tag. Sage and Doe have further distinguishing features: Sage with his glasses, and Doe with her bandana. Doe's bandana uses the same colours as the transgender flag, to show my support for the trans community and as a nod to her identity. # Going forward I'm so happy with the new dogs, and plan to use them in my posts going forward. I suspect I will, at some point, replace the dogs in my old posts as well. I don't plan to add any more characters, and I want to be careful to avoid overusing them. I don't want them to become a crutch, or to distract from the content of the posts. I also haven't forgotten the many people that pointed out to me that you can't pet the dogs. I'm working on it.
.bf { width: 100%; height: 150px; } @media only screen and (min-width: 320px) and (max-width: 479px) { .bf { height: 200px; } } @media only screen and (min-width: 480px) and (max-width: 676px) { .bf { height: 200px; } } @media only screen and (min-width: 677px) and (max-width: 991px) { .bf { height: 150px; } } form { display: flex; flex-direction: column; align-items: center; justify-content: stretch; } input { border: 1px solid rgb(119, 119, 119); padding: 0.25rem; border-radius: 0.25rem; height: 2em; line-height: 2em; } .aside { padding: 2rem; width: 100vw; position: relative; margin-left: -50vw; left: 50%; background-color: #eeeeee; display: flex; align-items: center; flex-direction: column; } .aside > * { flex-grow: 1; } .aside p { padding-left: 1rem; padding-right: 1rem; max-width: 780px; font-style: italic; font-family: Lora, serif; text-align: center; } Everyone has a set of tools they use to solve problems. Growing this set helps you to solve ever more difficult problems. In this post, I'm going to teach you about a tool you may not have heard of before. It's a niche tool that won't apply to many problems, but when it does you'll find it invaluable. It's called a "bloom filter." Before you continue! This post assumes you know what a hash function is, and if you don't it's going to be tricky to understand. Sam has written a post about hash functions, and recommendeds that you read this first. # What bloom filters can do Bloom filters are similar to the Set data structure. You can add items to them, and check if an item is present. Here's what it might look like to use a bloom filter in JavaScript, using a made-up BloomFilter class: let bf = new BloomFilter(); bf.add("Ant"); bf.add("Rhino"); bf.contains("Ant"); // true bf.contains("Rhino"); // true While this looks almost identical to a Set, there are some key differences. Bloom filters are what's called a probabalistic data structure. Where a Set can give you a concrete "yes" or "no" answer when you call contains, a bloom filter can't. Bloom filters can give definite "no"s, but they can't be certain about "yes." In the example above, when we ask bf if it contains "Ant" and "Rhino", the true that it returns isn't a guarantee that they're present. We know that they're present because we added them just a couple of lines before, but it would be possible for this to happen: let bf = new BloomFilter(); bf.add("Ant"); bf.add("Rhino"); bf.contains("Fox"); // true We'll demonstrate why over the course of this post. For now, we'll say that when bloom filters return true it doesn't mean "yes", it means "maybe". When this happens and the item has never been added before, it's called a false-positive. The opposite, claiming "no" when the answer is "yes," is called a false-negative. A bloom filter will never give a false-negative, and this is what makes them useful. It's not strictly lying, it's just not giving you a definite answer. Let's look at an example where we can use this property to our advantage. # When bloom filters are useful Imagine you're building a web browser, and you want to protect users from malicious links. You could build and maintain a list of all known malicious links and check the list every time a user navigates the browser. If the link they're trying to visit is in the list, you warn the user that they might be about to visit a malicious website. If we assume there are, say, 1,000,000 malicious links on the Internet, and each link is 20 characters long, then the list of malicious links would be 20MB in size. This isn't a huge amount of data, but it's not small either. If you have lots of users and want to keep this list up to date, the bandwidth could add up. However, if you're happy to accept being wrong 0.0001% of the time (1 in a million), you could use a bloom filter which can store the same data in 3.59MB. That's an 82% reduction in size, and all it costs you is showing the user an incorrect warning 1 in every million links visited. If you wanted to take it even further, and you were happy to accept being wrong 0.1% of the time (1 in 1000), the bloom filter would only be 1.8MB. This use-case isn't hypothetical, either. Google Chrome used a bloom filter for this exact purpose until 2012. If you were worried about showing a warning when it wasn't needed, you could always make an API that has the full list of malicious links in a database. When the bloom filter says "maybe," you would then make an API call to check the full list to be sure. No more spurious warnings, and the bloom filter would save you from having to call the API for every link visited. # How bloom filters work At its core, a bloom filter is an array of bits. When it is created, all of the bits are set to 0. We're going to represent this as a grid of circles, with each circle representing 1 bit. Our bloom filters in this post are all going to have 32 bits in total. this one and let me know what you think. Click here to go back to normal. To add an item to the bloom filter, we're going to hash it with 3 different hash functions, then use the 3 resulting values to set 3 bits. If you're not familiar with hashing, I recommend reading my post about it before continuing. For this post I'm choosing to use 3 of the SHA family of hash functions: sha1, sha256, and sha512. Here's what our bloom filter looks like if we add the value "foo" to it: The bits in positions 15, 16 and 27 have been set. Other bits, e.g. 1 have not been set. You can hover or tap the bits in this paragraph to highlight them in the visualisation. We get to this state by taking the hash value of "foo" for each of our 3 hash functions and modulo it by the number of bits in our bloom filter. Modulo gets us the remainder when dividing by 32, so we get 27 with sha1, 15 with sha256 and 16 with sha512. The table below shows what's happening, and you can try inputting your own values to see what bits they would set if added. Go ahead and add a few of your own values to our bloom filter below and see what happens. There's also a check button that will tell you if a value is present within the bloom filter. A value is only considered present if all of the bits checked are set. You can start again by hitting the clear button. You might occasionally notice that only 2, or even 1, bits get set. This happens when 2 or more of our hash functions produce the same value, or we attempt to set a bit that has already been set. Taking that a bit further, have a think about the implications of a bloom filter that has every bit set. bit is set, then won't the bloom filter claim it contains every item you check? That's a false-positive every time! Exactly right. A bloom filter with every bit set is equivalent to a Set that always returns true for contains. It will claim to contain everything you ask it about, even if that thing was never added. # False-positive rates The rate of false-positives in our bloom filter will grow as the percentage of set bits increases. Drag the slider below the graph to see how the false-positive rate changes as the number of set bits increases. It grows slowly at first, but as we get closer to having all bits set the rate increases. This is because we calculate the false-positive rate as x^3, where x is the percentage of set bits and 3 is the number of hash functions used. To give an example of why we calculate it with this formula, imagine we have a bloom filter with half of its bits set, x = 0.5. If we assume that our hash function has an equal chance of setting any of the bits, then the chance that all 3 hash functions set a bit that is already set is 0.5 * 0.5 * 0.5, or x^3. Let's have a look at the false-positive rate of bloom filters that use different numbers of hash functions. The problem that using lots of hash functions introduces is that it makes the bloom filter fill up faster. The more hash functions you use, the more bits get set for each item you add. There's also the cost of hashing itself. Hash functions aren't free, and while the hash functions you'd use in a bloom filter try to be as fast as possible, it's still more expensive to run 100 of them than it is to run 3. It's possible to calculate how full a bloom filter will be after inserting a number of items, based on the number of hash functions used. The graph below assumes a bloom filter with 1000 bits. The more hash functions we use, the faster we set all of the bits. You'll notice that the curve tails off as more items are added. This is because the more bits that are set, the more likely it is that we'all attempt to set a bit that has already been set. In practice, 1000 bits is a very small bloom filter, occupying only 125 bytes of memory. Modern computers have a lot of memory, so let's crank this up to 100,000 bits (12.5kB) and see what happens. The lines barely leave the bottom of the graph, meaning the bloom filter will be very empty and the false-positive rate will be low. All this cost us was 12.5kB of memory, which is still a very small amount by 2024 standards. # Tuning a bloom filter Picking the correct number of hash functions and bits for a bloom filter is a fine balance. Fortunately for us, if we know up-front how many unique items we want to store, and what our desired false-positive rate is, we can calculate the optimal number of hash functions, and the required number of bits. The bloom filter page on Wikipedia covers the mathematics involved, which I'm going to translate into JavaScript functions for us to use. I want to stress that you don't need to understand the maths to use a bloom filter or read this post. I'm including the link to it only for completeness. # Optimal number of bits The following JavaScript function, which might look a bit scary but bear with me, takes the number of items you want to store (items) and the desired false-positive rate (fpr, where 1% == 0.01), and returns how many bits you will need to achieve that false-positive rate. function bits(items, fpr) { const n = -items * Math.log(fpr); const d = Math.log(2) ** 2; return Math.ceil(n / d); } We can see how this grows for a variety of fpr values in the graph below. # Optimal number of hash functions After we've used the JavaScript above to calculate how many bits we need, we can use the following function to calculate the optimal number of hash functions to use: function hashFunctions(bits, items) { return Math.ceil((bits / items) * Math.log(2)); } Pause for a second here and have a think about how the number of hash functions might grow based on the size of the bloom filter and the number of items you expect to add. Do you think you'll use more hash functions, or fewer, as the bloom filter gets larger? What about as the number of items increases? The more items you plan to add, the fewer hash functions you should use. Yet, a larger bloom filter means you can use more hash functions. More hash functions keep the false-positive rate lower for longer, but more items fills up the bloom filter faster. It's a complex balancing act, and I am thankful that mathematicians have done the hard work of figuring it out for us. # Caution While we can stand on the shoulders of giants and pick the optimal number of bits and hash functions for our bloom filter, it's important to remember that these rely on you giving good estimates of the number of items you expect to add, and choosing a false-positive rate that's acceptable for your use-case. These numbers might be difficult to come up with, and I recommend erring on the side of caution. If you're not sure, it's likely better to use a larger bloom filter than you think you need. # Removing items from a bloom filter We've spent the whole post talking about adding things to a bloom filter, and the optimal parameters to use. We haven't spoken at all about removing items. And that's because you can't! In a bloom filter, we're using bits, individual 1s and 0s, to track the presence of items. If we were to remove an item by setting its bits to 0, we might also be removing other items by accident. There's no way of knowing. Click the buttons of the bloom filter below to see this in action. First we will add "foo", then "baz", and then we will remove "baz". Hit "clear" if you want to start again. The end result of this sequence is a bloom filter that doesn't contain "baz", but doesn't contain "foo" either. Because both "foo" and "baz" set bit 27, we accidentally clobber the presence of "foo" while removing "baz". Something else you might have noticed playing with the above example is that if you add "foo" and then attempt to remove "baz" before having added it, nothing happens. Even though 27 is set, bits 18 and 23 are not, so the bloom filter cannot contain "baz". Because of this, it won't unset 27. # Counting bloom filters While you can't remove items from a standard bloom filter, there are variants that allow you to do so. One of these variants is called a "counting bloom filter," which uses an array of counters instead of bits to keep track of items. Now when you go through the sequence, the end result is that the bloom filter still contains "foo." It solves the problem. The trade-off, though, is that counters are bigger than bits. With 4 bits per counter you can increment up to 15. With 8 bits per counter you can increment up to 255. You'll need to pick a counter size sufficient to never reach the maximum value, otherwise you risk corrupting the bloom filter. Using 8x more memory than a standard bloom filter could be a big deal, especially if you're using a bloom filter to save memory in the first place. Think hard about whether you really need to be able to remove items from your bloom filter. Counting bloom filters also introduce the possibility of false-negatives, which are impossible in standard bloom filters. Consider the following example. Because "loved" and "response" both hash to the bits 5, 22, and 26, when we remove "response" we also remove "loved". If we write this as JavaScript the problem becomes more clear: let bf = new CountingBloomFilter(); bf.add("loved"); bf.add("your"); bf.remove("response"); bf.contains("loved"); // false Even though we know for sure we've added "loved" in this snippet, the call to contains will return false. This sort of false-negative can't happen in a standard bloom filter, and it removes one of the key benefits of using a bloom filter in the first place: the guarantee of no false-negatives. # Bloom filters in the real-world Real-world users of bloom filters include Akamai, who use them to avoid caching web pages that are accessed once and never again. They do this by storing all page accesses in a bloom filter, and only writing them into cache if the bloom filter says they've been seen before. This does result in some pages being cached on the first access, but that's fine because it's still an improvement. It would be impractical for them to store all page accesses in a Set, so they accept the small false-positive rate in favour of the significantly smaller bloom filter. Akamai released a paper about this that goes into the full details if you're interested. Google's BigTable is a distributed key-value store, and uses bloom filters internally to know what keys are stored within. When a read request for a key comes in, a bloom filter in memory is first checked to see if the key is in the database. If not, BigTable can respond with "not found" without ever needing to read from disk. Sometimes the bloom filter will claim a key is in the database when it isn't, but this is fine because when that happens a disk access will confirm the key in fact isn't in the database. # Conclusion Bloom filters, while niche, can be a huge optimisation in the right situation. They're a wonderful application of hash functions, and a great example of making a deliberate trade-off to achieve a specific goal. Trade-offs, and combining simpler building blocks to create more complex, purpose-built data structures, are present everywhere in software engineering. Being able to spot where a data structure could net a big win can separate you from the pack, and take your career to the next level. I hope you've enjoyed this post, and that you find a way to apply bloom filters to a problem you're working on. # Acknowledgements Enormous thank you to my reviewers, without whom this post would be a shadow of what you read today. In no particular order: rylon, Indy, Aaron, Sophie, Davis, ed, Michael Drury, Anton Zhiyanov, Christoph Berger.
More in programming
Having spent four decades as a programmer in various industries and situations, I know that modern software development processes are far more stressful than when I started. It's not simply that developing software today is more complex than it was back in 1981. In that early decade, none
In previous articles, we saw how to use “real” UART, and looked into the trick used by Arduino to automatically reset boards when uploading firmware. Today, we’ll look into how Espressif does something similar, using even more tricks. “Real” UART on the Saola As usual, let’s first simply connect the UART adapter. Again, we connect … Continue reading Espressif’s Automatic Reset → The post Espressif’s Automatic Reset appeared first on Quentin Santos.
Hey peoples! Tonight, some meta-words. As you know I am fascinated by compilers and language implementations, and I just want to know all the things and implement all the fun stuff: intermediate representations, flow-sensitive source-to-source optimization passes, register allocation, instruction selection, garbage collection, all of that. It started long ago with a combination of curiosity and a hubris to satisfy that curiosity. The usual way to slake such a thirst is structured higher education followed by industry apprenticeship, but for whatever reason my path sent me through a nuclear engineering bachelor’s program instead of computer science, and continuing that path was so distasteful that I noped out all the way to rural Namibia for a couple years. Fast-forward, after 20 years in the programming industry, and having picked up some language implementation experience, a few years ago I returned to garbage collection. I have a good level of language implementation chops but never wrote a memory manager, and Guile’s performance was limited by its use of the Boehm collector. I had been on the lookout for something that could help, and when I learned of it seemed to me that the only thing missing was an appropriate implementation for Guile, and hey I could do that!Immix I started with the idea of an -style interface to a memory manager that was abstract enough to be implemented by a variety of different collection algorithms. This kind of abstraction is important, because in this domain it’s easy to convince oneself that a given algorithm is amazing, just based on vibes; to stay grounded, I find I always need to compare what I am doing to some fixed point of reference. This GC implementation effort grew into , but as it did so a funny thing happened: the as a direct replacement for the Boehm collector maintained mark bits in a side table, which I realized was a suitable substrate for Immix-inspired bump-pointer allocation into holes. I ended up building on that to develop an Immix collector, but without lines: instead each granule of allocation (16 bytes for a 64-bit system) is its own line.MMTkWhippetmark-sweep collector that I prototyped The is funny, because it defines itself as a new class of collector, fundamentally different from the three other fundamental algorithms (mark-sweep, mark-compact, and evacuation). Immix’s are blocks (64kB coarse-grained heap divisions) and lines (128B “fine-grained” divisions); the innovation (for me) is the discipline by which one can potentially defragment a block without a second pass over the heap, while also allowing for bump-pointer allocation. See the papers for the deets!Immix papermark-regionregionsoptimistic evacuation However what, really, are the regions referred to by ? If they are blocks, then the concept is trivial: everyone has a block-structured heap these days. If they are spans of lines, well, how does one choose a line size? As I understand it, Immix’s choice of 128 bytes was to be fine-grained enough to not lose too much space to fragmentation, while also being coarse enough to be eagerly swept during the GC pause.mark-region This constraint was odd, to me; all of the mark-sweep systems I have ever dealt with have had lazy or concurrent sweeping, so the lower bound on the line size to me had little meaning. Indeed, as one reads papers in this domain, it is hard to know the real from the rhetorical; the review process prizes novelty over nuance. Anyway. What if we cranked the precision dial to 16 instead, and had a line per granule? That was the process that led me to Nofl. It is a space in a collector that came from mark-sweep with a side table, but instead uses the side table for bump-pointer allocation. Or you could see it as an Immix whose line size is 16 bytes; it’s certainly easier to explain it that way, and that’s the tack I took in a .recent paper submission to ISMM’25 Wait what! I have a fine job in industry and a blog, why write a paper? Gosh I have meditated on this for a long time and the answers are very silly. Firstly, one of my language communities is Scheme, which was a research hotbed some 20-25 years ago, which means many practitioners—people I would be pleased to call peers—came up through the PhD factories and published many interesting results in academic venues. These are the folks I like to hang out with! This is also what academic conferences are, chances to shoot the shit with far-flung fellows. In Scheme this is fine, my work on Guile is enough to pay the intellectual cover charge, but I need more, and in the field of GC I am not a proven player. So I did an atypical thing, which is to cosplay at being an independent researcher without having first been a dependent researcher, and just solo-submit a paper. Kids: if you see yourself here, just go get a doctorate. It is not easy but I can only think it is a much more direct path to goal. And the result? Well, friends, it is this blog post :) I got the usual assortment of review feedback, from the very sympathetic to the less so, but ultimately people were confused by leading with a comparison to Immix but ending without an evaluation against Immix. This is fair and the paper does not mention that, you know, I don’t have an Immix lying around. To my eyes it was a good paper, an , but, you know, just a try. I’ll try again sometime.80% paper In the meantime, I am driving towards getting Whippet into Guile. I am hoping that sometime next week I will have excised all the uses of the BDW (Boehm GC) API in Guile, which will finally allow for testing Nofl in more than a laboratory environment. Onwards and upwards! whippet regions? paper??!?
I started writing this early last week but Real Life Stuff happened and now you're getting the first-draft late this week. Warning, unedited thoughts ahead! New Logic for Programmers release! v0.9 is out! This is a big release, with a new cover design, several rewritten chapters, online code samples and much more. See the full release notes at the changelog page, and get the book here! Write the cleverest code you possibly can There are millions of articles online about how programmers should not write "clever" code, and instead write simple, maintainable code that everybody understands. Sometimes the example of "clever" code looks like this (src): # Python p=n=1 exec("p*=n*n;n+=1;"*~-int(input())) print(p%n) This is code-golfing, the sport of writing the most concise code possible. Obviously you shouldn't run this in production for the same reason you shouldn't eat dinner off a Rembrandt. Other times the example looks like this: def is_prime(x): if x == 1: return True return all([x%n != 0 for n in range(2, x)] This is "clever" because it uses a single list comprehension, as opposed to a "simple" for loop. Yes, "list comprehensions are too clever" is something I've read in one of these articles. I've also talked to people who think that datatypes besides lists and hashmaps are too clever to use, that most optimizations are too clever to bother with, and even that functions and classes are too clever and code should be a linear script.1. Clever code is anything using features or domain concepts we don't understand. Something that seems unbearably clever to me might be utterly mundane for you, and vice versa. How do we make something utterly mundane? By using it and working at the boundaries of our skills. Almost everything I'm "good at" comes from banging my head against it more than is healthy. That suggests a really good reason to write clever code: it's an excellent form of purposeful practice. Writing clever code forces us to code outside of our comfort zone, developing our skills as software engineers. Debugging is twice as hard as writing the code in the first place. Therefore, if you write the code as cleverly as possible, you [will get excellent debugging practice at exactly the right level required to push your skills as a software engineer] — Brian Kernighan, probably There are other benefits, too, but first let's kill the elephant in the room:2 Don't commit clever code I am proposing writing clever code as a means of practice. Being at work is a job with coworkers who will not appreciate if your code is too clever. Similarly, don't use too many innovative technologies. Don't put anything in production you are uncomfortable with. We can still responsibly write clever code at work, though: Solve a problem in both a simple and a clever way, and then only commit the simple way. This works well for small scale problems where trying the "clever way" only takes a few minutes. Write our personal tools cleverly. I'm a big believer of the idea that most programmers would benefit from writing more scripts and support code customized to their particular work environment. This is a great place to practice new techniques, languages, etc. If clever code is absolutely the best way to solve a problem, then commit it with extensive documentation explaining how it works and why it's preferable to simpler solutions. Bonus: this potentially helps the whole team upskill. Writing clever code... ...teaches simple solutions Usually, code that's called too clever composes several powerful features together — the "not a single list comprehension or function" people are the exception. Josh Comeau's "don't write clever code" article gives this example of "too clever": const extractDataFromResponse = (response) => { const [Component, props] = response; const resultsEntries = Object.entries({ Component, props }); const assignIfValueTruthy = (o, [k, v]) => (v ? { ...o, [k]: v } : o ); return resultsEntries.reduce(assignIfValueTruthy, {}); } What makes this "clever"? I count eight language features composed together: entries, argument unpacking, implicit objects, splats, ternaries, higher-order functions, and reductions. Would code that used only one or two of these features still be "clever"? I don't think so. These features exist for a reason, and oftentimes they make code simpler than not using them. We can, of course, learn these features one at a time. Writing the clever version (but not committing it) gives us practice with all eight at once and also with how they compose together. That knowledge comes in handy when we want to apply a single one of the ideas. I've recently had to do a bit of pandas for a project. Whenever I have to do a new analysis, I try to write it as a single chain of transformations, and then as a more balanced set of updates. ...helps us master concepts Even if the composite parts of a "clever" solution aren't by themselves useful, it still makes us better at the overall language, and that's inherently valuable. A few years ago I wrote Crimes with Python's Pattern Matching. It involves writing horrible code like this: from abc import ABC class NotIterable(ABC): @classmethod def __subclasshook__(cls, C): return not hasattr(C, "__iter__") def f(x): match x: case NotIterable(): print(f"{x} is not iterable") case _: print(f"{x} is iterable") if __name__ == "__main__": f(10) f("string") f([1, 2, 3]) This composes Python match statements, which are broadly useful, and abstract base classes, which are incredibly niche. But even if I never use ABCs in real production code, it helped me understand Python's match semantics and Method Resolution Order better. ...prepares us for necessity Sometimes the clever way is the only way. Maybe we need something faster than the simplest solution. Maybe we are working with constrained tools or frameworks that demand cleverness. Peter Norvig argued that design patterns compensate for missing language features. I'd argue that cleverness is another means of compensating: if our tools don't have an easy way to do something, we need to find a clever way. You see this a lot in formal methods like TLA+. Need to check a hyperproperty? Cast your state space to a directed graph. Need to compose ten specifications together? Combine refinements with state machines. Most difficult problems have a "clever" solution. The real problem is that clever solutions have a skill floor. If normal use of the tool is at difficult 3 out of 10, then basic clever solutions are at 5 out of 10, and it's hard to jump those two steps in the moment you need the cleverness. But if you've practiced with writing overly clever code, you're used to working at a 7 out of 10 level in short bursts, and then you can "drop down" to 5/10. I don't know if that makes too much sense, but I see it happen a lot in practice. ...builds comradery On a few occasions, after getting a pull request merged, I pulled the reviewer over and said "check out this horrible way of doing the same thing". I find that as long as people know they're not going to be subjected to a clever solution in production, they enjoy seeing it! Next week's newsletter will probably also be late, after that we should be back to a regular schedule for the rest of the summer. Mostly grad students outside of CS who have to write scripts to do research. And in more than one data scientist. I think it's correlated with using Jupyter. ↩ If I don't put this at the beginning, I'll get a bajillion responses like "your team will hate you" ↩