More from ntietz.com blog - technically a blog
There's a pizza shop near me that serves a normal pizza. I mean, they distribute the toppings in a normal way. They're not uniform at all. The toppings are random, but not the way I want. The colloquial understanding of "random" is kind of the Platonic ideal of a pizza: slightly chaotic but things are more or less spread out over the whole piece in a regular way. If you take a slice you'll get more of less the same amount of pepperoni as any other slice. And every bite will have roughly the same amount of pepperoni as every other bite. I think it would look something like this. Regenerate this pie! This pizza to me is pretty much the canonical mental pizza. It looks pretty random, but you know what you're gonna get. And it is random! Here's how we made it, with the visualiztion part glossed over. First, we make a helper function, since Math.random() gives us values from 0 to 1, but we want values from -1 to 1. // return a uniform random value in [-1, 1] function randUniform() { return 2*Math.random() - 1; } Then, we make a simple function that gives us the coordinates of where to put a pepperoni piece, from the uniform distribution. function uniformPepperoniPosition() { var [centerX, centerY, radius] = pepperoniBounds(); let x = radius*2; let y = radius*2; while (x**2 + y**2 >= radius**2) { x = randUniform() * radius; y = randUniform() * radius; } return [x+centerX, y+centerY]; } And we cap it off with placing 300 fresh pieces of pepperoni on this pie, before we send it into the oven. (It's an outrageous amount of very small pepperoni, chosen in both axes for ease of visualizing the distribution rather than realism.) function drawUniformPizza() { drawBackground(); drawPizzaCrust(); drawCheese(); var [_, _, radius] = pepperoniBounds(); for (let p = 0; p < 300; p++) { let [x,y] = uniformPepperoniPosition(); drawPepperoni(x, y); } } But it's not what my local pizza shop's pizza's look like. That's because they're not using the same probability distribution. This pizza is using a uniform distribution. That means that for any given pepperoni, every single position on the pizza is equally likely for it to land on. These are normal pizzas We are using a uniform distribution here, but there are plenty of other distributions we could use as well. One of the other most familiar distributions is normal distribution. This is the distribution that has the normal "bell curve" that we are used to seeing. And this is probably what people are talking about most of the time when they talk about how many standard deviations something is away from something else. So what would it look like if we did a normal distribution on a pizza? The very first thing we need to answer that is a way of getting the values from the normal distribution. This isn't included with JavaScript by default, but we can implement it pretty simply using the Box-Muller transform. This might be a scary name, but it's really easy to use. Is a way of generating numbers in the normal distribution using number sampled from the uniform distribution. We can implement it like this: function randNormal() { let theta = 2*Math.PI*Math.random(); let r = Math.sqrt(-2*Math.log(Math.random())); let x = r * Math.cos(theta); let y = r * Math.sin(theta); return [x,y]; } Then we can make a pretty simple function again which gives us coordinates for where to place pepperoni in this distribution. The only little weird thing here is that I scale the radius down by a factor of 3. Without this, the pizza ends up a little bit indistinguishable from the uniform distribution, but the scaling is arbitrary and you can do whatever you want. function normalPepperoniPosition() { var [centerX, centerY, radius] = pepperoniBounds(); let x = radius*2; let y = radius*2; while (x**2 + y**2 >= radius**2) { [x,y] = randNormal(); x = x * radius/3; y = y * radius/3; } return [x + centerX, y + centerX]; } And then once again we cap it off with a 300 piece pepperoni pizza. function drawNormalPizza() { drawBackground(); drawPizzaCrust(); drawCheese(); for (let p = 0; p < 300; p++) { let [x,y] = normalPepperoniPosition(); drawPepperoni(x, y); } } Regenerate this pie! Ouch. It's not my platonic ideal of a pizza, that's for sure. It also looks closer to the pizzas my local shop serves, but it's missing something... See, this one is centered around, you know, the center. Theirs are not that. They're more chaotic with a few handfuls of toppings. What if we did the normal distributions, but multiple times, with different centers? First we have to update our position picking function to accept a center for the cluster. We'll do this by passing in the center and generating coordinates around those, while still checking that we're within the bounds of the circle formed by the crust of the pizza. function normal(cx, cy) { var [centerX, centerY, radius] = pepperoniBounds(); let x = radius*2; let y = radius*2; while ((x-centerX)**2 + (y-centerY)**2 >= radius**2) { [x,y] = randNormal(); x = cx + x * radius/3; y = cy + y * radius/3; } return [x, y]; } And then instead of one single loop for all 300 pieces, we can do 3 loops of 100 pieces each, with different (randomly chosen) centers for each. function drawClusterPizza() { const settings = initializeCanvas("drawing-3"); drawBackground(settings); drawPizzaCrust(settings); drawCheese(settings); var [centerX, centerY, radius] = pepperoniBounds(settings); for (let c = 0; c < 3; c++) { let [cx, cy] = uniform(settings, centerX, centerY, 1); console.log(cx, cy); for (let p = 0; p < 100; p++) { let [x, y] = normal(settings, cx, cy, 4); drawPepperoni(settings, x, y); } } } Regenerate this pie! That looks more like it. Well, probably. This one is more chaotic, and sometimes things work out okay, but other times they're weird. Just like the real pizzas. Click that "regenerate" button a few times to see a few examples! Okay, but when do you want one? So, this is all great. But, when would we want this? I mean, first of all, boring. We don't need a reason except that it's fun! But, there's one valid use case that a medical professional and I came up with[1]: hot honey[2]. The ideal pepperoni pizza just might be one that has uniformly distributed pepperoni with normally distributed hot honey or hot sauce. You'd start with more intense heat, then it would taper off as you go toward the crust, so you maintain the heat without getting overwhelmed by it. The room to play here is endless! We can come up with a lot of other fun distributions and map them in similar ways. Unfortunately, we probably can't make a Poisson pizza, since that's a distribution for discrete variables. I really do talk about weird things with all my medical providers. And everyone else I meet. I don't know, life's too short to go "hey, this is a professional interaction, let's not chatter on and on about whatever irrelevant topic is on our mind." ↩ The pizza topping, not my pet name. ↩
One of the first types we learn about is the boolean. It's pretty natural to use, because boolean logic underpins much of modern computing. And yet, it's one of the types we should probably be using a lot less of. In almost every single instance when you use a boolean, it should be something else. The trick is figuring out what "something else" is. Doing this is worth the effort. It tells you a lot about your system, and it will improve your design (even if you end up using a boolean). There are a few possible types that come up often, hiding as booleans. Let's take a look at each of these, as well as the case where using a boolean does make sense. This isn't exhaustive—[1]there are surely other types that can make sense, too. Datetimes A lot of boolean data is representing a temporal event having happened. For example, websites often have you confirm your email. This may be stored as a boolean column, is_confirmed, in the database. It makes a lot of sense. But, you're throwing away data: when the confirmation happened. You can instead store when the user confirmed their email in a nullable column. You can still get the same information by checking whether the column is null. But you also get richer data for other purposes. Maybe you find out down the road that there was a bug in your confirmation process. You can use these timestamps to check which users would be affected by that, based on when their confirmation was stored. This is the one I've seen discussed the most of all these. We run into it with almost every database we design, after all. You can detect it by asking if an action has to occur for the boolean to change values, and if values can only change one time. If you have both of these, then it really looks like it is a datetime being transformed into a boolean. Store the datetime! Enums Much of the remaining boolean data indicates either what type something is, or its status. Is a user an admin or not? Check the is_admin column! Did that job fail? Check the failed column! Is the user allowed to take this action? Return a boolean for that, yes or no! These usually make more sense as an enum. Consider the admin case: this is really a user role, and you should have an enum for it. If it's a boolean, you're going to eventually need more columns, and you'll keep adding on other statuses. Oh, we had users and admins, but now we also need guest users and we need super-admins. With an enum, you can add those easily. enum UserRole { User, Admin, Guest, SuperAdmin, } And then you can usually use your tooling to make sure that all the new cases are covered in your code. With a boolean, you have to add more booleans, and then you have to make sure you find all the places where the old booleans were used and make sure they handle these new cases, too. Enums help you avoid these bugs. Job status is one that's pretty clearly an enum as well. If you use booleans, you'll have is_failed, is_started, is_queued, and on and on. Or you could just have one single field, status, which is an enum with the various statuses. (Note, though, that you probably do want timestamp fields for each of these events—but you're still best having the status stored explicitly as well.) This begins to resemble a state machine once you store the status, and it means that you can make much cleaner code and analyze things along state transition lines. And it's not just for storing in a database, either. If you're checking a user's permissions, you often return a boolean for that. fn check_permissions(user: User) -> bool { false // no one is allowed to do anything i guess } In this case, true means the user can do it and false means they can't. Usually. I think. But you can really start to have doubts here, and with any boolean, because the application logic meaning of the value cannot be inferred from the type. Instead, this can be represented as an enum, even when there are just two choices. enum PermissionCheck { Allowed, NotPermitted(reason: String), } As a bonus, though, if you use an enum? You can end up with richer information, like returning a reason for a permission check failing. And you are safe for future expansions of the enum, just like with roles. You can detect when something should be an enum a proliferation of booleans which are mutually exclusive or depend on one another. You'll see multiple columns which are all changed at the same time. Or you'll see a boolean which is returned and used for a long time. It's important to use enums here to keep your program maintainable and understandable. Conditionals But when should we use a boolean? I've mainly run into one case where it makes sense: when you're (temporarily) storing the result of a conditional expression for evaluation. This is in some ways an optimization, either for the computer (reuse a variable[2]) or for the programmer (make it more comprehensible by giving a name to a big conditional) by storing an intermediate value. Here's a contrived example where using a boolean as an intermediate value. fn calculate_user_data(user: User, records: RecordStore) { // this would be some nice long conditional, // but I don't have one. So variables it is! let user_can_do_this: bool = (a && b) && (c || !d); if user_can_do_this && records.ready() { // do the thing } else if user_can_do_this && records.in_progress() { // do another thing } else { // and something else! } } But even here in this contrived example, some enums would make more sense. I'd keep the boolean, probably, simply to give a name to what we're calculating. But the rest of it should be a match on an enum! * * * Sure, not every boolean should go away. There's probably no single rule in software design that is always true. But, we should be paying a lot more attention to booleans. They're sneaky. They feel like they make sense for our data, but they make sense for our logic. The data is usually something different underneath. By storing a boolean as our data, we're coupling that data tightly to our application logic. Instead, we should remain critical and ask what data the boolean depends on, and should we maybe store that instead? It comes easier with practice. Really, all good design does. A little thinking up front saves you a lot of time in the long run. I know that using an em-dash is treated as a sign of using LLMs. LLMs are never used for my writing. I just really like em-dashes and have a dedicated key for them on one of my keyboard layers. ↩ This one is probably best left to the compiler. ↩
One of the best known hard problems in computer science is the halting problem. In fact, it's widely thought[1] that you cannot write a program that will, for any arbitrary program as input, tell you correctly whether or not it will terminate. This is written from the framing of computers, though: can we do better with a human in the loop? It turns out, we can. And we can use a method that's generalizable, which many people can follow for many problems. Not everyone can use the method, which you'll see why in a bit. But lots of people can apply this proof technique. Let's get started. * * * We'll start by formalizing what we're talking about, just a little bit. I'm not going to give the full formal proof—that will be reserved for when this is submitted to a prestigious conference next year. We will call the set of all programs P. We want to answer, for any p in P, whether or not p will eventually halt. We will call this h(p) and h(p) = true if p eventually finished and false otherwise. Actually, scratch that. Let's simplify it and just say that yes, every program does halt eventually, so h(p) = true for all p. That makes our lives easier. Now we need to get from our starting assumptions, the world of logic we live in, to the truth of our statement. We'll call our goal, that h(p) = true for all p, the statement H. Now let's start with some facts. Fact one: I think it's always an appropriate time to play the saxophone. *honk*! Fact two: My wife thinks that it's sometimes inappropriate to play the saxophone, such as when it's "time for bed" or "I was in the middle of a sentence![2] We'll give the statement "It's always an appropriate time to play the saxophone" the name A. We know that I believe A is true. And my wife believes that A is false. So now we run into the snag: Fact three: The wife is always right. This is a truism in American culture, useful for settling debates. It's also useful here for solving major problems in computer science because, babe, we're both the wife. We're both right! So now that we're both right, we know that A and !A are both true. And we're in luck, we can apply a whole lot of fancy classical logic here. Since A and !A we know that A is true and we also know that !A is true. From A being true, we can conclude that A or H is true. And then we can apply disjunctive syllogism[3] which says that if A or H is true and !A is true, then H must be true. This makes sense, because if you've excluded one possibility then the other must be true. And we do have !A, so that means: H is true! There we have it. We've proved our proposition, H, which says that for any program p, p will eventually halt. The previous logic is, mostly, sound. It uses the principle of explosion, though I prefer to call it "proof by married lesbian." * * * Of course, we know that this is wrong. It falls apart with our assumptions. We built the system on contradictory assumptions to begin with, and this is something we avoid in logic[4]. If we allow contradictions, then we can prove truly anything. I could have also proved (by married lesbian) that no program will terminate. This has been a silly traipse through logic. If you want a good journey through logic, I'd recommend Hillel Wayne's Logic for Programmers. I'm sure that, after reading it, you'll find absolutely no flaws in my logic here. After all, I'm the wife, so I'm always right. It's widely thought because it's true, but we don't have to let that keep us from a good time. ↩ I fact checked this with her, and she does indeed hold this belief. ↩ I had to look this up, my uni logic class was a long time ago. ↩ The real conclusion to draw is that, because of proof by contradiction, it's certainly not true that the wife is always right. Proved that one via married lesbians having arguments. Or maybe gay relationships are always magical and happy and everyone lives happily ever after, who knows. ↩
I've been publishing at least one blog post every week on this blog for about 2.5 years. I kept it up even when I was very sick last year with Lyme disease. It's time for me to take a break and reset. This is the right time, because the world is very difficult for me to move through right now and I'm just burnt out. I need to focus my energy on things that give me energy and right now, that's not writing and that's not tech. I'll come back to this, and it might look a little different. This is my last post for at least a month. It might be longer, if I still need more time, but I won't return before the end of May. I know I need at least that long to heal, and I also need that time to focus on music. I plan to play a set at West Philly Porchfest, so this whole month I'll be prepping that set. If you want to follow along with my music, you can find it on my bandcamp (only one track, but I'll post demos of the others that I prepare for Porchfest as they come together). And if you want to reach out, my inbox is open. Be kind to yourself. Stay well, drink some water. See you in a while.
More in programming
The first in a series of posts about doing things the right way
A deep dive into the Action Cache, the CAS, and the security issues that arise from using Bazel with a remote cache but without remote execution
(I present to you my stream of consciousness on the topic of casing as it applies to the web platform.) I’m reading about the new command and commandfor attributes — which I’m super excited about, declarative behavior invocation in HTML? YES PLEASE!! — and one thing that strikes me is the casing in these APIs. For example, the command attribute has a variety of values in HTML which correspond to APIs in JavaScript. The show-popover attribute value maps to .showPopover() in JavaScript. hide-popover maps to .hidePopover(), etc. So what we have is: lowercase in attribute names e.g. commandfor="..." kebab-case in attribute values e.g. show-popover camelCase for JS counterparts e.g. showPopover() After thinking about this a little more, I remember that HTML attributes names are case insensitive, so the browser will normalize them to lowercase during parsing. Given that, I suppose you could write commandFor="..." but it’s effectively the same. Ok, lowercase attribute names in HTML makes sense. The related popover attributes follow the same convention: popovertarget popovertargetaction And there are many other attribute names in HTML that are lowercase, e.g.: maxlength novalidate contenteditable autocomplete formenctype So that all makes sense. But wait, there are some attribute names with hyphens in them, like aria-label="..." and data-value="...". So why isn’t it command-for="..."? Well, upon further reflection, I suppose those attributes were named that way for extensibility’s sake: they are essentially wildcard attributes that represent a family of attributes that are all under the same namespace: aria-* and data-*. But wait, isn’t that an argument for doing popover-target and popover-target-action? Or command and command-for? But wait (I keep saying that) there are kebab-case attribute names in HTML — like http-equiv on the <meta> tag, or accept-charset on the form tag — but those seem more like legacy exceptions. It seems like the only answer here is: there is no rule. Naming is driven by convention and decisions are made on a case-by-case basis. But if I had to summarize, it would probably be that the default casing for new APIs tends to follow the rules I outlined at the start (and what’s reflected in the new command APIs): lowercase for HTML attributes names kebab-case for HTML attribute values camelCase for JS counterparts Let’s not even get into SVG attribute names We need one of those “bless this mess” signs that we can hang over the World Wide Web. Email · Mastodon · Bluesky
Greetings everyone! You might have noticed that it's September and I don't have the next version of Logic for Programmers ready. As penance, here's ten free copies of the book. So a few months ago I wrote a newsletter about how we use nondeterminism in formal methods. The overarching idea: Nondeterminism is when multiple paths are possible from a starting state. A system preserves a property if it holds on all possible paths. If even one path violates the property, then we have a bug. An intuitive model of this is that for this is that when faced with a nondeterministic choice, the system always makes the worst possible choice. This is sometimes called demonic nondeterminism and is favored in formal methods because we are paranoid to a fault. The opposite would be angelic nondeterminism, where the system always makes the best possible choice. A property then holds if any possible path satisfies that property.1 This is not as common in FM, but it still has its uses! "Players can access the secret level" or "We can always shut down the computer" are reachability properties, that something is possible even if not actually done. In broader computer science research, I'd say that angelic nondeterminism is more popular, due to its widespread use in complexity analysis and programming languages. Complexity Analysis P is the set of all "decision problems" (basically, boolean functions) can be solved in polynomial time: there's an algorithm that's worst-case in O(n), O(n²), O(n³), etc.2 NP is the set of all problems that can be solved in polynomial time by an algorithm with angelic nondeterminism.3 For example, the question "does list l contain x" can be solved in O(1) time by a nondeterministic algorithm: fun is_member(l: List[T], x: T): bool { if l == [] {return false}; guess i in 0..<(len(l)-1); return l[i] == x; } Say call is_member([a, b, c, d], c). The best possible choice would be to guess i = 2, which would correctly return true. Now call is_member([a, b], d). No matter what we guess, the algorithm correctly returns false. and just return false. Ergo, O(1). NP stands for "Nondeterministic Polynomial". (And I just now realized something pretty cool: you can say that P is the set of all problems solvable in polynomial time under demonic nondeterminism, which is a nice parallel between the two classes.) Computer scientists have proven that angelic nondeterminism doesn't give us any more "power": there are no problems solvable with AN that aren't also solvable deterministically. The big question is whether AN is more efficient: it is widely believed, but not proven, that there are problems in NP but not in P. Most famously, "Is there any variable assignment that makes this boolean formula true?" A polynomial AN algorithm is again easy: fun SAT(f(x1, x2, …: bool): bool): bool { N = num_params(f) for i in 1..=num_params(f) { guess x_i in {true, false} } return f(x_1, x_2, …) } The best deterministic algorithms we have to solve the same problem are worst-case exponential with the number of boolean parameters. This a real frustrating problem because real computers don't have angelic nondeterminism, so problems like SAT remain hard. We can solve most "well-behaved" instances of the problem in reasonable time, but the worst-case instances get intractable real fast. Means of Abstraction We can directly turn an AN algorithm into a (possibly much slower) deterministic algorithm, such as by backtracking. This makes AN a pretty good abstraction over what an algorithm is doing. Does the regex (a+b)\1+ match "abaabaabaab"? Yes, if the regex engine nondeterministically guesses that it needs to start at the third letter and make the group aab. How does my PL's regex implementation find that match? I dunno, backtracking or NFA construction or something, I don't need to know the deterministic specifics in order to use the nondeterministic abstraction. Neel Krishnaswami has a great definition of 'declarative language': "any language with a semantics has some nontrivial existential quantifiers in it". I'm not sure if this is identical to saying "a language with an angelic nondeterministic abstraction", but they must be pretty close, and all of his examples match: SQL's selects and joins Parsing DSLs Logic programming's unification Constraint solving On top of that I'd add CSS selectors and planner's actions; all nondeterministic abstractions over a deterministic implementation. He also says that the things programmers hate most in declarative languages are features that "that expose the operational model": constraint solver search strategies, Prolog cuts, regex backreferences, etc. Which again matches my experiences with angelic nondeterminism: I dread features that force me to understand the deterministic implementation. But they're necessary, since P probably != NP and so we need to worry about operational optimizations. Eldritch Nondeterminism If you need to know the ratio of good/bad paths, the number of good paths, or probability, or anything more than "there is a good path" or "there is a bad path", you are beyond the reach of heaven or hell. Angelic and demonic nondeterminism are duals: angelic returns "yes" if some choice: correct and demonic returns "no" if !all choice: correct, which is the same as some choice: !correct. ↩ Pet peeve about Big-O notation: O(n²) is the set of all algorithms that, for sufficiently large problem sizes, grow no faster that quadratically. "Bubblesort has O(n²) complexity" should be written Bubblesort in O(n²), not Bubblesort = O(n²). ↩ To be precise, solvable in polynomial time by a Nondeterministic Turing Machine, a very particular model of computation. We can broadly talk about P and NP without framing everything in terms of Turing machines, but some details of complexity classes (like the existence "weak NP-hardness") kinda need Turing machines to make sense. ↩
The 2025 edition of the TokyoDev Developer Survey is now live! If you’re a software developer living in Japan, please take a few minutes to participate. All questions are optional, and it should take less than 10 minutes to complete. The survey will remain open until September 30th. Last year, we received over 800 responses. Highlights included: Median compensation remained stable. The pay gap between international and Japanese companies narrowed to 47%. Fewer respondents had the option to work fully remotely. For 2025, we’ve added several new questions, including a dedicated section on one of the most talked-about topics in development today: AI. The survey is completely anonymous, and only aggregated results will be shared—never personally identifiable information. The more responses we get, the deeper and more meaningful our insights will be. Please help by taking the survey and sharing it with your peers!