More from ntietz.com blog - technically a blog
When you're just getting started with music, you have so many skills to learn. You have to be able to play your instrument and express yourself through it. You need to know the style you're playing, and its idioms and conventions. You may want to record your music, and need all the skills that come along with it. Music is, mostly, subjective: there's not an objective right or wrong way to do things. And that can make it really hard! Each of these skills is then couched in this subjectivity of trying to see if it's good enough. Playing someone else's music, making a cover, is great because it can make it objective. It gives you something to check against. When you're playing your own music, you're in charge of the entire thing. You didn't play a wrong note, because, well, you've just changed the piece! But when you play someone else's music, now there's an original and you can try to get as close to it as possible. Recreating it gives you a lot of practice in figuring out what someone did and how they did it. It also lets you peek into why they did it. Maybe a particular chord voicing is hard for you to play. Okay, let's simplify it and play an easier voicing. How does it sound now? How does it sound with the harder one? Play around with those differences and you start to see the why behind it all. * * * The same thing holds true for programming. One of my friends is a C++ programmer[1] and he was telling me about how he learned C++ and data structures really well early on: He reimplemented parts of the Boost library. This code makes heavy use of templates, a hard thing in C++. And it provides fundamental data structures with robust implementations and good performance[2]. What he would do is look at the library and pick a slice of it to implement. He'd look at what the API for it is, how it was implemented, what it was doing under the hood. Then he'd go ahead and try to do it himself, without any copy-pasting and without real-time copying from the other screen. Sometimes, he'd run into things which didn't make sense. Why is this a doubly-linked list here, when it seems a singly-linked list would do just fine? And in those moments, if you can't find a reason? You get to go down that path, make it the singly-linked version, and then find out later: oh, ohhh. Ohhhh, they did that for a reason. It lets you run into some of the hard problems, grapple with them, and understand why the original was written how it was. You get to study with some really strong programmers, by proxy via their codebase. Their code is your tutor and your guide for understanding how to write similar things in the future. * * * There's a lot of judgment out there about doing original works. This kind of judgment of covers and of reimplementing things that already exist, just to learn. So many people have internalized this, and I've heard countless times "I want to make a new project, but everything I think of, someone else has already done!" And to that, I say: do it anyway[3]. If someone else has done it, that's great. That means that you had an idea so good that someone else thought it was a good idea, too. And that means that, because someone else has done it, you have a reference now. You can compare notes, and you can see how they did it, and you can learn. I'm a recovering C++ programmer myself, and had some unpleasant experiences associated with the language. This friend is a game developer, and his industry is one where C++ makes a lot of sense to use because of the built-up code around it. ↩ He said they're not perfect, but that they're really good and solid and you know a lot of people thought for a long time about how to do them. You get to follow in their footsteps and benefit from all that hard thinking time. ↩ But: you must always give credit when you are using someone else's work. If you're reimplementing someone else's library, or covering someone's song, don't claim it's your own original invention. ↩
One of the first types we learn about is the boolean. It's pretty natural to use, because boolean logic underpins much of modern computing. And yet, it's one of the types we should probably be using a lot less of. In almost every single instance when you use a boolean, it should be something else. The trick is figuring out what "something else" is. Doing this is worth the effort. It tells you a lot about your system, and it will improve your design (even if you end up using a boolean). There are a few possible types that come up often, hiding as booleans. Let's take a look at each of these, as well as the case where using a boolean does make sense. This isn't exhaustive—[1]there are surely other types that can make sense, too. Datetimes A lot of boolean data is representing a temporal event having happened. For example, websites often have you confirm your email. This may be stored as a boolean column, is_confirmed, in the database. It makes a lot of sense. But, you're throwing away data: when the confirmation happened. You can instead store when the user confirmed their email in a nullable column. You can still get the same information by checking whether the column is null. But you also get richer data for other purposes. Maybe you find out down the road that there was a bug in your confirmation process. You can use these timestamps to check which users would be affected by that, based on when their confirmation was stored. This is the one I've seen discussed the most of all these. We run into it with almost every database we design, after all. You can detect it by asking if an action has to occur for the boolean to change values, and if values can only change one time. If you have both of these, then it really looks like it is a datetime being transformed into a boolean. Store the datetime! Enums Much of the remaining boolean data indicates either what type something is, or its status. Is a user an admin or not? Check the is_admin column! Did that job fail? Check the failed column! Is the user allowed to take this action? Return a boolean for that, yes or no! These usually make more sense as an enum. Consider the admin case: this is really a user role, and you should have an enum for it. If it's a boolean, you're going to eventually need more columns, and you'll keep adding on other statuses. Oh, we had users and admins, but now we also need guest users and we need super-admins. With an enum, you can add those easily. enum UserRole { User, Admin, Guest, SuperAdmin, } And then you can usually use your tooling to make sure that all the new cases are covered in your code. With a boolean, you have to add more booleans, and then you have to make sure you find all the places where the old booleans were used and make sure they handle these new cases, too. Enums help you avoid these bugs. Job status is one that's pretty clearly an enum as well. If you use booleans, you'll have is_failed, is_started, is_queued, and on and on. Or you could just have one single field, status, which is an enum with the various statuses. (Note, though, that you probably do want timestamp fields for each of these events—but you're still best having the status stored explicitly as well.) This begins to resemble a state machine once you store the status, and it means that you can make much cleaner code and analyze things along state transition lines. And it's not just for storing in a database, either. If you're checking a user's permissions, you often return a boolean for that. fn check_permissions(user: User) -> bool { false // no one is allowed to do anything i guess } In this case, true means the user can do it and false means they can't. Usually. I think. But you can really start to have doubts here, and with any boolean, because the application logic meaning of the value cannot be inferred from the type. Instead, this can be represented as an enum, even when there are just two choices. enum PermissionCheck { Allowed, NotPermitted(reason: String), } As a bonus, though, if you use an enum? You can end up with richer information, like returning a reason for a permission check failing. And you are safe for future expansions of the enum, just like with roles. You can detect when something should be an enum a proliferation of booleans which are mutually exclusive or depend on one another. You'll see multiple columns which are all changed at the same time. Or you'll see a boolean which is returned and used for a long time. It's important to use enums here to keep your program maintainable and understandable. Conditionals But when should we use a boolean? I've mainly run into one case where it makes sense: when you're (temporarily) storing the result of a conditional expression for evaluation. This is in some ways an optimization, either for the computer (reuse a variable[2]) or for the programmer (make it more comprehensible by giving a name to a big conditional) by storing an intermediate value. Here's a contrived example where using a boolean as an intermediate value. fn calculate_user_data(user: User, records: RecordStore) { // this would be some nice long conditional, // but I don't have one. So variables it is! let user_can_do_this: bool = (a && b) && (c || !d); if user_can_do_this && records.ready() { // do the thing } else if user_can_do_this && records.in_progress() { // do another thing } else { // and something else! } } But even here in this contrived example, some enums would make more sense. I'd keep the boolean, probably, simply to give a name to what we're calculating. But the rest of it should be a match on an enum! * * * Sure, not every boolean should go away. There's probably no single rule in software design that is always true. But, we should be paying a lot more attention to booleans. They're sneaky. They feel like they make sense for our data, but they make sense for our logic. The data is usually something different underneath. By storing a boolean as our data, we're coupling that data tightly to our application logic. Instead, we should remain critical and ask what data the boolean depends on, and should we maybe store that instead? It comes easier with practice. Really, all good design does. A little thinking up front saves you a lot of time in the long run. I know that using an em-dash is treated as a sign of using LLMs. LLMs are never used for my writing. I just really like em-dashes and have a dedicated key for them on one of my keyboard layers. ↩ This one is probably best left to the compiler. ↩
One of the best known hard problems in computer science is the halting problem. In fact, it's widely thought[1] that you cannot write a program that will, for any arbitrary program as input, tell you correctly whether or not it will terminate. This is written from the framing of computers, though: can we do better with a human in the loop? It turns out, we can. And we can use a method that's generalizable, which many people can follow for many problems. Not everyone can use the method, which you'll see why in a bit. But lots of people can apply this proof technique. Let's get started. * * * We'll start by formalizing what we're talking about, just a little bit. I'm not going to give the full formal proof—that will be reserved for when this is submitted to a prestigious conference next year. We will call the set of all programs P. We want to answer, for any p in P, whether or not p will eventually halt. We will call this h(p) and h(p) = true if p eventually finished and false otherwise. Actually, scratch that. Let's simplify it and just say that yes, every program does halt eventually, so h(p) = true for all p. That makes our lives easier. Now we need to get from our starting assumptions, the world of logic we live in, to the truth of our statement. We'll call our goal, that h(p) = true for all p, the statement H. Now let's start with some facts. Fact one: I think it's always an appropriate time to play the saxophone. *honk*! Fact two: My wife thinks that it's sometimes inappropriate to play the saxophone, such as when it's "time for bed" or "I was in the middle of a sentence![2] We'll give the statement "It's always an appropriate time to play the saxophone" the name A. We know that I believe A is true. And my wife believes that A is false. So now we run into the snag: Fact three: The wife is always right. This is a truism in American culture, useful for settling debates. It's also useful here for solving major problems in computer science because, babe, we're both the wife. We're both right! So now that we're both right, we know that A and !A are both true. And we're in luck, we can apply a whole lot of fancy classical logic here. Since A and !A we know that A is true and we also know that !A is true. From A being true, we can conclude that A or H is true. And then we can apply disjunctive syllogism[3] which says that if A or H is true and !A is true, then H must be true. This makes sense, because if you've excluded one possibility then the other must be true. And we do have !A, so that means: H is true! There we have it. We've proved our proposition, H, which says that for any program p, p will eventually halt. The previous logic is, mostly, sound. It uses the principle of explosion, though I prefer to call it "proof by married lesbian." * * * Of course, we know that this is wrong. It falls apart with our assumptions. We built the system on contradictory assumptions to begin with, and this is something we avoid in logic[4]. If we allow contradictions, then we can prove truly anything. I could have also proved (by married lesbian) that no program will terminate. This has been a silly traipse through logic. If you want a good journey through logic, I'd recommend Hillel Wayne's Logic for Programmers. I'm sure that, after reading it, you'll find absolutely no flaws in my logic here. After all, I'm the wife, so I'm always right. It's widely thought because it's true, but we don't have to let that keep us from a good time. ↩ I fact checked this with her, and she does indeed hold this belief. ↩ I had to look this up, my uni logic class was a long time ago. ↩ The real conclusion to draw is that, because of proof by contradiction, it's certainly not true that the wife is always right. Proved that one via married lesbians having arguments. Or maybe gay relationships are always magical and happy and everyone lives happily ever after, who knows. ↩
I've been publishing at least one blog post every week on this blog for about 2.5 years. I kept it up even when I was very sick last year with Lyme disease. It's time for me to take a break and reset. This is the right time, because the world is very difficult for me to move through right now and I'm just burnt out. I need to focus my energy on things that give me energy and right now, that's not writing and that's not tech. I'll come back to this, and it might look a little different. This is my last post for at least a month. It might be longer, if I still need more time, but I won't return before the end of May. I know I need at least that long to heal, and I also need that time to focus on music. I plan to play a set at West Philly Porchfest, so this whole month I'll be prepping that set. If you want to follow along with my music, you can find it on my bandcamp (only one track, but I'll post demos of the others that I prepare for Porchfest as they come together). And if you want to reach out, my inbox is open. Be kind to yourself. Stay well, drink some water. See you in a while.
More in programming
I’ve long been interested in new and different platforms. I ran Debian on an Alpha back in the late 1990s and was part of the Alpha port team; then I helped bootstrap Debian on amd64. I’ve got somewhere around 8 Raspberry Pi devices in active use right now, and the free NNCPNET Internet email service … Continue reading ARM is great, ARM is terrible (and so is RISC-V) →
In my first interview out of college I was asked the change counter problem: Given a set of coin denominations, find the minimum number of coins required to make change for a given number. IE for USA coinage and 37 cents, the minimum number is four (quarter, dime, 2 pennies). I implemented the simple greedy algorithm and immediately fell into the trap of the question: the greedy algorithm only works for "well-behaved" denominations. If the coin values were [10, 9, 1], then making 37 cents would take 10 coins in the greedy algorithm but only 4 coins optimally (10+9+9+9). The "smart" answer is to use a dynamic programming algorithm, which I didn't know how to do. So I failed the interview. But you only need dynamic programming if you're writing your own algorithm. It's really easy if you throw it into a constraint solver like MiniZinc and call it a day. int: total; array[int] of int: values = [10, 9, 1]; array[index_set(values)] of var 0..: coins; constraint sum (c in index_set(coins)) (coins[c] * values[c]) == total; solve minimize sum(coins); You can try this online here. It'll give you a prompt to put in total and then give you successively-better solutions: coins = [0, 0, 37]; ---------- coins = [0, 1, 28]; ---------- coins = [0, 2, 19]; ---------- coins = [0, 3, 10]; ---------- coins = [0, 4, 1]; ---------- coins = [1, 3, 0]; ---------- Lots of similar interview questions are this kind of mathematical optimization problem, where we have to find the maximum or minimum of a function corresponding to constraints. They're hard in programming languages because programming languages are too low-level. They are also exactly the problems that constraint solvers were designed to solve. Hard leetcode problems are easy constraint problems.1 Here I'm using MiniZinc, but you could just as easily use Z3 or OR-Tools or whatever your favorite generalized solver is. More examples This was a question in a different interview (which I thankfully passed): Given a list of stock prices through the day, find maximum profit you can get by buying one stock and selling one stock later. It's easy to do in O(n^2) time, or if you are clever, you can do it in O(n). Or you could be not clever at all and just write it as a constraint problem: array[int] of int: prices = [3, 1, 4, 1, 5, 9, 2, 6, 5, 3, 5, 8]; var int: buy; var int: sell; var int: profit = prices[sell] - prices[buy]; constraint sell > buy; constraint profit > 0; solve maximize profit; Reminder, link to trying it online here. While working at that job, one interview question we tested out was: Given a list, determine if three numbers in that list can be added or subtracted to give 0? This is a satisfaction problem, not a constraint problem: we don't need the "best answer", any answer will do. We eventually decided against it for being too tricky for the engineers we were targeting. But it's not tricky in a solver; include "globals.mzn"; array[int] of int: numbers = [3, 1, 4, 1, 5, 9, 2, 6, 5, 3, 5, 8]; array[index_set(numbers)] of var {0, -1, 1}: choices; constraint sum(n in index_set(numbers)) (numbers[n] * choices[n]) = 0; constraint count(choices, -1) + count(choices, 1) = 3; solve satisfy; Okay, one last one, a problem I saw last year at Chipy AlgoSIG. Basically they pick some leetcode problems and we all do them. I failed to solve this one: Given an array of integers heights representing the histogram's bar height where the width of each bar is 1, return the area of the largest rectangle in the histogram. The "proper" solution is a tricky thing involving tracking lots of bookkeeping states, which you can completely bypass by expressing it as constraints: array[int] of int: numbers = [2,1,5,6,2,3]; var 1..length(numbers): x; var 1..length(numbers): dx; var 1..: y; constraint x + dx <= length(numbers); constraint forall (i in x..(x+dx)) (y <= numbers[i]); var int: area = (dx+1)*y; solve maximize area; output ["(\(x)->\(x+dx))*\(y) = \(area)"] There's even a way to automatically visualize the solution (using vis_geost_2d), but I didn't feel like figuring it out in time for the newsletter. Is this better? Now if I actually brought these questions to an interview the interviewee could ruin my day by asking "what's the runtime complexity?" Constraint solvers runtimes are unpredictable and almost always than an ideal bespoke algorithm because they are more expressive, in what I refer to as the capability/tractability tradeoff. But even so, they'll do way better than a bad bespoke algorithm, and I'm not experienced enough in handwriting algorithms to consistently beat a solver. The real advantage of solvers, though, is how well they handle new constraints. Take the stock picking problem above. I can write an O(n²) algorithm in a few minutes and the O(n) algorithm if you give me some time to think. Now change the problem to Maximize the profit by buying and selling up to max_sales stocks, but you can only buy or sell one stock at a given time and you can only hold up to max_hold stocks at a time? That's a way harder problem to write even an inefficient algorithm for! While the constraint problem is only a tiny bit more complicated: include "globals.mzn"; int: max_sales = 3; int: max_hold = 2; array[int] of int: prices = [3, 1, 4, 1, 5, 9, 2, 6, 5, 3, 5, 8]; array [1..max_sales] of var int: buy; array [1..max_sales] of var int: sell; array [index_set(prices)] of var 0..max_hold: stocks_held; var int: profit = sum(s in 1..max_sales) (prices[sell[s]] - prices[buy[s]]); constraint forall (s in 1..max_sales) (sell[s] > buy[s]); constraint profit > 0; constraint forall(i in index_set(prices)) (stocks_held[i] = (count(s in 1..max_sales) (buy[s] <= i) - count(s in 1..max_sales) (sell[s] <= i))); constraint alldifferent(buy ++ sell); solve maximize profit; output ["buy at \(buy)\n", "sell at \(sell)\n", "for \(profit)"]; Most constraint solving examples online are puzzles, like Sudoku or "SEND + MORE = MONEY". Solving leetcode problems would be a more interesting demonstration. And you get more interesting opportunities to teach optimizations, like symmetry breaking. Because my dad will email me if I don't explain this: "leetcode" is slang for "tricky algorithmic interview questions that have little-to-no relevance in the actual job you're interviewing for." It's from leetcode.com. ↩
I’m something of a filesystem geek, I guess. I first wrote about ZFS on Linux 14 years ago, and even before I used ZFS, I had used ext2/3/4, jfs, reiserfs, xfs, and no doubt some others. I’ve also used btrfs. I last posted about it in 2014, when I noted it has some advantages over … Continue reading btrfs on a Raspberry Pi →
Something like a channel changer, for the web. That's what the idea was at first. But it led to a whole new path of discovery that even the site's creators couldn't have predicted. The post Stumbling upon appeared first on The History of the Web.