Full Width [alt+shift+f] Shortcuts [alt+shift+k]
Sign Up [alt+shift+s] Log In [alt+shift+l]
23
No newsletter next week, I'm teaching a TLA+ workshop. Speaking of which: I spend a lot of time thinking about formal methods (and TLA+ specifically) because it's where the source of almost all my revenue. But I don't share most of the details because 90% of my readers don't use FM and never will. I think it's more interesting to talk about ideas from FM that would be useful to people outside that field. For example, the idea of "property strength" translates to the idea that some tests are stronger than others. Another possible export is how FM approaches nondeterminism. A nondeterministic algorithm is one that, from the same starting conditions, has multiple possible outputs. This is nondeterministic: # Pseudocode def f() { return rand()+1; } When specifying systems, I may not encounter nondeterminism more often than in real systems, but I am definitely more aware of its presence. Modeling nondeterminism is a core part of formal specification. I mentally categorize...
a month ago

Improve your reading experience

Logged in users get linked directly to articles resulting in a better reading experience. Please login for free, it takes less than 1 minute.

More from Computer Things

[April Cools] Gaming Games for Non-Gamers

My April Cools is out! Gaming Games for Non-Gamers is a 3,000 word essay on video games worth playing if you've never enjoyed a video game before. Patreon notes here. (April Cools is a project where we write genuine content on non-normal topics. You can see all the other April Cools posted so far here. There's still time to submit your own!) April Cools' Club

yesterday 1 votes
Betteridge's Law of Software Engineering Specialness

Logic for Programmers v0.8 now out! The new release has minor changes: new formatting for notes and a better introduction to predicates. I would have rolled it all into v0.9 next month but I like the monthly cadence. Get it here! Betteridge's Law of Software Engineering Specialness In There is No Automatic Reset in Engineering, Tim Ottinger asks: Do the other people have to live with January 2013 for the rest of their lives? Or is it only engineering that has to deal with every dirty hack since the beginning of the organization? Betteridge's Law of Headlines says that if a journalism headline ends with a question mark, the answer is probably "no". I propose a similar law relating to software engineering specialness:1 If someone asks if some aspect of software development is truly unique to just software development, the answer is probably "no". Take the idea that "in software, hacks are forever." My favorite example of this comes from a different profession. The Dewey Decimal System hierarchically categorizes books by discipline. For example, Covered Bridges of Pennsylvania has Dewey number 624.37. 6-- is the technology discipline, 62- is engineering, 624 is civil engineering, and 624.3 is "special types of bridges". I have no idea what the last 0.07 means, but you get the picture. Now if you look at the 6-- "technology" breakdown, you'll see that there's no "software" subdiscipline. This is because when Dewey preallocated the whole technology block in 1876. New topics were instead to be added to the 00- "general-knowledge" catch-all. Eventually 005 was assigned to "software development", meaning The C Programming Language lives at 005.133. Incidentally, another late addition to the general knowledge block is 001.9: "controversial knowledge". And that's why my hometown library shelved the C++ books right next to The Mothman Prophecies. How's that for technical debt? If anything, fixing hacks in software is significantly easier than in other fields. This came up when I was interviewing classic engineers. Kludges happened all the time, but "refactoring" them out is expensive. Need to house a machine that's just two inches taller than the room? Guess what, you're cutting a hole in the ceiling. (Even if we restrict the question to other departments in a software company, we can find kludges that are horrible to undo. I once worked for a company which landed an early contract by adding a bespoke support agreement for that one customer. That plagued them for years afterward.) That's not to say that there aren't things that are different about software vs other fields!2 But I think that most of the time, when we say "software development is the only profession that deals with XYZ", it's only because we're ignorant of how those other professions work. Short newsletter because I'm way behind on writing my April Cools. If you're interested in April Cools, you should try it out! I make it way harder on myself than it actually needs to be— everybody else who participates finds it pretty chill. Ottinger caveats it with "engineering, software or otherwise", so I think he knows that other branches of engineering, at least, have kludges. ↩ The "software is different" idea that I'm most sympathetic to is that in software, the tools we use and the products we create are made from the same material. That's unusual at least in classic engineering. Then again, plenty of machinists have made their own lathes and mills! ↩

a week ago 9 votes
New Blog Post: "A Perplexing Javascript Parsing Puzzle"

I know I said we'd be back to normal newsletters this week and in fact had 80% of one already written. Then I unearthed something that was better left buried. Blog post here, Patreon notes here (Mostly an explanation of how I found this horror in the first place). Next week I'll send what was supposed to be this week's piece. (PS: April Cools in three weeks!)

3 weeks ago 13 votes
Are Efficiency and Horizontal Scalability at odds?

Sorry for missing the newsletter last week! I started writing on Monday as normal, and by Wednesday the piece (about the hierarchy of controls ) was 2000 words and not close to done. So now it'll be a blog post sometime later this month. I also just released a new version of Logic for Programmers! 0.7 adds a bunch of new content (type invariants, modeling access policies, rewrites of the first chapters) but more importantly has new fonts that are more legible than the old ones. Go check it out! For this week's newsletter I want to brainstorm an idea I've been noodling over for a while. Say we have a computational task, like running a simulation or searching a very large graph, and it's taking too long to complete on a computer. There's generally three things that we can do to make it faster: Buy a faster computer ("vertical scaling") Modify the software to use the computer's resources better ("efficiency") Modify the software to use multiple computers ("horizontal scaling") (Splitting single-threaded software across multiple threads/processes is sort of a blend of (2) and (3).) The big benefit of (1) is that we (usually) don't have to make any changes to the software to get a speedup. The downside is that for the past couple of decades computers haven't gotten much faster, except in ways that require recoding (like GPUs and multicore). This means we rely on (2) and (3), and we can do both to a point. I've noticed, though, that horizontal scaling seems to conflict with efficiency. Software optimized to scale well tends to be worse or the N=1 case than software optimized to, um, be optimized. Are there reasons to expect this? It seems reasonable that design goals of software are generally in conflict, purely because exclusively optimizing for one property means making decisions that impede other properties. But is there something in the nature of "efficiency" and "horizontal scalability" that make them especially disjoint? This isn't me trying to explain a fully coherent idea, more me trying to figure this all out to myself. Also I'm probably getting some hardware stuff wrong Amdahl's Law According to Amdahl's Law, the maximum speedup by parallelization is constrained by the proportion of the work that can be parallelized. If 80% of algorithm X is parallelizable, the maximum speedup from horizontal scaling is 5x. If algorithm Y is 25% parallelizable, the maximum speedup is only 1.3x. If you need horizontal scalability, you want to use algorithm X, even if Y is naturally 3x faster. But if Y was 4x faster, you'd prefer it to X. Maximal scalability means finding the optimal balance between baseline speed and parallelizability. Maximal efficiency means just optimizing baseline speed. Coordination Overhead Distributed algorithms require more coordination. To add a list of numbers in parallel via fork-join, we'd do something like this: Split the list into N sublists Fork a new thread/process for sublist Wait for each thread/process to finish Add the sums together. (1), (2), and (3) all add overhead to the algorithm. At the very least, it's extra lines of code to execute, but it can also mean inter-process communication or network hops. Distribution also means you have fewer natural correctness guarantees, so you need more administrative overhead to avoid race conditions. Real world example: Historically CPython has a "global interpreter lock" (GIL). In multithreaded code, only one thread could execute Python code at a time (others could execute C code). The newest version supports disabling the GIL, which comes at a 40% overhead for single-threaded programs. Supposedly the difference is because the specializing adaptor optimization isn't thread-safe yet. The Python team is hoping on getting it down to "only" 10%. Scaling loses shared resources I'd say that intra-machine scaling (multiple threads/processes) feels qualitatively different than inter-machine scaling. Part of that is that intra-machine scaling is "capped" while inter-machine is not. But there's also a difference in what assumptions you can make about shared resources. Starting from the baseline of single-threaded program: Threads have a much harder time sharing CPU caches (you have to manually mess with affinities) Processes have a much harder time sharing RAM (I think you have to use mmap?) Machines can't share cache, RAM, or disk, period. It's a lot easier to solve a problem when the whole thing fits in RAM. But if you split a 50 gb problem across three machines, it doesn't fit in ram by default, even if the machines have 64 gb each. Scaling also means that separate machines can't reuse resources like database connections. Efficiency comes from limits I think the two previous points tie together in the idea that maximal efficiency comes from being able to make assumptions about the system. If we know the exact sequence of computations, we can aim to minimize cache misses. If we don't have to worry about thread-safety, tracking references is dramatically simpler. If we have all of the data in a single database, our query planner has more room to work with. At various tiers of scaling these assumptions are no longer guaranteed and we lose the corresponding optimizations. Sometimes these assumptions are implicit and crop up in odd places. Like if you're working at a scale where you need multiple synced databases, you might want to use UUIDs instead of numbers for keys. But then you lose the assumption "recently inserted rows are close together in the index", which I've read can lead to significant slowdowns. This suggests that if you can find a limit somewhere else, you can get both high horizontal scaling and high efficiency. Supposedly the Tigerbeetle database has both, but that could be because they limit all records to accounts and transfers. This means every record fits in exactly 128 bytes. Does this mean that "assumptions" could be both "assumptions about the computing environment" and "assumptions about the problem"? In the famous essay Scalability! But at what COST, Frank McSherry shows that his single-threaded laptop could outperform 128-node "big data systems" on PageRank and graph connectivity (via label propagation). Afterwards, he discusses how a different algorithm solves graph connectivity even faster: [Union find] is more line of code than label propagation, but it is 10x faster and 100x less embarassing. … The union-find algorithm is fundamentally incompatible with the graph computation approaches Giraph, GraphLab, and GraphX put forward (the so-called “think like a vertex” model). The interesting thing to me is that his alternate makes more "assumptions" than what he's comparing to. He can "assume" a fixed goal and optimize the code for that goal. The "big data systems" are trying to be general purpose compute platforms and have to pick a model that supports the widest range of possible problems. A few years back I wrote clever vs insightful code, I think what I'm trying to say here is that efficiency comes from having insight into your problem and environment. (Last thought to shove in here: to exploit assumptions, you need control. Carefully arranging your data to fit in L1 doesn't matter if your programming language doesn't let you control where things are stored!) Is there a cultural aspect? Maybe there's also a cultural element to this conflict. What if the engineers interested in "efficiency" are different from the engineers interested in "horizontal scaling"? At my first job the data scientists set up a Hadoop cluster for their relatively small dataset, only a few dozen gigabytes or so. One of the senior software engineers saw this and said "big data is stupid." To prove it, he took one of their example queries, wrote a script in Go to compute the same thing, and optimized it to run faster on his machine. At the time I was like "yeah, you're right, big data IS stupid!" But I think now that we both missed something obvious: with the "scalable" solution, the data scientists didn't have to write an optimized script for every single query. Optimizing code is hard, adding more machines is easy! The highest-tier of horizontal scaling is usually something large businesses want, and large businesses like problems that can be solved purely with money. Maximizing efficiency requires a lot of knowledge-intensive human labour, so is less appealing as an investment. Then again, I've seen a lot of work on making the scalable systems more efficient, such as evenly balancing heterogeneous workloads. Maybe in the largest systems intra-machine efficiency is just too small-scale a problem. I'm not sure where this fits in but scaling a volume of tasks conflicts less than scaling individual tasks If you have 1,000 machines and need to crunch one big graph, you probably want the most scalable algorithm. If you instead have 50,000 small graphs, you probably want the most efficient algorithm, which you then run on all 1,000 machines. When we call a problem embarrassingly parallel, we usually mean it's easy to horizontally scale. But it's also one that's easy to make more efficient, because local optimizations don't affect the scaling! Okay that's enough brainstorming for one week. Blog Rec Whenever I think about optimization as a skill, the first article that comes to mind is Mat Klad's Push Ifs Up And Fors Down. I'd never have considered on my own that inlining loops into functions could be such a huge performance win. The blog has a lot of other posts on the nuts-and-bolts of systems languages, optimization, and concurrency.

a month ago 24 votes

More in programming

Thomas Aquinas — The world is divine!

A large part of our civilisation rests on the shoulders of one medieval monk: Thomas Aquinas. Amid the turmoil of life, riddled with wickedness and pain, he would insist that our world is good.  And all our success is built on this belief. Note: Before we start, let’s get one thing out of the way: Thomas Aquinas is clearly a Christian thinker, a Saint even. Yet he was also a brilliant philosopher. So even if you consider yourself agnostic or an atheist, stay with me, you will still enjoy his ideas. What is good? Thomas’ argument is rooted in Aristotle’s concept of goodness: Something is good if it fulfills its function. Aristotle had illustrated this idea with a knife. A knife is good to the extent that it cuts well. He made a distinction between an actual knife and its ideal function. That actual thing in your drawer is the existence of a knife. And its ideal function is its essence—what it means to be a knife: to cut well.  So everything is separated into its existence and its ideal essence. And this is also true for humans: We have an ideal conception of what the essence of a human […] The post Thomas Aquinas — The world is divine! appeared first on Ralph Ammer.

12 hours ago 2 votes
[April Cools] Gaming Games for Non-Gamers

My April Cools is out! Gaming Games for Non-Gamers is a 3,000 word essay on video games worth playing if you've never enjoyed a video game before. Patreon notes here. (April Cools is a project where we write genuine content on non-normal topics. You can see all the other April Cools posted so far here. There's still time to submit your own!) April Cools' Club

yesterday 1 votes
What Is Software Quality?

Everyone wants the software they work on to produce quality products, but what does that mean? In addition, how do you know when you have it? This is the longest single blog post I have ever written. I spent four decades writing software used by people (most of the server

2 days ago 5 votes
Name that Ware, March 2025

The Ware for March 2025 is shown below. I was just taking this thing apart to see what went wrong, and thought it had some merit as a name that ware. But perhaps more interestingly, I was also experimenting with my cross-polarized imaging setup. This is a technique a friend of mine told me about […]

2 days ago 3 votes