Full Width [alt+shift+f] Shortcuts [alt+shift+k]
Sign Up [alt+shift+s] Log In [alt+shift+l]
25
You've got a wall in your apartment and a couch. You measure the wall with a ruler and get 7 feet, then you measure the couch and get 7 feet. Can you fit the couch against that wall? Maybe. If the two measure is exactly 7 feet than sure, 7 ≤ 7. But you probably didn't line your measuring stick up perfectly with the couch, and also the stick itself might be a tiny bit long. There's some uncertainty around the edges, say 1/10th of a foot. Instead of treating each as exactly 7 feet, we can instead say that each is somewhere between a minimum of 6.9 feet and a maximum of 7.1. We can write this as an interval (6.9, 7.1). From there, we can say interval (a, b) is "definitely less than" another interval (c, d) if b < c, ie the first interval ends before the second starts. Since 7.1 !< 6.9, we can't say for certain that the couch will fit against the wall. Your couch could actually be 7.1 feet and the wall 6.9 feet, or 7.05 and 7.01 feet, anything like that. More arithmetic When...
8 months ago

Improve your reading experience

Logged in users get linked directly to articles resulting in a better reading experience. Please login for free, it takes less than 1 minute.

More from Computer Things

New Blog Post: "A Perplexing Javascript Parsing Puzzle"

I know I said we'd be back to normal newsletters this week and in fact had 80% of one already written. Then I unearthed something that was better left buried. Blog post here, Patreon notes here (Mostly an explanation of how I found this horror in the first place). Next week I'll send what was supposed to be this week's piece. (PS: April Cools in three weeks!)

2 days ago 4 votes
Five Kinds of Nondeterminism

No newsletter next week, I'm teaching a TLA+ workshop. Speaking of which: I spend a lot of time thinking about formal methods (and TLA+ specifically) because it's where the source of almost all my revenue. But I don't share most of the details because 90% of my readers don't use FM and never will. I think it's more interesting to talk about ideas from FM that would be useful to people outside that field. For example, the idea of "property strength" translates to the idea that some tests are stronger than others. Another possible export is how FM approaches nondeterminism. A nondeterministic algorithm is one that, from the same starting conditions, has multiple possible outputs. This is nondeterministic: # Pseudocode def f() { return rand()+1; } When specifying systems, I may not encounter nondeterminism more often than in real systems, but I am definitely more aware of its presence. Modeling nondeterminism is a core part of formal specification. I mentally categorize nondeterminism into five buckets. Caveat, this is specifically about nondeterminism from the perspective of system modeling, not computer science as a whole. If I tried to include stuff on NFAs and amb operations this would be twice as long.1 1. True Randomness Programs that literally make calls to a random function and then use the results. This the simplest type of nondeterminism and one of the most ubiquitous. Most of the time, random isn't truly nondeterministic. Most of the time computer randomness is actually pseudorandom, meaning we seed a deterministic algorithm that behaves "randomly-enough" for some use. You could "lift" a nondeterministic random function into a deterministic one by adding a fixed seed to the starting state. # Python from random import random, seed def f(x): seed(x) return random() >>> f(3) 0.23796462709189137 >>> f(3) 0.23796462709189137 Often we don't do this because the point of randomness is to provide nondeterminism! We deliberately abstract out the starting state of the seed from our program, because it's easier to think about it as locally nondeterministic. (There's also "true" randomness, like using thermal noise as an entropy source, which I think are mainly used for cryptography and seeding PRNGs.) Most formal specification languages don't deal with randomness (though some deal with probability more broadly). Instead, we treat it as a nondeterministic choice: # software if rand > 0.001 then return a else crash # specification either return a or crash This is because we're looking at worst-case scenarios, so it doesn't matter if crash happens 50% of the time or 0.0001% of the time, it's still possible. 2. Concurrency # Pseudocode global x = 1, y = 0; def thread1() { x++; x++; x++; } def thread2() { y := x; } If thread1() and thread2() run sequentially, then (assuming the sequence is fixed) the final value of y is deterministic. If the two functions are started and run simultaneously, then depending on when thread2 executes y can be 1, 2, 3, or 4. Both functions are locally sequential, but running them concurrently leads to global nondeterminism. Concurrency is arguably the most dramatic source of nondeterminism. Small amounts of concurrency lead to huge explosions in the state space. We have words for the specific kinds of nondeterminism caused by concurrency, like "race condition" and "dirty write". Often we think about it as a separate topic from nondeterminism. To some extent it "overshadows" the other kinds: I have a much easier time teaching students about concurrency in models than nondeterminism in models. Many formal specification languages have special syntax/machinery for the concurrent aspects of a system, and generic syntax for other kinds of nondeterminism. In P that's choose. Others don't special-case concurrency, instead representing as it as nondeterministic choices by a global coordinator. This more flexible but also more inconvenient, as you have to implement process-local sequencing code yourself. 3. User Input One of the most famous and influential programming books is The C Programming Language by Kernighan and Ritchie. The first example of a nondeterministic program appears on page 14: For the newsletter readers who get text only emails,2 here's the program: #include /* copy input to output; 1st version */ main() { int c; c = getchar(); while (c != EOF) { putchar(c); c = getchar(); } } Yup, that's nondeterministic. Because the user can enter any string, any call of main() could have any output, meaning the number of possible outcomes is infinity. Okay that seems a little cheap, and I think it's because we tend to think of determinism in terms of how the user experiences the program. Yes, main() has an infinite number of user inputs, but for each input the user will experience only one possible output. It starts to feel more nondeterministic when modeling a long-standing system that's reacting to user input, for example a server that runs a script whenever the user uploads a file. This can be modeled with nondeterminism and concurrency: We have one execution that's the system, and one nondeterministic execution that represents the effects of our user. (One intrusive thought I sometimes have: any "yes/no" dialogue actually has three outcomes: yes, no, or the user getting up and walking away without picking a choice, permanently stalling the execution.) 4. External forces The more general version of "user input": anything where either 1) some part of the execution outcome depends on retrieving external information, or 2) the external world can change some state outside of your system. I call the distinction between internal and external components of the system the world and the machine. Simple examples: code that at some point reads an external temperature sensor. Unrelated code running on a system which quits programs if it gets too hot. API requests to a third party vendor. Code processing files but users can delete files before the script gets to them. Like with PRNGs, some of these cases don't have to be nondeterministic; we can argue that "the temperature" should be a virtual input into the function. Like with PRNGs, we treat it as nondeterministic because it's useful to think in that way. Also, what if the temperature changes between starting a function and reading it? External forces are also a source of nondeterminism as uncertainty. Measurements in the real world often comes with errors, so repeating a measurement twice can give two different answers. Sometimes operations fail for no discernable reason, or for a non-programmatic reason (like something physically blocks the sensor). All of these situations can be modeled in the same way as user input: a concurrent execution making nondeterministic choices. 5. Abstraction This is where nondeterminism in system models and in "real software" differ the most. I said earlier that pseudorandomness is arguably deterministic, but we abstract it into nondeterminism. More generally, nondeterminism hides implementation details of deterministic processes. In one consulting project, we had a machine that received a message, parsed a lot of data from the message, went into a complicated workflow, and then entered one of three states. The final state was totally deterministic on the content of the message, but the actual process of determining that final state took tons and tons of code. None of that mattered at the scope we were modeling, so we abstracted it all away: "on receiving message, nondeterministically enter state A, B, or C." Doing this makes the system easier to model. It also makes the model more sensitive to possible errors. What if the workflow is bugged and sends us to the wrong state? That's already covered by the nondeterministic choice! Nondeterministic abstraction gives us the potential to pick the worst-case scenario for our system, so we can prove it's robust even under those conditions. I know I beat the "nondeterminism as abstraction" drum a whole lot but that's because it's the insight from formal methods I personally value the most, that nondeterminism is a powerful tool to simplify reasoning about things. You can see the same approach in how I approach modeling users and external forces: complex realities black-boxed and simplified into nondeterministic forces on the system. Anyway, I hope this collection of ideas I got from formal methods are useful to my broader readership. Lemme know if it somehow helps you out! I realized after writing this that I already talked wrote an essay about nondeterminism in formal specification just under a year ago. I hope this one covers enough new ground to be interesting! ↩ There is a surprising number of you. ↩

3 weeks ago 16 votes
Are Efficiency and Horizontal Scalability at odds?

Sorry for missing the newsletter last week! I started writing on Monday as normal, and by Wednesday the piece (about the hierarchy of controls ) was 2000 words and not close to done. So now it'll be a blog post sometime later this month. I also just released a new version of Logic for Programmers! 0.7 adds a bunch of new content (type invariants, modeling access policies, rewrites of the first chapters) but more importantly has new fonts that are more legible than the old ones. Go check it out! For this week's newsletter I want to brainstorm an idea I've been noodling over for a while. Say we have a computational task, like running a simulation or searching a very large graph, and it's taking too long to complete on a computer. There's generally three things that we can do to make it faster: Buy a faster computer ("vertical scaling") Modify the software to use the computer's resources better ("efficiency") Modify the software to use multiple computers ("horizontal scaling") (Splitting single-threaded software across multiple threads/processes is sort of a blend of (2) and (3).) The big benefit of (1) is that we (usually) don't have to make any changes to the software to get a speedup. The downside is that for the past couple of decades computers haven't gotten much faster, except in ways that require recoding (like GPUs and multicore). This means we rely on (2) and (3), and we can do both to a point. I've noticed, though, that horizontal scaling seems to conflict with efficiency. Software optimized to scale well tends to be worse or the N=1 case than software optimized to, um, be optimized. Are there reasons to expect this? It seems reasonable that design goals of software are generally in conflict, purely because exclusively optimizing for one property means making decisions that impede other properties. But is there something in the nature of "efficiency" and "horizontal scalability" that make them especially disjoint? This isn't me trying to explain a fully coherent idea, more me trying to figure this all out to myself. Also I'm probably getting some hardware stuff wrong Amdahl's Law According to Amdahl's Law, the maximum speedup by parallelization is constrained by the proportion of the work that can be parallelized. If 80% of algorithm X is parallelizable, the maximum speedup from horizontal scaling is 5x. If algorithm Y is 25% parallelizable, the maximum speedup is only 1.3x. If you need horizontal scalability, you want to use algorithm X, even if Y is naturally 3x faster. But if Y was 4x faster, you'd prefer it to X. Maximal scalability means finding the optimal balance between baseline speed and parallelizability. Maximal efficiency means just optimizing baseline speed. Coordination Overhead Distributed algorithms require more coordination. To add a list of numbers in parallel via fork-join, we'd do something like this: Split the list into N sublists Fork a new thread/process for sublist Wait for each thread/process to finish Add the sums together. (1), (2), and (3) all add overhead to the algorithm. At the very least, it's extra lines of code to execute, but it can also mean inter-process communication or network hops. Distribution also means you have fewer natural correctness guarantees, so you need more administrative overhead to avoid race conditions. Real world example: Historically CPython has a "global interpreter lock" (GIL). In multithreaded code, only one thread could execute Python code at a time (others could execute C code). The newest version supports disabling the GIL, which comes at a 40% overhead for single-threaded programs. Supposedly the difference is because the specializing adaptor optimization isn't thread-safe yet. The Python team is hoping on getting it down to "only" 10%. Scaling loses shared resources I'd say that intra-machine scaling (multiple threads/processes) feels qualitatively different than inter-machine scaling. Part of that is that intra-machine scaling is "capped" while inter-machine is not. But there's also a difference in what assumptions you can make about shared resources. Starting from the baseline of single-threaded program: Threads have a much harder time sharing CPU caches (you have to manually mess with affinities) Processes have a much harder time sharing RAM (I think you have to use mmap?) Machines can't share cache, RAM, or disk, period. It's a lot easier to solve a problem when the whole thing fits in RAM. But if you split a 50 gb problem across three machines, it doesn't fit in ram by default, even if the machines have 64 gb each. Scaling also means that separate machines can't reuse resources like database connections. Efficiency comes from limits I think the two previous points tie together in the idea that maximal efficiency comes from being able to make assumptions about the system. If we know the exact sequence of computations, we can aim to minimize cache misses. If we don't have to worry about thread-safety, tracking references is dramatically simpler. If we have all of the data in a single database, our query planner has more room to work with. At various tiers of scaling these assumptions are no longer guaranteed and we lose the corresponding optimizations. Sometimes these assumptions are implicit and crop up in odd places. Like if you're working at a scale where you need multiple synced databases, you might want to use UUIDs instead of numbers for keys. But then you lose the assumption "recently inserted rows are close together in the index", which I've read can lead to significant slowdowns. This suggests that if you can find a limit somewhere else, you can get both high horizontal scaling and high efficiency. Supposedly the Tigerbeetle database has both, but that could be because they limit all records to accounts and transfers. This means every record fits in exactly 128 bytes. Does this mean that "assumptions" could be both "assumptions about the computing environment" and "assumptions about the problem"? In the famous essay Scalability! But at what COST, Frank McSherry shows that his single-threaded laptop could outperform 128-node "big data systems" on PageRank and graph connectivity (via label propagation). Afterwards, he discusses how a different algorithm solves graph connectivity even faster: [Union find] is more line of code than label propagation, but it is 10x faster and 100x less embarassing. … The union-find algorithm is fundamentally incompatible with the graph computation approaches Giraph, GraphLab, and GraphX put forward (the so-called “think like a vertex” model). The interesting thing to me is that his alternate makes more "assumptions" than what he's comparing to. He can "assume" a fixed goal and optimize the code for that goal. The "big data systems" are trying to be general purpose compute platforms and have to pick a model that supports the widest range of possible problems. A few years back I wrote clever vs insightful code, I think what I'm trying to say here is that efficiency comes from having insight into your problem and environment. (Last thought to shove in here: to exploit assumptions, you need control. Carefully arranging your data to fit in L1 doesn't matter if your programming language doesn't let you control where things are stored!) Is there a cultural aspect? Maybe there's also a cultural element to this conflict. What if the engineers interested in "efficiency" are different from the engineers interested in "horizontal scaling"? At my first job the data scientists set up a Hadoop cluster for their relatively small dataset, only a few dozen gigabytes or so. One of the senior software engineers saw this and said "big data is stupid." To prove it, he took one of their example queries, wrote a script in Go to compute the same thing, and optimized it to run faster on his machine. At the time I was like "yeah, you're right, big data IS stupid!" But I think now that we both missed something obvious: with the "scalable" solution, the data scientists didn't have to write an optimized script for every single query. Optimizing code is hard, adding more machines is easy! The highest-tier of horizontal scaling is usually something large businesses want, and large businesses like problems that can be solved purely with money. Maximizing efficiency requires a lot of knowledge-intensive human labour, so is less appealing as an investment. Then again, I've seen a lot of work on making the scalable systems more efficient, such as evenly balancing heterogeneous workloads. Maybe in the largest systems intra-machine efficiency is just too small-scale a problem. I'm not sure where this fits in but scaling a volume of tasks conflicts less than scaling individual tasks If you have 1,000 machines and need to crunch one big graph, you probably want the most scalable algorithm. If you instead have 50,000 small graphs, you probably want the most efficient algorithm, which you then run on all 1,000 machines. When we call a problem embarrassingly parallel, we usually mean it's easy to horizontally scale. But it's also one that's easy to make more efficient, because local optimizations don't affect the scaling! Okay that's enough brainstorming for one week. Blog Rec Whenever I think about optimization as a skill, the first article that comes to mind is Mat Klad's Push Ifs Up And Fors Down. I'd never have considered on my own that inlining loops into functions could be such a huge performance win. The blog has a lot of other posts on the nuts-and-bolts of systems languages, optimization, and concurrency.

4 weeks ago 18 votes
What hard thing does your tech make easy?

I occasionally receive emails asking me to look at the writer's new language/library/tool. Sometimes it's in an area I know well, like formal methods. Other times, I'm a complete stranger to the field. Regardless, I'm generally happy to check it out. When starting out, this is the biggest question I'm looking to answer: What does this technology make easy that's normally hard? What justifies me learning and migrating to a new thing as opposed to fighting through my problems with the tools I already know? The new thing has to have some sort of value proposition, which could be something like "better performance" or "more secure". The most universal value and the most direct to show is "takes less time and mental effort to do something". I can't accurately judge two benchmarks, but I can see two demos or code samples and compare which one feels easier to me. Examples Functional programming What drew me originally to functional programming was higher order functions. # Without HOFs out = [] for x in input { if test(x) { out.append(x) } } # With HOFs filter(test, input) We can also compare the easiness of various tasks between examples within the same paradigm. If I know FP via Clojure, what could be appealing about Haskell or F#? For one, null safety is a lot easier when I've got option types. Array Programming Array programming languages like APL or J make certain classes of computation easier. For example, finding all of the indices where two arrays differ. Here it is in Python: x = [1, 4, 2, 3, 4, 1, 0, 0, 0, 4] y = [2, 3, 1, 1, 2, 3, 2, 0, 2, 4] >>> [i for i, (a, b) in enumerate(zip(x, y)) if a == b] [7, 9] And here it is in J: x =: 1 4 2 3 4 1 0 0 0 4 y =: 2 3 1 1 2 3 2 0 2 4 I. x = y 7 9 Not every tool is meant for every programmer, because you might not have any of the problems a tool makes easier. What comes up more often for you: filtering a list or finding all the indices where two lists differ? Statistically speaking, functional programming is more useful to you than array programming. But I have this problem enough to justify learning array programming. LLMs I think a lot of the appeal of LLMs is they make a lot of specialist tasks easy for nonspecialists. One thing I recently did was convert some rst list tables to csv tables. Normally I'd have to do write some tricky parsing and serialization code to automatically convert between the two. With LLMs, it's just Convert the following rst list-table into a csv-table: [table] "Easy" can trump "correct" as a value. The LLM might get some translations wrong, but it's so convenient I'd rather manually review all the translations for errors than write specialized script that is correct 100% of the time. Let's not take this too far A college friend once claimed that he cracked the secret of human behavior: humans do whatever makes them happiest. "What about the martyr who dies for their beliefs?" "Well, in their last second of life they get REALLY happy." We can do the same here, fitting every value proposition into the frame of "easy". CUDA makes it easier to do matrix multiplication. Rust makes it easier to write low-level code without memory bugs. TLA+ makes it easier to find errors in your design. Monads make it easier to sequence computations in a lazy environment. Making everything about "easy" obscures other reason for adopting new things. That whole "simple vs easy" thing Sometimes people think that "simple" is better than "easy", because "simple" is objective and "easy" is subjective. This comes from the famous talk Simple Made Easy. I'm not sure I agree that simple is better or more objective: the speaker claims that polymorphism and typeclasses are "simpler" than conditionals, and I doubt everybody would agree with that. The problem is that "simple" is used to mean both "not complicated" and "not complex". And everybody agrees that "complicated" and "complex" are different, even if they can't agree what the difference is. This idea should probably expanded be expanded into its own newsletter. It's also a lot harder to pitch a technology on being "simpler". Simplicity by itself doesn't make a tool better equipped to solve problems. Simplicity can unlock other benefits, like compositionality or tractability, that provide the actual value. And often that value is in the form of "makes some tasks easier".

a month ago 24 votes
The Juggler's Curse

I'm making a more focused effort to juggle this year. Mostly boxes, but also classic balls too.1 I've gotten to the point where I can almost consistently do a five-ball cascade, which I thought was the cutoff to being a "good juggler". "Thought" because I now know a "good juggler" is one who can do the five-ball cascade with outside throws. I know this because I can't do the outside five-ball cascade... yet. But it's something I can see myself eventually mastering, unlike the slightly more difficult trick of the five-ball mess, which is impossible for mere mortals like me. In theory there is a spectrum of trick difficulties and skill levels. I could place myself on the axis like this: In practice, there are three tiers: Toddlers Good jugglers who practice hard Genetic freaks and actual wizards And the graph always, always looks like this: This is the jugglers curse, and it's a three-parter: The threshold between you and "good" is the next trick you cannot do. Everything below that level is trivial. Once you've gotten a trick down, you can never go back to not knowing it, to appreciating how difficult it was to learn in the first place.2 Everything above that level is just "impossible". You don't have the knowledge needed to recognize the different tiers.3 So as you get better, the stuff that was impossible becomes differentiable, and you can see that some of it is possible. And everything you learned becomes trivial. So you're never a good juggler until you learn "just one more hard trick". The more you know, the more you know you don't know and the less you know you know. This is supposed to be a software newsletter A monad is a monoid in the category of endofunctors, what's the problem? (src) I think this applies to any difficult topic? Most fields don't have the same stark spectral lines as juggling, but there's still tiers of difficulty to techniques, which get compressed the further in either direction they are from your current level. Like, I'm not good at formal methods. I've written two books on it but I've never mastered a dependently-typed language or a theorem prover. Those are equally hard. And I'm not good at modeling concurrent systems because I don't understand the formal definition of bisimulation and haven't implemented a Raft. Those are also equally hard, in fact exactly as hard as mastering a theorem prover. At the same time, the skills I've already developed are easy: properly using refinement is exactly as easy as writing a wrapped counter. Then I get surprised when I try to explain strong fairness to someone and they just don't get how □◇(ENABLED〈A〉ᵥ) is obviously different from ◇□(ENABLED 〈A〉ᵥ). Juggler's curse! Now I don't actually know if this is actually how everybody experiences expertise or if it's just my particular personality— I was a juggler long before I was a software developer. Then again, I'd argue that lots of people talk about one consequence of the juggler's curse: imposter syndrome. If you constantly think what you know is "trivial" and what you don't know is "impossible", then yeah, you'd start feeling like an imposter at work real quick. I wonder if part of the cause is that a lot of skills you have to learn are invisible. One of my favorite blog posts ever is In Defense of Blub Studies, which argues that software expertise comes through understanding "boring" topics like "what all of the error messages mean" and "how to use a debugger well". Blub is a critical part of expertise and takes a lot of hard work to learn, but it feels like trivia. So looking back on a skill I mastered, I might think it was "easy" because I'm not including all of the blub that I had to learn, too. The takeaway, of course, is that the outside five-ball cascade is objectively the cutoff between good jugglers and toddlers. Rant time: I love cigar box juggling. It's fun, it's creative, it's totally unlike any other kind of juggling. And it's so niche I straight up cannot find anybody in Chicago to practice with. I once went to a juggling convention and was the only person with a cigar box set there. ↩ This particular part of the juggler's curse is also called the curse of knowledge or "expert blindness". ↩ This isn't Dunning-Kruger, because DK says that people think they are better than they actually are, and also may not actually be real. ↩

a month ago 39 votes

More in programming

ChatGPT Would be a Decent Policy Advisor

Revealed: How the UK tech secretary uses ChatGPT for policy advice by Chris Stokel-Walker for the New Scientist

11 hours ago 3 votes
Setting policy for strategy.

This book’s introduction started by defining strategy as “making decisions.” Then we dug into exploration, diagnosis, and refinement: three chapters where you could argue that we didn’t decide anything at all. Clarifying the problem to be solved is the prerequisite of effective decision making, but eventually decisions do have to be made. Here in this chapter on policy, and the following chapter on operations, we finally start to actually make some decisions. In this chapter, we’ll dig into: How we define policy, and how setting policy differs from operating policy as discussed in the next chapter The structured steps for setting policy How many policies should you set? Is it preferable to have one policy, many policies, or does it not matter much either way? Recurring kinds of policies that appear frequently in strategies Why it’s valuable to be intentional about your strategy’s altitude, and how engineers and executives generally maintain different altitudes in their strategies Criteria to use for evaluating whether your policies are likely to be impactful How to develop novel policies, and why it’s rare Why having multiple bundles of alternative policies is generally a phase in strategy development that indicates a gap in your diagnosis How policies that ignore constraints sound inspirational, but accomplish little Dealing with ambiguity and uncertainty created by missing strategies from cross-functional stakeholders By the end, you’ll be ready to evaluate why an existing strategy’s policies are struggling to make an impact, and to start iterating on policies for strategy of your own. This is an exploratory, draft chapter for a book on engineering strategy that I’m brainstorming in #eng-strategy-book. As such, some of the links go to other draft chapters, both published drafts and very early, unpublished drafts. What is policy? Policy is interpreting your diagnosis into a concrete plan. That plan will be a collection of decisions, tradeoffs, and approaches. They’ll range from coding practices, to hiring mandates, to architectural decisions, to guidance about how choices are made within your organization. An effective policy solves the entirety of the strategy’s diagnosis, although the diagnosis itself is encouraged to specify which aspects can be ignored. For example, the strategy for working with private equity ownership acknowledges in its diagnosis that they don’t have clear guidance on what kind of reduction to expect: Based on general practice, it seems likely that our new Private Equity ownership will expect us to reduce R&D headcount costs through a reduction. However, we don’t have any concrete details to make a structured decision on this, and our approach would vary significantly depending on the size of the reduction. Faced with that uncertainty, the policy simply acknowledges the ambiguity and commits to reconsider when more information becomes available: We believe our new ownership will provide a specific target for Research and Development (R&D) operating expenses during the upcoming financial year planning. We will revise these policies again once we have explicit targets, and will delay planning around reductions until we have those numbers to avoid running two overlapping processes. There are two frequent points of confusion when creating policies that are worth addressing directly: Policy is a subset of strategy, rather than the entirety of strategy, because policy is only meaningful in the context of the strategy’s diagnosis. For example, the “N-1 backfill policy” makes sense in the context of new, private equity ownership. The policy wouldn’t work well in a rapidly expanding organization. Any strategy without a policy is useless, but you’ll also find policies without context aren’t worth much either. This is particularly unfortunate, because so often strategies are communicated without those critical sections. Policy describes how tradeoffs should be made, but it doesn’t verify how the tradeoffs are actually being made in practice. The next chapter on operations covers how to inspect an organization’s behavior to ensure policies are followed. When reworking a strategy to be more readable, it often makes sense to merge policy and operation sections together. However, when drafting strategy it’s valuable to keep them separate. Yes, you might use a weekly meeting to review whether the policy is being followed, but whether it’s an effective policy is independent of having such a meeting, and what operational mechanisms you use will vary depending on the number of policies you intend to implement. With this definition in mind, now we can move onto the more interesting discussion of how to set policy. How to set policy Every part of writing a strategy feels hard when you’re doing it, but I personally find that writing policy either feels uncomfortably easy or painfully challenging. It’s never a happy medium. Fortunately, the exploration and diagnosis usually come together to make writing your policy simple: although sometimes that simple conclusion may be a difficult one to swallow. The steps I follow to write a strategy’s policy are: Review diagnosis to ensure it captures the most important themes. It doesn’t need to be perfect, but it shouldn’t have omissions so obvious that you can immediately identify them. Select policies that address the diagnosis. Explicitly match each policy to one or more diagnoses that it addresses. Continue adding policies until every diagnosis is covered. This is a broad instruction, but it’s simpler than it sounds because you’ll typically select from policies identified during your exploration phase. However, there certainly is space to tweak those policies, and to reapply familiar policies to new circumstances. If you do find yourself developing a novel policy, there’s a later section in this chapter, Developing novel policies, that addresses that topic in more detail. Consolidate policies in cases where they overlap or adjoin. For example, two policies about specific teams might be generalized into a policy about all teams in the engineering organization. Backtest policy against recent decisions you’ve made. This is particularly effective if you maintain a decision log in your organization. Mine for conflict once again, much as you did in developing your diagnosis. Emphasize feedback from teams and individuals with a different perspective than your own, but don’t wholly eliminate those that you agree with. Just as it’s easy to crowd out opposing views in diagnosis if you don’t solicit their input, it’s possible to accidentally crowd out your own perspective if you anchor too much on others’ perspectives. Consider refinement if you finish writing, and you just aren’t sure your approach works – that’s fine! Return to the refinement phase by deploying one of the refinement techniques to increase your conviction. Remember that we talk about strategy like it’s done in one pass, but almost all real strategy takes many refinement passes. The steps of writing policy are relatively pedestrian, largely because you’ve done so much of the work already in the exploration, diagnosis, and refinement steps. If you skip those phases, you’d likely follow the above steps for writing policy, but the expected quality of the policy itself would be far lower. How many policies? Addressing the entirety of the diagnosis is often complex, which is why most strategies feature a set of policies rather than just one. The strategy for decomposing a monolithic application is not one policy deciding not to decompose, but a series of four policies: Business units should always operate in their own code repository and monolith. New integrations across business unit monoliths should be done using gRPC. Except for new business unit monoliths, we don’t allow new services. Merge existing services into business-unit monoliths where you can. Four isn’t universally the right number either. It’s simply the number that was required to solve that strategy’s diagnosis. With an excellent diagnosis, your policies will often feel inevitable, and perhaps even boring. That’s great: what makes a policy good is that it’s effective, not that it’s novel or inspiring. Kinds of policies While there are so many policies you can write, I’ve found they generally fall into one of four major categories: approvals, allocations, direction, and guidance. This section introduces those categories. Approvals define the process for making a recurring decision. This might require invoking an architecture advice process, or it might require involving an authority figure like an executive. In the Index post-acquisition integration strategy, there were a number of complex decisions to be made, and the approval mechanism was: Escalations come to paired leads: given our limited shared context across teams, all escalations must come to both Stripe’s Head of Traffic Engineering and Index’s Head of Engineering. This allowed the acquired and acquiring teams to start building trust between each other by ensuring both were consulted before any decision was finalized. On the other hand, the user data access strategy’s approval strategy was more focused on managing corporate risk: Exceptions must be granted in writing by CISO. While our overarching Engineering Strategy states that we follow an advisory architecture process as described in Facilitating Software Architecture, the customer data access policy is an exception and must be explicitly approved, with documentation, by the CISO. Start that process in the #ciso channel. These two different approval processes had different goals, so they made tradeoffs differently. There are so many ways to tweak approval, allowing for many different tradeoffs between safety, productivity, and trust. Allocations describe how resources are split across multiple potential investments. Allocations are the most concrete statement of organizational priority, and also articulate the organization’s belief about how productivity happens in teams. Some companies believe you go fast by swarming more people onto critical problems. Other companies believe you go fast by forcing teams to solve problems without additional headcount. Both can work, and teach you something important about the company’s beliefs. The strategy on Uber’s service migration has two concrete examples of allocation policies. The first describes the Infrastructure engineering team’s allocation between manual provision tasks and investing into creating a self-service provisioning platform: Constrain manual provisioning allocation to maximize investment in self-service provisioning. The service provisioning team will maintain a fixed allocation of one full time engineer on manual service provisioning tasks. We will move the remaining engineers to work on automation to speed up future service provisioning. This will degrade manual provisioning in the short term, but the alternative is permanently degrading provisioning by the influx of new service requests from newly hired product engineers. The second allocation policy is implicitly noted in this strategy’s diagnosis, where it describes the allocation policy in the Engineering organization’s higher altitude strategy: Within infrastructure engineering, there is a team of four engineers responsible for service provisioning today. While our organization is growing at a similar rate as product engineering, none of that additional headcount is being allocated directly to the team working on service provisioning. We do not anticipate this changing. Allocation policies often create a surprising amount of clarity for the team, and I include them in almost every policy I write either explicitly, or implicitly in a higher altitude strategy. Direction provides explicit instruction on how a decision must be made. This is the right tool when you know where you want to go, and exactly the way that you want to get there. Direction is appropriate for problems you understand clearly, and you value consistency more than empowering individual judgment. Direction works well when you need an unambiguous policy that doesn’t leave room for interpretation. For example, Calm’s policy for working in the monolith: We write all code in the monolith. It has been ambiguous if new code (especially new application code) should be written in our JavaScript monolith, or if all new code must be written in a new service outside of the monolith. This is no longer ambiguous: all new code must be written in the monolith. In the rare case that there is a functional requirement that makes writing in the monolith implausible, then you should seek an exception as described below. In that case, the team couldn’t agree on what should go into the monolith. Individuals would often make incompatible decisions, so creating consistency required removing personal judgment from the equation. Sometimes judgment is the issue, and sometimes consistency is difficult due to misaligned incentives. A good example of this comes in strategy on working with new Private Equity ownership: We will move to an “N-1” backfill policy, where departures are backfilled with a less senior level. We will also institute a strict maximum of one Principal Engineer per business unit. It’s likely that hiring managers would simply ignore this backfill policy if it was stated more softly, although sometimes less forceful policies are useful. Guidance provides a recommendation about how a decision should be made. Guidance is useful when there’s enough nuance, ambiguity, or complexity that you can explain the desired destination, but you can’t mandate the path to reaching it. One example of guidance comes from the Index acquisition integration strategy: Minimize changes to tokenization environment: because point-of-sale devices directly work with customer payment details, the API that directly supports the point-of-sale device must live within our secured environment where payment details are stored. However, any other functionality must not be added to our tokenization environment. This might read like direction, but it’s clarifying the desired outcome of avoiding unnecessary complexity in the tokenization environment. However, it’s not able to articulate what complexity is necessary, so ultimately it’s guidance because it requires significant judgment to interpret. A second example of guidance comes in the strategy on decomposing a monolithic codebase: Merge existing services into business-unit monoliths where you can. We believe that each choice to move existing services back into a monolith should be made “in the details” rather than from a top-down strategy perspective. Consequently, we generally encourage teams to wind down their existing services outside of their business unit’s monolith, but defer to teams to make the right decision for their local context. This is another case of knowing the desired outcome, but encountering too much uncertainty to direct the team on how to get there. If you ask five engineers about whether it’s possible to merge a given service back into a monolithic codebase, they’ll probably disagree. That’s fine, and highlights the value of guidance: it makes it possible to make incremental progress in areas where more concrete direction would cause confusion. When you’re working on a strategy’s policy section, it’s important to consider all of these categories. Which feel most natural to use will vary depending on your team and role, but they’re all usable: If you’re a developer productivity team, you might have to lean heavily on guidance in your policies and increased support for that guidance within the details of your platform. If you’re an executive, you might lean heavily on direction. Indeed, you might lean too heavily on direction, where guidance often works better for areas where you understand the direction but not the path. If you’re a product engineering organization, you might have to narrow the scope of your direction to the engineers within that organization to deal with the realities of complex cross-organization dynamics. Finally, if you have a clear approach you want to take that doesn’t fit cleanly into any of these categories, then don’t let this framework dissuade you. Give it a try, and adapt if it doesn’t initially work out. Maintaining strategy altitude The chapter on when to write engineering strategy introduced the concept of strategy altitude, which is being deliberate about where certain kinds of policies are created within your organization. Without repeating that section in its entirety, it’s particularly relevant when you set policy to consider how your new policies eliminate flexibility within your organization. Consider these two somewhat opposing strategies: Stripe’s Sorbet strategy only worked in an organization that enforced the use of a single programming language across (essentially) all teams Uber’s service migration strategy worked well in an organization that was unwilling to enforce consistent programming language adoption across teams Stripe’s organization-altitude policy took away the freedom of individual teams to select their preferred technology stack. In return, they unlocked the ability to centralize investment in a powerful way. Uber went the opposite way, unlocking the ability of teams to pick their preferred technology stack, while significantly reducing their centralized teams’ leverage. Both altitudes make sense. Both have consequences. Criteria for effective policies In The Engineering Executive’s Primer’s chapter on engineering strategy, I introduced three criteria for evaluating policies. They ought to be applicable, enforced, and create leverage. Defining those a bit: Applicable: it can be used to navigate complex, real scenarios, particularly when making tradeoffs. Enforced: teams will be held accountable for following the guiding policy. Create Leverage: create compounding or multiplicative impact. The last of these three, create leverage, made sense in the context of a book about engineering executives, but probably doesn’t make as much sense here. Some policies certainly should create leverage (e.g. empower developer experience team by restricting new services), but others might not (e.g. moving to an N-1 backfill policy). Outside the executive context, what’s important isn’t necessarily creating leverage, but that a policy solves for part of the diagnosis. That leaves the other two–being applicable and enforced–both of which are necessary for a policy to actually address the diagnosis. Any policy which you can’t determine how to apply, or aren’t willing to enforce, simply won’t be useful. Let’s apply these criteria to a handful of potential policies. First let’s think about policies we might write to improve the talent density of our engineering team: “We only hire world-class engineers.” This isn’t applicable, because it’s unclear what a world-class engineer means. Because there’s no mutually agreeable definition in this policy, it’s also not consistently enforceable. “We only hire engineers that get at least one ‘strong yes’ in scorecards.” This is applicable, because there’s a clear definition. This is enforceable, depending on the willingness of the organization to reject seemingly good candidates who don’t happen to get a strong yes. Next, let’s think about a policy regarding code reuse within a codebase: “We follow a strict Don’t Repeat Yourself policy in our codebase.” There’s room for debate within a team about whether two pieces of code are truly duplicative, but this is generally applicable. Because there’s room for debate, it’s a very context specific determination to decide how to enforce a decision. “Code authors are responsible for determining if their contributions violate Don’t Repeat Yourself, and rewriting them if they do.” This is much more applicable, because now there’s only a single person’s judgment to assess the potential repetition. In some ways, this policy is also more enforceable, because there’s no longer any ambiguity around who is deciding whether a piece of code is a repetition. The challenge is that enforceability now depends on one individual, and making this policy effective will require holding individuals accountable for the quality of their judgement. An organization that’s unwilling to distinguish between good and bad judgment won’t get any value out of the policy. This is a good example of how a good policy in one organization might become a poor policy in another. If you ever find yourself wanting to include a policy that for some reason either can’t be applied or can’t be enforced, stop to ask yourself what you’re trying to accomplish and ponder if there’s a different policy that might be better suited to that goal. Developing novel policies My experience is that there are vanishingly few truly novel policies to write. There’s almost always someone else has already done something similar to your intended approach. Calm’s engineering strategy is such a case: the details are particular to the company, but the general approach is common across the industry. The most likely place to find truly novel policies is during the adoption phase of a new widespread technology, such as the rise of ubiquitous mobile phones, cloud computing, or large language models. Even then, as explored in the strategy for adopting large-language models, the new technology can be engaged with as a generic technology: Develop an LLM-backed process for reactivating departed and suspended drivers in mature markets. Through modeling our driver lifecycle, we determined that improving onboarding time will have little impact on the total number of active drivers. Instead, we are focusing on mechanisms to reactivate departed and suspended drivers, which is the only opportunity to meaningfully impact active drivers. You could simply replace “LLM” with “data-driven” and it would be equally readable. In this way, policy can generally sidestep areas of uncertainty by being a bit abstract. This avoids being overly specific about topics you simply don’t know much about. However, even if your policy isn’t novel to the industry, it might still be novel to you or your organization. The steps that I’ve found useful to debug novel policies are the same steps as running a condensed version of the strategy process, with a focus on exploration and refinement: Collect a number of similar policies, with a focus on how those policies differ from the policy you are creating Create a systems model to articulate how this policy will work, and also how it will differ from the similar policies you’re considering Run a strategy testing cycle for your proto-policy to discover any unknown-unknowns about how it works in practice Whether you run into this scenario is largely a function of the extent of your, and your organization’s, experience. Early in my career, I found myself doing novel (for me) strategy work very frequently, and these days I rarely find myself doing novel work, instead focusing on adaptation of well-known policies to new circumstances. Are competing policy proposals an anti-pattern? When creating policy, you’ll often have to engage with the question of whether you should develop one preferred policy or a series of potential strategies to pick from. Developing these is a useful stage of setting policy, but rather than helping you refine your policy, I’d encourage you to think of this as exposing gaps in your diagnosis. For example, when Stripe developed the Sorbet ruby-typing tooling, there was debate between two policies: Should we build a ruby-typing tool to allow a centralized team to gradually migrate the company to a typed codebase? Should we migrate the codebase to a preexisting strongly typed language like Golang or Java? These were, initially, equally valid hypotheses. It was only by clarifying our diagnosis around resourcing that it became clear that incurring the bulk of costs in a centralized team was clearly preferable to spreading the costs across many teams. Specifically, recognizing that we wanted to prioritize short-term product engineering velocity, even if it led to a longer migration overall. If you do develop multiple policy options, I encourage you to move the alternatives into an appendix rather than including them in the core of your strategy document. This will make it easier for readers of your final version to understand how to follow your policies, and they are the most important long-term user of your written strategy. Recognizing constraints A similar problem to competing solutions is developing a policy that you cannot possibly fund. It’s easy to get enamored with policies that you can’t meaningfully enforce, but that’s bad policy, even if it would work in an alternate universe where it was possible to enforce or resource it. To consider a few examples: The strategy for controlling access to user data might have proposed requiring manual approval by a second party of every access to customer data. However, that would have gone nowhere. Our approach to Uber’s service migration might have required more staffing for the infrastructure engineering team, but we knew that wasn’t going to happen, so it was a meaningless policy proposal to make. The strategy for navigating private equity ownership might have argued that new ownership should not hold engineering accountable to a new standard on spending. But they would have just invalidated that strategy in the next financial planning period. If you find a policy that contemplates an impractical approach, it doesn’t only indicate that the policy is a poor one, it also suggests your policy is missing an important pillar. Rather than debating the policy options, the fastest path to resolution is to align on the diagnosis that would invalidate potential paths forward. In cases where aligning on the diagnosis isn’t possible, for example because you simply don’t understand the possibilities of a new technology as encountered in the strategy for adopting LLMs, then you’ve typically found a valuable opportunity to use strategy refinement to build alignment. Dealing with missing strategies At a recent company offsite, we were debating which policies we might adopt to deal with annual plans that kept getting derailed after less than a month. Someone remarked that this would be much easier if we could get the executive team to commit to a clearer, written strategy about which business units we were prioritizing. They were, of course, right. It would be much easier. Unfortunately, it goes back to the problem we discussed in the diagnosis chapter about reframing blockers into diagnosis. If a strategy from the company or a peer function is missing, the empowering thing to do is to include the absence in your diagnosis and move forward. Sometimes, even when you do this, it’s easy to fall back into the belief that you cannot set a policy because a peer function might set a conflicting policy in the future. Whether you’re an executive or an engineer, you’ll never have the details you want to make the ideal policy. Meaningful leadership requires taking meaningful risks, which is never something that gets comfortable. Summary After working through this chapter, you know how to develop policy, how to assemble policies to solve your diagnosis, and how to avoid a number of the frequent challenges that policy writers encounter. At this point, there’s only one phase of strategy left to dig into, operating the policies you’ve created.

16 hours ago 3 votes
Fast and random sampling in SQLite

I was building a small feature for the Flickr Commons Explorer today: show a random selection of photos from the entire collection. I wanted a fast and varied set of photos. This meant getting a random sample of rows from a SQLite table (because the Explorer stores all its data in SQLite). I’m happy with the code I settled on, but it took several attempts to get right. Approach #1: ORDER BY RANDOM() My first attempt was pretty naïve – I used an ORDER BY RANDOM() clause to sort the table, then limit the results: SELECT * FROM photos ORDER BY random() LIMIT 10 This query works, but it was slow – about half a second to sample a table with 2 million photos (which is very small by SQLite standards). This query would run on every request for the homepage, so that latency is unacceptable. It’s slow because it forces SQLite to generate a value for every row, then sort all the rows, and only then does it apply the limit. SQLite is fast, but there’s only so fast you can sort millions of values. I found a suggestion from Stack Overflow user Ali to do a random sort on the id column first, pick my IDs from that, and only fetch the whole row for the photos I’m selecting: SELECT * FROM photos WHERE id IN ( SELECT id FROM photos ORDER BY RANDOM() LIMIT 10 ) This means SQLite only has to load the rows it’s returning, not every row in the database. This query was over three times faster – about 0.15s – but that’s still slower than I wanted. Approach #2: WHERE rowid > (…) Scrolling down the Stack Overflow page, I found an answer by Max Shenfield with a different approach: SELECT * FROM photos WHERE rowid > ( ABS(RANDOM()) % (SELECT max(rowid) FROM photos) ) LIMIT 10 The rowid is a unique identifier that’s used as a primary key in most SQLite tables, and it can be looked up very quickly. SQLite automatically assigns a unique rowid unless you explicitly tell it not to, or create your own integer primary key. This query works by picking a point between the biggest and smallest rowid values used in the table, then getting the rows with rowids which are higher than that point. If you want to know more, Max’s answer has a more detailed explanation. This query is much faster – around 0.0008s – but I didn’t go this route. The result is more like a random slice than a random sample. In my testing, it always returned contiguous rows – 101, 102, 103, … – which isn’t what I want. The photos in the Commons Explorer database were inserted in upload order, so photos with adjacent row IDs were uploaded at around the same time and are probably quite similar. I’d get one photo of an old plane, then nine more photos of other planes. I want more variety! (This behaviour isn’t guaranteed – if you don’t add an ORDER BY clause to a SELECT query, then the order of results is undefined. SQLite is returning rows in rowid order in my table, and a quick Google suggests that’s pretty common, but that may not be true in all cases. It doesn’t affect whether I want to use this approach, but I mention it here because I was confused about the ordering when I read this code.) Approach #3: Select random rowid values outside SQLite Max’s answer was the first time I’d heard of rowid, and it gave me an idea – what if I chose random rowid values outside SQLite? This is a less “pure” approach because I’m not doing everything in the database, but I’m happy with that if it gets the result I want. Here’s the procedure I came up with: Create an empty list to store our sample. Find the highest rowid that’s currently in use: sqlite> SELECT MAX(rowid) FROM photos; 1913389 Use a random number generator to pick a rowid between 1 and the highest rowid: >>> import random >>> random.randint(1, max_rowid) 196476 If we’ve already got this rowid, discard it and generate a new one. (The rowid is a signed, 64-bit integer, so the minimum possible value is always 1.) Look for a row with that rowid: SELECT * FROM photos WHERE rowid = 196476 If such a row exists, add it to our sample. If we have enough items in our sample, we’re done. Otherwise, return to step 3 and generate another rowid. If such a row doesn’t exist, return to step 3 and generate another rowid. This requires a bit more code, but it returns a diverse sample of photos, which is what I really care about. It’s a bit slower, but still plenty fast enough (about 0.001s). This approach is best for tables where the rowid values are mostly contiguous – it would be slower if there are lots of rowids between 1 and the max that don’t exist. If there are large gaps in rowid values, you might try multiple missing entries before finding a valid row, slowing down the query. You might want to try something different, like tracking valid rowid values separately. This is a good fit for my use case, because photos don’t get removed from Flickr Commons very often. Once a row is written, it sticks around, and over 97% of the possible rowid values do exist. Summary Here are the four approaches I tried: Approach Performance (for 2M rows) Notes ORDER BY RANDOM() ~0.5s Slowest, easiest to read WHERE id IN (SELECT id …) ~0.15s Faster, still fairly easy to understand WHERE rowid > ... ~0.0008s Returns clustered results Random rowid in Python ~0.001s Fast and returns varied results, requires code outside SQL I’m using the random rowid in Python in the Commons Explorer, trading code complexity for speed. I’m using this random sample to render a web page, so it’s important that it returns quickly – when I was testing ORDER BY RANDOM(), I could feel myself waiting for the page to load. But I’ve used ORDER BY RANDOM() in the past, especially for asynchronous data pipelines where I don’t care about absolute performance. It’s simpler to read and easier to see what’s going on. Now it’s your turn – visit the Commons Explorer and see what random gems you can find. Let me know if you spot anything cool! [If the formatting of this post looks odd in your feed reader, visit the original article]

7 hours ago 1 votes
Choosing Languages
yesterday 2 votes
05 · Syncing Keyhive

How we sync Keyhive and Automerge

yesterday 1 votes