Betteridge's Law of Software Engineering Specialness

from Computer Things [alt+shift+b] in programming

Logic for Programmers v0.8 now out! The new release has minor changes: new formatting for notes and a better introduction to predicates. I would have rolled it all into v0.9 next month but I like the monthly cadence. Get it here! Betteridge's Law of Software Engineering Specialness In There is No Automatic Reset in Engineering, Tim Ottinger asks: Do the other people have to live with January 2013 for the rest of their lives? Or is it only engineering that has to deal with every dirty hack since the beginning of the organization? Betteridge's Law of Headlines says that if a journalism headline ends with a question mark, the answer is probably "no". I propose a similar law relating to software engineering specialness:1 If someone asks if some aspect of software development is truly unique to just software development, the answer is probably "no". Take the idea that "in software, hacks are forever." My favorite example of this comes from a different profession. The Dewey Decimal System...

5 months ago

Remove from reading list Add to reading list [alt+a] Read now [→]

Comments

Improve your reading experience

Logged in users get linked directly to articles resulting in a better reading experience. Please login for free, it takes less than 1 minute.

More from Computer Things

The Angels and Demons of Nondeterminism

Greetings everyone! You might have noticed that it's September and I don't have the next version of Logic for Programmers ready. As penance, here's ten free copies of the book. So a few months ago I wrote a newsletter about how we use nondeterminism in formal methods. The overarching idea: Nondeterminism is when multiple paths are possible from a starting state. A system preserves a property if it holds on all possible paths. If even one path violates the property, then we have a bug. An intuitive model of this is that for this is that when faced with a nondeterministic choice, the system always makes the worst possible choice. This is sometimes called demonic nondeterminism and is favored in formal methods because we are paranoid to a fault. The opposite would be angelic nondeterminism, where the system always makes the best possible choice. A property then holds if any possible path satisfies that property.1 This is not as common in FM, but it still has its uses! "Players can access the secret level" or "We can always shut down the computer" are reachability properties, that something is possible even if not actually done. In broader computer science research, I'd say that angelic nondeterminism is more popular, due to its widespread use in complexity analysis and programming languages. Complexity Analysis P is the set of all "decision problems" (basically, boolean functions) can be solved in polynomial time: there's an algorithm that's worst-case in O(n), O(n²), O(n³), etc.2 NP is the set of all problems that can be solved in polynomial time by an algorithm with angelic nondeterminism.3 For example, the question "does list l contain x" can be solved in O(1) time by a nondeterministic algorithm: fun is_member(l: List[T], x: T): bool { if l == [] {return false}; guess i in 0..<(len(l)-1); return l[i] == x; } Say call is_member([a, b, c, d], c). The best possible choice would be to guess i = 2, which would correctly return true. Now call is_member([a, b], d). No matter what we guess, the algorithm correctly returns false. and just return false. Ergo, O(1). NP stands for "Nondeterministic Polynomial". (And I just now realized something pretty cool: you can say that P is the set of all problems solvable in polynomial time under demonic nondeterminism, which is a nice parallel between the two classes.) Computer scientists have proven that angelic nondeterminism doesn't give us any more "power": there are no problems solvable with AN that aren't also solvable deterministically. The big question is whether AN is more efficient: it is widely believed, but not proven, that there are problems in NP but not in P. Most famously, "Is there any variable assignment that makes this boolean formula true?" A polynomial AN algorithm is again easy: fun SAT(f(x1, x2, …: bool): bool): bool { N = num_params(f) for i in 1..=num_params(f) { guess x_i in {true, false} } return f(x_1, x_2, …) } The best deterministic algorithms we have to solve the same problem are worst-case exponential with the number of boolean parameters. This a real frustrating problem because real computers don't have angelic nondeterminism, so problems like SAT remain hard. We can solve most "well-behaved" instances of the problem in reasonable time, but the worst-case instances get intractable real fast. Means of Abstraction We can directly turn an AN algorithm into a (possibly much slower) deterministic algorithm, such as by backtracking. This makes AN a pretty good abstraction over what an algorithm is doing. Does the regex (a+b)\1+ match "abaabaabaab"? Yes, if the regex engine nondeterministically guesses that it needs to start at the third letter and make the group aab. How does my PL's regex implementation find that match? I dunno, backtracking or NFA construction or something, I don't need to know the deterministic specifics in order to use the nondeterministic abstraction. Neel Krishnaswami has a great definition of 'declarative language': "any language with a semantics has some nontrivial existential quantifiers in it". I'm not sure if this is identical to saying "a language with an angelic nondeterministic abstraction", but they must be pretty close, and all of his examples match: SQL's selects and joins Parsing DSLs Logic programming's unification Constraint solving On top of that I'd add CSS selectors and planner's actions; all nondeterministic abstractions over a deterministic implementation. He also says that the things programmers hate most in declarative languages are features that "that expose the operational model": constraint solver search strategies, Prolog cuts, regex backreferences, etc. Which again matches my experiences with angelic nondeterminism: I dread features that force me to understand the deterministic implementation. But they're necessary, since P probably != NP and so we need to worry about operational optimizations. Eldritch Nondeterminism If you need to know the ratio of good/bad paths, the number of good paths, or probability, or anything more than "there is a good path" or "there is a bad path", you are beyond the reach of heaven or hell. Angelic and demonic nondeterminism are duals: angelic returns "yes" if some choice: correct and demonic returns "no" if !all choice: correct, which is the same as some choice: !correct. ↩ Pet peeve about Big-O notation: O(n²) is the set of all algorithms that, for sufficiently large problem sizes, grow no faster that quadratically. "Bubblesort has O(n²) complexity" should be written Bubblesort in O(n²), not Bubblesort = O(n²). ↩ To be precise, solvable in polynomial time by a Nondeterministic Turing Machine, a very particular model of computation. We can broadly talk about P and NP without framing everything in terms of Turing machines, but some details of complexity classes (like the existence "weak NP-hardness") kinda need Turing machines to make sense. ↩

4 days ago • 11 votes

Logical Duals in Software Engineering

(Last week's newsletter took too long and I'm way behind on Logic for Programmers revisions so short one this time.1) In classical logic, two operators F/G are duals if F(x) = !G(!x). Three examples: x || y is the same as !(!x && !y). <>P ("P is possibly true") is the same as ![]!P ("not P isn't definitely true"). some x in set: P(x) is the same as !(all x in set: !P(x)). (1) is just a version of De Morgan's Law, which we regularly use to simplify boolean expressions. (2) is important in modal logic but has niche applications in software engineering, mostly in how it powers various formal methods.2 The real interesting one is (3), the "quantifier duals". We use lots of software tools to either find a value satisfying P or check that all values satisfy P. And by duality, any tool that does one can do the other, by seeing if it fails to find/check !P. Some examples in the wild: Z3 is used to solve mathematical constraints, like "find x, where f(x) >= 0. If I want to prove a property like "f is always positive", I ask z3 to solve "find x, where !(f(x) >= 0), and see if that is unsatisfiable. This use case powers a LOT of theorem provers and formal verification tooling. Property testing checks that all inputs to a code block satisfy a property. I've used it to generate complex inputs with certain properties by checking that all inputs don't satisfy the property and reading out the test failure. Model checkers check that all behaviors of a specification satisfy a property, so we can find a behavior that reaches a goal state G by checking that all states are !G. Here's TLA+ solving a puzzle this way.3 Planners find behaviors that reach a goal state, so we can check if all behaviors satisfy a property P by asking it to reach goal state !P. The problem "find the shortest traveling salesman route" can be broken into some route: distance(route) = n and all route: !(distance(route) < n). Then a route finder can find the first, and then convert the second into a some and fail to find it, proving n is optimal. Even cooler to me is when a tool does both finding and checking, but gives them different "meanings". In SQL, some x: P(x) is true if we can query for P(x) and get a nonempty response, while all x: P(x) is true if all records satisfy the P(x) constraint. Most SQL databases allow for complex queries but not complex constraints! You got UNIQUE, NOT NULL, REFERENCES, which are fixed predicates, and CHECK, which is one-record only.4 Oh, and you got database triggers, which can run arbitrary queries and throw exceptions. So if you really need to enforce a complex constraint P(x, y, z), you put in a database trigger that queries some x, y, z: !P(x, y, z) and throws an exception if it finds any results. That all works because of quantifier duality! See here for an example of this in practice. Duals more broadly "Dual" doesn't have a strict meaning in math, it's more of a vibe thing where all of the "duals" are kinda similar in meaning but don't strictly follow all of the same rules. Usually things X and Y are duals if there is some transform F where X = F(Y) and Y = F(X), but not always. Maybe the category theorists have a formal definition that covers all of the different uses. Usually duals switch properties of things, too: an example showing some x: P(x) becomes a counterexample of all x: !P(x). Under this definition, I think the dual of a list l could be reverse(l). The first element of l becomes the last element of reverse(l), the last becomes the first, etc. A more interesting case is the dual of a K -> set(V) map is the V -> set(K) map. IE the dual of lived_in_city = {alice: {paris}, bob: {detroit}, charlie: {detroit, paris}} is city_lived_in_by = {paris: {alice, charlie}, detroit: {bob, charlie}}. This preserves the property that x in map[y] <=> y in dual[x]. And after writing this I just realized this is partial retread of a newsletter I wrote a couple months ago. But only a partial retread! ↩ Specifically "linear temporal logics" are modal logics, so "eventually P ("P is true in at least one state of each behavior") is the same as saying !always !P ("not P isn't true in all states of all behaviors"). This is the basis of liveness checking. ↩ I don't know for sure, but my best guess is that Antithesis does something similar when their fuzzer beats videogames. They're doing fuzzing, not model checking, but they have the same purpose check that complex state spaces don't have bugs. Making the bug "we can't reach the end screen" can make a fuzzer output a complete end-to-end run of the game. Obvs a lot more complicated than that but that's the general idea at least. ↩ For CHECK to constraint multiple records you would need to use a subquery. Core SQL does not support subqueries in check. It is an optional database "feature outside of core SQL" (F671), which Postgres does not support. ↩

a week ago • 18 votes

Sapir-Whorf does not apply to Programming Languages

This one is a hot mess but it's too late in the week to start over. Oh well! Someone recognized me at last week's Chipy and asked for my opinion on Sapir-Whorf hypothesis in programming languages. I thought this was interesting enough to make a newsletter. First what it is, then why it looks like it applies, and then why it doesn't apply after all. The Sapir-Whorf Hypothesis We dissect nature along lines laid down by our native language. — Whorf To quote from a Linguistics book I've read, the hypothesis is that "an individual's fundamental perception of reality is moulded by the language they speak." As a massive oversimplification, if English did not have a word for "rebellion", we would not be able to conceive of rebellion. This view, now called Linguistic Determinism, is mostly rejected by modern linguists. The "weak" form of SWH is that the language we speak influences, but does not decide our cognition. For example, Russian has distinct words for "light blue" and "dark blue", so can discriminate between "light blue" and "dark blue" shades faster than they can discriminate two "light blue" shades. English does not have distinct words, so we discriminate those at the same speed. This linguistic relativism seems to have lots of empirical support in studies, but mostly with "small indicators". I don't think there's anything that convincingly shows linguistic relativism having effects on a societal level.1 The weak form of SWH for software would then be the "the programming languages you know affects how you think about programs." SWH in software This seems like a natural fit, as different paradigms solve problems in different ways. Consider the hardest interview question ever, "given a list of integers, sum the even numbers". Here it is in four paradigms: Procedural: total = 0; foreach x in list {if IsEven(x) total += x}. You iterate over data with an algorithm. Functional: reduce(+, filter(IsEven, list), 0). You apply transformations to data to get a result. Array: + fold L * iseven L.2 In English: replace every element in L with 0 if odd and 1 if even, multiple the new array elementwise against L, and then sum the resulting array. It's like functional except everything is in terms of whole-array transformations. Logical: Somethingish like sumeven(0, []). sumeven(X, [Y|L]) :- iseven(Y) -> sumeven(Z, L), X is Y + Z ; sumeven(X, L). You write a set of equations that express what it means for X to be the sum of events of L. There's some similarities between how these paradigms approach the problem, but each is also unique, too. It's plausible that where a procedural programmer "sees" a for loop, a functional programmer "sees" a map and an array programmer "sees" a singular operator. I also have a personal experience with how a language changed the way I think. I use TLA+ to detect concurrency bugs in software designs. After doing this for several years, I've gotten much better at intuitively seeing race conditions in things even without writing a TLA+ spec. It's even leaked out into my day-to-day life. I see concurrency bugs everywhere. Phone tag is a race condition. But I still don't think SWH is the right mental model to use, for one big reason: language is special. We think in language, we dream in language, there are huge parts of our brain dedicated to processing language. We don't use those parts of our brain to read code. SWH is so intriguing because it seems so unnatural, that the way we express thoughts changes the way we think thoughts. That I would be a different person if I was bilingual in Spanish, not because the life experiences it would open up but because grammatical gender would change my brain. Compared to that, the idea that programming languages affect our brain is more natural and has a simpler explanation: It's the goddamned Tetris Effect. The Goddamned Tetris Effect The Tetris effect occurs when someone dedicates vast amounts of time, effort and concentration on an activity which thereby alters their thoughts, dreams, and other experiences not directly linked to said activity. — Wikipedia Every skill does this. I'm a juggler, so every item I can see right now has a tiny metadata field of "how would this tumble if I threw it up". I teach professionally, so I'm always noticing good teaching examples everywhere. I spent years writing specs in TLA+ and watching the model checker throw concurrency errors in my face, so now race conditions have visceral presence. Every skill does this. And to really develop a skill, you gotta practice. This is where I think programming paradigms do something especially interesting that make them feel more like Sapir-Whorfy than, like, juggling. Some languages mix lots of different paradigms, like Javascript or Rust. Others like Haskell really focus on excluding paradigms. If something is easy for you in procedural and hard in FP, in JS you could just lean on the procedural bits. In Haskell, too bad, you're learning how to do it the functional way.3 And that forces you to practice, which makes you see functional patterns everywhere. Tetris effect! Anyway this may all seem like quibbling— why does it matter whether we call it "Tetris effect" or "Sapir-Whorf", if our brains is get rewired either way? For me, personally, it's because SWH sounds really special and unique, while Tetris effect sounds mundane and commonplace. Which it is. But also because TE suggests it's not just programming languages that affect how we think about software, it's everything. Spending lots of time debugging, profiling, writing exploits, whatever will change what you notice, what you think a program "is". And that's a way useful idea that shouldn't be restricted to just PLs. (Then again, the Tetris Effect might also be a bad analogy to what's going on here, because I think part of it is that it wears off after a while. Maybe it's just "building a mental model is good".) I just realized all of this might have missed the point Wait are people actually using SWH to mean the weak form or the strong form? Like that if a language doesn't make something possible, its users can't conceive of it being possible. I've been arguing against the weaker form in software but I think I've seen strong form often too. Dammit. Well, it's already Thursday and far too late to rewrite the whole newsletter, so I'll just outline the problem with the strong form: we describe the capabilities of our programming languages with human language. In college I wrote a lot of crappy physics lab C++ and one of my projects was filled with comments like "man I hate copying this triply-nested loop in 10 places with one-line changes, I wish I could put it in one function and just take the changing line as a parameter". Even if I hadn't encountered higher-order functions, I was still perfectly capable of expressing the idea. So if the strong SWH isn't true for human language, it's not true for programming languages either. Systems Distributed talk now up! Link here! Original abstract: Building correct distributed systems takes thinking outside the box, and the fastest way to do that is to think inside a different box. One different box is "formal methods", the discipline of mathematically verifying software and systems. Formal methods encourages unusual perspectives on systems, models that are also broadly useful to all software developers. In this talk we will learn two of the most important FM perspectives: the abstract specifications behind software systems, and the property they are and aren't supposed to have. The talk ended up evolving away from that abstract but I like how it turned out! There is one paper arguing that people who speak a language that doesn't have a "future tense" are more likely to save and eat healthy, but it is... extremely questionable. ↩ The original J is +/ (* (0 = 2&|)). Obligatory Notation as a Tool of Thought reference ↩ Though if it's too hard for you, that's why languages have escape hatches ↩

2 weeks ago • 16 votes

Software books I wish I could read

New Logic for Programmers Release! v0.11 is now available! This is over 20% longer than v0.10, with a new chapter on code proofs, three chapter overhauls, and more! Full release notes here. Software books I wish I could read I'm writing Logic for Programmers because it's a book I wanted to have ten years ago. I had to learn everything in it the hard way, which is why I'm ensuring that everybody else can learn it the easy way. Books occupy a sort of weird niche in software. We're great at sharing information via blogs and git repos and entire websites. These have many benefits over books: they're free, they're easily accessible, they can be updated quickly, they can even be interactive. But no blog post has influenced me as profoundly as Data and Reality or Making Software. There is no blog or talk about debugging as good as the Debugging book. It might not be anything deeper than "people spend more time per word on writing books than blog posts". I dunno. So here are some other books I wish I could read. I don't think any of them exist yet but it's a big world out there. Also while they're probably best as books, a website or a series of blog posts would be ok too. Everything about Configurations The whole topic of how we configure software, whether by CLI flags, environmental vars, or JSON/YAML/XML/Dhall files. What causes the configuration complexity clock? How do we distinguish between basic, advanced, and developer-only configuration options? When should we disallow configuration? How do we test all possible configurations for correctness? Why do so many widespread outages trace back to misconfiguration, and how do we prevent them? I also want the same for plugin systems. Manifests, permissions, common APIs and architectures, etc. Configuration management is more universal, though, since everybody either uses software with configuration or has made software with configuration. The Big Book of Complicated Data Schemas I guess this would kind of be like Schema.org, except with a lot more on the "why" and not the what. Why is important for the Volcano model to have a "smokingAllowed" field?1 I'd see this less as "here's your guide to putting Volcanos in your database" and more "here's recurring motifs in modeling interesting domains", to help a person see sources of complexity in their own domain. Does something crop up if the references can form a cycle? If a relationship needs to be strictly temporary, or a reference can change type? Bonus: path dependence in data models, where an additional requirement leads to a vastly different ideal data model that a company couldn't do because they made the old model. (This has got to exist, right? Business modeling is a big enough domain that this must exist. Maybe The Essence of Software touches on this? Man I feel bad I haven't read that yet.) Computer Science for Software Engineers Yes, I checked, this book does not exist (though maybe this is the same thing). I don't have any formal software education; everything I know was either self-taught or learned on the job. But it's way easier to learn software engineering that way than computer science. And I bet there's a lot of other engineers in the same boat. This book wouldn't have to be comprehensive or instructive: just enough about each topic to understand why it's an area of study and appreciate how research in it eventually finds its way into practice. MISU Patterns MISU, or "Make Illegal States Unrepresentable", is the idea of designing system invariants in the structure of your data. For example, if a Contact needs at least one of email or phone to be non-null, make it a sum type over EmailContact, PhoneContact, EmailPhoneContact (from this post). MISU is great. Most MISU in the wild look very different than that, though, because the concept of MISU is so broad there's lots of different ways to achieve it. And that means there are "patterns": smart constructors, product types, properly using sets, newtypes to some degree, etc. Some of them are specific to typed FP, while others can be used in even untyped languages. Someone oughta make a pattern book. My one request would be to not give them cutesy names. Do something like the Aarne–Thompson–Uther Index, where items are given names like "Recognition by manner of throwing cakes of different weights into faces of old uncles". Names can come later. The Tools of '25 Not something I'd read, but something to recommend to junior engineers. Starting out it's easy to think the only bit that matters is the language or framework and not realize the enormous amount of surrounding tooling you'll have to learn. This book would cover the basics of tools that enough developers will probably use at some point: git, VSCode, very basic Unix and bash, curl. Maybe the general concepts of tools that appear in every ecosystem, like package managers, build tools, task runners. That might be easier if we specialize this to one particular domain, like webdev or data science. Ideally the book would only have to be updated every five years or so. No LLM stuff because I don't expect the tooling will be stable through 2026, to say nothing of 2030. A History of Obsolete Optimizations Probably better as a really long blog series. Each chapter would be broken up into two parts: A deep dive into a brilliant, elegant, insightful historical optimization designed to work within the constraints of that era's computing technology What we started doing instead, once we had more compute/network/storage available. c.f. A Spellchecker Used to Be a Major Feat of Software Engineering. Bonus topics would be brilliance obsoleted by standardization (like what people did before git and json were universal), optimizations we do today that may not stand the test of time, and optimizations from the past that did. Sphinx Internals I need this. I've spent so much goddamn time digging around in Sphinx and docutils source code I'm gonna throw up. Systems Distributed Talk Today! Online premier's at noon central / 5 PM UTC, here! I'll be hanging out to answer questions and be awkward. You ever watch a recording of your own talk? It's real uncomfortable! In this case because it's a field on one of Volcano's supertypes. I guess schemas gotta follow LSP too ↩

a month ago • 30 votes

2000 words about arrays and tables

I'm way too discombobulated from getting next month's release of Logic for Programmers ready, so I'm pulling a idea from the slush pile. Basically I wanted to come up with a mental model of arrays as a concept that explained APL-style multidimensional arrays and tables but also why there weren't multitables. So, arrays. In all languages they are basically the same: they map a sequence of numbers (I'll use 1..N)1 to homogeneous values (values of a single type). This is in contrast to the other two foundational types, associative arrays (which map an arbitrary type to homogeneous values) and structs (which map a fixed set of keys to heterogeneous values). Arrays appear in PLs earlier than the other two, possibly because they have the simplest implementation and the most obvious application to scientific computing. The OG FORTRAN had arrays. I'm interested in two structural extensions to arrays. The first, found in languages like nushell and frameworks like Pandas, is the table. Tables have string keys like a struct and indexes like an array. Each row is a struct, so you can get "all values in this column" or "all values for this row". They're heavily used in databases and data science. The other extension is the N-dimensional array, mostly seen in APLs like Dyalog and J. Think of this like arrays-of-arrays(-of-arrays), except all arrays at the same depth have the same length. So [[1,2,3],[4]] is not a 2D array, but [[1,2,3],[4,5,6]] is. This means that N-arrays can be queried on any axis. ]x =: i. 3 3 0 1 2 3 4 5 6 7 8 0 { x NB. first row 0 1 2 0 {"1 x NB. first column 0 3 6 So, I've had some ideas on a conceptual model of arrays that explains all of these variations and possibly predicts new variations. I wrote up my notes and did the bare minimum of editing and polishing. Somehow it ended up being 2000 words. 1-dimensional arrays A one-dimensional array is a function over 1..N for some N. To be clear this is math functions, not programming functions. Programming functions take values of a type and perform computations on them. Math functions take values of a fixed set and return values of another set. So the array [a, b, c, d] can be represented by the function (1 -> a ++ 2 -> b ++ 3 -> c ++ 4 -> d). Let's write the set of all four element character arrays as 1..4 -> char. 1..4 is the function's domain. The set of all character arrays is the empty array + the functions with domain 1..1 + the functions with domain 1..2 + ... Let's call this set Array[Char]. Our compilers can enforce that a type belongs to Array[Char], but some operations care about the more specific type, like matrix multiplication. This is either checked with the runtime type or, in exotic enough languages, with static dependent types. (This is actually how TLA+ does things: the basic collection types are functions and sets, and a function with domain 1..N is a sequence.) 2-dimensional arrays Now take the 3x4 matrix i. 3 4 0 1 2 3 4 5 6 7 8 9 10 11 There are two equally valid ways to represent the array function: A function that takes a row and a column and returns the value at that index, so it would look like f(r: 1..3, c: 1..4) -> Int. A function that takes a row and returns that column as an array, aka another function: f(r: 1..3) -> g(c: 1..4) -> Int.2 Man, (2) looks a lot like currying! In Haskell, functions can only have one parameter. If you write (+) 6 10, (+) 6 first returns a new function f y = y + 6, and then applies f 10 to get 16. So (+) has the type signature Int -> Int -> Int: it's a function that takes an Int and returns a function of type Int -> Int.3 Similarly, our 2D array can be represented as an array function that returns array functions: it has type 1..3 -> 1..4 -> Int, meaning it takes a row index and returns 1..4 -> Int, aka a single array. (This differs from conventional array-of-arrays because it forces all of the subarrays to have the same domain, aka the same length. If we wanted to permit ragged arrays, we would instead have the type 1..3 -> Array[Int].) Why is this useful? A couple of reasons. First of all, we can apply function transformations to arrays, like "combinators". For example, we can flip any function of type a -> b -> c into a function of type b -> a -> c. So given a function that takes rows and returns columns, we can produce one that takes columns and returns rows. That's just a matrix transposition! Second, we can extend this to any number of dimensions: a three-dimensional array is one with type 1..M -> 1..N -> 1..O -> V. We can still use function transformations to rearrange the array along any ordering of axes. Speaking of dimensions: What are dimensions, anyway Okay, so now imagine we have a Row × Col grid of pixels, where each pixel is a struct of type Pixel(R: int, G: int, B: int). So the array is Row -> Col -> Pixel But we can also represent the Pixel struct with a function: Pixel(R: 0, G: 0, B: 255) is the function where f(R) = 0, f(G) = 0, f(B) = 255, making it a function of type {R, G, B} -> Int. So the array is actually the function Row -> Col -> {R, G, B} -> Int And then we can rearrange the parameters of the function like this: {R, G, B} -> Row -> Col -> Int Even though the set {R, G, B} is not of form 1..N, this clearly has a real meaning: f[R] is the function mapping each coordinate to that coordinate's red value. What about Row -> {R, G, B} -> Col -> Int? That's for each row, the 3 × Col array mapping each color to that row's intensities. Really any finite set can be a "dimension". Recording the monitor over a span of time? Frame -> Row -> Col -> Color -> Int. Recording a bunch of computers over some time? Computer -> Frame -> Row …. This is pretty common in constraint satisfaction! Like if you're conference trying to assign talks to talk slots, your array might be type (Day, Time, Room) -> Talk, where Day/Time/Room are enumerations. An implementation constraint is that most programming languages only allow integer indexes, so we have to replace Rooms and Colors with numerical enumerations over the set. As long as the set is finite, this is always possible, and for struct-functions, we can always choose the indexing on the lexicographic ordering of the keys. But we lose type safety. Why tables are different One more example: Day -> Hour -> Airport(name: str, flights: int, revenue: USD). Can we turn the struct into a dimension like before? In this case, no. We were able to make Color an axis because we could turn Pixel into a Color -> Int function, and we could only do that because all of the fields of the struct had the same type. This time, the fields are different types. So we can't convert {name, flights, revenue} into an axis. 4 One thing we can do is convert it to three separate functions: airport: Day -> Hour -> Str flights: Day -> Hour -> Int revenue: Day -> Hour -> USD But we want to keep all of the data in one place. That's where tables come in: an array-of-structs is isomorphic to a struct-of-arrays: AirportColumns( airport: Day -> Hour -> Str, flights: Day -> Hour -> Int, revenue: Day -> Hour -> USD, ) The table is a sort of both representations simultaneously. If this was a pandas dataframe, df["airport"] would get the airport column, while df.loc[day1] would get the first day's data. I don't think many table implementations support more than one axis dimension but there's no reason they couldn't. These are also possible transforms: Hour -> NamesAreHard( airport: Day -> Str, flights: Day -> Int, revenue: Day -> USD, ) Day -> Whatever( airport: Hour -> Str, flights: Hour -> Int, revenue: Hour -> USD, ) In my mental model, the heterogeneous struct acts as a "block" in the array. We can't remove it, we can only push an index into the fields or pull a shared column out. But there's no way to convert a heterogeneous table into an array. Actually there is a terrible way Most languages have unions or product types that let us say "this is a string OR integer". So we can make our airport data Day -> Hour -> AirportKey -> Int | Str | USD. Heck, might as well just say it's Day -> Hour -> AirportKey -> Any. But would anybody really be mad enough to use that in practice? Oh wait J does exactly that. J has an opaque datatype called a "box". A "table" is a function Dim1 -> Dim2 -> Box. You can see some examples of what that looks like here Misc Thoughts and Questions The heterogeneity barrier seems like it explains why we don't see multiple axes of table columns, while we do see multiple axes of array dimensions. But is that actually why? Is there a system out there that does have multiple columnar axes? The array x = [[a, b, a], [b, b, b]] has type 1..2 -> 1..3 -> {a, b}. Can we rearrange it to 1..2 -> {a, b} -> 1..3? No. But we can rearrange it to 1..2 -> {a, b} -> PowerSet(1..3), which maps rows and characters to columns with that character. [(a -> {1, 3} ++ b -> {2}), (a -> {} ++ b -> {1, 2, 3}]. We can also transform Row -> PowerSet(Col) into Row -> Col -> Bool, aka a boolean matrix. This makes sense to me as both forms are means of representing directed graphs. Are other function combinators useful for thinking about arrays? Does this model cover pivot tables? Can we extend it to relational data with multiple tables? Systems Distributed Talk (will be) Online The premier will be August 6 at 12 CST, here! I'll be there to answer questions / mock my own performance / generally make a fool of myself. Sacrilege! But it turns out in this context, it's easier to use 1-indexing than 0-indexing. In the years since I wrote that article I've settled on "each indexing choice matches different kinds of mathematical work", so mathematicians and computer scientists are best served by being able to choose their index. But software engineers need consistency, and 0-indexing is overall a net better consistency pick. ↩ This is right-associative: a -> b -> c means a -> (b -> c), not (a -> b) -> c. (1..3 -> 1..4) -> Int would be the associative array that maps length-3 arrays to integers. ↩ Technically it has type Num a => a -> a -> a, since (+) works on floats too. ↩ Notice that if each Airport had a unique name, we could pull it out into AirportName -> Airport(flights, revenue), but we still are stuck with two different values. ↩

a month ago • 23 votes

More in programming

chapter two

I had watched enough true crime to know that you should never talk to the police. And I wasn’t arrogant enough to believe that I was different. While I felt like I knew the interrogation tactics in and out, they were repeat customers of that interaction. I wasn’t going to call. I was going to ignore it. I’m not getting Reid techniqued. Why did they ask for me? This house was owned by my mother, how do they even know I live here? Wait who am I kidding, of course they know. I went to high school here, governments have records of that kind of thing. But still, why ask for me? Another thing was odd. We lived in Brooklyn, aka Kings County. Not Nassau County. These guys must have driven all the way here on a Saturday night. I felt like I was being watched. They wouldn’t drive all the way here to just leave a business card. I felt trapped in the house. Like they were a mountain lion on a rock perch and I was the prey in the valley below. They had the high ground and I didn’t know what they could see. But this was crazy, I didn’t do anything! Should I call them? Figure out what they want? No! That’s exactly what they want. They know I feel like this. This is exactly what they are going for. Another system carefully crafted based on years and years of “user feedback” designed to manipulate you into doing what it wants. But what if I’m doing what they want right now? Maybe they don’t want me to call. Maybe the real goal is to figure out what I do next. Watching and hoping I’ll go check on the body or something. But there wasn’t a body! If I did commit a crime this would all be a lot easier, I’d know why they were here and what they wanted and could plan my next move accordingly. I opened another Bud Light, took my clothes off, and got into bed. Even though there was nobody else home, I kept the sound off on the porn. Just in case they were listening. After I finished, I felt a bit more calm. Dude get a grip, all they did was leave a business card. Coming out of the paranoid spiral a bit, I realized what it must be about. It must have had to do with my Dad’s meeting. That was in Long Island, aka Nassau County. Probably some dumb financial crap. My mother was out with her friends in Manhattan, but she’d be home tonight and maybe she knew what the meeting was. It was now twenty to nine and I texted Brian. He’s like yea bro Dave just got here come through. And you still have that case of Bud Light? I put the beers in a backpack. Is this what the detective planned? Maybe I was playing right into the plot; arrest me for underage possession of alcohol and then get me to talk about what I knew. But I didn’t even know anything! This whole thing was stupid. I thought about how I got the beers, wondering if the whole thing was somehow a set-up. Totally nonsense thought. Kids buy beer with fake IDs all the time. When I got to Brian’s everything was normal. I walked around the back of his house and opened the screen door to his basement. There were three leather couches in a U-shape, two of which were sparsely occupied by Brian and Dave. I took my place on the third empty one and put my backpack on the center ottoman. “Pretty cool, right? Yea I found it in my Dad’s old stuff.” said Brian, referring to the inflated bag atop a device labeled Volcano sharing the ottoman with my backpack. “What is it?” “Bro it’s like an old vape. You put the weed in and plug it in to the wall.” He detached the cloudy bag from the device and demonstrated. If you pushed on the mouthpiece, it let air through and you could breathe in the vaporized drug. “It’s like a bong but chill.” I inhaled. This probably wasn’t smart with how paranoid I was from the interaction earlier, but I felt safe in the basement. It was a summer night, I was with friends, I had drank beer. Life was good. Dave showed us this reel. It was a mouse in a maze, and it started from the mouse’s perspective. Kind of like a skater cam, wow these things could scurry. Then it zoomed out so you could see the maze from the perspective of the experimenter. Then seeing the back of his head looking down at the maze, cutting to sped up dashcam video of him driving home from work. Zooming out again with a sparkling line showing his route through the grid of city streets. AI has done wonders for these video transitions. Maybe this whole video was AI. “What if we’re the mouse,” said Dave in the most stereotypical stoner voice. He’d always find shit like this, in that way that when you are high the thought seems really deep. But if you think about it more it’s nonsense, like that mouse is in a maze constructed by humans, and even if it doesn’t always feel like it, the society we live in is jointly constructed by all of us. Brian showed a video of two girls at some Mardi Gras bead type event licking one ice cream cone. He told us he wasn’t a virgin but I didn’t really believe him. It was a bit after midnight and it was time to go home. I hadn’t really thought about the interaction from earlier, but I started to again when I got outside. It was a half mile walk back home; I was grateful to hear all the noises of the city. Even though I couldn’t see it, it reminded me that there was a society out there. My mom’s car wasn’t in the driveway. Maybe she met a guy. Nothing too out of the ordinary. I unlocked the door, closed it behind me, locked both the knob and the deadbolt, went upstairs into my room, locked that door, and with the blanket of those three locks, a bunch of beers, and a couple hits of the Volcano, drifted off to sleep.

17 hours ago • 2 votes

Dreams of Late Summer

Here on a summer night in the grass and lilac smell Drunk on the crickets and the starry sky, Oh what fine stories we could tell With this moonlight to tell them by. A summer night, and you, and paradise, So lovely and so filled with grace, Above your head, the universe has hung its … Continue reading Dreams of Late Summer →

2 days ago • 7 votes

Thrice charmed at Rails World

The first Rails World in Amsterdam was a roaring success back in 2023. Tickets sold out in 45 minutes, the atmosphere was electric, and The Rails Foundation set a new standard for conference execution in the Ruby community. So when we decided to return to the Dutch Capital for the third edition of the conference this year, the expectations were towering. And yet, Amanda Perino, our executive director and event organizer extraordinaire, managed to outdo herself, and produced an even better show this year. The venue we returned to was already at capacity the first time around, but Amanda managed to fit a third more attendees by literally using slimmer chairs! And I didn't hear any complaints the folks who had to sit a little closer together in order for more people to enjoy the gathering. The increased capacity didn't come close to satisfy the increased demand, though. This year, tickets sold out in less than two minutes. Crazy. But for the 800+ people who managed to secure a pass, I'm sure it felt worth the refresh-the-website scramble to buy a ticket. And, as in years past, Amanda's recording crew managed to turn around post-production on my keynote in less than 24 hours, so anyone disappointed with missing out on a ticket could at least be in the loop on all the awesome new Rails stuff we were releasing up to and during the conference. Every other session was recorded too, and will soon be on the Rails YouTube channel. You can't stream the atmosphere, the enthusiasm, and the genuine love of Ruby on Rails, though. I was once again blown away by just how many incredible people and stories we have in this ecosystem. From entrepreneurs who've built million (or billion!) dollar businesses on Rails, to programmers who've been around the framework for decades, to people who just picked it up this year. It was a thrill to meet all of them, to take hundreds of selfies, and to talk about Ruby, Rails, and the Omarchy expansion pack for hours on the hallway track! I've basically stopped doing prepared presentations at conferences, but Rails World is the one exception. I really try my best to put on a good show, present the highlights of what we've been working on in the past year at 37signals, and transfer the never-ending enthusiasm I continue to feel for this framework, this programming language, and this ecosystem. True, I may occasionally curse that commitment in the weeks leading up to the conference, but the responsibility is always rewarded during and after the execution with a deep sense of satisfaction. Not everyone is so lucky as I've been to find their life's work early in their career, and see it continue to blossom over the decades. I'm eternally grateful that I have. Of course, there's been ups and downs over the years — nothing is ever just a straight line of excitement up and to the right! — but we're oh-so-clearly on the up-up-up part of that curve at the moment. I don't know whether it's just the wind or the whims, but Rails is enjoying an influx of a new generation of programmers at the moment. No doubt it helps when I get to wax poetically about Ruby for an hour with Lex Fridman in front of an audience of millions. No doubt Shopify's continued success eating the world of ecommerce helps. No doubt the stability, professionalism, and execution from The Rails Foundation is an aid. There are many auxiliary reasons why we're riding a wave at the moment, but key to it all is also that Ruby on Rails is simply really, really good! Next year, with RailsConf finished, it's time to return to the US. Amanda has picked a great spot in Austin, we're planning to dramatically expand the capacity, but I also fully expect that demand will continue to rise, especially in the most prosperous and successful market for Rails. Thanks again to all The Rails Foundation members who believed in the vision for a new institution back in 2022. It looks like a no-brainer to join such a venture now, given the success of Rails World and everything else, but it actually took guts to sign on back then. I approached quite a few companies at that time who could see the value, but couldn't find the courage to support our work, as our industry was still held hostage to a band of bad ideas and terrible ideologies. All that nonsense is thankfully now long gone in the Rails world. We're enjoying a period of peak unity, excitement, progress, and determination to continue to push for end-to-end problem solving, open source, and freedom. I can't tell you how happy it makes me feel when I hear from yet another programmer who credits Ruby on Rails with finding joy and beauty in the writing web applications because of what I started over 22 years ago. It may sound trite, but it's true: It's an honor and a privilege. I hope to carry this meaningful burden for as long as my intellectual legs still let me stand. See you next year in Austin? I hope so!

2 days ago • 6 votes

chapter one

I hadn’t lost my virginity yet. And it wasn’t for lack of trying; it seemed like the rest of my generation was no longer interested in sex. On some level, I understood where they were coming from, the whole act did seem kind of pointless. But after a few beers, that wasn’t how my mind was working. I turned 19 last week. Dad flew in from Idaho, and it was the first time he was in the house I shared with my mother. He left when I was 12, and it was always apparent that parenting wasn’t the top thing on his mind. There was some meeting on Long Island. That’s probably why he was there, in addition to the fact he knew mom wouldn’t make him sleep on the couch. He had many reasons to be in New York that weren’t me. My birthday was just a flimsy pretense. He’d worked on Wall Street the whole time he was around, a quant. He wrote programs that made other people rich. But something happened to him right before he left. A crisis of conscience perhaps; he was spiraling for weeks, cursing the capitalist system, calling my mother a gold-digging whore (which was mostly true), and saying things needed to change. Then he packed a single backpack and left for Idaho. I visited him out there once my sophomore year. He had a camouflaged one room cabin in the middle of a spruce forest, but instead of the hunting or fishing stuff you might expect, the walls were adorned with electrical test equipment and various things that looked like they were out of a biology or chemistry lab. I didn’t know much about this stuff and that wasn’t what he wanted to talk about anyway. He wanted to talk about “man shit” like nature and women and not being life’s bitch. I tried to act like I did, but I didn’t really listen. All I remember is how eerily quiet the night was, I could hear every animal movement outside. My dad said you get used to it. Brian was having a party tonight. Well okay, party is a lofty way to describe it. He’d replaced the fluorescent lights in his mom’s basement with blacklights, and we’d go over there to drink beer and smoke weed and sit around on our phones and scroll. And sometimes someone would laugh at something and share with the group. I had a case of Bud Light left over from the last party and drank two of them today. Hence the thinking about sex and not thinking that thinking about sex was stupid. People wouldn’t be going over there for a few more hours, so I laid in my bed, drank, and loosely beat off to YouTube. Celebrity gossip, internet gossip, speedrun videos, nothing even arousing. I liked the true crime videos about the hot female teachers who slept with their students. Yea yea yea terrible crime and they all act holier than thou about what if the genders were reversed, but the genders weren’t reversed. Maybe they just don’t want to get demonetized. There were never women at these parties. Okay maybe one or two. But nobody ever slept with them or much thought about them that way. They were the agendered mass like the rest of us. Fellow consumers, not providers. Fuck I should just go visit a hooker. I didn’t know much about that, were hookers real? I’d never met one, and there wasn’t a good way to find out about stuff like this anymore. The Internet was pretty much all “advertiser friendly” now, declawed, sanitized. Once the algorithms got good enough and it was technically easy to censor, there was nothing holding them back. It wasn’t actually censored, it would just redirect you elsewhere. And if you didn’t pay careful attention, you wouldn’t even notice it happening. I tried asking ChatGPT about hookers and it told me to call them sex workers. And this was kind of triggering. Who the fuck does this machine think it is? But then I was lost on this tangent, the algorithms got a rise out of me and I went back to comfort food YouTube. Look this guy beat Minecraft starting with only one block. The doorbell rang. This always gives me anxiety. And it was particularly anxiety inducing since I was the only one home. Normally I could just know that the door of my room was locked and someone else would get it and this would be a downstairs issue. But it was just me at home. My heart rate jumped. I waited for it to ring again, but prayed that it wouldn’t. Please just go away. But sure enough, it rang again. I went to my window, my room was on the second floor. There was a black Escalade in the driveway that I hadn’t seen before, and I could see two men at the door. They were wearing suits. I ducked as to make sure they wouldn’t look up at me, making as little noise as possible. Peering over the window sill I could see one opening the screen door, and it looked like he stuck something to the main door. My heart was beating even faster now. It was Saturday night, why were there two men in suits? And why were they here? It felt longer, but 3 minutes later they drove off. I waited another 3 for good measure, just watching the clock on my computer until it hit 6:57. I doubled checked out the window to make sure they were actually gone, and crept down the stairs to retrieve whatever they left on the door. It was a business card, belonging to a “Detective James Reese” of the Nassau County Police. And on the back of the card, there was handwriting. “John – call me” John was my name.

2 days ago • 3 votes

Apologies and forgiveness

The first in a series of posts about doing things the right way

3 days ago • 11 votes

New here?