Full Width [alt+shift+f] Shortcuts [alt+shift+k]
Sign Up [alt+shift+s] Log In [alt+shift+l]
69
Keeping up appearances in tech I saw a remarkable pair of tweets the other day. In the wake of the outage, the CEO of CrowdStrike sent out a public announcement. It's purely factual. The scope of the problem is identified, the known facts are stated, and the logistics of disaster relief are set in motion. Millions of computers were affected. This is the equivalent of a frazzled official giving a brief statement in the aftermath of an earthquake, directing people to the Red Cross. Everything is basically on fire for everyone involved. Systems are failing everywhere, some critical, and quite likely people are panicking. The important thing is to give the technicians the information and tools to fix it, and for everyone else to do what they can, and stay out of the way. In response, a communication professional posted an 'improved' version: Credit where credit is due, she nailed the style. 10/10. It seems unobjectionable, at first. Let's go through, shall...
7 months ago

Improve your reading experience

Logged in users get linked directly to articles resulting in a better reading experience. Please login for free, it takes less than 1 minute.

More from Acko.net

I is for Intent

Why your app turned into spaghetti There's a certain kind of programmer. Let's call him Stanley. Stanley has been around for a while, and has his fair share of war stories. The common thread is that poorly conceived and specced solutions lead to disaster, or at least, ongoing misery. As a result, he has adopted a firm belief: it should be impossible for his program to reach an invalid state. Stanley loves strong and static typing. He's a big fan of pattern matching, and enums, and discriminated unions, which allow correctness to be verified at compile time. He also has strong opinions on errors, which must be caught, logged and prevented. He uses only ACID-compliant databases, wants foreign keys and triggers to be enforced, and wraps everything in atomic transactions. He hates any source of uncertainty or ambiguity, like untyped JSON or plain-text markup. His APIs will accept data only in normalized and validated form. When you use a Stanley lib, and it doesn't work, the answer will probably be: "you're using it wrong." Stanley is most likely a back-end or systems-level developer. Because nirvana in front-end development is reached when you understand that this view of software is not just wrong, but fundamentally incompatible with the real world. I will prove it. State Your Intent Take a text editor. What happens if you press the up and down arrows? The keyboard cursor (aka caret) moves up and down. Duh. Except it also moves left and right. The editor state at the start has the caret on line 1 column 6. Pressing down will move it to line 2 column 6. But line 2 is too short, so the caret is forcibly moved left to column 1. Then, pressing down again will move it back to column 6. It should be obvious that any editor that didn't remember which column you were actually on would be a nightmare to use. You know it in your bones. Yet this only works because the editor allows the caret to be placed on a position that "does not exist." What is the caret state in the middle? It is both column 1 and column 6. To accommodate this, you need more than just a View that is a pure function of a State, as is now commonly taught. Rather, you need an Intent, which is the source of truth that you mutate... and which is then parsed and validated into a State. Only then can it be used by the View to render the caret in the right place. To edit the intent, aka what a classic Controller does, is a bit tricky. When you press left/right, it should determine the new Intent.column based on the validated State.column +/- 1. But when you press up/down, it should keep the Intent.column you had before and instead change only Intent.line. New intent is a mixed function of both previous intent and previous state. The general pattern is that you reuse Intent if it doesn't change, but that new computed Intent should be derived from State. Note that you should still enforce normal validation of Intent.column when editing too: you don't allow a user to go past the end of a line. Any new intent should be as valid as possible, but old intent should be preserved as is, even if non-sensical or inapplicable. Functionally, for most of the code, it really does look and feel as if the state is just State, which is valid. It's just that when you make 1 state change, the app may decide to jump into a different State than one would think. When this happens, it means some old intent first became invalid, but then became valid again due to a subsequent intent/state change. This is how applications actually work IRL. FYI. Knives and Forks I chose a text editor as an example because Stanley can't dismiss this as just frivolous UI polish for limp wristed homosexuals. It's essential that editors work like this. The pattern is far more common than most devs realize: A tree view remembers the expand/collapse state for rows that are hidden. Inspector tabs remember the tab you were on, even if currently disabled or inapplicable. Toggling a widget between type A/B/C should remember all the A, B and C options, even if are mutually exclusive. All of these involve storing and preserving something unknown, invalid or unused, and bringing it back into play later. More so, if software matches your expected intent, it's a complete non-event. What looks like a "surprise hidden state transition" to a programmer is actually the exact opposite. It would be an unpleasant surprise if that extra state transition didn't occur. It would only annoy users: they already told the software what they wanted, but it keeps forgetting. The ur-example is how nested popup menus should work: good implementations track the motion of the cursor so you can move diagonally from parent to child, without falsely losing focus: This is an instance of the goalkeeper's curse: people rarely compliment or notice the goalkeeper if they do their job, only if they mess up. Successful applications of this principle are doomed to remain unnoticed and unstudied. Validation is not something you do once, discarding the messy input and only preserving the normalized output. It's something you do continuously and non-destructively, preserving the mess as much as possible. It's UI etiquette: the unspoken rules that everyone expects but which are mostly undocumented folklore. This poses a problem for most SaaS in the wild, both architectural and existential. Most APIs will only accept mutations that are valid. The goal is for the database to be a sequence of fully valid states: The smallest possible operation in the system is a fully consistent transaction. This flattens any prior intent. In practice, many software deviates from this ad-hoc. For example, spreadsheets let you create cyclic references, which is by definition invalid. The reason it must let you do this is because fixing one side of a cyclic reference also fixes the other side. A user wants and needs to do these operations in any order. So you must allow a state transition through an invalid state: This requires an effective Intent/State split, whether formal or informal. Because cyclic references can go several levels deep, identifying one cyclic reference may require you to spider out the entire dependency graph. This is functionally equivalent to identifying all cyclic references—dixit Dijkstra. Plus, you need to produce sensible, specific error messages. Many "clever" algorithmic tricks fail this test. Now imagine a spreadsheet API that doesn't allow for any cyclic references ever. This still requires you to validate the entire resulting model, just to determine if 1 change is allowed. It still requires a general validate(Intent). In short, it means your POST and PUT request handlers need to potentially call all your business logic. That seems overkill, so the usual solution is bespoke validators for every single op. If the business logic changes, there is a risk your API will now accept invalid intent. And the app was not built for that. If you flip it around and assume intent will go out-of-bounds as a normal matter, then you never have this risk. You can write the validation in one place, and you reuse it for every change as a normal matter of data flow. Note that this is not cowboy coding. Records and state should not get irreversibly corrupted, because you only ever use valid inputs in computations. If the system is multiplayer, distributed changes should still be well-ordered and/or convergent. But the data structures you're encoding should be, essentially, entirely liberal to your user's needs. Consider git. Here, a "unit of intent" is just a diff applied to a known revision ID. When something's wrong with a merge, it doesn't crash, or panic, or refuse to work. It just enters a conflict state. This state is computed by merging two incompatible intents. It's a dirty state that can't be turned into a clean commit without human intervention. This means git must continue to work, because you need to use git to clean it up. So git is fully aware when a conflict is being resolved. As a general rule, the cases where you actually need to forbid a mutation which satisfies all the type and access constraints are small. A good example is trying to move a folder inside itself: the file system has to remain a sensibly connected tree. Enforcing the uniqueness of names is similar, but also comes with a caution: falsehoods programmers believe about names. Adding (Copy) to a duplicate name is usually better than refusing to accept it, and most names in real life aren't unique at all. Having user-facing names actually requires creating tools and affordances for search, renaming references, resolving duplicates, and so on. Even among front-end developers, few people actually grok this mental model of a user. It's why most React(-like) apps in the wild are spaghetti, and why most blog posts about React gripes continue to miss the bigger picture. Doing React (and UI) well requires you to unlearn old habits and actually design your types and data flow so it uses potentially invalid input as its single source of truth. That way, a one-way data flow can enforce the necessary constraints on the fly. The way Stanley likes to encode and mutate his data is how programmers think about their own program: it should be bug-free and not crash. The mistake is to think that this should also apply to any sort of creative process that program is meant to enable. It would be like making an IDE that only allows you to save a file if the code compiles and passes all the tests. Trigger vs Memo Coding around intent is a very hard thing to teach, because it can seem overwhelming. But what's overwhelming is not doing this. It leads to codebases where every new feature makes ongoing development harder, because no part of the codebase is ever finished. You will sprinkle copies of your business logic all over the place, in the form of request validation, optimistic local updaters, and guess-based cache invalidation. If this is your baseline experience, your estimate of what is needed to pull this off will simply be wrong. In the traditional MVC model, intent is only handled at the individual input widget or form level. e.g. While typing a number, the intermediate representation is a string. This may be empty, incomplete or not a number, but you temporarily allow that. I've never seen people formally separate Intent from State in an entire front-end. Often their state is just an adhoc mix of both, where validation constraints are relaxed in the places where it was most needed. They might just duplicate certain fields to keep a validated and unvalidated variant side by side. There is one common exception. In a React-like, when you do a useMemo with a derived computation of some state, this is actually a perfect fit. The eponymous useState actually describes Intent, not State, because the derived state is ephemeral. This is why so many devs get lost here. const state = useMemo( () => validate(intent), [intent] ); Their usual instinct is that every action that has knock-on effects should be immediately and fully realized, as part of one transaction. Only, they discover some of those knock-on effects need to be re-evaluated if certain circumstances change. Often to do so, they need to undo and remember what it was before. This is then triggered anew via a bespoke effect, which requires a custom trigger and mutation. If they'd instead deferred the computation, it could have auto-updated itself, and they would've still had the original data to work with. e.g. In a WYSIWYG scenario, you often want to preview an operation as part of mouse hovering or dragging. It should look like the final result. You don't need to implement custom previewing and rewinding code for this. You just need the ability to layer on some additional ephemeral intent on top of the intent that is currently committed. Rewinding just means resetting that extra intent back to empty. You can make this easy to use by treating previews as a special kind of transaction: now you can make preview states with the same code you use to apply the final change. You can also auto-tag the created objects as being preview-only, which is very useful. That is: you can auto-translate editing intent into preview intent automatically, by messing with the contents of a transaction. Sounds bad, is actually good. The same applies to any other temporary state, for example, highlighting of elements. Instead of manually changing colors, and creating/deleting labels to pull this off, derive the resolved style just-in-time. This is vastly simpler than doing it all on 1 classic retained model. There, you run the risk of highlights incorrectly becoming sticky, or shapes reverting to the wrong style when un-highlighted. You can architect it so this is simply impossible. The trigger vs memo problem also happens on the back-end, when you have derived collections. Each object of type A must have an associated type B, created on-demand for each A. What happens if you delete an A? Do you delete the B? Do you turn the B into a tombstone? What if the relationship is 1-to-N, do you need to garbage collect? If you create invisible objects behind the scenes as a user edits, and you never tell them, expect to see a giant mess as a result. It's crazy how often I've heard engineers suggest a user should only be allowed to create something, but then never delete it, as a "solution" to this problem. Everyday undo/redo precludes it. Don't be ridiculous. The problem is having an additional layer of bookkeeping you didn't need. The source of truth was collection A, but you created a permanent derived collection B. If you instead make B ephemeral, derived via a stateless computation, then the problem goes away. You can still associate data with B records, but you don't treat B as the authoritative source for itself. This is basically what a WeakMap is. In database land this can be realized with a materialized view, which can be incremental and subscribed to. Taken to its extreme, this turns into event-based sourcing, which might seem like a panacea for this mindset. But in most cases, the latter is still a system by and for Stanley. The event-based nature of those systems exists to support housekeeping tasks like migration, backup and recovery. Users are not supposed to be aware that this is happening. They do not have any view into the event log, and cannot branch and merge it. The exceptions are extremely rare. It's not a system for working with user intent, only for flattening it, because it's append-only. It has a lot of the necessary basic building blocks, but substitutes programmer intent for user intent. What's most nefarious is that the resulting tech stacks are often quite big and intricate, involving job queues, multi-layered caches, distribution networks, and more. It's a bunch of stuff that Stanley can take joy and pride in, far away from users, with "hard" engineering challenges. Unlike all this *ugh* JavaScript, which is always broken and unreliable and uninteresting. Except it's only needed because Stanley only solved half the problem, badly. Patch or Bust When factored in from the start, it's actually quite practical to split Intent from State, and it has lots of benefits. Especially if State is just a more constrained version of the same data structures as Intent. This doesn't need to be fully global either, but it needs to encompass a meaningful document or workspace to be useful. It does create an additional problem: you now have two kinds of data in circulation. If reading or writing requires you to be aware of both Intent and State, you've made your code more complicated and harder to reason about. More so, making a new Intent requires a copy of the old Intent, which you mutate or clone. But you want to avoid passing Intent around in general, because it's fishy data. It may have the right types, but the constraints and referential integrity aren't guaranteed. It's a magnet for the kind of bugs a type-checker won't catch. I've published my common solution before: turn changes into first-class values, and make a generic update of type Update<T> be the basic unit of change. As a first approximation, consider a shallow merge {...value, ...update}. This allows you to make an updateIntent(update) function where update only specifies the fields that are changing. In other words, Update<Intent> looks just like Update<State> and can be derived 100% from State, without Intent. Only one place needs to have access to the old Intent, all other code can just call that. You can make an app intent-aware without complicating all the code. If your state is cleaved along orthogonal lines, then this is all you need. i.e. If column and line are two separate fields, then you can selectively change only one of them. If they are stored as an XY tuple or vector, now you need to be able to describe a change that only affects either the X or Y component. const value = { hello: 'text', foo: { bar: 2, baz: 4 }, }; const update = { hello: 'world', foo: { baz: 50 }, }; expect( patch(value, update) ).toEqual({ hello: 'world', foo: { bar: 2, baz: 50 }, }); So in practice I have a function patch(value, update) which implements a comprehensive superset of a deep recursive merge, with full immutability. It doesn't try to do anything fancy with arrays or strings, they're just treated as atomic values. But it allows for precise overriding of merging behavior at every level, as well as custom lambda-based updates. You can patch tuples by index, but this is risky for dynamic lists. So instead you can express e.g. "append item to list" without the entire list, as a lambda. I've been using patch for years now, and the uses are myriad. To overlay a set of overrides onto a base template, patch(base, overrides) is all you need. It's the most effective way I know to erase a metric ton of {...splats} and ?? defaultValues and != null from entire swathes of code. This is a real problem. You could also view this as a "poor man's OT", with the main distinction being that a patch update only describes the new state, not the old state. Such updates are not reversible on their own. But they are far simpler to make and apply. It can still power a global undo/redo system, in combination with its complement diff(A, B): you can reverse an update by diffing in the opposite direction. This is an operation which is formalized and streamlined into revise(…), so that it retains the exact shape of the original update, and doesn't require B at all. The structure of the update is sufficient information: it too encodes some intent behind the change. With patch you also have a natural way to work with changes and conflicts as values. The earlier WYSIWIG scenario is just patch(commited, ephemeral) with bells on. The net result is that mutating my intent or state is as easy as doing a {...value, ...update} splat, but I'm not falsely incentivized to flatten my data structures. Instead it frees you up to think about what the most practical schema actually is from the data's point of view. This is driven by how the user wishes to edit it, because that's what you will connect it to. It makes you think about what a user's workspace actually is, and lets you align boundaries in UX and process with boundaries in data structure. Remember: most classic "data structures" are not about the structure of data at all. They serve as acceleration tools to speed up specific operations you need on that data. Having the reads and writes drive the data design was always part of the job. What's weird is that people don't apply that idea end-to-end, from database to UI and back. SQL tables are shaped the way they are because it enables complex filters and joins. However, I find this pretty counterproductive: it produces derived query results that are difficult to keep up to date on a client. They also don't look like any of the data structures I actually want to use in my code. A Bike Shed of Schemas This points to a very under-appreciated problem: it is completely pointless to argue about schemas and data types without citing specific domain logic and code that will be used to produce, edit and consume it. Because that code determines which structures you are incentivized to use, and which structures will require bespoke extra work. From afar, column and line are just XY coordinates. Just use a 2-vector. But once you factor in the domain logic and etiquette, you realize that the horizontal and vertical directions have vastly different rules applied to them, and splitting might be better. Which one do you pick? This applies to all data. Whether you should put items in a List<T> or a Map<K, V> largely depends on whether the consuming code will loop over it, or need random access. If an API only provides one, consumers will just build the missing Map or List as a first step. This is O(n log n) either way, because of sorting. The method you use to read or write your data shouldn't limit use of everyday structure. Not unless you have a very good reason. But this is exactly what happens. A lot of bad choices in data design come down to picking the "wrong" data type simply because the most appropriate one is inconvenient in some cases. This then leads to Conway's law, where one team picks the types that are most convenient only for them. The other teams are stuck with it, and end up writing bidirectional conversion code around their part, which will never be removed. The software will now always have this shape, reflecting which concerns were considered essential. What color are your types? { order: [4, 11, 9, 5, 15, 43], values: { 4: {...}, 5: {...}, 9: {...}, 11: {...}, 15: {...}, 43: {...}, }, ); For List vs Map, you can have this particular cake and eat it too. Just provide a List<Id> for the order and a Map<Id, T> for the values. If you structure a list or tree this way, then you can do both iteration and ID-based traversal in the most natural and efficient way. Don't underestimate how convenient this can be. This also has the benefit that "re-ordering items" and "editing items" are fully orthogonal operations. It decomposes the problem of "patching a list of objects" into "patching a list of IDs" and "patching N separate objects". It makes code for manipulating lists and trees universal. It lets you to decide on a case by case basis whether you need to garbage collect the map, or whether preserving unused records is actually desirable. Limiting it to ordinary JSON or JS types, rather than going full-blown OT or CRDT, is a useful baseline. With sensible schema design, at ordinary editing rates, CRDTs are overkill compared to the ability to just replay edits, or notify conflicts. This only requires version numbers and retries. Users need those things anyway: just because a CRDT converges when two people edit, doesn't mean the result is what either person wants. The only case where OTs/CRDTs are absolutely necessary is rich-text editing, and you need bespoke UI solutions for that anyway. For simple text fields, last-write-wins is perfectly fine, and also far superior to what 99% of RESTy APIs do. A CRDT is just a mechanism that translates partially ordered intents into a single state. Like, it's cool that you can make CRDT counters and CRDT lists and whatnot... but each CRDT implements only one particular resolution strategy. If it doesn't produce the desired result, you've created invalid intent no user expected. With last-write-wins, you at least have something 1 user did intend. Whether this is actually destructive or corrective is mostly a matter of schema design and minimal surface area, not math. The main thing that OTs and CRDTs do well is resolve edits on ordered sequences, like strings. If two users are typing text in the same doc, edits higher-up will shift edits down below, which means the indices change when rebased. But if you are editing structured data, you can avoid referring to indices entirely, and just use IDs instead. This sidesteps the issue, like splitting order from values. For the order, there is a simple solution: a map with a fractional index, effectively a dead-simple list CRDT. It just comes with some overhead. Using a CRDT for string editing might not even be enough. Consider Google Docs-style comments anchored to that text: their indices also need to shift on every edit. Now you need a bespoke domain-aware CRDT. Or you work around it by injecting magic markers into the text. Either way, it seems non-trivial to decouple a CRDT from the specific target domain of the data inside. The constraints get mixed in. If you ask me, this is why the field of real-time web apps is still in somewhat of a rut. It's mainly viewed as a high-end technical problem: how do we synchronize data structures over a P2P network without any data conflicts? What they should be asking is: what is the minimal amount of structure we need to reliably synchronize, so that users can have a shared workspace where intent is preserved, and conflicts are clearly signposted. And how should we design our schemas, so that our code can manipulate the data in a straightforward and reliable way? Fixing non-trivial user conflicts is simply not your job. Most SaaS out there doesn't need any of this technical complexity. Consider that a good multiplayer app requires user presence and broadcast anyway. The simplest solution is just a persistent process on a single server coordinating this, one per live workspace. It's what most MMOs do. In fast-paced video games, this even involves lag compensation. Reliable ordering is not the big problem. The situations where this doesn't scale, or where you absolutely must be P2P, are a minority. If you run into them, you must be doing very well. The solution that I've sketched out here is explicitly designed so it can comfortably be done by small teams, or even just 1 person. The (private) CAD app I showed glimpses of above is entirely built this way. It's patch all the way down and it's had undo/redo from day 1. It also has a developer mode where you can just edit the user-space part of the data model, and save/load it. When the in-house designers come to me with new UX requests, they often ask: "Is it possible to do ____?" The answer is never a laborious sigh from a front-end dev with too much on their plate. It's "sure, and we can do more." If you're not actively aware the design of schemas and code is tightly coupled, your codebase will explode, and the bulk of it will be glue. Much of it just serves to translate generalized intent into concrete state or commands. Arguments about schemas are usually just hidden debates about whose job it is to translate, split or join something. This isn't just an irrelevant matter of "wire formats" because changing the structure and format of data also changes how you address specific parts of it. In an interactive UI, you also need a reverse path, to apply edits. What I hope you are starting to realize is that this is really just the forward path in reverse, on so many levels. The result of a basic query is just the ordered IDs of the records that it matched. A join returns a tuple of record IDs per row. If you pre-assemble the associated record data for me, you actually make my job as a front-end dev harder, because there are multiple forward paths for the exact same data, in subtly different forms. What I want is to query and mutate the same damn store you do, and be told when what changes. It's table-stakes now. With well-architected data, this can be wired up mostly automatically, without any scaffolding. The implementations you encounter in the wild just obfuscate this, because they don't distinguish between the data store and the model it holds. The fact that the data store should not be corruptible, and should enforce permissions and quotas, is incorrectly extended to the entire model stored inside. But that model doesn't belong to Stanley, it belongs to the user. This is why desktop applications didn't have a "Data Export". It was just called Load and Save, and what you saved was the intent, in a file. Having a universal query or update mechanism doesn't absolve you from thinking about this either, which is why I think the patch approach is so rare: it looks like cowboy coding if you don't have the right boundaries in place. Patch is mainly for user-space mutations, not kernel-space, a concept that applies to more than just OS kernels. User-space must be very forgiving. If you avoid it, you end up with something like GraphQL, a good example of solving only half the problem badly. Its getter assembles data for consumption by laboriously repeating it in dozens of partial variations. And it turns the setter part into an unsavory mix of lasagna and spaghetti. No wonder, it was designed for a platform that owns and hoards all your data. * * * Viewed narrowly, Intent is just a useful concept to rethink how you enforce validation and constraints in a front-end app. Viewed broadly, it completely changes how you build back-ends and data flows to support that. It will also teach you how adding new aspects to your software can reduce complexity, not increase it, if done right. A good metric is to judge implementation choices by how many other places of the code need to care about them. If a proposed change requires adjustments literally everywhere else, it's probably a bad idea, unless the net effect is to remove code rather than add. I believe reconcilers like React or tree-sitter are a major guide stone here. What they do is apply structure-preserving transforms on data structures, and incrementally. They actually do the annoying part for you. I based Use.GPU on the same principles, and use it to drive CPU canvases too. The tree-based structure reflects that one function's state just might be the next function's intent, all the way down. This is a compelling argument that the data and the code should have roughly the same shape. You will also conclude there is nothing more nefarious than a hard split between back-end and front-end. You know, coded by different people, where each side is only half-aware of the other's needs, but one sits squarely in front of the other. Well-intentioned guesses about what the other end needs will often be wrong. You will end up with data types and query models that cannot answer questions concisely and efficiently, and which must be babysat to not go stale. In the last 20 years, little has changed here in the wild. On the back-end, it still looks mostly the same. Even when modern storage solutions are deployed, people end up putting SQL- and ORM-like layers on top, because that's what's familiar. The split between back-end and database has the exact same malaise. None of this work actually helps make the app more reliable, it's the opposite: every new feature makes on-going development harder. Many "solutions" in this space are not solutions, they are copes. Maybe we're overdue for a NoSQL-revival, this time with a focus on practical schema design and mutation? SQL was designed to model administrative business processes, not live interaction. I happen to believe a front-end should sit next to the back-end, not in front of it, with only a thin proxy as a broker. What I can tell you for sure is: it's so much better when intent is a first-class concept. You don't need nor want to treat user data as something to pussy-foot around, or handle like its radioactive. You can manipulate and transport it without a care. You can build rich, comfy functionality on top. Once implemented, you may find yourself not touching your network code for a very long time. It's the opposite of overwhelming, it's lovely. You can focus on building the tools your users need. This can pave the way for more advanced concepts like OT and CRDT, but will show you that neither of them is a substitute for getting your application fundamentals right. In doing so, you reach a synthesis of Dijkstra and anti-Dijkstra: your program should be provably correct in its data flow, which means it can safely break in completely arbitrary ways. Because the I in UI meant "intent" all along. More: On Variance and Extensibility Climbing Mount Effect APIs are About Policy

a year ago 21 votes
Stable Fiddusion

Acko.queue(function () { renderMathInElement(document.querySelector('article'), {delimiters: [ {left: "$$", right: "$$", display: true}, {left: "$", right: "$", display: false}, ]}); }); Frequency-domain blue noise generator In computer graphics, stochastic methods are so hot right now. All rendering turns into calculus, except you solve the integrals by numerically sampling them. As I showed with Teardown, this is all based on random noise, hidden with a ton of spatial and temporal smoothing. For this, you need a good source of high quality noise. There have been a few interesting developments in this area, such as Alan Wolfe et al.'s Spatio-Temporal Blue Noise. This post is about how I designed noise in frequency space. I will cover: What is blue noise? Designing indigo noise How swap works in the frequency domain Heuristics and analysis to speed up search Implementing it in WebGPU Along the way I will also show you some "street" DSP math. This illustrates how getting comfy in this requires you to develop deep intuition about complex numbers. But complex doesn't mean complicated. It can all be done on a paper napkin. The WebGPU interface I built What I'm going to make is this: If properly displayed, this image should look eerily even. But if your browser is rescaling it incorrectly, it may not be exactly right. Colorless Blue Ideas I will start by just recapping the essentials. If you're familiar, skip to the next section. Ordinary random generators produce uniform white noise: every value is equally likely, and the average frequency spectrum is flat. Time domain Frequency domain To a person, this doesn't actually seem fully 'random', because it has clusters and voids. Similarly, a uniformly random list of coin flips will still have long runs of heads or tails in it occasionally. What a person would consider evenly random is usually blue noise: it prefers to alternate between heads and tails, and avoids long runs entirely. It is 'more random than random', biased towards the upper frequencies, i.e. the blue part of the spectrum. Time domain Frequency domain Blue noise is great for e.g. dithering, because when viewed from afar, or blurred, it tends to disappear. With white noise, clumps remain after blurring: Blurred white noise Blurred blue noise Blueness is a delicate property. If you have e.g. 3D blue noise in a volume XYZ, then a single 2D XY slice is not blue at all: XYZ spectrum XY slice XY spectrum The samples are only evenly distributed in 3D, i.e. when you consider each slice in front and behind it too. Blue noise being delicate means that nobody really knows of a way to generate it statelessly, i.e. as a pure function f(x,y,z). Algorithms to generate it must factor in the whole, as noise is only blue if every single sample is evenly spaced. You can make blue noise images that tile, and sample those, but the resulting repetition may be noticeable. Because blue noise is constructed, you can make special variants. Uniform Blue Noise has a uniform distribution of values, with each value equally likely. An 8-bit 256x256 UBN image will have each unique byte appear exactly 256 times. Projective Blue Noise can be projected down, so that a 3D volume XYZ flattened into either XY, YZ or ZX is still blue in 2D, and same for X, Y and Z in 1D. Spatio-Temporal Blue Noise (STBN) is 3D blue noise created specifically for use in real-time rendering: Every 2D slice XY is 2D blue noise Every Z row is 1D blue noise This means XZ or YZ slices of STBN are not blue. Instead, it's designed so that when you average out all the XY slices over Z, the result is uniform gray, again without clusters or voids. This requires the noise in all the slices to perfectly complement each other, a bit like overlapping slices of translucent swiss cheese. This is the sort of noise I want to generate. Indigo STBN 64x64x16 XYZ spectrum Sleep Furiously A blur filter's spectrum is the opposite of blue noise: it's concentrated in the lowest frequencies, with a bump in the middle. If you blur the noise, you multiply the two spectra. Very little is left: only the ring-shaped overlap, creating a band-pass area. This is why blue noise looks good when smoothed, and is used in rendering, with both spatial (2D) and temporal smoothing (1D) applied. Blur filters can be designed. If a blur filter is perfectly low-pass, i.e. ~zero amplitude for all frequencies > $ f_{\rm{lowpass}} $ , then nothing is left of the upper frequencies past a point. If the noise is shaped to minimize any overlap, then the result is actually noise free. The dark part of the noise spectrum should be large and pitch black. The spectrum shouldn't just be blue, it should be indigo. When people say you can't design noise in frequency space, what they mean is that you can't merely apply an inverse FFT to a given target spectrum. The resulting noise is gaussian, not uniform. The missing ingredient is the phase: all the frequencies need to be precisely aligned to have the right value distribution. This is why you need a specialized algorithm. The STBN paper describes two: void-and-cluster, and swap. Both of these are driven by an energy function. It works in the spatial/time domain, based on the distances between pairs of samples. It uses a "fall-off parameter" sigma to control the effective radius of each sample, with a gaussian kernel. STBN (Wolfe et al.) The swap algorithm is trivially simple. It starts from white noise and shapes it: Start with e.g. 256x256 pixels initialized with the bytes 0-255 repeated 256 times in order Permute all the pixels into ~white noise using a random order Now iterate: randomly try swapping two pixels, check if the result is "more blue" This is guaranteed to preserve the uniform input distribution perfectly. The resulting noise patterns are blue, but they still have some noise in all the lower frequencies. The only blur filter that could get rid of it all, is one that blurs away all the signal too. My 'simple' fix is just to score swaps in the frequency domain instead. If this seems too good to be true, you should know that a permutation search space is catastrophically huge. If any pixel can be swapped with any other pixel, the number of possible swaps at any given step is O(N²). In a 256x256 image, it's ~2 billion. The goal is to find a sequence of thousands, millions of random swaps, to turn the white noise into blue noise. This is basically stochastic bootstrapping. It's the bulk of good old fashioned AI, using simple heuristics, queues and other tools to dig around large search spaces. If there are local minima in the way, you usually need more noise and simulated annealing to tunnel over those. Usually. This set up is somewhat simplified by the fact that swaps are symmetric (i.e. (A,B) = (B,A)), but also that applying swaps S1 and S2 is the same as applying swaps S2 and S1 as long as they don't overlap. Good Energy Let's take it one hurdle at a time. It's not obvious that you can change a signal's spectrum just by re-ordering its values over space/time, but this is easy to illustrate. Take any finite 1D signal, and order its values from lowest to highest. You will get some kind of ramp, approximating a sawtooth wave. This concentrates most of the energy in the first non-DC frequency: Now split the odd values from the even values, and concatenate them. You will now have two ramps, with twice the frequency: You can repeat this to double the frequency all the way up to Nyquist. So you have a lot of design freedom to transfer energy from one frequency to another. In fact the Fourier transform has the property that energy in the time and frequency domain is conserved: This means the sum of $ |\mathrm{spectrum}_k|^2 $ remains constant over pixel swaps. We then design a target curve, e.g. a high-pass cosine: This can be fit and compared to the current noise spectrum to get the error to minimize. However, I don't measure the error in energy $ |\mathrm{spectrum}_k|^2 $ but in amplitude $ |\mathrm{spectrum}_k| $. I normalize the spectrum and the target into distributions, and take the L2 norm of the difference, i.e. a sqrt of the sum of squared errors: This keeps the math simple, but also helps target the noise in the ~zero part of the spectrum. Otherwise, deviations near zero would count for less than deviations around one. Go Banana So I tried it. With a lot of patience, you can make 2D blue noise images up to 256x256 on a single thread. A naive random search with an FFT for every iteration is not fast, but computers are. Making a 64x64x16 with this is possible, but it's certainly like watching paint dry. It's the same number of pixels as 256x256, but with an extra dimension worth of FFTs that need to be churned. Still, it works and you can also make 3D STBN with the spatial and temporal curves controlled independently: Converged spectra I built command-line scripts for this, with a bunch of quality of life things. If you're going to sit around waiting for numbers to go by, you have a lot of time for this... Save and load byte/float-exact state to a .png, save parameters to .json Save a bunch of debug viz as extra .pngs with every snapshot Auto-save state periodically during runs Measure and show rate of convergence every N seconds, with smoothing Validate the histogram before saving to detect bugs and avoid corrupting expensive runs I could fire up a couple of workers to start churning, while continuing to develop the code liberally with new variations. I could also stop and restart workers with new heuristics, continuing where it left off. Protip: you can write C in JavaScript Drunk with power, I tried various sizes and curves, which created... okay noise. Each has the exact same uniform distribution so it's difficult to judge other than comparing to other output, or earlier versions of itself. To address this, I visualized the blurred result, using a [1 4 6 4 1] kernel as my base line. After adjusting levels, structure was visible: Semi-converged Blurred The resulting spectra show what's actually going on: Semi-converged Blurred The main component is the expected ring of bandpass noise, the 2D equivalent of ringing. But in between there is also a ton of redder noise, in the lower frequencies, which all remains after a blur. This noise is as strong as the ring. So while it's easy to make a blue-ish noise pattern that looks okay at first glance, there is a vast gap between having a noise floor and not having one. So I kept iterating: It takes a very long time, but if you wait, all those little specks will slowly twinkle out, until quantization starts to get in the way, with a loss of about 1/255 per pixel (0.0039). Semi converged Fully converged The effect on the blurred output is remarkable. All the large scale structure disappears, as you'd expect from spectra, leaving only the bandpass ringing. That goes away with a strong enough blur, or a large enough dark zone. The visual difference between the two is slight, but nevertheless, the difference is significant and pervasive when amplified: Semi converged Fully converged Difference Final spectrum I tried a few indigo noise curves, with different % of the curve zero. The resulting noise is all extremely equal, even after a blur and amplify. The only visible noise left is bandpass, and the noise floor is so low it may as well not be there. As you make the black exclusion zone bigger, the noise gets concentrated in the edges and corners. It becomes a bit more linear and squarish, a contender for violet noise. This is basically a smooth evolution towards a pure pixel checkboard in the limit. Using more than 50% zero seems inadvisable for this reason: Time domain Frequency domain At this point the idea was validated, but it was dog slow. Can it be done faster? Spatially Sparse An FFT scales like O(N log N). When you are dealing with images and volumes, that N is actually an N² or N³ in practice. The early phase of the search is the easiest to speed up, because you can find a good swap for any pixel with barely any tries. There is no point in being clever. Each sub-region is very non-uniform, and its spectrum nearly white. Placing pixels roughly by the right location is plenty good enough. You might try splitting a large volume into separate blocks, and optimize each block separately. That wouldn't work, because all the boundaries remain fully white. Overlapping doesn't fix this, because they will actively create new seams. I tried it. What does work is a windowed scoring strategy. It avoids a full FFT for the entire volume, and only scores each NxN or NxNxN region around each swapped point, with N-sized FFTs in each dimension. This is enormously faster and can rapidly dissolve larger volumes of white noise into approximate blue even with e.g. N = 8 or N = 16. Eventually it stops improving and you need to bump the range or switch to a global optimizer. Here's the progression from white noise, to when sparse 16x16 gives up, followed by some additional 64x64: Time domain Frequency domain Time domain Frequency domain Time domain Frequency domain A naive solution does not work well however. This is because the spectrum of a subregion does not match the spectrum of the whole. The Fourier transform assumes each signal is periodic. If you take a random subregion and forcibly repeat it, its new spectrum will have aliasing artifacts. This would cause you to consistently misjudge swaps. To fix this, you need to window the signal in the space/time-domain. This forces it to start and end at 0, and eliminates the effect of non-matching boundaries on the scoring. I used a smoothStep window because it's cheap, and haven't needed to try anything else: 16x16 windowed data This still alters the spectrum, but in a predictable way. A time-domain window is a convolution in the frequency domain. You don't actually have a choice here: not using a window is mathematically equivalent to using a very bad window. It's effectively a box filter covering the cut-out area inside the larger volume, which causes spectral ringing. The effect of the chosen window on the target spectrum can be modeled via convolution of their spectral magnitudes: $$ \mathbf{target}' = |\mathbf{target}| \circledast |\mathcal{F}(\mathbf{window})| $$ This can be done via the time domain as: $$ \mathbf{target}' = \mathcal{F}(\mathcal{F}^{-1}(|\mathbf{target}|) \cdot \mathcal{F}^{-1}(|\mathcal{F}(\mathbf{window})|)) $$ Note that the forward/inverse Fourier pairs are not redundant, as there is an absolute value operator in between. This discards the phase component of the window, which is irrelevant. Curiously, while it is important to window the noise data, it isn't very important to window the target. The effect of the spectral convolution is small, amounting to a small blur, and the extra error is random and absorbed by the smooth scoring function. The resulting local loss tracks the global loss function pretty closely. It massively speeds up the search in larger volumes, because the large FFT is the bottleneck. But it stops working well before anything resembling convergence in the frequency-domain. It does not make true blue noise, only a lookalike. The overall problem is still that we can't tell good swaps from bad swaps without trying them and verifying. Sleight of Frequency So, let's characterize the effect of a pixel swap. Given a signal [A B C D E F G H], let's swap C and F. Swapping the two values is the same as adding F - C = Δ to C, and subtracting that same delta from F. That is, you add the vector: V = [0 0 Δ 0 0 -Δ 0 0] This remains true if you apply a Fourier transform and do it in the frequency domain. To best understand this, you need to develop some intuition around FFTs of Dirac deltas. Consider the short filter kernel [1 4 6 4 1]. It's a little known fact, but you can actually sight-read its frequency spectrum directly off the coefficients, because the filter is symmetrical. I will teach you. The extremes are easy: The DC amplitude is the sum 1 + 4 + 6 + 4 + 1 = 16 The Nyquist amplitude is the modulated sum 1 - 4 + 6 - 4 + 1 = 0 So we already know it's an 'ideal' lowpass filter, which reduces the Nyquist signal +1, -1, +1, -1, ... to exactly zero. It also has 16x DC gain. Now all the other frequencies. First, remember the Fourier transform works in symmetric ways. Every statement "____ in the time domain = ____ in the frequency domain" is still true if you swap the words time and frequency. This has lead to the grotesquely named sub-field of cepstral processing where you have quefrencies and vawes, and it kinda feels like you're having a stroke. The cepstral convolution filter from earlier is called a lifter. Usually cepstral processing is applied to the real magnitude of the spectrum, i.e. $ |\mathrm{spectrum}| $, instead of its true complex value. This is a coward move. So, decompose the kernel into symmetric pairs: [· · 6 · ·] [· 4 · 4 ·] [1 · · · 1] All but the first row is a pair of real Dirac deltas in the time domain. Such a row is normally what you get when you Fourier transform a cosine, i.e.: $$ \cos \omega = \frac{\mathrm{e}^{i\omega} + \mathrm{e}^{-i\omega}}{2} $$ A cosine in time is a pair of Dirac deltas in the frequency domain. The phase of a (real) cosine is zero, so both its deltas are real. Now flip it around. The Fourier transform of a pair [x 0 0 ... 0 0 x] is a real cosine in frequency space. Must be true. Each new pair adds a new higher cosine on top of the existing spectrum. For the central [... 0 0 x 0 0 ...] we add a DC term. It's just a Fourier transform in the other direction: |FFT([1 4 6 4 1])| = [· · 6 · ·] => 6 [· 4 · 4 ·] => 8 cos(ɷ) [1 · · · 1] => 2 cos(2ɷ) = 6 + 8 cos(ɷ) + 2 cos(2ɷ) Normally you have to use the z-transform to analyze a digital filter. But the above is a shortcut. FFTs and inverse FFTs do have opposite phase, but that doesn't matter here because cos(ɷ) = cos(-ɷ). This works for the symmetric-even case too: you offset the frequencies by half a band, ɷ/2, and there is no DC term in the middle: |FFT([1 3 3 1])| = [· 3 3 ·] => 6 cos(ɷ/2) [1 · · 1] => 2 cos(3ɷ/2) = 6 cos(ɷ/2) + 2 cos(3ɷ/2) So, symmetric filters have spectra that are made up of regular cosines. Now you know. What about the delta vector [0 0 Δ 0 0 -Δ 0 0]? It's not symmetric, so we have to decompose it: V1 = [· · · · · Δ · ·] V2 = [· · Δ · · · · ·] V = V2 - V1 Each is now an unpaired dirac delta. Each vector's fourier transform is a complex wave $ Δ \cdot \mathrm{e}^{-i \omega k} $ in the frequency domain (the k'th quefrency). It lacks the usual complementary oppositely twisting wave $ Δ \cdot \mathrm{e}^{i \omega k} $, so it's not real-valued. It has constant magnitude Δ and varying phase: FFT(V1) = [ Δ Δ Δ Δ Δ Δ Δ Δ ] FFT(V2) = [ Δ Δ Δ Δ Δ Δ Δ Δ ] These are vawes. The effect of a swap is still just to add FFT(V), aka FFT(V2) - FFT(V1) to the (complex) spectrum. The effect is to transfer energy between all the bands simultaneously. Hence, FFT(V1) and FFT(V2) function as a source and destination mask for the transfer. However, 'mask' is the wrong word, because the magnitude of $ \mathrm{e}^{i \omega k} $ is always 1. It doesn't have varying amplitude, only varying phase. -FFT(V1) and FFT(V2) define the complex direction in which to add/subtract energy. When added together their phases interfere constructively or destructively, resulting in an amplitude that varies between 0 and 2Δ: an actual mask. The resulting phase will be halfway between the two, as it's the sum of two equal-length complex numbers. FFT(V) = [ · Δ Δ Δ Δ Δ Δ Δ ] For any given pixel A and its delta FFT(V1), it can pair up with other pixels B to form N-1 different interference masks FFT(V2) - FFT(V1). There are N(N-1)/2 unique interference masks, if you account for (A,B) (B,A) symmetry. Worth pointing out, the FFT of the first index: FFT([Δ 0 0 0 0 0 0 0]) = [Δ Δ Δ Δ Δ Δ Δ Δ] This is the DC quefrency, and the fourier symmetry continues to work. Moving values in time causes the vawe's quefrency to change in the frequency domain. This is the upside-down version of how moving energy to another frequency band causes the wave's frequency to change in the time domain. What's the Gradient, Kenneth? Using vectors as masks... shifting energy in directions... this means gradient descent, no? Well. It's indeed possible to calculate the derivative of your loss function as a function of input pixel brightness, with the usual bag of automatic differentiation/backprop tricks. You can also do it numerically. But, this doesn't help you directly because the only way you can act on that per-pixel gradient is by swapping a pair of pixels. You need to find two quefrencies FFT(V1) and FFT(V2) which interfere in exactly the right way to decrease the loss function across all bad frequencies simultaneously, while leaving the good ones untouched. Even if the gradient were to help you pick a good starting pixel, that still leaves the problem of finding a good partner. There are still O(N²) possible pairs to choose from, and the entire spectrum changes a little bit on every swap. Which means new FFTs to analyze it. Random greedy search is actually tricky to beat in practice. Whatever extra work you spend on getting better samples translates into less samples tried per second. e.g. Taking a best-of-3 approach is worse than just trying 3 swaps in a row. Swaps are almost always orthogonal. But random() still samples unevenly because it's white noise. If only we had.... oh wait. Indeed if you already have blue noise of the right size, you can use that to mildly speed up the search for more. Use it as a random permutation to drive sampling, with some inevitable whitening over time to keep it fresh. You can't however use the noise you're generating to accelerate its own search, because the two are highly correlated. What's really going on is all a consequence of the loss function. Given any particular frequency band, the loss function is only affected when its magnitude changes. Its phase can change arbitrarily, rolling around without friction. The complex gradient must point in the radial direction. In the tangential direction, the partial derivative is zero. The value of a given interference mask FFT(V1) - FFT(V2) for a given frequency is also complex-valued. It can be projected onto the current phase, and split into its radial and tangential component with a dot product. The interference mask has a dual action. As we saw, its magnitude varies between 0 and 2Δ, as a function of the two indices k1 and k2. This creates a window that is independent of the specific state or data. It defines a smooth 'hash' from the interference of two quefrency bands. But its phase adds an additional selection effect: whether the interference in the mask is aligned with the current band's phase: this determines the split between radial and tangential. This defines a smooth phase 'hash' on top. It cycles at the average of the two quefrencies, i.e. a different, third one. Energy is only added/subtracted if both hashes are non-zero. If the phase hash is zero, the frequency band only turns. This does not affect loss, but changes how each mask will affect it in the future. This then determines how it is coupled to other bands when you perform a particular swap. Note that this is only true differentially: for a finite swap, the curvature of the complex domain comes into play. The loss function is actually a hyper-cylindrical skate bowl you can ride around. Just the movement of all the bands is tied together. Frequency bands with significant error may 'random walk' freely clockwise or counterclockwise when subjected to swaps. A band can therefor drift until it gets a turn where its phase is in alignment with enough similar bands, where the swap makes them all descend along the local gradient, enough to counter any negative effects elsewhere. In the time domain, each frequency band is a wave that oscillates between -1...1: it 'owns' some of the value of each pixel, but there are places where its weight is ~zero (the knots). So when a band shifts phase, it changes how much of the energy of each pixel it 'owns'. This allows each band to 'scan' different parts of the noise in the time domain. In order to fix a particular peak or dent in the frequency spectrum, the search must rotate that band's phase so it strongly owns any defect in the noise, and then perform a swap to fix that defect. Thus, my mental model of this is not actually disconnected pixel swapping. It's more like one of those Myst puzzles where flipping a switch flips some of the neighbors too. You press one pair of buttons at a time. It's a giant haunted dimmer switch. We're dealing with complex amplitudes, not real values, so the light also has a color. Mechanically it's like a slot machine, with dials that can rotate to display different sides. The cherries and bells are the color: they determine how the light gets brighter or darker. If a dial is set just right, you can use it as a /dev/null to 'dump' changes. That's what theory predicts, but does it work? Well, here is a (blurred noise) spectrum being late-optimized. The search is trying to eliminate the left-over lower frequency noise in the middle: Semi converged Here's the phase difference from the late stages of search, each a good swap. Left to right shows 4 different value scales: x2 x16 x256 x4096 x2 x16 x256 x4096 At first it looks like just a few phases are changing, but amplification reveals it's the opposite. There are several plateaus. Strongest are the bands being actively modified. Then there's the circular error area around it, where other bands are still swirling into phase. Then there's a sharp drop-off to a much weaker noise floor, present everywhere. These are the bands that are already converged. Compare to a random bad swap: x2 x16 x256 x4096 Now there is strong noise all over the center, and the loss immediately gets worse, as a bunch of amplitudes start shifting in the wrong direction randomly. So it's true. Applying the swap algorithm with a spectral target naturally cycles through focusing on different parts of the target spectrum as it makes progress. This information is positionally encoded in the phases of the bands and can be 'queried' by attempting a swap. This means the constraint of a fixed target spectrum is actually a constantly moving target in the complex domain. Frequency bands that reach the target are locked in. Neither their magnitude nor phase changes in aggregate. The random walks of such bands must have no DC component... they must be complex-valued blue noise with a tiny amplitude. Knowing this doesn't help directly, but it does explain why the search is so hard. Because the interference masks function like hashes, there is no simple pattern to how positions map to errors in the spectrum. And once you get close to the target, finding new good swaps is equivalent to digging out information encoded deep in the phase domain, with O(N²) interference masks to choose from. Gradient Sampling As I was trying to optimize for evenness after blur, it occurred to me to simply try selecting bright or dark spots in the blurred after-image. This is the situation where frequency bands are in coupled alignment: the error in the spectrum has a relatively concentrated footprint in the time domain. But, this heuristic merely picks out good swaps that are already 'lined up' so to speak. It only works as a low-hanging fruit sampler, with rapidly diminishing returns. Next I used the gradient in the frequency domain. The gradient points towards increasing loss, which is the sum of squared distance $ (…)^2 $. So the slope is $ 2(…) $, proportional to distance to the goal: $$ |\mathrm{gradient}_k| = 2 \cdot \left( \frac{|\mathrm{spectrum}_k|}{||\mathbf{spectrum}||} - \frac{\mathrm{target}_k}{||\mathbf{target}||} \right) $$ It's radial, so its phase matches the spectrum itself: $$ \mathrm{gradient}_k = \mathrm{|gradient_k|} \cdot \left(1 ∠ \mathrm{arg}(\mathrm{spectrum}_k) \right) $$ Eagle-eyed readers may notice the sqrt part of the L2 norm is missing here. It's only there for normalization, and in fact, you generally want a gradient that decreases the closer you get to the target. It acts as a natural stabilizer, forming a convex optimization problem. You can transport this gradient backwards by applying an inverse FFT. Usually derivatives and FFTs don't commute, but that's only when you are deriving in the same dimension as the FFT. The partial derivative here is neither over time nor frequency, but by signal value. The resulting time-domain gradient tells you how fast the (squared) loss would change if a given pixel changed. The sign tells you whether it needs to become lighter or darker. In theory, a pixel with a large gradient can enable larger score improvements per step. It says little about what's a suitable pixel to pair with though. You can infer that a pixel needs to be paired with one that is brighter or darker, but not how much. The gradient only applies differentially. It involves two pixels, so it will cause interference between the two deltas, and also with the signal's own phase. The time-domain gradient does change slowly after every swap—mainly the swapping pixels—so this only needs to add an extra IFFT every N swap attempts, reusing it in between. I tried this in two ways. One was to bias random sampling towards points with the largest gradients. This barely did anything, when applied to one or both pixels. Then I tried going down the list in order, and this worked better. I tried a bunch of heuristics here, like adding a retry until paired, and a 'dud' tracker to reject known unpairable pixels. It did lead to some minor gains in successful sample selection. But beating random was still not a sure bet in all cases, because it comes at the cost of ordering and tracking all pixels to sample them. All in all, it was quite mystifying. Pair Analysis Hence I analyzed all possible swaps (A,B) inside one 64x64 image at different stages of convergence, for 1024 pixels A (25% of total). The result was quite illuminating. There are 2 indicators of a pixel's suitability for swapping: % of all possible swaps (A,_) that are good score improvement of best possible swap (A,B) They are highly correlated, and you can take the geometric average to get a single quality score to order by: The curve shows that the best possible candidates are rare, with a sharp drop-off at the start. Here the average candidate is ~1/3rd as good as the best, though every pixel is pairable. This represents the typical situation when you have unconverged blue-ish noise. Order all pixels by their (signed) gradient, and plot the quality: The distribution seems biased towards the ends. A larger absolute gradient at A can indeed lead to both better scores and higher % of good swaps. Notice that it's also noisier at the ends, where it dips below the middle. If you order pixels by their quality, and then plot the gradient, you see: Selecting for large gradient at A will select both the best and the worst possible pixels A. This implies that there are pixels in the noise that are very significant, but are nevertheless currently 'unfixable'. This corresponds to the 'focus' described earlier. By drawing from the 'top', I was mining the imbalance between the good/left and bad/right distribution. Selecting for a vanishing gradient would instead select the average-to-bad pixels A. I investigated one instance of each: very good, average or very bad pixel A. I tried every possible swap (A, B). You can plot a curve of the potential score improvement: The three scenarios have similar curves, with the bulk of swaps being negative. Only a tiny bit of the curve is sticking out positive, even in the best case. The potential benefit of a good swap is dwarfed by the potential harm of bad swaps. The main difference is just how many positive swaps there are, if any. So let's focus on the positive case, where you can see best. You can order by score, and plot the gradient of all the pixels B, to see another correlation. It looks kinda promising. Here the sign matters, with left and right being different. If the gradient of pixel A is the opposite sign, then this graph is mirrored. But if you order by (signed) gradient and plot the score, you see the real problem, caused by the noise: The good samples are mixed freely among the bad ones, with only a very weak trend downward. This explains why sampling improvements based purely on gradient for pixel B are impossible. You can see what's going on if you plot Δv, the difference in value between A and B: For a given pixel A, all the good swaps have a similar value for B, which is not unexpected. Its mean is the ideal value for A, but there is a lot of variance. In this case pixel A is nearly white, so it is brighter than almost every other pixel B. If you now plot Δv * -gradient, you see a clue on the left: Almost all of the successful swaps have a small but positive value. This represents what we already knew: the gradient's sign tells you if a pixel should be brighter or darker. If Δv has the opposite sign, the chances of a successful swap are slim. Ideally both pixels 'face' the right way, so the swap is beneficial on both ends. But only the combined effect on the loss matters: i.e. Δv * Δgradient < 0. It's only true differentially so it can misfire. But compared to blind sampling of pairs, it's easily 5-10x better and faster, racing towards the tougher parts of the search. What's more... while this test is just binary, I found that any effort spent on trying to further prioritize swaps by the magnitude of the gradient is entirely wasted. Maximizing Δv * Δgradient by repeated sampling is counterproductive, because it selects more bad candidates on the right. Minimizing Δv * Δgradient creates more successful swaps on the left, but lowers the average improvement per step so the convergence is net slower. Anything more sophisticated incurs too much computation to be worth it. It does have a limit. This is what it looks like when an image is practically fully converged: Eventually you reach the point where there are only a handful of swaps with any real benefit, while the rest is just shaving off a few bits of loss at a time. It devolves back to pure random selection, only skipping the coin flip for the gradient. It is likely that more targeted heuristics can still work here. The gradient also works in the early stages. As it barely changes over successive swaps, this leads to a different kind of sparse mode. Instead of scoring only a subset of pixels, simply score multiple swaps as a group over time, without re-scoring intermediate states. This lowers the success rate roughly by a power (e.g. 0.8 -> 0.64), but cuts the number of FFTs by a constant factor (e.g. 1/2). Early on this trade-off can be worth it. Even faster: don't score steps at all. In the very early stage, you can easily get up to 80-90% successful swaps just by filtering on values and gradients. If you just swap a bunch in a row, there is a very good chance you will still end up better than before. It works better than sparse scoring: using the gradient of your true objective approximately works better than using an approximate objective exactly. The latter will miss the true goal by design, while the former continually re-aims itself to the destination despite inaccuracy. Obviously you can mix and match techniques, and gradient + sparse is actually a winning combo. I've only scratched the surface here. Warp Speed Time to address the elephant in the room. If the main bottleneck is an FFT, wouldn't this work better on a GPU? The answer to that is an unsurprising yes, at least for large sizes where the overhead of async dispatch is negligible. However, it would have been endlessly more cumbersome to discover all of the above based on a GPU implementation, where I can't just log intermediate values to a console. After checking everything, I pulled out my bag of tricks and ported it to Use.GPU. As a result, the algorithm runs entirely on the GPU, and provides live visualization of the entire process. It requires a WebGPU-enabled browser, which in practice means Chrome on Windows or Mac, or a dev build elsewhere. I haven't particularly optimized this—the FFT is vanilla textbook—but it works. It provides an easy ~8x speed up on an M1 Mac on beefier target sizes. With a desktop GPU, 128x128x32 and larger become very feasible. It lacks a few goodies from the scripts, and only does gradient + optional sparse. You can however freely exchange PNGs between the CPU and GPU version via drag and drop, as long as the settings match. Layout components Compute components Worth pointing out: this visualization is built using Use.GPU's HTML-like layout system. I can put div-like blocks inside a flex box wrapper, and put text beside it... while at the same time using raw WGSL shaders as the contents of those divs. These visualization shaders sample and colorize the algorithm state on the fly, with no CPU-side involvement other than a static dispatch. The only GPU -> CPU readback is for the stats in the corner, which are classic React and real HTML, along with the rest of the controls. I can then build an <FFT> component and drop it inside an async <ComputeLoop>, and it does exactly what it should. The rest is just a handful of <Dispatch> elements and the ordinary headache of writing compute shaders. <Suspense> ensures all the shaders are compiled before dispatching. While running, the bulk of the tree is inert, with only a handful of reducers triggering on a loop, causing a mere 7 live components to update per frame. The compute dispatch fights with the normal rendering for GPU resources, so there is an auto-batching mechanism that aims for approximately 30-60 FPS. The display is fully anti-aliased, including the pixelized data. I'm using the usual per-pixel SDF trickery to do this... it's applied as a generic wrapper shader for any UV-based sampler. It's a good showcase that Use.GPU really is React-for-GPUs with less hassle, but still with all the goodies. It bypasses most of the browser once the canvas gets going, and it isn't just for UI: you can express async compute just fine with the right component design. The robust layout and vector plotting capabilities are just extra on top. I won't claim it's the world's most elegant abstraction, because it's far too pragmatic for that. But I simply don't know any other programming environment where I could even try something like this and not get bogged down in GPU binding hell, or have to round-trip everything back to the CPU. * * * So there you have it: blue and indigo noise à la carte. What I find most interesting is that the problem of generating noise in the time domain has been recast into shaping and denoising a spectrum in the frequency domain. It starts as white noise, and gets turned into a pre-designed picture. You do so by swapping pixels in the other domain. The state for this process is kept in the phase channel, which is not directly relevant to the problem, but drifts into alignment over time. Hence I called it Stable Fiddusion. If you swap the two domains, you're turning noise into a picture by swapping frequency bands without changing their values. It would result in a complex-valued picture, whose magnitude is the target, and whose phase encodes the progress of the convergence process. This is approximately what you get when you add a hidden layer to a diffusion model. What I also find interesting is that the notion of swaps naturally creates a space that is O(N²) big with only N samples of actual data. Viewed from the perspective of a single step, every pair (A,B) corresponds to a unique information mask in the frequency domain that extracts a unique delta from the same data. There is redundancy, of course, but the nature of the Fourier transform smears it out into one big superposition. When you do multiple swaps, the space grows, but not quite that fast: any permutation of the same non-overlapping swaps is equivalent. There is also a notion of entanglement: frequency bands / pixels are linked together to move as a whole by default, but parts will diffuse into being locked in place. Phase is kind of the bugbear of the DSP world. Everyone knows it's there, but they prefer not to talk about it unless its content is neat and simple. Hopefully by now you have a better appreciation of the true nature of a Fourier transform. Not just as a spectrum for a real-valued signal, but as a complex-valued transform of a complex-valued input. During a swap run, the phase channel continuously looks like noise, but is actually highly structured when queried with the right quefrency hashes. I wonder what other things look like that, when you flip them around. More: Stable Fiddusion app CPU-side source code WebGPU source code

a year ago 20 votes
Sub-pixel Distance Transform

High quality font rendering for WebGPU This page includes diagrams in WebGPU, which has limited browser support. For the full experience, use Chrome on Windows or Mac, or a developer build on other platforms. In this post I will describe Use.GPU's text rendering, which uses a bespoke approach to Signed Distance Fields (SDFs). This was borne out of necessity: while SDF text is pretty common on GPUs, some of the established practice on generating SDFs from masks is incorrect, and some libraries get it right only by accident. So this will be a deep dive from first principles, about the nuances of subpixels. SDFs The idea behind SDFs is quite simple. To draw a crisp, anti-aliased shape at any size, you start from a field or image that records the distance to the shape's edge at every point, as a gradient. Lighter grays are inside, darker grays are outside. This can be a lower resolution than the target. Then you increase the contrast until the gradient is exactly 1 pixel wide at the target size. You can sample it to get a perfectly anti-aliased opacity mask: This works fine for text at typical sizes, and handles fractional shifts and scales perfectly with zero shimmering. It's also reasonably correct from a signal processing math point-of-view: it closely approximates averaging over a pixel-sized circular window, i.e. a low-pass convolution. Crucially, it takes a rendered glyph as input, which means I can remain blissfully unaware of TrueType font specifics, and bezier rasterization, and just offload that to an existing library. To generate an SDF, I started with MapBox's TinySDF library. Except, what comes out of it is wrong: The contours are noticeably wobbly and pixelated. The only reason the glyph itself looks okay is because the errors around the zero-level are symmetrical and cancel out. If you try to dilate or contract the outline, which is supposed to be one of SDF's killer features, you get ugly gunk. Compare to: The original Valve paper glosses over this aspect and uses high resolution inputs (4k) for a highly downscaled result (64). That is not an option for me because it's too slow. But I did get it to work. As a result Use.GPU has a novel subpixel-accurate distance transform (ESDT), which even does emoji. It's a combination CPU/GPU approach, with the CPU generating SDFs and the GPU rendering them, including all the debug viz. The Classic EDT The common solution is a Euclidean Distance Transform. Given a binary mask, it will produce an unsigned distance field. This holds the squared distance d² for either the inside or outside area, which you can sqrt. Like a Fourier Transform, you can apply it to 2D images by applying it horizontally on each row X, then vertically on each column Y (or vice versa). To make a signed distance field, you do this for both the inside and outside separately, and then combine the two as inside – outside or vice versa. The algorithm is one of those clever bits of 80s-style C code which is O(N), has lots of 1-letter variable names, and is very CPU cache friendly. Often copy/pasted, but rarely understood. In TypeScript it looks like this, where array is modified in-place and f, v and z are temporary buffers up to 1 row/column long. The arguments offset and stride allow the code to be used in either the X or Y direction in a flattened 2D array. for (let q = 1, k = 0, s = 0; q < length; q++) { f[q] = array[offset + q * stride]; do { let r = v[k]; s = (f[q] - f[r] + q * q - r * r) / (q - r) / 2; } while (s <= z[k] && --k > -1); k++; v[k] = q; z[k] = s; z[k + 1] = INF; } for (let q = 0, k = 0; q < length; q++) { while (z[k + 1] < q) k++; let r = v[k]; let d = q - r; array[offset + q * stride] = f[r] + d * d; } To explain what this code does, let's start with a naive version instead. Given a 1D input array of zeroes (filled), with an area masked out with infinity (empty): O = [·, ·, ·, 0, 0, 0, 0, 0, ·, 0, 0, 0, ·, ·, ·] Make a matching sequence … 3 2 1 0 1 2 3 … for each element, centering the 0 at each index: [0, 1, 2, 3, 4, 5, 6, 7, 8, 9,10,11,12,13,14] + ∞ [1, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9,10,11,12,13] + ∞ [2, 1, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9,10,11,12] + ∞ [3, 2, 1, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9,10,11] + 0 [4, 3, 2, 1, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9,10] + 0 [5, 4, 3, 2, 1, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9] + 0 [6, 5, 4, 3, 2, 1, 0, 1, 2, 3, 4, 5, 6, 7, 8] + 0 [7, 6, 5, 4, 3, 2, 1, 0, 1, 2, 3, 4, 5, 6, 7] + 0 [8, 7, 6, 5, 4, 3, 2, 1, 0, 1, 2, 3, 4, 5, 6] + ∞ [9, 8, 7, 6, 5, 4, 3, 2, 1, 0, 1, 2, 3, 4, 5] + 0 [10,9, 8, 7, 6, 5, 4, 3, 2, 1, 0, 1, 2, 3, 4] + 0 [11,10,9, 8, 7, 6, 5, 4, 3, 2, 1, 0, 1, 2, 3] + 0 [12,11,10,9, 8, 7, 6, 5, 4, 3, 2, 1, 0, 1, 2] + ∞ [13,12,11,10,9, 8, 7, 6, 5, 4, 3, 2, 1, 0, 1] + ∞ [14,13,12,11,10,9, 8, 7, 6, 5, 4, 3, 2, 1, 0] + ∞ You then add the value from the array to each element in the row: [∞, ∞, ∞, ∞, ∞, ∞, ∞, ∞, ∞, ∞, ∞, ∞, ∞, ∞, ∞] [∞, ∞, ∞, ∞, ∞, ∞, ∞, ∞, ∞, ∞, ∞, ∞, ∞, ∞, ∞] [∞, ∞, ∞, ∞, ∞, ∞, ∞, ∞, ∞, ∞, ∞, ∞, ∞, ∞, ∞] [3, 2, 1, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9,10,11] [4, 3, 2, 1, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9,10] [5, 4, 3, 2, 1, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9] [6, 5, 4, 3, 2, 1, 0, 1, 2, 3, 4, 5, 6, 7, 8] [7, 6, 5, 4, 3, 2, 1, 0, 1, 2, 3, 4, 5, 6, 7] [∞, ∞, ∞, ∞, ∞, ∞, ∞, ∞, ∞, ∞, ∞, ∞, ∞, ∞, ∞] [9, 8, 7, 6, 5, 4, 3, 2, 1, 0, 1, 2, 3, 4, 5] [10,9, 8, 7, 6, 5, 4, 3, 2, 1, 0, 1, 2, 3, 4] [11,10,9, 8, 7, 6, 5, 4, 3, 2, 1, 0, 1, 2, 3] [∞, ∞, ∞, ∞, ∞, ∞, ∞, ∞, ∞, ∞, ∞, ∞, ∞, ∞, ∞] [∞, ∞, ∞, ∞, ∞, ∞, ∞, ∞, ∞, ∞, ∞, ∞, ∞, ∞, ∞] [∞, ∞, ∞, ∞, ∞, ∞, ∞, ∞, ∞, ∞, ∞, ∞, ∞, ∞, ∞] And then take the minimum of each column: P = [3, 2, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 2, 3] This sequence counts up inside the masked out area, away from the zeroes. This is the positive distance field P. You can do the same for the inverted mask: I = [0, 0, 0, ·, ·, ·, ·, ·, 0, ·, ·, ·, 0, 0, 0] to get the complementary area, i.e. the negative distance field N: N = [0, 0, 0, 1, 2, 3, 2, 1, 0, 1, 2, 1, 0, 0, 0] That's what the EDT does, except it uses square distance … 9 4 1 0 1 4 9 …: When you apply it a second time in the second dimension, these outputs are the new input, i.e. values other than 0 or ∞. It still works because of Pythagoras' rule: d² = x² + y². This wouldn't be true if it used linear distance instead. The net effect is that you end up intersecting a series of parabolas, somewhat like a 1D slice of a Voronoi diagram: I' = [0, 0, 1, 4, 9, 4, 4, 4, 1, 1, 4, 9, 4, 9, 9] Each parabola sitting above zero is the 'shadow' of a zero-level paraboloid located some distance in a perpendicular dimension: The code is just a more clever way to do that, without generating the entire N² grid per row/column. It instead scans through the array left to right, building up a list v[k] of significant minima, with thresholds s[k] where two parabolas intersect. It adds them as candidates (k++) and discards them (--k) if they are eclipsed by a newer value. This is the first for/while loop: for (let q = 1, k = 0, s = 0; q < length; q++) { f[q] = array[offset + q * stride]; do { let r = v[k]; s = (f[q] - f[r] + q * q - r * r) / (q - r) / 2; } while (s <= z[k] && --k > -1); k++; v[k] = q; z[k] = s; z[k + 1] = INF; } Then it goes left to right again (for), and fills out the values, skipping ahead to the right minimum (k++). This is the squared distance from the current index q to the nearest minimum r, plus the minimum's value f[r] itself. The paper has more details: for (let q = 0, k = 0; q < length; q++) { while (z[k + 1] < q) k++; let r = v[k]; let d = q - r; array[offset + q * stride] = f[r] + d * d; } The Broken EDT So what's the catch? The above assumes a binary mask. As it happens, if you try to subtract a binary N from P, you have an off-by-one error: O = [·, ·, ·, 0, 0, 0, 0, 0, ·, 0, 0, 0, ·, ·, ·] I = [0, 0, 0, ·, ·, ·, ·, ·, 0, ·, ·, ·, 0, 0, 0] P = [3, 2, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 2, 3] N = [0, 0, 0, 1, 2, 3, 2, 1, 0, 1, 2, 1, 0, 0, 0] P - N = [3, 2, 1,-1,-2,-3,-2,-1, 1,-1,-2,-1, 1, 2, 3] It goes directly from 1 to -1 and back. You could add +/- 0.5 to fix that. But if there is a gray pixel in between each white and black, which we classify as both inside (0) and outside (0), it seems to work out just fine: O = [·, ·, ·, 0, 0, 0, 0, 0, ·, 0, 0, 0, ·, ·, ·] I = [0, 0, 0, 0, ·, ·, ·, 0, 0, 0, ·, 0, 0, 0, 0] P = [3, 2, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 2, 3] N = [0, 0, 0, 0, 1, 2, 1, 0, 0, 0, 1, 0, 0, 0, 0] P - N = [3, 2, 1, 0,-1,-2,-1, 0, 1, 0,-1, 0, 1, 2, 3] This is a realization that somebody must've had, and they reasoned on: "The above is correct for a 50% opaque pixel, where the edge between inside and outside falls exactly in the middle of a pixel." "Lighter grays are more inside, and darker grays are more outside. So all we need to do is treat l = level - 0.5 as a signed distance, and use l² for the initial inside or outside value for gray pixels. This will cause either the positive or negative distance field to shift by a subpixel amount l. And then the EDT will propagate this in both X and Y directions." The initial idea is correct, because this is just running SDF rendering in reverse. A gray pixel in an opacity mask is what you get when you contrast adjust an SDF and do not blow it out into pure black or white. The information inside the gray pixels is "correct", up to rounding. But there are two mistakes here. The first is that even in an anti-aliased image, you can have white pixels right next to black ones. Especially with fonts, which are pixel-hinted. So the SDF is wrong there, because it goes directly from -1 to 1. This causes the contours to double up, e.g. around this bottom edge: To solve this, you can eliminate the crisp case by deliberately making those edges very dark or light gray. But the second mistake is more subtle. The EDT works in 2D because you can feed the output of X in as the input of Y. But that means that any non-zero input to X represents another dimension Z, separate from X and Y. The resulting squared distance will be x² + y² + z². This is a 3D distance, not 2D. If an edge is shifted by 0.5 pixels in X, you would expect a 1D SDF like: […, 0.5, 1.5, 2.5, 3.5, …] = […, 0.5, 1 + 0.5, 2 + 0.5, 3 + 0.5, …] Instead, because of the squaring + square root, you will get: […, 0.5, 1.12, 2.06, 3.04, …] = […, sqrt(0.25), sqrt(1 + 0.25), sqrt(4 + 0.25), sqrt(9 + 0.25), …] The effect of l² = 0.25 rapidly diminishes as you get away from the edge, and is significantly wrong even just one pixel over. The correct shift would need to be folded into (x + …)² + (y + …)² and depends on the direction. e.g. If an edge is shifted horizontally, it ought to be (x + l)² + y², which means there is a term of 2*x*l missing. If the shift is vertical, it's 2*y*l instead. This is also a signed value, not positive/unsigned. Given all this, it's a miracle this worked at all. The only reason this isn't more visible in the final glyph is because the positive and negative fields contains the same but opposite errors around their respective gray pixels. The Not-Subpixel EDT As mentioned before, the EDT algorithm is essentially making a 1D Voronoi diagram every time. It finds the distance to the nearest minimum for every array index. But there is no reason for those minima themselves to lie at integer offsets, because the second for loop effectively resamples the data. So you can take an input mask, and tag each index with a horizontal offset Δ: O = [·, ·, ·, 0, 0, 0, 0, 0, ·, ·, ·] Δ = [A, B, C, D, E, F, G, H, I, J, K] As long as the offsets are small, no two indices will swap order, and the code still works. You then build the Voronoi diagram out of the shifted parabolas, but sample the result at unshifted indices. Problem 1 - Opposing Shifts This lead me down the first rabbit hole, which was an attempt to make the EDT subpixel capable without losing its appealing simplicity. I started by investigating the nuances of subpixel EDT in 1D. This was a bit of a mirage, because most real problems only pop up in 2D. Though there was one important insight here. O = [·, ·, ·, 0, 0, 0, 0, 0, ·, ·, ·] Δ = [·, ·, ·, A, ·, ·, ·, B, ·, ·, ·] Given a mask of zeroes and infinities, you can only shift the first and last point of each segment. Infinities don't do anything, while middle zeroes should remain zero. Using an offset A works sort of as expected: this will increase or decrease the values filled in by a fractional pixel, calculating a squared distance (d + A)² where A can be positive or negative. But the value at the shifted index itself is always (0 + A)² (positive). This means it is always outside, regardless of whether it is moving left or right. If A is moving left (–), the point is inside, and the (unsigned) distance should be 0. At B the situation is reversed: the distance should be 0 if B is moving right (+). It might seem like this is annoying but fixable, because the zeroes can be filled in by the opposite signed field. But this is only looking at the binary 1D case, where there are only zeroes and infinities. In 2D, a second pass has non-zero distances, so every index can be shifted: O = [a, b, c, d, e, f, g, h, i, j, k] Δ = [A, B, C, D, E, F, G, H, I, J, K] Now, resolving every subpixel unambiguously is harder than you might think: It's important to notice that the function being sampled by an EDT is not actually smooth: it is the minimum of a discrete set of parabolas, which cross at an angle. The square root of the output only produces a smooth linear gradient because it samples each parabola at integer offsets. Each center only shifts upward by the square of an integer in every pass, so the crossings are predictable. You never sample the 'wrong' side of (d + ...)². A subpixel EDT does not have this luxury. Subpixel EDTs are not irreparably broken though. Rather, they are only valid if they cause the unsigned distance field to increase, i.e. if they dilate the empty space. This is a problem: any shift that dilates the positive field contracts the negative, and vice versa. To fix this, you need to get out of the handwaving stage and actually understand P and N as continuous 2D fields. Problem 2 - Diagonals Consider an aliased, sloped edge. To understand how the classic EDT resolves it, we can turn it into a voronoi diagram for all the white pixel centers: Near the bottom, the field is dominated by the white pixels on the corners: they form diagonal sections downwards. Near the edge itself, the field runs perfectly vertical inside a roughly triangular section. In both cases, an arrow pointing back towards the cell center is only vaguely perpendicular to the true diagonal edge. Near perfect diagonals, the edge distances are just wrong. The distance of edge pixels goes up or right (1), rather than the more logical diagonal 0.707…. The true closest point on the edge is not part of the grid. These fields don't really resolve properly until 6-7 pixels out. You could hide these flaws with e.g. an 8x downscale, but that's 64x more pixels. Either way, you shouldn't expect perfect numerical accuracy from an EDT. Just because it's mathematically separable doesn't mean it's particularly good. In fact, it's only separable because it isn't very good at all. Problem 3 - Gradients In 2D, there is also only one correct answer to the gray case. Consider a diagonal edge, anti-aliased: Thresholding it into black, grey or white, you get: If you now classify the grays as both inside and outside, then the highlighted pixels will be part of both masks. Both the positive and negative field will be exactly zero there, and so will the SDF (P - N): This creates a phantom vertical edge that pushes apart P and N, and causes the average slope to be less than 45º. The field simply has the wrong shape, because gray pixels can be surrounded by other gray pixels. This also explains why TinySDF magically seemed to work despite being so pixelized. The l² gray correction fills in exactly the gaps in the bad (P - N) field where it is zero, and it interpolates towards a symmetrically wrong P and N field on each side. If we instead classify grays as neither inside nor outside, then P and N overlap in the boundary, and it is possible to resolve them into a coherent SDF with a clean 45 degree slope, if you do it right: What seemed like an off-by-one error is actually the right approach in 2D or higher. The subpixel SDF will then be a modified version of this field, where the P and N sides are changed in lock-step to remain mutually consistent. Though we will get there in a roundabout way. Problem 4 - Commuting It's worth pointing out: a subpixel EDT simply cannot commute in 2D. First, consider the data flow of an ordinary EDT: Information from a corner pixel can flow through empty space both when doing X-then-Y and Y-then-X. But information from the horizontal edge pixels can only flow vertically then horizontally. This is okay because the separating lines between adjacent pixels are purely vertical too: the red arrows never 'win'. But if you introduce subpixel shifts, the separating lines can turn: The data flow is still limited to the original EDT pattern, so the edge pixels at the top can only propagate by starting downward. They can only influence adjacent columns if the order is Y-then-X. For vertical edges it's the opposite. That said, this is only a problem on shallow concave curves, where there aren't any corner pixels nearby. The error is that it 'snaps' to the wrong edge point, but only when it is already several pixels away from the edge. In that case, the smaller x² term is dwarfed by the much larger y² term, so the absolute error is small after sqrt. The ESDT Knowing all this, here's how I assembled a "true" Euclidean Subpixel Distance Transform. Subpixel offsets To start we need to determine the subpixel offsets. We can still treat level - 0.5 as the signed distance for any gray pixel, and ignore all white and black for now. The tricky part is determining the exact direction of that distance. As an approximation, we can examine the 3x3 neighborhood around each gray pixel and do a least-squares fit of a plane. As long as there is at least one white and one black pixel in this neighborhood, we get a vector pointing towards where the actual edge is. In practice I apply some horizontal/vertical smoothing here using a simple [1 2 1] kernel. The result is numerically very stable, because the originally rasterized image is visually consistent. This logic is disabled for thin creases and spikes, where it doesn't work. Such points are treated as fully masked out, so that neighboring distances propagate there instead. This is needed e.g. for the pointy negative space of a W to come out right. I also implemented a relaxation step that will smooth neighboring vectors if they point in similar directions. However, the effect is quite minimal, and it rounds very sharp corners, so I ended up disabling it by default. The goal is then to do an ESDT that uses these shifted positions for the minima, to get a subpixel accurate distance field. P and N junction We saw earlier that only non-masked pixels can have offsets that influence the output (#1). We only have offsets for gray pixels, yet we concluded that gray pixels should be masked out, to form a connected SDF with the right shape (#3). This can't work. SDFs are both the problem and the solution here. Dilating and contracting SDFs is easy: add or subtract a constant. So you can expand both P and N fields ahead of time geometrically, and then undo it numerically. This is done by pushing their respective gray pixel centers in opposite directions, by half a pixel, on top of the originally calculated offset: This way, they can remain masked in in both fields, but are always pushed between 0 and 1 pixel inwards. The distance between the P and N gray pixel offsets is always exactly 1, so the non-zero overlap between P and N is guaranteed to be exactly 1 pixel wide everywhere. It's a perfect join anywhere we sample it, because the line between the two ends crosses through a pixel center. When we then calculate the final SDF, we do the opposite, shifting each by half a pixel and trimming it off with a max: SDF = max(0, P - 0.5) - max(0, N - 0.5) Only one of P or N will be > 0.5 at a time, so this is exact. To deal with pure black/white edges, I treat any black neighbor of a white pixel (horizontal or vertical only) as gray with a 0.5 pixel offset (before P/N dilation). No actual blurring needs to happen, and the result is numerically exact minus epsilon, which is nice. ESDT state The state for the ESDT then consists of remembering a signed X and Y offset for every pixel, rather than the squared distance. These are factored into the distance and threshold calculations, separated into its proper parallel and orthogonal components, i.e. X/Y or Y/X. Unlike an EDT, each X or Y pass has to be aware of both axes. But the algorithm is mostly unchanged otherwise, here X-then-Y. The X pass: At the start, only gray pixels have offsets and they are all in the range -1…1 (exclusive). With each pass of ESDT, a winning minima's offsets propagate to its affecting range, tracking the total distance (Δx, Δy) (> 1). At the end, each pixel's offset points to the nearest edge, so the squared distance can be derived as Δx² + Δy². The Y pass: You can see that the vertical distances in the top-left are practically vertical, and not oriented perpendicular to the contour on average: they have not had a chance to propagate horizontally. But they do factor in the vertical subpixel offset, and this is the dominant component. So even without correction it still creates a smooth SDF with a surprisingly small error. Fix ups The commutativity errors are all biased positively, meaning we get an upper bound of the true distance field. You could take the min of X then Y and Y then X. This would re-use all the same prep and would restore rotation-independence at the cost of 2x as many ESDTs. You could try X then Y then X at 1.5x cost with some hacks. But neither would improve diagonal areas, which were still busted in the original EDT. Instead I implemented an additional relaxation pass. It visits every pixel's target, and double checks whether one of the 4 immediate neighbors (with subpixel offset) isn't a better solution: It's a good heuristic because if the target is >1px off there is either a viable commutative propagation path, or you're so far away the error is negligible. It fixes up the diagonals, creating tidy lines when the resolution allows for it: You could take this even further, given that you know the offsets are supposed to be perpendicular to the glyph contour. You could add reprojection with a few dot products here, but getting it to not misfire on edge cases would be tricky. While you can tell the unrelaxed offsets are wrong when visualized, and the fixed ones are better, the visual difference in the output glyphs is tiny. You need to blow glyphs up to enormous size to see the difference side by side. So it too is disabled by default. The diagonals in the original EDT were wrong too and you could barely tell. Emoji An emoji is generally stored as a full color transparent PNG or SVG. The ESDT can be applied directly to its opacity mask to get an SDF, so no problem there. There are an extremely rare handful of emoji with semi-transparent areas, but you can get away with making those solid. For this I just use a filter that detects '+' shaped arrangements of pixels that have (almost) the same transparency level. Then I dilate those by 3x3 to get the average transparency level in each area. Then I divide by it to only keep the anti-aliased edges transparent. The real issue is blending the colors at the edges, when the emoji is being rendered and scaled. The RGB color of transparent pixels is undefined, so whatever values are there will blend into the surrounding pixels, e.g. creating a subtle black halo: Not Premultiplied Premultiplied A common solution is premultiplied alpha. The opacity is baked into the RGB channels as (R * A, G * A, B * A, A), and transparent areas must be fully transparent black. This allows you to use a premultiplied blend mode where the RGB channels are added directly without further scaling, to cancel out the error. But the alpha channel of an SDF glyph is dynamic, and is independent of the colors, so it cannot be premultiplied. We need valid color values even for the fully transparent areas, so that up- or downscaling is still clean. Luckily the ESDT calculates X and Y offsets which point from each pixel directly to the nearest edge. We can use them to propagate the colors outward in a single pass, filling in the entire background. It doesn't need to be very accurate, so no filtering is required. RGB channel Output The result looks pretty great. At normal sizes, the crisp edge hides the fact that the interior is somewhat blurry. Emoji fonts are supported via the underlying ab_glyph library, but are too big for the web (10MB+). So you can just load .PNGs on demand instead, at whatever resolution you need. Hooking it up to the 2D canvas to render native system emoji is left as an exercise for the reader. Use.GPU does not support complex Unicode scripts or RTL text yet—both are a can of worms I wish to offload too—but it does support composite emoji like "pirate flag" (white flag + skull and crossbones) or "male astronaut" (astronaut + man) when formatted using the usual Zero-Width Joiners (U+200D) or modifiers. Shading Finally, a note on how to actually render with SDFs, which is more nuanced than you might think. I pack all the SDF glyphs into an atlas on-demand, the same I use elsewhere in Use.GPU. This has a custom layout algorithm that doesn't backtrack, optimized for filling out a layout at run-time with pieces of a similar size. Glyphs are rasterized at 1.5x their normal font size, after rounding up to the nearest power of two. The extra 50% ensures small fonts on low-DPI displays still use a higher quality SDF, while high-DPI displays just upscale that SDF without noticeable quality loss. The rounding ensures similar font sizes reuse the same SDFs. You can also override the detail independent of font size. To determine the contrast factor to draw an SDF, you generally use screen-space derivatives. There are good and bad ways of doing this. Your goal is to get a ratio of SDF pixels to screen pixels, so the best thing to do is give the GPU the coordinates of the SDF texture pixels, and ask it to calculate the difference for that between neighboring screen pixels. This works for surfaces in 3D at an angle too. Bad ways of doing this will instead work off relative texture coordinates, and introduce additional scaling factors based on the view or atlas size, when they are all just supposed to cancel out. As you then adjust the contrast of an SDF to render it, it's important to do so around the zero-level. The glyph's ideal vector shape should not expand or contract as you scale it. Like TinySDF, I use 75% gray as the zero level, so that more SDF range is allocated to the outside than the inside, as dilating glyphs is much more common than contraction. At the same time, a pixel whose center sits exactly on the zero level edge is actually half inside, half outside, i.e. 50% opaque. So, after scaling the SDF, you need to add 0.5 to the value to get the correct opacity for a blend. This gives you a mathematically accurate font rendering that approximates convolution with a pixel-sized circle or box. But I go further. Fonts were not invented for screens, they were designed for paper, with ink that bleeds. Certain renderers, e.g. MacOS, replicate this effect. The physical bleed distance is relatively constant, so the larger the font, the smaller the effect of the bleed proportionally. I got the best results with a 0.25 pixel bleed at 32px or more. For smaller sizes, it tapers off linearly. When you zoom out blocks of text, they get subtly fatter instead of thinning out, and this is actually a great effect when viewing document thumbnails, where lines of text become a solid mass at the point where the SDF resolution fails anyway. In Use.GPU I prefer to use gamma correct, linear RGB color, even for 2D. What surprised me the most is just how unquestionably superior this looks. Text looks rock solid and readable even at small sizes on low-DPI. Because the SDF scales, there is no true font hinting, but it really doesn't need it, it would just be a nice extra. Presumably you could track hinted points or edges inside SDF glyphs and then do a dynamic distortion somehow, but this is an order of magnitude more complex than what it is doing now, which is splat a contrasted texture on screen. It does have snapping you can turn on, which avoids jiggling of individual letters. But if you turn it off, you get smooth subpixel everything: I was always a big fan of the 3x1 subpixel rendering used on color LCDs (i.e. ClearType and the like), and I was sad when it was phased out due to the popularity of high-DPI displays. But it turns out the 3x res only offered marginal benefits... the real improvement was always that it had a custom gamma correct blend mode, which is a thing a lot of people still get wrong. Even without RGB subpixels, gamma correct AA looks great. Converting the entire desktop to Linear RGB is also not going to happen in our lifetime, but I really want it more now. The "blurry text" that some people associate with anti-aliasing is usually just text blended with the wrong gamma curve, and without an appropriate bleed for the font in question. * * * If you want to make SDFs from existing input data, subpixel accuracy is crucial. Without it, fully crisp strokes actually become uneven, diagonals can look bumpy, and you cannot make clean dilated outlines or shadows. If you use an EDT, you have to start from a high resolution source and then downscale away all the errors near the edges. But if you use an ESDT, you can upscale even emoji PNGs with decent results. It might seem pretty obvious in hindsight, but there is a massive difference between getting it sort of working, and actually getting all the details right. There were many false starts and dead ends, because subpixel accuracy also means one bad pixel ruins it. In some circles, SDF text is an old party trick by now... but a solid and reliable implementation is still a fair amount of work, with very little to go off for the harder parts. By the way, I did see if I could use voronoi techniques directly, but in terms of computation it is much more involved. Pretty tho: The ESDT is fast enough to use at run-time, and the implementation is available as a stand-alone import for drop-in use. This post started as a single live WebGPU diagram, which you can play around with. The source code for all the diagrams is available too.

a year ago 40 votes
Fuck It, We'll Do It Live

How the Live effect run-time is implemented In this post I describe how the Live run-time internals are implemented, which drive Use.GPU. Some pre-existing React and FP effect knowledge is useful. I have written about Live before, but in general terms. You may therefor have a wrong impression of this endeavor. When a junior engineer sees an application doing complex things, they're often intimidated by the prospect of working on it. They assume that complex functionality must be the result of complexity in code. The fancier the app, the less understandable it must be. This is what their experience has been so far: seniority correlates to more and hairier lines of code. After 30 years of coding though, I know it's actually the inverse. You cannot get complex functionality working if you've wasted all your complexity points on the code itself. This is the main thing I want to show here, because this post mainly describes 1 data structure and a handful of methods. Live has a real-time inspector, so a lot of this can be demonstrated live. Reading this on a phone is not recommended, the inspector is made for grown-ups. The story so far: The main mechanism of Live is to allow a tree to expand recursively like in React, doing breadth-first expansion. This happens incrementally, and in a reactive, rewindable way. You use this to let interactive programs knit themselves together at run-time, based on the input data. Like a simple CLI program with a main() function, the code runs top to bottom, and then stops. It produces a finite execution trace that you can inspect. To become interactive and to animate, the run-time will selectively rewind, and re-run only parts, in response to external events. It's a fusion of immediate and retained mode UI, offering the benefits of both and the downsides of neither, not limited to UI. This relies heavily on FP principles such as pure functions, immutable data structures and so on. But the run-time itself is very mutable: the whole idea is to centralize all the difficult parts of tracking changes in one place, and then forget about them. Live has no dependencies other than a JavaScript engine and these days consists of ~3600 lines. If you're still not quite sure what the Live component tree actually is, it's 3 things at once: a data dependency graph an execution trace a tree-shaped cache The properties of the software emerge because these aspects are fully aligned inside a LiveComponent. Functionally Mayonnaise You can approach this from two sides, either from the UI side, or from the side of functional Effects. Live Components A LiveComponent (LC) is a React UI function component (FC) with 1 letter changed, at first: const MyComponent: LC<MyProps> = (props: MyProps) => { const {wat} = props; // A memo hook // Takes dependencies as array const slow = useMemo(() => expensiveComputation(wat), [wat]); // Some local state const [state, setState] = useState(1); // JSX expressions with props and children // These are all names of LC functions to call + their props/arguments return ( <OtherComponent> <Foo value={slow} /> <Bar count={state} setCount={setState} /> </OtherComponent> ); }; The data is immutable, and the rendering appears stateless: it returns a pure data structure for given input props and current state. The component uses hooks to access and manipulate its own state. The run-time will unwrap the outer layer of the <JSX> onion, mount and reconcile it, and then recurse. let _ = await ( <OtherComponent> <Foo foo={foo} /> <Bar /> </OtherComponent> ); return null; The code is actually misleading though. Both in Live and React, the return keyword here is technically wrong. Return implies passing a value back to a parent, but this is not happening at all. A parent component decided to render <MyComponent>, yes. But the function itself is being called by Live/React. it's yielding JSX to the Live/React run-time to make a call to OtherComponent(...). There is no actual return value. Because a <Component> can't return a value to its parent, the received _ will always be null too. The data flow is one-way, from parent to child. Effects An Effect is basically just a Promise/Future as a pure value. To first approximation, it's a () => Promise: a promise that doesn't actually start unless you call it a second time. Just like a JSX tag is like a React/Live component waiting to be called. An Effect resolves asynchronously to a new Effect, just like <JSX> will render more <JSX>. Unlike a Promise, an Effect is re-usable: you can fire it as many times as you like. Just like you can keep rendering the same <JSX>. let value = yield ( OtherEffect([ Foo(foo), Bar(), ]) ); // ... return value; So React is like an incomplete functional Effect system. Just replace the word Component with Effect. OtherEffect is then some kind of decorator which describes a parallel dispatch to Effects Foo and Bar. A real Effect system will fork, but then join back, gathering the returned values, like a real return statement. Unlike React components, Effects are ephemeral: no state is retained after they finish. The purity is actually what makes them appealing in production, to manage complex async flows. They're also not incremental/rewindable: they always run from start to finish.   Pure Returns State Incremental React ✅ ❌ ✅ ✅ Effects ✅ ✅ ❌ ❌ You either take an effect system and make it incremental and stateful, or you take React and add the missing return data path I chose the latter option. First, because hooks are an excellent emulsifier. Second, because the big lesson from React is that plain, old, indexed arrays are kryptonite for incremental code. Unless you've deliberately learned how to avoid them, you won't get far, so it's better to start from that side. This breakdown is divided into three main parts: the rendering of 1 component the environment around a component the overall tree update loop Components The component model revolves around a few core concepts: Fibers Hooks and State Mounts Memoization Inlining Components form the "user land" of Live. You can do everything you need there without ever calling directly into the run-time's "kernel". Live however does not shield its internals. This is fine, because I don't employ hundreds of junior engineers who would gleefully turn that privilege into a cluster bomb of spaghetti. The run-time is not extensible anyhow: what you see is what you get. The escape hatch is there to support testing and debugging. Shielding this would be a game of hide-the-reference, creating a shadow-API for privileged friend packages, and so on. Ain't nobody got time for that. React has an export called DONT_USE_THIS_OR_YOU_WILL_BE_FIRED, Live has THIS_WAY___IF_YOU_DARE and it's called useFiber. Fibers Borrowing React terminology, a mounted <Component> function is called a fiber, despite this being single threaded. Each persists for the component lifetime. To start, you call render(<App />). This creates and renders the first fiber. type LiveFiber = { // Fiber ID id: number, // Component function f: Function, // Arguments (props, etc.) args: any[], // ... } Fibers are numbered with increasing IDs. In JS this means you can create 253 fibers before it crashes, which ought to be enough for anybody. It holds the component function f and its latest arguments args. Unlike React, Live functions aren't limited to only a single props argument. Each fiber is rendered from a <JSX> tag, which is a plain data structure. The Live version is very simple. type Key = number | string; type JSX.Element = { // Same as fiber f: Function, args: any[], // Element key={...} key?: string | number, // Rendered by fiber ID by: number, } Another name for this type is a DeferredCall. This is much leaner than React's JSX type, although Live will gracefully accept either. In Live, JSX syntax is also optional, as you can write use(Component, …) instead of <Component … />. Calls and fibers track the ID by of the fiber that rendered them. This is always an ancestor, but not necessarily the direct parent. fiber.bound = () => { enterFiber(fiber); const {f, args} = fiber; const jsx = f(...args); exitFiber(fiber); return jsx; }; The fiber holds a function bound. This binds f to the fiber itself, always using the current fiber.args as arguments. It wraps the call in an enter and exit function for state housekeeping. This can then be called via renderFiber(fiber) to get jsx. This is only done during an ongoing render cycle. { // ... state: any[], pointer: number, } Hooks and State Each fiber holds a local state array and a temporary pointer: Calling a hook like useState taps into this state without an explicit reference to it. In Live, this is implemented as a global currentFiber variable, combined with a local fiber.pointer starting at 0. Both are initialized by enterFiber(fiber). The state array holds flattened triplets, one per hook. They're arranged as [hookType, A, B]. Values A and B are hook-specific, but usually hold a value and a dependencies array. In the case useState, it's just the [value, setValue] pair. The fiber.pointer advances by 3 slots every time a hook is called. Tracking the hookType allows the run-time to warn you if you call hooks in a different order than before. The basic React hooks don't need any more state than this and can be implemented in ~20 lines of code each. This is useMemo: export const useMemo = <T>( callback: () => T, dependencies: any[] = NO_DEPS, ): T => { const fiber = useFiber(); const i = pushState(fiber, Hook.MEMO); let {state} = fiber; let value = state![i]; const deps = state![i + 1]; if (!isSameDependencies(deps, dependencies)) { value = callback(); state![i] = value; state![i + 1] = dependencies; } return value as unknown as T; } useFiber just returns currentFiber and doesn't count as a real hook (it has no state). It only ensures you cannot call a hook outside of a component render. export const useNoHook = (hookType: Hook) => () => { const fiber = useFiber(); const i = pushState(fiber, hookType); const {state} = fiber; state![i] = undefined; state![i + 1] = undefined; }; No-hooks like useNoMemo are also implemented, which allow for conditional hooks: write a matching else branch for any if. To ensure consistent rendering, a useNoHook will dispose of any state the useHook had, rather than just being a no-op. The above is just the basic version for simple hooks without cleanup. This also lets the run-time support early return cleanly in Components: when exitFiber(fiber) is called, all remaining unconsumed state is disposed of with the right no-hook. If someone calls a setState, this is added to a dispatch queue, so changes can be batched together. If f calls setState during its own render, this is caught and resolved within the same render cycle, by calling f again. A setState which is a no-op is dropped (pointer equality). You can see however that Live hooks are not pure: when a useMemo is tripped, it will immediately overwrite the previous state during the render, not after. This means renders in Live are not stateless, only idempotent. This is very deliberate. Live doesn't have a useEffect hook, it has a useResource hook that is like a useMemo with a useEffect-like disposal callback. While it seems to throw React's orchestration properties out the window, this is not actually so. What you get in return is an enormous increase in developer ergonomics, offering features React users are still dreaming of, running off 1 state array and 1 pointer. Live is React with the training wheels off, not with holodeck security protocols disabled, but this takes a while to grok. Mounts After rendering, the returned/yielded <JSX> value is reconciled with the previous rendered result. This is done by updateFiber(fiber, value). New children are mounted, while old children are unmounted or have their args replaced. Only children with the same f as before can be updated in place. { // ... // Static mount mount?: LiveFiber, // Dynamic mounts mounts?: Map<Key, LiveFiber>, lookup?: Map<Key, number>, order?: Key[], // Continuation next?: LiveFiber, // Fiber type type?: LiveComponent, // ... } Mounts are tracked inside the fiber, either as a single mount, or a map mounts, pointing to other fiber objects. The key for mounts is either an array index 0..N or a user-defined key. Keys must be unique. The order of the keys is kept in a list. A reverse lookup map is created if they're not anonymous indices. The mount is only used when a component renders 1 other statically. This excludes arrays of length 1. If a component switches between mount and mounts, all existing mounts are discarded. Continuations are implemented as a special next mount. This is mounted by one of the built-in fenced operators such as <Capture> or <Gather>. In the code, mounting is done via: mountFiberCall(fiber, call) (static) reconcileFiberCalls(fiber, calls) (dynamic) mountFiberContinuation(fiber, call) (next). Each will call updateMount(fiber, mount, jsx, key?, hasKeys?). If an existing mount (with the same key) is compatible it's updated, otherwise a replacement fiber is made with makeSubFiber(…). It doesn't update the parent fiber, rather it just returns the new state of the mount (LiveFiber | null), so it can work for all 3 mounting types. Once a fiber mount has been updated, it's queued to be rendered with flushMount. If updateMount returns false, the update was a no-op because fiber arguments were identical (pointer equality). The update will be skipped and the mount not flushed. This follows the same implicit memoization rule that React has. It tends to trigger when a stateful component re-renders an old props.children. A subtle point here is that fibers have no links/pointers pointing back to their parent. This is part practical, part ideological. It's practical because it cuts down on cyclic references to complicate garbage collection. It's ideological because it helps ensures one-way data flow. There is also no global collection of fibers, except in the inspector. Like in an effect system, the job of determining what happens is entirely the result of an ongoing computation on JSX, i.e. something passed around like pure, immutable data. The tree determines its own shape as it's evaluated. Queue and Path Live needs to process fibers in tree order, i.e. as in a typical tree list view. To do so, fibers are compared as values with compareFibers(a, b). This is based on references that are assigned only at fiber creation. It has a path from the root of the tree to the fiber (at depth depth), containing the indices or keys. { // ... depth: number, path: Key[], keys: ( number | Map<Key, number> )[], } A continuation next is ordered after the mount or mounts. This allows data fences to work naturally: the run-time only ensures all preceding fibers have been run first. For this, I insert an extra index into the path, 0 or 1, to distinguish the two sub-trees. If many fibers have a static mount (i.e. always 1 child), this would create paths with lots of useless zeroes. To avoid this, a single mount has the same path as its parent, only its depth is increased. Paths can still be compared element-wise, with depth as the tie breaker. This easily reduces typical path length by 70%. This is enough for children without keys, which are spawned statically. Their order in the tree never changes after creation, they can only be updated in-place or unmounted. But for children with a key, the expectation is that they persist even if their order changes. Their keys are just unsorted ids, and their order is stored in the fiber.order and fiber.lookup of the parent in question. This is referenced in the fiber.keys array. It's a flattened list of pairs [i, fiber.lookup], meaning the key at index i in the path should be compared using fiber.lookup. To keep these keys references intact, fiber.lookup is mutable and always modified in-place when reconciling. Memoization If a Component function is wrapped in memo(...), it won't be re-rendered if its individual props haven't changed (pointer equality). This goes deeper than the run-time's own oldArgs !== newArgs check. { // ... version?: number, memo?: number, runs?: number, } For this, memoized fibers keep a version around. They also store a memo which holds the last rendered version, and a run count runs for debugging: The version is used as one of the memo dependencies, along with the names and values of the props. Hence a memo(...) cache can be busted just by incrementing fiber.version, even if the props didn't change. Versions roll over at 32-bit. To actually do the memoization, it would be nice if you could just wrap the whole component in useMemo. It doesn't work in the React model because you can't call other hooks inside hooks. So I've brought back the mythical useYolo... An earlier incarnation of this allowed fiber.state scopes to be nested, but lacked a good purpose. The new useYolo is instead a useMemo you can nest. It effectively hot swaps the entire fiber.state array with a new one kept in one of the slots: This is then the first hook inside fiber.state. If the memo succeeds, the yolo'd state is preserved without treating it as an early return. Otherwise the component runs normally. Yolo'ing as the first hook has a dedicated fast path but is otherwise a perfectly normal hook. The purpose of fiber.memo is so the run-time can tell whether it rendered the same thing as before, and stop. It can just compare the two versions, leaving the specifics of memoization entirely up to the fiber component itself. For example, to handle a custom arePropsEqual function in memo(…). I always use version numbers as opposed to isDirty flags, because it leaves a paper trail. This provides the same ergonomics for mutable data as for immutable data: you can store a reference to a previous value, and do an O(1) equality check to know whether it changed since you last accessed it. Whenever you have a handle which you can't look inside, such as a pointer to GPU memory, it's especially useful to keep a version number on it, which you bump every time you write to it. It makes debugging so much easier. Inlining Built-in operators are resolved with a hand-coded routine post-render, rather than being "normal" components. Their component functions are just empty and there is a big dispatch with if statements. Each is tagged with a isLiveBuiltin: true. If a built-in operator is an only child, it's usually resolved inline. No new mount is created, it's immediately applied as part of updating the parent fiber. The glue in between tends to be "kernel land"-style code anyway, it doesn't need a whole new fiber, and it's not implemented in terms of hooks. The only fiber state it has is the type (i.e. function) of the last rendered JSX. There are several cases where it cannot inline, such as rendering one built-in inside another built-in, or rendering a built-in as part of an array. So each built-in can always be mounted independently if needed. From an architectural point of view, inlining is just incidental complexity, but this significantly reduces fiber overhead and keeps the user-facing component tree much tidier. It introduces a few minor problems around cleanup, but those are caught and handled. Live also has a morph operator. This lets you replace a mount with another component, without discarding any matching children or their state. The mount's own state is still discarded, but its f, args, bound and type are modified in-place. A normal render follows, which will reconcile the children. This is implemented in morphFiberCall. It only works for plain vanilla components, not other built-ins. The reason to re-use the fiber rather than transplant the children is so that references in children remain unchanged, without having to rekey them. In Live, I never do a full recursive traversal of any sub-tree, unless that traversal is incremental and memoized. This is a core property of the system. Deep recursion should happen in user-land. Environment Fibers have access to a shared environment, provided by their parent. This is created in user-land through built-in ops and accessed via hooks. Context and captures Gathers and yeets Fences and suspend Quotes and reconcilers Unquote + quote Context and captures Live extends the classic React context: { // ... context: { values: Map<LiveContext | LiveCapture, Ref<any>>, roots: Map<LiveContext | LiveCapture, number | LiveFiber>, }, } A LiveContext provides 1 value to N fibers. A LiveCapture collects N values into 1 fiber. Each is just an object created in user-land with makeContext / makeCapture, acting as a unique key. It can also hold a default value for a context. The values map holds the current value of each context/capture. This is boxed inside a Ref as {current: value} so that nested sub-environments share values for inherited contexts. The roots map points to the root fibers providing or capturing. This is used to allow useContext and useCapture to set up the right data dependency just-in-time. For a context, this points upstream in the tree, so to avoid a reverse reference, it's a number. For a capture, this points to a downstream continuation, i.e. the next of an ancestor, and can be a LiveFiber. Normally children just share their parent's context. It's only when you <Provide> or <Capture> that Live builds a new, immutable copy of values and roots with a new context/capture added. Each context and capture persists for the lifetime of its sub-tree. Captures build up a map incrementally inside the Ref while children are rendered, keyed by fiber. This is received in tree order after sorting: <Capture context={...} children={...} then={(values: T[]) => { ... }} /> You can also just write capture(context, children, then), FYI. This is an await or yield in disguise, where the then closure is spiritually part of the originating component function. Therefor it doesn't need to be memoized. The state of the next fiber is preserved even if you pass a new function instance every time. Unlike React-style render props, then props can use hooks, because they run on an independent next fiber called Resume(…). This fiber will be re-run when values changes, and can do so without re-running Capture itself. A then prop can render new elements, building a chain of next fibers. This acts like a rewindable generator, where each Resume provides a place where the code can be re-entered, without having to explicitly rewind any state. This requires the data passed into each closure to be immutable. The logic for providing or capturing is in provideFiber(fiber, ...) and captureFiber(fiber, ...). Unlike other built-ins, these are always mounted separately and are called at the start of a new fiber, not the end of previous one. Their children are then immediately reconciled by inlineFiberCall(fiber, calls). Gathers and yeets Live offers a true return, in the form of yeet(value) (aka <Yeet>{value}</Yeet>). This passes a value back to a parent. These values are gathered in an incremental map-reduce along the tree, to a root that mounted a gathering operation. It's similar to a Capture, except it visits every parent along the way. It's the complement to tree expansion during rendering. This works for any mapper and reducer function via <MapReduce>. There is also an optimized code path for a simple array flatMap <Gather>, as well as struct-of-arrays flatMap <MultiGather>. It works just like a capture: <Gather children={...} then={( value: T[] ) => { ... }} /> <MultiGather children={...} then={( value: Record<string, T[]> ) => { ... }} /> Each fiber in a reduction has a fiber.yeeted structure, created at mount time. Like a context, this relation never changes for the lifetime of the component. It acts as a persistent cache for a yeeted value of type A and its map-reduction reduced of type B: { yeeted: { // Same as fiber (for inspecting) id: number, // Reduction cache at this fiber value?: A, reduced?: B, // Parent yeet cache parent?: FiberYeet<A, B>, // Reduction root root: LiveFiber, // ... }, } The last value yeeted by the fiber is kept so that all yeets are auto-memoized. Each yeeted points to a parent. This is not the parent fiber but its fiber.yeeted. This is the parent reduction, which is downstream in terms of data dependency, not upstream. This forms a mirrored copy of the fiber tree and respects one-way data flow: Again the linked root fiber (sink) is not an ancestor, but the next of an ancestor, created to receive the final reduced value. If the reduced value is undefined, this signifies an empty cache. When a value is yeeted, parent caches are busted recursively towards the root, until an undefined is encountered. If a fiber mounts or unmounts children, it busts its reduction as well. Fibers that yeet a value cannot also have children. This isn't a limitation because you can render a yeet beside other children, as just another mount, without changing the semantics. You can also render multiple yeets, but it's faster to just yeet a single list. If you yeet undefined, this acts as a zero-cost signal: it does not affect the reduced values, but it will cause the reducing root fiber to be re-invoked. This is a tiny concession to imperative semantics, wildly useful. This may seem very impure, but actually it's the opposite. With clean, functional data types, there is usually a "no-op" value that you could yeet: an empty array or dictionary, an empty function, and so on. You can always force-refresh a reduction without meaningfully changing the output, but it causes a lot of pointless cache invalidation in the process. Zero-cost signals are just an optimization. When reducing a fiber that has a gathering next, it takes precedence over the fiber's own reduction: this is so that you can gather and reyeet in series, with the final reduction returned. Fences and suspend The specifics of a gathering operation are hidden behind a persistent emit and gather callback, derived from a classic map and reduce: { yeeted: { // ... // Emit a value A yeeted from fiber emit: (fiber: LiveFiber, value: A) => void, // Gather a reduction B from fiber gather: (fiber: LiveFiber, self: boolean) => B, // Enclosing yeet scope scope?: FiberYeet<any, any>, }, } Gathering is done by the root reduction fiber, so gather is not strictly needed here. It's only exposed so you can mount a <Fence> inside an existing reduction, without knowing its specifics. A fence will grab the intermediate reduction value at that point in the tree and pass it to user-land. It can then be reyeeted. One such use is to mimic React Suspense using a special toxic SUSPEND symbol. It acts like a NaN, poisoning any reduction it's a part of. You can then fence off a sub-tree to contain the spill and substitute it with its previous value or a fallback. In practice, gather will delegate to one of gatherFiberValues, multiGatherFiberValues or mapReduceFiberValues. Each will traverse the sub-tree, reuse any existing reduced values (stopping the recursion early), and fill in any undefineds via recursion. Their code is kinda gnarly, given that it's just map-reduce, but that's because they're hand-rolled to avoid useless allocations. The self argument to gather is such an optimization, only true for the final user-visible reduction. This lets intermediate reductions be type unsafe, e.g. to avoid creating pointless 1 element arrays. At a gathering root, the enclosing yeet scope is also kept. This is to cleanly unmount an inlined gather, by restoring the parent's yeeted. Quotes and reconcilers Live has a reconciler in reconcileFiberCalls, but it can also mount <Reconciler> as an effect via mountFiberReconciler. This is best understood by pretending this is React DOM. When you render a React tree which mixes <Components> with <html>, React reconciles it, and extracts the HTML parts into a new tree: <App> <div> <Layout> => <div> <div> <span> <div> <img> <Header> <span> <Logo> <img> Each HTML element is implicitly quoted inside React. They're only "activated" when they become real on the right. The ones on the left are only stand-ins. That's also what a Live <Reconcile> does. It mounts a normal tree of children, but it simultaneously mounts an independent second tree, under its next mount. If you render this: <App> <Reconcile> <Layout> <Quote> <Div> <Div> <Unquote> <Header> <Quote> <Span> <Unquote> <Logo> <Quote> <Img /> ... You will get: It adds a quote environment to the fiber: { // ... quote: { // Reconciler fiber root: number, // Fiber in current sub-tree from: number, // Fiber in other sub-tree to: LiveFiber, // Enclosing reconciler scope scope?: FiberQuote, } } When you render a <Quote>...</Quote>, whatever's inside ends up mounted on the to fiber. Quoted fibers will have a similar fiber.unquote environment. If they render an <Unquote>...</Unquote>, the children are mounted back on the quoting fiber. Each time, the quoting or unquoting fiber becomes the new to fiber on the other side. The idea is that you can use this to embed one set of components inside another as a DSL, and have the run-time sort them out. This all happens in mountFiberQuote(…) and mountFiberUnquote(…). It uses reconcileFiberCall(…) (singular). This is an incremental version of reconcileFiberCalls(…) (plural) which only does one mount/unmount at a time. The fiber id of the quote or unquote is used as the key of the quoted or unquoted fiber. const Queue = ({children}) => ( reconcile( quote( gather( unquote(children), (things: any[]) => <Render things={things} /> )))); The quote and unquote environments are separate so that reconcilers can be nested: at any given place, you can unquote 'up' or quote 'down'. Because you can put multiple <Unquote>s inside one <Quote>, it can also fork. The internal non-JSX dialect is very Lisp-esque, you can rap together some pretty neat structures with this. Because quote are mounted and unmounted incrementally, there is a data fence Reconcile(…) after each (un)quote. This is where the final set is re-ordered if needed. The data structure actually violates my own rule about no-reverse links. After you <Quote>, the fibers in the second tree have a link to the quoting fiber which spawned them. And the same applies in the other direction after you <Unquote>. The excuse is ergonomics. I could break the dependency by creating a separate sub-fiber of <Quote> to serve as the unquoting point, and vice versa. But this would bloat both trees with extra fibers, just for purity's sake. It already has unavoidable extra data fences, so this matters. At a reconciling root, the enclosing quote scope is added to fiber.quote, just like in yeeted, again for clean unmounting of inlined reconcilers. Unquote-quote There is an important caveat here. There are two ways you could implement this. One way is that <Quote>...</Quote> is a Very Special built-in, which does something unusual: it would traverse the children tree it was given, and go look for <Unquote>...</Unquote>s inside. It would have to do so recursively, to partition the quoted and unquoted JSX. Then it would have to graft the quoted JSX to a previous quote, while grafting the unquoted parts to itself as mounts. This is the React DOM mechanism, obfuscated. This is also how quoting works in Lisp: it switches between evaluation mode and AST mode. I have two objections. The first is that this goes against the whole idea of evaluating one component incrementally at a time. It wouldn't be working with one set of mounts on a local fiber: it would be building up args inside one big nested JSX expression. JSX is not supposed to be a mutable data structure, you're supposed to construct it immutably from the inside out, not from the outside in. The second is that this would only work for 'balanced' <Quote>...<Unquote>... pairs appearing in the same JSX expression. If you render: <Present> <Slide /> </Present> ...then you couldn't have <Present> render a <Quote> and <Slide> render an <Unquote> and have it work. It wouldn't be composable as two separate portals. The only way for the quotes/unquotes to be revealed in such a scenario is to actually render the components. This means you have to actively run the second tree as it's being reconciled, same as the first. There is no separate update + commit like in React DOM. This might seem pointless, because all this does is thread the data flow into a zigzag between the two trees, knitting the quote/unquote points together. The render order is the same as if <Quote> and <Unquote> weren't there. The path and depth of quoted fibers reveals this, which is needed to re-render them in the right order later. The key difference is that for all other purposes, those fibers do live in that spot. Each tree has its own stack of nested contexts. Reductions operate on the two separate trees, producing two different, independent values. This is just "hygienic macros" in disguise, I think. Use.GPU's presentation system uses a reconciler to wrap the layout system, adding slide transforms and a custom compositor. This is sandwiched in-between it and the normal renderer. A plain declarative tree of markup can be expanded into: <Device> <View> <Present> <Slide> <Object /> <Object /> </Slide> <Slide> <Object /> <Object /> </Slide> </Present> </View> </Device> I also use a reconciler to produce the WebGPU command queue. This is shared for an entire app and sits at the top. The second tree just contains quoted yeets. I use zero-cost signals here too, to let data sources signal that their contents have changed. There is a short-hand <Signal /> for <Quote><Yeet /></Quote>. Note that you cannot connect the reduction of tree 1 to the root of tree 2: <Reconcile> does not have a then prop. It doesn't make sense because the next fiber gets its children from elsewhere, and it would create a rendering cycle if you tried anyway. If you need to spawn a whole second tree based on a first, that's what a normal gather already does. You can use it to e.g. gather lambdas that return memoized JSX. This effectively acts as a two-phase commit. The Use.GPU layout system does this repeatedly, with several trees + gathers in a row. It involves constraints both from the inside out and the outside in, so you need both tree directions. The output is UI shapes, which need to be batched together for efficiency and turned into a data-driven draw call. The Run-Time With all the pieces laid out, I can now connect it all together. Before render(<App />) can render the first fiber, it initializes a very minimal run-time. So this section will be kinda dry. This is accessed through fiber.host and exposes a handful of APIs: a queue of pending state changes a priority queue for traversal a fiber-to-fiber dependency tracker a resource disposal tracker a stack slicer for reasons State changes When a setState is called, the state change is added to a simple queue as a lambda. This allows simultaneous state changes to be batched together. For this, the host exposes a schedule and a flush method. { // ... host: { schedule: (fiber: LiveFiber, task?: () => boolean | void) => void, flush: () => void, // ... } This comes from makeActionScheduler(…). It wraps a native scheduling function (e.g. queueMicrotask) and an onFlush callback: const makeActionScheduler = ( schedule: (flush: ArrowFunction) => void, onFlush: (fibers: LiveFiber[]) => void, ) => { // ... return {schedule, flush}; } The callback is set up by render(…). It will take the affected fibers and call renderFibers(…) (plural) on them. The returned schedule(…) will trigger a flush, so flush() is only called directly for sync execution, to stay within the same render cycle. Traversal The host keeps a priority queue (makePriorityQueue) of pending fibers to render, in tree order: { // ... host: { // ... visit: (fiber: LiveFiber) => void, unvisit: (fiber: LiveFiber) => void, pop: () => LiveFiber | null, peek: () => LiveFiber | null, } } renderFibers(…) first adds the fibers to the queue by calling host.visit(fiber). A loop in renderFibers(…) will then call host.peek() and host.pop() until the queue is empty. It will call renderFiber(…) and updateFiber(…) on each, which will call host.unvisit(fiber) in the process. This may also cause other fibers to be added to the queue. The priority queue is a singly linked list of fibers. It allows fast appends at the start or end. To speed up insertions in the middle, it remembers the last inserted fiber. This massively speeds up the very common case where multiple fibers are inserted into an existing queue in tree order. Otherwise it just does a linear scan. It also has a set of all the fibers in the queue, so it can quickly do presence checks. This means visit and unvisit can safely be called blindly, which happens a lot. // Re-insert all fibers that descend from fiber const reorder = (fiber: LiveFiber) => { const {path} = fiber; const list: LiveFiber[] = []; let q = queue; let qp = null; while (q) { if (compareFibers(fiber, q.fiber) >= 0) { hint = qp = q; q = q.next; continue; } if (isSubNode(fiber, q.fiber)) { list.push(q.fiber); if (qp) { qp.next = q.next; q = q.next; } else { pop(); q = q.next; } } break; } if (list.length) { list.sort(compareFibers); list.forEach(insert); } }; There is an edge case here though. If a fiber re-orders its keyed children, the compareFibers fiber order of those children changes. But, because of long-range dependencies, it's possible for those children to already be queued. This might mean a later cousin node could render before an earlier one, though never a child before a parent or ancestor. In principle this is not an issue because the output—the reductions being gathered—will be re-reduced in new order at a fence. From a pure data-flow perspective, this is fine: it would even be inevitable in a multi-threaded version. In practice, it feels off if code runs out of order for no reason, especially in a dev environment. So I added optional queue re-ordering, on by default. This can be done pretty easily because the affected fibers can be found by comparing paths, and still form a single group inside the otherwise ordered queue: scan until you find a fiber underneath the parent, then pop off fibers until you exit the subtree. Then just reinsert them. This really reminds me of shader warp reordering in raytracing GPUs btw. Dependencies To support contexts and captures, the host has a long-range dependency tracker (makeDependencyTracker): { host: { // ... depend: (fiber: LiveFiber, root: number) => boolean, undepend: (fiber: LiveFiber, root: number) => void, traceDown: (fiber: LiveFiber) => LiveFiber[], traceUp: (fiber: LiveFiber) => number[], } }; It holds two maps internally, each mapping fibers to fibers, for precedents and descendants respectively. These are mapped as LiveFiber -> id and id -> LiveFiber, once again following the one-way rule. i.e. It gives you real fibers if you traceDown, but only fiber IDs if you traceUp. The latter is only used for highlighting in the inspector. The depend and undepend methods are called by useContext and useCapture to set up a dependency this way. When a fiber is rendered (and did not memoize), bustFiberDeps(…) is called. This will invoke traceDown(…) and call host.visit(…) on each dependent fiber. It will also call bustFiberMemo(…) to bump their fiber.version (if present). Yeets could be tracked the same way, but this is unnecessary because yeeted already references the root statically. It's a different kind of cache being busted too (yeeted.reduced) and you need to bust all intermediate reductions along the way. So there is a dedicated visitYeetRoot(…) and bustFiberYeet(…) instead. Yeets are actually quite tricky to manage because there are two directions of traversal here. A yeet must bust all the caches towards the root. Once those caches are busted, another yeet shouldn't traverse them again until filled back in. It stops when it encounters undefined. Second, when the root gathers up the reduced values from the other end, it should be able to safely accept any defined yeeted.reduced as being correctly cached, and stop as well. The invariant to be maintained is that a trail of yeeted.reduced === undefined should always lead all the way back to the root. New fibers have an undefined reduction, and old fibers may be unmounted, so these operations also bust caches. But if there is no change in yeets, you don't need to reduce again. So visitYeetRoot is not actually called until and unless a new yeet is rendered or an old yeet is removed. Managing the lifecycle of this is simple, because there is only one place that triggers a re-reduction to fill it back in: the yeet root. Which is behind a data fence. It will always be called after the last cache has been busted, but before any other code that might need it. It's impossible to squeeze anything in between. It took a while to learn to lean into this style of thinking. Cache invalidation becomes a lot easier when you can partition your program into "before cache" and "after cache". Compared to the earliest versions of Live, the how and why of busting caches is now all very sensible. You use immutable data, or you pass a mutable ref and a signal. It always works. Resources The useResource hook lets a user register a disposal function for later. useContext and useCapture also need to dispose of their dependency when unmounted. For this, there is a disposal tracker (makeDisposalTracker) which effectively acts as an onFiberDispose event listener: { host: { // ... // Add/remove listener track: (fiber: LiveFiber, task: Task) => void, untrack: (fiber: LiveFiber, task: Task) => void, // Trigger listener dispose: (fiber: LiveFiber) => void, } } Disposal tasks are triggered by host.dispose(fiber), which is called by disposeFiber(fiber). The latter will also set fiber.bound to undefined so the fiber can no longer be called. A useResource may change during a fiber's lifetime. Rather than repeatedly untrack/track a new disposal function each time, I store a persistent resource tag in the hook state. This holds a reference to the latest disposal function. Old resources are explicitly disposed of before new ones are created, ensuring there is no overlap. Stack Slicing A React-like is a recursive tree evaluator. A naive implementation would use function recursion directly, using the native CPU stack. This is what Live 0.0.1 did. But the run-time has overhead, with its own function calls sandwiched in between (e.g. updateFiber, reconcileFiberCalls, flushMount). This creates towering stacks. It also cannot be time-sliced, because all the state is on the stack. In React this is instead implemented with a flat work queue, so it only calls into one component at a time. A profiler shows it repeatedly calling performUnitOfWork, beginWork, completeWork in a clean, shallow trace. Live could do the same with its fiber priority queue. But the rendering order is always just tree order. It's only interrupted and truncated by memoization. So the vast majority of the time you are adding a fiber to the front of the queue only to immediately pop it off again. The queue is a linked list so it creates allocation overhead. This massively complicates what should just be a native function call. Live says "¿Por qué no los dos?" and instead has a stack slicing mechanism (makeStackSlicer). It will use the stack, but stop recursion after N levels, where N is a global knob that current sits at 20. The left-overs are enqueued. This way, mainly fibers pinged by state changes and long-range dependency end up in the queue. This includes fenced continuations, which must always be called indirectly. If a fiber is in the queue, but ends up being rendered in a parent's recursion, it's immediately removed. { host: { // ... depth: (depth: number) => void, slice: (depth: number) => boolean, }, } When renderFibers gets a fiber from the queue, it calls host.depth(fiber.depth) to calibrate the slicer. Every time a mount is flushed, it will then call host.slice(mount.depth) to check if it should be sliced off. If so, it calls host.visit(…) to add it to the queue, but otherwise it just calls renderFiber / updateFiber directly. The exception is when there is a data fence, when the queue is always used. Here too there is a strict mode, on by default, which ensures that once the stack has been sliced, no further sync evaluation can take place higher up the stack. One-phase commit Time to rewind. A Live app consists of a tree of such fiber objects, all exactly the same shape, just with different state and environments inside. It's rendered in a purely one-way data flow, with only a minor asterisk on that statement. The host is the only thing coordinating, because it's the only thing that closes the cycle when state changes. This triggers an ongoing traversal, during which it only tells fibers which dependencies to ping when they render. Everything else emerges from the composition of components. Hopefully you can appreciate that Live is not actually Cowboy React, but something else and deliberate. It has its own invariants it's enforcing, and its own guarantees you can rely on. Like React, it has a strict and a non-strict mode that is meaningfully different, though the strictness is not about it nagging you, but about how anally the run-time will reproduce your exact declared intent. It does not offer any way to roll back partial state changes once made, unlike React. This idempotency model of rendering is good when you need to accommodate mutable references in a reactive system. Immediate mode APIs tend to use these, and Live is designed to be plugged in to those. The nice thing about Live is that it's often meaningful to suspend a partially rendered sub-tree without rewinding it back to the old state, because its state doesn't represent anything directly, like HTML does. It's merely reduced into a value, and you can simply re-use the old value until it has unsuspended. There is no need to hold on to all the old state of the components that produced it. If the value being gathered is made of lambdas, you have your two phases: the commit consists of calling them once you have a full set. In Use.GPU, you work with memory on the GPU, which you allocate once and then reuse by reference. The entire idea is that the view can re-render without re-rendering all components that produced it, the same way that a browser can re-render a page by animating only the CSS transforms. So I have to be all-in on mutability there, because updated transforms have to travel through the layout system without re-triggering it. I also use immediate mode for the CPU-side interactions, because I've found it makes UI controllers 2-3x less complicated. One interesting aspect here is that the difference between capturing and bubbling events, i.e. outside-in or inside-out, is just before-fence and after-fence. Live is also not a React alternative: it plays very nicely with React. You can nest one inside the other and barely notice. The Live inspector is written in React, because I needed it to work even if Live was broken. It can memoize effectively in React because Live is memoized. Therefor everything it shows you is live, including any state you open up. The inspector is functionality-first so I throw purity out the window and just focus on UX and performance. It installs a host.__ping callback so it can receive fiber pings from the run-time whenever they re-render. The run-time calls this via pingFiber in the right spots. Individual fibers can make themselves inspectable by adding undocumented/private props to fiber.__inspect. There are some helper hooks to make this prettier but that's all. You can make any component inspector-highlightable by having it re-render itself when highlighted. * * * Writing this post was a fun endeavour, prompting me to reconsider some assumptions from early on. I also fixed a few things that just sounded bad when said out loud. You know how it is. I removed some lingering unnecessary reverse fiber references. I was aware they weren't load bearing, but that's still different from not having them at all. The only one I haven't mentioned is the capture keys, which are a fiber so that they can be directly compared. In theory it only needs the id, path, depth, keys, and I could package those up separately, but it would just create extra objects, so the jury allows it. Live can model programs shaped like a one-way data flow, and generates one-way data itself. There are some interesting correspondences here. Live keep state entirely in fiber objects, while fibers run entirely on fiber.state. A fiber object is just a fixed dictionary of properties, always the same shape, just like fiber.state is for a component's lifetime. Children arrays without keys must be fixed-length and fixed-order (a fragment), but may have nulls. This is very similar to how no-hooks will skip over a missing spot in the fiber.state array and zero out the hook, so as to preserve hook order. Live hot-swaps a global currentFiber pointer to switch fibers, and useYolo hot-swaps a fiber's own local state to switch hook scopes. Memoizing a component can be implemented as a nested useMemo. Bumping the fiber version is really a bespoke setState which is resolved during next render. The lines between fiber, fiber.state and fiber.mounts are actually pretty damn blurry. A lot of mechanisms appear twice, once in a non-incremental form and once in an incremental form. Iteration turns into mounting, sequences turn into fences, and objects get chopped up into fine bits of cached state, either counted or with keys. The difference between hooks and a gather of unkeyed components start to blur. If Live is react-react, then a self-hosted live-live is hiding in there somewhere. Create a root fiber, give it empty state, off you go. Inlining would be a lot harder though, and you wouldn't be able to hand-roll fast paths as easily, which is always the problem in FP. For a JS implementation it would be very dumb, especially when you know that the JS VM already manages object prototypes incrementally, mounting one prop at a time. I do like the sound of an incremental Lisp where everything is made out of flat state lists instead of endless pointer chasing. If it had the same structure as Live, it might only have one genuine linked list driving it all: the priority queue, which holds elements pointing to elements. The rest would be elements pointing to linear arrays, a data structure that silicon caches love. A data-oriented Lisp maybe? You could even call it an incremental GPU. Worth pondering. What Live could really use is a WASM pony with better stack access and threading. But in the mean time, it already works. The source code for the embedded examples can be found on GitLab. If your browser can do WebGPU (desktop only for now), you can load up any of the Use.GPU examples and inspect them.

a year ago 52 votes

More in programming

Diagnosis in engineering strategy.

Once you’ve written your strategy’s exploration, the next step is working on its diagnosis. Diagnosis is understanding the constraints and challenges your strategy needs to address. In particular, it’s about doing that understanding while slowing yourself down from deciding how to solve the problem at hand before you know the problem’s nuances and constraints. If you ever find yourself wanting to skip the diagnosis phase–let’s get to the solution already!–then maybe it’s worth acknowledging that every strategy that I’ve seen fail, did so due to a lazy or inaccurate diagnosis. It’s very challenging to fail with a proper diagnosis, and almost impossible to succeed without one. The topics this chapter will cover are: Why diagnosis is the foundation of effective strategy, on which effective policy depends. Conversely, how skipping the diagnosis phase consistently ruins strategies A step-by-step approach to diagnosing your strategy’s circumstances How to incorporate data into your diagnosis effectively, and where to focus on adding data Dealing with controversial elements of your diagnosis, such as pointing out that your own executive is one of the challenges to solve Why it’s more effective to view difficulties as part of the problem to be solved, rather than a blocking issue that prevents making forward progress The near impossibility of an effective diagnosis if you don’t bring humility and self-awareness to the process Into the details we go! This is an exploratory, draft chapter for a book on engineering strategy that I’m brainstorming in #eng-strategy-book. As such, some of the links go to other draft chapters, both published drafts and very early, unpublished drafts. Diagnosis is strategy’s foundation One of the challenges in evaluating strategy is that, after the fact, many effective strategies are so obvious that they’re pretty boring. Similarly, most ineffective strategies are so clearly flawed that their authors look lazy. That’s because, as a strategy is operated, the reality around it becomes clear. When you’re writing your strategy, you don’t know if you can convince your colleagues to adopt a new approach to specifying APIs, but a year later you know very definitively whether it’s possible. Building your strategy’s diagnosis is your attempt to correctly recognize the context that the strategy needs to solve before deciding on the policies to address that context. Done well, the subsequent steps of writing strategy often feel like an afterthought, which is why I think of diagnosis as strategy’s foundation. Where exploration was an evaluation-free activity, diagnosis is all about evaluation. How do teams feel today? Why did that project fail? Why did the last strategy go poorly? What will be the distractions to overcome to make this new strategy successful? That said, not all evaluation is equal. If you state your judgment directly, it’s easy to dispute. An effective diagnosis is hard to argue against, because it’s a web of interconnected observations, facts, and data. Even for folks who dislike your conclusions, the weight of evidence should be hard to shift. Strategy testing, explored in the Refinement section, takes advantage of the reality that it’s easier to diagnose by doing than by speculating. It proposes a recursive diagnosis process until you have real-world evidence that the strategy is working. How to develop your diagnosis Your strategy is almost certain to fail unless you start from an effective diagnosis, but how to build a diagnosis is often left unspecified. That’s because, for most folks, building the diagnosis is indeed a dark art: unspecified, undiscussion, and uncontrollable. I’ve been guilty of this as well, with The Engineering Executive’s Primer’s chapter on strategy staying silent on the details of how to diagnose for your strategy. So, yes, there is some truth to the idea that forming your diagnosis is an emergent, organic process rather than a structured, mechanical one. However, over time I’ve come to adopt a fairly structured approach: Braindump, starting from a blank sheet of paper, write down your best understanding of the circumstances that inform your current strategy. Then set that piece of paper aside for the moment. Summarize exploration on a new piece of paper, review the contents of your exploration. Pull in every piece of diagnosis from similar situations that resonates with you. This is true for both internal and external works! For each diagnosis, tag whether it fits perfectly, or needs to be adjusted for your current circumstances. Then, once again, set the piece of paper aside. Mine for distinct perspectives on yet another blank page, talking to different stakeholders and colleagues who you know are likely to disagree with your early thinking. Your goal is not to agree with this feedback. Instead, it’s to understand their view. The Crux by Richard Rumelt anchors diagnosis in this approach, emphasizing the importance of “testing, adjusting, and changing the frame, or point of view.” Synthesize views into one internally consistent perspective. Sometimes the different perspectives you’ve gathered don’t mesh well. They might well explicitly differ in what they believe the underlying problem is, as is typical in tension between platform and product engineering teams. The goal is to competently represent each of these perspectives in the diagnosis, even the ones you disagree with, so that later on you can evaluate your proposed approach against each of them. When synthesizing feedback goes poorly, it tends to fail in one of two ways. First, the author’s opinion shines through so strongly that it renders the author suspect. Your goal is never to agree with every team’s perspective, just as your diagnosis should typically avoid crowning any perspective as correct: a reader should generally be appraised of the details and unaware of the author. The second common issue is when a group tries to jointly own the synthesis, but create a fractured perspective rather than a unified one. I generally find that having one author who is accountable for representing all views works best to address both of these issues. Test drafts across perspectives. Once you’ve written your initial diagnosis, you want to sit down with the people who you expect to disagree most fervently. Iterate with them until they agree that you’ve accurately captured their perspective. It might be that they disagree with some other view points, but they should be able to agree that others hold those views. They might argue that the data you’ve included doesn’t capture their full reality, in which case you can caveat the data by saying that their team disagrees that it’s a comprehensive lens. Don’t worry about getting the details perfectly right in your initial diagnosis. You’re trying to get the right crumbs to feed into the next phase, strategy refinement. Allowing yourself to be directionally correct, rather than perfectly correct, makes it possible to cover a broad territory quickly. Getting caught up in perfecting details is an easy way to anchor yourself into one perspective prematurely. At this point, I hope you’re starting to predict how I’ll conclude any recipe for strategy creation: if these steps feel overly mechanical to you, adjust them to something that feels more natural and authentic. There’s no perfect way to understand complex problems. That said, if you feel uncertain, or are skeptical of your own track record, I do encourage you to start with the above approach as a launching point. Incorporating data into your diagnosis The strategy for Navigating Private Equity ownership’s diagnosis includes a number of details to help readers understand the status quo. For example the section on headcount growth explains headcount growth, how it compares to the prior year, and providing a mental model for readers to translate engineering headcount into engineering headcount costs: Our Engineering headcount costs have grown by 15% YoY this year, and 18% YoY the prior year. Headcount grew 7% and 9% respectively, with the difference between headcount and headcount costs explained by salary band adjustments (4%), a focus on hiring senior roles (3%), and increased hiring in higher cost geographic regions (1%). If everyone evaluating a strategy shares the same foundational data, then evaluating the strategy becomes vastly simpler. Data is also your mechanism for supporting or critiquing the various views that you’ve gathered when drafting your diagnosis; to an impartial reader, data will speak louder than passion. If you’re confident that a perspective is true, then include a data narrative that supports it. If you believe another perspective is overstated, then include data that the reader will require to come to the same conclusion. Do your best to include data analysis with a link out to the full data, rather than requiring readers to interpret the data themselves while they are reading. As your strategy document travels further, there will be inevitable requests for different cuts of data to help readers understand your thinking, and this is somewhat preventable by linking to your original sources. If much of the data you want doesn’t exist today, that’s a fairly common scenario for strategy work: if the data to make the decision easy already existed, you probably would have already made a decision rather than needing to run a structured thinking process. The next chapter on refining strategy covers a number of tools that are useful for building confidence in low-data environments. Whisper the controversial parts At one time, the company I worked at rolled out a bar raiser program styled after Amazon’s, where there was an interviewer from outside the team that had to approve every hire. I spent some time arguing against adding this additional step as I didn’t understand what we were solving for, and I was surprised at how disinterested management was about knowing if the new process actually improved outcomes. What I didn’t realize until much later was that most of the senior leadership distrusted one of their peers, and had rolled out the bar raiser program solely to create a mechanism to control that manager’s hiring bar when the CTO was disinterested holding that leader accountable. (I also learned that these leaders didn’t care much about implementing this policy, resulting in bar raiser rejections being frequently ignored, but that’s a discussion for the Operations for strategy chapter.) This is a good example of a strategy that does make sense with the full diagnosis, but makes little sense without it, and where stating part of the diagnosis out loud is nearly impossible. Even senior leaders are not generally allowed to write a document that says, “The Director of Product Engineering is a bad hiring manager.” When you’re writing a strategy, you’ll often find yourself trying to choose between two awkward options: Say something awkward or uncomfortable about your company or someone working within it Omit a critical piece of your diagnosis that’s necessary to understand the wider thinking Whenever you encounter this sort of debate, my advice is to find a way to include the diagnosis, but to reframe it into a palatable statement that avoids casting blame too narrowly. I think it’s helpful to discuss a few concrete examples of this, starting with the strategy for navigating private equity, whose diagnosis includes: Based on general practice, it seems likely that our new Private Equity ownership will expect us to reduce R&D headcount costs through a reduction. However, we don’t have any concrete details to make a structured decision on this, and our approach would vary significantly depending on the size of the reduction. There are many things the authors of this strategy likely feel about their state of reality. First, they are probably upset about the fact that their new private equity ownership is likely to eliminate colleagues. Second, they are likely upset that there is no clear plan around what they need to do, so they are stuck preparing for a wide range of potential outcomes. However they feel, they don’t say any of that, they stick to precise, factual statements. For a second example, we can look to the Uber service migration strategy: Within infrastructure engineering, there is a team of four engineers responsible for service provisioning today. While our organization is growing at a similar rate as product engineering, none of that additional headcount is being allocated directly to the team working on service provisioning. We do not anticipate this changing. The team didn’t agree that their headcount should not be growing, but it was the reality they were operating in. They acknowledged their reality as a factual statement, without any additional commentary about that statement. In both of these examples, they found a professional, non-judgmental way to acknowledge the circumstances they were solving. The authors would have preferred that the leaders behind those decisions take explicit accountability for them, but it would have undermined the strategy work had they attempted to do it within their strategy writeup. Excluding critical parts of your diagnosis makes your strategies particularly hard to evaluate, copy or recreate. Find a way to say things politely to make the strategy effective. As always, strategies are much more about realities than ideals. Reframe blockers as part of diagnosis When I work on strategy with early-career leaders, an idea that comes up a lot is that an identified problem means that strategy is not possible. For example, they might argue that doing strategy work is impossible at their current company because the executive team changes their mind too often. That core insight is almost certainly true, but it’s much more powerful to reframe that as a diagnosis: if we don’t find a way to show concrete progress quickly, and use that to excite the executive team, our strategy is likely to fail. This transforms the thing preventing your strategy into a condition your strategy needs to address. Whenever you run into a reason why your strategy seems unlikely to work, or why strategy overall seems difficult, you’ve found an important piece of your diagnosis to include. There are never reasons why strategy simply cannot succeed, only diagnoses you’ve failed to recognize. For example, we knew in our work on Uber’s service provisioning strategy that we weren’t getting more headcount for the team, the product engineering team was going to continue growing rapidly, and that engineering leadership was unwilling to constrain how product engineering worked. Rather than preventing us from implementing a strategy, those components clarified what sort of approach could actually succeed. The role of self-awareness Every problem of today is partially rooted in the decisions of yesterday. If you’ve been with your organization for any duration at all, this means that you are directly or indirectly responsible for a portion of the problems that your diagnosis ought to recognize. This means that recognizing the impact of your prior actions in your diagnosis is a powerful demonstration of self-awareness. It also suggests that your next strategy’s success is rooted in your self-awareness about your prior choices. Don’t be afraid to recognize the failures in your past work. While changing your mind without new data is a sign of chaotic leadership, changing your mind with new data is a sign of thoughtful leadership. Summary Because diagnosis is the foundation of effective strategy, I’ve always found it the most intimidating phase of strategy work. While I think that’s a somewhat unavoidable reality, my hope is that this chapter has somewhat prepared you for that challenge. The four most important things to remember are simply: form your diagnosis before deciding how to solve it, try especially hard to capture perspectives you initially disagree with, supplement intuition with data where you can, and accept that sometimes you’re missing the data you need to fully understand. The last piece in particular, is why many good strategies never get shared, and the topic we’ll address in the next chapter on strategy refinement.

10 hours ago 3 votes
My friend, JT

I’ve had a cat for almost a third of my life.

2 hours ago 3 votes
[Course Launch] Hands-on Introduction to X86 Assembly

A Live, Interactive Course for Systems Engineers

4 hours ago 2 votes
It’s cool to care

I’m sitting in a small coffee shop in Brooklyn. I have a warm drink, and it’s just started to snow outside. I’m visiting New York to see Operation Mincemeat on Broadway – I was at the dress rehearsal yesterday, and I’ll be at the opening preview tonight. I’ve seen this show more times than I care to count, and I hope US theater-goers love it as much as Brits. The people who make the show will tell you that it’s about a bunch of misfits who thought they could do something ridiculous, who had the audacity to believe in something unlikely. That’s certainly one way to see it. The musical tells the true story of a group of British spies who tried to fool Hitler with a dead body, fake papers, and an outrageous plan that could easily have failed. Decades later, the show’s creators would mirror that same spirit of unlikely ambition. Four friends, armed with their creativity, determination, and a wardrobe full of hats, created a new musical in a small London theatre. And after a series of transfers, they’re about to open the show under the bright lights of Broadway. But when I watch the show, I see a story about friendship. It’s about how we need our friends to help us, to inspire us, to push us to be the best versions of ourselves. I see the swaggering leader who needs a team to help him truly achieve. The nervous scientist who stands up for himself with the support of his friends. The enthusiastic secretary who learns wisdom and resilience from her elder. And so, I suppose, it’s fitting that I’m not in New York on my own. I’m here with friends – dozens of wonderful people who I met through this ridiculous show. At first, I was just an audience member. I sat in my seat, I watched the show, and I laughed and cried with equal measure. After the show, I waited at stage door to thank the cast. Then I came to see the show a second time. And a third. And a fourth. After a few trips, I started to see familiar faces waiting with me at stage door. So before the cast came out, we started chatting. Those conversations became a Twitter community, then a Discord, then a WhatsApp. We swapped fan art, merch, and stories of our favourite moments. We went to other shows together, and we hung out outside the theatre. I spent New Year’s Eve with a few of these friends, sitting on somebody’s floor and laughing about a bowl of limes like it was the funniest thing in the world. And now we’re together in New York. Meeting this kind, funny, and creative group of people might seem as unlikely as the premise of Mincemeat itself. But I believed it was possible, and here we are. I feel so lucky to have met these people, to take this ridiculous trip, to share these precious days with them. I know what a privilege this is – the time, the money, the ability to say let’s do this and make it happen. How many people can gather a dozen friends for even a single evening, let alone a trip halfway round the world? You might think it’s silly to travel this far for a theatre show, especially one we’ve seen plenty of times in London. Some people would never see the same show twice, and most of us are comfortably into double or triple-figures. Whenever somebody asks why, I don’t have a good answer. Because it’s fun? Because it’s moving? Because I enjoy it? I feel the need to justify it, as if there’s some logical reason that will make all of this okay. But maybe I don’t have to. Maybe joy doesn’t need justification. A theatre show doesn’t happen without people who care. Neither does a friendship. So much of our culture tells us that it’s not cool to care. It’s better to be detached, dismissive, disinterested. Enthusiasm is cringe. Sincerity is weakness. I’ve certainly felt that pressure – the urge to play it cool, to pretend I’m above it all. To act as if I only enjoy something a “normal” amount. Well, fuck that. I don’t know where the drive to be detached comes from. Maybe it’s to protect ourselves, a way to guard against disappointment. Maybe it’s to seem sophisticated, as if having passions makes us childish or less mature. Or perhaps it’s about control – if we stay detached, we never have to depend on others, we never have to trust in something bigger than ourselves. Being detached means you can’t get hurt – but you’ll also miss out on so much joy. I’m a big fan of being a big fan of things. So many of the best things in my life have come from caring, from letting myself be involved, from finding people who are a big fan of the same things as me. If I pretended not to care, I wouldn’t have any of that. Caring – deeply, foolishly, vulnerably – is how I connect with people. My friends and I care about this show, we care about each other, and we care about our joy. That care and love for each other is what brought us together, and without it we wouldn’t be here in this city. I know this is a once-in-a-lifetime trip. So many stars had to align – for us to meet, for the show we love to be successful, for us to be able to travel together. But if we didn’t care, none of those stars would have aligned. I know so many other friends who would have loved to be here but can’t be, for all kinds of reasons. Their absence isn’t for lack of caring, and they want the show to do well whether or not they’re here. I know they care, and that’s the important thing. To butcher Tennyson: I think it’s better to care about something you cannot affect, than to care about nothing at all. In a world that’s full of cynicism and spite and hatred, I feel that now more than ever. I’d recommend you go to the show if you haven’t already, but that’s not really the point of this post. Maybe you’ve already seen Operation Mincemeat, and it wasn’t for you. Maybe you’re not a theatre kid. Maybe you aren’t into musicals, or history, or war stories. That’s okay. I don’t mind if you care about different things to me. (Imagine how boring the world would be if we all cared about the same things!) But I want you to care about something. I want you to find it, find people who care about it too, and hold on to them. Because right now, in this city, with these people, at this show? I’m so glad I did. And I hope you find that sort of happiness too. Some of the people who made this trip special. Photo by Chloe, and taken from her Twitter. Timing note: I wrote this on February 15th, but I delayed posting it because I didn’t want to highlight the fact I was away from home. [If the formatting of this post looks odd in your feed reader, visit the original article]

yesterday 3 votes
Stick with the customer

One of the biggest mistakes that new startup founders make is trying to get away from the customer-facing roles too early. Whether it's customer support or it's sales, it's an incredible advantage to have the founders doing that work directly, and for much longer than they find comfortable. The absolute worst thing you can do is hire a sales person or a customer service agent too early. You'll miss all the golden nuggets that customers throw at you for free when they're rejecting your pitch or complaining about the product. Seeing these reasons paraphrased or summarized destroy all the nutrients in their insights. You want that whole-grain feedback straight from the customers' mouth!  When we launched Basecamp in 2004, Jason was doing all the customer service himself. And he kept doing it like that for three years!! By the time we hired our first customer service agent, Jason was doing 150 emails/day. The business was doing millions of dollars in ARR. And Basecamp got infinitely, better both as a market proposition and as a product, because Jason could funnel all that feedback into decisions and positioning. For a long time after that, we did "Everyone on Support". Frequently rotating programmers, designers, and founders through a day of answering emails directly to customers. The dividends of doing this were almost as high as having Jason run it all in the early years. We fixed an incredible number of minor niggles and annoying bugs because programmers found it easier to solve the problem than to apologize for why it was there. It's not easy doing this! Customers often offer their valuable insights wrapped in rude language, unreasonable demands, and bad suggestions. That's why many founders quit the business of dealing with them at the first opportunity. That's why few companies ever do "Everyone On Support". That's why there's such eagerness to reduce support to an AI-only interaction. But quitting dealing with customers early, not just in support but also in sales, is an incredible handicap for any startup. You don't have to do everything that every customer demands of you, but you should certainly listen to them. And you can't listen well if the sound is being muffled by early layers of indirection.

yesterday 4 votes