Stable Fiddusion

from Acko.net [alt+shift+b] in programming

Acko.queue(function () { renderMathInElement(document.querySelector('article'), {delimiters: [ {left: "$$", right: "$$", display: true}, {left: "$", right: "$", display: false}, ]}); }); Frequency-domain blue noise generator In computer graphics, stochastic methods are so hot right now. All rendering turns into calculus, except you solve the integrals by numerically sampling them. As I showed with Teardown, this is all based on random noise, hidden with a ton of spatial and temporal smoothing. For this, you need a good source of high quality noise. There have been a few interesting developments in this area, such as Alan Wolfe et al.'s Spatio-Temporal Blue Noise. This post is about how I designed noise in frequency space. I will cover: What is blue noise? Designing indigo noise How swap works in the frequency domain Heuristics and analysis to speed up search Implementing it in WebGPU Along the way I will also show you some "street" DSP math. This illustrates how...

a year ago

Remove from reading list Add to reading list [alt+a] Read now [→]

More from Acko.net

The Bouquet Residence

Keeping up appearances in tech I saw a remarkable pair of tweets the other day. In the wake of the outage, the CEO of CrowdStrike sent out a public announcement. It's purely factual. The scope of the problem is identified, the known facts are stated, and the logistics of disaster relief are set in motion. Millions of computers were affected. This is the equivalent of a frazzled official giving a brief statement in the aftermath of an earthquake, directing people to the Red Cross. Everything is basically on fire for everyone involved. Systems are failing everywhere, some critical, and quite likely people are panicking. The important thing is to give the technicians the information and tools to fix it, and for everyone else to do what they can, and stay out of the way. In response, a communication professional posted an 'improved' version: Credit where credit is due, she nailed the style. 10/10. It seems unobjectionable, at first. Let's go through, shall we? Opposite Day First is that the CEO is "devastated." A feeling. And they are personally going to ensure it's fixed for every single user. This focuses on the individual who is inconvenienced. Not the disaster. They take a moment out of their time to say they are so, so sorry a mistake was made. They have let you and everyone else down, and that shouldn't happen. That's their responsibility. By this point, the original statement had already told everyone the relevant facts. Here the technical details are left to the imagination. The writer's self-assigned job is to wrap the message in a more palatable envelope. Everyone will be working "all day, all night, all weekends," indeed, "however long it takes," to avoid it happening again. I imagine this is meant to be inspiring and reassuring. But if I was a CrowdStrike technician or engineer, I would find it demoralizing: the boss, who will actually be personally fixing diddly-squat, is saying that the long hours of others are a sacrifice they're willing to make. Plus, CrowdStrike's customers are in the same boat: their technicians get volunteered too. They can't magically unbrick PCs from a distance, so "until it's fully fixed for every single user" would be a promise outsiders will have to keep. Lovely. There's even a punch line: an invitation to go contact them, the quickest way linked directly. It thanks people for reaching out. If everything is on fire, that includes the phone lines, the inboxes, and so on. The most stupid thing you could do in such a situation is to tell more people to contact you, right away. Don't encourage it! That's why the original statement refers to pre-existing lines of communication, internal representatives, and so on. The Support department would hate the CEO too. Root Cause If you're wondering about the pictures, it's Hyacinth Bucket, from 90s UK sitcom Keeping Up Appearances, who would always insist "it's pronounced Bouquet." Hyacinth's ambitions always landed her out of her depth, surrounded by upper-class people she's trying to impress, in the midst of an embarrassing disaster. Her increasingly desparate attempts to save face, which invariably made things worse, are the main source of comedy. Try reading that second statement in her voice. I’m devastated to see the scale of today’s outage and will be personally working on it together with our team until it’s fully fixed for every single user. But I wanted to take a moment to come here and tell you that I am sorry. People around the world rely on us, and incidents like this can’t happen. This came from an error that ultimately is my responsibility. I can hear it perfectly, telegraphing Britishness to restore dignity for all. If she were in tech she would give that statement. It's about reputation management first, projecting the image of competence and accountability. But she's giving the speech in front of a burning building, not realizing the entire exercise is futile. Worse, she thinks she's nailing it. If CrowdStrike had sent this out, some would've applauded and called it an admirable example of wise and empathetic communication. Real leadership qualities. But it's the exact opposite. It focuses on the wrong things, it alienates the staff, and it definitely amplifies the chaos. It's Monty Python-esque. Apologizing is pointless here, the damage is already done. What matters is how severe it is and whether it could've been avoided. This requires a detailed root-cause analysis and remedy. Otherwise you only have their word. Why would that re-assure you? The original restated the company's mission: security and stability. Those are the stakes to regain a modicum of confidence. You may think that I'm reading too much into this. But I know the exact vibe on an engineering floor when the shit hits the fan. I also know how executives and staff without that experience end up missing the point entirely. I once worked for a Hyacinth Bucket. It's not an anecdote, it's allegory. They simply don't get the engineering mindset, and confuse authority with ownership. They step on everyone's toes without realizing, because they're constantly wearing clown shoes. Nobody tells them. Softness as a Service The change in style between #1 and #2 is really a microcosm of the conflict that has been broiling in tech for ~15 years now. I don't mean the politics, but the shifting of norms, of language and behavior. It's framed as a matter of interpersonal style, which needs to be welcoming and inclusive. In practice this means they assert or demand that style #2 be the norm, even when #1 is advisable or required. Factuality is seen as deficient, improper and primitive. It's a form of doublethink: everyone's preference is equally valid, except yours, specifically. But the difference is not a preference. It's about what actually works and what doesn't. Style #1 is aimed at the people who have to fix it. Style #2 is aimed at the people who can't do anything until it's fixed. Who should they be reaching out to? In #2, communication becomes an end in itself, not a means of conveying information. It's about being seen saying the words, not living them. Poking at the statement makes it fall apart. When this becomes the norm in a technical field, it has deep consequences: Critique must be gift-wrapped in flattery, and is not allowed to actually land. Mistakes are not corrected, and sentiment takes precedence over effectiveness. Leaders speak lofty words far from the trenches to save face. The people they thank the loudest are the ones they pay the least. Inevitably, quiet competence is replaced with gaudy chaos. Everyone says they're sorry and responsible, but nobody actually is. Nobody wants to resign either. Sound familiar? Cope and Soothe The elephant in the room is that #1 is very masculine, while #2 is more feminine. When you hear "women are more empathetic communicators", this is what they mean. They tend to focus on the individual and their relation to them, not the team as a whole and its mission. Complaints that tech is too "male dominated" and "notoriously hostile to women" are often just this. Tech was always full of types who won't preface their proposals and criticisms with fluff, and instead lean into autism. When you're used to being pandered to, neutrality feels like vulgarity. The notable exceptions are rare and usually have an exasperating lead up. Tech is actually one of the most accepting and egalitarian fields around. The maintainers do a mostly thankless job. "Oh so you're saying there's no misogyny in tech?" No I'm just saying misogyny doesn't mean "something 1 woman hates". The tone is really a distraction. If someone drops an analysis, saying shit or get off the pot, even very kindly and patiently, some will still run away screaming. Like an octopus spraying ink, they'll deploy a nasty form of #2 as a distraction. That's the real issue. Many techies, in their naiveté, believed the cultural reformers when they showed up to gentrify them. They obediently branded heretics like James Damore, and burned witches like Richard Stallman. Thanks to racism, words like 'master' and 'slave' are now off-limits as technical terms. Ironic, because millions of computers just crashed because they worked exactly like that. The cope is to pretend that nothing has truly changed yet, and more reform is needed. In fact, everything has already changed. Tech forums used to be crucibles for distilling insight, but now they are guarded jealously by people more likely to flag and ban than strongly disagree. I once got flagged on HN because I pointed out Twitter's mass lay-offs were a response to overhiring, and that people were rooting for the site to fail after Musk bought it. It suggested what we all know now: that the company would not implode after trimming the dead weight, and that they'd never forgive him for it. Diversity is now associated with incompetence, because incompetent people have spent over a decade reaching for it as an excuse. In their attempts to fight stereotypes, they ensured the stereotypes came true. Bait and Snitch The outcry tends to be: "We do all the same things you do, but still we get treated differently!" But they start from the conclusion and work their way backwards. This is what the rewritten statement does: it tries to fix the relationship before fixing the problem. The average woman and man actually do things very differently in the first place. Individual men and women choose. And others respond accordingly. The people who build and maintain the world's infrastructure prefer the masculine style for a reason: it keeps civilization running, and helps restore it when it breaks. A disaster announcement does not need to be relatable, it needs to be effective. Furthermore, if the job of shoveling shit falls on you, no amount of flattery or oversight will make that more pleasant. It really won't. Such commentary is purely for the benefit of the ones watching and trying to look busy. It makes it worse, stop pretending otherwise. There's little loyalty in tech companies nowadays, and it's no surprise. Project and product managers are acting more like demanding clients to their own team, than leaders. "As a user, I want..." Yes, but what are you going to do about it? Do you even know where to start? What's perceived as a lack of sensitivity is actually the presence of sensibility. It's what connects the words to the reality on the ground. It does not need to be improved or corrected, it just needs to be respected. And yes it's a matter of gender, because bashing men and masculine norms has become a jolly recreational sport in the overculture. Mature women know it. It seems impossible to admit. The entire edifice of gender equality depends on there not being a single thing men are actually better at, even just on average. Where men and women's instincts differ, women must be right. It's childish, and not harmless either. It dares you to call it out, so they can then play the wounded victim, and paint you as the unreasonable asshole who is mean. This is supposed to invalidate the argument. * * * This post is of course a giant cannon pointing in the opposite direction, sitting on top of a wall. Its message will likely fly over the reformers' heads. If they read it at all, they'll selectively quote or paraphrase, call me a tech-bro, and spool off some sentences they overheard, like an LLM. It's why they adore AI, and want it to be exactly as sycophantic as them. They don't care that it makes stuff up wholesale, because it makes them look and feel competent. It will never tell them to just fuck off already. Think less about what is said, more about what is being done. Otherwise the next CrowdStrike will probably be worse.

6 months ago • 64 votes

I is for Intent

Why your app turned into spaghetti There's a certain kind of programmer. Let's call him Stanley. Stanley has been around for a while, and has his fair share of war stories. The common thread is that poorly conceived and specced solutions lead to disaster, or at least, ongoing misery. As a result, he has adopted a firm belief: it should be impossible for his program to reach an invalid state. Stanley loves strong and static typing. He's a big fan of pattern matching, and enums, and discriminated unions, which allow correctness to be verified at compile time. He also has strong opinions on errors, which must be caught, logged and prevented. He uses only ACID-compliant databases, wants foreign keys and triggers to be enforced, and wraps everything in atomic transactions. He hates any source of uncertainty or ambiguity, like untyped JSON or plain-text markup. His APIs will accept data only in normalized and validated form. When you use a Stanley lib, and it doesn't work, the answer will probably be: "you're using it wrong." Stanley is most likely a back-end or systems-level developer. Because nirvana in front-end development is reached when you understand that this view of software is not just wrong, but fundamentally incompatible with the real world. I will prove it. State Your Intent Take a text editor. What happens if you press the up and down arrows? The keyboard cursor (aka caret) moves up and down. Duh. Except it also moves left and right. The editor state at the start has the caret on line 1 column 6. Pressing down will move it to line 2 column 6. But line 2 is too short, so the caret is forcibly moved left to column 1. Then, pressing down again will move it back to column 6. It should be obvious that any editor that didn't remember which column you were actually on would be a nightmare to use. You know it in your bones. Yet this only works because the editor allows the caret to be placed on a position that "does not exist." What is the caret state in the middle? It is both column 1 and column 6. To accommodate this, you need more than just a View that is a pure function of a State, as is now commonly taught. Rather, you need an Intent, which is the source of truth that you mutate... and which is then parsed and validated into a State. Only then can it be used by the View to render the caret in the right place. To edit the intent, aka what a classic Controller does, is a bit tricky. When you press left/right, it should determine the new Intent.column based on the validated State.column +/- 1. But when you press up/down, it should keep the Intent.column you had before and instead change only Intent.line. New intent is a mixed function of both previous intent and previous state. The general pattern is that you reuse Intent if it doesn't change, but that new computed Intent should be derived from State. Note that you should still enforce normal validation of Intent.column when editing too: you don't allow a user to go past the end of a line. Any new intent should be as valid as possible, but old intent should be preserved as is, even if non-sensical or inapplicable. Functionally, for most of the code, it really does look and feel as if the state is just State, which is valid. It's just that when you make 1 state change, the app may decide to jump into a different State than one would think. When this happens, it means some old intent first became invalid, but then became valid again due to a subsequent intent/state change. This is how applications actually work IRL. FYI. Knives and Forks I chose a text editor as an example because Stanley can't dismiss this as just frivolous UI polish for limp wristed homosexuals. It's essential that editors work like this. The pattern is far more common than most devs realize: A tree view remembers the expand/collapse state for rows that are hidden. Inspector tabs remember the tab you were on, even if currently disabled or inapplicable. Toggling a widget between type A/B/C should remember all the A, B and C options, even if are mutually exclusive. All of these involve storing and preserving something unknown, invalid or unused, and bringing it back into play later. More so, if software matches your expected intent, it's a complete non-event. What looks like a "surprise hidden state transition" to a programmer is actually the exact opposite. It would be an unpleasant surprise if that extra state transition didn't occur. It would only annoy users: they already told the software what they wanted, but it keeps forgetting. The ur-example is how nested popup menus should work: good implementations track the motion of the cursor so you can move diagonally from parent to child, without falsely losing focus: This is an instance of the goalkeeper's curse: people rarely compliment or notice the goalkeeper if they do their job, only if they mess up. Successful applications of this principle are doomed to remain unnoticed and unstudied. Validation is not something you do once, discarding the messy input and only preserving the normalized output. It's something you do continuously and non-destructively, preserving the mess as much as possible. It's UI etiquette: the unspoken rules that everyone expects but which are mostly undocumented folklore. This poses a problem for most SaaS in the wild, both architectural and existential. Most APIs will only accept mutations that are valid. The goal is for the database to be a sequence of fully valid states: The smallest possible operation in the system is a fully consistent transaction. This flattens any prior intent. In practice, many software deviates from this ad-hoc. For example, spreadsheets let you create cyclic references, which is by definition invalid. The reason it must let you do this is because fixing one side of a cyclic reference also fixes the other side. A user wants and needs to do these operations in any order. So you must allow a state transition through an invalid state: This requires an effective Intent/State split, whether formal or informal. Because cyclic references can go several levels deep, identifying one cyclic reference may require you to spider out the entire dependency graph. This is functionally equivalent to identifying all cyclic references—dixit Dijkstra. Plus, you need to produce sensible, specific error messages. Many "clever" algorithmic tricks fail this test. Now imagine a spreadsheet API that doesn't allow for any cyclic references ever. This still requires you to validate the entire resulting model, just to determine if 1 change is allowed. It still requires a general validate(Intent). In short, it means your POST and PUT request handlers need to potentially call all your business logic. That seems overkill, so the usual solution is bespoke validators for every single op. If the business logic changes, there is a risk your API will now accept invalid intent. And the app was not built for that. If you flip it around and assume intent will go out-of-bounds as a normal matter, then you never have this risk. You can write the validation in one place, and you reuse it for every change as a normal matter of data flow. Note that this is not cowboy coding. Records and state should not get irreversibly corrupted, because you only ever use valid inputs in computations. If the system is multiplayer, distributed changes should still be well-ordered and/or convergent. But the data structures you're encoding should be, essentially, entirely liberal to your user's needs. Consider git. Here, a "unit of intent" is just a diff applied to a known revision ID. When something's wrong with a merge, it doesn't crash, or panic, or refuse to work. It just enters a conflict state. This state is computed by merging two incompatible intents. It's a dirty state that can't be turned into a clean commit without human intervention. This means git must continue to work, because you need to use git to clean it up. So git is fully aware when a conflict is being resolved. As a general rule, the cases where you actually need to forbid a mutation which satisfies all the type and access constraints are small. A good example is trying to move a folder inside itself: the file system has to remain a sensibly connected tree. Enforcing the uniqueness of names is similar, but also comes with a caution: falsehoods programmers believe about names. Adding (Copy) to a duplicate name is usually better than refusing to accept it, and most names in real life aren't unique at all. Having user-facing names actually requires creating tools and affordances for search, renaming references, resolving duplicates, and so on. Even among front-end developers, few people actually grok this mental model of a user. It's why most React(-like) apps in the wild are spaghetti, and why most blog posts about React gripes continue to miss the bigger picture. Doing React (and UI) well requires you to unlearn old habits and actually design your types and data flow so it uses potentially invalid input as its single source of truth. That way, a one-way data flow can enforce the necessary constraints on the fly. The way Stanley likes to encode and mutate his data is how programmers think about their own program: it should be bug-free and not crash. The mistake is to think that this should also apply to any sort of creative process that program is meant to enable. It would be like making an IDE that only allows you to save a file if the code compiles and passes all the tests. Trigger vs Memo Coding around intent is a very hard thing to teach, because it can seem overwhelming. But what's overwhelming is not doing this. It leads to codebases where every new feature makes ongoing development harder, because no part of the codebase is ever finished. You will sprinkle copies of your business logic all over the place, in the form of request validation, optimistic local updaters, and guess-based cache invalidation. If this is your baseline experience, your estimate of what is needed to pull this off will simply be wrong. In the traditional MVC model, intent is only handled at the individual input widget or form level. e.g. While typing a number, the intermediate representation is a string. This may be empty, incomplete or not a number, but you temporarily allow that. I've never seen people formally separate Intent from State in an entire front-end. Often their state is just an adhoc mix of both, where validation constraints are relaxed in the places where it was most needed. They might just duplicate certain fields to keep a validated and unvalidated variant side by side. There is one common exception. In a React-like, when you do a useMemo with a derived computation of some state, this is actually a perfect fit. The eponymous useState actually describes Intent, not State, because the derived state is ephemeral. This is why so many devs get lost here. const state = useMemo( () => validate(intent), [intent] ); Their usual instinct is that every action that has knock-on effects should be immediately and fully realized, as part of one transaction. Only, they discover some of those knock-on effects need to be re-evaluated if certain circumstances change. Often to do so, they need to undo and remember what it was before. This is then triggered anew via a bespoke effect, which requires a custom trigger and mutation. If they'd instead deferred the computation, it could have auto-updated itself, and they would've still had the original data to work with. e.g. In a WYSIWYG scenario, you often want to preview an operation as part of mouse hovering or dragging. It should look like the final result. You don't need to implement custom previewing and rewinding code for this. You just need the ability to layer on some additional ephemeral intent on top of the intent that is currently committed. Rewinding just means resetting that extra intent back to empty. You can make this easy to use by treating previews as a special kind of transaction: now you can make preview states with the same code you use to apply the final change. You can also auto-tag the created objects as being preview-only, which is very useful. That is: you can auto-translate editing intent into preview intent automatically, by messing with the contents of a transaction. Sounds bad, is actually good. The same applies to any other temporary state, for example, highlighting of elements. Instead of manually changing colors, and creating/deleting labels to pull this off, derive the resolved style just-in-time. This is vastly simpler than doing it all on 1 classic retained model. There, you run the risk of highlights incorrectly becoming sticky, or shapes reverting to the wrong style when un-highlighted. You can architect it so this is simply impossible. The trigger vs memo problem also happens on the back-end, when you have derived collections. Each object of type A must have an associated type B, created on-demand for each A. What happens if you delete an A? Do you delete the B? Do you turn the B into a tombstone? What if the relationship is 1-to-N, do you need to garbage collect? If you create invisible objects behind the scenes as a user edits, and you never tell them, expect to see a giant mess as a result. It's crazy how often I've heard engineers suggest a user should only be allowed to create something, but then never delete it, as a "solution" to this problem. Everyday undo/redo precludes it. Don't be ridiculous. The problem is having an additional layer of bookkeeping you didn't need. The source of truth was collection A, but you created a permanent derived collection B. If you instead make B ephemeral, derived via a stateless computation, then the problem goes away. You can still associate data with B records, but you don't treat B as the authoritative source for itself. This is basically what a WeakMap is. In database land this can be realized with a materialized view, which can be incremental and subscribed to. Taken to its extreme, this turns into event-based sourcing, which might seem like a panacea for this mindset. But in most cases, the latter is still a system by and for Stanley. The event-based nature of those systems exists to support housekeeping tasks like migration, backup and recovery. Users are not supposed to be aware that this is happening. They do not have any view into the event log, and cannot branch and merge it. The exceptions are extremely rare. It's not a system for working with user intent, only for flattening it, because it's append-only. It has a lot of the necessary basic building blocks, but substitutes programmer intent for user intent. What's most nefarious is that the resulting tech stacks are often quite big and intricate, involving job queues, multi-layered caches, distribution networks, and more. It's a bunch of stuff that Stanley can take joy and pride in, far away from users, with "hard" engineering challenges. Unlike all this *ugh* JavaScript, which is always broken and unreliable and uninteresting. Except it's only needed because Stanley only solved half the problem, badly. Patch or Bust When factored in from the start, it's actually quite practical to split Intent from State, and it has lots of benefits. Especially if State is just a more constrained version of the same data structures as Intent. This doesn't need to be fully global either, but it needs to encompass a meaningful document or workspace to be useful. It does create an additional problem: you now have two kinds of data in circulation. If reading or writing requires you to be aware of both Intent and State, you've made your code more complicated and harder to reason about. More so, making a new Intent requires a copy of the old Intent, which you mutate or clone. But you want to avoid passing Intent around in general, because it's fishy data. It may have the right types, but the constraints and referential integrity aren't guaranteed. It's a magnet for the kind of bugs a type-checker won't catch. I've published my common solution before: turn changes into first-class values, and make a generic update of type Update<T> be the basic unit of change. As a first approximation, consider a shallow merge {...value, ...update}. This allows you to make an updateIntent(update) function where update only specifies the fields that are changing. In other words, Update<Intent> looks just like Update<State> and can be derived 100% from State, without Intent. Only one place needs to have access to the old Intent, all other code can just call that. You can make an app intent-aware without complicating all the code. If your state is cleaved along orthogonal lines, then this is all you need. i.e. If column and line are two separate fields, then you can selectively change only one of them. If they are stored as an XY tuple or vector, now you need to be able to describe a change that only affects either the X or Y component. const value = { hello: 'text', foo: { bar: 2, baz: 4 }, }; const update = { hello: 'world', foo: { baz: 50 }, }; expect( patch(value, update) ).toEqual({ hello: 'world', foo: { bar: 2, baz: 50 }, }); So in practice I have a function patch(value, update) which implements a comprehensive superset of a deep recursive merge, with full immutability. It doesn't try to do anything fancy with arrays or strings, they're just treated as atomic values. But it allows for precise overriding of merging behavior at every level, as well as custom lambda-based updates. You can patch tuples by index, but this is risky for dynamic lists. So instead you can express e.g. "append item to list" without the entire list, as a lambda. I've been using patch for years now, and the uses are myriad. To overlay a set of overrides onto a base template, patch(base, overrides) is all you need. It's the most effective way I know to erase a metric ton of {...splats} and ?? defaultValues and != null from entire swathes of code. This is a real problem. You could also view this as a "poor man's OT", with the main distinction being that a patch update only describes the new state, not the old state. Such updates are not reversible on their own. But they are far simpler to make and apply. It can still power a global undo/redo system, in combination with its complement diff(A, B): you can reverse an update by diffing in the opposite direction. This is an operation which is formalized and streamlined into revise(…), so that it retains the exact shape of the original update, and doesn't require B at all. The structure of the update is sufficient information: it too encodes some intent behind the change. With patch you also have a natural way to work with changes and conflicts as values. The earlier WYSIWIG scenario is just patch(commited, ephemeral) with bells on. The net result is that mutating my intent or state is as easy as doing a {...value, ...update} splat, but I'm not falsely incentivized to flatten my data structures. Instead it frees you up to think about what the most practical schema actually is from the data's point of view. This is driven by how the user wishes to edit it, because that's what you will connect it to. It makes you think about what a user's workspace actually is, and lets you align boundaries in UX and process with boundaries in data structure. Remember: most classic "data structures" are not about the structure of data at all. They serve as acceleration tools to speed up specific operations you need on that data. Having the reads and writes drive the data design was always part of the job. What's weird is that people don't apply that idea end-to-end, from database to UI and back. SQL tables are shaped the way they are because it enables complex filters and joins. However, I find this pretty counterproductive: it produces derived query results that are difficult to keep up to date on a client. They also don't look like any of the data structures I actually want to use in my code. A Bike Shed of Schemas This points to a very under-appreciated problem: it is completely pointless to argue about schemas and data types without citing specific domain logic and code that will be used to produce, edit and consume it. Because that code determines which structures you are incentivized to use, and which structures will require bespoke extra work. From afar, column and line are just XY coordinates. Just use a 2-vector. But once you factor in the domain logic and etiquette, you realize that the horizontal and vertical directions have vastly different rules applied to them, and splitting might be better. Which one do you pick? This applies to all data. Whether you should put items in a List<T> or a Map<K, V> largely depends on whether the consuming code will loop over it, or need random access. If an API only provides one, consumers will just build the missing Map or List as a first step. This is O(n log n) either way, because of sorting. The method you use to read or write your data shouldn't limit use of everyday structure. Not unless you have a very good reason. But this is exactly what happens. A lot of bad choices in data design come down to picking the "wrong" data type simply because the most appropriate one is inconvenient in some cases. This then leads to Conway's law, where one team picks the types that are most convenient only for them. The other teams are stuck with it, and end up writing bidirectional conversion code around their part, which will never be removed. The software will now always have this shape, reflecting which concerns were considered essential. What color are your types? { order: [4, 11, 9, 5, 15, 43], values: { 4: {...}, 5: {...}, 9: {...}, 11: {...}, 15: {...}, 43: {...}, }, ); For List vs Map, you can have this particular cake and eat it too. Just provide a List<Id> for the order and a Map<Id, T> for the values. If you structure a list or tree this way, then you can do both iteration and ID-based traversal in the most natural and efficient way. Don't underestimate how convenient this can be. This also has the benefit that "re-ordering items" and "editing items" are fully orthogonal operations. It decomposes the problem of "patching a list of objects" into "patching a list of IDs" and "patching N separate objects". It makes code for manipulating lists and trees universal. It lets you to decide on a case by case basis whether you need to garbage collect the map, or whether preserving unused records is actually desirable. Limiting it to ordinary JSON or JS types, rather than going full-blown OT or CRDT, is a useful baseline. With sensible schema design, at ordinary editing rates, CRDTs are overkill compared to the ability to just replay edits, or notify conflicts. This only requires version numbers and retries. Users need those things anyway: just because a CRDT converges when two people edit, doesn't mean the result is what either person wants. The only case where OTs/CRDTs are absolutely necessary is rich-text editing, and you need bespoke UI solutions for that anyway. For simple text fields, last-write-wins is perfectly fine, and also far superior to what 99% of RESTy APIs do. A CRDT is just a mechanism that translates partially ordered intents into a single state. Like, it's cool that you can make CRDT counters and CRDT lists and whatnot... but each CRDT implements only one particular resolution strategy. If it doesn't produce the desired result, you've created invalid intent no user expected. With last-write-wins, you at least have something 1 user did intend. Whether this is actually destructive or corrective is mostly a matter of schema design and minimal surface area, not math. The main thing that OTs and CRDTs do well is resolve edits on ordered sequences, like strings. If two users are typing text in the same doc, edits higher-up will shift edits down below, which means the indices change when rebased. But if you are editing structured data, you can avoid referring to indices entirely, and just use IDs instead. This sidesteps the issue, like splitting order from values. For the order, there is a simple solution: a map with a fractional index, effectively a dead-simple list CRDT. It just comes with some overhead. Using a CRDT for string editing might not even be enough. Consider Google Docs-style comments anchored to that text: their indices also need to shift on every edit. Now you need a bespoke domain-aware CRDT. Or you work around it by injecting magic markers into the text. Either way, it seems non-trivial to decouple a CRDT from the specific target domain of the data inside. The constraints get mixed in. If you ask me, this is why the field of real-time web apps is still in somewhat of a rut. It's mainly viewed as a high-end technical problem: how do we synchronize data structures over a P2P network without any data conflicts? What they should be asking is: what is the minimal amount of structure we need to reliably synchronize, so that users can have a shared workspace where intent is preserved, and conflicts are clearly signposted. And how should we design our schemas, so that our code can manipulate the data in a straightforward and reliable way? Fixing non-trivial user conflicts is simply not your job. Most SaaS out there doesn't need any of this technical complexity. Consider that a good multiplayer app requires user presence and broadcast anyway. The simplest solution is just a persistent process on a single server coordinating this, one per live workspace. It's what most MMOs do. In fast-paced video games, this even involves lag compensation. Reliable ordering is not the big problem. The situations where this doesn't scale, or where you absolutely must be P2P, are a minority. If you run into them, you must be doing very well. The solution that I've sketched out here is explicitly designed so it can comfortably be done by small teams, or even just 1 person. The (private) CAD app I showed glimpses of above is entirely built this way. It's patch all the way down and it's had undo/redo from day 1. It also has a developer mode where you can just edit the user-space part of the data model, and save/load it. When the in-house designers come to me with new UX requests, they often ask: "Is it possible to do ____?" The answer is never a laborious sigh from a front-end dev with too much on their plate. It's "sure, and we can do more." If you're not actively aware the design of schemas and code is tightly coupled, your codebase will explode, and the bulk of it will be glue. Much of it just serves to translate generalized intent into concrete state or commands. Arguments about schemas are usually just hidden debates about whose job it is to translate, split or join something. This isn't just an irrelevant matter of "wire formats" because changing the structure and format of data also changes how you address specific parts of it. In an interactive UI, you also need a reverse path, to apply edits. What I hope you are starting to realize is that this is really just the forward path in reverse, on so many levels. The result of a basic query is just the ordered IDs of the records that it matched. A join returns a tuple of record IDs per row. If you pre-assemble the associated record data for me, you actually make my job as a front-end dev harder, because there are multiple forward paths for the exact same data, in subtly different forms. What I want is to query and mutate the same damn store you do, and be told when what changes. It's table-stakes now. With well-architected data, this can be wired up mostly automatically, without any scaffolding. The implementations you encounter in the wild just obfuscate this, because they don't distinguish between the data store and the model it holds. The fact that the data store should not be corruptible, and should enforce permissions and quotas, is incorrectly extended to the entire model stored inside. But that model doesn't belong to Stanley, it belongs to the user. This is why desktop applications didn't have a "Data Export". It was just called Load and Save, and what you saved was the intent, in a file. Having a universal query or update mechanism doesn't absolve you from thinking about this either, which is why I think the patch approach is so rare: it looks like cowboy coding if you don't have the right boundaries in place. Patch is mainly for user-space mutations, not kernel-space, a concept that applies to more than just OS kernels. User-space must be very forgiving. If you avoid it, you end up with something like GraphQL, a good example of solving only half the problem badly. Its getter assembles data for consumption by laboriously repeating it in dozens of partial variations. And it turns the setter part into an unsavory mix of lasagna and spaghetti. No wonder, it was designed for a platform that owns and hoards all your data. * * * Viewed narrowly, Intent is just a useful concept to rethink how you enforce validation and constraints in a front-end app. Viewed broadly, it completely changes how you build back-ends and data flows to support that. It will also teach you how adding new aspects to your software can reduce complexity, not increase it, if done right. A good metric is to judge implementation choices by how many other places of the code need to care about them. If a proposed change requires adjustments literally everywhere else, it's probably a bad idea, unless the net effect is to remove code rather than add. I believe reconcilers like React or tree-sitter are a major guide stone here. What they do is apply structure-preserving transforms on data structures, and incrementally. They actually do the annoying part for you. I based Use.GPU on the same principles, and use it to drive CPU canvases too. The tree-based structure reflects that one function's state just might be the next function's intent, all the way down. This is a compelling argument that the data and the code should have roughly the same shape. You will also conclude there is nothing more nefarious than a hard split between back-end and front-end. You know, coded by different people, where each side is only half-aware of the other's needs, but one sits squarely in front of the other. Well-intentioned guesses about what the other end needs will often be wrong. You will end up with data types and query models that cannot answer questions concisely and efficiently, and which must be babysat to not go stale. In the last 20 years, little has changed here in the wild. On the back-end, it still looks mostly the same. Even when modern storage solutions are deployed, people end up putting SQL- and ORM-like layers on top, because that's what's familiar. The split between back-end and database has the exact same malaise. None of this work actually helps make the app more reliable, it's the opposite: every new feature makes on-going development harder. Many "solutions" in this space are not solutions, they are copes. Maybe we're overdue for a NoSQL-revival, this time with a focus on practical schema design and mutation? SQL was designed to model administrative business processes, not live interaction. I happen to believe a front-end should sit next to the back-end, not in front of it, with only a thin proxy as a broker. What I can tell you for sure is: it's so much better when intent is a first-class concept. You don't need nor want to treat user data as something to pussy-foot around, or handle like its radioactive. You can manipulate and transport it without a care. You can build rich, comfy functionality on top. Once implemented, you may find yourself not touching your network code for a very long time. It's the opposite of overwhelming, it's lovely. You can focus on building the tools your users need. This can pave the way for more advanced concepts like OT and CRDT, but will show you that neither of them is a substitute for getting your application fundamentals right. In doing so, you reach a synthesis of Dijkstra and anti-Dijkstra: your program should be provably correct in its data flow, which means it can safely break in completely arbitrary ways. Because the I in UI meant "intent" all along. More: On Variance and Extensibility Climbing Mount Effect APIs are About Policy

12 months ago • 18 votes

Sub-pixel Distance Transform

High quality font rendering for WebGPU This page includes diagrams in WebGPU, which has limited browser support. For the full experience, use Chrome on Windows or Mac, or a developer build on other platforms. In this post I will describe Use.GPU's text rendering, which uses a bespoke approach to Signed Distance Fields (SDFs). This was borne out of necessity: while SDF text is pretty common on GPUs, some of the established practice on generating SDFs from masks is incorrect, and some libraries get it right only by accident. So this will be a deep dive from first principles, about the nuances of subpixels. SDFs The idea behind SDFs is quite simple. To draw a crisp, anti-aliased shape at any size, you start from a field or image that records the distance to the shape's edge at every point, as a gradient. Lighter grays are inside, darker grays are outside. This can be a lower resolution than the target. Then you increase the contrast until the gradient is exactly 1 pixel wide at the target size. You can sample it to get a perfectly anti-aliased opacity mask: This works fine for text at typical sizes, and handles fractional shifts and scales perfectly with zero shimmering. It's also reasonably correct from a signal processing math point-of-view: it closely approximates averaging over a pixel-sized circular window, i.e. a low-pass convolution. Crucially, it takes a rendered glyph as input, which means I can remain blissfully unaware of TrueType font specifics, and bezier rasterization, and just offload that to an existing library. To generate an SDF, I started with MapBox's TinySDF library. Except, what comes out of it is wrong: The contours are noticeably wobbly and pixelated. The only reason the glyph itself looks okay is because the errors around the zero-level are symmetrical and cancel out. If you try to dilate or contract the outline, which is supposed to be one of SDF's killer features, you get ugly gunk. Compare to: The original Valve paper glosses over this aspect and uses high resolution inputs (4k) for a highly downscaled result (64). That is not an option for me because it's too slow. But I did get it to work. As a result Use.GPU has a novel subpixel-accurate distance transform (ESDT), which even does emoji. It's a combination CPU/GPU approach, with the CPU generating SDFs and the GPU rendering them, including all the debug viz. The Classic EDT The common solution is a Euclidean Distance Transform. Given a binary mask, it will produce an unsigned distance field. This holds the squared distance d² for either the inside or outside area, which you can sqrt. Like a Fourier Transform, you can apply it to 2D images by applying it horizontally on each row X, then vertically on each column Y (or vice versa). To make a signed distance field, you do this for both the inside and outside separately, and then combine the two as inside – outside or vice versa. The algorithm is one of those clever bits of 80s-style C code which is O(N), has lots of 1-letter variable names, and is very CPU cache friendly. Often copy/pasted, but rarely understood. In TypeScript it looks like this, where array is modified in-place and f, v and z are temporary buffers up to 1 row/column long. The arguments offset and stride allow the code to be used in either the X or Y direction in a flattened 2D array. for (let q = 1, k = 0, s = 0; q < length; q++) { f[q] = array[offset + q * stride]; do { let r = v[k]; s = (f[q] - f[r] + q * q - r * r) / (q - r) / 2; } while (s <= z[k] && --k > -1); k++; v[k] = q; z[k] = s; z[k + 1] = INF; } for (let q = 0, k = 0; q < length; q++) { while (z[k + 1] < q) k++; let r = v[k]; let d = q - r; array[offset + q * stride] = f[r] + d * d; } To explain what this code does, let's start with a naive version instead. Given a 1D input array of zeroes (filled), with an area masked out with infinity (empty): O = [·, ·, ·, 0, 0, 0, 0, 0, ·, 0, 0, 0, ·, ·, ·] Make a matching sequence … 3 2 1 0 1 2 3 … for each element, centering the 0 at each index: [0, 1, 2, 3, 4, 5, 6, 7, 8, 9,10,11,12,13,14] + ∞ [1, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9,10,11,12,13] + ∞ [2, 1, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9,10,11,12] + ∞ [3, 2, 1, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9,10,11] + 0 [4, 3, 2, 1, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9,10] + 0 [5, 4, 3, 2, 1, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9] + 0 [6, 5, 4, 3, 2, 1, 0, 1, 2, 3, 4, 5, 6, 7, 8] + 0 [7, 6, 5, 4, 3, 2, 1, 0, 1, 2, 3, 4, 5, 6, 7] + 0 [8, 7, 6, 5, 4, 3, 2, 1, 0, 1, 2, 3, 4, 5, 6] + ∞ [9, 8, 7, 6, 5, 4, 3, 2, 1, 0, 1, 2, 3, 4, 5] + 0 [10,9, 8, 7, 6, 5, 4, 3, 2, 1, 0, 1, 2, 3, 4] + 0 [11,10,9, 8, 7, 6, 5, 4, 3, 2, 1, 0, 1, 2, 3] + 0 [12,11,10,9, 8, 7, 6, 5, 4, 3, 2, 1, 0, 1, 2] + ∞ [13,12,11,10,9, 8, 7, 6, 5, 4, 3, 2, 1, 0, 1] + ∞ [14,13,12,11,10,9, 8, 7, 6, 5, 4, 3, 2, 1, 0] + ∞ You then add the value from the array to each element in the row: [∞, ∞, ∞, ∞, ∞, ∞, ∞, ∞, ∞, ∞, ∞, ∞, ∞, ∞, ∞] [∞, ∞, ∞, ∞, ∞, ∞, ∞, ∞, ∞, ∞, ∞, ∞, ∞, ∞, ∞] [∞, ∞, ∞, ∞, ∞, ∞, ∞, ∞, ∞, ∞, ∞, ∞, ∞, ∞, ∞] [3, 2, 1, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9,10,11] [4, 3, 2, 1, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9,10] [5, 4, 3, 2, 1, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9] [6, 5, 4, 3, 2, 1, 0, 1, 2, 3, 4, 5, 6, 7, 8] [7, 6, 5, 4, 3, 2, 1, 0, 1, 2, 3, 4, 5, 6, 7] [∞, ∞, ∞, ∞, ∞, ∞, ∞, ∞, ∞, ∞, ∞, ∞, ∞, ∞, ∞] [9, 8, 7, 6, 5, 4, 3, 2, 1, 0, 1, 2, 3, 4, 5] [10,9, 8, 7, 6, 5, 4, 3, 2, 1, 0, 1, 2, 3, 4] [11,10,9, 8, 7, 6, 5, 4, 3, 2, 1, 0, 1, 2, 3] [∞, ∞, ∞, ∞, ∞, ∞, ∞, ∞, ∞, ∞, ∞, ∞, ∞, ∞, ∞] [∞, ∞, ∞, ∞, ∞, ∞, ∞, ∞, ∞, ∞, ∞, ∞, ∞, ∞, ∞] [∞, ∞, ∞, ∞, ∞, ∞, ∞, ∞, ∞, ∞, ∞, ∞, ∞, ∞, ∞] And then take the minimum of each column: P = [3, 2, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 2, 3] This sequence counts up inside the masked out area, away from the zeroes. This is the positive distance field P. You can do the same for the inverted mask: I = [0, 0, 0, ·, ·, ·, ·, ·, 0, ·, ·, ·, 0, 0, 0] to get the complementary area, i.e. the negative distance field N: N = [0, 0, 0, 1, 2, 3, 2, 1, 0, 1, 2, 1, 0, 0, 0] That's what the EDT does, except it uses square distance … 9 4 1 0 1 4 9 …: When you apply it a second time in the second dimension, these outputs are the new input, i.e. values other than 0 or ∞. It still works because of Pythagoras' rule: d² = x² + y². This wouldn't be true if it used linear distance instead. The net effect is that you end up intersecting a series of parabolas, somewhat like a 1D slice of a Voronoi diagram: I' = [0, 0, 1, 4, 9, 4, 4, 4, 1, 1, 4, 9, 4, 9, 9] Each parabola sitting above zero is the 'shadow' of a zero-level paraboloid located some distance in a perpendicular dimension: The code is just a more clever way to do that, without generating the entire N² grid per row/column. It instead scans through the array left to right, building up a list v[k] of significant minima, with thresholds s[k] where two parabolas intersect. It adds them as candidates (k++) and discards them (--k) if they are eclipsed by a newer value. This is the first for/while loop: for (let q = 1, k = 0, s = 0; q < length; q++) { f[q] = array[offset + q * stride]; do { let r = v[k]; s = (f[q] - f[r] + q * q - r * r) / (q - r) / 2; } while (s <= z[k] && --k > -1); k++; v[k] = q; z[k] = s; z[k + 1] = INF; } Then it goes left to right again (for), and fills out the values, skipping ahead to the right minimum (k++). This is the squared distance from the current index q to the nearest minimum r, plus the minimum's value f[r] itself. The paper has more details: for (let q = 0, k = 0; q < length; q++) { while (z[k + 1] < q) k++; let r = v[k]; let d = q - r; array[offset + q * stride] = f[r] + d * d; } The Broken EDT So what's the catch? The above assumes a binary mask. As it happens, if you try to subtract a binary N from P, you have an off-by-one error: O = [·, ·, ·, 0, 0, 0, 0, 0, ·, 0, 0, 0, ·, ·, ·] I = [0, 0, 0, ·, ·, ·, ·, ·, 0, ·, ·, ·, 0, 0, 0] P = [3, 2, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 2, 3] N = [0, 0, 0, 1, 2, 3, 2, 1, 0, 1, 2, 1, 0, 0, 0] P - N = [3, 2, 1,-1,-2,-3,-2,-1, 1,-1,-2,-1, 1, 2, 3] It goes directly from 1 to -1 and back. You could add +/- 0.5 to fix that. But if there is a gray pixel in between each white and black, which we classify as both inside (0) and outside (0), it seems to work out just fine: O = [·, ·, ·, 0, 0, 0, 0, 0, ·, 0, 0, 0, ·, ·, ·] I = [0, 0, 0, 0, ·, ·, ·, 0, 0, 0, ·, 0, 0, 0, 0] P = [3, 2, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 2, 3] N = [0, 0, 0, 0, 1, 2, 1, 0, 0, 0, 1, 0, 0, 0, 0] P - N = [3, 2, 1, 0,-1,-2,-1, 0, 1, 0,-1, 0, 1, 2, 3] This is a realization that somebody must've had, and they reasoned on: "The above is correct for a 50% opaque pixel, where the edge between inside and outside falls exactly in the middle of a pixel." "Lighter grays are more inside, and darker grays are more outside. So all we need to do is treat l = level - 0.5 as a signed distance, and use l² for the initial inside or outside value for gray pixels. This will cause either the positive or negative distance field to shift by a subpixel amount l. And then the EDT will propagate this in both X and Y directions." The initial idea is correct, because this is just running SDF rendering in reverse. A gray pixel in an opacity mask is what you get when you contrast adjust an SDF and do not blow it out into pure black or white. The information inside the gray pixels is "correct", up to rounding. But there are two mistakes here. The first is that even in an anti-aliased image, you can have white pixels right next to black ones. Especially with fonts, which are pixel-hinted. So the SDF is wrong there, because it goes directly from -1 to 1. This causes the contours to double up, e.g. around this bottom edge: To solve this, you can eliminate the crisp case by deliberately making those edges very dark or light gray. But the second mistake is more subtle. The EDT works in 2D because you can feed the output of X in as the input of Y. But that means that any non-zero input to X represents another dimension Z, separate from X and Y. The resulting squared distance will be x² + y² + z². This is a 3D distance, not 2D. If an edge is shifted by 0.5 pixels in X, you would expect a 1D SDF like: […, 0.5, 1.5, 2.5, 3.5, …] = […, 0.5, 1 + 0.5, 2 + 0.5, 3 + 0.5, …] Instead, because of the squaring + square root, you will get: […, 0.5, 1.12, 2.06, 3.04, …] = […, sqrt(0.25), sqrt(1 + 0.25), sqrt(4 + 0.25), sqrt(9 + 0.25), …] The effect of l² = 0.25 rapidly diminishes as you get away from the edge, and is significantly wrong even just one pixel over. The correct shift would need to be folded into (x + …)² + (y + …)² and depends on the direction. e.g. If an edge is shifted horizontally, it ought to be (x + l)² + y², which means there is a term of 2*x*l missing. If the shift is vertical, it's 2*y*l instead. This is also a signed value, not positive/unsigned. Given all this, it's a miracle this worked at all. The only reason this isn't more visible in the final glyph is because the positive and negative fields contains the same but opposite errors around their respective gray pixels. The Not-Subpixel EDT As mentioned before, the EDT algorithm is essentially making a 1D Voronoi diagram every time. It finds the distance to the nearest minimum for every array index. But there is no reason for those minima themselves to lie at integer offsets, because the second for loop effectively resamples the data. So you can take an input mask, and tag each index with a horizontal offset Δ: O = [·, ·, ·, 0, 0, 0, 0, 0, ·, ·, ·] Δ = [A, B, C, D, E, F, G, H, I, J, K] As long as the offsets are small, no two indices will swap order, and the code still works. You then build the Voronoi diagram out of the shifted parabolas, but sample the result at unshifted indices. Problem 1 - Opposing Shifts This lead me down the first rabbit hole, which was an attempt to make the EDT subpixel capable without losing its appealing simplicity. I started by investigating the nuances of subpixel EDT in 1D. This was a bit of a mirage, because most real problems only pop up in 2D. Though there was one important insight here. O = [·, ·, ·, 0, 0, 0, 0, 0, ·, ·, ·] Δ = [·, ·, ·, A, ·, ·, ·, B, ·, ·, ·] Given a mask of zeroes and infinities, you can only shift the first and last point of each segment. Infinities don't do anything, while middle zeroes should remain zero. Using an offset A works sort of as expected: this will increase or decrease the values filled in by a fractional pixel, calculating a squared distance (d + A)² where A can be positive or negative. But the value at the shifted index itself is always (0 + A)² (positive). This means it is always outside, regardless of whether it is moving left or right. If A is moving left (–), the point is inside, and the (unsigned) distance should be 0. At B the situation is reversed: the distance should be 0 if B is moving right (+). It might seem like this is annoying but fixable, because the zeroes can be filled in by the opposite signed field. But this is only looking at the binary 1D case, where there are only zeroes and infinities. In 2D, a second pass has non-zero distances, so every index can be shifted: O = [a, b, c, d, e, f, g, h, i, j, k] Δ = [A, B, C, D, E, F, G, H, I, J, K] Now, resolving every subpixel unambiguously is harder than you might think: It's important to notice that the function being sampled by an EDT is not actually smooth: it is the minimum of a discrete set of parabolas, which cross at an angle. The square root of the output only produces a smooth linear gradient because it samples each parabola at integer offsets. Each center only shifts upward by the square of an integer in every pass, so the crossings are predictable. You never sample the 'wrong' side of (d + ...)². A subpixel EDT does not have this luxury. Subpixel EDTs are not irreparably broken though. Rather, they are only valid if they cause the unsigned distance field to increase, i.e. if they dilate the empty space. This is a problem: any shift that dilates the positive field contracts the negative, and vice versa. To fix this, you need to get out of the handwaving stage and actually understand P and N as continuous 2D fields. Problem 2 - Diagonals Consider an aliased, sloped edge. To understand how the classic EDT resolves it, we can turn it into a voronoi diagram for all the white pixel centers: Near the bottom, the field is dominated by the white pixels on the corners: they form diagonal sections downwards. Near the edge itself, the field runs perfectly vertical inside a roughly triangular section. In both cases, an arrow pointing back towards the cell center is only vaguely perpendicular to the true diagonal edge. Near perfect diagonals, the edge distances are just wrong. The distance of edge pixels goes up or right (1), rather than the more logical diagonal 0.707…. The true closest point on the edge is not part of the grid. These fields don't really resolve properly until 6-7 pixels out. You could hide these flaws with e.g. an 8x downscale, but that's 64x more pixels. Either way, you shouldn't expect perfect numerical accuracy from an EDT. Just because it's mathematically separable doesn't mean it's particularly good. In fact, it's only separable because it isn't very good at all. Problem 3 - Gradients In 2D, there is also only one correct answer to the gray case. Consider a diagonal edge, anti-aliased: Thresholding it into black, grey or white, you get: If you now classify the grays as both inside and outside, then the highlighted pixels will be part of both masks. Both the positive and negative field will be exactly zero there, and so will the SDF (P - N): This creates a phantom vertical edge that pushes apart P and N, and causes the average slope to be less than 45º. The field simply has the wrong shape, because gray pixels can be surrounded by other gray pixels. This also explains why TinySDF magically seemed to work despite being so pixelized. The l² gray correction fills in exactly the gaps in the bad (P - N) field where it is zero, and it interpolates towards a symmetrically wrong P and N field on each side. If we instead classify grays as neither inside nor outside, then P and N overlap in the boundary, and it is possible to resolve them into a coherent SDF with a clean 45 degree slope, if you do it right: What seemed like an off-by-one error is actually the right approach in 2D or higher. The subpixel SDF will then be a modified version of this field, where the P and N sides are changed in lock-step to remain mutually consistent. Though we will get there in a roundabout way. Problem 4 - Commuting It's worth pointing out: a subpixel EDT simply cannot commute in 2D. First, consider the data flow of an ordinary EDT: Information from a corner pixel can flow through empty space both when doing X-then-Y and Y-then-X. But information from the horizontal edge pixels can only flow vertically then horizontally. This is okay because the separating lines between adjacent pixels are purely vertical too: the red arrows never 'win'. But if you introduce subpixel shifts, the separating lines can turn: The data flow is still limited to the original EDT pattern, so the edge pixels at the top can only propagate by starting downward. They can only influence adjacent columns if the order is Y-then-X. For vertical edges it's the opposite. That said, this is only a problem on shallow concave curves, where there aren't any corner pixels nearby. The error is that it 'snaps' to the wrong edge point, but only when it is already several pixels away from the edge. In that case, the smaller x² term is dwarfed by the much larger y² term, so the absolute error is small after sqrt. The ESDT Knowing all this, here's how I assembled a "true" Euclidean Subpixel Distance Transform. Subpixel offsets To start we need to determine the subpixel offsets. We can still treat level - 0.5 as the signed distance for any gray pixel, and ignore all white and black for now. The tricky part is determining the exact direction of that distance. As an approximation, we can examine the 3x3 neighborhood around each gray pixel and do a least-squares fit of a plane. As long as there is at least one white and one black pixel in this neighborhood, we get a vector pointing towards where the actual edge is. In practice I apply some horizontal/vertical smoothing here using a simple [1 2 1] kernel. The result is numerically very stable, because the originally rasterized image is visually consistent. This logic is disabled for thin creases and spikes, where it doesn't work. Such points are treated as fully masked out, so that neighboring distances propagate there instead. This is needed e.g. for the pointy negative space of a W to come out right. I also implemented a relaxation step that will smooth neighboring vectors if they point in similar directions. However, the effect is quite minimal, and it rounds very sharp corners, so I ended up disabling it by default. The goal is then to do an ESDT that uses these shifted positions for the minima, to get a subpixel accurate distance field. P and N junction We saw earlier that only non-masked pixels can have offsets that influence the output (#1). We only have offsets for gray pixels, yet we concluded that gray pixels should be masked out, to form a connected SDF with the right shape (#3). This can't work. SDFs are both the problem and the solution here. Dilating and contracting SDFs is easy: add or subtract a constant. So you can expand both P and N fields ahead of time geometrically, and then undo it numerically. This is done by pushing their respective gray pixel centers in opposite directions, by half a pixel, on top of the originally calculated offset: This way, they can remain masked in in both fields, but are always pushed between 0 and 1 pixel inwards. The distance between the P and N gray pixel offsets is always exactly 1, so the non-zero overlap between P and N is guaranteed to be exactly 1 pixel wide everywhere. It's a perfect join anywhere we sample it, because the line between the two ends crosses through a pixel center. When we then calculate the final SDF, we do the opposite, shifting each by half a pixel and trimming it off with a max: SDF = max(0, P - 0.5) - max(0, N - 0.5) Only one of P or N will be > 0.5 at a time, so this is exact. To deal with pure black/white edges, I treat any black neighbor of a white pixel (horizontal or vertical only) as gray with a 0.5 pixel offset (before P/N dilation). No actual blurring needs to happen, and the result is numerically exact minus epsilon, which is nice. ESDT state The state for the ESDT then consists of remembering a signed X and Y offset for every pixel, rather than the squared distance. These are factored into the distance and threshold calculations, separated into its proper parallel and orthogonal components, i.e. X/Y or Y/X. Unlike an EDT, each X or Y pass has to be aware of both axes. But the algorithm is mostly unchanged otherwise, here X-then-Y. The X pass: At the start, only gray pixels have offsets and they are all in the range -1…1 (exclusive). With each pass of ESDT, a winning minima's offsets propagate to its affecting range, tracking the total distance (Δx, Δy) (> 1). At the end, each pixel's offset points to the nearest edge, so the squared distance can be derived as Δx² + Δy². The Y pass: You can see that the vertical distances in the top-left are practically vertical, and not oriented perpendicular to the contour on average: they have not had a chance to propagate horizontally. But they do factor in the vertical subpixel offset, and this is the dominant component. So even without correction it still creates a smooth SDF with a surprisingly small error. Fix ups The commutativity errors are all biased positively, meaning we get an upper bound of the true distance field. You could take the min of X then Y and Y then X. This would re-use all the same prep and would restore rotation-independence at the cost of 2x as many ESDTs. You could try X then Y then X at 1.5x cost with some hacks. But neither would improve diagonal areas, which were still busted in the original EDT. Instead I implemented an additional relaxation pass. It visits every pixel's target, and double checks whether one of the 4 immediate neighbors (with subpixel offset) isn't a better solution: It's a good heuristic because if the target is >1px off there is either a viable commutative propagation path, or you're so far away the error is negligible. It fixes up the diagonals, creating tidy lines when the resolution allows for it: You could take this even further, given that you know the offsets are supposed to be perpendicular to the glyph contour. You could add reprojection with a few dot products here, but getting it to not misfire on edge cases would be tricky. While you can tell the unrelaxed offsets are wrong when visualized, and the fixed ones are better, the visual difference in the output glyphs is tiny. You need to blow glyphs up to enormous size to see the difference side by side. So it too is disabled by default. The diagonals in the original EDT were wrong too and you could barely tell. Emoji An emoji is generally stored as a full color transparent PNG or SVG. The ESDT can be applied directly to its opacity mask to get an SDF, so no problem there. There are an extremely rare handful of emoji with semi-transparent areas, but you can get away with making those solid. For this I just use a filter that detects '+' shaped arrangements of pixels that have (almost) the same transparency level. Then I dilate those by 3x3 to get the average transparency level in each area. Then I divide by it to only keep the anti-aliased edges transparent. The real issue is blending the colors at the edges, when the emoji is being rendered and scaled. The RGB color of transparent pixels is undefined, so whatever values are there will blend into the surrounding pixels, e.g. creating a subtle black halo: Not Premultiplied Premultiplied A common solution is premultiplied alpha. The opacity is baked into the RGB channels as (R * A, G * A, B * A, A), and transparent areas must be fully transparent black. This allows you to use a premultiplied blend mode where the RGB channels are added directly without further scaling, to cancel out the error. But the alpha channel of an SDF glyph is dynamic, and is independent of the colors, so it cannot be premultiplied. We need valid color values even for the fully transparent areas, so that up- or downscaling is still clean. Luckily the ESDT calculates X and Y offsets which point from each pixel directly to the nearest edge. We can use them to propagate the colors outward in a single pass, filling in the entire background. It doesn't need to be very accurate, so no filtering is required. RGB channel Output The result looks pretty great. At normal sizes, the crisp edge hides the fact that the interior is somewhat blurry. Emoji fonts are supported via the underlying ab_glyph library, but are too big for the web (10MB+). So you can just load .PNGs on demand instead, at whatever resolution you need. Hooking it up to the 2D canvas to render native system emoji is left as an exercise for the reader. Use.GPU does not support complex Unicode scripts or RTL text yet—both are a can of worms I wish to offload too—but it does support composite emoji like "pirate flag" (white flag + skull and crossbones) or "male astronaut" (astronaut + man) when formatted using the usual Zero-Width Joiners (U+200D) or modifiers. Shading Finally, a note on how to actually render with SDFs, which is more nuanced than you might think. I pack all the SDF glyphs into an atlas on-demand, the same I use elsewhere in Use.GPU. This has a custom layout algorithm that doesn't backtrack, optimized for filling out a layout at run-time with pieces of a similar size. Glyphs are rasterized at 1.5x their normal font size, after rounding up to the nearest power of two. The extra 50% ensures small fonts on low-DPI displays still use a higher quality SDF, while high-DPI displays just upscale that SDF without noticeable quality loss. The rounding ensures similar font sizes reuse the same SDFs. You can also override the detail independent of font size. To determine the contrast factor to draw an SDF, you generally use screen-space derivatives. There are good and bad ways of doing this. Your goal is to get a ratio of SDF pixels to screen pixels, so the best thing to do is give the GPU the coordinates of the SDF texture pixels, and ask it to calculate the difference for that between neighboring screen pixels. This works for surfaces in 3D at an angle too. Bad ways of doing this will instead work off relative texture coordinates, and introduce additional scaling factors based on the view or atlas size, when they are all just supposed to cancel out. As you then adjust the contrast of an SDF to render it, it's important to do so around the zero-level. The glyph's ideal vector shape should not expand or contract as you scale it. Like TinySDF, I use 75% gray as the zero level, so that more SDF range is allocated to the outside than the inside, as dilating glyphs is much more common than contraction. At the same time, a pixel whose center sits exactly on the zero level edge is actually half inside, half outside, i.e. 50% opaque. So, after scaling the SDF, you need to add 0.5 to the value to get the correct opacity for a blend. This gives you a mathematically accurate font rendering that approximates convolution with a pixel-sized circle or box. But I go further. Fonts were not invented for screens, they were designed for paper, with ink that bleeds. Certain renderers, e.g. MacOS, replicate this effect. The physical bleed distance is relatively constant, so the larger the font, the smaller the effect of the bleed proportionally. I got the best results with a 0.25 pixel bleed at 32px or more. For smaller sizes, it tapers off linearly. When you zoom out blocks of text, they get subtly fatter instead of thinning out, and this is actually a great effect when viewing document thumbnails, where lines of text become a solid mass at the point where the SDF resolution fails anyway. In Use.GPU I prefer to use gamma correct, linear RGB color, even for 2D. What surprised me the most is just how unquestionably superior this looks. Text looks rock solid and readable even at small sizes on low-DPI. Because the SDF scales, there is no true font hinting, but it really doesn't need it, it would just be a nice extra. Presumably you could track hinted points or edges inside SDF glyphs and then do a dynamic distortion somehow, but this is an order of magnitude more complex than what it is doing now, which is splat a contrasted texture on screen. It does have snapping you can turn on, which avoids jiggling of individual letters. But if you turn it off, you get smooth subpixel everything: I was always a big fan of the 3x1 subpixel rendering used on color LCDs (i.e. ClearType and the like), and I was sad when it was phased out due to the popularity of high-DPI displays. But it turns out the 3x res only offered marginal benefits... the real improvement was always that it had a custom gamma correct blend mode, which is a thing a lot of people still get wrong. Even without RGB subpixels, gamma correct AA looks great. Converting the entire desktop to Linear RGB is also not going to happen in our lifetime, but I really want it more now. The "blurry text" that some people associate with anti-aliasing is usually just text blended with the wrong gamma curve, and without an appropriate bleed for the font in question. * * * If you want to make SDFs from existing input data, subpixel accuracy is crucial. Without it, fully crisp strokes actually become uneven, diagonals can look bumpy, and you cannot make clean dilated outlines or shadows. If you use an EDT, you have to start from a high resolution source and then downscale away all the errors near the edges. But if you use an ESDT, you can upscale even emoji PNGs with decent results. It might seem pretty obvious in hindsight, but there is a massive difference between getting it sort of working, and actually getting all the details right. There were many false starts and dead ends, because subpixel accuracy also means one bad pixel ruins it. In some circles, SDF text is an old party trick by now... but a solid and reliable implementation is still a fair amount of work, with very little to go off for the harder parts. By the way, I did see if I could use voronoi techniques directly, but in terms of computation it is much more involved. Pretty tho: The ESDT is fast enough to use at run-time, and the implementation is available as a stand-alone import for drop-in use. This post started as a single live WebGPU diagram, which you can play around with. The source code for all the diagrams is available too.

a year ago • 36 votes

Fuck It, We'll Do It Live

How the Live effect run-time is implemented In this post I describe how the Live run-time internals are implemented, which drive Use.GPU. Some pre-existing React and FP effect knowledge is useful. I have written about Live before, but in general terms. You may therefor have a wrong impression of this endeavor. When a junior engineer sees an application doing complex things, they're often intimidated by the prospect of working on it. They assume that complex functionality must be the result of complexity in code. The fancier the app, the less understandable it must be. This is what their experience has been so far: seniority correlates to more and hairier lines of code. After 30 years of coding though, I know it's actually the inverse. You cannot get complex functionality working if you've wasted all your complexity points on the code itself. This is the main thing I want to show here, because this post mainly describes 1 data structure and a handful of methods. Live has a real-time inspector, so a lot of this can be demonstrated live. Reading this on a phone is not recommended, the inspector is made for grown-ups. The story so far: The main mechanism of Live is to allow a tree to expand recursively like in React, doing breadth-first expansion. This happens incrementally, and in a reactive, rewindable way. You use this to let interactive programs knit themselves together at run-time, based on the input data. Like a simple CLI program with a main() function, the code runs top to bottom, and then stops. It produces a finite execution trace that you can inspect. To become interactive and to animate, the run-time will selectively rewind, and re-run only parts, in response to external events. It's a fusion of immediate and retained mode UI, offering the benefits of both and the downsides of neither, not limited to UI. This relies heavily on FP principles such as pure functions, immutable data structures and so on. But the run-time itself is very mutable: the whole idea is to centralize all the difficult parts of tracking changes in one place, and then forget about them. Live has no dependencies other than a JavaScript engine and these days consists of ~3600 lines. If you're still not quite sure what the Live component tree actually is, it's 3 things at once: a data dependency graph an execution trace a tree-shaped cache The properties of the software emerge because these aspects are fully aligned inside a LiveComponent. Functionally Mayonnaise You can approach this from two sides, either from the UI side, or from the side of functional Effects. Live Components A LiveComponent (LC) is a React UI function component (FC) with 1 letter changed, at first: const MyComponent: LC<MyProps> = (props: MyProps) => { const {wat} = props; // A memo hook // Takes dependencies as array const slow = useMemo(() => expensiveComputation(wat), [wat]); // Some local state const [state, setState] = useState(1); // JSX expressions with props and children // These are all names of LC functions to call + their props/arguments return ( <OtherComponent> <Foo value={slow} /> <Bar count={state} setCount={setState} /> </OtherComponent> ); }; The data is immutable, and the rendering appears stateless: it returns a pure data structure for given input props and current state. The component uses hooks to access and manipulate its own state. The run-time will unwrap the outer layer of the <JSX> onion, mount and reconcile it, and then recurse. let _ = await ( <OtherComponent> <Foo foo={foo} /> <Bar /> </OtherComponent> ); return null; The code is actually misleading though. Both in Live and React, the return keyword here is technically wrong. Return implies passing a value back to a parent, but this is not happening at all. A parent component decided to render <MyComponent>, yes. But the function itself is being called by Live/React. it's yielding JSX to the Live/React run-time to make a call to OtherComponent(...). There is no actual return value. Because a <Component> can't return a value to its parent, the received _ will always be null too. The data flow is one-way, from parent to child. Effects An Effect is basically just a Promise/Future as a pure value. To first approximation, it's a () => Promise: a promise that doesn't actually start unless you call it a second time. Just like a JSX tag is like a React/Live component waiting to be called. An Effect resolves asynchronously to a new Effect, just like <JSX> will render more <JSX>. Unlike a Promise, an Effect is re-usable: you can fire it as many times as you like. Just like you can keep rendering the same <JSX>. let value = yield ( OtherEffect([ Foo(foo), Bar(), ]) ); // ... return value; So React is like an incomplete functional Effect system. Just replace the word Component with Effect. OtherEffect is then some kind of decorator which describes a parallel dispatch to Effects Foo and Bar. A real Effect system will fork, but then join back, gathering the returned values, like a real return statement. Unlike React components, Effects are ephemeral: no state is retained after they finish. The purity is actually what makes them appealing in production, to manage complex async flows. They're also not incremental/rewindable: they always run from start to finish. Pure Returns State Incremental React ✅ ❌ ✅ ✅ Effects ✅ ✅ ❌ ❌ You either take an effect system and make it incremental and stateful, or you take React and add the missing return data path I chose the latter option. First, because hooks are an excellent emulsifier. Second, because the big lesson from React is that plain, old, indexed arrays are kryptonite for incremental code. Unless you've deliberately learned how to avoid them, you won't get far, so it's better to start from that side. This breakdown is divided into three main parts: the rendering of 1 component the environment around a component the overall tree update loop Components The component model revolves around a few core concepts: Fibers Hooks and State Mounts Memoization Inlining Components form the "user land" of Live. You can do everything you need there without ever calling directly into the run-time's "kernel". Live however does not shield its internals. This is fine, because I don't employ hundreds of junior engineers who would gleefully turn that privilege into a cluster bomb of spaghetti. The run-time is not extensible anyhow: what you see is what you get. The escape hatch is there to support testing and debugging. Shielding this would be a game of hide-the-reference, creating a shadow-API for privileged friend packages, and so on. Ain't nobody got time for that. React has an export called DONT_USE_THIS_OR_YOU_WILL_BE_FIRED, Live has THIS_WAY___IF_YOU_DARE and it's called useFiber. Fibers Borrowing React terminology, a mounted <Component> function is called a fiber, despite this being single threaded. Each persists for the component lifetime. To start, you call render(<App />). This creates and renders the first fiber. type LiveFiber = { // Fiber ID id: number, // Component function f: Function, // Arguments (props, etc.) args: any[], // ... } Fibers are numbered with increasing IDs. In JS this means you can create 253 fibers before it crashes, which ought to be enough for anybody. It holds the component function f and its latest arguments args. Unlike React, Live functions aren't limited to only a single props argument. Each fiber is rendered from a <JSX> tag, which is a plain data structure. The Live version is very simple. type Key = number | string; type JSX.Element = { // Same as fiber f: Function, args: any[], // Element key={...} key?: string | number, // Rendered by fiber ID by: number, } Another name for this type is a DeferredCall. This is much leaner than React's JSX type, although Live will gracefully accept either. In Live, JSX syntax is also optional, as you can write use(Component, …) instead of <Component … />. Calls and fibers track the ID by of the fiber that rendered them. This is always an ancestor, but not necessarily the direct parent. fiber.bound = () => { enterFiber(fiber); const {f, args} = fiber; const jsx = f(...args); exitFiber(fiber); return jsx; }; The fiber holds a function bound. This binds f to the fiber itself, always using the current fiber.args as arguments. It wraps the call in an enter and exit function for state housekeeping. This can then be called via renderFiber(fiber) to get jsx. This is only done during an ongoing render cycle. { // ... state: any[], pointer: number, } Hooks and State Each fiber holds a local state array and a temporary pointer: Calling a hook like useState taps into this state without an explicit reference to it. In Live, this is implemented as a global currentFiber variable, combined with a local fiber.pointer starting at 0. Both are initialized by enterFiber(fiber). The state array holds flattened triplets, one per hook. They're arranged as [hookType, A, B]. Values A and B are hook-specific, but usually hold a value and a dependencies array. In the case useState, it's just the [value, setValue] pair. The fiber.pointer advances by 3 slots every time a hook is called. Tracking the hookType allows the run-time to warn you if you call hooks in a different order than before. The basic React hooks don't need any more state than this and can be implemented in ~20 lines of code each. This is useMemo: export const useMemo = <T>( callback: () => T, dependencies: any[] = NO_DEPS, ): T => { const fiber = useFiber(); const i = pushState(fiber, Hook.MEMO); let {state} = fiber; let value = state![i]; const deps = state![i + 1]; if (!isSameDependencies(deps, dependencies)) { value = callback(); state![i] = value; state![i + 1] = dependencies; } return value as unknown as T; } useFiber just returns currentFiber and doesn't count as a real hook (it has no state). It only ensures you cannot call a hook outside of a component render. export const useNoHook = (hookType: Hook) => () => { const fiber = useFiber(); const i = pushState(fiber, hookType); const {state} = fiber; state![i] = undefined; state![i + 1] = undefined; }; No-hooks like useNoMemo are also implemented, which allow for conditional hooks: write a matching else branch for any if. To ensure consistent rendering, a useNoHook will dispose of any state the useHook had, rather than just being a no-op. The above is just the basic version for simple hooks without cleanup. This also lets the run-time support early return cleanly in Components: when exitFiber(fiber) is called, all remaining unconsumed state is disposed of with the right no-hook. If someone calls a setState, this is added to a dispatch queue, so changes can be batched together. If f calls setState during its own render, this is caught and resolved within the same render cycle, by calling f again. A setState which is a no-op is dropped (pointer equality). You can see however that Live hooks are not pure: when a useMemo is tripped, it will immediately overwrite the previous state during the render, not after. This means renders in Live are not stateless, only idempotent. This is very deliberate. Live doesn't have a useEffect hook, it has a useResource hook that is like a useMemo with a useEffect-like disposal callback. While it seems to throw React's orchestration properties out the window, this is not actually so. What you get in return is an enormous increase in developer ergonomics, offering features React users are still dreaming of, running off 1 state array and 1 pointer. Live is React with the training wheels off, not with holodeck security protocols disabled, but this takes a while to grok. Mounts After rendering, the returned/yielded <JSX> value is reconciled with the previous rendered result. This is done by updateFiber(fiber, value). New children are mounted, while old children are unmounted or have their args replaced. Only children with the same f as before can be updated in place. { // ... // Static mount mount?: LiveFiber, // Dynamic mounts mounts?: Map<Key, LiveFiber>, lookup?: Map<Key, number>, order?: Key[], // Continuation next?: LiveFiber, // Fiber type type?: LiveComponent, // ... } Mounts are tracked inside the fiber, either as a single mount, or a map mounts, pointing to other fiber objects. The key for mounts is either an array index 0..N or a user-defined key. Keys must be unique. The order of the keys is kept in a list. A reverse lookup map is created if they're not anonymous indices. The mount is only used when a component renders 1 other statically. This excludes arrays of length 1. If a component switches between mount and mounts, all existing mounts are discarded. Continuations are implemented as a special next mount. This is mounted by one of the built-in fenced operators such as <Capture> or <Gather>. In the code, mounting is done via: mountFiberCall(fiber, call) (static) reconcileFiberCalls(fiber, calls) (dynamic) mountFiberContinuation(fiber, call) (next). Each will call updateMount(fiber, mount, jsx, key?, hasKeys?). If an existing mount (with the same key) is compatible it's updated, otherwise a replacement fiber is made with makeSubFiber(…). It doesn't update the parent fiber, rather it just returns the new state of the mount (LiveFiber | null), so it can work for all 3 mounting types. Once a fiber mount has been updated, it's queued to be rendered with flushMount. If updateMount returns false, the update was a no-op because fiber arguments were identical (pointer equality). The update will be skipped and the mount not flushed. This follows the same implicit memoization rule that React has. It tends to trigger when a stateful component re-renders an old props.children. A subtle point here is that fibers have no links/pointers pointing back to their parent. This is part practical, part ideological. It's practical because it cuts down on cyclic references to complicate garbage collection. It's ideological because it helps ensures one-way data flow. There is also no global collection of fibers, except in the inspector. Like in an effect system, the job of determining what happens is entirely the result of an ongoing computation on JSX, i.e. something passed around like pure, immutable data. The tree determines its own shape as it's evaluated. Queue and Path Live needs to process fibers in tree order, i.e. as in a typical tree list view. To do so, fibers are compared as values with compareFibers(a, b). This is based on references that are assigned only at fiber creation. It has a path from the root of the tree to the fiber (at depth depth), containing the indices or keys. { // ... depth: number, path: Key[], keys: ( number | Map<Key, number> )[], } A continuation next is ordered after the mount or mounts. This allows data fences to work naturally: the run-time only ensures all preceding fibers have been run first. For this, I insert an extra index into the path, 0 or 1, to distinguish the two sub-trees. If many fibers have a static mount (i.e. always 1 child), this would create paths with lots of useless zeroes. To avoid this, a single mount has the same path as its parent, only its depth is increased. Paths can still be compared element-wise, with depth as the tie breaker. This easily reduces typical path length by 70%. This is enough for children without keys, which are spawned statically. Their order in the tree never changes after creation, they can only be updated in-place or unmounted. But for children with a key, the expectation is that they persist even if their order changes. Their keys are just unsorted ids, and their order is stored in the fiber.order and fiber.lookup of the parent in question. This is referenced in the fiber.keys array. It's a flattened list of pairs [i, fiber.lookup], meaning the key at index i in the path should be compared using fiber.lookup. To keep these keys references intact, fiber.lookup is mutable and always modified in-place when reconciling. Memoization If a Component function is wrapped in memo(...), it won't be re-rendered if its individual props haven't changed (pointer equality). This goes deeper than the run-time's own oldArgs !== newArgs check. { // ... version?: number, memo?: number, runs?: number, } For this, memoized fibers keep a version around. They also store a memo which holds the last rendered version, and a run count runs for debugging: The version is used as one of the memo dependencies, along with the names and values of the props. Hence a memo(...) cache can be busted just by incrementing fiber.version, even if the props didn't change. Versions roll over at 32-bit. To actually do the memoization, it would be nice if you could just wrap the whole component in useMemo. It doesn't work in the React model because you can't call other hooks inside hooks. So I've brought back the mythical useYolo... An earlier incarnation of this allowed fiber.state scopes to be nested, but lacked a good purpose. The new useYolo is instead a useMemo you can nest. It effectively hot swaps the entire fiber.state array with a new one kept in one of the slots: This is then the first hook inside fiber.state. If the memo succeeds, the yolo'd state is preserved without treating it as an early return. Otherwise the component runs normally. Yolo'ing as the first hook has a dedicated fast path but is otherwise a perfectly normal hook. The purpose of fiber.memo is so the run-time can tell whether it rendered the same thing as before, and stop. It can just compare the two versions, leaving the specifics of memoization entirely up to the fiber component itself. For example, to handle a custom arePropsEqual function in memo(…). I always use version numbers as opposed to isDirty flags, because it leaves a paper trail. This provides the same ergonomics for mutable data as for immutable data: you can store a reference to a previous value, and do an O(1) equality check to know whether it changed since you last accessed it. Whenever you have a handle which you can't look inside, such as a pointer to GPU memory, it's especially useful to keep a version number on it, which you bump every time you write to it. It makes debugging so much easier. Inlining Built-in operators are resolved with a hand-coded routine post-render, rather than being "normal" components. Their component functions are just empty and there is a big dispatch with if statements. Each is tagged with a isLiveBuiltin: true. If a built-in operator is an only child, it's usually resolved inline. No new mount is created, it's immediately applied as part of updating the parent fiber. The glue in between tends to be "kernel land"-style code anyway, it doesn't need a whole new fiber, and it's not implemented in terms of hooks. The only fiber state it has is the type (i.e. function) of the last rendered JSX. There are several cases where it cannot inline, such as rendering one built-in inside another built-in, or rendering a built-in as part of an array. So each built-in can always be mounted independently if needed. From an architectural point of view, inlining is just incidental complexity, but this significantly reduces fiber overhead and keeps the user-facing component tree much tidier. It introduces a few minor problems around cleanup, but those are caught and handled. Live also has a morph operator. This lets you replace a mount with another component, without discarding any matching children or their state. The mount's own state is still discarded, but its f, args, bound and type are modified in-place. A normal render follows, which will reconcile the children. This is implemented in morphFiberCall. It only works for plain vanilla components, not other built-ins. The reason to re-use the fiber rather than transplant the children is so that references in children remain unchanged, without having to rekey them. In Live, I never do a full recursive traversal of any sub-tree, unless that traversal is incremental and memoized. This is a core property of the system. Deep recursion should happen in user-land. Environment Fibers have access to a shared environment, provided by their parent. This is created in user-land through built-in ops and accessed via hooks. Context and captures Gathers and yeets Fences and suspend Quotes and reconcilers Unquote + quote Context and captures Live extends the classic React context: { // ... context: { values: Map<LiveContext | LiveCapture, Ref<any>>, roots: Map<LiveContext | LiveCapture, number | LiveFiber>, }, } A LiveContext provides 1 value to N fibers. A LiveCapture collects N values into 1 fiber. Each is just an object created in user-land with makeContext / makeCapture, acting as a unique key. It can also hold a default value for a context. The values map holds the current value of each context/capture. This is boxed inside a Ref as {current: value} so that nested sub-environments share values for inherited contexts. The roots map points to the root fibers providing or capturing. This is used to allow useContext and useCapture to set up the right data dependency just-in-time. For a context, this points upstream in the tree, so to avoid a reverse reference, it's a number. For a capture, this points to a downstream continuation, i.e. the next of an ancestor, and can be a LiveFiber. Normally children just share their parent's context. It's only when you <Provide> or <Capture> that Live builds a new, immutable copy of values and roots with a new context/capture added. Each context and capture persists for the lifetime of its sub-tree. Captures build up a map incrementally inside the Ref while children are rendered, keyed by fiber. This is received in tree order after sorting: <Capture context={...} children={...} then={(values: T[]) => { ... }} /> You can also just write capture(context, children, then), FYI. This is an await or yield in disguise, where the then closure is spiritually part of the originating component function. Therefor it doesn't need to be memoized. The state of the next fiber is preserved even if you pass a new function instance every time. Unlike React-style render props, then props can use hooks, because they run on an independent next fiber called Resume(…). This fiber will be re-run when values changes, and can do so without re-running Capture itself. A then prop can render new elements, building a chain of next fibers. This acts like a rewindable generator, where each Resume provides a place where the code can be re-entered, without having to explicitly rewind any state. This requires the data passed into each closure to be immutable. The logic for providing or capturing is in provideFiber(fiber, ...) and captureFiber(fiber, ...). Unlike other built-ins, these are always mounted separately and are called at the start of a new fiber, not the end of previous one. Their children are then immediately reconciled by inlineFiberCall(fiber, calls). Gathers and yeets Live offers a true return, in the form of yeet(value) (aka <Yeet>{value}</Yeet>). This passes a value back to a parent. These values are gathered in an incremental map-reduce along the tree, to a root that mounted a gathering operation. It's similar to a Capture, except it visits every parent along the way. It's the complement to tree expansion during rendering. This works for any mapper and reducer function via <MapReduce>. There is also an optimized code path for a simple array flatMap <Gather>, as well as struct-of-arrays flatMap <MultiGather>. It works just like a capture: <Gather children={...} then={( value: T[] ) => { ... }} /> <MultiGather children={...} then={( value: Record<string, T[]> ) => { ... }} /> Each fiber in a reduction has a fiber.yeeted structure, created at mount time. Like a context, this relation never changes for the lifetime of the component. It acts as a persistent cache for a yeeted value of type A and its map-reduction reduced of type B: { yeeted: { // Same as fiber (for inspecting) id: number, // Reduction cache at this fiber value?: A, reduced?: B, // Parent yeet cache parent?: FiberYeet<A, B>, // Reduction root root: LiveFiber, // ... }, } The last value yeeted by the fiber is kept so that all yeets are auto-memoized. Each yeeted points to a parent. This is not the parent fiber but its fiber.yeeted. This is the parent reduction, which is downstream in terms of data dependency, not upstream. This forms a mirrored copy of the fiber tree and respects one-way data flow: Again the linked root fiber (sink) is not an ancestor, but the next of an ancestor, created to receive the final reduced value. If the reduced value is undefined, this signifies an empty cache. When a value is yeeted, parent caches are busted recursively towards the root, until an undefined is encountered. If a fiber mounts or unmounts children, it busts its reduction as well. Fibers that yeet a value cannot also have children. This isn't a limitation because you can render a yeet beside other children, as just another mount, without changing the semantics. You can also render multiple yeets, but it's faster to just yeet a single list. If you yeet undefined, this acts as a zero-cost signal: it does not affect the reduced values, but it will cause the reducing root fiber to be re-invoked. This is a tiny concession to imperative semantics, wildly useful. This may seem very impure, but actually it's the opposite. With clean, functional data types, there is usually a "no-op" value that you could yeet: an empty array or dictionary, an empty function, and so on. You can always force-refresh a reduction without meaningfully changing the output, but it causes a lot of pointless cache invalidation in the process. Zero-cost signals are just an optimization. When reducing a fiber that has a gathering next, it takes precedence over the fiber's own reduction: this is so that you can gather and reyeet in series, with the final reduction returned. Fences and suspend The specifics of a gathering operation are hidden behind a persistent emit and gather callback, derived from a classic map and reduce: { yeeted: { // ... // Emit a value A yeeted from fiber emit: (fiber: LiveFiber, value: A) => void, // Gather a reduction B from fiber gather: (fiber: LiveFiber, self: boolean) => B, // Enclosing yeet scope scope?: FiberYeet<any, any>, }, } Gathering is done by the root reduction fiber, so gather is not strictly needed here. It's only exposed so you can mount a <Fence> inside an existing reduction, without knowing its specifics. A fence will grab the intermediate reduction value at that point in the tree and pass it to user-land. It can then be reyeeted. One such use is to mimic React Suspense using a special toxic SUSPEND symbol. It acts like a NaN, poisoning any reduction it's a part of. You can then fence off a sub-tree to contain the spill and substitute it with its previous value or a fallback. In practice, gather will delegate to one of gatherFiberValues, multiGatherFiberValues or mapReduceFiberValues. Each will traverse the sub-tree, reuse any existing reduced values (stopping the recursion early), and fill in any undefineds via recursion. Their code is kinda gnarly, given that it's just map-reduce, but that's because they're hand-rolled to avoid useless allocations. The self argument to gather is such an optimization, only true for the final user-visible reduction. This lets intermediate reductions be type unsafe, e.g. to avoid creating pointless 1 element arrays. At a gathering root, the enclosing yeet scope is also kept. This is to cleanly unmount an inlined gather, by restoring the parent's yeeted. Quotes and reconcilers Live has a reconciler in reconcileFiberCalls, but it can also mount <Reconciler> as an effect via mountFiberReconciler. This is best understood by pretending this is React DOM. When you render a React tree which mixes <Components> with <html>, React reconciles it, and extracts the HTML parts into a new tree: <App> <div> <Layout> => <div> <div> <span> <div> <img> <Header> <span> <Logo> <img> Each HTML element is implicitly quoted inside React. They're only "activated" when they become real on the right. The ones on the left are only stand-ins. That's also what a Live <Reconcile> does. It mounts a normal tree of children, but it simultaneously mounts an independent second tree, under its next mount. If you render this: <App> <Reconcile> <Layout> <Quote> <Div> <Div> <Unquote> <Header> <Quote> <Span> <Unquote> <Logo> <Quote> <Img /> ... You will get: It adds a quote environment to the fiber: { // ... quote: { // Reconciler fiber root: number, // Fiber in current sub-tree from: number, // Fiber in other sub-tree to: LiveFiber, // Enclosing reconciler scope scope?: FiberQuote, } } When you render a <Quote>...</Quote>, whatever's inside ends up mounted on the to fiber. Quoted fibers will have a similar fiber.unquote environment. If they render an <Unquote>...</Unquote>, the children are mounted back on the quoting fiber. Each time, the quoting or unquoting fiber becomes the new to fiber on the other side. The idea is that you can use this to embed one set of components inside another as a DSL, and have the run-time sort them out. This all happens in mountFiberQuote(…) and mountFiberUnquote(…). It uses reconcileFiberCall(…) (singular). This is an incremental version of reconcileFiberCalls(…) (plural) which only does one mount/unmount at a time. The fiber id of the quote or unquote is used as the key of the quoted or unquoted fiber. const Queue = ({children}) => ( reconcile( quote( gather( unquote(children), (things: any[]) => <Render things={things} /> )))); The quote and unquote environments are separate so that reconcilers can be nested: at any given place, you can unquote 'up' or quote 'down'. Because you can put multiple <Unquote>s inside one <Quote>, it can also fork. The internal non-JSX dialect is very Lisp-esque, you can rap together some pretty neat structures with this. Because quote are mounted and unmounted incrementally, there is a data fence Reconcile(…) after each (un)quote. This is where the final set is re-ordered if needed. The data structure actually violates my own rule about no-reverse links. After you <Quote>, the fibers in the second tree have a link to the quoting fiber which spawned them. And the same applies in the other direction after you <Unquote>. The excuse is ergonomics. I could break the dependency by creating a separate sub-fiber of <Quote> to serve as the unquoting point, and vice versa. But this would bloat both trees with extra fibers, just for purity's sake. It already has unavoidable extra data fences, so this matters. At a reconciling root, the enclosing quote scope is added to fiber.quote, just like in yeeted, again for clean unmounting of inlined reconcilers. Unquote-quote There is an important caveat here. There are two ways you could implement this. One way is that <Quote>...</Quote> is a Very Special built-in, which does something unusual: it would traverse the children tree it was given, and go look for <Unquote>...</Unquote>s inside. It would have to do so recursively, to partition the quoted and unquoted JSX. Then it would have to graft the quoted JSX to a previous quote, while grafting the unquoted parts to itself as mounts. This is the React DOM mechanism, obfuscated. This is also how quoting works in Lisp: it switches between evaluation mode and AST mode. I have two objections. The first is that this goes against the whole idea of evaluating one component incrementally at a time. It wouldn't be working with one set of mounts on a local fiber: it would be building up args inside one big nested JSX expression. JSX is not supposed to be a mutable data structure, you're supposed to construct it immutably from the inside out, not from the outside in. The second is that this would only work for 'balanced' <Quote>...<Unquote>... pairs appearing in the same JSX expression. If you render: <Present> <Slide /> </Present> ...then you couldn't have <Present> render a <Quote> and <Slide> render an <Unquote> and have it work. It wouldn't be composable as two separate portals. The only way for the quotes/unquotes to be revealed in such a scenario is to actually render the components. This means you have to actively run the second tree as it's being reconciled, same as the first. There is no separate update + commit like in React DOM. This might seem pointless, because all this does is thread the data flow into a zigzag between the two trees, knitting the quote/unquote points together. The render order is the same as if <Quote> and <Unquote> weren't there. The path and depth of quoted fibers reveals this, which is needed to re-render them in the right order later. The key difference is that for all other purposes, those fibers do live in that spot. Each tree has its own stack of nested contexts. Reductions operate on the two separate trees, producing two different, independent values. This is just "hygienic macros" in disguise, I think. Use.GPU's presentation system uses a reconciler to wrap the layout system, adding slide transforms and a custom compositor. This is sandwiched in-between it and the normal renderer. A plain declarative tree of markup can be expanded into: <Device> <View> <Present> <Slide> <Object /> <Object /> </Slide> <Slide> <Object /> <Object /> </Slide> </Present> </View> </Device> I also use a reconciler to produce the WebGPU command queue. This is shared for an entire app and sits at the top. The second tree just contains quoted yeets. I use zero-cost signals here too, to let data sources signal that their contents have changed. There is a short-hand <Signal /> for <Quote><Yeet /></Quote>. Note that you cannot connect the reduction of tree 1 to the root of tree 2: <Reconcile> does not have a then prop. It doesn't make sense because the next fiber gets its children from elsewhere, and it would create a rendering cycle if you tried anyway. If you need to spawn a whole second tree based on a first, that's what a normal gather already does. You can use it to e.g. gather lambdas that return memoized JSX. This effectively acts as a two-phase commit. The Use.GPU layout system does this repeatedly, with several trees + gathers in a row. It involves constraints both from the inside out and the outside in, so you need both tree directions. The output is UI shapes, which need to be batched together for efficiency and turned into a data-driven draw call. The Run-Time With all the pieces laid out, I can now connect it all together. Before render(<App />) can render the first fiber, it initializes a very minimal run-time. So this section will be kinda dry. This is accessed through fiber.host and exposes a handful of APIs: a queue of pending state changes a priority queue for traversal a fiber-to-fiber dependency tracker a resource disposal tracker a stack slicer for reasons State changes When a setState is called, the state change is added to a simple queue as a lambda. This allows simultaneous state changes to be batched together. For this, the host exposes a schedule and a flush method. { // ... host: { schedule: (fiber: LiveFiber, task?: () => boolean | void) => void, flush: () => void, // ... } This comes from makeActionScheduler(…). It wraps a native scheduling function (e.g. queueMicrotask) and an onFlush callback: const makeActionScheduler = ( schedule: (flush: ArrowFunction) => void, onFlush: (fibers: LiveFiber[]) => void, ) => { // ... return {schedule, flush}; } The callback is set up by render(…). It will take the affected fibers and call renderFibers(…) (plural) on them. The returned schedule(…) will trigger a flush, so flush() is only called directly for sync execution, to stay within the same render cycle. Traversal The host keeps a priority queue (makePriorityQueue) of pending fibers to render, in tree order: { // ... host: { // ... visit: (fiber: LiveFiber) => void, unvisit: (fiber: LiveFiber) => void, pop: () => LiveFiber | null, peek: () => LiveFiber | null, } } renderFibers(…) first adds the fibers to the queue by calling host.visit(fiber). A loop in renderFibers(…) will then call host.peek() and host.pop() until the queue is empty. It will call renderFiber(…) and updateFiber(…) on each, which will call host.unvisit(fiber) in the process. This may also cause other fibers to be added to the queue. The priority queue is a singly linked list of fibers. It allows fast appends at the start or end. To speed up insertions in the middle, it remembers the last inserted fiber. This massively speeds up the very common case where multiple fibers are inserted into an existing queue in tree order. Otherwise it just does a linear scan. It also has a set of all the fibers in the queue, so it can quickly do presence checks. This means visit and unvisit can safely be called blindly, which happens a lot. // Re-insert all fibers that descend from fiber const reorder = (fiber: LiveFiber) => { const {path} = fiber; const list: LiveFiber[] = []; let q = queue; let qp = null; while (q) { if (compareFibers(fiber, q.fiber) >= 0) { hint = qp = q; q = q.next; continue; } if (isSubNode(fiber, q.fiber)) { list.push(q.fiber); if (qp) { qp.next = q.next; q = q.next; } else { pop(); q = q.next; } } break; } if (list.length) { list.sort(compareFibers); list.forEach(insert); } }; There is an edge case here though. If a fiber re-orders its keyed children, the compareFibers fiber order of those children changes. But, because of long-range dependencies, it's possible for those children to already be queued. This might mean a later cousin node could render before an earlier one, though never a child before a parent or ancestor. In principle this is not an issue because the output—the reductions being gathered—will be re-reduced in new order at a fence. From a pure data-flow perspective, this is fine: it would even be inevitable in a multi-threaded version. In practice, it feels off if code runs out of order for no reason, especially in a dev environment. So I added optional queue re-ordering, on by default. This can be done pretty easily because the affected fibers can be found by comparing paths, and still form a single group inside the otherwise ordered queue: scan until you find a fiber underneath the parent, then pop off fibers until you exit the subtree. Then just reinsert them. This really reminds me of shader warp reordering in raytracing GPUs btw. Dependencies To support contexts and captures, the host has a long-range dependency tracker (makeDependencyTracker): { host: { // ... depend: (fiber: LiveFiber, root: number) => boolean, undepend: (fiber: LiveFiber, root: number) => void, traceDown: (fiber: LiveFiber) => LiveFiber[], traceUp: (fiber: LiveFiber) => number[], } }; It holds two maps internally, each mapping fibers to fibers, for precedents and descendants respectively. These are mapped as LiveFiber -> id and id -> LiveFiber, once again following the one-way rule. i.e. It gives you real fibers if you traceDown, but only fiber IDs if you traceUp. The latter is only used for highlighting in the inspector. The depend and undepend methods are called by useContext and useCapture to set up a dependency this way. When a fiber is rendered (and did not memoize), bustFiberDeps(…) is called. This will invoke traceDown(…) and call host.visit(…) on each dependent fiber. It will also call bustFiberMemo(…) to bump their fiber.version (if present). Yeets could be tracked the same way, but this is unnecessary because yeeted already references the root statically. It's a different kind of cache being busted too (yeeted.reduced) and you need to bust all intermediate reductions along the way. So there is a dedicated visitYeetRoot(…) and bustFiberYeet(…) instead. Yeets are actually quite tricky to manage because there are two directions of traversal here. A yeet must bust all the caches towards the root. Once those caches are busted, another yeet shouldn't traverse them again until filled back in. It stops when it encounters undefined. Second, when the root gathers up the reduced values from the other end, it should be able to safely accept any defined yeeted.reduced as being correctly cached, and stop as well. The invariant to be maintained is that a trail of yeeted.reduced === undefined should always lead all the way back to the root. New fibers have an undefined reduction, and old fibers may be unmounted, so these operations also bust caches. But if there is no change in yeets, you don't need to reduce again. So visitYeetRoot is not actually called until and unless a new yeet is rendered or an old yeet is removed. Managing the lifecycle of this is simple, because there is only one place that triggers a re-reduction to fill it back in: the yeet root. Which is behind a data fence. It will always be called after the last cache has been busted, but before any other code that might need it. It's impossible to squeeze anything in between. It took a while to learn to lean into this style of thinking. Cache invalidation becomes a lot easier when you can partition your program into "before cache" and "after cache". Compared to the earliest versions of Live, the how and why of busting caches is now all very sensible. You use immutable data, or you pass a mutable ref and a signal. It always works. Resources The useResource hook lets a user register a disposal function for later. useContext and useCapture also need to dispose of their dependency when unmounted. For this, there is a disposal tracker (makeDisposalTracker) which effectively acts as an onFiberDispose event listener: { host: { // ... // Add/remove listener track: (fiber: LiveFiber, task: Task) => void, untrack: (fiber: LiveFiber, task: Task) => void, // Trigger listener dispose: (fiber: LiveFiber) => void, } } Disposal tasks are triggered by host.dispose(fiber), which is called by disposeFiber(fiber). The latter will also set fiber.bound to undefined so the fiber can no longer be called. A useResource may change during a fiber's lifetime. Rather than repeatedly untrack/track a new disposal function each time, I store a persistent resource tag in the hook state. This holds a reference to the latest disposal function. Old resources are explicitly disposed of before new ones are created, ensuring there is no overlap. Stack Slicing A React-like is a recursive tree evaluator. A naive implementation would use function recursion directly, using the native CPU stack. This is what Live 0.0.1 did. But the run-time has overhead, with its own function calls sandwiched in between (e.g. updateFiber, reconcileFiberCalls, flushMount). This creates towering stacks. It also cannot be time-sliced, because all the state is on the stack. In React this is instead implemented with a flat work queue, so it only calls into one component at a time. A profiler shows it repeatedly calling performUnitOfWork, beginWork, completeWork in a clean, shallow trace. Live could do the same with its fiber priority queue. But the rendering order is always just tree order. It's only interrupted and truncated by memoization. So the vast majority of the time you are adding a fiber to the front of the queue only to immediately pop it off again. The queue is a linked list so it creates allocation overhead. This massively complicates what should just be a native function call. Live says "¿Por qué no los dos?" and instead has a stack slicing mechanism (makeStackSlicer). It will use the stack, but stop recursion after N levels, where N is a global knob that current sits at 20. The left-overs are enqueued. This way, mainly fibers pinged by state changes and long-range dependency end up in the queue. This includes fenced continuations, which must always be called indirectly. If a fiber is in the queue, but ends up being rendered in a parent's recursion, it's immediately removed. { host: { // ... depth: (depth: number) => void, slice: (depth: number) => boolean, }, } When renderFibers gets a fiber from the queue, it calls host.depth(fiber.depth) to calibrate the slicer. Every time a mount is flushed, it will then call host.slice(mount.depth) to check if it should be sliced off. If so, it calls host.visit(…) to add it to the queue, but otherwise it just calls renderFiber / updateFiber directly. The exception is when there is a data fence, when the queue is always used. Here too there is a strict mode, on by default, which ensures that once the stack has been sliced, no further sync evaluation can take place higher up the stack. One-phase commit Time to rewind. A Live app consists of a tree of such fiber objects, all exactly the same shape, just with different state and environments inside. It's rendered in a purely one-way data flow, with only a minor asterisk on that statement. The host is the only thing coordinating, because it's the only thing that closes the cycle when state changes. This triggers an ongoing traversal, during which it only tells fibers which dependencies to ping when they render. Everything else emerges from the composition of components. Hopefully you can appreciate that Live is not actually Cowboy React, but something else and deliberate. It has its own invariants it's enforcing, and its own guarantees you can rely on. Like React, it has a strict and a non-strict mode that is meaningfully different, though the strictness is not about it nagging you, but about how anally the run-time will reproduce your exact declared intent. It does not offer any way to roll back partial state changes once made, unlike React. This idempotency model of rendering is good when you need to accommodate mutable references in a reactive system. Immediate mode APIs tend to use these, and Live is designed to be plugged in to those. The nice thing about Live is that it's often meaningful to suspend a partially rendered sub-tree without rewinding it back to the old state, because its state doesn't represent anything directly, like HTML does. It's merely reduced into a value, and you can simply re-use the old value until it has unsuspended. There is no need to hold on to all the old state of the components that produced it. If the value being gathered is made of lambdas, you have your two phases: the commit consists of calling them once you have a full set. In Use.GPU, you work with memory on the GPU, which you allocate once and then reuse by reference. The entire idea is that the view can re-render without re-rendering all components that produced it, the same way that a browser can re-render a page by animating only the CSS transforms. So I have to be all-in on mutability there, because updated transforms have to travel through the layout system without re-triggering it. I also use immediate mode for the CPU-side interactions, because I've found it makes UI controllers 2-3x less complicated. One interesting aspect here is that the difference between capturing and bubbling events, i.e. outside-in or inside-out, is just before-fence and after-fence. Live is also not a React alternative: it plays very nicely with React. You can nest one inside the other and barely notice. The Live inspector is written in React, because I needed it to work even if Live was broken. It can memoize effectively in React because Live is memoized. Therefor everything it shows you is live, including any state you open up. The inspector is functionality-first so I throw purity out the window and just focus on UX and performance. It installs a host.__ping callback so it can receive fiber pings from the run-time whenever they re-render. The run-time calls this via pingFiber in the right spots. Individual fibers can make themselves inspectable by adding undocumented/private props to fiber.__inspect. There are some helper hooks to make this prettier but that's all. You can make any component inspector-highlightable by having it re-render itself when highlighted. * * * Writing this post was a fun endeavour, prompting me to reconsider some assumptions from early on. I also fixed a few things that just sounded bad when said out loud. You know how it is. I removed some lingering unnecessary reverse fiber references. I was aware they weren't load bearing, but that's still different from not having them at all. The only one I haven't mentioned is the capture keys, which are a fiber so that they can be directly compared. In theory it only needs the id, path, depth, keys, and I could package those up separately, but it would just create extra objects, so the jury allows it. Live can model programs shaped like a one-way data flow, and generates one-way data itself. There are some interesting correspondences here. Live keep state entirely in fiber objects, while fibers run entirely on fiber.state. A fiber object is just a fixed dictionary of properties, always the same shape, just like fiber.state is for a component's lifetime. Children arrays without keys must be fixed-length and fixed-order (a fragment), but may have nulls. This is very similar to how no-hooks will skip over a missing spot in the fiber.state array and zero out the hook, so as to preserve hook order. Live hot-swaps a global currentFiber pointer to switch fibers, and useYolo hot-swaps a fiber's own local state to switch hook scopes. Memoizing a component can be implemented as a nested useMemo. Bumping the fiber version is really a bespoke setState which is resolved during next render. The lines between fiber, fiber.state and fiber.mounts are actually pretty damn blurry. A lot of mechanisms appear twice, once in a non-incremental form and once in an incremental form. Iteration turns into mounting, sequences turn into fences, and objects get chopped up into fine bits of cached state, either counted or with keys. The difference between hooks and a gather of unkeyed components start to blur. If Live is react-react, then a self-hosted live-live is hiding in there somewhere. Create a root fiber, give it empty state, off you go. Inlining would be a lot harder though, and you wouldn't be able to hand-roll fast paths as easily, which is always the problem in FP. For a JS implementation it would be very dumb, especially when you know that the JS VM already manages object prototypes incrementally, mounting one prop at a time. I do like the sound of an incremental Lisp where everything is made out of flat state lists instead of endless pointer chasing. If it had the same structure as Live, it might only have one genuine linked list driving it all: the priority queue, which holds elements pointing to elements. The rest would be elements pointing to linear arrays, a data structure that silicon caches love. A data-oriented Lisp maybe? You could even call it an incremental GPU. Worth pondering. What Live could really use is a WASM pony with better stack access and threading. But in the mean time, it already works. The source code for the embedded examples can be found on GitLab. If your browser can do WebGPU (desktop only for now), you can load up any of the Use.GPU examples and inspect them.

a year ago • 48 votes

More in programming

Adding auto-generated cover images to EPUBs downloaded from AO3

I was chatting with a friend recently, and she mentioned an annoyance when reading fanfiction on her iPad. She downloads fic from AO3 as EPUB files, and reads it in the Kindle app – but the files don’t have a cover image, and so the preview thumbnails aren’t very readable: She’s downloaded several hundred stories, and these thumbnails make it difficult to find things in the app’s “collections” view. This felt like a solvable problem. There are tools to add cover images to EPUB files, if you already have the image. The EPUB file embeds some key metadata, like the title and author. What if you had a tool that could extract that metadata, auto-generate an image, and use it as the cover? So I built that. It’s a small site where you upload EPUB files you’ve downloaded from AO3, the site generates a cover image based on the metadata, and it gives you an updated EPUB to download. The new covers show the title and author in large text on a coloured background, so they’re much easier to browse in the Kindle app: If you’d find this helpful, you can use it at alexwlchan.net/my-tools/add-cover-to-ao3-epubs/ Otherwise, I’m going to explain how it works, and what I learnt from building it. There are three steps to this process: Open the existing EPUB to get the title and author Generate an image based on that metadata Modify the EPUB to insert the new cover image Let’s go through them in turn. Open the existing EPUB I’ve not worked with EPUB before, and I don’t know much about it. My first instinct was to look for Python EPUB libraries on PyPI, but there was nothing appealing. The results were either very specific tools (convert EPUB to/from format X) or very unmaintained (the top result was last updated in April 2014). I decied to try writing my own code to manipulate EPUBs, rather than using somebody else’s library. I had a vague memory that EPUB files are zips, so I changed the extension from .epub to .zip and tried unzipping one – and it turns out that yes, it is a zip file, and the internal structure is fairly simple. I found a file called content.opf which contains metadata as XML, including the title and author I’m looking for: <?xml version='1.0' encoding='utf-8'?> <package xmlns="http://www.idpf.org/2007/opf" version="2.0" unique-identifier="uuid_id"> <metadata xmlns:opf="http://www.idpf.org/2007/opf" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:dcterms="http://purl.org/dc/terms/" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:calibre="http://calibre.kovidgoyal.net/2009/metadata"> <dc:title>Operation Cameo</dc:title> <meta name="calibre:timestamp" content="2025-01-25T18:01:43.253715+00:00"/> <dc:language>en</dc:language> <dc:creator opf:file-as="alexwlchan" opf:role="aut">alexwlchan</dc:creator> <dc:identifier id="uuid_id" opf:scheme="uuid">13385d97-35a1-4e72-830b-9757916d38a7</dc:identifier> <meta name="calibre:title_sort" content="operation cameo"/> <dc:description><p>Some unusual orders arrive at Operation Mincemeat HQ.</p></dc:description> <dc:publisher>Archive of Our Own</dc:publisher> <dc:subject>Fanworks</dc:subject> <dc:subject>General Audiences</dc:subject> <dc:subject>Operation Mincemeat: A New Musical - SpitLip</dc:subject> <dc:subject>No Archive Warnings Apply</dc:subject> <dc:date>2023-12-14T00:00:00+00:00</dc:date> </metadata> … That dc: prefix was instantly familiar from my time working at Wellcome Collection – this is Dublin Core, a standard set of metadata fields used to describe books and other objects. I’m unsurprised to see it in an EPUB; this is exactly how I’d expect it to be used. I found an article that explains the structure of an EPUB file, which told me that I can find the content.opf file by looking at the root-path element inside the mandatory META-INF/container.xml file which is every EPUB. I wrote some code to find the content.opf file, then a few XPath expressions to extract the key fields, and I had the metadata I needed. Generate a cover image I sketched a simple cover design which shows the title and author. I wrote the first version of the drawing code in Pillow, because that’s what I’m familiar with. It was fine, but the code was quite flimsy – it didn’t wrap properly for long titles, and I couldn’t get custom fonts to work. Later I rewrote the app in JavaScript, so I had access to the HTML canvas element. This is another tool that I haven’t worked with before, so a fun chance to learn something new. The API felt fairly familiar, similar to other APIs I’ve used to build HTML elements. This time I did implement some line wrapping – there’s a measureText() API for canvas, so you can see how much space text will take up before you draw it. I break the text into words, and keeping adding words to a line until measureText tells me the line is going to overflow the page. I have lots of ideas for how I could improve the line wrapping, but it’s good enough for now. I was also able to get fonts working, so I picked Georgia to match the font used for titles on AO3. Here are some examples: I had several ideas for choosing the background colour. I’m trying to help my friend browse her collection of fic, and colour would be a useful way to distinguish things – so how do I use it? I realised I could get the fandom from the EPUB file, so I decided to use that. I use the fandom name as a seed to a random number generator, then I pick a random colour. This means that all the fics in the same fandom will get the same colour – for example, all the Star Wars stories are a shade of red, while Star Trek are a bluey-green. This was a bit harder than I expected, because it turns out that JavaScript doesn’t have a built-in seeded random number generator – I ended up using some snippets from a Stack Overflow answer, where bryc has written several pseudorandom number generators in plain JavaScript. I didn’t realise until later, but I designed something similar to the placeholder book covers in the Apple Books app. I don’t use Apple Books that often so it wasn’t a deliberate choice to mimic this style, but clearly it was somewhere in my subconscious. One difference is that Apple’s app seems to be picking from a small selection of background colours, whereas my code can pick a much nicer variety of colours. Apple’s choices will have been pre-approved by a designer to look good, but I think mine is more fun. Add the cover image to the EPUB My first attempt to add a cover image used pandoc: pandoc input.epub --output output.epub --epub-cover-image cover.jpeg This approach was no good: although it added the cover image, it destroyed the formatting in the rest of the EPUB. This made it easier to find the fic, but harder to read once you’d found it. An EPUB file I downloaded from AO3, before/after it was processed by pandoc. So I tried to do it myself, and it turned out to be quite easy! I unzipped another EPUB which already had a cover image. I found the cover image in OPS/images/cover.jpg, and then I looked for references to it in content.opf. I found two elements that referred to cover images: <?xml version="1.0" encoding="UTF-8"?> <package xmlns="http://www.idpf.org/2007/opf" version="3.0" unique-identifier="PrimaryID"> <metadata xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:opf="http://www.idpf.org/2007/opf"> <meta name="cover" content="cover-image"/> … </metadata> <manifest> <item id="cover-image" href="images/cover.jpg" media-type="image/jpeg" properties="cover-image"/> … </manifest> </package> This gave me the steps for adding a cover image to an EPUB file: add the image file to the zipped bundle, then add these two elements to the content.opf. Where am I going to deploy this? I wrote the initial prototype of this in Python, because that’s the language I’m most familiar with. Python has all the libraries I need: The zipfile module can unpack and modify the EPUB/ZIP The xml.etree or lxml modules can manipulate XML The Pillow library can generate images I built a small Flask web app: you upload the EPUB to my server, my server does some processing, and sends the EPUB back to you. But for such a simple app, do I need a server? I tried rebuilding it as a static web page, doing all the processing in client-side JavaScript. That’s simpler for me to host, and it doesn’t involve a round-trip to my server. That has lots of other benefits – it’s faster, less of a privacy risk, and doesn’t require a persistent connection. I love static websites, so can they do this? Yes! I just had to find a different set of libraries: The JSZip library can unpack and modify the EPUB/ZIP, and is the only third-party code I’m using in the tool Browsers include DOMParser for manipulating XML I’ve already mentioned the HTML <canvas> element for rendering the image This took a bit longer because I’m not as familiar with JavaScript, but I got it working. As a bonus, this makes the tool very portable. Everything is bundled into a single HTML file, so if you download that file, you have the whole tool. If my friend finds this tool useful, she can save the file and keep a local copy of it – she doesn’t have to rely on my website to keep using it. What should it look like? My first design was very “engineer brain” – I just put the basic controls on the page. It was fine, but it wasn’t good. That might be okay, because the only person I need to be able to use this app is my friend – but wouldn’t it be nice if other people were able to use it? If they’re going to do that, they need to know what it is – most people aren’t going to read a 2,500 word blog post to understand a tool they’ve never heard of. (Although if you have read this far, I appreciate you!) I started designing a proper page, including some explanations and descriptions of what the tool is doing. I got something that felt pretty good, including FAQs and acknowledgements, and I added a grey area for the part where you actually upload and download your EPUBs, to draw the user’s eye and make it clear this is the important stuff. But even with that design, something was missing. I realised I was telling you I’d create covers, but not showing you what they’d look like. Aha! I sat down and made up a bunch of amusing titles for fanfic and fanfic authors, so now you see a sample of the covers before you upload your first EPUB: This makes it clearer what the app will do, and was a fun way to wrap up the project. What did I learn from this project? Don’t be scared of new file formats My first instinct was to look for a third-party library that could handle the “complexity” of EPUB files. In hindsight, I’m glad I didn’t find one – it forced me to learn more about how EPUBs work, and I realised I could write my own code using built-in libraries. EPUB files are essentially ZIP files, and I only had basic needs. I was able to write my own code. Because I didn’t rely on a library, now I know more about EPUBs, I have code that’s simpler and easier for me to understand, and I don’t have a dependency that may cause problems later. There are definitely some file formats where I need existing libraries (I’m not going to write my own JPEG parser, for example) – but I should be more open to writing my own code, and not jumping to add a dependency. Static websites can handle complex file manipulations I love static websites and I’ve used them for a lot of tasks, but mostly read-only display of information – not anything more complex or interactive. But modern JavaScript is very capable, and you can do a lot of things with it. Static pages aren’t just for static data. One of the first things I made that got popular was find untagged Tumblr posts, which was built as a static website because that’s all I knew how to build at the time. Somewhere in the intervening years, I forgot just how powerful static sites can be. I want to build more tools this way. Async JavaScript calls require careful handling The JSZip library I’m using has a lot of async functions, and this is my first time using async JavaScript. I got caught out several times, because I forgot to wait for async calls to finish properly. For example, I’m using canvas.toBlob to render the image, which is an async function. I wasn’t waiting for it to finish, and so the zip would be repackaged before the cover image was ready to add, and I got an EPUB with a missing image. Oops. I think I’ll always prefer the simplicity of synchronous code, but I’m sure I’ll get better at async JavaScript with practice. Final thoughts I know my friend will find this helpful, and that feels great. Writing software that’s designed for one person is my favourite software to write. It’s not hyper-scale, it won’t launch the next big startup, and it’s usually not breaking new technical ground – but it is useful. I can see how I’m making somebody’s life better, and isn’t that what computers are for? If other people like it, that’s a nice bonus, but I’m really thinking about that one person. Normally the one person I’m writing software for is me, so it’s extra nice when I can do it for somebody else. If you want to try this tool yourself, go to alexwlchan.net/my-tools/add-cover-to-ao3-epubs/ If you want to read the code, it’s all on GitHub. [If the formatting of this post looks odd in your feed reader, visit the original article]

4 hours ago • 3 votes

Non-alcoholic apéritifs

I’ve been doing Dry January this year. One thing I missed was something for apéro hour, a beverage to mark the start of the evening. Something complex and maybe bitter, not like a drink you’d have with lunch. I found some good options. Ghia sodas are my favorite. Ghia is an NA apéritif based on grape juice but with enough bitterness (gentian) and sourness (yuzu) to be interesting. You can buy a bottle and mix it with soda yourself but I like the little cans with extra flavoring. The Ginger and the Sumac & Chili are both great. Another thing I like are low-sugar fancy soda pops. Not diet drinks, they still have a little sugar, but typically 50 calories a can. De La Calle Tepache is my favorite. Fermented pineapple is delicious and they have some fun flavors. Culture Pop is also good. A friend gave me the Zero book, a drinks cookbook from the fancy restaurant Alinea. This book is a little aspirational but the recipes are doable, it’s just a lot of labor. Very fancy high end drink mixing, really beautiful flavor ideas. The only thing I made was their gin substitute (mostly junipers extracted in glycerin) and it was too sweet for me. Need to find the right use for it, a martini definitely ain’t it. An easier homemade drink is this Nonalcoholic Dirty Lemon Tonic. It’s basically a lemonade heavily flavored with salted preserved lemons, then mixed with tonic. I love the complexity and freshness of this drink and enjoy it on its own merits. Finally, non-alcoholic beer has gotten a lot better in the last few years thanks to manufacturing innovations. I’ve been enjoying NA Black Butte Porter, Stella Artois 0.0, Heineken 0.0. They basically all taste just like their alcoholic uncles, no compromise. One thing to note about non-alcoholic substitutes is they are not cheap. They’ve become a big high end business. Expect to pay the same for an NA drink as one with alcohol even though they aren’t taxed nearly as much.

2 days ago • 5 votes

It burns

The first time we had to evacuate Malibu this season was during the Franklin fire in early December. We went to bed with our bags packed, thinking they'd probably get it under control. But by 2am, the roaring blades of fire choppers shaking the house got us up. As we sped down the canyon towards Pacific Coast Highway (PCH), the fire had reached the ridge across from ours, and flames were blazing large out the car windows. It felt like we had left the evacuation a little too late, but they eventually did get Franklin under control before it reached us. Humans have a strange relationship with risk and disasters. We're so prone to wishful thinking and bad pattern matching. I remember people being shocked when the flames jumped the PCH during the Woolsey fire in 2017. IT HAD NEVER DONE THAT! So several friends of ours had to suddenly escape a nightmare scenario, driving through burning streets, in heavy smoke, with literally their lives on the line. Because the past had failed to predict the future. I feel into that same trap for a moment with the dramatic proclamations of wind and fire weather in the days leading up to January 7. Warning after warning of "extremely dangerous, life-threatening wind" coming from the City of Malibu, and that overly-bureaucratic-but-still-ominous "Particularly Dangerous Situation" designation. Because, really, how much worse could it be? Turns out, a lot. It was a little before noon on the 7th when we first saw the big plumes of smoke rise from the Palisades fire. And immediately the pattern matching ran astray. Oh, it's probably just like Franklin. It's not big yet, they'll get it out. They usually do. Well, they didn't. By the late afternoon, we had once more packed our bags, and by then it was also clear that things actually were different this time. Different worse. Different enough that even Santa Monica didn't feel like it was assured to be safe. So we headed far North, to be sure that we wouldn't have to evacuate again. Turned out to be a good move. Because by now, into the evening, few people in the connected world hadn't started to see the catastrophic images emerging from the Palisades and Eaton fires. Well over 10,000 houses would ultimately burn. Entire neighborhoods leveled. Pictures that could be mistaken for World War II. Utter and complete destruction. By the night of the 7th, the fire reached our canyon, and it tore through the chaparral and brush that'd been building since the last big fire that area saw in 1993. Out of some 150 houses in our immediate vicinity, nearly a hundred burned to the ground. Including the first house we moved to in Malibu back in 2009. But thankfully not ours. That's of course a huge relief. This was and is our Malibu Dream House. The site of that gorgeous home office I'm so fond to share views from. Our home. But a house left standing in a disaster zone is still a disaster. The flames reached all the way up to the base of our construction, incinerated much of our landscaping, and devoured the power poles around it to dysfunction. We have burnt-out buildings every which way the eye looks. The national guard is still stationed at road blocks on the access roads. Utility workers are tearing down the entire power grid to rebuild it from scratch. It's going to be a long time before this is comfortably habitable again. So we left. That in itself feels like defeat. There's an urge to stay put, and to help, in whatever helpless ways you can. But with three school-age children who've already missed over a months worth of learning from power outages, fire threats, actual fires, and now mudslide dangers, it was time to go. None of this came as a surprise, mind you. After Woolsey in 2017, Malibu life always felt like living on borrowed time to us. We knew it, even accepted it. Beautiful enough to be worth the risk, we said. But even if it wasn't a surprise, it's still a shock. The sheer devastation, especially in the Palisades, went far beyond our normal range of comprehension. Bounded, as it always is, by past experiences. Thus, we find ourselves back in Copenhagen. A safe haven for calamities of all sorts. We lived here for three years during the pandemic, so it just made sense to use it for refuge once more. The kids' old international school accepted them right back in, and past friendships were quickly rebooted. I don't know how long it's going to be this time. And that's an odd feeling to have, just as America has been turning a corner, and just as the optimism is back in so many areas. Of the twenty years I've spent in America, this feels like the most exciting time to be part of the exceptionalism that the US of A offers. And of course we still are. I'll still be in the US all the time on both business, racing, and family trips. But it won't be exclusively so for a while, and it won't be from our Malibu Dream House. And that burns.

2 days ago • 6 votes

Slow, flaky, and failing

Thou shalt not suffer a flaky test to live, because it’s annoying, counterproductive, and dangerous: one day it might fail for real, and you won’t notice. Here’s what to do.

3 days ago • 7 votes

Name that Ware, January 2025

The ware for January 2025 is shown below. Thanks to brimdavis for contributing this ware! …back in the day when you would get wares that had “blue wires” in them… One thing I wonder about this ware is…where are the ROMs? Perhaps I’ll find out soon! Happy year of the snake!

3 days ago • 5 votes

New here?