Full Width [alt+shift+f] Shortcuts [alt+shift+k]
Sign Up [alt+shift+s] Log In [alt+shift+l]
18
Why your app turned into spaghetti There's a certain kind of programmer. Let's call him Stanley. Stanley has been around for a while, and has his fair share of war stories. The common thread is that poorly conceived and specced solutions lead to disaster, or at least, ongoing misery. As a result, he has adopted a firm belief: it should be impossible for his program to reach an invalid state. Stanley loves strong and static typing. He's a big fan of pattern matching, and enums, and discriminated unions, which allow correctness to be verified at compile time. He also has strong opinions on errors, which must be caught, logged and prevented. He uses only ACID-compliant databases, wants foreign keys and triggers to be enforced, and wraps everything in atomic transactions. He hates any source of uncertainty or ambiguity, like untyped JSON or plain-text markup. His APIs will accept data only in normalized and validated form. When you use a Stanley lib, and it doesn't work, the answer...
12 months ago

More from Acko.net

The Bouquet Residence

Keeping up appearances in tech I saw a remarkable pair of tweets the other day. In the wake of the outage, the CEO of CrowdStrike sent out a public announcement. It's purely factual. The scope of the problem is identified, the known facts are stated, and the logistics of disaster relief are set in motion. Millions of computers were affected. This is the equivalent of a frazzled official giving a brief statement in the aftermath of an earthquake, directing people to the Red Cross. Everything is basically on fire for everyone involved. Systems are failing everywhere, some critical, and quite likely people are panicking. The important thing is to give the technicians the information and tools to fix it, and for everyone else to do what they can, and stay out of the way. In response, a communication professional posted an 'improved' version: Credit where credit is due, she nailed the style. 10/10. It seems unobjectionable, at first. Let's go through, shall we? Opposite Day First is that the CEO is "devastated." A feeling. And they are personally going to ensure it's fixed for every single user. This focuses on the individual who is inconvenienced. Not the disaster. They take a moment out of their time to say they are so, so sorry a mistake was made. They have let you and everyone else down, and that shouldn't happen. That's their responsibility. By this point, the original statement had already told everyone the relevant facts. Here the technical details are left to the imagination. The writer's self-assigned job is to wrap the message in a more palatable envelope. Everyone will be working "all day, all night, all weekends," indeed, "however long it takes," to avoid it happening again. I imagine this is meant to be inspiring and reassuring. But if I was a CrowdStrike technician or engineer, I would find it demoralizing: the boss, who will actually be personally fixing diddly-squat, is saying that the long hours of others are a sacrifice they're willing to make. Plus, CrowdStrike's customers are in the same boat: their technicians get volunteered too. They can't magically unbrick PCs from a distance, so "until it's fully fixed for every single user" would be a promise outsiders will have to keep. Lovely. There's even a punch line: an invitation to go contact them, the quickest way linked directly. It thanks people for reaching out. If everything is on fire, that includes the phone lines, the inboxes, and so on. The most stupid thing you could do in such a situation is to tell more people to contact you, right away. Don't encourage it! That's why the original statement refers to pre-existing lines of communication, internal representatives, and so on. The Support department would hate the CEO too. Root Cause If you're wondering about the pictures, it's Hyacinth Bucket, from 90s UK sitcom Keeping Up Appearances, who would always insist "it's pronounced Bouquet." Hyacinth's ambitions always landed her out of her depth, surrounded by upper-class people she's trying to impress, in the midst of an embarrassing disaster. Her increasingly desparate attempts to save face, which invariably made things worse, are the main source of comedy. Try reading that second statement in her voice. I’m devastated to see the scale of today’s outage and will be personally working on it together with our team until it’s fully fixed for every single user. But I wanted to take a moment to come here and tell you that I am sorry. People around the world rely on us, and incidents like this can’t happen. This came from an error that ultimately is my responsibility. I can hear it perfectly, telegraphing Britishness to restore dignity for all. If she were in tech she would give that statement. It's about reputation management first, projecting the image of competence and accountability. But she's giving the speech in front of a burning building, not realizing the entire exercise is futile. Worse, she thinks she's nailing it. If CrowdStrike had sent this out, some would've applauded and called it an admirable example of wise and empathetic communication. Real leadership qualities. But it's the exact opposite. It focuses on the wrong things, it alienates the staff, and it definitely amplifies the chaos. It's Monty Python-esque. Apologizing is pointless here, the damage is already done. What matters is how severe it is and whether it could've been avoided. This requires a detailed root-cause analysis and remedy. Otherwise you only have their word. Why would that re-assure you? The original restated the company's mission: security and stability. Those are the stakes to regain a modicum of confidence. You may think that I'm reading too much into this. But I know the exact vibe on an engineering floor when the shit hits the fan. I also know how executives and staff without that experience end up missing the point entirely. I once worked for a Hyacinth Bucket. It's not an anecdote, it's allegory. They simply don't get the engineering mindset, and confuse authority with ownership. They step on everyone's toes without realizing, because they're constantly wearing clown shoes. Nobody tells them. Softness as a Service The change in style between #1 and #2 is really a microcosm of the conflict that has been broiling in tech for ~15 years now. I don't mean the politics, but the shifting of norms, of language and behavior. It's framed as a matter of interpersonal style, which needs to be welcoming and inclusive. In practice this means they assert or demand that style #2 be the norm, even when #1 is advisable or required. Factuality is seen as deficient, improper and primitive. It's a form of doublethink: everyone's preference is equally valid, except yours, specifically. But the difference is not a preference. It's about what actually works and what doesn't. Style #1 is aimed at the people who have to fix it. Style #2 is aimed at the people who can't do anything until it's fixed. Who should they be reaching out to? In #2, communication becomes an end in itself, not a means of conveying information. It's about being seen saying the words, not living them. Poking at the statement makes it fall apart. When this becomes the norm in a technical field, it has deep consequences: Critique must be gift-wrapped in flattery, and is not allowed to actually land. Mistakes are not corrected, and sentiment takes precedence over effectiveness. Leaders speak lofty words far from the trenches to save face. The people they thank the loudest are the ones they pay the least. Inevitably, quiet competence is replaced with gaudy chaos. Everyone says they're sorry and responsible, but nobody actually is. Nobody wants to resign either. Sound familiar? Cope and Soothe The elephant in the room is that #1 is very masculine, while #2 is more feminine. When you hear "women are more empathetic communicators", this is what they mean. They tend to focus on the individual and their relation to them, not the team as a whole and its mission. Complaints that tech is too "male dominated" and "notoriously hostile to women" are often just this. Tech was always full of types who won't preface their proposals and criticisms with fluff, and instead lean into autism. When you're used to being pandered to, neutrality feels like vulgarity. The notable exceptions are rare and usually have an exasperating lead up. Tech is actually one of the most accepting and egalitarian fields around. The maintainers do a mostly thankless job. "Oh so you're saying there's no misogyny in tech?" No I'm just saying misogyny doesn't mean "something 1 woman hates". The tone is really a distraction. If someone drops an analysis, saying shit or get off the pot, even very kindly and patiently, some will still run away screaming. Like an octopus spraying ink, they'll deploy a nasty form of #2 as a distraction. That's the real issue. Many techies, in their naiveté, believed the cultural reformers when they showed up to gentrify them. They obediently branded heretics like James Damore, and burned witches like Richard Stallman. Thanks to racism, words like 'master' and 'slave' are now off-limits as technical terms. Ironic, because millions of computers just crashed because they worked exactly like that. The cope is to pretend that nothing has truly changed yet, and more reform is needed. In fact, everything has already changed. Tech forums used to be crucibles for distilling insight, but now they are guarded jealously by people more likely to flag and ban than strongly disagree. I once got flagged on HN because I pointed out Twitter's mass lay-offs were a response to overhiring, and that people were rooting for the site to fail after Musk bought it. It suggested what we all know now: that the company would not implode after trimming the dead weight, and that they'd never forgive him for it. Diversity is now associated with incompetence, because incompetent people have spent over a decade reaching for it as an excuse. In their attempts to fight stereotypes, they ensured the stereotypes came true. Bait and Snitch The outcry tends to be: "We do all the same things you do, but still we get treated differently!" But they start from the conclusion and work their way backwards. This is what the rewritten statement does: it tries to fix the relationship before fixing the problem. The average woman and man actually do things very differently in the first place. Individual men and women choose. And others respond accordingly. The people who build and maintain the world's infrastructure prefer the masculine style for a reason: it keeps civilization running, and helps restore it when it breaks. A disaster announcement does not need to be relatable, it needs to be effective. Furthermore, if the job of shoveling shit falls on you, no amount of flattery or oversight will make that more pleasant. It really won't. Such commentary is purely for the benefit of the ones watching and trying to look busy. It makes it worse, stop pretending otherwise. There's little loyalty in tech companies nowadays, and it's no surprise. Project and product managers are acting more like demanding clients to their own team, than leaders. "As a user, I want..." Yes, but what are you going to do about it? Do you even know where to start? What's perceived as a lack of sensitivity is actually the presence of sensibility. It's what connects the words to the reality on the ground. It does not need to be improved or corrected, it just needs to be respected. And yes it's a matter of gender, because bashing men and masculine norms has become a jolly recreational sport in the overculture. Mature women know it. It seems impossible to admit. The entire edifice of gender equality depends on there not being a single thing men are actually better at, even just on average. Where men and women's instincts differ, women must be right. It's childish, and not harmless either. It dares you to call it out, so they can then play the wounded victim, and paint you as the unreasonable asshole who is mean. This is supposed to invalidate the argument. * * * This post is of course a giant cannon pointing in the opposite direction, sitting on top of a wall. Its message will likely fly over the reformers' heads. If they read it at all, they'll selectively quote or paraphrase, call me a tech-bro, and spool off some sentences they overheard, like an LLM. It's why they adore AI, and want it to be exactly as sycophantic as them. They don't care that it makes stuff up wholesale, because it makes them look and feel competent. It will never tell them to just fuck off already. Think less about what is said, more about what is being done. Otherwise the next CrowdStrike will probably be worse.

6 months ago 64 votes
Stable Fiddusion

Acko.queue(function () { renderMathInElement(document.querySelector('article'), {delimiters: [ {left: "$$", right: "$$", display: true}, {left: "$", right: "$", display: false}, ]}); }); Frequency-domain blue noise generator In computer graphics, stochastic methods are so hot right now. All rendering turns into calculus, except you solve the integrals by numerically sampling them. As I showed with Teardown, this is all based on random noise, hidden with a ton of spatial and temporal smoothing. For this, you need a good source of high quality noise. There have been a few interesting developments in this area, such as Alan Wolfe et al.'s Spatio-Temporal Blue Noise. This post is about how I designed noise in frequency space. I will cover: What is blue noise? Designing indigo noise How swap works in the frequency domain Heuristics and analysis to speed up search Implementing it in WebGPU Along the way I will also show you some "street" DSP math. This illustrates how getting comfy in this requires you to develop deep intuition about complex numbers. But complex doesn't mean complicated. It can all be done on a paper napkin. The WebGPU interface I built What I'm going to make is this: If properly displayed, this image should look eerily even. But if your browser is rescaling it incorrectly, it may not be exactly right. Colorless Blue Ideas I will start by just recapping the essentials. If you're familiar, skip to the next section. Ordinary random generators produce uniform white noise: every value is equally likely, and the average frequency spectrum is flat. Time domain Frequency domain To a person, this doesn't actually seem fully 'random', because it has clusters and voids. Similarly, a uniformly random list of coin flips will still have long runs of heads or tails in it occasionally. What a person would consider evenly random is usually blue noise: it prefers to alternate between heads and tails, and avoids long runs entirely. It is 'more random than random', biased towards the upper frequencies, i.e. the blue part of the spectrum. Time domain Frequency domain Blue noise is great for e.g. dithering, because when viewed from afar, or blurred, it tends to disappear. With white noise, clumps remain after blurring: Blurred white noise Blurred blue noise Blueness is a delicate property. If you have e.g. 3D blue noise in a volume XYZ, then a single 2D XY slice is not blue at all: XYZ spectrum XY slice XY spectrum The samples are only evenly distributed in 3D, i.e. when you consider each slice in front and behind it too. Blue noise being delicate means that nobody really knows of a way to generate it statelessly, i.e. as a pure function f(x,y,z). Algorithms to generate it must factor in the whole, as noise is only blue if every single sample is evenly spaced. You can make blue noise images that tile, and sample those, but the resulting repetition may be noticeable. Because blue noise is constructed, you can make special variants. Uniform Blue Noise has a uniform distribution of values, with each value equally likely. An 8-bit 256x256 UBN image will have each unique byte appear exactly 256 times. Projective Blue Noise can be projected down, so that a 3D volume XYZ flattened into either XY, YZ or ZX is still blue in 2D, and same for X, Y and Z in 1D. Spatio-Temporal Blue Noise (STBN) is 3D blue noise created specifically for use in real-time rendering: Every 2D slice XY is 2D blue noise Every Z row is 1D blue noise This means XZ or YZ slices of STBN are not blue. Instead, it's designed so that when you average out all the XY slices over Z, the result is uniform gray, again without clusters or voids. This requires the noise in all the slices to perfectly complement each other, a bit like overlapping slices of translucent swiss cheese. This is the sort of noise I want to generate. Indigo STBN 64x64x16 XYZ spectrum Sleep Furiously A blur filter's spectrum is the opposite of blue noise: it's concentrated in the lowest frequencies, with a bump in the middle. If you blur the noise, you multiply the two spectra. Very little is left: only the ring-shaped overlap, creating a band-pass area. This is why blue noise looks good when smoothed, and is used in rendering, with both spatial (2D) and temporal smoothing (1D) applied. Blur filters can be designed. If a blur filter is perfectly low-pass, i.e. ~zero amplitude for all frequencies > $ f_{\rm{lowpass}} $ , then nothing is left of the upper frequencies past a point. If the noise is shaped to minimize any overlap, then the result is actually noise free. The dark part of the noise spectrum should be large and pitch black. The spectrum shouldn't just be blue, it should be indigo. When people say you can't design noise in frequency space, what they mean is that you can't merely apply an inverse FFT to a given target spectrum. The resulting noise is gaussian, not uniform. The missing ingredient is the phase: all the frequencies need to be precisely aligned to have the right value distribution. This is why you need a specialized algorithm. The STBN paper describes two: void-and-cluster, and swap. Both of these are driven by an energy function. It works in the spatial/time domain, based on the distances between pairs of samples. It uses a "fall-off parameter" sigma to control the effective radius of each sample, with a gaussian kernel. STBN (Wolfe et al.) The swap algorithm is trivially simple. It starts from white noise and shapes it: Start with e.g. 256x256 pixels initialized with the bytes 0-255 repeated 256 times in order Permute all the pixels into ~white noise using a random order Now iterate: randomly try swapping two pixels, check if the result is "more blue" This is guaranteed to preserve the uniform input distribution perfectly. The resulting noise patterns are blue, but they still have some noise in all the lower frequencies. The only blur filter that could get rid of it all, is one that blurs away all the signal too. My 'simple' fix is just to score swaps in the frequency domain instead. If this seems too good to be true, you should know that a permutation search space is catastrophically huge. If any pixel can be swapped with any other pixel, the number of possible swaps at any given step is O(N²). In a 256x256 image, it's ~2 billion. The goal is to find a sequence of thousands, millions of random swaps, to turn the white noise into blue noise. This is basically stochastic bootstrapping. It's the bulk of good old fashioned AI, using simple heuristics, queues and other tools to dig around large search spaces. If there are local minima in the way, you usually need more noise and simulated annealing to tunnel over those. Usually. This set up is somewhat simplified by the fact that swaps are symmetric (i.e. (A,B) = (B,A)), but also that applying swaps S1 and S2 is the same as applying swaps S2 and S1 as long as they don't overlap. Good Energy Let's take it one hurdle at a time. It's not obvious that you can change a signal's spectrum just by re-ordering its values over space/time, but this is easy to illustrate. Take any finite 1D signal, and order its values from lowest to highest. You will get some kind of ramp, approximating a sawtooth wave. This concentrates most of the energy in the first non-DC frequency: Now split the odd values from the even values, and concatenate them. You will now have two ramps, with twice the frequency: You can repeat this to double the frequency all the way up to Nyquist. So you have a lot of design freedom to transfer energy from one frequency to another. In fact the Fourier transform has the property that energy in the time and frequency domain is conserved: This means the sum of $ |\mathrm{spectrum}_k|^2 $ remains constant over pixel swaps. We then design a target curve, e.g. a high-pass cosine: This can be fit and compared to the current noise spectrum to get the error to minimize. However, I don't measure the error in energy $ |\mathrm{spectrum}_k|^2 $ but in amplitude $ |\mathrm{spectrum}_k| $. I normalize the spectrum and the target into distributions, and take the L2 norm of the difference, i.e. a sqrt of the sum of squared errors: This keeps the math simple, but also helps target the noise in the ~zero part of the spectrum. Otherwise, deviations near zero would count for less than deviations around one. Go Banana So I tried it. With a lot of patience, you can make 2D blue noise images up to 256x256 on a single thread. A naive random search with an FFT for every iteration is not fast, but computers are. Making a 64x64x16 with this is possible, but it's certainly like watching paint dry. It's the same number of pixels as 256x256, but with an extra dimension worth of FFTs that need to be churned. Still, it works and you can also make 3D STBN with the spatial and temporal curves controlled independently: Converged spectra I built command-line scripts for this, with a bunch of quality of life things. If you're going to sit around waiting for numbers to go by, you have a lot of time for this... Save and load byte/float-exact state to a .png, save parameters to .json Save a bunch of debug viz as extra .pngs with every snapshot Auto-save state periodically during runs Measure and show rate of convergence every N seconds, with smoothing Validate the histogram before saving to detect bugs and avoid corrupting expensive runs I could fire up a couple of workers to start churning, while continuing to develop the code liberally with new variations. I could also stop and restart workers with new heuristics, continuing where it left off. Protip: you can write C in JavaScript Drunk with power, I tried various sizes and curves, which created... okay noise. Each has the exact same uniform distribution so it's difficult to judge other than comparing to other output, or earlier versions of itself. To address this, I visualized the blurred result, using a [1 4 6 4 1] kernel as my base line. After adjusting levels, structure was visible: Semi-converged Blurred The resulting spectra show what's actually going on: Semi-converged Blurred The main component is the expected ring of bandpass noise, the 2D equivalent of ringing. But in between there is also a ton of redder noise, in the lower frequencies, which all remains after a blur. This noise is as strong as the ring. So while it's easy to make a blue-ish noise pattern that looks okay at first glance, there is a vast gap between having a noise floor and not having one. So I kept iterating: It takes a very long time, but if you wait, all those little specks will slowly twinkle out, until quantization starts to get in the way, with a loss of about 1/255 per pixel (0.0039). Semi converged Fully converged The effect on the blurred output is remarkable. All the large scale structure disappears, as you'd expect from spectra, leaving only the bandpass ringing. That goes away with a strong enough blur, or a large enough dark zone. The visual difference between the two is slight, but nevertheless, the difference is significant and pervasive when amplified: Semi converged Fully converged Difference Final spectrum I tried a few indigo noise curves, with different % of the curve zero. The resulting noise is all extremely equal, even after a blur and amplify. The only visible noise left is bandpass, and the noise floor is so low it may as well not be there. As you make the black exclusion zone bigger, the noise gets concentrated in the edges and corners. It becomes a bit more linear and squarish, a contender for violet noise. This is basically a smooth evolution towards a pure pixel checkboard in the limit. Using more than 50% zero seems inadvisable for this reason: Time domain Frequency domain At this point the idea was validated, but it was dog slow. Can it be done faster? Spatially Sparse An FFT scales like O(N log N). When you are dealing with images and volumes, that N is actually an N² or N³ in practice. The early phase of the search is the easiest to speed up, because you can find a good swap for any pixel with barely any tries. There is no point in being clever. Each sub-region is very non-uniform, and its spectrum nearly white. Placing pixels roughly by the right location is plenty good enough. You might try splitting a large volume into separate blocks, and optimize each block separately. That wouldn't work, because all the boundaries remain fully white. Overlapping doesn't fix this, because they will actively create new seams. I tried it. What does work is a windowed scoring strategy. It avoids a full FFT for the entire volume, and only scores each NxN or NxNxN region around each swapped point, with N-sized FFTs in each dimension. This is enormously faster and can rapidly dissolve larger volumes of white noise into approximate blue even with e.g. N = 8 or N = 16. Eventually it stops improving and you need to bump the range or switch to a global optimizer. Here's the progression from white noise, to when sparse 16x16 gives up, followed by some additional 64x64: Time domain Frequency domain Time domain Frequency domain Time domain Frequency domain A naive solution does not work well however. This is because the spectrum of a subregion does not match the spectrum of the whole. The Fourier transform assumes each signal is periodic. If you take a random subregion and forcibly repeat it, its new spectrum will have aliasing artifacts. This would cause you to consistently misjudge swaps. To fix this, you need to window the signal in the space/time-domain. This forces it to start and end at 0, and eliminates the effect of non-matching boundaries on the scoring. I used a smoothStep window because it's cheap, and haven't needed to try anything else: 16x16 windowed data This still alters the spectrum, but in a predictable way. A time-domain window is a convolution in the frequency domain. You don't actually have a choice here: not using a window is mathematically equivalent to using a very bad window. It's effectively a box filter covering the cut-out area inside the larger volume, which causes spectral ringing. The effect of the chosen window on the target spectrum can be modeled via convolution of their spectral magnitudes: $$ \mathbf{target}' = |\mathbf{target}| \circledast |\mathcal{F}(\mathbf{window})| $$ This can be done via the time domain as: $$ \mathbf{target}' = \mathcal{F}(\mathcal{F}^{-1}(|\mathbf{target}|) \cdot \mathcal{F}^{-1}(|\mathcal{F}(\mathbf{window})|)) $$ Note that the forward/inverse Fourier pairs are not redundant, as there is an absolute value operator in between. This discards the phase component of the window, which is irrelevant. Curiously, while it is important to window the noise data, it isn't very important to window the target. The effect of the spectral convolution is small, amounting to a small blur, and the extra error is random and absorbed by the smooth scoring function. The resulting local loss tracks the global loss function pretty closely. It massively speeds up the search in larger volumes, because the large FFT is the bottleneck. But it stops working well before anything resembling convergence in the frequency-domain. It does not make true blue noise, only a lookalike. The overall problem is still that we can't tell good swaps from bad swaps without trying them and verifying. Sleight of Frequency So, let's characterize the effect of a pixel swap. Given a signal [A B C D E F G H], let's swap C and F. Swapping the two values is the same as adding F - C = Δ to C, and subtracting that same delta from F. That is, you add the vector: V = [0 0 Δ 0 0 -Δ 0 0] This remains true if you apply a Fourier transform and do it in the frequency domain. To best understand this, you need to develop some intuition around FFTs of Dirac deltas. Consider the short filter kernel [1 4 6 4 1]. It's a little known fact, but you can actually sight-read its frequency spectrum directly off the coefficients, because the filter is symmetrical. I will teach you. The extremes are easy: The DC amplitude is the sum 1 + 4 + 6 + 4 + 1 = 16 The Nyquist amplitude is the modulated sum 1 - 4 + 6 - 4 + 1 = 0 So we already know it's an 'ideal' lowpass filter, which reduces the Nyquist signal +1, -1, +1, -1, ... to exactly zero. It also has 16x DC gain. Now all the other frequencies. First, remember the Fourier transform works in symmetric ways. Every statement "____ in the time domain = ____ in the frequency domain" is still true if you swap the words time and frequency. This has lead to the grotesquely named sub-field of cepstral processing where you have quefrencies and vawes, and it kinda feels like you're having a stroke. The cepstral convolution filter from earlier is called a lifter. Usually cepstral processing is applied to the real magnitude of the spectrum, i.e. $ |\mathrm{spectrum}| $, instead of its true complex value. This is a coward move. So, decompose the kernel into symmetric pairs: [· · 6 · ·] [· 4 · 4 ·] [1 · · · 1] All but the first row is a pair of real Dirac deltas in the time domain. Such a row is normally what you get when you Fourier transform a cosine, i.e.: $$ \cos \omega = \frac{\mathrm{e}^{i\omega} + \mathrm{e}^{-i\omega}}{2} $$ A cosine in time is a pair of Dirac deltas in the frequency domain. The phase of a (real) cosine is zero, so both its deltas are real. Now flip it around. The Fourier transform of a pair [x 0 0 ... 0 0 x] is a real cosine in frequency space. Must be true. Each new pair adds a new higher cosine on top of the existing spectrum. For the central [... 0 0 x 0 0 ...] we add a DC term. It's just a Fourier transform in the other direction: |FFT([1 4 6 4 1])| = [· · 6 · ·] => 6 [· 4 · 4 ·] => 8 cos(ɷ) [1 · · · 1] => 2 cos(2ɷ) = 6 + 8 cos(ɷ) + 2 cos(2ɷ) Normally you have to use the z-transform to analyze a digital filter. But the above is a shortcut. FFTs and inverse FFTs do have opposite phase, but that doesn't matter here because cos(ɷ) = cos(-ɷ). This works for the symmetric-even case too: you offset the frequencies by half a band, ɷ/2, and there is no DC term in the middle: |FFT([1 3 3 1])| = [· 3 3 ·] => 6 cos(ɷ/2) [1 · · 1] => 2 cos(3ɷ/2) = 6 cos(ɷ/2) + 2 cos(3ɷ/2) So, symmetric filters have spectra that are made up of regular cosines. Now you know. What about the delta vector [0 0 Δ 0 0 -Δ 0 0]? It's not symmetric, so we have to decompose it: V1 = [· · · · · Δ · ·] V2 = [· · Δ · · · · ·] V = V2 - V1 Each is now an unpaired dirac delta. Each vector's fourier transform is a complex wave $ Δ \cdot \mathrm{e}^{-i \omega k} $ in the frequency domain (the k'th quefrency). It lacks the usual complementary oppositely twisting wave $ Δ \cdot \mathrm{e}^{i \omega k} $, so it's not real-valued. It has constant magnitude Δ and varying phase: FFT(V1) = [ Δ Δ Δ Δ Δ Δ Δ Δ ] FFT(V2) = [ Δ Δ Δ Δ Δ Δ Δ Δ ] These are vawes. The effect of a swap is still just to add FFT(V), aka FFT(V2) - FFT(V1) to the (complex) spectrum. The effect is to transfer energy between all the bands simultaneously. Hence, FFT(V1) and FFT(V2) function as a source and destination mask for the transfer. However, 'mask' is the wrong word, because the magnitude of $ \mathrm{e}^{i \omega k} $ is always 1. It doesn't have varying amplitude, only varying phase. -FFT(V1) and FFT(V2) define the complex direction in which to add/subtract energy. When added together their phases interfere constructively or destructively, resulting in an amplitude that varies between 0 and 2Δ: an actual mask. The resulting phase will be halfway between the two, as it's the sum of two equal-length complex numbers. FFT(V) = [ · Δ Δ Δ Δ Δ Δ Δ ] For any given pixel A and its delta FFT(V1), it can pair up with other pixels B to form N-1 different interference masks FFT(V2) - FFT(V1). There are N(N-1)/2 unique interference masks, if you account for (A,B) (B,A) symmetry. Worth pointing out, the FFT of the first index: FFT([Δ 0 0 0 0 0 0 0]) = [Δ Δ Δ Δ Δ Δ Δ Δ] This is the DC quefrency, and the fourier symmetry continues to work. Moving values in time causes the vawe's quefrency to change in the frequency domain. This is the upside-down version of how moving energy to another frequency band causes the wave's frequency to change in the time domain. What's the Gradient, Kenneth? Using vectors as masks... shifting energy in directions... this means gradient descent, no? Well. It's indeed possible to calculate the derivative of your loss function as a function of input pixel brightness, with the usual bag of automatic differentiation/backprop tricks. You can also do it numerically. But, this doesn't help you directly because the only way you can act on that per-pixel gradient is by swapping a pair of pixels. You need to find two quefrencies FFT(V1) and FFT(V2) which interfere in exactly the right way to decrease the loss function across all bad frequencies simultaneously, while leaving the good ones untouched. Even if the gradient were to help you pick a good starting pixel, that still leaves the problem of finding a good partner. There are still O(N²) possible pairs to choose from, and the entire spectrum changes a little bit on every swap. Which means new FFTs to analyze it. Random greedy search is actually tricky to beat in practice. Whatever extra work you spend on getting better samples translates into less samples tried per second. e.g. Taking a best-of-3 approach is worse than just trying 3 swaps in a row. Swaps are almost always orthogonal. But random() still samples unevenly because it's white noise. If only we had.... oh wait. Indeed if you already have blue noise of the right size, you can use that to mildly speed up the search for more. Use it as a random permutation to drive sampling, with some inevitable whitening over time to keep it fresh. You can't however use the noise you're generating to accelerate its own search, because the two are highly correlated. What's really going on is all a consequence of the loss function. Given any particular frequency band, the loss function is only affected when its magnitude changes. Its phase can change arbitrarily, rolling around without friction. The complex gradient must point in the radial direction. In the tangential direction, the partial derivative is zero. The value of a given interference mask FFT(V1) - FFT(V2) for a given frequency is also complex-valued. It can be projected onto the current phase, and split into its radial and tangential component with a dot product. The interference mask has a dual action. As we saw, its magnitude varies between 0 and 2Δ, as a function of the two indices k1 and k2. This creates a window that is independent of the specific state or data. It defines a smooth 'hash' from the interference of two quefrency bands. But its phase adds an additional selection effect: whether the interference in the mask is aligned with the current band's phase: this determines the split between radial and tangential. This defines a smooth phase 'hash' on top. It cycles at the average of the two quefrencies, i.e. a different, third one. Energy is only added/subtracted if both hashes are non-zero. If the phase hash is zero, the frequency band only turns. This does not affect loss, but changes how each mask will affect it in the future. This then determines how it is coupled to other bands when you perform a particular swap. Note that this is only true differentially: for a finite swap, the curvature of the complex domain comes into play. The loss function is actually a hyper-cylindrical skate bowl you can ride around. Just the movement of all the bands is tied together. Frequency bands with significant error may 'random walk' freely clockwise or counterclockwise when subjected to swaps. A band can therefor drift until it gets a turn where its phase is in alignment with enough similar bands, where the swap makes them all descend along the local gradient, enough to counter any negative effects elsewhere. In the time domain, each frequency band is a wave that oscillates between -1...1: it 'owns' some of the value of each pixel, but there are places where its weight is ~zero (the knots). So when a band shifts phase, it changes how much of the energy of each pixel it 'owns'. This allows each band to 'scan' different parts of the noise in the time domain. In order to fix a particular peak or dent in the frequency spectrum, the search must rotate that band's phase so it strongly owns any defect in the noise, and then perform a swap to fix that defect. Thus, my mental model of this is not actually disconnected pixel swapping. It's more like one of those Myst puzzles where flipping a switch flips some of the neighbors too. You press one pair of buttons at a time. It's a giant haunted dimmer switch. We're dealing with complex amplitudes, not real values, so the light also has a color. Mechanically it's like a slot machine, with dials that can rotate to display different sides. The cherries and bells are the color: they determine how the light gets brighter or darker. If a dial is set just right, you can use it as a /dev/null to 'dump' changes. That's what theory predicts, but does it work? Well, here is a (blurred noise) spectrum being late-optimized. The search is trying to eliminate the left-over lower frequency noise in the middle: Semi converged Here's the phase difference from the late stages of search, each a good swap. Left to right shows 4 different value scales: x2 x16 x256 x4096 x2 x16 x256 x4096 At first it looks like just a few phases are changing, but amplification reveals it's the opposite. There are several plateaus. Strongest are the bands being actively modified. Then there's the circular error area around it, where other bands are still swirling into phase. Then there's a sharp drop-off to a much weaker noise floor, present everywhere. These are the bands that are already converged. Compare to a random bad swap: x2 x16 x256 x4096 Now there is strong noise all over the center, and the loss immediately gets worse, as a bunch of amplitudes start shifting in the wrong direction randomly. So it's true. Applying the swap algorithm with a spectral target naturally cycles through focusing on different parts of the target spectrum as it makes progress. This information is positionally encoded in the phases of the bands and can be 'queried' by attempting a swap. This means the constraint of a fixed target spectrum is actually a constantly moving target in the complex domain. Frequency bands that reach the target are locked in. Neither their magnitude nor phase changes in aggregate. The random walks of such bands must have no DC component... they must be complex-valued blue noise with a tiny amplitude. Knowing this doesn't help directly, but it does explain why the search is so hard. Because the interference masks function like hashes, there is no simple pattern to how positions map to errors in the spectrum. And once you get close to the target, finding new good swaps is equivalent to digging out information encoded deep in the phase domain, with O(N²) interference masks to choose from. Gradient Sampling As I was trying to optimize for evenness after blur, it occurred to me to simply try selecting bright or dark spots in the blurred after-image. This is the situation where frequency bands are in coupled alignment: the error in the spectrum has a relatively concentrated footprint in the time domain. But, this heuristic merely picks out good swaps that are already 'lined up' so to speak. It only works as a low-hanging fruit sampler, with rapidly diminishing returns. Next I used the gradient in the frequency domain. The gradient points towards increasing loss, which is the sum of squared distance $ (…)^2 $. So the slope is $ 2(…) $, proportional to distance to the goal: $$ |\mathrm{gradient}_k| = 2 \cdot \left( \frac{|\mathrm{spectrum}_k|}{||\mathbf{spectrum}||} - \frac{\mathrm{target}_k}{||\mathbf{target}||} \right) $$ It's radial, so its phase matches the spectrum itself: $$ \mathrm{gradient}_k = \mathrm{|gradient_k|} \cdot \left(1 ∠ \mathrm{arg}(\mathrm{spectrum}_k) \right) $$ Eagle-eyed readers may notice the sqrt part of the L2 norm is missing here. It's only there for normalization, and in fact, you generally want a gradient that decreases the closer you get to the target. It acts as a natural stabilizer, forming a convex optimization problem. You can transport this gradient backwards by applying an inverse FFT. Usually derivatives and FFTs don't commute, but that's only when you are deriving in the same dimension as the FFT. The partial derivative here is neither over time nor frequency, but by signal value. The resulting time-domain gradient tells you how fast the (squared) loss would change if a given pixel changed. The sign tells you whether it needs to become lighter or darker. In theory, a pixel with a large gradient can enable larger score improvements per step. It says little about what's a suitable pixel to pair with though. You can infer that a pixel needs to be paired with one that is brighter or darker, but not how much. The gradient only applies differentially. It involves two pixels, so it will cause interference between the two deltas, and also with the signal's own phase. The time-domain gradient does change slowly after every swap—mainly the swapping pixels—so this only needs to add an extra IFFT every N swap attempts, reusing it in between. I tried this in two ways. One was to bias random sampling towards points with the largest gradients. This barely did anything, when applied to one or both pixels. Then I tried going down the list in order, and this worked better. I tried a bunch of heuristics here, like adding a retry until paired, and a 'dud' tracker to reject known unpairable pixels. It did lead to some minor gains in successful sample selection. But beating random was still not a sure bet in all cases, because it comes at the cost of ordering and tracking all pixels to sample them. All in all, it was quite mystifying. Pair Analysis Hence I analyzed all possible swaps (A,B) inside one 64x64 image at different stages of convergence, for 1024 pixels A (25% of total). The result was quite illuminating. There are 2 indicators of a pixel's suitability for swapping: % of all possible swaps (A,_) that are good score improvement of best possible swap (A,B) They are highly correlated, and you can take the geometric average to get a single quality score to order by: The curve shows that the best possible candidates are rare, with a sharp drop-off at the start. Here the average candidate is ~1/3rd as good as the best, though every pixel is pairable. This represents the typical situation when you have unconverged blue-ish noise. Order all pixels by their (signed) gradient, and plot the quality: The distribution seems biased towards the ends. A larger absolute gradient at A can indeed lead to both better scores and higher % of good swaps. Notice that it's also noisier at the ends, where it dips below the middle. If you order pixels by their quality, and then plot the gradient, you see: Selecting for large gradient at A will select both the best and the worst possible pixels A. This implies that there are pixels in the noise that are very significant, but are nevertheless currently 'unfixable'. This corresponds to the 'focus' described earlier. By drawing from the 'top', I was mining the imbalance between the good/left and bad/right distribution. Selecting for a vanishing gradient would instead select the average-to-bad pixels A. I investigated one instance of each: very good, average or very bad pixel A. I tried every possible swap (A, B). You can plot a curve of the potential score improvement: The three scenarios have similar curves, with the bulk of swaps being negative. Only a tiny bit of the curve is sticking out positive, even in the best case. The potential benefit of a good swap is dwarfed by the potential harm of bad swaps. The main difference is just how many positive swaps there are, if any. So let's focus on the positive case, where you can see best. You can order by score, and plot the gradient of all the pixels B, to see another correlation. It looks kinda promising. Here the sign matters, with left and right being different. If the gradient of pixel A is the opposite sign, then this graph is mirrored. But if you order by (signed) gradient and plot the score, you see the real problem, caused by the noise: The good samples are mixed freely among the bad ones, with only a very weak trend downward. This explains why sampling improvements based purely on gradient for pixel B are impossible. You can see what's going on if you plot Δv, the difference in value between A and B: For a given pixel A, all the good swaps have a similar value for B, which is not unexpected. Its mean is the ideal value for A, but there is a lot of variance. In this case pixel A is nearly white, so it is brighter than almost every other pixel B. If you now plot Δv * -gradient, you see a clue on the left: Almost all of the successful swaps have a small but positive value. This represents what we already knew: the gradient's sign tells you if a pixel should be brighter or darker. If Δv has the opposite sign, the chances of a successful swap are slim. Ideally both pixels 'face' the right way, so the swap is beneficial on both ends. But only the combined effect on the loss matters: i.e. Δv * Δgradient < 0. It's only true differentially so it can misfire. But compared to blind sampling of pairs, it's easily 5-10x better and faster, racing towards the tougher parts of the search. What's more... while this test is just binary, I found that any effort spent on trying to further prioritize swaps by the magnitude of the gradient is entirely wasted. Maximizing Δv * Δgradient by repeated sampling is counterproductive, because it selects more bad candidates on the right. Minimizing Δv * Δgradient creates more successful swaps on the left, but lowers the average improvement per step so the convergence is net slower. Anything more sophisticated incurs too much computation to be worth it. It does have a limit. This is what it looks like when an image is practically fully converged: Eventually you reach the point where there are only a handful of swaps with any real benefit, while the rest is just shaving off a few bits of loss at a time. It devolves back to pure random selection, only skipping the coin flip for the gradient. It is likely that more targeted heuristics can still work here. The gradient also works in the early stages. As it barely changes over successive swaps, this leads to a different kind of sparse mode. Instead of scoring only a subset of pixels, simply score multiple swaps as a group over time, without re-scoring intermediate states. This lowers the success rate roughly by a power (e.g. 0.8 -> 0.64), but cuts the number of FFTs by a constant factor (e.g. 1/2). Early on this trade-off can be worth it. Even faster: don't score steps at all. In the very early stage, you can easily get up to 80-90% successful swaps just by filtering on values and gradients. If you just swap a bunch in a row, there is a very good chance you will still end up better than before. It works better than sparse scoring: using the gradient of your true objective approximately works better than using an approximate objective exactly. The latter will miss the true goal by design, while the former continually re-aims itself to the destination despite inaccuracy. Obviously you can mix and match techniques, and gradient + sparse is actually a winning combo. I've only scratched the surface here. Warp Speed Time to address the elephant in the room. If the main bottleneck is an FFT, wouldn't this work better on a GPU? The answer to that is an unsurprising yes, at least for large sizes where the overhead of async dispatch is negligible. However, it would have been endlessly more cumbersome to discover all of the above based on a GPU implementation, where I can't just log intermediate values to a console. After checking everything, I pulled out my bag of tricks and ported it to Use.GPU. As a result, the algorithm runs entirely on the GPU, and provides live visualization of the entire process. It requires a WebGPU-enabled browser, which in practice means Chrome on Windows or Mac, or a dev build elsewhere. I haven't particularly optimized this—the FFT is vanilla textbook—but it works. It provides an easy ~8x speed up on an M1 Mac on beefier target sizes. With a desktop GPU, 128x128x32 and larger become very feasible. It lacks a few goodies from the scripts, and only does gradient + optional sparse. You can however freely exchange PNGs between the CPU and GPU version via drag and drop, as long as the settings match. Layout components Compute components Worth pointing out: this visualization is built using Use.GPU's HTML-like layout system. I can put div-like blocks inside a flex box wrapper, and put text beside it... while at the same time using raw WGSL shaders as the contents of those divs. These visualization shaders sample and colorize the algorithm state on the fly, with no CPU-side involvement other than a static dispatch. The only GPU -> CPU readback is for the stats in the corner, which are classic React and real HTML, along with the rest of the controls. I can then build an <FFT> component and drop it inside an async <ComputeLoop>, and it does exactly what it should. The rest is just a handful of <Dispatch> elements and the ordinary headache of writing compute shaders. <Suspense> ensures all the shaders are compiled before dispatching. While running, the bulk of the tree is inert, with only a handful of reducers triggering on a loop, causing a mere 7 live components to update per frame. The compute dispatch fights with the normal rendering for GPU resources, so there is an auto-batching mechanism that aims for approximately 30-60 FPS. The display is fully anti-aliased, including the pixelized data. I'm using the usual per-pixel SDF trickery to do this... it's applied as a generic wrapper shader for any UV-based sampler. It's a good showcase that Use.GPU really is React-for-GPUs with less hassle, but still with all the goodies. It bypasses most of the browser once the canvas gets going, and it isn't just for UI: you can express async compute just fine with the right component design. The robust layout and vector plotting capabilities are just extra on top. I won't claim it's the world's most elegant abstraction, because it's far too pragmatic for that. But I simply don't know any other programming environment where I could even try something like this and not get bogged down in GPU binding hell, or have to round-trip everything back to the CPU. * * * So there you have it: blue and indigo noise à la carte. What I find most interesting is that the problem of generating noise in the time domain has been recast into shaping and denoising a spectrum in the frequency domain. It starts as white noise, and gets turned into a pre-designed picture. You do so by swapping pixels in the other domain. The state for this process is kept in the phase channel, which is not directly relevant to the problem, but drifts into alignment over time. Hence I called it Stable Fiddusion. If you swap the two domains, you're turning noise into a picture by swapping frequency bands without changing their values. It would result in a complex-valued picture, whose magnitude is the target, and whose phase encodes the progress of the convergence process. This is approximately what you get when you add a hidden layer to a diffusion model. What I also find interesting is that the notion of swaps naturally creates a space that is O(N²) big with only N samples of actual data. Viewed from the perspective of a single step, every pair (A,B) corresponds to a unique information mask in the frequency domain that extracts a unique delta from the same data. There is redundancy, of course, but the nature of the Fourier transform smears it out into one big superposition. When you do multiple swaps, the space grows, but not quite that fast: any permutation of the same non-overlapping swaps is equivalent. There is also a notion of entanglement: frequency bands / pixels are linked together to move as a whole by default, but parts will diffuse into being locked in place. Phase is kind of the bugbear of the DSP world. Everyone knows it's there, but they prefer not to talk about it unless its content is neat and simple. Hopefully by now you have a better appreciation of the true nature of a Fourier transform. Not just as a spectrum for a real-valued signal, but as a complex-valued transform of a complex-valued input. During a swap run, the phase channel continuously looks like noise, but is actually highly structured when queried with the right quefrency hashes. I wonder what other things look like that, when you flip them around. More: Stable Fiddusion app CPU-side source code WebGPU source code

a year ago 17 votes
Sub-pixel Distance Transform

High quality font rendering for WebGPU This page includes diagrams in WebGPU, which has limited browser support. For the full experience, use Chrome on Windows or Mac, or a developer build on other platforms. In this post I will describe Use.GPU's text rendering, which uses a bespoke approach to Signed Distance Fields (SDFs). This was borne out of necessity: while SDF text is pretty common on GPUs, some of the established practice on generating SDFs from masks is incorrect, and some libraries get it right only by accident. So this will be a deep dive from first principles, about the nuances of subpixels. SDFs The idea behind SDFs is quite simple. To draw a crisp, anti-aliased shape at any size, you start from a field or image that records the distance to the shape's edge at every point, as a gradient. Lighter grays are inside, darker grays are outside. This can be a lower resolution than the target. Then you increase the contrast until the gradient is exactly 1 pixel wide at the target size. You can sample it to get a perfectly anti-aliased opacity mask: This works fine for text at typical sizes, and handles fractional shifts and scales perfectly with zero shimmering. It's also reasonably correct from a signal processing math point-of-view: it closely approximates averaging over a pixel-sized circular window, i.e. a low-pass convolution. Crucially, it takes a rendered glyph as input, which means I can remain blissfully unaware of TrueType font specifics, and bezier rasterization, and just offload that to an existing library. To generate an SDF, I started with MapBox's TinySDF library. Except, what comes out of it is wrong: The contours are noticeably wobbly and pixelated. The only reason the glyph itself looks okay is because the errors around the zero-level are symmetrical and cancel out. If you try to dilate or contract the outline, which is supposed to be one of SDF's killer features, you get ugly gunk. Compare to: The original Valve paper glosses over this aspect and uses high resolution inputs (4k) for a highly downscaled result (64). That is not an option for me because it's too slow. But I did get it to work. As a result Use.GPU has a novel subpixel-accurate distance transform (ESDT), which even does emoji. It's a combination CPU/GPU approach, with the CPU generating SDFs and the GPU rendering them, including all the debug viz. The Classic EDT The common solution is a Euclidean Distance Transform. Given a binary mask, it will produce an unsigned distance field. This holds the squared distance d² for either the inside or outside area, which you can sqrt. Like a Fourier Transform, you can apply it to 2D images by applying it horizontally on each row X, then vertically on each column Y (or vice versa). To make a signed distance field, you do this for both the inside and outside separately, and then combine the two as inside – outside or vice versa. The algorithm is one of those clever bits of 80s-style C code which is O(N), has lots of 1-letter variable names, and is very CPU cache friendly. Often copy/pasted, but rarely understood. In TypeScript it looks like this, where array is modified in-place and f, v and z are temporary buffers up to 1 row/column long. The arguments offset and stride allow the code to be used in either the X or Y direction in a flattened 2D array. for (let q = 1, k = 0, s = 0; q < length; q++) { f[q] = array[offset + q * stride]; do { let r = v[k]; s = (f[q] - f[r] + q * q - r * r) / (q - r) / 2; } while (s <= z[k] && --k > -1); k++; v[k] = q; z[k] = s; z[k + 1] = INF; } for (let q = 0, k = 0; q < length; q++) { while (z[k + 1] < q) k++; let r = v[k]; let d = q - r; array[offset + q * stride] = f[r] + d * d; } To explain what this code does, let's start with a naive version instead. Given a 1D input array of zeroes (filled), with an area masked out with infinity (empty): O = [·, ·, ·, 0, 0, 0, 0, 0, ·, 0, 0, 0, ·, ·, ·] Make a matching sequence … 3 2 1 0 1 2 3 … for each element, centering the 0 at each index: [0, 1, 2, 3, 4, 5, 6, 7, 8, 9,10,11,12,13,14] + ∞ [1, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9,10,11,12,13] + ∞ [2, 1, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9,10,11,12] + ∞ [3, 2, 1, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9,10,11] + 0 [4, 3, 2, 1, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9,10] + 0 [5, 4, 3, 2, 1, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9] + 0 [6, 5, 4, 3, 2, 1, 0, 1, 2, 3, 4, 5, 6, 7, 8] + 0 [7, 6, 5, 4, 3, 2, 1, 0, 1, 2, 3, 4, 5, 6, 7] + 0 [8, 7, 6, 5, 4, 3, 2, 1, 0, 1, 2, 3, 4, 5, 6] + ∞ [9, 8, 7, 6, 5, 4, 3, 2, 1, 0, 1, 2, 3, 4, 5] + 0 [10,9, 8, 7, 6, 5, 4, 3, 2, 1, 0, 1, 2, 3, 4] + 0 [11,10,9, 8, 7, 6, 5, 4, 3, 2, 1, 0, 1, 2, 3] + 0 [12,11,10,9, 8, 7, 6, 5, 4, 3, 2, 1, 0, 1, 2] + ∞ [13,12,11,10,9, 8, 7, 6, 5, 4, 3, 2, 1, 0, 1] + ∞ [14,13,12,11,10,9, 8, 7, 6, 5, 4, 3, 2, 1, 0] + ∞ You then add the value from the array to each element in the row: [∞, ∞, ∞, ∞, ∞, ∞, ∞, ∞, ∞, ∞, ∞, ∞, ∞, ∞, ∞] [∞, ∞, ∞, ∞, ∞, ∞, ∞, ∞, ∞, ∞, ∞, ∞, ∞, ∞, ∞] [∞, ∞, ∞, ∞, ∞, ∞, ∞, ∞, ∞, ∞, ∞, ∞, ∞, ∞, ∞] [3, 2, 1, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9,10,11] [4, 3, 2, 1, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9,10] [5, 4, 3, 2, 1, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9] [6, 5, 4, 3, 2, 1, 0, 1, 2, 3, 4, 5, 6, 7, 8] [7, 6, 5, 4, 3, 2, 1, 0, 1, 2, 3, 4, 5, 6, 7] [∞, ∞, ∞, ∞, ∞, ∞, ∞, ∞, ∞, ∞, ∞, ∞, ∞, ∞, ∞] [9, 8, 7, 6, 5, 4, 3, 2, 1, 0, 1, 2, 3, 4, 5] [10,9, 8, 7, 6, 5, 4, 3, 2, 1, 0, 1, 2, 3, 4] [11,10,9, 8, 7, 6, 5, 4, 3, 2, 1, 0, 1, 2, 3] [∞, ∞, ∞, ∞, ∞, ∞, ∞, ∞, ∞, ∞, ∞, ∞, ∞, ∞, ∞] [∞, ∞, ∞, ∞, ∞, ∞, ∞, ∞, ∞, ∞, ∞, ∞, ∞, ∞, ∞] [∞, ∞, ∞, ∞, ∞, ∞, ∞, ∞, ∞, ∞, ∞, ∞, ∞, ∞, ∞] And then take the minimum of each column: P = [3, 2, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 2, 3] This sequence counts up inside the masked out area, away from the zeroes. This is the positive distance field P. You can do the same for the inverted mask: I = [0, 0, 0, ·, ·, ·, ·, ·, 0, ·, ·, ·, 0, 0, 0] to get the complementary area, i.e. the negative distance field N: N = [0, 0, 0, 1, 2, 3, 2, 1, 0, 1, 2, 1, 0, 0, 0] That's what the EDT does, except it uses square distance … 9 4 1 0 1 4 9 …: When you apply it a second time in the second dimension, these outputs are the new input, i.e. values other than 0 or ∞. It still works because of Pythagoras' rule: d² = x² + y². This wouldn't be true if it used linear distance instead. The net effect is that you end up intersecting a series of parabolas, somewhat like a 1D slice of a Voronoi diagram: I' = [0, 0, 1, 4, 9, 4, 4, 4, 1, 1, 4, 9, 4, 9, 9] Each parabola sitting above zero is the 'shadow' of a zero-level paraboloid located some distance in a perpendicular dimension: The code is just a more clever way to do that, without generating the entire N² grid per row/column. It instead scans through the array left to right, building up a list v[k] of significant minima, with thresholds s[k] where two parabolas intersect. It adds them as candidates (k++) and discards them (--k) if they are eclipsed by a newer value. This is the first for/while loop: for (let q = 1, k = 0, s = 0; q < length; q++) { f[q] = array[offset + q * stride]; do { let r = v[k]; s = (f[q] - f[r] + q * q - r * r) / (q - r) / 2; } while (s <= z[k] && --k > -1); k++; v[k] = q; z[k] = s; z[k + 1] = INF; } Then it goes left to right again (for), and fills out the values, skipping ahead to the right minimum (k++). This is the squared distance from the current index q to the nearest minimum r, plus the minimum's value f[r] itself. The paper has more details: for (let q = 0, k = 0; q < length; q++) { while (z[k + 1] < q) k++; let r = v[k]; let d = q - r; array[offset + q * stride] = f[r] + d * d; } The Broken EDT So what's the catch? The above assumes a binary mask. As it happens, if you try to subtract a binary N from P, you have an off-by-one error: O = [·, ·, ·, 0, 0, 0, 0, 0, ·, 0, 0, 0, ·, ·, ·] I = [0, 0, 0, ·, ·, ·, ·, ·, 0, ·, ·, ·, 0, 0, 0] P = [3, 2, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 2, 3] N = [0, 0, 0, 1, 2, 3, 2, 1, 0, 1, 2, 1, 0, 0, 0] P - N = [3, 2, 1,-1,-2,-3,-2,-1, 1,-1,-2,-1, 1, 2, 3] It goes directly from 1 to -1 and back. You could add +/- 0.5 to fix that. But if there is a gray pixel in between each white and black, which we classify as both inside (0) and outside (0), it seems to work out just fine: O = [·, ·, ·, 0, 0, 0, 0, 0, ·, 0, 0, 0, ·, ·, ·] I = [0, 0, 0, 0, ·, ·, ·, 0, 0, 0, ·, 0, 0, 0, 0] P = [3, 2, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 2, 3] N = [0, 0, 0, 0, 1, 2, 1, 0, 0, 0, 1, 0, 0, 0, 0] P - N = [3, 2, 1, 0,-1,-2,-1, 0, 1, 0,-1, 0, 1, 2, 3] This is a realization that somebody must've had, and they reasoned on: "The above is correct for a 50% opaque pixel, where the edge between inside and outside falls exactly in the middle of a pixel." "Lighter grays are more inside, and darker grays are more outside. So all we need to do is treat l = level - 0.5 as a signed distance, and use l² for the initial inside or outside value for gray pixels. This will cause either the positive or negative distance field to shift by a subpixel amount l. And then the EDT will propagate this in both X and Y directions." The initial idea is correct, because this is just running SDF rendering in reverse. A gray pixel in an opacity mask is what you get when you contrast adjust an SDF and do not blow it out into pure black or white. The information inside the gray pixels is "correct", up to rounding. But there are two mistakes here. The first is that even in an anti-aliased image, you can have white pixels right next to black ones. Especially with fonts, which are pixel-hinted. So the SDF is wrong there, because it goes directly from -1 to 1. This causes the contours to double up, e.g. around this bottom edge: To solve this, you can eliminate the crisp case by deliberately making those edges very dark or light gray. But the second mistake is more subtle. The EDT works in 2D because you can feed the output of X in as the input of Y. But that means that any non-zero input to X represents another dimension Z, separate from X and Y. The resulting squared distance will be x² + y² + z². This is a 3D distance, not 2D. If an edge is shifted by 0.5 pixels in X, you would expect a 1D SDF like: […, 0.5, 1.5, 2.5, 3.5, …] = […, 0.5, 1 + 0.5, 2 + 0.5, 3 + 0.5, …] Instead, because of the squaring + square root, you will get: […, 0.5, 1.12, 2.06, 3.04, …] = […, sqrt(0.25), sqrt(1 + 0.25), sqrt(4 + 0.25), sqrt(9 + 0.25), …] The effect of l² = 0.25 rapidly diminishes as you get away from the edge, and is significantly wrong even just one pixel over. The correct shift would need to be folded into (x + …)² + (y + …)² and depends on the direction. e.g. If an edge is shifted horizontally, it ought to be (x + l)² + y², which means there is a term of 2*x*l missing. If the shift is vertical, it's 2*y*l instead. This is also a signed value, not positive/unsigned. Given all this, it's a miracle this worked at all. The only reason this isn't more visible in the final glyph is because the positive and negative fields contains the same but opposite errors around their respective gray pixels. The Not-Subpixel EDT As mentioned before, the EDT algorithm is essentially making a 1D Voronoi diagram every time. It finds the distance to the nearest minimum for every array index. But there is no reason for those minima themselves to lie at integer offsets, because the second for loop effectively resamples the data. So you can take an input mask, and tag each index with a horizontal offset Δ: O = [·, ·, ·, 0, 0, 0, 0, 0, ·, ·, ·] Δ = [A, B, C, D, E, F, G, H, I, J, K] As long as the offsets are small, no two indices will swap order, and the code still works. You then build the Voronoi diagram out of the shifted parabolas, but sample the result at unshifted indices. Problem 1 - Opposing Shifts This lead me down the first rabbit hole, which was an attempt to make the EDT subpixel capable without losing its appealing simplicity. I started by investigating the nuances of subpixel EDT in 1D. This was a bit of a mirage, because most real problems only pop up in 2D. Though there was one important insight here. O = [·, ·, ·, 0, 0, 0, 0, 0, ·, ·, ·] Δ = [·, ·, ·, A, ·, ·, ·, B, ·, ·, ·] Given a mask of zeroes and infinities, you can only shift the first and last point of each segment. Infinities don't do anything, while middle zeroes should remain zero. Using an offset A works sort of as expected: this will increase or decrease the values filled in by a fractional pixel, calculating a squared distance (d + A)² where A can be positive or negative. But the value at the shifted index itself is always (0 + A)² (positive). This means it is always outside, regardless of whether it is moving left or right. If A is moving left (–), the point is inside, and the (unsigned) distance should be 0. At B the situation is reversed: the distance should be 0 if B is moving right (+). It might seem like this is annoying but fixable, because the zeroes can be filled in by the opposite signed field. But this is only looking at the binary 1D case, where there are only zeroes and infinities. In 2D, a second pass has non-zero distances, so every index can be shifted: O = [a, b, c, d, e, f, g, h, i, j, k] Δ = [A, B, C, D, E, F, G, H, I, J, K] Now, resolving every subpixel unambiguously is harder than you might think: It's important to notice that the function being sampled by an EDT is not actually smooth: it is the minimum of a discrete set of parabolas, which cross at an angle. The square root of the output only produces a smooth linear gradient because it samples each parabola at integer offsets. Each center only shifts upward by the square of an integer in every pass, so the crossings are predictable. You never sample the 'wrong' side of (d + ...)². A subpixel EDT does not have this luxury. Subpixel EDTs are not irreparably broken though. Rather, they are only valid if they cause the unsigned distance field to increase, i.e. if they dilate the empty space. This is a problem: any shift that dilates the positive field contracts the negative, and vice versa. To fix this, you need to get out of the handwaving stage and actually understand P and N as continuous 2D fields. Problem 2 - Diagonals Consider an aliased, sloped edge. To understand how the classic EDT resolves it, we can turn it into a voronoi diagram for all the white pixel centers: Near the bottom, the field is dominated by the white pixels on the corners: they form diagonal sections downwards. Near the edge itself, the field runs perfectly vertical inside a roughly triangular section. In both cases, an arrow pointing back towards the cell center is only vaguely perpendicular to the true diagonal edge. Near perfect diagonals, the edge distances are just wrong. The distance of edge pixels goes up or right (1), rather than the more logical diagonal 0.707…. The true closest point on the edge is not part of the grid. These fields don't really resolve properly until 6-7 pixels out. You could hide these flaws with e.g. an 8x downscale, but that's 64x more pixels. Either way, you shouldn't expect perfect numerical accuracy from an EDT. Just because it's mathematically separable doesn't mean it's particularly good. In fact, it's only separable because it isn't very good at all. Problem 3 - Gradients In 2D, there is also only one correct answer to the gray case. Consider a diagonal edge, anti-aliased: Thresholding it into black, grey or white, you get: If you now classify the grays as both inside and outside, then the highlighted pixels will be part of both masks. Both the positive and negative field will be exactly zero there, and so will the SDF (P - N): This creates a phantom vertical edge that pushes apart P and N, and causes the average slope to be less than 45º. The field simply has the wrong shape, because gray pixels can be surrounded by other gray pixels. This also explains why TinySDF magically seemed to work despite being so pixelized. The l² gray correction fills in exactly the gaps in the bad (P - N) field where it is zero, and it interpolates towards a symmetrically wrong P and N field on each side. If we instead classify grays as neither inside nor outside, then P and N overlap in the boundary, and it is possible to resolve them into a coherent SDF with a clean 45 degree slope, if you do it right: What seemed like an off-by-one error is actually the right approach in 2D or higher. The subpixel SDF will then be a modified version of this field, where the P and N sides are changed in lock-step to remain mutually consistent. Though we will get there in a roundabout way. Problem 4 - Commuting It's worth pointing out: a subpixel EDT simply cannot commute in 2D. First, consider the data flow of an ordinary EDT: Information from a corner pixel can flow through empty space both when doing X-then-Y and Y-then-X. But information from the horizontal edge pixels can only flow vertically then horizontally. This is okay because the separating lines between adjacent pixels are purely vertical too: the red arrows never 'win'. But if you introduce subpixel shifts, the separating lines can turn: The data flow is still limited to the original EDT pattern, so the edge pixels at the top can only propagate by starting downward. They can only influence adjacent columns if the order is Y-then-X. For vertical edges it's the opposite. That said, this is only a problem on shallow concave curves, where there aren't any corner pixels nearby. The error is that it 'snaps' to the wrong edge point, but only when it is already several pixels away from the edge. In that case, the smaller x² term is dwarfed by the much larger y² term, so the absolute error is small after sqrt. The ESDT Knowing all this, here's how I assembled a "true" Euclidean Subpixel Distance Transform. Subpixel offsets To start we need to determine the subpixel offsets. We can still treat level - 0.5 as the signed distance for any gray pixel, and ignore all white and black for now. The tricky part is determining the exact direction of that distance. As an approximation, we can examine the 3x3 neighborhood around each gray pixel and do a least-squares fit of a plane. As long as there is at least one white and one black pixel in this neighborhood, we get a vector pointing towards where the actual edge is. In practice I apply some horizontal/vertical smoothing here using a simple [1 2 1] kernel. The result is numerically very stable, because the originally rasterized image is visually consistent. This logic is disabled for thin creases and spikes, where it doesn't work. Such points are treated as fully masked out, so that neighboring distances propagate there instead. This is needed e.g. for the pointy negative space of a W to come out right. I also implemented a relaxation step that will smooth neighboring vectors if they point in similar directions. However, the effect is quite minimal, and it rounds very sharp corners, so I ended up disabling it by default. The goal is then to do an ESDT that uses these shifted positions for the minima, to get a subpixel accurate distance field. P and N junction We saw earlier that only non-masked pixels can have offsets that influence the output (#1). We only have offsets for gray pixels, yet we concluded that gray pixels should be masked out, to form a connected SDF with the right shape (#3). This can't work. SDFs are both the problem and the solution here. Dilating and contracting SDFs is easy: add or subtract a constant. So you can expand both P and N fields ahead of time geometrically, and then undo it numerically. This is done by pushing their respective gray pixel centers in opposite directions, by half a pixel, on top of the originally calculated offset: This way, they can remain masked in in both fields, but are always pushed between 0 and 1 pixel inwards. The distance between the P and N gray pixel offsets is always exactly 1, so the non-zero overlap between P and N is guaranteed to be exactly 1 pixel wide everywhere. It's a perfect join anywhere we sample it, because the line between the two ends crosses through a pixel center. When we then calculate the final SDF, we do the opposite, shifting each by half a pixel and trimming it off with a max: SDF = max(0, P - 0.5) - max(0, N - 0.5) Only one of P or N will be > 0.5 at a time, so this is exact. To deal with pure black/white edges, I treat any black neighbor of a white pixel (horizontal or vertical only) as gray with a 0.5 pixel offset (before P/N dilation). No actual blurring needs to happen, and the result is numerically exact minus epsilon, which is nice. ESDT state The state for the ESDT then consists of remembering a signed X and Y offset for every pixel, rather than the squared distance. These are factored into the distance and threshold calculations, separated into its proper parallel and orthogonal components, i.e. X/Y or Y/X. Unlike an EDT, each X or Y pass has to be aware of both axes. But the algorithm is mostly unchanged otherwise, here X-then-Y. The X pass: At the start, only gray pixels have offsets and they are all in the range -1…1 (exclusive). With each pass of ESDT, a winning minima's offsets propagate to its affecting range, tracking the total distance (Δx, Δy) (> 1). At the end, each pixel's offset points to the nearest edge, so the squared distance can be derived as Δx² + Δy². The Y pass: You can see that the vertical distances in the top-left are practically vertical, and not oriented perpendicular to the contour on average: they have not had a chance to propagate horizontally. But they do factor in the vertical subpixel offset, and this is the dominant component. So even without correction it still creates a smooth SDF with a surprisingly small error. Fix ups The commutativity errors are all biased positively, meaning we get an upper bound of the true distance field. You could take the min of X then Y and Y then X. This would re-use all the same prep and would restore rotation-independence at the cost of 2x as many ESDTs. You could try X then Y then X at 1.5x cost with some hacks. But neither would improve diagonal areas, which were still busted in the original EDT. Instead I implemented an additional relaxation pass. It visits every pixel's target, and double checks whether one of the 4 immediate neighbors (with subpixel offset) isn't a better solution: It's a good heuristic because if the target is >1px off there is either a viable commutative propagation path, or you're so far away the error is negligible. It fixes up the diagonals, creating tidy lines when the resolution allows for it: You could take this even further, given that you know the offsets are supposed to be perpendicular to the glyph contour. You could add reprojection with a few dot products here, but getting it to not misfire on edge cases would be tricky. While you can tell the unrelaxed offsets are wrong when visualized, and the fixed ones are better, the visual difference in the output glyphs is tiny. You need to blow glyphs up to enormous size to see the difference side by side. So it too is disabled by default. The diagonals in the original EDT were wrong too and you could barely tell. Emoji An emoji is generally stored as a full color transparent PNG or SVG. The ESDT can be applied directly to its opacity mask to get an SDF, so no problem there. There are an extremely rare handful of emoji with semi-transparent areas, but you can get away with making those solid. For this I just use a filter that detects '+' shaped arrangements of pixels that have (almost) the same transparency level. Then I dilate those by 3x3 to get the average transparency level in each area. Then I divide by it to only keep the anti-aliased edges transparent. The real issue is blending the colors at the edges, when the emoji is being rendered and scaled. The RGB color of transparent pixels is undefined, so whatever values are there will blend into the surrounding pixels, e.g. creating a subtle black halo: Not Premultiplied Premultiplied A common solution is premultiplied alpha. The opacity is baked into the RGB channels as (R * A, G * A, B * A, A), and transparent areas must be fully transparent black. This allows you to use a premultiplied blend mode where the RGB channels are added directly without further scaling, to cancel out the error. But the alpha channel of an SDF glyph is dynamic, and is independent of the colors, so it cannot be premultiplied. We need valid color values even for the fully transparent areas, so that up- or downscaling is still clean. Luckily the ESDT calculates X and Y offsets which point from each pixel directly to the nearest edge. We can use them to propagate the colors outward in a single pass, filling in the entire background. It doesn't need to be very accurate, so no filtering is required. RGB channel Output The result looks pretty great. At normal sizes, the crisp edge hides the fact that the interior is somewhat blurry. Emoji fonts are supported via the underlying ab_glyph library, but are too big for the web (10MB+). So you can just load .PNGs on demand instead, at whatever resolution you need. Hooking it up to the 2D canvas to render native system emoji is left as an exercise for the reader. Use.GPU does not support complex Unicode scripts or RTL text yet—both are a can of worms I wish to offload too—but it does support composite emoji like "pirate flag" (white flag + skull and crossbones) or "male astronaut" (astronaut + man) when formatted using the usual Zero-Width Joiners (U+200D) or modifiers. Shading Finally, a note on how to actually render with SDFs, which is more nuanced than you might think. I pack all the SDF glyphs into an atlas on-demand, the same I use elsewhere in Use.GPU. This has a custom layout algorithm that doesn't backtrack, optimized for filling out a layout at run-time with pieces of a similar size. Glyphs are rasterized at 1.5x their normal font size, after rounding up to the nearest power of two. The extra 50% ensures small fonts on low-DPI displays still use a higher quality SDF, while high-DPI displays just upscale that SDF without noticeable quality loss. The rounding ensures similar font sizes reuse the same SDFs. You can also override the detail independent of font size. To determine the contrast factor to draw an SDF, you generally use screen-space derivatives. There are good and bad ways of doing this. Your goal is to get a ratio of SDF pixels to screen pixels, so the best thing to do is give the GPU the coordinates of the SDF texture pixels, and ask it to calculate the difference for that between neighboring screen pixels. This works for surfaces in 3D at an angle too. Bad ways of doing this will instead work off relative texture coordinates, and introduce additional scaling factors based on the view or atlas size, when they are all just supposed to cancel out. As you then adjust the contrast of an SDF to render it, it's important to do so around the zero-level. The glyph's ideal vector shape should not expand or contract as you scale it. Like TinySDF, I use 75% gray as the zero level, so that more SDF range is allocated to the outside than the inside, as dilating glyphs is much more common than contraction. At the same time, a pixel whose center sits exactly on the zero level edge is actually half inside, half outside, i.e. 50% opaque. So, after scaling the SDF, you need to add 0.5 to the value to get the correct opacity for a blend. This gives you a mathematically accurate font rendering that approximates convolution with a pixel-sized circle or box. But I go further. Fonts were not invented for screens, they were designed for paper, with ink that bleeds. Certain renderers, e.g. MacOS, replicate this effect. The physical bleed distance is relatively constant, so the larger the font, the smaller the effect of the bleed proportionally. I got the best results with a 0.25 pixel bleed at 32px or more. For smaller sizes, it tapers off linearly. When you zoom out blocks of text, they get subtly fatter instead of thinning out, and this is actually a great effect when viewing document thumbnails, where lines of text become a solid mass at the point where the SDF resolution fails anyway. In Use.GPU I prefer to use gamma correct, linear RGB color, even for 2D. What surprised me the most is just how unquestionably superior this looks. Text looks rock solid and readable even at small sizes on low-DPI. Because the SDF scales, there is no true font hinting, but it really doesn't need it, it would just be a nice extra. Presumably you could track hinted points or edges inside SDF glyphs and then do a dynamic distortion somehow, but this is an order of magnitude more complex than what it is doing now, which is splat a contrasted texture on screen. It does have snapping you can turn on, which avoids jiggling of individual letters. But if you turn it off, you get smooth subpixel everything: I was always a big fan of the 3x1 subpixel rendering used on color LCDs (i.e. ClearType and the like), and I was sad when it was phased out due to the popularity of high-DPI displays. But it turns out the 3x res only offered marginal benefits... the real improvement was always that it had a custom gamma correct blend mode, which is a thing a lot of people still get wrong. Even without RGB subpixels, gamma correct AA looks great. Converting the entire desktop to Linear RGB is also not going to happen in our lifetime, but I really want it more now. The "blurry text" that some people associate with anti-aliasing is usually just text blended with the wrong gamma curve, and without an appropriate bleed for the font in question. * * * If you want to make SDFs from existing input data, subpixel accuracy is crucial. Without it, fully crisp strokes actually become uneven, diagonals can look bumpy, and you cannot make clean dilated outlines or shadows. If you use an EDT, you have to start from a high resolution source and then downscale away all the errors near the edges. But if you use an ESDT, you can upscale even emoji PNGs with decent results. It might seem pretty obvious in hindsight, but there is a massive difference between getting it sort of working, and actually getting all the details right. There were many false starts and dead ends, because subpixel accuracy also means one bad pixel ruins it. In some circles, SDF text is an old party trick by now... but a solid and reliable implementation is still a fair amount of work, with very little to go off for the harder parts. By the way, I did see if I could use voronoi techniques directly, but in terms of computation it is much more involved. Pretty tho: The ESDT is fast enough to use at run-time, and the implementation is available as a stand-alone import for drop-in use. This post started as a single live WebGPU diagram, which you can play around with. The source code for all the diagrams is available too.

a year ago 36 votes
Fuck It, We'll Do It Live

How the Live effect run-time is implemented In this post I describe how the Live run-time internals are implemented, which drive Use.GPU. Some pre-existing React and FP effect knowledge is useful. I have written about Live before, but in general terms. You may therefor have a wrong impression of this endeavor. When a junior engineer sees an application doing complex things, they're often intimidated by the prospect of working on it. They assume that complex functionality must be the result of complexity in code. The fancier the app, the less understandable it must be. This is what their experience has been so far: seniority correlates to more and hairier lines of code. After 30 years of coding though, I know it's actually the inverse. You cannot get complex functionality working if you've wasted all your complexity points on the code itself. This is the main thing I want to show here, because this post mainly describes 1 data structure and a handful of methods. Live has a real-time inspector, so a lot of this can be demonstrated live. Reading this on a phone is not recommended, the inspector is made for grown-ups. The story so far: The main mechanism of Live is to allow a tree to expand recursively like in React, doing breadth-first expansion. This happens incrementally, and in a reactive, rewindable way. You use this to let interactive programs knit themselves together at run-time, based on the input data. Like a simple CLI program with a main() function, the code runs top to bottom, and then stops. It produces a finite execution trace that you can inspect. To become interactive and to animate, the run-time will selectively rewind, and re-run only parts, in response to external events. It's a fusion of immediate and retained mode UI, offering the benefits of both and the downsides of neither, not limited to UI. This relies heavily on FP principles such as pure functions, immutable data structures and so on. But the run-time itself is very mutable: the whole idea is to centralize all the difficult parts of tracking changes in one place, and then forget about them. Live has no dependencies other than a JavaScript engine and these days consists of ~3600 lines. If you're still not quite sure what the Live component tree actually is, it's 3 things at once: a data dependency graph an execution trace a tree-shaped cache The properties of the software emerge because these aspects are fully aligned inside a LiveComponent. Functionally Mayonnaise You can approach this from two sides, either from the UI side, or from the side of functional Effects. Live Components A LiveComponent (LC) is a React UI function component (FC) with 1 letter changed, at first: const MyComponent: LC<MyProps> = (props: MyProps) => { const {wat} = props; // A memo hook // Takes dependencies as array const slow = useMemo(() => expensiveComputation(wat), [wat]); // Some local state const [state, setState] = useState(1); // JSX expressions with props and children // These are all names of LC functions to call + their props/arguments return ( <OtherComponent> <Foo value={slow} /> <Bar count={state} setCount={setState} /> </OtherComponent> ); }; The data is immutable, and the rendering appears stateless: it returns a pure data structure for given input props and current state. The component uses hooks to access and manipulate its own state. The run-time will unwrap the outer layer of the <JSX> onion, mount and reconcile it, and then recurse. let _ = await ( <OtherComponent> <Foo foo={foo} /> <Bar /> </OtherComponent> ); return null; The code is actually misleading though. Both in Live and React, the return keyword here is technically wrong. Return implies passing a value back to a parent, but this is not happening at all. A parent component decided to render <MyComponent>, yes. But the function itself is being called by Live/React. it's yielding JSX to the Live/React run-time to make a call to OtherComponent(...). There is no actual return value. Because a <Component> can't return a value to its parent, the received _ will always be null too. The data flow is one-way, from parent to child. Effects An Effect is basically just a Promise/Future as a pure value. To first approximation, it's a () => Promise: a promise that doesn't actually start unless you call it a second time. Just like a JSX tag is like a React/Live component waiting to be called. An Effect resolves asynchronously to a new Effect, just like <JSX> will render more <JSX>. Unlike a Promise, an Effect is re-usable: you can fire it as many times as you like. Just like you can keep rendering the same <JSX>. let value = yield ( OtherEffect([ Foo(foo), Bar(), ]) ); // ... return value; So React is like an incomplete functional Effect system. Just replace the word Component with Effect. OtherEffect is then some kind of decorator which describes a parallel dispatch to Effects Foo and Bar. A real Effect system will fork, but then join back, gathering the returned values, like a real return statement. Unlike React components, Effects are ephemeral: no state is retained after they finish. The purity is actually what makes them appealing in production, to manage complex async flows. They're also not incremental/rewindable: they always run from start to finish.   Pure Returns State Incremental React ✅ ❌ ✅ ✅ Effects ✅ ✅ ❌ ❌ You either take an effect system and make it incremental and stateful, or you take React and add the missing return data path I chose the latter option. First, because hooks are an excellent emulsifier. Second, because the big lesson from React is that plain, old, indexed arrays are kryptonite for incremental code. Unless you've deliberately learned how to avoid them, you won't get far, so it's better to start from that side. This breakdown is divided into three main parts: the rendering of 1 component the environment around a component the overall tree update loop Components The component model revolves around a few core concepts: Fibers Hooks and State Mounts Memoization Inlining Components form the "user land" of Live. You can do everything you need there without ever calling directly into the run-time's "kernel". Live however does not shield its internals. This is fine, because I don't employ hundreds of junior engineers who would gleefully turn that privilege into a cluster bomb of spaghetti. The run-time is not extensible anyhow: what you see is what you get. The escape hatch is there to support testing and debugging. Shielding this would be a game of hide-the-reference, creating a shadow-API for privileged friend packages, and so on. Ain't nobody got time for that. React has an export called DONT_USE_THIS_OR_YOU_WILL_BE_FIRED, Live has THIS_WAY___IF_YOU_DARE and it's called useFiber. Fibers Borrowing React terminology, a mounted <Component> function is called a fiber, despite this being single threaded. Each persists for the component lifetime. To start, you call render(<App />). This creates and renders the first fiber. type LiveFiber = { // Fiber ID id: number, // Component function f: Function, // Arguments (props, etc.) args: any[], // ... } Fibers are numbered with increasing IDs. In JS this means you can create 253 fibers before it crashes, which ought to be enough for anybody. It holds the component function f and its latest arguments args. Unlike React, Live functions aren't limited to only a single props argument. Each fiber is rendered from a <JSX> tag, which is a plain data structure. The Live version is very simple. type Key = number | string; type JSX.Element = { // Same as fiber f: Function, args: any[], // Element key={...} key?: string | number, // Rendered by fiber ID by: number, } Another name for this type is a DeferredCall. This is much leaner than React's JSX type, although Live will gracefully accept either. In Live, JSX syntax is also optional, as you can write use(Component, …) instead of <Component … />. Calls and fibers track the ID by of the fiber that rendered them. This is always an ancestor, but not necessarily the direct parent. fiber.bound = () => { enterFiber(fiber); const {f, args} = fiber; const jsx = f(...args); exitFiber(fiber); return jsx; }; The fiber holds a function bound. This binds f to the fiber itself, always using the current fiber.args as arguments. It wraps the call in an enter and exit function for state housekeeping. This can then be called via renderFiber(fiber) to get jsx. This is only done during an ongoing render cycle. { // ... state: any[], pointer: number, } Hooks and State Each fiber holds a local state array and a temporary pointer: Calling a hook like useState taps into this state without an explicit reference to it. In Live, this is implemented as a global currentFiber variable, combined with a local fiber.pointer starting at 0. Both are initialized by enterFiber(fiber). The state array holds flattened triplets, one per hook. They're arranged as [hookType, A, B]. Values A and B are hook-specific, but usually hold a value and a dependencies array. In the case useState, it's just the [value, setValue] pair. The fiber.pointer advances by 3 slots every time a hook is called. Tracking the hookType allows the run-time to warn you if you call hooks in a different order than before. The basic React hooks don't need any more state than this and can be implemented in ~20 lines of code each. This is useMemo: export const useMemo = <T>( callback: () => T, dependencies: any[] = NO_DEPS, ): T => { const fiber = useFiber(); const i = pushState(fiber, Hook.MEMO); let {state} = fiber; let value = state![i]; const deps = state![i + 1]; if (!isSameDependencies(deps, dependencies)) { value = callback(); state![i] = value; state![i + 1] = dependencies; } return value as unknown as T; } useFiber just returns currentFiber and doesn't count as a real hook (it has no state). It only ensures you cannot call a hook outside of a component render. export const useNoHook = (hookType: Hook) => () => { const fiber = useFiber(); const i = pushState(fiber, hookType); const {state} = fiber; state![i] = undefined; state![i + 1] = undefined; }; No-hooks like useNoMemo are also implemented, which allow for conditional hooks: write a matching else branch for any if. To ensure consistent rendering, a useNoHook will dispose of any state the useHook had, rather than just being a no-op. The above is just the basic version for simple hooks without cleanup. This also lets the run-time support early return cleanly in Components: when exitFiber(fiber) is called, all remaining unconsumed state is disposed of with the right no-hook. If someone calls a setState, this is added to a dispatch queue, so changes can be batched together. If f calls setState during its own render, this is caught and resolved within the same render cycle, by calling f again. A setState which is a no-op is dropped (pointer equality). You can see however that Live hooks are not pure: when a useMemo is tripped, it will immediately overwrite the previous state during the render, not after. This means renders in Live are not stateless, only idempotent. This is very deliberate. Live doesn't have a useEffect hook, it has a useResource hook that is like a useMemo with a useEffect-like disposal callback. While it seems to throw React's orchestration properties out the window, this is not actually so. What you get in return is an enormous increase in developer ergonomics, offering features React users are still dreaming of, running off 1 state array and 1 pointer. Live is React with the training wheels off, not with holodeck security protocols disabled, but this takes a while to grok. Mounts After rendering, the returned/yielded <JSX> value is reconciled with the previous rendered result. This is done by updateFiber(fiber, value). New children are mounted, while old children are unmounted or have their args replaced. Only children with the same f as before can be updated in place. { // ... // Static mount mount?: LiveFiber, // Dynamic mounts mounts?: Map<Key, LiveFiber>, lookup?: Map<Key, number>, order?: Key[], // Continuation next?: LiveFiber, // Fiber type type?: LiveComponent, // ... } Mounts are tracked inside the fiber, either as a single mount, or a map mounts, pointing to other fiber objects. The key for mounts is either an array index 0..N or a user-defined key. Keys must be unique. The order of the keys is kept in a list. A reverse lookup map is created if they're not anonymous indices. The mount is only used when a component renders 1 other statically. This excludes arrays of length 1. If a component switches between mount and mounts, all existing mounts are discarded. Continuations are implemented as a special next mount. This is mounted by one of the built-in fenced operators such as <Capture> or <Gather>. In the code, mounting is done via: mountFiberCall(fiber, call) (static) reconcileFiberCalls(fiber, calls) (dynamic) mountFiberContinuation(fiber, call) (next). Each will call updateMount(fiber, mount, jsx, key?, hasKeys?). If an existing mount (with the same key) is compatible it's updated, otherwise a replacement fiber is made with makeSubFiber(…). It doesn't update the parent fiber, rather it just returns the new state of the mount (LiveFiber | null), so it can work for all 3 mounting types. Once a fiber mount has been updated, it's queued to be rendered with flushMount. If updateMount returns false, the update was a no-op because fiber arguments were identical (pointer equality). The update will be skipped and the mount not flushed. This follows the same implicit memoization rule that React has. It tends to trigger when a stateful component re-renders an old props.children. A subtle point here is that fibers have no links/pointers pointing back to their parent. This is part practical, part ideological. It's practical because it cuts down on cyclic references to complicate garbage collection. It's ideological because it helps ensures one-way data flow. There is also no global collection of fibers, except in the inspector. Like in an effect system, the job of determining what happens is entirely the result of an ongoing computation on JSX, i.e. something passed around like pure, immutable data. The tree determines its own shape as it's evaluated. Queue and Path Live needs to process fibers in tree order, i.e. as in a typical tree list view. To do so, fibers are compared as values with compareFibers(a, b). This is based on references that are assigned only at fiber creation. It has a path from the root of the tree to the fiber (at depth depth), containing the indices or keys. { // ... depth: number, path: Key[], keys: ( number | Map<Key, number> )[], } A continuation next is ordered after the mount or mounts. This allows data fences to work naturally: the run-time only ensures all preceding fibers have been run first. For this, I insert an extra index into the path, 0 or 1, to distinguish the two sub-trees. If many fibers have a static mount (i.e. always 1 child), this would create paths with lots of useless zeroes. To avoid this, a single mount has the same path as its parent, only its depth is increased. Paths can still be compared element-wise, with depth as the tie breaker. This easily reduces typical path length by 70%. This is enough for children without keys, which are spawned statically. Their order in the tree never changes after creation, they can only be updated in-place or unmounted. But for children with a key, the expectation is that they persist even if their order changes. Their keys are just unsorted ids, and their order is stored in the fiber.order and fiber.lookup of the parent in question. This is referenced in the fiber.keys array. It's a flattened list of pairs [i, fiber.lookup], meaning the key at index i in the path should be compared using fiber.lookup. To keep these keys references intact, fiber.lookup is mutable and always modified in-place when reconciling. Memoization If a Component function is wrapped in memo(...), it won't be re-rendered if its individual props haven't changed (pointer equality). This goes deeper than the run-time's own oldArgs !== newArgs check. { // ... version?: number, memo?: number, runs?: number, } For this, memoized fibers keep a version around. They also store a memo which holds the last rendered version, and a run count runs for debugging: The version is used as one of the memo dependencies, along with the names and values of the props. Hence a memo(...) cache can be busted just by incrementing fiber.version, even if the props didn't change. Versions roll over at 32-bit. To actually do the memoization, it would be nice if you could just wrap the whole component in useMemo. It doesn't work in the React model because you can't call other hooks inside hooks. So I've brought back the mythical useYolo... An earlier incarnation of this allowed fiber.state scopes to be nested, but lacked a good purpose. The new useYolo is instead a useMemo you can nest. It effectively hot swaps the entire fiber.state array with a new one kept in one of the slots: This is then the first hook inside fiber.state. If the memo succeeds, the yolo'd state is preserved without treating it as an early return. Otherwise the component runs normally. Yolo'ing as the first hook has a dedicated fast path but is otherwise a perfectly normal hook. The purpose of fiber.memo is so the run-time can tell whether it rendered the same thing as before, and stop. It can just compare the two versions, leaving the specifics of memoization entirely up to the fiber component itself. For example, to handle a custom arePropsEqual function in memo(…). I always use version numbers as opposed to isDirty flags, because it leaves a paper trail. This provides the same ergonomics for mutable data as for immutable data: you can store a reference to a previous value, and do an O(1) equality check to know whether it changed since you last accessed it. Whenever you have a handle which you can't look inside, such as a pointer to GPU memory, it's especially useful to keep a version number on it, which you bump every time you write to it. It makes debugging so much easier. Inlining Built-in operators are resolved with a hand-coded routine post-render, rather than being "normal" components. Their component functions are just empty and there is a big dispatch with if statements. Each is tagged with a isLiveBuiltin: true. If a built-in operator is an only child, it's usually resolved inline. No new mount is created, it's immediately applied as part of updating the parent fiber. The glue in between tends to be "kernel land"-style code anyway, it doesn't need a whole new fiber, and it's not implemented in terms of hooks. The only fiber state it has is the type (i.e. function) of the last rendered JSX. There are several cases where it cannot inline, such as rendering one built-in inside another built-in, or rendering a built-in as part of an array. So each built-in can always be mounted independently if needed. From an architectural point of view, inlining is just incidental complexity, but this significantly reduces fiber overhead and keeps the user-facing component tree much tidier. It introduces a few minor problems around cleanup, but those are caught and handled. Live also has a morph operator. This lets you replace a mount with another component, without discarding any matching children or their state. The mount's own state is still discarded, but its f, args, bound and type are modified in-place. A normal render follows, which will reconcile the children. This is implemented in morphFiberCall. It only works for plain vanilla components, not other built-ins. The reason to re-use the fiber rather than transplant the children is so that references in children remain unchanged, without having to rekey them. In Live, I never do a full recursive traversal of any sub-tree, unless that traversal is incremental and memoized. This is a core property of the system. Deep recursion should happen in user-land. Environment Fibers have access to a shared environment, provided by their parent. This is created in user-land through built-in ops and accessed via hooks. Context and captures Gathers and yeets Fences and suspend Quotes and reconcilers Unquote + quote Context and captures Live extends the classic React context: { // ... context: { values: Map<LiveContext | LiveCapture, Ref<any>>, roots: Map<LiveContext | LiveCapture, number | LiveFiber>, }, } A LiveContext provides 1 value to N fibers. A LiveCapture collects N values into 1 fiber. Each is just an object created in user-land with makeContext / makeCapture, acting as a unique key. It can also hold a default value for a context. The values map holds the current value of each context/capture. This is boxed inside a Ref as {current: value} so that nested sub-environments share values for inherited contexts. The roots map points to the root fibers providing or capturing. This is used to allow useContext and useCapture to set up the right data dependency just-in-time. For a context, this points upstream in the tree, so to avoid a reverse reference, it's a number. For a capture, this points to a downstream continuation, i.e. the next of an ancestor, and can be a LiveFiber. Normally children just share their parent's context. It's only when you <Provide> or <Capture> that Live builds a new, immutable copy of values and roots with a new context/capture added. Each context and capture persists for the lifetime of its sub-tree. Captures build up a map incrementally inside the Ref while children are rendered, keyed by fiber. This is received in tree order after sorting: <Capture context={...} children={...} then={(values: T[]) => { ... }} /> You can also just write capture(context, children, then), FYI. This is an await or yield in disguise, where the then closure is spiritually part of the originating component function. Therefor it doesn't need to be memoized. The state of the next fiber is preserved even if you pass a new function instance every time. Unlike React-style render props, then props can use hooks, because they run on an independent next fiber called Resume(…). This fiber will be re-run when values changes, and can do so without re-running Capture itself. A then prop can render new elements, building a chain of next fibers. This acts like a rewindable generator, where each Resume provides a place where the code can be re-entered, without having to explicitly rewind any state. This requires the data passed into each closure to be immutable. The logic for providing or capturing is in provideFiber(fiber, ...) and captureFiber(fiber, ...). Unlike other built-ins, these are always mounted separately and are called at the start of a new fiber, not the end of previous one. Their children are then immediately reconciled by inlineFiberCall(fiber, calls). Gathers and yeets Live offers a true return, in the form of yeet(value) (aka <Yeet>{value}</Yeet>). This passes a value back to a parent. These values are gathered in an incremental map-reduce along the tree, to a root that mounted a gathering operation. It's similar to a Capture, except it visits every parent along the way. It's the complement to tree expansion during rendering. This works for any mapper and reducer function via <MapReduce>. There is also an optimized code path for a simple array flatMap <Gather>, as well as struct-of-arrays flatMap <MultiGather>. It works just like a capture: <Gather children={...} then={( value: T[] ) => { ... }} /> <MultiGather children={...} then={( value: Record<string, T[]> ) => { ... }} /> Each fiber in a reduction has a fiber.yeeted structure, created at mount time. Like a context, this relation never changes for the lifetime of the component. It acts as a persistent cache for a yeeted value of type A and its map-reduction reduced of type B: { yeeted: { // Same as fiber (for inspecting) id: number, // Reduction cache at this fiber value?: A, reduced?: B, // Parent yeet cache parent?: FiberYeet<A, B>, // Reduction root root: LiveFiber, // ... }, } The last value yeeted by the fiber is kept so that all yeets are auto-memoized. Each yeeted points to a parent. This is not the parent fiber but its fiber.yeeted. This is the parent reduction, which is downstream in terms of data dependency, not upstream. This forms a mirrored copy of the fiber tree and respects one-way data flow: Again the linked root fiber (sink) is not an ancestor, but the next of an ancestor, created to receive the final reduced value. If the reduced value is undefined, this signifies an empty cache. When a value is yeeted, parent caches are busted recursively towards the root, until an undefined is encountered. If a fiber mounts or unmounts children, it busts its reduction as well. Fibers that yeet a value cannot also have children. This isn't a limitation because you can render a yeet beside other children, as just another mount, without changing the semantics. You can also render multiple yeets, but it's faster to just yeet a single list. If you yeet undefined, this acts as a zero-cost signal: it does not affect the reduced values, but it will cause the reducing root fiber to be re-invoked. This is a tiny concession to imperative semantics, wildly useful. This may seem very impure, but actually it's the opposite. With clean, functional data types, there is usually a "no-op" value that you could yeet: an empty array or dictionary, an empty function, and so on. You can always force-refresh a reduction without meaningfully changing the output, but it causes a lot of pointless cache invalidation in the process. Zero-cost signals are just an optimization. When reducing a fiber that has a gathering next, it takes precedence over the fiber's own reduction: this is so that you can gather and reyeet in series, with the final reduction returned. Fences and suspend The specifics of a gathering operation are hidden behind a persistent emit and gather callback, derived from a classic map and reduce: { yeeted: { // ... // Emit a value A yeeted from fiber emit: (fiber: LiveFiber, value: A) => void, // Gather a reduction B from fiber gather: (fiber: LiveFiber, self: boolean) => B, // Enclosing yeet scope scope?: FiberYeet<any, any>, }, } Gathering is done by the root reduction fiber, so gather is not strictly needed here. It's only exposed so you can mount a <Fence> inside an existing reduction, without knowing its specifics. A fence will grab the intermediate reduction value at that point in the tree and pass it to user-land. It can then be reyeeted. One such use is to mimic React Suspense using a special toxic SUSPEND symbol. It acts like a NaN, poisoning any reduction it's a part of. You can then fence off a sub-tree to contain the spill and substitute it with its previous value or a fallback. In practice, gather will delegate to one of gatherFiberValues, multiGatherFiberValues or mapReduceFiberValues. Each will traverse the sub-tree, reuse any existing reduced values (stopping the recursion early), and fill in any undefineds via recursion. Their code is kinda gnarly, given that it's just map-reduce, but that's because they're hand-rolled to avoid useless allocations. The self argument to gather is such an optimization, only true for the final user-visible reduction. This lets intermediate reductions be type unsafe, e.g. to avoid creating pointless 1 element arrays. At a gathering root, the enclosing yeet scope is also kept. This is to cleanly unmount an inlined gather, by restoring the parent's yeeted. Quotes and reconcilers Live has a reconciler in reconcileFiberCalls, but it can also mount <Reconciler> as an effect via mountFiberReconciler. This is best understood by pretending this is React DOM. When you render a React tree which mixes <Components> with <html>, React reconciles it, and extracts the HTML parts into a new tree: <App> <div> <Layout> => <div> <div> <span> <div> <img> <Header> <span> <Logo> <img> Each HTML element is implicitly quoted inside React. They're only "activated" when they become real on the right. The ones on the left are only stand-ins. That's also what a Live <Reconcile> does. It mounts a normal tree of children, but it simultaneously mounts an independent second tree, under its next mount. If you render this: <App> <Reconcile> <Layout> <Quote> <Div> <Div> <Unquote> <Header> <Quote> <Span> <Unquote> <Logo> <Quote> <Img /> ... You will get: It adds a quote environment to the fiber: { // ... quote: { // Reconciler fiber root: number, // Fiber in current sub-tree from: number, // Fiber in other sub-tree to: LiveFiber, // Enclosing reconciler scope scope?: FiberQuote, } } When you render a <Quote>...</Quote>, whatever's inside ends up mounted on the to fiber. Quoted fibers will have a similar fiber.unquote environment. If they render an <Unquote>...</Unquote>, the children are mounted back on the quoting fiber. Each time, the quoting or unquoting fiber becomes the new to fiber on the other side. The idea is that you can use this to embed one set of components inside another as a DSL, and have the run-time sort them out. This all happens in mountFiberQuote(…) and mountFiberUnquote(…). It uses reconcileFiberCall(…) (singular). This is an incremental version of reconcileFiberCalls(…) (plural) which only does one mount/unmount at a time. The fiber id of the quote or unquote is used as the key of the quoted or unquoted fiber. const Queue = ({children}) => ( reconcile( quote( gather( unquote(children), (things: any[]) => <Render things={things} /> )))); The quote and unquote environments are separate so that reconcilers can be nested: at any given place, you can unquote 'up' or quote 'down'. Because you can put multiple <Unquote>s inside one <Quote>, it can also fork. The internal non-JSX dialect is very Lisp-esque, you can rap together some pretty neat structures with this. Because quote are mounted and unmounted incrementally, there is a data fence Reconcile(…) after each (un)quote. This is where the final set is re-ordered if needed. The data structure actually violates my own rule about no-reverse links. After you <Quote>, the fibers in the second tree have a link to the quoting fiber which spawned them. And the same applies in the other direction after you <Unquote>. The excuse is ergonomics. I could break the dependency by creating a separate sub-fiber of <Quote> to serve as the unquoting point, and vice versa. But this would bloat both trees with extra fibers, just for purity's sake. It already has unavoidable extra data fences, so this matters. At a reconciling root, the enclosing quote scope is added to fiber.quote, just like in yeeted, again for clean unmounting of inlined reconcilers. Unquote-quote There is an important caveat here. There are two ways you could implement this. One way is that <Quote>...</Quote> is a Very Special built-in, which does something unusual: it would traverse the children tree it was given, and go look for <Unquote>...</Unquote>s inside. It would have to do so recursively, to partition the quoted and unquoted JSX. Then it would have to graft the quoted JSX to a previous quote, while grafting the unquoted parts to itself as mounts. This is the React DOM mechanism, obfuscated. This is also how quoting works in Lisp: it switches between evaluation mode and AST mode. I have two objections. The first is that this goes against the whole idea of evaluating one component incrementally at a time. It wouldn't be working with one set of mounts on a local fiber: it would be building up args inside one big nested JSX expression. JSX is not supposed to be a mutable data structure, you're supposed to construct it immutably from the inside out, not from the outside in. The second is that this would only work for 'balanced' <Quote>...<Unquote>... pairs appearing in the same JSX expression. If you render: <Present> <Slide /> </Present> ...then you couldn't have <Present> render a <Quote> and <Slide> render an <Unquote> and have it work. It wouldn't be composable as two separate portals. The only way for the quotes/unquotes to be revealed in such a scenario is to actually render the components. This means you have to actively run the second tree as it's being reconciled, same as the first. There is no separate update + commit like in React DOM. This might seem pointless, because all this does is thread the data flow into a zigzag between the two trees, knitting the quote/unquote points together. The render order is the same as if <Quote> and <Unquote> weren't there. The path and depth of quoted fibers reveals this, which is needed to re-render them in the right order later. The key difference is that for all other purposes, those fibers do live in that spot. Each tree has its own stack of nested contexts. Reductions operate on the two separate trees, producing two different, independent values. This is just "hygienic macros" in disguise, I think. Use.GPU's presentation system uses a reconciler to wrap the layout system, adding slide transforms and a custom compositor. This is sandwiched in-between it and the normal renderer. A plain declarative tree of markup can be expanded into: <Device> <View> <Present> <Slide> <Object /> <Object /> </Slide> <Slide> <Object /> <Object /> </Slide> </Present> </View> </Device> I also use a reconciler to produce the WebGPU command queue. This is shared for an entire app and sits at the top. The second tree just contains quoted yeets. I use zero-cost signals here too, to let data sources signal that their contents have changed. There is a short-hand <Signal /> for <Quote><Yeet /></Quote>. Note that you cannot connect the reduction of tree 1 to the root of tree 2: <Reconcile> does not have a then prop. It doesn't make sense because the next fiber gets its children from elsewhere, and it would create a rendering cycle if you tried anyway. If you need to spawn a whole second tree based on a first, that's what a normal gather already does. You can use it to e.g. gather lambdas that return memoized JSX. This effectively acts as a two-phase commit. The Use.GPU layout system does this repeatedly, with several trees + gathers in a row. It involves constraints both from the inside out and the outside in, so you need both tree directions. The output is UI shapes, which need to be batched together for efficiency and turned into a data-driven draw call. The Run-Time With all the pieces laid out, I can now connect it all together. Before render(<App />) can render the first fiber, it initializes a very minimal run-time. So this section will be kinda dry. This is accessed through fiber.host and exposes a handful of APIs: a queue of pending state changes a priority queue for traversal a fiber-to-fiber dependency tracker a resource disposal tracker a stack slicer for reasons State changes When a setState is called, the state change is added to a simple queue as a lambda. This allows simultaneous state changes to be batched together. For this, the host exposes a schedule and a flush method. { // ... host: { schedule: (fiber: LiveFiber, task?: () => boolean | void) => void, flush: () => void, // ... } This comes from makeActionScheduler(…). It wraps a native scheduling function (e.g. queueMicrotask) and an onFlush callback: const makeActionScheduler = ( schedule: (flush: ArrowFunction) => void, onFlush: (fibers: LiveFiber[]) => void, ) => { // ... return {schedule, flush}; } The callback is set up by render(…). It will take the affected fibers and call renderFibers(…) (plural) on them. The returned schedule(…) will trigger a flush, so flush() is only called directly for sync execution, to stay within the same render cycle. Traversal The host keeps a priority queue (makePriorityQueue) of pending fibers to render, in tree order: { // ... host: { // ... visit: (fiber: LiveFiber) => void, unvisit: (fiber: LiveFiber) => void, pop: () => LiveFiber | null, peek: () => LiveFiber | null, } } renderFibers(…) first adds the fibers to the queue by calling host.visit(fiber). A loop in renderFibers(…) will then call host.peek() and host.pop() until the queue is empty. It will call renderFiber(…) and updateFiber(…) on each, which will call host.unvisit(fiber) in the process. This may also cause other fibers to be added to the queue. The priority queue is a singly linked list of fibers. It allows fast appends at the start or end. To speed up insertions in the middle, it remembers the last inserted fiber. This massively speeds up the very common case where multiple fibers are inserted into an existing queue in tree order. Otherwise it just does a linear scan. It also has a set of all the fibers in the queue, so it can quickly do presence checks. This means visit and unvisit can safely be called blindly, which happens a lot. // Re-insert all fibers that descend from fiber const reorder = (fiber: LiveFiber) => { const {path} = fiber; const list: LiveFiber[] = []; let q = queue; let qp = null; while (q) { if (compareFibers(fiber, q.fiber) >= 0) { hint = qp = q; q = q.next; continue; } if (isSubNode(fiber, q.fiber)) { list.push(q.fiber); if (qp) { qp.next = q.next; q = q.next; } else { pop(); q = q.next; } } break; } if (list.length) { list.sort(compareFibers); list.forEach(insert); } }; There is an edge case here though. If a fiber re-orders its keyed children, the compareFibers fiber order of those children changes. But, because of long-range dependencies, it's possible for those children to already be queued. This might mean a later cousin node could render before an earlier one, though never a child before a parent or ancestor. In principle this is not an issue because the output—the reductions being gathered—will be re-reduced in new order at a fence. From a pure data-flow perspective, this is fine: it would even be inevitable in a multi-threaded version. In practice, it feels off if code runs out of order for no reason, especially in a dev environment. So I added optional queue re-ordering, on by default. This can be done pretty easily because the affected fibers can be found by comparing paths, and still form a single group inside the otherwise ordered queue: scan until you find a fiber underneath the parent, then pop off fibers until you exit the subtree. Then just reinsert them. This really reminds me of shader warp reordering in raytracing GPUs btw. Dependencies To support contexts and captures, the host has a long-range dependency tracker (makeDependencyTracker): { host: { // ... depend: (fiber: LiveFiber, root: number) => boolean, undepend: (fiber: LiveFiber, root: number) => void, traceDown: (fiber: LiveFiber) => LiveFiber[], traceUp: (fiber: LiveFiber) => number[], } }; It holds two maps internally, each mapping fibers to fibers, for precedents and descendants respectively. These are mapped as LiveFiber -> id and id -> LiveFiber, once again following the one-way rule. i.e. It gives you real fibers if you traceDown, but only fiber IDs if you traceUp. The latter is only used for highlighting in the inspector. The depend and undepend methods are called by useContext and useCapture to set up a dependency this way. When a fiber is rendered (and did not memoize), bustFiberDeps(…) is called. This will invoke traceDown(…) and call host.visit(…) on each dependent fiber. It will also call bustFiberMemo(…) to bump their fiber.version (if present). Yeets could be tracked the same way, but this is unnecessary because yeeted already references the root statically. It's a different kind of cache being busted too (yeeted.reduced) and you need to bust all intermediate reductions along the way. So there is a dedicated visitYeetRoot(…) and bustFiberYeet(…) instead. Yeets are actually quite tricky to manage because there are two directions of traversal here. A yeet must bust all the caches towards the root. Once those caches are busted, another yeet shouldn't traverse them again until filled back in. It stops when it encounters undefined. Second, when the root gathers up the reduced values from the other end, it should be able to safely accept any defined yeeted.reduced as being correctly cached, and stop as well. The invariant to be maintained is that a trail of yeeted.reduced === undefined should always lead all the way back to the root. New fibers have an undefined reduction, and old fibers may be unmounted, so these operations also bust caches. But if there is no change in yeets, you don't need to reduce again. So visitYeetRoot is not actually called until and unless a new yeet is rendered or an old yeet is removed. Managing the lifecycle of this is simple, because there is only one place that triggers a re-reduction to fill it back in: the yeet root. Which is behind a data fence. It will always be called after the last cache has been busted, but before any other code that might need it. It's impossible to squeeze anything in between. It took a while to learn to lean into this style of thinking. Cache invalidation becomes a lot easier when you can partition your program into "before cache" and "after cache". Compared to the earliest versions of Live, the how and why of busting caches is now all very sensible. You use immutable data, or you pass a mutable ref and a signal. It always works. Resources The useResource hook lets a user register a disposal function for later. useContext and useCapture also need to dispose of their dependency when unmounted. For this, there is a disposal tracker (makeDisposalTracker) which effectively acts as an onFiberDispose event listener: { host: { // ... // Add/remove listener track: (fiber: LiveFiber, task: Task) => void, untrack: (fiber: LiveFiber, task: Task) => void, // Trigger listener dispose: (fiber: LiveFiber) => void, } } Disposal tasks are triggered by host.dispose(fiber), which is called by disposeFiber(fiber). The latter will also set fiber.bound to undefined so the fiber can no longer be called. A useResource may change during a fiber's lifetime. Rather than repeatedly untrack/track a new disposal function each time, I store a persistent resource tag in the hook state. This holds a reference to the latest disposal function. Old resources are explicitly disposed of before new ones are created, ensuring there is no overlap. Stack Slicing A React-like is a recursive tree evaluator. A naive implementation would use function recursion directly, using the native CPU stack. This is what Live 0.0.1 did. But the run-time has overhead, with its own function calls sandwiched in between (e.g. updateFiber, reconcileFiberCalls, flushMount). This creates towering stacks. It also cannot be time-sliced, because all the state is on the stack. In React this is instead implemented with a flat work queue, so it only calls into one component at a time. A profiler shows it repeatedly calling performUnitOfWork, beginWork, completeWork in a clean, shallow trace. Live could do the same with its fiber priority queue. But the rendering order is always just tree order. It's only interrupted and truncated by memoization. So the vast majority of the time you are adding a fiber to the front of the queue only to immediately pop it off again. The queue is a linked list so it creates allocation overhead. This massively complicates what should just be a native function call. Live says "¿Por qué no los dos?" and instead has a stack slicing mechanism (makeStackSlicer). It will use the stack, but stop recursion after N levels, where N is a global knob that current sits at 20. The left-overs are enqueued. This way, mainly fibers pinged by state changes and long-range dependency end up in the queue. This includes fenced continuations, which must always be called indirectly. If a fiber is in the queue, but ends up being rendered in a parent's recursion, it's immediately removed. { host: { // ... depth: (depth: number) => void, slice: (depth: number) => boolean, }, } When renderFibers gets a fiber from the queue, it calls host.depth(fiber.depth) to calibrate the slicer. Every time a mount is flushed, it will then call host.slice(mount.depth) to check if it should be sliced off. If so, it calls host.visit(…) to add it to the queue, but otherwise it just calls renderFiber / updateFiber directly. The exception is when there is a data fence, when the queue is always used. Here too there is a strict mode, on by default, which ensures that once the stack has been sliced, no further sync evaluation can take place higher up the stack. One-phase commit Time to rewind. A Live app consists of a tree of such fiber objects, all exactly the same shape, just with different state and environments inside. It's rendered in a purely one-way data flow, with only a minor asterisk on that statement. The host is the only thing coordinating, because it's the only thing that closes the cycle when state changes. This triggers an ongoing traversal, during which it only tells fibers which dependencies to ping when they render. Everything else emerges from the composition of components. Hopefully you can appreciate that Live is not actually Cowboy React, but something else and deliberate. It has its own invariants it's enforcing, and its own guarantees you can rely on. Like React, it has a strict and a non-strict mode that is meaningfully different, though the strictness is not about it nagging you, but about how anally the run-time will reproduce your exact declared intent. It does not offer any way to roll back partial state changes once made, unlike React. This idempotency model of rendering is good when you need to accommodate mutable references in a reactive system. Immediate mode APIs tend to use these, and Live is designed to be plugged in to those. The nice thing about Live is that it's often meaningful to suspend a partially rendered sub-tree without rewinding it back to the old state, because its state doesn't represent anything directly, like HTML does. It's merely reduced into a value, and you can simply re-use the old value until it has unsuspended. There is no need to hold on to all the old state of the components that produced it. If the value being gathered is made of lambdas, you have your two phases: the commit consists of calling them once you have a full set. In Use.GPU, you work with memory on the GPU, which you allocate once and then reuse by reference. The entire idea is that the view can re-render without re-rendering all components that produced it, the same way that a browser can re-render a page by animating only the CSS transforms. So I have to be all-in on mutability there, because updated transforms have to travel through the layout system without re-triggering it. I also use immediate mode for the CPU-side interactions, because I've found it makes UI controllers 2-3x less complicated. One interesting aspect here is that the difference between capturing and bubbling events, i.e. outside-in or inside-out, is just before-fence and after-fence. Live is also not a React alternative: it plays very nicely with React. You can nest one inside the other and barely notice. The Live inspector is written in React, because I needed it to work even if Live was broken. It can memoize effectively in React because Live is memoized. Therefor everything it shows you is live, including any state you open up. The inspector is functionality-first so I throw purity out the window and just focus on UX and performance. It installs a host.__ping callback so it can receive fiber pings from the run-time whenever they re-render. The run-time calls this via pingFiber in the right spots. Individual fibers can make themselves inspectable by adding undocumented/private props to fiber.__inspect. There are some helper hooks to make this prettier but that's all. You can make any component inspector-highlightable by having it re-render itself when highlighted. * * * Writing this post was a fun endeavour, prompting me to reconsider some assumptions from early on. I also fixed a few things that just sounded bad when said out loud. You know how it is. I removed some lingering unnecessary reverse fiber references. I was aware they weren't load bearing, but that's still different from not having them at all. The only one I haven't mentioned is the capture keys, which are a fiber so that they can be directly compared. In theory it only needs the id, path, depth, keys, and I could package those up separately, but it would just create extra objects, so the jury allows it. Live can model programs shaped like a one-way data flow, and generates one-way data itself. There are some interesting correspondences here. Live keep state entirely in fiber objects, while fibers run entirely on fiber.state. A fiber object is just a fixed dictionary of properties, always the same shape, just like fiber.state is for a component's lifetime. Children arrays without keys must be fixed-length and fixed-order (a fragment), but may have nulls. This is very similar to how no-hooks will skip over a missing spot in the fiber.state array and zero out the hook, so as to preserve hook order. Live hot-swaps a global currentFiber pointer to switch fibers, and useYolo hot-swaps a fiber's own local state to switch hook scopes. Memoizing a component can be implemented as a nested useMemo. Bumping the fiber version is really a bespoke setState which is resolved during next render. The lines between fiber, fiber.state and fiber.mounts are actually pretty damn blurry. A lot of mechanisms appear twice, once in a non-incremental form and once in an incremental form. Iteration turns into mounting, sequences turn into fences, and objects get chopped up into fine bits of cached state, either counted or with keys. The difference between hooks and a gather of unkeyed components start to blur. If Live is react-react, then a self-hosted live-live is hiding in there somewhere. Create a root fiber, give it empty state, off you go. Inlining would be a lot harder though, and you wouldn't be able to hand-roll fast paths as easily, which is always the problem in FP. For a JS implementation it would be very dumb, especially when you know that the JS VM already manages object prototypes incrementally, mounting one prop at a time. I do like the sound of an incremental Lisp where everything is made out of flat state lists instead of endless pointer chasing. If it had the same structure as Live, it might only have one genuine linked list driving it all: the priority queue, which holds elements pointing to elements. The rest would be elements pointing to linear arrays, a data structure that silicon caches love. A data-oriented Lisp maybe? You could even call it an incremental GPU. Worth pondering. What Live could really use is a WASM pony with better stack access and threading. But in the mean time, it already works. The source code for the embedded examples can be found on GitLab. If your browser can do WebGPU (desktop only for now), you can load up any of the Use.GPU examples and inspect them.

a year ago 48 votes

More in programming

Adding auto-generated cover images to EPUBs downloaded from AO3

I was chatting with a friend recently, and she mentioned an annoyance when reading fanfiction on her iPad. She downloads fic from AO3 as EPUB files, and reads it in the Kindle app – but the files don’t have a cover image, and so the preview thumbnails aren’t very readable: She’s downloaded several hundred stories, and these thumbnails make it difficult to find things in the app’s “collections” view. This felt like a solvable problem. There are tools to add cover images to EPUB files, if you already have the image. The EPUB file embeds some key metadata, like the title and author. What if you had a tool that could extract that metadata, auto-generate an image, and use it as the cover? So I built that. It’s a small site where you upload EPUB files you’ve downloaded from AO3, the site generates a cover image based on the metadata, and it gives you an updated EPUB to download. The new covers show the title and author in large text on a coloured background, so they’re much easier to browse in the Kindle app: If you’d find this helpful, you can use it at alexwlchan.net/my-tools/add-cover-to-ao3-epubs/ Otherwise, I’m going to explain how it works, and what I learnt from building it. There are three steps to this process: Open the existing EPUB to get the title and author Generate an image based on that metadata Modify the EPUB to insert the new cover image Let’s go through them in turn. Open the existing EPUB I’ve not worked with EPUB before, and I don’t know much about it. My first instinct was to look for Python EPUB libraries on PyPI, but there was nothing appealing. The results were either very specific tools (convert EPUB to/from format X) or very unmaintained (the top result was last updated in April 2014). I decied to try writing my own code to manipulate EPUBs, rather than using somebody else’s library. I had a vague memory that EPUB files are zips, so I changed the extension from .epub to .zip and tried unzipping one – and it turns out that yes, it is a zip file, and the internal structure is fairly simple. I found a file called content.opf which contains metadata as XML, including the title and author I’m looking for: <?xml version='1.0' encoding='utf-8'?> <package xmlns="http://www.idpf.org/2007/opf" version="2.0" unique-identifier="uuid_id"> <metadata xmlns:opf="http://www.idpf.org/2007/opf" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:dcterms="http://purl.org/dc/terms/" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:calibre="http://calibre.kovidgoyal.net/2009/metadata"> <dc:title>Operation Cameo</dc:title> <meta name="calibre:timestamp" content="2025-01-25T18:01:43.253715+00:00"/> <dc:language>en</dc:language> <dc:creator opf:file-as="alexwlchan" opf:role="aut">alexwlchan</dc:creator> <dc:identifier id="uuid_id" opf:scheme="uuid">13385d97-35a1-4e72-830b-9757916d38a7</dc:identifier> <meta name="calibre:title_sort" content="operation cameo"/> <dc:description><p>Some unusual orders arrive at Operation Mincemeat HQ.</p></dc:description> <dc:publisher>Archive of Our Own</dc:publisher> <dc:subject>Fanworks</dc:subject> <dc:subject>General Audiences</dc:subject> <dc:subject>Operation Mincemeat: A New Musical - SpitLip</dc:subject> <dc:subject>No Archive Warnings Apply</dc:subject> <dc:date>2023-12-14T00:00:00+00:00</dc:date> </metadata> … That dc: prefix was instantly familiar from my time working at Wellcome Collection – this is Dublin Core, a standard set of metadata fields used to describe books and other objects. I’m unsurprised to see it in an EPUB; this is exactly how I’d expect it to be used. I found an article that explains the structure of an EPUB file, which told me that I can find the content.opf file by looking at the root-path element inside the mandatory META-INF/container.xml file which is every EPUB. I wrote some code to find the content.opf file, then a few XPath expressions to extract the key fields, and I had the metadata I needed. Generate a cover image I sketched a simple cover design which shows the title and author. I wrote the first version of the drawing code in Pillow, because that’s what I’m familiar with. It was fine, but the code was quite flimsy – it didn’t wrap properly for long titles, and I couldn’t get custom fonts to work. Later I rewrote the app in JavaScript, so I had access to the HTML canvas element. This is another tool that I haven’t worked with before, so a fun chance to learn something new. The API felt fairly familiar, similar to other APIs I’ve used to build HTML elements. This time I did implement some line wrapping – there’s a measureText() API for canvas, so you can see how much space text will take up before you draw it. I break the text into words, and keeping adding words to a line until measureText tells me the line is going to overflow the page. I have lots of ideas for how I could improve the line wrapping, but it’s good enough for now. I was also able to get fonts working, so I picked Georgia to match the font used for titles on AO3. Here are some examples: I had several ideas for choosing the background colour. I’m trying to help my friend browse her collection of fic, and colour would be a useful way to distinguish things – so how do I use it? I realised I could get the fandom from the EPUB file, so I decided to use that. I use the fandom name as a seed to a random number generator, then I pick a random colour. This means that all the fics in the same fandom will get the same colour – for example, all the Star Wars stories are a shade of red, while Star Trek are a bluey-green. This was a bit harder than I expected, because it turns out that JavaScript doesn’t have a built-in seeded random number generator – I ended up using some snippets from a Stack Overflow answer, where bryc has written several pseudorandom number generators in plain JavaScript. I didn’t realise until later, but I designed something similar to the placeholder book covers in the Apple Books app. I don’t use Apple Books that often so it wasn’t a deliberate choice to mimic this style, but clearly it was somewhere in my subconscious. One difference is that Apple’s app seems to be picking from a small selection of background colours, whereas my code can pick a much nicer variety of colours. Apple’s choices will have been pre-approved by a designer to look good, but I think mine is more fun. Add the cover image to the EPUB My first attempt to add a cover image used pandoc: pandoc input.epub --output output.epub --epub-cover-image cover.jpeg This approach was no good: although it added the cover image, it destroyed the formatting in the rest of the EPUB. This made it easier to find the fic, but harder to read once you’d found it. An EPUB file I downloaded from AO3, before/after it was processed by pandoc. So I tried to do it myself, and it turned out to be quite easy! I unzipped another EPUB which already had a cover image. I found the cover image in OPS/images/cover.jpg, and then I looked for references to it in content.opf. I found two elements that referred to cover images: <?xml version="1.0" encoding="UTF-8"?> <package xmlns="http://www.idpf.org/2007/opf" version="3.0" unique-identifier="PrimaryID"> <metadata xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:opf="http://www.idpf.org/2007/opf"> <meta name="cover" content="cover-image"/> … </metadata> <manifest> <item id="cover-image" href="images/cover.jpg" media-type="image/jpeg" properties="cover-image"/> … </manifest> </package> This gave me the steps for adding a cover image to an EPUB file: add the image file to the zipped bundle, then add these two elements to the content.opf. Where am I going to deploy this? I wrote the initial prototype of this in Python, because that’s the language I’m most familiar with. Python has all the libraries I need: The zipfile module can unpack and modify the EPUB/ZIP The xml.etree or lxml modules can manipulate XML The Pillow library can generate images I built a small Flask web app: you upload the EPUB to my server, my server does some processing, and sends the EPUB back to you. But for such a simple app, do I need a server? I tried rebuilding it as a static web page, doing all the processing in client-side JavaScript. That’s simpler for me to host, and it doesn’t involve a round-trip to my server. That has lots of other benefits – it’s faster, less of a privacy risk, and doesn’t require a persistent connection. I love static websites, so can they do this? Yes! I just had to find a different set of libraries: The JSZip library can unpack and modify the EPUB/ZIP, and is the only third-party code I’m using in the tool Browsers include DOMParser for manipulating XML I’ve already mentioned the HTML <canvas> element for rendering the image This took a bit longer because I’m not as familiar with JavaScript, but I got it working. As a bonus, this makes the tool very portable. Everything is bundled into a single HTML file, so if you download that file, you have the whole tool. If my friend finds this tool useful, she can save the file and keep a local copy of it – she doesn’t have to rely on my website to keep using it. What should it look like? My first design was very “engineer brain” – I just put the basic controls on the page. It was fine, but it wasn’t good. That might be okay, because the only person I need to be able to use this app is my friend – but wouldn’t it be nice if other people were able to use it? If they’re going to do that, they need to know what it is – most people aren’t going to read a 2,500 word blog post to understand a tool they’ve never heard of. (Although if you have read this far, I appreciate you!) I started designing a proper page, including some explanations and descriptions of what the tool is doing. I got something that felt pretty good, including FAQs and acknowledgements, and I added a grey area for the part where you actually upload and download your EPUBs, to draw the user’s eye and make it clear this is the important stuff. But even with that design, something was missing. I realised I was telling you I’d create covers, but not showing you what they’d look like. Aha! I sat down and made up a bunch of amusing titles for fanfic and fanfic authors, so now you see a sample of the covers before you upload your first EPUB: This makes it clearer what the app will do, and was a fun way to wrap up the project. What did I learn from this project? Don’t be scared of new file formats My first instinct was to look for a third-party library that could handle the “complexity” of EPUB files. In hindsight, I’m glad I didn’t find one – it forced me to learn more about how EPUBs work, and I realised I could write my own code using built-in libraries. EPUB files are essentially ZIP files, and I only had basic needs. I was able to write my own code. Because I didn’t rely on a library, now I know more about EPUBs, I have code that’s simpler and easier for me to understand, and I don’t have a dependency that may cause problems later. There are definitely some file formats where I need existing libraries (I’m not going to write my own JPEG parser, for example) – but I should be more open to writing my own code, and not jumping to add a dependency. Static websites can handle complex file manipulations I love static websites and I’ve used them for a lot of tasks, but mostly read-only display of information – not anything more complex or interactive. But modern JavaScript is very capable, and you can do a lot of things with it. Static pages aren’t just for static data. One of the first things I made that got popular was find untagged Tumblr posts, which was built as a static website because that’s all I knew how to build at the time. Somewhere in the intervening years, I forgot just how powerful static sites can be. I want to build more tools this way. Async JavaScript calls require careful handling The JSZip library I’m using has a lot of async functions, and this is my first time using async JavaScript. I got caught out several times, because I forgot to wait for async calls to finish properly. For example, I’m using canvas.toBlob to render the image, which is an async function. I wasn’t waiting for it to finish, and so the zip would be repackaged before the cover image was ready to add, and I got an EPUB with a missing image. Oops. I think I’ll always prefer the simplicity of synchronous code, but I’m sure I’ll get better at async JavaScript with practice. Final thoughts I know my friend will find this helpful, and that feels great. Writing software that’s designed for one person is my favourite software to write. It’s not hyper-scale, it won’t launch the next big startup, and it’s usually not breaking new technical ground – but it is useful. I can see how I’m making somebody’s life better, and isn’t that what computers are for? If other people like it, that’s a nice bonus, but I’m really thinking about that one person. Normally the one person I’m writing software for is me, so it’s extra nice when I can do it for somebody else. If you want to try this tool yourself, go to alexwlchan.net/my-tools/add-cover-to-ao3-epubs/ If you want to read the code, it’s all on GitHub. [If the formatting of this post looks odd in your feed reader, visit the original article]

4 hours ago 3 votes
Non-alcoholic apéritifs

I’ve been doing Dry January this year. One thing I missed was something for apéro hour, a beverage to mark the start of the evening. Something complex and maybe bitter, not like a drink you’d have with lunch. I found some good options. Ghia sodas are my favorite. Ghia is an NA apéritif based on grape juice but with enough bitterness (gentian) and sourness (yuzu) to be interesting. You can buy a bottle and mix it with soda yourself but I like the little cans with extra flavoring. The Ginger and the Sumac & Chili are both great. Another thing I like are low-sugar fancy soda pops. Not diet drinks, they still have a little sugar, but typically 50 calories a can. De La Calle Tepache is my favorite. Fermented pineapple is delicious and they have some fun flavors. Culture Pop is also good. A friend gave me the Zero book, a drinks cookbook from the fancy restaurant Alinea. This book is a little aspirational but the recipes are doable, it’s just a lot of labor. Very fancy high end drink mixing, really beautiful flavor ideas. The only thing I made was their gin substitute (mostly junipers extracted in glycerin) and it was too sweet for me. Need to find the right use for it, a martini definitely ain’t it. An easier homemade drink is this Nonalcoholic Dirty Lemon Tonic. It’s basically a lemonade heavily flavored with salted preserved lemons, then mixed with tonic. I love the complexity and freshness of this drink and enjoy it on its own merits. Finally, non-alcoholic beer has gotten a lot better in the last few years thanks to manufacturing innovations. I’ve been enjoying NA Black Butte Porter, Stella Artois 0.0, Heineken 0.0. They basically all taste just like their alcoholic uncles, no compromise. One thing to note about non-alcoholic substitutes is they are not cheap. They’ve become a big high end business. Expect to pay the same for an NA drink as one with alcohol even though they aren’t taxed nearly as much.

2 days ago 5 votes
It burns

The first time we had to evacuate Malibu this season was during the Franklin fire in early December. We went to bed with our bags packed, thinking they'd probably get it under control. But by 2am, the roaring blades of fire choppers shaking the house got us up. As we sped down the canyon towards Pacific Coast Highway (PCH), the fire had reached the ridge across from ours, and flames were blazing large out the car windows. It felt like we had left the evacuation a little too late, but they eventually did get Franklin under control before it reached us. Humans have a strange relationship with risk and disasters. We're so prone to wishful thinking and bad pattern matching. I remember people being shocked when the flames jumped the PCH during the Woolsey fire in 2017. IT HAD NEVER DONE THAT! So several friends of ours had to suddenly escape a nightmare scenario, driving through burning streets, in heavy smoke, with literally their lives on the line. Because the past had failed to predict the future. I feel into that same trap for a moment with the dramatic proclamations of wind and fire weather in the days leading up to January 7. Warning after warning of "extremely dangerous, life-threatening wind" coming from the City of Malibu, and that overly-bureaucratic-but-still-ominous "Particularly Dangerous Situation" designation. Because, really, how much worse could it be? Turns out, a lot. It was a little before noon on the 7th when we first saw the big plumes of smoke rise from the Palisades fire. And immediately the pattern matching ran astray. Oh, it's probably just like Franklin. It's not big yet, they'll get it out. They usually do. Well, they didn't. By the late afternoon, we had once more packed our bags, and by then it was also clear that things actually were different this time. Different worse. Different enough that even Santa Monica didn't feel like it was assured to be safe. So we headed far North, to be sure that we wouldn't have to evacuate again. Turned out to be a good move. Because by now, into the evening, few people in the connected world hadn't started to see the catastrophic images emerging from the Palisades and Eaton fires. Well over 10,000 houses would ultimately burn. Entire neighborhoods leveled. Pictures that could be mistaken for World War II. Utter and complete destruction. By the night of the 7th, the fire reached our canyon, and it tore through the chaparral and brush that'd been building since the last big fire that area saw in 1993. Out of some 150 houses in our immediate vicinity, nearly a hundred burned to the ground. Including the first house we moved to in Malibu back in 2009. But thankfully not ours. That's of course a huge relief. This was and is our Malibu Dream House. The site of that gorgeous home office I'm so fond to share views from. Our home. But a house left standing in a disaster zone is still a disaster. The flames reached all the way up to the base of our construction, incinerated much of our landscaping, and devoured the power poles around it to dysfunction. We have burnt-out buildings every which way the eye looks. The national guard is still stationed at road blocks on the access roads. Utility workers are tearing down the entire power grid to rebuild it from scratch. It's going to be a long time before this is comfortably habitable again. So we left. That in itself feels like defeat. There's an urge to stay put, and to help, in whatever helpless ways you can. But with three school-age children who've already missed over a months worth of learning from power outages, fire threats, actual fires, and now mudslide dangers, it was time to go. None of this came as a surprise, mind you. After Woolsey in 2017, Malibu life always felt like living on borrowed time to us. We knew it, even accepted it. Beautiful enough to be worth the risk, we said.  But even if it wasn't a surprise, it's still a shock. The sheer devastation, especially in the Palisades, went far beyond our normal range of comprehension. Bounded, as it always is, by past experiences. Thus, we find ourselves back in Copenhagen. A safe haven for calamities of all sorts. We lived here for three years during the pandemic, so it just made sense to use it for refuge once more. The kids' old international school accepted them right back in, and past friendships were quickly rebooted. I don't know how long it's going to be this time. And that's an odd feeling to have, just as America has been turning a corner, and just as the optimism is back in so many areas. Of the twenty years I've spent in America, this feels like the most exciting time to be part of the exceptionalism that the US of A offers. And of course we still are. I'll still be in the US all the time on both business, racing, and family trips. But it won't be exclusively so for a while, and it won't be from our Malibu Dream House. And that burns.

2 days ago 6 votes
Slow, flaky, and failing

Thou shalt not suffer a flaky test to live, because it’s annoying, counterproductive, and dangerous: one day it might fail for real, and you won’t notice. Here’s what to do.

3 days ago 7 votes
Name that Ware, January 2025

The ware for January 2025 is shown below. Thanks to brimdavis for contributing this ware! …back in the day when you would get wares that had “blue wires” in them… One thing I wonder about this ware is…where are the ROMs? Perhaps I’ll find out soon! Happy year of the snake!

3 days ago 5 votes