React - The Missing Parts • BoredReading.com

React - The Missing Parts

from Acko.net [alt+shift+b] in programming

Question the rules for fun and profit One of the nice things about having your own lean copy of a popular library's patterns is that you can experiment with all sorts of changes. In my case, I have a React-clone, Live, which includes all the familiar basics: props, state and hooks. The semantics are all the same. The premise is simple: after noticing that React is shaped like an incremental, resumable effect system, I wanted to see if I could use the same patterns for non-UI application code too. Thus, my version leaves out the most essential part of React entirely: actually rendering to HTML. There is no React-DOM as such, and nothing external is produced by the run-time. Live Components mainly serve to expand into either other Live Components, or nothing. This might sound useless, but it turns out it's not. I should emphasize though, I am not talking about the React useEffect hook. The Effect-analog in React are the Components themselves. Along the way, I've come up with some bespoke...

over a year ago

Remove from reading list Add to reading list [alt+a] Read now [→]

Improve your reading experience

Logged in users get linked directly to articles resulting in a better reading experience. Please login for free, it takes less than 1 minute.

More from Acko.net

HTML is Dead, Long Live HTML

Rethinking DOM from first principles Browsers are in a very weird place. While WebAssembly has succeeded, even on the server, the client still feels largely the same as it did 10 years ago. Enthusiasts will tell you that accessing native web APIs via WASM is a solved problem, with some minimal JS glue. But the question not asked is why you would want to access the DOM. It's just the only option. So I'd like to explain why it really is time to send the DOM and its assorted APIs off to a farm somewhere, with some ideas on how. I won't pretend to know everything about browsers. Nobody knows everything anymore, and that's the problem. The 'Document' Model Few know how bad the DOM really is. In Chrome, document.body now has 350+ keys, grouped roughly like this: This doesn't include the CSS properties in document.body.style of which there are... 660. The boundary between properties and methods is very vague. Many are just facades with an invisible setter behind them. Some getters may trigger a just-in-time re-layout. There's ancient legacy stuff, like all the onevent properties nobody uses anymore. The DOM is not lean and continues to get fatter. Whether you notice this largely depends on whether you are making web pages or web applications. Most devs now avoid working with the DOM directly, though occasionally some purist will praise pure DOM as being superior to the various JS component/templating frameworks. What little declarative facilities the DOM has, like innerHTML, do not resemble modern UI patterns at all. The DOM has too many ways to do the same thing, none of them nice. connectedCallback() { const shadow = this.attachShadow({ mode: 'closed' }), template = document.getElementById('hello-world') .content.cloneNode(true), hwMsg = `Hello ${ this.name }`; Array.from(template.querySelectorAll('.hw-text')) .forEach(n => n.textContent = hwMsg); shadow.append(template); } Web Components deserve a mention, being the web-native equivalent of JS component libraries. But they came too late and are unpopular. The API seems clunky, with its Shadow DOM introducing new nesting and scoping layers. Proponents kinda read like apologetics. The achilles heel is the DOM's SGML/XML heritage, making everything stringly typed. React-likes do not have this problem, their syntax only looks like XML. Devs have learned not to keep state in the document, because it's inadequate for it. For HTML itself, there isn't much to critique because nothing has changed in 10-15 years. Only ARIA (accessibility) is notable, and only because this was what Semantic HTML was supposed to do and didn't. Semantic HTML never quite reached its goal. Despite dating from around 2011, there is e.g. no <thread> or <comment> tag, when those were well-established idioms. Instead, an article inside an article is probably a comment. The guidelines are... weird. There's this feeling that HTML always had paper-envy, and couldn't quite embrace or fully define its hypertext nature, and did not trust its users to follow clear rules. Stewardship of HTML has since firmly passed to WHATWG, really the browser vendors, who have not been able to define anything more concrete as a vision, and have instead just added epicycles at the margins. Along the way even CSS has grown expressions, because every templating language wants to become a programming language. Editability of HTML remains a sad footnote. While technically supported via contentEditable, actually wrangling this feature into something usable for applications is a dark art. I'm sure the Google Docs and Notion people have horror stories. Nobody really believes in the old gods of progressive enhancement and separating markup from style anymore, not if they make apps. Most of the applications you see nowadays will kitbash HTML/CSS/SVG into a pretty enough shape. But this comes with immense overhead, and is looking more and more like the opposite of a decent UI toolkit. The Slack input box Off-screen clipboard hacks Lists and tables must be virtualized by hand, taking over for layout, resizing, dragging, and so on. Making a chat window's scrollbar stick to the bottom is somebody's TODO, every single time. And the more you virtualize, the more you have to reinvent find-in-page, right-click menus, etc. The web blurred the distinction between UI and fluid content, which was novel at the time. But it makes less and less sense, because the UI part is a decade obsolete, and the content has largely homogenized. CSS is inside-out CSS doesn't have a stellar reputation either, but few can put their finger on exactly why. Where most people go wrong is to start with the wrong mental model, approaching it like a constraint solver. This is easy to show with e.g.: <div> <div style="height: 50%">...</div> <div style="height: 50%">...</div> </div> <div> <div style="height: 100%">...</div> <div style="height: 100%">...</div> </div> The first might seem reasonable: divide the parent into two halves vertically. But what about the second? Viewed as a set of constraints, it's contradictory, because the parent div is twice as tall as... itself. What will happen instead in both cases is the height is ignored. The parent height is unknown and CSS doesn't backtrack or iterate here. It just shrink-wraps the contents. If you set e.g. height: 300px on the parent, then it works, but the latter case will still just spill out. Outside-in and inside-out layout modes Instead, your mental model of CSS should be applying two passes of constraints, first going outside-in, and then inside-out. When you make an application frame, this is outside-in: the available space is divided, and the content inside does not affect sizing of panels. When paragraphs stack on a page, this is inside-out: the text stretches out its containing parent. This is what HTML wants to do naturally. By being structured this way, CSS layouts are computationally pretty simple. You can propagate the parent constraints down to the children, and then gather up the children's sizes in the other direction. This is attractive and allows webpages to scale well in terms of elements and text content. CSS is always inside-out by default, reflecting its document-oriented nature. The outside-in is not obvious, because it's up to you to pass all the constraints down, starting with body { height: 100%; }. This is why they always say vertical alignment in CSS is hard. Use flex grow and shrink for spill-free auto-layouts with completely reasonable gaps The scenario above is better handled with a CSS3 flex box (display: flex), which provides explicit control over how space is divided. Unfortunately flexing muddles the simple CSS model. To auto-flex, the layout algorithm must measure the "natural size" of every child. This means laying it out twice: first speculatively, as if floating in aether, and then again after growing or shrinking to fit: This sounds reasonable but can come with hidden surprises, because it's recursive. Doing speculative layout of a parent often requires full layout of unsized children. e.g. to know how text will wrap. If you nest it right, it could in theory cause an exponential blow up, though I've never heard of it being an issue. Instead you will only discover this when someone drops some large content in somewhere, and suddenly everything gets stretched out of whack. It's the opposite of the problem on the mug. To avoid the recursive dependency, you need to isolate the children's contents from the outside, thus making speculative layout trivial. This can be done with contain: size, or by manually setting the flex-basis size. CSS has gained a few constructs like contain or will-transform, which work directly with the layout system, and drop the pretense of one big happy layout. It reveals some of the layer-oriented nature underneath, and is a substitute for e.g. using position: absolute wrappers to do the same. What these do is strip off some of the semantics, and break the flow of DOM-wide constraints. These are overly broad by default and too document-oriented for the simpler cases. This is really a metaphor for all DOM APIs. The Good Parts? That said, flex box is pretty decent if you understand these caveats. Building layouts out of nested rows and columns with gaps is intuitive, and adapts well to varying sizes. There is a "CSS: The Good Parts" here, which you can make ergonomic with sufficient love. CSS grids also work similarly, they're just very painfully... CSSy in their syntax. But if you designed CSS layout from scratch, you wouldn't do it this way. You wouldn't have a subtractive API, with additional extra containment barrier hints. You would instead break the behavior down into its component facets, and use them à la carte. Outside-in and inside-out would both be legible as different kinds of containers and placement models. The inline-block and inline-flex display models illustrate this: it's a block or flex on the inside, but an inline element on the outside. These are two (mostly) orthogonal aspects of a box in a box model. Text and font styles are in fact the odd ones out, in hypertext. Properties like font size inherit from parent to child, so that formatting tags like <b> can work. But most of those 660 CSS properties do not do that. Setting a border on an element does not apply the same border to all its children recursively, that would be silly. It shows that CSS is at least two different things mashed together: a system for styling rich text based on inheritance... and a layout system for block and inline elements, nested recursively but without inheritance, only containment. They use the same syntax and APIs, but don't really cascade the same way. Combining this under one style-umbrella was a mistake. Worth pointing out: early ideas of relative em scaling have largely become irrelevant. We now think of logical vs device pixels instead, which is a far more sane solution, and closer to what users actually expect. SVG is natively integrated as well. Having SVGs in the DOM instead of just as <img> tags is useful to dynamically generate shapes and adjust icon styles. But while SVG is powerful, it's neither a subset nor superset of CSS. Even when it overlaps, there are subtle differences, like the affine transform. It has its own warts, like serializing all coordinates to strings. CSS has also gained the ability to round corners, draw gradients, and apply arbitrary clipping masks: it clearly has SVG-envy, but falls very short. SVG can e.g. do polygonal hit-testing for mouse events, which CSS cannot, and SVG has its own set of graphical layer effects. Whether you use HTML/CSS or SVG to render any particular element is based on specific annoying trade-offs, even if they're all scalable vectors on the back-end. In either case, there are also some roadblocks. I'll just mention three: text-ellipsis can only be used to truncate unwrapped text, not entire paragraphs. Detecting truncated text is even harder, as is just measuring text: the APIs are inadequate. Everyone just counts letters instead. position: sticky lets elements stay in place while scrolling with zero jank. While tailor-made for this purpose, it's subtly broken. Having elements remain unconditionally sticky requires an absurd nesting hack, when it should be trivial. The z-index property determines layering by absolute index. This inevitably leads to a z-index-war.css where everyone is putting in a new number +1 or -1 to make things layer correctly. There is no concept of relative Z positioning. For each of these features, we got stuck with v1 of whatever they could get working, instead of providing the right primitives. Getting this right isn't easy, it's the hard part of API design. You can only iterate on it, by building real stuff with it before finalizing it, and looking for the holes. Oil on Canvas So, DOM is bad, CSS is single-digit X% good, and SVG is ugly but necessary... and nobody is in a position to fix it? Well no. The diagnosis is that the middle layers don't suit anyone particularly well anymore. Just an HTML6 that finally removes things could be a good start. But most of what needs to happen is to liberate the functionality that is there already. This can be done in good or bad ways. Ideally you design your system so the "escape hatch" for custom use is the same API you built the user-space stuff with. That's what dogfooding is, and also how you get good kernels. A recent proposal here is HTML in Canvas, to draw HTML content into a <canvas>, with full control over the visual output. It's not very good. While it might seem useful, the only reason the API has the shape that it does is because it's shoehorned into the DOM: elements must be descendants of <canvas> to fully participate in layout and styling, and to make accessibility work. There are also "technical concerns" with using it off-screen. One example is this spinny cube: To make it interactive, you attach hit-testing rectangles and respond to paint events. This is a new kind of hit-testing API. But it only works in 2D... so it seems 3D-use is only cosmetic? I have many questions. Again, if you designed it from scratch, you wouldn't do it this way! In particular, it's absurd that you'd have to take over all interaction responsibilities for an element and its descendants just to be able to customize how it looks i.e. renders. Especially in a browser that has projective CSS 3D transforms. The use cases not covered by that, e.g. curved re-projection, will also need more complicated hit-testing than rectangles. Did they think this through? What happens when you put a dropdown in there? To me it seems like they couldn't really figure out how to unify CSS and SVG filters, or how to add shaders to CSS. Passing it thru canvas is the only viable option left. "At least it's programmable." Is it really? Screenshotting DOM content is 1 good use-case, but not what this is sold as at all. The whole reason to do "complex UIs on canvas" is to do all the things the DOM doesn't do, like virtualizing content, just-in-time layout and styling, visual effects, custom gestures and hit-testing, and so on. It's all nuts and bolts stuff. Having to pre-stage all the DOM content you want to draw sounds... very counterproductive. From a reactivity point-of-view it's also a bad idea to route this stuff back through the same document tree, because it sets up potential cycles with observers. A canvas that's rendering DOM content isn't really a document element anymore, it's doing something else entirely. Canvas-based spreadsheet that skips the DOM entirely The actual achilles heel of canvas is that you don't have any real access to system fonts, text layout APIs, or UI utilities. It's quite absurd how basic it is. You have to implement everything from scratch, including Unicode word splitting, just to get wrapped text. The proposal is "just use the DOM as a black box for content." But we already know that you can't do anything except more CSS/SVG kitbashing this way. text-ellipsis and friends will still be broken, and you will still need to implement UIs circa 1990 from scratch to fix it. It's all-or-nothing when you actually want something right in the middle. That's why the lower level needs to be opened up. Where To Go From Here The goals of "HTML in Canvas" do strike a chord, with chunks of HTML used as free-floating fragments, a notion that has always existed under the hood. It's a composite value type you can handle. But it should not drag 20 years of useless baggage along, while not enabling anything truly novel. The kitbashing of the web has also resulted in enormous stagnation, and a loss of general UI finesse. When UI behaviors have to be mined out of divs, it limits the kinds of solutions you can even consider. Fixing this within DOM/HTML seems unwise, because there's just too much mess inside. Instead, new surfaces should be opened up outside of it. WebGPU-based box model My schtick here has become to point awkwardly at Use.GPU's HTML-like renderer, which does a full X/Y flex model in a fraction of the complexity or code. I don't mean my stuff is super great, no, it's pretty bare-bones and kinda niche... and yet definitely nicer. Vertical centering is easy. Positioning makes sense. There is no semantic HTML or CSS cascade, just first-class layout. You don't need 61 different accessors for border* either. You can just attach shaders to divs. Like, that's what people wanted right? Here's a blueprint, it's mostly just SDFs. Font and markup concerns only appear at the leaves of the tree, where the text sits. It's striking how you can do like 90% of what the DOM does here, with a fraction of the complexity of HTML/CSS/SVG, if you just reinvent that wheel. Done by 1 guy. And yes, I know about the second 90% too. The classic data model here is of a view tree and a render tree. What should the view tree actually look like? And what can it be lowered into? What is it being lowered into right now, by a giant pile of legacy crud? Alt-browser projects like Servo or Ladybird are in a position to make good proposals here. They have the freshest implementations, and are targeting the most essential features first. The big browser vendors could also do it, but well, taste matters. Good big systems grow from good small ones, not bad big ones. Maybe if Mozilla hadn't imploded... but alas. Platform-native UI toolkits are still playing catch up with declarative and reactive UI, so that's that. Native Electron-alternatives like Tauri could be helpful, but they don't treat origin isolation as a design constraint, which makes security teams antsy. There's a feasible carrot to dangle for them though, namely in the form of better process isolation. Because of CPU exploits like Spectre, multi-threading via SharedArrayBuffer and Web Workers is kinda dead on arrival anyway, and that affects all WASM. The details are boring but right now it's an impossible sell when websites have to have things like OAuth and Zendesk integrated into them. Reinventing the DOM to ditch all legacy baggage could coincide with redesigning it for a more multi-threaded, multi-origin, and async web. The browser engines are already multi-process... what did they learn? A lot has happened since Netscape, with advances in structured concurrency, ownership semantics, FP effects... all could come in handy here. * * * Step 1 should just be a data model that doesn't have 350+ properties per node tho. Don't be under the mistaken impression that this isn't entirely fixable.

a week ago • 16 votes

Occlusion with Bells On

Modern SSAO in a modern run-time Use.GPU 0.14 is out, so here's an update on my declarative/reactive rendering efforts. The highlights in this release are: dramatic inspector viewing upgrades a modern ambient-occlusion (SSAO/GTAO) implementation newly revised render pass infrastructure expanded shader generation for bind groups more use of generated WGSL struct types SSAO with Image-Based Lighting The main effect is that out-of-the-box, without any textures, Use.GPU no longer looks like early 2000s OpenGL. This is a problem every home-grown 3D effort runs into: how to make things look good without premium, high-quality models and pre-baking all the lights. Use.GPU's reactive run-time continues to purr along well. Its main role is to enable doing at run-time what normally only happens at build time: dealing with shader permutations, assigning bindings, and so on. I'm quite proud of the line up of demos Use.GPU has now, for the sheer diversity of rendering techniques on display, including an example path tracer. The new inspector is the cherry on top. A lot of the effort continues to revolve around mitigating flaws in GPU API design, and offering something simpler. As such, the challenge here wasn't just implementing SSAO: the basic effect is pretty easy. Rather, it brings with it a few new requirements, such as temporal accumulation and reprojection, that put new demands on the rendering pipeline, which I still want to expose in a modular and flexible way. This refines the efforts I detailed previously for 0.8. Good SSAO also requires deep integration in the lighting pipeline. Here there is tension between modularizing and ease-of-use. If there is only one way to assemble a particular set of components, then it should probably be provided as a prefab. As such, occlusion has to remain a first class concept, tho it can be provided in several ways. It's a good case study of pragmatism over purity. In case you're wondering: WebGPU is still not readily available on every device, so Use.GPU remains niche, tho it already excels at in-house use for adventurous clients. At this point you can imagine me and the browser GPU teams eyeing each other awkwardly from across the room: I certainly do. Inspector Gadget The first thing to mention is the upgraded the Use.GPU inspector. It already had a lot of quality-of-life features like highlighting, but the main issue was finding your way around the giant trees that Use.GPU now expands into. Old New Highlights show data dependencies The fix was filtering by type. This is very simple as a component already advertises its inspectability in a few pragmatic ways. Additionally, it uses the data dependency graph between components to identify relevant parents. This shows a surprisingly tidy overview with no additional manual tagging. For each demo, it really does show you the major parts first now. If you've checked it out before, give it another try. The layered structure is now clearly visible, and often fits in one screen. The main split is how Live is used to reconcile different levels of representation: from data, to geometry, to renders, to dispatches. These points appear as different reconciler nodes, and can be toggled as a filter. It's still the best way to see Live and Use.GPU in action. It can be tricky to grok that each line in the tree is really a plain function, calling other functions, as it's an execution trace you can inspect. It will now point you more in the right way, and auto-select the most useful tabs by default. The inspector is unfortunately far heavier than the GPU rendering itself, as it all relies on HTML and React to do its thing. At some point it's probably worth to remake it into a Live-native version, maybe as a 2D canvas with some virtualization. But in the mean time it's a dev tool, so the important thing is that it still works when nothing else does. Most of the images of buffers in this post can be viewed live in the inspector, if you have a WebGPU capable browser. SSAO Screen-space AO is common now: using the rendered depth buffer, you estimate occlusion in a hemisphere around every point. I opted for Ground Truth AO (GTAO) as it estimates the correct visibility integral, as opposed to a more empirical 'crease darkening' technique. It also allows me to estimate bent normals along the way, i.e. the average unoccluded direction, for better environment lighting. Hemisphere sampling This image shows the debug viz in the demo. Each frame will sample one green ring around a hemisphere, spinning rapidly, and you can hold ALT to capture the sampling process for the pixel you're pointing at. It was invaluable to find sampling issues, and also makes it trivial to verify alignment in 3D. The shader calls printPoint(…) and printLine(…) in WGSL, which are provided by a print helper, and linked in the same way it links any other shader functions. Bent normal and occlusion samples SSAO is expensive, and typically done at half-res, with heavy blurring to hide the sampling noise. Mine is no different, though I did take care to handle odd-sized framebuffers correctly, with no unexpected sample misalignments. It also has accumulation over time, as the shadows change slowly from frame to frame. This is done with temporal reprojection and motion vectors, at the cost of a little bit of ghosting. Moving the camera doesn't reset the ambient occlusion, as long as it's moving smoothly. Motion vectors example Accumulated samples As Use.GPU doesn't render continuously, you can now use <Loop converge={N}> to decide how many extra frames you want to render after every visual change. Reprojection requires access to the last frame's depth, normal and samples, and this is trivial to provide. Use.GPU has built-in transparent history for render targets and buffers. This allows for a classic front/back buffer flipping arrangement with zero effort (also, n > 2). Depth history You bind this as virtual sources, each accessing a fixed slot history[i], which will transparently cycle whenever you render to its target. Any reimagined GPU API should seriously consider buffer history as a first-class concept. All the modern techniques require it. IGN Rather than use e.g. blue noise and hope the statistics work out, I chose a very precise sampling and blurring scheme. This uses interleaved gradient noise (IGN), and pre-filters samples in alternating 2x2 quads to help diffuse the speckles as quickly as possible. IGN is designed for 3x3 filters, so a more specifically tuned noise generator may work even better, but it's a decent V1. Reprojection often doubles as a cheap blur filter, creating free anti-aliasing under motion or jitter. I avoided this however, as the data being sampled includes the bent normals, and this would cause all edges to become rounded. Instead I use a precise bilateral filter based on depth and normal, aided by 3D motion vectors. This means it knows exactly what depth to expect in the last frame, and the reprojected samples remain fully aliased, which is a good thing here. The choice of 3D motion vectors is mainly a fun experiment, it may be an unnecessary luxury. Detail of accumulated samples The motion vectors are based only on the camera motion for now, though there is already the option of implementing custom motion shaders similar to e.g. Unity. For live data viz and procedural geometry, motion vectors may not even be well-defined. Luckily it doesn't matter much: it converges fast enough that artifacts are hard to spot. The final resolve can then do a bilateral upsample of these accumulated samples, using the original high-res normal and depth buffer: Upscaled and resolved samples, with overscan trimmed off Because it's screen-space, the shadows disappear at the screen edges. To remedy this, I implemented a very precise form of overscan. It expands the framebuffer by a constant amount of pixels, and expands the projectionMatrix to match. This border is then trimmed off when doing the final resolve. In principle this is pixel-exact, barring GPU quirks. These extra pixels don't go to waste either: they can get reprojected into the frame under motion, reducing visible noise significantly. In theory this is very simple, as it's a direct scaling of [-1..1] XY clip space. In practice you have to make sure absolutely nothing visual depends on the exact X/Y range of your projectionMatrix, either its aspect ratio or in screen-space units. This required some cleanup on the inside, as Use.GPU has some pretty subtle scaling shaders for 2.5D and 3D points and lines. I imagine this is also why I haven't seen more people do this. But it's definitely worth it. Overall I'm very satisfied with this. Improvements and tweaks can be made aplenty, some performance tuning needs to happen, but it looks great already. It also works in both forward and deferred mode. The shader source is here. Render Buffers & Passes The rendering API for passes reflects the way a user wants to think about it, as 1 logical step in producing a final image. Sub-passes such as shadows or SSAO aren't really separate here, as the correct render cannot be finished without it. The main entry point here is the <Pass> component, representing such a logical render pass. It sits inside a view, like an <OrbitCamera>, and has some kind of pre-existing render context, like the visible canvas. <Pass lights ssao={{ radius: 3, indirect: 0.5 }} overscan={0.05} > ... </Pass> You can sequence multiple logical passes to add overlays with overlay: true, or even merge two scenes in 3D using the same Z-buffer. Inside it's a declarative recipe that turns a few flags and options into the necessary arrangement of buffers and passes required. This uses the alt-Live syntax use(…) but you can pretend that's JSX: const resources = [ use(ViewBuffer, options), lights ? use(LightBuffer, options) : null, shadows ? use(ShadowBuffer, options) : null, picking ? use(PickingBuffer, options) : null, overscan ? use(OverscanBuffer, options) : null, ...(ssao ? [ use(NormalBuffer, options), use(MotionBuffer, options), ] : []), ssao ? use(SSAOBuffer, options) : null, ]; const resolved = passes ?? [ normals ? use(NormalPass, options) : null, motion ? use(MotionPass, options) : null, ssao ? use(SSAOPass, options) : null, shadows ? use(ShadowPass, options) : null, use(DEFAULT_PASS[viewType], options), picking ? use(PickingPass, options) : null, debug ? use(DebugPass, options) : null, ] e.g. The <SSAOBuffer> will spawn all the buffers necessary to do SSAO. Notice what is absent here: the inputs and outputs. The render passes are wired up implicitly, because if you had to do it manually, there would only be one correct way. This is the purpose of separating the resources from the passes: it allows everything to be allocated once, up front, so that then the render passes can connect them into a suitable graph with a non-trivial but generally expected topology. They find each other using 'well-known names' like normal and motion, which is how it's done in practice anyway. Render passes in the inspector This reflects what I am starting to run into more and more: that decomposed systems have little value if everyone has to use it the same way. It can lead to a lot of code noise, and also tie users to unimportant details of the existing implementation. Hence the simple recipe. But, if you want to sequence your own render exactly, nothing prevents you from using the render components à la carte: the main method of composition is mounting reactive components in Live, like everything else. Your passes work exactly the same as the built-in ones. I make use of the dynamicism of JS to e.g. not care what options are passed to the buffers and passes. The convention is that each should be namespaced so they don't collide. This provides real extensibility for custom use, while paving the cow paths that exist. It's typical that buffers and passes come in matching pairs. However, one could swap out one variation of a <FooPass> for another, while reusing the same buffer type. Most <FooBuffer> implementations are themselves declarative recipes, with e.g. a <RenderTarget> or two, and perhaps an associated data binding. All the meat—i.e. the dispatches—is in the passes. It's so declarative that there isn't much left inside <Renderer> itself. It maps logical calls into concrete ones by leveraging Live, and that's reflected entirely in what's there. It only gathers up some data it doesn't know details about, and helps ensure the sequence of compute before render before readback. This is a big clue that renderers really want to be reactive run-times instead. Bind Group Soup Use.GPU's initial design goal was "a unique shader for every draw call". This means its data binding fu has mostly been applied to local shader bindings. These apply only to one particular draw, and you bind the data to the shader at the same time as creating it. This is the useShader hook. There is no separation where you first prepare the binding layout, and as such, you use it like a deferred function call, just like JSX. // Prepare to call surfaceShader(matrix, ray, normal, size, ...) const getSurface = useShader(surfaceShader, [ matrix, ray, normal, size, insideRef, originRef, sdf, palette, pbr, ...sources ], defs); Shader and pipeline reuse is handled via structural hashing behind the scenes: it's merely a happy benefit if two draw calls can reuse the same shader and pipeline, but absolutely not a problem if they don't. As batching is highly encouraged, and large data sets can be rendered as one, the number of draw calls tends to be low. All local bindings are grouped in two bind groups, static and volatile. The latter allows for the transparent history feature, as well as just-in-time allocated atlases. Static bindings don't need to be 100% static, they just can't change during dispatch or rendering. WebGPU only has four bind groups total. I previously used the other two for respectively the global view, and the concrete render pass, using up all the bind groups. This was wasteful but an unfortunate necessity, without an easy way to compose them at run-time. Bind Group: #0 #1 #2 #3 Use.GPU 0.13 View Pass Static Volatile Use.GPU 0.14 Pass Static Volatile Free This has been fixed in 0.14, which frees up a bind group. It also means every render pass fully owns its own view. It can pick from a set of pre-provided ones (e.g. overscanned or not), or set a custom one, the same way it finds buffers and other bindings. Having bind group 3 free also opens up the possibility of a more traditional sub-pipeline, as seen in a traditional scene graph renderer. These can handle larger amounts of individual draw calls, all sharing the same shader template, but with different textures and parameters. My goal however is to avoid monomorphizing to this degree, unless it's absolutely necessary (e.g. with the lighting). This required upgrading the shader linker. Given e.g. a static binding snippet such as: use '@use-gpu/wgsl/use/types'::{ Light }; @export struct LightUniforms { count: u32, lights: array<Light>, }; @group(PASS) @binding(1) var<storage> lightUniforms: LightUniforms; ...you can import it in Typescript like any other shader module, with the @binding as an attribute to be linked. The shader linker will understand struct types like LightUniforms with array<Light> fully now, and is able to produce e.g. a correct minimum binding size for types that cross module boundaries. The ergonomics of useShader have been replicated here, so that useBindGroupLayout takes a set of these and prepares them into a single static bind group, managing e.g. the shader stages for you. To bind data to the bind group, a render pass delegates via useApplyPassBindGroup: this allows the source of the data to be modularized, instead of requiring every pass to know about every possible binding (e.g. lighting, shadows, SSAO, etc.). That is, while there is a separation between bind group layout and data binding, it's lazy: both are still defined in the same place. The binding system is flexible enough end-to-end that the SSAO can e.g. be applied to the voxel raytracer from @use-gpu/voxel with zero effort required, as it also uses the shaded technique (with per fragment depth). It has a getSurface(...) shader function that raytraces and returns a surface fragment. The SSAO sampler can just attach its occlusion information to it, by decorating it in WGSL. WGSL Types Worth noting, this all derives from previous work on auto-generated structs for data aggregation. It's cool tech, but it's hard to show off, because it's completely invisible on the outside, and the shader code is all ugly autogenerated glue. There's a presentation up on the site that details it at the lower level, if you're curious. The main reason I had aggregation initially was to work around the 8 storage buffers limit in WebGPU. The Plot API needed to auto-aggregate all the different attributes of shapes, with their given spread policies, based on what the user supplied. This allows me to offer e.g. a bulk line drawing primitive where attributes don't waste precious bandwidth on repeated data. Each ends up grouped in structs, taking up only 1 storage buffer, depending on whether it is constant or varying, per instance or per vertex: <Line // Two lines positions={[ [[300, 50], [350, 150], [400, 50], [450, 150]], [[300, 150], [350, 250], [400, 150], [450, 250]], ]} // Of the same color and width color={'#40c000'} width={5} /> <Line // Two lines positions={[ [[300, 250], [350, 350], [400, 250], [450, 350]], [[300, 350], [350, 450], [400, 350], [450, 450]], ]} // With color per line color={['#ffa040', '#7f40a0']} // And width per vertex widths={[[1, 2, 2, 1], [1, 2, 2, 1]} /> This involves a comprehensive buffer interleaving and copying mechanism, that has to satisfy all the alignment constraints. This then leverages @use-gpu/shader's structType(…) API to generate WGSL struct types at run-time. Given a list of attributes, it returns a virtual shader module with a real symbol table. This is materialized into shader code on demand, and can be exploded into individual accessor functions as well. Hence data sources in Use.GPU can now have a format of T or array<T> with a WGSL shader module as the type parameter. I already had most of the pieces in place for this, but hadn't quite put it all together everywhere. Using shader modules as the representation of types is very natural, as they carry all the WGSL attributes and GPU-only concepts. It goes far beyond what I had initially scoped for the linker, as it's all source-code-level, but it was worth it. The main limitation is that type inference only happens at link time, as binding shader modules together has to remain a fast and lazy op. Native WGSL types are somewhat poorly aligned with the WebGPU API on the CPU side. A good chunk of @use-gpu/core is lookup tables with info about formats and types, as well as alignment and size, so it can all be resolved at run-time. There's something similar for bind group creation, where it has to translate between a few different ways of saying the same thing. The types I expose instead are simple: TextureSource, StorageSource and LambdaSource. Everything you bind to a shader is either one of these, or a constant (by reference). They carry all the necessary metadata to derive a suitable binding and accessor. That said, I cannot shield you from the limitations underneath. Texture formats can e.g. be renderable or not, filterable or not, writeable or not, and the specific mechanisms available to you vary. If this involves native depth buffers, you may need to use a full-screen render pass to copy data, instead of just calling copyTextureToTexture. I run into this too, and can only provide a few more convenience hooks. I did come up with a neat way to genericize these copy shaders, using the existing WGSL type inference I had, souped up a bit. This uses simple selector functions to serve the role of reassembling types. It's finally given me a concrete way to make 'root shaders' (i.e. the entry points) generic enough to support all use. I may end up using something similar to handle the ordinary vertex and fragment entry points, which still have to be provided in various permutations. * * * Phew. Use.GPU is always a lot to go over. But its à la carte nature remains and that's great. For in-house use it's already useful, especially if you need a decent GPU on a desktop anyway. I have been using it for some client work, and it seems to be making people happy. If you want to go off-road from there, you can. It delivers on combining low-level shader code with its own stock components, without making you reinvent a lot of the wheels. Visit usegpu.live for more and to view demos in a WebGPU capable browser. PS: I upgraded the aging build of Jekyll that was driving this blog, so if you see anything out of the ordinary, please let me know.

4 months ago • 25 votes

The Bouquet Residence

Keeping up appearances in tech The word "rant" is used far too often, and in various ways. It's meant to imply aimless, angry venting. But often it means: Naming problems without proposing solutions, this makes me feel confused. Naming problems and assigning blame, this makes me feel bad. I saw a remarkable pair of tweets the other day. In the wake of the outage, the CEO of CrowdStrike sent out a public announcement. It's purely factual. The scope of the problem is identified, the known facts are stated, and the logistics of disaster relief are set in motion. Millions of computers were affected. This is the equivalent of a frazzled official giving a brief statement in the aftermath of an earthquake, directing people to the Red Cross. Everything is basically on fire for everyone involved. Systems are failing everywhere, some critical, and quite likely people are panicking. The important thing is to give the technicians the information and tools to fix it, and for everyone else to do what they can, and stay out of the way. In response, a communication professional posted an 'improved' version: Credit where credit is due, she nailed the style. 10/10. It seems unobjectionable, at first. Let's go through, shall we? Opposite Day First is that the CEO is "devastated." A feeling. And they are personally going to ensure it's fixed for every single user. This focuses on the individual who is inconvenienced. Not the disaster. They take a moment out of their time to say they are so, so sorry a mistake was made. They have let you and everyone else down, and that shouldn't happen. That's their responsibility. By this point, the original statement had already told everyone the relevant facts. Here the technical details are left to the imagination. The writer's self-assigned job is to wrap the message in a more palatable envelope. Everyone will be working "all day, all night, all weekends," indeed, "however long it takes," to avoid it happening again. I imagine this is meant to be inspiring and reassuring. But if I was a CrowdStrike technician or engineer, I would find it demoralizing: the boss, who will actually be personally fixing diddly-squat, is saying that the long hours of others are a sacrifice they're willing to make. Plus, CrowdStrike's customers are in the same boat: their technicians get volunteered too. They can't magically unbrick PCs from a distance, so "until it's fully fixed for every single user" would be a promise outsiders will have to keep. Lovely. There's even a punch line: an invitation to go contact them, the quickest way linked directly. It thanks people for reaching out. If everything is on fire, that includes the phone lines, the inboxes, and so on. The most stupid thing you could do in such a situation is to tell more people to contact you, right away. Don't encourage it! That's why the original statement refers to pre-existing lines of communication, internal representatives, and so on. The Support department would hate the CEO too. Root Cause If you're wondering about the pictures, it's Hyacinth Bucket, from 90s UK sitcom Keeping Up Appearances, who would always insist "it's pronounced Bouquet." Hyacinth's ambitions always landed her out of her depth, surrounded by upper-class people she's trying to impress, in the midst of an embarrassing disaster. Her increasingly desperate attempts to save face, which invariably made things worse, are the main source of comedy. Try reading that second statement in her voice. I’m devastated to see the scale of today’s outage and will be personally working on it together with our team until it’s fully fixed for every single user. But I wanted to take a moment to come here and tell you that I am sorry. People around the world rely on us, and incidents like this can’t happen. This came from an error that ultimately is my responsibility. I can hear it perfectly, telegraphing Britishness to restore dignity for all. If she were in tech she would give that statement. It's about reputation management first, projecting the image of competence and accountability. But she's giving the speech in front of a burning building, not realizing the entire exercise is futile. Worse, she thinks she's nailing it. If CrowdStrike had sent this out, some would've applauded and called it an admirable example of wise and empathetic communication. Real leadership qualities. But it's the exact opposite. It focuses on the wrong things, it alienates the staff, and it definitely amplifies the chaos. It's Monty Python-esque. Apologizing is pointless here, the damage is already done. What matters is how severe it is and whether it could've been avoided. This requires a detailed root-cause analysis and remedy. Otherwise you only have their word. Why would that re-assure you? The original restated the company's mission: security and stability. Those are the stakes to regain a modicum of confidence. You may think that I'm reading too much into this. But I know the exact vibe on an engineering floor when the shit hits the fan. I also know how executives and staff without that experience end up missing the point entirely. I once worked for a Hyacinth Bucket. It's not an anecdote, it's allegory. They simply don't get the engineering mindset, and confuse authority with ownership. They step on everyone's toes without realizing, because they're constantly wearing clown shoes. Nobody tells them. Softness as a Service The change in style between #1 and #2 is really a microcosm of the conflict that has been broiling in tech for ~15 years now. I don't mean the politics, but the shifting of norms, of language and behavior. It's framed as a matter of interpersonal style, which needs to be welcoming and inclusive. In practice this means they assert or demand that style #2 be the norm, even when #1 is advisable or required. Factuality is seen as deficient, improper and primitive. It's a form of doublethink: everyone's preference is equally valid, except yours, specifically. But the difference is not a preference. It's about what actually works and what doesn't. Style #1 is aimed at the people who have to fix it. Style #2 is aimed at the people who can't do anything until it's fixed. Who should they be reaching out to? In #2, communication becomes an end in itself, not a means of conveying information. It's about being seen saying the words, not living them. Poking at the statement makes it fall apart. When this becomes the norm in a technical field, it has deep consequences: Critique must be gift-wrapped in flattery, and is not allowed to actually land. Mistakes are not corrected, and sentiment takes precedence over effectiveness. Leaders speak lofty words far from the trenches to save face. The people they thank the loudest are the ones they pay the least. Inevitably, quiet competence is replaced with gaudy chaos. Everyone says they're sorry and responsible, but nobody actually is. Nobody wants to resign either. Sound familiar? Cope and Soothe The elephant in the room is that #1 is very masculine, while #2 is more feminine. When you hear "women are more empathetic communicators", this is what it means. They tend to focus on the individual and their relation to them, not the team as a whole and its mission. Complaints that tech is too "male dominated" and "notoriously hostile to women" are often just this. Tech was always full of types who won't preface their proposals and criticisms with fluff, and instead lean into autism. When you're used to being pandered to, neutrality feels like vulgarity. The notable exceptions are rare and usually have an exasperating lead up. Tech is actually one of the most accepting and egalitarian fields around. The maintainers do a mostly thankless job. "Oh so you're saying there's no misogyny in tech?" No I'm just saying misogyny doesn't mean "something 1 woman hates". The tone is really a distraction. If someone drops an analysis, saying shit or get off the pot, even very kindly and patiently, some will still run away screaming. Like an octopus spraying ink, they'll deploy a nasty form of #2 as a distraction. That's the real issue. Many techies, in their naiveté, believed the cultural reformers when they showed up to gentrify them. They obediently branded heretics like James Damore, and burned witches like Richard Stallman. Thanks to racism, words like 'master' and 'slave' are now off-limits as technical terms. Ironic, because millions of computers just crashed because they worked exactly like that. The cope is to pretend that nothing has truly changed yet, and more reform is needed. In fact, everything has already changed. Tech forums used to be crucibles for distilling insight, but now they are guarded jealously by people more likely to flag and ban than strongly disagree. I once got flagged on HN because I pointed out Twitter's mass lay-offs were a response to overhiring, and that people were rooting for the site to fail after Musk bought it. It suggested what we all know now: that the company would not implode after trimming the dead weight, and that they'd never forgive him for it. Diversity is now associated with incompetence, because incompetent people have spent over a decade reaching for it as an excuse. In their attempts to fight stereotypes, they ensured the stereotypes came true. Bait and Snitch The outcry tends to be: "We do all the same things you do, but still we get treated differently!" But they start from the conclusion and work their way backwards. This is what the rewritten statement does: it tries to fix the relationship before fixing the problem. The average woman and man actually do things very differently in the first place. Individual men and women choose. And others respond accordingly. The people who build and maintain the world's infrastructure prefer the masculine style for a reason: it keeps civilization running, and helps restore it when it breaks. A disaster announcement does not need to be relatable, it needs to be effective. Furthermore, if the job of shoveling shit falls on you, no amount of flattery or oversight will make that more pleasant. It really won't. Such commentary is purely for the benefit of the ones watching and trying to look busy. It makes it worse, stop pretending otherwise. There's little loyalty in tech companies nowadays, and it's no surprise. Project and product managers are acting more like demanding clients to their own team, than leaders. "As a user, I want..." Yes, but what are you going to do about it? Do you even know where to start? What's perceived as a lack of sensitivity is actually the presence of sensibility. It's what connects the words to the reality on the ground. It does not need to be improved or corrected, it just needs to be respected. And yes it's a matter of gender, because bashing men and masculine norms has become a jolly recreational sport in the overculture. Mature women know it. It seems impossible to admit. The entire edifice of gender equality depends on there not being a single thing men are actually better at, even just on average. Where men and women's instincts differ, women must be right. It's childish, and not harmless either. It dares you to call it out, so they can then play the wounded victim, and paint you as the unreasonable asshole who is mean. This is supposed to invalidate the argument. * * * This post is of course a giant cannon pointing in the opposite direction, sitting on top of a wall. Its message will likely fly over the reformers' heads. If they read it at all, they'll selectively quote or paraphrase, call me a tech-bro, and spool off some sentences they overheard, like an LLM. It's why they adore AI, and want it to be exactly as sycophantic as them. They don't care that it makes stuff up wholesale, because it makes them look and feel competent. It will never tell them to just fuck off already. Think less about what is said, more about what is being done. Otherwise the next CrowdStrike will probably be worse.

a year ago • 23 votes

I is for Intent

Why your app turned into spaghetti "I do not like your software sir, There's a certain kind of programmer. Let's call him Stanley. Stanley has been around for a while, and has his fair share of war stories. The common thread is that poorly conceived and specced solutions lead to disaster, or at least, ongoing misery. As a result, he has adopted a firm belief: it should be impossible for his program to reach an invalid state. Stanley loves strong and static typing. He's a big fan of pattern matching, and enums, and discriminated unions, which allow correctness to be verified at compile time. He also has strong opinions on errors, which must be caught, logged and prevented. He uses only ACID-compliant databases, wants foreign keys and triggers to be enforced, and wraps everything in atomic transactions. He hates any source of uncertainty or ambiguity, like untyped JSON or plain-text markup. His APIs will accept data only in normalized and validated form. When you use a Stanley lib, and it doesn't work, the answer will probably be: "you're using it wrong." Stanley is most likely a back-end or systems-level developer. Because nirvana in front-end development is reached when you understand that this view of software is not just wrong, but fundamentally incompatible with the real world. I will prove it. State Your Intent Take a text editor. What happens if you press the up and down arrows? The keyboard cursor (aka caret) moves up and down. Duh. Except it also moves left and right. The editor state at the start has the caret on line 1 column 6. Pressing down will move it to line 2 column 6. But line 2 is too short, so the caret is forcibly moved left to column 1. Then, pressing down again will move it back to column 6. It should be obvious that any editor that didn't remember which column you were actually on would be a nightmare to use. You know it in your bones. Yet this only works because the editor allows the caret to be placed on a position that "does not exist." What is the caret state in the middle? It is both column 1 and column 6. To accommodate this, you need more than just a View that is a pure function of a State, as is now commonly taught. Rather, you need an Intent, which is the source of truth that you mutate... and which is then parsed and validated into a State. Only then can it be used by the View to render the caret in the right place. To edit the intent, aka what a classic Controller does, is a bit tricky. When you press left/right, it should determine the new Intent.column based on the validated State.column +/- 1. But when you press up/down, it should keep the Intent.column you had before and instead change only Intent.line. New intent is a mixed function of both previous intent and previous state. The general pattern is that you reuse Intent if it doesn't change, but that new computed Intent should be derived from State. Note that you should still enforce normal validation of Intent.column when editing too: you don't allow a user to go past the end of a line. Any new intent should be as valid as possible, but old intent should be preserved as is, even if non-sensical or inapplicable. Functionally, for most of the code, it really does look and feel as if the state is just State, which is valid. It's just that when you make 1 state change, the app may decide to jump into a different State than one would think. When this happens, it means some old intent first became invalid, but then became valid again due to a subsequent intent/state change. This is how applications actually work IRL. FYI. Knives and Forks I chose a text editor as an example because Stanley can't dismiss this as just frivolous UI polish for limp wristed homosexuals. It's essential that editors work like this. The pattern is far more common than most devs realize: A tree view remembers the expand/collapse state for rows that are hidden. Inspector tabs remember the tab you were on, even if currently disabled or inapplicable. Toggling a widget between type A/B/C should remember all the A, B and C options, even if mutually exclusive. All of these involve storing and preserving something unknown, invalid or unused, and bringing it back into play later. More so, if software matches your expected intent, it's a complete non-event. What looks like a "surprise hidden state transition" to a programmer is actually the exact opposite. It would be an unpleasant surprise if that extra state transition didn't occur. It would only annoy users: they already told the software what they wanted, but it keeps forgetting. The ur-example is how nested popup menus should work: good implementations track the motion of the cursor so you can move diagonally from parent to child, without falsely losing focus: This is an instance of the goalkeeper's curse: people rarely compliment or notice the goalkeeper if they do their job, only if they mess up. Successful applications of this principle are doomed to remain unnoticed and unstudied. Validation is not something you do once, discarding the messy input and only preserving the normalized output. It's something you do continuously and non-destructively, preserving the mess as much as possible. It's UI etiquette: the unspoken rules that everyone expects but which are mostly undocumented folklore. This poses a problem for most SaaS in the wild, both architectural and existential. Most APIs will only accept mutations that are valid. The goal is for the database to be a sequence of fully valid states: The smallest possible operation in the system is a fully consistent transaction. This flattens any prior intent. In practice, many software deviates from this ad-hoc. For example, spreadsheets let you create cyclic references, which is by definition invalid. The reason it must let you do this is because fixing one side of a cyclic reference also fixes the other side. A user wants and needs to do these operations in any order. So you must allow a state transition through an invalid state: This requires an effective Intent/State split, whether formal or informal. Because cyclic references can go several levels deep, identifying one cyclic reference may require you to spider out the entire dependency graph. This is functionally equivalent to identifying all cyclic references—dixit Dijkstra. Plus, you need to produce sensible, specific error messages. Many "clever" algorithmic tricks fail this test. Now imagine a spreadsheet API that doesn't allow for any cyclic references ever. This still requires you to validate the entire resulting model, just to determine if 1 change is allowed. It still requires a general validate(Intent). In short, it means your POST and PUT request handlers need to potentially call all your business logic. That seems overkill, so the usual solution is bespoke validators for every single op. If the business logic changes, there is a risk your API will now accept invalid intent. And the app was not built for that. If you flip it around and assume intent will go out-of-bounds as a normal matter, then you never have this risk. You can write the validation in one place, and you reuse it for every change as a normal matter of data flow. Note that this is not cowboy coding. Records and state should not get irreversibly corrupted, because you only ever use valid inputs in computations. If the system is multiplayer, distributed changes should still be well-ordered and/or convergent. But the data structures you're encoding should be, essentially, entirely liberal to your user's needs. Consider git. Here, a "unit of intent" is just a diff applied to a known revision ID. When something's wrong with a merge, it doesn't crash, or panic, or refuse to work. It just enters a conflict state. This state is computed by merging two incompatible intents. It's a dirty state that can't be turned into a clean commit without human intervention. This means git must continue to work, because you need to use git to clean it up. So git is fully aware when a conflict is being resolved. As a general rule, the cases where you actually need to forbid a mutation which satisfies all the type and access constraints are small. A good example is trying to move a folder inside itself: the file system has to remain a sensibly connected tree. Enforcing the uniqueness of names is similar, but also comes with a caution: falsehoods programmers believe about names. Adding (Copy) to a duplicate name is usually better than refusing to accept it, and most names in real life aren't unique at all. Having user-facing names actually requires creating tools and affordances for search, renaming references, resolving duplicates, and so on. Even among front-end developers, few people actually grok this mental model of a user. It's why most React(-like) apps in the wild are spaghetti, and why most blog posts about React gripes continue to miss the bigger picture. Doing React (and UI) well requires you to unlearn old habits and actually design your types and data flow so it uses potentially invalid input as its single source of truth. That way, a one-way data flow can enforce the necessary constraints on the fly. The way Stanley likes to encode and mutate his data is how programmers think about their own program: it should be bug-free and not crash. The mistake is to think that this should also apply to any sort of creative process that program is meant to enable. It would be like making an IDE that only allows you to save a file if the code compiles and passes all the tests. Trigger vs Memo Coding around intent is a very hard thing to teach, because it can seem overwhelming. But what's overwhelming is not doing this. It leads to codebases where every new feature makes ongoing development harder, because no part of the codebase is ever finished. You will sprinkle copies of your business logic all over the place, in the form of request validation, optimistic local updaters, and guess-based cache invalidation. If this is your baseline experience, your estimate of what is needed to pull this off will simply be wrong. In the traditional MVC model, intent is only handled at the individual input widget or form level. e.g. While typing a number, the intermediate representation is a string. This may be empty, incomplete or not a number, but you temporarily allow that. I've never seen people formally separate Intent from State in an entire front-end. Often their state is just an adhoc mix of both, where validation constraints are relaxed in the places where it was most needed. They might just duplicate certain fields to keep a validated and unvalidated variant side by side. There is one common exception. In a React-like, when you do a useMemo with a derived computation of some state, this is actually a perfect fit. The eponymous useState actually describes Intent, not State, because the derived state is ephemeral. This is why so many devs get lost here. const state = useMemo( () => validate(intent), [intent] ); Their usual instinct is that every action that has knock-on effects should be immediately and fully realized, as part of one transaction. Only, they discover some of those knock-on effects need to be re-evaluated if certain circumstances change. Often to do so, they need to undo and remember what it was before. This is then triggered anew via a bespoke effect, which requires a custom trigger and mutation. If they'd instead deferred the computation, it could have auto-updated itself, and they would've still had the original data to work with. e.g. In a WYSIWYG scenario, you often want to preview an operation as part of mouse hovering or dragging. It should look like the final result. You don't need to implement custom previewing and rewinding code for this. You just need the ability to layer on some additional ephemeral intent on top of the intent that is currently committed. Rewinding just means resetting that extra intent back to empty. You can make this easy to use by treating previews as a special kind of transaction: now you can make preview states with the same code you use to apply the final change. You can also auto-tag the created objects as being preview-only, which is very useful. That is: you can auto-translate editing intent into preview intent, by messing with the contents of a transaction. Sounds bad, is actually good. The same applies to any other temporary state, for example, highlighting of elements. Instead of manually changing colors, and creating/deleting labels to pull this off, derive the resolved style just-in-time. This is vastly simpler than doing it all on 1 classic retained model. There, you run the risk of highlights incorrectly becoming sticky, or shapes reverting to the wrong style when un-highlighted. You can architect it so this is simply impossible. The trigger vs memo problem also happens on the back-end, when you have derived collections. Each object of type A must have an associated type B, created on-demand for each A. What happens if you delete an A? Do you delete the B? Do you turn the B into a tombstone? What if the relationship is 1-to-N, do you need to garbage collect? If you create invisible objects behind the scenes as a user edits, and you never tell them, expect to see a giant mess as a result. It's crazy how often I've heard engineers suggest a user should only be allowed to create something, but then never delete it, as a "solution" to this problem. Everyday undo/redo precludes it. Don't be ridiculous. The problem is having an additional layer of bookkeeping you didn't need. The source of truth was collection A, but you created a permanent derived collection B. If you instead make B ephemeral, derived via a stateless computation, then the problem goes away. You can still associate data with B records, but you don't treat B as the authoritative source for itself. This is basically what a WeakMap is. In database land this can be realized with a materialized view, which can be incremental and subscribed to. Taken to its extreme, this turns into event-based sourcing, which might seem like a panacea for this mindset. But in most cases, the latter is still a system by and for Stanley. The event-based nature of those systems exists to support housekeeping tasks like migration, backup and recovery. Users are not supposed to be aware that this is happening. They do not have any view into the event log, and cannot branch and merge it. The exceptions are extremely rare. It's not a system for working with user intent, only for flattening it, because it's append-only. It has a lot of the necessary basic building blocks, but substitutes programmer intent for user intent. What's most nefarious is that the resulting tech stacks are often quite big and intricate, involving job queues, multi-layered caches, distribution networks, and more. It's a bunch of stuff that Stanley can take joy and pride in, far away from users, with "hard" engineering challenges. Unlike all this *ugh* JavaScript, which is always broken and unreliable and uninteresting. Except it's only needed because Stanley only solved half the problem, badly. Patch or Bust When factored in from the start, it's actually quite practical to split Intent from State, and it has lots of benefits. Especially if State is just a more constrained version of the same data structures as Intent. This doesn't need to be fully global either, but it needs to encompass a meaningful document or workspace to be useful. It does create an additional problem: you now have two kinds of data in circulation. If reading or writing requires you to be aware of both Intent and State, you've made your code more complicated and harder to reason about. More so, making a new Intent requires a copy of the old Intent, which you mutate or clone. But you want to avoid passing Intent around in general, because it's fishy data. It may have the right types, but the constraints and referential integrity aren't guaranteed. It's a magnet for the kind of bugs a type-checker won't catch. I've published my common solution before: turn changes into first-class values, and make a generic update of type Update<T> be the basic unit of change. As a first approximation, consider a shallow merge {...value, ...update}. This allows you to make an updateIntent(update) function where update only specifies the fields that are changing. In other words, Update<Intent> looks just like Update<State> and can be derived 100% from State, without Intent. Only one place needs to have access to the old Intent, all other code can just call that. You can make an app intent-aware without complicating all the code. If your state is cleaved along orthogonal lines, then this is all you need. i.e. If column and line are two separate fields, then you can selectively change only one of them. If they are stored as an XY tuple or vector, now you need to be able to describe a change that only affects either the X or Y component. const value = { hello: 'text', foo: { bar: 2, baz: 4 }, }; const update = { hello: 'world', foo: { baz: 50 }, }; expect( patch(value, update) ).toEqual({ hello: 'world', foo: { bar: 2, baz: 50 }, }); So in practice I have a function patch(value, update) which implements a comprehensive superset of a deep recursive merge, with full immutability. It doesn't try to do anything fancy with arrays or strings, they're just treated as atomic values. But it allows for precise overriding of merging behavior at every level, as well as custom lambda-based updates. You can patch tuples by index, but this is risky for dynamic lists. So instead you can express e.g. "append item to list" without the entire list, as a lambda. I've been using patch for years now, and the uses are myriad. To overlay a set of overrides onto a base template, patch(base, overrides) is all you need. It's the most effective way I know to erase a metric ton of {...splats} and ?? defaultValues and != null from entire swathes of code. This is a real problem. You could also view this as a "poor man's OT", with the main distinction being that a patch update only describes the new state, not the old state. Such updates are not reversible on their own. But they are far simpler to make and apply. It can still power a global undo/redo system, in combination with its complement diff(A, B): you can reverse an update by diffing in the opposite direction. This is an operation which is formalized and streamlined into revise(…), so that it retains the exact shape of the original update, and doesn't require B at all. The structure of the update is sufficient information: it too encodes some intent behind the change. With patch you also have a natural way to work with changes and conflicts as values. The earlier WYSIWIG scenario is just patch(commited, ephemeral) with bells on. The net result is that mutating my intent or state is as easy as doing a {...value, ...update} splat, but I'm not falsely incentivized to flatten my data structures. Instead it frees you up to think about what the most practical schema actually is from the data's point of view. This is driven by how the user wishes to edit it, because that's what you will connect it to. It makes you think about what a user's workspace actually is, and lets you align boundaries in UX and process with boundaries in data structure. Remember: most classic "data structures" are not about the structure of data at all. They serve as acceleration tools to speed up specific operations you need on that data. Having the reads and writes drive the data design was always part of the job. What's weird is that people don't apply that idea end-to-end, from database to UI and back. SQL tables are shaped the way they are because it enables complex filters and joins. However, I find this pretty counterproductive: it produces derived query results that are difficult to keep up to date on a client. They also don't look like any of the data structures I actually want to use in my code. A Bike Shed of Schemas This points to a very under-appreciated problem: it is completely pointless to argue about schemas and data types without citing specific domain logic and code that will be used to produce, edit and consume it. Because that code determines which structures you are incentivized to use, and which structures will require bespoke extra work. From afar, column and line are just XY coordinates. Just use a 2-vector. But once you factor in the domain logic and etiquette, you realize that the horizontal and vertical directions have vastly different rules applied to them, and splitting might be better. Which one do you pick? This applies to all data. Whether you should put items in a List<T> or a Map<K, V> largely depends on whether the consuming code will loop over it, or need random access. If an API only provides one, consumers will just build the missing Map or List as a first step. This is O(n log n) either way, because of sorting. The method you use to read or write your data shouldn't limit use of everyday structure. Not unless you have a very good reason. But this is exactly what happens. A lot of bad choices in data design come down to picking the "wrong" data type simply because the most appropriate one is inconvenient in some cases. This then leads to Conway's law, where one team picks the types that are most convenient only for them. The other teams are stuck with it, and end up writing bidirectional conversion code around their part, which will never be removed. The software will now always have this shape, reflecting which concerns were considered essential. What color are your types? { order: [4, 11, 9, 5, 15, 43], values: { 4: {...}, 5: {...}, 9: {...}, 11: {...}, 15: {...}, 43: {...}, }, ); For List vs Map, you can have this particular cake and eat it too. Just provide a List<Id> for the order and a Map<Id, T> for the values. If you structure a list or tree this way, then you can do both iteration and ID-based traversal in the most natural and efficient way. Don't underestimate how convenient this can be. This also has the benefit that "re-ordering items" and "editing items" are fully orthogonal operations. It decomposes the problem of "patching a list of objects" into "patching a list of IDs" and "patching N separate objects". It makes code for manipulating lists and trees universal. It lets you to decide on a case by case basis whether you need to garbage collect the map, or whether preserving unused records is actually desirable. Limiting it to ordinary JSON or JS types, rather than going full-blown OT or CRDT, is a useful baseline. With sensible schema design, at ordinary editing rates, CRDTs are overkill compared to the ability to just replay edits, or notify conflicts. This only requires version numbers and retries. Users need those things anyway: just because a CRDT converges when two people edit, doesn't mean the result is what either person wants. The only case where OTs/CRDTs are absolutely necessary is rich-text editing, and you need bespoke UI solutions for that anyway. For simple text fields, last-write-wins is perfectly fine, and also far superior to what 99% of RESTy APIs do. A CRDT is just a mechanism that translates partially ordered intents into a single state. Like, it's cool that you can make CRDT counters and CRDT lists and whatnot... but each CRDT implements only one particular resolution strategy. If it doesn't produce the desired result, you've created invalid intent no user expected. With last-write-wins, you at least have something 1 user did intend. Whether this is actually destructive or corrective is mostly a matter of schema design and minimal surface area, not math. The main thing that OTs and CRDTs do well is resolve edits on ordered sequences, like strings. If two users are typing text in the same doc, edits higher-up will shift edits down below, which means the indices change when rebased. But if you are editing structured data, you can avoid referring to indices entirely, and just use IDs instead. This sidesteps the issue, like splitting order from values. For the order, there is a simple solution: a map with a fractional index, effectively a dead-simple list CRDT. It just comes with some overhead. Using a CRDT for string editing might not even be enough. Consider Google Docs-style comments anchored to that text: their indices also need to shift on every edit. Now you need a bespoke domain-aware CRDT. Or you work around it by injecting magic markers into the text. Either way, it seems non-trivial to decouple a CRDT from the specific target domain of the data inside. The constraints get mixed in. If you ask me, this is why the field of real-time web apps is still in somewhat of a rut. It's mainly viewed as a high-end technical problem: how do we synchronize data structures over a P2P network without any data conflicts? What they should be asking is: what is the minimal amount of structure we need to reliably synchronize, so that users can have a shared workspace where intent is preserved, and conflicts are clearly signposted. And how should we design our schemas, so that our code can manipulate the data in a straightforward and reliable way? Fixing non-trivial user conflicts is simply not your job. Most SaaS out there doesn't need any of this technical complexity. Consider that a good multiplayer app requires user presence and broadcast anyway. The simplest solution is just a persistent process on a single server coordinating this, one per live workspace. It's what most MMOs do. In fast-paced video games, this even involves lag compensation. Reliable ordering is not the big problem. The situations where this doesn't scale, or where you absolutely must be P2P, are a minority. If you run into them, you must be doing very well. The solution that I've sketched out here is explicitly designed so it can comfortably be done by small teams, or even just 1 person. The (private) CAD app I showed glimpses of above is entirely built this way. It's patch all the way down and it's had undo/redo from day 1. It also has a developer mode where you can just edit the user-space part of the data model, and save/load it. When the in-house designers come to me with new UX requests, they often ask: "Is it possible to do ____?" The answer is never a laborious sigh from a front-end dev with too much on their plate. It's "sure, and we can do more." If you're not actively aware the design of schemas and code is tightly coupled, your codebase will explode, and the bulk of it will be glue. Much of it just serves to translate generalized intent into concrete state or commands. Arguments about schemas are usually just hidden debates about whose job it is to translate, split or join something. This isn't just an irrelevant matter of "wire formats" because changing the structure and format of data also changes how you address specific parts of it. In an interactive UI, you also need a reverse path, to apply edits. What I hope you are starting to realize is that this is really just the forward path in reverse, on so many levels. The result of a basic query is just the ordered IDs of the records that it matched. A join returns a tuple of record IDs per row. If you pre-assemble the associated record data for me, you actually make my job as a front-end dev harder, because there are multiple forward paths for the exact same data, in subtly different forms. What I want is to query and mutate the same damn store you do, and be told when what changes. It's table-stakes now. With well-architected data, this can be wired up mostly automatically, without any scaffolding. The implementations you encounter in the wild just obfuscate this, because they don't distinguish between the data store and the model it holds. The fact that the data store should not be corruptible, and should enforce permissions and quotas, is incorrectly extended to the entire model stored inside. But that model doesn't belong to Stanley, it belongs to the user. This is why desktop applications didn't have a "Data Export". It was just called Load and Save, and what you saved was the intent, in a file. Having a universal query or update mechanism doesn't absolve you from thinking about this either, which is why I think the patch approach is so rare: it looks like cowboy coding if you don't have the right boundaries in place. Patch is mainly for user-space mutations, not kernel-space, a concept that applies to more than just OS kernels. User-space must be very forgiving. If you avoid it, you end up with something like GraphQL, a good example of solving only half the problem badly. Its getter assembles data for consumption by laboriously repeating it in dozens of partial variations. And it turns the setter part into an unsavory mix of lasagna and spaghetti. No wonder, it was designed for a platform that owns and hoards all your data. * * * Viewed narrowly, Intent is just a useful concept to rethink how you enforce validation and constraints in a front-end app. Viewed broadly, it completely changes how you build back-ends and data flows to support that. It will also teach you how adding new aspects to your software can reduce complexity, not increase it, if done right. A good metric is to judge implementation choices by how many other places of the code need to care about them. If a proposed change requires adjustments literally everywhere else, it's probably a bad idea, unless the net effect is to remove code rather than add. I believe reconcilers like React or tree-sitter are a major guide stone here. What they do is apply structure-preserving transforms on data structures, and incrementally. They actually do the annoying part for you. I based Use.GPU on the same principles, and use it to drive CPU canvases too. The tree-based structure reflects that one function's state just might be the next function's intent, all the way down. This is a compelling argument that the data and the code should have roughly the same shape. You will also conclude there is nothing more nefarious than a hard split between back-end and front-end. You know, coded by different people, where each side is only half-aware of the other's needs, but one sits squarely in front of the other. Well-intentioned guesses about what the other end needs will often be wrong. You will end up with data types and query models that cannot answer questions concisely and efficiently, and which must be babysat to not go stale. In the last 20 years, little has changed here in the wild. On the back-end, it still looks mostly the same. Even when modern storage solutions are deployed, people end up putting SQL- and ORM-like layers on top, because that's what's familiar. The split between back-end and database has the exact same malaise. None of this work actually helps make the app more reliable, it's the opposite: every new feature makes on-going development harder. Many "solutions" in this space are not solutions, they are copes. Maybe we're overdue for a NoSQL-revival, this time with a focus on practical schema design and mutation? SQL was designed to model administrative business processes, not live interaction. I happen to believe a front-end should sit next to the back-end, not in front of it, with only a thin proxy as a broker. What I can tell you for sure is: it's so much better when intent is a first-class concept. You don't need nor want to treat user data as something to pussy-foot around, or handle like it's radioactive. You can manipulate and transport it without a care. You can build rich, comfy functionality on top. Once implemented, you may find yourself not touching your network code for a very long time. It's the opposite of overwhelming, it's lovely. You can focus on building the tools your users need. This can pave the way for more advanced concepts like OT and CRDT, but will show you that neither of them is a substitute for getting your application fundamentals right. In doing so, you reach a synthesis of Dijkstra and anti-Dijkstra: your program should be provably correct in its data flow, which means it can safely break in completely arbitrary ways. Because the I in UI meant "intent" all along. More: On Variance and Extensibility Climbing Mount Effect APIs are About Policy

a year ago • 19 votes

Stable Fiddusion

Acko.queue(function () { renderMathInElement(document.querySelector('article'), {delimiters: [ {left: "$$", right: "$$", display: true}, {left: "$", right: "$", display: false}, ]}); }); Frequency-domain blue noise generator In computer graphics, stochastic methods are so hot right now. All rendering turns into calculus, except you solve the integrals by numerically sampling them. As I showed with Teardown, this is all based on random noise, hidden with a ton of spatial and temporal smoothing. For this, you need a good source of high quality noise. There have been a few interesting developments in this area, such as Alan Wolfe et al.'s Spatio-Temporal Blue Noise. This post is about how I designed noise in frequency space. I will cover: What is blue noise? Designing indigo noise How swap works in the frequency domain Heuristics and analysis to speed up search Implementing it in WebGPU Along the way I will also show you some "street" DSP math. This illustrates how getting comfy in this requires you to develop deep intuition about complex numbers. But complex doesn't mean complicated. It can all be done on a paper napkin. The WebGPU interface I built What I'm going to make is this: If properly displayed, this image should look eerily even. But if your browser is rescaling it incorrectly, it may not be exactly right. Colorless Blue Ideas I will start by just recapping the essentials. If you're familiar, skip to the next section. Ordinary random generators produce uniform white noise: every value is equally likely, and the average frequency spectrum is flat. Time domain Frequency domain To a person, this doesn't actually seem fully 'random', because it has clusters and voids. Similarly, a uniformly random list of coin flips will still have long runs of heads or tails in it occasionally. What a person would consider evenly random is usually blue noise: it prefers to alternate between heads and tails, and avoids long runs entirely. It is 'more random than random', biased towards the upper frequencies, i.e. the blue part of the spectrum. Time domain Frequency domain Blue noise is great for e.g. dithering, because when viewed from afar, or blurred, it tends to disappear. With white noise, clumps remain after blurring: Blurred white noise Blurred blue noise Blueness is a delicate property. If you have e.g. 3D blue noise in a volume XYZ, then a single 2D XY slice is not blue at all: XYZ spectrum XY slice XY spectrum The samples are only evenly distributed in 3D, i.e. when you consider each slice in front and behind it too. Blue noise being delicate means that nobody really knows of a way to generate it statelessly, i.e. as a pure function f(x,y,z). Algorithms to generate it must factor in the whole, as noise is only blue if every single sample is evenly spaced. You can make blue noise images that tile, and sample those, but the resulting repetition may be noticeable. Because blue noise is constructed, you can make special variants. Uniform Blue Noise has a uniform distribution of values, with each value equally likely. An 8-bit 256x256 UBN image will have each unique byte appear exactly 256 times. Projective Blue Noise can be projected down, so that a 3D volume XYZ flattened into either XY, YZ or ZX is still blue in 2D, and same for X, Y and Z in 1D. Spatio-Temporal Blue Noise (STBN) is 3D blue noise created specifically for use in real-time rendering: Every 2D slice XY is 2D blue noise Every Z row is 1D blue noise This means XZ or YZ slices of STBN are not blue. Instead, it's designed so that when you average out all the XY slices over Z, the result is uniform gray, again without clusters or voids. This requires the noise in all the slices to perfectly complement each other, a bit like overlapping slices of translucent swiss cheese. This is the sort of noise I want to generate. Indigo STBN 64x64x16 XYZ spectrum Sleep Furiously A blur filter's spectrum is the opposite of blue noise: it's concentrated in the lowest frequencies, with a bump in the middle. If you blur the noise, you multiply the two spectra. Very little is left: only the ring-shaped overlap, creating a band-pass area. This is why blue noise looks good when smoothed, and is used in rendering, with both spatial (2D) and temporal smoothing (1D) applied. Blur filters can be designed. If a blur filter is perfectly low-pass, i.e. ~zero amplitude for all frequencies > $ f_{\rm{lowpass}} $ , then nothing is left of the upper frequencies past a point. If the noise is shaped to minimize any overlap, then the result is actually noise free. The dark part of the noise spectrum should be large and pitch black. The spectrum shouldn't just be blue, it should be indigo. When people say you can't design noise in frequency space, what they mean is that you can't merely apply an inverse FFT to a given target spectrum. The resulting noise is gaussian, not uniform. The missing ingredient is the phase: all the frequencies need to be precisely aligned to have the right value distribution. This is why you need a specialized algorithm. The STBN paper describes two: void-and-cluster, and swap. Both of these are driven by an energy function. It works in the spatial/time domain, based on the distances between pairs of samples. It uses a "fall-off parameter" sigma to control the effective radius of each sample, with a gaussian kernel. STBN (Wolfe et al.) The swap algorithm is trivially simple. It starts from white noise and shapes it: Start with e.g. 256x256 pixels initialized with the bytes 0-255 repeated 256 times in order Permute all the pixels into ~white noise using a random order Now iterate: randomly try swapping two pixels, check if the result is "more blue" This is guaranteed to preserve the uniform input distribution perfectly. The resulting noise patterns are blue, but they still have some noise in all the lower frequencies. The only blur filter that could get rid of it all, is one that blurs away all the signal too. My 'simple' fix is just to score swaps in the frequency domain instead. If this seems too good to be true, you should know that a permutation search space is catastrophically huge. If any pixel can be swapped with any other pixel, the number of possible swaps at any given step is O(N²). In a 256x256 image, it's ~2 billion. The goal is to find a sequence of thousands, millions of random swaps, to turn the white noise into blue noise. This is basically stochastic bootstrapping. It's the bulk of good old fashioned AI, using simple heuristics, queues and other tools to dig around large search spaces. If there are local minima in the way, you usually need more noise and simulated annealing to tunnel over those. Usually. This set up is somewhat simplified by the fact that swaps are symmetric (i.e. (A,B) = (B,A)), but also that applying swaps S1 and S2 is the same as applying swaps S2 and S1 as long as they don't overlap. Good Energy Let's take it one hurdle at a time. It's not obvious that you can change a signal's spectrum just by re-ordering its values over space/time, but this is easy to illustrate. Take any finite 1D signal, and order its values from lowest to highest. You will get some kind of ramp, approximating a sawtooth wave. This concentrates most of the energy in the first non-DC frequency: Now split the odd values from the even values, and concatenate them. You will now have two ramps, with twice the frequency: You can repeat this to double the frequency all the way up to Nyquist. So you have a lot of design freedom to transfer energy from one frequency to another. In fact the Fourier transform has the property that energy in the time and frequency domain is conserved: This means the sum of $ |\mathrm{spectrum}_k|^2 $ remains constant over pixel swaps. We then design a target curve, e.g. a high-pass cosine: This can be fit and compared to the current noise spectrum to get the error to minimize. However, I don't measure the error in energy $ |\mathrm{spectrum}_k|^2 $ but in amplitude $ |\mathrm{spectrum}_k| $. I normalize the spectrum and the target into distributions, and take the L2 norm of the difference, i.e. a sqrt of the sum of squared errors: This keeps the math simple, but also helps target the noise in the ~zero part of the spectrum. Otherwise, deviations near zero would count for less than deviations around one. Go Banana So I tried it. With a lot of patience, you can make 2D blue noise images up to 256x256 on a single thread. A naive random search with an FFT for every iteration is not fast, but computers are. Making a 64x64x16 with this is possible, but it's certainly like watching paint dry. It's the same number of pixels as 256x256, but with an extra dimension worth of FFTs that need to be churned. Still, it works and you can also make 3D STBN with the spatial and temporal curves controlled independently: Converged spectra I built command-line scripts for this, with a bunch of quality of life things. If you're going to sit around waiting for numbers to go by, you have a lot of time for this... Save and load byte/float-exact state to a .png, save parameters to .json Save a bunch of debug viz as extra .pngs with every snapshot Auto-save state periodically during runs Measure and show rate of convergence every N seconds, with smoothing Validate the histogram before saving to detect bugs and avoid corrupting expensive runs I could fire up a couple of workers to start churning, while continuing to develop the code liberally with new variations. I could also stop and restart workers with new heuristics, continuing where it left off. Protip: you can write C in JavaScript Drunk with power, I tried various sizes and curves, which created... okay noise. Each has the exact same uniform distribution so it's difficult to judge other than comparing to other output, or earlier versions of itself. To address this, I visualized the blurred result, using a [1 4 6 4 1] kernel as my base line. After adjusting levels, structure was visible: Semi-converged Blurred The resulting spectra show what's actually going on: Semi-converged Blurred The main component is the expected ring of bandpass noise, the 2D equivalent of ringing. But in between there is also a ton of redder noise, in the lower frequencies, which all remains after a blur. This noise is as strong as the ring. So while it's easy to make a blue-ish noise pattern that looks okay at first glance, there is a vast gap between having a noise floor and not having one. So I kept iterating: It takes a very long time, but if you wait, all those little specks will slowly twinkle out, until quantization starts to get in the way, with a loss of about 1/255 per pixel (0.0039). Semi converged Fully converged The effect on the blurred output is remarkable. All the large scale structure disappears, as you'd expect from spectra, leaving only the bandpass ringing. That goes away with a strong enough blur, or a large enough dark zone. The visual difference between the two is slight, but nevertheless, the difference is significant and pervasive when amplified: Semi converged Fully converged Difference Final spectrum I tried a few indigo noise curves, with different % of the curve zero. The resulting noise is all extremely equal, even after a blur and amplify. The only visible noise left is bandpass, and the noise floor is so low it may as well not be there. As you make the black exclusion zone bigger, the noise gets concentrated in the edges and corners. It becomes a bit more linear and squarish, a contender for violet noise. This is basically a smooth evolution towards a pure pixel checkboard in the limit. Using more than 50% zero seems inadvisable for this reason: Time domain Frequency domain At this point the idea was validated, but it was dog slow. Can it be done faster? Spatially Sparse An FFT scales like O(N log N). When you are dealing with images and volumes, that N is actually an N² or N³ in practice. The early phase of the search is the easiest to speed up, because you can find a good swap for any pixel with barely any tries. There is no point in being clever. Each sub-region is very non-uniform, and its spectrum nearly white. Placing pixels roughly by the right location is plenty good enough. You might try splitting a large volume into separate blocks, and optimize each block separately. That wouldn't work, because all the boundaries remain fully white. Overlapping doesn't fix this, because they will actively create new seams. I tried it. What does work is a windowed scoring strategy. It avoids a full FFT for the entire volume, and only scores each NxN or NxNxN region around each swapped point, with N-sized FFTs in each dimension. This is enormously faster and can rapidly dissolve larger volumes of white noise into approximate blue even with e.g. N = 8 or N = 16. Eventually it stops improving and you need to bump the range or switch to a global optimizer. Here's the progression from white noise, to when sparse 16x16 gives up, followed by some additional 64x64: Time domain Frequency domain Time domain Frequency domain Time domain Frequency domain A naive solution does not work well however. This is because the spectrum of a subregion does not match the spectrum of the whole. The Fourier transform assumes each signal is periodic. If you take a random subregion and forcibly repeat it, its new spectrum will have aliasing artifacts. This would cause you to consistently misjudge swaps. To fix this, you need to window the signal in the space/time-domain. This forces it to start and end at 0, and eliminates the effect of non-matching boundaries on the scoring. I used a smoothStep window because it's cheap, and haven't needed to try anything else: 16x16 windowed data This still alters the spectrum, but in a predictable way. A time-domain window is a convolution in the frequency domain. You don't actually have a choice here: not using a window is mathematically equivalent to using a very bad window. It's effectively a box filter covering the cut-out area inside the larger volume, which causes spectral ringing. The effect of the chosen window on the target spectrum can be modeled via convolution of their spectral magnitudes: $$ \mathbf{target}' = |\mathbf{target}| \circledast |\mathcal{F}(\mathbf{window})| $$ This can be done via the time domain as: $$ \mathbf{target}' = \mathcal{F}(\mathcal{F}^{-1}(|\mathbf{target}|) \cdot \mathcal{F}^{-1}(|\mathcal{F}(\mathbf{window})|)) $$ Note that the forward/inverse Fourier pairs are not redundant, as there is an absolute value operator in between. This discards the phase component of the window, which is irrelevant. Curiously, while it is important to window the noise data, it isn't very important to window the target. The effect of the spectral convolution is small, amounting to a small blur, and the extra error is random and absorbed by the smooth scoring function. The resulting local loss tracks the global loss function pretty closely. It massively speeds up the search in larger volumes, because the large FFT is the bottleneck. But it stops working well before anything resembling convergence in the frequency-domain. It does not make true blue noise, only a lookalike. The overall problem is still that we can't tell good swaps from bad swaps without trying them and verifying. Sleight of Frequency So, let's characterize the effect of a pixel swap. Given a signal [A B C D E F G H], let's swap C and F. Swapping the two values is the same as adding F - C = Δ to C, and subtracting that same delta from F. That is, you add the vector: V = [0 0 Δ 0 0 -Δ 0 0] This remains true if you apply a Fourier transform and do it in the frequency domain. To best understand this, you need to develop some intuition around FFTs of Dirac deltas. Consider the short filter kernel [1 4 6 4 1]. It's a little known fact, but you can actually sight-read its frequency spectrum directly off the coefficients, because the filter is symmetrical. I will teach you. The extremes are easy: The DC amplitude is the sum 1 + 4 + 6 + 4 + 1 = 16 The Nyquist amplitude is the modulated sum 1 - 4 + 6 - 4 + 1 = 0 So we already know it's an 'ideal' lowpass filter, which reduces the Nyquist signal +1, -1, +1, -1, ... to exactly zero. It also has 16x DC gain. Now all the other frequencies. First, remember the Fourier transform works in symmetric ways. Every statement "____ in the time domain = ____ in the frequency domain" is still true if you swap the words time and frequency. This has lead to the grotesquely named sub-field of cepstral processing where you have quefrencies and vawes, and it kinda feels like you're having a stroke. The cepstral convolution filter from earlier is called a lifter. Usually cepstral processing is applied to the real magnitude of the spectrum, i.e. $ |\mathrm{spectrum}| $, instead of its true complex value. This is a coward move. So, decompose the kernel into symmetric pairs: [· · 6 · ·] [· 4 · 4 ·] [1 · · · 1] All but the first row is a pair of real Dirac deltas in the time domain. Such a row is normally what you get when you Fourier transform a cosine, i.e.: $$ \cos \omega = \frac{\mathrm{e}^{i\omega} + \mathrm{e}^{-i\omega}}{2} $$ A cosine in time is a pair of Dirac deltas in the frequency domain. The phase of a (real) cosine is zero, so both its deltas are real. Now flip it around. The Fourier transform of a pair [x 0 0 ... 0 0 x] is a real cosine in frequency space. Must be true. Each new pair adds a new higher cosine on top of the existing spectrum. For the central [... 0 0 x 0 0 ...] we add a DC term. It's just a Fourier transform in the other direction: |FFT([1 4 6 4 1])| = [· · 6 · ·] => 6 [· 4 · 4 ·] => 8 cos(ɷ) [1 · · · 1] => 2 cos(2ɷ) = |6 + 8 cos(ɷ) + 2 cos(2ɷ)| Normally you have to use the z-transform to analyze a digital filter. But the above is a shortcut. FFTs and inverse FFTs do have opposite phase, but that doesn't matter here because cos(ɷ) = cos(-ɷ). This works for the symmetric-even case too: you offset the frequencies by half a band, ɷ/2, and there is no DC term in the middle: |FFT([1 3 3 1])| = [· 3 3 ·] => 6 cos(ɷ/2) [1 · · 1] => 2 cos(3ɷ/2) = |6 cos(ɷ/2) + 2 cos(3ɷ/2)| So, symmetric filters have spectra that are made up of regular cosines. Now you know. For the purpose of this trick, we centered the filter around $ t = 0 $. FFTs are typically aligned to array index 0. The difference between the two is however just phase, so it can be disregarded. What about the delta vector [0 0 Δ 0 0 -Δ 0 0]? It's not symmetric, so we have to decompose it: V1 = [· · · · · Δ · ·] V2 = [· · Δ · · · · ·] V = V2 - V1 Each is now an unpaired Dirac delta. Each vector's Fourier transform is a complex wave $ Δ \cdot \mathrm{e}^{-i \omega k} $ in the frequency domain (the k'th quefrency). It lacks the usual complementary oppositely twisting wave $ Δ \cdot \mathrm{e}^{i \omega k} $, so it's not real-valued. It has constant magnitude Δ and varying phase: FFT(V1) = [ Δ Δ Δ Δ Δ Δ Δ Δ ] FFT(V2) = [ Δ Δ Δ Δ Δ Δ Δ Δ ] These are vawes. The effect of a swap is still just to add FFT(V), aka FFT(V2) - FFT(V1) to the (complex) spectrum. The effect is to transfer energy between all the bands simultaneously. Hence, FFT(V1) and FFT(V2) function as a source and destination mask for the transfer. However, 'mask' is the wrong word, because the magnitude of $ \mathrm{e}^{i \omega k} $ is always 1. It doesn't have varying amplitude, only varying phase. -FFT(V1) and FFT(V2) define the complex direction in which to add/subtract energy. When added together their phases interfere constructively or destructively, resulting in an amplitude that varies between 0 and 2Δ: an actual mask. The resulting phase will be halfway between the two, as it's the sum of two equal-length complex numbers. FFT(V) = [ · Δ Δ Δ Δ Δ Δ Δ ] For any given pixel A and its delta FFT(V1), it can pair up with other pixels B to form N-1 different interference masks FFT(V2) - FFT(V1). There are N(N-1)/2 unique interference masks, if you account for (A,B) (B,A) symmetry. Worth pointing out, the FFT of the first index: FFT([Δ 0 0 0 0 0 0 0]) = [Δ Δ Δ Δ Δ Δ Δ Δ] This is the DC quefrency, and the fourier symmetry continues to work. Moving values in time causes the vawe's quefrency to change in the frequency domain. This is the upside-down version of how moving energy to another frequency band causes the wave's frequency to change in the time domain. What's the Gradient, Kenneth? Using vectors as masks... shifting energy in directions... this means gradient descent, no? Well. It's indeed possible to calculate the derivative of your loss function as a function of input pixel brightness, with the usual bag of automatic differentiation/backprop tricks. You can also do it numerically. But, this doesn't help you directly because the only way you can act on that per-pixel gradient is by swapping a pair of pixels. You need to find two quefrencies FFT(V1) and FFT(V2) which interfere in exactly the right way to decrease the loss function across all bad frequencies simultaneously, while leaving the good ones untouched. Even if the gradient were to help you pick a good starting pixel, that still leaves the problem of finding a good partner. There are still O(N²) possible pairs to choose from, and the entire spectrum changes a little bit on every swap. Which means new FFTs to analyze it. Random greedy search is actually tricky to beat in practice. Whatever extra work you spend on getting better samples translates into less samples tried per second. e.g. Taking a best-of-3 approach is worse than just trying 3 swaps in a row. Swaps are almost always orthogonal. But random() still samples unevenly because it's white noise. If only we had.... oh wait. Indeed if you already have blue noise of the right size, you can use that to mildly speed up the search for more. Use it as a random permutation to drive sampling, with some inevitable whitening over time to keep it fresh. You can't however use the noise you're generating to accelerate its own search, because the two are highly correlated. What's really going on is all a consequence of the loss function. Given any particular frequency band, the loss function is only affected when its magnitude changes. Its phase can change arbitrarily, rolling around without friction. The complex gradient must point in the radial direction. In the tangential direction, the partial derivative is zero. The value of a given interference mask FFT(V1) - FFT(V2) for a given frequency is also complex-valued. It can be projected onto the current phase, and split into its radial and tangential component with a dot product. The interference mask has a dual action. As we saw, its magnitude varies between 0 and 2Δ, as a function of the two indices k1 and k2. This creates a window that is independent of the specific state or data. It defines a smooth 'hash' from the interference of two quefrency bands. But its phase adds an additional selection effect: whether the interference in the mask is aligned with the current band's phase: this determines the split between radial and tangential. This defines a smooth phase 'hash' on top. It cycles at the average of the two quefrencies, i.e. a different, third one. Energy is only added/subtracted if both hashes are non-zero. If the phase hash is zero, the frequency band only turns. This does not affect loss, but changes how each mask will affect it in the future. This then determines how it is coupled to other bands when you perform a particular swap. Note that this is only true differentially: for a finite swap, the curvature of the complex domain comes into play. The loss function is actually a hyper-cylindrical skate bowl you can ride around. Just the movement of all the bands is tied together. Frequency bands with significant error may 'random walk' freely clockwise or counterclockwise when subjected to swaps. A band can therefor drift until it gets a turn where its phase is in alignment with enough similar bands, where the swap makes them all descend along the local gradient, enough to counter any negative effects elsewhere. In the time domain, each frequency band is a wave that oscillates between -1...1: it 'owns' some of the value of each pixel, but there are places where its weight is ~zero (the knots). So when a band shifts phase, it changes how much of the energy of each pixel it 'owns'. This allows each band to 'scan' different parts of the noise in the time domain. In order to fix a particular peak or dent in the frequency spectrum, the search must rotate that band's phase so it strongly owns any defect in the noise, and then perform a swap to fix that defect. Thus, my mental model of this is not actually disconnected pixel swapping. It's more like one of those Myst puzzles where flipping a switch flips some of the neighbors too. You press one pair of buttons at a time. It's a giant haunted dimmer switch. We're dealing with complex amplitudes, not real values, so the light also has a color. Mechanically it's like a slot machine, with dials that can rotate to display different sides. The cherries and bells are the color: they determine how the light gets brighter or darker. If a dial is set just right, you can use it as a /dev/null to 'dump' changes. That's what theory predicts, but does it work? Well, here is a (blurred noise) spectrum being late-optimized. The search is trying to eliminate the left-over lower frequency noise in the middle: Semi converged Here's the phase difference from the late stages of search, each a good swap. Left to right shows 4 different value scales: x2 x16 x256 x4096 x2 x16 x256 x4096 At first it looks like just a few phases are changing, but amplification reveals it's the opposite. There are several plateaus. Strongest are the bands being actively modified. Then there's the circular error area around it, where other bands are still swirling into phase. Then there's a sharp drop-off to a much weaker noise floor, present everywhere. These are the bands that are already converged. Compare to a random bad swap: x2 x16 x256 x4096 Now there is strong noise all over the center, and the loss immediately gets worse, as a bunch of amplitudes start shifting in the wrong direction randomly. So it's true. Applying the swap algorithm with a spectral target naturally cycles through focusing on different parts of the target spectrum as it makes progress. This information is positionally encoded in the phases of the bands and can be 'queried' by attempting a swap. This means the constraint of a fixed target spectrum is actually a constantly moving target in the complex domain. Frequency bands that reach the target are locked in. Neither their magnitude nor phase changes in aggregate. The random walks of such bands must have no DC component... they must be complex-valued blue noise with a tiny amplitude. Knowing this doesn't help directly, but it does explain why the search is so hard. Because the interference masks function like hashes, there is no simple pattern to how positions map to errors in the spectrum. And once you get close to the target, finding new good swaps is equivalent to digging out information encoded deep in the phase domain, with O(N²) interference masks to choose from. Gradient Sampling As I was trying to optimize for evenness after blur, it occurred to me to simply try selecting bright or dark spots in the blurred after-image. This is the situation where frequency bands are in coupled alignment: the error in the spectrum has a relatively concentrated footprint in the time domain. But, this heuristic merely picks out good swaps that are already 'lined up' so to speak. It only works as a low-hanging fruit sampler, with rapidly diminishing returns. Next I used the gradient in the frequency domain. The gradient points towards increasing loss, which is the sum of squared distance $ (…)^2 $. So the slope is $ 2(…) $, proportional to distance to the goal: $$ |\mathrm{gradient}_k| = 2 \cdot \left( \frac{|\mathrm{spectrum}_k|}{||\mathbf{spectrum}||} - \frac{\mathrm{target}_k}{||\mathbf{target}||} \right) $$ It's radial, so its phase matches the spectrum itself: $$ \mathrm{gradient}_k = \mathrm{|gradient_k|} \cdot \left(1 ∠ \mathrm{arg}(\mathrm{spectrum}_k) \right) $$ Eagle-eyed readers may notice the sqrt part of the L2 norm is missing here. It's only there for normalization, and in fact, you generally want a gradient that decreases the closer you get to the target. It acts as a natural stabilizer, forming a convex optimization problem. You can transport this gradient backwards by applying an inverse FFT. Usually derivatives and FFTs don't commute, but that's only when you are deriving in the same dimension as the FFT. The partial derivative here is neither over time nor frequency, but by signal value. The resulting time-domain gradient tells you how fast the (squared) loss would change if a given pixel changed. The sign tells you whether it needs to become lighter or darker. In theory, a pixel with a large gradient can enable larger score improvements per step. It says little about what's a suitable pixel to pair with though. You can infer that a pixel needs to be paired with one that is brighter or darker, but not how much. The gradient only applies differentially. It involves two pixels, so it will cause interference between the two deltas, and also with the signal's own phase. The time-domain gradient does change slowly after every swap—mainly the swapping pixels—so this only needs to add an extra IFFT every N swap attempts, reusing it in between. I tried this in two ways. One was to bias random sampling towards points with the largest gradients. This barely did anything, when applied to one or both pixels. Then I tried going down the list in order, and this worked better. I tried a bunch of heuristics here, like adding a retry until paired, and a 'dud' tracker to reject known unpairable pixels. It did lead to some minor gains in successful sample selection. But beating random was still not a sure bet in all cases, because it comes at the cost of ordering and tracking all pixels to sample them. All in all, it was quite mystifying. Pair Analysis Hence I analyzed all possible swaps (A,B) inside one 64x64 image at different stages of convergence, for 1024 pixels A (25% of total). The result was quite illuminating. There are 2 indicators of a pixel's suitability for swapping: % of all possible swaps (A,_) that are good score improvement of best possible swap (A,B) They are highly correlated, and you can take the geometric average to get a single quality score to order by: The curve shows that the best possible candidates are rare, with a sharp drop-off at the start. Here the average candidate is ~1/3rd as good as the best, though every pixel is pairable. This represents the typical situation when you have unconverged blue-ish noise. Order all pixels by their (signed) gradient, and plot the quality: The distribution seems biased towards the ends. A larger absolute gradient at A can indeed lead to both better scores and higher % of good swaps. Notice that it's also noisier at the ends, where it dips below the middle. If you order pixels by their quality, and then plot the absolute gradient, you see: Selecting for large gradient at A will select both the best and the worst possible pixels A. This implies that there are pixels in the noise that are very significant, but are nevertheless currently 'unfixable'. This corresponds to the 'focus' described earlier. By drawing from the 'top', I was mining the imbalance between the good/left and bad/right distribution. Selecting for a vanishing gradient would instead select the average-to-bad pixels A. I investigated one instance of each: very good, average or very bad pixel A. I tried every possible swap (A, B) and plotted the curve again. Here the quality is just the actual score improvement: The three scenarios have similar curves, with the bulk of swaps being negative. Only a tiny bit of the curve is sticking out positive, even in the best case. The potential benefit of a good swap is dwarfed by the potential harm of bad swaps. The main difference is just how many positive swaps there are, if any. So let's focus on the positive case, where you can see best. You can order by score, and plot the gradient of all the pixels B, to see another correlation. It looks kinda promising. Here the sign matters, with left and right being different. If the gradient of pixel A is the opposite sign, then this graph is mirrored. But if you order by (signed) gradient and plot the score, you see the real problem, caused by the noise: The good samples are mixed freely among the bad ones, with only a very weak trend downward. This explains why sampling improvements based purely on gradient for pixel B are impossible. You can see what's going on if you plot Δv, the difference in value between A and B: For a given pixel A, all the good swaps have a similar value for B, which is not unexpected. Its mean is the ideal value for A, but there is a lot of variance. In this case pixel A is nearly white, so it is brighter than almost every other pixel B. If you now plot Δv * -gradient, you see a clue on the left: Almost all of the successful swaps have a small but positive value. This represents what we already knew: the gradient's sign tells you if a pixel should be brighter or darker. If Δv has the opposite sign, the chances of a successful swap are slim. Ideally both pixels 'face' the right way, so the swap is beneficial on both ends. But only the combined effect on the loss matters: i.e. Δv * Δgradient < 0. It's only true differentially so it can misfire. But compared to blind sampling of pairs, it's easily 5-10x better and faster, racing towards the tougher parts of the search. What's more... while this test is just binary, I found that any effort spent on trying to further prioritize swaps by the magnitude of the gradient is entirely wasted. Maximizing Δv * Δgradient by repeated sampling is counterproductive, because it selects more bad candidates on the right. Minimizing Δv * Δgradient creates more successful swaps on the left, but lowers the average improvement per step so the convergence is net slower. Anything more sophisticated incurs too much computation to be worth it. It does have a limit. This is what it looks like when an image is practically fully converged: Eventually you reach the point where there are only a handful of swaps with any real benefit, while the rest is just shaving off a few bits of loss at a time. It devolves back to pure random selection, only skipping the coin flip for the gradient. It is likely that more targeted heuristics can still work here. The gradient also works in the early stages. As it barely changes over successive swaps, this leads to a different kind of sparse mode. Instead of scoring only a subset of pixels, simply score multiple swaps as a group over time, without re-scoring intermediate states. This lowers the success rate roughly by a power (e.g. 0.8 -> 0.64), but cuts the number of FFTs by a constant factor (e.g. 1/2). Early on this trade-off can be worth it. Even faster: don't score steps at all. In the very early stage, you can easily get up to 80-90% successful swaps just by filtering on values and gradients. If you just swap a bunch in a row, there is a very good chance you will still end up better than before. It works better than sparse scoring: using the gradient of your true objective approximately works better than using an approximate objective exactly. The latter will miss the true goal by design, while the former continually re-aims itself to the destination despite inaccuracy. Obviously you can mix and match techniques, and gradient + sparse is actually a winning combo. I've only scratched the surface here. Warp Speed Time to address the elephant in the room. If the main bottleneck is an FFT, wouldn't this work better on a GPU? The answer to that is an unsurprising yes, at least for large sizes where the overhead of async dispatch is negligible. However, it would have been endlessly more cumbersome to discover all of the above based on a GPU implementation, where I can't just log intermediate values to a console. After checking everything, I pulled out my bag of tricks and ported it to Use.GPU. As a result, the algorithm runs entirely on the GPU, and provides live visualization of the entire process. It requires a WebGPU-enabled browser, which in practice means Chrome on Windows or Mac, or a dev build elsewhere. I haven't particularly optimized this—the FFT is vanilla textbook—but it works. It provides an easy ~8x speed up on an M1 Mac on beefier target sizes. With a desktop GPU, 128x128x32 and larger become very feasible. It lacks a few goodies from the scripts, and only does gradient + optional sparse. You can however freely exchange PNGs between the CPU and GPU version via drag and drop, as long as the settings match. Layout components Compute components Worth pointing out: this visualization is built using Use.GPU's HTML-like layout system. I can put div-like blocks inside a flex box wrapper, and put text beside it... while at the same time using raw WGSL shaders as the contents of those divs. These visualization shaders sample and colorize the algorithm state on the fly, with no CPU-side involvement other than a static dispatch. The only GPU -> CPU readback is for the stats in the corner, which are classic React and real HTML, along with the rest of the controls. I can then build an <FFT> component and drop it inside an async <ComputeLoop>, and it does exactly what it should. The rest is just a handful of <Dispatch> elements and the ordinary headache of writing compute shaders. <Suspense> ensures all the shaders are compiled before dispatching. While running, the bulk of the tree is inert, with only a handful of reducers triggering on a loop, causing a mere 7 live components to update per frame. The compute dispatch fights with the normal rendering for GPU resources, so there is an auto-batching mechanism that aims for approximately 30-60 FPS. The display is fully anti-aliased, including the pixelized data. I'm using the usual per-pixel SDF trickery to do this... it's applied as a generic wrapper shader for any UV-based sampler. It's a good showcase that Use.GPU really is React-for-GPUs with less hassle, but still with all the goodies. It bypasses most of the browser once the canvas gets going, and it isn't just for UI: you can express async compute just fine with the right component design. The robust layout and vector plotting capabilities are just extra on top. I won't claim it's the world's most elegant abstraction, because it's far too pragmatic for that. But I simply don't know any other programming environment where I could even try something like this and not get bogged down in GPU binding hell, or have to round-trip everything back to the CPU. * * * So there you have it: blue and indigo noise à la carte. What I find most interesting is that the problem of generating noise in the time domain has been recast into shaping and denoising a spectrum in the frequency domain. It starts as white noise, and gets turned into a pre-designed picture. You do so by swapping pixels in the other domain. The state for this process is kept in the phase channel, which is not directly relevant to the problem, but drifts into alignment over time. Hence I called it Stable Fiddusion. If you swap the two domains, you're turning noise into a picture by swapping frequency bands without changing their values. It would result in a complex-valued picture, whose magnitude is the target, and whose phase encodes the progress of the convergence process. This is approximately what you get when you add a hidden layer to a diffusion model. What I also find interesting is that the notion of swaps naturally creates a space that is O(N²) big with only N samples of actual data. Viewed from the perspective of a single step, every pair (A,B) corresponds to a unique information mask in the frequency domain that extracts a unique delta from the same data. There is redundancy, of course, but the nature of the Fourier transform smears it out into one big superposition. When you do multiple swaps, the space grows, but not quite that fast: any permutation of the same non-overlapping swaps is equivalent. There is also a notion of entanglement: frequency bands / pixels are linked together to move as a whole by default, but parts will diffuse into being locked in place. Phase is kind of the bugbear of the DSP world. Everyone knows it's there, but they prefer not to talk about it unless its content is neat and simple. Hopefully by now you have a better appreciation of the true nature of a Fourier transform. Not just as a spectrum for a real-valued signal, but as a complex-valued transform of a complex-valued input. During a swap run, the phase channel continuously looks like noise, but is actually highly structured when queried with the right quefrency hashes. I wonder what other things look like that, when you flip them around. More: Stable Fiddusion app CPU-side source code WebGPU source code

a year ago • 19 votes

More in programming

The future of large files in Git is Git

.title {text-wrap:balance;} #content > p:first-child {text-wrap:balance;} If Git had a nemesis, it’d be large files. Large files bloat Git’s storage, slow down git clone, and wreak havoc on Git forges. In 2015, GitHub released Git LFS—a Git extension that hacked around problems with large files. But Git LFS added new complications and storage costs. Meanwhile, the Git project has been quietly working on large files. And while LFS ain’t dead yet, the latest Git release shows the path towards a future where LFS is, finally, obsolete. What you can do today: replace Git LFS with Git partial clone Git LFS works by storing large files outside your repo. When you clone a project via LFS, you get the repo’s history and small files, but skip large files. Instead, Git LFS downloads only the large files you need for your working copy. In 2017, the Git project introduced partial clones that provide the same benefits as Git LFS: Partial clone allows us to avoid downloading [large binary assets] in advance during clone and fetch operations and thereby reduce download times and disk usage. – Partial Clone Design Notes, git-scm.com Git’s partial clone and LFS both make for: Small checkouts – On clone, you get the latest copy of big files instead of every copy. Fast clones – Because you avoid downloading large files, each clone is fast. Quick setup – Unlike shallow clones, you get the entire history of the project—you can get to work right away. What is a partial clone? A Git partial clone is a clone with a --filter. For example, to avoid downloading files bigger than 100KB, you’d use: git clone --filter='blobs:size=100k' <repo> Later, Git will lazily download any files over 100KB you need for your checkout. By default, if I git clone a repo with many revisions of a noisome 25 MB PNG file, then cloning is slow and the checkout is obnoxiously large: $ time git clone https://github.com/thcipriani/noise-over-git Cloning into '/tmp/noise-over-git'... ... Receiving objects: 100% (153/153), 1.19 GiB real 3m49.052s Almost four minutes to check out a single 25MB file! $ du --max-depth=0 --human-readable noise-over-git/. 1.3G noise-over-git/. $ ^ 🤬 And 50 revisions of that single 25MB file eat 1.3GB of space. But a partial clone side-steps these problems: $ git config --global alias.pclone 'clone --filter=blob:limit=100k' $ time git pclone https://github.com/thcipriani/noise-over-git Cloning into '/tmp/noise-over-git'... ... Receiving objects: 100% (1/1), 24.03 MiB real 0m6.132s $ du --max-depth=0 --human-readable noise-over-git/. 49M noise-over-git/ $ ^ 😻 (the same size as a git lfs checkout) My filter made cloning 97% faster (3m 49s → 6s), and it reduced my checkout size by 96% (1.3GB → 49M)! But there are still some caveats here. If you run a command that needs data you filtered out, Git will need to make a trip to the server to get it. So, commands like git diff, git blame, and git checkout will require a trip to your Git host to run. But, for large files, this is the same behavior as Git LFS. Plus, I can’t remember the last time I ran git blame on a PNG 🙃. Why go to the trouble? What’s wrong with Git LFS? Git LFS foists Git’s problems with large files onto users. And the problems are significant: 🖕 High vendor lock-in – When GitHub wrote Git LFS, the other large file systems—Git Fat, Git Annex, and Git Media—were agnostic about the server-side. But GitHub locked users to their proprietary server implementation and charged folks to use it.1 💸 Costly – GitHub won because it let users host repositories for free. But Git LFS started as a paid product. Nowadays, there’s a free tier, but you’re dependent on the whims of GitHub to set pricing. Today, a 50GB repo on GitHub will cost $40/year for storage. In contrast, storing 50GB on Amazon’s S3 standard storage is $13/year. 😰 Hard to undo – Once you’ve moved to Git LFS, it’s impossible to undo the move without rewriting history. 🌀 Ongoing set-up costs – All your collaborators need to install Git LFS. Without Git LFS installed, your collaborators will get confusing, metadata-filled text files instead of the large files they expect. The future: Git large object promisors Large files create problems for Git forges, too. GitHub and GitLab put limits on file size2 because big files cost more money to host. Git LFS keeps server-side costs low by offloading large files to CDNs. But the Git project has a new solution. Earlier this year, Git merged a new feature: large object promisers. Large object promisors aim to provide the same server-side benefits as LFS, minus the hassle to users. This effort aims to especially improve things on the server side, and especially for large blobs that are already compressed in a binary format. This effort aims to provide an alternative to Git LFS – Large Object Promisors, git-scm.com What is a large object promisor? Large object promisors are special Git remotes that only house large files. In the bright, shiny future, large object promisors will work like this: You push a large file to your Git host. In the background, your Git host offloads that large file to a large object promisor. When you clone, the Git host tells your Git client about the promisor. Your client will clone from the Git host, and automagically nab large files from the promisor remote. But we’re still a ways off from that bright, shiny future. Git large object promisors are still a work in progress. Pieces of large object promisors merged to Git in March of 2025. But there’s more to do and open questions yet to answer. And so, for today, you’re stuck with Git LFS for giant files. But once large object promisors see broad adoption, maybe GitHub will let you push files bigger than 100MB. The future of large files in Git is Git. The Git project is thinking hard about large files, so you don’t have to. Today, we’re stuck with Git LFS. But soon, the only obstacle for large files in Git will be your half-remembered, ominous hunch that it’s a bad idea to stow your MP3 library in Git. Edited by Refactoring English Later, other Git forges made their own LFS servers. Today, you can push to multiple Git forges or use an LFS transfer agent, but all this makes set up harder for contributors. You’re pretty much locked-in unless you put in extra effort to get unlocked.↩︎ File size limits: 100MB for GitHub, 100MB for GitLab.com↩︎

19 hours ago • 6 votes

Just a Little More Context Bro, I Promise, and It’ll Fix Everything

Conrad Irwin has an article on the Zed blog “Why LLMs Can't Really Build Software”. He says it boils down to: the distinguishing factor of effective engineers is their ability to build and maintain clear mental models We do this by: Building a mental model of what you want to do Building a mental model of what the code does Reducing the difference between the two It’s kind of an interesting observation about how we (as humans) problem solve vs. how we use LLMs to problem solve: With LLMs, you stuff more and more information into context until it (hopefully) has enough to generate a solution. With your brain, you tweak, revise, or simplify your mental model more and more until the solution presents itself. One adds information — complexity you might even say — to solve a problem. The other eliminates it. You know what that sort of makes me think of? NPM driven development. Solving problems with LLMs is like solving front-end problems with NPM: the “solution” comes through installing more and more things — adding more and more context, i.e. more and more packages. LLM: Problem? Add more context. NPM: Problem? There’s a package for that. Contrast that with a solution that comes through simplification. You don’t add more context. You simplify your mental model so you need less to solve a problem — if you solve it at all, perhaps you eliminate the problem entirely! Rather than install another package to fix what ails you, you simplify your mental model which often eliminates the problem you had in the first place; thus eliminating the need to solve any problem at all, or to add any additional context or complexity (or dependency). As I’m typing this, I’m thinking of that image of the evolution of the Raptor engine, where it evolved in simplicity: This stands in contrast to my working with LLMs, which often wants more and more context from me to get to a generative solution: I know, I know. There’s probably a false equivalence here. This entire post started as a note and I just kept going. This post itself needs further thought and simplification. But that’ll have to come in a subsequent post, otherwise this never gets published lol. Email · Mastodon · Bluesky

20 hours ago • 3 votes

How to Leverage the CPU’s Micro-Op Cache for Faster Loops

Measuring, analyzing, and optimizing loops using Linux perf, Top-Down Microarchitectural Analysis, and the CPU’s micro-op cache

yesterday • 5 votes

Omarchy micro-forks Chromium

You can just change things! That's the power of open source. But for a lot of people, it might seem like a theoretical power. Can you really change, say, Chrome? Well, yes! We've made a micro fork of Chromium for Omarchy (our new 37signals Linux distribution). Just to add one feature needed for live theming. And now it's released as a package anyone can install on any flavor of Arch using the AUR (Arch User Repository). We got it all done in just four days. From idea, to solicitation, to successful patch, to release, to incorporation. And now it'll be part of the next release of Omarchy. There are no speed limits in open source. Nobody to ask for permission. You have the code, so you can make the change. All you need is skill and will (and maybe, if you need someone else to do it for you, a $5,000 incentive 😄).

2 days ago • 4 votes

Choosing Tools To Make Websites

Jan Miksovsky lays out his idea for website creation as content transformation. He starts by talking about tools that hide what’s happening “under the hood”: A framework’s marketing usually pretends it is unnecessary for you to understand how its core transformation works — but without that knowledge, you can’t achieve the beautiful range of results you see in the framework’s sample site gallery. This is a great callout. Tools will say, “You don’t have to worry about the details.” But the reality is, you end up worrying about the details — at least to some degree. Why? Because what you want to build is full of personalization. That’s how you differentiate yourself, which means you’re going to need a tool that’s expressive enough to help you. So the question becomes: how hard is it to understand the details that are being intentionally hidden away? A lot of the time those details are not exposed directly. Instead they’re exposed through configuration. But configuration doesn’t really help you learn how something works. I mean, how many of you have learned how typescript works under the hood by using tsconfig.json? As Jan says: Configuration can lead to as many problems as it solves Nailed it. He continues: Configuring software is itself a form of programming, in fact a rather difficult and often baroque form. It can take more data files or code to configure a framework’s transformation than to write a program that directly implements that transformation itself. I’m not a Devops person, but that sounds like Devops in a nutshell right there. (It also perfectly encapsulates my feelings on trying to setup configuration in GitHub Actions.) Jan moves beyond site creation to also discuss site hosting. He gives good reasons for keeping your website’s architecture simple and decoupled from your hosting provider (something I’ve been a long time proponent of): These site hosting platforms typically charge an ongoing subscription fee. (Some offer a free tier that may meet your needs.) The monthly fee may not be large, but it’s forever. Ten years from now you’ll probably still want your content to be publicly available, but will you still be happy paying that monthly fee? If you stop paying, your site disappears. In subscription pricing, any price (however small) is recurring. Stated differently: pricing is forever. Anyhow, it’s a good read from Jan and lays out his vision for why he’s building Web Origami: a tool for that encourages you to understand (and customize) how you transform content to a website. He just launched version 0.4.0 which has some exciting stuff I’m excited to try out further (I’ll have to write about all that later). Email · Mastodon · Bluesky

3 days ago • 5 votes

New here?