Full Width [alt+shift+f] Shortcuts [alt+shift+k]
Sign Up [alt+shift+s] Log In [alt+shift+l]
27
Newton's method is probably one of the most popular algorithm for finding the roots of a function through successive numeric approximations. In less cryptic words, if you have an opaque function f(x), and you need to solve f(x)=0 (finding where the function crosses the x-axis), the Newton-Raphson method gives you a dead simple cookbook to achieve that (a few conditions need to be met though). I recently had to solve a similar problem where instead of finding the roots I had to inverse the function. At first glance this may sound like two entirely different problems but in practice it's almost the same thing. Since I barely avoided a mental breakdown in the process of figuring that out, I thought it would make sense to share the experience of walking the road to enlightenment. A function and its inverse We are given a funky function, let's say f(x)=2/3(x+1)²-sin(x)-1, and we want to figure out its inverse f¯¹(): The diagonal is highlighted for the symmetry to be more obvious. One thing...
over a year ago

Improve your reading experience

Logged in users get linked directly to articles resulting in a better reading experience. Please login for free, it takes less than 1 minute.

More from A small freedom area RSS

Fixing the iterative damping interpolation in video games

As I'm exploring the fantastic world of indie game development lately, I end up watching a large number of video tutorials on the subject. Even though the quality of the content is pretty variable, I'm very grateful to the creators for it. That being said, I couldn't help noticing this particular bit times and times again: a = lerp(a, B, delta * RATE) Behind this apparent banal call hides a terrible curse, forever perpetrated by innocent souls on the Internet. In this article we will study what it's trying to achieve, how it works, why it's wrong, and then we'll come up with a good solution to the initial problem. The usual warning: I don't have a mathematics or academic background, so the article is addressed at other neanderthals like myself, who managed to understand that pressing keys on a keyboard make pixels turn on and off. What is it? Let's start from the beginning. We're in a game engine main loop callback called at a regular interval (roughly), passing down the time difference from the last call. In Godot engine, it looks like this: func _physics_process(delta: float): ... If the game is configured to refresh at 60 FPS, we can expect this function to be called around 60 times per second with delta = 1/60 = 0.01666.... As a game developer, we want some smooth animations for all kind of transformations. For example, we may want the speed of the player to go down to zero as they release the moving key. We could do that linearly, but to make the stop less brutal and robotic we want to slow down the speed progressively. Linear (top) versus smooth/exponential (bottom) animation Virtually every tutorial will suggest updating some random variable with something like that: velocity = lerp(velocity, 0, delta * RATE) At 60 FPS, a decay RATE defined to 3.5, and an initial velocity of 100, the velocity will go down to 0 following this curve: Example curve of a decaying variable Note: velocity is just a variable name example, it can be found in many other contexts If you're familiar with lerp() ("linear interpolation") you may be wondering why this is making a curve. Indeed, this lerp() function, also known as mix(), is a simple linear function defined as lerp(a,b,x) = x*(b-a) + a or its alternative stable form lerp(a,b,x) = (1-a)x + bx. For more information, see a previous article about this particular function. But here we are re-using the previous value, so this essentially means nesting lerp() function calls, which expands into a power formula, forming a curve composed of a chain of small straight segments. Why is it wrong? The main issue is that the formula is heavily depending on the refresh rate. If the game is supposed to work at 30, 60, or 144 FPS, then it means the physics engine is going to behave differently. Here is an illustration of the kind of instability we can expect: Comparison of the curves at different frame rates with the problematic formula Note that the inaccuracy when compared to an ideal curve is not the issue here. The problem is that the game mechanics are different depending on the hardware, the system, and the wind direction observed in a small island of Japan. Imagine being able to jump further if we replace our 60Hz monitor with a 144Hz one, that would be some nasty pay to win incentive. We may be able to get away with this by forcing a constant refresh rate for the game and consider this a non-issue (I'm not convinced this is achievable on all engines and platforms), but then we meet another problem: the device may not be able to hold this requirement at all times because of potential lags (for reasons that may be outside our control). That's right, so far we assumed delta=1/FPS but that's merely a target, it could fluctuate, causing mild to dramatic situations gameplay wise. One last issue with that formula is the situation of a huge delay spike, causing an overshooting of the target. For example if we have RATE=3 and we end up with a frame that takes 500ms for whatever random reason, we're going to interpolate with a value of 1.5, which is way above 1. This is easily fixed by maxing out the 3rd argument of lerp to 1, but we have to keep that issue in mind. To summarize, the formula is: not frame rate agnostic ❌ non deterministic ❌ vulnerable to overshooting ❌ If you're not interested in the gory details on the how, you can now jump straight to the conclusion for a better alternative. Study We're going to switch to a more mathematical notation from now on. It's only going to be linear algebra, nothing particularly fancy, but we're going to make a mess of 1 letter symbols, bear with me. Let's name the exhaustive list of inputs of our problem: initial value: a_0=\Alpha (from where we start, only used once) target value: \Beta (where we are going, constant value) time delta: \Delta_n (time difference from last call) the rate of change: R (arbitrary scaling user constant) original sequence: a_{n+1} = \texttt{lerp}(a_n, \Beta, R\Delta_n) (the code in the main loop callback) frame rate: F (the target frame rate, for example 60 FPS) time: t (animation time elapsed) What we are looking for is a new sequence formula u_n (u standing for purfect) that doesn't have the 3 previously mentioned pitfalls. The first thing we can do is to transform this recursive sequence into the expected ideal contiguous time based function. The original sequence was designed for a given rate R and FPS F: this means that while \Delta_n changes in practice every frame, the ideal function we are looking for is constant: \Delta=1/F. So instead of starting from a_{n+1} = \texttt{lerp}(a_n, \Beta, R\Delta_n), we will look for u_n starting from u_{n+1} = \texttt{lerp}(u_n, \Beta, R\Delta) with u_0=a_0=\Alpha. Since I'm lazy and incompetent, we are just going to ask WolframAlpha for help finding the solution to the recursive sequence. But to feed its input we need to simplify the terms a bit: ...with P=(1-R\Delta) and Q=\Beta R\Delta. We do that so we have a familiar ax+b linear form. According to WolframAlpha this is equivalent to: This is great because we now have the formula according to n, our frame number. We can also express that discrete sequence into a contiguous function according to the time t: Expanding our temporary P and Q placeholders with their values and unrolling, we get: This function perfectly matches the initial lerp() sequence in the hypothetical situation where the frame rate is honored. Basically, it's what the sequence a_{n+1} was meant to emulate at a given frame rate F. Note: we swapped the first 2 terms of lerp() at the last step because it makes more sense semantically to go from \Alpha to \Beta. Let's again summarize what we have and what we want: we're into the game main loop and we want our running value to stick to that f(t) function. We have: v=f(t): the value previously computed (t is the running duration so far, but we don't have it); in the original sequence this is known as a_n \Delta_n: the delta time for the current frame We are looking for a function \Eta(v,\Delta_n) which defines the position of a new point on the curve, only knowing v and \Delta_n. It's a "time agnostic" version of f(t). Basically, it is defined as \Eta(v,\Delta_n)=f(t+\Delta_n), but since we don't have t it's not very helpful. That being said, while we don't have t, we do have f(t) (the previous value v). Looking at the curve, we know the y-value of the previous point, and we know the difference between the new point and the previous point on the x-axis: Previous and current point in time If we want t (the total time elapsed at the previous point), we need the inverse function f^{-1}. Indeed, t = f^{-1}(f(t)): taking the inverse of a function gives back the input. We know f so we can inverse it, relying on WolframAlpha again (what a blessing this website is): Note: \ln stands for natural logarithm, sometimes also called \log. Careful though, on Desmos for example \log is in base 10, not base e (while its \exp is in base e for some reason). This complex formula may feel a bit intimidating but we can now find \Eta only using its two parameters: It's pretty messy but we can simplify that down to something much simpler. An interesting property that is going to be helpful here is m^n = e^{n \ln m}. For my fellow programmers getting tensed here: pow(m, n) == exp(n * log(m)). Similarly, e^{\ln x} = x (exp(log(x)) == x), which give us an idea where this is going. Again we swapped the first 2 arguments of lerp at the last step at the cost of an additional subtraction: this is more readable because \Beta is our destination point. Rewriting this in a sequence notation, we get: We still have this annoying F\ln(1-R/F) bit in the formula, but we can take it out and precompute it because it is constant: it's our rate conversion formula: We're going to make one extra adjustment: R' is negative, which is not exactly intuitive to work with as a user (in case it is defined arbitrarily and not through the conversion formula), so we make a sign swap for convenience: The conversion formula is optional, it's only needed to port a previously broken code to the new formula. One interesting thing here is that R' is fairly close to R when R is small. For example, a rate factor R=5 at 60 FPS gives us R' \approx 5.22. This means that if the rate factors weren't closely tuned, it is probably acceptable to go with R'=R and not bother with any conversion. Still, having that formula can be useful to update all the decay constants and check that everything still works as expected. Also, notice how if the delta gets very large, -R'\Delta_n is going toward -\infty, e^{-R'\Delta_n} toward 0, 1-e^{-R'\Delta_n} toward 1, and so the interpolation is going to reach our final target \Beta without overshooting. This means the formula doesn't need any extra care with regard to the 3rd issue we pointed out earlier. Looking at the previous curves but now with the new formula and an adjusted rate: Comparison of the curves at different frame rates with the new formula Conclusion So there we have it, the perfect formula, frame rate agnostic ✅, deterministic ✅ and resilient to overshooting ✅. If you've quickly skimmed through the maths, here is what you need to know: a = lerp(a, B, delta * RATE) Should be changed to: a = lerp(a, B, 1.0 - exp(-delta * RATE2)) With the precomputed RATE2 = -FPS * log(1 - RATE/FPS) (where log is the natural logarithm), or simply using RATE2 = RATE as a rough equivalent. Also, any existing overshooting clamping can safely be dropped. Now please adjust your game to make the world a better and safer place for everyone ♥

12 months ago 93 votes
Hacking window titles to help OBS

This write-up is meant to present the rationale and technical details behind a tiny project I wrote the other day, WTH, or WindowTitleHack, which is meant to force a constant window name for apps that keep changing it (I'm looking specifically at Firefox and Krita, but there are probably many others). Why tho? I've been streaming on Twitch from Linux (X11) with a barebone OBS Studio setup for a while now, and while most of the experience has been relatively smooth, one particularly striking frustration has been dealing with windows detection. If you don't want to capture the whole desktop for privacy reasons or simply to have control over the scene layout depending on the currently focused app, you need to rely on the Window Capture (XComposite) source. This works mostly fine, and it is actually able to track windows even when their title bar is renamed. But obviously, upon restart it can't find it again because both the window title and the window ID changed, meaning you have to redo your setup by reselecting the window again. It would have been acceptable if that was the only issue I had, but one of the more advanced feature I'm extensively using is the Advanced Scene Switcher (the builtin one, available through the Tools menu). This tool is a basic window title pattern matching system that allows automatic scene switches depending on the current window. Since it doesn't even seem to support regex, it's troublesome to have it reliably work with apps constantly changing their title (and even if it had, there is no guarantee that the app would leave a recognizable matchable pattern in its title). Hacking Windows One unreliable hack would be to spam xdotool commands to correct the window title. This could be a resource hog, and it would create quite a bunch of races. One slight improvement over this would be to use xprop -spy, but that wouldn't address the race conditions (since we would adjust the title after it's been already changed). So how do we deal with that properly? Well, on X11 with the reference library (Xlib) there are actually various (actually a lot of) ways of changing the title bar. It took me a while to identify which call(s) to target, but ended up with the following call graph, where each function is actually exposed publicly: From this we can easily see that we only need to hook the deepest function XChangeProperty, and check if the property is XA_WM_NAME (or its "modern" sibling, _NET_WM_NAME). How do we do that? With the help of the LD_PRELOAD environment variable and a dynamic library that implements a custom XChangeProperty. First, we grab the original function: #include <dlfcn.h> /* A type matching the prototype of the target function */ typedef int (*XChangeProperty_func_type)( Display *display, Window w, Atom property, Atom type, int format, int mode, const unsigned char *data, int nelements ); /* [...] */ XChangeProperty_func_type XChangeProperty_orig = dlsym(RTLD_NEXT, "XChangeProperty"); We also need to craft a custom _NET_WM_NAME atom: _NET_WM_NAME = XInternAtom(display, "_NET_WM_NAME", 0); With this we are now able to intercept all the WM_NAME events and override them with our own: if (property == XA_WM_NAME || property == _NET_WM_NAME) { data = (const unsigned char *)new_title; nelements = (int)strlen(new_title); } return XChangeProperty_orig(display, w, property, type, format, mode, data, nelements); We wrap all of this into our own redefinition of XChangeProperty and… that's pretty much it. Now due to a long history of development, Xlib has been "deprecated" and superseded by libxcb. Both are widely used, but fortunately the APIs are more or less similar. The function to hook is xcb_change_property, and defining _NET_WM_NAME is slightly more cumbered but not exactly challenging: const xcb_intern_atom_cookie_t cookie = xcb_intern_atom(conn, 0, strlen("_NET_WM_NAME"), "_NET_WM_NAME"); xcb_intern_atom_reply_t *reply = xcb_intern_atom_reply(conn, cookie, NULL); if (reply) _NET_WM_NAME = reply->atom; free(reply); Aside from that, the code is pretty much the same. Configuration To pass down the custom title to override, I've been relying on an environment variable WTH_TITLE. From a user point of view, it looks like this: LD_PRELOAD="builddir/libwth.so" WTH_TITLE="Krita4ever" krita We could probably improve the usability by creating a wrapping tool (so that we could have something such as ./wth --title=Krita4ever krita). Unfortunately I wasn't yet able to make a self-referencing executable accepted by LD_PRELOAD, so for now the manual LD_PRELOAD and WTH_TITLE environment will do just fine. Thread safety To avoid a bunch of redundant function roundtrips we need to globally cache a few things: the new title (to avoid fetching it in the environment all the time), the original functions (to save the dlsym call), and _NET_WM_NAME. Those are loaded lazily at the first function call, but we have no guarantee with regards to concurrent calls on that hooked function so we must create our own lock. I initially though about using pthread_once but unfortunately the initialization callback mechanism doesn't allow any custom argument. Again, this is merely a slight annoyance since we can implement our own in a few lines of code: /* The "once" API is similar to pthread_once but allows a custom function argument */ struct wth_once { pthread_mutex_t lock; int initialized; }; #define WTH_ONCE_INITIALIZER {.lock=PTHREAD_MUTEX_INITIALIZER} typedef void (*init_func_type)(void *user_arg); void wth_init_once(struct wth_once *once, init_func_type init_func, void *user_arg) { pthread_mutex_lock(&once->lock); if (!once->initialized) { init_func(user_arg); once->initialized = 1; } pthread_mutex_unlock(&once->lock); } Which we use like this: static struct wth_once once = WTH_ONCE_INITIALIZER; static void init_once(void *user_arg) { Display *display = user_arg; /* [...] */ } /* [...] */ wth_init_once(&once, init_once, display); The End? I've been delaying doing this project for weeks because it felt complex at first glance, but it actually just took me a few hours. Probably the same amount of time it took me to write this article. While the project is admittedly really small, it still feel like a nice accomplishment. I hope it's useful to other people. Now, the Wayland support is probably the most obvious improvement the project can receive, but I don't have such a setup locally to test yet, so this is postponed for an undetermined amount of time. The code is released with a permissive license (MIT); if you want to contribute you can open a pull request but getting in touch with me first is appreciated to avoid unnecessary and overlapping efforts.

a year ago 83 votes
Improving color quantization heuristics

In 2015, I wrote an article about how the palette color quantization was improved in FFmpeg in order to make nice animated GIF files. For some reason, to this day this is one of my most popular article. As time passed, my experience with colors grew and I ended up being quite ashamed and frustrated with the state of these filters. A lot of the code was naive (when not terribly wrong), despite the apparent good results. One of the major change I wanted to do was to evaluate the color distances using a perceptually uniform colorspace, instead of using a naive euclidean distance of RGB triplets. As usual it felt like a week-end long project; after all, all I have to do is change the distance function to work in a different space, right? Well, if you're following my blog you might have noticed I've add numerous adventures that stacked up on each others: I had to work out the colorspace with integer arithmetic first ...which forced me to look into integer division more deeply ...which confronted me to all sort of undefined behaviours in the process And when I finally reached the point where I could make the switch to OkLab (the perceptual colorspace), a few experiments showed that the flavour of the core algorithm I was using might contain some fundamental flaws, or at least was not implementing optimal heuristics. So here we go again, quickly enough I find myself starting a new research study in the pursuit of understanding how to put pixels on the screen. This write-up is the story of yet another self-inflicted struggle. Palette quantization But what is palette quantization? It essentially refers to the process of reducing the number of available colors of an image down to a smaller subset. In sRGB, an image can have up to 16.7 million colors. In practice though it's generally much less, to the surprise of no one. Still, it's not rare to have a few hundreds of thousands different colors in a single picture. Our goal is to reduce that to something like 256 colors that represent them best, and use these colors to create a new picture. Why you may ask? There are multiple reasons, here are some: Improve size compression (this is a lossy operation of course, and using dithering on top might actually defeat the original purpose) Some codecs might not support anything else than limited palettes (GIF or subtitles codecs are examples) Various artistic purposes Following is an example of a picture quantized at different levels: Original (26125 colors) Quantized to 8bpp (256 colors) Quantized to 2bpp (4 colors) This color quantization process can be roughly summarized in a 4-steps based process: Sample the input image: we build an histogram of all the colors in the picture (basically a simple statistical analysis) Design a colormap: we build the palette through various means using the histograms Create a pixel mapping which associates a color (one that can be found in the input image) with another (one that can be found in the newly created palette) Image quantizing: we use the color mapping to build our new image. This step may also involve some dithering. The study here will focus on step 2 (which itself relies on step 1). Colormap design algorithms A palette is simply a set of colors. It can be represented in various ways, for example here in 2D and 3D: To generate such a palette, all sort of algorithms exists. They are usually classified into 2 large categories: Dividing/splitting algorithms (such as Median-Cut and its various flavors) Clustering algorithms (such as K-means, maximin distance, (E)LBG or pairwise clustering) The former are faster but non-optimal while the latter are slower but better. The problem is NP-complete, meaning it's possible to find the optimal solution but it can be extremely costly. On the other hand, it's possible to find "local optimums" at minimal cost. Since I'm working within FFmpeg, speed has always been a priority. This was the reason that motivated me to initially implement the Median-Cut over a more expensive algorithm. The rough picture of the algorithm is relatively easy to grasp. Assuming we want a palette of K colors: A set S of all the colors in the input picture is constructed, along with a respective set W of the weight of each color (how much they appear) Since the colors are expressed as RGB triplets, they can be encapsulated in one big cuboid, or box The box is cut in two along one of the axis (R, G or B) on the median (hence the name of the algorithm) If we don't have a total K boxes yet, pick one of them and go back to previous step All the colors in each of the K boxes are then averaged to form the color palette entries Here is how the process looks like visually: Median-Cut algorithm targeting 16 boxes You may have spotted in this video that the colors are not expressed in RGB but in Lab: this is because instead of representing the colors in a traditional RGB colorspace, we are instead using the OkLab colorspace which has the property of being perceptually uniform. It doesn't really change the Median Cut algorithm, but it definitely has an impact on the resulting palette. One striking limitation of this algorithm is that we are working exclusively with cuboids: the cuts are limited to an axis, we are not cutting along an arbitrary plane or a more complex shape. Think of it like working with voxels instead of more free-form geometries. The main benefit is that the algorithm is pretty simple to implement. Now the description provided earlier conveniently avoided describing two important aspects happening in step 3 and 4: How do we choose the next box to split? How do we choose along which axis of the box we make the cut? I pondered about that for a quite a long time. An overview of the possible heuristics In bulk, some of the heuristics I started thinking of: should we take the box that has the longest axis across all boxes? should we take the box that has the largest volume? should we take the box that has the biggest Mean Squared Error when compared to its average color? should we take the box that has the axis with the biggest MSE? assuming we choose to go with the MSE, should it be normalized across all boxes? should we even account for the weight of each color or consider them equal? what about the axis? Is it better to pick the longest or the one with the higher MSE? I tried to formalize these questions mathematically to the best of my limited abilities. So let's start by saying that all the colors c of given box are stored in a N×M 2D-array following the matrix notation: L₁L₂L₃…Lₘ a₁a₂a₃…aₘ b₁b₂b₃…bₘ N is the number of components (3 in our case, whether it's RGB or Lab), and M the number of colors in that box. You can visualize this as a list of vectors as well, where c_{i,j} is the color at row i and column j. With that in mind we can sketch the following diagram representing the tree of heuristic possibilities to implement: Mathematicians are going to kill me for doodling random notes all over this perfectly understandable symbols gibberish, but I believe this is required for the human beings reading this article. In summary, we end up with a total of 24 combinations to try out: 2 axis selection heuristics: cut the axis with the maximum error squared cut the axis with the maximum length 3 operators: maximum measurement out of all the channels product of the measurements of all the channels sum of the measurements of all the channels 4 measurements: error squared, honoring weights error squared, not honoring weights error squared, honoring weights, normalized length of the axis If we start to intuitively think about which ones are likely going to perform the best, we quickly realize that we haven't actually formalized what we are trying to achieve. Such a rookie mistake. Clarifying this will help us getting a better feeling about the likely outcome. I chose to target an output that minimizes the MSE against the reference image, in a perceptual way. Said differently, trying to make the perceptual distance between an input and output color pixel as minimal as possible. This is an arbitrary and debatable target, but it's relatively simple and objective to evaluate if we have faith in the selected perceptual model. Another appropriate metric could have been to find the ideal palette through another algorithm, and compare against that instead. Doing that unfortunately implied that I would trust that other algorithm, its implementation, and that I have enough computing power. So to summarize, we want to minimize the MSE between the input and output, evaluated in the OkLab colorspace. This can be expressed with the following formula: Where: P is a partition (which we constrain to a box in our implementation) C the set of colors in the partition P w the weight of a color c a single color µ the average color of the set C Special thanks to criver for helping me a ton on the math area, this last formula is from them. Looking at the formula, we can see how similar it is to certain branches of the heuristics tree, so we can start getting an intuition about the result of the experiment. Experiment language Short deviation from the main topic (feel free to skip to the next section): working in C within FFmpeg quickly became a hurdle more than anything. Aside from the lack of flexibility, the implicit casts destroying the precision deceitfully, and the undefined behaviours, all kind of C quirks went in the way several times, which made me question my sanity. This one typically severly messed me up while trying to average the colors: #include <stdio.h> #include <stdint.h> int main (void) { const int32_t x = -30; const uint32_t y = 10; const uint32_t a = 30; const int32_t b = -10; printf("%d×%u=%d\n", x, y, x * y); printf("%u×%d=%d\n", a, b, a * b); printf("%d/%u=%d\n", x, y, x / y); printf("%u/%d=%d\n", a, b, a / b); return 0; } % cc -Wall -Wextra -fsanitize=undefined test.c -o test && ./test -30×10=-300 30×-10=-300 -30/10=429496726 30/-10=0 Anyway, I know this is obvious but if you aren't already doing that I suggest you build your experiments in another language, Python or whatever, and rewrite them in C later when you figured out your expected output. Re-implementing what I needed in Python didn't take me long. It was, and still is obviously much slower at runtime, but that's fine. There is a lot of room for speed improvement, typically by relying on numpy (which I didn't bother with). Experiment results I created a research repository for the occasion. The code to reproduce and the results can be found in the color quantization README. In short, based on the results, we can conclude that: Overall, the box that has the axis with the largest non-normalized weighted sum of squared error is the best candidate in the box selection algorithm Overall, cutting the axis with the largest weighted sum of squared error is the best axis cut selection algorithm To my surprise, normalizing the weights per box is not a good idea. I initially observed that by trial and error, which was actually one of the main motivator for this research. I initially thought normalizing each box was necessary in order to compare them against each others (such that they are compared on a common ground). My loose explanation of the phenomenon was that not normalizing causes a bias towards boxes with many colors, but that's actually exactly what we want. I believe it can also be explained by our evaluation function: we want to minimize the error across the whole set of colors, so small partitions (in color counts) must not be made stronger. At least not in the context of the target we chose. It's also interesting to see how the max() seems to perform better than the sum() of the variance of each component most of the time. Admittedly my picture samples set is not that big, which may imply that more experiments to confirm that tendency are required. In retrospective, this might have been quickly predictable to someone with a mathematical background. But since I don't have that, nor do I trust my abstract thinking much, I'm kind of forced to try things out often. This is likely one of the many instances where I spent way too much energy on something obvious from the beginning, but I have the hope it will actually provide some useful information for other lost souls out there. Known limitations There are two main limitations I want to discuss before closing this article. The first one is related to minimizing the MSE even more. K-means refinement We know the Median-Cut actually provides a rough estimate of the optimal palette. One thing we could do is use it as a first step before refinement, for example by running a few K-means iterations as post-processing (how much refinement/iterations could be a user control). The general idea of K-means is to progressively move each colors individually to a more appropriate box, that is a box for which the color distance to the average color of that box is smaller. I started implementing that in a very naive way, so it's extremely slow, but that's something to investigate further because it definitely improves the results. Most of the academic literature seems to suggest the use of the K-means clustering, but all of them require some startup step. Some come up with various heuristics, some use PCA, but I've yet to see one that rely on Median-Cut as first pass; maybe that's not such a good idea, but who knows. Bias toward perceived lightness Another more annoying problem for which I have no solution for is with regards to the human perception being much more sensitive to light changes than hue. If you look at the first demo with the parrot, you may have observed the boxes are kind of thin. This is because the a and b components (respectively how green/red and blue/yellow the color is) have a much smaller amplitude compared to the L (perceived lightness). Here is a side by side comparison of the spread of colors between a stretched and normalized view: You may rightfully question whether this is a problem or not. In practice, this means that when K is low (let's say smaller than 8 or even 16), cuts along L will almost always be preferred, causing the picture to be heavily desaturated. This is because it tries to preserve the most significant attribute in human perception: the lightness. That particular picture is actually a pathological study case: 4 colors 8 colors 12 colors 16 colors We can see the hue timidly appearing around K=16 (specifically it starts being more strongly noticeable starting the cut K=13). Conclusion For now, I'm mostly done with this "week-end long project" into which I actually poured 2 or 3 months of lifetime. The FFmpeg patchset will likely be upstreamed soon so everyone should hopefully be able to benefit from it in the next release. It will also come with additional dithering methods, which implementation actually was a relaxing distraction from all this hardship. There are still many ways of improving this work, but it's the end of the line for me, so I'll trust the Internet with it.

over a year ago 43 votes
Porting OkLab colorspace to integer arithmetic

For reasons I'll explain in a futur write-up, I needed to make use of a perceptually uniform colorspace in some computer vision code. OkLab from Björn Ottosson was a great candidate given how simple the implementation is. But there is a plot twist: I needed the code to be deterministic for the tests to be portable across a large variety of architecture, systems and configurations. Several solutions were offered to me, including reworking the test framework to support a difference mechanism with threshold, but having done that in another project I can confidently say that it's not trivial (when not downright impossible in certain cases). Another approach would have been to hardcode the libc math functions, but even then I wasn't confident the floating point arithmetic would determinism would be guaranteed in all cases. So I ended up choosing to port the code to integer arithmetic. I'm sure many people would disagree with that approach, but: code determinism is guaranteed not all FPU are that efficient, typically on embedded it can now be used in the kernel; while this is far-fetched for OkLab (though maybe someone needs some color management in v4l2 or something), sRGB transforms might have their use cases it's a learning experience which can be re-used in other circumstances working on the integer arithmetic versions unlocked various optimizations for the normal case Note: I'm following Björn Ottosson will to have OkLab code in the public domain as well as under MIT license, so this "dual licensing" applies to all the code presented in this article. Warning: The integer arithmetics in this write-up can only work if your language behaves the same as C99 (or more recent) with regard to integer division. See this previous article on integer division for more information. Quick summary of uniform colorspaces For those unfamiliar with color management, one of the main benefit of a uniform colorspace like OkLab is that the euclidean distance between two colors is directly correlated with the human perception of these colors. More concretely, if we want to evaluate the distance between the RGB triplets (R₀,G₀,B₀) and (R₁,G₁,B₁), one may naively compute the euclidean distance √((R₀-R₁)²+(G₀-G₁)²+(B₀-B₁)²). Unfortunately, even if the RGB is gamma expanded into linear values, the computed distance will actually be pretty far from reflecting how the human eye perceive this difference. It typically isn't going to be consistent when applied to another pair of colors. With OkLab (and many other uniform colorspaces), the colors are also identified with 3D coordinates, but instead of (R,G,B) we call them (L,a,b) (which is an entirely different 3D space). In that space √((L₀-L₁)²+(a₀-a₁)²+(b₀-b₁)² (called ΔE, or Delta-E) is expected to be aligned with human perception of color differences. Of course, this is just one model, and it doesn't take into account many parameters. For instance, the perception of a color depends a lot on the surrounding colors. Still, these models are much better than working with RGB triplets, which don't make much sense visually speaking. Reference code / diagram In this study case, We will be focusing on the transform that goes from sRGB to OkLab, and back again into sRGB. Only the first part is interesting if we want the color distance, but sometimes we also want to alter a color uniformly and thus we need the 2nd part as well to reconstruct an sRGB color from it. We are only considering the sRGB input and output for simplicity, which means we will be inlining the sRGB color transfer in the pipeline. If you're not familiar with gamma compression, there are many resources about it on the Internet which you may want to look into first. Here is a diagram of the complete pipeline: And the corresponding code (of the 4 circles in the diagram) we will be porting: struct Lab { float L, a, b; } uint8_t linear_f32_to_srgb_u8(float x) { if (x <= 0.0) { return 0; } else if (x >= 1.0) { return 0xff; } else { const float v = x < 0.0031308f ? x*12.92f : 1.055f*powf(x, 1.f/2.4f) - 0.055f; return lrintf(v * 255.f); } } float srgb_u8_to_linear_f32(uint8_t x) { const float v = x / 255.f; return v < 0.04045f ? v/12.92f : powf((v+0.055f)/1.055f, 2.4f); } struct Lab srgb_u8_to_oklab_f32(uint32_t srgb) { const float r = srgb_u8_to_linear_f32(srgb >> 16 & 0xff); const float g = srgb_u8_to_linear_f32(srgb >> 8 & 0xff); const float b = srgb_u8_to_linear_f32(srgb & 0xff); const float l = 0.4122214708f * r + 0.5363325363f * g + 0.0514459929f * b; const float m = 0.2119034982f * r + 0.6806995451f * g + 0.1073969566f * b; const float s = 0.0883024619f * r + 0.2817188376f * g + 0.6299787005f * b; const float l_ = cbrtf(l); const float m_ = cbrtf(m); const float s_ = cbrtf(s); const struct Lab ret = { .L = 0.2104542553f * l_ + 0.7936177850f * m_ - 0.0040720468f * s_, .a = 1.9779984951f * l_ - 2.4285922050f * m_ + 0.4505937099f * s_, .b = 0.0259040371f * l_ + 0.7827717662f * m_ - 0.8086757660f * s_, }; return ret; } uint32_t oklab_f32_to_srgb_u8(struct Lab c) { const float l_ = c.L + 0.3963377774f * c.a + 0.2158037573f * c.b; const float m_ = c.L - 0.1055613458f * c.a - 0.0638541728f * c.b; const float s_ = c.L - 0.0894841775f * c.a - 1.2914855480f * c.b; const float l = l_*l_*l_; const float m = m_*m_*m_; const float s = s_*s_*s_; const uint8_t r = linear_f32_to_srgb_u8(+4.0767416621f * l - 3.3077115913f * m + 0.2309699292f * s); const uint8_t g = linear_f32_to_srgb_u8(-1.2684380046f * l + 2.6097574011f * m - 0.3413193965f * s); const uint8_t b = linear_f32_to_srgb_u8(-0.0041960863f * l - 0.7034186147f * m + 1.7076147010f * s); return r<<16 | g<<8 | b; } sRGB to Linear The first step is converting the sRGB color to linear values. That sRGB transfer function can be intimidating, but it's pretty much a simple power function: The input is 8-bit ([0x00;0xff] for each of the 3 channels) which means we can use a simple 256 values lookup table containing the precomputed resulting linear values. Note that we can already do that with the reference code with a table remapping the 8-bit index into a float value. For our integer version we need to pick an arbitrary precision for the linear representation. 8-bit is not going to be enough precision, so we're going to pick the next power of two to be space efficient: 16-bit. We will be using the constant K=(1<<16)-1=0xffff to refer to this scale. Alternatively we could rely on a fixed point mapping (an integer for the decimal part and another integer for the fractional part), but in our case pretty much everything is normalized so the decimal part doesn't really matter. /** * Table mapping formula: * f(x) = x < 0.04045 ? x/12.92 : ((x+0.055)/1.055)^2.4 (sRGB EOTF) * Where x is the normalized index in the table and f(x) the value in the table. * f(x) is remapped to [0;K] and rounded. */ static const uint16_t srgb2linear[256] = { 0x0000, 0x0014, 0x0028, 0x003c, 0x0050, 0x0063, 0x0077, 0x008b, 0x009f, 0x00b3, 0x00c7, 0x00db, 0x00f1, 0x0108, 0x0120, 0x0139, 0x0154, 0x016f, 0x018c, 0x01ab, 0x01ca, 0x01eb, 0x020e, 0x0232, 0x0257, 0x027d, 0x02a5, 0x02ce, 0x02f9, 0x0325, 0x0353, 0x0382, 0x03b3, 0x03e5, 0x0418, 0x044d, 0x0484, 0x04bc, 0x04f6, 0x0532, 0x056f, 0x05ad, 0x05ed, 0x062f, 0x0673, 0x06b8, 0x06fe, 0x0747, 0x0791, 0x07dd, 0x082a, 0x087a, 0x08ca, 0x091d, 0x0972, 0x09c8, 0x0a20, 0x0a79, 0x0ad5, 0x0b32, 0x0b91, 0x0bf2, 0x0c55, 0x0cba, 0x0d20, 0x0d88, 0x0df2, 0x0e5e, 0x0ecc, 0x0f3c, 0x0fae, 0x1021, 0x1097, 0x110e, 0x1188, 0x1203, 0x1280, 0x1300, 0x1381, 0x1404, 0x1489, 0x1510, 0x159a, 0x1625, 0x16b2, 0x1741, 0x17d3, 0x1866, 0x18fb, 0x1993, 0x1a2c, 0x1ac8, 0x1b66, 0x1c06, 0x1ca7, 0x1d4c, 0x1df2, 0x1e9a, 0x1f44, 0x1ff1, 0x20a0, 0x2150, 0x2204, 0x22b9, 0x2370, 0x242a, 0x24e5, 0x25a3, 0x2664, 0x2726, 0x27eb, 0x28b1, 0x297b, 0x2a46, 0x2b14, 0x2be3, 0x2cb6, 0x2d8a, 0x2e61, 0x2f3a, 0x3015, 0x30f2, 0x31d2, 0x32b4, 0x3399, 0x3480, 0x3569, 0x3655, 0x3742, 0x3833, 0x3925, 0x3a1a, 0x3b12, 0x3c0b, 0x3d07, 0x3e06, 0x3f07, 0x400a, 0x4110, 0x4218, 0x4323, 0x4430, 0x453f, 0x4651, 0x4765, 0x487c, 0x4995, 0x4ab1, 0x4bcf, 0x4cf0, 0x4e13, 0x4f39, 0x5061, 0x518c, 0x52b9, 0x53e9, 0x551b, 0x5650, 0x5787, 0x58c1, 0x59fe, 0x5b3d, 0x5c7e, 0x5dc2, 0x5f09, 0x6052, 0x619e, 0x62ed, 0x643e, 0x6591, 0x66e8, 0x6840, 0x699c, 0x6afa, 0x6c5b, 0x6dbe, 0x6f24, 0x708d, 0x71f8, 0x7366, 0x74d7, 0x764a, 0x77c0, 0x7939, 0x7ab4, 0x7c32, 0x7db3, 0x7f37, 0x80bd, 0x8246, 0x83d1, 0x855f, 0x86f0, 0x8884, 0x8a1b, 0x8bb4, 0x8d50, 0x8eef, 0x9090, 0x9235, 0x93dc, 0x9586, 0x9732, 0x98e2, 0x9a94, 0x9c49, 0x9e01, 0x9fbb, 0xa179, 0xa339, 0xa4fc, 0xa6c2, 0xa88b, 0xaa56, 0xac25, 0xadf6, 0xafca, 0xb1a1, 0xb37b, 0xb557, 0xb737, 0xb919, 0xbaff, 0xbce7, 0xbed2, 0xc0c0, 0xc2b1, 0xc4a5, 0xc69c, 0xc895, 0xca92, 0xcc91, 0xce94, 0xd099, 0xd2a1, 0xd4ad, 0xd6bb, 0xd8cc, 0xdae0, 0xdcf7, 0xdf11, 0xe12e, 0xe34e, 0xe571, 0xe797, 0xe9c0, 0xebec, 0xee1b, 0xf04d, 0xf282, 0xf4ba, 0xf6f5, 0xf933, 0xfb74, 0xfdb8, 0xffff, }; int32_t srgb_u8_to_linear_int(uint8_t x) { return (int32_t)srgb2linear[x]; } You may have noticed that we are returning the value in a i32: this is to ease arithmetic operations (preserving the 16-bit unsigned precision would have overflow warping implications when working with the value). Linear to OkLab The OkLab is expressed in a virtually continuous space (floats). If we feed all 16.7 millions sRGB colors to the OkLab transform we get the following ranges in output: min Lab: 0.000000 -0.233887 -0.311528 max Lab: 1.000000 0.276216 0.198570 We observe that L is always positive and neatly within [0;1] while a and b are in a more restricted and signed range. Multiple choices are offered to us with regard to the integer representation we pick. Since we chose 16-bit for the input linear value, it makes sense to preserve that precision for Lab. For the L component, this fits neatly ([0;1] in the ref maps to [0;0xffff] in the integer version), but for the a and b component, not so much. We could pick a signed 16-bit, but that would imply a 15-bit precision for the arithmetic and 1-bit for the sign, which is going to be troublesome: we want to preserve the same precision for L, a and b since the whole point of this operation is to have a uniform space. Instead, I decided to go with 16-bits of precision, with one extra bit for the sign (which will be used for a and b), and thus storing Lab in 3 signed i32. Alternatively, we could decide to have a 15-bit precision with an extra bit for the sign by using 3 i16. This should work mostly fine but having the values fit exactly the boundaries of the storage can be problematic in various situations, typically anything that involves boundary checks and overflows. Picking a larger storage simplifies a bunch of things. Looking at srgb_u8_to_oklab_f32 we quickly see that for most of the function it's simple arithmetic, but we have a cube root (cbrt()), so let's study that first. Cube root All the cbrt inputs are driven by this: const float l = 0.4122214708f * r + 0.5363325363f * g + 0.0514459929f * b; const float m = 0.2119034982f * r + 0.6806995451f * g + 0.1073969566f * b; const float s = 0.0883024619f * r + 0.2817188376f * g + 0.6299787005f * b; This might not be obvious at first glance but here l, m and s all are in [0;1] range (the sum of the coefficients of each row is 1), so we will only need to deal with this range in our cbrt implementation. This simplifies greatly the problem! Now, what does it look like? This function is simply the inverse of f(x)=x³, which is a more convenient function to work with. And I have some great news: not long ago, I wrote an article on how to inverse a function, so that's exactly what we are going to do here: inverse f(x)=x³. What we first need though is a good approximation of the curve. A straight line is probably fine but we could try to use some symbolic regression in order to get some sort of rough polynomial approximation. PySR can do that in a few lines of code: import numpy as np from pysr import PySRRegressor # 25 points of ³√x within [0;1] x = np.linspace(0, 1, 25).reshape(-1, 1) y = x ** (1/3) model = PySRRegressor(model_selection="accuracy", binary_operators=["+", "-", "*"], niterations=200) r = model.fit(x, y, variable_names=["x"]) print(r) The output is not deterministic for some reason (which is quite annoying) and the expressions provided usually follows a wonky form. Still, in my run it seemed to take a liking to the following polynomial: u₀ = x³ - 2.19893x² + 2.01593x + 0.219407 (reformatted in a sane polynomial form thanks to WolframAlpha). Note that increasing the number of data points is not really a good idea because we quickly start being confronted to Runge's phenomenon. No need to overthink it, 25 points is just fine. Now we can make a few Newton iterations. For that, we need the derivative of f(uₙ)=uₙ³-x, so f'(uₙ)=3uₙ² and thus the iteration expressions can be obtained easily: uₙ₊₁ = uₙ - (f(uₙ)-v)/f'(uₙ) = uₙ - (uₙ³-v)/(3uₙ²) = (2uₙ³+v)/(3uₙ²) If you don't understand what the hell is going on here, check the article referred to earlier, we're simply following the recipe here. Now I had a look into how most libc compute cbrt, and despite sometimes referring to Newton iterations, they were actually using Halley iterations. So we're going to do the same (not lying, just the Halley part). To get the Halley iteration instead of Newton, we need the first but also the second derivative of f(uₙ)=uₙ³-x (f'(uₙ)=3uₙ² and f"(uₙ)=6uₙ) from which we deduce a relatively simple expression: uₙ₊₁ = uₙ-2f(uₙ)f'(uₙ)/(2f'(uₙ)²-f(uₙ)f"(uₙ)) = uₙ(2x+uₙ³)/(x+2uₙ³) We have everything we need to approximate a cube root of a real between 0 and 1. In Python a complete implementation would be as simple as this snippet: b, c, d = -2.19893, 2.01593, 0.219407 def cbrt01(x): # We only support [0;1] if x <= 0: return 0 if x >= 1: return 1 # Initial approximation u = x**3 + b*x**2 + c*x + d # 2 Halley iterations u = u * (2*x+u**3) / (x+2*u**3) u = u * (2*x+u**3) / (x+2*u**3) return u But now we need to scale the floating values up into 16-bit integers. First of all, in the integer version our x is actually in K scale, which means we want to express u according to X=x·K. Similarly, we want to use B=b·K, C=c·K and D=d·K instead of b, c and d because we have no way of expressing the former as integer otherwise. Finally, we're not actually going to compute u₀ but u₀·K because we're preserving the scale through the function. We have: u₀·K = K·(x³ + bx² + cx + d) = K·((x·K)³/K³ + b(x·K)²/K² + c(x·K)/K + d) = K·(X³/K³ + bX²/K² + cX/K + d) = X³·K/K³ + bX²·K/K² + cX·K/K + d·K = X³/K² + BX²/K² + CX/K + D = X³/K² + BX²/K² + CX/K + D = (X³ + BX²)/K² + CX/K + D = ((X³ + BX²)/K + CX)/K + D = (X(X² + BX)/K + CX)/K + D U₀ = (X(X(X + B)/K + CX)/K + D With this we have a relatively cheap expression where the K divisions would still preserve enough precision even if evaluated as integer division. We can do the same for the Halley iteration. I spare you the algebra, the expression u(2x+u³) / (x+2u³) becomes (U(2X+U³/K²)) / (X+2U³/K²). Looking at this expression you may start to worry about overflows, and that would fair since even K² is getting dangerously close to the sun (it's actually already larger than INT32_MAX). For this reason we're going to cheat and simply use 64-bit arithmetic in this function. I believe we could reduce the risk of overflow, but I don't think there is a way to remain in 32-bit without nasty compromises anyway. This is also why in the code below you'll notice the constants are suffixed with LL (to force long-long/64-bit arithmetic). Beware that overflows are a terrible predicament to get into as they will lead to undefined behaviour. Do not underestimate this risk. You might not detect them early enough, and missing them may mislead you when interpreting the results. For this reason, I strongly suggest to always build with -fsanitize=undefined during test and development. I don't do that often, but for this kind of research, I also highly recommend to first write tests that cover all possible integers input (when applicable) so that overflows are detected as soon as possible. Before we write the integer version of our function, we need to address rounding. In the case of the initial approximation I don't think we need to bother, but for our Halley iteration we're going to need as much precision as we can get. Since we know U is positive (remember we're evaluating cbrt(x) where x is in [0;1]), we can use the (a+b/2)/b rounding formula. Our function finally just looks like: #define K2 ((int64_t)K*K) int32_t cbrt01_int(int32_t x) { int64_t u; /* Approximation curve is for the [0;1] range */ if (x <= 0) return 0; if (x >= K) return K; /* Initial approximation: x³ - 2.19893x² + 2.01593x + 0.219407 */ u = x*(x*(x - 144107LL) / K + 132114LL) / K + 14379LL; /* Refine with 2 Halley iterations. */ for (int i = 0; i < 2; i++) { const int64_t u3 = u*u*u; const int64_t den = x + (2*u3 + K2/2) / K2; u = (u * (2*x + (u3 + K2/2) / K2) + den/2) / den; } return u; } Cute, isn't it? If we test the accuracy of this function by calling it for all the possible values we actually get extremely good results. Here is a test code: int main(void) { float max_diff = 0; float total_diff = 0; for (int i = 0; i <= K; i++) { const float ref = cbrtf(i / (float)K); const float out = cbrt01_int(i) / (float)K; const float d = fabs(ref - out); if (d > max_diff) max_diff = d; total_diff += d; } printf("max_diff=%f total_diff=%f avg_diff=%f\n", max_diff, total_diff, total_diff / (K + 1)); return 0; } Output: max_diff=0.030831 total_diff=0.816078 avg_diff=0.000012 If we want to trade precision for speed, we could adjust the function to use Newton iterations, and maybe remove the rounding. Back to the core Going back to our sRGB-to-OkLab function, everything should look straightforward to implement now. There is one thing though, while lms computation (at the beginning of the function) is exclusively working with positive values, the output Lab value expression is signed. For this reason we will need a more involved rounded division, so referring again to my last article we will use: static int64_t div_round64(int64_t a, int64_t b) { return (a^b)<0 ? (a-b/2)/b : (a+b/2)/b; } And thus, we have: struct LabInt { int32_t L, a, b; }; struct LabInt srgb_u8_to_oklab_int(uint32_t srgb) { const int32_t r = (int32_t)srgb2linear[srgb >> 16 & 0xff]; const int32_t g = (int32_t)srgb2linear[srgb >> 8 & 0xff]; const int32_t b = (int32_t)srgb2linear[srgb & 0xff]; // Note: lms can actually be slightly over K due to rounded coefficients const int32_t l = (27015LL*r + 35149LL*g + 3372LL*b + K/2) / K; const int32_t m = (13887LL*r + 44610LL*g + 7038LL*b + K/2) / K; const int32_t s = (5787LL*r + 18462LL*g + 41286LL*b + K/2) / K; const int32_t l_ = cbrt01_int(l); const int32_t m_ = cbrt01_int(m); const int32_t s_ = cbrt01_int(s); const struct LabInt ret = { .L = div_round64( 13792LL*l_ + 52010LL*m_ - 267LL*s_, K), .a = div_round64(129628LL*l_ - 159158LL*m_ + 29530LL*s_, K), .b = div_round64( 1698LL*l_ + 51299LL*m_ - 52997LL*s_, K), }; return ret; } The note in this code is here to remind us that we have to saturate lms to a maximum of K (corresponding to 1.0 with floats), which is what we're doing in cbrt01_int(). At this point we can already work within the OkLab space but we're only half-way through the pain. Fortunately, things are going to be easier from now on. OkLab to sRGB Our OkLab-to-sRGB function relies on the Linear-to-sRGB function (at the end), so we're going to deal with it first. Linear to sRGB Contrary to sRGB-to-Linear it's going to be tricky to rely on a table because it would be way too large to hold all possible values (since it would require K entries). I initially considered computing powf(x, 1.f/2.4f) with integer arithmetic somehow, but this is much more involved than how we managed to implement cbrt. So instead I thought about approximating the curve with a bunch of points (stored in a table), and then approximate any intermediate value with a linear interpolation, that is as if the point were joined through small segments. We gave 256 16-bit entries to srgb2linear, so if we were to give as much storage to linear2srgb we could have a table of 512 8-bit entries (our output is 8-bit). Here it is: /** * Table mapping formula: * f(x) = x < 0.0031308 ? x*12.92 : (1.055)*x^(1/2.4)-0.055 (sRGB OETF) * Where x is the normalized index in the table and f(x) the value in the table. * f(x) is remapped to [0;0xff] and rounded. * * Since a 16-bit table is too large, we reduce its precision to 9-bit. */ static const uint8_t linear2srgb[P + 1] = { 0x00, 0x06, 0x0d, 0x12, 0x16, 0x19, 0x1c, 0x1f, 0x22, 0x24, 0x26, 0x28, 0x2a, 0x2c, 0x2e, 0x30, 0x32, 0x33, 0x35, 0x36, 0x38, 0x39, 0x3b, 0x3c, 0x3d, 0x3e, 0x40, 0x41, 0x42, 0x43, 0x45, 0x46, 0x47, 0x48, 0x49, 0x4a, 0x4b, 0x4c, 0x4d, 0x4e, 0x4f, 0x50, 0x51, 0x52, 0x53, 0x54, 0x55, 0x56, 0x56, 0x57, 0x58, 0x59, 0x5a, 0x5b, 0x5b, 0x5c, 0x5d, 0x5e, 0x5f, 0x5f, 0x60, 0x61, 0x62, 0x62, 0x63, 0x64, 0x65, 0x65, 0x66, 0x67, 0x67, 0x68, 0x69, 0x6a, 0x6a, 0x6b, 0x6c, 0x6c, 0x6d, 0x6e, 0x6e, 0x6f, 0x6f, 0x70, 0x71, 0x71, 0x72, 0x73, 0x73, 0x74, 0x74, 0x75, 0x76, 0x76, 0x77, 0x77, 0x78, 0x79, 0x79, 0x7a, 0x7a, 0x7b, 0x7b, 0x7c, 0x7d, 0x7d, 0x7e, 0x7e, 0x7f, 0x7f, 0x80, 0x80, 0x81, 0x81, 0x82, 0x82, 0x83, 0x84, 0x84, 0x85, 0x85, 0x86, 0x86, 0x87, 0x87, 0x88, 0x88, 0x89, 0x89, 0x8a, 0x8a, 0x8b, 0x8b, 0x8c, 0x8c, 0x8c, 0x8d, 0x8d, 0x8e, 0x8e, 0x8f, 0x8f, 0x90, 0x90, 0x91, 0x91, 0x92, 0x92, 0x93, 0x93, 0x93, 0x94, 0x94, 0x95, 0x95, 0x96, 0x96, 0x97, 0x97, 0x97, 0x98, 0x98, 0x99, 0x99, 0x9a, 0x9a, 0x9a, 0x9b, 0x9b, 0x9c, 0x9c, 0x9c, 0x9d, 0x9d, 0x9e, 0x9e, 0x9f, 0x9f, 0x9f, 0xa0, 0xa0, 0xa1, 0xa1, 0xa1, 0xa2, 0xa2, 0xa3, 0xa3, 0xa3, 0xa4, 0xa4, 0xa5, 0xa5, 0xa5, 0xa6, 0xa6, 0xa6, 0xa7, 0xa7, 0xa8, 0xa8, 0xa8, 0xa9, 0xa9, 0xa9, 0xaa, 0xaa, 0xab, 0xab, 0xab, 0xac, 0xac, 0xac, 0xad, 0xad, 0xae, 0xae, 0xae, 0xaf, 0xaf, 0xaf, 0xb0, 0xb0, 0xb0, 0xb1, 0xb1, 0xb1, 0xb2, 0xb2, 0xb3, 0xb3, 0xb3, 0xb4, 0xb4, 0xb4, 0xb5, 0xb5, 0xb5, 0xb6, 0xb6, 0xb6, 0xb7, 0xb7, 0xb7, 0xb8, 0xb8, 0xb8, 0xb9, 0xb9, 0xb9, 0xba, 0xba, 0xba, 0xbb, 0xbb, 0xbb, 0xbc, 0xbc, 0xbc, 0xbd, 0xbd, 0xbd, 0xbe, 0xbe, 0xbe, 0xbf, 0xbf, 0xbf, 0xc0, 0xc0, 0xc0, 0xc1, 0xc1, 0xc1, 0xc1, 0xc2, 0xc2, 0xc2, 0xc3, 0xc3, 0xc3, 0xc4, 0xc4, 0xc4, 0xc5, 0xc5, 0xc5, 0xc6, 0xc6, 0xc6, 0xc6, 0xc7, 0xc7, 0xc7, 0xc8, 0xc8, 0xc8, 0xc9, 0xc9, 0xc9, 0xc9, 0xca, 0xca, 0xca, 0xcb, 0xcb, 0xcb, 0xcc, 0xcc, 0xcc, 0xcc, 0xcd, 0xcd, 0xcd, 0xce, 0xce, 0xce, 0xce, 0xcf, 0xcf, 0xcf, 0xd0, 0xd0, 0xd0, 0xd0, 0xd1, 0xd1, 0xd1, 0xd2, 0xd2, 0xd2, 0xd2, 0xd3, 0xd3, 0xd3, 0xd4, 0xd4, 0xd4, 0xd4, 0xd5, 0xd5, 0xd5, 0xd6, 0xd6, 0xd6, 0xd6, 0xd7, 0xd7, 0xd7, 0xd7, 0xd8, 0xd8, 0xd8, 0xd9, 0xd9, 0xd9, 0xd9, 0xda, 0xda, 0xda, 0xda, 0xdb, 0xdb, 0xdb, 0xdc, 0xdc, 0xdc, 0xdc, 0xdd, 0xdd, 0xdd, 0xdd, 0xde, 0xde, 0xde, 0xde, 0xdf, 0xdf, 0xdf, 0xe0, 0xe0, 0xe0, 0xe0, 0xe1, 0xe1, 0xe1, 0xe1, 0xe2, 0xe2, 0xe2, 0xe2, 0xe3, 0xe3, 0xe3, 0xe3, 0xe4, 0xe4, 0xe4, 0xe4, 0xe5, 0xe5, 0xe5, 0xe5, 0xe6, 0xe6, 0xe6, 0xe6, 0xe7, 0xe7, 0xe7, 0xe7, 0xe8, 0xe8, 0xe8, 0xe8, 0xe9, 0xe9, 0xe9, 0xe9, 0xea, 0xea, 0xea, 0xea, 0xeb, 0xeb, 0xeb, 0xeb, 0xec, 0xec, 0xec, 0xec, 0xed, 0xed, 0xed, 0xed, 0xee, 0xee, 0xee, 0xee, 0xef, 0xef, 0xef, 0xef, 0xef, 0xf0, 0xf0, 0xf0, 0xf0, 0xf1, 0xf1, 0xf1, 0xf1, 0xf2, 0xf2, 0xf2, 0xf2, 0xf3, 0xf3, 0xf3, 0xf3, 0xf3, 0xf4, 0xf4, 0xf4, 0xf4, 0xf5, 0xf5, 0xf5, 0xf5, 0xf6, 0xf6, 0xf6, 0xf6, 0xf6, 0xf7, 0xf7, 0xf7, 0xf7, 0xf8, 0xf8, 0xf8, 0xf8, 0xf9, 0xf9, 0xf9, 0xf9, 0xf9, 0xfa, 0xfa, 0xfa, 0xfa, 0xfb, 0xfb, 0xfb, 0xfb, 0xfb, 0xfc, 0xfc, 0xfc, 0xfc, 0xfd, 0xfd, 0xfd, 0xfd, 0xfd, 0xfe, 0xfe, 0xfe, 0xfe, 0xff, 0xff, 0xff, }; Again we're going to start with the floating point version as it's easier to reason with. We have a precision P of 9-bits: P = (1<<9)-1 = 511 = 0x1ff. But for the sake of understanding the math, the following diagram will assume a P of 3 so that we can clearly see the segment divisions: The input of our table is an integer index which needs to be calculated according to our input x. But as stated earlier, we won't need one but two indices in order to interpolate a point between 2 discrete values from our table. We will refer to these indices as iₚ and iₙ, which can be computed like this: i = x·P iₚ = ⌊i⌋ iₙ = iₚ + 1 (⌊a⌋ means floor(a)) In order to get an approximation of y according to i, we simply need a linear remapping: the ratio of i between iₚ and iₙ is the same ratio as y between yₚ and yₙ. So yet again we're going to rely on the most useful maths formulas: remap(iₚ,iₙ,yₚ,yₙ,i) = mix(yₚ,yₙ,linear(iₚ,iₙ,i)). The ratio r we're computing as an input to the y-mix can be simplified a bit: r = linear(iₚ,iₙ,i) = (i-iₚ) / (iₙ-iₚ) = i-iₚ = x·P - ⌊x·P⌋ = fract(x·P) So in the end our formula is simply: y = mix(yₚ,yₙ,fract(x·P)) Translated into C we can write it like this: uint8_t linear_f32_to_srgb_u8_fast(float x) { if (x <= 0.f) { return 0; } else if (x >= 1.f) { return 0xff; } else { const float i = x * P; const int32_t idx = (int32_t)floorf(i); const float y0 = linear2srgb[idx]; const float y1 = linear2srgb[idx + 1]; const float r = i - idx; return lrintf(mix(y0, y1, r)); } } Note: in case you are concerned about idx+1 overflowing, floorf((1.0-FLT_EPSILON)*P) is P-1, so this is safe. Linear to sRGB, integer version In the integer version, our function input x is within [0;K], so we need to make a few adjustments. The first issue we have is that with integer arithmetic our i and idx are the same. We have X=x·K as input, so i = idx = X·P/K because we are using an integer division, which in this case is equivalent to the floor() expression in the float version. So while it's a simple and fast way to get yₚ and yₙ, we have an issue figuring out the ratio r. One tool we have is the modulo operator: the integer division is destructive of the fractional part, but fortunately the modulo (the rest of the division) gives this information back. It can also be obtained for free most of the time because CPU division instructions tend to also provide that modulo as well without extra computation. If we give m = (X·P) % K, we have the fractional part of the division expressed in the K scale, which means we can derivate our ratio r from it: r = m / K. Slipping the K division in our mix() expression we end up with the following code: uint8_t linear_int_to_srgb_u8(int32_t x) { if (x <= 0) { return 0; } else if (x >= K) { return 0xff; } else { const int32_t xP = x * P; const int32_t i = xP / K; const int32_t m = xP % K; const int32_t y0 = linear2srgb[i]; const int32_t y1 = linear2srgb[i + 1]; return (m * (y1 - y0) + K/2) / K + y0; } } Testing this function for all the possible input of x, the biggest inaccuracy is a off-by-one, which concerns 6280 of the 65536 possible values (less than 10%): 2886 "off by -1" and 3394 "off by +1". It matches exactly the inaccuracy of the float version of this function, so I think we can be pretty happy with it. Given how good this approach is, we could also consider applying the same strategy for cbrt, so this is left as an exercise to the reader. Back to the core We're finally in our last function. Using everything we've learned so far, it can be trivially converted to integer arithmetic: uint32_t oklab_int_to_srgb_u8(struct LabInt c) { const int64_t l_ = c.L + div_round64(25974LL * c.a, K) + div_round64( 14143LL * c.b, K); const int64_t m_ = c.L + div_round64(-6918LL * c.a, K) + div_round64( -4185LL * c.b, K); const int64_t s_ = c.L + div_round64(-5864LL * c.a, K) + div_round64(-84638LL * c.b, K); const int32_t l = l_*l_*l_ / K2; const int32_t m = m_*m_*m_ / K2; const int32_t s = s_*s_*s_ / K2; const uint8_t r = linear_int_to_srgb_u8((267169LL * l - 216771LL * m + 15137LL * s + K/2) / K); const uint8_t g = linear_int_to_srgb_u8((-83127LL * l + 171030LL * m - 22368LL * s + K/2) / K); const uint8_t b = linear_int_to_srgb_u8(( -275LL * l - 46099LL * m + 111909LL * s + K/2) / K); return r<<16 | g<<8 | b; } Important things to notice: we're storing l_, m_ and s_ in 64-bits values so that the following cubic do not overflow we're using div_round64 for part of the expressions of l_, m_ and s_ because they are using signed sub-expressions we're using a naive integer division in r, g and b because the value is expected to be positive Evaluation We're finally there. In the end the complete code is less than 200 lines of code and even less for the optimized float one (assuming we don't implement our own cbrt). The complete code, test functions and benchmarks tools can be found on Github. Accuracy Comparing the integer version to the reference float gives use the following results: sRGB to OkLab: max_diff=0.000883 total_diff=0.051189 OkLab to sRGB: max_diff_r=2 max_diff_g=1 max_diff_b=1 I find these results pretty decent for an integer version, but you're free to disagree and improve them. Speed The benchmarks are also interesting: on my main workstation (Intel® Core™ i7-12700, glibc 2.36, GCC 12.2.0), the integer arithmetic is slightly slower that the optimized float version: Command Mean [s] Min [s] Max [s] Relative Reference 1.425 ± 0.008 1.414 1.439 1.59 ± 0.01 Fast float 0.897 ± 0.005 0.888 0.902 1.00 Integer arithmetic 0.937 ± 0.006 0.926 0.947 1.04 ± 0.01 Observations: The FPU is definitely fast in modern CPUs Both integer and optimized float versions are destroying the reference code (note that this only because of the transfer functions optimizations, as we have no change in the OkLab functions themselves in the optimized float version) On the other hand, on one of my random ARM board (NanoPI NEO 2 with a Cortex A53, glibc 2.35, GCC 12.1.0), I get different results: Command Mean [s] Min [s] Max [s] Relative Reference 27.678 ± 0.009 27.673 27.703 2.04 ± 0.00 Fast float 15.769 ± 0.001 15.767 15.772 1.16 ± 0.00 Integer arithmetic 13.551 ± 0.001 13.550 13.553 1.00 Not that much faster proportionally speaking, but the integer version is still significantly faster overall on such low-end device. Conclusion This took me ages to complete, way longer than I expected but I'm pretty happy with the end results and with everything I learned in the process. Also, you may have noticed how much I referred to previous work; this has been particularly satisfying from my point of view (re-using previous toolboxes means they were actually useful). This write-up won't be an exception to the rule: in a later article, I will make use of OkLab for another project I've been working on for a while now. See you soon!

over a year ago 43 votes
GCC undefined behaviors are getting wild

Happy with my recent breakthrough in understanding C integer divisions after weeks of struggle, I was minding my own business having fun writing integer arithmetic code. Life was good, when suddenly… zsh: segmentation fault (core dumped). That code wasn't messing with memory much so it was more likely to be a side effect of an arithmetic overflow or something. Using -fsanitize=undefined quickly identified the issue, which confirmed the presence of an integer overflow. The fix was easy but something felt off. I was under the impression my code was robust enough against that kind of honest mistake. Turns out, the protecting condition I had in place should indeed have been enough, so I tried to extract a minimal reproducible case: #include <stdint.h> #include <stdio.h> #include <stdlib.h> uint8_t tab[0x1ff + 1]; uint8_t f(int32_t x) { if (x < 0) return 0; int32_t i = x * 0x1ff / 0xffff; if (i >= 0 && i < sizeof(tab)) { printf("tab[%d] looks safe because %d is between [0;%d[\n", i, i, (int)sizeof(tab)); return tab[i]; } return 0; } int main(int ac, char **av) { return f(atoi(av[1])); } The overflow can happen on x * 0x1ff. Since an integer overflow is undefined, GCC makes the assumption that it cannot happen, ever. In practice in this case it does, but the i >= 0 && i < sizeof(tab) condition should be enough to take care of it, whatever crazy value it becomes, right? Well, I have bad news: % cc -Wall -O2 overflow.c -o overflow && ./overflow 50000000 tab[62183] looks safe because 62183 is between [0;512[ zsh: segmentation fault (core dumped) ./overflow 50000000 Note: this is GCC 12.2.0 on x86-64. We have i=62183 as the result of the overflow, and nevertheless the execution violates the gate condition, spout a non-sense lie, go straight into dereferencing tab, and die miserably. Let's study what GCC is doing here. Firing up Ghidra we observe the following decompiled code: uint8_t f(int x) { int tmp; if (-1 < x) { tmp = x * 0x1ff; if (tmp < 0x1fffe00) { printf("tab[%d] looks safe because %d is between [0;%d[\n",(ulong)(uint)tmp / 0xffff, (ulong)(uint)tmp / 0xffff,0x200); return tab[(int)((uint)tmp / 0xffff)]; } } return '\0'; } When I said GCC makes the assumption that it cannot happen this is what I meant: tmp is not supposed to overflow so part of the condition I had in place was simply removed. More specifically since x can not be lesser than 0, and since GCC assumes a multiplication cannot overflow into a random value (that could be negative) because it is undefined behaviour, it then decides to drop the "redundant" i >= 0 condition because "it cannot happen". I reported that exact issue to GCC to make sure it wasn't a bug, and it was indeed confirmed to me that the undefined behaviour of an integer overflow is not limited in scope to whatever insane value it could take: it is apparently perfectly acceptable to mess up the code flow entirely. While I understand how attractive it can be from an optimization point of view, the paranoid developer in me is straight up terrified by the perspective of a single integer overflow removing security protection and causing such havoc. I've worked several years in a project where the integer overflows were (and probably still are) legion. Identifying and fixing of all them is likely a lifetime mission of several opinionated individuals. I'm expecting this article to make the rust crew go in a crusade again, and I think I might be with them this time. Edit: it was made clear to me while reading Predrag's blog that the key to my misunderstanding boils down to this: "Undefined behavior is not the same as implementation-defined behavior". While I was indeed talking about undefined behaviour, subconsciously I was thinking that the behaviour of an overflow on a multiplication would be "implementation-defined behaviour". This is not the case, it is indeed an undefined behaviour, and yes the compiler is free to do whatever it wants to because it is compliant with the specifications. It's my mistake of course, but to my defense, despite the arrogant comments I read, this confusion happens a lot. This happens I believe because it's violating the Principle of least astonishment. To illustrate this I'll take this interesting old OpenBSD developer blog post being concerned about the result of the multiplication rather than the invalidation of any guarantee with regard to what's going to happen to the execution flow (before and after). This is not uncommon and in my opinion perfectly understandable.

over a year ago 38 votes

More in programming

Notes from Alexander Petros’ “Building the Hundred-Year Web Service”

I loved this talk from Alexander Petros titled “Building the Hundred-Year Web Service”. What follows is summation of my note-taking from watching the talk on YouTube. Is what you’re building for future generations: Useful for them? Maintainable by them? Adaptable by them? Actually, forget about future generations. Is what you’re building for future you 6 months or 6 years from now aligning with those goals? While we’re building codebases which may not be useful, maintainable, or adaptable by someone two years from now, the Romans built a bridge thousands of years ago that is still being used today. It should be impossible to imagine building something in Roman times that’s still useful today. But if you look at [Trajan’s Bridge in Portugal, which is still used today] you can see there’s a little car on its and a couple pedestrians. They couldn’t have anticipated the automobile, but nevertheless it is being used for that today. That’s a conundrum. How do you build for something you can’t anticipate? You have to think resiliently. Ask yourself: What’s true today, that was true for a software engineer in 1991? One simple answer is: Sharing and accessing information with a uniform resource identifier. That was true 30+ years ago, I would venture to bet it will be true in another 30 years — and more! There [isn’t] a lot of source code that can run unmodified in software that is 30 years apart. And yet, the first web site ever made can do precisely that. The source code of the very first web page — which was written for a line mode browser — still runs today on a touchscreen smartphone, which is not a device that Tim Berners-less could have anticipated. Alexander goes on to point out how interaction with web pages has changed over time: In the original line mode browser, links couldn’t be represented as blue underlined text. They were represented more like footnotes on screen where you’d see something like this[1] and then this[2]. If you wanted to follow that link, there was no GUI to point and click. Instead, you would hit that number on your keyboard. In desktop browsers and GUI interfaces, we got blue underlines to represent something you could point and click on to follow a link On touchscreen devices, we got “tap” with your finger to follow a link. While these methods for interaction have changed over the years, the underlying medium remains unchanged: information via uniform resource identifiers. The core representation of a hypertext document is adaptable to things that were not at all anticipated in 1991. The durability guarantees of the web are absolutely astounding if you take a moment to think about it. In you’re sprinting you might beat the browser, but it’s running a marathon and you’ll never beat it in the long run. If your page is fast enough, [refreshes] won’t even repaint the page. The experience of refreshing a page, or clicking on a “hard link” is identical to the experience of partially updating the page. That is something that quietly happened in the last ten years with no fanfare. All the people who wrote basic HTML got a huge performance upgrade in their browser. And everybody who tried to beat the browser now has to reckon with all the JavaScript they wrote to emulate these basic features. Email · Mastodon · Bluesky

14 hours ago 2 votes
Modeling Awkward Social Situations with TLA+

You're walking down the street and need to pass someone going the opposite way. You take a step left, but they're thinking the same thing and take a step to their right, aka your left. You're still blocking each other. Then you take a step to the right, and they take a step to their left, and you're back to where you started. I've heard this called "walkwarding" Let's model this in TLA+. TLA+ is a formal methods tool for finding bugs in complex software designs, most often involving concurrency. Two people trying to get past each other just also happens to be a concurrent system. A gentler introduction to TLA+'s capabilities is here, an in-depth guide teaching the language is here. The spec ---- MODULE walkward ---- EXTENDS Integers VARIABLES pos vars == <<pos>> Double equals defines a new operator, single equals is an equality check. <<pos>> is a sequence, aka array. you == "you" me == "me" People == {you, me} MaxPlace == 4 left == 0 right == 1 I've gotten into the habit of assigning string "symbols" to operators so that the compiler complains if I misspelled something. left and right are numbers so we can shift position with right - pos. direction == [you |-> 1, me |-> -1] goal == [you |-> MaxPlace, me |-> 1] Init == \* left-right, forward-backward pos = [you |-> [lr |-> left, fb |-> 1], me |-> [lr |-> left, fb |-> MaxPlace]] direction, goal, and pos are "records", or hash tables with string keys. I can get my left-right position with pos.me.lr or pos["me"]["lr"] (or pos[me].lr, as me == "me"). Juke(person) == pos' = [pos EXCEPT ![person].lr = right - @] TLA+ breaks the world into a sequence of steps. In each step, pos is the value of pos in the current step and pos' is the value in the next step. The main outcome of this semantics is that we "assign" a new value to pos by declaring pos' equal to something. But the semantics also open up lots of cool tricks, like swapping two values with x' = y /\ y' = x. TLA+ is a little weird about updating functions. To set f[x] = 3, you gotta write f' = [f EXCEPT ![x] = 3]. To make things a little easier, the rhs of a function update can contain @ for the old value. ![me].lr = right - @ is the same as right - pos[me].lr, so it swaps left and right. ("Juke" comes from here) Move(person) == LET new_pos == [pos[person] EXCEPT !.fb = @ + direction[person]] IN /\ pos[person].fb # goal[person] /\ \A p \in People: pos[p] # new_pos /\ pos' = [pos EXCEPT ![person] = new_pos] The EXCEPT syntax can be used in regular definitions, too. This lets someone move one step in their goal direction unless they are at the goal or someone is already in that space. /\ means "and". Next == \E p \in People: \/ Move(p) \/ Juke(p) I really like how TLA+ represents concurrency: "In each step, there is a person who either moves or jukes." It can take a few uses to really wrap your head around but it can express extraordinarily complicated distributed systems. Spec == Init /\ [][Next]_vars Liveness == <>(pos[me].fb = goal[me]) ==== Spec is our specification: we start at Init and take a Next step every step. Liveness is the generic term for "something good is guaranteed to happen", see here for more. <> means "eventually", so Liveness means "eventually my forward-backward position will be my goal". I could extend it to "both of us eventually reach out goal" but I think this is good enough for a demo. Checking the spec Four years ago, everybody in TLA+ used the toolbox. Now the community has collectively shifted over to using the VSCode extension.1 VSCode requires we write a configuration file, which I will call walkward.cfg. SPECIFICATION Spec PROPERTY Liveness I then check the model with the VSCode command TLA+: Check model with TLC. Unsurprisingly, it finds an error: The reason it fails is "stuttering": I can get one step away from my goal and then just stop moving forever. We say the spec is unfair: it does not guarantee that if progress is always possible, progress will be made. If I want the spec to always make progress, I have to make some of the steps weakly fair. + Fairness == WF_vars(Next) - Spec == Init /\ [][Next]_vars + Spec == Init /\ [][Next]_vars /\ Fairness Now the spec is weakly fair, so someone will always do something. New error: \* First six steps cut 7: <Move("me")> pos = [you |-> [lr |-> 0, fb |-> 4], me |-> [lr |-> 1, fb |-> 2]] 8: <Juke("me")> pos = [you |-> [lr |-> 0, fb |-> 4], me |-> [lr |-> 0, fb |-> 2]] 9: <Juke("me")> (back to state 7) In this failure, I've successfully gotten past you, and then spend the rest of my life endlessly juking back and forth. The Next step keeps happening, so weak fairness is satisfied. What I actually want is for both my Move and my Juke to both be weakly fair independently of each other. - Fairness == WF_vars(Next) + Fairness == WF_vars(Move(me)) /\ WF_vars(Juke(me)) If my liveness property also specified that you reached your goal, I could instead write \A p \in People: WF_vars(Move(p)) etc. I could also swap the \A with a \E to mean at least one of us is guaranteed to have fair actions, but not necessarily both of us. New error: 3: <Move("me")> pos = [you |-> [lr |-> 0, fb |-> 2], me |-> [lr |-> 0, fb |-> 3]] 4: <Juke("you")> pos = [you |-> [lr |-> 1, fb |-> 2], me |-> [lr |-> 0, fb |-> 3]] 5: <Juke("me")> pos = [you |-> [lr |-> 1, fb |-> 2], me |-> [lr |-> 1, fb |-> 3]] 6: <Juke("me")> pos = [you |-> [lr |-> 1, fb |-> 2], me |-> [lr |-> 0, fb |-> 3]] 7: <Juke("you")> (back to state 3) Now we're getting somewhere! This is the original walkwarding situation we wanted to capture. We're in each others way, then you juke, but before either of us can move you juke, then we both juke back. We can repeat this forever, trapped in a social hell. Wait, but doesn't WF(Move(me)) guarantee I will eventually move? Yes, but only if a move is permanently available. In this case, it's not permanently available, because every couple of steps it's made temporarily unavailable. How do I fix this? I can't add a rule saying that we only juke if we're blocked, because the whole point of walkwarding is that we're not coordinated. In the real world, walkwarding can go on for agonizing seconds. What I can do instead is say that Liveness holds as long as Move is strongly fair. Unlike weak fairness, strong fairness guarantees something happens if it keeps becoming possible, even with interruptions. Liveness == + SF_vars(Move(me)) => <>(pos[me].fb = goal[me]) This makes the spec pass. Even if we weave back and forth for five minutes, as long as we eventually pass each other, I will reach my goal. Note we could also by making Move in Fairness strongly fair, which is preferable if we have a lot of different liveness properties to check. A small exercise for the reader There is a presumed invariant that is violated. Identify what it is, write it as a property in TLA+, and show the spec violates it. Then fix it. Answer (in rot13): Gur vainevnag vf "ab gjb crbcyr ner va gur rknpg fnzr ybpngvba". Zbir thnenagrrf guvf ohg Whxr qbrf abg. More TLA+ Exercises I've started work on an exercises repo. There's only a handful of specific problems now but I'm planning on adding more over the summer. learntla is still on the toolbox, but I'm hoping to get it all moved over this summer. ↩

17 hours ago 2 votes
the penultimate conditional syntax

About half a year ago I encountered a paper bombastically titled “the ultimate conditional syntax”. It has the attractive goal of unifying pattern match with boolean if tests, and its solution is in some ways very nice. But it seems over-complicated to me, especially for something that’s a basic work-horse of programming. I couldn’t immediately see how to cut it down to manageable proportions, but recently I had an idea. I’ll outline it under the “penultimate conditionals” heading below, after reviewing the UCS and explaining my motivation. what the UCS? whence UCS out of scope penultimate conditionals dangling syntax examples antepenultimate breath what the UCS? The ultimate conditional syntax does several things which are somewhat intertwined and support each other. An “expression is pattern” operator allows you to do pattern matching inside boolean expressions. Like “match” but unlike most other expressions, “is” binds variables whose scope is the rest of the boolean expression that might be evaluated when the “is” is true, and the consequent “then” clause. You can “split” tests to avoid repeating parts that are the same in successive branches. For example, if num < 0 then -1 else if num > 0 then +1 else 0 can be written if num < 0 then -1 > 0 then +1 else 0 The example shows a split before an operator, where the left hand operand is the same and the rest of the expression varies. You can split after the operator when the operator is the same, which is common for “is” pattern match clauses. Indentation-based syntax (an offside rule) reduces the amount of punctuation that splits would otherwise need. An explicit version of the example above is if { x { { < { 0 then −1 } }; { > { 0 then +1 } }; else 0 } } (This example is written in the paper on one line. I’ve split it for narrow screens, which exposes what I think is a mistake in the nesting.) You can also intersperse let bindings between splits. I doubt the value of this feature, since “is” can also bind values, but interspersed let does have its uses. The paper has an example using let to avoid rightward drift: if let tp1_n = normalize(tp1) tp1_n is Bot then Bot let tp2_n = normalize(tp2) tp2_n is Bot then Bot let m = merge(tp1_n, tp2_n) m is Some(tp) then tp m is None then glb(tp1_n, tp2_n) It’s probably better to use early return to avoid rightward drift. The desugaring uses let bindings when lowering the UCS to simpler constructions. whence UCS Pattern matching in the tradition of functional programming languages supports nested patterns that are compiled in a way that eliminates redundant tests. For example, this example checks that e1 is Some(_) once, not twice as written. if e1 is Some(Left(lv)) then e2 Some(Right(rv)) then e3 None then e4 Being cheeky, I’d say UCS introduces more causes of redundant checks, then goes to great effort to to eliminate redundant checks again. Splits reduce redundant code at the source level; the bulk of the paper is about eliminating redundant checks in the lowering from source to core language. I think the primary cause of this extra complexity is treating the is operator as a two-way test rather than a multi-way match. Splits are introduced as a more general (more complicated) way to build multi-way conditions out of two-way tests. There’s a secondary cause: the tradition of expression-oriented functional languages doesn’t like early returns. A nice pattern in imperative code is to write a function as a series of preliminary calculations and guards with early returns that set things up for the main work of the function. Rust’s ? operator and let-else statement support this pattern directly. UCS addresses the same pattern by wedging calculate-check sequences into if statements, as in the normalize example above. out of scope I suspect UCS’s indentation-based syntax will make programmers more likely to make mistakes, and make compilers have more trouble producing nice error messages. (YAML has put me off syntax that doesn’t have enough redundancy to support good error recovery.) So I wondered if there’s a way to have something like an “is pattern” operator in a Rust-like language, without an offside rule, and without the excess of punctuation in the UCS desugaring. But I couldn’t work out how to make the scope of variable bindings in patterns cover all the code that might need to use them. The scope needs to extend into the consequent then clause, but also into any follow-up tests – and those tests can branch so the scope might need to reach into multiple then clauses. The problem was the way I was still thinking of the then and else clauses as part of the outer if. That implied the expression has to be closed off before the then, which troublesomely closes off the scope of any is-bound variables. The solution – part of it, at least – is actually in the paper, where then and else are nested inside the conditional expression. penultimate conditionals There are two ingredients: The then and else clauses become operators that cause early return from a conditional expression. They can be lowered to a vaguely Rust syntax with the following desugaring rules. The 'if label denotes the closest-enclosing if; you can’t use then or else inside the expr of a then or else unless there’s another intervening if. then expr ⟼ && break 'if expr else expr ⟼ || break 'if expr else expr ⟼ || _ && break 'if expr There are two desugarings for else depending on whether it appears in an expression or a pattern. If you prefer a less wordy syntax, you might spell then as => (like match in Rust) and else as || =>. (For symmetry we might allow && => for then as well.) An is operator for multi-way pattern-matching that binds variables whose scope covers the consequent part of the expression. The basic form is like the UCS, scrutinee is pattern which matches the scrutinee against the pattern returning a boolean result. For example, foo is None Guarded patterns are like, scrutinee is pattern && consequent where the scope of the variables bound by the pattern covers the consequent. The consequent might be a simple boolean guard, for example, foo is Some(n) && n < 0 or inside an if expression it might end with a then clause, if foo is Some(n) && n < 0 => -1 // ... Simple multi-way patterns are like, scrutinee is { pattern || pattern || … } If there is a consequent then the patterns must all bind the same set of variables (if any) with the same types. More typically, a multi-way match will have consequent clauses, like scrutinee is { pattern && consequent || pattern && consequent || => otherwise } When a consequent is false, we go on to try other alternatives of the match, like we would when the first operand of boolean || is false. To help with layout, you can include a redundant || before the first alternative. For example, if foo is { || Some(n) && n < 0 => -1 || Some(n) && n > 0 => +1 || Some(n) => 0 || None => 0 } Alternatively, if foo is { Some(n) && ( n < 0 => -1 || n > 0 => +1 || => 0 ) || None => 0 } (They should compile the same way.) The evaluation model is like familiar shortcutting && and || and the syntax is supposed to reinforce that intuition. The UCS paper spends a lot of time discussing backtracking and how to eliminate it, but penultimate conditionals evaluate straightforwardly from left to right. The paper briefly mentions as patterns, like Some(Pair(x, y) as p) which in Rust would be written Some(p @ Pair(x, y)) The is operator doesn’t need a separate syntax for this feature: Some(p is Pair(x, y)) For large examples, the penultimate conditional syntax is about as noisy as Rust’s match, but it scales down nicely to smaller matches. However, there are differences in how consequences and alternatives are punctuated which need a bit more discussion. dangling syntax The precedence and associativity of the is operator is tricky: it has two kinds of dangling-else problem. The first kind occurs with a surrounding boolean expression. For example, when b = false, what is the value of this? b is true || false It could bracket to the left, yielding false: (b is true) || false Or to the right, yielding true: b is { true || false } This could be disambiguated by using different spellings for boolean or and pattern alternatives. But that doesn’t help for the second kind which occurs with an inner match. foo is Some(_) && bar is Some(_) || None Does that check foo is Some(_) with an always-true look at bar ( foo is Some(_) ) && bar is { Some(_) || None } Or does it check bar is Some(_) and waste time with foo? foo is { Some(_) && ( bar is Some(_) ) || None } I have chosen to resolve the ambiguity by requiring curly braces {} around groups of alternative patterns. This allows me to use the same spelling || for all kinds of alternation. (Compare Rust, which uses || for boolean expressions, | in a pattern, and , between the arms of a match.) Curlies around multi-way matches can be nested, so the example in the previous section can also be written, if foo is { || Some(n) && n < 0 => -1 || Some(n) && n > 0 => +1 || { Some(0) || None } => 0 } The is operator binds tigher than && on its left, but looser than && on its right (so that a chain of && is gathered into a consequent) and tigher than || on its right so that outer || alternatives don’t need extra brackets. examples I’m going to finish these notes by going through the ultimate conditional syntax paper to translate most of its examples into the penultimate syntax, to give it some exercise. Here we use is to name a value n, as a replacement for the |> abs pipe operator, and we use range patterns instead of split relational operators: if foo(args) is { || 0 => "null" || n && abs(n) is { || 101.. => "large" || ..10 => "small" || => "medium" ) } In both the previous example and the next one, we have some extra brackets where UCS relies purely on an offside rule. if x is { || Right(None) => defaultValue || Right(Some(cached)) => f(cached) || Left(input) && compute(input) is { || None => defaultValue || Some(result) => f(result) } } This one is almost identical to UCS apart from the spellings of and, then, else. if name.startsWith("_") && name.tailOption is Some(namePostfix) && namePostfix.toIntOption is Some(index) && 0 <= index && index < arity && => Right([index, name]) || => Left("invalid identifier: " + name) Here are some nested multi-way matches with overlapping patterns and bound values: if e is { // ... || Lit(value) && Map.find_opt(value) is Some(result) => Some(result) // ... || { Lit(value) || Add(Lit(0), value) || Add(value, Lit(0)) } => { print_int(value); Some(value) } // ... } The next few examples show UCS splits without the is operator. In my syntax I need to press a few more buttons but I think that’s OK. if x == 0 => "zero" || x == 1 => "unit" || => "?" if x == 0 => "null" || x > 0 => "positive" || => "negative" if predicate(0, 1) => "A" || predicate(2, 3) => "B" || => "C" The first two can be written with is instead, but it’s not briefer: if x is { || 0 => "zero" || 1 => "unit" || => "?" } if x is { || 0 => "null" || 1.. => "positive" || => "negative" } There’s little need for a split-anything feature when we have multi-way matches. if foo(u, v, w) is { || Some(x) && x is { || Left(_) => "left-defined" || Right(_) => "right-defined" } || None => "undefined" } A more complete function: fn zip_with(f, xs, ys) { if [xs, ys] is { || [x :: xs, y :: ys] && zip_with(f, xs, ys) is Some(tail) => Some(f(x, y) :: tail) || [Nil, Nil] => Some(Nil) || => None } } Another fragment of the expression evaluator: if e is { // ... || Var(name) && Map.find_opt(env, name) is { || Some(Right(value)) => Some(value) || Some(Left(thunk)) => Some(thunk()) } || App(lhs, rhs) => // ... // ... } This expression is used in the paper to show how a UCS split is desugared: if Pair(x, y) is { || Pair(Some(xv), Some(yv)) => xv + yv || Pair(Some(xv), None) => xv || Pair(None, Some(yv)) => yv || Pair(None, None) => 0 } The desugaring in the paper introduces a lot of redundant tests. I would desugar straightforwardly, then rely on later optimizations to eliminate other redundancies such as the construction and immediate destruction of the pair: if Pair(x, y) is Pair(xx, yy) && xx is { || Some(xv) && yy is { || Some(yv) => xv + yv || None => xv } || None && yy is { || Some(yv) => yv || None => 0 } } Skipping ahead to the “non-trivial example” in the paper’s fig. 11: if e is { || Var(x) && context.get(x) is { || Some(IntVal(v)) => Left(v) || Some(BoolVal(v)) => Right(v) } || Lit(IntVal(v)) => Left(v) || Lit(BoolVal(v)) => Right(v) // ... } The next example in the paper compares C# relational patterns. Rust’s range patterns do a similar job, with the caveat that Rust’s ranges don’t have a syntax for exclusive lower bounds. fn classify(value) { if value is { || .. -4.0 => "too low" || 10.0 .. => "too high" || NaN => "unknown" || => "acceptable" } } I tend to think relational patterns are the better syntax than ranges. With relational patterns I can rewrite an earlier example like, if foo is { || Some(< 0) => -1 || Some(> 0) => +1 || { Some(0) || None } => 0 } I think with the UCS I would have to name the Some(_) value to be able to compare it, which suggests that relational patterns can be better than UCS split relational operators. Prefix-unary relational operators are also a nice way to write single-ended ranges in expressions. We could simply write both ends to get a complete range, like >= lo < hi or like if value is > -4.0 < 10.0 => "acceptable" || => "far out" Near the start I quoted a normalize example that illustrates left-aligned UCS expression. The penultimate version drifts right like the Scala version: if normalize(tp1) is { || Bot => Bot || tp1_n && normalize(tp2) is { || Bot => Bot || tp2_n && merge(tp1_n, tp2_n) is { || Some(tp) => tp || None => glb(tp1_n, tp2_n) } } } But a more Rusty style shows the benefits of early returns (especially the terse ? operator) and monadic combinators. let tp1 = normalize(tp1)?; let tp2 = normalize(tp2)?; merge(tp1, tp2) .unwrap_or_else(|| glb(tp1, tp2)) antepenultimate breath When I started writing these notes, my penultimate conditional syntax was little more than a sketch of an idea. Having gone through the previous section’s exercise, I think it has turned out better than I thought it might. The extra nesting from multi-way match braces doesn’t seem to be unbearably heavyweight. However, none of the examples have bulky then or else blocks which are where the extra nesting is more likely to be annoying. But then, as I said before it’s comparable to a Rust match: match scrutinee { pattern => { consequent } } if scrutinee is { || pattern => { consequent } } The || lines down the left margin are noisy, but hard to get rid of in the context of a curly-brace language. I can’t reduce them to | like OCaml because what would I use for bitwise OR? I don’t want presence or absence of flow control to depend on types or context. I kind of like Prolog / Erlang , for && and ; for ||, but that’s well outside what’s legible to mainstream programmers. So, dunno. Anyway, I think I’ve successfully found a syntax that does most of what UCS does, but much in a much simpler fashion.

2 days ago 4 votes
Coding should be a vibe!

The appeal of "vibe coding" — where programmers lean back and prompt their way through an entire project with AI — appears partly to be based on the fact that so many development environments are deeply unpleasant to work with. So it's no wonder that all these programmers stuck working with cumbersome languages and frameworks can't wait to give up on the coding part of software development. If I found writing code a chore, I'd be looking for retirement too. But I don't. I mean, I used to! When I started programming, it was purely because I wanted programs. Learning to code was a necessary but inconvenient step toward bringing systems to life. That all changed when I learned Ruby and built Rails. Ruby's entire premise is "programmer happiness": that writing code should be a joy. And historically, the language was willing to trade run-time performance, memory usage, and other machine sympathies against the pursuit of said programmer happiness. These days, it seems like you can eat your cake and have it too, though. Ruby, after thirty years of constant improvement, is now incredibly fast and efficient, yet remains a delight to work with. That ethos couldn't shine brighter now. Disgruntled programmers have finally realized that an escape from nasty syntax, boilerplate galore, and ecosystem hyper-churn is possible. That's the appeal of AI: having it hide away all that unpleasantness. Only it's like cleaning your room by stuffing the mess under the bed — it doesn't make it go away! But the instinct is correct: Programming should be a vibe! It should be fun! It should resemble English closely enough that line noise doesn't obscure the underlying ideas and decisions. It should allow a richness of expression that serves the human reader instead of favoring the strictness preferred by the computer. Ruby does. And given that, I have no interest in giving up writing code. That's not the unpleasant part that I want AI to take off my hands. Just so I can — what? — become a project manager for a murder of AI crows? I've had the option to retreat up the manager ladder for most of my career, but I've steadily refused, because I really like writing Ruby! It's the most enjoyable part of the job! That doesn't mean AI doesn't have a role to play when writing Ruby. I'm conversing and collaborating with LLMs all day long — looking up APIs, clarifying concepts, and asking stupid questions. AI is a superb pair programmer, but I'd retire before permanently handing it the keyboard to drive the code. Maybe one day, wanting to write code will be a quaint concept. Like tending to horses for transportation in the modern world — done as a hobby but devoid of any economic value. I don't think anyone knows just how far we can push the intelligence and creativity of these insatiable token munchers. And I wouldn't bet against their advance, but it's clear to me that a big part of their appeal to programmers is the wisdom that Ruby was founded on: Programming should favor and flatter the human.

2 days ago 8 votes
Tempest Rising is a great game

I really like RTS games. I pretty much grew up on them, starting with Command&Conquer 3: Kane’s Wrath, moving on to StarCraft 2 trilogy and witnessing the downfall of Command&Conquer 4. I never had the disks for any other RTS games during my teenage years. Yes, the disks, the ones you go to the store to buy! I didn’t know Steam existed back then, so this was my only source of games. There is something magical in owning a physical copy of the game. I always liked the art on the front (a mandatory huge face for all RTS!), game description and screenshots on the back, even the smell of the plastic disk case.

2 days ago 3 votes