Porting OkLab colorspace to integer arithmetic

from A small freedom area RSS [alt+shift+b] in programming

For reasons I'll explain in a futur write-up, I needed to make use of a perceptually uniform colorspace in some computer vision code. OkLab from Björn Ottosson was a great candidate given how simple the implementation is. But there is a plot twist: I needed the code to be deterministic for the tests to be portable across a large variety of architecture, systems and configurations. Several solutions were offered to me, including reworking the test framework to support a difference mechanism with threshold, but having done that in another project I can confidently say that it's not trivial (when not downright impossible in certain cases). Another approach would have been to hardcode the libc math functions, but even then I wasn't confident the floating point arithmetic would determinism would be guaranteed in all cases. So I ended up choosing to port the code to integer arithmetic. I'm sure many people would disagree with that approach, but: code determinism is guaranteed not all FPU...

over a year ago

Remove from reading list Add to reading list [alt+a] Read now [→]

Comments

Improve your reading experience

Logged in users get linked directly to articles resulting in a better reading experience. Please login for free, it takes less than 1 minute.

More from A small freedom area RSS

Fixing the iterative damping interpolation in video games

As I'm exploring the fantastic world of indie game development lately, I end up watching a large number of video tutorials on the subject. Even though the quality of the content is pretty variable, I'm very grateful to the creators for it. That being said, I couldn't help noticing this particular bit times and times again: a = lerp(a, B, delta * RATE) Behind this apparent banal call hides a terrible curse, forever perpetrated by innocent souls on the Internet. In this article we will study what it's trying to achieve, how it works, why it's wrong, and then we'll come up with a good solution to the initial problem. The usual warning: I don't have a mathematics or academic background, so the article is addressed at other neanderthals like myself, who managed to understand that pressing keys on a keyboard make pixels turn on and off. What is it? Let's start from the beginning. We're in a game engine main loop callback called at a regular interval (roughly), passing down the time difference from the last call. In Godot engine, it looks like this: func _physics_process(delta: float): ... If the game is configured to refresh at 60 FPS, we can expect this function to be called around 60 times per second with delta = 1/60 = 0.01666.... As a game developer, we want some smooth animations for all kind of transformations. For example, we may want the speed of the player to go down to zero as they release the moving key. We could do that linearly, but to make the stop less brutal and robotic we want to slow down the speed progressively. Linear (top) versus smooth/exponential (bottom) animation Virtually every tutorial will suggest updating some random variable with something like that: velocity = lerp(velocity, 0, delta * RATE) At 60 FPS, a decay RATE defined to 3.5, and an initial velocity of 100, the velocity will go down to 0 following this curve: Example curve of a decaying variable Note: velocity is just a variable name example, it can be found in many other contexts If you're familiar with lerp() ("linear interpolation") you may be wondering why this is making a curve. Indeed, this lerp() function, also known as mix(), is a simple linear function defined as lerp(a,b,x) = x*(b-a) + a or its alternative stable form lerp(a,b,x) = (1-a)x + bx. For more information, see a previous article about this particular function. But here we are re-using the previous value, so this essentially means nesting lerp() function calls, which expands into a power formula, forming a curve composed of a chain of small straight segments. Why is it wrong? The main issue is that the formula is heavily depending on the refresh rate. If the game is supposed to work at 30, 60, or 144 FPS, then it means the physics engine is going to behave differently. Here is an illustration of the kind of instability we can expect: Comparison of the curves at different frame rates with the problematic formula Note that the inaccuracy when compared to an ideal curve is not the issue here. The problem is that the game mechanics are different depending on the hardware, the system, and the wind direction observed in a small island of Japan. Imagine being able to jump further if we replace our 60Hz monitor with a 144Hz one, that would be some nasty pay to win incentive. We may be able to get away with this by forcing a constant refresh rate for the game and consider this a non-issue (I'm not convinced this is achievable on all engines and platforms), but then we meet another problem: the device may not be able to hold this requirement at all times because of potential lags (for reasons that may be outside our control). That's right, so far we assumed delta=1/FPS but that's merely a target, it could fluctuate, causing mild to dramatic situations gameplay wise. One last issue with that formula is the situation of a huge delay spike, causing an overshooting of the target. For example if we have RATE=3 and we end up with a frame that takes 500ms for whatever random reason, we're going to interpolate with a value of 1.5, which is way above 1. This is easily fixed by maxing out the 3rd argument of lerp to 1, but we have to keep that issue in mind. To summarize, the formula is: not frame rate agnostic ❌ non deterministic ❌ vulnerable to overshooting ❌ If you're not interested in the gory details on the how, you can now jump straight to the conclusion for a better alternative. Study We're going to switch to a more mathematical notation from now on. It's only going to be linear algebra, nothing particularly fancy, but we're going to make a mess of 1 letter symbols, bear with me. Let's name the exhaustive list of inputs of our problem: initial value: a_0=\Alpha (from where we start, only used once) target value: \Beta (where we are going, constant value) time delta: \Delta_n (time difference from last call) the rate of change: R (arbitrary scaling user constant) original sequence: a_{n+1} = \texttt{lerp}(a_n, \Beta, R\Delta_n) (the code in the main loop callback) frame rate: F (the target frame rate, for example 60 FPS) time: t (animation time elapsed) What we are looking for is a new sequence formula u_n (u standing for purfect) that doesn't have the 3 previously mentioned pitfalls. The first thing we can do is to transform this recursive sequence into the expected ideal contiguous time based function. The original sequence was designed for a given rate R and FPS F: this means that while \Delta_n changes in practice every frame, the ideal function we are looking for is constant: \Delta=1/F. So instead of starting from a_{n+1} = \texttt{lerp}(a_n, \Beta, R\Delta_n), we will look for u_n starting from u_{n+1} = \texttt{lerp}(u_n, \Beta, R\Delta) with u_0=a_0=\Alpha. Since I'm lazy and incompetent, we are just going to ask WolframAlpha for help finding the solution to the recursive sequence. But to feed its input we need to simplify the terms a bit: ...with P=(1-R\Delta) and Q=\Beta R\Delta. We do that so we have a familiar ax+b linear form. According to WolframAlpha this is equivalent to: This is great because we now have the formula according to n, our frame number. We can also express that discrete sequence into a contiguous function according to the time t: Expanding our temporary P and Q placeholders with their values and unrolling, we get: This function perfectly matches the initial lerp() sequence in the hypothetical situation where the frame rate is honored. Basically, it's what the sequence a_{n+1} was meant to emulate at a given frame rate F. Note: we swapped the first 2 terms of lerp() at the last step because it makes more sense semantically to go from \Alpha to \Beta. Let's again summarize what we have and what we want: we're into the game main loop and we want our running value to stick to that f(t) function. We have: v=f(t): the value previously computed (t is the running duration so far, but we don't have it); in the original sequence this is known as a_n \Delta_n: the delta time for the current frame We are looking for a function \Eta(v,\Delta_n) which defines the position of a new point on the curve, only knowing v and \Delta_n. It's a "time agnostic" version of f(t). Basically, it is defined as \Eta(v,\Delta_n)=f(t+\Delta_n), but since we don't have t it's not very helpful. That being said, while we don't have t, we do have f(t) (the previous value v). Looking at the curve, we know the y-value of the previous point, and we know the difference between the new point and the previous point on the x-axis: Previous and current point in time If we want t (the total time elapsed at the previous point), we need the inverse function f^{-1}. Indeed, t = f^{-1}(f(t)): taking the inverse of a function gives back the input. We know f so we can inverse it, relying on WolframAlpha again (what a blessing this website is): Note: \ln stands for natural logarithm, sometimes also called \log. Careful though, on Desmos for example \log is in base 10, not base e (while its \exp is in base e for some reason). This complex formula may feel a bit intimidating but we can now find \Eta only using its two parameters: It's pretty messy but we can simplify that down to something much simpler. An interesting property that is going to be helpful here is m^n = e^{n \ln m}. For my fellow programmers getting tensed here: pow(m, n) == exp(n * log(m)). Similarly, e^{\ln x} = x (exp(log(x)) == x), which give us an idea where this is going. Again we swapped the first 2 arguments of lerp at the last step at the cost of an additional subtraction: this is more readable because \Beta is our destination point. Rewriting this in a sequence notation, we get: We still have this annoying F\ln(1-R/F) bit in the formula, but we can take it out and precompute it because it is constant: it's our rate conversion formula: We're going to make one extra adjustment: R' is negative, which is not exactly intuitive to work with as a user (in case it is defined arbitrarily and not through the conversion formula), so we make a sign swap for convenience: The conversion formula is optional, it's only needed to port a previously broken code to the new formula. One interesting thing here is that R' is fairly close to R when R is small. For example, a rate factor R=5 at 60 FPS gives us R' \approx 5.22. This means that if the rate factors weren't closely tuned, it is probably acceptable to go with R'=R and not bother with any conversion. Still, having that formula can be useful to update all the decay constants and check that everything still works as expected. Also, notice how if the delta gets very large, -R'\Delta_n is going toward -\infty, e^{-R'\Delta_n} toward 0, 1-e^{-R'\Delta_n} toward 1, and so the interpolation is going to reach our final target \Beta without overshooting. This means the formula doesn't need any extra care with regard to the 3rd issue we pointed out earlier. Looking at the previous curves but now with the new formula and an adjusted rate: Comparison of the curves at different frame rates with the new formula Conclusion So there we have it, the perfect formula, frame rate agnostic ✅, deterministic ✅ and resilient to overshooting ✅. If you've quickly skimmed through the maths, here is what you need to know: a = lerp(a, B, delta * RATE) Should be changed to: a = lerp(a, B, 1.0 - exp(-delta * RATE2)) With the precomputed RATE2 = -FPS * log(1 - RATE/FPS) (where log is the natural logarithm), or simply using RATE2 = RATE as a rough equivalent. Also, any existing overshooting clamping can safely be dropped. Now please adjust your game to make the world a better and safer place for everyone ♥

a year ago • 118 votes

Hacking window titles to help OBS

This write-up is meant to present the rationale and technical details behind a tiny project I wrote the other day, WTH, or WindowTitleHack, which is meant to force a constant window name for apps that keep changing it (I'm looking specifically at Firefox and Krita, but there are probably many others). Why tho? I've been streaming on Twitch from Linux (X11) with a barebone OBS Studio setup for a while now, and while most of the experience has been relatively smooth, one particularly striking frustration has been dealing with windows detection. If you don't want to capture the whole desktop for privacy reasons or simply to have control over the scene layout depending on the currently focused app, you need to rely on the Window Capture (XComposite) source. This works mostly fine, and it is actually able to track windows even when their title bar is renamed. But obviously, upon restart it can't find it again because both the window title and the window ID changed, meaning you have to redo your setup by reselecting the window again. It would have been acceptable if that was the only issue I had, but one of the more advanced feature I'm extensively using is the Advanced Scene Switcher (the builtin one, available through the Tools menu). This tool is a basic window title pattern matching system that allows automatic scene switches depending on the current window. Since it doesn't even seem to support regex, it's troublesome to have it reliably work with apps constantly changing their title (and even if it had, there is no guarantee that the app would leave a recognizable matchable pattern in its title). Hacking Windows One unreliable hack would be to spam xdotool commands to correct the window title. This could be a resource hog, and it would create quite a bunch of races. One slight improvement over this would be to use xprop -spy, but that wouldn't address the race conditions (since we would adjust the title after it's been already changed). So how do we deal with that properly? Well, on X11 with the reference library (Xlib) there are actually various (actually a lot of) ways of changing the title bar. It took me a while to identify which call(s) to target, but ended up with the following call graph, where each function is actually exposed publicly: From this we can easily see that we only need to hook the deepest function XChangeProperty, and check if the property is XA_WM_NAME (or its "modern" sibling, _NET_WM_NAME). How do we do that? With the help of the LD_PRELOAD environment variable and a dynamic library that implements a custom XChangeProperty. First, we grab the original function: #include <dlfcn.h> /* A type matching the prototype of the target function */ typedef int (*XChangeProperty_func_type)( Display *display, Window w, Atom property, Atom type, int format, int mode, const unsigned char *data, int nelements ); /* [...] */ XChangeProperty_func_type XChangeProperty_orig = dlsym(RTLD_NEXT, "XChangeProperty"); We also need to craft a custom _NET_WM_NAME atom: _NET_WM_NAME = XInternAtom(display, "_NET_WM_NAME", 0); With this we are now able to intercept all the WM_NAME events and override them with our own: if (property == XA_WM_NAME || property == _NET_WM_NAME) { data = (const unsigned char *)new_title; nelements = (int)strlen(new_title); } return XChangeProperty_orig(display, w, property, type, format, mode, data, nelements); We wrap all of this into our own redefinition of XChangeProperty and… that's pretty much it. Now due to a long history of development, Xlib has been "deprecated" and superseded by libxcb. Both are widely used, but fortunately the APIs are more or less similar. The function to hook is xcb_change_property, and defining _NET_WM_NAME is slightly more cumbered but not exactly challenging: const xcb_intern_atom_cookie_t cookie = xcb_intern_atom(conn, 0, strlen("_NET_WM_NAME"), "_NET_WM_NAME"); xcb_intern_atom_reply_t *reply = xcb_intern_atom_reply(conn, cookie, NULL); if (reply) _NET_WM_NAME = reply->atom; free(reply); Aside from that, the code is pretty much the same. Configuration To pass down the custom title to override, I've been relying on an environment variable WTH_TITLE. From a user point of view, it looks like this: LD_PRELOAD="builddir/libwth.so" WTH_TITLE="Krita4ever" krita We could probably improve the usability by creating a wrapping tool (so that we could have something such as ./wth --title=Krita4ever krita). Unfortunately I wasn't yet able to make a self-referencing executable accepted by LD_PRELOAD, so for now the manual LD_PRELOAD and WTH_TITLE environment will do just fine. Thread safety To avoid a bunch of redundant function roundtrips we need to globally cache a few things: the new title (to avoid fetching it in the environment all the time), the original functions (to save the dlsym call), and _NET_WM_NAME. Those are loaded lazily at the first function call, but we have no guarantee with regards to concurrent calls on that hooked function so we must create our own lock. I initially though about using pthread_once but unfortunately the initialization callback mechanism doesn't allow any custom argument. Again, this is merely a slight annoyance since we can implement our own in a few lines of code: /* The "once" API is similar to pthread_once but allows a custom function argument */ struct wth_once { pthread_mutex_t lock; int initialized; }; #define WTH_ONCE_INITIALIZER {.lock=PTHREAD_MUTEX_INITIALIZER} typedef void (*init_func_type)(void *user_arg); void wth_init_once(struct wth_once *once, init_func_type init_func, void *user_arg) { pthread_mutex_lock(&once->lock); if (!once->initialized) { init_func(user_arg); once->initialized = 1; } pthread_mutex_unlock(&once->lock); } Which we use like this: static struct wth_once once = WTH_ONCE_INITIALIZER; static void init_once(void *user_arg) { Display *display = user_arg; /* [...] */ } /* [...] */ wth_init_once(&once, init_once, display); The End? I've been delaying doing this project for weeks because it felt complex at first glance, but it actually just took me a few hours. Probably the same amount of time it took me to write this article. While the project is admittedly really small, it still feel like a nice accomplishment. I hope it's useful to other people. Now, the Wayland support is probably the most obvious improvement the project can receive, but I don't have such a setup locally to test yet, so this is postponed for an undetermined amount of time. The code is released with a permissive license (MIT); if you want to contribute you can open a pull request but getting in touch with me first is appreciated to avoid unnecessary and overlapping efforts.

over a year ago • 106 votes

Improving color quantization heuristics

In 2015, I wrote an article about how the palette color quantization was improved in FFmpeg in order to make nice animated GIF files. For some reason, to this day this is one of my most popular article. As time passed, my experience with colors grew and I ended up being quite ashamed and frustrated with the state of these filters. A lot of the code was naive (when not terribly wrong), despite the apparent good results. One of the major change I wanted to do was to evaluate the color distances using a perceptually uniform colorspace, instead of using a naive euclidean distance of RGB triplets. As usual it felt like a week-end long project; after all, all I have to do is change the distance function to work in a different space, right? Well, if you're following my blog you might have noticed I've add numerous adventures that stacked up on each others: I had to work out the colorspace with integer arithmetic first ...which forced me to look into integer division more deeply ...which confronted me to all sort of undefined behaviours in the process And when I finally reached the point where I could make the switch to OkLab (the perceptual colorspace), a few experiments showed that the flavour of the core algorithm I was using might contain some fundamental flaws, or at least was not implementing optimal heuristics. So here we go again, quickly enough I find myself starting a new research study in the pursuit of understanding how to put pixels on the screen. This write-up is the story of yet another self-inflicted struggle. Palette quantization But what is palette quantization? It essentially refers to the process of reducing the number of available colors of an image down to a smaller subset. In sRGB, an image can have up to 16.7 million colors. In practice though it's generally much less, to the surprise of no one. Still, it's not rare to have a few hundreds of thousands different colors in a single picture. Our goal is to reduce that to something like 256 colors that represent them best, and use these colors to create a new picture. Why you may ask? There are multiple reasons, here are some: Improve size compression (this is a lossy operation of course, and using dithering on top might actually defeat the original purpose) Some codecs might not support anything else than limited palettes (GIF or subtitles codecs are examples) Various artistic purposes Following is an example of a picture quantized at different levels: Original (26125 colors) Quantized to 8bpp (256 colors) Quantized to 2bpp (4 colors) This color quantization process can be roughly summarized in a 4-steps based process: Sample the input image: we build an histogram of all the colors in the picture (basically a simple statistical analysis) Design a colormap: we build the palette through various means using the histograms Create a pixel mapping which associates a color (one that can be found in the input image) with another (one that can be found in the newly created palette) Image quantizing: we use the color mapping to build our new image. This step may also involve some dithering. The study here will focus on step 2 (which itself relies on step 1). Colormap design algorithms A palette is simply a set of colors. It can be represented in various ways, for example here in 2D and 3D: To generate such a palette, all sort of algorithms exists. They are usually classified into 2 large categories: Dividing/splitting algorithms (such as Median-Cut and its various flavors) Clustering algorithms (such as K-means, maximin distance, (E)LBG or pairwise clustering) The former are faster but non-optimal while the latter are slower but better. The problem is NP-complete, meaning it's possible to find the optimal solution but it can be extremely costly. On the other hand, it's possible to find "local optimums" at minimal cost. Since I'm working within FFmpeg, speed has always been a priority. This was the reason that motivated me to initially implement the Median-Cut over a more expensive algorithm. The rough picture of the algorithm is relatively easy to grasp. Assuming we want a palette of K colors: A set S of all the colors in the input picture is constructed, along with a respective set W of the weight of each color (how much they appear) Since the colors are expressed as RGB triplets, they can be encapsulated in one big cuboid, or box The box is cut in two along one of the axis (R, G or B) on the median (hence the name of the algorithm) If we don't have a total K boxes yet, pick one of them and go back to previous step All the colors in each of the K boxes are then averaged to form the color palette entries Here is how the process looks like visually: Median-Cut algorithm targeting 16 boxes You may have spotted in this video that the colors are not expressed in RGB but in Lab: this is because instead of representing the colors in a traditional RGB colorspace, we are instead using the OkLab colorspace which has the property of being perceptually uniform. It doesn't really change the Median Cut algorithm, but it definitely has an impact on the resulting palette. One striking limitation of this algorithm is that we are working exclusively with cuboids: the cuts are limited to an axis, we are not cutting along an arbitrary plane or a more complex shape. Think of it like working with voxels instead of more free-form geometries. The main benefit is that the algorithm is pretty simple to implement. Now the description provided earlier conveniently avoided describing two important aspects happening in step 3 and 4: How do we choose the next box to split? How do we choose along which axis of the box we make the cut? I pondered about that for a quite a long time. An overview of the possible heuristics In bulk, some of the heuristics I started thinking of: should we take the box that has the longest axis across all boxes? should we take the box that has the largest volume? should we take the box that has the biggest Mean Squared Error when compared to its average color? should we take the box that has the axis with the biggest MSE? assuming we choose to go with the MSE, should it be normalized across all boxes? should we even account for the weight of each color or consider them equal? what about the axis? Is it better to pick the longest or the one with the higher MSE? I tried to formalize these questions mathematically to the best of my limited abilities. So let's start by saying that all the colors c of given box are stored in a N×M 2D-array following the matrix notation: L₁L₂L₃…Lₘ a₁a₂a₃…aₘ b₁b₂b₃…bₘ N is the number of components (3 in our case, whether it's RGB or Lab), and M the number of colors in that box. You can visualize this as a list of vectors as well, where c_{i,j} is the color at row i and column j. With that in mind we can sketch the following diagram representing the tree of heuristic possibilities to implement: Mathematicians are going to kill me for doodling random notes all over this perfectly understandable symbols gibberish, but I believe this is required for the human beings reading this article. In summary, we end up with a total of 24 combinations to try out: 2 axis selection heuristics: cut the axis with the maximum error squared cut the axis with the maximum length 3 operators: maximum measurement out of all the channels product of the measurements of all the channels sum of the measurements of all the channels 4 measurements: error squared, honoring weights error squared, not honoring weights error squared, honoring weights, normalized length of the axis If we start to intuitively think about which ones are likely going to perform the best, we quickly realize that we haven't actually formalized what we are trying to achieve. Such a rookie mistake. Clarifying this will help us getting a better feeling about the likely outcome. I chose to target an output that minimizes the MSE against the reference image, in a perceptual way. Said differently, trying to make the perceptual distance between an input and output color pixel as minimal as possible. This is an arbitrary and debatable target, but it's relatively simple and objective to evaluate if we have faith in the selected perceptual model. Another appropriate metric could have been to find the ideal palette through another algorithm, and compare against that instead. Doing that unfortunately implied that I would trust that other algorithm, its implementation, and that I have enough computing power. So to summarize, we want to minimize the MSE between the input and output, evaluated in the OkLab colorspace. This can be expressed with the following formula: Where: P is a partition (which we constrain to a box in our implementation) C the set of colors in the partition P w the weight of a color c a single color µ the average color of the set C Special thanks to criver for helping me a ton on the math area, this last formula is from them. Looking at the formula, we can see how similar it is to certain branches of the heuristics tree, so we can start getting an intuition about the result of the experiment. Experiment language Short deviation from the main topic (feel free to skip to the next section): working in C within FFmpeg quickly became a hurdle more than anything. Aside from the lack of flexibility, the implicit casts destroying the precision deceitfully, and the undefined behaviours, all kind of C quirks went in the way several times, which made me question my sanity. This one typically severly messed me up while trying to average the colors: #include <stdio.h> #include <stdint.h> int main (void) { const int32_t x = -30; const uint32_t y = 10; const uint32_t a = 30; const int32_t b = -10; printf("%d×%u=%d\n", x, y, x * y); printf("%u×%d=%d\n", a, b, a * b); printf("%d/%u=%d\n", x, y, x / y); printf("%u/%d=%d\n", a, b, a / b); return 0; } % cc -Wall -Wextra -fsanitize=undefined test.c -o test && ./test -30×10=-300 30×-10=-300 -30/10=429496726 30/-10=0 Anyway, I know this is obvious but if you aren't already doing that I suggest you build your experiments in another language, Python or whatever, and rewrite them in C later when you figured out your expected output. Re-implementing what I needed in Python didn't take me long. It was, and still is obviously much slower at runtime, but that's fine. There is a lot of room for speed improvement, typically by relying on numpy (which I didn't bother with). Experiment results I created a research repository for the occasion. The code to reproduce and the results can be found in the color quantization README. In short, based on the results, we can conclude that: Overall, the box that has the axis with the largest non-normalized weighted sum of squared error is the best candidate in the box selection algorithm Overall, cutting the axis with the largest weighted sum of squared error is the best axis cut selection algorithm To my surprise, normalizing the weights per box is not a good idea. I initially observed that by trial and error, which was actually one of the main motivator for this research. I initially thought normalizing each box was necessary in order to compare them against each others (such that they are compared on a common ground). My loose explanation of the phenomenon was that not normalizing causes a bias towards boxes with many colors, but that's actually exactly what we want. I believe it can also be explained by our evaluation function: we want to minimize the error across the whole set of colors, so small partitions (in color counts) must not be made stronger. At least not in the context of the target we chose. It's also interesting to see how the max() seems to perform better than the sum() of the variance of each component most of the time. Admittedly my picture samples set is not that big, which may imply that more experiments to confirm that tendency are required. In retrospective, this might have been quickly predictable to someone with a mathematical background. But since I don't have that, nor do I trust my abstract thinking much, I'm kind of forced to try things out often. This is likely one of the many instances where I spent way too much energy on something obvious from the beginning, but I have the hope it will actually provide some useful information for other lost souls out there. Known limitations There are two main limitations I want to discuss before closing this article. The first one is related to minimizing the MSE even more. K-means refinement We know the Median-Cut actually provides a rough estimate of the optimal palette. One thing we could do is use it as a first step before refinement, for example by running a few K-means iterations as post-processing (how much refinement/iterations could be a user control). The general idea of K-means is to progressively move each colors individually to a more appropriate box, that is a box for which the color distance to the average color of that box is smaller. I started implementing that in a very naive way, so it's extremely slow, but that's something to investigate further because it definitely improves the results. Most of the academic literature seems to suggest the use of the K-means clustering, but all of them require some startup step. Some come up with various heuristics, some use PCA, but I've yet to see one that rely on Median-Cut as first pass; maybe that's not such a good idea, but who knows. Bias toward perceived lightness Another more annoying problem for which I have no solution for is with regards to the human perception being much more sensitive to light changes than hue. If you look at the first demo with the parrot, you may have observed the boxes are kind of thin. This is because the a and b components (respectively how green/red and blue/yellow the color is) have a much smaller amplitude compared to the L (perceived lightness). Here is a side by side comparison of the spread of colors between a stretched and normalized view: You may rightfully question whether this is a problem or not. In practice, this means that when K is low (let's say smaller than 8 or even 16), cuts along L will almost always be preferred, causing the picture to be heavily desaturated. This is because it tries to preserve the most significant attribute in human perception: the lightness. That particular picture is actually a pathological study case: 4 colors 8 colors 12 colors 16 colors We can see the hue timidly appearing around K=16 (specifically it starts being more strongly noticeable starting the cut K=13). Conclusion For now, I'm mostly done with this "week-end long project" into which I actually poured 2 or 3 months of lifetime. The FFmpeg patchset will likely be upstreamed soon so everyone should hopefully be able to benefit from it in the next release. It will also come with additional dithering methods, which implementation actually was a relaxing distraction from all this hardship. There are still many ways of improving this work, but it's the end of the line for me, so I'll trust the Internet with it.

over a year ago • 65 votes

GCC undefined behaviors are getting wild

Happy with my recent breakthrough in understanding C integer divisions after weeks of struggle, I was minding my own business having fun writing integer arithmetic code. Life was good, when suddenly… zsh: segmentation fault (core dumped). That code wasn't messing with memory much so it was more likely to be a side effect of an arithmetic overflow or something. Using -fsanitize=undefined quickly identified the issue, which confirmed the presence of an integer overflow. The fix was easy but something felt off. I was under the impression my code was robust enough against that kind of honest mistake. Turns out, the protecting condition I had in place should indeed have been enough, so I tried to extract a minimal reproducible case: #include <stdint.h> #include <stdio.h> #include <stdlib.h> uint8_t tab[0x1ff + 1]; uint8_t f(int32_t x) { if (x < 0) return 0; int32_t i = x * 0x1ff / 0xffff; if (i >= 0 && i < sizeof(tab)) { printf("tab[%d] looks safe because %d is between [0;%d[\n", i, i, (int)sizeof(tab)); return tab[i]; } return 0; } int main(int ac, char **av) { return f(atoi(av[1])); } The overflow can happen on x * 0x1ff. Since an integer overflow is undefined, GCC makes the assumption that it cannot happen, ever. In practice in this case it does, but the i >= 0 && i < sizeof(tab) condition should be enough to take care of it, whatever crazy value it becomes, right? Well, I have bad news: % cc -Wall -O2 overflow.c -o overflow && ./overflow 50000000 tab[62183] looks safe because 62183 is between [0;512[ zsh: segmentation fault (core dumped) ./overflow 50000000 Note: this is GCC 12.2.0 on x86-64. We have i=62183 as the result of the overflow, and nevertheless the execution violates the gate condition, spout a non-sense lie, go straight into dereferencing tab, and die miserably. Let's study what GCC is doing here. Firing up Ghidra we observe the following decompiled code: uint8_t f(int x) { int tmp; if (-1 < x) { tmp = x * 0x1ff; if (tmp < 0x1fffe00) { printf("tab[%d] looks safe because %d is between [0;%d[\n",(ulong)(uint)tmp / 0xffff, (ulong)(uint)tmp / 0xffff,0x200); return tab[(int)((uint)tmp / 0xffff)]; } } return '\0'; } When I said GCC makes the assumption that it cannot happen this is what I meant: tmp is not supposed to overflow so part of the condition I had in place was simply removed. More specifically since x can not be lesser than 0, and since GCC assumes a multiplication cannot overflow into a random value (that could be negative) because it is undefined behaviour, it then decides to drop the "redundant" i >= 0 condition because "it cannot happen". I reported that exact issue to GCC to make sure it wasn't a bug, and it was indeed confirmed to me that the undefined behaviour of an integer overflow is not limited in scope to whatever insane value it could take: it is apparently perfectly acceptable to mess up the code flow entirely. While I understand how attractive it can be from an optimization point of view, the paranoid developer in me is straight up terrified by the perspective of a single integer overflow removing security protection and causing such havoc. I've worked several years in a project where the integer overflows were (and probably still are) legion. Identifying and fixing of all them is likely a lifetime mission of several opinionated individuals. I'm expecting this article to make the rust crew go in a crusade again, and I think I might be with them this time. Edit: it was made clear to me while reading Predrag's blog that the key to my misunderstanding boils down to this: "Undefined behavior is not the same as implementation-defined behavior". While I was indeed talking about undefined behaviour, subconsciously I was thinking that the behaviour of an overflow on a multiplication would be "implementation-defined behaviour". This is not the case, it is indeed an undefined behaviour, and yes the compiler is free to do whatever it wants to because it is compliant with the specifications. It's my mistake of course, but to my defense, despite the arrogant comments I read, this confusion happens a lot. This happens I believe because it's violating the Principle of least astonishment. To illustrate this I'll take this interesting old OpenBSD developer blog post being concerned about the result of the multiplication rather than the invalidation of any guarantee with regard to what's going to happen to the execution flow (before and after). This is not uncommon and in my opinion perfectly understandable.

over a year ago • 60 votes

More in programming

Benjie's Humanoid Olympic Games

A gauntlet thrown

23 hours ago • 8 votes

Stumbling upon

Something like a channel changer, for the web. That's what the idea was at first. But it led to a whole new path of discovery that even the site's creators couldn't have predicted. The post Stumbling upon appeared first on The History of the Web.

3 hours ago • 3 votes

Some Love For Python

I really enjoyed watching Python: The Documentary (from CultRepo, formerly Honeypot, same makers as the TypeScript documentary). Personally, I don’t write much Python and am not involved in the broader Python community. That said, I love how this documentary covers a lot of the human problems in tech and not just the technical history of Python as language. For example: How do you handle succession from a pivotal creator? How do you deal with poor representation? How do you fund and steer open projects? How do you build community? How do you handle the fallout of major version changes? And honestly, all the stories around these topics as told from the perspective of Python feel like lessons to learn from. Here are a few things that stood out to me. Guido van Rossum, Creator of Python, Sounds Cool The film interviews Drew Houston, Founder/CEO at Dropbox, because he hired Python’s creator Guido van Rossum for a stint. This is what Drew had to say about his time working with Guido: It’s hard for me to think of someone who has had more impact with lower ego [than Guido] For tech, that’s saying something! Now that is a legacy if you ask me. The Python Community Sounds Cool Brett Cannon famously gave a talk at a Python conference where he said he “came for the language, but stayed for the community”. In the documentary they interview him and he adds: The community is the true strength of Pyhon. It’s not just the language, it’s the people. ❤️ This flies in the face of the current era we’re in, where it’s the technology that matters. How it disrupts or displaces people is insignificant next to the fantastic capabilities it purports to wield. But here’s this language surrounded by people who acknowledge that the community around the language is its true strength. People are the true strength. Let me call this out again, in case it’s not sinking in: Here’s a piece of technology where the people around it seem to acknowledge that the technology itself is only secondary to the people it was designed to serve. How incongruous is that belief with so many other pieces of technology we’ve seen through the years? What else do we have, if not each other? That’s something worth amplifying. Mariatta, Python Core Developer, Sounds Cool I absolutely loved the story of @mariatta@fosstodon.org. If you’re not gonna watch the documentary, at least watch the ~8 minutes of her story. Watched it? Ok, here’s my quick summary: She loves to program, but everywhere she looks it’s men. At work. At conferences. On core teams. She hears about pyladies and wants to go to Pycon where she can meet them. She goes to Pycon and sees Guido van Rossum stand up and say he wants 2 core contributors to Python that are female. She thinks, “Oh that’s cool! I’m not good enough for that, but I bet they’ll find someone awesome.” The next year she goes to the conference and finds out they’re still looking for those 2 core contributors. She thinks “Why not me?” and fires off an email to Guido. Here’s her recollection on composing that email: I felt really scared. I didn’t feel like I deserved mentorship from Guido van Rossum. I really hesitated to send this email to him, but in the end I realized I want to try. This was a great opportunity for me. I hit the send button. And later, her feelings on becoming the first female core contributor to Python: When you don’t have role models you can relate to, you don’t believe you can do it. ❤️ Mad respect. I love her story. As Jessica McKellar says in the film, Mariatta’s is an inspiring story and “a vision of what is possible in other communities”. Python Is Refreshing I’ve spent years in “webdev” circles — and there are some great ones — but this Python documentary was, to me, a tall, refreshing glass of humanity. Go Python! Email · Mastodon · Bluesky

20 hours ago • 2 votes

btrfs on a Raspberry Pi

I’m something of a filesystem geek, I guess. I first wrote about ZFS on Linux 14 years ago, and even before I used ZFS, I had used ext2/3/4, jfs, reiserfs, xfs, and no doubt some others. I’ve also used btrfs. I last posted about it in 2014, when I noted it has some advantages over … Continue reading btrfs on a Raspberry Pi →

an hour ago • 1 votes

chapter seven: nudge

“Why are you here on a Sunday?” “John’s in town,” I said. “And he knows I’m looking for him.” I’ve carried this case for five years. When Operant moved its compute out to Long Island—cheaper power, easier permits—it landed in my world by accident. Detective James Reese, Nassau County Police. Since then, every time I think I’ve got a straight line, the story bends. People call it “mind control.” That’s the wrong phrase. You hear that and you start hunting for sci-fi. What you should be hunting for is timing. There are the clean facts. It started with a private investigator caught at night inside Jane Street’s office. He was there to plant a device. Not a camera, not a mic. A flat plastic square the size of a drop ceiling tile, featureless, no lens, no obvious grill. If you tapped it with a knuckle it sounded dead, like dense foam. The FBI took the evidence, said as little as possible, and then Trump dissolved the Bureau and the chain of custody with it. The PI pled to B&E, did eighteen months, swore a friend offered him ten grand and a location. The friend never existed long enough for us to find. While the tile vanished, Operant didn’t. They grew. They put their name on the Ducks’ ballpark, donated to everyone they should, and pushed eight percent of Long Island’s power through their meter. Every time I asked questions, a lawyer answered them. I still have a job mostly because I don’t stick my questions in microphones. But the corporate espionage wasn’t the hook. The hook’s name was Tom Park. Young, gifted, on Operant’s “research” payroll. He died off the roof of their building. We asked for the CCTV. They delayed until the delay became its own story, and when the files came they were grainy enough you could convince yourself resolution had gone out of style. We couldn’t prove a cut. We couldn’t prove a lie. We could see a silhouette on the roof with a phone in his hand, see him put it away, and watch him walk forward like he’d decided to walk forward an hour ago. We pulled his phone records. The carrier said the device never left his parents’ house that night. The family’s router logs said the same thing—MAC associated all evening, steady signal, Netflix on the downstairs TV. At the time of death the rooftop access point didn’t record a roam. No one found a phone with the body. If you’re generous, you call that “inconsistency.” If you’ve been around long enough, you call it “choreography.” I didn’t see the tile again, but I kept a copy of the photos and I stared at the connector pads until I’d memorized the geometry. Four edge pads, power bus shape. Months later, a fire inspection at an Operant satellite site flagged “non-listed luminaires with integrated driver boards.” That’s code for “custom lights.” The brand on the sticker didn’t exist in any registry. It matched nothing you could buy. What does a ceiling tile do if it isn’t a ceiling tile? You can guess: a planar array under plastic, phaseable, a clock inside that doesn’t drift. You don’t need to read thoughts. You need to make the room keep time. We ran a small experiment in our squad room. Nothing that requires approval. We set up a tapping game on a laptop—left or right as quickly as you can when a cue appears. We added a desk lamp we could modulate in the last hundred milliseconds before the cue—no visible flicker, just PWM phase changes—and a piezo disc under the mouse pad that could make a vibration too soft to notice unless you were trying to notice. We told the script to wait until the model thought the subject was likely to pick left, then line up the lamp phase and the tick so “left” felt a hair earlier. The hit rate shifted eight points. The officers said it felt like the computer was “on it” that round. No one said they felt pushed. That’s the thing about timing: when it works, it feels like you were going to do it anyway. I went back to Tom. We subpoenaed what we could: badge swipes, elevator logs, building automation schedules for lights and HVAC. The elevator cabin he rode at 23:41 ran a “door nudge” cycle at floor 35—exact term in the manual. Not a stop, a shove. At the same minute the east conference rooms above ran a luminance ramp—35 to 50 percent and back down—logged as a “pattern test.” Two minutes later the air handlers kicked a “night purge,” unscheduled. The lobby mic’s spectrogram shows the change as a clean band sliding up. None of those facts make a person move. Together they draw a rhythm line through a building. We never found what Tom had in his hand, his “not a phone,” but a year after his death, one of their contractors quit and dumped an issue tracker on a public repo by accident. It was up for an hour before it vanished, but the internet is full of raccoons, and one of them sent me a ZIP. Half the issues were boring—install scripts, driver mismatches, bad GPIO pull-ups. The other half had words like “phase,” “latency,” “confidence gate,” “avoid visible artifacts,” “EEG-free,” and the tag “ROOM.” There was a set of comments on a bug titled “End Token Misfires.” The engineers were arguing about whether printing the full predicted sequence at the start of a session biased the subject into making it true. One person said that was the point. Another said if your only wins are the ones you can cause you aren’t measuring prediction anymore, you’re measuring control. The thread ends with a “resolved—won’t fix.” Mind control isn’t the right term. It makes people look for sci-fi. The right term is “nudge,” the one the elevator manuals use. You put your thumb on the timing. You don’t push the person; you lean on the moment. Tom stood on a roof with a clock in his pocket that belonged to the room, and a room that belonged to the company, and a company that had learned you can make a person look like a prediction if you take away all the moments where they would have surprised you. Sometimes I think the real trick isn’t the tile or the lights. It’s the bookkeeping. You arrange your systems so that there’s nothing to subpoena. The carrier shows a phone at home. The Wi-Fi shows a phone at home. The building shows a test pattern and a purge cycle and a polite door. Nothing is illegal in a log file. I keep a copy of that ZIP on a USB stick in my desk. There’s a folder called “SAFE_GATES” with a README someone wrote in plain English. “Do not schedule interventions if subject arousal > threshold. Do not schedule end token if subject mentions self-harm. Cooldown after consecutive errors.” Half the rules are commented out. The most recent commit message is just a shrug emoji. The worst part is how ordinary it all is. The elevator nudge. The lamp nudge. The HVAC tone. The not-a-phone. If you want to find the devil, you don’t go looking for horns. You go looking for clock edges. I told my sergeant I was taking the rest of the day. I stopped by a hardware store and bought a dimmer I knew I could open, and a roll of white tape. Back at the office I put a strip of tape on the lamp in interview room two, covering the LED that the supplier put there to indicate “smart mode.” We don’t use smart mode. We don’t use anything with a mode. When I left, the room looked like every other room. That’s the point. You only notice the timing when it slips. On the way home I drove past the Ducks ballpark. Operant’s name on the sign looked like every other naming deal. Families walking in, kids with foam fingers, warm light over the field. If you didn’t know to look, you’d think it was all just baseball.

yesterday • 2 votes

New here?