Full Width [alt+shift+f] Shortcuts [alt+shift+k]
Sign Up [alt+shift+s] Log In [alt+shift+l]
52
The International Obfuscated C Code Contest has a newly revamped web site, and the Judges have announced the 28th contest, to coincide with its 40th anniversary. (Or 41st?) The Judges have also updated the archive of past winners so that as many of them as possible work on modern systems. Accordingly, I took a look at my 1998 winner to see how much damage time hath wrought. When it is built, my program needs to go through the C preprocessor twice. There are a few reasons: It’s part of coercing the C compiler into compiling OFL, an obfuscated functional language. OFL has keywords l and b, short for let and be, so for example the function for constructing a pair is defined as l pair b (BB (B (B K)) C CI) In a less awful language that might be written let pair = λx λy λf λg (f x y) Anyway, the first pass of the C preprocessor turns a l (let) declaration into a macro #define pair b (BB (B (B K)) C CI) And the second pass expands the macros. (There’s a joke in the README that...
2 months ago

Improve your reading experience

Logged in users get linked directly to articles resulting in a better reading experience. Please login for free, it takes less than 1 minute.

More from Tony Finch's blog

constantly divisionless random numbers

Last year I wrote about inlining just the fast path of Lemire’s algorithm for nearly-divisionless unbiased bounded random numbers. The idea was to reduce code bloat by eliminating lots of copies of the random number generator in the rarely-executed slow paths. However a simple split prevented the compiler from being able to optimize cases like pcg32_rand(1 << n), so a lot of the blog post was toying around with ways to mitigate this problem. On Monday while procrastinating a different blog post, I realised that it’s possible to do better: there’s a more general optimization which gives us the 1 << n special case for free. nearly divisionless Lemire’s algorithm has about 4 neat tricks: use multiplication instead of division to reduce the output of a random number generator modulo some limit eliminate the bias in (1) by (counterintuitively) looking at the lower digits fun modular arithmetic to calculate the reject threshold for (2) arrange the reject tests to avoid the slow division in (3) in most cases The nearly-divisionless logic in (4) leads to two copies of the random number generator, in the fast path and the slow path. Generally speaking, compilers don’t try do deduplicate code that was written by the programmer, so they can’t simplify the nearly-divisionless algorithm very much when the limit is constant. constantly divisionless Two points occurred to me: when the limit is constant, the reject threshold (3) can be calculated at compile time when the division is free, there’s no need to avoid it using (4) These observations suggested that when the limit is constant, the function for random numbers less than a limit should be written: static inline uint32_t pcg32_rand_const(pcg32_t *rng, uint32_t limit) { uint32_t reject = -limit % limit; uint64_t sample; do sample = (uint64_t)pcg32_random(rng) * (uint64_t)limit); while ((uint32_t)(sample) < reject); return ((uint32_t)(sample >> 32)); } This has only one call to pcg32_random(), saving space as I wanted, and the compiler is able to eliminate the loop automatically when the limit is a power of two. The loop is smaller than a call to an out-of-line slow path function, so it’s better all round than the code I wrote last year. algorithm selection As before it’s possible to automatically choose the constantly-divisionless or nearly-divisionless algorithms depending on whether the limit is a compile-time constant or run-time variable, using arcane C tricks or GNU C __builtin_constant_p(). I have been idly wondering how to do something similar in other languages. Rust isn’t very keen on automatic specialization, but it has a reasonable alternative. The thing to avoid is passing a runtime variable to the constantly-divisionless algorithm, because then it becomes never-divisionless. Rust has a much richer notion of compile-time constants than C, so it’s possible to write a method like the follwing, which can’t be misused: pub fn upto<const LIMIT: u32>(&mut self) -> u32 { let reject = LIMIT.wrapping_neg().wrapping_rem(LIMIT); loop { let (lo, hi) = self.get_u32().embiggening_mul(LIMIT); if lo < reject { continue; } else { return hi; } } } assert!(rng.upto::<42>() < 42); (embiggening_mul is my stable replacement for the unstable widening_mul API.) This is a nugatory optimization, but there are more interesting cases where it makes sense to choose a different implementation for constant or variable arguments – that it, the constant case isn’t simply a constant-folded or partially-evaluated version of the variable case. Regular expressions might be lex-style or pcre-style, for example. It’s a curious question of language design whether it should be possible to write a library that provides a uniform API that automatically chooses constant or variable implementations, or whether the user of the library must make the choice explicit. Maybe I should learn some Zig to see how its comptime works.

a week ago 11 votes
random numbers from pcg32 at 200 Gbit/s

One of the neat things about the PCG random number generator by Melissa O’Neill is its use of instruction-level parallelism: the PCG state update can run in parallel with its output permutation. However, PCG only has a limited amount of ILP, about 3 instructions. Its overall speed is limited by the rate at which a CPU can run a sequence where the output of one multiply-add feeds into the next multiply-add. … Or is it? With some linear algebra and some AVX512, I can generate random numbers from a single instance of pcg32 at 200 Gbit/s on a single core. This is the same sequence of random numbers generated in the same order as normal pcg32, but more than 4x faster. You can look at the benchmark in my pcg-dxsm repository. skip ahead the insight multipliers trying it out results skip ahead One of the slightly weird features that PCG gets from its underlying linear congruential generator is “seekability”: you can skip ahead k steps in the stream of random numbers in log(k) time. The PCG paper (in section 4.3.1) cites Forrest Brown’s paper, random numbers with arbitrary strides, which explains that the skip-ahead feature is useful for reproducibility of monte carlo simulations. But what caught my eye is the skip-ahead formula. Rephrased in programmer style, state[n+k] = state[n] * pow(MUL, k) + inc * (pow(MUL, k) - 1) / (MUL - 1) the insight The skip-ahead formula says that we can calculate a future state using a couple of multiplications. The skip-ahead multipliers depend only on the LCG multiplier, not on the variable state, nor on the configurable increment. That means that for a fixed skip ahead, we can precalculate the multipliers before compile time. The skip-ahead formula allows us to unroll the PCG data dependency chain. Normally, four iterations of the PCG state update look like, state0 = rng->state state1 = state0 * MUL + rng->inc state2 = state1 * MUL + rng->inc state3 = state2 * MUL + rng->inc state4 = state3 * MUL + rng->inc rng->state = state4 With the skip-ahead multipliers it looks like, state0 = rng->state state1 = state0 * MULs1 + rng->inc * MULi1 state2 = state0 * MULs2 + rng->inc * MULi2 state3 = state0 * MULs3 + rng->inc * MULi3 state4 = state0 * MULs4 + rng->inc * MULi4 rng->state = state4 These state calculations can be done in parallel using NEON or AVX vector instructions. The disadvantage is that calculating future states in parallel requires more multiplications than doing so in series, but that’s OK because modern CPUs have lots of ALUs. multipliers The skip-ahead formula is useful for jumping ahead long distances, because (as Forrest Brown explained) you can do the exponentiation in log(k) time using repeated squaring. (The same technique is used in for modexp in RSA.) But I’m only interested in the first few skip-ahead multipliers. I’ll define the linear congruential generator as: lcg(s, inc) = s * MUL + inc Which is used in PCG’s normal state update like: rng->state = lcg(rng->state, rng->inc) To precalculate the first few skip-ahead multipliers, we iterate the LCG starting from zero and one, like this: MULs0 = 1 MULs1 = lcg(MULs0, 0) MULs2 = lcg(MULs1, 0) MULi0 = 0 MULi1 = lcg(MULi0, 1) MULi2 = lcg(MULi1, 1) My benchmark code’s commentary includes a proof by induction, which I wrote to convince myself that these multipliers are correct. trying it out To explore how well this skip-ahead idea works, I have written a couple of variants of my pcg32_bytes() function, which simply iterates pcg32 and writes the results to a byte array. The variants have an adjustable amount of parallelism. One variant is written as scalar code in a loop that has been unrolled by hand a few times. I wanted to see if standard C gets a decent speedup, perhaps from autovectorization. The other variant uses the GNU C portable vector extensions to calculate pcg32 in an explicitly parallel manner. The benchmark also ensures the output from every variant matches the baseline pcg32_bytes(). results The output from the benchmark harness lists: the function variant either the baseline version or uN for a scalar loop unrolled N times or xN for vector code with N lanes its speed in bytes per nanosecond (aka gigabytes per second) its performance relative to the baseline There are small differences in style between the baseline and u1 functions, but their performance ought to be basically the same. Apple clang 16, Macbook Pro M1 Pro. This compiler is eager and fairly effective at autovectorizing. ARM NEON isn’t big enough to get a speedup from 8 lanes of parallelism. __ 3.66 bytes/ns x 1.00 u1 3.90 bytes/ns x 1.07 u2 6.40 bytes/ns x 1.75 u3 7.66 bytes/ns x 2.09 u4 8.52 bytes/ns x 2.33 x2 7.59 bytes/ns x 2.08 x4 10.49 bytes/ns x 2.87 x8 10.40 bytes/ns x 2.84 The following results were from my AMD Ryzen 9 7950X running Debian 12 “bookworm”, comparing gcc vs clang, and AVX2 vs AVX512. gcc is less keen to autovectorize so it doesn’t do very well with the unrolled loops. (Dunno why u1 is so much slower than the baseline.) gcc 12.2 -march=x86-64-v3 __ 5.57 bytes/ns x 1.00 u1 5.13 bytes/ns x 0.92 u2 5.03 bytes/ns x 0.90 u3 7.01 bytes/ns x 1.26 u4 6.83 bytes/ns x 1.23 x2 3.96 bytes/ns x 0.71 x4 8.00 bytes/ns x 1.44 x8 12.35 bytes/ns x 2.22 clang 16.0 -march=x86-64-v3 __ 4.89 bytes/ns x 1.00 u1 4.08 bytes/ns x 0.83 u2 8.76 bytes/ns x 1.79 u3 10.43 bytes/ns x 2.13 u4 10.81 bytes/ns x 2.21 x2 6.67 bytes/ns x 1.36 x4 12.67 bytes/ns x 2.59 x8 15.27 bytes/ns x 3.12 gcc 12.2 -march=x86-64-v4 __ 5.53 bytes/ns x 1.00 u1 5.53 bytes/ns x 1.00 u2 5.55 bytes/ns x 1.00 u3 6.99 bytes/ns x 1.26 u4 6.79 bytes/ns x 1.23 x2 4.75 bytes/ns x 0.86 x4 17.14 bytes/ns x 3.10 x8 20.90 bytes/ns x 3.78 clang 16.0 -march=x86-64-v4 __ 5.53 bytes/ns x 1.00 u1 4.25 bytes/ns x 0.77 u2 7.94 bytes/ns x 1.44 u3 9.31 bytes/ns x 1.68 u4 15.33 bytes/ns x 2.77 x2 9.07 bytes/ns x 1.64 x4 21.74 bytes/ns x 3.93 x8 26.34 bytes/ns x 4.76 That last result is pcg32 generating random numbers at 200 Gbit/s.

3 weeks ago 13 votes
nsnotifyd-2.3 released

D’oh, I lost track of a bug report that should have been fixed in nsnotifyd-2.2. Thus, hot on the heels of [the previous release][prev], here’s nsnotifyd-2.3. Sorry for causing extra work to my uncountably many users! The nsnotifyd daemon monitors a set of DNS zones and runs a command when any of them change. It listens for DNS NOTIFY messages so it can respond to changes promptly. It also uses each zone’s SOA refresh and retry parameters to poll for updates if nsnotifyd does not receive NOTIFY messages more frequently. It comes with a client program nsnotify for sending notify messages. This nsnotifyd-2.3 release includes some bug fixes: When nsnotifyd receives a SIGINT or SIGTERM while running the command, it failed to handle it correctly. Now it exits promptly. Many thanks to Athanasius for reporting the bug! Miscellaneous minor code cleanup and compiler warning suppression. Thanks also to Dan Langille who sent me a lovely appreciation: Now that I think of it, nsnotifyd is in my favorite group of software. That group is software I forget I’m running, because they just run and do the work. For years. I haven’t touched, modified, or configured nsnotifyd and it just keeps doing the job.

3 months ago 53 votes
nsnotifyd-2.2 released

I have made a new release of nsnotifyd, a tiny DNS server that just listens for NOTIFY messages and runs a script when one of your zones changes. This nsnotifyd-2.2 release includes a new feature: nsnotify can now send NOTIFY messages from a specific source address Thanks to Adam Augustine for the suggestion. I like receiving messages that say things like, Thanks for making this useful tool available for free.

3 months ago 58 votes

More in programming

Supa Pecha Kucha

slug: supapechakucha

17 hours ago 3 votes
The Power of Principles in Web Development Decision-Making (article)

Discover how The Epic Programming Principles can transform your web development decision-making, boost your career, and help you build better software.

9 hours ago 2 votes
Closing the borders alone won't fix the problems

Denmark has been reaping lots of delayed accolades from its relatively strict immigration policy lately. The Swedes and the Germans in particular are now eager to take inspiration from The Danish Model, given their predicaments. The very same countries that until recently condemned the lack of open-arms/open-border policies they would champion as Moral Superpowers.  But even in Denmark, thirty years after the public opposition to mass immigration started getting real political representation, the consequences of culturally-incompatible descendants from MENAPT continue to stress the high-trust societal model. Here are just three major cases that's been covered in the Danish media in 2025 alone: Danish public schools are increasingly struggling with violence and threats against students and teachers, primarily from descendants of MENAPT immigrants. In schools with 30% or more immigrants, violence is twice as prevalent. This is causing a flight to private schools from parents who can afford it (including some Syrians!). Some teachers are quitting the profession as a result, saying "the Quran run the class room". Danish women are increasingly feeling unsafe in the nightlife. The mayor of the country's third largest city, Odense, says he knows why: "It's groups of young men with an immigrant background that's causing it. We might as well be honest about that." But unfortunately, the only suggestion he had to deal with the problem was that "when [the women] meet these groups... they should take a big detour around them". A soccer club from the infamous ghetto area of Vollsmose got national attention because every other team in their league refused to play them. Due to the team's long history of violent assaults and death threats against opposing teams and referees. Bizarrely leading to the situation were the team got to the top of its division because they'd "win" every forfeited match. Problems of this sort have existed in Denmark for well over thirty years. So in a way, none of this should be surprising. But it actually is. Because it shows that long-term assimilation just isn't happening at a scale to tackle these problems. In fact, data shows the opposite: Descendants of MENAPT immigrants are more likely to be violent and troublesome than their parents. That's an explosive point because it blows up the thesis that time will solve these problems. Showing instead that it actually just makes it worse. And then what? This is particularly pertinent in the analysis of Sweden. After the "far right" party of the Swedish Democrats got into government, the new immigrant arrivals have plummeted. But unfortunately, the net share of immigrants is still increasing, in part because of family reunifications, and thus the problems continue. Meaning even if European countries "close the borders", they're still condemned to deal with the damning effects of maladjusted MENAPT immigrant descendants for decades to come. If the intervention stops there. There are no easy answers here. Obviously, if you're in a hole, you should stop digging. And Sweden has done just that. But just because you aren't compounding the problem doesn't mean you've found a way out. Denmark proves to be both a positive example of minimizing the digging while also a cautionary tale that the hole is still there.

19 hours ago 2 votes
We all lose when art is anonymised

One rabbit hole I can never resist going down is finding the original creator of a piece of art. This sounds simple, but it’s often quite difficult. The Internet is a maze of social media accounts that only exist to repost other people’s art, usually with minimal or non-existent attribution. A popular image spawns a thousand copies, each a little further from the original. Signatures get cropped, creators’ names vanish, and we’re left with meaningless phrases like “no copyright intended”, as if that magically absolves someone of artistic theft. Why do I do this? I’ve always been a bit obsessive, a bit completionist. I’ve worked in cultural heritage for eight years, which has made me more aware of copyright and more curious about provenance. And it’s satisfying to know I’ve found the original source, that I can’t dig any further. This takes time. It’s digital detective work, using tools like Google Lens and TinEye, and it’s not always easy or possible. Sometimes the original pops straight to the top, but other times it takes a lot of digging to find the source of an image. So many of us have become accustomed to art as an endless, anonymous stream of “content”. A beautiful image appears in our feed, we give it a quick heart, and scroll on, with no thought for the human who sweated blood and tears to create it. That original artist feels distant, disconected. Whatever benefit they might get from the “exposure” of your work going viral, they don’t get any if their name has been removed first. I came across two examples recently that remind me it’s not just artists who miss out – it’s everyone who enjoys art. I saw a photo of some traffic lights on Tumblr. I love their misty, nighttime aesthetic, the way the bright colours of the lights cut through the fog, the totality of the surrounding darkness. But there was no name – somebody had just uploaded the image to their Tumblr page, it was reblogged a bunch of times, and then it appeared on my dashboard. Who took it? I used Google Lens to find the original photographer: Lucas Zimmerman. Then I discovered it was part of a series. And there was a sequel. I found interviews. Context. Related work. I found all this cool stuff, but only because I knew Lucas’s name. Traffic Lights, by Lucas Zimmerman. Published on Behance.net under a CC BY‑NC 4.0 license, and reposted here in accordance with that license. The second example was a silent video of somebody making tiny chess pieces, just captioned “wow”. It was clearly an edit of another video, with fast-paced cuts to make it accommodate a short attention span – and again with no attribution. This was a little harder to find – I had to search several frames in Google Lens before I found a summary on a Russian website, which had a link to a YouTube video by metalworker and woodworker Левша (Levsha). This video is four times longer than the cut-up version I found, in higher resolution, and with commentary from the original creator. I don’t speak Russian, but YouTube has auto-translated subtitles. Now I know how this amazing set was made, and I have a much better understanding of the materials and techniques involved. (This includes the delightful name Wenge wood, which I’d never heard before.) https://youtube.com/watch?v=QoKdDK3y-mQ A piece of art is more than just a single image or video. It’s a process, a human story. When art is detached from its context and creator, we lose something fundamental. Creators lose the chance to benefit from their work, and we lose the opportunity to engage with it in a deeper way. We can’t learn how it was made, find their other work, or discover how to make similar art for ourselves. The Internet has done many wonderful things for art, but it’s also a machine for endless copyright infringement. It’s not just about generative AI and content scraping – those are serious issues, but this problem existed long before any of us had heard of ChatGPT. It’s a thousand tiny paper cuts. How many of us have used an image from the Internet because it showed up in a search, without a second thought for its creator? When Google Images says “images may be subject to copyright”, how many of us have really thought about what that means? Next time you want to use an image from the web, look to see if it’s shared under a license that allows reuse, and make sure you include the appropriate attribution – and if not, look for a different image. Finding the original creator is hard, sometimes impossible. The Internet is full of shadows: copies of things that went offline years ago. But when I succeed, it feels worth the effort – both for the original artist and myself. When I read a book or watch a TV show, the credits guide me to the artists, and I can appreciate both them and the rest of their work. I wish the Internet was more like that. I wish the platforms we rely on put more emphasis on credit and attribution, and the people behind art. The next time an image catches your eye, take a moment. Who made this? What does it mean? What’s their story? [If the formatting of this post looks odd in your feed reader, visit the original article]

yesterday 1 votes
Apple does AI as Microsoft did mobile

When the iPhone first appeared in 2007, Microsoft was sitting pretty with their mobile strategy. They'd been early to the market with Windows CE, they were fast-following the iPod with their Zune. They also had the dominant operating system, the dominant office package, and control of the enterprise. The future on mobile must have looked so bright! But of course now, we know it wasn't. Steve Ballmer infamously dismissed the iPhone with a chuckle, as he believed all of Microsoft's past glory would guarantee them mobile victory. He wasn't worried at all. He clearly should have been! After reliving that Ballmer moment, it's uncanny to watch this CNBC interview from one year ago with Johny Srouji and John Ternus from Apple on their AI strategy. Ternus even repeats the chuckle!! Exuding the same delusional confidence that lost Ballmer's Microsoft any serious part in the mobile game.  But somehow, Apple's problems with AI seem even more dire. Because there's apparently no one steering the ship. Apple has been promising customers a bag of vaporware since last fall, and they're nowhere close to being able to deliver on the shiny concept demos. The ones that were going to make Apple Intelligence worthy of its name, and not just terrible image generation that is years behind the state of the art. Nobody at Apple seems able or courageous enough to face the music: Apple Intelligence sucks. Siri sucks. None of the vaporware is anywhere close to happening. Yet as late as last week, you have Cook promoting the new MacBook Air with "Apple Intelligence". Yikes. This is partly down to the org chart. John Giannandrea is Apple's VP of ML/AI, and he reports directly to Tim Cook. He's been in the seat since 2018. But Cook evidently does not have the product savvy to be able to tell bullshit from benefit, so he keeps giving Giannandrea more rope. Now the fella has hung Apple's reputation on vaporware, promised all iPhone 16 customers something magical that just won't happen, and even spec-bumped all their devices with more RAM for nothing but diminished margins. Ouch. This is what regression to the mean looks like. This is what fiefdom management looks like. This is what having a company run by a logistics guy looks like. Apple needs a leadership reboot, stat. That asterisk is a stain.

2 days ago 3 votes