Full Width [alt+shift+f] Shortcuts [alt+shift+k]
Sign Up [alt+shift+s] Log In [alt+shift+l]
22
About half a year ago I encountered a paper bombastically titled “the ultimate conditional syntax”. It has the attractive goal of unifying pattern match with boolean if tests, and its solution is in some ways very nice. But it seems over-complicated to me, especially for something that’s a basic work-horse of programming. I couldn’t immediately see how to cut it down to manageable proportions, but recently I had an idea. I’ll outline it under the “penultimate conditionals” heading below, after reviewing the UCS and explaining my motivation. what the UCS? whence UCS out of scope penultimate conditionals dangling syntax examples antepenultimate breath what the UCS? The ultimate conditional syntax does several things which are somewhat intertwined and support each other. An “expression is pattern” operator allows you to do pattern matching inside boolean expressions. Like “match” but unlike most other expressions, “is” binds variables whose scope is the rest of the boolean expression...
a month ago

Improve your reading experience

Logged in users get linked directly to articles resulting in a better reading experience. Please login for free, it takes less than 1 minute.

More from Tony Finch's blog

performance of random floats

A couple of years ago I wrote about random floating point numbers. In that article I was mainly concerned about how neat the code is, and I didn’t pay attention to its performance. Recently, a comment from Oliver Hunt and a blog post from Alisa Sireneva prompted me to wonder if I made an unwarranted assumption. So I wrote a little benchmark, which you can find in pcg-dxsm.git. As a brief recap, there are two basic ways to convert a random integer to a floating point number between 0.0 and 1.0: Use bit fiddling to construct an integer whose format matches a float between 1.0 and 2.0; this is the same span as the result but with a simpler exponent. Bitcast the integer to a float and subtract 1.0 to get the result. Shift the integer down to the same range as the mantissa, convert to float, then multiply by a scaling factor that reduces it to the desired range. This produces one more bit of randomness than the bithacking conversion. (There are other less basic ways.) My benchmark has 2 x 2 x 2 tests: bithacking vs multiplying 32 bit vs 64 bit sequential integers vs random integers Each operation is isolated from the benchmark loop by putting it in a separate translation unit (to prevent the compiler from inlining) and there is a fence instruction (ISB SY on ARM, MFENCE on AMD) in the loop to stop the CPU from overlapping successive iterations. I ran the benchmark on my Apple M1 Pro and my AMD Ryzen 7950X. In the table below, the leftmost column is the number of random bits. The top half measures sequential numbers, the bottom half is random numbers. The times are nanoseconds per operation, which includes the overheads of the benchmark loop and function call. arm amd 23 12.15 11.22 24 13.37 11.21 52 12.11 11.02 53 13.38 11.20 23 14.75 12.62 24 15.85 12.81 52 16.78 14.23 53 18.02 14.41 The times vary a little from run to run but the difference in speed of the various loops is reasonably consistent. I think my conclusion is that the bithacking conversion is about 1ns faster than the multiply conversion on my ARM box. There’s a subnanosecond difference on my AMD box which might indicate that the conversion takes different amounts of time depending on the value? Dunno.

2 weeks ago 13 votes
moka pot notes

In hot weather I like to drink my coffee in an iced latte. To make it, I have a very large Bialetti Moka Express. Recently when I got it going again after a winter of disuse, it took me a couple of attempts to get the technique right, so here are some notes as a reminder to my future self next year. It’s worth noting that I’m not fussy about my coffee: I usually drink pre-ground beans from the supermarket, with cream (in winter hot coffee) or milk and ice. basic principle When I was getting the hang of my moka pot, I learned from YouTube coffee geeks such as James Hoffmann that the main aim is for the water to be pushed through the coffee smoothly and gently. Better to err on the side of too little flow than too much. I have not had much success trying to make fine temperature adjustments while the coffee is brewing, because the big moka pot has a lot of thermal inertia: it takes a long time for any change in gas level to have any effect on on the coffee flow. routine fill the kettle and turn it on put the moka pot’s basket in a mug to keep it stable fill it with coffee (mine needs about 4 Aeropress scoops) tamp it down firmly [1] when the kettle has boiled, fill the base of the pot to just below the pressure valve (which is also just below the filter screen in the basket) insert the coffee basket, making sure there are no stray grounds around the edge where the seal will mate screw on the upper chamber firmly put it on a small gas ring turned up to the max [2] leave the lid open and wait for the coffee to emerge immediately turn the gas down to the minimum [3] the coffee should now come out in a steady thin stream without spluttering or stalling when the upper chamber is filled near the mouths of the central spout, it’ll start fizzing or spluttering [4] turn off the gas and pour the coffee into a carafe notes If I don’t tamp the grounds, the pot tends to splutter. I guess tamping gives the puck better integrity to resist channelling, and to keep the water under even pressure. Might be an effect of the relatively coarse supermarket grind? It takes a long time to get the pot back up to boiling point and I’m not sure that heating it up slower helps. The main risk, I think, is overshooting the ideal steady brewing state too much, but: With my moka pot on my hob the lowest gas flow on the smallest rings is just enough to keep the coffee flowing without stalling. The flow when the coffee first emerges is relatively fast, and it slows to the steady state several seconds after I turn the heat down, so I think the overshoot isn’t too bad. This routine turns almost all of the water into coffee, which Hoffmann suggests is a good result, and a sign that the pressure and temperature aren’t getting too high.

2 weeks ago 10 votes
the algebra of dependent types

TIL (or this week-ish I learned) why big-sigma and big-pi turn up in the notation of dependent type theory. I’ve long been aware of the zoo of more obscure Greek letters that turn up in papers about type system features of functional programming languages, μ, Λ, Π, Σ. Their meaning is usually clear from context but the reason for the choice of notation is usually not explained. I recently stumbled on an explanation for Π (dependent functions) and Σ (dependent pairs) which turn out to be nicer than I expected, and closely related to every-day algebraic data types. sizes of types The easiest way to understand algebraic data types is by counting the inhabitants of a type. For example: the unit type () has one inhabitant, (), and the number 1 is why it’s called the unit type; the bool type hass two inhabitants, false and true. I have even seen these types called 1 and 2 (cruelly, without explanation) in occasional papers. product types Or pairs or (more generally) tuples or records. Usually written, (A, B) The pair contains an A and a B, so the number of possible values is the number of possible A values multiplied by the number of possible B values. So it is spelled in type theory (and in Standard ML) like, A * B sum types Or disjoint union, or variant record. Declared in Haskell like, data Either a b = Left a | Right b Or in Rust like, enum Either<A, B> { Left(A), Right(B), } A value of the type is either an A or a B, so the number of possible values is the number of A values plus the number of B values. So it is spelled in type theory like, A + B dependent pairs In a dependent pair, the type of the second element depends on the value of the first. The classic example is a slice, roughly, struct IntSlice { len: usize, elem: &[i64; len], } (This might look a bit circular, but the idea is that an array [i64; N] must be told how big it is – its size is an explicit part of its type – but an IntSlice knows its own size. The traditional dependent “vector” type is a sized linked list, more like my array type than my slice type.) The classic way to write a dependent pair in type theory is like,      Σ len: usize . Array(Int, len) The big sigma binds a variable that has a type annotation, with a scope covering the expression after the dot – similar syntax to a typed lambda expression. We can expand a simple example like this into a many-armed sum type: either an array of length zero, or an array of length 1, or an array of length 2, … but in a sigma type the discriminant is user-defined instead of hidden. The number of possible values of the type comes from adding up all the alternatives, a summation just like the big sigma summation we were taught in school. ∑ a ∈ A B a When the second element doesn’t depend on the first element, we can count the inhabitants like, ∑ A B = A*B And the sigma type simplifies to a product type. telescopes An aside from the main topic of these notes, I also recently encountered the name “telescope” for a multi-part dependent tuple or record. The name “telescope” comes from de Bruijn’s AUTOMATH, one of the first computerized proof assistants. (I first encountered de Bruijn as the inventor of numbered lambda bindings.) dependent functions The return type of a dependent function can vary according to the argument it is passed. For example, to construct an array we might write something like, fn repeat_zero(len: usize) -> [i64; len] { [0; len] } The classic way to write the type of repeat_zero() is very similar to the IntSlice dependent pair, but with a big pi instead of a big sigma:      Π len: usize . Array(Int, len) Mmm, pie. To count the number of possible (pure, total) functions A ➞ B, we can think of each function as a big lookup table with A entries each containing a B. That is, a big tuple (B, B, … B), that is, B * B * … * B, that is, BA. Functions are exponential types. We can count a dependent function, where the number of possible Bs depends on which A we are passed, ∏ a ∈ A B a danger I have avoided the terms “dependent sum” and “dependent product”, because they seem perfectly designed to cause confusion over whether I am talking about variants, records, or functions. It kind of makes me want to avoid algebraic data type jargon, except that there isn’t a good alternative for “sum type”. Hmf.

3 weeks ago 12 votes
testing data structures per element

Recently, Alex Kladov wrote on the TigerBeetle blog about swarm testing data structures. It’s a neat post about randomized testing with Zig. I wrote a comment with an idea that was new to Alex @matklad, so I’m reposing a longer version here. differential testing problems grow / shrink random elements element-wise testing test loop data structure size invariants performance conclusion differential testing A common approach to testing data structures is to write a second reference implementation that has the same API but simpler and/or more obviously correct, though it uses more memory or is slower or less concurrent or otherwise not up to production quality. Then, run the production implementation and the reference implementation on the same sequence of operations, and verify that they produce the same results. Any difference is either a bug in the production implementation (probably) or a bug in the reference implementation (unlucky) or a bug in the tests (unfortunate). This is a straightforward differential testing pattern. problems There are a couple of difficulties with this kind of basic differential testing. grow / shrink The TigerBeetle article talks about adjusting the probabilities of different operations on the data structure to try to explore more edge cases. To motivate the idea, the article talks about adjusting the probabilities of adding or deleting items: If adding and deleting have equal probability, then the test finds it hard to grow the data structure to interesting sizes that might expose bugs. Unfortunately, if the probability of add is greater than del, then the data structure tends to grow without bound. If the probability of del is greater than add, then it tries to shrink from nothing: worse than equal probabilities! They could preload the data structure to test how it behaves when it shrinks, but a fixed set of probabilities per run is not good at testing both growth and shrinkage on the same test run on the same data structure. One way to improve this kind of test is to adjust the probability of add and del dynamically: make add more likely when the data structure is small, and del more likely when it is big. And maybe make add more likely in the first half of a test run and del more likely in the second half. random elements The TigerBeetle article glosses over the question of where the tests get fresh elements to add to the data structure. And its example is chosen so it doesn’t have to think about which elements get deleted. In my experience writing data structures for non-garbage-collected languages, I had to be more deliberate about how to create and destroy elements. That led to a style of test that’s more element-centric, as Alex described it. element-wise testing Change the emphasis so that instead of testing that two implementations match, test that one implementation obeys the expected behaviour. No need to make a drop-in replacement reference implementation! What I typically do is pre-allocate an array of elements, with slots that I can set to keep track of how each element relates to the data structure under test. The most important property is whether the element has been added or deleted, but there might be others related to ordering of elements, or values associated with keys, and so on. test loop Each time round the loop, choose at random an element from the array, and an action such as add / del / get / … Then, if it makes sense, perform the operation on the data structure with the element. For example, you might skip an add action if the element is already in the data structure, unless you can try to add it and expect an error. data structure size This strategy tends to grow the data structure until about 50% of the pre-allocated elements are inserted, then it makes a random walk around this 50% point. Random walks can diverge widely from their central point both in theory and in practice, so this kind of testing is reasonably effective at both growing and (to a lesser extent) shrinking the data structure. invariants I usually check some preconditions before an action, to verify that the data structure matches the expected properties of the chosen element. This can help to detect earlier that an action on one element has corrupted another element. After performing the action and updating the element’s properties, I check the updated properties as a postcondition, to make sure the action had the expected effects. performance John Regehr’s great tutorial, how to fuzz an ADT implementation, recommends writing a checkRep() function that thoroughly verifies a data structure’s internal consistency. A checkRep() function is a solid gold testing tool, but it is O(n) at least and typically very slow. If you call checkRep() frequently during testing, your tests slow down dramatically as your data structure gets larger. I like my per-element invariants to be local and ideally O(1) or O(log n) at worst, so they don’t slow down the tests too much. conclusion Recently I’ve used this pattern to exhibit concurrency bugs in an API that’s hard to make thread-safe. Writing the tests has required some cunning to work out what invariants I can usefully maintain and test; what variety of actions I can use to stress those invariants; and what mix of elements + actions I need so that my tests know which properties of each element should be upheld and which can change. I’m testing multiple implementations of the same API, trying to demonstrate which is safest. Differential testing can tell me that implementations diverge, but not which is correct, whereas testing properties and invariants more directly tells me whether an implementation does what I expect. (Or gives me a useless answer when my tests are weak.) Which is to say that this kind of testing is a fun creative challenge. I find it a lot more rewarding than example-based testing.

a month ago 10 votes

More in programming

2025-06-22 Sun: Ban std::string

The use of std::string should be banned in C++ code bases. I’m sure this statement sounds like heresy and you want to burn me at stake. But is it really controversial? Java, C#, Go, JavaScript, Python, Ruby, PHP: they all have immutable strings that are basically 2 machine words: a pointer to string data and size of the string. If they have an equivalent of std:string it’s something like StringBuilder. C++ should also use immutable strings in 97% of situations. The problem is gravity: the existing code, the culture. They all pull you strongly towards std::string and going against the current is the hardest thing there is. There isn’t a standard type for that. You can use newish std::span<char*> but there really should be std::str (or some such). I did that in SumatraPDF where I mostly pass char* but I don’t expect many other C++ code bases to switch away from std::string.

3 hours ago 1 votes
In Praise of “Normal” Engineers

This article was originally commissioned by Luca Rossi (paywalled) for refactoring.fm, on February 11th, 2025. Luca edited a version of it that emphasized the importance of building “10x engineering teams” . It was later picked up by IEEE Spectrum (!!!), who scrapped most of the teams content and published a different, shorter piece on March […]

3 days ago 6 votes
Optimizing calling Windows DLL functions in Go

Go team wrote golang.org/x/sys/windows package to call functions in a Windows DLL. Their way is inefficient and this article describes a better way. The sys/windows way To call a function in a DLL, let’s say kernel32.dll, we must: load the dll into memory with LoadLibrary get the address of a function in the dll call the function at that address Here’s how it looks when you use sys/windows library: var ( libole32 *windows.LazyDLL coCreateInstance *windows.LazyProc ) func init() { libole32 = windows.NewLazySystemDLL("ole32.dll") coCreateInstance = libole32.NewProc("CoCreateInstance") } func CoCreateInstance(rclsid *GUID, pUnkOuter *IUnknown, dwClsContext uint32, riid *GUID, ppv *unsafe.Pointer) HRESULT { ret, _, _ := syscall.SyscallN(coCreateInstance.Addr(), 5, uintptr(unsafe.Pointer(rclsid)), uintptr(unsafe.Pointer(pUnkOuter)), uintptr(dwClsContext), uintptr(unsafe.Pointer(riid)), uintptr(unsafe.Pointer(ppv)), 0, ) return HRESULT(ret) } The problem The problem is that this is memory inefficient. For every function all we need is: name of the function to get its address in a dll. That is a string so its 8 bytes (address of the string) + 8 bytes (size of the string) + the content of the string. address of a function, which is 8 bytes on a 64-bit CPU Unfortunately in sys/windows each function requires this: type LazyProc struct { Name string mu sync.Mutex l *LazyDLL proc *Proc } type Proc struct { Dll *DLL Name string addr uintptr } // sync.Mutex type Mutex struct { _ noCopy mu isync.Mutex } // isync.Mutex type Mutex struct { state int32 sema uint32 } Let’s eyeball the size of all those structures: LazyProc : 16 + sizeof(Mutex) + 8 + 8 = 32 + sizeof(Mutex) Proc : 8 + 16 + 8 = 32 Mutex : 8 Total: 32 + 32 + 8 = 72 and that’s not counting possible memory padding for allocations. Windows has a lot of functions so this adds up. Additionally, at startup we call NewProcfor every function, even if they are not used by the program. This increases startup time. The better way What we ultimately need is uintptr for the address of the function. It’ll be lazily looked up. Let’s say we use 8 functions from ole32.dll. We can use a single array of uintptr values for storing function pointers: var oleFuncPtrs = [8]uintptr var oleFuncNames = []string{"CoCreateInstance", "CoGetClassObject", ... } const kCoCreateInstance = 0 const kCoGetClassObject = 1 // etc. const kFuncMissing = 1 func funcAddrInDLL(dll *windows.LazyDLL, funcPtrs []uintptr, funcIdx int, funcNames []string) uintptr { addr := funcPtrs[funcIdx]; if addr == kFuncMissing { // we already tried to look it up and didn't find it // this can happen becuse older version of Windows might not implement this function return 0 } if addr != 0 { return addr } // lookup the funcion by name in dll name := funcNames[funcIdx] /// ... return addr } In real life this would need multi-threading protection with e.g. a mutex. Saving on strings The following is not efficient: var oleFuncNames = []string{"CoCreateInstance", "CoGetClassObject", ... } In addition to the text of the string Go needs 16 bytes: 8 for a pointer to the string and 8 for the size of the string. We can be more efficient by storing all names as a single string: var oleFuncNames ` CoCreateInstance CoGetClassObject ` Only when we’re looking up the function by name we need to construct temporary string that is a slice of oleFuncNames. We need to know the offset and size inside oleFuncNames which we can cleverly encode as a single number: // Auto-generated shell procedure identifier: cache index | str start | str past-end. const ( _PROC_SHCreateItemFromIDList _PROC_SHELL = 0 | (9 << 16) | (31 << 32) _PROC_SHCreateItemFromParsingName _PROC_SHELL = 1 | (32 << 16) | (59 << 32) // ... ) We pack the info into a single number: bits 0-15 : index of function in array of function pointers bits 16-31: start of function name in multi-name string bits 32-47: end of function name in multi-name string This technique requires code generation. It would be too difficult to write those numbers manually. References This technique is used in https://github.com/rodrigocfd/windigo win32 bindings Go library. See e.g. https://github.com/rodrigocfd/windigo/blob/master/internal/dll/dll_gdi.go

4 days ago 5 votes
Lessons along the EndBOX journey

How a wild side-quest became the source of many of the articles you’ve read—and have come to expect—in this publication

5 days ago 6 votes
Making System Calls in x86-64 Assembly

Watch now | Privilege levels, syscall conventions, and how assembly code talks to the Linux kernel

6 days ago 7 votes