Full Width [alt+shift+f] Shortcuts [alt+shift+k]
Sign Up [alt+shift+s] Log In [alt+shift+l]
2
Go team wrote golang.org/x/sys/windows package to call functions in a Windows DLL. Their way is inefficient and this article describes a better way. The sys/windows way To call a function in a DLL, let’s say kernel32.dll, we must: load the dll into memory with LoadLibrary get the address of a function in the dll call the function at that address Here’s how it looks when you use sys/windows library: var ( libole32 *windows.LazyDLL coCreateInstance *windows.LazyProc ) func init() { libole32 = windows.NewLazySystemDLL("ole32.dll") coCreateInstance = libole32.NewProc("CoCreateInstance") } func CoCreateInstance(rclsid *GUID, pUnkOuter *IUnknown, dwClsContext uint32, riid *GUID, ppv *unsafe.Pointer) HRESULT { ret, _, _ := syscall.SyscallN(coCreateInstance.Addr(), 5, uintptr(unsafe.Pointer(rclsid)), uintptr(unsafe.Pointer(pUnkOuter)), uintptr(dwClsContext), uintptr(unsafe.Pointer(riid)), uintptr(unsafe.Pointer(ppv)), 0, ) return HRESULT(ret) } The...
yesterday

Improve your reading experience

Logged in users get linked directly to articles resulting in a better reading experience. Please login for free, it takes less than 1 minute.

More from Krzysztof Kowalczyk blog

Simplest C++ callback, from SumatraPDF

SumatraPDF is a Windows GUI application for viewing PDF, ePub and comic books written in C++. A common need in GUI programs is a callback. E.g. when a button is clicked we need to call a function with some data identifying which button was clicked. Callback is therefore a combo of function and data and we need to call the function with data as an argument. In programming language lingo, code + data combo is called a closure. C++ has std::function<> and lambdas (i.e. closures). Lambdas convert to std::function<> and capture local variables. Lambdas can be used as callbacks so problems solved? Not for me. I’ve used std::function<> and I’ve used lambdas and what pushed me away from them were crash reports. I’ve implemented crash reporting and it’s been very useful. The problem with lambdas is that they are implemented as compiler-generated functions. They get non-descriptive, auto-generated names. When I look at call stack of a crash I can’t map the auto-generated closure name to a function in my code. It makes it harder to read crash reports. Simplest solution that could possibly work You should know up front that my solution is worse than std::function<> in most ways. It’s not as nice to type as a lambda, it supports a small subset of std::function<> functionality. On the other hand it’s small, fast and I can understand it. One thing you need to know about me is that despite working on SumatraPDF C++ code base for 16 years, I don’t know 80% of C++. I get by thanks to sticking to a small subset that I do understand. I don’t claim I’ve invented this particular method. It seems obvious in retrospect but it did take me 16 years to arrive at it. Implementation of a simple callback in C++ A closure is code + data A closure is conceptually simple. It combines code (function) and data: using func0Ptr = void (*)(void*); struct Func0 { func0Ptr fn; void* data; void Call() { fn(data); } }; There are 2 big problems with this. First is annoying casting. You have to do: struct MyFuncData { }; void MyFunc(void* voidData) { MyFuncData* data = (MyFuncData*)voidData; } auto data = new MyFuncData; auto fn = Func0{(void*)data, MyFunc} Second is lack of type safety: struct MyFuncData {}; void MyOhterFunc(void* voidData) { MyOtherFuncData* data = (MyOtherFuncData*)voidData; } auto data = new MyFuncData; auto fn = Func0{ MyOtherFunc, (void*)data }; We will call MyOtherFunc with data of MyFunc. This will likely crash. The good thing is that pointer types are compatible. The machine instructions to call void Foo(void*) are exactly the same as calling void Foo(FooData*). We can solve the above annoyances with a bit of cleverness in the form of MkFunc0(): template <typename T> Func0 MkFunc0(void (*fn)(T*), T* d) { auto res = Func0{}; res.fn = (func0Ptr)fn; res.userData = (void*)d; return res; } void MyFunc(MyFuncData* data) { } auto data = new MyFuncData; auto fn = MkFunc0(MyFunc, data); We no longer need to cast data from void* in MyFunc. Trying to to create a mis-matched auto fn = MkFunc0(MyFunc, new MyOtherFuncData) will error out. The compiler will notice that fnand data arguments don’t match. We’ll make one improvement: ability to also create closure for functions without any arguments: void MyFuncNoData() { }; Func0 fn = MkFuncVoid(MyFuncNoData); The implementation cleverness: use a special, impossible value of a pointer (-1) to indicate a function without arguments. The full implementation is: using func0Ptr = void (*)(void*); using funcVoidPtr = void (*)(); #define kVoidFunc0 (void*)-1 // the simplest possible function that ties a function and a single argument to it // we get type safety and convenience with mkFunc() struct Func0 { void* fn = nullptr; void* userData = nullptr; Func0() = default; Func0(const Func0& that) { this->fn = that.fn; this->userData = that.userData; } ~Func0() = default; bool IsEmpty() const { return fn == nullptr; } void Call() const { if (!fn) { return; } if (userData == kVoidFunc0) { auto func = (funcVoidPtr)fn; func(); return; } auto func = (func0Ptr)fn; func(userData); } }; template <typename T> Func0 MkFunc0(void (*fn)(T*), T* d) { auto res = Func0{}; res.fn = (func0Ptr)fn; res.userData = (void*)d; return res; } Func0 MkFuncVoid(funcVoidPtr fn) { auto res = Func0{}; res.fn = (void*)fn; res.userData = kVoidFunc0; return res; } Closure with additional caller-provided argument Func0 only addresses a use case of packaging a function and its own data. Most of use cases for callbacks require passing additional arguments. For example a list view control has onItemSelected(int itemIndex) callback. For that we need Func1: template <typename T> struct Func1 { void (*fn)(void*, T) = nullptr; void* userData = nullptr; Func1() = default; ~Func1() = default; bool IsEmpty() const { return fn == nullptr; } void Call(T arg) const { if (fn) { fn(userData, arg); } } }; template <typename T1, typename T2> Func1<T2> MkFunc1(void (*fn)(T1*, T2), T1* d) { auto res = Func1<T2>{}; using fptr = void (*)(void*, T2); res.fn = (fptr)fn; res.userData = (void*)d; return res; } We can now do: struct OnListItemSelectedData { }; void OnListItemSelected(OnListItemChangedData* d, int selectedIdx) { } struct ListView { Func1<int> onListItemSelected; void listItemSelected(int idx) { onListItemSelected.Call(idx); } } auto lv = new ListView; auto data = new OnListItemSelectedData; lv.onListItemSelected = MkFunc1(OnListItemSelected, data) In Func0 the argument must be a pointer because the type is forgotten when we put it in a struct. We rely on the fact that void foo(void*) and void foo(Foo*) are compatible and we can cast the argument and function. But Func1 retains the type of second argument so it can be any type and the right call will happen. We also don’t want to erase the second type to avoid casts when calling it and to serve as documentation. We could write Func2for 2 arguments, Func3 for 3 arguments etc. but I didn’t bother. If I need more than one argument, I can always use struct to pack any number of arguments into a single one. Fringe benefits So is it worth it to use this over std::function<>? For me it does and I’ve refactored SumatraPDF to get rid of most of std::function<> uses in favor of Func0 and Func1. Yes, std::function<> is better in many ways. It’s more flexible. My solution only supports void Foo(), void Foo(T*) and void Foo(T1*, T2). std::function<> supports arbitrary number arguments of any type. Compared to writing a lambda with variable capture, I need to write more code: define a struct for closure data allocate and initialize struct construct Func0 or Func1 delete the data (typically at the end of closure) I decided writing this boilerplate doesn’t bother me. There are fringe benefits of my approach. On MSVC 64-bit std::function<> is 64 bytes. Func0 and Func1 are 16 bytes. Templated code is a highway to bloat. For every unique type, the compiler generates a new class definition on set of methods. Implementation of std::function<> is gigantic compared to Func1 and Func2. Templated code is also a highway to slow compilation. Again, std::function<> is at least order of magnitude more complicated so it’ll take order of magnitude longer to compile. Finally, I understand my implementation. I don’t understand std::function<> implementation. It’s scarier than Freddy Krueger. It’s scarier than Frankenstein’s monster. In fact, I don’t think anyone understands std::function<> including the 3 people who implemented it.

4 days ago 5 votes
Why Go iterators are ugly, clever and elegant

Go 1.23 adds iterators. An iterator is a way to provide values that can be used in for x := range iter loops. People are happy the iterators were added to the language. Not everyone is happy about HOW they were implemented. This person opined that they demonstrate “typical Go fashion of quite ugly syntax”. The ugly Are Go iterators ugly? Here’s the boilerplate of an iterator: func IterNumbers(n int) func(func(int) bool) { return func(yield func(int) bool) { // ... the code } } Ok, that is kind of ugly. I can’t imagine typing it from memory. The competition We do not live in a vacuum. How do other languages implement iterators? C++ I recently implemented DirIter class with an iterator in C++, for SumatraPDF. I did it to so that I can write code like for (DirEntry* e : DirIter("c:\")) { ... } to read list of files in directory c:\. Implementing it was no fun. I had to implement a class with the following methods: begin() end() DirEntry* operator*() operator==() operator!=() operator++() operator++(int) Oh my, that’s a lot of methods to implement. A bigger problem is that the logic is complicated. This is an example of pull iterator where the caller “pulls” next value out of the iterator. The caller needs at least two operations from an iterator: give me next value do you have more values? In C++ it’s more complicated than that because “Overcomplication” is C++’s middle name. A function that reads a list of entries in a directory is relatively simple. The difficulty of implementing pull iterator comes from the need to track the current state of iteration to be able to provide “give me next value” function. A simple directory traversal turned into complicated tracking of what I have read so far, did the process finish and reading the next directory entry. C C# also has pull iterators but they removed incidental complexity present in C++. It reduced the interface to just 2 essential methods: T Next() which returns next element bool HasMore() which tells if there are more values to read Here’s an iterator that returns integers from 1 to n: class NumberIterator { private int _current; private int _end; public NumberIterator(int n) { _current = 0; _end = n; } public bool HasMore() { return _current < _end; } public int Next() { if (!HasMore()) { throw new InvalidOperationException("No more elements."); } return ++_current; } } Much better but still doesn’t solve the big problem: the logic is split across many calls to Next()so the code needs to track the state. C# push iterator with yield Later C# improved this by adding a way to implement push iterator. An iterator is just a function that “pushes” values to the caller using a yield statement. Push iterator is much simpler: static IEnumerable<int> GetNumbers(int n) { for (int i = 1; i <= n; i++) { yield return i; } } Clever and elegant Here’s a Go version: func GetNumbers(n int) func(func(int) bool) { return func(yield func(int) bool) { for i := i; i <= n; i++ { if !yield(i) { return } } } } The clever and elegant part is that Go designers figured out how to implement push iterators in a way very similar to C#’s yield without adding new keyword. The hard part, the logic of the iterator, is equally simple as with yield. The yield statement in C# is kind of magic. What actually happens is that the compiler rewrites the code inside-out and turns linear logic into a state machine. Go designers figured out how to implement it using just a function. It is true that there remains essential complexity: iterator is a function that returns a function that takes a function as an argument. That is a mind bend, but it can be analyzed. Instead of yield statement pushing values to the loop driver, we have a function. This function is synthesized by the compiler and provided to the iterator function. The argument to that function is the value we’re pushing to the loop. It returns a bool to indicate early exit. This is needed to implement early break out of for loop. An iterator function returns an iterator object. In Go case, the iterator object is a new function. This creates a closure. If function is an iterator object then local variables of the function are state of the iterator. I don’t know why Go designers chose this design over yield. I assume the implementation is simpler so maybe that was the reason. Or maybe they didn’t want to add new keyword and potentially break existing code.

a week ago 6 votes
Showing UI on mouse move, in Svelte 5

In my note taking application Edna I’ve implemented unorthodox UI feature: in the editor a top left navigation element is only visible when you’re moving the mouse or when mouse is over the element. Here’s UI hidden: Here’s UI visible: The thinking is: when writing, you want max window space dedicated to the editor. When you move mouse, you’re not writing so I can show additional UI. In my case it’s a way to launch note opener or open a starred or recently opened note. Implementation details Here’s how to implement this: the element we show hide has CSS visibility set to hidden. That way the element is not shown but it takes part of layout so we can test if mouse is over it even when it’s not visible. To make the element visible we change the visibility to visible we can register multiple HTML elements for tracking if mouse is over an element. In typical usage we would only we install mousemove handler. In the handler we set isMouseMoving variable and clear it after a second of inactivity using setTimeout for every registered HTML element we check if mouse is over the element Svelte 5 implementation details This can be implemented in any web framework. Here’s how to do it in Svelte 5. We want to use Svelte 5 reactivity so we have: class MouseOverElement { element; isMoving = $state(false); isOver = $state(false); } An element is shown if (isMoving || isOver) == true. To start tracking an element we use registerMuseOverElement(el: HTMLElement) : MouseOverElement function, typically in onMount. Here’s typical usage in a component: let element; let mouseOverElement; onMount(() => { mouseOverElement = registerMuseOverElement(element); }); $effect(() => { if (mouseOverElement) { let shouldShow = mouseOverElement.isMoving || mouseOverElement.isOver; let style = shouldShow ? "visible" : "hidden"; element.style.visibility = style; } }); <div bind:this={element}>...</div> Here’s a full implementation of mouse-track.sveltejs: import { len } from "./util"; class MouseOverElement { /** @type {HTMLElement} */ element; isMoving = $state(false); isOver = $state(false); /** * @param {HTMLElement} el */ constructor(el) { this.element = el; } } /** * @param {MouseEvent} e * @param {HTMLElement} el * @returns {boolean} */ function isMouseOverElement(e, el) { if (!el) { return; } const rect = el.getBoundingClientRect(); let x = e.clientX; let y = e.clientY; return x >= rect.left && x <= rect.right && y >= rect.top && y <= rect.bottom; } /** @type {MouseOverElement[]} */ let registered = []; let timeoutId; /** * @param {MouseEvent} e */ function onMouseMove(e) { clearTimeout(timeoutId); timeoutId = setTimeout(() => { for (let moe of registered) { moe.isMoving = false; } }, 1000); for (let moe of registered) { let el = moe.element; moe.isMoving = true; moe.isOver = isMouseOverElement(e, el); } } let didRegister; /** * @param {HTMLElement} el * @returns {MouseOverElement} */ export function registerMuseOverElement(el) { if (!didRegister) { document.addEventListener("mousemove", onMouseMove); didRegister = true; } let res = new MouseOverElement(el); registered.push(res); return res; } /** * @param {HTMLElement} el */ export function unregisterMouseOverElement(el) { let n = registered.length; for (let i = 0; i < n; i++) { if (registered[i].element != el) { continue; } registered.splice(i, 1); if (len(registered) == 0) { document.removeEventListener("mousemove", onMouseMove); didRegister = null; } return; } }

a week ago 9 votes
Man vs. AI: optimizing JavaScript (Claude, Cursor)

How AI beat me at code optimization game. When I started writing this article I did not expect AI to beat me at optimizing JavaScript code. But it did. I’m really passionate about optimizing JavaScript. Some say it’s a mental illness but I like my code to go balls to the wall fast. I feel the need. The need for speed. Optimizing code often requires tedious refactoring. Can we delegate the tedious parts to AI? Can I just have ideas and get AI to be my programming slave? Let’s find out. Optimizing Unicode range lookup with AI In my experiment I used Cursor with Claude 3.5 Sonnet model. I assume it could be done with other tools / models. I was browsing pdf.js code and saw this function: const UnicodeRanges = [ [0x0000, 0x007f], // 0 - Basic Latin ... omited [0x0250, 0x02af, 0x1d00, 0x1d7f, 0x1d80, 0x1dbf], // 4 - IPA Extensions - Phonetic Extensions - Phonetic Extensions Supplement ... omited ]; function getUnicodeRangeFor(value, lastPosition = -1) { // TODO: create a map range => position, sort the ranges and cache it. // Then we can make a binary search for finding a range for a given unicode. if (lastPosition !== -1) { const range = UnicodeRanges[lastPosition]; for (let i = 0, ii = range.length; i < ii; i += 2) { if (value >= range[i] && value <= range[i + 1]) { return lastPosition; } } } for (let i = 0, ii = UnicodeRanges.length; i < ii; i++) { const range = UnicodeRanges[i]; for (let j = 0, jj = range.length; j < jj; j += 2) { if (value >= range[j] && value <= range[j + 1]) { return i; } } } return -1; } This is a simple function: given Unicode character it returns Unicode range (number 0 ... UnicodeRanges.length - 1). The number is cleverly encoded implicitly as array index. The code already has a TODO about a potential optimization: sort UnicodeRanges and do binary search. There’s a minor difficulty: some Unicode ranges have multiple intervals, so we have to change the UnicodeRanges to explicitly include the range as 3rd element: const UnicodeRanges = [ [0x0000, 0x007f, 0], // 0 - Basic Latin [0x0250, 0x02af, 4], // 4 - IPA Extensions - Phonetic Extensions - Phonetic Extensions Supplement [0x1d00, 0x1d7f, 4], // 4 - IPA Extensions - Phonetic Extensions - Phonetic Extensions Supplement ... omitted ]; Then sort it and change the implementation of getUnicodeRangeFor(). Re-writing UnicodeRanges is simple but tedious and I don’t feel like doing it. Can AI do such change for us? Changing representation I’ve selected the whole of UnicodeRanges, added to chat with Ctrl + Shift + L and commanded: Change representation of UnicodeRanges array where each entry is 3 elements: [start, end, rangeNo]; preserve comments; if rangeNo has multiple ranges, split into separate entries Cursor responded with: const UnicodeRanges = [ [0x0000, 0x007f, 0], // 0 - Basic Latin [0x0080, 0x00ff, 1], // 1 - Latin-1 Supplement ... omitted ]; There was an issue: it only did part of the array. So I commanded: don’t be lazy, finish the whole UnicodeRanges array It did but the result was too long so I had to click Continue button. It worked and the generated code seems to be correct so I applied it. For binary search, the array must be sorted, so I commanded: Sort the array by first element It did, the result looked correct so I accepted the change. Hmm, I can get used to having a programming slave. Updating the code to implement binary search I copied getUnicodeRangeFor to chat and commanded: change the implementation to use binary search over UnicodeRanges array It gave me: function getUnicodeRangeFor(value, lastPosition = -1) { // Check the last position first if (lastPosition !== -1) { const range = UnicodeRanges[lastPosition]; if (value >= range[0] && value <= range[1]) { return lastPosition; } } // Binary search let left = 0; let right = UnicodeRanges.length - 1; while (left <= right) { const mid = Math.floor((left + right) / 2); const range = UnicodeRanges[mid]; if (value < range[0]) { right = mid - 1; } else if (value > range[1]) { left = mid + 1; } else { return range[2]; // Return the range index } } return -1; // Not found } It looks correct so I accepted it. We achieved the goal: replaced linear search with binary search using cheap AI labor. But I want more. More speed. Changing array of arrays to flat array A more efficient representation of the ranges would be a flat array: const UnicodeRanges = [ 0x0000, 0x007f, 0, // 0 - Basic Latin 0x0080, 0x00ff, 1, // 1 - Latin-1 Supplement ]; Let’s see if our AI slave can do it: change the whole UnicodeRanges array to flat array, preserve comments, don’t be lazy and do all of it I pre-emptively command the AI to convert the whole array. It did it. Now is the time to update getUnicodeRangeFor(). It’s a simple change I could do myself: now we just multiply index * 3 to get range definition. But I kind of like commanding the AI so I did it again: update getUnicodeRangeFor for new flat array representations of UnicodeRanges It gave me: function getUnicodeRangeFor(value, lastPosition = -1) { // Check the last position first if (lastPosition !== -1) { const index = lastPosition * 3; if (value >= UnicodeRanges[index] && value <= UnicodeRanges[index + 1]) { return UnicodeRanges[index + 2]; } } // Binary search let left = 0; let right = UnicodeRanges.length / 3 - 1; while (left <= right) { const mid = Math.floor((left + right) / 2); const index = mid * 3; const rangeStart = UnicodeRanges[index]; const rangeEnd = UnicodeRanges[index + 1]; if (value < rangeStart) { right = mid - 1; } else if (value > rangeEnd) { left = mid + 1; } else { return UnicodeRanges[index + 2]; // Return the range index } } return -1; // Not found } It does seem correct, but let’s do one more tweak: move binary search code to its own function And it worked: function binarySearchUnicodeRange(value) { let left = 0; let right = UnicodeRanges.length / 3 - 1; while (left <= right) { const mid = Math.floor((left + right) / 2); const index = mid * 3; const rangeStart = UnicodeRanges[index]; const rangeEnd = UnicodeRanges[index + 1]; if (value < rangeStart) { right = mid - 1; } else if (value > rangeEnd) { left = mid + 1; } else { return UnicodeRanges[index + 2]; // Return the range index } } return -1; // Not found } function getUnicodeRangeFor(value, lastPosition = -1) { // Check the last position first if (lastPosition !== -1) { const index = lastPosition * 3; if (value >= UnicodeRanges[index] && value <= UnicodeRanges[index + 1]) { return UnicodeRanges[index + 2]; } } // Perform binary search return binarySearchUnicodeRange(value); } Inlining some of the binary search It’s good but we haven’t yet reached the levels of insanity I’ve hinted at. UnicodeRanges is a fixed array. We could “unroll” the binary search into code. It would be very painful to do manually. I would have to write custom code generator to ensure it’s correct but maybe AI can do it for us? unroll first 3 levels of binarySearchUnicodeRange To my surprise, it worked. AI understood what I wanted and gave me this: function getUnicodeRangeFor(value, lastPosition = -1) { ... omited const length = UnicodeRanges.length / 3; let mid = Math.floor(length / 2) * 3; if (value < UnicodeRanges[mid]) { mid = Math.floor(mid / 2); if (value < UnicodeRanges[mid]) { mid = Math.floor(mid / 2); } else if (value > UnicodeRanges[mid + 1]) { mid = mid + Math.floor(mid / 2); } else { return UnicodeRanges[mid + 2]; } } else if (value > UnicodeRanges[mid + 1]) { mid = mid + Math.floor(mid / 2); if (value < UnicodeRanges[mid]) { mid = mid - Math.floor(mid / 4); } else if (value > UnicodeRanges[mid + 1]) { mid = mid + Math.floor(mid / 4); } else { return UnicodeRanges[mid + 2]; } } else { return UnicodeRanges[mid + 2]; } ... omited } Can we do even better? unroll the values of UnicodeRanges[mid] etc. and replace with constants Holly cow, it worked: function getUnicodeRangeFor(value, lastPosition = -1) { // Check the last position first if (lastPosition !== -1) { const index = lastPosition * 3; if (value >= UnicodeRanges[index] && value <= UnicodeRanges[index + 1]) { return UnicodeRanges[index + 2]; } } // Unrolled binary search with constants if (value < 0x4E00) { // CJK Unified Ideographs start if (value < 0x0530) { // Armenian start if (value < 0x0250) { // IPA Extensions start if (value < 0x0100) { // Latin Extended-A start if (value < 0x0080) { // Latin-1 Supplement start return 0; // Basic Latin } return 1; // Latin-1 Supplement } if (value < 0x0180) { return 2; // Latin Extended-A } return 3; // Latin Extended-B } if (value < 0x0300) { // Combining Diacritical Marks start if (value < 0x02B0) { // Spacing Modifier Letters start return 4; // IPA Extensions } return 5; // Spacing Modifier Letters } if (value < 0x0370) { // Greek and Coptic start return 6; // Combining Diacritical Marks } if (value < 0x0400) { // Cyrillic start return 7; // Greek and Coptic } return 9; // Cyrillic } // Continue with binary search for remaining ranges let left = 10; // Index of Armenian in UnicodeRanges let right = 102; // Index of CJK Unified Ideographs in UnicodeRanges while (left <= right) { const mid = Math.floor((left + right) / 2) * 3; if (value < UnicodeRanges[mid]) { right = Math.floor(mid / 3) - 1; } else if (value > UnicodeRanges[mid + 1]) { left = Math.floor(mid / 3) + 1; } else { return UnicodeRanges[mid + 2]; } } } else { // Binary search for ranges after CJK Unified Ideographs let left = 103; // Index after CJK Unified Ideographs in UnicodeRanges let right = UnicodeRanges.length / 3 - 1; while (left <= right) { const mid = Math.floor((left + right) / 2) * 3; if (value < UnicodeRanges[mid]) { right = Math.floor(mid / 3) - 1; } else if (value > UnicodeRanges[mid + 1]) { left = Math.floor(mid / 3) + 1; } else { return UnicodeRanges[mid + 2]; } } } return -1; // Not found } Is it correct? I have no idea. Would be easy to write a test that uses UnicodeRanges as source of truth and tests getUnicodeRangeFor() for every possible value. Can I just ask AI to do it faster? So far I micro-managed the AI coder. I was telling it exactly what kind of change I wanted and commanded AI until it did what I wanted. A good boss delegates. Good boss tells the worker the high level goal and leaves implementation detail to the worker. Let’s start from scratch and command AI thusly: could you make a faster implementation of getUnicodeRangeFor()? You can change the representation of UnicodeRanges to achieve that goal Holly cow again. The AI figured out the binary search: const UnicodeRanges = [ { start: 0x0000, end: 0x007f, index: 0 }, // Basic Latin { start: 0x0080, end: 0x00ff, index: 1 }, // Latin-1 Supplement { start: 0x0100, end: 0x017f, index: 2 }, // Latin Extended-A // ... (other ranges) { start: 0x1f000, end: 0x1f09f, index: 122 }, // Domino Tiles - Mahjong Tiles ].sort((a, b) => a.start - b.start); function getUnicodeRangeFor(value, lastPosition = -1) { if (lastPosition !== -1) { const range = UnicodeRanges[lastPosition]; if (value >= range.start && value <= range.end) { return range.index; } } let left = 0; let right = UnicodeRanges.length - 1; while (left <= right) { const mid = Math.floor((left + right) / 2); const range = UnicodeRanges[mid]; if (value >= range.start && value <= range.end) { return range.index; } if (value < range.start) { right = mid - 1; } else { left = mid + 1; } } return -1; } The object {start, end, index} is less efficient than my flat array representation and it didn’t unroll the binary search but the core idea of binary search is there. But just for kicks, maybe there’s more: can you make it even more efficient? It did something slightly different, but really the same. I’m afraid no bonus this quarter. Ok, one last try: but could you go all out and use every possible implementation trick you know to optkmize getUnicodeRangeFor and UnicodeRanges representation? Fuck me, I have no idea if the generated code is correct but it does seem it would be faster than the non-unrolled binary search I came up with earlier: // Preprocess UnicodeRanges into a more efficient structure const BUCKET_SIZE = 256; const MAX_UNICODE = 0x10FFFF; const buckets = new Uint16Array(Math.ceil((MAX_UNICODE + 1) / BUCKET_SIZE)); const ranges = []; UnicodeRanges.forEach((range, index) => { for (let i = 0; i < range.length; i += 2) { const start = range[i]; const end = range[i + 1]; ranges.push({ start, end, index }); const startBucket = start >>> 8; const endBucket = end >>> 8; for (let j = startBucket; j <= endBucket; j++) { buckets[j] |= (1 << (index & 15)); } } }); ranges.sort((a, b) => a.start - b.start); // Create a compact representation of ranges const compactRanges = new Uint32Array(ranges.length * 3); ranges.forEach((range, i) => { compactRanges[i * 3] = range.start; compactRanges[i * 3 + 1] = range.end; compactRanges[i * 3 + 2] = range.index; }); function getUnicodeRangeFor(value, lastPosition = -1) { if (value > MAX_UNICODE) return -1; // Check last position first if (lastPosition !== -1) { const start = compactRanges[lastPosition * 3]; const end = compactRanges[lastPosition * 3 + 1]; if (value >= start && value <= end) { return compactRanges[lastPosition * 3 + 2]; } } // Quick check using buckets const bucket = buckets[value >>> 8]; if (bucket === 0) return -1; // Binary search within the relevant ranges let left = 0; let right = compactRanges.length / 3 - 1; while (left <= right) { const mid = (left + right) >>> 1; const start = compactRanges[mid * 3]; const end = compactRanges[mid * 3 + 1]; if (value < start) { right = mid - 1; } else if (value > end) { left = mid + 1; } else { return compactRanges[mid * 3 + 2]; } } return -1; } In retrospect, using Uint32Array for UnicodeRanges array was a no-brainer. It’s more compact in memory, the values will probably be compared as integers, not floats. Should be faster to scan than array of floats. The buckets are not necessary. They seems to be for quick “doesn’t exist check” which is not important. I believe most lookups are for values that exist. I’m humbled that just asking for super duper optimization made AI produce something I didn’t think of. More optimization ideas I can’t help myself. Those are ideas I didn’t ask AI to implement. UnicodeRanges is small. A linear search of compact Uint32Array representation where we just have (start, end) values for each range would be faster than binary search due to cache lines. We could start the search in the middle of array and scan half the data going forward or backwards. We could also store ranges smaller than 0x10000 in Uint16Array and larger in Uint32Array. And do linear search starting in the middle. Since the values are smaller than 256, we could encode the first 0xffff values in 64kB as Uint8Array and the rest as Uint32Array. That would probably be the fastest on average, because I believe most lookups are for Unicode chars smaller than 0xffff. Finally, we could calculate the the frequency of each range in representative sample of PDF documents, check the ranges based on that frequency, fully unrolled into code, without any tables. Conclusions AI is a promising way to do tedious code refactoring. If I didn’t have the AI, I would have to write a program to e.g. convert UnicodeRanges to a flat representation. It’s simple and therefore doable but certainly would take longer than few minutes it took me to command AI. The final unrolling of getUnicodeRangeFor() would probably never happen. It would require writing a sophisticated code generator which would be a big project by itself. AI can generate buggy code so it needs to be carefully reviewed. The unrolled binary search could not be verified by review, it would need a test. But hey, I could command my AI sidekick to write the test for me. There was this idea of organizing programming teams into master programmer and coding grunts. The job of master programmer, the thinking was, to generate high level ideas and having coding grunts implement them. Turns out that we can’t organize people that way but now we can use AI to be our coding grunt. Prompt engineering is a thing. I wasted a bunch of time doing incremental improvements. I should have started by asking for super-duper optimization. Productivity gains is real. The whole thing took me about an hour. For this particular task easily 2x compared to not using cheap AI labor. Imagine you’re running a software business and instead of spending 2 months on a task, you only spend 1 month. I’ll be using more AI for coding in the future.

9 months ago 72 votes

More in programming

Lessons along the EndBOX journey

How a wild side-quest became the source of many of the articles you’ve read—and have come to expect—in this publication

2 days ago 3 votes
Making System Calls in x86-64 Assembly

Watch now | Privilege levels, syscall conventions, and how assembly code talks to the Linux kernel

3 days ago 5 votes
Better Test Setup with Disposable Objects (article)

Learn how disposable objects solve test cleanup problems in flat testing. Use TypeScript's using keyword to ensure reliable resource disposal in tests.

3 days ago 6 votes
Digital Ghosts, Wisdom, and Tennis Matchmaking

Digital Ghosts My mom recently had a free consultation from her electric company to assess replacing her propane water heater with an electric water pump heater.  She forwarded the assessment report to me, and I spent some time reviewing and researching the program. Despite living quite far away, I have been surprised by how much […]

3 days ago 6 votes