Full Width [alt+shift+f] Shortcuts [alt+shift+k]
Sign Up [alt+shift+s] Log In [alt+shift+l]
11
Consider the following JSON object: { "sides": 4, "colour": "red", "sides": 5, "colour": "blue" } Notice that sides and colour both appear twice. This looks invalid, but I learnt recently that this is actually legal JSON syntax! It’s unusual and discouraged, but it’s not completely forbidden. This was a big surprise to me. I think of JSON objects as key/value pairs, and I associate them with data structures like a dict in Python or a Hash in Ruby – both of which only allow unique keys. JSON has no such restriction, and I started thinking about how to handle it. What does the JSON spec say about duplicate names? JSON is described by several standards, which Wikipedia helpfully explains for us: After RFC 4627 had been available as its “informational” specification since 2006, JSON was first standardized in 2013, as ECMA‑404. RFC 8259, published in 2017, is the current version of the Internet Standard STD 90, and it remains consistent with ECMA‑404. That same year, JSON was...
a month ago

Improve your reading experience

Logged in users get linked directly to articles resulting in a better reading experience. Please login for free, it takes less than 1 minute.

More from alexwlchan

A faster way to copy SQLite databases between computers

I store a lot of data in SQLite databases on remote servers, and I often want to copy them to my local machine for analysis or backup. When I’m starting a new project and the database is near-empty, this is a simple rsync operation: $ rsync --progress username@server:my_remote_database.db my_local_database.db As the project matures and the database grows, this gets slower and less reliable. Downloading a 250MB database from my web server takes about a minute over my home Internet connection, and that’s pretty small – most of my databases are multiple gigabytes in size. I’ve been trying to make these copies go faster, and I recently discovered a neat trick. What really slows me down is my indexes. I have a lot of indexes in my SQLite databases, which dramatically speed up my queries, but also make the database file larger and slower to copy. (In one database, there’s an index which single-handedly accounts for half the size on disk!) The indexes don’t store anything unique – they just duplicate data from other tables to make queries faster. Copying the indexes makes the transfer less efficient, because I’m copying the same data multiple times. I was thinking about ways to skip copying the indexes, and I realised that SQLite has built-in tools to make this easy. Dumping a database as a text file SQLite allows you to dump a database as a text file. If you use the .dump command, it prints the entire database as a series of SQL statements. This text file can often be significantly smaller than the original database. Here’s the command: $ sqlite3 my_database.db .dump > my_database.db.txt And here’s what the beginning of that file looks like: PRAGMA foreign_keys=OFF; BEGIN TRANSACTION; CREATE TABLE IF NOT EXISTS "tags" ( [name] TEXT PRIMARY KEY, [count_uses] INTEGER NOT NULL ); INSERT INTO tags VALUES('carving',260); INSERT INTO tags VALUES('grass',743); … Crucially, this reduces the large and disk-heavy indexes into a single line of text – it’s an instruction to create an index, not the index itself. CREATE INDEX [idx_photo_locations] ON [photos] ([longitude], [latitude]); This means that I’m only storing each value once, rather than the many times it may be stored across the original table and my indexes. This is how the text file can be smaller than the original database. If you want to reconstruct the database, you pipe this text file back to SQLite: $ cat my_database.db.txt | sqlite3 my_reconstructed_database.db Because the SQL statements are very repetitive, this text responds well to compression: $ sqlite3 explorer.db .dump | gzip -c > explorer.db.txt.gz To give you an idea of the potential savings, here’s the relative disk size for one of my databases. File Size on disk original SQLite database 3.4 GB text file (sqlite3 my_database.db .dump) 1.3 GB gzip-compressed text (sqlite3 my_database.db .dump | gzip -c) 240 MB The gzip-compressed text file is 14× smaller than the original SQLite database – that makes downloading the database much faster. My new ssh+rsync command Rather than copying the database directly, now I create a gzip-compressed text file on the server, copy that to my local machine, and reconstruct the database. Like so: # Create a gzip-compressed text file on the server ssh username@server "sqlite3 my_remote_database.db .dump | gzip -c > my_remote_database.db.txt.gz" # Copy the gzip-compressed text file to my local machine rsync --progress username@server:my_remote_database.db.txt.gz my_local_database.db.txt.gz # Remove the gzip-compressed text file from my server ssh username@server "rm my_remote_database.db.txt.gz" # Uncompress the text file gunzip my_local_database.db.txt.gz # Reconstruct the database from the text file cat my_local_database.db.txt | sqlite3 my_local_database.db # Remove the local text file rm my_local_database.db.txt A database dump is a stable copy source This approach fixes another issue I’ve had when copying SQLite databases. If it takes a long time to copy a database and it gets updated midway through, rsync may give me an invalid database file. The first half of the file is pre-update, the second half file is post-update, and they don’t match. When I try to open the database locally, I get an error: database disk image is malformed By creating a text dump before I start the copy operation, I’m giving rsync a stable copy source. That text dump isn’t going to change midway through the copy, so I’ll always get a complete and consistent text file. This approach has saved me hours when working with large databases, and made my downloads both faster and more reliable. If you have to copy around large SQLite databases, give it a try. [If the formatting of this post looks odd in your feed reader, visit the original article]

a month ago 11 votes
A flash of light in the darkness

I support dark mode on this site, and as part of the dark theme, I have a colour-inverted copy of the default background texture. I like giving my website a subtle bit of texture, which I think makes it stand out from a web which is mostly solid-colour backgrounds. Both my textures are based on the “White Waves” pattern made by Stas Pimenov. I was setting these images as my background with two CSS rules, using the prefers-color-scheme: dark media feature to use the alternate image in dark mode: body { background: url('https://alexwlchan.net/theme/white-waves-transparent.png'); } @media (prefers-color-scheme: dark) { body { background: url('https://alexwlchan.net/theme/black-waves-transparent.png'); } } This works, mostly. But I prefer light mode, so while I wrote this CSS and I do some brief testing whenever I make changes, I’m not using the site in dark mode. I know how dark mode works in my local development environment, not how it feels as a day-to-day user. Late last night I was using my phone in dark mode to avoid waking the other people in the house, and I opened my site. I saw a brief flash of white, and then the dark background texture appeared. That flash of bright white is precisely what you don’t want when you’re using dark mode, but it happened anyway. I made a note to work it out in the morning, then I went to bed. Now I’m fully awake, it’s obvious what happened. Because my only background is the image URL, there’s a brief gap between the CSS being parsed and the background image being loaded. In that time, the browser doesn’t have anything to put in the background, so you just get pure white. This was briefly annoying in the moment, but it would be even more worse if the background texture never loaded. I have light text on black in dark mode, but without the background image it’s just light text on white, which is barely readable: I never noticed this in local development, because I’m usually working in a well-lit room where that white flash would be far less obvious. I’m also using a local version of the site, which loads near-instantly and where the background image is almost certainly saved in my browser cache. I’ve made two changes to prevent this happening again. I’ve added a colour to use as a fallback until the image loads. The CSS background property supports adding a colour, which is used until the image loads, or as a fallback if it doesn’t. I already use this in a few places, and now I’ve added it to my body background. body { background: url('https://…/white-waves-transparent.png') #fafafa; } @media (prefers-color-scheme: dark) { body { background: url('https://…/black-waves-transparent.png') #0d0d0d; } } This avoids the flash of unstyled background before the image loads – the browser will use a solid dark background until it gets the texture. I’ve added rel="preload" elements to the head of the page, so the browser will start loading the background textures faster. These elements are a clue to the browser that these resources are going to be useful when it renders the page, so it should start loading them as soon as possible: <link rel="preload" href="https://alexwlchan.net/theme/white-waves-transparent.png" as="image" type="image/png" media="(prefers-color-scheme: light)" /> <link rel="preload" href="https://alexwlchan.net/theme/black-waves-transparent.png" as="image" type="image/png" media="(prefers-color-scheme: dark)" /> This means the browser is downloading the appropriate texture at the same time as it’s downloading the CSS file. Previously it had to download the CSS file, parse it, and only then would it know to start downloading the texture. With the preload, it’s a bit faster! The difference is probably imperceptible if you’re on a fast connection, but it’s a small win and I can’t see any downside (as long as I scope the preload correctly, and don’t preload resources I don’t end up using). I’ve seen a lot of sites using <link rel="preload"> and I’ve only half-understood what it is and why it’s useful – I’m glad to have a chance to use it myself, so I can understand it better. This bug reminds me of a phenomenon called flash of unstyled text. Back when custom fonts were fairly new, you’d often see web pages appear briefly with the default font before custom fonts finished loading. There are well-understood techniques for preventing this, so it’s unusual to see that brief unstyled text on modern web pages – but the same issue is affecting me in dark mode I avoided using custom fonts on the web to avoid tackling this issue, but it got me anyway! In these dark times for the web, old bugs are new again. [If the formatting of this post looks odd in your feed reader, visit the original article]

2 months ago 26 votes
Beyond `None`: actionable error messages for `keyring.get_password()`

I’m a big fan of keyring, a Python module made by Jason R. Coombs for storing secrets in the system keyring. It works on multiple operating systems, and it knows what password store to use for each of them. For example, if you’re using macOS it puts secrets in the Keychain, but if you’re on Windows it uses Credential Locker. The keyring module is a safe and portable way to store passwords, more secure than using a plaintext config file or an environment variable. The same code will work on different platforms, because keyring handles the hard work of choosing which password store to use. It has a straightforward API: the keyring.set_password and keyring.get_password functions will handle a lot of use cases. >>> import keyring >>> keyring.set_password("xkcd", "alexwlchan", "correct-horse-battery-staple") >>> keyring.get_password("xkcd", "alexwlchan") "correct-horse-battery-staple" Although this API is simple, it’s not perfect – I have some frustrations with the get_password function. In a lot of my projects, I’m now using a small function that wraps get_password. What do I find frustrating about keyring.get_password? If you look up a password that isn’t in the system keyring, get_password returns None rather than throwing an exception: >>> print(keyring.get_password("xkcd", "the_invisible_man")) None I can see why this makes sense for the library overall – a non-existent password is very normal, and not exceptional behaviour – but in my projects, None is rarely a usable value. I normally use keyring to retrieve secrets that I need to access protected resources – for example, an API key to call an API that requires authentication. If I can’t get the right secrets, I know I can’t continue. Indeed, continuing often leads to more confusing errors when some other function unexpectedly gets None, rather than a string. For a while, I wrapped get_password in a function that would throw an exception if it couldn’t find the password: def get_required_password(service_name: str, username: str) -> str: """ Get password from the specified service. If a matching password is not found in the system keyring, this function will throw an exception. """ password = keyring.get_password(service_name, username) if password is None: raise RuntimeError(f"Could not retrieve password {(service_name, username)}") return password When I use this function, my code will fail as soon as it fails to retrieve a password, rather than when it tries to use None as the password. This worked well enough for my personal projects, but it wasn’t a great fit for shared projects. I could make sense of the error, but not everyone could do the same. What’s that password meant to be? A good error message explains what’s gone wrong, and gives the reader clear steps for fixing the issue. The error message above is only doing half the job. It tells you what’s gone wrong (it couldn’t get the password) but it doesn’t tell you how to fix it. As I started using this snippet in codebases that I work on with other developers, I got questions when other people hit this error. They could guess that they needed to set a password, but the error message doesn’t explain how, or what password they should be setting. For example, is this a secret they should pick themselves? Is it a password in our shared password vault? Or do they need an API key for a third-party service? If so, where do they find it? I still think my initial error was an improvement over letting None be used in the rest of the codebase, but I realised I could go further. This is my extended wrapper: def get_required_password(service_name: str, username: str, explanation: str) -> str: """ Get password from the specified service. If a matching password is not found in the system keyring, this function will throw an exception and explain to the user how to set the required password. """ password = keyring.get_password(service_name, username) if password is None: raise RuntimeError( "Unable to retrieve required password from the system keyring!\n" "\n" "You need to:\n" "\n" f"1/ Get the password. Here's how: {explanation}\n" "\n" "2/ Save the new password in the system keyring:\n" "\n" f" keyring set {service_name} {username}\n" ) return password The explanation argument allows me to explain what the password is for to a future reader, and what value it should have. That information can often be found in a code comment or in documentation, but putting it in an error message makes it more visible. Here’s one example: get_required_password( "flask_app", "secret_key", explanation=( "Pick a random value, e.g. with\n" "\n" " python3 -c 'import secrets; print(secrets.token_hex())'\n" "\n" "This password is used to securely sign the Flask session cookie. " "See https://flask.palletsprojects.com/en/stable/config/#SECRET_KEY" ), ) If you call this function and there’s no keyring entry for flask_app/secret_key, you get the following error: Unable to retrieve required password from the system keyring! You need to: 1/ Get the password. Here's how: Pick a random value, e.g. with python3 -c 'import secrets; print(secrets.token_hex())' This password is used to securely sign the Flask session cookie. See https://flask.palletsprojects.com/en/stable/config/#SECRET_KEY 2/ Save the new password in the system keyring: keyring set flask_app secret_key It’s longer, but this error message is far more informative. It tells you what’s wrong, how to save a password, and what the password should be. This is based on a real example where the previous error message led to a misunderstanding. A co-worker saw a missing password called “secret key” and thought it referred to a secret key for calling an API, and didn’t realise it was actually for signing Flask session cookies. Now I can write a more informative error message, I can prevent that misunderstanding happening again. (We also renamed the secret, for additional clarity.) It takes time to write this explanation, which will only ever be seen by a handful of people, but I think it’s important. If somebody sees it at all, it’ll be when they’re setting up the project for the first time. I want that setup process to be smooth and straightforward. I don’t use this wrapper in all my code, particularly small or throwaway toys that won’t last long enough for this to be an issue. But in larger codebases that will be used by other developers, and which I expect to last a long time, I use it extensively. Writing a good explanation now can avoid frustration later. [If the formatting of this post looks odd in your feed reader, visit the original article]

2 months ago 22 votes
Localising the `` with JavaScript

I’ve been writing some internal dashboards recently, and one hard part is displaying timestamps. Our server does everything in UTC, but the team is split across four different timezones, so the server timestamps aren’t always easy to read. For most people, it’s harder to understand a UTC timestamp than a timestamp in your local timezone. Did that event happen just now, an hour ago, or much further back? Was that at the beginning of your working day? Or at the end? Then I remembered that I tried to solve this five years ago at a previous job. I wrote a JavaScript snippet that converts UTC timestamps into human-friendly text. It displays times in your local time zone, and adds a short suffix if the time happened recently. For example: today @ 12:00 BST (1 hour ago) In my old project, I was using writing timestamps in a <div> and I had to opt into the human-readable text for every date on the page. It worked, but it was a bit fiddly. Doing it again, I thought of a more elegant solution. HTML has a <time> element for expressing datetimes, which is a more meaningful wrapper than a <div>. When I render the dashboard on the server, I don’t know the user’s timezone, so I include the UTC timestamp in the page like so: <time datetime="2025-04-15 19:45:00Z"> Tue, 15 Apr 2025 at 19:45 UTC </time> I put a machine-readable date and time string with a timezone offset string in the datetime attribute, and then a more human-readable string in the text of the element. Then I add this JavaScript snippet to the page: window.addEventListener("DOMContentLoaded", function() { document.querySelectorAll("time").forEach(function(timeElem) { // Set the `title` attribute to the original text, so a user // can hover over a timestamp to see the UTC time. timeElem.setAttribute("title", timeElem.innerText); // Replace the display text with a human-friendly date string // which is localised to the user's timezone. timeElem.innerText = getHumanFriendlyDateString( timeElem.getAttribute("datetime") ); }) }); This updates any <time> element on the page to use a human friendly date string, which is localised to the user’s timezone. For example, I’m in the UK so that becomes: <time datetime="2025-04-15 19:45:00Z" title="Tue, 15 Apr 2025 at 19:45 UTC"> Tue, 15 Apr 2025 at 20:45 BST </time> In my experience, these timestamps are easier and more intuitive for people to read. I always include a timezone string (e.g. BST, EST, PDT) so it’s obvious that I’m showing a localised timestamp. If you really need the UTC timestamp, it’s in the title attribute, so you can see it by hovering over it. (Sorry, mouseless users, but I don’t think any of my team are browsing our dashboards from their phone or tablet.) If the JavaScript doesn’t load, you see the plain old UTC timestamp. It’s not ideal, but the page still loads and you can see all the information – this behaviour is an enhancement, not an essential. To me, this is the unfulfilled promise of the <time> element. In my fantasy world, web page authors would write the time in a machine-readable format, and browsers would show it in a way that makes sense for the reader. They’d take into account their language, locale, and time zone. I understand why that hasn’t happened – it’s much easier said than done. You need so much context to know what’s the “right” thing to do when dealing with datetimes, and guessing without that context is at the heart of many datetime bugs. These sort of human-friendly, localised timestamps are very handy sometimes, and a complete mess at other times. In my staff-only dashboards, I have that context. I know what these timestamps mean, who’s going to be reading them, and I think they’re a helpful addition that makes the data easier to read. [If the formatting of this post looks odd in your feed reader, visit the original article]

2 months ago 32 votes

More in programming

2025-06-22 Sun: Ban std::string

The use of std::string should be banned in C++ code bases. I’m sure this statement sounds like heresy and you want to burn me at stake. But is it really controversial? Java, C#, Go, JavaScript, Python, Ruby, PHP: they all have immutable strings that are basically 2 machine words: a pointer to string data and size of the string. If they have an equivalent of std:string it’s something like StringBuilder. C++ should also use immutable strings in 97% of situations. The problem is gravity: the existing code, the culture. They all pull you strongly towards std::string and going against the current is the hardest thing there is. There isn’t a standard type for that. You can use newish std::span<char*> but there really should be std::str (or some such). I did that in SumatraPDF where I mostly pass char* but I don’t expect many other C++ code bases to switch away from std::string.

4 hours ago 1 votes
In Praise of “Normal” Engineers

This article was originally commissioned by Luca Rossi (paywalled) for refactoring.fm, on February 11th, 2025. Luca edited a version of it that emphasized the importance of building “10x engineering teams” . It was later picked up by IEEE Spectrum (!!!), who scrapped most of the teams content and published a different, shorter piece on March […]

3 days ago 6 votes
Optimizing calling Windows DLL functions in Go

Go team wrote golang.org/x/sys/windows package to call functions in a Windows DLL. Their way is inefficient and this article describes a better way. The sys/windows way To call a function in a DLL, let’s say kernel32.dll, we must: load the dll into memory with LoadLibrary get the address of a function in the dll call the function at that address Here’s how it looks when you use sys/windows library: var ( libole32 *windows.LazyDLL coCreateInstance *windows.LazyProc ) func init() { libole32 = windows.NewLazySystemDLL("ole32.dll") coCreateInstance = libole32.NewProc("CoCreateInstance") } func CoCreateInstance(rclsid *GUID, pUnkOuter *IUnknown, dwClsContext uint32, riid *GUID, ppv *unsafe.Pointer) HRESULT { ret, _, _ := syscall.SyscallN(coCreateInstance.Addr(), 5, uintptr(unsafe.Pointer(rclsid)), uintptr(unsafe.Pointer(pUnkOuter)), uintptr(dwClsContext), uintptr(unsafe.Pointer(riid)), uintptr(unsafe.Pointer(ppv)), 0, ) return HRESULT(ret) } The problem The problem is that this is memory inefficient. For every function all we need is: name of the function to get its address in a dll. That is a string so its 8 bytes (address of the string) + 8 bytes (size of the string) + the content of the string. address of a function, which is 8 bytes on a 64-bit CPU Unfortunately in sys/windows each function requires this: type LazyProc struct { Name string mu sync.Mutex l *LazyDLL proc *Proc } type Proc struct { Dll *DLL Name string addr uintptr } // sync.Mutex type Mutex struct { _ noCopy mu isync.Mutex } // isync.Mutex type Mutex struct { state int32 sema uint32 } Let’s eyeball the size of all those structures: LazyProc : 16 + sizeof(Mutex) + 8 + 8 = 32 + sizeof(Mutex) Proc : 8 + 16 + 8 = 32 Mutex : 8 Total: 32 + 32 + 8 = 72 and that’s not counting possible memory padding for allocations. Windows has a lot of functions so this adds up. Additionally, at startup we call NewProcfor every function, even if they are not used by the program. This increases startup time. The better way What we ultimately need is uintptr for the address of the function. It’ll be lazily looked up. Let’s say we use 8 functions from ole32.dll. We can use a single array of uintptr values for storing function pointers: var oleFuncPtrs = [8]uintptr var oleFuncNames = []string{"CoCreateInstance", "CoGetClassObject", ... } const kCoCreateInstance = 0 const kCoGetClassObject = 1 // etc. const kFuncMissing = 1 func funcAddrInDLL(dll *windows.LazyDLL, funcPtrs []uintptr, funcIdx int, funcNames []string) uintptr { addr := funcPtrs[funcIdx]; if addr == kFuncMissing { // we already tried to look it up and didn't find it // this can happen becuse older version of Windows might not implement this function return 0 } if addr != 0 { return addr } // lookup the funcion by name in dll name := funcNames[funcIdx] /// ... return addr } In real life this would need multi-threading protection with e.g. a mutex. Saving on strings The following is not efficient: var oleFuncNames = []string{"CoCreateInstance", "CoGetClassObject", ... } In addition to the text of the string Go needs 16 bytes: 8 for a pointer to the string and 8 for the size of the string. We can be more efficient by storing all names as a single string: var oleFuncNames ` CoCreateInstance CoGetClassObject ` Only when we’re looking up the function by name we need to construct temporary string that is a slice of oleFuncNames. We need to know the offset and size inside oleFuncNames which we can cleverly encode as a single number: // Auto-generated shell procedure identifier: cache index | str start | str past-end. const ( _PROC_SHCreateItemFromIDList _PROC_SHELL = 0 | (9 << 16) | (31 << 32) _PROC_SHCreateItemFromParsingName _PROC_SHELL = 1 | (32 << 16) | (59 << 32) // ... ) We pack the info into a single number: bits 0-15 : index of function in array of function pointers bits 16-31: start of function name in multi-name string bits 32-47: end of function name in multi-name string This technique requires code generation. It would be too difficult to write those numbers manually. References This technique is used in https://github.com/rodrigocfd/windigo win32 bindings Go library. See e.g. https://github.com/rodrigocfd/windigo/blob/master/internal/dll/dll_gdi.go

4 days ago 5 votes
Lessons along the EndBOX journey

How a wild side-quest became the source of many of the articles you’ve read—and have come to expect—in this publication

5 days ago 6 votes
Making System Calls in x86-64 Assembly

Watch now | Privilege levels, syscall conventions, and how assembly code talks to the Linux kernel

6 days ago 7 votes