Full Width [alt+shift+f] Shortcuts [alt+shift+k]
Sign Up [alt+shift+s] Log In [alt+shift+l]
91
Obsidian still feels like my “new” app for managing my notes, but according to my daily journal I’ve been using it for nearly three years. Time flies when you’re organising information! I’ve grown to really like it, and I expect to keep using it for a while to come. Its approach to tagging and linked notes fits my mental model, and there’s a lot of flexibility in the plugin architecture. I can make it look nice, add a few basic features I want, and it syncs nicely across my Mac and my iPhone. Inspired by Steph Ango’s example, I thought it might be useful to write a little about how I structure my Obsidian vaults. My setup is somewhat fluid, so consider this as more of a point-in-time snapshot than a definitive approach – I keep tweaking it as I find better ways to organise my notes. My vaults I have two vaults: My work vault has everything that’s specific to my day job – meeting minutes, project plans, notes about ongoing work. This is scoped to my current employer and lives on my work...
a year ago

Improve your reading experience

Logged in users get linked directly to articles resulting in a better reading experience. Please login for free, it takes less than 1 minute.

More from alexwlchan

Handling JSON objects with duplicate names in Python

Consider the following JSON object: { "sides": 4, "colour": "red", "sides": 5, "colour": "blue" } Notice that sides and colour both appear twice. This looks invalid, but I learnt recently that this is actually legal JSON syntax! It’s unusual and discouraged, but it’s not completely forbidden. This was a big surprise to me. I think of JSON objects as key/value pairs, and I associate them with data structures like a dict in Python or a Hash in Ruby – both of which only allow unique keys. JSON has no such restriction, and I started thinking about how to handle it. What does the JSON spec say about duplicate names? JSON is described by several standards, which Wikipedia helpfully explains for us: After RFC 4627 had been available as its “informational” specification since 2006, JSON was first standardized in 2013, as ECMA‑404. RFC 8259, published in 2017, is the current version of the Internet Standard STD 90, and it remains consistent with ECMA‑404. That same year, JSON was also standardized as ISO/IEC 21778:2017. The ECMA and ISO/IEC standards describe only the allowed syntax, whereas the RFC covers some security and interoperability considerations. All three of these standards explicitly allow the use of duplicate names in objects. ECMA‑404 and ISO/IEC 21778:2017 have identical text to describe the syntax of JSON objects, and they say (emphasis mine): An object structure is represented as a pair of curly bracket tokens surrounding zero or more name/value pairs. […] The JSON syntax does not impose any restrictions on the strings used as names, does not require that name strings be unique, and does not assign any significance to the ordering of name/value pairs. These are all semantic considerations that may be defined by JSON processors or in specifications defining specific uses of JSON for data interchange. RFC 8259 goes further and strongly recommends against duplicate names, but the use of SHOULD means it isn’t completely forbidden: The names within an object SHOULD be unique. The same document warns about the consequences of ignoring this recommendation: An object whose names are all unique is interoperable in the sense that all software implementations receiving that object will agree on the name-value mappings. When the names within an object are not unique, the behavior of software that receives such an object is unpredictable. Many implementations report the last name/value pair only. Other implementations report an error or fail to parse the object, and some implementations report all of the name/value pairs, including duplicates. So it’s technically valid, but it’s unusual and discouraged. I’ve never heard of a use case for JSON objects with duplicate names. I’m sure there was a good reason for it being allowed by the spec, but I can’t think of it. Most JSON parsers – including jq, JavaScript, and Python – will silently discard all but the last instance of a duplicate name. Here’s an example in Python: >>> import json >>> json.loads('{"sides": 4, "colour": "red", "sides": 5, "colour": "blue"}') {'colour': 'blue', 'sides': 5} What if I wanted to decode the whole object, or throw an exception if I see duplicate names? This happened to me recently. I was editing a JSON file by hand, and I’d copy/paste objects to update the data. I also had scripts which could update the file. I forgot to update the name on one of the JSON objects, so there were two name/value pairs with the same name. When I ran the script, it silently erased the first value. I was able to recover the deleted value from the Git history, but I wondered how I could prevent this happening again. How could I make the script fail, rather than silently delete data? Decoding duplicate names in Python When Python decodes a JSON object, it first parses the object as a list of name/value pairs, then it turns that list of name value pairs into a dictionary. We can see this by looking at the JSONObject function in the CPython source code: it builds a list pairs, and at the end of the function, it calls dict(pairs) to turn the list into a dictionary. This relies on the fact that dict() can take an iterable of key/value tuples and create a dictionary: >>> dict([('sides', 4), ('colour', 'red')]) {'colour': 'red', 'sides': 4} The docs for dict() tell us that it` will discard duplicate keys: “if a key occurs more than once, the last value for that key becomes the corresponding value in the new dictionary”. >>> dict([('sides', 4), ('colour', 'red'), ('sides', 5), ('colour', 'blue')]) {'colour': 'blue', 'sides': 5} We can customise what Python does with the list of name/value pairs. Rather than calling dict(), we can pass our own function to the object_pairs_hook parameter of json.loads(), and Python will call that function on the list of pairs. This allows us to parse objects in a different way. For example, we can just return the literal list of name/value pairs: >>> import json >>> json.loads( ... '{"sides": 4, "colour": "red", "sides": 5, "colour": "blue"}', ... object_pairs_hook=lambda pairs: pairs ... ) ... [('sides', 4), ('colour', 'red'), ('sides', 5), ('colour', 'blue')] We could also use the multidict library to get a dict-like data structure which supports multiple values per key. This is based on HTTP headers and URL query strings, two environments where it’s common to have multiple values for a single key: >>> from multidict import MultiDict >>> md = json.loads( ... '{"sides": 4, "colour": "red", "sides": 5, "colour": "blue"}', ... object_pairs_hook=lambda pairs: MultiDict(pairs) ... ) ... >>> md <MultiDict('sides': 4, 'colour': 'red', 'sides': 5, 'colour': 'blue')> >>> md['sides'] 4 >>> md.getall('sides') [4, 5] Preventing silent data loss If we want to throw an exception when we see duplicate names, we need a longer function. Here’s the code I wrote: import collections import typing def dict_with_unique_names(pairs: list[tuple[str, typing.Any]]) -> dict[str, typing.Any]: """ Convert a list of name/value pairs to a dict, but only if the names are unique. If there are non-unique names, this function throws a ValueError. """ # First try to parse the object as a dictionary; if it's the same # length as the pairs, then we know all the names were unique and # we can return immediately. pairs_as_dict = dict(pairs) if len(pairs_as_dict) == len(pairs): return pairs_as_dict # Otherwise, let's work out what the repeated name(s) were, so we # can throw an appropriate error message for the user. name_tally = collections.Counter(n for n, _ in pairs) repeated_names = [n for n, count in name_tally.items() if count > 1] assert len(repeated_names) > 0 if len(repeated_names) == 1: raise ValueError(f"Found repeated name in JSON object: {repeated_names[0]}") else: raise ValueError( f"Found repeated names in JSON object: {', '.join(repeated_names)}" ) If I use this as my object_pairs_hook when parsing an object which has all unique names, it returns the normal dict I’d expect: >>> json.loads( ... '{"sides": 4, "colour": "red"}', ... object_pairs_hook=dict_with_unique_names ... ) ... {'colour': 'red', 'sides': 4} But if I’m parsing an object with one or more repeated names, the parsing fails and throws a ValueError: >>> json.loads( ... '{"sides": 4, "colour": "red", "sides": 5}', ... object_pairs_hook=dict_with_unique_names ... ) Traceback (most recent call last): […] ValueError: Found repeated name in JSON object: sides >>> json.loads( ... '{"sides": 4, "colour": "red", "sides": 5, "colour": "blue"}', ... object_pairs_hook=dict_with_unique_names ... ) Traceback (most recent call last): […] ValueError: Found repeated names in JSON object: sides, colour This is precisely the behaviour I want – throwing an exception, not silently dropping data. Encoding non-unique names in Python It’s hard to think of a use case, but this post feels incomplete without at least a brief mention. If you want to encode custom data structures with Python’s JSON library, you can subclass JSONEncoder and define how those structures should be serialised. Here’s a rudimentary attempt at doing that for a MultiDict: class MultiDictEncoder(json.JSONEncoder): def encode(self, o: typing.Any) -> str: # If this is a MultiDict, we need to construct the JSON string # manually -- first encode each name/value pair, then construct # the JSON object literal. if isinstance(o, MultiDict): name_value_pairs = [ f'{super().encode(str(name))}: {self.encode(value)}' for name, value in o.items() ] return '{' + ', '.join(name_value_pairs) + '}' return super().encode(o) and here’s how you use it: >>> md = MultiDict([('sides', 4), ('colour', 'red'), ('sides', 5), ('colour', 'blue')]) >>> json.dumps(md, cls=MultiDictEncoder) {"sides": 4, "colour": "red", "sides": 5, "colour": "blue"} This is rough code, and you shouldn’t use it – it’s only an example. I’m constructing the JSON string manually, so it doesn’t handle edge cases like indentation or special characters. There are almost certainly bugs, and you’d need to be more careful if you wanted to use this for real. In practice, if I had to encode a multi-dict as JSON, I’d encode it as a list of objects which each have a key and a value field. For example: [ {"key": "sides", "value": 4 }, {"key": "colour", "value": "red" }, {"key": "sides", "value": 5 }, {"key": "colour", "value": "blue"}, ] This is a pretty standard pattern, and it won’t trip up JSON parsers which aren’t expecting duplicate names. Do you need to worry about this? This isn’t a big deal. JSON objects with duplicate names are pretty unusual – this is the first time I’ve ever encountered one, and it was a mistake. Trying to account for this edge case in every project that uses JSON would be overkill. It would add complexity to my code and probably never catch a single error. This started when I made a copy/paste error that introduced the initial duplication, and then a script modified the JSON file and caused some data loss. That’s a somewhat unusual workflow, because most JSON files are exclusively modified by computers, and this wouldn’t be an issue. I’ve added this error handling to my javascript-data-files library, but I don’t anticipate adding it to other projects. I use that library for my static website archives, which is where I had this issue. Although I won’t use this code exactly, it’s been good practice at writing custom encoders/decoders in Python. That is something I do all the time – I’m often encoding native Python types as JSON, and I want to get the same type back when I decode later. I’ve been writing my own subclasses of JSONEncoder and JSONDecoder for a while. Now I know a bit more about how Python decodes JSON, and object_pairs_hook is another tool I can consider using. This was a fun deep dive for me, and I hope you found it helpful too. [If the formatting of this post looks odd in your feed reader, visit the original article]

2 months ago 15 votes
A faster way to copy SQLite databases between computers

I store a lot of data in SQLite databases on remote servers, and I often want to copy them to my local machine for analysis or backup. When I’m starting a new project and the database is near-empty, this is a simple rsync operation: $ rsync --progress username@server:my_remote_database.db my_local_database.db As the project matures and the database grows, this gets slower and less reliable. Downloading a 250MB database from my web server takes about a minute over my home Internet connection, and that’s pretty small – most of my databases are multiple gigabytes in size. I’ve been trying to make these copies go faster, and I recently discovered a neat trick. What really slows me down is my indexes. I have a lot of indexes in my SQLite databases, which dramatically speed up my queries, but also make the database file larger and slower to copy. (In one database, there’s an index which single-handedly accounts for half the size on disk!) The indexes don’t store anything unique – they just duplicate data from other tables to make queries faster. Copying the indexes makes the transfer less efficient, because I’m copying the same data multiple times. I was thinking about ways to skip copying the indexes, and I realised that SQLite has built-in tools to make this easy. Dumping a database as a text file SQLite allows you to dump a database as a text file. If you use the .dump command, it prints the entire database as a series of SQL statements. This text file can often be significantly smaller than the original database. Here’s the command: $ sqlite3 my_database.db .dump > my_database.db.txt And here’s what the beginning of that file looks like: PRAGMA foreign_keys=OFF; BEGIN TRANSACTION; CREATE TABLE IF NOT EXISTS "tags" ( [name] TEXT PRIMARY KEY, [count_uses] INTEGER NOT NULL ); INSERT INTO tags VALUES('carving',260); INSERT INTO tags VALUES('grass',743); … Crucially, this reduces the large and disk-heavy indexes into a single line of text – it’s an instruction to create an index, not the index itself. CREATE INDEX [idx_photo_locations] ON [photos] ([longitude], [latitude]); This means that I’m only storing each value once, rather than the many times it may be stored across the original table and my indexes. This is how the text file can be smaller than the original database. If you want to reconstruct the database, you pipe this text file back to SQLite: $ cat my_database.db.txt | sqlite3 my_reconstructed_database.db Because the SQL statements are very repetitive, this text responds well to compression: $ sqlite3 explorer.db .dump | gzip -c > explorer.db.txt.gz To give you an idea of the potential savings, here’s the relative disk size for one of my databases. File Size on disk original SQLite database 3.4 GB text file (sqlite3 my_database.db .dump) 1.3 GB gzip-compressed text (sqlite3 my_database.db .dump | gzip -c) 240 MB The gzip-compressed text file is 14× smaller than the original SQLite database – that makes downloading the database much faster. My new ssh+rsync command Rather than copying the database directly, now I create a gzip-compressed text file on the server, copy that to my local machine, and reconstruct the database. Like so: # Create a gzip-compressed text file on the server ssh username@server "sqlite3 my_remote_database.db .dump | gzip -c > my_remote_database.db.txt.gz" # Copy the gzip-compressed text file to my local machine rsync --progress username@server:my_remote_database.db.txt.gz my_local_database.db.txt.gz # Remove the gzip-compressed text file from my server ssh username@server "rm my_remote_database.db.txt.gz" # Uncompress the text file gunzip my_local_database.db.txt.gz # Reconstruct the database from the text file cat my_local_database.db.txt | sqlite3 my_local_database.db # Remove the local text file rm my_local_database.db.txt A database dump is a stable copy source This approach fixes another issue I’ve had when copying SQLite databases. If it takes a long time to copy a database and it gets updated midway through, rsync may give me an invalid database file. The first half of the file is pre-update, the second half file is post-update, and they don’t match. When I try to open the database locally, I get an error: database disk image is malformed By creating a text dump before I start the copy operation, I’m giving rsync a stable copy source. That text dump isn’t going to change midway through the copy, so I’ll always get a complete and consistent text file. This approach has saved me hours when working with large databases, and made my downloads both faster and more reliable. If you have to copy around large SQLite databases, give it a try. [If the formatting of this post looks odd in your feed reader, visit the original article]

2 months ago 14 votes
A flash of light in the darkness

I support dark mode on this site, and as part of the dark theme, I have a colour-inverted copy of the default background texture. I like giving my website a subtle bit of texture, which I think makes it stand out from a web which is mostly solid-colour backgrounds. Both my textures are based on the “White Waves” pattern made by Stas Pimenov. I was setting these images as my background with two CSS rules, using the prefers-color-scheme: dark media feature to use the alternate image in dark mode: body { background: url('https://alexwlchan.net/theme/white-waves-transparent.png'); } @media (prefers-color-scheme: dark) { body { background: url('https://alexwlchan.net/theme/black-waves-transparent.png'); } } This works, mostly. But I prefer light mode, so while I wrote this CSS and I do some brief testing whenever I make changes, I’m not using the site in dark mode. I know how dark mode works in my local development environment, not how it feels as a day-to-day user. Late last night I was using my phone in dark mode to avoid waking the other people in the house, and I opened my site. I saw a brief flash of white, and then the dark background texture appeared. That flash of bright white is precisely what you don’t want when you’re using dark mode, but it happened anyway. I made a note to work it out in the morning, then I went to bed. Now I’m fully awake, it’s obvious what happened. Because my only background is the image URL, there’s a brief gap between the CSS being parsed and the background image being loaded. In that time, the browser doesn’t have anything to put in the background, so you just get pure white. This was briefly annoying in the moment, but it would be even more worse if the background texture never loaded. I have light text on black in dark mode, but without the background image it’s just light text on white, which is barely readable: I never noticed this in local development, because I’m usually working in a well-lit room where that white flash would be far less obvious. I’m also using a local version of the site, which loads near-instantly and where the background image is almost certainly saved in my browser cache. I’ve made two changes to prevent this happening again. I’ve added a colour to use as a fallback until the image loads. The CSS background property supports adding a colour, which is used until the image loads, or as a fallback if it doesn’t. I already use this in a few places, and now I’ve added it to my body background. body { background: url('https://…/white-waves-transparent.png') #fafafa; } @media (prefers-color-scheme: dark) { body { background: url('https://…/black-waves-transparent.png') #0d0d0d; } } This avoids the flash of unstyled background before the image loads – the browser will use a solid dark background until it gets the texture. I’ve added rel="preload" elements to the head of the page, so the browser will start loading the background textures faster. These elements are a clue to the browser that these resources are going to be useful when it renders the page, so it should start loading them as soon as possible: <link rel="preload" href="https://alexwlchan.net/theme/white-waves-transparent.png" as="image" type="image/png" media="(prefers-color-scheme: light)" /> <link rel="preload" href="https://alexwlchan.net/theme/black-waves-transparent.png" as="image" type="image/png" media="(prefers-color-scheme: dark)" /> This means the browser is downloading the appropriate texture at the same time as it’s downloading the CSS file. Previously it had to download the CSS file, parse it, and only then would it know to start downloading the texture. With the preload, it’s a bit faster! The difference is probably imperceptible if you’re on a fast connection, but it’s a small win and I can’t see any downside (as long as I scope the preload correctly, and don’t preload resources I don’t end up using). I’ve seen a lot of sites using <link rel="preload"> and I’ve only half-understood what it is and why it’s useful – I’m glad to have a chance to use it myself, so I can understand it better. This bug reminds me of a phenomenon called flash of unstyled text. Back when custom fonts were fairly new, you’d often see web pages appear briefly with the default font before custom fonts finished loading. There are well-understood techniques for preventing this, so it’s unusual to see that brief unstyled text on modern web pages – but the same issue is affecting me in dark mode I avoided using custom fonts on the web to avoid tackling this issue, but it got me anyway! In these dark times for the web, old bugs are new again. [If the formatting of this post looks odd in your feed reader, visit the original article]

2 months ago 32 votes
Beyond `None`: actionable error messages for `keyring.get_password()`

I’m a big fan of keyring, a Python module made by Jason R. Coombs for storing secrets in the system keyring. It works on multiple operating systems, and it knows what password store to use for each of them. For example, if you’re using macOS it puts secrets in the Keychain, but if you’re on Windows it uses Credential Locker. The keyring module is a safe and portable way to store passwords, more secure than using a plaintext config file or an environment variable. The same code will work on different platforms, because keyring handles the hard work of choosing which password store to use. It has a straightforward API: the keyring.set_password and keyring.get_password functions will handle a lot of use cases. >>> import keyring >>> keyring.set_password("xkcd", "alexwlchan", "correct-horse-battery-staple") >>> keyring.get_password("xkcd", "alexwlchan") "correct-horse-battery-staple" Although this API is simple, it’s not perfect – I have some frustrations with the get_password function. In a lot of my projects, I’m now using a small function that wraps get_password. What do I find frustrating about keyring.get_password? If you look up a password that isn’t in the system keyring, get_password returns None rather than throwing an exception: >>> print(keyring.get_password("xkcd", "the_invisible_man")) None I can see why this makes sense for the library overall – a non-existent password is very normal, and not exceptional behaviour – but in my projects, None is rarely a usable value. I normally use keyring to retrieve secrets that I need to access protected resources – for example, an API key to call an API that requires authentication. If I can’t get the right secrets, I know I can’t continue. Indeed, continuing often leads to more confusing errors when some other function unexpectedly gets None, rather than a string. For a while, I wrapped get_password in a function that would throw an exception if it couldn’t find the password: def get_required_password(service_name: str, username: str) -> str: """ Get password from the specified service. If a matching password is not found in the system keyring, this function will throw an exception. """ password = keyring.get_password(service_name, username) if password is None: raise RuntimeError(f"Could not retrieve password {(service_name, username)}") return password When I use this function, my code will fail as soon as it fails to retrieve a password, rather than when it tries to use None as the password. This worked well enough for my personal projects, but it wasn’t a great fit for shared projects. I could make sense of the error, but not everyone could do the same. What’s that password meant to be? A good error message explains what’s gone wrong, and gives the reader clear steps for fixing the issue. The error message above is only doing half the job. It tells you what’s gone wrong (it couldn’t get the password) but it doesn’t tell you how to fix it. As I started using this snippet in codebases that I work on with other developers, I got questions when other people hit this error. They could guess that they needed to set a password, but the error message doesn’t explain how, or what password they should be setting. For example, is this a secret they should pick themselves? Is it a password in our shared password vault? Or do they need an API key for a third-party service? If so, where do they find it? I still think my initial error was an improvement over letting None be used in the rest of the codebase, but I realised I could go further. This is my extended wrapper: def get_required_password(service_name: str, username: str, explanation: str) -> str: """ Get password from the specified service. If a matching password is not found in the system keyring, this function will throw an exception and explain to the user how to set the required password. """ password = keyring.get_password(service_name, username) if password is None: raise RuntimeError( "Unable to retrieve required password from the system keyring!\n" "\n" "You need to:\n" "\n" f"1/ Get the password. Here's how: {explanation}\n" "\n" "2/ Save the new password in the system keyring:\n" "\n" f" keyring set {service_name} {username}\n" ) return password The explanation argument allows me to explain what the password is for to a future reader, and what value it should have. That information can often be found in a code comment or in documentation, but putting it in an error message makes it more visible. Here’s one example: get_required_password( "flask_app", "secret_key", explanation=( "Pick a random value, e.g. with\n" "\n" " python3 -c 'import secrets; print(secrets.token_hex())'\n" "\n" "This password is used to securely sign the Flask session cookie. " "See https://flask.palletsprojects.com/en/stable/config/#SECRET_KEY" ), ) If you call this function and there’s no keyring entry for flask_app/secret_key, you get the following error: Unable to retrieve required password from the system keyring! You need to: 1/ Get the password. Here's how: Pick a random value, e.g. with python3 -c 'import secrets; print(secrets.token_hex())' This password is used to securely sign the Flask session cookie. See https://flask.palletsprojects.com/en/stable/config/#SECRET_KEY 2/ Save the new password in the system keyring: keyring set flask_app secret_key It’s longer, but this error message is far more informative. It tells you what’s wrong, how to save a password, and what the password should be. This is based on a real example where the previous error message led to a misunderstanding. A co-worker saw a missing password called “secret key” and thought it referred to a secret key for calling an API, and didn’t realise it was actually for signing Flask session cookies. Now I can write a more informative error message, I can prevent that misunderstanding happening again. (We also renamed the secret, for additional clarity.) It takes time to write this explanation, which will only ever be seen by a handful of people, but I think it’s important. If somebody sees it at all, it’ll be when they’re setting up the project for the first time. I want that setup process to be smooth and straightforward. I don’t use this wrapper in all my code, particularly small or throwaway toys that won’t last long enough for this to be an issue. But in larger codebases that will be used by other developers, and which I expect to last a long time, I use it extensively. Writing a good explanation now can avoid frustration later. [If the formatting of this post looks odd in your feed reader, visit the original article]

2 months ago 29 votes
Localising the `` with JavaScript

I’ve been writing some internal dashboards recently, and one hard part is displaying timestamps. Our server does everything in UTC, but the team is split across four different timezones, so the server timestamps aren’t always easy to read. For most people, it’s harder to understand a UTC timestamp than a timestamp in your local timezone. Did that event happen just now, an hour ago, or much further back? Was that at the beginning of your working day? Or at the end? Then I remembered that I tried to solve this five years ago at a previous job. I wrote a JavaScript snippet that converts UTC timestamps into human-friendly text. It displays times in your local time zone, and adds a short suffix if the time happened recently. For example: today @ 12:00 BST (1 hour ago) In my old project, I was using writing timestamps in a <div> and I had to opt into the human-readable text for every date on the page. It worked, but it was a bit fiddly. Doing it again, I thought of a more elegant solution. HTML has a <time> element for expressing datetimes, which is a more meaningful wrapper than a <div>. When I render the dashboard on the server, I don’t know the user’s timezone, so I include the UTC timestamp in the page like so: <time datetime="2025-04-15 19:45:00Z"> Tue, 15 Apr 2025 at 19:45 UTC </time> I put a machine-readable date and time string with a timezone offset string in the datetime attribute, and then a more human-readable string in the text of the element. Then I add this JavaScript snippet to the page: window.addEventListener("DOMContentLoaded", function() { document.querySelectorAll("time").forEach(function(timeElem) { // Set the `title` attribute to the original text, so a user // can hover over a timestamp to see the UTC time. timeElem.setAttribute("title", timeElem.innerText); // Replace the display text with a human-friendly date string // which is localised to the user's timezone. timeElem.innerText = getHumanFriendlyDateString( timeElem.getAttribute("datetime") ); }) }); This updates any <time> element on the page to use a human friendly date string, which is localised to the user’s timezone. For example, I’m in the UK so that becomes: <time datetime="2025-04-15 19:45:00Z" title="Tue, 15 Apr 2025 at 19:45 UTC"> Tue, 15 Apr 2025 at 20:45 BST </time> In my experience, these timestamps are easier and more intuitive for people to read. I always include a timezone string (e.g. BST, EST, PDT) so it’s obvious that I’m showing a localised timestamp. If you really need the UTC timestamp, it’s in the title attribute, so you can see it by hovering over it. (Sorry, mouseless users, but I don’t think any of my team are browsing our dashboards from their phone or tablet.) If the JavaScript doesn’t load, you see the plain old UTC timestamp. It’s not ideal, but the page still loads and you can see all the information – this behaviour is an enhancement, not an essential. To me, this is the unfulfilled promise of the <time> element. In my fantasy world, web page authors would write the time in a machine-readable format, and browsers would show it in a way that makes sense for the reader. They’d take into account their language, locale, and time zone. I understand why that hasn’t happened – it’s much easier said than done. You need so much context to know what’s the “right” thing to do when dealing with datetimes, and guessing without that context is at the heart of many datetime bugs. These sort of human-friendly, localised timestamps are very handy sometimes, and a complete mess at other times. In my staff-only dashboards, I have that context. I know what these timestamps mean, who’s going to be reading them, and I think they’re a helpful addition that makes the data easier to read. [If the formatting of this post looks odd in your feed reader, visit the original article]

2 months ago 39 votes

More in programming

Single-Use Disposable Applications

As search gets worse and “working code” gets cheaper, apps get easier to make from scratch than to find.

11 hours ago 3 votes
Thoughts on Motivation and My 40-Year Career

I’ve never published an essay quite like this. I’ve written about my life before, reams of stuff actually, because that’s how I process what I think, but never for public consumption. I’ve been pushing myself to write more lately because my co-authors and I have a whole fucking book to write between now and October. […]

6 hours ago 3 votes
Desktop UI frameworks written by a single person

Less known desktop UI frameworks Writing desktop software is hard. The UI technologies of Windows or MacOS are awful compared to web technology. What can trivially be done with HTML/CSS/JavaScript in few minutes can take hours using Windows’s win32 APIs or Mac’s Cocoa. That’s why the default technology for desktop apps, especially cross-platform, is Electron: a Chrome browser combined with Node runtime. The problem is that it’s bloaty: each app is a unique build of Chrome with a little bit of application code. Chrome is over 100MB so many apps ship less than 1MB of code in a 100M wrapper. People tried to address the problem of poor OS APIs by writing UI frameworks, often meant to be cross-platform. You’ve heard about QT, GTK, wxWindows. The problem with those is that they are also old, their APIs are not the greatest either and they are bloaty as well. There just doesn’t seem to be a good option. Writing your own framework seems impossible due to the size of task. But is it? I’ll show a couple of less-known UI frameworks written mostly be a single person, often done simply to enable writing an application. SWELL in WDL WDL is interesting. Justin Frankel, the guy who created Winamp, has a repository of C++ code he uses in different projects. After selling Winamp to AOL, a side quest of writing file sharing application, getting fired from AOL for writing file sharing application, he started a company building Reaper a digital audio workstation software for Windows. Winamp is a win32 API program and so is Reaper. At some point Justin decided to make a Mac version but by then he had a lot of code heavily using win32 APIs. So he did what anyone in his position would: he implemented win32 APIs for Mac OS and Linux and called it SWELL - Simple Windows Emulation Layer. Ok, actually no-one else would do it. It was an insane idea but it worked. It’s important to not over-state SWELL capabilities. It’s not Wine. You can’t take any win32 program and recompile for Mac with SWELL. Frankel is insanely pragmatic and so is his code. SWELL only implements the subset of APIs he uses in Reaper. At the same time Reaper is a big app so if SWELL works for Reaper, it could work for your app. WDL is open-source using permissive MIT license. Sublime Text For a few years Sublime Text was THE programmer’s editor. It was written by a single developer in C++ and he wrote a custom UI toolkit for it. Not open source but its existence shows it can be done. RAD Debugger RAD Debugger is an open-source Windows debugger for C/C++ apps written in C by mostly a single person. It implements a custom UI framework based on 3D renderer. The UI is integral part of the the app but the code is well structured so you probably can take just their UI / render code and use it in your own C / C++ app. Currently the app / UI is only for Windows but it’s designed to be cross-platform and they are working on porting the renderer to Mac OS / Linux. They use permissive MIT license and everything is written in C. Dear ImGUI Dear ImGui is a newer cross-platform, UI framework in C++. Open source, permissive MIT license. Written by mostly a single person. Ghostty Ghostty is a cross-platform terminal emulator and UI. It’s written in Zig by mostly a single person and uses it’s own low-level GPU renderer for the UI. You too can write your own UI framework At first the idea of writing your own UI framework seems impossibly daunting. What I’m hoping to show is that if you’re ambitious enough it’s possible to build cross platform desktop apps that are not just bloated 100MB Chrome wrappers around few kilobytes of custom code. I’m not saying it’s a simple thing, just that enough people did it that it’s possible. It shouldn’t be necessary but both Microsoft and Apple have tragically dropped the ball on providing decent, high-performance UI libraries for their OS. Microsoft even writes their own apps, like Teams, in web technologies. Thanks to open source you’re not at the staring line. You can just use Dear ImGUI or WDL’s SWELL. Or you can extract the UI code from RAD Debugger or Ghostty (if you write in Zig). Or you can look at how their implementation to speed up your own design and implementation.

yesterday 2 votes
Logic for Programmers Turns One

I released Logic for Programmers exactly one year ago today. It feels weird to celebrate the anniversary of something that isn't 1.0 yet, but software projects have a proud tradition of celebrating a dozen anniversaries before 1.0. I wanted to share about what's changed in the past year and the work for the next six+ months. The Road to 0.1 I had been noodling on the idea of a logic book since the pandemic. The first time I wrote about it on the newsletter was in 2021! Then I said that it would be done by June and would be "under 50 pages". The idea was to cover logic as a "soft skill" that helped you think about things like requirements and stuff. That version sucked. If you want to see how much it sucked, I put it up on Patreon. Then I slept on the next draft for three years. Then in 2024 a lot of business fell through and I had a lot of free time, so with the help of Saul Pwanson I rewrote the book. This time I emphasized breadth over depth, trying to cover a lot more techniques. I also decided to self-publish it instead of pitching it to a publisher. Not going the traditional route would mean I would be responsible for paying for editing, advertising, graphic design etc, but I hoped that would be compensated by much higher royalties. It also meant I could release the book in early access and use early sales to fund further improvements. So I wrote up a draft in Sphinx, compiled it to LaTeX, and uploaded the PDF to leanpub. That was in June 2024. Since then I kept to a monthly cadence of updates, missing once in November (short-notice contract) and once last month (Systems Distributed). The book's now on v0.10. What's changed? A LOT v0.1 was very obviously an alpha, and I have made a lot of improvements since then. For one, the book no longer looks like a Sphinx manual. Compare! Also, the content is very, very different. v0.1 was 19,000 words, v.10 is 31,000.1 This comes from new chapters on TLA+, constraint/SMT solving, logic programming, and major expansions to the existing chapters. Originally, "Simplifying Conditionals" was 600 words. Six hundred words! It almost fit in two pages! The chapter is now 2600 words, now covering condition lifting, quantifier manipulation, helper predicates, and set optimizations. All the other chapters have either gotten similar facelifts or are scheduled to get facelifts. The last big change is the addition of book assets. Originally you had to manually copy over all of the code to try it out, which is a problem when there are samples in eight distinct languages! Now there are ready-to-go examples for each chapter, with instructions on how to set up each programming environment. This is also nice because it gives me breaks from writing to code instead. How did the book do? Leanpub's all-time visualizations are terrible, so I'll just give the summary: 1180 copies sold, $18,241 in royalties. That's a lot of money for something that isn't fully out yet! By comparison, Practical TLA+ has made me less than half of that, despite selling over 5x as many books. Self-publishing was the right choice! In that time I've paid about $400 for the book cover (worth it) and maybe $800 in Leanpub's advertising service (probably not worth it). Right now that doesn't come close to making back the time investment, but I think it can get there post-release. I believe there's a lot more potential customers via marketing. I think post-release 10k copies sold is within reach. Where is the book going? The main content work is rewrites: many of the chapters have not meaningfully changed since 1.0, so I am going through and rewriting them from scratch. So far four of the ten chapters have been rewritten. My (admittedly ambitious) goal is to rewrite three of them by the end of this month and another three by the end of next. I also want to do final passes on the rewritten chapters; as most of them have a few TODOs left lying around. (Also somehow in starting this newsletter and publishing it I realized that one of the chapters might be better split into two chapters, so there could well-be a tenth technique in v0.11 or v0.12!) After that, I will pass it to a copy editor while I work on improving the layout, making images, and indexing. I want to have something worthy of printing on a dead tree by 1.0. In terms of timelines, I am very roughly estimating something like this: Summer: final big changes and rewrites Early Autumn: graphic design and copy editing Late Autumn: proofing, figuring out printing stuff Winter: final ebook and initial print releases of 1.0. (If you know a service that helps get self-published books "past the finish line", I'd love to hear about it! Preferably something that works for a fee, not part of royalties.) This timeline may be disrupted by official client work, like a new TLA+ contract or a conference invitation. Needless to say, I am incredibly excited to complete this book and share the final version with you all. This is a book I wished for years ago, a book I wrote because nobody else would. It fills a critical gap in software educational material, and someday soon I'll be able to put a copy on my bookshelf. It's exhilarating and terrifying and above all, satisfying. It's also 150 pages vs 50 pages, but admittedly this is partially because I made the book smaller with a larger font. ↩

2 days ago 4 votes
Implementing UI translation in SumatraPDF, a C++ Windows application

Translating user interface of SumatraPDF SumatraPDF is the best PDF/eBook/Comic Book viewer for Windows. It’s small, fast, full of features, free and open-source. It became popular enough that it made sense to translate the UI for non-English users. Currently we support 72 languages. This article describes how I designed and implemented a translation system in SumatraPDF, a native win32 C++ Windows application. Hard things about translating the UI There are 2 hard things about translating an application code for translation system (extracting strings to translate, translate strings from English to user’s language) translating them into many languages Extracting strings to translate from source code Currently there are 381 strings in SumatraPDF subject to translation. It’s important that the system requires the least amount of effort when adding new strings to translate. Every string that needs to be translated is marked in .cpp or .h file with one of two macros: _TRA("Rename") _TRN("Open") I have a script that extracts those strings from source files. Mine is written in Go but it could just as well be Python or JavaScript. It’s a simple regex job. _TR stands for “translation”. _TRA(s) expands into const char* trans::GetTranslation(const char* str) function which returns str translated to current UI language. We auto-detect language at startup based on Windows settings and allow the user to explicitly set UI language. For English we just return the original string. If a string to be translated is e.g. a part of const char* array[], we can’t use trans::GetTranslation(). For cases like that we have _TRN() which expands to English string. We have to write code to translate it at some point. Adding new strings is therefore as simple as wrapping them in _TRA() or _TRN() macros. Translating strings into many languages Now that we’ve extracted strings to be translated, we need to translate them into 72 languages. SumatraPDF is a free, open-source program. I don’t have a budget to hire translators. I don’t have a budget, period. The only option was to get help from SumatraPDF users. It was vital to make it very easy for users to send me translations. I didn’t want to ask them, for example, to download some translation software. Design and implementation of AppTranslator web app I couldn’t find a really simple software for crowd sourcing translations so I wrote my own: https://github.com/kjk/apptranslator You can see it in action: https://www.apptranslator.org/app/SumatraPDF I designed it to be generic but I don’t think anyone else is using it. AppTranslator is simple. Per https://tools.arslexis.io/wc/: 4k lines of Go server code 451 lines of html code a single dependency: bootstrap CSS framework (the project is old) It’s simple because I don’t want to spend a lot of time writing translation software. It’s just a side project in service of the goal of translating SumatraPDF. Login is exclusively via GitHub. It doesn’t even use a database. Like in Redis, changes are stored as a series of operations in an append-only log. We keep the whole state in memory and re-create it from the log at startup. Main operation is translate a string from English to language X represented as [kOpTranslation, english string, language, translation, user who provided translation]. When user provides a translation in the web UI, we send an API call to the server which appends the translation operation to the log. Simple and reliable. Because the code is written in Go, it’s very fast and memory efficient. When running it uses mere megabytes of RAM. It can comfortably run on the smallest 256 MB VPS server. I backup the log to S3 so if the server ever fails, I can re-install the program on a new server and re-download the translations from S3. I provide RSS feed for each language so that people who provide translations can monitor for new strings to be translated. Sending strings for translation and receiving translations So I have a web app for collecting translations and a script that extracts strings to be translated from source code. How do they connect? AppTranslator has an API for submitting the current set of strings to be translated in the simplest possible format: a line for each string (I ensure there are no newlines in the string itself by escaping them with \n) API is password protected because only I can submit the strings. The server compares the strings sent with the current set and records a difference in the log. It also sends a response with translations. Again the simplest possible format: AppTranslator: SumatraPDF 651b739d7fa110911f25563c933f42b1d37590f8 :%s annotation. Ctrl+click to edit. am:%s մեկնաբանություն: Ctrl+քլիք՝ խմբագրելու համար: ar:ملاحظة %s. اضغط Ctrl للتحرير. az:Qeyd %s. Düzəliş etmək üçün Ctrl+düyməyə basın. As you can see: a string to translate is on a line starting with : is followed by translations of that strings in the format: ${lang}: ${translation} An optimization: 651b739d7fa110911f25563c933f42b1d37590f8 is a hash of this response. If I submit this hash with my request and translations didn’t change on the server, the response is empty. Implementing C++ part of translation system So now I have a text file with translation downloaded from the server. How do I get a translation in my C++ code? As with everything in SumatraPDF, I try to do things in a simple and efficient way. The whole Translation.cpp is only 239 lines of code. The core of translation system is const char* trans::GetTranslation(const char* s); function. I embed the translations in exact the same format as received from AppTranslator in the executable as data file in resources. If the UI language is English, we do nothing. trans::GetTranslation() returns its argument. When we switch the language, we load the translations from resources and build an index: an array of English strings an array of corresponding translations Both arrays use my own StrVec class optimized for storing an array of strings. To find a translation we scan the first array to find an index of the string and return translation from the second array, at the same index. Linear scan seems like it would be slow but it isn’t. Resizing dialogs I have a few dialogs defined in SumatraPDF.rc file. The problem with dialogs is that position of UI elements is fixed. A translated string will almost certainly have a different size than the English string which will mess up fixed layout. Thankfully someone wrote DialogSizer that smartly resizes dialogs and solves this problem. The evolution of a solution No AppTranslator My initial implementation was simpler. I didn’t yet have AppTranslator so I stored the strings in a text file in repository in the same format as what I described above. People would download it, make changes using a text editor and send me the file via email which I would then checkin. It worked for a while but it became worse over time. More strings, more languages created more work for me to manually manage e-mail submissions. I decided to automate the process. Code generation My first implementation of C++ side used code generation instead of embedding the text file in resources. My Go script would generate C++ source code files with static const char* [] arrays. This worked well but I decided to improve it further by making the code use the text file with translations embedded in the app. The main motivation for the change was to open a possibility of downloading latest translations from the server to fix the problem of translations not being all ready when I build the release executable. I haven’t done that yet but it’s now easier to implement given that the format of strings embedded in the exe is the same as the one I can download from AppTranslator. Only utf-8 SumatraPDF started by using both WCHAR* Unicode strings and char* utf8 strings. For that reason the translation system had to support returning translation in both WCHAR* and char* version. Over time I refactored the code to use mostly utf8 and at some point I no longer needed to support WCHAR* version. That made the code even smaller and reduced memory usage. The experience I’m happy how things turned out. AppTranslator proved to be reliable and hassle free. It runs for many years now and collected 35440 string translations from users. I automated everything so that all I need to do is to periodically re-run the script that extracts strings from source code, uploads them to AppTranslator and downloads latest translations. One problem is that translations are not always ready in time for release so I make a release and then people start translating strings added since last release. I’ve considered downloading the latest translations from the server, in addition to embedding them in an executable at the time of building the app. Would I do the same today? While AppTranslator is reliable and doesn’t require on-going work, it would be better to not have to run a server at all. The world has changed since I started SumatraPDF. Namely: people are comfortable using GitHub and you can edit files directly in GitHub UI. It’s not a great experience but it works. One option would be to generate a translation text file for each language, in this format: :first untranslated string :second untranslated string :first translated string translation of first string :second translated string translation of second string Untranslated strings are listed at the top, to make it easier to find. A link would send a translator directly to edit this file in GitHub UI. When translator saves translations, it creates a PR for me to review and merge. The roads not taken But why did you re-invent everything? You should do X instead. All other X that I know about suck. Using per-language .rc resource files Traditional way of localizing / translating Window GUI apps is to store all strings and dialog definitions in an .rc file. Each language gets its own .rc file (or files) and the program picks the right resource based on a language. This doesn’t solve the 2 hard problems: having an easy way to add strings for translations having an easy way for users to provide translations XML horror show There was a dark time when the world was under the iron grip of XML fanaticism. Everything had to be an XML file even when it was the worst possible solution for the problem. XML doesn’t solve the 2 hard problems and a string storage format is an absolute nightmare for human editing. GNU gettext There’s a C library gettext that uses .po files. This is much saner solution than XML horror show. .po files are relatively simple text format. The code is already written. Warning: tooting my own horn. My format is better. It’s easier for people to edit, it’s easier to write code to parse it. This looks like many times more than 239 lines of code. Ok, gettext probably does a bit more than my code, but clearly nothing than I need. It also doesn’t solve the 2 hard problems. I would still have to write code to extract strings from source code and build a way to allow users to translate them easily.

2 days ago 3 votes