Full Width [alt+shift+f] Shortcuts [alt+shift+k]
Sign Up [alt+shift+s] Log In [alt+shift+l]
31
I left school in 2011, and I graduated from university in 2014. When I was done, I had six plastic crates full of paper – exercise books, worksheets, school newsletters – everything I’d accumulated over nearly two decades of education. It was a lot. 🤯 I’m trying to reduce this to a more manageable size. There’s some stuff I definitely want to keep – letters, handwritten notes, major bits of work – but there’s a lot of stuff I’m never going to look at again, which can be safely recycled. For example, I had over thirty maths exercise books, which are very repetitive and pretty meaningless without the accompanying textbook. I’ve kept one or two representative examples, but I definitely don’t need all of them. To be on the safe side, I’ve been scanning and digitising as I go, so I have digital copies of everything I’ve thrown away. It’s taken a while and there’s still more I could digitise and recycle, but I’ve substantially reduced the pile. To give you an idea of scale, I’ve...
over a year ago

Improve your reading experience

Logged in users get linked directly to articles resulting in a better reading experience. Please login for free, it takes less than 1 minute.

More from alexwlchan

Handling JSON objects with duplicate names in Python

Consider the following JSON object: { "sides": 4, "colour": "red", "sides": 5, "colour": "blue" } Notice that sides and colour both appear twice. This looks invalid, but I learnt recently that this is actually legal JSON syntax! It’s unusual and discouraged, but it’s not completely forbidden. This was a big surprise to me. I think of JSON objects as key/value pairs, and I associate them with data structures like a dict in Python or a Hash in Ruby – both of which only allow unique keys. JSON has no such restriction, and I started thinking about how to handle it. What does the JSON spec say about duplicate names? JSON is described by several standards, which Wikipedia helpfully explains for us: After RFC 4627 had been available as its “informational” specification since 2006, JSON was first standardized in 2013, as ECMA‑404. RFC 8259, published in 2017, is the current version of the Internet Standard STD 90, and it remains consistent with ECMA‑404. That same year, JSON was also standardized as ISO/IEC 21778:2017. The ECMA and ISO/IEC standards describe only the allowed syntax, whereas the RFC covers some security and interoperability considerations. All three of these standards explicitly allow the use of duplicate names in objects. ECMA‑404 and ISO/IEC 21778:2017 have identical text to describe the syntax of JSON objects, and they say (emphasis mine): An object structure is represented as a pair of curly bracket tokens surrounding zero or more name/value pairs. […] The JSON syntax does not impose any restrictions on the strings used as names, does not require that name strings be unique, and does not assign any significance to the ordering of name/value pairs. These are all semantic considerations that may be defined by JSON processors or in specifications defining specific uses of JSON for data interchange. RFC 8259 goes further and strongly recommends against duplicate names, but the use of SHOULD means it isn’t completely forbidden: The names within an object SHOULD be unique. The same document warns about the consequences of ignoring this recommendation: An object whose names are all unique is interoperable in the sense that all software implementations receiving that object will agree on the name-value mappings. When the names within an object are not unique, the behavior of software that receives such an object is unpredictable. Many implementations report the last name/value pair only. Other implementations report an error or fail to parse the object, and some implementations report all of the name/value pairs, including duplicates. So it’s technically valid, but it’s unusual and discouraged. I’ve never heard of a use case for JSON objects with duplicate names. I’m sure there was a good reason for it being allowed by the spec, but I can’t think of it. Most JSON parsers – including jq, JavaScript, and Python – will silently discard all but the last instance of a duplicate name. Here’s an example in Python: >>> import json >>> json.loads('{"sides": 4, "colour": "red", "sides": 5, "colour": "blue"}') {'colour': 'blue', 'sides': 5} What if I wanted to decode the whole object, or throw an exception if I see duplicate names? This happened to me recently. I was editing a JSON file by hand, and I’d copy/paste objects to update the data. I also had scripts which could update the file. I forgot to update the name on one of the JSON objects, so there were two name/value pairs with the same name. When I ran the script, it silently erased the first value. I was able to recover the deleted value from the Git history, but I wondered how I could prevent this happening again. How could I make the script fail, rather than silently delete data? Decoding duplicate names in Python When Python decodes a JSON object, it first parses the object as a list of name/value pairs, then it turns that list of name value pairs into a dictionary. We can see this by looking at the JSONObject function in the CPython source code: it builds a list pairs, and at the end of the function, it calls dict(pairs) to turn the list into a dictionary. This relies on the fact that dict() can take an iterable of key/value tuples and create a dictionary: >>> dict([('sides', 4), ('colour', 'red')]) {'colour': 'red', 'sides': 4} The docs for dict() tell us that it` will discard duplicate keys: “if a key occurs more than once, the last value for that key becomes the corresponding value in the new dictionary”. >>> dict([('sides', 4), ('colour', 'red'), ('sides', 5), ('colour', 'blue')]) {'colour': 'blue', 'sides': 5} We can customise what Python does with the list of name/value pairs. Rather than calling dict(), we can pass our own function to the object_pairs_hook parameter of json.loads(), and Python will call that function on the list of pairs. This allows us to parse objects in a different way. For example, we can just return the literal list of name/value pairs: >>> import json >>> json.loads( ... '{"sides": 4, "colour": "red", "sides": 5, "colour": "blue"}', ... object_pairs_hook=lambda pairs: pairs ... ) ... [('sides', 4), ('colour', 'red'), ('sides', 5), ('colour', 'blue')] We could also use the multidict library to get a dict-like data structure which supports multiple values per key. This is based on HTTP headers and URL query strings, two environments where it’s common to have multiple values for a single key: >>> from multidict import MultiDict >>> md = json.loads( ... '{"sides": 4, "colour": "red", "sides": 5, "colour": "blue"}', ... object_pairs_hook=lambda pairs: MultiDict(pairs) ... ) ... >>> md <MultiDict('sides': 4, 'colour': 'red', 'sides': 5, 'colour': 'blue')> >>> md['sides'] 4 >>> md.getall('sides') [4, 5] Preventing silent data loss If we want to throw an exception when we see duplicate names, we need a longer function. Here’s the code I wrote: import collections import typing def dict_with_unique_names(pairs: list[tuple[str, typing.Any]]) -> dict[str, typing.Any]: """ Convert a list of name/value pairs to a dict, but only if the names are unique. If there are non-unique names, this function throws a ValueError. """ # First try to parse the object as a dictionary; if it's the same # length as the pairs, then we know all the names were unique and # we can return immediately. pairs_as_dict = dict(pairs) if len(pairs_as_dict) == len(pairs): return pairs_as_dict # Otherwise, let's work out what the repeated name(s) were, so we # can throw an appropriate error message for the user. name_tally = collections.Counter(n for n, _ in pairs) repeated_names = [n for n, count in name_tally.items() if count > 1] assert len(repeated_names) > 0 if len(repeated_names) == 1: raise ValueError(f"Found repeated name in JSON object: {repeated_names[0]}") else: raise ValueError( f"Found repeated names in JSON object: {', '.join(repeated_names)}" ) If I use this as my object_pairs_hook when parsing an object which has all unique names, it returns the normal dict I’d expect: >>> json.loads( ... '{"sides": 4, "colour": "red"}', ... object_pairs_hook=dict_with_unique_names ... ) ... {'colour': 'red', 'sides': 4} But if I’m parsing an object with one or more repeated names, the parsing fails and throws a ValueError: >>> json.loads( ... '{"sides": 4, "colour": "red", "sides": 5}', ... object_pairs_hook=dict_with_unique_names ... ) Traceback (most recent call last): […] ValueError: Found repeated name in JSON object: sides >>> json.loads( ... '{"sides": 4, "colour": "red", "sides": 5, "colour": "blue"}', ... object_pairs_hook=dict_with_unique_names ... ) Traceback (most recent call last): […] ValueError: Found repeated names in JSON object: sides, colour This is precisely the behaviour I want – throwing an exception, not silently dropping data. Encoding non-unique names in Python It’s hard to think of a use case, but this post feels incomplete without at least a brief mention. If you want to encode custom data structures with Python’s JSON library, you can subclass JSONEncoder and define how those structures should be serialised. Here’s a rudimentary attempt at doing that for a MultiDict: class MultiDictEncoder(json.JSONEncoder): def encode(self, o: typing.Any) -> str: # If this is a MultiDict, we need to construct the JSON string # manually -- first encode each name/value pair, then construct # the JSON object literal. if isinstance(o, MultiDict): name_value_pairs = [ f'{super().encode(str(name))}: {self.encode(value)}' for name, value in o.items() ] return '{' + ', '.join(name_value_pairs) + '}' return super().encode(o) and here’s how you use it: >>> md = MultiDict([('sides', 4), ('colour', 'red'), ('sides', 5), ('colour', 'blue')]) >>> json.dumps(md, cls=MultiDictEncoder) {"sides": 4, "colour": "red", "sides": 5, "colour": "blue"} This is rough code, and you shouldn’t use it – it’s only an example. I’m constructing the JSON string manually, so it doesn’t handle edge cases like indentation or special characters. There are almost certainly bugs, and you’d need to be more careful if you wanted to use this for real. In practice, if I had to encode a multi-dict as JSON, I’d encode it as a list of objects which each have a key and a value field. For example: [ {"key": "sides", "value": 4 }, {"key": "colour", "value": "red" }, {"key": "sides", "value": 5 }, {"key": "colour", "value": "blue"}, ] This is a pretty standard pattern, and it won’t trip up JSON parsers which aren’t expecting duplicate names. Do you need to worry about this? This isn’t a big deal. JSON objects with duplicate names are pretty unusual – this is the first time I’ve ever encountered one, and it was a mistake. Trying to account for this edge case in every project that uses JSON would be overkill. It would add complexity to my code and probably never catch a single error. This started when I made a copy/paste error that introduced the initial duplication, and then a script modified the JSON file and caused some data loss. That’s a somewhat unusual workflow, because most JSON files are exclusively modified by computers, and this wouldn’t be an issue. I’ve added this error handling to my javascript-data-files library, but I don’t anticipate adding it to other projects. I use that library for my static website archives, which is where I had this issue. Although I won’t use this code exactly, it’s been good practice at writing custom encoders/decoders in Python. That is something I do all the time – I’m often encoding native Python types as JSON, and I want to get the same type back when I decode later. I’ve been writing my own subclasses of JSONEncoder and JSONDecoder for a while. Now I know a bit more about how Python decodes JSON, and object_pairs_hook is another tool I can consider using. This was a fun deep dive for me, and I hope you found it helpful too. [If the formatting of this post looks odd in your feed reader, visit the original article]

3 weeks ago 7 votes
A faster way to copy SQLite databases between computers

I store a lot of data in SQLite databases on remote servers, and I often want to copy them to my local machine for analysis or backup. When I’m starting a new project and the database is near-empty, this is a simple rsync operation: $ rsync --progress username@server:my_remote_database.db my_local_database.db As the project matures and the database grows, this gets slower and less reliable. Downloading a 250MB database from my web server takes about a minute over my home Internet connection, and that’s pretty small – most of my databases are multiple gigabytes in size. I’ve been trying to make these copies go faster, and I recently discovered a neat trick. What really slows me down is my indexes. I have a lot of indexes in my SQLite databases, which dramatically speed up my queries, but also make the database file larger and slower to copy. (In one database, there’s an index which single-handedly accounts for half the size on disk!) The indexes don’t store anything unique – they just duplicate data from other tables to make queries faster. Copying the indexes makes the transfer less efficient, because I’m copying the same data multiple times. I was thinking about ways to skip copying the indexes, and I realised that SQLite has built-in tools to make this easy. Dumping a database as a text file SQLite allows you to dump a database as a text file. If you use the .dump command, it prints the entire database as a series of SQL statements. This text file can often be significantly smaller than the original database. Here’s the command: $ sqlite3 my_database.db .dump > my_database.db.txt And here’s what the beginning of that file looks like: PRAGMA foreign_keys=OFF; BEGIN TRANSACTION; CREATE TABLE IF NOT EXISTS "tags" ( [name] TEXT PRIMARY KEY, [count_uses] INTEGER NOT NULL ); INSERT INTO tags VALUES('carving',260); INSERT INTO tags VALUES('grass',743); … Crucially, this reduces the large and disk-heavy indexes into a single line of text – it’s an instruction to create an index, not the index itself. CREATE INDEX [idx_photo_locations] ON [photos] ([longitude], [latitude]); This means that I’m only storing each value once, rather than the many times it may be stored across the original table and my indexes. This is how the text file can be smaller than the original database. If you want to reconstruct the database, you pipe this text file back to SQLite: $ cat my_database.db.txt | sqlite3 my_reconstructed_database.db Because the SQL statements are very repetitive, this text responds well to compression: $ sqlite3 explorer.db .dump | gzip -c > explorer.db.txt.gz To give you an idea of the potential savings, here’s the relative disk size for one of my databases. File Size on disk original SQLite database 3.4 GB text file (sqlite3 my_database.db .dump) 1.3 GB gzip-compressed text (sqlite3 my_database.db .dump | gzip -c) 240 MB The gzip-compressed text file is 14× smaller than the original SQLite database – that makes downloading the database much faster. My new ssh+rsync command Rather than copying the database directly, now I create a gzip-compressed text file on the server, copy that to my local machine, and reconstruct the database. Like so: # Create a gzip-compressed text file on the server ssh username@server "sqlite3 my_remote_database.db .dump | gzip -c > my_remote_database.db.txt.gz" # Copy the gzip-compressed text file to my local machine rsync --progress username@server:my_remote_database.db.txt.gz my_local_database.db.txt.gz # Remove the gzip-compressed text file from my server ssh username@server "rm my_remote_database.db.txt.gz" # Uncompress the text file gunzip my_local_database.db.txt.gz # Reconstruct the database from the text file cat my_local_database.db.txt | sqlite3 my_local_database.db # Remove the local text file rm my_local_database.db.txt A database dump is a stable copy source This approach fixes another issue I’ve had when copying SQLite databases. If it takes a long time to copy a database and it gets updated midway through, rsync may give me an invalid database file. The first half of the file is pre-update, the second half file is post-update, and they don’t match. When I try to open the database locally, I get an error: database disk image is malformed By creating a text dump before I start the copy operation, I’m giving rsync a stable copy source. That text dump isn’t going to change midway through the copy, so I’ll always get a complete and consistent text file. This approach has saved me hours when working with large databases, and made my downloads both faster and more reliable. If you have to copy around large SQLite databases, give it a try. [If the formatting of this post looks odd in your feed reader, visit the original article]

a month ago 7 votes
A flash of light in the darkness

I support dark mode on this site, and as part of the dark theme, I have a colour-inverted copy of the default background texture. I like giving my website a subtle bit of texture, which I think makes it stand out from a web which is mostly solid-colour backgrounds. Both my textures are based on the “White Waves” pattern made by Stas Pimenov. I was setting these images as my background with two CSS rules, using the prefers-color-scheme: dark media feature to use the alternate image in dark mode: body { background: url('https://alexwlchan.net/theme/white-waves-transparent.png'); } @media (prefers-color-scheme: dark) { body { background: url('https://alexwlchan.net/theme/black-waves-transparent.png'); } } This works, mostly. But I prefer light mode, so while I wrote this CSS and I do some brief testing whenever I make changes, I’m not using the site in dark mode. I know how dark mode works in my local development environment, not how it feels as a day-to-day user. Late last night I was using my phone in dark mode to avoid waking the other people in the house, and I opened my site. I saw a brief flash of white, and then the dark background texture appeared. That flash of bright white is precisely what you don’t want when you’re using dark mode, but it happened anyway. I made a note to work it out in the morning, then I went to bed. Now I’m fully awake, it’s obvious what happened. Because my only background is the image URL, there’s a brief gap between the CSS being parsed and the background image being loaded. In that time, the browser doesn’t have anything to put in the background, so you just get pure white. This was briefly annoying in the moment, but it would be even more worse if the background texture never loaded. I have light text on black in dark mode, but without the background image it’s just light text on white, which is barely readable: I never noticed this in local development, because I’m usually working in a well-lit room where that white flash would be far less obvious. I’m also using a local version of the site, which loads near-instantly and where the background image is almost certainly saved in my browser cache. I’ve made two changes to prevent this happening again. I’ve added a colour to use as a fallback until the image loads. The CSS background property supports adding a colour, which is used until the image loads, or as a fallback if it doesn’t. I already use this in a few places, and now I’ve added it to my body background. body { background: url('https://…/white-waves-transparent.png') #fafafa; } @media (prefers-color-scheme: dark) { body { background: url('https://…/black-waves-transparent.png') #0d0d0d; } } This avoids the flash of unstyled background before the image loads – the browser will use a solid dark background until it gets the texture. I’ve added rel="preload" elements to the head of the page, so the browser will start loading the background textures faster. These elements are a clue to the browser that these resources are going to be useful when it renders the page, so it should start loading them as soon as possible: <link rel="preload" href="https://alexwlchan.net/theme/white-waves-transparent.png" as="image" type="image/png" media="(prefers-color-scheme: light)" /> <link rel="preload" href="https://alexwlchan.net/theme/black-waves-transparent.png" as="image" type="image/png" media="(prefers-color-scheme: dark)" /> This means the browser is downloading the appropriate texture at the same time as it’s downloading the CSS file. Previously it had to download the CSS file, parse it, and only then would it know to start downloading the texture. With the preload, it’s a bit faster! The difference is probably imperceptible if you’re on a fast connection, but it’s a small win and I can’t see any downside (as long as I scope the preload correctly, and don’t preload resources I don’t end up using). I’ve seen a lot of sites using <link rel="preload"> and I’ve only half-understood what it is and why it’s useful – I’m glad to have a chance to use it myself, so I can understand it better. This bug reminds me of a phenomenon called flash of unstyled text. Back when custom fonts were fairly new, you’d often see web pages appear briefly with the default font before custom fonts finished loading. There are well-understood techniques for preventing this, so it’s unusual to see that brief unstyled text on modern web pages – but the same issue is affecting me in dark mode I avoided using custom fonts on the web to avoid tackling this issue, but it got me anyway! In these dark times for the web, old bugs are new again. [If the formatting of this post looks odd in your feed reader, visit the original article]

a month ago 20 votes
Beyond `None`: actionable error messages for `keyring.get_password()`

I’m a big fan of keyring, a Python module made by Jason R. Coombs for storing secrets in the system keyring. It works on multiple operating systems, and it knows what password store to use for each of them. For example, if you’re using macOS it puts secrets in the Keychain, but if you’re on Windows it uses Credential Locker. The keyring module is a safe and portable way to store passwords, more secure than using a plaintext config file or an environment variable. The same code will work on different platforms, because keyring handles the hard work of choosing which password store to use. It has a straightforward API: the keyring.set_password and keyring.get_password functions will handle a lot of use cases. >>> import keyring >>> keyring.set_password("xkcd", "alexwlchan", "correct-horse-battery-staple") >>> keyring.get_password("xkcd", "alexwlchan") "correct-horse-battery-staple" Although this API is simple, it’s not perfect – I have some frustrations with the get_password function. In a lot of my projects, I’m now using a small function that wraps get_password. What do I find frustrating about keyring.get_password? If you look up a password that isn’t in the system keyring, get_password returns None rather than throwing an exception: >>> print(keyring.get_password("xkcd", "the_invisible_man")) None I can see why this makes sense for the library overall – a non-existent password is very normal, and not exceptional behaviour – but in my projects, None is rarely a usable value. I normally use keyring to retrieve secrets that I need to access protected resources – for example, an API key to call an API that requires authentication. If I can’t get the right secrets, I know I can’t continue. Indeed, continuing often leads to more confusing errors when some other function unexpectedly gets None, rather than a string. For a while, I wrapped get_password in a function that would throw an exception if it couldn’t find the password: def get_required_password(service_name: str, username: str) -> str: """ Get password from the specified service. If a matching password is not found in the system keyring, this function will throw an exception. """ password = keyring.get_password(service_name, username) if password is None: raise RuntimeError(f"Could not retrieve password {(service_name, username)}") return password When I use this function, my code will fail as soon as it fails to retrieve a password, rather than when it tries to use None as the password. This worked well enough for my personal projects, but it wasn’t a great fit for shared projects. I could make sense of the error, but not everyone could do the same. What’s that password meant to be? A good error message explains what’s gone wrong, and gives the reader clear steps for fixing the issue. The error message above is only doing half the job. It tells you what’s gone wrong (it couldn’t get the password) but it doesn’t tell you how to fix it. As I started using this snippet in codebases that I work on with other developers, I got questions when other people hit this error. They could guess that they needed to set a password, but the error message doesn’t explain how, or what password they should be setting. For example, is this a secret they should pick themselves? Is it a password in our shared password vault? Or do they need an API key for a third-party service? If so, where do they find it? I still think my initial error was an improvement over letting None be used in the rest of the codebase, but I realised I could go further. This is my extended wrapper: def get_required_password(service_name: str, username: str, explanation: str) -> str: """ Get password from the specified service. If a matching password is not found in the system keyring, this function will throw an exception and explain to the user how to set the required password. """ password = keyring.get_password(service_name, username) if password is None: raise RuntimeError( "Unable to retrieve required password from the system keyring!\n" "\n" "You need to:\n" "\n" f"1/ Get the password. Here's how: {explanation}\n" "\n" "2/ Save the new password in the system keyring:\n" "\n" f" keyring set {service_name} {username}\n" ) return password The explanation argument allows me to explain what the password is for to a future reader, and what value it should have. That information can often be found in a code comment or in documentation, but putting it in an error message makes it more visible. Here’s one example: get_required_password( "flask_app", "secret_key", explanation=( "Pick a random value, e.g. with\n" "\n" " python3 -c 'import secrets; print(secrets.token_hex())'\n" "\n" "This password is used to securely sign the Flask session cookie. " "See https://flask.palletsprojects.com/en/stable/config/#SECRET_KEY" ), ) If you call this function and there’s no keyring entry for flask_app/secret_key, you get the following error: Unable to retrieve required password from the system keyring! You need to: 1/ Get the password. Here's how: Pick a random value, e.g. with python3 -c 'import secrets; print(secrets.token_hex())' This password is used to securely sign the Flask session cookie. See https://flask.palletsprojects.com/en/stable/config/#SECRET_KEY 2/ Save the new password in the system keyring: keyring set flask_app secret_key It’s longer, but this error message is far more informative. It tells you what’s wrong, how to save a password, and what the password should be. This is based on a real example where the previous error message led to a misunderstanding. A co-worker saw a missing password called “secret key” and thought it referred to a secret key for calling an API, and didn’t realise it was actually for signing Flask session cookies. Now I can write a more informative error message, I can prevent that misunderstanding happening again. (We also renamed the secret, for additional clarity.) It takes time to write this explanation, which will only ever be seen by a handful of people, but I think it’s important. If somebody sees it at all, it’ll be when they’re setting up the project for the first time. I want that setup process to be smooth and straightforward. I don’t use this wrapper in all my code, particularly small or throwaway toys that won’t last long enough for this to be an issue. But in larger codebases that will be used by other developers, and which I expect to last a long time, I use it extensively. Writing a good explanation now can avoid frustration later. [If the formatting of this post looks odd in your feed reader, visit the original article]

a month ago 18 votes
Localising the `` with JavaScript

I’ve been writing some internal dashboards recently, and one hard part is displaying timestamps. Our server does everything in UTC, but the team is split across four different timezones, so the server timestamps aren’t always easy to read. For most people, it’s harder to understand a UTC timestamp than a timestamp in your local timezone. Did that event happen just now, an hour ago, or much further back? Was that at the beginning of your working day? Or at the end? Then I remembered that I tried to solve this five years ago at a previous job. I wrote a JavaScript snippet that converts UTC timestamps into human-friendly text. It displays times in your local time zone, and adds a short suffix if the time happened recently. For example: today @ 12:00 BST (1 hour ago) In my old project, I was using writing timestamps in a <div> and I had to opt into the human-readable text for every date on the page. It worked, but it was a bit fiddly. Doing it again, I thought of a more elegant solution. HTML has a <time> element for expressing datetimes, which is a more meaningful wrapper than a <div>. When I render the dashboard on the server, I don’t know the user’s timezone, so I include the UTC timestamp in the page like so: <time datetime="2025-04-15 19:45:00Z"> Tue, 15 Apr 2025 at 19:45 UTC </time> I put a machine-readable date and time string with a timezone offset string in the datetime attribute, and then a more human-readable string in the text of the element. Then I add this JavaScript snippet to the page: window.addEventListener("DOMContentLoaded", function() { document.querySelectorAll("time").forEach(function(timeElem) { // Set the `title` attribute to the original text, so a user // can hover over a timestamp to see the UTC time. timeElem.setAttribute("title", timeElem.innerText); // Replace the display text with a human-friendly date string // which is localised to the user's timezone. timeElem.innerText = getHumanFriendlyDateString( timeElem.getAttribute("datetime") ); }) }); This updates any <time> element on the page to use a human friendly date string, which is localised to the user’s timezone. For example, I’m in the UK so that becomes: <time datetime="2025-04-15 19:45:00Z" title="Tue, 15 Apr 2025 at 19:45 UTC"> Tue, 15 Apr 2025 at 20:45 BST </time> In my experience, these timestamps are easier and more intuitive for people to read. I always include a timezone string (e.g. BST, EST, PDT) so it’s obvious that I’m showing a localised timestamp. If you really need the UTC timestamp, it’s in the title attribute, so you can see it by hovering over it. (Sorry, mouseless users, but I don’t think any of my team are browsing our dashboards from their phone or tablet.) If the JavaScript doesn’t load, you see the plain old UTC timestamp. It’s not ideal, but the page still loads and you can see all the information – this behaviour is an enhancement, not an essential. To me, this is the unfulfilled promise of the <time> element. In my fantasy world, web page authors would write the time in a machine-readable format, and browsers would show it in a way that makes sense for the reader. They’d take into account their language, locale, and time zone. I understand why that hasn’t happened – it’s much easier said than done. You need so much context to know what’s the “right” thing to do when dealing with datetimes, and guessing without that context is at the heart of many datetime bugs. These sort of human-friendly, localised timestamps are very handy sometimes, and a complete mess at other times. In my staff-only dashboards, I have that context. I know what these timestamps mean, who’s going to be reading them, and I think they’re a helpful addition that makes the data easier to read. [If the formatting of this post looks odd in your feed reader, visit the original article]

a month ago 27 votes

More in programming

A Developer’s Crash Course in Coming to Japan

Thinking about moving to Japan? You’re not alone—Japan is a popular destination for those hoping to move abroad. What’s more, Japan actually needs more international developers. But how easy is it to immigrate to and work in Japan? Scores of videos on social media warn that living in Japan is quite different from holidaying here, and graphic descriptions of exploitative companies also create doubt. The truth is that Japan is not the easiest country to immigrate to, nor is it the hardest. Some Japanese tech companies and developer roles offer great work-life balance and good compensation; others do not. Based on other developers’ experiences, you’ll thrive here if you: Are an experienced developer Value safety, good food, and convenience over a high salary Are willing to invest time and effort into learning Japanese over the long term Read on to discover if Japan is a good fit for you, and the best ways to get a visa and begin your life here. What is it like working as a developer in Japan? TokyoDev conducts an annual survey of international developers living in Japan. Many of the questions in TokyoDev’s 2024 survey specifically addressed respondents’ work environments. Compensation When TokyoDev asked about “workplace difficulties” in the 2024 survey, 45% of respondents said that “compensation” was their number one problem at work. Overall, compensation for developers in Japan is far lower than the US developer median salary of 120,000 USD (currently 17.5 million yen), but higher than the Indian developer median salary of 640,000 rupees (currently around 1.1 million yen). Yet evaluating compensation for international developers in Japan, specifically, is trickier than you might expect. It’s hard to define an expected salary range because international developers tend to work in different companies and roles than the average Japanese developer. According to a 2024 survey conducted by the Japanese Ministry of Health, Labor and Welfare, the average annual salary of software engineers in Japan was 5.69 million yen. In a survey conducted that same year by TokyoDev, though, English-speaking international software developers in Japan enjoyed a median salary of 8.5 million yen. Of those international developers who responded, only 71% of them worked at a company headquartered in Japan, and almost 80% of them used English always or frequently, with 79% belonging to an engineering team with many other non-Japanese members. Wages, then, are heavily influenced by a range of factors, but particularly by whether you’re working for a Japanese or international company. In general, 75% of the international developers surveyed made 6 million yen or more. The real question is, is that enough for you to be comfortable in Japan? The answer is likely to be yes, if you don’t have overseas financial obligations or dependents. If you do, you’ll want to look carefully at rent, grocery, and education prices in your area of choice to guesstimate the expense of your Japanese lifestyle. Work-life balance Japan has a tradition of long hours and overtime. The Financial Times reports that the Japanese government has taken many measures to reduce the phenomenon of death from overwork (過労死, karoushi), from capping overtime to 100 hours a month, to setting up a national hotline for employees to report abusive companies. The results seem mixed. The Financial Times article adds that in 2024, employees at 26,000 organizations reported working illegal overtime at 44.5% of those businesses. On the other hand, average working hours for men fell to below 45 hours per week, and for women to below 35, which is similar to average working hours in the US. Still, 72% of the developers surveyed by TokyoDev worked for less than 40 hours a week. In addition, 70% of TokyoDev respondents cited work-life balance as their top workplace perk. The number of respondents happy with their working conditions came in just below that, at 69%. There was some correlation between hours worked and the type of employer, though. Employees at international subsidiaries were slightly more likely to enjoy shorter work weeks than those at Japanese companies. Remote work Remote work is still relatively new in Japan. Although more offices adopted the practice during Covid, many of them are now backtracking and requiring employees to return to the office, often with a hybrid schedule. While only 9% of TokyoDev respondents weren’t allowed any remote work, 76% of those required to work in-office were employed by Japan-headquartered companies. By contrast, of the 16% who worked fully remotely, only 57% worked for a Japanese company. Those with the option to work remotely really enjoy it. When asked what their most important workplace benefit was, 49% of respondents answered “remote work,” outstripping every other answer by far. Job security A major plus of working in Japan is job security—which, given the waves of layoffs at American tech companies, may now seem extra appealing. It’s overwhelmingly difficult to fire or lay off an employee with a permanent contract (正社員, seishain) in Japan, due to labor laws designed to protect the individual. This may be why 54% of TokyoDev survey respondents named “job security” as their most important workplace perk. Not every company will adhere to labor protection laws, and sometimes businesses pressure employees to “voluntarily” resign. Nonetheless, employees have significant legal recourse when companies attempt to fire them, change their contracts, or alter the current workplace conditions (sometimes, even if those conditions were never stated in writing). Developer stories TokyoDev regularly interviews developers working at our client companies, for information on both their specific positions and their general work environment. Our interviewees work with a variety of technology in many different roles, and at companies ranging from fintech enterprises like PayPay to game companies like Wizcorp. Why do developers choose Japan? In 2024 TokyoDev also asked developers, “What’s your favorite thing about Japan?” The results were: Safety: 21% Food: 13% Convenience: 11% Culture: 8% Peacefulness: 7% Nature: 5% Interestingly, there was a strong correlation between the amount of time someone had lived in Japan and their answer. Those who had been in Japan three years or less more frequently chose “food” or “culture.” Those who’d lived in Japan for four or more years were significantly more likely to answer “safety” or “peacefulness.” Safety It’s true that Japan enjoys a lower crime rate than many developed nations. The Security Journal UK ranked it ninth in a list of the world’s twenty safest countries. In 2024, World Population Review selected Tokyo as the safest city in the world. The homicide rate in 2023 was only 0.23 per 100,000 people, and has been steadily declining since the nineties. There are a few women-specific concerns, such as sexual violence. Nonetheless, the subjective experience of many women in the TokyoDev audience is that Japan feels safe; for example, they experience no trepidation walking around late at night. Of course, crime statistics don’t take into account natural disasters, of which Japan has more than its fair share. Thanks to being located on the Ring of Fire, Japan regularly copes with earthquakes and volcanic activity, and its location in the Pacific means that it is also affected by typhoons and tsunamis. To compensate, Japan also takes natural disaster countermeasures extremely seriously. It’s certainly the only country I’ve been to that posts large-scale evacuation maps on the side of the street, stores emergency supply stockpiles in public parks, and often requires schoolchildren to keep earthquake safety headgear at their desks. Food Food is another major draw. Many respondents simply wrote that “food” or “fresh, affordable food” was their favorite thing about Japan, but a few listed specific dishes. Favorite Japanese foods of the TokyoDev audience include: Yakiniku (self-grilled meat) Ramen Peaches Sushi Hiroshima-style okonomiyaki (savory pancake) Curry rice Onigiri (rice balls) Of those, sushi was mentioned most often. One respondent also answered the question with “drinking,” if you think that should count! Personal experiences Our contributors have also shared their personal experiences of moving to and working in Japan. We’ve got articles from Filipino, Indonesian, Australian, Vietnamese, and Mongolian developers, as well as others sharing what it’s like to work as a female software developer in Japan, or to live in Japan with a disability. Why shouldn’t you live in Japan? Safety, food, convenience, and culture are the most commonly-cited upsides of living in Japan. The downsides are the necessity of learning the language and some strict, yet often-unspoken, cultural expectations. Language Fluency in Japanese is not strictly necessary to live or work in Japan. Access to government services for you and your family, such as Japanese public school, is possible even if you speak little Japanese. (That doesn’t mean that most city hall clerks speak English; usually they’ll either locate a translator, or work with you via a translation app.) Nonetheless, TokyoDev’s 2024 survey showed that language ability was highly correlated to social success in Japan. In particular, 56% of all respondents were happy or very happy with their adjustment to Japanese culture. Breaking down that number, though, 76% of those with fluent or native Japanese ability reported being happy with their cultural adjustment, while only 34% of those with little or no Japanese ability were similarly happy. The same held true for social life satisfaction: 59% of those with fluent or native Japanese ability were happy or very happy with their social life, compared to 42% of those who don’t speak much Japanese. While English study is compulsory in Japan and starts in elementary school, as of 2025, only 28% of Japanese people speak English, and most of them can’t converse with high fluency. Living and working in Japan is possible without Japanese, but it’s hard to integrate, make friends, and participate in cultural activities if you can’t communicate with the locals. Cultural expectations As mentioned above, fluency in Japanese is closely allied to fluency in Japanese culture. At the same time, one does not necessarily imply the other. It’s possible to be fluent in Japanese, but still not grasp many of the unspoken rules your Japanese friends, neighbors, and coworkers operate by. Japan’s culture is both high-context and specifically averse to confrontation and outspokenness; if you get it “wrong,” people aren’t likely to tell you so. Japanese culture also values conformity: as the saying goes, “the nail that sticks up, gets hammered down.” While there are hints of things changing, with many Japanese companies saying support for greater diversity is necessary, minorities or those who are different may experience pressure to fit in. Introspection is required: are you the kind of person who’s adept at “reading the room,” a highly-valued quality in Japan? Conversely, are you self-confident enough to not sweat the small stuff? Either of these personality types may do well in Japan, but if social acceptance is very important to you, and you’re also uncomfortable with feeling occasionally awkward or uncertain, then you may struggle more to adjust. I want to go! How can I get there? If you’ve decided to immigrate to Japan, there are a number of ways to acquire a work visa. The simplest way is to get hired by a company operating in Japan. Alternatively, you can start your own business in Japan, come over on a Working Holiday, or even—if you’re very determined—arrive first as an English teacher. Let’s begin with the most straightforward route: getting hired as a developer. Getting a developer job in Japan As mentioned before, Japan needs more international developers. Some types of developers, though, will find it easier to get a job in Japan. In particular, companies in Japan are looking for the following: Senior developers. Companies are particularly interested in those with management experience and soft skills such as communication and leadership. Backend developers. This is one of the most widely-available roles for those who don’t speak Japanese. Developers who know Python. Python is one of Japan’s top in-demand languages. AI and Machine Learning Specialists. Japan is leaning hard on AI to help cope with demographic changes. Those who already know, or are willing to learn, Japanese. Combining those criteria, an experienced developer who speaks Japanese should have little difficulty finding a job! If you’re none of these things, you don’t need to give up—you just need to be patient, flexible, and willing to think outside the box. As Mercari Senior Technical Recruiter Clement Chidiac told me, “I know a bunch of people that managed to land a job because they’ve tried harder, going to meetups, reaching out to people, networking, that kind of thing.” Edmund Ho, Principal Consultant at Talisman Corporation, agreed that overseas candidates hoping to work in Japan for the first time face a tough road. He believes candidates should maintain a realistic, but optimistic, view of the process. “Keep a longer mindset,” he suggested. “Maybe you don’t get an offer the first year, but you do the second year.” “Stepping-stone” jobs Candidates from overseas do face a severe disadvantage: many companies, even those founded by non-Japanese people, are only open to developers who already live in Japan. Although getting a work visa for an overseas employee is cheaper and easier in Japan than in many countries, it still presents a barrier some organizations are reluctant to overcome. By contrast, once you’re already on the ground, more companies will be interested in your skills. This is why some developers settle on a “stepping-stone” position—in other words, a job that may not be all you hoped for, but that is willing to sponsor your visa and bring you into the country. Here’s where some important clarification on Japanese work visas is required. Work visas The most common visa for developers is the Engineer/Specialist in Humanities/International Services visa, a broad-category visa for foreign workers in those fields. To qualify, a developer must have a college degree, or have ten years of work experience, or have passed an approved IT exam. Another relatively common visa for high-level developers is the Highly-Skilled Professional (HSP) visa. To acquire it, applicants must score at least 70 points on an assessment scale that addresses age, education level, Japanese level, income, and more. The HSP visa has many advantages, but there is one important difference between it, and the more standard Engineer visa. The Engineer/Specialist in Humanities/International Services visa is not tied to a specific company. It grants you the legal right to work within those fields for a specific period of time in Japan. The Highly-Skilled Foreign Professional visa, on the other hand, is tied to a specific employer. If you want to change jobs, you’ll need to update your residency status with immigration. Some unscrupulous companies will try to claim that because they sponsored your Engineer/Specialist in Humanities/International Servicesvisa, you are obligated to remain with their company or risk being deported. This is not the case. If you do leave your job without another one lined up, you have three months to find another before you may be at risk for deportation. In addition, the fields of work covered by the Engineer/Specialist in Humanities/International Services visa are incredibly broad, and include everything from sales to product development to language instruction. As TokyoDev specifically confirmed with immigration, you can even come to Japan as an English instructor, then later work as a developer, without needing to alter your visa. Those with the HSP visa will need to go to immigration and alter their residency status each time they change roles. However, if you have the points and qualifications for an HSP visa, that means you’re also eligible for Permanent Residency within one to three years. Once you’ve obtained Permanent Residency, you’re free to pursue whatever sort of employment you like. International or Japanese company? As you begin your job hunt, you’ll hopefully receive responses from several sorts of companies: Japanese companies that also primarily hire Japanese people, Japanese companies with designated multinational developer teams, companies that were founded in Japan but nonetheless hire international developers for a variety of positions, and international subsidiaries. There are advantages and disadvantages to working with mostly-Japanese or mostly-international companies. Japanese companies The more Japanese a company is—both in philosophy and personnel—the more you’ll need Japanese language skills to thrive there. It’s true that a number of well-established Japanese tech companies are now creating developer teams designed to be multinational from the outset: typically, these are very English-language friendly. Some organizations, such as Money Forward, have even adopted English as the official company language. However, this often results in an institutional language barrier between development teams and the rest of the company, which is usually staffed by Japanese speakers. Developers are still encouraged to learn Japanese, particularly as they climb the promotional ladder, to help facilitate interdepartmental communication. Some companies, such as DeepX and Beatrust, either offer language classes themselves or provide a stipend for language learning. In addition to the language, you’ll also need to become “fluent” in Japanese business norms, which can be much more rigid and hierarchical than American or European company cultures. For example, at introductory drinking parties (themselves a potential surprise for many!), it is customary for new employees or women employees to go around with a bottle of beer and pour glasses for their managers and the company’s senior management. As mentioned in the cultural expectations section, most Japanese people won’t correct you even if you’re doing it all wrong, which leaves foreigners to discover their gaffes via trial-and-error. The advantage here is that you’ll be pressured, hopefully in a good way, to adapt swiftly to the Japanese language and business culture. There’s a sink-or-swim element to this approach, but if you’re serious about settling in Japan, then this “downside” could benefit you in the long run. Finally, there is the above-mentioned issue of compensation. On average, international companies pay more than Japanese ones; the median salary difference is around three million yen per year. Specific roles may be paid at higher rates, though, and most Japanese companies do offer bonuses. Many Japanese companies also offer other perks, such as housing stipends, spouse and child allowances, etc. If you receive an offer, it’s worth examining the whole compensation package before you make a decision. International companies The advantages of working either for an international company, or for a Japanese company that already employs many non-Japanese people, are straightforward: you can usually communicate in English, you already understand most of the business norms, and such companies typically pay developers more. You do run the risk of getting stuck in a rut, though. As mentioned earlier, TokyoDev found in its own survey that the correlation between Japanese language skills and social life satisfaction is high. You can of course study Japanese in your free time—and many do—but the more your work environment and social life revolve around English, the more difficult acquiring Japanese becomes. Want a job? Start here! If you’re ready to begin your job hunt, you can start with the TokyoDev job board. TokyoDev only works with companies we feel good about sending applicants to, and the job board includes positions that don’t require Japanese and that accept candidates from abroad. Other alternatives These visas don’t lead directly to working as a software developer in Japan, but can still help you get your foot in the door. DIY options If you prefer to be your own boss, there are several visas that allow you to set up a business in Japan. The Business Manager visa is typically good for one year, although repeated applicants may get longer terms. Applicants should have five million yen in a bank account when they apply, and there are some complicated requirements for getting and keeping the visa, such as maintaining an office, paying yourself a minimum salary, following proper accounting procedures, etc. The Startup visa is another option if the Business Manager visa appeals to you, but you don’t yet have the funds or connections to make it happen. You’ll be granted the equivalent of a Business Manager visa for up to one year so that you can launch your business in Japan. Working Holiday visa This is the path our own founder Paul McMahon took to get his first developer job in Japan. If you meet various qualifications, and you belong to a country that has a Working Holiday visa agreement with Japan, you can come to Japan for a period of one year and do work that is “incidental” to your holiday. In practice, this means you can work almost any job except for those that are considered “immoral” (bars, clubs, gambling, etc.). The Working Holiday visa is a great opportunity for those who have the option. It allows you to experience living and working in Japan without any long-term commitments, and also permits you to job-hunt freely without time or other visa constraints. J-Find visa The J-Find visa is a one-year visa, intended to let graduates of top universities job-hunt or prepare to found a start-up in Japan. To qualify, applicants should have: A degree from a university ranked in the top 100 by at least two world university rankings, or completed a graduate course there Graduated within five years of the application date At least 200,000 yen for initial living expenses TokyoDev contributor Oguzhan Karagözoglu received a J-Find visa, though he did run into some difficulties, particularly given immigration’s unfamiliarity with this relatively new type of visa. Digital Nomad visa This is another new visa category that allows foreigners from specific countries, who must make over 10 million yen or more a year, to work remotely from Japan for six months. Given that the application process alone can take months, the visa isn’t extendable or renewable, and you’re not granted residency, it’s questionable whether the pay-off is worth the effort. Still, if you have the option to work remotely and want to test out living in Japan before committing long-term, this is one way to do that. TokyoDev contributor Christian Mack was not only one of the first to acquire the Digital Nomad visa, but has since opened a consultancy to help others through the process. Conclusion If your takeaway from this article is, “Japan, here I come!” then there are more TokyoDev articles that can help you on your way. For example, if you want to bring your pets with you, you should know that you need to start preparing the import paperwork up to seven months in advance. If you’re ready now to start applying for jobs, check out the TokyoDev job board. You’ll also want to look at how to write a resume for a job in Japan, and our industry insider advice on passing the resume screening process. These tips for interviewing at Japanese tech companies would be useful, and when you’re ready for it, see this guide to salary negotiations. Once you’ve landed that job, we’ve got articles on everything from bringing your family with you, to getting your first bank account and apartment. In addition, the TokyoDev Discord hosts regular discussions on all these topics and more. It’s a great chance to make developer friends in Japan before you ever set foot in the country. Once you are here, you can join some of Japan’s top tech meetups, including many organized by TokyoDev itself. We look forward to seeing you soon!

15 hours ago 2 votes
Let's talk about the future of Remix and react-router (tip)

We go over the "Wake up, Remix!" article by the remix team and talk about their decisions moving forward and also speculate on what is coming next.

22 hours ago 2 votes
the algebra of dependent types

TIL (or this week-ish I learned) why big-sigma and big-pi turn up in the notation of dependent type theory. I’ve long been aware of the zoo of more obscure Greek letters that turn up in papers about type system features of functional programming languages, μ, Λ, Π, Σ. Their meaning is usually clear from context but the reason for the choice of notation is usually not explained. I recently stumbled on an explanation for Π (dependent functions) and Σ (dependent pairs) which turn out to be nicer than I expected, and closely related to every-day algebraic data types. sizes of types The easiest way to understand algebraic data types is by counting the inhabitants of a type. For example: the unit type () has one inhabitant, (), and the number 1 is why it’s called the unit type; the bool type hass two inhabitants, false and true. I have even seen these types called 1 and 2 (cruelly, without explanation) in occasional papers. product types Or pairs or (more generally) tuples or records. Usually written, (A, B) The pair contains an A and a B, so the number of possible values is the number of possible A values multiplied by the number of possible B values. So it is spelled in type theory (and in Standard ML) like, A * B sum types Or disjoint union, or variant record. Declared in Haskell like, data Either a b = Left a | Right b Or in Rust like, enum Either<A, B> { Left(A), Right(B), } A value of the type is either an A or a B, so the number of possible values is the number of A values plus the number of B values. So it is spelled in type theory like, A + B dependent pairs In a dependent pair, the type of the second element depends on the value of the first. The classic example is a slice, roughly, struct IntSlice { len: usize, elem: &[i64; len], } (This might look a bit circular, but the idea is that an array [i64; N] must be told how big it is – its size is an explicit part of its type – but an IntSlice knows its own size. The traditional dependent “vector” type is a sized linked list, more like my array type than my slice type.) The classic way to write a dependent pair in type theory is like,      Σ len: usize . Array(Int, len) The big sigma binds a variable that has a type annotation, with a scope covering the expression after the dot – similar syntax to a typed lambda expression. We can expand a simple example like this into a many-armed sum type: either an array of length zero, or an array of length 1, or an array of length 2, … but in a sigma type the discriminant is user-defined instead of hidden. The number of possible values of the type comes from adding up all the alternatives, a summation just like the big sigma summation we were taught in school. ∑ a ∈ A B a When the second element doesn’t depend on the first element, we can count the inhabitants like, ∑ A B = A*B And the sigma type simplifies to a product type. telescopes An aside from the main topic of these notes, I also recently encountered the name “telescope” for a multi-part dependent tuple or record. The name “telescope” comes from de Bruijn’s AUTOMATH, one of the first computerized proof assistants. (I first encountered de Bruijn as the inventor of numbered lambda bindings.) dependent functions The return type of a dependent function can vary according to the argument it is passed. For example, to construct an array we might write something like, fn repeat_zero(len: usize) -> [i64; len] { [0; len] } The classic way to write the type of repeat_zero() is very similar to the IntSlice dependent pair, but with a big pi instead of a big sigma:      Π len: usize . Array(Int, len) Mmm, pie. To count the number of possible (pure, total) functions A ➞ B, we can think of each function as a big lookup table with A entries each containing a B. That is, a big tuple (B, B, … B), that is, B * B * … * B, that is, BA. Functions are exponential types. We can count a dependent function, where the number of possible Bs depends on which A we are passed, ∏ a ∈ A B a danger I have avoided the terms “dependent sum” and “dependent product”, because they seem perfectly designed to cause confusion over whether I am talking about variants, records, or functions. It kind of makes me want to avoid algebraic data type jargon, except that there isn’t a good alternative for “sum type”. Hmf.

yesterday 3 votes
What does "Undecidable" mean, anyway

Systems Distributed I'll be speaking at Systems Distributed next month! The talk is brand new and will aim to showcase some of the formal methods mental models that would be useful in mainstream software development. It has added some extra stress on my schedule, though, so expect the next two monthly releases of Logic for Programmers to be mostly minor changes. What does "Undecidable" mean, anyway Last week I read Against Curry-Howard Mysticism, which is a solid article I recommend reading. But this newsletter is actually about one comment: I like to see posts like this because I often feel like I can’t tell the difference between BS and a point I’m missing. Can we get one for questions like “Isn’t XYZ (Undecidable|NP-Complete|PSPACE-Complete)?” I've already written one of these for NP-complete, so let's do one for "undecidable". Step one is to pull a technical definition from the book Automata and Computability: A property P of strings is said to be decidable if ... there is a total Turing machine that accepts input strings that have property P and rejects those that do not. (pg 220) Step two is to translate the technical computer science definition into more conventional programmer terms. Warning, because this is a newsletter and not a blog post, I might be a little sloppy with terms. Machines and Decision Problems In automata theory, all inputs to a "program" are strings of characters, and all outputs are "true" or "false". A program "accepts" a string if it outputs "true", and "rejects" if it outputs "false". You can think of this as automata studying all pure functions of type f :: string -> boolean. Problems solvable by finding such an f are called "decision problems". This covers more than you'd think, because we can bootstrap more powerful functions from these. First, as anyone who's programmed in bash knows, strings can represent any other data. Second, we can fake non-boolean outputs by instead checking if a certain computation gives a certain result. For example, I can reframe the function add(x, y) = x + y as a decision problem like this: IS_SUM(str) { x, y, z = split(str, "#") return x + y == z } Then because IS_SUM("2#3#5") returns true, we know 2 + 3 == 5, while IS_SUM("2#3#6") is false. Since we can bootstrap parameters out of strings, I'll just say it's IS_SUM(x, y, z) going forward. A big part of automata theory is studying different models of computation with different strengths. One of the weakest is called "DFA". I won't go into any details about what DFA actually can do, but the important thing is that it can't solve IS_SUM. That is, if you give me a DFA that takes inputs of form x#y#z, I can always find an input where the DFA returns true when x + y != z, or an input which returns false when x + y == z. It's really important to keep this model of "solve" in mind: a program solves a problem if it correctly returns true on all true inputs and correctly returns false on all false inputs. (total) Turing Machines A Turing Machine (TM) is a particular type of computation model. It's important for two reasons: By the Church-Turing thesis, a Turing Machine is the "upper bound" of how powerful (physically realizable) computational models can get. This means that if an actual real-world programming language can solve a particular decision problem, so can a TM. Conversely, if the TM can't solve it, neither can the programming language.1 It's possible to write a Turing machine that takes a textual representation of another Turing machine as input, and then simulates that Turing machine as part of its computations. Property (1) means that we can move between different computational models of equal strength, proving things about one to learn things about another. That's why I'm able to write IS_SUM in a pseudocode instead of writing it in terms of the TM computational model (and why I was able to use split for convenience). Property (2) does several interesting things. First of all, it makes it possible to compose Turing machines. Here's how I can roughly ask if a given number is the sum of two primes, with "just" addition and boolean functions: IS_SUM_TWO_PRIMES(z): x := 1 y := 1 loop { if x > z {return false} if IS_PRIME(x) { if IS_PRIME(y) { if IS_SUM(x, y, z) { return true; } } } y := y + 1 if y > x { x := x + 1 y := 0 } } Notice that without the if x > z {return false}, the program would loop forever on z=2. A TM that always halts for all inputs is called total. Property (2) also makes "Turing machines" a possible input to functions, meaning that we can now make decision problems about the behavior of Turing machines. For example, "does the TM M either accept or reject x within ten steps?"2 IS_DONE_IN_TEN_STEPS(M, x) { for (i = 0; i < 10; i++) { `simulate M(x) for one step` if(`M accepted or rejected`) { return true } } return false } Decidability and Undecidability Now we have all of the pieces to understand our original definition: A property P of strings is said to be decidable if ... there is a total Turing machine that accepts input strings that have property P and rejects those that do not. (220) Let IS_P be the decision problem "Does the input satisfy P"? Then IS_P is decidable if it can be solved by a Turing machine, ie, I can provide some IS_P(x) machine that always accepts if x has property P, and always rejects if x doesn't have property P. If I can't do that, then IS_P is undecidable. IS_SUM(x, y, z) and IS_DONE_IN_TEN_STEPS(M, x) are decidable properties. Is IS_SUM_TWO_PRIMES(z) decidable? Some analysis shows that our corresponding program will either find a solution, or have x>z and return false. So yes, it is decidable. Notice there's an asymmetry here. To prove some property is decidable, I need just to need to find one program that correctly solves it. To prove some property is undecidable, I need to show that any possible program, no matter what it is, doesn't solve it. So with that asymmetry in mind, do are there any undecidable problems? Yes, quite a lot. Recall that Turing machines can accept encodings of other TMs as input, meaning we can write a TM that checks properties of Turing machines. And, by Rice's Theorem, almost every nontrivial semantic3 property of Turing machines is undecidable. The conventional way to prove this is to first find a single undecidable property H, and then use that to bootstrap undecidability of other properties. The canonical and most famous example of an undecidable problem is the Halting problem: "does machine M halt on input i?" It's pretty easy to prove undecidable, and easy to use it to bootstrap other undecidability properties. But again, any nontrivial property is undecidable. Checking a TM is total is undecidable. Checking a TM accepts any inputs is undecidable. Checking a TM solves IS_SUM is undecidable. Etc etc etc. What this doesn't mean in practice I often see the halting problem misconstrued as "it's impossible to tell if a program will halt before running it." This is wrong. The halting problem says that we cannot create an algorithm that, when applied to an arbitrary program, tells us whether the program will halt or not. It is absolutely possible to tell if many programs will halt or not. It's possible to find entire subcategories of programs that are guaranteed to halt. It's possible to say "a program constructed following constraints XYZ is guaranteed to halt." The actual consequence of undecidability is more subtle. If we want to know if a program has property P, undecidability tells us We will have to spend time and mental effort to determine if it has P We may not be successful. This is subtle because we're so used to living in a world where everything's undecidable that we don't really consider what the counterfactual would be like. In such a world there might be no need for Rust, because "does this C program guarantee memory-safety" is a decidable property. The entire field of formal verification could be unnecessary, as we could just check properties of arbitrary programs directly. We could automatically check if a change in a program preserves all existing behavior. Lots of famous math problems could be solved overnight. (This to me is a strong "intuitive" argument for why the halting problem is undecidable: a halt detector can be trivially repurposed as a program optimizer / theorem-prover / bcrypt cracker / chess engine. It's too powerful, so we should expect it to be impossible.) But because we don't live in that world, all of those things are hard problems that take effort and ingenuity to solve, and even then we often fail. To be pendantic, a TM can't do things like "scrape a webpage" or "render a bitmap", but we're only talking about computational decision problems here. ↩ One notation I've adopted in Logic for Programmers is marking abstract sections of pseudocode with backticks. It's really handy! ↩ Nontrivial meaning "at least one TM has this property and at least one TM doesn't have this property". Semantic meaning "related to whether the TM accepts, rejects, or runs forever on a class of inputs". IS_DONE_IN_TEN_STEPS is not a semantic property, as it doesn't tell us anything about inputs that take longer than ten steps. ↩

2 days ago 4 votes
Tradeoffs to Continuous Software?

I came across this post from the tech collective crftd. about how software is in a process of “continuous disintegration”: One of the uncomfortable truths we sometimes have to break to people is that software isn't just never “done”. Worse even, it rots… The practices of continuous integration act as enablers for us to keep adding value and keeping development maintainable, but they cannot stop the inevitable: The system will eventually fail in unexpected ways, as is the nature of complex systems: That all resonates with me — software is rarely “done”, it generally has shelf life and starts rotting the moment you ship it — but what really made me pause was this line: The practices of continuous integration act as enablers for us I read “enabler” there in the negative context of the word, like in addiction when the word “enabler” refers to someone who exploits others by encouraging a pattern of self-destructive behavior. Is CI/CD an enabler? I’d only ever thought on moving towards CI/CD as a net positive thing. Is it possible that, like everything, CI/CD has its tradeoffs and isn’t always the Best Thing Ever™️? What are the trade-offs of CI/CD? The thought occurred to me that CI stands for “continuous investment” because that’s what it requires to keep it working — a continuous investment in the both the infrastructure that delivers the software and the software itself. Everybody complains now-a-days about how software requires a subscription. Why is that? Could it be, perhaps, because of CI/CD? If you want continuous updates to your software, you’re going to have to pay for it continuously. We’ve made delivering software continuously easy, which means we’ve made creating software that’s “done” hard — be careful of what you make easy. In some sense — at least on the web — I think you could argue that we don’t know how to make software that’s done (e.g. software that ships on a CD). We’re inundated with tools and practices and norms that enable the opposite of that. And, perhaps, we’ve trading something there? When something comes along and enables new capabilities, it often severs others. Email · Mastodon · Bluesky

2 days ago 3 votes