Full Width [alt+shift+f] Shortcuts [alt+shift+k]
Sign Up [alt+shift+s] Log In [alt+shift+l]
2
I store a lot of data in SQLite databases on remote servers, and I often want to copy them to my local machine for analysis or backup. When I’m starting a new project and the database is near-empty, this is a simple rsync operation: $ rsync --progress username@server:my_remote_database.db my_local_database.db As the project matures and the database grows, this gets slower and less reliable. Downloading a 250MB database from my web server takes about a minute over my home Internet connection, and that’s pretty small – most of my databases are multiple gigabytes in size. I’ve been trying to make these copies go faster, and I recently discovered a neat trick. What really slows me down is my indexes. I have a lot of indexes in my SQLite databases, which dramatically speed up my queries, but also make the database file larger and slower to copy. (In one database, there’s an index which single-handedly accounts for half the size on disk!) The indexes don’t store anything unique – they...
a week ago

Improve your reading experience

Logged in users get linked directly to articles resulting in a better reading experience. Please login for free, it takes less than 1 minute.

More from alexwlchan

Handling JSON objects with duplicate names in Python

Consider the following JSON object: { "sides": 4, "colour": "red", "sides": 5, "colour": "blue" } Notice that sides and colour both appear twice. This looks invalid, but I learnt recently that this is actually legal JSON syntax! It’s unusual and discouraged, but it’s not completely forbidden. This was a big surprise to me. I think of JSON objects as key/value pairs, and I associate them with data structures like a dict in Python or a Hash in Ruby – both of which only allow unique keys. JSON has no such restriction, and I started thinking about how to handle it. What does the JSON spec say about duplicate names? JSON is described by several standards, which Wikipedia helpfully explains for us: After RFC 4627 had been available as its “informational” specification since 2006, JSON was first standardized in 2013, as ECMA‑404. RFC 8259, published in 2017, is the current version of the Internet Standard STD 90, and it remains consistent with ECMA‑404. That same year, JSON was also standardized as ISO/IEC 21778:2017. The ECMA and ISO/IEC standards describe only the allowed syntax, whereas the RFC covers some security and interoperability considerations. All three of these standards explicitly allow the use of duplicate names in objects. ECMA‑404 and ISO/IEC 21778:2017 have identical text to describe the syntax of JSON objects, and they say (emphasis mine): An object structure is represented as a pair of curly bracket tokens surrounding zero or more name/value pairs. […] The JSON syntax does not impose any restrictions on the strings used as names, does not require that name strings be unique, and does not assign any significance to the ordering of name/value pairs. These are all semantic considerations that may be defined by JSON processors or in specifications defining specific uses of JSON for data interchange. RFC 8259 goes further and strongly recommends against duplicate names, but the use of SHOULD means it isn’t completely forbidden: The names within an object SHOULD be unique. The same document warns about the consequences of ignoring this recommendation: An object whose names are all unique is interoperable in the sense that all software implementations receiving that object will agree on the name-value mappings. When the names within an object are not unique, the behavior of software that receives such an object is unpredictable. Many implementations report the last name/value pair only. Other implementations report an error or fail to parse the object, and some implementations report all of the name/value pairs, including duplicates. So it’s technically valid, but it’s unusual and discouraged. I’ve never heard of a use case for JSON objects with duplicate names. I’m sure there was a good reason for it being allowed by the spec, but I can’t think of it. Most JSON parsers – including jq, JavaScript, and Python – will silently discard all but the last instance of a duplicate name. Here’s an example in Python: >>> import json >>> json.loads('{"sides": 4, "colour": "red", "sides": 5, "colour": "blue"}') {'colour': 'blue', 'sides': 5} What if I wanted to decode the whole object, or throw an exception if I see duplicate names? This happened to me recently. I was editing a JSON file by hand, and I’d copy/paste objects to update the data. I also had scripts which could update the file. I forgot to update the name on one of the JSON objects, so there were two name/value pairs with the same name. When I ran the script, it silently erased the first value. I was able to recover the deleted value from the Git history, but I wondered how I could prevent this happening again. How could I make the script fail, rather than silently delete data? Decoding duplicate names in Python When Python decodes a JSON object, it first parses the object as a list of name/value pairs, then it turns that list of name value pairs into a dictionary. We can see this by looking at the JSONObject function in the CPython source code: it builds a list pairs, and at the end of the function, it calls dict(pairs) to turn the list into a dictionary. This relies on the fact that dict() can take an iterable of key/value tuples and create a dictionary: >>> dict([('sides', 4), ('colour', 'red')]) {'colour': 'red', 'sides': 4} The docs for dict() tell us that it` will discard duplicate keys: “if a key occurs more than once, the last value for that key becomes the corresponding value in the new dictionary”. >>> dict([('sides', 4), ('colour', 'red'), ('sides', 5), ('colour', 'blue')]) {'colour': 'blue', 'sides': 5} We can customise what Python does with the list of name/value pairs. Rather than calling dict(), we can pass our own function to the object_pairs_hook parameter of json.loads(), and Python will call that function on the list of pairs. This allows us to parse objects in a different way. For example, we can just return the literal list of name/value pairs: >>> import json >>> json.loads( ... '{"sides": 4, "colour": "red", "sides": 5, "colour": "blue"}', ... object_pairs_hook=lambda pairs: pairs ... ) ... [('sides', 4), ('colour', 'red'), ('sides', 5), ('colour', 'blue')] We could also use the multidict library to get a dict-like data structure which supports multiple values per key. This is based on HTTP headers and URL query strings, two environments where it’s common to have multiple values for a single key: >>> from multidict import MultiDict >>> md = json.loads( ... '{"sides": 4, "colour": "red", "sides": 5, "colour": "blue"}', ... object_pairs_hook=lambda pairs: MultiDict(pairs) ... ) ... >>> md <MultiDict('sides': 4, 'colour': 'red', 'sides': 5, 'colour': 'blue')> >>> md['sides'] 4 >>> md.getall('sides') [4, 5] Preventing silent data loss If we want to throw an exception when we see duplicate names, we need a longer function. Here’s the code I wrote: import collections import typing def dict_with_unique_names(pairs: list[tuple[str, typing.Any]]) -> dict[str, typing.Any]: """ Convert a list of name/value pairs to a dict, but only if the names are unique. If there are non-unique names, this function throws a ValueError. """ # First try to parse the object as a dictionary; if it's the same # length as the pairs, then we know all the names were unique and # we can return immediately. pairs_as_dict = dict(pairs) if len(pairs_as_dict) == len(pairs): return pairs_as_dict # Otherwise, let's work out what the repeated name(s) were, so we # can throw an appropriate error message for the user. name_tally = collections.Counter(n for n, _ in pairs) repeated_names = [n for n, count in name_tally.items() if count > 1] assert len(repeated_names) > 0 if len(repeated_names) == 1: raise ValueError(f"Found repeated name in JSON object: {repeated_names[0]}") else: raise ValueError( f"Found repeated names in JSON object: {', '.join(repeated_names)}" ) If I use this as my object_pairs_hook when parsing an object which has all unique names, it returns the normal dict I’d expect: >>> json.loads( ... '{"sides": 4, "colour": "red"}', ... object_pairs_hook=dict_with_unique_names ... ) ... {'colour': 'red', 'sides': 4} But if I’m parsing an object with one or more repeated names, the parsing fails and throws a ValueError: >>> json.loads( ... '{"sides": 4, "colour": "red", "sides": 5}', ... object_pairs_hook=dict_with_unique_names ... ) Traceback (most recent call last): […] ValueError: Found repeated name in JSON object: sides >>> json.loads( ... '{"sides": 4, "colour": "red", "sides": 5, "colour": "blue"}', ... object_pairs_hook=dict_with_unique_names ... ) Traceback (most recent call last): […] ValueError: Found repeated names in JSON object: sides, colour This is precisely the behaviour I want – throwing an exception, not silently dropping data. Encoding non-unique names in Python It’s hard to think of a use case, but this post feels incomplete without at least a brief mention. If you want to encode custom data structures with Python’s JSON library, you can subclass JSONEncoder and define how those structures should be serialised. Here’s a rudimentary attempt at doing that for a MultiDict: class MultiDictEncoder(json.JSONEncoder): def encode(self, o: typing.Any) -> str: # If this is a MultiDict, we need to construct the JSON string # manually -- first encode each name/value pair, then construct # the JSON object literal. if isinstance(o, MultiDict): name_value_pairs = [ f'{super().encode(str(name))}: {self.encode(value)}' for name, value in o.items() ] return '{' + ', '.join(name_value_pairs) + '}' return super().encode(o) and here’s how you use it: >>> md = MultiDict([('sides', 4), ('colour', 'red'), ('sides', 5), ('colour', 'blue')]) >>> json.dumps(md, cls=MultiDictEncoder) {"sides": 4, "colour": "red", "sides": 5, "colour": "blue"} This is rough code, and you shouldn’t use it – it’s only an example. I’m constructing the JSON string manually, so it doesn’t handle edge cases like indentation or special characters. There are almost certainly bugs, and you’d need to be more careful if you wanted to use this for real. In practice, if I had to encode a multi-dict as JSON, I’d encode it as a list of objects which each have a key and a value field. For example: [ {"key": "sides", "value": 4 }, {"key": "colour", "value": "red" }, {"key": "sides", "value": 5 }, {"key": "colour", "value": "blue"}, ] This is a pretty standard pattern, and it won’t trip up JSON parsers which aren’t expecting duplicate names. Do you need to worry about this? This isn’t a big deal. JSON objects with duplicate names are pretty unusual – this is the first time I’ve ever encountered one, and it was a mistake. Trying to account for this edge case in every project that uses JSON would be overkill. It would add complexity to my code and probably never catch a single error. This started when I made a copy/paste error that introduced the initial duplication, and then a script modified the JSON file and caused some data loss. That’s a somewhat unusual workflow, because most JSON files are exclusively modified by computers, and this wouldn’t be an issue. I’ve added this error handling to my javascript-data-files library, but I don’t anticipate adding it to other projects. I use that library for my static website archives, which is where I had this issue. Although I won’t use this code exactly, it’s been good practice at writing custom encoders/decoders in Python. That is something I do all the time – I’m often encoding native Python types as JSON, and I want to get the same type back when I decode later. I’ve been writing my own subclasses of JSONEncoder and JSONDecoder for a while. Now I know a bit more about how Python decodes JSON, and object_pairs_hook is another tool I can consider using. This was a fun deep dive for me, and I hope you found it helpful too. [If the formatting of this post looks odd in your feed reader, visit the original article]

a week ago 2 votes
A flash of light in the darkness

I support dark mode on this site, and as part of the dark theme, I have a colour-inverted copy of the default background texture. I like giving my website a subtle bit of texture, which I think makes it stand out from a web which is mostly solid-colour backgrounds. Both my textures are based on the “White Waves” pattern made by Stas Pimenov. I was setting these images as my background with two CSS rules, using the prefers-color-scheme: dark media feature to use the alternate image in dark mode: body { background: url('https://alexwlchan.net/theme/white-waves-transparent.png'); } @media (prefers-color-scheme: dark) { body { background: url('https://alexwlchan.net/theme/black-waves-transparent.png'); } } This works, mostly. But I prefer light mode, so while I wrote this CSS and I do some brief testing whenever I make changes, I’m not using the site in dark mode. I know how dark mode works in my local development environment, not how it feels as a day-to-day user. Late last night I was using my phone in dark mode to avoid waking the other people in the house, and I opened my site. I saw a brief flash of white, and then the dark background texture appeared. That flash of bright white is precisely what you don’t want when you’re using dark mode, but it happened anyway. I made a note to work it out in the morning, then I went to bed. Now I’m fully awake, it’s obvious what happened. Because my only background is the image URL, there’s a brief gap between the CSS being parsed and the background image being loaded. In that time, the browser doesn’t have anything to put in the background, so you just get pure white. This was briefly annoying in the moment, but it would be even more worse if the background texture never loaded. I have light text on black in dark mode, but without the background image it’s just light text on white, which is barely readable: I never noticed this in local development, because I’m usually working in a well-lit room where that white flash would be far less obvious. I’m also using a local version of the site, which loads near-instantly and where the background image is almost certainly saved in my browser cache. I’ve made two changes to prevent this happening again. I’ve added a colour to use as a fallback until the image loads. The CSS background property supports adding a colour, which is used until the image loads, or as a fallback if it doesn’t. I already use this in a few places, and now I’ve added it to my body background. body { background: url('https://…/white-waves-transparent.png') #fafafa; } @media (prefers-color-scheme: dark) { body { background: url('https://…/black-waves-transparent.png') #0d0d0d; } } This avoids the flash of unstyled background before the image loads – the browser will use a solid dark background until it gets the texture. I’ve added rel="preload" elements to the head of the page, so the browser will start loading the background textures faster. These elements are a clue to the browser that these resources are going to be useful when it renders the page, so it should start loading them as soon as possible: <link rel="preload" href="https://alexwlchan.net/theme/white-waves-transparent.png" as="image" type="image/png" media="(prefers-color-scheme: light)" /> <link rel="preload" href="https://alexwlchan.net/theme/black-waves-transparent.png" as="image" type="image/png" media="(prefers-color-scheme: dark)" /> This means the browser is downloading the appropriate texture at the same time as it’s downloading the CSS file. Previously it had to download the CSS file, parse it, and only then would it know to start downloading the texture. With the preload, it’s a bit faster! The difference is probably imperceptible if you’re on a fast connection, but it’s a small win and I can’t see any downside (as long as I scope the preload correctly, and don’t preload resources I don’t end up using). I’ve seen a lot of sites using <link rel="preload"> and I’ve only half-understood what it is and why it’s useful – I’m glad to have a chance to use it myself, so I can understand it better. This bug reminds me of a phenomenon called flash of unstyled text. Back when custom fonts were fairly new, you’d often see web pages appear briefly with the default font before custom fonts finished loading. There are well-understood techniques for preventing this, so it’s unusual to see that brief unstyled text on modern web pages – but the same issue is affecting me in dark mode I avoided using custom fonts on the web to avoid tackling this issue, but it got me anyway! In these dark times for the web, old bugs are new again. [If the formatting of this post looks odd in your feed reader, visit the original article]

2 weeks ago 15 votes
Beyond `None`: actionable error messages for `keyring.get_password()`

I’m a big fan of keyring, a Python module made by Jason R. Coombs for storing secrets in the system keyring. It works on multiple operating systems, and it knows what password store to use for each of them. For example, if you’re using macOS it puts secrets in the Keychain, but if you’re on Windows it uses Credential Locker. The keyring module is a safe and portable way to store passwords, more secure than using a plaintext config file or an environment variable. The same code will work on different platforms, because keyring handles the hard work of choosing which password store to use. It has a straightforward API: the keyring.set_password and keyring.get_password functions will handle a lot of use cases. >>> import keyring >>> keyring.set_password("xkcd", "alexwlchan", "correct-horse-battery-staple") >>> keyring.get_password("xkcd", "alexwlchan") "correct-horse-battery-staple" Although this API is simple, it’s not perfect – I have some frustrations with the get_password function. In a lot of my projects, I’m now using a small function that wraps get_password. What do I find frustrating about keyring.get_password? If you look up a password that isn’t in the system keyring, get_password returns None rather than throwing an exception: >>> print(keyring.get_password("xkcd", "the_invisible_man")) None I can see why this makes sense for the library overall – a non-existent password is very normal, and not exceptional behaviour – but in my projects, None is rarely a usable value. I normally use keyring to retrieve secrets that I need to access protected resources – for example, an API key to call an API that requires authentication. If I can’t get the right secrets, I know I can’t continue. Indeed, continuing often leads to more confusing errors when some other function unexpectedly gets None, rather than a string. For a while, I wrapped get_password in a function that would throw an exception if it couldn’t find the password: def get_required_password(service_name: str, username: str) -> str: """ Get password from the specified service. If a matching password is not found in the system keyring, this function will throw an exception. """ password = keyring.get_password(service_name, username) if password is None: raise RuntimeError(f"Could not retrieve password {(service_name, username)}") return password When I use this function, my code will fail as soon as it fails to retrieve a password, rather than when it tries to use None as the password. This worked well enough for my personal projects, but it wasn’t a great fit for shared projects. I could make sense of the error, but not everyone could do the same. What’s that password meant to be? A good error message explains what’s gone wrong, and gives the reader clear steps for fixing the issue. The error message above is only doing half the job. It tells you what’s gone wrong (it couldn’t get the password) but it doesn’t tell you how to fix it. As I started using this snippet in codebases that I work on with other developers, I got questions when other people hit this error. They could guess that they needed to set a password, but the error message doesn’t explain how, or what password they should be setting. For example, is this a secret they should pick themselves? Is it a password in our shared password vault? Or do they need an API key for a third-party service? If so, where do they find it? I still think my initial error was an improvement over letting None be used in the rest of the codebase, but I realised I could go further. This is my extended wrapper: def get_required_password(service_name: str, username: str, explanation: str) -> str: """ Get password from the specified service. If a matching password is not found in the system keyring, this function will throw an exception and explain to the user how to set the required password. """ password = keyring.get_password(service_name, username) if password is None: raise RuntimeError( "Unable to retrieve required password from the system keyring!\n" "\n" "You need to:\n" "\n" f"1/ Get the password. Here's how: {explanation}\n" "\n" "2/ Save the new password in the system keyring:\n" "\n" f" keyring set {service_name} {username}\n" ) return password The explanation argument allows me to explain what the password is for to a future reader, and what value it should have. That information can often be found in a code comment or in documentation, but putting it in an error message makes it more visible. Here’s one example: get_required_password( "flask_app", "secret_key", explanation=( "Pick a random value, e.g. with\n" "\n" " python3 -c 'import secrets; print(secrets.token_hex())'\n" "\n" "This password is used to securely sign the Flask session cookie. " "See https://flask.palletsprojects.com/en/stable/config/#SECRET_KEY" ), ) If you call this function and there’s no keyring entry for flask_app/secret_key, you get the following error: Unable to retrieve required password from the system keyring! You need to: 1/ Get the password. Here's how: Pick a random value, e.g. with python3 -c 'import secrets; print(secrets.token_hex())' This password is used to securely sign the Flask session cookie. See https://flask.palletsprojects.com/en/stable/config/#SECRET_KEY 2/ Save the new password in the system keyring: keyring set flask_app secret_key It’s longer, but this error message is far more informative. It tells you what’s wrong, how to save a password, and what the password should be. This is based on a real example where the previous error message led to a misunderstanding. A co-worker saw a missing password called “secret key” and thought it referred to a secret key for calling an API, and didn’t realise it was actually for signing Flask session cookies. Now I can write a more informative error message, I can prevent that misunderstanding happening again. (We also renamed the secret, for additional clarity.) It takes time to write this explanation, which will only ever be seen by a handful of people, but I think it’s important. If somebody sees it at all, it’ll be when they’re setting up the project for the first time. I want that setup process to be smooth and straightforward. I don’t use this wrapper in all my code, particularly small or throwaway toys that won’t last long enough for this to be an issue. But in larger codebases that will be used by other developers, and which I expect to last a long time, I use it extensively. Writing a good explanation now can avoid frustration later. [If the formatting of this post looks odd in your feed reader, visit the original article]

3 weeks ago 14 votes
Localising the `` with JavaScript

I’ve been writing some internal dashboards recently, and one hard part is displaying timestamps. Our server does everything in UTC, but the team is split across four different timezones, so the server timestamps aren’t always easy to read. For most people, it’s harder to understand a UTC timestamp than a timestamp in your local timezone. Did that event happen just now, an hour ago, or much further back? Was that at the beginning of your working day? Or at the end? Then I remembered that I tried to solve this five years ago at a previous job. I wrote a JavaScript snippet that converts UTC timestamps into human-friendly text. It displays times in your local time zone, and adds a short suffix if the time happened recently. For example: today @ 12:00 BST (1 hour ago) In my old project, I was using writing timestamps in a <div> and I had to opt into the human-readable text for every date on the page. It worked, but it was a bit fiddly. Doing it again, I thought of a more elegant solution. HTML has a <time> element for expressing datetimes, which is a more meaningful wrapper than a <div>. When I render the dashboard on the server, I don’t know the user’s timezone, so I include the UTC timestamp in the page like so: <time datetime="2025-04-15 19:45:00Z"> Tue, 15 Apr 2025 at 19:45 UTC </time> I put a machine-readable date and time string with a timezone offset string in the datetime attribute, and then a more human-readable string in the text of the element. Then I add this JavaScript snippet to the page: window.addEventListener("DOMContentLoaded", function() { document.querySelectorAll("time").forEach(function(timeElem) { // Set the `title` attribute to the original text, so a user // can hover over a timestamp to see the UTC time. timeElem.setAttribute("title", timeElem.innerText); // Replace the display text with a human-friendly date string // which is localised to the user's timezone. timeElem.innerText = getHumanFriendlyDateString( timeElem.getAttribute("datetime") ); }) }); This updates any <time> element on the page to use a human friendly date string, which is localised to the user’s timezone. For example, I’m in the UK so that becomes: <time datetime="2025-04-15 19:45:00Z" title="Tue, 15 Apr 2025 at 19:45 UTC"> Tue, 15 Apr 2025 at 20:45 BST </time> In my experience, these timestamps are easier and more intuitive for people to read. I always include a timezone string (e.g. BST, EST, PDT) so it’s obvious that I’m showing a localised timestamp. If you really need the UTC timestamp, it’s in the title attribute, so you can see it by hovering over it. (Sorry, mouseless users, but I don’t think any of my team are browsing our dashboards from their phone or tablet.) If the JavaScript doesn’t load, you see the plain old UTC timestamp. It’s not ideal, but the page still loads and you can see all the information – this behaviour is an enhancement, not an essential. To me, this is the unfulfilled promise of the <time> element. In my fantasy world, web page authors would write the time in a machine-readable format, and browsers would show it in a way that makes sense for the reader. They’d take into account their language, locale, and time zone. I understand why that hasn’t happened – it’s much easier said than done. You need so much context to know what’s the “right” thing to do when dealing with datetimes, and guessing without that context is at the heart of many datetime bugs. These sort of human-friendly, localised timestamps are very handy sometimes, and a complete mess at other times. In my staff-only dashboards, I have that context. I know what these timestamps mean, who’s going to be reading them, and I think they’re a helpful addition that makes the data easier to read. [If the formatting of this post looks odd in your feed reader, visit the original article]

3 weeks ago 18 votes

More in programming

How to provide feedback on documents.

At Carta, we recently ran a reading group for Facilitating Software Architecture by Andrew Harmel-Law. We already loosely followed the ideas of an architectural advice process (from this 2021 article by the same Andrew Harmel-Law), but in practice we found that internal tech spec and architecture decision record (ADR) authors tended to exclusively share their documents locally within their team rather than more widely. As we asked authors why they preferred sharing locally, the most common answer was that they got enough feedback from their team that they didn’t want to pay the time overhead of sharing widely. The wider feedback wasn’t necessarily bad or combative. It just wasn’t good enough to compensate for the additional time it cost to process. This made sense from the authors’ perspectives, but didn’t work well for me from the executive perspective, as I was seeing teams make misaligned decisions due to lack of cross-team communication. As one step in reducing the overhead of sharing documents widely, I wrote up and shared this recommended process for providing feedback on documents: Before starting, remember that the goal of providing feedback on a document is to help the author. Optimizing for anything else, even if it’s a worthy cause, discourages authors from sharing their future writing. If you prioritize something other than helping the author, you are discouraging them from sharing future work. Start by skimming the document to understand its structure and where various kinds of topics are addressed. Why? This helps avoid giving feedback on ways the document’s actual structure diverges from how you imagined it would be structured. It also reduces questions about topics that are answered later in the document. Both of these sorts of feedback are a distraction during a discussion on a tech spec. In general, it’s better to avoid them. If you notice an author making the same significant structural mistake over several ADRs, it’s worth delivering that feedback separately. After skimming, reread the document, leaving comments with concerns. Each comment should include these details: What your suggested change or concern is Why you believe this is meaningful to address How important this seems (from ignorable nitpick to critical) If you find yourself leaving more than three or four issues, then you should either raise your threshold for commenting or you should schedule time with the individual to talk over the feedback. If the document is unreasonably weak, then it’s appropriate to nudge their leadership to dig into what’s happening on that team. The most important idea behind these steps is that your goal as a feedback giver is to help the document’s author. It is not to protect your team’s strategy or platform. It is not to optimize for your goals. It’s to help the author. This might feel wrong, but ultimately optimizing for anything else will lead to an environment where sharing widely is an irrational behavior. As a final aside, I think the user experience around commenting on documents is fundamentally wrong in most document editors. For example, Google Docs treats individual comments as first-order objects, similarly to how old version control systems like CVS tracked changes to individual files without tracking an overall state of the project. Ultimately, you want to collect all your comments into a bundle, then review that bundle for consistency and duplicates, and then submit that bundle as commentary, but editors don’t support that flow particularly well.

16 hours ago 1 votes
How (and Why) to Get a Bank Account in Japan

You can technically get by in Japan without a Japanese bank account. For those who are here on short-term visas, or who plan to move frequently from city to city, it’s perfectly possible to live and work in Japan without one. However, if you want to work a full-time job, rent an apartment, join social activities, or enroll your children in school, you’ll almost certainly need to make an account. Following is an overview on what you’ll need to open an account, some common problems foreigners encounter, and what banks will work best for your needs. If you don’t have a bank account . . . The “chicken-and-egg problem” is what many foreigners call it—that strange bureaucratic trap you encounter when moving to Japan. You need a local phone number to get an apartment, but you need a registered address to get a bank account, and you need a bank account to get a local phone number! Luckily, there is an order of operations that can get you all three as fast as possible. But let’s say you haven’t decided where you want to live yet, or there’s some other reason for delay. Can you get by in Japan without an account? Strangely enough, it’s not that difficult, thanks to Japan’s cash-based society. Getting paid Direct deposit is more common now, and most companies will also ask you to make an account with a specific bank to receive your paycheck. Nonetheless, they cannot require you to make an account with that bank. You are within your rights to insist on being paid to the account of your choice. Getting cash Be aware that Japan has two methods of getting cash from a machine: ATMs, which function generally like ATMs around the world, and cash machines, which are usually located in banks and are only usable with that bank’s cash card. For example, if you go into Mitsui Sumitomo and have a cash card for some other bank, you will not be able to use it. Many ATMs found at convenience stores, as well as Japan Post Bank ATMs, will allow you to withdraw yen from your foreign accounts. Of the various convenience store options, 7-Eleven ATMs are your best bet. There are some limitations: Depending on the ATM, additional fees may be charged Many ATMs can’t check your foreign account’s balance The single transaction withdrawal limit may be reduced—at Japan Post Bank ATMs, you can’t withdraw more than 50,000 yen from a foreign account at one time 7-Eleven ATMs do not allow you to freely select an amount to withdraw and instead require you to pick from options starting from 10,000 yen and up Using your foreign card In addition, most stores that accept credit or debit cards will also be able to process foreign-issued cards—at least, I’ve never had mine rejected. If the store is not large or is not part of a national chain, however, the odds of them not being able to process your card are higher. Additionally, some stores may not be able to support chips, so if your card does not have a magnetic stripe, you would be unable to use it. As a side note, one of the services that does not permit foreign credit cards is the one you’d least expect—Disneyland. If you want to purchase park tickets online, the website theoretically accepts most foreign cards, yet very few seem to actually work. Personally I got around this problem by using Klook, a third-party app that had no difficulty processing my credit card, and delivered my digital tickets without issue. Finding housing Finally, share-houses and other short-term, foreigner-friendly rental accommodations don’t require a Japanese bank account to rent. These often come furnished, may include utilities, and can be rented without the hassle of a deposit or key money. Of course, they will cost more overall than long-term housing, but they’re good options for those without a Japanese account. But you should make a bank account As you can see, it’s possible to live in Japan without a Japanese account, at least for a while. But it’s not convenient, and the longer you live in Japan, the more inconvenient it becomes. Renting Renting your own apartment with a long-term lease will almost certainly require a Japanese bank account. In this case, having a Japanese bank account and phone number is the bare minimum; they will also want to see your residency status, employment contract or income statement, and either guarantors or the endorsement of a guarantor company. In addition, while you can pay most utility bills with cash at a convenience store, it’s becoming more and more convenient to set up automatic withdrawal, with some companies attempting to discourage convenience store payments by applying a service fee for the paper bill. Automatic withdrawals also mean you’re less likely to miss a payment and have your gas turned off without warning, as happened to me! Employment Your employer will also want you to make a bank account, as almost all big businesses prefer direct deposits. Government benefits The government, at a certain point, requires you to have a local account. It’s how you can expect to receive your tax refund and any social benefits you may be entitled to, such as the child support allowance (jidou teate, 児童手当). Japanese society Aside from the basics of life, many social clubs, activities, and schools require participants to have bank accounts. This will depend somewhat on where you live. In Tokyo, my husband’s taiko club insisted that he set up monthly debits from a Japanese account in order to participate. My children’s public elementary school required us to make an entirely new account with their preferred bank, so that they could withdraw lunch fees. By contrast, in our new small town in Kansai, the children’s karate and ballet classes are cash-only. The school did ask us to make a new account at a regional bank for lunch fees, but when we were unsuccessful—a point I’ll explore below—they were fine with collecting the payments in cash. In short, it’s better to bite the bullet and make the account. The actual difficulty of doing so will depend on which bank you choose. The kinds of banks in Japan There are of course all kinds of banks in Japan, from online banks to large national institutions. From the immigrant’s point of view, however, there are several distinct categories. Japan Post Bank The Japan Post Bank (Yuucho Ginkou, ゆうちょ銀行) deserves a category of its own. Unlike other banks in Japan, the Japan Post Bank does not require six months residency or an employment contract in Japan to open an account. You must, however, have at least three months remaining on your residence card when you apply. In addition, if you have less than six months residency and no employment contract, your account will be treated as a non-resident account with limited services. There are branches all over Japan wherever a post office is; you can also open an account online. Conventional foreigner-friendly banks Several banks in Japan are well known for being foreigner-friendly and providing some English services. SMBC Trust Bank Prestia and SBI Shinsei Bank are the usual recommendations in this category. Both offer English-language online banking, and English support via chat. Online banks You can also select a bank that operates purely online (netto ginkou, ネット銀行). For simple bank procedures, such as acquiring a debit card and depositing your paycheck, these don’t operate much differently from conventional banks. Popular choices include: PayPay, which operates a thriving cashless payment service Sony Bank, which has 90,000 partner ATMs in Japan Seven Bank, the official online bank of 7-Eleven and has ATMs in every branch Japanese-speaking banks Aside from convenience, there’s really nothing stopping you from banking with any bank in Japan. You should be able to make an appointment at any branch and request their help in opening an account. Granted, this approach requires time, patience, possibly multiple appointments, and—if you don’t speak Japanese—a lot of translation. Nonetheless it can be done, and will probably even be necessary at one point or another, since jobs, schools, and activities in Japan may ask you to work with their preferred bank. What you’ll need to apply Typically, this is what you’ll need to open an account with a bank: Your residence card. This is always required. A second form of ID. This could include your My Number card, your student ID, your Residence Certificate (住民票, juuminhyo), or a utility bill or other document with your full name in katakana. The exact specifications for a second form of ID differ from bank to bank, so check their instructions carefully. An employment contract and/or Employee ID. For most banks, if you want to open an account before you’ve lived in the country for six months, you will need to provide proof of employment. A local phone number Do I need a hanko? A hanko (判子, also called an inkan 印鑑) is a stamp which, on many Japanese documents, serves as your official signature. Do you absolutely need to have one to open an account? Not necessarily. Some banks, such as Japan Post Bank, will permit you to start banking with only your signature. Should you buy and use one anyway? Yes, for several good reasons: You may need it later for more advanced procedures, such as renting an apartment, getting a loan, or car registration. If your signature doesn’t match exactly when you’re submitting paperwork in the future, your bank may reject it. A hanko will remain the same, as long as it is not damaged. If you damage or lose your hanko, the bank will require you to re-register the imprint so that they have a current copy on file. If you sign up for a bank account with your signature, but later acquire and use a hanko, this can lead to confusion with your bank. Again, a Japanese bank will reject paperwork with any inconsistencies. This may not seem like a hard thing to keep straight, but if and when you have multiple accounts in Japan, remembering which requires your signature and which requires hanko can be a hassle. Why not? Hanko are not that expensive, they make great souvenirs, and they’re an easy way to integrate. My own hanko has my surname in katakana, and receives a lot of interest from Japanese people due to its unique appearance. If you’re intimidated by the process of buying a hanko in person, you can order one online. I used Google Translate to buy mine at Shibuya Stamp Shop, but there are English websites available as well. Be careful not to buy a hanko that is self-stamping (such as a shachihata), as many banks will refuse to accept them. Additionally, you should make sure that you carefully store the hanko you use for bank accounts and use it only for bank accounts. It is fine to use one hanko for multiple bank accounts. People commonly have several hanko, each for different levels of tasks; you don’t want to be stamping delivery slips or kids’ homework with the same security device you use to control your finances! U.S. citizen requirements U.S. citizens and green card holders will require a few more documents, thanks to the Foreign Account Tax Compliance Act (FATCA). If you’re opening an account in person, you should bring your passport and social security card with you. If you’re opening the account online, expect to fill out additional forms to establish your TIN (Taxpayer Identification Number). Usually these forms will be requested by mail, which delays the so-called “online application” process considerably. For U.S. citizens and green card holders, it’s faster to apply for an account in person. Should I apply online? Quite a few banks now claim to offer online applications in English, to ease account opening procedures. But what is meant by an “online application” can differ hugely. By smartphone is best First, if you want to apply online, it’s best to have a smartphone with a domestic SIM. Smartphones are the main way consumers access the internet in Japan, so many solutions are built smartphone-first. You can often save several steps by using a smartphone. For example, if you apply via smartphone with SMBC Trust Bank Prestia, you have the option to take a selfie as one form of ID, which means you only need your residence card. Do note that the facial ID process can be finicky for these systems, and may reject your photo. If you use a computer or tablet, however, the bank requires two forms of ID. Seven Bank, as an online bank, also strongly prefers customers to use a smartphone; those who don’t have one can use a Debit Card and conduct transactions from its ATMs, but won’t be able to use their Direct Banking Service. Is it really online? “Applying online” isn’t always as simple as it sounds. Japan Post Bank and Sony Bank both allow users to make an account via the bank’s app, a process that they claim takes around 30 minutes. But Shinsei’s online application barely qualifies as such. While you do fill out the initial form on the website, it’s only so you can receive a printed application form in the mail around one week later. You’ll then have to send back copies of your IDs to the bank via mail, for an additional 7-10 business days of processing—at which point, you might be better served by visiting a branch with Google Translate. Online-only banks often have similar processing times for foreigners, but with an additional down side: since they’re online-only, there is no option to visit a local branch and hammer everything out in one go! These estimated application times also depend on everything going smoothly via the bank’s app or website, which is not guaranteed. Modern banks often rely on relatively new MyNumber card integrations, or “AI” facial/document recognition, and bugs are unfortunately common. Common problems Forewarned is forearmed, and in that spirit, here are some of the most common issues experienced by foreigners banking in Japan. Technical difficulties Personally I bank with Japan Post Bank, and am very happy with the service I receive—-except when I need to try and set up a new direct withdrawal online. For whatever reason, I’ve found that trying to access the forms via Chrome causes all sorts of problems. Switch to Safari, though, and suddenly everything works. Using VPNs, adblockers, or other common security extensions can also frequently cause issues with financial sites in Japan. Name issues If you take away one important thing from this article, let it be this. From the beginning, choose the Japanese version of your name and keep it consistent. It’s a given that if you do not have a Japanese name, you will need to spell it out in katakana. However, for many names there are several accepted katakana variations. For example, I prefer to spell my surname Callahan as カラハン (Karahan), but it was spelled (without my input) as キャラハン (Kyarahan) on my health insurance card. Fortunately I didn’t run into any issues and was able to change it later. However, that would have caused issues with opening a bank account, if I’d attempted to use my health insurance card as a secondary form of ID. Long names and middle names will also cause problems—unfortunately, these are mostly unavoidable. There frequently isn’t enough space in a form to write your name properly, either in the Roman alphabet or in katakana. You might be tempted to leave out your middle name whenever possible, but you risk your application not being accepted because it doesn’t match your full legal name. For me personally, a long legal name has been only a minor inconvenience. However, for my Sri Lankan neighbor, her long name created so many problems that she was unable to open an account at our local bank. Although she is a permanent resident and speaks Japanese fluently, even after three separate trips to the bank, she was still unable to open the account. Banks will also unfortunately have different recommendations in the event that your full name does not fit their paper or electronic application; some will ask that you fill in as much as possible and truncate, while others may concede and allow only your first and last names. Still other banks may require you to use your English name and not accept a primary katakana rendering. These mismatches can cause issues when attempting to connect accounts in the future, and those can usually only be solved with human help—perhaps a reason to consider banking at an institution that has physical branches. Kanji difficulties Several times I’ve been asked to create a new account with a regional bank that didn’t offer service in English. Both times, I was asked by bank employees to fill out several forms with my address written in kanji. Best practice, of course, would be to have already memorized my own address in kanji. In reality, I ended up copying it from the tiny writing on the back of my residence card. At the first bank, the kind employees carefully showed me how to write some of the more complicated kanji. At the second, I was mostly left to my own devices, and the subsequent scrawl caused my application to be rejected; they asked me to come back with someone who spoke, and wrote, Japanese. If you do need to open an account at a Japanese-speaking bank, try keeping a copy of your address in your phone, or even printing out the kanji version in large characters that are easier to copy. Of course, if you have a Japanese-writing friend who is willing to accompany you that day, that will also speed things along. I’ll add that the bank that rejected me was the same bank that my neighbor applied to three times. I wouldn’t describe my visit there as an ordinary banking experience in Japan; this particular branch is clearly unwilling to assist or accommodate foreign residents. A cash card is not a debit card Perhaps this isn’t a widespread misunderstanding, but it caught me by surprise: most banks provide only cash cards by default, and debit cards are opt-in. A cash card is not a debit card—it is good only for pulling cash out of a cash machine or ATM. Some banks, such as Prestia and Sony, do give you a debit card straight away. Others, such as Japan Post Bank, require a subsequent application for a debit card once the account is open. You can distinguish a cash card from a debit card by looking for a network logo such as Visa, Mastercard, or JCB. If it does not have one, it’s likely a cash card. Holidays and ATM times If you live or work near convenience stores, you shouldn’t have much problem withdrawing cash whenever you want. However, you should still keep an eye out for ATM working hours or your bank’s maintenance hours. For example, many ATMs are unusable over a portion of the New Year or Golden Week. Japan Post Bank shuts down completely for part of Golden Week—a shutdown that includes ATMs, online services, the smartphone app, and even your debit card! You should also keep an eye on time-sensitive withdrawal fees. Many ATMs will display a screen that shows one withdrawal fee for business hours, and another for early morning or late-night transactions. The difference is fairly small—a business-hour withdrawal may cost 110 yen, as opposed to a late-night withdrawal at 220 yen—but if you’re cost-conscious, it’s good to take note. Sending and receiving money internationally The cost of sending and receiving money internationally adds up quickly. Not only do Japanese banks often charge steep fees for currency conversion and wiring, but there’s yet more paperwork involved. If you enjoy a prestigious bank account, such as the Sony Bank and Shinsei Platinum accounts, then one of the perks is lowered or waived fees for international transfers. If you don’t, then an online transfer service like Wise is certainly faster and frequently cheaper. If you are interested in moving large amounts of money and want to avoid fees as much as possible, here’s a detailed breakdown of the average transfer rates for various institutions and accounts. Frequently-recommended banks Following are some of the banks most often recommended by other immigrants, with a brief overview of their pros and cons. Japan Post Bank Japan Post Bank is one of the easiest banks to open an account with when you first arrive in Japan. Pros Doesn’t require six months residency or an employment contract to open an account Branches all over Japan in the post offices Can open an account and check your virtual bankbook via apps No monthly maintenance fees Cons Service is mostly in Japanese Services may be limited and fees may be high during the first six months if you do not have an employment contract Have to apply separately for a debit or bank card Access to ATMs on post office grounds is limited to the hours for that branch, which can be inconsistent High fees for international transfers SBI Shinsei Bank Shinsei is a good choice for those who want some service in English, and who intend to send and receive money internationally. Pros English Internet banking and online service Foreign currency accounts with high interest rates Free ATM withdrawals up to five times a month If you have a higher-level account (Diamond, Platinum, Gold, or Silver) you can receive foreign currency remittances for free Cons The “online” application procedure is really more by mail Initially only given a cash card Standard accounts are charged 2,200 yen per remittance SMBC Trust Bank Prestia Prestia is ideal for those who want a full-service bank that offers a travel-friendly debit card. Pros English-language bank app, online service, and assistance for housing loans, investment, etc. If you apply for an account via the app, you only need your residence card as a form of ID (assuming you meet the six-month residency requirement) Upon opening an account, automatically get both a yen account and a foreign currency account Immediately receive a GLOBAL PASS Visa debit card that can be used domestically and overseas Cons Monthly maintenance fee of 2,200 yen unless you keep a minimum balance of 500,000 yen or meet other requirements Easily confused with SMBC Bank, but the services and branches are not interchangeable Sony Bank For those who’d prefer an online bank, Sony Bank offers another international-friendly debit card and a comprehensive rewards system. Pros Automatically get the Sony Bank WALLET cash card, which can be used internationally Has Club S, a three-tier rewards system based on the balance of your yen and foreign currency accounts. Platinum members can get perks such as 2% cashback, unlimited free cash withdrawals, waived transfer and remittance fees, etc. Cons Only online banking is available in English (the app is in Japanese) As an online bank it has no physical branch to visit Special note: the Rakuten credit card Rakuten also has an online bank. While this is less often suggested as a bank for new immigrants, it is one of the few places foreigners can easily apply for a credit card. Conclusion Like most bureaucratic processes in Japan, opening an account can take quite a bit of time and paperwork, but is ultimately doable, not to mention beneficial in the long run. To recap: If you intend to live and work in Japan for more than a few months, you should open a local bank account. Japan Post Bank doesn’t require six months residency or employment to open an account, as other banks do. Banks such as SBI Shinsei, SMBC Trust Bank Prestia, and Sony Bank have a reputation for being foreigner-friendly; however, with proper preparation, you can have an account at any bank in Japan. The greatest difficulties in banking tend to be name-related. You can avoid most of them by keeping your legal name and its katakana spelling consistent from the beginning, as well as obtaining a hanko before opening the account. U.S. citizens and green card holders should expect more paperwork related to FATCA. Judging by these banks’ English-language sites, they’re pushing non-Japanese-speaking customers towards applying online or via mail rather than visiting their branches. However, if you’re a U.S. citizen, or just don’t want to download yet another app, don’t be afraid to go in person. With the exception of one local bank, I’ve consistently had positive experiences with bank personnel—they’ve often gone above and beyond to help me, despite the language barrier. So long as you’re patient with the process, and do your research on bank requirements, then opening an account will swiftly be one more item checked off that moving-to-Japan list.

3 hours ago 1 votes
systems-mcp: generate systems models via LLM

Back in 2018, I wrote lethain/systems as a domain-specific language for writing runnable systems models, and introduced it with this blog post modeling a hiring funnel. While it’s far from a perfect system, I’ve gotten a lot of value out of it over the last seven years, because it allows me to maintain systems models in version control. As I’ve been playing with writing Model Context Protocol (MCP) servers, one I’ve been thinking about frequently is one to help writing systems syntax, and I finally put that together in the lethain/systems-mcp repository. More detailed installation and usage instructions are in the GitHub repository, so I’ll just share a couple of screenshots and comments here. Starting with the load_systems_documentation tool which loads a copy of lethain/systems/README.md and a file with example systems into the context window. The biggest challenge of properly writing DSLs with an LLM is providing enough in-context learning (ICL) examples, and I think the idea of providing tools that are specifically designed to provide that context is a very interesting idea. Eventually I imagine there will be generalized tools for this, e.g. a search index of the best ICL examples for a wide variety of DSLs. Until then, my guess is that this sort of tool is particularly valuable. The second tool is run_systems_model which passes the DSL (and an optional parameter for number of rounds) to the tool and then returns the result. I experimented with interface design here, initially trying to return a rendered chart of the results, but ultimately even multi-modal models are just much better at working with text than with images. This meant that I had the best results returning JSON of the results and then having the LLM build a tool for interacting with the results. Altogether, a fun little experiment, and another confirmation in my mind that the most interesting part of designing MCPs today is deciding where to introduce and eliminate complexity from the LLM. Introduce too little and the tool lacks power; eliminate too little and the combination rarely works.

14 hours ago 1 votes
How Cursor Indexes Codebases Fast

Merkle Trees in the real world

2 days ago 5 votes
a whippet waypoint

Hey peoples! Tonight, some meta-words. As you know I am fascinated by compilers and language implementations, and I just want to know all the things and implement all the fun stuff: intermediate representations, flow-sensitive source-to-source optimization passes, register allocation, instruction selection, garbage collection, all of that. It started long ago with a combination of curiosity and a hubris to satisfy that curiosity. The usual way to slake such a thirst is structured higher education followed by industry apprenticeship, but for whatever reason my path sent me through a nuclear engineering bachelor’s program instead of computer science, and continuing that path was so distasteful that I noped out all the way to rural Namibia for a couple years. Fast-forward, after 20 years in the programming industry, and having picked up some language implementation experience, a few years ago I returned to garbage collection. I have a good level of language implementation chops but never wrote a memory manager, and Guile’s performance was limited by its use of the Boehm collector. I had been on the lookout for something that could help, and when I learned of it seemed to me that the only thing missing was an appropriate implementation for Guile, and hey I could do that!Immix I started with the idea of an -style interface to a memory manager that was abstract enough to be implemented by a variety of different collection algorithms. This kind of abstraction is important, because in this domain it’s easy to convince oneself that a given algorithm is amazing, just based on vibes; to stay grounded, I find I always need to compare what I am doing to some fixed point of reference. This GC implementation effort grew into , but as it did so a funny thing happened: the as a direct replacement for the Boehm collector maintained mark bits in a side table, which I realized was a suitable substrate for Immix-inspired bump-pointer allocation into holes. I ended up building on that to develop an Immix collector, but without lines: instead each granule of allocation (16 bytes for a 64-bit system) is its own line.MMTkWhippetmark-sweep collector that I prototyped The is funny, because it defines itself as a new class of collector, fundamentally different from the three other fundamental algorithms (mark-sweep, mark-compact, and evacuation). Immix’s are blocks (64kB coarse-grained heap divisions) and lines (128B “fine-grained” divisions); the innovation (for me) is the discipline by which one can potentially defragment a block without a second pass over the heap, while also allowing for bump-pointer allocation. See the papers for the deets!Immix papermark-regionregionsoptimistic evacuation However what, really, are the regions referred to by ? If they are blocks, then the concept is trivial: everyone has a block-structured heap these days. If they are spans of lines, well, how does one choose a line size? As I understand it, Immix’s choice of 128 bytes was to be fine-grained enough to not lose too much space to fragmentation, while also being coarse enough to be eagerly swept during the GC pause.mark-region This constraint was odd, to me; all of the mark-sweep systems I have ever dealt with have had lazy or concurrent sweeping, so the lower bound on the line size to me had little meaning. Indeed, as one reads papers in this domain, it is hard to know the real from the rhetorical; the review process prizes novelty over nuance. Anyway. What if we cranked the precision dial to 16 instead, and had a line per granule? That was the process that led me to Nofl. It is a space in a collector that came from mark-sweep with a side table, but instead uses the side table for bump-pointer allocation. Or you could see it as an Immix whose line size is 16 bytes; it’s certainly easier to explain it that way, and that’s the tack I took in a .recent paper submission to ISMM’25 Wait what! I have a fine job in industry and a blog, why write a paper? Gosh I have meditated on this for a long time and the answers are very silly. Firstly, one of my language communities is Scheme, which was a research hotbed some 20-25 years ago, which means many practitioners—people I would be pleased to call peers—came up through the PhD factories and published many interesting results in academic venues. These are the folks I like to hang out with! This is also what academic conferences are, chances to shoot the shit with far-flung fellows. In Scheme this is fine, my work on Guile is enough to pay the intellectual cover charge, but I need more, and in the field of GC I am not a proven player. So I did an atypical thing, which is to cosplay at being an independent researcher without having first been a dependent researcher, and just solo-submit a paper. Kids: if you see yourself here, just go get a doctorate. It is not easy but I can only think it is a much more direct path to goal. And the result? Well, friends, it is this blog post :) I got the usual assortment of review feedback, from the very sympathetic to the less so, but ultimately people were confused by leading with a comparison to Immix but ending without an evaluation against Immix. This is fair and the paper does not mention that, you know, I don’t have an Immix lying around. To my eyes it was a good paper, an , but, you know, just a try. I’ll try again sometime.80% paper In the meantime, I am driving towards getting Whippet into Guile. I am hoping that sometime next week I will have excised all the uses of the BDW (Boehm GC) API in Guile, which will finally allow for testing Nofl in more than a laboratory environment. Onwards and upwards! whippet regions? paper??!?

3 days ago 6 votes