Emoji Griddle

10 months ago

Remove from reading list Add to reading list [alt+a] Read now [→]

Comments

Improve your reading experience

Logged in users get linked directly to articles resulting in a better reading experience. Please login for free, it takes less than 1 minute.

More from exist

When Imperfect Systems are Good, Actually: Bluesky's Lossy Timelines

Often when designing systems, we aim for perfection in things like consistency of data, availability, latency, and more. The hardest part of system design is that it’s difficult (if not impossible) to design systems that have perfect consistency, perfect availability, incredibly low latency, and incredibly high throughput, all at the same time. Instead, when we approach system design, it’s best to treat each of these properties as points on different axes that we balance to find the “right fit” for the application we’re supporting. I recently made some major tradeoffs in the design of Bluesky’s Following Feed/Timeline to improve the performance of writes at the cost of consistency in a way that doesn’t negatively affect users but reduced P99s by over 96%. Timeline Fanout When you make a post on Bluesky, your post is indexed by our systems and persisted to a database where we can fetch it to hydrate and serve in API responses. Additionally, a reference to your post is “fanned out” to your followers so they can see it in their Timelines. This process involves looking up all of your followers, then inserting a new row into each of their Timeline tables in reverse chronological order with a reference to your post. When a user loads their Timeline, we fetch a page of post references and then hydrate the posts/actors concurrently to quickly build an API response and let them see the latest content from people they follow. The Timelines table is sharded by user. This means each user gets their own Timeline partition, randomly distributed among shards of our horizontally scalable database (ScyllaDB), replicated across multiple shards for high availability. Timelines are regularly trimmed when written to, keeping them near a target length and dropping older post references to conserve space. Hot Shards in Your Area Bluesky currently has around 32 Million Users and our Timelines database is broken into hundreds of shards. To support millions of partitions on such a small number of shards, each user’s Timeline partition is colocated with tens of thousands of other users’ Timelines. Under normal circumstances with all users behaving well, this doesn’t present a problem as the work of an individual Timeline is small enough that a shard can handle the work of tens of thousands of them without being heavily taxed. Unfortunately, with a large number of users, some of them will do abnormal things like… well… following hundreds of thousands of other users. Generally, this can be dealt with via policy and moderation to prevent abusive users from causing outsized load on systems, but these processes take time and can be imperfect. When a user follows hundreds of thousands of others, their Timeline becomes hyperactive with writes and trimming occurring at massively elevated rates. This load slows down the individual operations to the user’s Timeline, which is fine for the bad behaving user, but causes problems to the tens of thousands of other users sharing a shard with them. We typically call this situation a “Hot Shard”: where some resident of a shard has “hot” data that is being written to or read from at much higher rates than others. Since the data on the shard is only replicated a few times, we can’t effectively leverage the horizontal scale of our database to process all this additional work. Instead, the “Hot Shard” ends up spending so much time doing work for a single partition that operations to the colocated partitions slow down as well. Stacking Latencies Returning to our Fanout process, let’s consider the case of Fanout for a user followed by 2,000,000 other users. Under normal circumstances, writing to a single Timeline takes an average of ~600 microseconds. If we sequentially write to the Timelines of our user’s followers, we’ll be sitting around for 20 minutes at best to Fanout this post. If instead we concurrently Fanout to 1,000 Timelines at once, we can complete this Fanout job in ~1.2 seconds. That sounds great, except it oversimplifies an important property of systems: tail latencies. The average latency of a write is ~600 microseconds, but some writes take much less time and some take much more. In fact, the P99 latency of writes to the Timelines cluster can be as high as 15 milliseconds! What does this mean for our Fanout? Well, if we concurrently write to 1,000 Timelines at once, statistically we’ll see 10 writes as slow as or slower than 15 milliseconds. In the case of timelines, each “page” of followers is 10,000 users large and each “page” must be fanned out before we fetch the next page. This means that our slowest writes will hold up the fetching and Fanout of the next page. How does this affect our expected Fanout time? Each “page” will have ~100 writes as slow as or slower than the P99 latency. If we get unlucky, they could all stack up on a single routine and end up slowing down a single page of Fanout to 1.5 seconds. In the worst case, for our 2,000,000 Follower celebrity, their post Fanout could end up taking as long as 5 minutes! That’s not even considering P99.9 and P99.99 latencies which could end up being >1 second, which could leave us waiting tens of minutes for our Fanout job. Now imagine how bad this would be for a user with 20,000,000+ Followers! So, how do we fix the problem? By embracing imperfection, of course! Lossy Timelines Imagine a user who follows hundreds of thousands of others. Their Timeline is being written to hundreds of times a second, moving so fast it would be humanly impossible to keep up with the entirety of their Timeline even if it was their full-time job. For a given user, there’s a threshold beyond which it is unreasonable for them to be able to keep up with their Timeline. Beyond this point, they likely consume content through various other feeds and do not primarily use their Following Feed. Additionally, beyond this point, it is reasonable for us to not necessarily have a perfect chronology of everything posted by the many thousands of users they follow, but provide enough content that the Timeline always has something new. Note in this case I’m using the term “reasonable” to loosely convey that as a social media service, there must be a limit to the amount of work we are expected to do for a single user. What if we introduce a mechanism to reduce the correctness of a Timeline such that there is a limit to the amount of work a single Timeline can place on a DB shard. We can assert a reasonable limit for the number of follows a user should have to have a healthy and active Timeline, then increase the “lossiness” of their Timeline the further past that limit they go. A loss_factor can be defined as min(reasonable_limit/num_follows, 1) and can be used to probabilistically drop writes to a Timeline to prevent hot shards. Just before writing a page in Fanout, we can generate a random float between 0 and 1, then compare it to the loss_factor of each user in the page. If the user’s loss_factor is smaller than the generated float, we filter the user out of the page and don’t write to their Timeline. Now, users all have the same number of “follows worth” of Fanout. For example with a reasonable_limit of 2,000, a user who follows 4,000 others will have a loss_factor of 0.5 meaning half the writes to their Timeline will get dropped. For a user following 8,000 others, their loss factor of 0.25 will drop 75% of writes to their Timeline. Thus, each user has a effective ceiling on the amount of Fanout work done for their Timeline. By specifying the limits of reasonable user behavior and embracing imperfection for users who go beyond it, we can continue to provide service that meets the expectations of users without sacrificing scalability of the system. Aside on Caching We write to Timelines at a rate of more than one million times a second during the busy parts of the day. Looking up the number of follows of a given user before fanning out to them would require more than one million additional reads per second to our primary database cluster. This additional load would not be well received by our database and the additional cost wouldn’t be worth the payoff for faster Timeline Fanout. Instead, we implemented an approach that caches high-follow accounts in a Redis sorted set, then each instance of our Fanout service loads an updated version of the set into memory every 30 seconds. This allows us to perform lookups of follow counts for high-follow accounts millions of times per second per Fanount service instance. By caching values which don’t need to be perfect to function correctly in this case, we can once again embrace imperfection in the system to improve performance and scalability without compromising the function of the service. Results We implemented Lossy Timelines a few weeks ago on our production systems and saw a dramatic reduction in hot shards on the Timelines database clusters. In fact, there now appear to be no hot shards in the cluster at all, and the P99 of a page of Fanout work has been reduced by over 90%. Additionally, with the reduction in write P99s, the P99 duration for a full post Fanout has been reduced by over 96%. Jobs that used to take 5-10 minutes for large accounts now take <10 seconds. Knowing where it’s okay to be imperfect lets you trade consistency for other desirable aspects of your systems and scale ever higher. There are plenty of other places for improvement in our Timelines architecture, but this step was a big one towards improving throughput and scalability of Bluesky’s Timelines. If you’re interested in these sorts of problems and would like to help us build the core data services that power Bluesky, check out this job listing. If you’re interested in other open positions at Bluesky, you can find them here.

6 months ago • 52 votes

Jetstream: Shrinking the AT Proto Firehose by >99%

Bluesky recently saw a massive spike in activity in response to Brazil’s ban of Twitter. As a result, the AT Proto event firehose provided by Bluesky’s Relay at bsky.network has increased in volume by a huge amount. The average event rate during this surge increased by ~1,300%. Before this new surge in activity, the firehose would produce around 24 GB/day of traffic. After the surge, this volume jumped to over 232 GB/day! Keeping up with the full, verified firehose quickly became less practical on cheap cloud infrastructure with metered bandwidth. To help reduce the burden of operating bots, feed generators, labelers, and other non-verifying AT Proto services, I built Jetstream as an alternative, lightweight, filterable JSON firehose for AT Proto. How the Firehose Works The AT Proto firehose is a mechanism used to keep verified, fully synced copies of the repos of all users. Since repos are represented as Merkle Search Trees, each firehose event contains an update to the user’s MST which includes all the changed blocks (nodes in the path from the root to the modified leaf). The root of this path is signed by the repo owner, and a consumer can keep their copy of the repo’s MST up-to-date by applying the diff in the event. For a more in-depth explanation of how Merkle Trees are constructed, check out this explainer. Practically, this means that for every small JSON record added to a repo, we also send along some number of MST blocks (which are content-addressed hashes and thus very information-dense) that are mostly useful for consumers attempting to keep a fully synced, verified copy of the repo. You can think of this as the difference between cloning a git repo v.s. just grabbing the latest version of the files without the .git folder. In this case, the firehose effectively streams the diffs for the repository with commits, signatures, and metadata, which is inherently heavier than a point-in-time checkout of the repo. Because firehose events with repo updates are signed by the repo owner, they allow a consumer to process events from any operator without having to trust the messenger. This is the “Authenticated” part of the Authenticated Transfer (AT) Protocol and is crucial to the correct functioning of the network. That being said, of the hundreds of consumers of Bluesky’s production Relay, >90% of them are building feeds, bots, and other tools that don’t keep full copies of the entire network and don’t verify MST operations at all. For these consumers, all they actually process is the JSON records created, updated, and deleted in each event. If consumers already trust the provider to do validation on their end, they could get by with a much more lightweight data stream. How Jetstream Works Jetstream is a streaming service that consumes an AT Proto com.atproto.sync.subscribeRepos stream and converts it into lightweight, friendly JSON. If you want to try it out yourself, you can connect to my public Jetstream instance and view all posts on Bluesky in realtime: $ websocat "wss://jetstream2.us-east.bsky.network/subscribe?wantedCollections=app.bsky.feed.post" Note: the above instance is operated by Bluesky PBC and is free to use, more instances are listed in the official repo Readme Jetstream converts the CBOR-encoded MST blocks produced by the AT Proto firehose and translates them into JSON objects that are easier to interface with using standard tooling available in programming languages. Since Repo MSTs only contain records in their leaf nodes, this means Jetstream can drop all of the blocks in an event except for those of the leaf nodes, typically leaving only one block per event. In reality, this means that Jetstream’s JSON firehose is nearly 1/10 the size of the full protocol firehose for the same events, but lacks the verifiability and signatures included in the protocol-level firehose. Jetstream events end up looking something like: { "did": "did:plc:eygmaihciaxprqvxpfvl6flk", "time_us": 1725911162329308, "type": "com", "commit": { "rev": "3l3qo2vutsw2b", "type": "c", "collection": "app.bsky.feed.like", "rkey": "3l3qo2vuowo2b", "record": { "$type": "app.bsky.feed.like", "createdAt": "2024-09-09T19:46:02.102Z", "subject": { "cid": "bafyreidc6sydkkbchcyg62v77wbhzvb2mvytlmsychqgwf2xojjtirmzj4", "uri": "at://did:plc:wa7b35aakoll7hugkrjtf3xf/app.bsky.feed.post/3l3pte3p2e325" } }, "cid": "bafyreidwaivazkwu67xztlmuobx35hs2lnfh3kolmgfmucldvhd3sgzcqi" } } Each event lets you know the DID of the repo it applies to, when it was seen by Jetstream (a time-based cursor), and up to one updated repo record as serialized JSON. Check out this 10 second CPU profile of Jetstream serving 200k evt/sec to a local consumer: By dropping the MST and verification overhead by consuming from relay we trust, we’ve reduced the size of a firehose of all events on the network from 232 GB/day to ~41GB/day, but we can do better. Jetstream and zstd I recently read a great engineering blog from Discord about their use of zstd to compress websocket traffic to/from their Gateway service and client applications. Since Jetstream emits marshalled JSON through the websocket for developer-friendliness, I figured it might be a neat idea to see if we could get further bandwidth reduction by employing zstd to compress events we send to consumers. zstd has two basic operating modes, “simple” mode and “streaming” mode. Streaming Compression At first glance, streaming mode seems like it’d be a great fit. We’ve got a websocket connection with a consumer and streaming mode allows the compression to get more efficient over the lifetime of the connection. I went and implemented a streaming compression version of Jetstream where a consumer can request compression when connecting and will get zstd compressed JSON sent as binary messages over the socket instead of plaintext. Unfortunately, this had a massive impact on Jetstream’s server-side CPU utilization. We were effectively compressing every message once per consumer as part of their streaming session. This was not a scalable approach to offering compression on Jetstream. Additionally, Jetstream stores a buffer of the past 24 hours (configurable) of events on disk in PebbleDB to allow consumers to replay events before getting transitioned into live-tailing mode. Jetstream stores serialized JSON in the DB, so playback is just shuffling the bytes into the websocket without having to round-trip the data into a Go struct. When we layer in streaming compression, playback becomes significantly more expensive because we have to compress outgoing events on-the-fly for a consumer that’s catching up. In real numbers, this increased CPU usage of Jetstream by 23% while lowering the throughput of playback from ~200k evt/sec to ~28k evt/sec for a single local consumer. When in streaming mode, we can’t leverage the bytes we compress for one consumer and reuse them for another consumer because zstd’s streaming context window may not be in sync between the two consumers. They haven’t received exactly the same data in the session so the clients on the other end don’t have their state machines in the same state. Since streaming mode’s primary advantage is giving us eventually better efficiency as the encoder learns about the data, what if we just taught the encoder about the data at the start and compress each message statelessly? Dictionary Mode zstd offers a mechanism for initializing an encoder/decoder with pre-optimized settings by providing a dictionary trained on a sample of the data you’ll be encoding/decoding. Using this dictionary, zstd essentially uses it’s smallest encoded representations for the most frequently seen patterns in the sample data. In our case, where we’re compressing serialized JSON with a common event shape and lots of common property names, training a dictionary on a large number of real events should allow us to represent the common elements among messages in the smallest number of bytes. For take two of Jetstream with zstd, let’s to use a single encoder for the whole service that utilizes a custom dictionary trained on 100,000 real events. We can use this encoder to compress every event as we see it, before persisting and emitting it to consumers. Now we end up with two copies of every event, one that’s just serialized JSON, and one that’s statelessly compressed to zstd using our dictionary. Any consumers that want compression can have a copy of the dictionary on their end to initialize a decoder, then when we broadcast the shared compressed event, all consumers can read it without any state or context issues. This requires the consumers and server to have a pre-shared dictionary, which is a major drawback of this implementation but good enough for our purposes. That leaves the problem of event playback for compression-enabled clients. An easy solution here is to just store the compressed events as well! Since we’re only sticking the JSON records into our PebbleDB, the actual size of the 24 hour playback window is <8GB with sstable compression. If we store a copy of the JSON serialized event and a copy of the zstd compressed event, this will, at most, double our storage requirements. Then during playback, if the consumer requests compression, we can just shuffle bytes out of the compressed version of the DB into their socket instead of having to move it through a zstd encoder. Savings Running with a custom dictionary, I was able to get the average Jetstream event down from 482 bytes to just 211 bytes (~0.44 compression ratio). Jetstream allows us to live tail all posts on Bluesky as they’re posted for as little as ~850 MB/day, and we could keep up with all events moving through the firehose during the Brazil Twitter Exodus weekend for 18GB/day (down from 232GB/day). With this scheme, Jetstream is required to compress each event only once before persisting it to disk and emitting it to connected consumers. The CPU impact of these changes is significant in proportion to Jetstream’s incredibly light load but it’s a flat cost we pay once no matter how many consumers we have. (CPU profile from a 30 second pprof sample with 12 consumers live-tailing Jetstream) Additionally, with Jetstream’s shared buffer broadcast architecture, we keep memory allocations incredibly low and the cost per consumer on CPU and RAM is trivial. In the allocation profile below, more than 80% of the allocations are used to consume the full protocol firehose. The total resident memory of Jetstream sits below 16MB, 25% of which is actually consumed by the new zstd dictionary. To bring it all home, here’s a screenshot from the dashboard of my public Jetstream instance serving 12 consumers all with various filters and compression settings, running on a $5/mo OVH VPS. At our new baseline firehose activity, a consumer of the protocol-level firehose would require downloading ~3.16TB/mo to keep up. A Jetstream consumer getting all created, updated, and deleted records without compression enabled would require downloading ~400GB/mo to keep up. A Jetstream consumer that only cares about posts and has zstd compression enabled can get by on as little as ~25.5GB/mo, <99% of the full weight firehose. Feel free to join the conversation about Jetstream and zstd on Bluesky.

11 months ago • 31 votes

How HLS Works

Over the past few weeks, I’ve been building out server-side short video support for Bluesky. The major aim of this feature is to support short (90 second max) video streaming at a quality that doesn’t cost an arm and a leg for us to provide for free. In order to stay within these constraints, we’re considering making use of a video CDN that can bear the brunt of the bandwidth required to support Video-on-Demand streaming. While the CDN is a pretty fully-featured product, we want to avoid too much vendor lock-in and provide some enhancements to our streaming platform that requires extending their offering and getting creative with video streaming protocols. Some of the things we’d like to be able to do that don’t work out-of-the-box are: Track view counts, viewer sessions, and duration viewed to provide better feedback for video performance. Provide dynamic closed-caption support with the flexibility to automate them in the future. Store a transcoded version of source files somewhere durable to provide a “source of truth” for videos when needed. Append a “trailer” to the end of video streams for some branding in a TikTok-esque 3-second snippet. In this post I’ll be focusing on the HLS-related features above, namely view/duration accounting, closed captions, and trailers. HLS is Just a Bunch of Text files HTTP Live Streaming (HLS) is a standard established by Apple in 2009 that allows for adaptive-bitrate live and Video-on-Demand (VOD) streaming. For the purposes of this blog post, I’ll restrict my explanations to how HLS VOD streaming works. A player that implements the HLS protocol is capable of dynamically adjusting the quality of a streamed video based on network conditions. Additionally, a server that implements the HLS protocol should provide one or more variants of a media stream which accommodate varying network qualities to allow for graceful degradation of stream quality without stopping playback. HLS implements this by producing a series of plaintext (.m3u8) “playlist” files that tell the player what bitrates and resolutions the server provides so that the player can decide which variant it should stream. HLS differentiates between two kinds of “playlist” files: Master Playlists, and Media Playlists. Master Playlists A Master Playlist is the first file fetched by your video player. It contains a series of variants which point to child Media Playlists. It also describes the approximate bitrate of the variant sources and the codecs and resolutions used by those sources. $ curl https://my.video.host.com/video_15/playlist.m3u8 #EXTM3U #EXT-X-VERSION:3 #EXT-X-STREAM-INF:PROGRAM-ID=0,BANDWIDTH=688540,CODECS="avc1.64001e,mp4a.40.2",RESOLUTION=640x360 360p/video.m3u8 #EXT-X-STREAM-INF:PROGRAM-ID=0,BANDWIDTH=1921217,CODECS="avc1.64001f,mp4a.40.2",RESOLUTION=1280x720 720p/video.m3u8 In the above file, the key things to notice are the RESOLUTION parameters and the {res}/video.m3u8 links. Your media player will generally start with the lowest resolution version before jumping up to higher resolutions once the network speed between you and the server is dialed in. The links in this file are pointers to Media Playlists, generally as relative paths from the Master Playlist such that, if we wanted to grab the 720p Media Playlist, we’d navigate to: https://my.video.host.com/video_15/720p/video.m3u8. A Master Playlist can also contain multi-track audio directives and directives for closed-captions but for now let’s move onto the Media Playlist. Media Playlists A Media Playlist is yet another plaintext file that provides your video player with two key bits of data: a list of media Segments (encoded as .ts video files) and headers for each Segment that tell the player the runtime of the media. $ curl https://my.video.host.com/video_15/720p/video.m3u8 #EXTM3U #EXT-X-VERSION:3 #EXT-X-PLAYLIST-TYPE:VOD #EXT-X-MEDIA-SEQUENCE:0 #EXT-X-TARGETDURATION:4 #EXTINF:4.000, video0.ts #EXTINF:4.000, video1.ts #EXTINF:4.000, video2.ts #EXTINF:4.000, video3.ts #EXTINF:4.000, video4.ts #EXTINF:2.800, video5.ts This Media Playlist describes a video that’s 22.8 seconds long (5 x 4-second Segments + 1 x 2.8-second Segment). The playlist describes a VOD piece of media, meaning we know this playlist contains the entirety of the media the player needs. The TARGETDURATION tells us the maximum length of each Segment so the player knows how many Segments to buffer ahead of time. During live streaming, that also lets the player know how frequently to refresh the playlist file to discover new Segments. Finally the EXTINF headers for each Segment indicate the duration of the following .ts Segment file and the relative paths of the video#.ts tell the player where to load the actual media files from. Where’s the Actual Media? At this point, the video player has loaded two .m3u8 playlist files and got lots of metadata about how to play the video but it hasn’t actually loaded any media files. The .ts files referenced in the Media Playlist are where the real media is, so if we wanted to control the playlists but let the CDN handle serving actual media, we can just redirect those video#.ts requests to our CDN. .ts files are Transport Stream MPEG-2 encoded short media files that can contain video or audio and video. Tracking Views To track views of our HLS streams, we can leverage the fact that every video player must first load the Master Playlist. When a user requests the Master Playlist, we can modify the results dynamically to provide a SessionID to each response and allow us to track the user session without cookies or headers: #EXTM3U #EXT-X-VERSION:3 #EXT-X-STREAM-INF:PROGRAM-ID=0,BANDWIDTH=688540,CODECS="avc1.64001e,mp4a.40.2",RESOLUTION=640x360 360p/video.m3u8?session_id=12345 #EXT-X-STREAM-INF:PROGRAM-ID=0,BANDWIDTH=1921217,CODECS="avc1.64001f,mp4a.40.2",RESOLUTION=1280x720 720p/video.m3u8?session_id=12345 Now when their video player fetches the Media Playlists, it’ll include a query-string that we can use to identify the streaming session, ensuring we don’t double-count views on the video and can track which Segments of video were loaded in the session. #EXTM3U #EXT-X-VERSION:3 #EXT-X-PLAYLIST-TYPE:VOD #EXT-X-MEDIA-SEQUENCE:0 #EXT-X-TARGETDURATION:4 #EXTINF:4.000, video0.ts?session_id=12345&duration=4 #EXTINF:4.000, video1.ts?session_id=12345&duration=4 #EXTINF:4.000, video2.ts?session_id=12345&duration=4 #EXTINF:4.000, video3.ts?session_id=12345&duration=4 #EXTINF:4.000, video4.ts?session_id=12345&duration=4 #EXTINF:2.800, video5.ts?session_id=12345&duration=2.8 Finally when the video player fetches the media Segment files, we can measure the Segment view before we redirect to our CDN with a 302, allowing us to know the amount of video-seconds loaded in the session and which Segments were loaded. This method has limitations, namely that a media player loading a segment doesn’t necessarily mean it showed that segment to the viewer, but it’s the best we can do without an instrumented media player. Adding Subtitles Subtitles are included in the Master Playlist as a variant and then are referenced in each of the video variants to let the player know where to load subs from. #EXTM3U #EXT-X-VERSION:3 #EXT-X-MEDIA:TYPE=SUBTITLES,GROUP-ID="subs",NAME="en_subtitle",DEFAULT=NO,AUTOSELECT=yes,LANGUAGE="en",FORCED="no",CHARACTERISTICS="public.accessibility.transcribes-spoken-dialog",URI="subtitles/en.m3u8" #EXT-X-STREAM-INF:PROGRAM-ID=0,BANDWIDTH=688540,CODECS="avc1.64001e,mp4a.40.2",RESOLUTION=640x360,SUBTITLES="subs" 360p/video.m3u8 #EXT-X-STREAM-INF:PROGRAM-ID=0,BANDWIDTH=1921217,CODECS="avc1.64001f,mp4a.40.2",RESOLUTION=1280x720,SUBTITLES="subs" 720p/video.m3u8 Just like with the video Media Playlists, we need a Media Playlist file for the subtitle track as well so that the player knows where to load the source files from and what duration of the stream they cover. $ curl https://my.video.host.com/video_15/subtitles/en.m3u8 #EXTM3U #EXT-X-VERSION:3 #EXT-X-MEDIA-SEQUENCE:0 #EXT-X-TARGETDURATION:22.8 #EXTINF:22.800, en.vtt In this case, since we’re only serving a short video, we can just provide a single Segment that points at a WebVTT subtitle file encompassing the entire duration of the video. If you crack open the en.vtt file you’ll see something like: $ curl https://my.video.host.com/video_15/subtitles/en.vtt WEBVTT 00:00.000 --> 00:02.000 According to all known laws of aviation, 00:02.000 --> 00:04.000 there is no way a bee should be able to fly. 00:04.000 --> 00:06.000 Its wings are too small to get its fat little body off the ground. ... The media player is capable of reading WebVTT and presenting the subtitles at the right time to the viewer. For longer videos you may want to break up your VTT files into more Segments and update the subtitle Media Playlist accordingly. To provide multiple languages and versions of subtitles, just add more EXT-X-MEDIA:TYPE=SUBTITLES lines to the Master Playlist and tweak the NAME, LANGUAGE (if different), and URI of the additional subtitle variant definitions. #EXT-X-MEDIA:TYPE=SUBTITLES,GROUP-ID="subs",NAME="en_subtitle",DEFAULT=NO,AUTOSELECT=yes,LANGUAGE="en",FORCED="no",CHARACTERISTICS="public.accessibility.transcribes-spoken-dialog",URI="subtitles/en.m3u8" #EXT-X-MEDIA:TYPE=SUBTITLES,GROUP-ID="subs",NAME="fr_subtitle",DEFAULT=NO,AUTOSELECT=yes,LANGUAGE="fr",FORCED="no",CHARACTERISTICS="public.accessibility.transcribes-spoken-dialog",URI="subtitles/fr.m3u8" #EXT-X-MEDIA:TYPE=SUBTITLES,GROUP-ID="subs",NAME="ja_subtitle",DEFAULT=NO,AUTOSELECT=yes,LANGUAGE="ja",FORCED="no",CHARACTERISTICS="public.accessibility.transcribes-spoken-dialog",URI="subtitles/ja.m3u8" Appending a Trailer For branding purposes (and in other applications, for advertising purposes), it can be helpful to insert Segments of video into a playlist to change the content of the video without requiring the content be appended to and re-encoded with the source file. Thankfully, HLS allows us to easily insert Segments into the Media Playlist using this one neat trick: #EXTM3U #EXT-X-VERSION:3 #EXT-X-PLAYLIST-TYPE:VOD #EXT-X-MEDIA-SEQUENCE:0 #EXT-X-TARGETDURATION:4 #EXTINF:4.000, video0.ts #EXTINF:4.000, video1.ts #EXTINF:4.000, video2.ts #EXTINF:4.000, video3.ts #EXTINF:4.000, video4.ts #EXTINF:2.800, video5.ts #EXT-X-DISCONTINUITY #EXTINF:3.337, trailer0.ts #EXTINF:1.201, trailer1.ts #EXTINF:1.301, trailer2.ts #EXT-X-ENDLIST In this Media Playlist we use HLS’s EXT-X-DISCONTINUITY header to let the video player know that the following Segments may be in a different bitrate, resolution, and aspect-ratio than the preceding content. Once we’ve provided the discontinuity header, we can add more Segments just like normal that point at a different media source broken up into .ts files. Remember, HLS allows us to use relative or absolute paths here, so we could provide a full URL for these trailer#.ts files, or virtually route them so they can retain the path context of the currently viewed video. Note that we don’t need to provide the discontinuity header here, and we could also name the trailer files something like video{6-8}.ts if we wanted to, but for clarity and proper player behavior, it’s best to use the discontinuity header if your trailer content doesn’t match the bitrate and resolution of the other video Segments. When the video player goes to play this media, it will continue from video5.ts to trailer0.ts without missing a beat, making it appear as if the trailer is part of the original video. This approach allows us to dynamically change the contents of the trailer for all videos, heavily cache the trailer .ts Segment files for performance, and avoid having to encode the trailer onto the end of every video source file. Conclusion At the end of the day, we’ve now got a video streaming service capable of tracking views and watch session durations, dynamic closed caption support, and branded trailers to help grow the platform. HLS is not a terribly complex protocol. The vast majority of it is human-readable plaintext files and is easy to inspect in the wild to how it’s used in production. When I started this project, I knew next to nothing about the protocol but was able to download some .m3u8 files and get digging to discover how the protocol worked, then build my own implementation of a HLS server to accommodate the video streaming needs of Bluesky. To learn more about HLS, you can check out the official RFC here which describes all the features discussed above and more. I hope this post encourages you to go explore other protocols you use every day by poking at them in the wild, downloading the files your browser interprets for you, and figuring out how simple some of these apparently “complex” systems are. If you’re interested in solving problems like these, take a look at our open Job Recs. If you have any questions about HLS, Bluesky, or other distributed, @scale social media infrastructure, you can find me on Bluesky here and you can discuss this post here.

a year ago • 28 votes

An entire Social Network in 1.6GB (GraphD Part 2)

In Part 1 of this series, we tried to answer the question “who do you follow who also follows user B” in Bluesky, a social network with millions of users and hundreds of millions of follow relationships. At the conclusion of the post, we’d developed an in-memory graph store for the network that uses HashMaps and HashSets to keep track of the followers of every user and the set of users they follow, allowing bidirectional lookups, intersections, unions, and other set operations for combining social graph data. I received some helpful feedback after that post where several people pointed me towards Roaring Bitmaps as a potential improvement on my implementation. They were right, Roaring Bitmaps would be an excellent fit for my Graph service, GraphD, and could also provide me with a much needed way to quickly persist and load the Graph data to and from disk on startup, hopefully reducing the startup time of the service. What are Bitmaps? If you just want to dive into the Roaring Bitmap spec, you can read the paper here, but it might be easier to first talk about bitmaps in general. You can think of a bitmap as a vector of one-bit values (like booleans) that let you encode a set of integer values. For instance, say we have 10,000 users on our website and want to keep track of which users have validated their email addresses. We could do this by creating a list of the uint32 user IDs of each user, in which case if all 10,000 users have validated their emails we’re storing 10k * 32 bits = 40KB. Or, we could create a vector of single-bit values that’s 10,000 bits long (10k / 8 = 1.25KB), then if a user has confirmed their email we can set the value at the index of their UID to 1. If we want to create a list of all the UIDs of validated accounts, we can walk the vector and record the index of each non-zero bit. If we want to check if user n has validated their email, we can do a O(1) lookup in the bitmap by loading the bit at index n and checking if it’s set. When Bitmaps get Big and Sparse Now when talking about our social network problem, we’re dealing with a few more than 10,000 UIDs. We need to keep track of 5.5M users and whether or not the user follows or is followed by any of the other 5.5M users in the network. To keep a bitmap of “People who follow User A”, we’re going to need 5.5M bits which would require (5.5M / 8) ~687KB of space. If we wanted to keep bitmaps of “People who follow User A” and “People who User A follows”, we’d need ~1.37MB of space per user using a simple bitmap, meaning we’d need 5,500,000 * 1.37MB = ~7.5 Terabytes of space! Clearly this isn’t an improvement of our strategy from Part 1, so how can we make this more efficient? One strategy for compressing the bitmap is to take consecutive runs of 0’s or 1’s (i.e. 00001110000001) in the bitmap and turn them into a number. For instance if we had an account that followed only the last 100 accounts in our social network, the first 5,499,900 indices in our bitmap would be 0’s and so we could represent the bitmap by saying: 5,499,900 0's, then 100 1's which you notice I’ve written here in a lot fewer than 687KB and a computer could encode using two uint32 values plus two bits (one indicator bit for the state of each run) for a total of 66 bits. This strategy is called Run Length Encoding (RLE) and works pretty well but has a few drawbacks: mainly if your data is randomly and heavily populated, you may not have many consecutive runs (imagine a bitset where every odd bit is set and every even bit is unset). Also lookups and evaluation of the bitset requires walking the whole bitset to figure out where the index you care about lives in the compressed format. Thankfully there’s a more clever way to compress bitmaps using a strategy called Roaring Bitmaps. A brief description of the storage strategy for Roaring Bitmaps from the official paper is as follows: We partition the range of 32-bit indexes ([0, n)) into chunks of 2^16 integers sharing the same 16 most significant digits. We use specialized containers to store their 16 least significant bits. When a chunk contains no more than 4096 integers, we use a sorted array of packed 16-bit integers. When there are more than 4096 integers, we use a 2^16-bit bitmap. Thus, we have two types of containers: an array container for sparse chunks and a bitmap container for dense chunks. The 4096 threshold insures that at the level of the containers, each integer uses no more than 16 bits. These bitmaps are designed to support both densely and sparsely distributed data and can provide high performance binary set operations (and/or/etc.) operating on the containers within two or more bitsets in parallel. For more info on how Roaring Bitmaps work and some neat diagrams, check out this excellent primer on Roaring Bitmaps by Vikram Oberoi. So, how does this help us build a better graph? GraphD, Revisited with Roaring Bitmaps Let’s get back to our GraphD Service, this time in Go instead of Rust. For each user we can keep track of a struct with two bitmaps: type FollowMap struct { followingBM *roaring.Bitmap followingLk sync.RWMutex followersBM *roaring.Bitmap followersLk sync.RWMutex } Our FollowMap gives us a Roaring Bitmap for both the set of users we follow, and the set of users who follow us. Adding a Follow to the graph just requires we set the right bits in both user’s respective maps: // Note I've removed locking code and error checks for brevity func (g *Graph) addFollow(actorUID, targetUID uint32) { actorMap, _ := g.g.Load(actorUID) actorMap.followingBM.Add(targetUID) targetMap, _ := g.g.Load(targetUID) targetMap.followersBM.Add(actorUID) } Even better if we want to compute the intersections of two sets (i.e. the people User A follows who also follow User B) we can do so in parallel: // Note I've removed locking code and error checks for brevity func (g *Graph) IntersectFollowingAndFollowers(actorUID, targetUID uint32) ([]uint32, error) { actorMap, ok := g.g.Load(actorUID) targetMap, ok := g.g.Load(targetUID) intersectMap := roaring.ParAnd(4, actorMap.followingBM, targetMap.followersBM) return intersectMap.ToArray(), nil } Storing the entire graph as Roaring Bitmaps in-memory costs us around 6.5GB of RAM and allows us to perform set intersections between moderately large sets (with hundreds of thousands of set bits) in under 500 microseconds while serving over 70k req/sec! And the best part of all? We can use Roaring’s serialization format to write these bitmaps to disk or transfer them over the network. Storing 164M Follows in 1.6GB In the original version of GraphD, on startup the service would read a CSV file with an adjacency list of the (ActorDID, TargetDID) pairs of all follows on the network. This required creating a CSV dump of the follows table, pausing writes to the follows table, then bringing up the service and waiting 5 minutes for it to read the CSV file, intern the DIDs as uint32 UIDs, and construct the in-memory graph. This process is slow, pauses writes for 5 minutes, and every time our service restarts we have to do it all over again! With Roaring Bitmaps, we’re now given an easy way to effectively serialize a version of the in-memory graph that is many times smaller than the adjacency list CSV and many times faster to load. We can serialize the entire graph into a SQLite DB on the local machine where each row in a table contains: (uid, DID, followers_bitmap, following_bitmap) Loading the entire graph from this SQLite DB can be done in around ~20 seconds: // Note I've removed locking code and error checks for brevity rows, err := g.db.Query(`SELECT uid, did, following, followers FROM actors;`) for rows.Next() { var uid uint32 var did string var followingBytes []byte var followersBytes []byte rows.Scan(&uid, &did, &followingBytes, &followersBytes) followingBM := roaring.NewBitmap() followingBM.FromBuffer(followingBytes) followersBM := roaring.NewBitmap() followersBM.FromBuffer(followersBytes) followMap := &FollowMap{ followingBM: followingBM, followersBM: followersBM, followingLk: sync.RWMutex{}, followersLk: sync.RWMutex{}, } g.g.Store(uid, followMap) g.setUID(did, uid) g.setDID(uid, did) } While the service is running, we can also keep track of the UIDs of actors who have added or removed a follow since the last time we saved the DB, allowing us to periodically flush changes to the on-disk SQLite only for bitmaps that have updated. Syncing our data every 5 seconds while tailing the production firehose takes 2ms and writes an average of only ~5MB to disk per flush. The crazy part of this is, the on-disk representation of our entire follow network is only ~1.6GB! Because we’re making use of Roaring’s compressed serialized format, we can turn the ~6.5GB of in-memory maps into 1.6GB of on-disk data. Our largest bitmap, the followers of the bsky.app account with over 876k members, becomes ~500KB as a blob stored in SQLite. So, to wrap up our exploration of Roaring Bitmaps for first-degree graph databases, we saw: A ~20% reduction in resident memory size compared to HashSets and HashMaps A ~84% reduction in the on-disk size of the graph compared to an adjacency list A ~93% reduction in startup time compared to loading from an adjacency list A ~66% increase in throughput of worst-case requests under load A ~59% reduction in p99 latency of worst-case requests under low My next iteration on this problem will likely be to make use of DGraph’s in-memory Serialized Roaring Bitmap library that allows you to operate on fully-compressed bitmaps so there’s no need to serialize and deserialize them when reading from or writing to disk. It also probably results in significant memory savings as well! If you’re interested in solving problems like these, take a look at our open Backend Developer Job Rec. You can find me on Bluesky here, you can chat about this post here.

a year ago • 31 votes

More in programming

strongly typed?

What does it mean when someone writes that a programming language is “strongly typed”? I’ve known for many years that “strongly typed” is a poorly-defined term. Recently I was prompted on Lobsters to explain why it’s hard to understand what someone means when they use the phrase. I came up with more than five meanings! how strong? The various meanings of “strongly typed” are not clearly yes-or-no. Some developers like to argue that these kinds of integrity checks must be completely perfect or else they are entirely worthless. Charitably (it took me a while to think of a polite way to phrase this), that betrays a lack of engineering maturity. Software engineers, like any engineers, have to create working systems from imperfect materials. To do so, we must understand what guarantees we can rely on, where our mistakes can be caught early, where we need to establish processes to catch mistakes, how we can control the consequences of our mistakes, and how to remediate when somethng breaks because of a mistake that wasn’t caught. strong how? So, what are the ways that a programming language can be strongly or weakly typed? In what ways are real programming languages “mid”? Statically typed as opposed to dynamically typed? Many languages have a mixture of the two, such as run time polymorphism in OO languages (e.g. Java), or gradual type systems for dynamic languages (e.g. TypeScript). Sound static type system? It’s common for static type systems to be deliberately unsound, such as covariant subtyping in arrays or functions (Java, again). Gradual type systems migh have gaping holes for usability reasons (TypeScript, again). And some type systems might be unsound due to bugs. (There are a few of these in Rust.) Unsoundness isn’t a disaster, if a programmer won’t cause it without being aware of the risk. For example: in Lean you can write “sorry” as a kind of “to do” annotation that deliberately breaks soundness; and Idris 2 has type-in-type so it accepts Girard’s paradox. Type safe at run time? Most languages have facilities for deliberately bypassing type safety, with an “unsafe” library module or “unsafe” language features, or things that are harder to spot. It can be more or less difficult to break type safety in ways that the programmer or language designer did not intend. JavaScript and Lua are very safe, treating type safety failures as security vulnerabilities. Java and Rust have controlled unsafety. In C everything is unsafe. Fewer weird implicit coercions? There isn’t a total order here: for instance, C has implicit bool/int coercions, Rust does not; Rust has implicit deref, C does not. There’s a huge range in how much coercions are a convenience or a source of bugs. For example, the PHP and JavaScript == operators are made entirely of WAT, but at least you can use === instead. How fancy is the type system? To what degree can you model properties of your program as types? Is it convenient to parse, not validate? Is the Curry-Howard correspondance something you can put into practice? Or is it only capable of describing the physical layout of data? There are probably other meanings, e.g. I have seen “strongly typed” used to mean that runtime representations are abstract (you can’t see the underlying bytes); or in the past it sometimes meant a language with a heavy type annotation burden (as a mischaracterization of static type checking). how to type So, when you write (with your keyboard) the phrase “strongly typed”, delete it, and come up with a more precise description of what you really mean. The desiderata above are partly overlapping, sometimes partly orthogonal. Some of them you might care about, some of them not. But please try to communicate where you draw the line and how fuzzy your line is.

yesterday • 8 votes

Logical Duals in Software Engineering

(Last week's newsletter took too long and I'm way behind on Logic for Programmers revisions so short one this time.1) In classical logic, two operators F/G are duals if F(x) = !G(!x). Three examples: x || y is the same as !(!x && !y). <>P ("P is possibly true") is the same as ![]!P ("not P isn't definitely true"). some x in set: P(x) is the same as !(all x in set: !P(x)). (1) is just a version of De Morgan's Law, which we regularly use to simplify boolean expressions. (2) is important in modal logic but has niche applications in software engineering, mostly in how it powers various formal methods.2 The real interesting one is (3), the "quantifier duals". We use lots of software tools to either find a value satisfying P or check that all values satisfy P. And by duality, any tool that does one can do the other, by seeing if it fails to find/check !P. Some examples in the wild: Z3 is used to solve mathematical constraints, like "find x, where f(x) >= 0. If I want to prove a property like "f is always positive", I ask z3 to solve "find x, where !(f(x) >= 0), and see if that is unsatisfiable. This use case powers a LOT of theorem provers and formal verification tooling. Property testing checks that all inputs to a code block satisfy a property. I've used it to generate complex inputs with certain properties by checking that all inputs don't satisfy the property and reading out the test failure. Model checkers check that all behaviors of a specification satisfy a property, so we can find a behavior that reaches a goal state G by checking that all states are !G. Here's TLA+ solving a puzzle this way.3 Planners find behaviors that reach a goal state, so we can check if all behaviors satisfy a property P by asking it to reach goal state !P. The problem "find the shortest traveling salesman route" can be broken into some route: distance(route) = n and all route: !(distance(route) < n). Then a route finder can find the first, and then convert the second into a some and fail to find it, proving n is optimal. Even cooler to me is when a tool does both finding and checking, but gives them different "meanings". In SQL, some x: P(x) is true if we can query for P(x) and get a nonempty response, while all x: P(x) is true if all records satisfy the P(x) constraint. Most SQL databases allow for complex queries but not complex constraints! You got UNIQUE, NOT NULL, REFERENCES, which are fixed predicates, and CHECK, which is one-record only.4 Oh, and you got database triggers, which can run arbitrary queries and throw exceptions. So if you really need to enforce a complex constraint P(x, y, z), you put in a database trigger that queries some x, y, z: !P(x, y, z) and throws an exception if it finds any results. That all works because of quantifier duality! See here for an example of this in practice. Duals more broadly "Dual" doesn't have a strict meaning in math, it's more of a vibe thing where all of the "duals" are kinda similar in meaning but don't strictly follow all of the same rules. Usually things X and Y are duals if there is some transform F where X = F(Y) and Y = F(X), but not always. Maybe the category theorists have a formal definition that covers all of the different uses. Usually duals switch properties of things, too: an example showing some x: P(x) becomes a counterexample of all x: !P(x). Under this definition, I think the dual of a list l could be reverse(l). The first element of l becomes the last element of reverse(l), the last becomes the first, etc. A more interesting case is the dual of a K -> set(V) map is the V -> set(K) map. IE the dual of lived_in_city = {alice: {paris}, bob: {detroit}, charlie: {detroit, paris}} is city_lived_in_by = {paris: {alice, charlie}, detroit: {bob, charlie}}. This preserves the property that x in map[y] <=> y in dual[x]. And after writing this I just realized this is partial retread of a newsletter I wrote a couple months ago. But only a partial retread! ↩ Specifically "linear temporal logics" are modal logics, so "eventually P ("P is true in at least one state of each behavior") is the same as saying !always !P ("not P isn't true in all states of all behaviors"). This is the basis of liveness checking. ↩ I don't know for sure, but my best guess is that Antithesis does something similar when their fuzzer beats videogames. They're doing fuzzing, not model checking, but they have the same purpose check that complex state spaces don't have bugs. Making the bug "we can't reach the end screen" can make a fuzzer output a complete end-to-end run of the game. Obvs a lot more complicated than that but that's the general idea at least. ↩ For CHECK to constraint multiple records you would need to use a subquery. Core SQL does not support subqueries in check. It is an optional database "feature outside of core SQL" (F671), which Postgres does not support. ↩

2 days ago • 8 votes

Omarchy 2.0

Omarchy 2.0 was released on Linux's 34th birthday as a gift to perhaps the greatest open-source project the world has ever known. Not only does Linux run 95% of all servers on the web, billions of devices as an embedded OS, but it also turns out to be an incredible desktop environment! It's crazy that it took me more than thirty years to realize this, but while I spent time in Apple's walled garden, the free software alternative simply grew better, stronger, and faster. The Linux of 2025 is not the Linux of the 90s or the 00s or even the 10s. It's shockingly more polished, capable, and beautiful. It's been an absolute honor to celebrate Linux with the making of Omarchy, the new Linux distribution that I've spent the last few months building on top of Arch and Hyprland. What began as a post-install script has turned into a full-blown ISO, dedicated package repository, and flourishing community of thousands of enthusiasts all collaborating on making it better. It's been improving rapidly with over twenty releases since the premiere in late June, but this Version 2.0 update is the biggest one yet. If you've been curious about giving Linux a try, you're not afraid of an operating system that asks you to level up and learn a little, and you want to see what a totally different computing experience can look and feel like, I invite you to give it a go. Here's a full tour of Omarchy 2.0.

3 days ago • 8 votes

Dissecting the Apple M1 GPU, the end

In 2020, Apple released the M1 with a custom GPU. We got to work reverse-engineering the hardware and porting Linux. Today, you can run Linux on a range of M1 and M2 Macs, with almost all hardware working: wireless, audio, and full graphics acceleration. Our story begins in December 2020, when Hector Martin kicked off Asahi Linux. I was working for Collabora working on Panfrost, the open source Mesa3D driver for Arm Mali GPUs. Hector put out a public call for guidance from upstream open source maintainers, and I bit. I just intended to give some quick pointers. Instead, I bought myself a Christmas present and got to work. In between my university coursework and Collabora work, I poked at the shader instruction set. One thing led to another. Within a few weeks, I drew a triangle. In 3D graphics, once you can draw a triangle, you can do anything. Pretty soon, I started work on a shader compiler. After my final exams that semester, I took a few days off from Collabora to bring up an OpenGL driver capable of spinning gears with my new compiler. Over the next year, I kept reverse-engineering and improving the driver until it could run 3D games on macOS. Meanwhile, Asahi Lina wrote a kernel driver for the Apple GPU. My userspace OpenGL driver ran on macOS, leaving her kernel driver as the missing piece for an open source graphics stack. In December 2022, we shipped graphics acceleration in Asahi Linux. In January 2023, I started my final semester in my Computer Science program at the University of Toronto. For years I juggled my courses with my part-time job and my hobby driver. I faced the same question as my peers: what will I do after graduation? Maybe Panfrost? I started reverse-engineering of the Mali Midgard GPU back in 2017, when I was still in high school. That led to an internship at Collabora in 2019 once I graduated, turning into my job throughout four years of university. During that time, Panfrost grew from a kid’s pet project based on blackbox reverse-engineering, to a professional driver engineered by a team with Arm’s backing and hardware documentation. I did what I set out to do, and the project succeeded beyond my dreams. It was time to move on. What did I want to do next? Finish what I started with the M1. Ship a great driver. Bring full, conformant OpenGL drivers to the M1. Apple’s drivers are not conformant, but we should strive for the industry standard. Bring full, conformant Vulkan to Apple platforms, disproving the myth that Vulkan isn’t suitable for Apple hardware. Bring Proton gaming to Asahi Linux. Thanks to Valve’s work for the Steam Deck, Windows games can run better on Linux than even on Windows. Why not reap those benefits on the M1? Panfrost was my challenge until we “won”. My next challenge? Gaming on Linux on M1. Once I finished my coursework, I started full-time on gaming on Linux. Within a month, we shipped OpenGL 3.1 on Asahi Linux. A few weeks later, we passed official conformance for OpenGL ES 3.1. That put us at feature parity with Panfrost. I wanted to go further. OpenGL (ES) 3.2 requires geometry shaders, a legacy feature not supported by either Arm or Apple hardware. The proprietary OpenGL drivers emulate geometry shaders with compute, but there was no open source prior art to borrow. Even though multiple Mesa drivers need geometry/tessellation emulation, nobody did the work to get there. My early progress on OpenGL was fast thanks to the mature common code in Mesa. It was time to pay it forward. Over the rest of the year, I implemented geometry/tessellation shader emulation. And also the rest of the owl. In January 2024, I passed conformance for the full OpenGL 4.6 specification, finishing up OpenGL. Vulkan wasn’t too bad, either. I polished the OpenGL driver for a few months, but once I started typing a Vulkan driver, I passed 1.3 conformance in a few weeks. What remained was wiring up the geometry/tessellation emulation to my shiny new Vulkan driver, since those are required for Direct3D. Et voilà, Proton games. Along the way, Karol Herbst passed OpenCL 3.0 conformance on the M1, running my compiler atop his “rusticl” frontend. Meanwhile, when the Vulkan 1.4 specification was published, we were ready and shipped a conformant implementation on the same day. After that, I implemented sparse texture support, unlocking Direct3D 12 via Proton. …Now what? Ship a great driver? Check. Conformant OpenGL 4.6, OpenGL ES 3.2, and OpenCL 3.0? Check. Conformant Vulkan 1.4? Check. Proton gaming? Check. That’s a wrap. We’ve succeeded beyond my dreams. The challenges I chased, I have tackled. The drivers are fully upstream in Mesa. Performance isn’t too bad. With the Vulkan on Apple myth busted, conformant Vulkan is now coming to macOS via LunarG’s KosmicKrisp project building on my work. Satisfied, I am now stepping away from the Apple ecosystem. My friends in the Asahi Linux orbit will carry the torch from here. As for me? Onto the next challenge!

3 days ago • 12 votes

Changing Careers to Software Development in Japan

TokyoDev has published a number of different guides on coming to Japan to work as a software developer. But what if you’re already employed in another industry in Japan, and are considering changing your career to software development? I interviewed four people who became developers after they moved to Japan, for their advice and personal experiences on: Why they chose development How they switched careers How they successfully found their first jobs What mistakes they made in the job hunt The most important advice they give to others Why switch to software development? A lifelong goal For Yuta Asakura, a career in software was the dream all along. “I’ve always wanted to work with computers,” he said, “but due to financial difficulties, I couldn’t pursue a degree in computer science. I had to start working early to support my single mother. As the eldest child, I focused on helping my younger brother complete his education.” To support his family, Asakura worked in construction for eight years, eventually becoming a foreman in Yokohama. Meanwhile, his brother graduated, and became a software engineer after joining the Le Wagon Tokyo bootcamp. About a year before his brother graduated, Asakura began to delve back into development. “I had already begun self-studying in my free time by taking online courses and building small projects,” he explained. “ I quickly became hooked by how fun and empowering it was to learn, apply, and build. It wasn’t always easy. There were moments I wanted to give up, but the more I learned, the more interesting things I could create. That feeling kept me going.” What truly inspired me was the idea of creating something from nothing. Coming from a construction background, I was used to building things physically. But I wanted to create things that were digital, scalable, borderless, and meaningful to others. An unexpected passion As Andrew Wilson put it, “Wee little Andrew had a very digital childhood,” full of games and computer time. Rather than pursuing tech, however, he majored in Japanese and moved to Japan in 2012, where he initially worked as a language teacher and recruiter before settling into sales. Wilson soon discovered that sales wasn’t really his strong suit. “At the time I was selling three different enterprise software solutions.” So I had to have a fairly deep understanding of that software from a user perspective, and in the course of learning about these products and giving technical demonstrations, I realized that I liked doing that bit of my job way more than I liked actually trying to sell these things. Around that time, he also realized he didn’t want to manually digitize the many business cards he always collected during sales meetings: “That’s boring, and I’m lazy.” So instead, he found a business card-scanning app, made a spreadsheet to contain the data, automated the whole process, and shared it internally within his company. His manager approached him soon afterwards, saying, “You built this? We were looking to hire someone to do this!” Encouraged, Wilson continued to develop it. “As soon as I was done with work,” he explained with a laugh, “I was like, ‘Oh boy, I can work on my spreadsheet!’” As a result, Wilson came to the conclusion that he really should switch careers and pursue his passion for programming. Similarly to Wilson, Malcolm Hendricks initially focused on Japanese. He came to Japan as an exchange student in 2002, and traveled to Japan several more times before finally relocating in 2011. Though his original role was as a language teacher, he soon found a job at a Japanese publishing company, where he worked as an editor and writer for seven years. However, he felt burned out on the work, and also that he was in danger of stagnating; since he isn’t Japanese, the road to promotion was a difficult one. He started following some YouTube tutorials on web development, and eventually began creating websites for his friends. Along the way, he fell in love with development, on both a practical and a philosophical level. “There’s another saying I’ve heard here and there—I don’t know exactly who to attribute it to—but the essence of it goes that ‘Computer science is just teaching rocks how to think,’” Hendricks said. “My mentor Bob has been guiding me through the very fundamentals of computer science, down to binary calculations, Boolean logic, gate theory, and von Neumann architecture. He explains the fine minutia and often concludes with, ‘That’s how it works. There’s no magic to it.’ “Meanwhile, in the back of my mind, I can’t help but be mystified at the things we are all now able to do, such as having video calls from completely different parts of the world, or even me here typing on squares of plastic to make letters appear on a screen that has its own source of light inside it. . . . [It] sounds like the highest of high-fantasy wizardry to me.” I’ve always had a love for technomancy, but I never figured I might one day get the chance to be a technomancer myself. And I love it! We have the ability to create nigh unto anything in the digital world. A practical solution When Paulo D’Alberti moved to Japan in 2019, he only spoke a little Japanese, which limited his employment prospects. With his prior business experience, he landed an online marketing role for a blockchain startup, but eventually exited the company to pursue a more stable work environment. “But when I decided to leave the company,” D’Alberti said, “my Japanese was still not good enough to do business. So I was at a crossroads.” Do I decide to join a full-time Japanese language course, aiming to get JLPT N2 or the equivalent, and find a job on the business side? . . . Or do I say screw it and go for a complete career change and get skills in something more technical, that would allow me to carry those skills [with me] even if I were to move again to another country?” The portability of a career in development was a major plus for D’Alberti. “That was one of the big reasons. Another consideration was that, looking at the boot camps that were available, the promise was ‘Yeah, we’ll teach you to be a software developer in nine weeks or two months.’ That was a much shorter lead time than getting from JLPT N4 to N2. I definitely wouldn’t be able to do that in two months.” Since D’Alberti had family obligations, the timeline for his career switch was crucial. “We still had family costs and rent and groceries and all of that. I needed to find a job as soon as possible. I actually already at that point had been unsuccessfully job hunting for two months. So that was like, ‘Okay, the savings are winding up, and we are running out of options. I need to make a decision and make it fast.’” How to switch careers Method 1: Software Development Bootcamp Under pressure to find new employment quickly, D’Alberti decided to enter the Le Wagon Coding Bootcamp in Tokyo. Originally, he wavered between Le Wagon and Code Chrysalis, which has since ended its bootcamp programs. “I went with Le Wagon for two reasons,” he explained. “There were some scheduling reasons. . . . But the main reason was that Code Chrysalis required you to pass a coding exam before being admitted to their bootcamp.” Since D’Alberti was struggling to learn development by himself, he knew his chances of passing any coding exam were slim. “I tried Code Academy, I tried Solo Learn, I tried a whole bunch of apps online, I would follow the examples, the exercises . . . nothing clicked. I wouldn’t understand what I was doing or why I was doing it.” At the time, Le Wagon only offered full-time web development courses, although they now also have part-time courses and a data science curriculum. Since D’Alberti was unemployed, a full-time program wasn’t a problem for him, “But it did mean that the people who were present were very particular [kinds] of people: students who could take some time off to add this to their [coursework], or foreigners who took three months off and were traveling and decide to come here and do studying plus sightseeing, and I think there were one or two who actually asked for time off from the job in order to participate.” It was a very intense course, and the experience itself gave me exactly what I needed. I had been trying to learn by myself. It did not work. I did not understand. [After joining], the first day or second day, suddenly everything clicked. D’Alberti appreciated how Le Wagon organized the curriculum to build continuously off previous lessons. By the time he graduated in June of 2019, he’d built three applications from scratch, and felt far more confident in his coding abilities. “It was great. [The curriculum] was amazing, and I really felt super confident in my abilities after the three months. Which, looking back,” he joked, “I still had a lot to learn.” D’Alberti did have some specific advice for those considering a bootcamp: “Especially in the last couple of weeks, it can get very dramatic. You are divided into teams and as a team, you’re supposed to develop an application that you will be demonstrating in front of other people.” Some of the students, D’Alberti explained, felt that pressure intensely; one of his classmates broke down in tears. “Of course,” he added, “one of the big difficulties of joining a bootcamp is economical. The bootcamp itself is quite expensive.” While between 700,000 and 800,000 yen when D’Alberti went through the bootcamp, Le Wagon’s tuition has now risen to 890,000 yen for Web Development and 950,000 for Data Science. At the time D’Alberti joined there was no financial assistance. Now, Le Wagon has an agreement with Hello Work, so that students who are enrolled in the Hello Work system can be reimbursed for up to 70 percent of the bootcamp’s tuition. Though already studying development by himself, Asakura also enrolled in Le Wagon Tokyo in 2024, “to gain structure and accountability,” he said. One lesson that really stayed with me came from Sylvain Pierre, our bootcamp director. He said, ‘You stop being a developer the moment you stop learning or coding.’ That mindset helped me stay on track. Method 2: Online computer science degree Wilson considered going the bootcamp route, but decided against it. He knew, from his experience in recruiting, that a degree would give him an edge—especially in Japan, where having the right degree can make a difference in visa eligibility “The quality of bootcamps is perfectly fine,” he explained. “If you go through a bootcamp and study hard, you can get a job and become a developer no problem. I wanted to differentiate myself on paper as much as I could . . . [because] there are a lot of smart, motivated people who go through a bootcamp.” Whether it’s true or not, whether it’s valid or not, if you take two candidates who are very similar on paper, and one has a coding bootcamp and one has a degree, from a typical Japanese HR perspective, they’re going to lean toward the person with the degree. “Whether that’s good or not, that’s sort of a separate situation,” Wilson added. “But the reality [is] I’m older and I’m trying to make a career change, so I want to make sure that I’m giving myself every advantage that I can.” For these reasons, Wilson opted to get his computer science degree online. “There’s a program out of the University of Oregon, for people who already had a Bachelor’s degree in a different subject to get a Bachelor’s degree in Computer Science. “Because it’s limited to people who already have a Bachelor’s degree, that means you don’t need to take any non-computer science classes. You don’t need any electives or prerequisites or anything like that.” As it happened, Wilson was on paternity leave when he started studying for his degree. “That was one of my motivations to finish quickly!” he said. In the end, with his employer’s cooperation, he extended his paternity leave to two years, and finished the degree in five quarters. Method 3: Self-taught Hendricks took a different route, combining online learning materials with direct experience. He primarily used YouTube tutorials, like this project from one of his favorite channels, to teach himself. Once he had the basics down, he started creating websites for friends, as well as for the publishing company he worked for at the time. With every site, he’d put his name at the bottom of the page, as a form of marketing. This worked well enough that Hendricks was able to quit his work at the translation company and transition to full-time freelancing. However, eventually the freelancing work dried up, and he decided he wanted to experience working at a tech company—and not just for job security reasons. Hendricks saw finding a full-time development role as the perfect opportunity to push himself and see just how far he could get in his new career. There’s a common trope, probably belonging more to the sports world at large, about the importance of shedding ‘blood, sweat, and tears’ in the pursuit of one’s passion . . . and that’s also how I wanted to cut my teeth in the software engineering world. The job hunt While all four are now successfully employed as developers, Asakura, D’Alberti, Wilson, and Hendricks approached and experienced the job hunt differently. Following is their hard-earned advice on best practices and common mistakes. DO network When Hendricks started his job hunt, he faced the disadvantages of not having any formal experience, and also being both physically and socially isolated from other developers. Since he and his family were living in Nagano, he wasn’t able to participate in most of the tech events and meet-ups available in Tokyo or other big cities. His initial job hunt took around a year, and at one point he was sending so many applications that he received a hundred rejections in a week. It wasn’t until he started connecting with the community that he was able to turn it around, eventually getting three good job offers in a single week. Networking, for me, is what made all the difference. It was through networking that I found my mentors, found community, and joined and even started a few great Discord servers. These all undeniably contributed to me ultimately landing my current job, but they also made me feel welcome in the industry. Hendricks particularly credits his mentors, Ean More and Bob Cousins, for giving him great advice. “My initial mentor [Ean More] I actually met through a mutual IT networking Facebook group. I noticed that he was one of the more active members, and that he was always ready to lend a hand to help others with their questions and spread a deeper understanding of programming and computer science. He also often posted snippets of his own code to share with the community and receive feedback, and I was interested in a lot of what he was posting. “I reached out to him and told him I thought it was amazing how selfless he was in the group, and that, while I’m still a junior, if there was ever any grunt work I could do under his guidance, I would be happy to do so. Since he had a history of mentoring others, he offered to do so for me, and we’ve been mentor/mentee and friends ever since.” “My other mentor [Bob Cousins],” Hendricks continued, “was a friend of my late uncle’s. My uncle had originally begun mentoring me shortly before his passing. We were connected through a mutual friend whom I lamented to about not having any clue how to continue following the path my uncle had originally laid before me. He mentioned that he knew just the right person and gave me an email address to contact. I sent an email to the address and was greeted warmly by the man who would become another mentor, and like an uncle to me.” Although Hendricks found him via a personal connection, Cousins runs a mentorship program that caters to a wide variety of industries. Wilson also believes in the power of networking—and not just for the job hunt. “One of the things I like about programming,” he said, “is that it’s a very collaborative community. Everybody wants to help everybody.” We remember that everyone had to start somewhere, and we’ll take time to help those starting out. It’s a very welcoming community. Just do it! We’re all here for you, and if you need help I’ll refer you. Asakura, by contrast, thinks that networking can help, but that it works a little differently in Japan than in other countries. “Don’t rely on it too much,” he said. “Unlike in Western countries, personal referrals don’t always lead directly to job opportunities in Japan. Your skills, effort, and consistency will matter more in the long run.” DO treat the job hunt like a job Once he’d graduated from Le Wagon, D’Alberti said, “I considered job-hunting my full-time job.” I checked all the possible networking events and meetup events that were going on in the city, and tried to attend all of them, every single day. I had a list of 10 different job boards that I would go and just refresh on a daily basis to see, ‘Okay, Is there anything new now?’ And, of course, I talked with recruiters. D’Alberti suggests beginning the search earlier than you think you need to. “I had started actively job hunting even before graduating [from Le Wagon],” he said. “That’s advice I give to everyone who joins the bootcamp. “Two weeks before graduation, you have one simple web application that you can show. You have a second one you’re working on in a team, and you have a third one that you know what it’s going to be about. So, already, there are three applications that you can showcase or you can use to explain your skills. I started going to meetups and to different events, talking with people, showing my CV.” The process wasn’t easy, as most companies and recruiters weren’t interested in hiring for junior roles. But his intensive strategy paid off within a month, as D’Albert landed three invitations to interview: one from a Japanese job board, one from a recruiter, and one from LinkedIn. For Asakura, treating job hunting like a job was as much for his mental health as for his career. “The biggest challenge was dealing with impostor syndrome and feeling like I didn’t belong because I didn’t have a computer science degree,” he explained. “I also experienced burnout from pushing myself too hard.” To cope, I stuck to a structured routine. I went to the gym daily to decompress, kept a consistent study schedule as if I were working full-time, and continued applying for jobs even when it felt hopeless. At first, Asakura tried to apply to jobs strategically by tracking each application, tailoring his resume, and researching every role. “But after dozens of rejections,” he said, “I eventually switched to applying more broadly and sent out over one hundred applications. I also reached out to friends who were already software engineers and asked for direct referrals, but unfortunately, nothing worked out.” Still, Asakura didn’t give up. He practiced interviews in both English and Japanese with his friends, and stayed in touch with recruiters. Most importantly, he kept developing and adding to his portfolio. DO make use of online resources “What ultimately helped me was staying active and visible,” Asakura said. I consistently updated my GitHub, LInkedIn, and Wantedly profiles. Eventually, I received a message on Wantedly from the CTO of a company who was impressed with my portfolio, and that led to my first developer job.” “If you have the time, certifications can also help validate your knowledge,” Asakura added, “especially in fields like cloud and AI. Some people may not realize this, but the rise of artificial intelligence is closely tied to the growth of cloud computing. Earning certifications such as AWS, Kubernetes, and others can give you a strong foundation and open new opportunities, especially as these technologies continue to evolve.” Hendricks also heavily utilized LinkedIn and similar sites, though in a slightly different way. “I would also emphasize the importance of knowing how to use job-hunting sites like Indeed and LinkedIn,” he said. “I had the best luck when I used them primarily to do initial research into companies, then applied directly through the companies’ own websites, rather than through job postings that filter applicants before their resumés ever make it to the actual people looking to hire.” In addition, Hendricks recommends studying coding interview prep tutorials from freeCodeCamp. Along with advice from his mentors and the online communities he joined, he credits those tutorials with helping him successfully receive offers after a long job hunt. DO highlight experience with Japanese culture and language Asakura felt that his experience in Japan, and knowledge of Japanese, gave him an edge. “I understand Japanese work culture [and] can speak the language,” Asakura said, “and as a Japanese national I didn’t require visa sponsorship. That made me a lower-risk hire for companies here.” Hendricks also felt that his excellent Japanese made him a more attractive hire. While applying, he emphasized to companies that he could be a bridge to the global market and business overseas. However, he also admitted this strategy steered him towards applying with more domestic Japanese companies, which were also less likely to hire someone without a computer science degree. “So,” he said, “it sort of washed out.” Wilson is another who put a lot of emphasis on his Japanese language skills, from a slightly different angle. A lot of interviewees typically don’t speak Japanese well . . . and a lot of companies here say that they’re very international, but if they want very good programmers, [those people] spend their lives programming, not studying English. So having somebody who can bridge the language gap on the IT side can be helpful. DO lean into your other experience Several career switchers discovered that their past experiences and skills, while not immediately relevant to their new career, still proved quite helpful in landing that first role—sometimes in very unexpected ways. When Wilson was pitching his language skills to companies, he wasn’t talking about just Japanese–English translation. He also highlighted his prior experience in sales to suggest that he could help communicate with and educate non-technical audiences. “Actually to be a software engineer, there’s a lot of technical communication you have to do.” I have worked with some incredible coders who are so good at the technical side and just don’t want to do the personal side. But for those of us who are not super-geniuses and can’t rely purely on our tech skills . . . there’s a lot of non-technical discussion that goes around building a product.” This strategy, while eventually fruitful, didn’t earn Wilson a job right away. Initially, he applied to more than sixty companies over the course of three to four months. “I didn’t have any professional [coding] experience, so it was actually quite a rough time,” he said. “I interviewed all over the place. I was getting rejected all over town.” The good news was, Wilson said, “I’m from Chicago. I don’t know what it is, but there are a lot of Chicagoans who work in Tokyo for whatever reason.” When he finally landed an interview, one of the three founders of the company was also from Chicago, giving them something in common. “We hit it off really well in the interview. I think that kind of gave me the edge to get the role, to be honest.” Like Wilson, D’Alberti found that his previous work as a marketer helped him secure his first developer role—which was ironic, he felt, given that he’d partially chosen to switch careers because he hadn’t been able to find an English-language marketing job in Japan. “I had my first interview with the CEO,” he told me, “and this was for a Japanese startup that was building chatbots, and they wanted to expand into the English market. So I talked with the CEO, and he was very excited to get to know me and sent me to talk with the CTO.” The CTO, unfortunately, wasn’t interested in hiring a junior developer with no professional experience. “And I thought that was the end of it. But then I got called again by the CEO. I wanted to join for the engineering position, and he wanted to have me for my marketing experience.” In the end we agreed that I would join in a 50-50 arrangement. I would do 50 percent of my job in marketing and going to conferences and talking to people, and 50 percent on the engineering side. I was like, ‘Okay, I’ll take that.’ This ended up working better than D’Alberti had expected, partially due to external circumstances. “When COVID came, we couldn’t travel abroad, so most of the job I was doing in my marketing role I couldn’t perform anymore. “So they sat me down and [said], ‘What are we going to do with you, since we cannot use you for marketing anymore?’ And I was like, ‘Well, I’m still a software developer. I could continue working in that role.’ And that actually allowed me to fully transition.” DON’T make these mistakes It was D’Alberti’s willingness to compromise on that first development role that led to his later success, so he would explicitly encourage other career-changers to avoid, in his own words, “being too picky.” This advice is based, not just on his own experience, but also on his time working as a teaching assistant at Le Wagon. “There were a couple of people who would be like, ‘Yeah, I’d really like to find a job and I’m not getting any interviews,’” he explained. “And then we’d go and ask, ‘Okay, how many companies are you applying to? What are you doing?’ But [they’d say] ‘No, see, [this company] doesn’t offer enough’ or ‘I don’t really like this company’ or ‘I’d like to do something else.’ Those who would be really picky or wouldn’t put in the effort, they wouldn’t land a job. Those who were deadly serious about ‘I need to get a job as a software developer,’ they’d find one. It might not be a great job, it might not be at a good company, but it would be a good first start from which to move on afterwards. Asakura also knew some other bootcamp graduates who struggled to find work. “A major reason was a lack of Japanese language skills,” he said. Even for junior roles, many companies in Japan require at least conversational Japanese, especially domestic ones. On the other hand, if you prioritize learning Japanese, that can give you an edge on entering the industry: “Many local companies are open to training junior developers, as long as they see your motivation and you can communicate effectively. International companies, on the other hand, often have stricter technical requirements and may pass on candidates without degrees or prior experience.” Finally, Hendricks said that during his own job hunt, “Not living in Tokyo was a problem.” It was something that he was able to overcome via diligent digital networking, but he’d encourage career-changers to think seriously about their future job prospects before settling outside a major metropolis in Japan. Their top advice I asked each developer to share their number one piece of advice for career-changers. D’Alberti wasn’t quite sure what to suggest, given recent changes in the tech market overall. “I don’t have clear advice to someone who’s trying to break into tech right now,” he said. “It might be good to wait and see what happens with the AI path. Might be good to actually learn how to code using AI, if that’s going to be the way to distinguish yourself from other junior developers. It might be to just abandon the idea of [being] a linear software developer in the traditional sense, and maybe look more into data science, if there are more opportunities.” But assuming they still decide ‘Yes, I want to join, I love the idea of being a software developer and I want to go forward’ . . . my main suggestion is patience. “It’s going to be tough,” he added. By contrast, Hendricks and Wilson had the same suggestion: if you want to change careers, then go for it, full speed ahead. “Do it now, or as soon as you possibly can,” Hendricks stated adamantly. His life has been so positively altered by discovering and pursuing his passion, that his only regret is he didn’t do it sooner. Wilson said something strikingly similar. “Do it. Just do it. I went back and forth a lot,” he explained. “‘Oh, should I do this, it’s so much money, I already have a job’ . . . just rip the bandaid off. Just do it. You probably have a good reason.” He pointed out that while starting over and looking for work is scary, it’s also possible that you’ll lose your current job anyway, at which point you’ll still be job hunting but in an industry you no longer even enjoy. “If you keep at it,” he said, “you can probably do it.” “Not to talk down to developers,” he added, “but it’s not the hardest job in the world. You have to study and learn and be the kind of person who wants to sit at the computer and write code, but if you’re thinking about it, you’re probably the kind of person who can do it, and that also means you can probably weather the awful six months of job hunting.” You only need to pass one job interview. You only need to get your foot in the door. Asakura agreed with “just do it,” but with a twist. “Build in public,” he suggested. “Share your progress. Post on GitHub. Keep your LinkedIn active.” Let people see your journey, because even small wins build momentum and credibility. “To anyone learning to code right now,” Asakura added, “don’t get discouraged by setbacks or rejections. Focus on building, learning, and showing up every day. Your portfolio speaks louder than your past, and consistency will eventually open the door.” If you want to read more how-tos and success stories around networking, working with recruitment agencies, writing your resume, etc., check out TokyoDev’s other articles. If you’d like to hear more about being a developer in Japan, we invite you to join the TokyoDev Discord, which has over 6,000 members as well as dedicated channels for resume review, job posts, life in Japan, and more.