Full Width [alt+shift+f] FOCUS MODE Shortcuts [alt+shift+k]
Sign Up [alt+shift+s] Log In [alt+shift+l]
31
Over the past few weeks, I’ve been building out server-side short video support for Bluesky. The major aim of this feature is to support short (90 second max) video streaming at a quality that doesn’t cost an arm and a leg for us to provide for free. In order to stay within these constraints, we’re considering making use of a video CDN that can bear the brunt of the bandwidth required to support Video-on-Demand streaming. While the CDN is a pretty fully-featured product, we want to avoid too much vendor lock-in and provide some enhancements to our streaming platform that requires extending their offering and getting creative with video streaming protocols. Some of the things we’d like to be able to do that don’t work out-of-the-box are: Track view counts, viewer sessions, and duration viewed to provide better feedback for video performance. Provide dynamic closed-caption support with the flexibility to automate them in the future. Store a transcoded version of source files somewhere...
a year ago

Comments

Improve your reading experience

Logged in users get linked directly to articles resulting in a better reading experience. Please login for free, it takes less than 1 minute.

More from exist

When Imperfect Systems are Good, Actually: Bluesky's Lossy Timelines

Often when designing systems, we aim for perfection in things like consistency of data, availability, latency, and more. The hardest part of system design is that it’s difficult (if not impossible) to design systems that have perfect consistency, perfect availability, incredibly low latency, and incredibly high throughput, all at the same time. Instead, when we approach system design, it’s best to treat each of these properties as points on different axes that we balance to find the “right fit” for the application we’re supporting. I recently made some major tradeoffs in the design of Bluesky’s Following Feed/Timeline to improve the performance of writes at the cost of consistency in a way that doesn’t negatively affect users but reduced P99s by over 96%. Timeline Fanout When you make a post on Bluesky, your post is indexed by our systems and persisted to a database where we can fetch it to hydrate and serve in API responses. Additionally, a reference to your post is “fanned out” to your followers so they can see it in their Timelines. This process involves looking up all of your followers, then inserting a new row into each of their Timeline tables in reverse chronological order with a reference to your post. When a user loads their Timeline, we fetch a page of post references and then hydrate the posts/actors concurrently to quickly build an API response and let them see the latest content from people they follow. The Timelines table is sharded by user. This means each user gets their own Timeline partition, randomly distributed among shards of our horizontally scalable database (ScyllaDB), replicated across multiple shards for high availability. Timelines are regularly trimmed when written to, keeping them near a target length and dropping older post references to conserve space. Hot Shards in Your Area Bluesky currently has around 32 Million Users and our Timelines database is broken into hundreds of shards. To support millions of partitions on such a small number of shards, each user’s Timeline partition is colocated with tens of thousands of other users’ Timelines. Under normal circumstances with all users behaving well, this doesn’t present a problem as the work of an individual Timeline is small enough that a shard can handle the work of tens of thousands of them without being heavily taxed. Unfortunately, with a large number of users, some of them will do abnormal things like… well… following hundreds of thousands of other users. Generally, this can be dealt with via policy and moderation to prevent abusive users from causing outsized load on systems, but these processes take time and can be imperfect. When a user follows hundreds of thousands of others, their Timeline becomes hyperactive with writes and trimming occurring at massively elevated rates. This load slows down the individual operations to the user’s Timeline, which is fine for the bad behaving user, but causes problems to the tens of thousands of other users sharing a shard with them. We typically call this situation a “Hot Shard”: where some resident of a shard has “hot” data that is being written to or read from at much higher rates than others. Since the data on the shard is only replicated a few times, we can’t effectively leverage the horizontal scale of our database to process all this additional work. Instead, the “Hot Shard” ends up spending so much time doing work for a single partition that operations to the colocated partitions slow down as well. Stacking Latencies Returning to our Fanout process, let’s consider the case of Fanout for a user followed by 2,000,000 other users. Under normal circumstances, writing to a single Timeline takes an average of ~600 microseconds. If we sequentially write to the Timelines of our user’s followers, we’ll be sitting around for 20 minutes at best to Fanout this post. If instead we concurrently Fanout to 1,000 Timelines at once, we can complete this Fanout job in ~1.2 seconds. That sounds great, except it oversimplifies an important property of systems: tail latencies. The average latency of a write is ~600 microseconds, but some writes take much less time and some take much more. In fact, the P99 latency of writes to the Timelines cluster can be as high as 15 milliseconds! What does this mean for our Fanout? Well, if we concurrently write to 1,000 Timelines at once, statistically we’ll see 10 writes as slow as or slower than 15 milliseconds. In the case of timelines, each “page” of followers is 10,000 users large and each “page” must be fanned out before we fetch the next page. This means that our slowest writes will hold up the fetching and Fanout of the next page. How does this affect our expected Fanout time? Each “page” will have ~100 writes as slow as or slower than the P99 latency. If we get unlucky, they could all stack up on a single routine and end up slowing down a single page of Fanout to 1.5 seconds. In the worst case, for our 2,000,000 Follower celebrity, their post Fanout could end up taking as long as 5 minutes! That’s not even considering P99.9 and P99.99 latencies which could end up being >1 second, which could leave us waiting tens of minutes for our Fanout job. Now imagine how bad this would be for a user with 20,000,000+ Followers! So, how do we fix the problem? By embracing imperfection, of course! Lossy Timelines Imagine a user who follows hundreds of thousands of others. Their Timeline is being written to hundreds of times a second, moving so fast it would be humanly impossible to keep up with the entirety of their Timeline even if it was their full-time job. For a given user, there’s a threshold beyond which it is unreasonable for them to be able to keep up with their Timeline. Beyond this point, they likely consume content through various other feeds and do not primarily use their Following Feed. Additionally, beyond this point, it is reasonable for us to not necessarily have a perfect chronology of everything posted by the many thousands of users they follow, but provide enough content that the Timeline always has something new. Note in this case I’m using the term “reasonable” to loosely convey that as a social media service, there must be a limit to the amount of work we are expected to do for a single user. What if we introduce a mechanism to reduce the correctness of a Timeline such that there is a limit to the amount of work a single Timeline can place on a DB shard. We can assert a reasonable limit for the number of follows a user should have to have a healthy and active Timeline, then increase the “lossiness” of their Timeline the further past that limit they go. A loss_factor can be defined as min(reasonable_limit/num_follows, 1) and can be used to probabilistically drop writes to a Timeline to prevent hot shards. Just before writing a page in Fanout, we can generate a random float between 0 and 1, then compare it to the loss_factor of each user in the page. If the user’s loss_factor is smaller than the generated float, we filter the user out of the page and don’t write to their Timeline. Now, users all have the same number of “follows worth” of Fanout. For example with a reasonable_limit of 2,000, a user who follows 4,000 others will have a loss_factor of 0.5 meaning half the writes to their Timeline will get dropped. For a user following 8,000 others, their loss factor of 0.25 will drop 75% of writes to their Timeline. Thus, each user has a effective ceiling on the amount of Fanout work done for their Timeline. By specifying the limits of reasonable user behavior and embracing imperfection for users who go beyond it, we can continue to provide service that meets the expectations of users without sacrificing scalability of the system. Aside on Caching We write to Timelines at a rate of more than one million times a second during the busy parts of the day. Looking up the number of follows of a given user before fanning out to them would require more than one million additional reads per second to our primary database cluster. This additional load would not be well received by our database and the additional cost wouldn’t be worth the payoff for faster Timeline Fanout. Instead, we implemented an approach that caches high-follow accounts in a Redis sorted set, then each instance of our Fanout service loads an updated version of the set into memory every 30 seconds. This allows us to perform lookups of follow counts for high-follow accounts millions of times per second per Fanount service instance. By caching values which don’t need to be perfect to function correctly in this case, we can once again embrace imperfection in the system to improve performance and scalability without compromising the function of the service. Results We implemented Lossy Timelines a few weeks ago on our production systems and saw a dramatic reduction in hot shards on the Timelines database clusters. In fact, there now appear to be no hot shards in the cluster at all, and the P99 of a page of Fanout work has been reduced by over 90%. Additionally, with the reduction in write P99s, the P99 duration for a full post Fanout has been reduced by over 96%. Jobs that used to take 5-10 minutes for large accounts now take <10 seconds. Knowing where it’s okay to be imperfect lets you trade consistency for other desirable aspects of your systems and scale ever higher. There are plenty of other places for improvement in our Timelines architecture, but this step was a big one towards improving throughput and scalability of Bluesky’s Timelines. If you’re interested in these sorts of problems and would like to help us build the core data services that power Bluesky, check out this job listing. If you’re interested in other open positions at Bluesky, you can find them here.

6 months ago 56 votes
Emoji Griddle
10 months ago 29 votes
Jetstream: Shrinking the AT Proto Firehose by >99%

Bluesky recently saw a massive spike in activity in response to Brazil’s ban of Twitter. As a result, the AT Proto event firehose provided by Bluesky’s Relay at bsky.network has increased in volume by a huge amount. The average event rate during this surge increased by ~1,300%. Before this new surge in activity, the firehose would produce around 24 GB/day of traffic. After the surge, this volume jumped to over 232 GB/day! Keeping up with the full, verified firehose quickly became less practical on cheap cloud infrastructure with metered bandwidth. To help reduce the burden of operating bots, feed generators, labelers, and other non-verifying AT Proto services, I built Jetstream as an alternative, lightweight, filterable JSON firehose for AT Proto. How the Firehose Works The AT Proto firehose is a mechanism used to keep verified, fully synced copies of the repos of all users. Since repos are represented as Merkle Search Trees, each firehose event contains an update to the user’s MST which includes all the changed blocks (nodes in the path from the root to the modified leaf). The root of this path is signed by the repo owner, and a consumer can keep their copy of the repo’s MST up-to-date by applying the diff in the event. For a more in-depth explanation of how Merkle Trees are constructed, check out this explainer. Practically, this means that for every small JSON record added to a repo, we also send along some number of MST blocks (which are content-addressed hashes and thus very information-dense) that are mostly useful for consumers attempting to keep a fully synced, verified copy of the repo. You can think of this as the difference between cloning a git repo v.s. just grabbing the latest version of the files without the .git folder. In this case, the firehose effectively streams the diffs for the repository with commits, signatures, and metadata, which is inherently heavier than a point-in-time checkout of the repo. Because firehose events with repo updates are signed by the repo owner, they allow a consumer to process events from any operator without having to trust the messenger. This is the “Authenticated” part of the Authenticated Transfer (AT) Protocol and is crucial to the correct functioning of the network. That being said, of the hundreds of consumers of Bluesky’s production Relay, >90% of them are building feeds, bots, and other tools that don’t keep full copies of the entire network and don’t verify MST operations at all. For these consumers, all they actually process is the JSON records created, updated, and deleted in each event. If consumers already trust the provider to do validation on their end, they could get by with a much more lightweight data stream. How Jetstream Works Jetstream is a streaming service that consumes an AT Proto com.atproto.sync.subscribeRepos stream and converts it into lightweight, friendly JSON. If you want to try it out yourself, you can connect to my public Jetstream instance and view all posts on Bluesky in realtime: $ websocat "wss://jetstream2.us-east.bsky.network/subscribe?wantedCollections=app.bsky.feed.post" Note: the above instance is operated by Bluesky PBC and is free to use, more instances are listed in the official repo Readme Jetstream converts the CBOR-encoded MST blocks produced by the AT Proto firehose and translates them into JSON objects that are easier to interface with using standard tooling available in programming languages. Since Repo MSTs only contain records in their leaf nodes, this means Jetstream can drop all of the blocks in an event except for those of the leaf nodes, typically leaving only one block per event. In reality, this means that Jetstream’s JSON firehose is nearly 1/10 the size of the full protocol firehose for the same events, but lacks the verifiability and signatures included in the protocol-level firehose. Jetstream events end up looking something like: { "did": "did:plc:eygmaihciaxprqvxpfvl6flk", "time_us": 1725911162329308, "type": "com", "commit": { "rev": "3l3qo2vutsw2b", "type": "c", "collection": "app.bsky.feed.like", "rkey": "3l3qo2vuowo2b", "record": { "$type": "app.bsky.feed.like", "createdAt": "2024-09-09T19:46:02.102Z", "subject": { "cid": "bafyreidc6sydkkbchcyg62v77wbhzvb2mvytlmsychqgwf2xojjtirmzj4", "uri": "at://did:plc:wa7b35aakoll7hugkrjtf3xf/app.bsky.feed.post/3l3pte3p2e325" } }, "cid": "bafyreidwaivazkwu67xztlmuobx35hs2lnfh3kolmgfmucldvhd3sgzcqi" } } Each event lets you know the DID of the repo it applies to, when it was seen by Jetstream (a time-based cursor), and up to one updated repo record as serialized JSON. Check out this 10 second CPU profile of Jetstream serving 200k evt/sec to a local consumer: By dropping the MST and verification overhead by consuming from relay we trust, we’ve reduced the size of a firehose of all events on the network from 232 GB/day to ~41GB/day, but we can do better. Jetstream and zstd I recently read a great engineering blog from Discord about their use of zstd to compress websocket traffic to/from their Gateway service and client applications. Since Jetstream emits marshalled JSON through the websocket for developer-friendliness, I figured it might be a neat idea to see if we could get further bandwidth reduction by employing zstd to compress events we send to consumers. zstd has two basic operating modes, “simple” mode and “streaming” mode. Streaming Compression At first glance, streaming mode seems like it’d be a great fit. We’ve got a websocket connection with a consumer and streaming mode allows the compression to get more efficient over the lifetime of the connection. I went and implemented a streaming compression version of Jetstream where a consumer can request compression when connecting and will get zstd compressed JSON sent as binary messages over the socket instead of plaintext. Unfortunately, this had a massive impact on Jetstream’s server-side CPU utilization. We were effectively compressing every message once per consumer as part of their streaming session. This was not a scalable approach to offering compression on Jetstream. Additionally, Jetstream stores a buffer of the past 24 hours (configurable) of events on disk in PebbleDB to allow consumers to replay events before getting transitioned into live-tailing mode. Jetstream stores serialized JSON in the DB, so playback is just shuffling the bytes into the websocket without having to round-trip the data into a Go struct. When we layer in streaming compression, playback becomes significantly more expensive because we have to compress outgoing events on-the-fly for a consumer that’s catching up. In real numbers, this increased CPU usage of Jetstream by 23% while lowering the throughput of playback from ~200k evt/sec to ~28k evt/sec for a single local consumer. When in streaming mode, we can’t leverage the bytes we compress for one consumer and reuse them for another consumer because zstd’s streaming context window may not be in sync between the two consumers. They haven’t received exactly the same data in the session so the clients on the other end don’t have their state machines in the same state. Since streaming mode’s primary advantage is giving us eventually better efficiency as the encoder learns about the data, what if we just taught the encoder about the data at the start and compress each message statelessly? Dictionary Mode zstd offers a mechanism for initializing an encoder/decoder with pre-optimized settings by providing a dictionary trained on a sample of the data you’ll be encoding/decoding. Using this dictionary, zstd essentially uses it’s smallest encoded representations for the most frequently seen patterns in the sample data. In our case, where we’re compressing serialized JSON with a common event shape and lots of common property names, training a dictionary on a large number of real events should allow us to represent the common elements among messages in the smallest number of bytes. For take two of Jetstream with zstd, let’s to use a single encoder for the whole service that utilizes a custom dictionary trained on 100,000 real events. We can use this encoder to compress every event as we see it, before persisting and emitting it to consumers. Now we end up with two copies of every event, one that’s just serialized JSON, and one that’s statelessly compressed to zstd using our dictionary. Any consumers that want compression can have a copy of the dictionary on their end to initialize a decoder, then when we broadcast the shared compressed event, all consumers can read it without any state or context issues. This requires the consumers and server to have a pre-shared dictionary, which is a major drawback of this implementation but good enough for our purposes. That leaves the problem of event playback for compression-enabled clients. An easy solution here is to just store the compressed events as well! Since we’re only sticking the JSON records into our PebbleDB, the actual size of the 24 hour playback window is <8GB with sstable compression. If we store a copy of the JSON serialized event and a copy of the zstd compressed event, this will, at most, double our storage requirements. Then during playback, if the consumer requests compression, we can just shuffle bytes out of the compressed version of the DB into their socket instead of having to move it through a zstd encoder. Savings Running with a custom dictionary, I was able to get the average Jetstream event down from 482 bytes to just 211 bytes (~0.44 compression ratio). Jetstream allows us to live tail all posts on Bluesky as they’re posted for as little as ~850 MB/day, and we could keep up with all events moving through the firehose during the Brazil Twitter Exodus weekend for 18GB/day (down from 232GB/day). With this scheme, Jetstream is required to compress each event only once before persisting it to disk and emitting it to connected consumers. The CPU impact of these changes is significant in proportion to Jetstream’s incredibly light load but it’s a flat cost we pay once no matter how many consumers we have. (CPU profile from a 30 second pprof sample with 12 consumers live-tailing Jetstream) Additionally, with Jetstream’s shared buffer broadcast architecture, we keep memory allocations incredibly low and the cost per consumer on CPU and RAM is trivial. In the allocation profile below, more than 80% of the allocations are used to consume the full protocol firehose. The total resident memory of Jetstream sits below 16MB, 25% of which is actually consumed by the new zstd dictionary. To bring it all home, here’s a screenshot from the dashboard of my public Jetstream instance serving 12 consumers all with various filters and compression settings, running on a $5/mo OVH VPS. At our new baseline firehose activity, a consumer of the protocol-level firehose would require downloading ~3.16TB/mo to keep up. A Jetstream consumer getting all created, updated, and deleted records without compression enabled would require downloading ~400GB/mo to keep up. A Jetstream consumer that only cares about posts and has zstd compression enabled can get by on as little as ~25.5GB/mo, <99% of the full weight firehose. Feel free to join the conversation about Jetstream and zstd on Bluesky.

11 months ago 34 votes
An entire Social Network in 1.6GB (GraphD Part 2)

In Part 1 of this series, we tried to answer the question “who do you follow who also follows user B” in Bluesky, a social network with millions of users and hundreds of millions of follow relationships. At the conclusion of the post, we’d developed an in-memory graph store for the network that uses HashMaps and HashSets to keep track of the followers of every user and the set of users they follow, allowing bidirectional lookups, intersections, unions, and other set operations for combining social graph data. I received some helpful feedback after that post where several people pointed me towards Roaring Bitmaps as a potential improvement on my implementation. They were right, Roaring Bitmaps would be an excellent fit for my Graph service, GraphD, and could also provide me with a much needed way to quickly persist and load the Graph data to and from disk on startup, hopefully reducing the startup time of the service. What are Bitmaps? If you just want to dive into the Roaring Bitmap spec, you can read the paper here, but it might be easier to first talk about bitmaps in general. You can think of a bitmap as a vector of one-bit values (like booleans) that let you encode a set of integer values. For instance, say we have 10,000 users on our website and want to keep track of which users have validated their email addresses. We could do this by creating a list of the uint32 user IDs of each user, in which case if all 10,000 users have validated their emails we’re storing 10k * 32 bits = 40KB. Or, we could create a vector of single-bit values that’s 10,000 bits long (10k / 8 = 1.25KB), then if a user has confirmed their email we can set the value at the index of their UID to 1. If we want to create a list of all the UIDs of validated accounts, we can walk the vector and record the index of each non-zero bit. If we want to check if user n has validated their email, we can do a O(1) lookup in the bitmap by loading the bit at index n and checking if it’s set. When Bitmaps get Big and Sparse Now when talking about our social network problem, we’re dealing with a few more than 10,000 UIDs. We need to keep track of 5.5M users and whether or not the user follows or is followed by any of the other 5.5M users in the network. To keep a bitmap of “People who follow User A”, we’re going to need 5.5M bits which would require (5.5M / 8) ~687KB of space. If we wanted to keep bitmaps of “People who follow User A” and “People who User A follows”, we’d need ~1.37MB of space per user using a simple bitmap, meaning we’d need 5,500,000 * 1.37MB = ~7.5 Terabytes of space! Clearly this isn’t an improvement of our strategy from Part 1, so how can we make this more efficient? One strategy for compressing the bitmap is to take consecutive runs of 0’s or 1’s (i.e. 00001110000001) in the bitmap and turn them into a number. For instance if we had an account that followed only the last 100 accounts in our social network, the first 5,499,900 indices in our bitmap would be 0’s and so we could represent the bitmap by saying: 5,499,900 0's, then 100 1's which you notice I’ve written here in a lot fewer than 687KB and a computer could encode using two uint32 values plus two bits (one indicator bit for the state of each run) for a total of 66 bits. This strategy is called Run Length Encoding (RLE) and works pretty well but has a few drawbacks: mainly if your data is randomly and heavily populated, you may not have many consecutive runs (imagine a bitset where every odd bit is set and every even bit is unset). Also lookups and evaluation of the bitset requires walking the whole bitset to figure out where the index you care about lives in the compressed format. Thankfully there’s a more clever way to compress bitmaps using a strategy called Roaring Bitmaps. A brief description of the storage strategy for Roaring Bitmaps from the official paper is as follows: We partition the range of 32-bit indexes ([0, n)) into chunks of 2^16 integers sharing the same 16 most significant digits. We use specialized containers to store their 16 least significant bits. When a chunk contains no more than 4096 integers, we use a sorted array of packed 16-bit integers. When there are more than 4096 integers, we use a 2^16-bit bitmap. Thus, we have two types of containers: an array container for sparse chunks and a bitmap container for dense chunks. The 4096 threshold insures that at the level of the containers, each integer uses no more than 16 bits. These bitmaps are designed to support both densely and sparsely distributed data and can provide high performance binary set operations (and/or/etc.) operating on the containers within two or more bitsets in parallel. For more info on how Roaring Bitmaps work and some neat diagrams, check out this excellent primer on Roaring Bitmaps by Vikram Oberoi. So, how does this help us build a better graph? GraphD, Revisited with Roaring Bitmaps Let’s get back to our GraphD Service, this time in Go instead of Rust. For each user we can keep track of a struct with two bitmaps: type FollowMap struct { followingBM *roaring.Bitmap followingLk sync.RWMutex followersBM *roaring.Bitmap followersLk sync.RWMutex } Our FollowMap gives us a Roaring Bitmap for both the set of users we follow, and the set of users who follow us. Adding a Follow to the graph just requires we set the right bits in both user’s respective maps: // Note I've removed locking code and error checks for brevity func (g *Graph) addFollow(actorUID, targetUID uint32) { actorMap, _ := g.g.Load(actorUID) actorMap.followingBM.Add(targetUID) targetMap, _ := g.g.Load(targetUID) targetMap.followersBM.Add(actorUID) } Even better if we want to compute the intersections of two sets (i.e. the people User A follows who also follow User B) we can do so in parallel: // Note I've removed locking code and error checks for brevity func (g *Graph) IntersectFollowingAndFollowers(actorUID, targetUID uint32) ([]uint32, error) { actorMap, ok := g.g.Load(actorUID) targetMap, ok := g.g.Load(targetUID) intersectMap := roaring.ParAnd(4, actorMap.followingBM, targetMap.followersBM) return intersectMap.ToArray(), nil } Storing the entire graph as Roaring Bitmaps in-memory costs us around 6.5GB of RAM and allows us to perform set intersections between moderately large sets (with hundreds of thousands of set bits) in under 500 microseconds while serving over 70k req/sec! And the best part of all? We can use Roaring’s serialization format to write these bitmaps to disk or transfer them over the network. Storing 164M Follows in 1.6GB In the original version of GraphD, on startup the service would read a CSV file with an adjacency list of the (ActorDID, TargetDID) pairs of all follows on the network. This required creating a CSV dump of the follows table, pausing writes to the follows table, then bringing up the service and waiting 5 minutes for it to read the CSV file, intern the DIDs as uint32 UIDs, and construct the in-memory graph. This process is slow, pauses writes for 5 minutes, and every time our service restarts we have to do it all over again! With Roaring Bitmaps, we’re now given an easy way to effectively serialize a version of the in-memory graph that is many times smaller than the adjacency list CSV and many times faster to load. We can serialize the entire graph into a SQLite DB on the local machine where each row in a table contains: (uid, DID, followers_bitmap, following_bitmap) Loading the entire graph from this SQLite DB can be done in around ~20 seconds: // Note I've removed locking code and error checks for brevity rows, err := g.db.Query(`SELECT uid, did, following, followers FROM actors;`) for rows.Next() { var uid uint32 var did string var followingBytes []byte var followersBytes []byte rows.Scan(&uid, &did, &followingBytes, &followersBytes) followingBM := roaring.NewBitmap() followingBM.FromBuffer(followingBytes) followersBM := roaring.NewBitmap() followersBM.FromBuffer(followersBytes) followMap := &FollowMap{ followingBM: followingBM, followersBM: followersBM, followingLk: sync.RWMutex{}, followersLk: sync.RWMutex{}, } g.g.Store(uid, followMap) g.setUID(did, uid) g.setDID(uid, did) } While the service is running, we can also keep track of the UIDs of actors who have added or removed a follow since the last time we saved the DB, allowing us to periodically flush changes to the on-disk SQLite only for bitmaps that have updated. Syncing our data every 5 seconds while tailing the production firehose takes 2ms and writes an average of only ~5MB to disk per flush. The crazy part of this is, the on-disk representation of our entire follow network is only ~1.6GB! Because we’re making use of Roaring’s compressed serialized format, we can turn the ~6.5GB of in-memory maps into 1.6GB of on-disk data. Our largest bitmap, the followers of the bsky.app account with over 876k members, becomes ~500KB as a blob stored in SQLite. So, to wrap up our exploration of Roaring Bitmaps for first-degree graph databases, we saw: A ~20% reduction in resident memory size compared to HashSets and HashMaps A ~84% reduction in the on-disk size of the graph compared to an adjacency list A ~93% reduction in startup time compared to loading from an adjacency list A ~66% increase in throughput of worst-case requests under load A ~59% reduction in p99 latency of worst-case requests under low My next iteration on this problem will likely be to make use of DGraph’s in-memory Serialized Roaring Bitmap library that allows you to operate on fully-compressed bitmaps so there’s no need to serialize and deserialize them when reading from or writing to disk. It also probably results in significant memory savings as well! If you’re interested in solving problems like these, take a look at our open Backend Developer Job Rec. You can find me on Bluesky here, you can chat about this post here.

a year ago 34 votes

More in AI

A guide to understanding AI as normal technology

And a big change for this newsletter

11 hours ago 5 votes
Pluralistic: Trump steals $400b from American workers (09 Sep 2025)

Today's links Trump steals $400b from American workers: You get a noncompete, and you get a noncompete, and you get a noncompete! Hey look at this: Delights to delectate. Object permanence: Spying baby-monitors; FBI tests spy-gear at Burning Man; Little Brother optioned by Paramount; Best-paid CEOs have worst-paid workers. Upcoming appearances: Where to find me. Recent appearances: Where I've been. Latest books: You keep readin' em, I'll keep writin' 'em. Upcoming books: Like I said, I'll keep writin' 'em. Colophon: All the rest. Trump steals $400b from American workers (permalink) Trump's stolen a lot of workers' wages over the years, but this week, he has become history's greatest thief of wages, having directed his FTC to stop enforcing its ban on noncompetes "agreements," a move that will cost American workers $400 billion over the next ten years: https://prospect.org/labor/2025-09-09-trump-lets-bosses-grab-400-billion-worker-pay-noncompete-agreements/ The argument for noncompetes is this: modern industry is IP-intensive, and IP-intensive businesses need noncompetes, otherwise workers will take proprietary information with them when they walk out the door and bring it to a competitor. Who would invest in an IP-intensive firm under those circumstances? I'll tell you who would: Hollywood and Silicon Valley. These are two most IP-intensive industries in human history, both of which were incubated in California, a state whose constitution prohibits noncompetes and has done so through the entire history of those two industries. Indeed, we wouldn't have a Silicon Valley if California had noncompetes. Silicon Valley was founded by Robert Shockley, who won the Nobel Price for his role in inventing the silicon transistor (hence Silicon Valley). Shockley was a paranoid, virulent racist who couldn't produce a working chip because he was consumed by eugenic fervor and spent all his time on the road offering shares of his Nobel prize money to Black women who would agree to have their tubes tied. Lucky for (literally) everyone (except Robert Shockley), California doesn't have noncompetes, so eight of his top engineers ("The Traitorous Eight") were able to quit Shockley Semiconductor and start the first successful chip business: Fairchild Semiconductor. And then two of Fairchild's top engineers quit to found Intel: https://pluralistic.net/2021/10/24/the-traitorous-eight-and-the-battle-of-germanium-valley/ It's not just Silicon Valley that's rooted in wresting IP away from asshole control-freaks: that's Hollywood's story, too. Ever wonder how it was that movies were invented at Edison Labs in New Jersey, but the film industry was incubated in California, literally as far away from Edison as you could possibly get without ending up in Mexico? In short: California got the motion picture industry because Edison was an asshole who used his patents to control what kinds of movies could be made and to suck rents out of filmmakers to license those patents. So the most ambitious filmmakers in America fled to California, where Edison couldn't easily enforce his patents, and founded Hollywood: https://www.nytimes.com/2005/08/21/weekinreview/lala-land-the-origins.html?unlocked_article_code=1.kk8.5T1M.VSaEsN5Vn9tM&amp;smid=url-share And Hollywood stayed in Calfornia, a place where noncompetes couldn't be enforced, where "IP" could hop from one studio to another, smuggled out between the ears of writers, actors, directors, SFX wizards, prop makers, scenepainters, makeup artists, costumers, and the most creative professionals in Hollywood: accountants. Empirically speaking, the function of noncompetes is to trap good workers and good ideas in companies controlled by asshole bosses who can't get anything done. Any disinvestment that can be attributed to the absence of noncompetes is completely swamped by the dividends generated by good workers and good ideas escaping from control-freak asshole bosses and founding productive firms. As ever, money talks and bullshit walks. Today, one in 18 US workers is trapped by a noncompete, and those aren't the knowledge workers of Silicon Valley workers or Hollywood. So who is captured by this form of contractual indenture? The median US worker under noncompete is a fast-food worker stuck with the tipped minimum wage, or a pet groomer making the regular minimum wage. The function of the noncompete in America isn't to secure investment for knowledge-intensive industries – it's to stop the cashier at Wendy's from getting an extra $0.25/hour working the fry-trap at the McDonald's across the street. Noncompetes are an integral part of the conservative project, which is the substitution of individual power for democratic choice. As Dan Savage puts it, the GOP agenda is "Husbands you can't leave [ed: ending no-fault divorce], pregnancies you can't prevent or terminate [ed: banning contraception and abortion], politicians you can't vote out of office [ed: gerrymandering and voter suppression." Add to that: jobs you can't quit. It's not just noncompetes that lock workers to shitty bosses. When Biden's FTC investigated the issue, they revealed a widespread practice called "training repayment agreement provision," (TRAPs) that puts workers on the hook for thousands of dollars if they quit or get fired: https://pluralistic.net/2022/08/04/its-a-trap/#a-little-on-the-nose A TRAPped worker – often a pet-groomer at a private equity-owned giant like Petsmart – is charged $5,500 or more for three weeks of "training" that actually amount to one or two weeks of sweeping up pet-hair. But if they leave or get fired in the next three years, they have to pay back that whole amount: https://pluralistic.net/2022/08/04/its-a-trap/#a-little-on-the-nose A closely related concept is "bondage fees," which have been imposed on whole classes of workers, like doormen in NYC apartment buildings: https://pluralistic.net/2023/04/21/bondage-fees/#doorman-building These fees trap workers in dead-end jobs by forcing anyone who hires them away to pay massive fees to their former employers. It's just another way to lock workers to businesses. The irony here is that conservatives claim to worship "voluntarism" and "free choice," and insist that the virtue of markets is that they "aggregate price signals" so that companies can respond to these signals by efficiently matching demand to supply. But though conservatives say they worship free choice as an engine of economic efficiency, they understand that their ideas are so unpopular that they can only succeed if people are coerced into adopting them, hence voter suppression, gerrymandering, noncompetes, and other heads-I-win/tails-you-lose propositions. Noncompetes aren't about preventing the loss of IP – they're about preventing the loss of process knowledge, the know-how to turn ideas into products and services. Bosses love IP, because it can be alienated, hoarded and sold, while process knowledge is ineluctably vested in the bodies, minds and relations of workers. No IP law can keep employees from taking process knowledge with them on their way out the door, so bosses want to ban them from leaving: https://pluralistic.net/2025/09/08/process-knowledge/#dance-monkey-dance Biden's FTC banned noncompetes nationwide, for nearly every category of employment, deeming them an "unfair method of competition": https://www.ftc.gov/news-events/news/press-releases/2023/03/ftc-extends-public-comment-period-its-proposed-rule-ban-noncompete-clauses-until-april-19 FTC economists estimated that killing noncompetes would result in $400b in wage gains for the American workforce over the next decade, as good workers migrated to good bosses. Of course this was challenged by the business lobby, which sued to get the rule overturned. Trump's FTC has not only declined to defend the rule in court, they've also decided to stop trying to enforce it. Trump is now the king of wage-theft, and MAGA is a relentless engine of enshittification. After all, the thesis of enshittification is that companies make their products and practices worse for suppliers, users and business customers only when they calculate that they can do so without facing punishment – from regulators, competitors, or workers. Trump's regulators are all either comatose or so captured they wear gimpsuits and leashes in public. They're not keeping companies in line. And his antitrust shops have turned into pay-for-play operations, where a $1m payment to a MAGA influencer gets your case dropped: https://www.thebignewsletter.com/p/an-attempted-coup-at-the-antitrust Trump neutered the National Labor Relations Board and now he's revived indentured servitude nationwide, formalizing the idea of government-backed jobs you can't quit. If you can't quit your job or vote our your politicians, why wouldn't your boss or your elected representative just relentless fuck you over? Not merely for sadism's sake (though sadism undoubtedly plays a part here), but simply to make things better for themselves by making things worse for you? It's exactly the same logic of platform lock-in: once you can't leave, they don't have to keep you happy. Formalizing the legality of noncompetes will only lead to their monotonic spread. When Antonin Scalia greenlit binding arbitration waivers in consumer contracts, only a tiny number of companies used them, forcing customers to sign away their right to sue them no matter how badly, negligently or criminally they behaved. Today, binding arbitration has expanded into every kind of contract, even to the point where groovy, open source, decentralized, federated social media platforms are forcing it on their users: https://pluralistic.net/2025/08/15/dogs-breakfast/#by-clicking-this-you-agree-on-behalf-of-your-employer-to-release-me-from-all-obligations-and-waivers-arising-from-any-and-all-NON-NEGOTIATED-agreements Same for noncompetes: as private equity rolls up whole sectors – funeral homes, pet groomers, hospices – they will stuff noncompetes into the contracts of every employer in each industry, so no matter where a worker applies for a job, they'll have to sign a noncompete. Why wouldn't they? If workers can't leave, they'll accept worse working conditions and lower pay. The best workers will be stuck with the worst employers. And despite owing their existence to bans on noncompetes Silicon Valley and Hollywood will happily cram noncompetes down their workers' throats. If you doubt it, just read up on the "no poach" scandal, where the biggest tech and movie companies entered into a criminal conspiracy not to hire away each others' employees: https://en.wikipedia.org/wiki/High-Tech_Employee_Antitrust_Litigation The conservative future, folks: jobs you can't quit, politicians you can't vote out of office, husbands you can't divorce, and pregnancies you can't prevent or terminate. Hey look at this (permalink) Nate Silver's big list of grievances https://www.garbageday.email/p/nate-silver-s-big-list-of-grievances Electronic Dance Music vs. Copyright: Law as Weaponized Culture https://drive.proton.me/urls/TVH0PW4TZ8#EM5VMl1BUlny Google admits the open web is in ‘rapid decline’ https://www.theverge.com/news/773928/google-open-web-rapid-decline Britain Owes Palestine https://www.britainowespalestine.org/ A Dramatic Reading of The Recent New York Times Dispatch from the Hamptons. https://bsky.app/profile/zohrankmamdani.bsky.social/post/3lyech7chqs2q Object permanence (permalink) #20yrsago Crooks take anti-forensic countermeasures https://www.newscientist.com/article/mg18725163-800-television-shows-scramble-forensic-evidence/ #20yrsago Recording industry demands digital radio broadcast flag https://web.archive.org/web/20051018100306/https://www.godwinslaw.org/weblog/archive/2005/09/09/riaas-big-push-to-copy-protect-digital-radio #20yrsago Unicef/Save the Children sell out to recording industry https://web.archive.org/web/20050914034709/http://www.promusicae.org/pdf/campana_jovenes_musica_e_internet.pdf #15yrsago TSA forces pregnant traveller into full-body scanner https://web.archive.org/web/20100910235117/https://consumerist.com/2010/09/pregnant-traveler-tsa-screeners-bullied-me-into-full-body-scan.html #10yrsago Help crowdfund a relentless tsunami of FOIA requests into America’s private prisons https://www.muckrock.com/project/the-private-prison-project-8/ #10yrsago Your baby monitor is an Internet-connected spycam vulnerable to voyeurs and crooks https://web.archive.org/web/20210505050810/https://www.rapid7.com/blog/post/2015/09/02/iotsec-disclosure-10-new-vulns-for-several-video-baby-monitors/ #10yrsago Inept copyright bot sends 2600 a legal threat over ink blotches https://www.2600.com/content/2600-accused-using-unauthorized-ink-splotches #10yrsago FBI used Burning Man to field-test new surveillance equipment https://www.muckrock.com/news/archives/2015/sep/01/burning-man-fbi-file/ #10yrsago Fury Road, hieroglyph edition https://imgur.com/gallery/you-will-ride-eternal-papyrus-chrome-you-will-ride-eternal-papyrus-chrome-BxdOcTr#/t/chrome #10yrsago Little Brother optioned by Paramount https://www.tracking-board.com/tb-exclusive-paramount-pictures-picks-up-ny-times-bestselling-ya-novel-little-brother/ #10yrsago Record street-marches in Moldova against corrupt oligarchs https://www.euractiv.com/section/europe-s-east/news/moldova-banking-scandal-fuels-biggest-protest-ever/ #5yrsago Germany's amazing new competition proposalhttps://pluralistic.net/2020/09/09/free-sample/#wunderschoen #5yrsago DRM versus human rights https://pluralistic.net/2020/09/09/free-sample/#que-viva #1yrago America's best-paid CEOs have the worst-paid employees https://pluralistic.net/2024/09/09/low-wage-100/#executive-excess Upcoming appearances (permalink) Ithaca: Enshittification at Buffalo Street Books, Sept 11 https://buffalostreetbooks.com/event/2025-09-11/cory-doctorow-tcpl-librarian-judd-karlman Ithaca: AD White keynote (Cornell), Sep 12 https://deanoffaculty.cornell.edu/events/keynote-cory-doctorow-professor-at-large/ Ithaca: Enshittification at Autumn Leaves Books, Sept 13 https://www.autumnleavesithaca.com/event-details/enshittification-why-everything-got-worse-and-what-to-do-about-it Ithaca: Radicalized Q&A (Cornell), Sept 16 https://events.cornell.edu/event/radicalized-qa-with-author-cory-doctorow Ithaca: The Counterfeiters (Dinner/Movie Night) (Cornell), Sept 17 https://adwhiteprofessors.cornell.edu/visits/cory-doctorow/ Ithaca: Communication Power, Policy, and Practice (Cornell), Sept 18 https://events.cornell.edu/event/policy-provocations-a-conversation-about-communication-power-policy-and-practice Ithaca: A Reverse-Centaur's Guide to Being a Better AI Critic (Cornell), Sept 18 https://events.cornell.edu/event/2025-nordlander-lecture-in-science-public-policy NYC: Enshittification and Renewal (Cornell Tech), Sept 19 https://www.eventbrite.com/e/enshittification-and-renewal-a-conversation-with-cory-doctorow-tickets-1563948454929 NYC: Brooklyn Book Fair, Sept 21 https://brooklynbookfestival.org/event/big-techs-big-heist-cory-doctorow-in-conversation-with-adam-becker/ DC: Enshittification with Rohit Chopra (Politics and Prose), Oct 8 https://politics-prose.com/cory-doctorow-10825 NYC: Enshittification with Lina Khan (Brooklyn Public Library), Oct 9 https://www.bklynlibrary.org/calendar/cory-doctorow-discusses-central-library-dweck-20251009-0700pm New Orleans: DeepSouthCon63, Oct 10-12 http://www.contraflowscifi.org/ Chicago: Enshittification with Anand Giridharadas (Chicago Humanities), Oct 15 https://www.oldtownschool.org/concerts/2025/10-15-2025-kara-swisher-and-cory-doctorow-on-enshittification/ San Francisco: Enshittification at Public Works (The Booksmith), Oct 20 https://app.gopassage.com/events/doctorow25 Madrid: Conferencia EUROPEA 4D (Virtual), Oct 28 https://4d.cat/es/conferencia/ Miami: Enshittification at Books & Books, Nov 5 https://www.eventbrite.com/e/an-evening-with-cory-doctorow-tickets-1504647263469 Recent appearances (permalink) Nerd Harder! (This Week in Tech) https://twit.tv/shows/this-week-in-tech/episodes/1047 Techtonic with Mark Hurst https://www.wfmu.org/playlists/shows/155658 Cory Doctorow DESTROYS Enshittification (QAA Podcast) https://soundcloud.com/qanonanonymous/cory-doctorow-destroys-enshitification-e338 Latest books (permalink) "Picks and Shovels": a sequel to "Red Team Blues," about the heroic era of the PC, Tor Books (US), Head of Zeus (UK), February 2025 (https://us.macmillan.com/books/9781250865908/picksandshovels). "The Bezzle": a sequel to "Red Team Blues," about prison-tech and other grifts, Tor Books (US), Head of Zeus (UK), February 2024 (the-bezzle.org). "The Lost Cause:" a solarpunk novel of hope in the climate emergency, Tor Books (US), Head of Zeus (UK), November 2023 (http://lost-cause.org). "The Internet Con": A nonfiction book about interoperability and Big Tech (Verso) September 2023 (http://seizethemeansofcomputation.org). Signed copies at Book Soup (https://www.booksoup.com/book/9781804291245). "Red Team Blues": "A grabby, compulsive thriller that will leave you knowing more about how the world works than you did before." Tor Books http://redteamblues.com. "Chokepoint Capitalism: How to Beat Big Tech, Tame Big Content, and Get Artists Paid, with Rebecca Giblin", on how to unrig the markets for creative labor, Beacon Press/Scribe 2022 https://chokepointcapitalism.com Upcoming books (permalink) "Canny Valley": A limited edition collection of the collages I create for Pluralistic, self-published, September 2025 "Enshittification: Why Everything Suddenly Got Worse and What to Do About It," Farrar, Straus, Giroux, October 7 2025 https://us.macmillan.com/books/9780374619329/enshittification/ "Unauthorized Bread": a middle-grades graphic novel adapted from my novella about refugees, toasters and DRM, FirstSecond, 2026 "Enshittification, Why Everything Suddenly Got Worse and What to Do About It" (the graphic novel), Firstsecond, 2026 "The Memex Method," Farrar, Straus, Giroux, 2026 "The Reverse-Centaur's Guide to AI," a short book about being a better AI critic, Farrar, Straus and Giroux, 2026 Colophon (permalink) Today's top sources: Currently writing: "The Reverse Centaur's Guide to AI," a short book for Farrar, Straus and Giroux about being an effective AI critic. FIRST DRAFT COMPLETE AND SUBMITTED. A Little Brother short story about DIY insulin PLANNING This work – excluding any serialized fiction – is licensed under a Creative Commons Attribution 4.0 license. That means you can use it any way you like, including commercially, provided that you attribute it to me, Cory Doctorow, and include a link to pluralistic.net. https://creativecommons.org/licenses/by/4.0/ Quotations and images are not included in this license; they are included either under a limitation or exception to copyright, or on the basis of a separate license. Please exercise caution. How to get Pluralistic: Blog (no ads, tracking, or data-collection): Pluralistic.net Newsletter (no ads, tracking, or data-collection): https://pluralistic.net/plura-list Mastodon (no ads, tracking, or data-collection): https://mamot.fr/@pluralistic Medium (no ads, paywalled): https://doctorow.medium.com/ Twitter (mass-scale, unrestricted, third-party surveillance and advertising): https://twitter.com/doctorow Tumblr (mass-scale, unrestricted, third-party surveillance and advertising): https://mostlysignssomeportents.tumblr.com/tagged/pluralistic "When life gives you SARS, you make sarsaparilla" -Joey "Accordion Guy" DeVilla READ CAREFULLY: By reading this, you agree, on behalf of your employer, to release me from all obligations and waivers arising from any and all NON-NEGOTIATED agreements, licenses, terms-of-service, shrinkwrap, clickwrap, browsewrap, confidentiality, non-disclosure, non-compete and acceptable use policies ("BOGUS AGREEMENTS") that I have entered into with your employer, its partners, licensors, agents and assigns, in perpetuity, without prejudice to my ongoing rights and privileges. You further represent that you have the authority to release me from any BOGUS AGREEMENTS on behalf of your employer. ISSN: 3066-764X

6 hours ago 1 votes
AI Roundup 134: The young and the jobless

September 5, 2025.

5 days ago 7 votes
Pluralistic: Canny Valley (04 Sep 2025)

Today's links Canny Valley: My little art-book is here! Hey look at this: Delights to delectate. Object permanence: Ballmer throws a chair; Bruce Sterling on Singapore; RIP David Graeber; Big Car warns of lethal Right to Repair. Upcoming appearances: Where to find me. Recent appearances: Where I've been. Latest books: You keep readin' em, I'll keep writin' 'em. Upcoming books: Like I said, I'll keep writin' 'em. Colophon: All the rest. Canny Valley (permalink) I've spent every evening this week painstakingly unpacking, numbering and signing 500 copies of my very first art-book, a strange and sturdy little volume called Canny Valley. Canny Valley collects 80 of the best collages I've made for my Pluralistic newsletter, where I publish 5-6 essays every week, usually headed by a strange, humorous and/or grotesque image made up of public domain sources and Creative Commons works. These images are made from open access sources, and they are themselves open access, licensed Creative Commons Attribution Share-Alike, which means you can take them, remix them, even sell them, all without my permission. I never thought I'd become a visual artist, but as I've grappled with the daily challenge of figuring out how to illustrate my furious editorials about contemporary techno-politics, especially "enshittification," I've discovered a deep satisfaction from my deep dives into historical archives of illustration, and, of course, the remixing that comes afterward. Over the years, many readers have asked whether I would ever collect these in a book. Then I ran into Creative Commons CEO Anna Tumadóttir and we brainstormed ideas for donor gifts in honor of Creative Commons' 25th anniversary. My first novel was the first book ever released under a CC license, and while CC has gone on to bigger and better things (without CC there'd be no Wikipedia!), I never forget that my own artistic career and CC's trajectory are co-terminal: https://craphound.com/down/download/ Talking with Anna, I hit on the idea of making a beautiful little book of my favorite illustrations from Pluralistic. Anna thought CC could use about 400 of these, and all the printers I talked to offered me a pretty great quantity break at 500, so I decided I'd do it, and offer the excess 100 copies as premiums in my next Kickstarter, for the enshittification book: https://www.kickstarter.com/projects/doctorow/enshittification-the-drm-free-audiobook/ That Kickstarter is going really well – about to break $100,000! – and as I type these words, there are only five copies of Canny Valley up for grabs. I'm pretty sure they'll be gone long before the campaign closes in ten days. Of course, the fact that you can't get a physical copy of the book doesn't mean that you can't get access to all its media. Here's the full set of all 238 collages, in high-rez, for your plundering pleasure: https://www.flickr.com/photos/doctorow/albums/72177720316719208 But there is one part of this book that's not online: my pal and mentor Bruce Sterling, a cyberpunk legend turned electronic art impressario turned assemblage sculptor, wrote me a brilliant foreword for Canny Valley. Bruce gave me the go-ahead to license this CC BY 4.0 as well, and so I'm reproducing it below. Having spent several days now handling hundreds of these books, I have to say, I am indecently pleased with how they turned out, which is all down to other people. My friend John Berry, a legendary book designer and typographer, laid it out: https://johndberry.com/ And the folks at LA's best comics shop, Secret Headquarters, hooked me up with an incredible printer, the 100+ year old Pasadena institution Typecraft: https://www.typecraft.com/live2/who-we-are.html Typecraft ran this on a gorgeous Indigo printer on 100lb Mohawk paper that just drank the ink. The PVA glue in the binding will last a century, and the matte coat cover doesn't pick up smudges or fingerprints. It's a stunning little artifact. This has been so much fun (and such a success) that I imagine I'll do future volumes in the years to come. In the meantime, enjoy Bruce's intro, and join me in basking in the fact that "enshittification" has made Webster's: https://bsky.app/profile/merriam-webster.com/post/3lxxhhxo4nc2e INTRODUCTION by Bruce Sterling In 1970 a robotics professor named Masahiro Mori discovered a new problem in aesthetics. He called this "bukimi no tani genshō." The Japanese robots he built were functional, so the "bukimi no tani" situation was not an engineering problem. It was a deep and basic problem in the human perception of humanlike androids. Humble assembly robots, with their claws and swivels, those looked okay to most people. Dolls, puppets and mannequins, those also looked okay. Living people had always aesthetically looked okay to people. Especially, the pretty ones. However, between these two realms that the late Dr Mori was gamely attempting to weld together — the world of living mankind and of the pseudo-man-like machine– there was an artistic crevasse. Anything in this "Uncanny Valley" looked, and felt, severely not-okay. These overdressed robots looked and felt so eerie that their creator's skills became actively disgusting. The robots got prettier, but only up to a steep verge. Then they slid down the precipice and became zombie doppelgangers. That's also the issue with the aptly-titled "Canny Valley" art collection here. People already know how to react aesthetically to traditional graphic images. Diagrams are okay. Hand-drawn sketches and cartoons are also okay. Brush-made paintings are mostly fine. Photographs, those can get kind of dodgy. Digital collages that slice up and weld highly disparate elements like diagrams, cartoons, sketches and also photos and paintings, those trend toward the uncanny. The pixel-juggling means of digital image-manipulation are not art-traditional pencils or brushes. They do not involve the human hand, or maybe not even the human eye, or the human will. They're not fixed on paper or canvas; they're a Frankenstein mash-up landscape of tiny colored screen-dots where images can become so fried that they look and feel "cursed." They're conceptually gooey congelations, stuck in the valley mire of that which is and must be neither this-nor-that. A modern digital artist has billions of jpegs in files, folders, clouds and buckets. He's never gonna run out of weightless grist from that mill. Why would Cory Doctorow — novelist, journalist, activist, opinion columnist and so on — want to lift his typing fingers from his lettered keyboard, so as to create graphics with cut-and-paste and "lasso tools"? Cory Doctorow also has some remarkably tangled, scandalous and precarious issues to contemplate, summarize and discuss. They're not his scandalous private intrigues, though. Instead, they're scandalous public intrigues. Or, at least Cory struggles to rouse some public indignation about these intrigues, because his core topics are the tangled penthouse/slash/underground machinations of billionaire web moguls. Cory really knows really a deep dank lot about this uncanny nexus of arcane situations. He explains the shameful disasters there, but they're difficult to capture without torrents of unwieldy tech jargon. I think there are two basic reasons for this. The important motivation is his own need to express himself by some method other than words. I'm reminded here of the example of H. G. Wells, another science fiction writer turned internationally famous political pundit. HG Wells was quite a tireless and ambitious writer — so much so that he almost matched the torrential output of Cory Doctorow. But HG Wells nevertheless felt a compelling need to hand-draw cartoons. He called them "picshuas." These hundreds of "picshuas" were rarely made public. They were usually sketched in the margins of his hand-written letters. Commonly the picshuas were aimed at his second wife, the woman he had renamed "Jane." These picshuas were caricatures, or maybe rapid pen-and-ink conceptual outlines, of passing conflicts, events and situations in the life of Wells. They seemed to carry tender messages to Jane that the writer was unable or unwilling to speak aloud to her. Wells being Wells, there were always issues in his private life that might well pose a challenge to bluntly state aloud: "Oh by the way, darling, I've built a second house in the South of France where I spend my summers with a comely KGB asset, the Baroness Budberg." Even a famously glib and charming writer might feel the need to finesse that. Cory Doctorow also has some remarkably tangled, scandalous and precarious issues to contemplate, summarize and discuss. They're not his scandalous private intrigues, though. Instead, they're scandalous public intrigues. Or, at least Cory struggles to rouse some public indignation about these intrigues, because his core topics are the tangled penthouse/slash/underground machinations of billionaire web moguls. Cory really knows really a deep dank lot about this uncanny nexus of arcane situations. He explains the shameful disasters there, but they're difficult to capture without torrents of unwieldy tech jargon. So instead, he diligently clips, cuts, pastes, lassos, collages and pastiches. He might, plausibly, hire a professional artist to design his editorial cartoons for him. However, then Cory would have to verbally explain all his political analysis to this innocent graphics guy. Then Cory would also have to double-check the results of the artist and fix the inevitable newbie errors and grave misunderstandings. That effort would be three times the labor for a dogged crusader who is already working like sixty. It's more practical for him to mash-up images that resemble editorial cartoons. He can't draw. Also, although he definitely has a pronounced sense of aesthetics, it's not a aesthetic most people would consider tasteful. Cory Doctorow, from his very youth, has always had a "craphound" aesthetic. As an aesthete, Cory is the kind of guy who would collect rain-drenched punk-band flyers that had fallen off telephone poles and store them inside a 1950s cardboard kid-cereal box. I am not scolding him for this. He's always been like that. As Wells used to say about his unique "picshuas," they seemed like eccentric scribblings, but over the years, when massed-up as an oeuvre, they formed a comic burlesque of an actual life. Similarly, one isolated Doctorow collage can seem rather what-the-hell. It's trying to be "canny." If you get it, you get it. If you don't get the first one, then you can page through all of these, and at the end you will probably get it. En masse, it forms the comic burlesque of a digital left-wing cyberspatial world-of-hell. A monster-teeming Silicon Uncanny Valley of extensively raked muck. <img src="https://craphound.com/images/ai-freud.jpg" alt="Sigmund Freud's study with his famous couch. Behind the couch stands an altered version of the classic Freud portrait in which he is smoking a cigar. Freud's clothes and cigar have all been tinted in bright neon colors. His head has been replaced with the glaring red eye of HAL9000 from Kubrick's '2001: A Space Odyssey.' His legs have been replaced with a tangle of tentacles. Cryteria (modified)/https://commons.wikimedia.org/wiki/File:HAL9000.svg/CC BY 3.0/https://creativecommons.org/licenses/by/3.0/deed.en | Ser Amantio di Nicolao (modified)/https://commons.wikimedia.org/wiki/File:Study_with_the_couch,_Freud_Museum_London,_18M0143.jpg"/CC BY-SA 3.0/https://creativecommons.org/licenses/by-sa/3.0/deed"> There are a lot of web-comix people who like to make comic fun of the Internet, and to mock "the Industry." However, there's no other social and analytical record quite like this one. It has something of the dark affect of the hundred-year-old satirical Dada collages of Georg Schultz or Hannah Hoch. Those Dada collages look dank and horrible because they're "Dada" and pulling a stunt. These images look dank and horrible because they're analytical, revelatory and make sense. If you do not enjoy contemporary electronic politics, and instead you have somehow obtained an art degree, I might still be able to help you with my learned and well-meaning intro here. I can recommend a swell art-critical book titled "Memesthetics" by Valentina Tanni. I happen to know Dr. Tanni personally, and her book is the cat's pyjamas when it comes to semi-digital, semi-collage, appropriated, Situationiste-detournement, net.art "meme aesthetics." I promise that I could robotically mimic her, and write uncannily like her, if I somehow had to do that. I could even firmly link the graphic works of Cory Doctorow to the digital avant-garde and/or digital folk-art traditions that Valentina Tanni is eruditely and humanely discussing. Like with a lot of robots, the hard part would be getting me to stop. Cory works with care on his political meme-cartoons — because he is using them to further his own personal analysis, and to personally convince himself. They're not merely sharp and partisan memes, there to rouse one distinct viewer-emotion and make one single point. They're like digital jigsaw-puzzle landscape-sketches — unstable, semi-stolen and digital, because the realm he portrays is itself also unstable, semi-stolen and digital. The cartoons are dirty and messy because the situations he tackles are so dirty and messy. That's the grain of his lampoon material, like the damaged amps in a punk song. A punk song that was licensed by some billionaire and then used to spy on hapless fans with surveillance-capitalism. Since that's how it goes, that's also what you're in for. You have been warned, and these collages will warn you a whole lot more. If you want to aesthetically experience some elegant, time-tested collage art that was created by a major world artist, then you should gaze in wonder at the Max Ernst masterpiece, "Une semaine de bonté" ("A Week of Kindness"). This indefinable "collage novel" aka "artist's book" was created in the troubled time of 1934. It's very uncanny rather than "canny, "and it's also capital-A great Art. As an art critic, I could balloon this essay to dreadful robotic proportions while I explain to you in detail why this weirdo mess is a lasting monument to the expressive power of collage. However, Cory Doctorow is not doing Max Ernst's dreamy, oneiric, enchanting Surrealist art. He would never do that and it wouldn't make any sense if he did. Cory did this instead. It is art, though. It is what it is, and there's nothing else like it. It's artistic expression as Cory Doctorow has a sincere need to perform that, and in twenty years it will be even more rare and interesting. It's journalism ahead of its time (a little) and with a passage of time, it will become testimonial. Bruce Sterling — Ibiza MMXXV Hey look at this (permalink) Twitter users on Enshittification https://x.com/search?q=https%3A%2F%2Ftwitter.com%2FMerriamWebster%2Fstatus%2F1963336587712057346&amp;src=typed_query&amp;f=live Introducing Structural Zero: a New Monthly Newsletter https://hrdag.org/introducing-structural-zero-a-new-monthly-newsletter/ 70 leading Canadians, civil society groups ask Carney to protect Canada's 'digital sovereignty' https://www.cbc.ca/news/politics/open-letter-mark-carney-digital-sovereignty-1.7623128 AI Darwin Awards https://aidarwinawards.org/ Kraft Heinz went all-in on scale. Now it’s banking on a breakup to save its business https://www.cnn.com/2025/09/03/business/kraft-heinz-nightcap Object permanence (permalink) #20yrsago Singapore’s cool-ass hard-drive video-players https://memex.craphound.com/2005/09/03/singapores-cool-ass-hard-drive-video-players/ #20yrsago Being Poor — meditation by John Scalzi https://whatever.scalzi.com/2005/09/03/being-poor/ #20yrsago MSFT CEO: I will “fucking kill” Google — then he threw a chair https://battellemedia.com/archives/2005/09/ballmer_throws_a_chair_at_fing_google #20yrsago Massachusetts to MSFT: switch to open formats or you’re fired https://web.archive.org/web/20051001011728/http://www.boston.com/business/technology/articles/2005/09/02/state_may_drop_office_software/ #20yrsago Bruce Sterling’s Singapore wrapup https://web.archive.org/web/20051217133502/https://wiredblogs.tripod.com/sterling/index.blog?entry_id=1211240 #20yrsago Apple //e mainboards networked and boxed: the Applecrate https://web.archive.org/web/20050407173742/http://members.aol.com/MJMahon/CratePaper.html #15yrsago Jewelry made from laminated, polished cross-sections of bookshttps://littlefly.co.uk/ #15yrsago Boneless, clubfooted French Connection model invades Melbournehttps://www.flickr.com/photos/doctorow/4953586953/ #5yrsago Corporate spooks track you "to your door" https://pluralistic.net/2020/09/03/rip-david-graeber/#hyas #5yrsago Hedge fund managers trouser 64% https://pluralistic.net/2020/09/03/rip-david-graeber/#2-and-20 #5yrsago Rest in Power, David Graeber https://pluralistic.net/2020/09/03/rip-david-graeber/#rip-david-graeber #5yrsago Coronavirus is over (if we want it) https://pluralistic.net/2020/09/03/rip-david-graeber/#test-test-test #5yrsago Snowden vindicated https://pluralistic.net/2020/09/03/rip-david-graeber/#criming-spooks #5yrsago Algorithmic grading https://pluralistic.net/2020/09/03/rip-david-graeber/#computer-says-no #5yrsago Big Car says Right to Repair will MURDER YOU https://pluralistic.net/2020/09/03/rip-david-graeber/#rolling-surveillance-platforms Upcoming appearances (permalink) Ithaca: AD White keynote (Cornell), Sep 12 https://deanoffaculty.cornell.edu/events/keynote-cory-doctorow-professor-at-large/ DC: Enshittification at Politics and Prose, Oct 8 https://politics-prose.com/cory-doctorow-10825 NYC: Enshittification with Lina Khan (Brooklyn Public Library), Oct 9 https://www.bklynlibrary.org/calendar/cory-doctorow-discusses-central-library-dweck-20251009-0700pm New Orleans: DeepSouthCon63, Oct 10-12 http://www.contraflowscifi.org/ Chicago: Enshittification with Anand Giridharadas (Chicago Humanities), Oct 15 https://www.oldtownschool.org/concerts/2025/10-15-2025-kara-swisher-and-cory-doctorow-on-enshittification/ San Francisco: Enshittification at Public Works (The Booksmith), Oct 20 https://app.gopassage.com/events/doctorow25 Madrid: Conferencia EUROPEA 4D (Virtual), Oct 28 https://4d.cat/es/conferencia/ Miami: Enshittification at Books & Books, Nov 5 https://www.eventbrite.com/e/an-evening-with-cory-doctorow-tickets-1504647263469 Recent appearances (permalink) Nerd Harder! (This Week in Tech) https://twit.tv/shows/this-week-in-tech/episodes/1047 Techtonic with Mark Hurst https://www.wfmu.org/playlists/shows/155658 Cory Doctorow DESTROYS Enshittification (QAA Podcast) https://soundcloud.com/qanonanonymous/cory-doctorow-destroys-enshitification-e338 Latest books (permalink) "Picks and Shovels": a sequel to "Red Team Blues," about the heroic era of the PC, Tor Books (US), Head of Zeus (UK), February 2025 (https://us.macmillan.com/books/9781250865908/picksandshovels). "The Bezzle": a sequel to "Red Team Blues," about prison-tech and other grifts, Tor Books (US), Head of Zeus (UK), February 2024 (the-bezzle.org). "The Lost Cause:" a solarpunk novel of hope in the climate emergency, Tor Books (US), Head of Zeus (UK), November 2023 (http://lost-cause.org). "The Internet Con": A nonfiction book about interoperability and Big Tech (Verso) September 2023 (http://seizethemeansofcomputation.org). Signed copies at Book Soup (https://www.booksoup.com/book/9781804291245). "Red Team Blues": "A grabby, compulsive thriller that will leave you knowing more about how the world works than you did before." Tor Books http://redteamblues.com. "Chokepoint Capitalism: How to Beat Big Tech, Tame Big Content, and Get Artists Paid, with Rebecca Giblin", on how to unrig the markets for creative labor, Beacon Press/Scribe 2022 https://chokepointcapitalism.com Upcoming books (permalink) "Canny Valley": A limited edition collection of the collages I create for Pluralistic, self-published, September 2025 "Enshittification: Why Everything Suddenly Got Worse and What to Do About It," Farrar, Straus, Giroux, October 7 2025 https://us.macmillan.com/books/9780374619329/enshittification/ "Unauthorized Bread": a middle-grades graphic novel adapted from my novella about refugees, toasters and DRM, FirstSecond, 2026 "Enshittification, Why Everything Suddenly Got Worse and What to Do About It" (the graphic novel), Firstsecond, 2026 "The Memex Method," Farrar, Straus, Giroux, 2026 "The Reverse-Centaur's Guide to AI," a short book about being a better AI critic, Farrar, Straus and Giroux, 2026 Colophon (permalink) Today's top sources: Currently writing: "The Reverse Centaur's Guide to AI," a short book for Farrar, Straus and Giroux about being an effective AI critic. FIRST DRAFT COMPLETE AND SUBMITTED. A Little Brother short story about DIY insulin PLANNING This work – excluding any serialized fiction – is licensed under a Creative Commons Attribution 4.0 license. That means you can use it any way you like, including commercially, provided that you attribute it to me, Cory Doctorow, and include a link to pluralistic.net. https://creativecommons.org/licenses/by/4.0/ Quotations and images are not included in this license; they are included either under a limitation or exception to copyright, or on the basis of a separate license. Please exercise caution. How to get Pluralistic: Blog (no ads, tracking, or data-collection): Pluralistic.net Newsletter (no ads, tracking, or data-collection): https://pluralistic.net/plura-list Mastodon (no ads, tracking, or data-collection): https://mamot.fr/@pluralistic Medium (no ads, paywalled): https://doctorow.medium.com/ Twitter (mass-scale, unrestricted, third-party surveillance and advertising): https://twitter.com/doctorow Tumblr (mass-scale, unrestricted, third-party surveillance and advertising): https://mostlysignssomeportents.tumblr.com/tagged/pluralistic "When life gives you SARS, you make sarsaparilla" -Joey "Accordion Guy" DeVilla READ CAREFULLY: By reading this, you agree, on behalf of your employer, to release me from all obligations and waivers arising from any and all NON-NEGOTIATED agreements, licenses, terms-of-service, shrinkwrap, clickwrap, browsewrap, confidentiality, non-disclosure, non-compete and acceptable use policies ("BOGUS AGREEMENTS") that I have entered into with your employer, its partners, licensors, agents and assigns, in perpetuity, without prejudice to my ongoing rights and privileges. You further represent that you have the authority to release me from any BOGUS AGREEMENTS on behalf of your employer. ISSN: 3066-764X

6 days ago 7 votes
The Chatbot Wars Are Over. What Comes Next?

Spoiler alert: ChatGPT won.

a week ago 14 votes