Logical replication in Postgres: Basics

from Notes on software development [alt+shift+b] in technology

This is an external post of mine. Click here if you are not redirected.

5 months ago

Remove from reading list Add to reading list [alt+a] Read now [→]

Improve your reading experience

Logged in users get linked directly to articles resulting in a better reading experience. Please login for free, it takes less than 1 minute.

More from Notes on software development

Burn your title

I've been a developer, a manager, a cofounder, and now I'm a developer again. I ran away from each position until being a founder because I felt like I was limited by what I was allowed to do. But I reached an enlightment of sorts during my career progression: that everyone around me was dying for someone to pick things up, for employees to show engagement and agency. We think of our titles as our limits. We're quick to say and believe, "that isn't my job". While in reality titles reflect the minimum expected of us, not the maximum that is open to us. Moving your career forward Trying to figure out what (new minimum) you must do to get promoted seems kind of backwards to me, reinforcing our sense of our own limits. Instead, at every stage in your career, focus on doing the intersection of: what you see needs to be done (that isn't being done) what you are capable of doing what you have the desire/energy (or would find fulfillment) doing And this is the path to promotion and a successful and interesting career. Burn your title. Burn your job description. I mean, keep your boss happy for sure. Keep your teammates happy by supporting them and building them up and communicating well. But don't wait to be officially made a lead or given a new title to do what otherwise fits into that intersection above. And if after doing this for some time, demonstrating this level of agency, you are not promoted, it just means you're not at the right company or right organization within your company and you should look elsewhere. What's more this work you did (at a company that doesn't appreciate your agency, if that happens to be the case) merely makes the case stronger for your successful interview at the next company. There's no downside. The cynical, and perhaps realistic, alternative to this is to do politics to get promoted. Or to do not do politics but to do things that don't align with your long-term goals. I'm not personally interested in either path so I'm not covering them here. I'm interested in the intersection of things that move me in the direction I want, things that are useful to the company, and things that I am capable of doing (in addition to whatever minimum work I must actually do). Examples Here's a peek at what this looks like for me as an individual contributor, a programmer, at EnterpriseDB. I started the EDB Engineering Newsletter because it seemed like we needed to do a better job telling the world the awesome things our engineering team is doing. (You know we're one of the biggest contributors to Postgres? Bruce Momjian, Robert Haas, Peter Eisentraut, etc. work here? The guy who implemented the WAL and MVCC in Postgres is my teammate?) Nobody asked me to do that. I started publishing blog views for the entire company once a month internally. Nobody asked me to do that. I wrote a number of internal docs and tutorials on the product because we were just obviously missing them. Nobody asked me to do that. I started a fortnightly incident review meeting for my team because it seemed like we were missing chances to update docs and teach each other. Nobody asked me to do that. I write a bunch of random posts for the company blog on what I've learned. Nobody asked me to do that. These are just a few of the random things that seemed like a good idea for me to do on top of my Actual Work as a developer, which I think I do a decent job of on its own. In closing Don't burn out. Don't do things you aren't asked for and don't find rewarding. Or that won't pave the way toward the career you want. I'm trying to be very careful not to advocate anything along those lines. But also don't wait to be asked to do something. Do what is interesting and obvious and rewarding to you. Interesting opportunities seem to come most reliably when you make them for yourself. Burn your title pic.twitter.com/4bQRPMX4EZ — Phil Eaton (@eatonphil) April 22, 2025

2 months ago • 25 votes

Transactions are a protocol

Transactions are not an intrinsic part of a storage system. Any storage system can be made transactional: Redis, S3, the filesystem, etc. Delta Lake and Orleans demonstrated techniques to make S3 (or cloud storage in general) transactional. Epoxy demonstrated techniques to make Redis (and any other system) transactional. And of course there's always good old Two-Phase Commit. If you don't want to read those papers, I wrote about a simplified implementation of Delta Lake and also wrote about a simplified MVCC implementation over a generic key-value storage layer. It is both the beauty and the burden of transactions that they are not intrinsic to a storage system. Postgres and MySQL and SQLite have transactions. But you don't need to use them. It isn't possible to require you to use transactions. Many developers, myself a few years ago included, do not know why you should use them. (Hint: read Designing Data Intensive Applications.) And you can take it even further by ignoring the transaction layer of an existing transactional database and implement your own transaction layer as Convex has done (the Epoxy paper above also does this). It isn't entirely clear that you have a lot to lose by implementing your own transaction layer since the indexes you'd want on the version field of a value would only be as expensive or slow as any other secondary index in a transactional database. Though why you'd do this isn't entirely clear (I will like to read about this from Convex some time). It's useful to see transaction protocols as another tool in your system design tool chest when you care about consistency, atomicity, and isolation. Especially as you build systems that span data systems. Maybe, as Ben Hindman hinted at the last NYC Systems, even proprietary APIs will eventually provide something like two-phase commit so physical systems outside our control can become transactional too. Transactions are a protocol short new post pic.twitter.com/nTj5LZUpUr — Phil Eaton (@eatonphil) April 20, 2025

2 months ago • 28 votes

Things that go wrong with disk IO

There are a few interesting scenarios to keep in mind when writing applications (not just databases!) that interact with read and writes files, particularly in transactional contexts where you actually care about the integrity of the data and when you are editing data in place (versus copy-on-write for example). If I don't say otherwise I'm talking about behavior on Linux. The research version of this blog post is Parity Lost and Parity Regained and Characteristics, Impact, and Tolerance of Partial Disk Failures. These two papers also go into the frequency of some of the issues discussed here. These behaviors actually happen in real life! Thank you to Alex Miller and George Xanthakis for reviewing a draft of this post. Terminology Some of these terms are reused in different contexts, and sometimes they are reused because they effectively mean the same thing in a certain configuration. But I'll try to be explicit to avoid confusion. Sector The smallest amount of data that can be read and written atomically by hardware. It used to be 512 bytes, but on modern disks it is often 4KiB. There doesn't seem to be any safe assumption you can make about sector size, despite file system defaults (see below). You must check your disks to know. Block (filesystem/kernel view) Typically set to the sector size since only this block size is atomic. The default in ext4 is 4KiB. Page (kernel view) A disk block that is in memory. Any reads/writes less than the size of a block will read the entire block into kernel memory even if less than that amount is sent back to userland. Page (database/application view) The smallest amount of data the system (database, application, etc.) chooses to act on, when it's read or written or held in memory. The page size is some multiple of the filesystem/kernel block size (including the multiple being 1). SQLite's default page size is 4KiB. MySQL's default page size is 16KiB. Postgres's default page size is 8KiB. Things that go wrong The data didn't reach disk By default, file writes succeed when the data is copied into kernel memory (buffered IO). The man page for write(2) says: A successful return from write() does not make any guarantee that data has been committed to disk. On some filesystems, including NFS, it does not even guarantee that space has successfully been reserved for the data. In this case, some errors might be delayed until a future write(), fsync(2), or even close(2). The only way to be sure is to call fsync(2) after you are done writing all your data. If you don't call fsync on Linux the data isn't necessarily durably on disk, and if the system crashes or restarts before the disk writes the data to non-volatile storage, you may lose data. With O_DIRECT, file writes succeed when the data is copied to at least the disk cache. Alternatively you could open the file with O_DIRECT|O_SYNC (or O_DIRECT|O_DSYNC) and forgo fsync calls. fsync on macOS is a no-op. If you're confused, read Userland Disk I/O. Postgres, SQLite, MongoDB, MySQL fsync data before considering a transaction successful by default. RocksDB does not. The data was fsynced but fsync failed fsync isn't guaranteed to succeed. And when it fails you can't tell which write failed. It may not even be a failure of a write to a file that your process opened: Ideally, the kernel would report errors only on file descriptions on which writes were done that subsequently failed to be written back. The generic pagecache infrastructure does not track the file descriptions that have dirtied each individual page however, so determining which file descriptors should get back an error is not possible. Instead, the generic writeback error tracking infrastructure in the kernel settles for reporting errors to fsync on all file descriptions that were open at the time that the error occurred. In a situation with multiple writers, all of them will get back an error on a subsequent fsync, even if all of the writes done through that particular file descriptor succeeded (or even if there were no writes on that file descriptor at all). Don't be 2018-era Postgres. The only way to have known which exact write failed would be to open the file with O_DIRECT|O_SYNC (or O_DIRECT|O_DSYNC), though this is not the only way to handle fsync failures. The data was corrupted If you don't checksum your data on write and check the checksum on read (as well as periodic scrubbing a la ZFS) you will never be aware if and when the data gets corrupted and you will have to restore (who knows how far back in time) from backups if and when you notice. ZFS, MongoDB (WiredTiger), MySQL (InnoDB), and RocksDB checksum data by default. Postgres and SQLite do not (though databases created from Postgres 18+ will). You should probably turn on checksums on any system that supports it, regardless of the default. The data was partially written Only when the page size you write = block size of your filesystem = sector size of your disk is a write guaranteed to be atomic. If you need to write multiple sectors of data atomically there is the risk that some sectors are written and then the system crashes or restarts. This is called torn writes or torn pages. Postgres, SQLite, and MySQL (InnoDB) handle torn writes. Torn writes are by definition not relevant to immutable storage systems like RocksDB (and other LSM Tree or Copy-on-Write systems like MongoDB (WiredTiger)) unless writes (that update metadata) span sectors. If your file system duplicates all writes like MySQL (InnoDB) does (like you can with data=journal in ext4) you may also not have to worry about torn writes. On the other hand, this amplifies writes 2x. The data didn't reach disk, part 2 Sometimes fsync succeeds but the data isn't actually on disk because the disk is lying. These are called lost writes or phantom writes. You can be resilient to phantom writes by always reading back what you wrote (expensive) or versioning what you wrote. Databases and file systems generally do not seem to handle this situation. The data was written to the wrong place, read from the wrong place If you aren't including where data is supposed to be on disk as part of the checksum or page itself, you risk being unaware that you wrote data to the wrong place or that you read from the wrong place. This is called misdirected writes/reads. Databases and file systems generally do not seem to handle this situation. Further reading In increasing levels of paranoia (laudatory) follow ZFS, Andrea and Remzi Arpaci-Dusseau, and TigerBeetle.

3 months ago • 31 votes

Phil Eaton on Technical Blogging

This is an external post of mine. Click here if you are not redirected.

3 months ago • 40 votes

Minimal downtime Postgres major version upgrades with EDB Postgres Distributed

This is an external post of mine. Click here if you are not redirected.

4 months ago • 34 votes

More in technology

You should repaste your MacBook (but don't)

My favorite memory of my M1 Pro MacBook Pro was the whole sensation of “holy crap, you never hear the fans in this thing”, which was very novel in 2021. Four years later, this MacBook Pro is still a delight. It’s the longest I’ve ever owned a laptop, and while I’d love to pick up the new M4 goodness, this dang thing still seems to just shrug at basically anything I throw at it. Video editing, code compiling, CAD models, the works. (My desire to update is helped though by the fact I got the 2TB SSD, 32GB RAM option, and upgrading to those on new MacBooks is still eye wateringly expensive.) But my MacBook is starting to show its age in one area: it’s not quiet anymore. If you’re doing anything too intensive like compiling code for awhile, or converting something in Handbrake, the age of the fans being quiet is long past. The fans are properly loud. (And despite having two cats, it’s not them! I clean out the fans pretty regularly.) Enter the thermal paste Everyone online seems to point toward one thing: the thermal paste on computers tends to dry up over the years. What the heck is thermal paste? Well, components on your computer that generate a lot of heat are normally made to touch something like a copper heatsink that is really good at pulling that heat away from it. The issue is, when you press these two metal surfaces against each other, even the best machining isn’t perfect and you there’s microscopic gaps between them meaning there’s just air at those parts, and air is a terrible conductor of heat. The solution is to put a little bit of thermal paste (basically a special grey toothpaste gunk that is really good at transferring heat) between them, and it fills in any of those microscopic gaps. The problem with this solution is after hundreds and hundreds of days of intense heat, the paste can dry up into something closer to almost a powder, and it’s not nearly as good at filling in those gaps. Replacement time The logic board! MacBook thermal paste isn’t anything crazy (for the most part, see below), custom PC builders use thermal paste all the time so incredibly performant options are available online. I grabbed a tube of Noctua NT-H2 for about $10 and set to taking apart my MacBook to swap out the aging thermal paste. And thankfully, iFixit has a tremendous, in depth guide on the disassembly required, so I got to it. Indeed, that grey thermal paste looked quite old, but also above and below it (on the RAM chips) I noticed something that didn’t quite seem like thermal paste, it was far more… grainy almost? Spottiness is due to half of it being on the heatsink It turns out, ending with my generation of MacBooks (lucky me!) Apple used a very special kind of thermal compound often called “Carbon Black”, which is basically designed to be able to bridge an even thicker gap than traditional thermal paste. I thought about replacing it, but it seems really hard to come across that special thermal compound (and do not do it with normal thermal paste) and my RAM temperatures always seemed fine (65°C is fine… right?) so I just made sure to not touch that. For the regular grey thermal paste, I used some cotton swabs and isopropyl alcohol to remove the dried up existing thermal paste, then painted on a bit of the new stuff. Disaster To get to the underside of the CPU, you basically need to disassemble the entire MacBook. It’s honestly not that hard, but iFixit warned that the fan cables (which also need to be unclipped) are incredibly delicate. And they’re not wrong, seriously they have the structural integrity of the half-ply toilet paper available at gas stations. So, wouldn’t you know it, I moved the left fan’s cable a bit too hard and it completely tore in half. Gah. I found a replacement fan online (yeah you can’t just buy the cable, need a whole new fan) and in the meantime I just kept an eye on my CPU thermals. As long as I wasn’t doing anything too intensive it honestly always stayed around 65° which was warm, but not terrifying (MacBook Airs completely lack a fan, after all). Take two A few days later, the fans arrived, and I basically had to redo the entire disassembly process to get to the fans. At least I was a lot faster this time. The fan was incredibly easy to swap out (hats off there, Apple!) and I screwed everything back together and began reconnecting all the little connectors. Until I saw it: the tiny (made of the same half ply material as the fan cable) Touch ID sensor cable was inexpicably torn in half, the top half just hanging out. I didn’t even half to touch this thing really, and I hadn’t even got to the stage of reconnecting it (I was about to!), it comes from underneath the logic board and I guess just the movement of sliding the logic board back in sheared it in half. me Bah. I looked up if I could just grab another replacement cable here, and sure enough you can… but the Touch ID chip is cryptographically paired to your MacBook so you’d have to take it into an Apple Store. Estimates seemed to be in the hundreds of dollars, so if anyone has any experience there let me know, but for now I’m just going to live happily without a Touch ID sensor… or the button because the button also does not work. RIP little buddy (And yeah I’m 99.9% sure I can’t solder this back together, there’s a bunch of tiny lanes that make up the cable that you would need experience with proper micro-soldering to do.) Honestly, the disassembly process for my MacBook was surprisingly friendly and not very difficult, I just really wish they beefed up some of the cables even slightly so they weren’t so delicate. The results I was going to cackle if I went through all that just to have identical temperatures as before, but I’m very happy to say they actually improved a fair bit. I ran a Cinebench test before disassembling the MacBook the very first time to establish a baseline: Max CPU temperature: 102°C Max fan speed: 6,300 RPM Cinbench score: 12,252 After the new thermal paste (and the left fan being new): Max CPU temperature: 96°C Max fan speed: 4,700 RPM Cinbench score: 12,316 Now just looking at those scores you might be like… so? But let me tell you, dropping 1,600 RPM on the fan is a noticeable change, it goes from “Oh my god this is annoyingly loud” to “Oh look the fans kicked in”, and despite slower fan speeds there was still a decent drop in CPU temperature! And a 0.5% higher Cinebench score! But where I also really notice it is in idling: just writing this blog post my CPU was right at 46°C the whole time, where previously my computer idled right aroud 60°C. The whole computer just feels a bit healthier. So… should you do it? Honestly, unless you’re very used to working on small, delicate electronics, probably not. But if you do have that experience and are very careful, or have a local repair shop that can do it for a reasonable fee (and your MacBook is a few years old so as to warrant it) it’s honestly a really nice tweak that I feel will hopefully at least get me to the M5 generation. I do miss Touch ID, though.

2 days ago • 6 votes

Six Game Devs Speak to Computer Games Mag (1984)

Meet the Creators of Choplifter, Wizardry, Castle Wolfenstein, Zaxxon, Canyon Climber, and the Arcade Machine

2 days ago • 4 votes

New AWS x Arduino Opta Workshop: Connect your PLC to the Cloud in just a few steps

We’re excited to invite you to a brand-new workshop created in collaboration with Amazon Web Services (AWS). Whether you’re modernizing factory operations or tinkering with your first industrial project, this hands-on workshop is your gateway to building cloud-connected PLCs that ship data – fast. At Arduino, we believe in making advanced technology more accessible. That’s […] The post New AWS x Arduino Opta Workshop: Connect your PLC to the Cloud in just a few steps appeared first on Arduino Blog.

2 days ago • 4 votes

The History of Acer

A Shy Kid Builds the Taiwanese Tech Industry

5 days ago • 11 votes

Concept Bytes’ coffee table tracks people and walks itself across a room when called

The term “mmWave” refers to radio waves with wavelengths on the millimeter scale. When it comes to wireless communications technology, like 5G, mmWave allows for very fast data transfer — though that comes at the expense of range. But mmWave technology also has some very useful sensing and scanning applications, which you may have experienced […] The post Concept Bytes’ coffee table tracks people and walks itself across a room when called appeared first on Arduino Blog.

5 days ago • 8 votes

New here?