Full Width [alt+shift+f] Shortcuts [alt+shift+k]
Sign Up [alt+shift+s] Log In [alt+shift+l]
104
.highlight pre { background-color: #efecec; border-color: var(--theme-secondary-background-color); border-radius: 10px; } The firehose of data is turned on In the beginning, the Internet was a small, cozy place. Most people weren’t online, and most businesses weren’t really online. The old Internet was for nerds willing to suffer through the less-than-straightforward technical setup, before the soul-scraping screech of a 28K baud modem resulted in a successful connection to the interwebs. Finally - we could now slowly download, bar by bar, images of Cindy Margolis. It was an innocent time, with tacky page view counters, guestbooks, “dancing baby” animated gifs, scrolling marquees, and just terrible background color choices. Back then we discovered things on the Web through an array of search engines - AltaVista, Excite, Lycos, Yahoo… None of them particularly stood out on their own. Yahoo was a more thorough, actual directory of websites maintained by fellow life...
a year ago

Improve your reading experience

Logged in users get linked directly to articles resulting in a better reading experience. Please login for free, it takes less than 1 minute.

More from Renegade Otter

A Lannister Always Pays His Technical Debts

A tale of two rewrites Jamie Zawinski is kind of a tech legend. He came up with the name “Mozilla”, invented that whole thing where you can send HTML in emails, and more. In his harrowing work diary of how Mosaic/Netscape came to be, Jamie described the burnout rodeo that was the Mosaic development (the top disclaimer has its own history — ignore it): I slept at work again last night; two and a half hours curled up in a quilt underneath my desk, from 11am to 1:30pm or so. That was when I woke up with a start, realizing that I was late for a meeting we were scheduled to have to argue about colormaps and dithering, and how we should deal with all the nefarious 8-bit color management issues. But it was no big deal, we just had the meeting later. It’s hard for someone to hold it against you when you miss a meeting because you’ve been at work so long that you’ve passed out from exhaustion. Netscape’s wild ride is well-depicted in the dramatized Discovery mini-series Valley of the Boom, and the company eventually collapsed with the death march rewrite of what seemed to be just seriously unmaintainable code. It was the subject of one of the more famous articles by ex-Microsoft engineer and then entrepreneur Joel Spolsky - Things You Should Never Do. While the infamous Netscape codebase is long gone, the people that it enriched are still shaping the world to this day. There have been big, successful rewrites. Twitter moved away from Ruby-on-Rails to JVM over a decade ago but the first, year-long full rewrite effort completely failed. Following architecture by fiat from the top, the engineering team said nothing, speaking out only days before the launch. The whole thing would crash out of the gate, they claimed, so Twitter had to go back to the drawing board and rewrite again. I'd love to hear from you. What didn’t work for Netscape worked for Twitter. Why? Netscape had major heat coming from ruthless Microsoft competition, very little time for major moves, and a team aleady exhausted from “office heroics”. Twitter, however, is a unique product that is incredibly hard to dislodge, even with the almost purposefully incompetent and reckless management. It’s hard to abandon your social media account after accumulating algorithmic reputation and followers for years, and yet one can switch browsers faster than they can switch socks. Companies often do not survive this kind of adventure without having an almost unfair moat. Those that do survive, they probably caught some battle scars. Friendly Fire: Notify in Slack directly Skip reviewers who are not available File pattern matching Individual code review reminders No access to your codebase needed The road to hell is paved with TODO comments All of this is to say that you should probably never let your system rot so badly until a code rewrite is even discussed. It never just happens. Your code doesn’t just become unmaintainable overnight. It gets there by the constant cutting of corners, hard-coding things, and crop-dusting your work with long-forgotten //FIXME comments. Fix who? We used to call it technical debt - a term that is now being frowned upon. The concept of “technical debt” got popular around the time when we were getting obsessed with “proh-cess” and Agile, as we got tired of death march projects, arbitrary deadlines, and general lack of structure and visibility into our work. Every software project felt like a tour — you came up for air and then went back into the 💩 for months. Agile meant that the stakeholders could be present in our planning meetings. We had to explain to them - somehow - that it took time to upgrade the web framework from v1 to v5 because no one has been using v1 for years, and in general, it slowed everyone down. Since we didn’t know how to explain this to a non-coder, someone came up with the condescending “technical debt” — “those spreadsheet monkeys wouldn’t understand what we do here!” While “technical debt” has most likely run its course as a manipulative verbal device, it is absolutely the right term to use amongst ourselves to reason about risks and to properly triage them. The three type of technical debt The word “debt” has negative connotations for sure, but just like with actual monetary debt, it’s never great but not always horrible. To mutilate the famous saying - you have to spend code to make code. I would categorize technical debt into three types — Aesthetic, Deferrable, and Toxic. A mark of a good engineer is knowing when to create technical debt, what kind of debt, and when to repay it. Aesthetic debt This is the kind of stuff that triggers your OCD but does not really affect your users or your velocity in any way. Maybe the imports are not sorted the way you want, and maybe there is a naming convention that is grinding your gears. It’s something that can be addressed with relatively low effort when you are good and ready, in many cases with proper automated code analysis and tools. Deferrable debt Deferrable debt is what should be refactored at some point, but it’s fairly contained and will not be a problem in the immediate future. The kind of debt that you need to minimize by methodically striking it off your list, and as long as it seeps through into your sprint work, you can probably avoid a scenario where it all gets out of control. Sometimes this sort of thing is really contained - a lone hacky file, written in the Mesozoic Era by a sleep-deprived Jamie Zawinski because someone was breathing down his neck. No one really understands what the code does, but it’s been humming along for the last 7 years, so why take your chances by waking the sleeping dragons? Slap the Safety Pig on it, claim a victory, and go shake down a vending machine. Toxic debt This is the kind of debt that needs to be addressed before it’s too late. How do you identify “toxic” debt? It’s that thing that you did half-way and now it’s become a workaround magnet. “We have to do it like this now until we fix it - someday”. The workarounds then become the foundation of new features, creating new and exciting debugging side quests. The future work required grows bigger with every new feature and a line of code. This is the toxic debt. Lack of tests is toxic debt Not having automated tests, or insufficient testing of critical paths, is tech debt in its own right. The more untested code you are adding, the more miserable your life is going to get over time. Tests are important to fight the debt itself. It’s much easier to take a sledgehammer to your codebase when a solid integration test suite’s got your back. We don’t like it, it’s upfront work that slows us down, but at some point after your Minimal Viable Prototype starts running away from you, you need to switch into Test Mode and tie it all down — before things get really nasty. Lack of documentation is toxic debt I am not talking about a War & Peace sized manual or detailed and severely out of date architecture diagrams in your Google Docs. Just a a set of critical READMEs and runbooks on how to start the system locally and perform basic tasks. What variables and secrets do I need? What else do I need installed? If there is a bug report, how do I configure my local environment to reproduce it, and so on. The time taken to reverse-engineer a system every time has an actual dollar value attached to it, plus the opportunity cost of not doing useful work. Put. It. In. A. Card. I have been guilty of this myself. I love TODOs. They are easy to add without breaking the flow, and they are configured in my IDE to be bright and loud. It’s a TODO — I will do it someday. During the Annual TODO Week, obviously. Let’s be frank — marking items as “TODO” is saying to yourself that you should really do this thing, but probably never will. This is relevant because TODO items can represent any level of technical debt described above, and so you should really make these actual stories on your Kanban/Agile boards. Mark technical debt as such You should be able to easily scan your “debt stories” and figure out which ones have payment due. This can be either a tag in your issue-tracking system or a column in your Kanban-style board like Trello. An approach like this will let you gauge better the ratio of new feature stories vs the growing technical debt. Your debt column will never be empty — that goal is as futile as Zero Inbox, but it should never grow out of control either. // TODO: conclusion

a year ago 59 votes
Code Lab - Job queues in Postgres

Introduction Friendly Fire needs to periodically execute scheduled jobs - to remind Slack users to review GitHub pull requests. Instead of bolting on a new system just for this, I decided to leverage Postgres instead. The must-have requirement was the ability to schedule a job to run in the future, with workers polling for “ripe” jobs, executing them and retrying on failure, with exponential backoff. With SKIP LOCKED, Postgres has the needed functionality, allowing a single worker to atomically pull a job from the job queue without another worker pulling the same one. This project is a demo of this system, slightly simplified. This example, available on GitHub is a playground for the following: How to set up a base Quart web app with Postgres using Poetry How to process a queue of immediate and delayed jobs using only the database How to retry failed jobs with exponential backoff How to use custom decorators to ensure atomic HTTP requests (success - commit, failure - rollback) How to use Pydantic for stricter Python models How to use asyncpg and asynchronously query Postgres with connection pooling How to test asyncio code using pytest and unittest.IsolatedAsyncioTestCase How to manipulate the clock in tests using freezegun How to use mypy, flake8, isort, and black to format and lint the code How to use Make to simplify local commands ALTER MODE SKIP COMPLEXITY Postgres introduced SKIP LOCKED years ago, but recently there was a noticeable uptick in the interest around this feature. In particular regarding its obvious use for simpler queuing systems, allowing us to bypass libraries or maintenance-hungry third-party messaging systems. Why now? It’s hard to say, but my guess is that the tech sector is adjusting to the leaner times, looking for more efficient and cheaper ways of achieving the same goals at common-scale but with fewer resources. Or shall we say - reasonable resources. What’s Quart? Quart is the asynchronous version of Flask. If you know about the g - the global request context - you will be right at home. Multiple quality frameworks have entered Python-scape in recent years - FastAPI, Sanic, Falcon, Litestar. There is also Bottle and Carafe. Apparently naming Python frameworks after liquid containers is now a running joke. Seeing that both Flask and Quart are now part of the Pallets project, Quart has been curiously devoid of hype. These two are in the process of being merged and at some point will become one framework - classic synchronous Flask and asynchronous Quart in one. How it works Writing about SKIP LOCKED is going to be redundant as this has been covered plenty elsewhere. For example, in this article. Even more in-depth are these slides from 2016 PGCON. The central query looks like this: DELETE FROM job WHERE id = ( SELECT id FROM job WHERE ripe_at IS NULL OR [current_time_argument] >= ripe_at FOR UPDATE SKIP LOCKED LIMIT 1 ) RETURNING *, id::text Each worker is added as a background task, periodically querying the database for “ripe” jobs (the ones ready to execute), and then runs the code for that specific job type. A job that does not have the “ripe” time set will be executed whenever a worker is available. A job that fails will be retried with exponential backoff, up to Job.max_retries times: next_retry_minutes = self.base_retry_minutes * pow(self.tries, 2) Creating a job is simple: job: Job = Job( job_type=JobType.MY_JOB_TYPE, arguments={"user_id": user_id}, ).runs_in(hours=1) await jobq.service.job_db.save(job) SKIP LOCKED and DELETE ... SELECT FOR UPDATE tango together to make sure that no worker gets the same job at the same time. To keep things interesting, at the Postgres level we have an MD5-based auto-generated column to make sure that no job of the same type and with the same arguments gets queued up more than once. This project also demonstrates the usage of custom DB transaction decorators in order to have a cleaner transaction notation: @write_transaction @api.put("/user") async def add_user(): # DB write logic @read_transaction @api.get("/user") async def get_user(): # DB read logic A request (or a function) annotated with one of these decorators will be in an atomic transaction until it exits, and rolled back if it fails. At shutdown, the “stop” flag in each worker is set, and the server waits until all the workers complete their sleep cycles, peacing out gracefully. async def stop(self): for worker in self.workers: worker.request_stop() while not all([w.stopped for w in self.workers]): logger.info("Waiting for all workers to stop...") await asyncio.sleep(1) logger.info("All workers have stopped") Testing The test suite leverages unittest.IsolatedAsyncioTestCase (Python 3.8 and up) to grant us access to asyncSetUp() - this way we can call await in our test setup functions: async def asyncSetUp(self) -> None: self.app: Quart = create_app() self.ctx: quart.ctx.AppContext = self.app.app_context() await self.ctx.push() self.conn = await asyncpg.connect(...) db.connection_manager.set_connection(self.conn) self.transaction = self.conn.transaction() await self.transaction.start() async def asyncTearDown(self) -> None: await self.transaction.rollback() await self.conn.close() await self.ctx.pop() Note that we set up the database only once for our test class. At the end of each test, the connection is rolled back, returning the database to its pristine state for the next test. This is a speed trick to make sure we don’t have to run database setup code each single time. In this case it doesn’t really matter, but in a test suite large enough, this is going to add up. For delayed jobs, we simulate the future by freezing the clock at a specific time (relative to now): # jump to the FUTURE with freeze_time(now + datetime.timedelta(hours=2)): ripe_job = await jobq.service.job_db.get_one_ripe_job() assert ripe_job Improvements Batching - pulling more than one job at once would add major dragonforce to this system. This is not part of the example as to not overcomplicate it. You just need to be careful and return the failed jobs back in the queue while deleting the completed ones. With enough workers, a system like this could really be capable of handling serious common-scale workloads. Server exit - there are less than trivial ways of interrupting worker sleep cycles. This could improve the experience of running the service locally. In its current form, you have to wait a few seconds until all worker loops get out of sleep() and read the STOP flag. Renegade Otter is the developer of Friendly Fire - Smarter pull request assignment for GitHub: Connect GitHub users to Slack and notify directly Skip reviewers who are not available File pattern matching Individual code review reminders No access to your codebase needed

a year ago 50 votes
Your database skills are not ‘good to have’

A MySQL war story It’s 2006, and the New York Magazine digital team set out to create a new search experience for its Fashion Week portal. It was one of those projects where technical feasibility was not even discussed with the tech team - a common occurrence back then. Agile was still new, let alone in publishing. It was just a vision, a real friggin’ moonshot, and 10 to 12 weeks to develop the wireframed version of the product. There would be almost no time left for proper QA. Fashion Week does not start slowly but rather goes from zero to sixty in a blink. The vision? Thousands of near-real-time fashion show images, each one with its sub-items categorized: “2006”, “bag”, “red”, “ leather”, and so on. A user will land on the search page and have the ability to “drill down” and narrow the results based on those properties. To make things much harder, all of these properties would come with exact counts. The workflow was going to be intense. Photographers will courier their digital cartridges from downtown NYC to our offices on Madison Avenue, where the images will be processed, tagged by interns, and then indexed every hour by our Perl script, reading the tags from the embedded EXIF information. Failure to build the search product on our side would have collapsed the entire ecosystem already in place, primed and ready to rumble. “Oh! Just use the facets in Solr, dude”. Yeah, not so fast - dude. In 2006 that kind of technology didn’t even exist yet. I sat through multiple enterprise search engine demos with our CTO, and none of the products (which cost a LOT of money) could do a deep faceted search. We already had an Autonomy license and my first try proved that… it just couldn’t do it. It was supposed to be able to, but the counts were all wrong. Endeca (now owned by Oracle), came out of stealth when the design part of the project was already underway. Too new, too raw, too risky. The idea was just a little too ambitious for its time, especially for a tiny team in a non-tech company. So here we were, a team of three, myself and two consultants, writing Perl for the indexing script, query-parsing logic, and modeling the data - in MySQL 4. It was one of those projects where one single insurmountable technical risk would have sunk the whole thing. I will cut the story short and spare you the excitement. We did it, and then we went out to celebrate at a karaoke bar (where I got my very first work-stress-related severe hangover) 🤮 For someone who was in charge of the SQL model and queries, it was days and days of tuning those, timing every query and studying the EXPLAIN output to see what else I could do to squeeze another 50ms out of the database. There were no free nights or weekends. In the end, it was a combination of trial and error, digging deep into MySQL server settings, and crafting GROUP BY queries that would make you nauseous. The MySQL query analyzer was fidgety back then, and sometimes re-arranging the fields in the SELECT clause could change a query’s performance. Imagine if SELECT field1, field2 FROM my_table was faster than SELECT field2, field1 FROM my_table. Why would it do that? I have no idea to this day, and I don’t even want to know. Unfortunately, I lost examples of this work, but the Way Back Machine has proof of our final product. The point here is - if you really know your database, you can do pretty crazy things with it, and with the modern generation of storage technologies and beefier hardware, you don’t even need to push the limits - it should easily handle what I refer to as “common-scale”. Renegade Otter is the developer of Friendly Fire - Smarter pull request assignment for GitHub: Connect GitHub users to Slack and notify directly Skip reviewers who are not available File pattern matching Individual code review reminders No access to your codebase needed The fading art of SQL In the past few years I have been noticing an unsettling trend - software engineers are eager to use exotic “planet-scale” databases for pretty rudimentary problems, while at the same time not having a good grasp of the very powerful relational database engine they are likely already using, let alone understanding the technology’s more advanced and useful capabilities. The SQL layer is buried so deep beneath libraries and too clever by a half ORMs that it all just becomes high-level code. Why is it slow? No idea - let's add Cassandra to it! Modern hardware certainly allows us to go way up from the CPU into the higher abstraction layers, while it wasn’t that uncommon in the past to convert certain functions to assembly code in order to squeeze every bit of performance out of the processor. Now compute and storage is cheaper - it’s true - but abusing this abundance has trained us laziness and complacency. Suddenly, that Cloud bill is a wee too high, and heavens knows how much energy the world is burning by just running billions of these inefficient ORM queries every second against mammoth database instances. The morning of my first job interview in 2004, I was on a subway train memorizing the nine levels of database normalization. Or is it five levels? I don’t remember, and It doesn’t even matter - no one will ever ask you this now in a software engineer interview. Just skimming through the table of contents of your database of choice, say the now freshly in vogue Postgres, you will find an absolute treasure trove of features fit to handle everything but the most gruesome planet-scale computer science problems. Petabyte-sized Postgres boxes, replicated, are effortlessly running now as you are reading this. The trick is to not expect your database or your ORM to read your mind. Speaking of… ORMs are the frenemy I was a new hire at an e-commerce outfit once, and right off the bat I was thrown into fixing serious performance issues with the company’s product catalog pages. Just a straight-forward, paginated grid of product images. How hard could it be? Believe it or not - it be. The pages took over 10 seconds to load, sometimes longer, the database was struggling, and the solution was to “just cache it”. One last datapoint - this was not a high-traffic site. The pages were dead-slow even if there was no traffic at all. That’s a rotten sign that something is seriously off. After looking a bit closer, I realized that I hit the motherlode - all top three major database and coding mistakes in one. ❌ Mistake #1: There is no index The column that was hit in every single mission-critical query had no index. None. After adding the much-needed index in production, you could practically hear MySQL exhaling in relief. Still, the performance was not quite there yet, so I had to dig deeper, now in the code. ❌ Mistake #2: Assuming each ORM call is free Activating the query logs locally and reloading a product listing page, I see… 200, 300, 500 queries fired off just to load one single page. What the shit? Turns out, this was the result of a classic ORM abuse of going through every record in a loop, to the effect of: for product_id in product_ids: product = amazing_orm.products.get(id=product_id) products.append(product) The high number of queries was also due the fact that some of this logic was nested. The obvious solution is to keep the number of queries in each request to a minimum, leveraging SQL to join and combine the data into one single blob. This is what relational databases do - it’s in the name. Each separate query needs to travel to the database, get parsed, transformed, analyzed, planned, executed, and then travel back to the caller. It is one of the most expensive operations you can do, and ORMs will happily do the worst possible thing for you in terms of performance. One wonders what those algorithm and data structure interview questions are good for, considering you are more likely to run into a sluggish database call than a B-tree implementation (common structure used for database indexes). ❌ Mistake #3: Pulling in the world To make matters worse, the amount of data here was relatively small, but there were dozens and dozens of columns. What do ORMs usually do by default in order to make your life “easier”? They send the whole thing, all the columns, clogging your network pipes with the data that you don’t even need. It is a form of toxic technical debt, where the speed of development will eventually start eating into performance. I spent hours within the same project hacking the dark corners of the Dango admin, overriding default ORM queries to be less “eager”. This led to a much better office-facing experience. Performance IS a feature Serious, mission-critical systems have been running on classic and boring relational databases for decades, serving thousands of requests per second. These systems have become more advanced, more capable, and more relevant. They are wonders of computer science, one can claim. You would think that an ancient database like Postgres (in development since 1982) is in some kind of legacy maintenance mode at this point, but the opposite is true. In fact, the work has been only accelerating, with the scale and features becoming pretty impressive. What took multiple queries just a few years ago now takes a single one. Why is this significant? It has been known for a long time, as discovered by Amazon, that every additional 100ms of a user waiting for a page to load loses a business money. We also know now that from a user’s perspective, the maximum target response time for a web page is around 100 milliseconds: A delay of less than 100 milliseconds feels instant to a user, but a delay between 100 and 300 milliseconds is perceptible. A delay between 300 and 1,000 milliseconds makes the user feel like a machine is working, but if the delay is above 1,000 milliseconds, your user will likely start to mentally context-switch. The “just add more CPU and RAM if it’s slow” approach may have worked for a while, but many are finding out the hard way that this kind of laziness is not sustainable in a frugal business environment where costs matter. Database anti-patterns Knowing what not to do is as important as knowing what to do. Some of the below mistakes are all too common: ❌ Anti-pattern #1. Using exotic databases for the wrong reasons Technologies like DynamoDB are designed to handle scale at which Postgres and MySQL begin to fail. This is achieved by denormalizing, duplicating the data aggressively, where the database is not doing much real-time data manipulation or joining. Your data is now modeled after how it is queried, not after how it is related. Regular relational concepts disintegrate at this insane level of scale. Needless to say, if you are resorting to this kind of storage for “common-scale” problems, you are already solving problems you don’t have. ❌ Anti-pattern #2. Caching things unnecessarily Caching is a necessary evil - but it’s not always necessary. There is an entire class of bugs and on-call issues that stem from stale cached data. Read-only database replicas are a classic architecture pattern that is still very much not outdated, and it will buy you insane levels of performance before you have to worry about anything. It should not be a surprise that mature relational databases already have query caching in place - it just has to be tuned for your specific needs. Cache invalidation is hard. It adds more complexity and states of uncertainty to your system. It makes debugging more difficult. I received more emails from content teams than I care for throughout my career that wondered “why is the data not there, I updated it 30 minutes ago?!” Caching should not act as a bandaid for bad architecture and non-performant code. ❌ Anti-pattern #3. Storing everything and a kitchen sink As much punishment as an industry-standard database can take, it’s probably not a good idea to not care at all about what’s going into it, treating it like a data landfill of sorts. Management, querying, backups, migrations - all becomes painful once the DB grows substantially. Even if that is of no concern as you are using a managed cloud DB - the costs should be. An RDBMS is a sophisticated piece of technology, and storing data in it is expensive. Figure out common-scale first It is fairly easy to make a beefy Postgres or a MySQL database grind to a halt if you expect it to do magic without any extra work. “It’s not web-scale, boss. Our 2 million records seem to be too much of a lift. We need DynamoDB, Kafka, and event sourcing!” A relational database is not some antiquated technology that only us tech fossils choose to be experts in, a thing that can be waved off like an annoying insect. “Here we React and GraphQL all the things, old man”. In legal speak, a modern RDBMS is innocent until proven guilty, and the burden of proof should be extremely high - and almost entirely on you. Finally, if I have to figure out “why it’s slow”, my approximate runbook is: Compile a list of unique queries, from logging, slow query log, etc. Look at the most frequent queries first Use EXPLAIN to check slow query plans for index usage Select only the data that needs to travel across the wire If an ORM is doing something silly without a workaround, pop the hood and get dirty with the raw SQL plumbing Most importantly, study your database (and SQL). Learn it, love it, use it, abuse it. Spending a couple of days just leafing through that Postgres manual to see what it can do will probably make you a better engineer than spending more time on the next flavor-of-the-month React hotness. Again. Related posts I am not your Cloud person Renegade Otter is the developer of Friendly Fire - Smarter pull request assignment for GitHub: Connect GitHub users to Slack and notify directly Skip reviewers who are not available File pattern matching Individual code review reminders No access to your codebase needed

a year ago 53 votes
Death by a thousand microservices

The Church of Complexity There is a pretty well-known sketch in which an engineer is explaining to the project manager how an overly complicated maze of microservices works in order to get a user’s birthday - and fails to do so anyway. The scene accurately describes the absurdity of the state of the current tech culture. We laugh, and yet bringing this up in a serious conversation is tantamount to professional heresy, rendering you borderline un-hirable. How did we get here? How did our aim become not addressing the task at hand but instead setting a pile of cash on fire by solving problems we don’t have? Trigger warning: Some people understandably got salty when I name-checked JavaScript and NodeJS as a source of the problem, but my point really was more about the dangers of hermetically sealed software ecosystems that seem hell-bent on re-learning the lessons that we just had finished learning. We ran into the complexity wall before and reset - otherwise we'd still be using CORBA and SOAP. These air-tight developer bubbles are a wrecking ball on the entire industry, and it takes about a full decade to swing. The perfect storm There are a few events in recent history that may have contributed to the current state of things. First, a whole army of developers writing JavaScript for the browser started self-identifying as “full-stack”, diving into server development and asynchronous code. JavaScript is JavaScript, right? What difference does it make what you create using it - user interfaces, servers, games, or embedded systems. Right? Node was still kind of a learning project of one person, and the early JavaScript was a deeply problematic choice for server development. Pointing this out to still green server-side developers usually resulted in a lot of huffing and puffing. This is all they knew, after all. The world outside of Node effectively did not exist, the Node way was the only way, and so this was the genesis of the stubborn, dogmatic thinking that we are dealing with to this day. And then, a steady stream of FAANG veterans started merging into the river of startups, mentoring the newly-minted and highly impressionable young JavaScript server-side engineers. The apostles of the Church of Complexity would assertively claim that “how they did things over at Google” was unquestionable and correct - even if it made no sense with the given context and size. What do you mean you don’t have a separate User Preferences Service? That just will not scale, bro! But, it’s easy to blame the veterans and the newcomers for all of this. What else was happening? Oh yeah - easy money. What do you do when you are flush with venture capital? You don’t go for revenue, surely! On more than one occasion I received an email from management, asking everyone to be in the office, tidy up their desks and look busy, as a clouder of Patagonia vests was about to be paraded through the space. Investors needed to see explosive growth, but not in profitability, no. They just needed to see how quickly the company could hire ultra-expensive software engineers to do … something. And now that you have these developers, what do you do with them? Well, they could build a simpler system that is easier to grow and maintain, or they could conjure up a monstrous constellation of “microservices” that no one really understands. Microservices - the new way of writing scalable software! Are we just going to pretend that the concept of “distributed systems” never existed? (Let’s skip the whole parsing of nuances about microservices not being real distributed systems). Back in the days when the tech industry was not such a bloated farce, distributed systems were respected, feared, and generally avoided - reserved only as the weapon of last resort for particularly gnarly problems. Everything with a distributed system becomes more challenging and time-consuming - development, debugging, deployment, testing, resilience. But I don’t know - maybe it’s all super easy now because toooollling. There is no standard tooling for microservices-based development - there is no common framework. Working on distributed systems has gotten only marginally easier in 2020s. The Dockers and the Kuberneteses of the world did not magically take away the inherent complexity of a distributed setup. I love referring to this summary of 5 years of startup audits, as it is packed with common-sense conclusions: … the startups we audited that are now doing the best usually had an almost brazenly ‘Keep It Simple’ approach to engineering. Cleverness for cleverness sake was abhorred. On the flip side, the companies where we were like ”woah, these folks are smart as hell” for the most part kind of faded. Generally, the major foot-gun that got a lot of places in trouble was the premature move to microservices, architectures that relied on distributed computing, and messaging-heavy designs. Literally - “complexity kills”. The audit revealed an interesting pattern, where many startups experienced a sort of collective imposter syndrome while building straight-forward, simple, performant systems. There is a dogma attached to not starting out with microservices on day one - no matter the problem. “Everyone is doing microservices, yet we have a single Django monolith maintained by just a few engineers, and a MySQL instance - what are we doing wrong?”. The answer is almost always “nothing”. Likewise, it’s very often that seasoned engineers experience hesitation and inadequacy in today’s tech world, and the good news is that, no - it’s probably not you. It’s common for teams to pretend like they are doing “web scale”, hiding behind libraries, ORMs, and cache - confident in their expertise (they crushed that Leetcode!), yet they may not even be aware of database indexing basics. You are operating in a sea of unjustified overconfidence, waste, and Dunning-Kruger, so who is really the imposter here? Renegade Otter is the developer of Friendly Fire - Smarter pull request assignment for GitHub: Slack notifications Out-of-office support File matching Try it now. There is nothing wrong with a monolith The idea that you cannot grow without a system that looks like the infamous slide of Afghanistan war strategy is a myth. Dropbox, Twitter, Netflix, Facebook, GitHub, Instagram, Shopify, StackOverflow - these companies and others started out as monolithic code bases. Many have a monolith at their core to this day. StackOverflow makes it a point of pride how little hardware they need to run the massive site. Shopify is still a Rails monolith, leveraging the tried and true Resque to process billions of tasks. WhatsApp went supernova with their Erlang monolith and a relatively small team. How? WhatsApp consciously keeps the engineering staff small to only about 50 engineers. Individual engineering teams are also small, consisting of 1 - 3 engineers and teams are each given a great deal of autonomy. In terms of servers, WhatsApp prefers to use a smaller number of servers and vertically scale each server to the highest extent possible. Instagram was acquired for billions - with a crew of 12. And do you imagine Threads as an effort involving a whole Meta campus? Nope. They followed the Instagram model, and this is the entire Threads team: Perhaps claiming that your particular problem domain requires a massively complicated distributed system and an open office stuffed to the gills with turbo-geniuses is just crossing over into arrogance rather than brilliance? Don’t solve problems you don’t have It’s a simple question - what problem are you solving? Is it scale? How do you know how to break it all up for scale and performance? Do you have enough data to show what needs to be a separate service and why? Distributed systems are built for size and resilience. Can your system scale and be resilient at the same time? What happens if one of the services goes down or comes to a crawl? Just scale it up? What about the other services that are going to get hit with traffic? Did you war-game the endless permutations of things that can and will go wrong? Is there backpressure? Circuit breakers? Queues? Jitter? Sensible timeouts on every endpoint? Are there fool-proof guards to make sure a simple change does not bring everything down? The knobs you need to be aware of and tune are endless, and they are all specific to your system’s particular signature of usage and load. The truth is that most companies will never reach the massive size that will actually require building a true distributed system. Your cosplaying Amazon and Google - without their scale, expertise, and endless resources - is very likely just an egregious waste of money and time. Religiously following all the steps from an article called “Ten morning habits of very successful people” is not going to make you a billionaire. The only thing harder than a distributed system is a BAD distributed system. “But each team… but separate… but API” Trying to shove a distributed topology into your company’s structure is a noble effort, but it almost always backfires. It’s a common approach to break up a problem into smaller pieces and then solve those one by one. So, the thinking goes, if you break up one service into multiple ones, everything becomes easier. The theory is sweet and elegant - each microservice is being maintained rigorously by a dedicated team, walled off behind a beautiful, backward-compatible, versioned API. In fact, this is so solid that you rarely even have to communicate with that team - as if the microservice was maintained by a 3rd party vendor. It’s simple! If that doesn’t sound familiar, that’s because this rarely happens. In reality, our Slack channels are flooded with messages from teams communicating about releases, bugs, configuration updates, breaking changes, and PSAs. Everyone needs to be on top of everything, all the time. And if that wasn’t great, it’s normal for one already-slammed team to half-ass multiple microservices instead of doing a great job on a single one, often changing ownership as people come and go. In order to win the race, we don’t build one good race car - we build a fleet of shitty golf carts. What you lose There are multiple pitfalls to building with microservices, and often that minefield is either not fully appreciated or simply ignored. Teams spend months writing highly customized tooling and learning lessons not related at all to the core product. Here are just some often overlooked aspects… Say goodbye to DRY After decades of teaching developers to write Don’t Repeat Yourself code, it seems we just stopped talking about it altogether. Microservices by default are not DRY, with every service stuffed with redundant boilerplate. Very often the overhead of such “plumbing” is so heavy, and the size of the microservices is so small, that the average instance of a service has more “service” than “product”. So what about the common code that can be factored out? Have a common library? How does the common library get updated? Keep different versions everywhere? Force updates regularly, creating dozens of pull requests across all repositories? Keep it all in a monorepo? That comes with its own set of problems. Allow for some code duplication? Forget it, each team gets to reinvent the wheel every time. Each company going this route faces these choices, and there are no good “ergonomic” options - you have to choose your version of the pain. Developer ergonomics will crater “Developer ergonomics” is the friction, the amount of effort a developer must go through in order to get something done, be it working on a new feature or resolving a bug. With microservices, an engineer has to have a mental map of the entire system in order to know what services to bring up for any particular task, what teams to talk to, whom to talk to, and what about. The “you have to know everything before doing anything” principle. How do you keep on top of it? Spotify, a multi-billion dollar company, spent probably not negligible internal resources to build Backstage, software for cataloging its endless systems and services. This should at least give you a clue that this game is not for everyone, and the price of the ride is high. So what about the tooooling? The Not Spotifies of the world are left with MacGyvering their own solutions, robustness and portability of which you can probably guess. And how many teams actually streamline the process of starting a YASS - “yet another stupid service”? This includes: Developer privileges in GitHub/GitLab Default environment variables and configuration CI/CD Code quality checkers Code review settings Branch rules and protections Monitoring and observability Test harness Infrastructure-as-code And of course, multiply this list by the number of programming languages used throughout the company. Maybe you have a usable template or a runbook? Maybe a frictionless, one-click system to launch a new service from scratch? It takes months to iron out all the kinks with this kind of automation. So, you can either work on your product, or you can be working on toooooling. Integration tests - LOL As if the everyday microservices grind was not enough, you also forfeit the peace of mind offered by solid integration tests. Your single-service and unit tests are passing, but are your critical paths still intact after each commit? Who is in charge of the overall integration test suite, in Postman or wherever else? Is there one? Integration testing a distributed setup is a nearly-impossible problem, so we pretty much gave up on that and replaced it with another one - Observability. Just like “microservices” are the new “distributed systems”, “observability” is the new “debugging in production”. Surely, you are not writing real software if you are not doing…. observability! Observability has become its own sector, and you will pay in both pretty penny and in developer time for it. It doesn’t come as plug-and-pay either - you need to understand and implement canary releases, feature flags, etc. Who is doing that? One already overwhelmed engineer? As you can see, breaking up your problem does not make solving it easier - all you get is another set of even harder problems. What about just “services”? Why do your services need to be “micro”? What’s wrong with just services? Some startups have gone as far as create a service for each function, and yes, “isn’t that just like Lambda” is a valid question. This gives you an idea of how far gone this unchecked cargo cult is. So what do we do? Starting with a monolith is one obvious choice. A pattern that could also work in many instances is “trunk & branches”, where the main “meat and potatoes” monolith is helped by “branch” services. A branch service can be one that takes care of a clearly-identifiable and separately-scalable load. A CPU-hungry Image-Resizing Service makes way more sense than a User Registration Service. Or do you get so many registrations per second that it requires independent horizontal scaling? Side note: In version control, back in the days of CVS and Subversion, we rarely used "master" branches. We had "trunk and branches" because, you know - *trees*. "Master" branches appeared somewhere along the way, and when GitHub decided to do away with the rather unfortunate naming convention, the average engineer was too young to remember about "trunk" - and so the generic "main" default came to be. The pendulum is swinging back The hype, however, seems to be dying down. The VC cash faucet is tightening, and so the businesses have been market-corrected into exercising common-sense decisions, recognizing that perhaps splurging on web-scale architectures when they don’t have web-scale problems is not sustainable. Ultimately, when faced with the need to travel from New York to Philadelphia, you have two options. You can either attempt to construct a highly intricate spaceship for an orbital descent to your destination, or you can simply purchase an Amtrak train ticket for a 90-minute ride. That is the problem at hand. Additional reading & listening How to recover from microservices You want modules, not microservices XML is the future Gasp! You might not need microservices Podcast: How we keep Stack Overflow’s codebase clean and modern Goodbye Microservices: From 100s of problem children to 1 superstar It’s the future Renegade Otter is the developer of Friendly Fire - Smarter pull request assignment for GitHub: Slack notifications Out-of-office support File matching Try it now.

a year ago 26 votes

More in programming

Are we the baddies?

I signed up for Hinge. Holy shit with the boosts. How does someone who works on this wake up every morning and feel okay about themselves? Similarly with the tip screens, Uber algorithm, all the zero sum bullshit using all the tricks of psychology to extract a little bit more from every interaction in society. Nudge. Nudge. NUDGE. Want to partake in normal society like buying a coffee, going on a date, getting a ride, paying a friend. Oh, there’s a middle man now. An evil ominous middleman using state of the art AI algorithms to extract just a little bit more from you. But eventually the market will fix this, right? People will feel sick of being manipulated and move elsewhere? Ahhh, but they see that coming long before you do. They have dashboards. Quick Jeeves, tune the AI to make people feel less manipulated. Give them a little bit more for now, we have to think about maximizing lifetime customer value here. Oh the AI already did this on its own? Jeeves you’ve been replaced! People perpetually on the edge. You want to opt out of this all you say? Good luck running a competitive business! Every metric is now a target. You better maximize engagement or you will lose engagement this is a red queen’s race we can’t afford to lose! Burn all the social capital, burn all your values, FEED IT ALL TO MOLOCH! Someday, people will have to realize we live in a society. What will it take? A complete self cannibalization to the point you can’t eat your own mouth? It sure as hell isn’t going to be people opting out, that’s a collective action problem you can’t solve. Democracy, haha, you think the algorithms will let you vote to kill them? Your vote is as decoupled from action as the amount Uber pays the driver is decoupled from the fare that you pay. There’s no reform here, there’s only revolution. Will it simply be a huge financial collapse? Or do we need World War 3? And even World War 3 is on a spectrum. Will mass starvation fix this? Or will the attitude of thinking it’s okay to manipulate others at scale persist even past that? He’s got his, and I’ve got mine… If you open a government S&P 500 account for everyone with $1,000 at birth that’ll pay their social security cause it like…goes up…wait who’s creating this value again? It’s not okay. Advertising is not okay. Price discrimination is not okay. Using big data, machine learning, and psychology to manipulate others at scale is not okay. But you aren’t going to learn this lesson until you have fed a huge majority of your customers to Moloch. Modern capitialism is wireheading. Release the hypnodrones. How many cans of Pepsi did you want them to consume an hour again?

13 hours ago 2 votes
Get in losers, we're moving to Linux!

I've never seen so many developers curious about leaving the Mac and giving Linux a go. Something has really changed in the last few years. Maybe Linux just got better? Maybe powerful mini PCs made it easier? Maybe Apple just fumbled their relationship with developers one too many times? Maybe it's all of it. But whatever the reason, the vibe shift is noticeable. This is why the future is so hard to predict! People have been joking about "The Year of Linux on the Desktop" since the late 90s. Just like self-driving cars were supposed to be a thing back in 2017. And now, in the year of our Lord 2025, it seems like we're getting both! I also wouldn't underestimate the cultural influence of a few key people. PewDiePie sharing his journey into Arch and Hyprland with his 110 million followers is important. ThePrimeagen moving to Arch and Hyprland is important. Typecraft teaching beginners how to build an Arch and Hyprland setup from scratch is important (and who I just spoke to about Omarchy). Gabe Newell's Steam Deck being built on Arch and pushing Proton to over 20,000 compatible Linux games is important. You'll notice a trend here, which is that Arch Linux, a notoriously "difficult" distribution, is at the center of much of this new engagement. Despite the fact that it's been around since 2003! There's nothing new about Arch, but there's something new about the circles of people it's engaging. I've put Arch at the center of Omarchy too. Originally just because that was what Hyprland recommended. Then, after living with the wonders of 90,000+ packages on the community-driven AUR package repository, for its own sake. It's really good! But while Arch (and Hyprland) are having a moment amongst a new crowd, it's also "just" Linux at its core. And Linux really is the star of the show. The perfect, free, and open alternative that was just sitting around waiting for developers to finally have had enough of the commercial offerings from Apple and Microsoft. Now obviously there's a taste of "new vegan sees vegans everywhere" here. You start talking about Linux, and you'll hear from folks already in the community or those considering the move too. It's easy to confuse what you'd like to be true with what is actually true. And it's definitely true that Linux is still a niche operating system on the desktop. Even among developers. Apple and Microsoft sit on the lion's share of the market share. But the mind share? They've been losing that fast. The window is open for a major shift to happen. First gradually, then suddenly. It feels like morning in Linux land!

2 days ago 5 votes
Another tip (tip)
2 days ago 7 votes
All about Svelte 5 snippets

Snippets are a useful addition to Svelte 5. I use them in my Svelte 5 projects like Edna. Snippet basics A snippet is a function that renders html based on its arguments. Here’s how to define and use a snippet: {#snippet hello(name)} <div>Hello {name}!</div> {/snippet} {@render hello("Andrew")} {@render hello("Amy")} You can re-use snippets by exporting them: <script module> export { hello }; </script> {@snippet hello(name)}<div>Hello {name}!</div>{/snippet} Snippets use cases Snippets for less nesting Deeply nested html is hard to read. You can use snippets to extract some parts to make the structure clearer. For example, you can transform: <div> <div class="flex justify-end mt-2"> <button onclick={onclose} class="mr-4 px-4 py-1 border border-black hover:bg-gray-100" >Cancel</button > <button onclick={() => emitRename()} disabled={!canRename} class="px-4 py-1 border border-black hover:bg-gray-50 disabled:text-gray-400 disabled:border-gray-400 disabled:bg-white default:bg-slate-700" >Rename</button > </div> into: {#snippet buttonCancel()} <button onclick={onclose} class="mr-4 px-4 py-1 border border-black hover:bg-gray-100" >Cancel</button > {/snippet} {#snippet buttonRename()}...{/snippet} To make this easier to read: <div> <div class="flex justify-end mt-2"> {@render buttonCancel()} {@render buttonRename()} </div> </div> snippets replace default <slot/> In Svelte 4, if you wanted place some HTML inside the component, you used <slot />. Let’s say you have Overlay.svelte component used like this: <Overlay> <MyDialog></MyDialog> </Overlay> In Svelte 4, you would use <slot /> to render children: <div class="overlay-wrapper"> <slot /> </div> <slot /> would be replaced with <MyDialog></MyDialog>. In Svelte 5 <MyDialog></MyDialog> is passed to Overlay.svelte as children property so you would change Overlay.svelte to: <script> let { children } = $props(); </script> <div class="overlay-wrapper"> {@render children()} </div> children property is created by Svelte compiler so you should avoid naming your own props children. snippets replace named slots A component can have a default slot for rendering children and additional named slots. In Svelte 5 instead of named slots you pass snippets as props. An example of Dialog.svelte: <script> let { title, children } = $props(); </script> <div class="dialog"> <div class="title"> {@render title()} </div> {@render children()} </div> And use: {#snippet title()} <div class="fancy-title">My fancy title</div> {/snippet} <Dialog title={title}> <div>Body of the dialog</div> </Dialog> passing snippets as implicit props You can pass title snippet prop implicitly: <Dialog> {#snippet title()} <div class="fancy-title">My fancy title</div> {/snippet} <div>Body of the dialog</div> </Dialog> Because {snippet title()} is a child or <Dialog>, we don’t have to pass it as explicit title={title} prop. The compiler does it for us. snippets to reduce repetition Here’s part of how I render https://tools.arslexis.io/ {#snippet row(name, url, desc)} <tr> <td class="text-left align-top" ><a class="font-semibold whitespace-nowrap" href={url}>{name}</a> </td> <td class="pl-4 align-top">{@html desc}</td> </tr> {/snippet} {@render row("unzip", "/unzip/", "unzip a file in the browser")} {@render row("wc", "/wc/", "like <tt>wc</tt>, but in the browser")} It saves me copy & paste of the same HTML and makes the structure more readable. snippets for recursive rendering Sometimes you need to render a recursive structure, like nested menus or file tree. In Svelte 4 you could use <svelte:self> but the downside of that is that you create multiple instances of the component. That means that the state is also split among multiple instances. That makes it harder to implement functionality that requires a global view of the structure, like keyboard navigation. With snippets you can render things recursively in a single instance of the component. I used it to implement nested context menus. snippets to customize rendering Let’s say you’re building a Menu component. Each menu item is a <div> with some non-trivial children. To allow the client of Menu customize how items are rendered, you could provide props for things like colors, padding etc. or you could allow ultimate flexibility by accepting an optional menuitem prop that is a snippet that renders the item. You can think of it as a headless UI i.e. you provide the necessary structure and difficult logic like keyboard navigation etc. and allow the client lots of control over how things are rendered. snippets for library of icons Before snippets every SVG Icon I used was a Svelte component. Many icons means many files. Now I have a single Icons.svelte file, like: <script module> export { IconMenu, IconSettings }; </script> {#snippet IconMenu(arg1, arg2, ...)} <svg>... icon svg</svg> {/snippet}} {#snippet IconSettings()} <svg>... icon svg</svg> {/snippet}}

2 days ago 4 votes
Logical Quantifiers in Software

I realize that for all I've talked about Logic for Programmers in this newsletter, I never once explained basic logical quantifiers. They're both simple and incredibly useful, so let's do that this week! Sets and quantifiers A set is a collection of unordered, unique elements. {1, 2, 3, …} is a set, as are "every programming language", "every programming language's Wikipedia page", and "every function ever defined in any programming language's standard library". You can put whatever you want in a set, with some very specific limitations to avoid certain paradoxes.2 Once we have a set, we can ask "is something true for all elements of the set" and "is something true for at least one element of the set?" IE, is it true that every programming language has a set collection type in the core language? We would write it like this: # all of them all l in ProgrammingLanguages: HasSetType(l) # at least one some l in ProgrammingLanguages: HasSetType(l) This is the notation I use in the book because it's easy to read, type, and search for. Mathematicians historically had a few different formats; the one I grew up with was ∀x ∈ set: P(x) to mean all x in set, and ∃ to mean some. I use these when writing for just myself, but find them confusing to programmers when communicating. "All" and "some" are respectively referred to as "universal" and "existential" quantifiers. Some cool properties We can simplify expressions with quantifiers, in the same way that we can simplify !(x && y) to !x || !y. First of all, quantifiers are commutative with themselves. some x: some y: P(x,y) is the same as some y: some x: P(x, y). For this reason we can write some x, y: P(x,y) as shorthand. We can even do this when quantifying over different sets, writing some x, x' in X, y in Y instead of some x, x' in X: some y in Y. We can not do this with "alternating quantifiers": all p in Person: some m in Person: Mother(m, p) says that every person has a mother. some m in Person: all p in Person: Mother(m, p) says that someone is every person's mother. Second, existentials distribute over || while universals distribute over &&. "There is some url which returns a 403 or 404" is the same as "there is some url which returns a 403 or some url that returns a 404", and "all PRs pass the linter and the test suites" is the same as "all PRs pass the linter and all PRs pass the test suites". Finally, some and all are duals: some x: P(x) == !(all x: !P(x)), and vice-versa. Intuitively: if some file is malicious, it's not true that all files are benign. All these rules together mean we can manipulate quantifiers almost as easily as we can manipulate regular booleans, putting them in whatever form is easiest to use in programming. Speaking of which, how do we use this in in programming? How we use this in programming First of all, people clearly have a need for directly using quantifiers in code. If we have something of the form: for x in list: if P(x): return true return false That's just some x in list: P(x). And this is a prevalent pattern, as you can see by using GitHub code search. It finds over 500k examples of this pattern in Python alone! That can be simplified via using the language's built-in quantifiers: the Python would be any(P(x) for x in list). (Note this is not quantifying over sets but iterables. But the idea translates cleanly enough.) More generally, quantifiers are a key way we express higher-level properties of software. What does it mean for a list to be sorted in ascending order? That all i, j in 0..<len(l): if i < j then l[i] <= l[j]. When should a ratchet test fail? When some f in functions - exceptions: Uses(f, bad_function). Should the image classifier work upside down? all i in images: classify(i) == classify(rotate(i, 180)). These are the properties we verify with tests and types and MISU and whatnot;1 it helps to be able to make them explicit! One cool use case that'll be in the book's next version: database invariants are universal statements over the set of all records, like all a in accounts: a.balance > 0. That's enforceable with a CHECK constraint. But what about something like all i, i' in intervals: NoOverlap(i, i')? That isn't covered by CHECK, since it spans two rows. Quantifier duality to the rescue! The invariant is equivalent to !(some i, i' in intervals: Overlap(i, i')), so is preserved if the query SELECT COUNT(*) FROM intervals CROSS JOIN intervals … returns 0 rows. This means we can test it via a database trigger.3 There are a lot more use cases for quantifiers, but this is enough to introduce the ideas! Next week's the one year anniversary of the book entering early access, so I'll be writing a bit about that experience and how the book changed. It's crazy how crude v0.1 was compared to the current version. MISU ("make illegal states unrepresentable") means using data representations that rule out invalid values. For example, if you have a location -> Optional(item) lookup and want to make sure that each item is in exactly one location, consider instead changing the map to item -> location. This is a means of implementing the property all i in item, l, l' in location: if ItemIn(i, l) && l != l' then !ItemIn(i, l'). ↩ Specifically, a set can't be an element of itself, which rules out constructing things like "the set of all sets" or "the set of sets that don't contain themselves". ↩ Though note that when you're inserting or updating an interval, you already have that row's fields in the trigger's NEW keyword. So you can just query !(some i in intervals: Overlap(new, i')), which is more efficient. ↩

3 days ago 9 votes