More from 37signals Dev
As the final part of our move out of the cloud, we are working on moving 10 petabytes of data out of AWS Simple Storage Service (S3). After exploring different alternatives, we decided to go with Pure Storage FlashBlade solution. We store different kinds of information on S3, from the attachments customers upload to Basecamp to the Prometheus long-term metrics. On top of that, Pure’s system also provides filesystem-based capabilities, enabling other relevant usages, such as database backup storage. This makes the system a top priority for observability. Although the system has great reliability, out-of-the-box internal alerting, and autonomous ticket creation, it would also be good to have our metrics and alerts to facilitate problem-solving and ensure any disruptions are prioritized and handled. For more context on our current Prometheus setup, see how we use Prometheus at 37signals. Pure OpenMetrics exporter Pure maintains two OpenMetrics exporters, pure-fb-openmetrics-exporter and pure-fa-openmetrics-exporter. Since we use Pure Flashblade (fb), this post covers pure-fb-openmetrics-exporter, although overall usage should be similar. The setup is straightforward and requires only binary and basic authentication installation. Here is a snippet of our Chef recipe that installs it: pure_api_token = "token" # If you use Chef, your token should come from an ecrypted databag. Changed to hardcoded here to simplify PURE_EXPORTER_VERSION = "1.0.13".freeze # Generally, we use Chef node metadata for version management. Changed to hardcoded to simplify directory "/opt/pure_exporter/#{PURE_EXPORTER_VERSION}" do recursive true owner 'pure_exporter' group 'pure_exporter' end # Avoid recreating under /tmp after reboot if target_binary is already there target_binary = "/opt/pure_exporter/#{PURE_EXPORTER_VERSION}/pure-fb-openmetrics-exporter" remote_file "/tmp/pure-fb-openmetrics-exporter-v#{PURE_EXPORTER_VERSION}-linux-amd64.tar.gz" do source "https://github.com/PureStorage-OpenConnect/pure-fb-openmetrics-exporter/releases/download/v#{PURE_EXPORTER_VERSION}/pure-fb-openmetrics-exporter-v#{PURE_EXPORTER_VERSION}-linux-amd64.tar.gz" not_if { ::File.exist?(target_binary) } end archive_file "/tmp/pure-fb-openmetrics-exporter-v#{PURE_EXPORTER_VERSION}-linux-amd64.tar.gz" do destination "/tmp/pure-fb-openmetrics-exporter-v#{PURE_EXPORTER_VERSION}" action :extract not_if { ::File.exist?(target_binary) } end execute "copy binary" do command "sudo cp /tmp/pure-fb-openmetrics-exporter-v#{PURE_EXPORTER_VERSION}/pure-fb-openmetrics-exporter /opt/pure_exporter/#{PURE_EXPORTER_VERSION}/pure-exporter" creates "/opt/pure_exporter/#{PURE_EXPORTER_VERSION}/pure-exporter" not_if { ::File.exist?(target_binary) } end tokens = <<EOF main: address: purestorage-mgmt.mydomain.com api_token: #{pure_api_token['token']} EOF file "/opt/pure_exporter/tokens.yml" do content tokens owner 'pure_exporter' group 'pure_exporter' sensitive true end systemd_unit 'pure-exporter.service' do content <<-EOU # Caution: Chef managed content. This is a file resource from #{cookbook_name}::#{recipe_name} # [Unit] Description=Pure Exporter After=network.target [Service] Restart=on-failure PIDFile=/var/run/pure-exporter.pid User=pure_exporter Group=pure_exporter ExecStart=/opt/pure_exporter/#{PURE_EXPORTER_VERSION}/pure-exporter \ --tokens=/opt/pure_exporter/tokens.yml ExecReload=/bin/kill -HUP $MAINPID SyslogIdentifier=pure-exporter [Install] WantedBy=multi-user.target EOU action [ :create, :enable, :start ] notifies :reload, "service[pure-exporter]" end service 'pure-exporter' Prometheus Job Configuration The simplest way of ingesting the metrics is to configure a basic Job without any customization: - job_name: pure_exporter metrics_path: /metrics static_configs: - targets: ['<%= @hostname %>:9491'] labels: environment: 'production' job: pure_exporter params: endpoint: [main] # From the tokens configuration above For a production-ready setup, we are using a slightly different approach. The exporter supports the usage of specific metric paths to allow for split Prometheus jobs configuration that reduces the overhead of pulling the metrics all at once: - job_name: pure_exporter_array metrics_path: /metrics/array static_configs: - targets: ['<%= @hostname %>:9491'] labels: environment: 'production' job: pure_exporter metric_relabel_configs: - source_labels: [name] target_label: ch regex: "([^.]+).*" replacement: "$1" action: replace - source_labels: [name] target_label: fb regex: "[^.]+\\.([^.]+).*" replacement: "$1" action: replace - source_labels: [name] target_label: bay regex: "[^.]+\\.[^.]+\\.([^.]+)" replacement: "$1" action: replace params: endpoint: [main] # From the tokens configuration above - job_name: pure_exporter_clients metrics_path: /metrics/clients static_configs: - targets: ['<%= @hostname %>:9491'] labels: environment: 'production' job: pure_exporter params: endpoint: [main] # From the tokens configuration above - job_name: pure_exporter_usage metrics_path: /metrics/usage static_configs: - targets: ['<%= @hostname %>:9491'] labels: environment: 'production' job: pure_exporter params: endpoint: [main] - job_name: pure_exporter_policies metrics_path: /metrics/policies static_configs: - targets: ['<%= @hostname %>:9491'] labels: environment: 'production' job: pure_exporter params: endpoint: [main] # From the tokens configuration above We also configure some metric_relabel_configs to extract labels from name using regex. Those labels help reduce the complexity of queries that aggregate metrics by different components. Detailed documentation on the available metrics can be found here. Alerts Auto Generated Alerts As I shared earlier, the system has an internal Alerting module that automatically triggers alerts for critical situations and creates tickets. To cover those alerts on the Prometheus side, we added an alerting configuration of our own that relies on the incoming severities: - alert: PureAlert annotations: summary: '{{ $labels.summary }}' description: '{{ $labels.component_type }} - {{ $labels.component_name }} - {{ $labels.action }} - {{ $labels.kburl }}' dashboard: 'https://grafana/your-dashboard' expr: purefb_alerts_open{environment="production"} == 1 for: 1m We still need to evaluate how the pure-generated alerts will interact with the custom alerts I will cover below, and we might decide to stick to one or the other depending on what we find out. Hardware Before I continue, the image below helps visualize how some of the Pure FlashBlade components are physically organized: Because of Pure’s reliability, most isolated hardware failures do not require the immediate attention of an Ops team member. To cover the most basic hardware failures, we configure an alert that sends a message to the Ops Basecamp 4 project chat: - alert: PureHardwareFailed annotations: summary: Hardware {{ $labels.name }} in chassis {{ $labels.ch }} is failed description: 'The Pure Storage hardware {{ $labels.name }} in chassis {{ $labels.ch }} is failed' dashboard: 'https://grafana/your-dashboard' expr: purefb_hardware_health == 0 for: 1m labels: severity: chat-notification We also configure alerts that check for multiple hardware failures of the same type. This doesn’t mean two simultaneous failures will result in a critical state, but it is a fair guardrail for unexpected scenarios. We also expect those situations to be rare, keeping the risk of causing unnecessary noise low. - alert: PureMultipleHardwareFailed annotations: summary: Pure chassis {{ $labels.ch }} has {{ $value }} failed {{ $labels.type }} description: 'The Pure Storage chassis {{ $labels.ch }} has {{ $value }} failed {{ $labels.type }}, close to the healthy limit of two simultaneous failures. Ensure that the hardware failures are being worked on' dashboard: 'https://grafana/your-dashboard' expr: count(purefb_hardware_health{type!~"eth|mgmt_port|bay"} == 0) by (ch,type,environment) > 1 for: 1m labels: severity: page # We are looking for multiple failed bays in the same blade - alert: PureMultipleBaysFailed annotations: summary: Pure chassis {{ $labels.ch }} has fb {{ $labels.fb }} with {{ $value }} failed bays description: 'The Pure Storage chassis {{ $labels.ch }} has fb {{ $labels.fb }} with {{ $value }} failed bays, close to the healthy limit of two simultaneous failures. Ensure that the hardware failures are being worked on' dashboard: 'https://grafana/your-dashboard' expr: count(purefb_hardware_health{type="bay"} == 0) by (ch,type,fb,environment) > 1 for: 1m labels: severity: page Finally, we configure high-level alerts for chassis and XFM failures: - alert: PureChassisFailed annotations: summary: Chassis {{ $labels.name }} is failed description: 'The Pure Storage hardware chassis {{ $labels.name }} is failed' dashboard: 'https://grafana/your-dashboard' expr: purefb_hardware_health{type="ch"} == 0 for: 1m labels: severity: page - alert: PureXFMFailed annotations: summary: Xternal Fabric Module {{ $labels.name }} is failed description: 'The Pure Storage hardware Xternal fabric module {{ $labels.name }} is failed' dashboard: 'https://grafana/your-dashboard' expr: purefb_hardware_health{type="xfm"} == 0 for: 1m labels: severity: page Latency Using the metric purefb_array_performance_latency_usec we can set a threshold for all the different protocols and dimensions (read, write, etc), so we are alerted if any problem causes the latency to go above an expected level. - alert: PureLatencyHigh annotations: summary: Pure {{ $labels.dimension }} - {{ $labels.protocol }} latency high description: 'Pure {{ $labels.protocol }} latency for dimension {{ $labels.dimension }} is above 100ms' dashboard: 'https://grafana/your-dashboard' expr: (avg_over_time(purefb_array_performance_latency_usec{protocol="all"}[30m]) * 0.001) for: 1m labels: severity: chat-notification Saturation For saturation, we are primarily worried about something unexpected causing excessive use of array space, increasing the risk of hitting the cluster capacity. With that in mind, it’s good to have a simple alert in place, even if we don’t expect it to fire anytime soon: - alert: PureArraySpace annotations: summary: Pure Cluster {{ $labels.instance }} available space is expected to be below 10% description: 'The array space for pure cluster {{ $labels.instance }} is expected to be below 10% in a month, please investigate and ensure there is no risk of running out of capacity' dashboard: 'https://grafana/your-dashboard' expr: (predict_linear(purefb_array_space_bytes{space="empty",type="array"}[30d], 730 * 3600)) < (purefb_array_space_bytes{space="capacity",type="array"} * 0.10) for: 1m labels: severity: chat-notification HTTP We use BigIp load balancers to front-end the cluster, which means that all the alerts we already had in place for the BigIp HTTP profiles, virtual servers, and pools also cover access to Pure. The solution for each organization on this topic will be different, but it is a good practice to keep an eye on HTTP status codes and throughput. Grafana Dashboards The project’s GitHub repository includes JSON files for Grafana dashboards that are based on the metrics generated by the exporter. With simple adjustments to fit each setup, it’s possible to import them quickly. Wrapping up On top of the system’s built-in capabilities, Pure also provides options to integrate their system into well-known tools like Prometheus and Grafana, facilitating the process of managing the cluster the same way we manage everything else. I hope this post helps any other team interested in working with them better understand the effort involved. Thanks for reading!
Today, we are releasing Hotwire Spark, a live-reloading system for Rails Applications. Reloading the browser automatically on source changes is a problem that has been well-solved for a long time. Here, we wanted to put an accent on smoothness. If the reload operation is very noticeable, the feedback loop is similar to just reloading the page yourself. But if it’s smooth enough—if you only perceive the intended change—the feedback loop becomes terrific. To use, just install the gem in development: group :development do gem "hotwire-spark" end It will update the current page on three types of change: HTML content, CSS, and Stimulus controllers. How do we achieve that desired smoothness with each? For HTML content, it morphs the <body> of the page into the new <body>. Also, it disconnects and reconnects all the Stimulus controllers on the page. For CSS, it reloads the changed stylesheet. For Stimulus controllers, it fetches the changed controller, replaces its module in Stimulus, and reconnects all the controllers. We designed Hotwire Spark to shine with the #nobuildapproach we use and recommend. Serving CSS and JS assets as standalone files is ideal when you want to fetch and update only what has changed. There is no need to use bundling or any tooling. Hot Module Replacement for Stimulus controllers without any frontend building tool is pretty cool! 2024 has been a very special year for Rails. We’re thrilled to share Hotwire Spark before the year wraps up. Wishing you all a joyful holiday season and a fantastic start to 2025.
If you have the luxury of starting a new Rails app today, here’s our recommendation: go vanilla. Fight hard before adding Ruby dependencies. Keep that Gemfile that Rails generates as close to the original one as possible. Fight even harder before adding Javascript dependencies. You don’t need React or any other front-end frameworks, nor a JSON API to feed those. Hotwire is a fantastic, pragmatic, and ridiculously productive technology for the front end. Use it. The same goes for mobile apps: use Hotwire Native. With a hybrid approach you can combine the very same web app you have built with a wonderful native experience right where you want it. The productivity compared to a purely native approach is night and day. Embrace and celebrate rendering things on the server. It has become cool again. ERB templates and view helpers will take you as long as you need, and they are a fantastic common ground for designers to collaborate hands-on with the code. #nobuild is the simplest way to go; don’t close this door with your choices. Instead of bundling Javascript, use import maps. Don’t bundle CSS, just use modern standard CSS goodies and serve them all with Propshaft. If you have 100 Javascript files and 100 stylesheets, serve 200 standalone requests multiplexed over HTTP2. You will be delighted. Don’t add Redis to the mix. Use solid_cache for caching, solid_queue for jobs, and solid_cable for Action Cable. They will all work on your beloved relational database and are battle-tested. Test your apps with Minitest. Use fixtures and build a realistic set of those as you cook your app. Make your app a PWA, which is fully supported by Rails 8. This may be more than enough before caring about mobile apps at all. Deploy your app with Kamal. If you want heuristics, your importmap.rb should import Turbo, Stimulus, your app controllers, and little else. Your Gemfile should be almost identical to the one that Rails generates. I know it sounds radical, but going vanilla is a radical stance in this convoluted world of endless choices. This is the Rails 8 stack we have chosen for our new apps at 37signals. We are a tiny crew, so we care a lot about productivity. And we sell products, not stacks, so we care a lot about delighting our users. This is our Omakase stack because it offers the optimal balance for achieving both. Vanilla means your app stays nimble. Fewer dependencies mean fewer future headaches. You get a tight integration out of the box, so you can focus on building things. It also maximizes the odds of having smoother future upgrades. Vanilla requires determination, though, because new dependencies always look shiny and shinier. It’s always clear what you get when you add them, but never what you lose in the long term. It is certainly up to you. Rails is a wonderful big tent. These are our opinions. If it resonates, choose vanilla! Guess what our advice is for architecting your app internals?
We’ve just released Mission Control — Jobs v1.0.0, the dashboard and set of extensions to operate background jobs that we introduced earlier this year. This new version is the result of 92 pull requests, 67 issues and the help of 35 different contributors. It includes many bugfixes and improvements, such as: Support for Solid Queue’s recurring tasks, including running them on-demand. Support for API-only apps. Allowing immediate dispatching of scheduled and blocked jobs. Backtrace cleaning for failed jobs’ backtraces. A safer default for authentication, with Basic HTTP authentication enabled and initially closed unless configured or explicitly disabled. Recurring tasks in Mission Control — Jobs, with a subset of the tasks we run in production We use Mission Control — Jobs daily to manage jobs HEY and Basecamp 4, with both Solid Queue and Resque, and it’s the dashboard we recommend if you’re using Solid Queue for your jobs. Our plan is to upstream some of the extensions we’ve made to Active Job and continue improving it until it’s ready to be included by default in Rails together with Solid Queue. If you want to help us with that, are interested in learning more or have any issues or questions, head over to the repo in GitHub. We hope you like it!
More in programming
I often give Google a lot of shit for shutting down services whenever they're bored, hire a new executive, or face a three-day weekend. The company seems institutionally incapable of standing behind the majority of the products they launch for longer than a KPI cycle. But when the company does decide that something is pivotal to the business, it's an entirely different story. And that's the tale of YouTube: The King of Internet Archives (Video Edition). I've just revived my YouTube channel after realizing just how often video has become my go-to for learning. This entire Linux adventure I've gotten myself into started by watching YouTube creators like ThePrimeagen, Typecraft, and Bread on Penguins. I learned about mechanical keyboards from Hipyo Tech. Devoured endless mini PC reviews from Level1Techs and Robtech. Oh, and took a side quest into retro gaming handhelds with Retro Game Corps. But it was when putting together the playlists for my own channel that YouTube's royal role in internet archival really stood out. Like with the original Rails Demo from 19 years ago(!), the infamous talk at Startup School from 2009, or my very first RailsConf keynote from 2006. You'd be hard-pressed to find any video content on the internet from those days anywhere else. I notice that with podcast appearances from even just a few years ago that have gone missing already. Decentralization is wonderful in many ways, but it's very much subject to link rot and disappearing content. I love how you can pull in videos from other channels onto your own page as well. I've gathered up a bunch of the many podcast appearances I've done, and even dedicated an entire playlist to the 69(!!) clips from the Lex Fridman interview. The majority of the RailsConf and Rails World keynotes are on a list. So is the old On Writing Software Well series that I keep meaning to bring back. When you're working in small tech, it's really easy to become so jaded with big tech that you become ideologically blind to the benefits they do bring. I find no inconsistency in cheering much of the antitrust agenda against Google while also celebrating their work on Chrome or their stewardship of YouTube. Any company as large as Google is bound to be full of contradictions, ambitions, and behaviors. We ought to have the capacity to cheer for the good parts and boo at the bad parts without feeling like frauds. So today, I choose to cheer for YouTube. It's an international treasure of learning, enthusiasm, and discovery.
I was listening to a podcast interview with the Jackson Browne (American singer/songwriter, political activist, and inductee into the Rock and Roll Hall of Fame) and the interviewer asks him how he approaches writing songs with social commentaries and critiques — something along the lines of: “How do you get from the New York Times headline on a social subject to the emotional heart of a song that matters to each individual?” Browne discusses how if you’re too subtle, people won’t know what you’re talking about. And if you’re too direct, you run the risk of making people feel like they’re being scolded. Here’s what he says about his songwriting: I want this to sound like you and I were drinking in a bar and we’re just talking about what’s going on in the world. Not as if you’re at some elevated place and lecturing people about something they should know about but don’t but [you think] they should care. You have to get to people where [they are, where] they do care and where they do know. I think that’s a great insight for anyone looking to have a connecting, effective voice. I know for me, it’s really easily to slide into a lecturing voice — you “should” do this and you “shouldn’t” do that. But I like Browne’s framing of trying to have an informal, conversational tone that meets people where they are. Like you’re discussing an issue in the bar, rather than listening to a sermon. Chris Coyier is the canonical example of this that comes to mind. I still think of this post from CSS Tricks where Chris talks about how to have submit buttons that go to different URLs: When you submit that form, it’s going to go to the URL /submit. Say you need another submit button that submits to a different URL. It doesn’t matter why. There is always a reason for things. The web is a big place and all that. He doesn’t conjure up some universally-applicable, justified rationale for why he’s sharing this method. Nor is there any pontificating on why this is “good” or “bad”. Instead, like most of Chris’ stuff, I read it as a humble acknowledgement of the practicalities at hand — “Hey, the world is a big place. People have to do crafty things to make their stuff work. And if you’re in that situation, here’s something that might help what ails ya.” I want to work on developing that kind of a voice because I love reading voices like that. Email · Mastodon · Bluesky
Previously, I wrote some sketchy ideas for what I call a p-fast trie, which is basically a wide fan-out variant of an x-fast trie. It allows you to find the longest matching prefix or nearest predecessor or successor of a query string in a set of names in O(log k) time, where k is the key length. My initial sketch was more complicated and greedy for space than necessary, so here’s a simplified revision. (“p” now stands for prefix.) layout A p-fast trie stores a lexicographically ordered set of names. A name is a sequence of characters from some small-ish character set. For example, DNS names can be represented as a set of about 50 letters, digits, punctuation and escape characters, usually one per byte of name. Names that are arbitrary bit strings can be split into chunks of 6 bits to make a set of 64 characters. Every unique prefix of every name is added to a hash table. An entry in the hash table contains: A shared reference to the closest name lexicographically greater than or equal to the prefix. Multiple hash table entries will refer to the same name. A reference to a name might instead be a reference to a leaf object containing the name. The length of the prefix. To save space, each prefix is not stored separately, but implied by the combination of the closest name and prefix length. A bitmap with one bit per possible character, corresponding to the next character after this prefix. For every other prefix that matches this prefix and is one character longer than this prefix, a bit is set in the bitmap corresponding to the last character of the longer prefix. search The basic algorithm is a longest-prefix match. Look up the query string in the hash table. If there’s a match, great, done. Otherwise proceed by binary chop on the length of the query string. If the prefix isn’t in the hash table, reduce the prefix length and search again. (If the empty prefix isn’t in the hash table then there are no names to find.) If the prefix is in the hash table, check the next character of the query string in the bitmap. If its bit is set, increase the prefix length and search again. Otherwise, this prefix is the answer. predecessor Instead of putting leaf objects in a linked list, we can use a more complicated search algorithm to find names lexicographically closest to the query string. It’s tricky because a longest-prefix match can land in the wrong branch of the implicit trie. Here’s an outline of a predecessor search; successor requires more thought. During the binary chop, when we find a prefix in the hash table, compare the complete query string against the complete name that the hash table entry refers to (the closest name greater than or equal to the common prefix). If the name is greater than the query string we’re in the wrong branch of the trie, so reduce the length of the prefix and search again. Otherwise search the set bits in the bitmap for one corresponding to the greatest character less than the query string’s next character; if there is one remember it and the prefix length. This will be the top of the sub-trie containing the predecessor, unless we find a longer match. If the next character’s bit is set in the bitmap, continue searching with a longer prefix, else stop. When the binary chop has finished, we need to walk down the predecessor sub-trie to find its greatest leaf. This must be done one character at a time – there’s no shortcut. thoughts In my previous note I wondered how the number of search steps in a p-fast trie compares to a qp-trie. I have some old numbers measuring the average depth of binary, 4-bit, 5-bit, 6-bit and 4-bit, 5-bit, dns qp-trie variants. A DNS-trie varies between 7 and 15 deep on average, depending on the data set. The number of steps for a search matches the depth for exact-match lookups, and is up to twice the depth for predecessor searches. A p-fast trie is at most 9 hash table probes for DNS names, and unlikely to be more than 7. I didn’t record the average length of names in my benchmark data sets, but I guess they would be 8–32 characters, meaning 3–5 probes. Which is far fewer than a qp-trie, though I suspect a hash table probe takes more time than chasing a qp-trie pointer. (But this kind of guesstimate is notoriously likely to be wrong!) However, a predecessor search might need 30 probes to walk down the p-fast trie, which I think suggests a linked list of leaf objects is a better option.
New Logic for Programmers Release! v0.11 is now available! This is over 20% longer than v0.10, with a new chapter on code proofs, three chapter overhauls, and more! Full release notes here. Software books I wish I could read I'm writing Logic for Programmers because it's a book I wanted to have ten years ago. I had to learn everything in it the hard way, which is why I'm ensuring that everybody else can learn it the easy way. Books occupy a sort of weird niche in software. We're great at sharing information via blogs and git repos and entire websites. These have many benefits over books: they're free, they're easily accessible, they can be updated quickly, they can even be interactive. But no blog post has influenced me as profoundly as Data and Reality or Making Software. There is no blog or talk about debugging as good as the Debugging book. It might not be anything deeper than "people spend more time per word on writing books than blog posts". I dunno. So here are some other books I wish I could read. I don't think any of them exist yet but it's a big world out there. Also while they're probably best as books, a website or a series of blog posts would be ok too. Everything about Configurations The whole topic of how we configure software, whether by CLI flags, environmental vars, or JSON/YAML/XML/Dhall files. What causes the configuration complexity clock? How do we distinguish between basic, advanced, and developer-only configuration options? When should we disallow configuration? How do we test all possible configurations for correctness? Why do so many widespread outages trace back to misconfiguration, and how do we prevent them? I also want the same for plugin systems. Manifests, permissions, common APIs and architectures, etc. Configuration management is more universal, though, since everybody either uses software with configuration or has made software with configuration. The Big Book of Complicated Data Schemas I guess this would kind of be like Schema.org, except with a lot more on the "why" and not the what. Why is important for the Volcano model to have a "smokingAllowed" field?1 I'd see this less as "here's your guide to putting Volcanos in your database" and more "here's recurring motifs in modeling interesting domains", to help a person see sources of complexity in their own domain. Does something crop up if the references can form a cycle? If a relationship needs to be strictly temporary, or a reference can change type? Bonus: path dependence in data models, where an additional requirement leads to a vastly different ideal data model that a company couldn't do because they made the old model. (This has got to exist, right? Business modeling is a big enough domain that this must exist. Maybe The Essence of Software touches on this? Man I feel bad I haven't read that yet.) Computer Science for Software Engineers Yes, I checked, this book does not exist (though maybe this is the same thing). I don't have any formal software education; everything I know was either self-taught or learned on the job. But it's way easier to learn software engineering that way than computer science. And I bet there's a lot of other engineers in the same boat. This book wouldn't have to be comprehensive or instructive: just enough about each topic to understand why it's an area of study and appreciate how research in it eventually finds its way into practice. MISU Patterns MISU, or "Make Illegal States Unrepresentable", is the idea of designing system invariants in the structure of your data. For example, if a Contact needs at least one of email or phone to be non-null, make it a sum type over EmailContact, PhoneContact, EmailPhoneContact (from this post). MISU is great. Most MISU in the wild look very different than that, though, because the concept of MISU is so broad there's lots of different ways to achieve it. And that means there are "patterns": smart constructors, product types, properly using sets, newtypes to some degree, etc. Some of them are specific to typed FP, while others can be used in even untyped languages. Someone oughta make a pattern book. My one request would be to not give them cutesy names. Do something like the Aarne–Thompson–Uther Index, where items are given names like "Recognition by manner of throwing cakes of different weights into faces of old uncles". Names can come later. The Tools of '25 Not something I'd read, but something to recommend to junior engineers. Starting out it's easy to think the only bit that matters is the language or framework and not realize the enormous amount of surrounding tooling you'll have to learn. This book would cover the basics of tools that enough developers will probably use at some point: git, VSCode, very basic Unix and bash, curl. Maybe the general concepts of tools that appear in every ecosystem, like package managers, build tools, task runners. That might be easier if we specialize this to one particular domain, like webdev or data science. Ideally the book would only have to be updated every five years or so. No LLM stuff because I don't expect the tooling will be stable through 2026, to say nothing of 2030. A History of Obsolete Optimizations Probably better as a really long blog series. Each chapter would be broken up into two parts: A deep dive into a brilliant, elegant, insightful historical optimization designed to work within the constraints of that era's computing technology What we started doing instead, once we had more compute/network/storage available. c.f. A Spellchecker Used to Be a Major Feat of Software Engineering. Bonus topics would be brilliance obsoleted by standardization (like what people did before git and json were universal), optimizations we do today that may not stand the test of time, and optimizations from the past that did. Sphinx Internals I need this. I've spent so much goddamn time digging around in Sphinx and docutils source code I'm gonna throw up. Systems Distributed Talk Today! Online premier's at noon central / 5 PM UTC, here! I'll be hanging out to answer questions and be awkward. You ever watch a recording of your own talk? It's real uncomfortable! In this case because it's a field on one of Volcano's supertypes. I guess schemas gotta follow LSP too ↩