More from Oxide Computer Company Blog
Sometime in late 2007, we had the idea of a DTrace conference. Or really, more of a meetup; from the primordial e-mail I sent: The goal here, by the way, is not a DTrace user group, but more of a face-to-face meeting with people actively involved in DTrace — either by porting it to another system, by integrating probes into higher level environments, by building higher-level tools on top of DTrace or by using it heavily and/or in a critical role. That said, we also don’t want to be exclusionary, so our thinking is that the only true requirement for attending is that everyone must be prepared to speak informally for 15 mins or so on what they are doing with DTrace, any limitations that they have encountered, and some ideas for the future. We’re thinking that this is going to be on the order of 15-30 people (though more would be a good problem to have — we’ll track it if necessary), that it will be one full day (breakfast in the morning through drinks into the evening), and that we’re going to host it here at our offices in San Francisco sometime in March 2008. This same note also included some suggested names for the gathering, including what in hindsight seems a clear winner: DTrace Bi-Mon-Sci-Fi-Con. As if knowing that I should leave an explanatory note to my future self as to why this name was not selected, my past self fortunately clarified: "before everyone clamors for the obvious Bi-Mon-Sci-Fi-Con, you should know that most Millennials don’t (sadly) get the reference." (While I disagree with the judgement of my past self, it at least indicates that at some point I cared if anyone got the reference.) We settled on a much more obscure reference, and had the first dtrace.conf in March 2008. Befitting the style of the time, it was an unconference (a term that may well have hit its apogee in 2008) that you signed up to attend by editing a wiki. More surprising given the year (and thanks entirely to attendee Ben Rockwood), it was recorded — though this is so long ago that I referred to it as video taping (and with none of the participants mic’d, I’m afraid the quality isn’t very good). The conference, however, was terrific, viz. the reports of Adam, Keith and Stephen (all somehow still online nearly two decades later). If anything, it was a little too good: we realized that we couldn’t recreate the magic, and we demurred on making it an annual event. Years passed, and memories faded. By 2012, it felt like we wanted to get folks together again, now under a post-lawnmower corporate aegis in Joyent. The resulting dtrace.conf(12) was a success, and the Olympiad cadence felt like the right one; we did it again four years later at dtrace.conf(16). In 2020, we came back together for a new adventure — and the DTrace Olympiad was not lost on Adam. Alas, dtrace.conf(20) — like the Olympics themselves — was cancelled, if implicitly. Unlike the Olympics, however, it was not to be rescheduled. More years passed and DTrace continued to prove its utility at Oxide; last year when Adam and I did our "DTrace at 20" episode of Oxide and Friends, we vowed to hold dtrace.conf(24) — and a few months ago, we set our date to be December 11th. At first we assumed we would do something similar to our earlier conferences: a one-day participant-run conference, at the Oxide office in Emeryville. But times have changed: thanks to the rise of remote work, technologists are much more dispersed — and many more people would need to travel for dtrace.conf(24) than in previous DTrace Olympiads. Travel hasn’t become any cheaper since 2008, and the cost (and inconvenience) was clearly going to limit attendance. The dilemma for our small meetup highlights the changing dynamics in tech conferences in general: with talks all recorded and made publicly available after the conference, how does one justify attending a conference in person? There can be reasonable answers to that question, of course: it may be the hallway track, or the expo hall, or the after-hours socializing, or perhaps some other special conference experience. But it’s also not surprising that some conferences — especially ones really focused on technical content — have decided that they are better off doing as conference giant O’Reilly Media did, and going exclusively online. And without the need to feed and shelter participants, the logistics for running a conference become much more tenable — and the price point can be lowered to the point that even highly produced conferences like P99 CONF can be made freely available. This, in turn, leads to much greater attendance — and a network effect that can get back some of what one might lose going online. In particular, using chat as the hallway track can be more much effective (and is certainly more scalable!) than the actual physical hallways at a conference. For conferences in general, there is a conversation to be had here (and as a teaser, Adam and I are going to talk about it with Stephen O’Grady and Theo Schlossnagle on Oxide and Friends next week, but for our quirky, one-day, Olympiad-cadence dtrace.conf, the decision was pretty easy: there was much more to be gained than lost by going exclusively on-line. So dtrace.conf(24) is coming up next week, and it’s available to everyone. In terms of platform, we’re going to try to keep that pretty simple: we’re going to use Google Meet for the actual presenters, which we will stream in real-time to YouTube — and we’ll use the Oxide Discord for all chat. We’re hoping you’ll join us on December 11th — and if you want to talk about DTrace or a DTrace-adjacent topic, we’d love for you to present! Keeping to the unconference style, if you would like to present, please indicate your topic in the #session-topics Discord channel so we can get the agenda fleshed out. While we’re excited to be online, there are some historical accoutrements of conferences that we didn’t want to give up. First, we have a tradition of t-shirts with dtrace.conf. Thanks to our designer Ben Leonard, we have a banger of a t-shirt, capturing the spirit of our original dtrace.conf(08) shirt but with an Oxide twist. It’s (obviously) harder to make those free but we have tried to price them reasonably. You can get your t-shirt by adding it to your (free) dtrace.conf ticket. (And for those who present at dtrace.conf, your shirt is on us — we’ll send you a coupon code!) Second, for those who can make their way to the East Bay and want some hangout time, we are going to have an après conference social event at the Oxide office starting at 5p. We’re charging something nominal for that too (and like the t-shirt, you pay for that via your dtrace.conf ticket); we’ll have some food and drinks and an Oxide hardware tour for the curious — and (of course?) there will be Fishpong. Much has changed since I sent that e-mail 17 years ago — but the shared values and disposition that brought together our small community continue to endure; we look forward to seeing everyone (virtually) at dtrace.conf(24)!
Oxide Computer Company and Lawrence Livermore National Laboratory Work Together to Advance Cloud and HPC Convergence Oxide Computer Company and Lawrence Livermore National Laboratory (LLNL) today announced a plan to bring on-premises cloud computing capabilities to the Livermore Computing (LC) high-performance computing (HPC) center. The rack-scale Oxide Cloud Computer allows LLNL to improve the efficiency of operational workloads and will provide users in the National Nuclear Security Administration (NNSA) with new capabilities for provisioning secure, virtualized services alongside HPC workloads. HPC centers have traditionally run batch workloads for large-scale scientific simulations and other compute-heavy applications. HPC workloads do not exist in isolation—there are a multitude of persistent, operational services that keep the HPC center running. Meanwhile, HPC users also want to deploy cloud-like persistent services—databases, Jupyter notebooks, orchestration tools, Kubernetes clusters. Clouds have developed extensive APIs, security layers, and automation to enable these capabilities, but few options exist to deploy fully virtualized, automated cloud environments on-premises. The Oxide Cloud Computer allows organizations to deliver secure cloud computing capabilities within an on-premises environment. On-premises environments are the next frontier for cloud computing. LLNL is tackling some of the hardest and most important problems in science and technology, requiring advanced hardware, software, and cloud capabilities. We are thrilled to be working with their exceptional team to help advance those efforts, delivering an integrated system that meets their rigorous requirements for performance, efficiency, and security. — Steve TuckCEO at Oxide Computer Company Leveraging the new Oxide Cloud Computer, LLNL will enable staff to provision virtual machines (VMs) and services via self-service APIs, improving operations and modernizing aspects of system management. In addition, LLNL will use the Oxide rack as a proving ground for secure multi-tenancy and for smooth integration with the LLNL-developed Flux resource manager. LLNL plans to bring its users cloud-like Infrastructure-as-a-Service (IaaS) capabilities that work seamlessly with their HPC jobs, while maintaining security and isolation from other users. Beyond LLNL personnel, researchers at the Los Alamos National Laboratory and Sandia National Laboratories will also partner in many of the activities on the Oxide Cloud Computer. We look forward to working with Oxide to integrate this machine within our HPC center. Oxide’s Cloud Computer will allow us to securely support new types of workloads for users, and it will be a proving ground for introducing cloud-like features to operational processes and user workflows. We expect Oxide’s open-source software stack and their transparent and open approach to development to help us work closely together. — Todd GamblinDistinguished Member of Technical Staff at LLNL Sandia is excited to explore the Oxide platform as we work to integrate on-premise cloud technologies into our HPC environment. This advancement has the potential to enable new classes of interactive and on-demand modeling and simulation capabilities. — Kevin PedrettiDistinguished Member of Technical Staff at Sandia National Laboratories LLNL plans to work with Oxide on additional capabilities, including the deployment of additional Cloud Computers in its environment. Of particular interest are scale-out capabilities and disaster recovery. The latest installation underscores Oxide Computer’s momentum in the federal technology ecosystem, providing reliable, state-of-the-art Cloud Computers to support critical IT infrastructure. To learn more about Oxide Computer, visit https://oxide.computer. About Oxide Computer Oxide Computer Company is the creator of the world’s first commercial Cloud Computer, a true rack-scale system with fully unified hardware and software, purpose-built to deliver hyperscale cloud computing to on-premises data centers. With Oxide, organizations can fully realize the economic and operational benefits of cloud ownership, with access to the same self-service development experience of public cloud, without the public cloud cost. Oxide empowers developers to build, run, and operate any application with enhanced security, latency, and control, and frees organizations to elevate IT operations to accelerate strategic initiatives. To learn more about Oxide’s Cloud Computer, visit oxide.computer. About LLNL Founded in 1952, Lawrence Livermore National Laboratory provides solutions to our nation’s most important national security challenges through innovative science, engineering, and technology. Lawrence Livermore National Laboratory is managed by Lawrence Livermore National Security, LLC for the U.S. Department of Energy’s National Nuclear Security Administration. Media Contact LaunchSquad for Oxide Computer oxide@launchsquad.com
We are heartbroken to relay that Charles Beeler, a friend and early investor in Oxide, passed away in September after a battle with cancer. We lost Charles far too soon; he had a tremendous influence on the careers of us both. Our relationship with Charles dates back nearly two decades, to his involvement with the ACM Queue board where he met Bryan. It was unprecedented to have a venture capitalist serve in this capacity with ACM, and Charles brought an entirely different perspective on the practitioner content. A computer science pioneer who also served on the board took Bryan aside at one point: "Charles is one of the good ones, you know." When Bryan joined Joyent a few years later, Charles also got to know Steve well. Seeing the promise in both node.js and cloud computing, Charles became an investor in the company. When companies hit challenging times, some investors will hide — but Charles was the kind of investor to figure out how to fix what was broken. When Joyent needed a change in executive leadership, it was Charles who not only had the tough conversations, but led the search for the leader the company needed, ultimately positioning the company for success. Aside from his investment in Joyent, Charles was an outspoken proponent of node.js, becoming an organizer of the Node Summit conference. In 2017, he asked Bryan to deliver the conference’s keynote, but by then, the relationship between Joyent and node.js had become… complicated, and Bryan felt that it probably wouldn’t be a good idea. Any rational person would have dropped it, but Charles persisted, with characteristic zeal: if the Joyent relationship with node.js had become strained, so much more the reason to speak candidly about it! Charles prevailed, and the resulting talk, Platform as Reflection of Values, became one of Bryan’s most personally meaningful talks. Charles’s persistence was emblematic: he worked behind the scenes to encourage people to do their best work, always with an enthusiasm for the innovators and the creators. As we were contemplating Oxide, we told Charles what we wanted to do long before we had a company. Charles laughed with delight: "I hoped that you two would do something big, and I am just so happy for you that you’re doing something so ambitious!" As we raised seed capital, we knew that we were likely a poor fit for Charles and his fund. But we also knew that we deeply appreciated his wisdom and enthusiasm; we couldn’t resist pitching him on Oxide. Charles approached the investment in Oxide as he did with so many other aspects: with curiosity, diligence, empathy, and candor. He was direct with us that despite his enthusiasm for us personally, Oxide would be a challenging investment for his firm. But he also worked with us to address specific objections, and ultimately he won over his partnership. We were thrilled when he not only invested, but pulled together a syndicate of like-minded technologists and entrepreneurs to join him. Ever since, he has been a huge Oxide fan. Befitting his enthusiasm, one of his final posts expressed his enthusiasm and pride in what the Oxide team has built. Charles, thank you. You told us you were proud of us — and it meant the world. We are gutted to no longer have you with us; your influence lives on not just in Oxide, but also in the many people that you have inspired. You were the best of venture capital. Closer to the heart, you were a terrific friend to us both; thank you.
Here’s a sobering thought: today, data centers already consume 1-2% of the world’s power, and that percentage will likely rise to 3-4% by the end of the decade. According to Goldman Sachs research, that rise will include a doubling in data center carbon dioxide emissions. As the data and AI boom progresses, this thirst for power shows no signs of slowing down anytime soon. Two key challenges quickly become evident for the 85% of IT that currently lives on-premises. How can organizations reduce power consumption and corresponding carbon emissions? How can organizations keep pace with AI innovation as existing data centers run out of available power? Figure 1. Masanet et al. (2020), Cisco, IEA, Goldman Sachs Research Rack-scale design is critical to improved data center efficiency Traditional data center IT consumes so much power because the fundamental unit of compute is an individual server; like a house where rooms were built one at a time, with each room having its own central AC unit, gas furnace, and electrical panel. Individual rackmount servers are stacked together, each with their own AC power supplies, cooling fans, and power management. They are then paired with storage appliances and network switches that communicate at arm’s length, not designed as a cohesive whole. This approach fundamentally limits organizations' ability to maintain sustainable, high-efficiency computing systems. Of course, hyperscale public cloud providers did not design their data center systems this way. Instead, they operate like a carefully planned smart home where everything is designed to work together cohesively and is operated by software that understands the home’s systems end-to-end. High-efficiency, rack-scale computers are deployed at scale and operate as a single unit with integrated storage and networking to support elastic cloud computing services. This modern archietecture is made available to the market as public cloud, but that rental-only model is ill-fit for many business needs. Compared to a popular rackmount server vendor, Oxide is able to fill our specialized racks with 32 AMD Milan sleds and highly-available network switches using less than 15kW per rack, doubling the compute density in a typical data center. With just 16 of the alternative 1U servers and equivalent network switches, over 16kW of power is required per rack, leading to only 1,024 CPU cores vs Oxide’s 2,048. Extracting more useful compute from each kW of power and square foot of data center space is key to the future effectiveness of on-premises computing. At Oxide, we’ve taken this lesson in advancing rack-scale design, improved upon it in several ways, and made it available for every organization to purchase and operate anywhere in the world without a tether back to the public cloud. Our Cloud Computer treats the entire rack as a single, unified computer rather than a collection of independent parts, achieving unprecedented power efficiency. By designing the hardware and software together, we’ve eliminated unnecessary components and optimized every aspect of system operation through a control plane with visibility to end-to-end operations. When we started Oxide, the DC bus bar stood as one of the most glaring differences between the rack-scale machines at the hyperscalers and the rack-and-stack servers that the rest of the market was stuck with. That a relatively simple piece of copper was unavailable to commercial buyers — despite being unequivocally the right way to build it! — represented everything wrong with the legacy approach. The bus bar in the Oxide Cloud Computer is not merely more efficient, it is a concrete embodiment of the tremendous gains from designing at rack-scale, and by integrating hardware with software. — Bryan Cantrill The improvements we’re seeing are rooted in technical innovation Replacing low-efficiency AC power supplies with a high-efficiency DC Bus Bar This eliminates the 70 total AC power supplies found in an equivalent legacy server rack within 32 servers, two top-of-rack switches, and one out-of-band switch, each with two AC power supplies. This power shelf also ensures the load is balanced across phases, something that’s impossible with traditional power distribution units found in legacy server racks. Bigger fans = bigger efficiency gains using 12x less energy than legacy servers, which each contain as many as 7 fans, which must work much harder to move air over system components. Purpose-built for power efficiency less restrictive airflow than legacy servers by eliminating extraneous components like PCIe risers, storage backplanes, and more. Legacy servers need many optional components like these because they could be used for any number of tasks, such as point-of-sale systems, data center servers, or network-attached-storage (NAS) systems. Still, they were never designed optimally for any one of those tasks. The Oxide Cloud Computer was designed from the ground up to be a rack-scale cloud computing powerhouse, and so it’s optimized for exactly that task. Hardware + Software designed together By designing the hardware and software together, we can make hardware choices like more intelligent DC-DC power converters that can provide rich telemetry to our control plane, enabling future feature enhancements such as dynamic power capping and efficiency-based workload placement that are impossible with legacy servers and software systems. Learn more about Oxide’s intelligent Power Shelf Controller The Bottom Line: Customers and the Environment Both Benefit Reducing data center power demands and achieving more useful computing per kilowatt requires fundamentally rethinking traditional data center utilization and compute design. At Oxide, we’ve proven that dramatic efficiency gains are possible when you rethink the computer at rack-scale with hardware and software designed thoughtfully and rigorously together. Ready to learn how your organization can achieve these results? Schedule time with our team here. Together, we can reclaim on-premises computing efficiency to achieve both business and sustainability goals.
Paul Graham’s Founder Mode is an important piece, and you should read it if for no other reason that "founder mode" will surely enter the lexicon (and as Graham grimly predicts: "as soon as the concept of founder mode becomes established, people will start misusing it"). When building a company, founders are engaged in several different acts at once: raising capital; building a product; connecting that product to a market; building an organization to do all of these. Founders make lots of mistakes in all of these activities, and Graham’s essay highlights a particular kind of mistake in which founders are overly deferential to expertise or convention. Pejoratively referring to this as "Management Mode", Graham frames this in the Silicon Valley dramaturgical dyad of Steve Jobs and John Scully. While that’s a little too reductive (anyone seeking to understand Jobs needs to read Randall Stross’s superlative Steve Jobs and the NeXT Big Thing, highlighting Jobs’s many post-Scully failures at NeXT), Graham has identified a real issue here, albeit without much specificity. For a treatment of the same themes but with much more supporting detail, one should read the (decade-old) piece from Tim O’Reilly, How I failed. (Speaking personally, O’Reilly’s piece had a profound influence on me, as it encouraged me to stand my ground on an issue on which I had my own beliefs but was being told to defer to convention.) But as terrific as it is, O’Reilly’s piece also doesn’t answer the question that Graham poses: how do founders prevent their companies from losing their way? Graham says that founder mode is a complete mystery ("There are as far as I know no books specifically about founder mode"), and while there is a danger in being too pat or prescriptive, there does seem to be a clear component for keeping companies true to themselves: the written word. That is, a writing- (and reading-!) intensive company culture does, in fact, allow for scaling the kind of responsibility that Graham thinks of as founder mode. At Oxide, our writing-intensive culture has been absolutely essential: our RFD process is the backbone of Oxide, and has given us the structure to formalize, share, and refine our thinking. First among this formalized thinking — and captured in our first real RFD — is RFD 2 Mission, Principles, and Values. Immediately behind that (and frankly, the most important process for any company) is RFD 3 Oxide Hiring Process. These first three RFDs — on the process itself, on what we value, and on how we hire — were written in the earliest days of the company, and they have proven essential to scale the company: they are the foundation upon which we attract people who share our values. While the shared values have proven necessary, they haven’t been sufficient to eliminate the kind of quandaries that Graham and O’Reilly describe. For example, there have been some who have told us that we can’t possibly hire non-engineering roles using our hiring process — or told us that our approach to compensation can’t possibly work. To the degree that we have had a need for Graham’s founder mode, it has been in those moments: to stay true to the course we have set for the company. But because we have written down so much, there is less occasion for this than one might think. And when it does occur — when there is a need for further elucidation or clarification — the artifact is not infrequently a new RFD that formalizes our newly extended thinking. (RFD 68 is an early public and concrete example of this; RFD 508 is a much more recent one that garnered some attention.) Most importantly, because we have used our values as a clear lens for hiring, we are able to assure that everyone at Oxide is able to have the same disposition with respect to responsibility — and this (coupled with the transparency that the written word allows) permits us to trust one another. As I elucidated in Things I Learned The Hard Way, the most important quality in a leader is to bind a team with mutual trust: with it, all things are possible — and without it, even easy things can be debilitatingly difficult. Graham mentions trust, but he doesn’t give it its due. Too often, founders focus on the immediacy of a current challenge without realizing that they are, in fact, undermining trust with their approach. Bluntly, founders are at grave risk of misinterpreting Graham’s "Founders Mode" to be a license to micromanage their teams, descending into the kind of manic seagull management that inhibits a team rather than empowering it. Founders seeking to internalize Graham’s advice should recast it by asking themselves how they can foster mutual trust — and how they can build the systems that allow trust to be strengthened even as the team expands. For us at Oxide, writing is the foundation upon which we build that trust. Others may land on different mechanisms, but the goal of founders should be the same: build the trust that allows a team to kick a Jobsian dent in the universe!
More in programming
The first time we had to evacuate Malibu this season was during the Franklin fire in early December. We went to bed with our bags packed, thinking they'd probably get it under control. But by 2am, the roaring blades of fire choppers shaking the house got us up. As we sped down the canyon towards Pacific Coast Highway (PCH), the fire had reached the ridge across from ours, and flames were blazing large out the car windows. It felt like we had left the evacuation a little too late, but they eventually did get Franklin under control before it reached us. Humans have a strange relationship with risk and disasters. We're so prone to wishful thinking and bad pattern matching. I remember people being shocked when the flames jumped the PCH during the Woolsey fire in 2017. IT HAD NEVER DONE THAT! So several friends of ours had to suddenly escape a nightmare scenario, driving through burning streets, in heavy smoke, with literally their lives on the line. Because the past had failed to predict the future. I feel into that same trap for a moment with the dramatic proclamations of wind and fire weather in the days leading up to January 7. Warning after warning of "extremely dangerous, life-threatening wind" coming from the City of Malibu, and that overly-bureaucratic-but-still-ominous "Particularly Dangerous Situation" designation. Because, really, how much worse could it be? Turns out, a lot. It was a little before noon on the 7th when we first saw the big plumes of smoke rise from the Palisades fire. And immediately the pattern matching ran astray. Oh, it's probably just like Franklin. It's not big yet, they'll get it out. They usually do. Well, they didn't. By the late afternoon, we had once more packed our bags, and by then it was also clear that things actually were different this time. Different worse. Different enough that even Santa Monica didn't feel like it was assured to be safe. So we headed far North, to be sure that we wouldn't have to evacuate again. Turned out to be a good move. Because by now, into the evening, few people in the connected world hadn't started to see the catastrophic images emerging from the Palisades and Eaton fires. Well over 10,000 houses would ultimately burn. Entire neighborhoods leveled. Pictures that could be mistaken for World War II. Utter and complete destruction. By the night of the 7th, the fire reached our canyon, and it tore through the chaparral and brush that'd been building since the last big fire that area saw in 1993. Out of some 150 houses in our immediate vicinity, nearly a hundred burned to the ground. Including the first house we moved to in Malibu back in 2009. But thankfully not ours. That's of course a huge relief. This was and is our Malibu Dream House. The site of that gorgeous home office I'm so fond to share views from. Our home. But a house left standing in a disaster zone is still a disaster. The flames reached all the way up to the base of our construction, incinerated much of our landscaping, and devoured the power poles around it to dysfunction. We have burnt-out buildings every which way the eye looks. The national guard is still stationed at road blocks on the access roads. Utility workers are tearing down the entire power grid to rebuild it from scratch. It's going to be a long time before this is comfortably habitable again. So we left. That in itself feels like defeat. There's an urge to stay put, and to help, in whatever helpless ways you can. But with three school-age children who've already missed over a months worth of learning from power outages, fire threats, actual fires, and now mudslide dangers, it was time to go. None of this came as a surprise, mind you. After Woolsey in 2017, Malibu life always felt like living on borrowed time to us. We knew it, even accepted it. Beautiful enough to be worth the risk, we said. But even if it wasn't a surprise, it's still a shock. The sheer devastation, especially in the Palisades, went far beyond our normal range of comprehension. Bounded, as it always is, by past experiences. Thus, we find ourselves back in Copenhagen. A safe haven for calamities of all sorts. We lived here for three years during the pandemic, so it just made sense to use it for refuge once more. The kids' old international school accepted them right back in, and past friendships were quickly rebooted. I don't know how long it's going to be this time. And that's an odd feeling to have, just as America has been turning a corner, and just as the optimism is back in so many areas. Of the twenty years I've spent in America, this feels like the most exciting time to be part of the exceptionalism that the US of A offers. And of course we still are. I'll still be in the US all the time on both business, racing, and family trips. But it won't be exclusively so for a while, and it won't be from our Malibu Dream House. And that burns.
I’ve been doing Dry January this year. One thing I missed was something for apéro hour, a beverage to mark the start of the evening. Something complex and maybe bitter, not like a drink you’d have with lunch. I found some good options. Ghia sodas are my favorite. Ghia is an NA apéritif based on grape juice but with enough bitterness (gentian) and sourness (yuzu) to be interesting. You can buy a bottle and mix it with soda yourself but I like the little cans with extra flavoring. The Ginger and the Sumac & Chili are both great. Another thing I like are low-sugar fancy soda pops. Not diet drinks, they still have a little sugar, but typically 50 calories a can. De La Calle Tepache is my favorite. Fermented pineapple is delicious and they have some fun flavors. Culture Pop is also good. A friend gave me the Zero book, a drinks cookbook from the fancy restaurant Alinea. This book is a little aspirational but the recipes are doable, it’s just a lot of labor. Very fancy high end drink mixing, really beautiful flavor ideas. The only thing I made was their gin substitute (mostly junipers extracted in glycerin) and it was too sweet for me. Need to find the right use for it, a martini definitely ain’t it. An easier homemade drink is this Nonalcoholic Dirty Lemon Tonic. It’s basically a lemonade heavily flavored with salted preserved lemons, then mixed with tonic. I love the complexity and freshness of this drink and enjoy it on its own merits. Finally, non-alcoholic beer has gotten a lot better in the last few years thanks to manufacturing innovations. I’ve been enjoying NA Black Butte Porter, Stella Artois 0.0, Heineken 0.0. They basically all taste just like their alcoholic uncles, no compromise. One thing to note about non-alcoholic substitutes is they are not cheap. They’ve become a big high end business. Expect to pay the same for an NA drink as one with alcohol even though they aren’t taxed nearly as much.
Thou shalt not suffer a flaky test to live, because it’s annoying, counterproductive, and dangerous: one day it might fail for real, and you won’t notice. Here’s what to do.
The ware for January 2025 is shown below. Thanks to brimdavis for contributing this ware! …back in the day when you would get wares that had “blue wires” in them… One thing I wonder about this ware is…where are the ROMs? Perhaps I’ll find out soon! Happy year of the snake!
While I frequently hear engineers bemoan a missing strategy, they rarely complete the thought by articulating why the missing strategy matters. Instead, it serves as more of a truism: the economy used to be better, children used to respect their parents, and engineering organizations used to have an engineering strategy. This chapter starts by exploring something I believe quite strongly: there’s always an engineering strategy, even if there’s nothing written down. From there, we’ll discuss why strategy, especially written strategy, is such a valuable opportunity for organizations that take it seriously. We’ll dig into: Why there’s always a strategy, even when people say there isn’t How strategies have been impactful across my career How inappropriate strategies create significant organizational pain without much compensating impact How written strategy drives organizational learning The costs of not writing strategy down How strategy supports personal learning and developing, even in cases where you’re not empowered to “do strategy” yourself By this chapter’s end, hopefully you will agree with me that strategy is an undertaking worth investing your–and your organization’s–time in. This is an exploratory, draft chapter for a book on engineering strategy that I’m brainstorming in #eng-strategy-book. As such, some of the links go to other draft chapters, both published drafts and very early, unpublished drafts. There’s always a strategy I’ve never worked somewhere where people didn’t claim there as no strategy. In many of those companies, they’d say there was no engineering strategy. Once I became an executive and was able to document and distribute an engineering strategy, accusations of missing strategy didn’t go away, they just shfited to focus on a missing product or company strategy. This even happened at companies that definitively had engineering strategies like Stripe in 2016 which had numerous pillars to a clear engineering strategy such as: Maintain backwards API compatibilty, at almost any cost (e.g. force an upgrade from TLS 1.2 to TLS 1.3 to retain PCI compliance, but don’t force upgrades from the /v1/charges endpoint to the /v1/payment_intents endpoint) Work in Ruby in a monorepo, unless it’s the PCI environment, data processing, or data science work Engineers are fully responsible for the usability of their work, even when there are product or engineering managers involved Working there it was generally clear what the company’s engineering strategy was on any given topic. That said, it sometimes required asking around, and over time certain decisions became sufficiently contentious that it became hard to definitively answer what the strategy was. For example, the adoptino of Ruby versus Java became contentious enough that I distributed a strategy attempting to mediate the disagreement, Magnitudes of exploration, although it wasn’t a particularly successful effort (for reasons that are obvious in hindsight, particularly the lack of any enforcement mechanism). In the same sense that William Gibson said “The future is already here – it’s just not very evenly distributed,” there is always a strategy embedded into an organization’s decisions, although in many organizations that strategy is only visible to a small group, and may be quickly forgotten. If you ever find yourself thinking that a strategy doesn’t exist, I’d encourage you to instead ask yourself where the strategy lives if you can’t find it. Once you do find it, you may also find that the strategy is quite ineffective, but I’ve simply never found that it doesn’t exist. Strategy is impactful In “We are a product engineering company!”, we discuss Calm’s engineering strategy to address pervasive friction within the engineering team. The core of that strategy is clarifying how Calm makes major technology decisions, along with documenting the motivating goal steering those decisions: maximizing time and energy spent on creating their product. That strategy reduced friction by eliminating the cause of ongoing debate. It was successful in resetting the team’s focus. It also caused several engineers to leave the company, because it was incompatible with their priorities. It’s easy to view that as a downside, but I don’t think it was. A clear, documented strategy made it clear to everyone involved what sort of game we were playing, the rules for that game, and for the first time let them accurately decide if they wanted to be part of that game with the wider team. Creating alignment is one of the ways that strategy makes an impact, but it’s certainly not the only way. Some of the ways that strategies support the creating organization are: Concentrating company investment into a smaller space. For example, deciding not to decompose a monolith allows you to invest the majority of your tooling efforts on one language, one test suite, and one deployment mechanism. Many interesting properties only available through universal adoption. For example, moving to an “N-1 policy” on backfilled roles is a significant opportunity for managing costs, but only works if consistently adopted. As another example, many strategies for disaster recovery or multi-region are only viable if all infrastructure has a common configuration mechanism. Focus execution on what truly matters. For example, Uber’s service migration strategy allowed a four engineer team to migrate a thousand services operated by two thousand engineers to a new provisioning and orchestration platform in less than a year. This was an extraordinarily difficult project, and was only possible because of clear thinking. Creating a knowledge repository of how your organization thinks. Onboarding new hires, particularly senior new hires, is much more effective with documented strategy. For example, most industry professionals today have a strongly held opinion on how to adopt large language models. New hires will have a strong opinion as well, but they’re unlikely to share your organization’s opinion unless there’s a clear document they can read to understand it. There are some things that a strategy, even a cleverly written one, cannot do. However, it’s always been my experience that developing a strategy creates progress, even if the progress is understanding the inherent disagreement preventing agreement. Inappropriate strategy is especially impactful While good strategy can accomplish many things, it sometimes feels that inappropriate strategy is far more impactful. Of course, impactful in all the wrong ways. Digg V4 remains the worst considered strategy I’ve personally participated in. It was a complete rewrite of the Digg V3.5 codebase from a PHP monolith to a PHP frontend and backend of a dozen Python services. It also moved the database from sharded MySQL to an early version of Cassandra. Perhaps worst, it replaced the nuanced algorithms developed over a decade with a hack implemented a few days before launch. Although it’s likely Digg would have struggled to become profitable due to its reliance on search engine optimization for traffic, and Google’s frequently changing search algorithm of that era, the engineering strategy ensured we died fast rather than having an opportunity to dig our way out. Importantly, it’s not just Digg. Almost every engineering organization you drill into will have it’s share of unused platform projects that captured decades of engineering years to the detriment of an important opportunity. A shocking number of senior leaders join new companies and initiate a grand migration that attempts to entirely rewrite the architecture, switch programming languages, or otherwise shift their new organization to resemble a prior organization where they understood things better. Inappropriate versus bad When I first wrote this section, I just labeled this sort of strategy as “bad.” The challenge with that term is that the same strategy might well be very effective in a different set of circumstances. For example, if Digg had been a three person company with no revenue, rewriting from scratch could have the right decision! As a result, I’ve tried to prefer the term “inappropriate” rather than “bad” to avoid getting caught up on whether a given approach might work in other circumstances. Every approach undoubtedly works in some organization. Written strategy drives organizational learning When I joined Carta, I noticed we had an inconsistent approach to a number of important problems. Teams had distinct standard kits for how they approached new projects. Adoption of existing internal platforms was inconsistent, as was decision making around funding new internal platforms. There was widespread agreement that we were decomposing our monolith, but no agreement on how we were doing it. Coming into such a permissive strategy environment, with strong, differing perspectives on the ideal path forward, one of my first projects was writing down an explicit engineering strategy along with our newly formed Navigators team, itself a part of our new engineering strategy. Navigators at Carta As discussed in Navigators, we developed a program at Carta to have explicitly named individual contributor, technical leaders to represent key parts of the engineering organization. This representative leadership group made it possible to iterate on strategy with a small team of about ten engineers that represented the entire organization, rather than take on the impossible task of negotiating with 400 engineers directly. This written strategy made it possible to explicitly describe the problems we saw, and how we wanted to navigate those problems. Further, it was an artifact that we were able to iterate on in a small group, but then share widely for feedback from teams we might have missed. After initial publishing, we shared it widely and talked about it frequently in engineering all-hands meetings. Then we came back to it each year, or when things stopped making much sense, and revised it. As an example, our initial strategy didn’t talk about artificial intelligence at all. A few months later, we extended it to mention a very conservative approach to using Large Language Models. Most recently, we’ve revised the artificial intelligence portion again, as we dive deeply into agentic workflows. A lot of people have disagreed with parts of the strategy, which is great: that’s one of the key benefits of a written strategy, it’s possible to precisely disagree. From that disagreement, we’ve been able to evolve our strategy. Sometimes because there’s new information like the current rapidly evolution of artificial intelligence pratices, and other times because our initial approach could be improved like in how we gated membership of the initial Navigators team. New hires are able to disagree too, and do it from an informed place rather than coming across as attached to their prior company’s practices. In particular, they’re able to understand the historical thinking that motivated our decisions, even when that context is no longer obvious. At the time we paused decomposition of our monolith, there was significant friction in service provisioning, but that’s far less true today, which makes the decision seem a bit arbitrary. Only the written document can consistently communicate that context across a growing, shifting, and changing organization. With oral history, what you believe is highly dependent on who you talk with, which shapes your view of history and the present. With writen history, it’s far more possible to agree at scale, which is the prerequisite to growing at scale rather than isolating growth to small pockets of senior leadership. The cost of implicit strategy We just finished talking about written strategy, and this book spends a lot of time on this topic, including a chapter on how to structure strategies to maximize readability. It’s not just because of the positives created by written strategy, but also because of the damage unwritten strategy creates. Vulnerable to misinterpretation. Information flow in verbal organizations depends on an individual being in a given room for a decision, and then accurately repeating that information to the others who need it. However, it’s common to see those individuals fail to repeat that information elsewhere. Sometimes their interpretation is also faulty to some degree. Both of these create significant problems in operating strategy. Two-headed organizations Some years ago, I started moving towards a model where most engineering organizations I worked with have two leaders: one who’s a manager, and another who is a senior engineer. This was partially to ensure engineering context was included in senior decision making, but it was also to reduce communication errors. Errors in point-to-point communication are so prevalent when done one-to-one, that the only solution I could find for folks who weren’t reading-oriented communicators was ensuring I had communicated strategy (and other updates) to at least two people. Inconsistency across teams. At one company I worked in, promotions to Staff-plus role happened at a much higher rate in the infrastructure engineering organization than the product engineering team. This created a constant drain out of product engineering to work on infrastructure shaped problems, even if those problems weren’t particularly valuable to the business. New leaders had no idea this informal policy existed, and they would routinely run into trouble in calibration discussions. They also weren’t aware they needed to go argue for a better policy. Worse, no one was sure if this was a real policy or not, so it was ultimately random whether this perspective was represented for any given promotion: sometimes good promotions would be blocked, sometimes borderline cases would be approved. Inconsistency over time. Implementing a new policy tends to be a mix of persistent and one-time actions. For example, let’s say you wanted to standardize all HTTP operations to use the same library across your codebase. You might add a linter check to reject known alternatives, and you’ll probably do a one-time pass across your codebase standardizing on that library. However, two years later there are another three random HTTP libraries in your codebase, creeping into the cracks surrounding your linting. If the policy is written down, and a few people read it, then there’s a number of ways this could be nonetheless prevented. If it’s not written down, it’s much less likely someone will remember, and much more likely they won’t remember the rationale well enough to argue about it. Hazard to new leadership. When a new Staff-plus engineer or executive joins a company, it’s common to blame them for failing to understand the existing context behind decisions. That’s fair: a big part of senior leadership is uncovering and understanding context. It’s also unfair: explicit documentation of prior thinking would have made this much easier for them. Every particularly bad new-leader onboarding that I’ve seen has involved a new leader coming into an unfilled role, that the new leader’s manager didn’t know how to do. In those cases, success is entirely dependent on that new leader’s ability and interest in learning. In most ways, the practice of documenting strategy has a lot in common with succession planning, where the full benefits accrue to the organization rather than to the individual doing it. It’s possible to maintain things when the original authors are present, appreciating the value requires stepping outside yourself for a moment to value things that will matter most to the organization when you’re no longer a member. Information herd immunity A frequent objection to written strategy is that no one reads anything. There’s some truth to this: it’s extremely hard to get everyone in an organization to know something. However, I’ve never found that goal to be particularly important. My view of information dispersal in an organization is the same as Herd immunity: you don’t need everyone to know something, just to have enough people who know something that confusion doesn’t propagate too far. So, it may be impossible for all engineers to know strategy details, but you certainly can have every Staff-plus engineer and engineering manager know those details. Strategy supports personal learning While I believe that the largest benefits of strategy accrue to the organization, rather than the individual creating it, I also believe that strategy is an underrated avenue for self-development. The ways that I’ve seen strategy support personal development are: Creating strategy builds self-awareness. Starting with a concrete example, I’ve worked with several engineers who viewed themselves as extremely senior, but frequently demanded that projects were implemented using new programming languages or technologies because they personally wanted to learn about the technology. Their internal strategy was clear–they wanted to work on something fun–but following the steps to build an engineering strategy would have created a strategy that even they agreed didn’t make sense. Strategy supports situational awareness in new environments. Wardley mapping talks a lot about situational awareness as a prerequisite to good strategy. This is ensuring you understand the realities of your circumstances, which is the most destructive failure of new senior engineering leaders. By explicitly stating the diagnosis where the strategy applied, it makes it easier for you to debug why reusing a prior strategy in a new team or company might not work. Strategy as your personal archive. Just as documented strategy is institutional memory, it also serves as personal memory to understand the impact of your prior approaches. Each of us is an archivist of our prior work, pulling out the most valuable pieces to address the problem at hand. Over a long career, memory fades–and motivated reasoning creeps in–but explicit documentation doesn’t. Indeed, part of the reason I started working on this book now rather than later is that I realized I was starting to forget the details of the strategy work I did earlier in my career. If I wanted to preserve the wisdom of that era, and ensure I didn’t have to relearn the same lessons in the future, I had to write it now. Summary We’ve covered why strategy can be a valuable learning mechanism for both your engineering organization and for you. We’ve shown how strategies have helped organizations deal with service migrations, monolith decomposition, and right-sizing backfilling. We’ve also discussed how inappropriate strategy contributed to Digg’s demise. However, if I had to pick two things to emphasize as this chapter ends, it wouldn’t be any of those things. Rather, it would be two themes that I find are the most frequently ignored: There’s always a strategy, even if it isn’t written down. The single biggest act you can take to further strategy in your organization is to write down strategy so it can be debated, agreed upon, and explicitly evolved. Discussions around topics like strategy often get caught up in high prestige activities like making controversial decisions, but the most effective strategists I’ve seen make more progress by actually performing the basics: writing things down, exploring widely to see how other companies solve the same problem, accepting feedback into their draft from folks who disagree with them. Strategy is useful, and doing strategy can be simple, too.