Full Width [alt+shift+f] Shortcuts [alt+shift+k]
Sign Up [alt+shift+s] Log In [alt+shift+l]
12
You might have noticed that Mission Control — Jobs, isn’t simply titled “Mission Control”. That’s because it was developed alongside another useful tool, Mission Control — Web. Today I’m pleased to announce that we are open sourcing Mission Control — Web and you can find it on GitHub. Here’s how it looks: Admin dashboard of Mission Control — Web What’s it for? Mission Control — Web allows immediate control of web requests during incidents, by denying access to parts of the application, defined by specific paths. My colleague Jorge proposed Mission Control — Web in 2022. He identified the need for such a tool during previous incidents, where a whole application became unavailable due to specific features being unperformant. This is admittedly a simple tool, but it’s a nice lever to be able to pull in case of emergency. What does it do? It has two parts: an admin dashboard and a middleware. These can be configured in the same app, or ideally, two separate apps....
9 months ago

Improve your reading experience

Logged in users get linked directly to articles resulting in a better reading experience. Please login for free, it takes less than 1 minute.

More from 37signals Dev

Monitoring 10 Petabytes of data in Pure Storage

As the final part of our move out of the cloud, we are working on moving 10 petabytes of data out of AWS Simple Storage Service (S3). After exploring different alternatives, we decided to go with Pure Storage FlashBlade solution. We store different kinds of information on S3, from the attachments customers upload to Basecamp to the Prometheus long-term metrics. On top of that, Pure’s system also provides filesystem-based capabilities, enabling other relevant usages, such as database backup storage. This makes the system a top priority for observability. Although the system has great reliability, out-of-the-box internal alerting, and autonomous ticket creation, it would also be good to have our metrics and alerts to facilitate problem-solving and ensure any disruptions are prioritized and handled. For more context on our current Prometheus setup, see how we use Prometheus at 37signals. Pure OpenMetrics exporter Pure maintains two OpenMetrics exporters, pure-fb-openmetrics-exporter and pure-fa-openmetrics-exporter. Since we use Pure Flashblade (fb), this post covers pure-fb-openmetrics-exporter, although overall usage should be similar. The setup is straightforward and requires only binary and basic authentication installation. Here is a snippet of our Chef recipe that installs it: pure_api_token = "token" # If you use Chef, your token should come from an ecrypted databag. Changed to hardcoded here to simplify PURE_EXPORTER_VERSION = "1.0.13".freeze # Generally, we use Chef node metadata for version management. Changed to hardcoded to simplify directory "/opt/pure_exporter/#{PURE_EXPORTER_VERSION}" do recursive true owner 'pure_exporter' group 'pure_exporter' end # Avoid recreating under /tmp after reboot if target_binary is already there target_binary = "/opt/pure_exporter/#{PURE_EXPORTER_VERSION}/pure-fb-openmetrics-exporter" remote_file "/tmp/pure-fb-openmetrics-exporter-v#{PURE_EXPORTER_VERSION}-linux-amd64.tar.gz" do source "https://github.com/PureStorage-OpenConnect/pure-fb-openmetrics-exporter/releases/download/v#{PURE_EXPORTER_VERSION}/pure-fb-openmetrics-exporter-v#{PURE_EXPORTER_VERSION}-linux-amd64.tar.gz" not_if { ::File.exist?(target_binary) } end archive_file "/tmp/pure-fb-openmetrics-exporter-v#{PURE_EXPORTER_VERSION}-linux-amd64.tar.gz" do destination "/tmp/pure-fb-openmetrics-exporter-v#{PURE_EXPORTER_VERSION}" action :extract not_if { ::File.exist?(target_binary) } end execute "copy binary" do command "sudo cp /tmp/pure-fb-openmetrics-exporter-v#{PURE_EXPORTER_VERSION}/pure-fb-openmetrics-exporter /opt/pure_exporter/#{PURE_EXPORTER_VERSION}/pure-exporter" creates "/opt/pure_exporter/#{PURE_EXPORTER_VERSION}/pure-exporter" not_if { ::File.exist?(target_binary) } end tokens = <<EOF main: address: purestorage-mgmt.mydomain.com api_token: #{pure_api_token['token']} EOF file "/opt/pure_exporter/tokens.yml" do content tokens owner 'pure_exporter' group 'pure_exporter' sensitive true end systemd_unit 'pure-exporter.service' do content <<-EOU # Caution: Chef managed content. This is a file resource from #{cookbook_name}::#{recipe_name} # [Unit] Description=Pure Exporter After=network.target [Service] Restart=on-failure PIDFile=/var/run/pure-exporter.pid User=pure_exporter Group=pure_exporter ExecStart=/opt/pure_exporter/#{PURE_EXPORTER_VERSION}/pure-exporter \ --tokens=/opt/pure_exporter/tokens.yml ExecReload=/bin/kill -HUP $MAINPID SyslogIdentifier=pure-exporter [Install] WantedBy=multi-user.target EOU action [ :create, :enable, :start ] notifies :reload, "service[pure-exporter]" end service 'pure-exporter' Prometheus Job Configuration The simplest way of ingesting the metrics is to configure a basic Job without any customization: - job_name: pure_exporter metrics_path: /metrics static_configs: - targets: ['<%= @hostname %>:9491'] labels: environment: 'production' job: pure_exporter params: endpoint: [main] # From the tokens configuration above For a production-ready setup, we are using a slightly different approach. The exporter supports the usage of specific metric paths to allow for split Prometheus jobs configuration that reduces the overhead of pulling the metrics all at once: - job_name: pure_exporter_array metrics_path: /metrics/array static_configs: - targets: ['<%= @hostname %>:9491'] labels: environment: 'production' job: pure_exporter metric_relabel_configs: - source_labels: [name] target_label: ch regex: "([^.]+).*" replacement: "$1" action: replace - source_labels: [name] target_label: fb regex: "[^.]+\\.([^.]+).*" replacement: "$1" action: replace - source_labels: [name] target_label: bay regex: "[^.]+\\.[^.]+\\.([^.]+)" replacement: "$1" action: replace params: endpoint: [main] # From the tokens configuration above - job_name: pure_exporter_clients metrics_path: /metrics/clients static_configs: - targets: ['<%= @hostname %>:9491'] labels: environment: 'production' job: pure_exporter params: endpoint: [main] # From the tokens configuration above - job_name: pure_exporter_usage metrics_path: /metrics/usage static_configs: - targets: ['<%= @hostname %>:9491'] labels: environment: 'production' job: pure_exporter params: endpoint: [main] - job_name: pure_exporter_policies metrics_path: /metrics/policies static_configs: - targets: ['<%= @hostname %>:9491'] labels: environment: 'production' job: pure_exporter params: endpoint: [main] # From the tokens configuration above We also configure some metric_relabel_configs to extract labels from name using regex. Those labels help reduce the complexity of queries that aggregate metrics by different components. Detailed documentation on the available metrics can be found here. Alerts Auto Generated Alerts As I shared earlier, the system has an internal Alerting module that automatically triggers alerts for critical situations and creates tickets. To cover those alerts on the Prometheus side, we added an alerting configuration of our own that relies on the incoming severities: - alert: PureAlert annotations: summary: '{{ $labels.summary }}' description: '{{ $labels.component_type }} - {{ $labels.component_name }} - {{ $labels.action }} - {{ $labels.kburl }}' dashboard: 'https://grafana/your-dashboard' expr: purefb_alerts_open{environment="production"} == 1 for: 1m We still need to evaluate how the pure-generated alerts will interact with the custom alerts I will cover below, and we might decide to stick to one or the other depending on what we find out. Hardware Before I continue, the image below helps visualize how some of the Pure FlashBlade components are physically organized: Because of Pure’s reliability, most isolated hardware failures do not require the immediate attention of an Ops team member. To cover the most basic hardware failures, we configure an alert that sends a message to the Ops Basecamp 4 project chat: - alert: PureHardwareFailed annotations: summary: Hardware {{ $labels.name }} in chassis {{ $labels.ch }} is failed description: 'The Pure Storage hardware {{ $labels.name }} in chassis {{ $labels.ch }} is failed' dashboard: 'https://grafana/your-dashboard' expr: purefb_hardware_health == 0 for: 1m labels: severity: chat-notification We also configure alerts that check for multiple hardware failures of the same type. This doesn’t mean two simultaneous failures will result in a critical state, but it is a fair guardrail for unexpected scenarios. We also expect those situations to be rare, keeping the risk of causing unnecessary noise low. - alert: PureMultipleHardwareFailed annotations: summary: Pure chassis {{ $labels.ch }} has {{ $value }} failed {{ $labels.type }} description: 'The Pure Storage chassis {{ $labels.ch }} has {{ $value }} failed {{ $labels.type }}, close to the healthy limit of two simultaneous failures. Ensure that the hardware failures are being worked on' dashboard: 'https://grafana/your-dashboard' expr: count(purefb_hardware_health{type!~"eth|mgmt_port|bay"} == 0) by (ch,type,environment) > 1 for: 1m labels: severity: page # We are looking for multiple failed bays in the same blade - alert: PureMultipleBaysFailed annotations: summary: Pure chassis {{ $labels.ch }} has fb {{ $labels.fb }} with {{ $value }} failed bays description: 'The Pure Storage chassis {{ $labels.ch }} has fb {{ $labels.fb }} with {{ $value }} failed bays, close to the healthy limit of two simultaneous failures. Ensure that the hardware failures are being worked on' dashboard: 'https://grafana/your-dashboard' expr: count(purefb_hardware_health{type="bay"} == 0) by (ch,type,fb,environment) > 1 for: 1m labels: severity: page Finally, we configure high-level alerts for chassis and XFM failures: - alert: PureChassisFailed annotations: summary: Chassis {{ $labels.name }} is failed description: 'The Pure Storage hardware chassis {{ $labels.name }} is failed' dashboard: 'https://grafana/your-dashboard' expr: purefb_hardware_health{type="ch"} == 0 for: 1m labels: severity: page - alert: PureXFMFailed annotations: summary: Xternal Fabric Module {{ $labels.name }} is failed description: 'The Pure Storage hardware Xternal fabric module {{ $labels.name }} is failed' dashboard: 'https://grafana/your-dashboard' expr: purefb_hardware_health{type="xfm"} == 0 for: 1m labels: severity: page Latency Using the metric purefb_array_performance_latency_usec we can set a threshold for all the different protocols and dimensions (read, write, etc), so we are alerted if any problem causes the latency to go above an expected level. - alert: PureLatencyHigh annotations: summary: Pure {{ $labels.dimension }} - {{ $labels.protocol }} latency high description: 'Pure {{ $labels.protocol }} latency for dimension {{ $labels.dimension }} is above 100ms' dashboard: 'https://grafana/your-dashboard' expr: (avg_over_time(purefb_array_performance_latency_usec{protocol="all"}[30m]) * 0.001) for: 1m labels: severity: chat-notification Saturation For saturation, we are primarily worried about something unexpected causing excessive use of array space, increasing the risk of hitting the cluster capacity. With that in mind, it’s good to have a simple alert in place, even if we don’t expect it to fire anytime soon: - alert: PureArraySpace annotations: summary: Pure Cluster {{ $labels.instance }} available space is expected to be below 10% description: 'The array space for pure cluster {{ $labels.instance }} is expected to be below 10% in a month, please investigate and ensure there is no risk of running out of capacity' dashboard: 'https://grafana/your-dashboard' expr: (predict_linear(purefb_array_space_bytes{space="empty",type="array"}[30d], 730 * 3600)) < (purefb_array_space_bytes{space="capacity",type="array"} * 0.10) for: 1m labels: severity: chat-notification HTTP We use BigIp load balancers to front-end the cluster, which means that all the alerts we already had in place for the BigIp HTTP profiles, virtual servers, and pools also cover access to Pure. The solution for each organization on this topic will be different, but it is a good practice to keep an eye on HTTP status codes and throughput. Grafana Dashboards The project’s GitHub repository includes JSON files for Grafana dashboards that are based on the metrics generated by the exporter. With simple adjustments to fit each setup, it’s possible to import them quickly. Wrapping up On top of the system’s built-in capabilities, Pure also provides options to integrate their system into well-known tools like Prometheus and Grafana, facilitating the process of managing the cluster the same way we manage everything else. I hope this post helps any other team interested in working with them better understand the effort involved. Thanks for reading!

a month ago 48 votes
Announcing Hotwire Spark: live reloading for Rails applications

Today, we are releasing Hotwire Spark, a live-reloading system for Rails Applications. Reloading the browser automatically on source changes is a problem that has been well-solved for a long time. Here, we wanted to put an accent on smoothness. If the reload operation is very noticeable, the feedback loop is similar to just reloading the page yourself. But if it’s smooth enough—if you only perceive the intended change—the feedback loop becomes terrific. To use, just install the gem in development: group :development do gem "hotwire-spark" end It will update the current page on three types of change: HTML content, CSS, and Stimulus controllers. How do we achieve that desired smoothness with each? For HTML content, it morphs the <body> of the page into the new <body>. Also, it disconnects and reconnects all the Stimulus controllers on the page. For CSS, it reloads the changed stylesheet. For Stimulus controllers, it fetches the changed controller, replaces its module in Stimulus, and reconnects all the controllers. We designed Hotwire Spark to shine with the #nobuildapproach we use and recommend. Serving CSS and JS assets as standalone files is ideal when you want to fetch and update only what has changed. There is no need to use bundling or any tooling. Hot Module Replacement for Stimulus controllers without any frontend building tool is pretty cool! 2024 has been a very special year for Rails. We’re thrilled to share Hotwire Spark before the year wraps up. Wishing you all a joyful holiday season and a fantastic start to 2025.

2 months ago 66 votes
A vanilla Rails stack is plenty

If you have the luxury of starting a new Rails app today, here’s our recommendation: go vanilla. Fight hard before adding Ruby dependencies. Keep that Gemfile that Rails generates as close to the original one as possible. Fight even harder before adding Javascript dependencies. You don’t need React or any other front-end frameworks, nor a JSON API to feed those. Hotwire is a fantastic, pragmatic, and ridiculously productive technology for the front end. Use it. The same goes for mobile apps: use Hotwire Native. With a hybrid approach you can combine the very same web app you have built with a wonderful native experience right where you want it. The productivity compared to a purely native approach is night and day. Embrace and celebrate rendering things on the server. It has become cool again. ERB templates and view helpers will take you as long as you need, and they are a fantastic common ground for designers to collaborate hands-on with the code. #nobuild is the simplest way to go; don’t close this door with your choices. Instead of bundling Javascript, use import maps. Don’t bundle CSS, just use modern standard CSS goodies and serve them all with Propshaft. If you have 100 Javascript files and 100 stylesheets, serve 200 standalone requests multiplexed over HTTP2. You will be delighted. Don’t add Redis to the mix. Use solid_cache for caching, solid_queue for jobs, and solid_cable for Action Cable. They will all work on your beloved relational database and are battle-tested. Test your apps with Minitest. Use fixtures and build a realistic set of those as you cook your app. Make your app a PWA, which is fully supported by Rails 8. This may be more than enough before caring about mobile apps at all. Deploy your app with Kamal. If you want heuristics, your importmap.rb should import Turbo, Stimulus, your app controllers, and little else. Your Gemfile should be almost identical to the one that Rails generates. I know it sounds radical, but going vanilla is a radical stance in this convoluted world of endless choices. This is the Rails 8 stack we have chosen for our new apps at 37signals. We are a tiny crew, so we care a lot about productivity. And we sell products, not stacks, so we care a lot about delighting our users. This is our Omakase stack because it offers the optimal balance for achieving both. Vanilla means your app stays nimble. Fewer dependencies mean fewer future headaches. You get a tight integration out of the box, so you can focus on building things. It also maximizes the odds of having smoother future upgrades. Vanilla requires determination, though, because new dependencies always look shiny and shinier. It’s always clear what you get when you add them, but never what you lose in the long term. It is certainly up to you. Rails is a wonderful big tent. These are our opinions. If it resonates, choose vanilla! Guess what our advice is for architecting your app internals?

2 months ago 33 votes
Mission Control — Jobs 1.0 released

We’ve just released Mission Control — Jobs v1.0.0, the dashboard and set of extensions to operate background jobs that we introduced earlier this year. This new version is the result of 92 pull requests, 67 issues and the help of 35 different contributors. It includes many bugfixes and improvements, such as: Support for Solid Queue’s recurring tasks, including running them on-demand. Support for API-only apps. Allowing immediate dispatching of scheduled and blocked jobs. Backtrace cleaning for failed jobs’ backtraces. A safer default for authentication, with Basic HTTP authentication enabled and initially closed unless configured or explicitly disabled. Recurring tasks in Mission Control — Jobs, with a subset of the tasks we run in production We use Mission Control — Jobs daily to manage jobs HEY and Basecamp 4, with both Solid Queue and Resque, and it’s the dashboard we recommend if you’re using Solid Queue for your jobs. Our plan is to upstream some of the extensions we’ve made to Active Job and continue improving it until it’s ready to be included by default in Rails together with Solid Queue. If you want to help us with that, are interested in learning more or have any issues or questions, head over to the repo in GitHub. We hope you like it!

2 months ago 30 votes
All about QA

Quality Assurance (QA) is a team of two at 37signals: Michael, who created the department 12 years ago, and Gabriel, who joined the team in 2022. Together, we have a hand in projects across all of our products, from kickoff to release. Our goal is to help designers and programmers ship their best work. Our process revolves around manual testing and has been tuned to match the rhythm of Shape Up. Here, we’ll share the ins and outs of our methods and touch on a few of the tools we use along the way. Kicking things off At 37signals we run projects in six-week cycles informed by Shape Up. At the beginning of each cycle, Brian, our Head of Product, posts a kick-off message detailing what we plan to ship. This usually consists of new features and improvements for Basecamp, HEY, or a ONCE product. Each gets its own Basecamp project, and each project includes a pitch. The pitch lays out the problem or need, a proposed solution, and the “appetite” or time budget. The kick-off is also QA’s cue to dive in! We offer early feedback, ask questions or illuminate things that aren’t covered, and give extra consideration to flows and interactions that may require extra work on the accessibility front. We then step back and let the teams focus, design, and build things for a while. The right time to test We wait until the feature or product reaches a usable state to start testing in earnest. This helps us keep a fresh perspective, unencumbered by the knowledge of compromises made along the way. We use a Card Table within our QA Team project to track what’s ready for testing or in progress. Teams add a card to the Ready for QA (Triage) section when the time is right. The table is kept simple with just two columns, In Progress and Pending Input, for when we’ve completed our test run and the team is addressing the feedback. Depending on the breadth and complexity of the work being tested, this flow can take anywhere from a few hours to a few days. A holistic approach to QA Once we take on a request, we explore and scrutinize the feature much like an (extremely zealous!) customer would. We want to help teams ship the most polished features they can. We look out for bugs of all kinds: performance issues, visual glitches, unexpected changes, and so on, but perhaps most importantly, we offer feedback on the usability of the feature. We guide our feedback with questions like: Is this feature easy to discover and access? Is it in the right spot? Does it interact in an unexpected way with another part of the app? How does the change play with our mobile apps? Does this solve the problem in a way that customers will find obvious? Critically, what we raise with this type of QA testing are suggestions, not must-haves. The designer and programmer working on the feature make the call on what to address and what to shelve. We document this feedback in a dedicated Card Table within the feature’s Basecamp project. The designer and programmer will then review the cards we’ve added to Triage and direct them to the In Progress and Not Now columns as appropriate. From In Progress, cards are moved to a column called QA to confirm fixed, then finally to Done. More focus, less bloat Our overall approach to testing is guided exploration. We don’t maintain an exhaustive collection of test cases to dogmatically review each time we test a feature. We’ve tried using dedicated test plan tools and comprehensive spreadsheets of test cases upon test cases; the time spent certifying every little thing was considerable, yet it didn’t translate into finding more issues. Worse, it left us with less time to spend sitting with the feature in a more subjective way. We’ve landed on a more pragmatic approach. We’ve boiled down the test plan to a concise list of considerations that live in Basecamp as to-do list templates, one for each product. Instead of a multitude of test cases, each template contains around 100 items. These act as pointers, touching on overall concepts (like commenting, dark mode, email notifications), specific areas of the app, and platform-specific considerations. We reflect on the work presented and how it ties into these areas. Some examples from recent projects have been: Did we update exporting to consider this new addition of time tracking entries? Are email notifications properly reflecting the new Steps feature we added to Card Table? How about print styles, do they look good? QA Considerations for Basecamp 4 We create a to-do list via the template directly in the project we are working on, and use that as our reference for reviewing the work. We also ask the feature team if there are areas that deserve extra attention. Being flexible and discerning about how much time and coverage we use in our testing allows us to cover anywhere from 4 to 12+ projects in a very short span of time. We love working as a team of two and being able to riff on how to approach testing a feature. Sometimes, we divide and conquer; other times, both of us review the work. Fresh eyes provide a good chance of catching something new. Gabriel has a better knack for Android conventions and Michael for iOS, but we actively avoid over-specializing. Keeping up with multiple platforms requires extra effort, but it’s worth it when considering the consistency of the experience across all of them. Accessibility As part of our review, we test the accessibility of the changes. We use a combination of keyboard navigation and at least one screen reader on each platform to vet how well the feature will work for someone who relies on accessible technology. We also use browser extensions like axe and Accessibility Insights for Web to validate semantics of the code and Headings Map to make sure heading levels are sequential. At times, we bring in customers who use a screen reader full-time to help us validate whether everything makes sense and learn where things can improve. Our new colleague, Bruno, is a full-time user of the NVDA screen reader and can offer this sort of direct feedback on how a feature or flow works for him. Explorations in tooling A recent addition to our toolkit is a visual regression tool built on BackstopJS with the help of our colleague Lewis. Whenever we review work, we can run the suite of tests — mostly a list of URLs for various pages around the app — first pointed to production, then against a beta environment where the new feature is staged. Any visual differences will be flagged in a report we review, then write up bug report cards for the team if needed. Walking the walk Part of what enables us to keep our process minimal is that we use our products daily, both on the job and in our everyday lives. This affords us an intimate understanding of how they work and how they can be improved. We’re passionate about what we do. We find ourselves fortunate to work with each other and with so many talented colleagues. We hope this post has given you some helpful insight into the way we do things! If you have questions or if there are topics you’d like us to cover in future posts, drop us an email at qa@37signals.com.

4 months ago 23 votes

More in programming

Diagnosis in engineering strategy.

Once you’ve written your strategy’s exploration, the next step is working on its diagnosis. Diagnosis is understanding the constraints and challenges your strategy needs to address. In particular, it’s about doing that understanding while slowing yourself down from deciding how to solve the problem at hand before you know the problem’s nuances and constraints. If you ever find yourself wanting to skip the diagnosis phase–let’s get to the solution already!–then maybe it’s worth acknowledging that every strategy that I’ve seen fail, did so due to a lazy or inaccurate diagnosis. It’s very challenging to fail with a proper diagnosis, and almost impossible to succeed without one. The topics this chapter will cover are: Why diagnosis is the foundation of effective strategy, on which effective policy depends. Conversely, how skipping the diagnosis phase consistently ruins strategies A step-by-step approach to diagnosing your strategy’s circumstances How to incorporate data into your diagnosis effectively, and where to focus on adding data Dealing with controversial elements of your diagnosis, such as pointing out that your own executive is one of the challenges to solve Why it’s more effective to view difficulties as part of the problem to be solved, rather than a blocking issue that prevents making forward progress The near impossibility of an effective diagnosis if you don’t bring humility and self-awareness to the process Into the details we go! This is an exploratory, draft chapter for a book on engineering strategy that I’m brainstorming in #eng-strategy-book. As such, some of the links go to other draft chapters, both published drafts and very early, unpublished drafts. Diagnosis is strategy’s foundation One of the challenges in evaluating strategy is that, after the fact, many effective strategies are so obvious that they’re pretty boring. Similarly, most ineffective strategies are so clearly flawed that their authors look lazy. That’s because, as a strategy is operated, the reality around it becomes clear. When you’re writing your strategy, you don’t know if you can convince your colleagues to adopt a new approach to specifying APIs, but a year later you know very definitively whether it’s possible. Building your strategy’s diagnosis is your attempt to correctly recognize the context that the strategy needs to solve before deciding on the policies to address that context. Done well, the subsequent steps of writing strategy often feel like an afterthought, which is why I think of diagnosis as strategy’s foundation. Where exploration was an evaluation-free activity, diagnosis is all about evaluation. How do teams feel today? Why did that project fail? Why did the last strategy go poorly? What will be the distractions to overcome to make this new strategy successful? That said, not all evaluation is equal. If you state your judgment directly, it’s easy to dispute. An effective diagnosis is hard to argue against, because it’s a web of interconnected observations, facts, and data. Even for folks who dislike your conclusions, the weight of evidence should be hard to shift. Strategy testing, explored in the Refinement section, takes advantage of the reality that it’s easier to diagnose by doing than by speculating. It proposes a recursive diagnosis process until you have real-world evidence that the strategy is working. How to develop your diagnosis Your strategy is almost certain to fail unless you start from an effective diagnosis, but how to build a diagnosis is often left unspecified. That’s because, for most folks, building the diagnosis is indeed a dark art: unspecified, undiscussion, and uncontrollable. I’ve been guilty of this as well, with The Engineering Executive’s Primer’s chapter on strategy staying silent on the details of how to diagnose for your strategy. So, yes, there is some truth to the idea that forming your diagnosis is an emergent, organic process rather than a structured, mechanical one. However, over time I’ve come to adopt a fairly structured approach: Braindump, starting from a blank sheet of paper, write down your best understanding of the circumstances that inform your current strategy. Then set that piece of paper aside for the moment. Summarize exploration on a new piece of paper, review the contents of your exploration. Pull in every piece of diagnosis from similar situations that resonates with you. This is true for both internal and external works! For each diagnosis, tag whether it fits perfectly, or needs to be adjusted for your current circumstances. Then, once again, set the piece of paper aside. Mine for distinct perspectives on yet another blank page, talking to different stakeholders and colleagues who you know are likely to disagree with your early thinking. Your goal is not to agree with this feedback. Instead, it’s to understand their view. The Crux by Richard Rumelt anchors diagnosis in this approach, emphasizing the importance of “testing, adjusting, and changing the frame, or point of view.” Synthesize views into one internally consistent perspective. Sometimes the different perspectives you’ve gathered don’t mesh well. They might well explicitly differ in what they believe the underlying problem is, as is typical in tension between platform and product engineering teams. The goal is to competently represent each of these perspectives in the diagnosis, even the ones you disagree with, so that later on you can evaluate your proposed approach against each of them. When synthesizing feedback goes poorly, it tends to fail in one of two ways. First, the author’s opinion shines through so strongly that it renders the author suspect. Your goal is never to agree with every team’s perspective, just as your diagnosis should typically avoid crowning any perspective as correct: a reader should generally be appraised of the details and unaware of the author. The second common issue is when a group tries to jointly own the synthesis, but create a fractured perspective rather than a unified one. I generally find that having one author who is accountable for representing all views works best to address both of these issues. Test drafts across perspectives. Once you’ve written your initial diagnosis, you want to sit down with the people who you expect to disagree most fervently. Iterate with them until they agree that you’ve accurately captured their perspective. It might be that they disagree with some other view points, but they should be able to agree that others hold those views. They might argue that the data you’ve included doesn’t capture their full reality, in which case you can caveat the data by saying that their team disagrees that it’s a comprehensive lens. Don’t worry about getting the details perfectly right in your initial diagnosis. You’re trying to get the right crumbs to feed into the next phase, strategy refinement. Allowing yourself to be directionally correct, rather than perfectly correct, makes it possible to cover a broad territory quickly. Getting caught up in perfecting details is an easy way to anchor yourself into one perspective prematurely. At this point, I hope you’re starting to predict how I’ll conclude any recipe for strategy creation: if these steps feel overly mechanical to you, adjust them to something that feels more natural and authentic. There’s no perfect way to understand complex problems. That said, if you feel uncertain, or are skeptical of your own track record, I do encourage you to start with the above approach as a launching point. Incorporating data into your diagnosis The strategy for Navigating Private Equity ownership’s diagnosis includes a number of details to help readers understand the status quo. For example the section on headcount growth explains headcount growth, how it compares to the prior year, and providing a mental model for readers to translate engineering headcount into engineering headcount costs: Our Engineering headcount costs have grown by 15% YoY this year, and 18% YoY the prior year. Headcount grew 7% and 9% respectively, with the difference between headcount and headcount costs explained by salary band adjustments (4%), a focus on hiring senior roles (3%), and increased hiring in higher cost geographic regions (1%). If everyone evaluating a strategy shares the same foundational data, then evaluating the strategy becomes vastly simpler. Data is also your mechanism for supporting or critiquing the various views that you’ve gathered when drafting your diagnosis; to an impartial reader, data will speak louder than passion. If you’re confident that a perspective is true, then include a data narrative that supports it. If you believe another perspective is overstated, then include data that the reader will require to come to the same conclusion. Do your best to include data analysis with a link out to the full data, rather than requiring readers to interpret the data themselves while they are reading. As your strategy document travels further, there will be inevitable requests for different cuts of data to help readers understand your thinking, and this is somewhat preventable by linking to your original sources. If much of the data you want doesn’t exist today, that’s a fairly common scenario for strategy work: if the data to make the decision easy already existed, you probably would have already made a decision rather than needing to run a structured thinking process. The next chapter on refining strategy covers a number of tools that are useful for building confidence in low-data environments. Whisper the controversial parts At one time, the company I worked at rolled out a bar raiser program styled after Amazon’s, where there was an interviewer from outside the team that had to approve every hire. I spent some time arguing against adding this additional step as I didn’t understand what we were solving for, and I was surprised at how disinterested management was about knowing if the new process actually improved outcomes. What I didn’t realize until much later was that most of the senior leadership distrusted one of their peers, and had rolled out the bar raiser program solely to create a mechanism to control that manager’s hiring bar when the CTO was disinterested holding that leader accountable. (I also learned that these leaders didn’t care much about implementing this policy, resulting in bar raiser rejections being frequently ignored, but that’s a discussion for the Operations for strategy chapter.) This is a good example of a strategy that does make sense with the full diagnosis, but makes little sense without it, and where stating part of the diagnosis out loud is nearly impossible. Even senior leaders are not generally allowed to write a document that says, “The Director of Product Engineering is a bad hiring manager.” When you’re writing a strategy, you’ll often find yourself trying to choose between two awkward options: Say something awkward or uncomfortable about your company or someone working within it Omit a critical piece of your diagnosis that’s necessary to understand the wider thinking Whenever you encounter this sort of debate, my advice is to find a way to include the diagnosis, but to reframe it into a palatable statement that avoids casting blame too narrowly. I think it’s helpful to discuss a few concrete examples of this, starting with the strategy for navigating private equity, whose diagnosis includes: Based on general practice, it seems likely that our new Private Equity ownership will expect us to reduce R&D headcount costs through a reduction. However, we don’t have any concrete details to make a structured decision on this, and our approach would vary significantly depending on the size of the reduction. There are many things the authors of this strategy likely feel about their state of reality. First, they are probably upset about the fact that their new private equity ownership is likely to eliminate colleagues. Second, they are likely upset that there is no clear plan around what they need to do, so they are stuck preparing for a wide range of potential outcomes. However they feel, they don’t say any of that, they stick to precise, factual statements. For a second example, we can look to the Uber service migration strategy: Within infrastructure engineering, there is a team of four engineers responsible for service provisioning today. While our organization is growing at a similar rate as product engineering, none of that additional headcount is being allocated directly to the team working on service provisioning. We do not anticipate this changing. The team didn’t agree that their headcount should not be growing, but it was the reality they were operating in. They acknowledged their reality as a factual statement, without any additional commentary about that statement. In both of these examples, they found a professional, non-judgmental way to acknowledge the circumstances they were solving. The authors would have preferred that the leaders behind those decisions take explicit accountability for them, but it would have undermined the strategy work had they attempted to do it within their strategy writeup. Excluding critical parts of your diagnosis makes your strategies particularly hard to evaluate, copy or recreate. Find a way to say things politely to make the strategy effective. As always, strategies are much more about realities than ideals. Reframe blockers as part of diagnosis When I work on strategy with early-career leaders, an idea that comes up a lot is that an identified problem means that strategy is not possible. For example, they might argue that doing strategy work is impossible at their current company because the executive team changes their mind too often. That core insight is almost certainly true, but it’s much more powerful to reframe that as a diagnosis: if we don’t find a way to show concrete progress quickly, and use that to excite the executive team, our strategy is likely to fail. This transforms the thing preventing your strategy into a condition your strategy needs to address. Whenever you run into a reason why your strategy seems unlikely to work, or why strategy overall seems difficult, you’ve found an important piece of your diagnosis to include. There are never reasons why strategy simply cannot succeed, only diagnoses you’ve failed to recognize. For example, we knew in our work on Uber’s service provisioning strategy that we weren’t getting more headcount for the team, the product engineering team was going to continue growing rapidly, and that engineering leadership was unwilling to constrain how product engineering worked. Rather than preventing us from implementing a strategy, those components clarified what sort of approach could actually succeed. The role of self-awareness Every problem of today is partially rooted in the decisions of yesterday. If you’ve been with your organization for any duration at all, this means that you are directly or indirectly responsible for a portion of the problems that your diagnosis ought to recognize. This means that recognizing the impact of your prior actions in your diagnosis is a powerful demonstration of self-awareness. It also suggests that your next strategy’s success is rooted in your self-awareness about your prior choices. Don’t be afraid to recognize the failures in your past work. While changing your mind without new data is a sign of chaotic leadership, changing your mind with new data is a sign of thoughtful leadership. Summary Because diagnosis is the foundation of effective strategy, I’ve always found it the most intimidating phase of strategy work. While I think that’s a somewhat unavoidable reality, my hope is that this chapter has somewhat prepared you for that challenge. The four most important things to remember are simply: form your diagnosis before deciding how to solve it, try especially hard to capture perspectives you initially disagree with, supplement intuition with data where you can, and accept that sometimes you’re missing the data you need to fully understand. The last piece in particular, is why many good strategies never get shared, and the topic we’ll address in the next chapter on strategy refinement.

10 hours ago 3 votes
My friend, JT

I’ve had a cat for almost a third of my life.

2 hours ago 2 votes
[Course Launch] Hands-on Introduction to X86 Assembly

A Live, Interactive Course for Systems Engineers

4 hours ago 2 votes
It’s cool to care

I’m sitting in a small coffee shop in Brooklyn. I have a warm drink, and it’s just started to snow outside. I’m visiting New York to see Operation Mincemeat on Broadway – I was at the dress rehearsal yesterday, and I’ll be at the opening preview tonight. I’ve seen this show more times than I care to count, and I hope US theater-goers love it as much as Brits. The people who make the show will tell you that it’s about a bunch of misfits who thought they could do something ridiculous, who had the audacity to believe in something unlikely. That’s certainly one way to see it. The musical tells the true story of a group of British spies who tried to fool Hitler with a dead body, fake papers, and an outrageous plan that could easily have failed. Decades later, the show’s creators would mirror that same spirit of unlikely ambition. Four friends, armed with their creativity, determination, and a wardrobe full of hats, created a new musical in a small London theatre. And after a series of transfers, they’re about to open the show under the bright lights of Broadway. But when I watch the show, I see a story about friendship. It’s about how we need our friends to help us, to inspire us, to push us to be the best versions of ourselves. I see the swaggering leader who needs a team to help him truly achieve. The nervous scientist who stands up for himself with the support of his friends. The enthusiastic secretary who learns wisdom and resilience from her elder. And so, I suppose, it’s fitting that I’m not in New York on my own. I’m here with friends – dozens of wonderful people who I met through this ridiculous show. At first, I was just an audience member. I sat in my seat, I watched the show, and I laughed and cried with equal measure. After the show, I waited at stage door to thank the cast. Then I came to see the show a second time. And a third. And a fourth. After a few trips, I started to see familiar faces waiting with me at stage door. So before the cast came out, we started chatting. Those conversations became a Twitter community, then a Discord, then a WhatsApp. We swapped fan art, merch, and stories of our favourite moments. We went to other shows together, and we hung out outside the theatre. I spent New Year’s Eve with a few of these friends, sitting on somebody’s floor and laughing about a bowl of limes like it was the funniest thing in the world. And now we’re together in New York. Meeting this kind, funny, and creative group of people might seem as unlikely as the premise of Mincemeat itself. But I believed it was possible, and here we are. I feel so lucky to have met these people, to take this ridiculous trip, to share these precious days with them. I know what a privilege this is – the time, the money, the ability to say let’s do this and make it happen. How many people can gather a dozen friends for even a single evening, let alone a trip halfway round the world? You might think it’s silly to travel this far for a theatre show, especially one we’ve seen plenty of times in London. Some people would never see the same show twice, and most of us are comfortably into double or triple-figures. Whenever somebody asks why, I don’t have a good answer. Because it’s fun? Because it’s moving? Because I enjoy it? I feel the need to justify it, as if there’s some logical reason that will make all of this okay. But maybe I don’t have to. Maybe joy doesn’t need justification. A theatre show doesn’t happen without people who care. Neither does a friendship. So much of our culture tells us that it’s not cool to care. It’s better to be detached, dismissive, disinterested. Enthusiasm is cringe. Sincerity is weakness. I’ve certainly felt that pressure – the urge to play it cool, to pretend I’m above it all. To act as if I only enjoy something a “normal” amount. Well, fuck that. I don’t know where the drive to be detached comes from. Maybe it’s to protect ourselves, a way to guard against disappointment. Maybe it’s to seem sophisticated, as if having passions makes us childish or less mature. Or perhaps it’s about control – if we stay detached, we never have to depend on others, we never have to trust in something bigger than ourselves. Being detached means you can’t get hurt – but you’ll also miss out on so much joy. I’m a big fan of being a big fan of things. So many of the best things in my life have come from caring, from letting myself be involved, from finding people who are a big fan of the same things as me. If I pretended not to care, I wouldn’t have any of that. Caring – deeply, foolishly, vulnerably – is how I connect with people. My friends and I care about this show, we care about each other, and we care about our joy. That care and love for each other is what brought us together, and without it we wouldn’t be here in this city. I know this is a once-in-a-lifetime trip. So many stars had to align – for us to meet, for the show we love to be successful, for us to be able to travel together. But if we didn’t care, none of those stars would have aligned. I know so many other friends who would have loved to be here but can’t be, for all kinds of reasons. Their absence isn’t for lack of caring, and they want the show to do well whether or not they’re here. I know they care, and that’s the important thing. To butcher Tennyson: I think it’s better to care about something you cannot affect, than to care about nothing at all. In a world that’s full of cynicism and spite and hatred, I feel that now more than ever. I’d recommend you go to the show if you haven’t already, but that’s not really the point of this post. Maybe you’ve already seen Operation Mincemeat, and it wasn’t for you. Maybe you’re not a theatre kid. Maybe you aren’t into musicals, or history, or war stories. That’s okay. I don’t mind if you care about different things to me. (Imagine how boring the world would be if we all cared about the same things!) But I want you to care about something. I want you to find it, find people who care about it too, and hold on to them. Because right now, in this city, with these people, at this show? I’m so glad I did. And I hope you find that sort of happiness too. Some of the people who made this trip special. Photo by Chloe, and taken from her Twitter. Timing note: I wrote this on February 15th, but I delayed posting it because I didn’t want to highlight the fact I was away from home. [If the formatting of this post looks odd in your feed reader, visit the original article]

yesterday 3 votes
Stick with the customer

One of the biggest mistakes that new startup founders make is trying to get away from the customer-facing roles too early. Whether it's customer support or it's sales, it's an incredible advantage to have the founders doing that work directly, and for much longer than they find comfortable. The absolute worst thing you can do is hire a sales person or a customer service agent too early. You'll miss all the golden nuggets that customers throw at you for free when they're rejecting your pitch or complaining about the product. Seeing these reasons paraphrased or summarized destroy all the nutrients in their insights. You want that whole-grain feedback straight from the customers' mouth!  When we launched Basecamp in 2004, Jason was doing all the customer service himself. And he kept doing it like that for three years!! By the time we hired our first customer service agent, Jason was doing 150 emails/day. The business was doing millions of dollars in ARR. And Basecamp got infinitely, better both as a market proposition and as a product, because Jason could funnel all that feedback into decisions and positioning. For a long time after that, we did "Everyone on Support". Frequently rotating programmers, designers, and founders through a day of answering emails directly to customers. The dividends of doing this were almost as high as having Jason run it all in the early years. We fixed an incredible number of minor niggles and annoying bugs because programmers found it easier to solve the problem than to apologize for why it was there. It's not easy doing this! Customers often offer their valuable insights wrapped in rude language, unreasonable demands, and bad suggestions. That's why many founders quit the business of dealing with them at the first opportunity. That's why few companies ever do "Everyone On Support". That's why there's such eagerness to reduce support to an AI-only interaction. But quitting dealing with customers early, not just in support but also in sales, is an incredible handicap for any startup. You don't have to do everything that every customer demands of you, but you should certainly listen to them. And you can't listen well if the sound is being muffled by early layers of indirection.

yesterday 4 votes