More from 37signals Dev
As the final part of our move out of the cloud, we are working on moving 10 petabytes of data out of AWS Simple Storage Service (S3). After exploring different alternatives, we decided to go with Pure Storage FlashBlade solution. We store different kinds of information on S3, from the attachments customers upload to Basecamp to the Prometheus long-term metrics. On top of that, Pure’s system also provides filesystem-based capabilities, enabling other relevant usages, such as database backup storage. This makes the system a top priority for observability. Although the system has great reliability, out-of-the-box internal alerting, and autonomous ticket creation, it would also be good to have our metrics and alerts to facilitate problem-solving and ensure any disruptions are prioritized and handled. For more context on our current Prometheus setup, see how we use Prometheus at 37signals. Pure OpenMetrics exporter Pure maintains two OpenMetrics exporters, pure-fb-openmetrics-exporter and pure-fa-openmetrics-exporter. Since we use Pure Flashblade (fb), this post covers pure-fb-openmetrics-exporter, although overall usage should be similar. The setup is straightforward and requires only binary and basic authentication installation. Here is a snippet of our Chef recipe that installs it: pure_api_token = "token" # If you use Chef, your token should come from an ecrypted databag. Changed to hardcoded here to simplify PURE_EXPORTER_VERSION = "1.0.13".freeze # Generally, we use Chef node metadata for version management. Changed to hardcoded to simplify directory "/opt/pure_exporter/#{PURE_EXPORTER_VERSION}" do recursive true owner 'pure_exporter' group 'pure_exporter' end # Avoid recreating under /tmp after reboot if target_binary is already there target_binary = "/opt/pure_exporter/#{PURE_EXPORTER_VERSION}/pure-fb-openmetrics-exporter" remote_file "/tmp/pure-fb-openmetrics-exporter-v#{PURE_EXPORTER_VERSION}-linux-amd64.tar.gz" do source "https://github.com/PureStorage-OpenConnect/pure-fb-openmetrics-exporter/releases/download/v#{PURE_EXPORTER_VERSION}/pure-fb-openmetrics-exporter-v#{PURE_EXPORTER_VERSION}-linux-amd64.tar.gz" not_if { ::File.exist?(target_binary) } end archive_file "/tmp/pure-fb-openmetrics-exporter-v#{PURE_EXPORTER_VERSION}-linux-amd64.tar.gz" do destination "/tmp/pure-fb-openmetrics-exporter-v#{PURE_EXPORTER_VERSION}" action :extract not_if { ::File.exist?(target_binary) } end execute "copy binary" do command "sudo cp /tmp/pure-fb-openmetrics-exporter-v#{PURE_EXPORTER_VERSION}/pure-fb-openmetrics-exporter /opt/pure_exporter/#{PURE_EXPORTER_VERSION}/pure-exporter" creates "/opt/pure_exporter/#{PURE_EXPORTER_VERSION}/pure-exporter" not_if { ::File.exist?(target_binary) } end tokens = <<EOF main: address: purestorage-mgmt.mydomain.com api_token: #{pure_api_token['token']} EOF file "/opt/pure_exporter/tokens.yml" do content tokens owner 'pure_exporter' group 'pure_exporter' sensitive true end systemd_unit 'pure-exporter.service' do content <<-EOU # Caution: Chef managed content. This is a file resource from #{cookbook_name}::#{recipe_name} # [Unit] Description=Pure Exporter After=network.target [Service] Restart=on-failure PIDFile=/var/run/pure-exporter.pid User=pure_exporter Group=pure_exporter ExecStart=/opt/pure_exporter/#{PURE_EXPORTER_VERSION}/pure-exporter \ --tokens=/opt/pure_exporter/tokens.yml ExecReload=/bin/kill -HUP $MAINPID SyslogIdentifier=pure-exporter [Install] WantedBy=multi-user.target EOU action [ :create, :enable, :start ] notifies :reload, "service[pure-exporter]" end service 'pure-exporter' Prometheus Job Configuration The simplest way of ingesting the metrics is to configure a basic Job without any customization: - job_name: pure_exporter metrics_path: /metrics static_configs: - targets: ['<%= @hostname %>:9491'] labels: environment: 'production' job: pure_exporter params: endpoint: [main] # From the tokens configuration above For a production-ready setup, we are using a slightly different approach. The exporter supports the usage of specific metric paths to allow for split Prometheus jobs configuration that reduces the overhead of pulling the metrics all at once: - job_name: pure_exporter_array metrics_path: /metrics/array static_configs: - targets: ['<%= @hostname %>:9491'] labels: environment: 'production' job: pure_exporter metric_relabel_configs: - source_labels: [name] target_label: ch regex: "([^.]+).*" replacement: "$1" action: replace - source_labels: [name] target_label: fb regex: "[^.]+\\.([^.]+).*" replacement: "$1" action: replace - source_labels: [name] target_label: bay regex: "[^.]+\\.[^.]+\\.([^.]+)" replacement: "$1" action: replace params: endpoint: [main] # From the tokens configuration above - job_name: pure_exporter_clients metrics_path: /metrics/clients static_configs: - targets: ['<%= @hostname %>:9491'] labels: environment: 'production' job: pure_exporter params: endpoint: [main] # From the tokens configuration above - job_name: pure_exporter_usage metrics_path: /metrics/usage static_configs: - targets: ['<%= @hostname %>:9491'] labels: environment: 'production' job: pure_exporter params: endpoint: [main] - job_name: pure_exporter_policies metrics_path: /metrics/policies static_configs: - targets: ['<%= @hostname %>:9491'] labels: environment: 'production' job: pure_exporter params: endpoint: [main] # From the tokens configuration above We also configure some metric_relabel_configs to extract labels from name using regex. Those labels help reduce the complexity of queries that aggregate metrics by different components. Detailed documentation on the available metrics can be found here. Alerts Auto Generated Alerts As I shared earlier, the system has an internal Alerting module that automatically triggers alerts for critical situations and creates tickets. To cover those alerts on the Prometheus side, we added an alerting configuration of our own that relies on the incoming severities: - alert: PureAlert annotations: summary: '{{ $labels.summary }}' description: '{{ $labels.component_type }} - {{ $labels.component_name }} - {{ $labels.action }} - {{ $labels.kburl }}' dashboard: 'https://grafana/your-dashboard' expr: purefb_alerts_open{environment="production"} == 1 for: 1m We still need to evaluate how the pure-generated alerts will interact with the custom alerts I will cover below, and we might decide to stick to one or the other depending on what we find out. Hardware Before I continue, the image below helps visualize how some of the Pure FlashBlade components are physically organized: Because of Pure’s reliability, most isolated hardware failures do not require the immediate attention of an Ops team member. To cover the most basic hardware failures, we configure an alert that sends a message to the Ops Basecamp 4 project chat: - alert: PureHardwareFailed annotations: summary: Hardware {{ $labels.name }} in chassis {{ $labels.ch }} is failed description: 'The Pure Storage hardware {{ $labels.name }} in chassis {{ $labels.ch }} is failed' dashboard: 'https://grafana/your-dashboard' expr: purefb_hardware_health == 0 for: 1m labels: severity: chat-notification We also configure alerts that check for multiple hardware failures of the same type. This doesn’t mean two simultaneous failures will result in a critical state, but it is a fair guardrail for unexpected scenarios. We also expect those situations to be rare, keeping the risk of causing unnecessary noise low. - alert: PureMultipleHardwareFailed annotations: summary: Pure chassis {{ $labels.ch }} has {{ $value }} failed {{ $labels.type }} description: 'The Pure Storage chassis {{ $labels.ch }} has {{ $value }} failed {{ $labels.type }}, close to the healthy limit of two simultaneous failures. Ensure that the hardware failures are being worked on' dashboard: 'https://grafana/your-dashboard' expr: count(purefb_hardware_health{type!~"eth|mgmt_port|bay"} == 0) by (ch,type,environment) > 1 for: 1m labels: severity: page # We are looking for multiple failed bays in the same blade - alert: PureMultipleBaysFailed annotations: summary: Pure chassis {{ $labels.ch }} has fb {{ $labels.fb }} with {{ $value }} failed bays description: 'The Pure Storage chassis {{ $labels.ch }} has fb {{ $labels.fb }} with {{ $value }} failed bays, close to the healthy limit of two simultaneous failures. Ensure that the hardware failures are being worked on' dashboard: 'https://grafana/your-dashboard' expr: count(purefb_hardware_health{type="bay"} == 0) by (ch,type,fb,environment) > 1 for: 1m labels: severity: page Finally, we configure high-level alerts for chassis and XFM failures: - alert: PureChassisFailed annotations: summary: Chassis {{ $labels.name }} is failed description: 'The Pure Storage hardware chassis {{ $labels.name }} is failed' dashboard: 'https://grafana/your-dashboard' expr: purefb_hardware_health{type="ch"} == 0 for: 1m labels: severity: page - alert: PureXFMFailed annotations: summary: Xternal Fabric Module {{ $labels.name }} is failed description: 'The Pure Storage hardware Xternal fabric module {{ $labels.name }} is failed' dashboard: 'https://grafana/your-dashboard' expr: purefb_hardware_health{type="xfm"} == 0 for: 1m labels: severity: page Latency Using the metric purefb_array_performance_latency_usec we can set a threshold for all the different protocols and dimensions (read, write, etc), so we are alerted if any problem causes the latency to go above an expected level. - alert: PureLatencyHigh annotations: summary: Pure {{ $labels.dimension }} - {{ $labels.protocol }} latency high description: 'Pure {{ $labels.protocol }} latency for dimension {{ $labels.dimension }} is above 100ms' dashboard: 'https://grafana/your-dashboard' expr: (avg_over_time(purefb_array_performance_latency_usec{protocol="all"}[30m]) * 0.001) for: 1m labels: severity: chat-notification Saturation For saturation, we are primarily worried about something unexpected causing excessive use of array space, increasing the risk of hitting the cluster capacity. With that in mind, it’s good to have a simple alert in place, even if we don’t expect it to fire anytime soon: - alert: PureArraySpace annotations: summary: Pure Cluster {{ $labels.instance }} available space is expected to be below 10% description: 'The array space for pure cluster {{ $labels.instance }} is expected to be below 10% in a month, please investigate and ensure there is no risk of running out of capacity' dashboard: 'https://grafana/your-dashboard' expr: (predict_linear(purefb_array_space_bytes{space="empty",type="array"}[30d], 730 * 3600)) < (purefb_array_space_bytes{space="capacity",type="array"} * 0.10) for: 1m labels: severity: chat-notification HTTP We use BigIp load balancers to front-end the cluster, which means that all the alerts we already had in place for the BigIp HTTP profiles, virtual servers, and pools also cover access to Pure. The solution for each organization on this topic will be different, but it is a good practice to keep an eye on HTTP status codes and throughput. Grafana Dashboards The project’s GitHub repository includes JSON files for Grafana dashboards that are based on the metrics generated by the exporter. With simple adjustments to fit each setup, it’s possible to import them quickly. Wrapping up On top of the system’s built-in capabilities, Pure also provides options to integrate their system into well-known tools like Prometheus and Grafana, facilitating the process of managing the cluster the same way we manage everything else. I hope this post helps any other team interested in working with them better understand the effort involved. Thanks for reading!
Today, we are releasing Hotwire Spark, a live-reloading system for Rails Applications. Reloading the browser automatically on source changes is a problem that has been well-solved for a long time. Here, we wanted to put an accent on smoothness. If the reload operation is very noticeable, the feedback loop is similar to just reloading the page yourself. But if it’s smooth enough—if you only perceive the intended change—the feedback loop becomes terrific. To use, just install the gem in development: group :development do gem "hotwire-spark" end It will update the current page on three types of change: HTML content, CSS, and Stimulus controllers. How do we achieve that desired smoothness with each? For HTML content, it morphs the <body> of the page into the new <body>. Also, it disconnects and reconnects all the Stimulus controllers on the page. For CSS, it reloads the changed stylesheet. For Stimulus controllers, it fetches the changed controller, replaces its module in Stimulus, and reconnects all the controllers. We designed Hotwire Spark to shine with the #nobuildapproach we use and recommend. Serving CSS and JS assets as standalone files is ideal when you want to fetch and update only what has changed. There is no need to use bundling or any tooling. Hot Module Replacement for Stimulus controllers without any frontend building tool is pretty cool! 2024 has been a very special year for Rails. We’re thrilled to share Hotwire Spark before the year wraps up. Wishing you all a joyful holiday season and a fantastic start to 2025.
If you have the luxury of starting a new Rails app today, here’s our recommendation: go vanilla. Fight hard before adding Ruby dependencies. Keep that Gemfile that Rails generates as close to the original one as possible. Fight even harder before adding Javascript dependencies. You don’t need React or any other front-end frameworks, nor a JSON API to feed those. Hotwire is a fantastic, pragmatic, and ridiculously productive technology for the front end. Use it. The same goes for mobile apps: use Hotwire Native. With a hybrid approach you can combine the very same web app you have built with a wonderful native experience right where you want it. The productivity compared to a purely native approach is night and day. Embrace and celebrate rendering things on the server. It has become cool again. ERB templates and view helpers will take you as long as you need, and they are a fantastic common ground for designers to collaborate hands-on with the code. #nobuild is the simplest way to go; don’t close this door with your choices. Instead of bundling Javascript, use import maps. Don’t bundle CSS, just use modern standard CSS goodies and serve them all with Propshaft. If you have 100 Javascript files and 100 stylesheets, serve 200 standalone requests multiplexed over HTTP2. You will be delighted. Don’t add Redis to the mix. Use solid_cache for caching, solid_queue for jobs, and solid_cable for Action Cable. They will all work on your beloved relational database and are battle-tested. Test your apps with Minitest. Use fixtures and build a realistic set of those as you cook your app. Make your app a PWA, which is fully supported by Rails 8. This may be more than enough before caring about mobile apps at all. Deploy your app with Kamal. If you want heuristics, your importmap.rb should import Turbo, Stimulus, your app controllers, and little else. Your Gemfile should be almost identical to the one that Rails generates. I know it sounds radical, but going vanilla is a radical stance in this convoluted world of endless choices. This is the Rails 8 stack we have chosen for our new apps at 37signals. We are a tiny crew, so we care a lot about productivity. And we sell products, not stacks, so we care a lot about delighting our users. This is our Omakase stack because it offers the optimal balance for achieving both. Vanilla means your app stays nimble. Fewer dependencies mean fewer future headaches. You get a tight integration out of the box, so you can focus on building things. It also maximizes the odds of having smoother future upgrades. Vanilla requires determination, though, because new dependencies always look shiny and shinier. It’s always clear what you get when you add them, but never what you lose in the long term. It is certainly up to you. Rails is a wonderful big tent. These are our opinions. If it resonates, choose vanilla! Guess what our advice is for architecting your app internals?
We’ve just released Mission Control — Jobs v1.0.0, the dashboard and set of extensions to operate background jobs that we introduced earlier this year. This new version is the result of 92 pull requests, 67 issues and the help of 35 different contributors. It includes many bugfixes and improvements, such as: Support for Solid Queue’s recurring tasks, including running them on-demand. Support for API-only apps. Allowing immediate dispatching of scheduled and blocked jobs. Backtrace cleaning for failed jobs’ backtraces. A safer default for authentication, with Basic HTTP authentication enabled and initially closed unless configured or explicitly disabled. Recurring tasks in Mission Control — Jobs, with a subset of the tasks we run in production We use Mission Control — Jobs daily to manage jobs HEY and Basecamp 4, with both Solid Queue and Resque, and it’s the dashboard we recommend if you’re using Solid Queue for your jobs. Our plan is to upstream some of the extensions we’ve made to Active Job and continue improving it until it’s ready to be included by default in Rails together with Solid Queue. If you want to help us with that, are interested in learning more or have any issues or questions, head over to the repo in GitHub. We hope you like it!
More in programming
I like the job title “Design Engineer”. When required to label myself, I feel partial to that term (I should, I’ve written about it enough). Lately I’ve felt like the term is becoming more mainstream which, don’t get me wrong, is a good thing. I appreciate the diversification of job titles, especially ones that look to stand in the middle between two binaries. But — and I admit this is a me issue — once a title starts becoming mainstream, I want to use it less and less. I was never totally sure why I felt this way. Shouldn’t I be happy a title I prefer is gaining acceptance and understanding? Do I just want to rebel against being labeled? Why do I feel this way? These were the thoughts simmering in the back of my head when I came across an interview with the comedian Brian Regan where he talks about his own penchant for not wanting to be easily defined: I’ve tried over the years to write away from how people are starting to define me. As soon as I start feeling like people are saying “this is what you do” then I would be like “Alright, I don't want to be just that. I want to be more interesting. I want to have more perspectives.” [For example] I used to crouch around on stage all the time and people would go “Oh, he’s the guy who crouches around back and forth.” And I’m like, “I’ll show them, I will stand erect! Now what are you going to say?” And then they would go “You’re the guy who always feels stupid.” So I started [doing other things]. He continues, wondering aloud whether this aversion to not being easily defined has actually hurt his career in terms of commercial growth: I never wanted to be something you could easily define. I think, in some ways, that it’s held me back. I have a nice following, but I’m not huge. There are people who are huge, who are great, and deserve to be huge. I’ve never had that and sometimes I wonder, ”Well maybe it’s because I purposely don’t want to be a particular thing you can advertise or push.” That struck a chord with me. It puts into words my current feelings towards the job title “Design Engineer” — or any job title for that matter. Seven or so years ago, I would’ve enthusiastically said, “I’m a Design Engineer!” To which many folks would’ve said, “What’s that?” But today I hesitate. If I say “I’m a Design Engineer” there are less follow up questions. Now-a-days that title elicits less questions and more (presumed) certainty. I think I enjoy a title that elicits a “What’s that?” response, which allows me to explain myself in more than two or three words, without being put in a box. But once a title becomes mainstream, once people begin to assume they know what it means, I don’t like it anymore (speaking for myself, personally). As Brian says, I like to be difficult to define. I want to have more perspectives. I like a title that befuddles, that doesn’t provide a presumed sense of certainty about who I am and what I do. And I get it, that runs counter to the very purpose of a job title which is why I don’t think it’s good for your career to have the attitude I do, lol. I think my own career evolution has gone something like what Brian describes: Them: “Oh you’re a Designer? So you make mock-ups in Photoshop and somebody else implements them.” Me: “I’ll show them, I’ll implement them myself! Now what are you gonna do?” Them: “Oh, so you’re a Design Engineer? You design and build user interfaces on the front-end.” Me: “I’ll show them, I’ll write a Node server and setup a database that powers my designs and interactions on the front-end. Now what are they gonna do?” Them: “Oh, well, we I’m not sure we have a term for that yet, maybe Full-stack Design Engineer?” Me: “Oh yeah? I’ll frame up a user problem, interface with stakeholders, explore the solution space with static designs and prototypes, implement a high-fidelity solution, and then be involved in testing, measuring, and refining said solution. What are you gonna call that?” [As you can see, I have some personal issues I need to work through…] As Brian says, I want to be more interesting. I want to have more perspectives. I want to be something that’s not so easily definable, something you can’t sum up in two or three words. I’ve felt this tension my whole career making stuff for the web. I think it has led me to work on smaller teams where boundaries are much more permeable and crossing them is encouraged rather than discouraged. All that said, I get it. I get why titles are useful in certain contexts (corporate hierarchies, recruiting, etc.) where you’re trying to take something as complicated and nuanced as an individual human beings and reduce them to labels that can be categorized in a database. I find myself avoiding those contexts where so much emphasis is placed in the usefulness of those labels. “I’ve never wanted to be something you could easily define” stands at odds with the corporate attitude of, “Here’s the job req. for the role (i.e. cog) we’re looking for.” Email · Mastodon · Bluesky
I've been biking in Brooklyn for a few years now! It's hard for me to believe it, but I'm now one of the people other bicyclists ask questions to now. I decided to make a zine that answers the most common of those questions: Bike Brooklyn! is a zine that touches on everything I wish I knew when I started biking in Brooklyn. A lot of this information can be found in other resources, but I wanted to collect it in one place. I hope to update this zine when we get significantly more safe bike infrastructure in Brooklyn and laws change to make streets safer for bicyclists (and everyone) over time, but it's still important to note that each release will reflect a specific snapshot in time of bicycling in Brooklyn. All text and illustrations in the zine are my own. Thank you to Matt Denys, Geoffrey Thomas, Alex Morano, Saskia Haegens, Vishnu Reddy, Ben Turndorf, Thomas Nayem-Huzij, and Ryan Christman for suggestions for content and help with proofreading. This zine is licensed under the Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, so you can copy and distribute this zine for noncommercial purposes in unadapted form as long as you give credit to me. Check out the Bike Brooklyn! zine on the web or download pdfs to read digitally or print here!
At work, we’ve been building agentic workflows to support our internal Delivery team on various accounting, cash reconciliation, and operational tasks. To better guide that project, I wrote my own simple workflow tool as a learning project in January. Since then, the Model Context Protocol (MCP) has become a prominent solution for writing tools for agents, and I decided to spend some time writing an MCP server over the weekend to build a better intuition. The output of that project is library-mcp, a simple MCP that you can use locally with tools like Claude Desktop to explore Markdown knowledge bases. I’m increasingly enamored with the idea of “datapacks” that I load into context windows with relevant work, and I am currently working to release my upcoming book in a “datapack” format that’s optimized for usage with LLMs. library-mcp allows any author to dynamically build datapacks relevant to their current question, as long as they have access to their content in Markdown files. A few screenshots tell the story. First, here’s a list of the tools provided by this server. These tools give a variety of ways to search through content and pull that content into your context window. Each time you access a tool for the first time in a chat, Claude Desktop prompts you to verify you want that tool to operate. This is a nice feature, and I think it’s particularly important that approval is done at the application layer, not at the agent layer. If agents approve their own usage, well, security is going to be quite a mess. Here’s an example of retrieving all the tags to figure out what I’ve written about. You could do a follow-up like, “Get me posts I’ve written about ‘python’” after seeing the tags. The interesting thing here is you can combine retrieval and intelligence. For example, you could ask “Get me all the tags I’ve written, and find those that seem related to software architecture” and it does a good job of filtering. Finally, here’s an example of actually using a datapack to answer a question. In this case, it’s evaluating how my writing has changed between 2018 and 2025. More practically, I’ve already experimented with friends writing their CTO onboarding plans with Your first 90 days as CTO as a datapack in the context window, and you can imagine the right datapacks allowing you to go much further. Writing a company policy with all the existing policies in a datapack, along with a document about how to write policies effectively, for example, would improve consistency and be likely to identify conflicting policies. Altogether, I am currently enamored with the vision of useful datapacks facilitating creation, and hope that library-mcp is a useful tool for folks as we experiment our way towards this idea.