More from 37signals Dev
As the final part of our move out of the cloud, we are working on moving 10 petabytes of data out of AWS Simple Storage Service (S3). After exploring different alternatives, we decided to go with Pure Storage FlashBlade solution. We store different kinds of information on S3, from the attachments customers upload to Basecamp to the Prometheus long-term metrics. On top of that, Pure’s system also provides filesystem-based capabilities, enabling other relevant usages, such as database backup storage. This makes the system a top priority for observability. Although the system has great reliability, out-of-the-box internal alerting, and autonomous ticket creation, it would also be good to have our metrics and alerts to facilitate problem-solving and ensure any disruptions are prioritized and handled. For more context on our current Prometheus setup, see how we use Prometheus at 37signals. Pure OpenMetrics exporter Pure maintains two OpenMetrics exporters, pure-fb-openmetrics-exporter and pure-fa-openmetrics-exporter. Since we use Pure Flashblade (fb), this post covers pure-fb-openmetrics-exporter, although overall usage should be similar. The setup is straightforward and requires only binary and basic authentication installation. Here is a snippet of our Chef recipe that installs it: pure_api_token = "token" # If you use Chef, your token should come from an ecrypted databag. Changed to hardcoded here to simplify PURE_EXPORTER_VERSION = "1.0.13".freeze # Generally, we use Chef node metadata for version management. Changed to hardcoded to simplify directory "/opt/pure_exporter/#{PURE_EXPORTER_VERSION}" do recursive true owner 'pure_exporter' group 'pure_exporter' end # Avoid recreating under /tmp after reboot if target_binary is already there target_binary = "/opt/pure_exporter/#{PURE_EXPORTER_VERSION}/pure-fb-openmetrics-exporter" remote_file "/tmp/pure-fb-openmetrics-exporter-v#{PURE_EXPORTER_VERSION}-linux-amd64.tar.gz" do source "https://github.com/PureStorage-OpenConnect/pure-fb-openmetrics-exporter/releases/download/v#{PURE_EXPORTER_VERSION}/pure-fb-openmetrics-exporter-v#{PURE_EXPORTER_VERSION}-linux-amd64.tar.gz" not_if { ::File.exist?(target_binary) } end archive_file "/tmp/pure-fb-openmetrics-exporter-v#{PURE_EXPORTER_VERSION}-linux-amd64.tar.gz" do destination "/tmp/pure-fb-openmetrics-exporter-v#{PURE_EXPORTER_VERSION}" action :extract not_if { ::File.exist?(target_binary) } end execute "copy binary" do command "sudo cp /tmp/pure-fb-openmetrics-exporter-v#{PURE_EXPORTER_VERSION}/pure-fb-openmetrics-exporter /opt/pure_exporter/#{PURE_EXPORTER_VERSION}/pure-exporter" creates "/opt/pure_exporter/#{PURE_EXPORTER_VERSION}/pure-exporter" not_if { ::File.exist?(target_binary) } end tokens = <<EOF main: address: purestorage-mgmt.mydomain.com api_token: #{pure_api_token['token']} EOF file "/opt/pure_exporter/tokens.yml" do content tokens owner 'pure_exporter' group 'pure_exporter' sensitive true end systemd_unit 'pure-exporter.service' do content <<-EOU # Caution: Chef managed content. This is a file resource from #{cookbook_name}::#{recipe_name} # [Unit] Description=Pure Exporter After=network.target [Service] Restart=on-failure PIDFile=/var/run/pure-exporter.pid User=pure_exporter Group=pure_exporter ExecStart=/opt/pure_exporter/#{PURE_EXPORTER_VERSION}/pure-exporter \ --tokens=/opt/pure_exporter/tokens.yml ExecReload=/bin/kill -HUP $MAINPID SyslogIdentifier=pure-exporter [Install] WantedBy=multi-user.target EOU action [ :create, :enable, :start ] notifies :reload, "service[pure-exporter]" end service 'pure-exporter' Prometheus Job Configuration The simplest way of ingesting the metrics is to configure a basic Job without any customization: - job_name: pure_exporter metrics_path: /metrics static_configs: - targets: ['<%= @hostname %>:9491'] labels: environment: 'production' job: pure_exporter params: endpoint: [main] # From the tokens configuration above For a production-ready setup, we are using a slightly different approach. The exporter supports the usage of specific metric paths to allow for split Prometheus jobs configuration that reduces the overhead of pulling the metrics all at once: - job_name: pure_exporter_array metrics_path: /metrics/array static_configs: - targets: ['<%= @hostname %>:9491'] labels: environment: 'production' job: pure_exporter metric_relabel_configs: - source_labels: [name] target_label: ch regex: "([^.]+).*" replacement: "$1" action: replace - source_labels: [name] target_label: fb regex: "[^.]+\\.([^.]+).*" replacement: "$1" action: replace - source_labels: [name] target_label: bay regex: "[^.]+\\.[^.]+\\.([^.]+)" replacement: "$1" action: replace params: endpoint: [main] # From the tokens configuration above - job_name: pure_exporter_clients metrics_path: /metrics/clients static_configs: - targets: ['<%= @hostname %>:9491'] labels: environment: 'production' job: pure_exporter params: endpoint: [main] # From the tokens configuration above - job_name: pure_exporter_usage metrics_path: /metrics/usage static_configs: - targets: ['<%= @hostname %>:9491'] labels: environment: 'production' job: pure_exporter params: endpoint: [main] - job_name: pure_exporter_policies metrics_path: /metrics/policies static_configs: - targets: ['<%= @hostname %>:9491'] labels: environment: 'production' job: pure_exporter params: endpoint: [main] # From the tokens configuration above We also configure some metric_relabel_configs to extract labels from name using regex. Those labels help reduce the complexity of queries that aggregate metrics by different components. Detailed documentation on the available metrics can be found here. Alerts Auto Generated Alerts As I shared earlier, the system has an internal Alerting module that automatically triggers alerts for critical situations and creates tickets. To cover those alerts on the Prometheus side, we added an alerting configuration of our own that relies on the incoming severities: - alert: PureAlert annotations: summary: '{{ $labels.summary }}' description: '{{ $labels.component_type }} - {{ $labels.component_name }} - {{ $labels.action }} - {{ $labels.kburl }}' dashboard: 'https://grafana/your-dashboard' expr: purefb_alerts_open{environment="production"} == 1 for: 1m We still need to evaluate how the pure-generated alerts will interact with the custom alerts I will cover below, and we might decide to stick to one or the other depending on what we find out. Hardware Before I continue, the image below helps visualize how some of the Pure FlashBlade components are physically organized: Because of Pure’s reliability, most isolated hardware failures do not require the immediate attention of an Ops team member. To cover the most basic hardware failures, we configure an alert that sends a message to the Ops Basecamp 4 project chat: - alert: PureHardwareFailed annotations: summary: Hardware {{ $labels.name }} in chassis {{ $labels.ch }} is failed description: 'The Pure Storage hardware {{ $labels.name }} in chassis {{ $labels.ch }} is failed' dashboard: 'https://grafana/your-dashboard' expr: purefb_hardware_health == 0 for: 1m labels: severity: chat-notification We also configure alerts that check for multiple hardware failures of the same type. This doesn’t mean two simultaneous failures will result in a critical state, but it is a fair guardrail for unexpected scenarios. We also expect those situations to be rare, keeping the risk of causing unnecessary noise low. - alert: PureMultipleHardwareFailed annotations: summary: Pure chassis {{ $labels.ch }} has {{ $value }} failed {{ $labels.type }} description: 'The Pure Storage chassis {{ $labels.ch }} has {{ $value }} failed {{ $labels.type }}, close to the healthy limit of two simultaneous failures. Ensure that the hardware failures are being worked on' dashboard: 'https://grafana/your-dashboard' expr: count(purefb_hardware_health{type!~"eth|mgmt_port|bay"} == 0) by (ch,type,environment) > 1 for: 1m labels: severity: page # We are looking for multiple failed bays in the same blade - alert: PureMultipleBaysFailed annotations: summary: Pure chassis {{ $labels.ch }} has fb {{ $labels.fb }} with {{ $value }} failed bays description: 'The Pure Storage chassis {{ $labels.ch }} has fb {{ $labels.fb }} with {{ $value }} failed bays, close to the healthy limit of two simultaneous failures. Ensure that the hardware failures are being worked on' dashboard: 'https://grafana/your-dashboard' expr: count(purefb_hardware_health{type="bay"} == 0) by (ch,type,fb,environment) > 1 for: 1m labels: severity: page Finally, we configure high-level alerts for chassis and XFM failures: - alert: PureChassisFailed annotations: summary: Chassis {{ $labels.name }} is failed description: 'The Pure Storage hardware chassis {{ $labels.name }} is failed' dashboard: 'https://grafana/your-dashboard' expr: purefb_hardware_health{type="ch"} == 0 for: 1m labels: severity: page - alert: PureXFMFailed annotations: summary: Xternal Fabric Module {{ $labels.name }} is failed description: 'The Pure Storage hardware Xternal fabric module {{ $labels.name }} is failed' dashboard: 'https://grafana/your-dashboard' expr: purefb_hardware_health{type="xfm"} == 0 for: 1m labels: severity: page Latency Using the metric purefb_array_performance_latency_usec we can set a threshold for all the different protocols and dimensions (read, write, etc), so we are alerted if any problem causes the latency to go above an expected level. - alert: PureLatencyHigh annotations: summary: Pure {{ $labels.dimension }} - {{ $labels.protocol }} latency high description: 'Pure {{ $labels.protocol }} latency for dimension {{ $labels.dimension }} is above 100ms' dashboard: 'https://grafana/your-dashboard' expr: (avg_over_time(purefb_array_performance_latency_usec{protocol="all"}[30m]) * 0.001) for: 1m labels: severity: chat-notification Saturation For saturation, we are primarily worried about something unexpected causing excessive use of array space, increasing the risk of hitting the cluster capacity. With that in mind, it’s good to have a simple alert in place, even if we don’t expect it to fire anytime soon: - alert: PureArraySpace annotations: summary: Pure Cluster {{ $labels.instance }} available space is expected to be below 10% description: 'The array space for pure cluster {{ $labels.instance }} is expected to be below 10% in a month, please investigate and ensure there is no risk of running out of capacity' dashboard: 'https://grafana/your-dashboard' expr: (predict_linear(purefb_array_space_bytes{space="empty",type="array"}[30d], 730 * 3600)) < (purefb_array_space_bytes{space="capacity",type="array"} * 0.10) for: 1m labels: severity: chat-notification HTTP We use BigIp load balancers to front-end the cluster, which means that all the alerts we already had in place for the BigIp HTTP profiles, virtual servers, and pools also cover access to Pure. The solution for each organization on this topic will be different, but it is a good practice to keep an eye on HTTP status codes and throughput. Grafana Dashboards The project’s GitHub repository includes JSON files for Grafana dashboards that are based on the metrics generated by the exporter. With simple adjustments to fit each setup, it’s possible to import them quickly. Wrapping up On top of the system’s built-in capabilities, Pure also provides options to integrate their system into well-known tools like Prometheus and Grafana, facilitating the process of managing the cluster the same way we manage everything else. I hope this post helps any other team interested in working with them better understand the effort involved. Thanks for reading!
Today, we are releasing Hotwire Spark, a live-reloading system for Rails Applications. Reloading the browser automatically on source changes is a problem that has been well-solved for a long time. Here, we wanted to put an accent on smoothness. If the reload operation is very noticeable, the feedback loop is similar to just reloading the page yourself. But if it’s smooth enough—if you only perceive the intended change—the feedback loop becomes terrific. To use, just install the gem in development: group :development do gem "hotwire-spark" end It will update the current page on three types of change: HTML content, CSS, and Stimulus controllers. How do we achieve that desired smoothness with each? For HTML content, it morphs the <body> of the page into the new <body>. Also, it disconnects and reconnects all the Stimulus controllers on the page. For CSS, it reloads the changed stylesheet. For Stimulus controllers, it fetches the changed controller, replaces its module in Stimulus, and reconnects all the controllers. We designed Hotwire Spark to shine with the #nobuildapproach we use and recommend. Serving CSS and JS assets as standalone files is ideal when you want to fetch and update only what has changed. There is no need to use bundling or any tooling. Hot Module Replacement for Stimulus controllers without any frontend building tool is pretty cool! 2024 has been a very special year for Rails. We’re thrilled to share Hotwire Spark before the year wraps up. Wishing you all a joyful holiday season and a fantastic start to 2025.
If you have the luxury of starting a new Rails app today, here’s our recommendation: go vanilla. Fight hard before adding Ruby dependencies. Keep that Gemfile that Rails generates as close to the original one as possible. Fight even harder before adding Javascript dependencies. You don’t need React or any other front-end frameworks, nor a JSON API to feed those. Hotwire is a fantastic, pragmatic, and ridiculously productive technology for the front end. Use it. The same goes for mobile apps: use Hotwire Native. With a hybrid approach you can combine the very same web app you have built with a wonderful native experience right where you want it. The productivity compared to a purely native approach is night and day. Embrace and celebrate rendering things on the server. It has become cool again. ERB templates and view helpers will take you as long as you need, and they are a fantastic common ground for designers to collaborate hands-on with the code. #nobuild is the simplest way to go; don’t close this door with your choices. Instead of bundling Javascript, use import maps. Don’t bundle CSS, just use modern standard CSS goodies and serve them all with Propshaft. If you have 100 Javascript files and 100 stylesheets, serve 200 standalone requests multiplexed over HTTP2. You will be delighted. Don’t add Redis to the mix. Use solid_cache for caching, solid_queue for jobs, and solid_cable for Action Cable. They will all work on your beloved relational database and are battle-tested. Test your apps with Minitest. Use fixtures and build a realistic set of those as you cook your app. Make your app a PWA, which is fully supported by Rails 8. This may be more than enough before caring about mobile apps at all. Deploy your app with Kamal. If you want heuristics, your importmap.rb should import Turbo, Stimulus, your app controllers, and little else. Your Gemfile should be almost identical to the one that Rails generates. I know it sounds radical, but going vanilla is a radical stance in this convoluted world of endless choices. This is the Rails 8 stack we have chosen for our new apps at 37signals. We are a tiny crew, so we care a lot about productivity. And we sell products, not stacks, so we care a lot about delighting our users. This is our Omakase stack because it offers the optimal balance for achieving both. Vanilla means your app stays nimble. Fewer dependencies mean fewer future headaches. You get a tight integration out of the box, so you can focus on building things. It also maximizes the odds of having smoother future upgrades. Vanilla requires determination, though, because new dependencies always look shiny and shinier. It’s always clear what you get when you add them, but never what you lose in the long term. It is certainly up to you. Rails is a wonderful big tent. These are our opinions. If it resonates, choose vanilla! Guess what our advice is for architecting your app internals?
We’ve just released Mission Control — Jobs v1.0.0, the dashboard and set of extensions to operate background jobs that we introduced earlier this year. This new version is the result of 92 pull requests, 67 issues and the help of 35 different contributors. It includes many bugfixes and improvements, such as: Support for Solid Queue’s recurring tasks, including running them on-demand. Support for API-only apps. Allowing immediate dispatching of scheduled and blocked jobs. Backtrace cleaning for failed jobs’ backtraces. A safer default for authentication, with Basic HTTP authentication enabled and initially closed unless configured or explicitly disabled. Recurring tasks in Mission Control — Jobs, with a subset of the tasks we run in production We use Mission Control — Jobs daily to manage jobs HEY and Basecamp 4, with both Solid Queue and Resque, and it’s the dashboard we recommend if you’re using Solid Queue for your jobs. Our plan is to upstream some of the extensions we’ve made to Active Job and continue improving it until it’s ready to be included by default in Rails together with Solid Queue. If you want to help us with that, are interested in learning more or have any issues or questions, head over to the repo in GitHub. We hope you like it!
More in programming
Discover how The Epic Programming Principles can transform your web development decision-making, boost your career, and help you build better software.
In a fit of frustration, I wrote the first version of Kamal in six weeks at the start of 2023. Our plan to get out of the cloud was getting bogged down in enterprisey pricing and Kubernetes complexity. And I refused to accept that running our own hardware had to be that expensive or that convoluted. So I got busy building a cheap and simple alternative. Now, just two years later, Kamal is deploying every single application in our entire heritage fleet, and everything in active development. Finalizing a perfectly uniform mode of deployment for every web app we've built over the past two decades and still maintain. See, we have this obsession at 37signals: That the modern build-boost-discard cycle of internet applications is a scourge. That users ought to be able to trust that when they adopt a system like Basecamp or HEY, they don't have to fear eviction from the next executive re-org. We call this obsession Until The End Of The Internet. That obsession isn't free, but it's worth it. It means we're still operating the very first version of Basecamp for thousands of paying customers. That's the OG code base from 2003! Which hasn't seen any updates since 2010, beyond security patches, bug fixes, and performance improvements. But we're still operating it, and, along with every other app in our heritage collection, deploying it with Kamal. That just makes me smile, knowing that we have customers who adopted Basecamp in 2004, and are still able to use the same system some twenty years later. In the meantime, we've relaunched and dramatically improved Basecamp many times since. But for customers happy with what they have, there's no forced migration to the latest version. I very much had all of this in mind when designing Kamal. That's one of the reasons I really love Docker. It allows you to encapsulate an entire system, with all of its dependencies, and run it until the end of time. Kind of how modern gaming emulators can run the original ROM of Pac-Man or Pong to perfection and eternity. Kamal seeks to be but a simple wrapper and workflow around this wondrous simplicity. Complexity is but a bridge — and a fragile one at that. To build something durable, you have to make it simple.
Denmark has been reaping lots of delayed accolades from its relatively strict immigration policy lately. The Swedes and the Germans in particular are now eager to take inspiration from The Danish Model, given their predicaments. The very same countries that until recently condemned the lack of open-arms/open-border policies they would champion as Moral Superpowers. But even in Denmark, thirty years after the public opposition to mass immigration started getting real political representation, the consequences of culturally-incompatible descendants from MENAPT continue to stress the high-trust societal model. Here are just three major cases that's been covered in the Danish media in 2025 alone: Danish public schools are increasingly struggling with violence and threats against students and teachers, primarily from descendants of MENAPT immigrants. In schools with 30% or more immigrants, violence is twice as prevalent. This is causing a flight to private schools from parents who can afford it (including some Syrians!). Some teachers are quitting the profession as a result, saying "the Quran run the class room". Danish women are increasingly feeling unsafe in the nightlife. The mayor of the country's third largest city, Odense, says he knows why: "It's groups of young men with an immigrant background that's causing it. We might as well be honest about that." But unfortunately, the only suggestion he had to deal with the problem was that "when [the women] meet these groups... they should take a big detour around them". A soccer club from the infamous ghetto area of Vollsmose got national attention because every other team in their league refused to play them. Due to the team's long history of violent assaults and death threats against opposing teams and referees. Bizarrely leading to the situation were the team got to the top of its division because they'd "win" every forfeited match. Problems of this sort have existed in Denmark for well over thirty years. So in a way, none of this should be surprising. But it actually is. Because it shows that long-term assimilation just isn't happening at a scale to tackle these problems. In fact, data shows the opposite: Descendants of MENAPT immigrants are more likely to be violent and troublesome than their parents. That's an explosive point because it blows up the thesis that time will solve these problems. Showing instead that it actually just makes it worse. And then what? This is particularly pertinent in the analysis of Sweden. After the "far right" party of the Swedish Democrats got into government, the new immigrant arrivals have plummeted. But unfortunately, the net share of immigrants is still increasing, in part because of family reunifications, and thus the problems continue. Meaning even if European countries "close the borders", they're still condemned to deal with the damning effects of maladjusted MENAPT immigrant descendants for decades to come. If the intervention stops there. There are no easy answers here. Obviously, if you're in a hole, you should stop digging. And Sweden has done just that. But just because you aren't compounding the problem doesn't mean you've found a way out. Denmark proves to be both a positive example of minimizing the digging while also a cautionary tale that the hole is still there.
One rabbit hole I can never resist going down is finding the original creator of a piece of art. This sounds simple, but it’s often quite difficult. The Internet is a maze of social media accounts that only exist to repost other people’s art, usually with minimal or non-existent attribution. A popular image spawns a thousand copies, each a little further from the original. Signatures get cropped, creators’ names vanish, and we’re left with meaningless phrases like “no copyright intended”, as if that magically absolves someone of artistic theft. Why do I do this? I’ve always been a bit obsessive, a bit completionist. I’ve worked in cultural heritage for eight years, which has made me more aware of copyright and more curious about provenance. And it’s satisfying to know I’ve found the original source, that I can’t dig any further. This takes time. It’s digital detective work, using tools like Google Lens and TinEye, and it’s not always easy or possible. Sometimes the original pops straight to the top, but other times it takes a lot of digging to find the source of an image. So many of us have become accustomed to art as an endless, anonymous stream of “content”. A beautiful image appears in our feed, we give it a quick heart, and scroll on, with no thought for the human who sweated blood and tears to create it. That original artist feels distant, disconected. Whatever benefit they might get from the “exposure” of your work going viral, they don’t get any if their name has been removed first. I came across two examples recently that remind me it’s not just artists who miss out – it’s everyone who enjoys art. I saw a photo of some traffic lights on Tumblr. I love their misty, nighttime aesthetic, the way the bright colours of the lights cut through the fog, the totality of the surrounding darkness. But there was no name – somebody had just uploaded the image to their Tumblr page, it was reblogged a bunch of times, and then it appeared on my dashboard. Who took it? I used Google Lens to find the original photographer: Lucas Zimmerman. Then I discovered it was part of a series. And there was a sequel. I found interviews. Context. Related work. I found all this cool stuff, but only because I knew Lucas’s name. Traffic Lights, by Lucas Zimmerman. Published on Behance.net under a CC BY‑NC 4.0 license, and reposted here in accordance with that license. The second example was a silent video of somebody making tiny chess pieces, just captioned “wow”. It was clearly an edit of another video, with fast-paced cuts to make it accommodate a short attention span – and again with no attribution. This was a little harder to find – I had to search several frames in Google Lens before I found a summary on a Russian website, which had a link to a YouTube video by metalworker and woodworker Левша (Levsha). This video is four times longer than the cut-up version I found, in higher resolution, and with commentary from the original creator. I don’t speak Russian, but YouTube has auto-translated subtitles. Now I know how this amazing set was made, and I have a much better understanding of the materials and techniques involved. (This includes the delightful name Wenge wood, which I’d never heard before.) https://youtube.com/watch?v=QoKdDK3y-mQ A piece of art is more than just a single image or video. It’s a process, a human story. When art is detached from its context and creator, we lose something fundamental. Creators lose the chance to benefit from their work, and we lose the opportunity to engage with it in a deeper way. We can’t learn how it was made, find their other work, or discover how to make similar art for ourselves. The Internet has done many wonderful things for art, but it’s also a machine for endless copyright infringement. It’s not just about generative AI and content scraping – those are serious issues, but this problem existed long before any of us had heard of ChatGPT. It’s a thousand tiny paper cuts. How many of us have used an image from the Internet because it showed up in a search, without a second thought for its creator? When Google Images says “images may be subject to copyright”, how many of us have really thought about what that means? Next time you want to use an image from the web, look to see if it’s shared under a license that allows reuse, and make sure you include the appropriate attribution – and if not, look for a different image. Finding the original creator is hard, sometimes impossible. The Internet is full of shadows: copies of things that went offline years ago. But when I succeed, it feels worth the effort – both for the original artist and myself. When I read a book or watch a TV show, the credits guide me to the artists, and I can appreciate both them and the rest of their work. I wish the Internet was more like that. I wish the platforms we rely on put more emphasis on credit and attribution, and the people behind art. The next time an image catches your eye, take a moment. Who made this? What does it mean? What’s their story? [If the formatting of this post looks odd in your feed reader, visit the original article]