More from 37signals Dev
As the final part of our move out of the cloud, we are working on moving 10 petabytes of data out of AWS Simple Storage Service (S3). After exploring different alternatives, we decided to go with Pure Storage FlashBlade solution. We store different kinds of information on S3, from the attachments customers upload to Basecamp to the Prometheus long-term metrics. On top of that, Pure’s system also provides filesystem-based capabilities, enabling other relevant usages, such as database backup storage. This makes the system a top priority for observability. Although the system has great reliability, out-of-the-box internal alerting, and autonomous ticket creation, it would also be good to have our metrics and alerts to facilitate problem-solving and ensure any disruptions are prioritized and handled. For more context on our current Prometheus setup, see how we use Prometheus at 37signals. Pure OpenMetrics exporter Pure maintains two OpenMetrics exporters, pure-fb-openmetrics-exporter and pure-fa-openmetrics-exporter. Since we use Pure Flashblade (fb), this post covers pure-fb-openmetrics-exporter, although overall usage should be similar. The setup is straightforward and requires only binary and basic authentication installation. Here is a snippet of our Chef recipe that installs it: pure_api_token = "token" # If you use Chef, your token should come from an ecrypted databag. Changed to hardcoded here to simplify PURE_EXPORTER_VERSION = "1.0.13".freeze # Generally, we use Chef node metadata for version management. Changed to hardcoded to simplify directory "/opt/pure_exporter/#{PURE_EXPORTER_VERSION}" do recursive true owner 'pure_exporter' group 'pure_exporter' end # Avoid recreating under /tmp after reboot if target_binary is already there target_binary = "/opt/pure_exporter/#{PURE_EXPORTER_VERSION}/pure-fb-openmetrics-exporter" remote_file "/tmp/pure-fb-openmetrics-exporter-v#{PURE_EXPORTER_VERSION}-linux-amd64.tar.gz" do source "https://github.com/PureStorage-OpenConnect/pure-fb-openmetrics-exporter/releases/download/v#{PURE_EXPORTER_VERSION}/pure-fb-openmetrics-exporter-v#{PURE_EXPORTER_VERSION}-linux-amd64.tar.gz" not_if { ::File.exist?(target_binary) } end archive_file "/tmp/pure-fb-openmetrics-exporter-v#{PURE_EXPORTER_VERSION}-linux-amd64.tar.gz" do destination "/tmp/pure-fb-openmetrics-exporter-v#{PURE_EXPORTER_VERSION}" action :extract not_if { ::File.exist?(target_binary) } end execute "copy binary" do command "sudo cp /tmp/pure-fb-openmetrics-exporter-v#{PURE_EXPORTER_VERSION}/pure-fb-openmetrics-exporter /opt/pure_exporter/#{PURE_EXPORTER_VERSION}/pure-exporter" creates "/opt/pure_exporter/#{PURE_EXPORTER_VERSION}/pure-exporter" not_if { ::File.exist?(target_binary) } end tokens = <<EOF main: address: purestorage-mgmt.mydomain.com api_token: #{pure_api_token['token']} EOF file "/opt/pure_exporter/tokens.yml" do content tokens owner 'pure_exporter' group 'pure_exporter' sensitive true end systemd_unit 'pure-exporter.service' do content <<-EOU # Caution: Chef managed content. This is a file resource from #{cookbook_name}::#{recipe_name} # [Unit] Description=Pure Exporter After=network.target [Service] Restart=on-failure PIDFile=/var/run/pure-exporter.pid User=pure_exporter Group=pure_exporter ExecStart=/opt/pure_exporter/#{PURE_EXPORTER_VERSION}/pure-exporter \ --tokens=/opt/pure_exporter/tokens.yml ExecReload=/bin/kill -HUP $MAINPID SyslogIdentifier=pure-exporter [Install] WantedBy=multi-user.target EOU action [ :create, :enable, :start ] notifies :reload, "service[pure-exporter]" end service 'pure-exporter' Prometheus Job Configuration The simplest way of ingesting the metrics is to configure a basic Job without any customization: - job_name: pure_exporter metrics_path: /metrics static_configs: - targets: ['<%= @hostname %>:9491'] labels: environment: 'production' job: pure_exporter params: endpoint: [main] # From the tokens configuration above For a production-ready setup, we are using a slightly different approach. The exporter supports the usage of specific metric paths to allow for split Prometheus jobs configuration that reduces the overhead of pulling the metrics all at once: - job_name: pure_exporter_array metrics_path: /metrics/array static_configs: - targets: ['<%= @hostname %>:9491'] labels: environment: 'production' job: pure_exporter metric_relabel_configs: - source_labels: [name] target_label: ch regex: "([^.]+).*" replacement: "$1" action: replace - source_labels: [name] target_label: fb regex: "[^.]+\\.([^.]+).*" replacement: "$1" action: replace - source_labels: [name] target_label: bay regex: "[^.]+\\.[^.]+\\.([^.]+)" replacement: "$1" action: replace params: endpoint: [main] # From the tokens configuration above - job_name: pure_exporter_clients metrics_path: /metrics/clients static_configs: - targets: ['<%= @hostname %>:9491'] labels: environment: 'production' job: pure_exporter params: endpoint: [main] # From the tokens configuration above - job_name: pure_exporter_usage metrics_path: /metrics/usage static_configs: - targets: ['<%= @hostname %>:9491'] labels: environment: 'production' job: pure_exporter params: endpoint: [main] - job_name: pure_exporter_policies metrics_path: /metrics/policies static_configs: - targets: ['<%= @hostname %>:9491'] labels: environment: 'production' job: pure_exporter params: endpoint: [main] # From the tokens configuration above We also configure some metric_relabel_configs to extract labels from name using regex. Those labels help reduce the complexity of queries that aggregate metrics by different components. Detailed documentation on the available metrics can be found here. Alerts Auto Generated Alerts As I shared earlier, the system has an internal Alerting module that automatically triggers alerts for critical situations and creates tickets. To cover those alerts on the Prometheus side, we added an alerting configuration of our own that relies on the incoming severities: - alert: PureAlert annotations: summary: '{{ $labels.summary }}' description: '{{ $labels.component_type }} - {{ $labels.component_name }} - {{ $labels.action }} - {{ $labels.kburl }}' dashboard: 'https://grafana/your-dashboard' expr: purefb_alerts_open{environment="production"} == 1 for: 1m We still need to evaluate how the pure-generated alerts will interact with the custom alerts I will cover below, and we might decide to stick to one or the other depending on what we find out. Hardware Before I continue, the image below helps visualize how some of the Pure FlashBlade components are physically organized: Because of Pure’s reliability, most isolated hardware failures do not require the immediate attention of an Ops team member. To cover the most basic hardware failures, we configure an alert that sends a message to the Ops Basecamp 4 project chat: - alert: PureHardwareFailed annotations: summary: Hardware {{ $labels.name }} in chassis {{ $labels.ch }} is failed description: 'The Pure Storage hardware {{ $labels.name }} in chassis {{ $labels.ch }} is failed' dashboard: 'https://grafana/your-dashboard' expr: purefb_hardware_health == 0 for: 1m labels: severity: chat-notification We also configure alerts that check for multiple hardware failures of the same type. This doesn’t mean two simultaneous failures will result in a critical state, but it is a fair guardrail for unexpected scenarios. We also expect those situations to be rare, keeping the risk of causing unnecessary noise low. - alert: PureMultipleHardwareFailed annotations: summary: Pure chassis {{ $labels.ch }} has {{ $value }} failed {{ $labels.type }} description: 'The Pure Storage chassis {{ $labels.ch }} has {{ $value }} failed {{ $labels.type }}, close to the healthy limit of two simultaneous failures. Ensure that the hardware failures are being worked on' dashboard: 'https://grafana/your-dashboard' expr: count(purefb_hardware_health{type!~"eth|mgmt_port|bay"} == 0) by (ch,type,environment) > 1 for: 1m labels: severity: page # We are looking for multiple failed bays in the same blade - alert: PureMultipleBaysFailed annotations: summary: Pure chassis {{ $labels.ch }} has fb {{ $labels.fb }} with {{ $value }} failed bays description: 'The Pure Storage chassis {{ $labels.ch }} has fb {{ $labels.fb }} with {{ $value }} failed bays, close to the healthy limit of two simultaneous failures. Ensure that the hardware failures are being worked on' dashboard: 'https://grafana/your-dashboard' expr: count(purefb_hardware_health{type="bay"} == 0) by (ch,type,fb,environment) > 1 for: 1m labels: severity: page Finally, we configure high-level alerts for chassis and XFM failures: - alert: PureChassisFailed annotations: summary: Chassis {{ $labels.name }} is failed description: 'The Pure Storage hardware chassis {{ $labels.name }} is failed' dashboard: 'https://grafana/your-dashboard' expr: purefb_hardware_health{type="ch"} == 0 for: 1m labels: severity: page - alert: PureXFMFailed annotations: summary: Xternal Fabric Module {{ $labels.name }} is failed description: 'The Pure Storage hardware Xternal fabric module {{ $labels.name }} is failed' dashboard: 'https://grafana/your-dashboard' expr: purefb_hardware_health{type="xfm"} == 0 for: 1m labels: severity: page Latency Using the metric purefb_array_performance_latency_usec we can set a threshold for all the different protocols and dimensions (read, write, etc), so we are alerted if any problem causes the latency to go above an expected level. - alert: PureLatencyHigh annotations: summary: Pure {{ $labels.dimension }} - {{ $labels.protocol }} latency high description: 'Pure {{ $labels.protocol }} latency for dimension {{ $labels.dimension }} is above 100ms' dashboard: 'https://grafana/your-dashboard' expr: (avg_over_time(purefb_array_performance_latency_usec{protocol="all"}[30m]) * 0.001) for: 1m labels: severity: chat-notification Saturation For saturation, we are primarily worried about something unexpected causing excessive use of array space, increasing the risk of hitting the cluster capacity. With that in mind, it’s good to have a simple alert in place, even if we don’t expect it to fire anytime soon: - alert: PureArraySpace annotations: summary: Pure Cluster {{ $labels.instance }} available space is expected to be below 10% description: 'The array space for pure cluster {{ $labels.instance }} is expected to be below 10% in a month, please investigate and ensure there is no risk of running out of capacity' dashboard: 'https://grafana/your-dashboard' expr: (predict_linear(purefb_array_space_bytes{space="empty",type="array"}[30d], 730 * 3600)) < (purefb_array_space_bytes{space="capacity",type="array"} * 0.10) for: 1m labels: severity: chat-notification HTTP We use BigIp load balancers to front-end the cluster, which means that all the alerts we already had in place for the BigIp HTTP profiles, virtual servers, and pools also cover access to Pure. The solution for each organization on this topic will be different, but it is a good practice to keep an eye on HTTP status codes and throughput. Grafana Dashboards The project’s GitHub repository includes JSON files for Grafana dashboards that are based on the metrics generated by the exporter. With simple adjustments to fit each setup, it’s possible to import them quickly. Wrapping up On top of the system’s built-in capabilities, Pure also provides options to integrate their system into well-known tools like Prometheus and Grafana, facilitating the process of managing the cluster the same way we manage everything else. I hope this post helps any other team interested in working with them better understand the effort involved. Thanks for reading!
Today, we are releasing Hotwire Spark, a live-reloading system for Rails Applications. Reloading the browser automatically on source changes is a problem that has been well-solved for a long time. Here, we wanted to put an accent on smoothness. If the reload operation is very noticeable, the feedback loop is similar to just reloading the page yourself. But if it’s smooth enough—if you only perceive the intended change—the feedback loop becomes terrific. To use, just install the gem in development: group :development do gem "hotwire-spark" end It will update the current page on three types of change: HTML content, CSS, and Stimulus controllers. How do we achieve that desired smoothness with each? For HTML content, it morphs the <body> of the page into the new <body>. Also, it disconnects and reconnects all the Stimulus controllers on the page. For CSS, it reloads the changed stylesheet. For Stimulus controllers, it fetches the changed controller, replaces its module in Stimulus, and reconnects all the controllers. We designed Hotwire Spark to shine with the #nobuildapproach we use and recommend. Serving CSS and JS assets as standalone files is ideal when you want to fetch and update only what has changed. There is no need to use bundling or any tooling. Hot Module Replacement for Stimulus controllers without any frontend building tool is pretty cool! 2024 has been a very special year for Rails. We’re thrilled to share Hotwire Spark before the year wraps up. Wishing you all a joyful holiday season and a fantastic start to 2025.
If you have the luxury of starting a new Rails app today, here’s our recommendation: go vanilla. Fight hard before adding Ruby dependencies. Keep that Gemfile that Rails generates as close to the original one as possible. Fight even harder before adding Javascript dependencies. You don’t need React or any other front-end frameworks, nor a JSON API to feed those. Hotwire is a fantastic, pragmatic, and ridiculously productive technology for the front end. Use it. The same goes for mobile apps: use Hotwire Native. With a hybrid approach you can combine the very same web app you have built with a wonderful native experience right where you want it. The productivity compared to a purely native approach is night and day. Embrace and celebrate rendering things on the server. It has become cool again. ERB templates and view helpers will take you as long as you need, and they are a fantastic common ground for designers to collaborate hands-on with the code. #nobuild is the simplest way to go; don’t close this door with your choices. Instead of bundling Javascript, use import maps. Don’t bundle CSS, just use modern standard CSS goodies and serve them all with Propshaft. If you have 100 Javascript files and 100 stylesheets, serve 200 standalone requests multiplexed over HTTP2. You will be delighted. Don’t add Redis to the mix. Use solid_cache for caching, solid_queue for jobs, and solid_cable for Action Cable. They will all work on your beloved relational database and are battle-tested. Test your apps with Minitest. Use fixtures and build a realistic set of those as you cook your app. Make your app a PWA, which is fully supported by Rails 8. This may be more than enough before caring about mobile apps at all. Deploy your app with Kamal. If you want heuristics, your importmap.rb should import Turbo, Stimulus, your app controllers, and little else. Your Gemfile should be almost identical to the one that Rails generates. I know it sounds radical, but going vanilla is a radical stance in this convoluted world of endless choices. This is the Rails 8 stack we have chosen for our new apps at 37signals. We are a tiny crew, so we care a lot about productivity. And we sell products, not stacks, so we care a lot about delighting our users. This is our Omakase stack because it offers the optimal balance for achieving both. Vanilla means your app stays nimble. Fewer dependencies mean fewer future headaches. You get a tight integration out of the box, so you can focus on building things. It also maximizes the odds of having smoother future upgrades. Vanilla requires determination, though, because new dependencies always look shiny and shinier. It’s always clear what you get when you add them, but never what you lose in the long term. It is certainly up to you. Rails is a wonderful big tent. These are our opinions. If it resonates, choose vanilla! Guess what our advice is for architecting your app internals?
We’ve just released Mission Control — Jobs v1.0.0, the dashboard and set of extensions to operate background jobs that we introduced earlier this year. This new version is the result of 92 pull requests, 67 issues and the help of 35 different contributors. It includes many bugfixes and improvements, such as: Support for Solid Queue’s recurring tasks, including running them on-demand. Support for API-only apps. Allowing immediate dispatching of scheduled and blocked jobs. Backtrace cleaning for failed jobs’ backtraces. A safer default for authentication, with Basic HTTP authentication enabled and initially closed unless configured or explicitly disabled. Recurring tasks in Mission Control — Jobs, with a subset of the tasks we run in production We use Mission Control — Jobs daily to manage jobs HEY and Basecamp 4, with both Solid Queue and Resque, and it’s the dashboard we recommend if you’re using Solid Queue for your jobs. Our plan is to upstream some of the extensions we’ve made to Active Job and continue improving it until it’s ready to be included by default in Rails together with Solid Queue. If you want to help us with that, are interested in learning more or have any issues or questions, head over to the repo in GitHub. We hope you like it!
More in programming
In the past few years, social media use has gained a bad reputation. More or less everyone is now aware that TikTok is ruining your attention span, and Twitter is radicalizing you into extreme ideologies. But, despite its enormous popularity amongst technology enthusiasts, there’s not a lot of attention given to Discord. I personally have been using Discord so much for so long that the majority of my social circle is made of people I met through the platform. I even spent two years of my life helping run the infrastructure behind the most popular Bot available on Discord. In this article, I will try to give my perspective on Discord, why I think it is harmful, and what can we do about it. appshunter.io A tale of two book clubs To explain my point of view about Discord, I will compare the experience between joining a real-life book-club, and one that communicates exclusively through Discord. This example is about books, but the same issues would apply if it was a community talking about investing, knitting, or collecting stamps. As Marshall McLuhan showed last century, examining media should be done independently of their content. In the first scenario, we have Bob. Bob enjoys reading books, which is generally a solitary hobby. To break this solitude, Bob decides to join a book club. This book club reunites twice a month in a library where they talk about a new book each time. In the second scenario, we have Alice. Alice also likes books. Alice also wants to meet fellow book lovers. Being a nerd, Alice decides to join a Discord server. This server does not have fixed meeting times. Most users simply use the text channels to talk about what they are reading anytime during the day. Crumbs of Belongingness In Bob’s book club, a session typically lasts an hour. First, the librarian takes some time to welcome everyone and introduce newcomers. After, that each club member talks about the book they were expected to read. They can talk about what they liked and disliked, how the book made them feel, and the chapters they found particularly noteworthy. Once each member had the time to talk about the book, they vote on the book they are going to read for the next date. After the session is concluded, some members move to the nearest coffeehouse to keep talking. During this session of one hour, Bob spent around one hour socializing. The need for belongingness that drove Bob to join this book club is fully met. On Alice’s side, the server is running 24/7. When she opens the app, even if there are sometimes more than 4000 members of her virtual book club online, most of the time, nobody is talking. If she was to spend an entire hour staring at the server she might witness a dozen or so messages. Those messages may be part of small conversations in which Alice can take part. Sadly, most of the time they will be simple uploads of memes, conversations about books she hasn’t read, or messages that do not convey enough meaning to start a conversation. In one hour of constant Discord use, Alice’s need for socializing has not been met. Susan Q Yin The shop is closed Even if Bob’s library is open every day, the book club is only open for a total of two hours a month. It is enough for Bob. Since the book club fulfills his need, he doesn’t want it to be around for longer. He has not even entertained the thought of joining a second book club, because too many meetings would be overwhelming. For Alice, Discord is always available. No matter if she is at home or away, it is always somewhere in her phone or taskbar. At any moment of the day, she might notice a red circle above the icon. It tells her there are unread messages on Discord. When she notices that, she instinctively stops her current task and opens the app to spend a few minutes checking her messages. Most of the time those messages do not lead to a meaningful conversation. Reading a few messages isn’t enough to meet her need for socialization. So, after having scrolled through the messages, she goes back to waiting for the next notification. Each time she interrupts her current task to check Discord, getting back into the flow can take several minutes or not happen at all. This can easily happen dozens of times a day and cost Alice hundreds of hours each month. Book hopping When Bob gets home, the club only requires him to read the next book. He may also choose to read two books at the same time, one for the book club and one from his personal backlog. But, if he were to keep his efforts to a strict minimum, he would still have things to talk about in the next session. Alice wants to be able to talk with other users about the books they are reading. So she starts reading the books that are trending and get mentionned often. The issue is, Discord’s conversation are instantaneous, and instantaneity compresses time. A book isn’t going to stay popular and relevant for two whole weeks, if it manages to be the thing people talk about for two whole days, it’s already great. Alice might try to purchase and read two to three books a week to keep up with the server rythm. Even if books are not terribly expensive, this can turn a 20 $/month hobby into a 200 $/month hobby. In addition to that, if reading a book takes Alice on average 10 hours, reading 3 books a week would be like adding a part-time job to her schedule. All this, while being constantly interrupted by the need to check if new conversations have been posted to the server. visnu deva Quitting Discord If you are in Alice’s situation, the solution is quite simple: use Discord less, ideally not at all. On my side, I’ve left every server that is not relevant to my current work. I blocked discord.com from the DNS of my coding computer (using NextDNS) and uninstalled the app from my phone. This makes the platform only usable as a direct messaging app, exclusively from my gaming device, which I cannot carry with me. I think many people realize the addictive nature of Discord, yet keep using the application all the time. One common objection to quitting the platform, is that there’s a need for an alternative: maybe we should go back to forums, or IRC, or use Matrix, etc… I don’t think any alternative internet chat platform can solve the problem. The real problem is that we want to be able to talk to people without leaving home, at any time, without any inconvenience. But what we should do is exactly that, leave home and join a real book club, one that is not open 24/7, and one where the members take the time to listen to each other. In the software community, we have also been convinced that every one of our projects needs to be on Discord. Every game needs a server, open-source projects offer support on Discord, and a bunch of AI startups even use it as their main user interface. I even made a server for Dice’n Goblins. I don’t think it’s really that useful. I’m not even sure it’s that convenient. Popular games are not popular because they have big servers, they have big servers because they are popular. Successful open-source projects often don’t even have a server.
Learn how disposable objects solve test cleanup problems in flat testing. Use TypeScript's using keyword to ensure reliable resource disposal in tests.
Digital Ghosts My mom recently had a free consultation from her electric company to assess replacing her propane water heater with an electric water pump heater. She forwarded the assessment report to me, and I spent some time reviewing and researching the program. Despite living quite far away, I have been surprised by how much […]
Watch now | Privilege levels, syscall conventions, and how assembly code talks to the Linux kernel
Read more about RSS Club. I’ve been reading Apple in China by Patrick McGee. There’s this part in there where he’s talking about a guy who worked for Apple and was known for being ruthless, stopping at nothing to negotiate the best deal for Apple. He was so aggressive yet convincing that suppliers often found themselves faced with regret, wondering how they got talked into a deal that in hindsight was not in their best interest.[1] One particular Apple executive sourced in the book noted how there are companies who don’t employ questionable tactics to gain an edge, but most of them don’t exist anymore. To paraphrase: “I worked with two kinds of suppliers at Apple: 1) complete assholes, and 2) those who are no longer in business.” Taking advantage of people is normalized in business on account of it being existential, i.e. “If we don’t act like assholes — or have someone on our team who will on our behalf[1] — we will not survive!” In other words: All’s fair in self-defense. But what’s the point of survival if you become an asshole in the process? What else is there in life if not what you become in the process? It’s almost comedically twisted how easy it is for us to become the very thing we abhor if it means our survival. (Note to self: before you start anything, ask “What will this help me become, and is that who I want to be?”) It’s interesting how we can smile at stories like that and think, “Gosh they’re tenacious, glad they’re on my side!” Not stopping to think for a moment what it would feel like to be on the other side of that equation. ⏎ Email · Mastodon · Bluesky