Full Width [alt+shift+f] Shortcuts [alt+shift+k]
Sign Up [alt+shift+s] Log In [alt+shift+l]
5
I’ve written about abusing USB storage devices in the past, with a passing mention that I’m too cheap to buy an IODD device. Then I bought one. I’ve always liked the promise of tools like Ventoy: you only need to carry the one storage device that boots anything you want. Unfortunately I still can’t trust Ventoy, so I’m forced to look elsewhere. The hardware I decided to get the IODD ST400 for 122 EUR (about 124 USD) off of Amazon Germany, since it was for some reason cheaper than getting it from iodd.shop directly. SATA SSD-s are cheap and plentiful, so the ST400 made the most sense to me. The device came with one USB cable, with type A and type C ends. The device itself has a USB type C port, which I like a lot. The buttons are functional and clicky, but incredibly loud. Setting it up Before you get started with this device, I highly recommend glancing over the official documentation. The text is poorly translated in some parts, but overall it gets the job done. Inserting the SSD was...
6 days ago

Improve your reading experience

Logged in users get linked directly to articles resulting in a better reading experience. Please login for free, it takes less than 1 minute.

More from ./techtipsy

Turns out that I'm a 'prolific open-source influencer' now

Yes, you read that right. I’m a prolific open-source influencer now. Some years ago I set up a Google Alert with my name, for fun. Who knows what it might show one day? On 7th of February, it fired an alert. Turns out that my thoughts on Ubuntu were somewhat popular, and it ended up being ingested by an AI slop generator over at Fudzilla, with no links back to the source or anything.1 Not only that, but their choice of spicy autocomplete confabulation bot a large language model completely butchered the article, leaving out critical information, which lead to one reader gloating about Windows. Not linking back to the original source? Not a good start. Misrepresenting my work? Insulting. Giving a Windows user the opportunity to boast about how happy they are with using it? Absolutely unacceptable. Here’s the full article in case they ever delete their poor excuse of a “news” “article”. two can play at that game. ↩︎

3 days ago 3 votes
Feature toggles: just roll your own!

When you’re dealing with a particularly large service with a slow deployment pipeline (15-30 minutes), and a rollback delay of up to 10 minutes, you’re going to need feature toggles (some also call them feature flags) to turn those half-an-hour nerve-wrecking major incidents into a small whoopsie-daisy that you can fix in a few seconds. Make a change, gate it behind a feature toggle, release, enable the feature toggle and monitor the impact. If there is an issue, you can immediately roll it back with one HTTP request (or database query 1). If everything looks good, you can remove the usage of the feature toggle from your code and move on with other work. Need to roll out the new feature gradually? Implement the feature toggle as a percentage and increase it as you go. It’s really that simple, and you don’t have to pay 500 USD a month to get similar functionality from a service provider and make critical paths in your application depend on them.2 As my teammate once said, our service is perfectly capable of breaking down on its own. All you really need is one database table containing the keys and values for the feature toggles, and two HTTP endpoints, one to GET the current value of the feature toggle, and one to POST a new value for an existing one. New feature toggles will be introduced using tools like Flyway or Liquibase, and the same method can be used for also deleting them later on. You can also add convenience columns containing timestamps, such as created and modified, to track when these were introduced and when the last change was. However, there are a few considerations to take into account when setting up such a system. Feature toggles implemented as database table rows can work fantastically, but you should also monitor how often these get used. If you implement a feature toggle on a hot path in your service, then you can easily generate thousands of queries per second. A properly set up feature toggles system can sustain it without any issues on any competent database engine, but you should still try to monitor the impact and remove unused feature toggles as soon as possible. For hot code paths (1000+ requests/second) you might be better off implementing feature toggles as application properties. There’s no call to the database and reading a static property is darn fast, but you lose out on the ability to update it while the application is running. Alternatively, you can rely on the same database-based feature toggles system and keep a cached copy in-memory, while also refreshing it from time to time. Toggling won’t be as responsive as it will depend on the cache expiry time, but the reduced load on the database is often worth it. If your service receives contributions from multiple teams, or you have very anxious product managers that fill your backlog faster than you can say “story points”, then it’s a good idea to also introduce expiration dates for your feature toggles, with ample warning time to properly remove them. Using this method, you can make sure that old feature toggles get properly removed as there is no better prioritization reason than a looming major incident. You don’t want them to stick around for years on end, that’s just wasteful and clutters up your codebase. If your feature toggling needs are a bit more complicated, then you may need to invest more time in your DIY solution, or you can use one of the SaaS options if you really want to, just account for the added expense and reliance on yet another third party service. At work, I help manage a business-critical monolith that handles thousands of requests per second during peak hours, and the simple approach has served us very well. All it took was one motivated developer and about a day to implement, document and communicate the solution to our stakeholders. Skip the latter two steps, and you can be done within two hours, tops. letting inexperienced developers touch the production database is a fantastic way to take down your service, and a very expensive way to learn about database locks. ↩︎ I hate to refer to specific Hacker News comments like this, but there’s just something about paying 6000 USD a year for such a service that I just can’t understand. Has the Silicon Valley mindset gone too far? Or are US-based developers just way too expensive, resulting in these types of services sounding reasonable? You can hire a senior developer in Estonia for that amount of money for 2-3 weeks (including all taxes), and they can pop in and implement a feature toggles system in a few hours at most. The response comment with the status page link that’s highlighting multiple outages for LaunchDarkly is the cherry on top. ↩︎

a week ago 10 votes
I'm done with Ubuntu

I liked Ubuntu. For a very long time, it was the sensible default option. Around 2016, I used the Ubuntu GNOME flavor, and after they ditched the Unity desktop environment, GNOME became the default option. I was really happy with it, both for work and personal computing needs. Estonian ID card software was also officially supported on Ubuntu, which made Ubuntu a good choice for family members. But then something changed. Upgrades suck Like many Ubuntu users, I stuck to the long-term support releases and upgraded every two years to the next major version. There was just one tiny little issue: every upgrade broke something. Usually it was a relatively minor issue, with some icons, fonts or themes being a bit funny. Sometimes things went completely wrong. The worst upgrade was the one I did on my mothers’ laptop. During the upgrade process from Ubuntu 20.04 to 22.04, everything blew up spectacularly. The UI froze, the machine was completely unresponsive. After a 30-minute wait and a forced restart later, the installation was absolutely fucked. In frustration, I ended up installing Windows so that I don’t have to support Ubuntu. Another family member, another upgrade. This is one that they did themselves on Lubuntu 18.04, and they upgraded to the latest version. The result: Firefox shortcuts stopped working, the status bar contained duplicate icons, and random errors popped up after logging in. After making sure that ID card software works on Fedora 40, I installed that instead. All they need is a working browser, and that’s too difficult for Ubuntu to handle. Snaps ruined Ubuntu Snaps. I hate them. They sound great in theory, but the poor implementation and heavy-handed push by Canonical has been a mess. Snaps auto-update by default. Great for security1, but horrible for users who want to control what their personal computer is doing. Snaps get forced upon users as more and more system components are forcibly switched from Debian-based packages to Snaps, which breaks compatibility, functionality and introduces a lot of new issues. You can upgrade your Ubuntu installation and then discover that your browser is now contained within a Snap, the desktop shortcut for it doesn’t work and your government ID card does not work for logging in to your bank any longer. Snaps also destroy productivity. A colleague was struggling to get any work done because the desktop environment on their Ubuntu installation was flashing certain UI elements, being unresponsive and blocking them from doing any work. Apparently the whole GNOME desktop environment is a Snap now, and that lead to issues. The fix was super easy, barely an inconvenience: roll back to the previous version of the GNOME snap restart still broken update to the latest version again restart still broken restart again it is fixed now What was the issue? Absolutely no clue, but a days’ worth of developers’ productivity was completely wasted. Some of these issues have probably been fixed by now, but if I executed migration projects at my day job with a similar track record, I would be fired.2 Snaps done right: Flatpak Snaps can be implemented in a way that doesn’t suck for end users. It’s called a Flatpak. They work reasonably well, you can update them whenever you want and they are optional. Your Firefox installation won’t suddenly turn into a Flatpak overnight. On the Steam Deck, Flatpaks are the main distribution method for user-installed apps and I don’t mind it at all. The only issue is the software selection, not every app is available as a Flatpak just yet. Consider Fedora Fedora works fine. It’s not perfect, but I like it. At this point I’ve used it for longer than Ubuntu and unless IBM ruins it for all of us, I think it will be a perfectly cromulent distro go get work done on. Hopefully it’s not too late for Canonical to reconsider their approach to building a Linux distro. the xz backdoor demonstrated that getting the latest versions of all software can also be problematic from the security angle. ↩︎ technical failures themselves are not the issue, but not responding to users’ feedback and not testing things certainly is, especially if you keep repeatedly making the same mistake. ↩︎

2 weeks ago 16 votes
Why my blog was down for over 24 hours in November 2024

In November 2024, my blog was down for over 24 hours. Here’s what I learned from this absolute clusterfuck of an incident. Lead-up to the incident I was browsing through photos on my Nextcloud instance. Everything was fine, until Nextcloud started generating preview images for older photos. This process is quite resource intensive, but generally manageable. However, this time the images were high quality photos in the 10-20 MB size range. Nextcloud crunched through those, but ended up spawning so many processes that it ended up using all the available memory on my home server. And thus, the server was down. This could have been solved by a forced reboot. Things were complicated by the simple fact that I was 120 kilometers away from my server, and I had no IPMI-like device set up. So I waited. 50 minutes later, I successfully logged in to my server over SSH again! The load averages were in the three-digit realm, but the system was mostly operational. I thought that it would be a good idea to restart the server, since who knows what might’ve gone wrong while the server was handling the out-of-memory situation. I reboot. The server doesn’t seem to come back up. Fuck. The downtime The worst part of the downtime was that I was simply unable to immediately fix it due to being 120 kilometers away from the server. My VPN connection back home was also hosted right there on the server, using this Docker image. I eventually got around to fixing this issue the next day when I could finally get hands-on with the server, my trusty ThinkPad T430. I open the lid and am greeted with the console login screen. This means that the machine did boot. I log in to the server over SSH and quickly open htop. My htop configuration shows metrics like systemd state, and it was showing 20+ failed services. This is very unusual. lsblk and mount show that the storage is there. What was the issue? Well, apparently the Docker daemon was not starting. I was searching for the error messages and ended up on this GitHub issue. I tried the fix, which involved deleting the Docker folder with all the containers and configuration, and restarted the daemon and containers. Everything is operational once again. I then rebooted the server. Everything is down again, with the same issue. And thus began a 8+ hours long troubleshooting session that ran late into the night. 04:00-ish late, on a Monday. I tried everything that I could come up with: used the btrfs Docker storage driver instead of the default overlay one Docker is still broken after a reboot replaced everything with podman I could not get podman to play well with my containers and IPv6 networking considered switching careers tractors are surprisingly expensive! I’m unable to put into words how frustrating this troubleshooting session was. The sleep deprivation, the lack of helpful information, the failed attempts at finding solutions. I’m usually quite calm and very rarely feel anger, but during these hours I felt enraged. The root cause The root cause will make more sense after you understand the storage setup I had at the time. The storage on my server consisted of four 4 TB SSD-s, two were mounted inside the laptop, and the remaining two were connected via USB-SATA adapters. The filesystem in use was btrfs, both on the OS drive and the 4x 4TB storage pool. To avoid hitting the OS boot drive with unnecessary writes, I moved the Docker data root to a separate btrfs subvolume on the main storage pool. What was the issue? Apparently the Docker daemon on Fedora Server is able to start up before every filesystem was mounted. In this case, Docker daemon started up before the subvolume containing all the Docker images, containers and networks was mounted. I tested out this theory by moving the Docker storage back to /var/lib/docker, which lives on the root filesystem, and after a reboot everything remained functional. In the past, I ran a similar setup, but with the Docker storage on the SATA SSD-s that are mounted inside the laptop over a native SATA connection. With the addition of two USB-connected SSD-s, the mounting process took longer for the whole pool, which resulted in a race condition between the Docker daemon startup and the storage being mounted. Fixing the root cause The fix for Docker starting up before all of your storage is mounted is actually quite elegant. The Docker service definition is contained in /etc/systemd/system/docker.service. You can override this configuration by creating a new directory at /etc/systemd/system/docker.service.d and dropping a file with the name override.conf in there with the following contents: [Unit] RequiresMountsFor=/containerstorage The rest of the service definition remains the same and your customized configuration won’t be overwritten with a Docker version update. The RequiresMountsFor setting prevents the Docker service from starting up before that particular mount exists. You can specify multiple mount points on the same line, separated by spaces. [Unit] RequiresMountsFor=/containerstorage /otherstorage /some/other/mountpoint You can also specify the mount points over multiple lines if you prefer. [Unit] RequiresMountsFor=/containerstorage RequiresMountsFor=/otherstorage RequiresMountsFor=/some/other/mountpoint If you’re using systemd unit files for controlling containers, then you can use the same systemd setting to prevent your containers from starting up before the storage that the container depends on is mounted. Avoiding the out of memory incident Nextcloud taking down my home server for 50 minutes was not the root cause, it only highlighted an issue that had been there for days at that point. That doesn’t mean that this area can’t be improved. After this incident, every Docker Compose file that I use includes resource limits on all containers. When defining the limits, I started with very conservative limits based on the average resource usage as observed from docker stats output. Over the past few months I’ve had to continuously tweak the limits, especially the memory ones, due to the containers themselves running out of memory when the limits were set too low. Apparently software is getting increasingly more resource hungry. An example Docker Compose file with resource limits looks like this: name: nextcloud services: nextcloud: container_name: nextcloud volumes: - /path/to/nextcloud/stuff:/data deploy: resources: limits: cpus: "4" memory: 2gb image: docker.io/nextcloud:latest restart: always nextcloud-db: container_name: nextcloud-db volumes: - /path/to/database:/var/lib/postgresql/data deploy: resources: limits: cpus: "4" memory: 2gb image: docker.io/postgres:16 restart: always In this example, each container is able to use up to 4 CPU cores and a maximum of 2 GB of memory. And just like that, Nextcloud is unable to take down my server by eating up all the available memory. Yes, I’m aware of the Preview Generator Nextcloud app. I have it, but over multiple years of running Nextcloud, I have not found it to be very effective against the resource-hungry preview image generation happening during user interactions. Decoupling my VPN solution from Docker With this incident, it was also clear that running your gateway to your home network inside a container was a really stupid idea. I’ve mitigated this issue by taking the WireGuard configuration generated by the container and moving it to the host. I also used this as an opportunity to get a to-do list item done and used this guide to add IPv6 support inside the virtual WireGuard network. I can now access IPv6 networks everywhere I go! I briefly considered setting WireGuard up on my openWRT-powered router, but I decided against it as I’d like to own one computer that I don’t screw up with my configuration changes. Closing thoughts I have not yet faced an incident this severe, even at work. The impact wasn’t that big, I guess a hundred people were not able to read my blog, but the stress levels were off the charts for me during the troubleshooting process. I’ve long advocated for self-hosting and running basic and boring solutions, with the main benefits being ease of maintenance, troubleshooting and low cost. This incident is a good reminder that even the most basic setups can have complicated issues associated with them. At least I got it fixed and learned about a new systemd unit setting, which is nice. Still better than handling Kubernetes issues.

4 weeks ago 28 votes

More in technology

The 2024 Arduino Open Source Report is here!

Every year, we take a moment to reflect on the contributions we made to the open source movement, and the many ways our community has made a huge difference. As we publish the latest Open Source Report, we are proud to say 2024 was another year of remarkable progress and achievements. A year of growth and […] The post The 2024 Arduino Open Source Report is here! appeared first on Arduino Blog.

13 hours ago 2 votes
On the new iPhone 16e

Today Apple unveiled the iPhone 16e, which they're calling a part of the iPhone 16 lineup now, rather than this odd duck the SE had been since launch. Honestly, I think this move in particular is interesting and maybe makes a ton of sense, but let's

8 hours ago 1 votes
DIY micro lab analyzes ammonia levels in blood and urine

Cirrhosis of the liver is an extremely serious condition that requires extensive medical monitoring and often intervention. Progression of the condition can be fatal, so even if caught early it must be monitored closely. But, like most things in medicine, that gets expensive. That’s why Marb built his own DIY “micro lab” to analyze ammonia […] The post DIY micro lab analyzes ammonia levels in blood and urine appeared first on Arduino Blog.

9 hours ago 1 votes
Trump’s latest executive order plainly states no one has authority over him

The White House: Ensuring Accountability for All Agencies Sec. 7. Rules of Conduct Guiding Federal Employees’ Interpretation of the Law. The President and the Attorney General, subject to the President’s supervision and control, shall provide authoritative interpretations of law for the executive branch. The President and the

13 hours ago 1 votes
This mod simplifies single-point threading on mini lathes

“Single-point threading” on a lathe is the process of cutting threads, such as for a bolt, into the material through turning. The spindle/workpiece spin and the carriage moves linearly at a precise amount per turn of the spindle. That linear movement is the thread pitch. But this process usually requires several passes to reach the […] The post This mod simplifies single-point threading on mini lathes appeared first on Arduino Blog.

yesterday 2 votes