Full Width [alt+shift+f] Shortcuts [alt+shift+k]
Sign Up [alt+shift+s] Log In [alt+shift+l]
25
Poor Man’s CSS Full-Bleed Layout 2020-10-07 I recently came across the very well written and interesting article, Full-Bleed Layout Using CSS Grid, while browsing my daily designer feeds. I won’t go into the post’s specifics here (I recommend you read the article for yourself) but it details how to render full-bleed element effects utilizing the CSS grid property. While the approach in the article works perfectly fine, I thought to myself, “Is there not a simpler, more backwards compatible way to do this?”. Indeed there is. Make the Web Backwards Compatible I try my best when creating specific element designs or layouts to have everything render consistently across almost all browsers. This tends to include the obvious front-runners: Chrome, Firefox, Safari - but I also try my best not to ignore the oldies: IE11, Edge and older versions of Opera. I believe if most web designers even loosely followed this concept we wouldn’t be stringing together barely implemented CSS properties and...
over a year ago

Improve your reading experience

Logged in users get linked directly to articles resulting in a better reading experience. Please login for free, it takes less than 1 minute.

More from bt RSS Feed

Installing OpenBSD on Linveo KVM VPS

Installing OpenBSD on Linveo KVM VPS 2024-10-21 I recently came across an amazing deal for a VPS on Linveo. For just $15 a year they provide: AMD KVM 1GB 1024 MB RAM 1 CPU Core 25 GB NVMe SSD 2000 GB Bandwidth It’s a pretty great deal and I suggest you look more into it if you’re interested! But this post is more focused on setting up OpenBSD via the custom ISO option in the KVM dashboard. Linveo already provides several Linux OS options, along with FreeBSD by default (which is great!). Since there is no OpenBSD template we need to do things manually. Getting Started Once you have your initial VPS up and running, login to the main dashboard and navigate to the Media tab. Under CD/DVD-ROM you’ll want to click “Custom CD/DVD” and enter the direct link to the install76.iso: https://cdn.openbsd.org/pub/OpenBSD/7.6/amd64/install76.iso The "Media" tab of the Linveo Dashboard. Use the official ISO link and set the Boot Order to CD/DVD. Select “Insert”, then set your Boot Order to CD/DVD and click “Apply”. Once complete, Restart your server. Installing via VNC With the server rebooting, jump over to Options and click on “Browser VNC” to launch the web-based VNC client. From here we will boot into the OpenBSD installer and get things going! Follow the installer as you normally would when installing OpenBSD (if you’re unsure, I have a step-by-step walkthrough) until you reach the IPv4 selection. At this point you will want to input your servers IPv4 and IPv6 IPs found under your Network section of your dashboard. Next you will want to set the IPv6 route to first default listed option (not “none”). After that is complete, choose cd0 for your install media (don’t worry about http yet). Continue with the rest of the install (make users if desired, etc) until it tells you to reboot the machine. Go back to the Linveo Dashboard, switch your Boot Order back to “Harddrive” and reboot the machine directly. Booting into OpenBSD Load into the VNC client again. If you did everything correctly you should be greeted with the OpenBSD login prompt. There are a few tweaks we still need to make, so login as the root user. Remember how we installed our sets directly from the cd0? We’ll want to change that. Since we are running OpenBSD “virtually” through KVM, our target network interface will be vio0. Edit the /etc/hostname.vio0 file and add the following: dhcp !route add default <your_gateway_ip> The <your_gateway_ip> can be found under the Network tab of your dashboard. The next file we need to tweak is /etc/resolv.conf. Add the following to it: nameserver 8.8.8.8 nameserver 1.1.1.1 These nameservers are based on your selected IPs under the Resolvers section of Network in the Linveo dashboard. Change these as you see fit, so long as they match what you place in the resolve.conf file. Finally, the last file we need to edit is /etc/pf.conf. Like the others, add the following: pass out proto { tcp, udp } from any to any port 53 Final Stretch Now just reboot the server. Log back in as your desired user and everything should be working as expected! You can perform a simple test to check: ping openbsd.org This should work - meaning your network is up and running! Now you’re free to enjoy the beauty that is OpenBSD.

8 months ago 77 votes
Vertical Tabs in Safari

Vertical Tabs in Safari 2024-09-26 I use Firefox as my main browser (specifically the Nightly build) which has vertical tabs built-in. There are instances where I need to use Safari, such as debugging or testing iOS devices, and in those instances I prefer to have a similar experience to that of Firefox. Luckily, Apple has finally made it fairly straight forward to do so. Click the Sidebar icon in the top left of the Safari browser Right click and group your current tab(s) (I normally name mine something uninspired like “My Tabs” or simply “Tabs”) For an extra “clean look”, remove the horizontal tabs by right clicking the top bar, selected Customize Toolbar and dragging the tabs out When everything is set properly, you’ll have something that looks like this: One minor drawback is not having access to a direct URL input, since we have removed the horizontal tab bar altogether. Using a set of curated bookmarks could help avoid the need for direct input, along with setting our new tab page to DuckDuckGo or any other search engine.

9 months ago 79 votes
Build and Deploy Websites Automatically with Git

Build and Deploy Websites Automatically with Git 2024-09-20 I recently began the process of setting up my self-hosted1 cgit server as my main code forge. Updating repos via cgit on NearlyFreeSpeech on its own has been simple enough, but it lacked the “wow-factor” of having some sort of automated build process. I looked into a bunch of different tools that I could add to my workflow and automate deploying changes. The problem was they all seemed to be fairly bloated or overly complex for my needs. Then I realized I could simply use post-receive hooks which were already built-in to git! You can’t get more simple than that… So I thought it would be best to document my full process. These notes are more for my future self when I inevitably forget this, but hopefully others can benefit from it! Before We Begin This “tutorial” assumes that you already have a git server setup. It shouldn’t matter what kind of forge you’re using, so long as you have access to the hooks/ directory and have the ability to write a custom post-receive script. For my purposes I will be running standard git via the web through cgit, hosted on NearlyFreeSpeech (FreeBSD based). Overview Here is a quick rundown of what we plan to do: Write a custom post-receive script in the repo of our choice Build and deploy our project when a remote push to master is made Nothing crazy. Once you get the hang of things it’s really simple. Prepping Our Servers Before we get into the nitty-gritty, there are a few items we need to take care of first: Your main git repo needs ssh access to your web hosting (deploy) server. Make sure to add your public key and run a connection test first (before running the post-receive hook) in order to approve the “fingerprinting”. You will need to git clone your main git repo in a private/admin area of your deploy server. In the examples below, mine is cloned under /home/private/_deploys Once you do both of those tasks, continue with the rest of the article! The post-receive Script I will be using my own personal website as the main project for this example. My site is built with wruby, so the build instructions are specific to that generator. If you use Jekyll or something similar, you will need to tweak those commands for your own purposes. Head into your main git repo (not the cloned one on your deploy server), navigate under the hooks/ directory and create a new file named post-receive containing the following: #!/bin/bash # Get the branch that was pushed while read oldrev newrev ref do branch=$(echo $ref | cut -d/ -f3) if [ "$branch" == "master" ]; then echo "Deploying..." # Build on the remote server ssh user@deployserver.net << EOF set -e # Stop on any error cd /home/private/_deploys/btxx.org git pull origin master gem install 'kramdown:2.4.0' 'rss:0.3.0' make build rsync -a build/* ~/public/btxx.org/ EOF echo "Build synced to the deployment server." echo "Deployment complete." fi done Let’s break everything down. First we check if the branch being pushed to the remote server is master. Only if this is true do we proceed. (Feel free to change this if you prefer something like production or deploy) if [ "$branch" == "master" ]; then Then we ssh into the server (ie. deployserver.net) which will perform the build commands and also host these built files. ssh user@deployserver.net << EOF Setting set -e ensures that the script stops if any errors are triggered. set -e # Stop on any error Next, we navigate into the previously mentioned “private” directory, pull the latest changes from master, and run the required build commands (in this case installing gems and running make build) cd /home/private/_deploys/btxx.org git pull origin master gem install 'kramdown:2.4.0' 'rss:0.3.0' make build Finally, rsync is run to copy just the build directory to our public-facing site directory. rsync -a build/* ~/public/btxx.org/ With that saved and finished, be sure to give this file proper permissions: chmod +x post-receive That’s all there is to it! Time to Test! Now make changes to your main git project and push those up into master. You should see the post-receive commands printing out into your terminal successfully. Now check out your website to see the changes. Good stuff. Still Using sourcehut My go-to code forge was previously handled through sourcehut, which will now be used for mirroring my repos and handling mailing lists (since I don’t feel like hosting something like that myself - yet!). This switch over was nothing against sourcehut itself but more of a “I want to control all aspects of my projects” mentality. I hope this was helpful and please feel free to reach out with suggestions or improvements! By self-hosted I mean a NearlyFreeSpeech instance ↩

9 months ago 92 votes
Burning & Playing PS2 Games without a Modded Console

Burning & Playing PS2 Games without a Modded Console 2024-09-02 Important: I do not support pirating or obtaining illegal copies of video games. This process should only be used to copy your existing PS2 games for backup, in case of accidental damage to the original disc. Requirements Note: This tutorial is tailored towards macOS users, but most things should work similar on Windows or Linux. You will need: An official PS2 game disc (the one you wish to copy) A PS2 Slim console An Apple device with a optical DVD drive (or a portable USB DVD drive) Some time and a coffee! (or tea) Create an ISO Image of Your PS2 Disc: Insert your PS2 disc into your optical drive. Open Disk Utility (Applications > Utilities) In Disk Utility, select your PS2 disc from the sidebar Click on the File menu, then select New Image > Image from [Disc Name] Choose a destination to save the ISO file and select the format as DVD/CD Master Name your file and click Save. Disk Utility will create a .cdr file, which is essentially an ISO file Before we move on, we will need to convert that newly created cdr file into ISO. Navigate to the directory where the .cdr file is located and use the hdiutil command to convert the .cdr file to an ISO file: hdiutil convert yourfile.cdr -format UDTO -o yourfile.iso You’ll end up with a file named yourfile.iso.cdr. Rename it by removing the .cdr extension to make it an .iso file: mv yourfile.iso.cdr yourfile.iso Done and done. Getting Started For Mac and Linux users, you will need to install Wine in order to run the patcher: # macOS brew install wine-stable # Linux (Debian) apt install wine Clone & Run the Patcher Clone the FreeDVDBoot ESR Patcher: git clone https://git.sr.ht/~bt/fdvdb-esr Navigate to the cloned project folder: cd /path/to/fdvdb-esr The run the executable: wine FDVDB_ESR_Patcher.exe Now you need to select your previously cloned ISO file, use the default Payload setting and then click Patch!. After a few seconds your file should be patched. Burning Our ISO to DVD It’s time for the main event! Insert a blank DVD-R into your disc drive and mount it. Then right click on your patched ISO file and run “Burn Disk Image to Disc...". From here, you want to make sure you select the slowest write speed and enable verification. Once the file is written to the disc and verified (verification might fail - it is safe to ignore) you can remove the disc from the drive. Before Playing the Game Make sure you change the PS2 disc speed from Standard to Fast in the main “Browser” setting before you put the game into your console. After that, enjoy playing your cloned PS2 game!

10 months ago 68 votes
"This Key is Useless Now. Discard?"

“This Key is Useless Now. Discard?” 2024-08-28 The title of this article probably triggers nostalgic memories for old school Resident Evil veterans like myself. My personal favourite in the series (not that anyone asked) was the original, 1998 version of Resident Evil 2 (RE2). I believe that game stands the test of time and is very close to a masterpiece. The recent remake lost a lot of the charm and nuance that made the original so great, which is why I consistently fire up the PS1 version on my PS2 Slim. Resident Evil 2 (PS1) running on my PS2, hooked up to my Toshiba CRT TV. But the point of this post isn’t to gush over RE2. Instead I would like to discuss how well RE2 handled its interface and user experience across multiple in-game systems. HUD? What HUD? Just like the first Resident Evil that came before it, RE2 has no in-game HUD (heads-up display) to speak of. It’s just your playable character and the environment. No ammo-counters. No health bars. No “quest” markers. Nothing. This is how the game looks while you play. Zero HUD elements. The game does provide you with an inventory system, which holds your core items, weapons and notes you find along your journey. Opening up this sub-menu allows you to heal, reload weapons, combine objects or puzzle items, or read through previously collected documents. Not only is this more immersive (HUDs don’t exist for us in the real world, we need to look through our packs as well…) it also gets out of the way. The main inventory screen. Shows everything you need to know, only when you need it. (I can hear this screenshot...) I don’t need a visual element in the bottom corner showing me a list of “items” I can cycle through. I don’t want an ammo counter cluttering up my screen with information I only need to see in combat or while manually reloading. If those are pieces of information I need, I’ll explicitly open and look for it. Don’t make assumptions about what is important to me on screen. Capcom took this concept of less visual clutter even further in regards to maps and the character health status. Where We’re Going, We Don’t Need Roads Mini-Maps A great deal of newer games come pre-packaged with a mini-map on the main interface. In certain instances this works just fine, but most are 100% UI clutter. Something to add to the screen. I can only assume some devs believe it is “helpful”. Most times it’s simply a distraction. Thank goodness most games allow you to disable them. As for RE2, you collect maps throughout your adventure and, just like most other systems in the game, you need to consciously open the map menu to view them. You know, just like in real life. This creates a higher tension as well, since you need to constantly reference your map (on initial playthroughs) to figure out where the heck to go. You feel the pressure of someone frantically pulling out a physical map and scanning their surroundings. It also helps the player build a mental model in their head, thus providing even more of that sweet, sweet immersion. The map of the Raccoon City Police Station. No Pain, No Gain The game doesn’t display any health bar or player status information. In order to view your current status (symbolized by “Fine”, “Caution” or “Danger”) you need to open your inventory screen. From here you can heal yourself (if needed) and see the status type change in real-time. The "condition" health status. This is fine. But that isn’t the only way to visually see your current status. Here’s a scenario: you’re traveling down a hallway, turn a corner and run right into the arms of a zombie. She takes a couple good bites out of your neck before you push her aside. You unload some handgun rounds into her and down she goes. As you run over her body she reaches out and chomps on your leg as a final “goodbye”. You break free and move along but notice something different in your character’s movement - they’re holding their stomach and limping. Here we can see the character "Hunk" holding his stomach and limping, indicating an injury without the need for a custom HUD element. If this was your first time playing, most players would instinctively open the inventory menu, where their characters health is displayed, and (in this instance) be greeted with a “Caution” status. This is another example of subtle UX design. I don’t need to know the health status of my character until an action is required (in this example: healing). The health system is out of the way but not hidden. This keeps the focus on immersion, not baby-sitting the game’s interface. A Key to Every Lock Hey! This section is in reference to the title of the article. We made it! …But yes, discarding keys in RE2 is a subtle example of fantastic user experience. As a player, I know for certain this key is no longer needed. I can safely discard it and free up precious space from my inventory. There is also a sense of accomplishment, a feeling of “I’ve completed a task” or an internal checkbox being ticked. Progress has been made! Don’t overlook how powerful of a interaction this little text prompt is. Ask any veteran of the series and they will tell you this prompt is almost euphoric. The game's prompt asking if you'd like to discard a useless key. Perfection. Inspiring Greatness RE2 is certainly not the first or last game to implement these “minimal” game systems. A more “modern” example is Dead Space (DS), along with its very faithful remake. In DS the character’s health is displayed directly on the character model itself, and a similar inventory screen is used to manage items. An ammo-counter is visible but only when the player aims their weapon. Pretty great stuff and another masterpiece of survival horror. In Dead Space, the character's health bar is set as part of their spacesuit. The Point I guess my main takeaway is that designers and developers should try their best to keep user experience intuitive. I know that sounds extremely generic but it is a lot more complex than one might think. Try to be as direct as possible while remaining subtle. It’s a delicate balance but experiences like RE2 show us it is attainable. I’d love to talk more, but I have another play-through of RE2 to complete…

10 months ago 70 votes

More in programming

My first year since coming back to Linux

<![CDATA[It has been a year since I set up my System76 Merkaat with Linux Mint. In July of 2024 I migrated from ChromeOS and the Merkaat has been my daily driver on the desktop. A year later I have nothing major to report, which is the point. Despite the occasional unplanned reinstallation I have been enjoying the stability of Linux and just using the PC. This stability finally enabled me to burn bridges with mainstream operating systems and fully embrace Linux and open systems. I'm ready to handle the worst and get back to work. Just a few years ago the frustration of troubleshooting a broken system would have made me seriously consider the switch to a proprietary solution. But a year of regular use, with an ordinary mix of quiet moments and glitches, gave me the confidence to stop worrying and learn to love Linux. linux a href="https://remark.as/p/journal.paoloamoroso.com/my-first-year-since-coming-back-to-linux"Discuss.../a Email | Reply @amoroso@oldbytes.space !--emailsub--]]>

18 hours ago 3 votes
Overanalyzing a minor quirk of Espressif’s reset circuit

The mystery In the previous article, I briefly mentioned a slight difference between the ESP-Prog and the reproduced circuit, when it comes to EN: Focusing on EN, it looks like the voltage level goes back to 3.3V much faster on the ESP-Prog than on the breadboard circuit. The grid is horizontally spaced at 2ms, so … Continue reading Overanalyzing a minor quirk of Espressif’s reset circuit → The post Overanalyzing a minor quirk of Espressif’s reset circuit appeared first on Quentin Santos.

19 hours ago 2 votes
What can agents actually do?

There’s a lot of excitement about what AI (specifically the latest wave of LLM-anchored AI) can do, and how AI-first companies are different from the prior generations of companies. There are a lot of important and real opportunities at hand, but I find that many of these conversations occur at such an abstract altitude that they’re a bit too abstract. Sort of like saying that your company could be much better if you merely adopted software. That’s certainly true, but it’s not a particularly helpful claim. This post is an attempt to concisely summarize how AI agents work, apply that summary to a handful of real-world use cases for AI, and make the case that the potential of AI agents is equivalent to the potential of this generation of AI. By the end of this writeup, my hope is that you’ll be well-armed to have a concrete discussion about how LLMs and agents could change the shape of your company. How do agents work? At its core, using an LLM is an API call that includes a prompt. For example, you might call Anthropic’s /v1/message with a prompt: How should I adopt LLMs in my company? That prompt is used to fill the LLM’s context window, which conditions the model to generate certain kinds of responses. This is the first important thing that agents can do: use an LLM to evaluate a context window and get a result. Prompt engineering, or context engineering as it’s being called now, is deciding what to put into the context window to best generate the responses you’re looking for. For example, In-Context Learning (ICL) is one form of context engineering, where you supply a bunch of similar examples before asking a question. If I want to determine if a transaction is fraudulent, then I might supply a bunch of prior transactions and whether they were, or were not, fraudulent as ICL examples. Those examples make generating the correct answer more likely. However, composing the perfect context window is very time intensive, benefiting from techniques like metaprompting to improve your context. Indeed, the human (or automation) creating the initial context might not know enough to do a good job of providing relevant context. For example, if you prompt, Who is going to become the next mayor of New York City?, then you are unsuited to include the answer to that question in your prompt. To do that, you would need to already know the answer, which is why you’re asking the question to begin with! This is where we see model chat experiences from OpenAI and Anthropic use web search to pull in context that you likely don’t have. If you ask a question about the new mayor of New York, they use a tool to retrieve web search results, then add the content of those searches to your context window. This is the second important thing that agents can do: use an LLM to suggest tools relevant to the context window, then enrich the context window with the tool’s response. However, it’s important to clarify how “tool usage” actually works. An LLM does not actually call a tool. (You can skim OpenAI’s function calling documentation if you want to see a specific real-world example of this.) Instead there is a five-step process to calling tools that can be a bit counter-intuitive: The program designer that calls the LLM API must also define a set of tools that the LLM is allowed to suggest using. Every API call to the LLM includes that defined set of tools as options that the LLM is allowed to recommend The response from the API call with defined functions is either: Generated text as any other call to an LLM might provide A recommendation to call a specific tool with a specific set of parameters, e.g. an LLM that knows about a get_weather tool, when prompted about the weather in Paris, might return this response: [{ "type": "function_call", "name": "get_weather", "arguments": "{\"location\":\"Paris, France\"}" }] The program that calls the LLM API then decides whether and how to honor that requested tool use. The program might decide to reject the requested tool because it’s been used too frequently recently (e.g. rate limiting), it might check if the associated user has permission to use the tool (e.g. maybe it’s a premium only tool), it might check if the parameters match the user’s role-based permissions as well (e.g. the user can check weather, but only admin users are allowed to check weather in France). If the program does decide to call the tool, it invokes the tool, then calls the LLM API with the output of the tool appended to the prior call’s context window. The important thing about this loop is that the LLM itself can still only do one interesting thing: taking a context window and returning generated text. It is the broader program, which we can start to call an agent at this point, that calls tools and sends the tools’ output to the LLM to generate more context. What’s magical is that LLMs plus tools start to really improve how you can generate context windows. Instead of having to have a very well-defined initial context window, you can use tools to inject relevant context to improve the initial context. This brings us to the third important thing that agents can do: they manage flow control for tool usage. Let’s think about three different scenarios: Flow control via rules has concrete rules about how tools can be used. Some examples: it might only allow a given tool to be used once in a given workflow (or a usage limit of a tool for each user, etc) it might require that a human-in-the-loop approves parameters over a certain value (e.g. refunds more than $100 require human approval) it might run a generated Python program and return the output to analyze a dataset (or provide error messages if it fails) apply a permission system to tool use, restricting who can use which tools and which parameters a given user is able to use (e.g. you can only retrieve your own personal data) a tool to escalate to a human representative can only be called after five back and forths with the LLM agent Flow control via statistics can use statistics to identify and act on abnormal behavior: if the size of a refund is higher than 99% of other refunds for the order size, you might want to escalate to a human if a user has used a tool more than 99% of other users, then you might want to reject usage for the rest of the day it might escalate to a human representative if tool parameters are more similar to prior parameters that required escalation to a human agent LLMs themselves absolutely cannot be trusted. Anytime you rely on an LLM to enforce something important, you will fail. Using agents to manage flow control is the mechanism that makes it possible to build safe, reliable systems with LLMs. Whenever you find yourself dealing with an unreliable LLM-based system, you can always find a way to shift the complexity to a tool to avoid that issue. As an example, if you want to do algebra with an LLM, the solution is not asking the LLM to directly perform algebra, but instead providing a tool capable of algebra to the LLM, and then relying on the LLM to call that tool with the proper parameters. At this point, there is one final important thing that agents do: they are software programs. This means they can do anything software can do to build better context windows to pass on to LLMs for generation. This is an infinite category of tasks, but generally these include: Building general context to add to context window, sometimes thought of as maintaining memory Initiating a workflow based on an incoming ticket in a ticket tracker, customer support system, etc Periodically initiating workflows at a certain time, such as hourly review of incoming tickets Alright, we’ve now summarized what AI agents can do down to four general capabilities. Recapping a bit, those capabilities are: Use an LLM to evaluate a context window and get a result Use an LLM to suggest tools relevant to the context window, then enrich the context window with the tool’s response Manage flow control for tool usage via rules or statistical analysis Agents are software programs, and can do anything other software programs do Armed with these four capabilities, we’ll be able to think about the ways we can, and cannot, apply AI agents to a number of opportunities. Use Case 1: Customer Support Agent One of the first scenarios that people often talk about deploying AI agents is customer support, so let’s start there. A typical customer support process will have multiple tiers of agents who handle increasingly complex customer problems. So let’s set a goal of taking over the easiest tier first, with the goal of moving up tiers over time as we show impact. Our approach might be: Allow tickets (or support chats) to flow into an AI agent Provide a variety of tools to the agent to support: Retrieving information about the user: recent customer support tickets, account history, account state, and so on Escalating to next tier of customer support Refund a purchase (almost certainly implemented as “refund purchase” referencing a specific purchase by the user, rather than “refund amount” to prevent scenarios where the agent can be fooled into refunding too much) Closing the user account on request Include customer support guidelines in the context window, describe customer problems, map those problems to specific tools that should be used to solve the problems Flow control rules that ensure all calls escalate to a human if not resolved within a certain time period, number of back-and-forth exchanges, if they run into an error in the agent, and so on. These rules should be both rules-based and statistics-based, ensuring that gaps in your rules are neither exploitable nor create a terrible customer experience Review agent-customer interactions for quality control, making improvements to the support guidelines provided to AI agents. Initially you would want to review every interaction, then move to interactions that lead to unusual outcomes (e.g. escalations to human) and some degree of random sampling Review hourly, then daily, and then weekly metrics of agent performance Based on your learnings from the metric reviews, you should set baselines for alerts which require more immediate response. For example, if a new topic comes up frequently, it probably means a serious regression in your product or process, and it requires immediate review rather than periodical review. Note that even when you’ve moved “Customer Support to AI agents”, you still have: a tier of human agents dealing with the most complex calls humans reviewing the periodic performance statistics humans performing quality control on AI agent-customer interactions You absolutely can replace each of those downstream steps (reviewing performance statistics, etc) with its own AI agent, but doing that requires going through the development of an AI product for each of those flows. There is a recursive process here, where over time you can eliminate many human components of your business, in exchange for increased fragility as you have more tiers of complexity. The most interesting part of complex systems isn’t how they work, it’s how they fail, and agent-driven systems will fail occasionally, as all systems do, very much including human-driven ones. Applied with care, the above series of actions will work successfully. However, it’s important to recognize that this is building an entire software pipeline, and then learning to operate that software pipeline in production. These are both very doable things, but they are meaningful work, turning customer support leadership into product managers and requiring an engineering team building and operating the customer support agent. Use Case 2: Triaging incoming bug reports When an incident is raised within your company, or when you receive a bug report, the first problem of the day is determining how severe the issue might be. If it’s potentially quite severe, then you want on-call engineers immediately investigating; if it’s certainly not severe, then you want to triage it in a less urgent process of some sort. It’s interesting to think about how an AI agent might support this triaging workflow. The process might work as follows: Pipe all created incidents and all created tickets to this agent for review. Expose these tools to the agent: Open an incident Retrieve current incidents Retrieve recently created tickets Retrieve production metrics Retrieve deployment logs Retrieve feature flag change logs Toggle known-safe feature flags Propose merging an incident with another for human approval Propose merging a ticket with another ticket for human approval Redundant LLM providers for critical workflows. If the LLM provider’s API is unavailable, retry three times over ten seconds, then resort to using a second model provider (e.g. Anthropic first, if unavailable try OpenAI), and then finally create an incident that the triaging mechanism is unavailable. For critical workflows, we can’t simply assume the APIs will be available, because in practice all major providers seem to have monthly availability issues. Merge duplicates. When a ticket comes in, first check ongoing incidents and recently created tickets for potential duplicates. If there is a probable duplicate, suggest merging the ticket or incident with the existing issue and exit the workflow. Assess impact. If production statistics are severely impacted, or if there is a new kind of error in production, then this is likely an issue that merits quick human review. If it’s high priority, open an incident. If it’s low priority, create a ticket. Propose cause. Now that the incident has been sized, switch to analyzing the potential causes of the incident. Look at the code commits in recent deploys and suggest potential issues that might have caused the current error. In some cases this will be obvious (e.g. spiking errors with a traceback of a line of code that changed recently), and in other cases it will only be proximity in time. Apply known-safe feature flags. Establish an allow list of known safe feature flags that the system is allowed to activate itself. For example, if there are expensive features that are safe to disable, it could be allowed to disable them, e.g. restricting paginating through deeper search results when under load might be a reasonable tradeoff between stability and user experience. Defer to humans. At this point, rely on humans to drive incident, or ticket, remediation to completion. Draft initial incident report. If an incident was opened, the agent should draft an initial incident report including the timeline, related changes, and the human activities taken over the course of the incident. This report should then be finalized by the human involved in the incident. Run incident review. Your existing incident review process should take the incident review and determine how to modify your systems, including the triaging agent, to increase reliability over time. Safeguard to reenable feature flags. Since we now have an agent disabling feature flags, we also need to add a periodic check (agent-driven or otherwise) to reenable the “known safe” feature flags if there isn’t an ongoing incident to avoid accidentally disabling them for long periods of time. This is another AI agent that will absolutely work as long as you treat it as a software product. In this case, engineering is likely the product owner, but it will still require thoughtful iteration to improve its behavior over time. Some of the ongoing validation to make this flow work includes: The role of humans in incident response and review will remain significant, merely aided by this agent. This is especially true in the review process, where an agent cannot solve the review process because it’s about actively learning what to change based on the incident. You can make a reasonable argument that an agent could decide what to change and then hand that specification off to another agent to implement it. Even today, you can easily imagine low risk changes (e.g. a copy change) being automatically added to a ticket for human approval. Doing this for more complex, or riskier changes, is possible but requires an extraordinary degree of care and nuance: it is the polar opposite of the idea of “just add agents and things get easy.” Instead, enabling that sort of automation will require immense care in constraining changes to systems that cannot expose unsafe behavior. For example, one startup I know has represented their domain logic in a domain-specific language (DSL) that can be safely generated by an LLM, and are able to represent many customer-specific features solely through that DSL. Expanding the list of known-safe feature flags to make incidents remediable. To do this widely will require enforcing very specific requirements for how software is developed. Even doing this narrowly will require changes to ensure the known-safe feature flags remain safe as software is developed. Periodically reviewing incident statistics over time to ensure mean-time-to-resolution (MTTR) is decreasing. If the agent is truly working, this should decrease. If the agent isn’t driving a reduction in MTTR, then something is rotten in the details of the implementation. Even a very effective agent doesn’t relieve the responsibility of careful system design. Rather, agents are a multiplier on the quality of your system design: done well, agents can make you significantly more effective. Done poorly, they’ll only amplify your problems even more widely. Do AI Agents Represent Entirety of this Generation of AI? If you accept my definition that AI agents are any combination of LLMs and software, then I think it’s true that there’s not much this generation of AI can express that doesn’t fit this definition. I’d readily accept the argument that LLM is too narrow a term, and that perhaps foundational model would be a better term. My sense is that this is a place where frontier definitions and colloquial usage have deviated a bit. Closing thoughts LLMs and agents are powerful mechanisms. I think they will truly change how products are designed and how products work. An entire generation of software makers, and company executives, are in the midst of learning how these tools work. Software isn’t magic, it’s very logical, but what it can accomplish is magical. The same goes for agents and LLMs. The more we can accelerate that learning curve, the better for our industry.

16 hours ago 2 votes
Can tinygrad win?

This is not going to be a cakewalk like self driving cars. Most of comma’s competition is now out of business, taking billions and billions of dollars with it. Re: Tesla and FSD, we always expected Tesla to have the lead, but it’s not a winner take all market, it will look more like iOS vs Android. comma has been around for 10 years, is profitable, and is now growing rapidly. In self driving, most of the competition wasn’t even playing the right game. This isn’t how it is for ML frameworks. tinygrad’s competition is playing the right game, open source, and run by some quite smart people. But this is my second startup, so hopefully taking a bit more risk is appropriate. For comma to win, all it would take is people in 2016 being wrong about LIDAR, mapping, end to end, and hand coding, which hopefully we all agree now that they were. For tinygrad to win, it requires something much deeper to be wrong about software development in general. As it stands now, tinygrad is 14556 lines. Line count is not a perfect proxy for complexity, but when you have differences of multiple orders of magnitude, it might mean something. I asked ChatGPT to estimate the lines of code in PyTorch, JAX, and MLIR. JAX = 400k MLIR = 950k PyTorch = 3300k They range from one to two orders of magnitude off. And this isn’t even including all the libraries and drivers the other frameworks rely on, CUDA, cuBLAS, Triton, nccl, LLVM, etc…. tinygrad includes every single piece of code needed to drive an AMD RDNA3 GPU except for LLVM, and we plan to remove LLVM in a year or two as well. But so what? What does line count matter? One hypothesis is that tinygrad is only smaller because it’s not speed or feature competitive, and that if and when it becomes competitive, it will also be that many lines. But I just don’t think that’s true. tinygrad is already feature competitive, and for speed, I think the bitter lesson also applies to software. When you look at the machine learning ecosystem, you realize it’s just the same problems over and over again. The problem of multi machine, multi GPU, multi SM, multi ALU, cross machine memory scheduling, DRAM scheduling, SRAM scheduling, register scheduling, it’s all the same underlying problem at different scales. And yet, in all the current ecosystems, there are completely different codebases and libraries at each scale. I don’t think this stands. I suspect there is a simple formulation of the problem underlying all of the scheduling. Of course, this problem will be in NP and hard to optimize, but I’m betting the bitter lesson wins here. The goal of the tinygrad project is to abstract away everything except the absolute core problem in the cleanest way possible. This is why we need to replace everything. A model for the hardware is simple compared to a model for CUDA. If we succeed, tinygrad will not only be the fastest NN framework, but it will be under 25k lines all in, GPT-5 scale training job to MMIO on the PCIe bus! Here are the steps to get there: Expose the underlying search problem spanning several orders of magnitude. Due to the execution of neural networks not being data dependent, this problem is very amenable to search. Make sure your formulation is simple and complete. Fully capture all dimensions of the search space. The optimization goal is simple, run faster. Apply the state of the art in search. Burn compute. Use LLMs to guide. Use SAT solvers. Reinforcement learning. It doesn’t matter, there’s no way to cheat this goal. Just see if it runs faster. If this works, not only do we win with tinygrad, but hopefully people begin to rethink software in general. Of course, it’s a big if, this isn’t like comma where it was hard to lose. But if it wins… The main thing to watch is development speed. Our bet has to be that tinygrad’s development speed is outpacing the others. We have the AMD contract to train LLaMA 405B as fast as NVIDIA due in a year, let’s see if we succeed.

20 hours ago 1 votes
Do You Even Personalize, Bro?

There’s a video on YouTube from “Technology Connections” — who I’ve never heard of or watched until now — called Algorithms are breaking how we think. I learned of this video from Gedeon Maheux of The Iconfactory fame. Speaking in the context of why they made Tapestry, he said the ideas in this video would be their manifesto. So I gave it a watch. Generally speaking, the video asks: Does anyone care to have a self-directed experience online, or with a computer more generally? I'm not sure how infrequently we’re actually deciding for ourselves these days [how we decide what we want to see, watch, and do on the internet] Ironically we spend more time than ever on computing devices, but less time than ever curating our own experiences with them. Which — again ironically — is the inverse of many things in our lives. Generally speaking, the more time we spend with something, the more we invest in making it our own — customizing it to our own idiosyncrasies. But how much time do you spend curating, customizing, and personalizing your digital experience? (If you’re reading this in an RSS reader, high five!) I’m not talking about “I liked that post, or saved that video, so the algorithm is personalizing things for me”. Do you know what to get yourself more of? Do you know where to find it? Do you even ask yourself these questions? “That sounds like too much work” you might say. And you’re right, it is work. As the guy in the video says: I'm one of those weirdos who think the most rewarding things in life take effort Me too. Email · Mastodon · Bluesky

8 hours ago 1 votes