More from HTMHell
by Alexis Degryse I think we all know the <datalist> element (and if you don’t, it’s ok). It holds a list of <option> elements, offering suggested choices for its associated input field. It’s not an alternative for the <select> element. A field associated to a <datalist> can still allow any value that is not listed in the <option> elements. Here is a basic example: Pretty cool, isn't it? But what happens if we combine <datalist> with less common field types, like color and date: <label for="favorite-color">What is your favorite color?</label> <input type="color" list="colors-list" id="favorite-color"> <datalist id="colors-list"> <option>#FF0000</option> <option>#FFA500</option> <option>#FFFF00</option> <option>#008000</option> <option>#0000FF</option> <option>#800080</option> <option>#FFC0CB</option> <option>#FFFFFF</option> <option>#000000</option> </datalist> Colors listed in <datalist> are pre-selectable but the color picker is still usable by users if they need to choose a more specific one. <label for="event-choice" class="form-label col-form-label-lg">Choose a historical date</label> <input type="date" list="events" id="event-choice"> <datalist id="events"> <option label="Fall of the Berlin wall">1989-11-09</option> <option label="Maastricht Treaty">1992-02-07</option> <option label="Brexit Referendum">2016-06-23</option> </datalist> Same here: some dates are pre-selectable and the datepicker is still available. Depending on the context, having pre-defined values can possibly speed up the form filling by users. Please, note that <datalist> should be seen as a progressive enhancement because of some points: For Firefox (tested on 133), the <datalist> element is compatible only with textual field types (think about text, url, tel, email, number). There is no support for color, date and time. For Safari (tested on 15.6), it has support for color, but not for date and time. With some screen reader/browser combinations there are issues. For example, suggestions are not announced in Safari and it's not possible to navigate to the datalist with the down arrow key (until you type something matched with suggestions). Refer to a11ysupport.io for more. Find out more datalist experiment by Eiji Kitamura Documentation on MDN
by Schepp Everybody loves fast websites, and everyone despises slow ones even more. Site speed significantly contributes to the overall user experience (UX), determining whether it feels positive or negative. To ensure the fastest possible page load times, it’s crucial to design with performance in mind. However, performance optimization is an art form in itself. While implementing straightforward techniques like file compression or proper cache headers is relatively easy, achieving deeper optimizations can quickly become complex. But what if, instead of solely trying to accelerate the loading process, we triggered it earlier—without the user noticing? One way to achieve this is by prefetching pages the user might navigate to next using <link rel="prefetch"> tags. These tags are typically embedded in your HTML, but they can also be generated dynamically via JavaScript, based on a heuristic of your choice. Alternatively, you can send them as an HTML Link header if you lack access to the HTML code but can modify the server configuration. Browsers will take note of the prefetch directives and fetch the referenced pages as needed. ⚠︎ Caveat: To benefit from this prefetching technique, you must allow the browser to cache pages temporarily using the Cache-Control HTTP header. For example, Cache-Control: max-age=300 would tell the browser to cache a page for five minutes. Without such a header, the browser will discard the pre-fetched resource and fetch it again upon navigation, rendering the prefetch ineffective. In addition to <link rel="prefetch">, Chromium-based browsers support <link rel="prerender">. This tag is essentially a supercharged version of <link rel="prefetch">. Known as "NoState Prefetch," it not only prefetches an HTML page but also scans it for subresources—stylesheets, JavaScript files, images, and fonts referenced via a <link rel="preload" as="font" crossorigin> — loading them as well. The Speculation Rules API A relatively new addition to Chromium browsers is the Speculation Rules API, which offers enhanced prefetching and enables actual prerendering of webpages. It introduces a JSON-based syntax for precisely defining the conditions under which preprocessing should occur. Here’s a simple example of how to use it: <script type="speculationrules"> { "prerender": [{ "urls": ["next.html", "next2.html"] }] } </script> Alternatively, you can place the JSON file on your server and reference it using an HTTP header: Speculation-Rules: "/speculationrules.json". The above list-rule specifies that the browser should prerender the URLs next.html and next2.html so they are ready for instant navigation. The keyword prerender means more than fetching the HTML and subresources—it instructs the browser to fully render the pages in hidden tabs, ready to replace the current page instantly when needed. This makes navigation to these pages feel seamless. Prerendered pages also typically score excellent Core Web Vital metrics. Layout shifts and image loading occur during the hidden prerendering phase, and JavaScript execution happens upfront, ensuring a smooth experience when the user first sees the page. Instead of listing specific URLs, the API also allows for pattern matching using where and href_matches keys: <script type="speculationrules"> { "prerender": [{ "where": { "href_matches": "/*" } }] } </script> For more precise targeting, CSS selectors can be used with the selector_matches key: <script type="speculationrules"> { "prerender": [{ "where": { "selector_matches": ".navigation__link" } }] } </script> These rules, called document-rules, act on link elements as soon as the user triggers a pointerdown or touchstart event, giving the referenced pages a few milliseconds' head start before the actual navigation. If you want the preprocessing to begin even earlier, you can adjust the eagerness setting: <script type="speculationrules"> { "prerender": [{ "where": { "href_matches": "/*" }, "eagerness": "moderate" }] } </script> Eagerness values: immediate: Executes immediately. eager: Currently behaves like immediate but may be refined to sit between immediate and moderate. moderate: Executes after a 200ms hover or on pointerdown for mobile devices. conservative (default): Speculates based on pointer or touch interaction. For even greater flexibility, you can combine prerender and prefetch rules with different eagerness settings: <script type="speculationrules"> { "prerender": [{ "where": { "href_matches": "/*" }, "eagerness": "conservative" }], "prefetch": [{ "where": { "href_matches": "/*" }, "eagerness": "moderate" }] } </script> Limitations and Challenges While the Speculation Rules API is powerful, it comes with some limitations: Browser support: Only Chromium-based browsers support it. Other browsers lack this capability, so treat it as a progressive enhancement. Bandwidth concerns: Over-aggressive settings could waste user bandwidth. Chromium imposes limits to mitigate this: a maximum of 10 prerendered and 50 prefetched pages with immediate or eager eagerness. Server strain: Poorly optimized servers (e.g., no caching, heavy database dependencies) may experience significant load increases due to excessive speculative requests. Compatibility: Prefetching won’t work if a Service Worker is active, though prerendering remains unaffected. Cross-origin prerendering requires explicit opt-in by the target page. Despite these caveats, the Speculation Rules API offers a powerful toolset to significantly enhance perceived performance and improve UX. So go ahead and try them out! I would like to express a big thank you to the Webperf community for always being ready to help with great tips and expertise. For this article, I would like to thank Barry Pollard, Andy Davies, and Noam Rosenthal in particular for providing very valuable background information. ❤️
by Alexander Muzenhardt Introduction Imagine you’re tasked with building a cool new feature for a product. You dive into the work with full energy, and just before the deadline, you manage to finish it. Everyone loves your work, and the feature is set to go live the next day. <button> <i class="icon">📆</i> </button> The Problem You find some good resources explaining that there are people with disabilities who need to be considered in these cases. This is known as accessibility. For example, some individuals have motor impairments and cannot use a mouse. In this particular case, the user is visually impaired and relies on assistive technology like a screen reader, which reads aloud the content of the website or software. The button you implemented doesn’t have any descriptive text, so only the icon is read aloud. In your case, the screen reader says, “Tear-Off Calendar button”. While it describes the appearance of the icon, it doesn’t convey the purpose of the button. This information is meaningless to the user. A button should always describe what action it will trigger when activated. That’s why we need additional descriptive text. The Challenge Okay, you understand the problem now and agree that it should be fixed. However, you don’t want to add visible text to the button. For design and aesthetic reasons, sighted users should only see the icon. Is there a way to keep the button “icon-only” while still providing a meaningful, descriptive text for users who rely on assistive technologies like screen readers? The Solution First, you need to give the button a descriptive name so that a screen reader can announce it. <button> <span>Open Calendar</span> <i class="icon">📆</i> </button> The problem now is that the button’s name becomes visible, which goes against your design guidelines. To prevent this, additional CSS is required. .sr-only { position: absolute; width: 1px; height: 1px; padding: 0; margin: -1px; overflow: hidden; clip: rect(0, 0, 0, 0); white-space: nowrap; border-width: 0; } <button> <span class="sr-only">Open Calendar</span> <i class="icon">📆</i> </button> The CSS ensures that the text inside the span-element is hidden from sighted users but remains readable for screen readers. This approach is so common that well-known CSS libraries like TailwindCSS, Bootstrap, and Material-UI include such a class by default. Although the text of the buttons is not visible anymore, the entire content of the button will be read aloud, including the icon — something you want to avoid. In HTML you are allowed to use specific attributes for accessibility, and in this case, the attribute aria-hidden is what you need. ARIA stands for “Accessible Rich Internet Applications” and is an initiative to make websites and software more accessible to people with disabilities. The attribute aria-hidden hides elements from screen readers so that their content isn’t read. All you need to do is add the attribute aria-hidden with the value “true” to the icon element, which in this case is the “i”-element. <button> <span class="sr-only">Open Calendar</span> <i class="icon" aria-hidden="true">📆</i> </button> Alternative An alternative is the attribute aria-label, which you can assign a descriptive, accessible text to a button without it being visible to sighted users. The purpose of aria-label is to provide a description for interactive elements that lack a visible label or descriptive text. All you need to do is add the attribute aria-label to the button. The attribute aria-hidden and the span-Element can be deleted. <button aria-label="Open Calendar"> <i class="icon">📆</i> </button> With this adjustment, the screen reader will now announce “Open calendar,” completely ignoring the icon. This clearly communicates to the user what the button will do when clicked. Which Option Should You Use? At first glance, the aria-label approach might seem like the smarter choice. It requires less code, reducing the likelihood of errors, and looks cleaner overall, potentially improving code readability. However, the first option is actually the better choice. There are several reasons for this that may not be immediately obvious: Some browsers do not translate aria-label It is difficult to copy aria-label content or otherwise manipulated it as text aria-label content will not show up if styles fail to load These are just a few of the many reasons why you should be cautious when using the aria-label attribute. These points, along with others, are discussed in detail in the excellent article "aria-label is a Code Smell" by Eric Bailey. The First Rule of ARIA Use The “First Rule of ARIA Use” states: If you can use a native HTML element or attribute with the semantics and behavior you require already built in, instead of re-purposing an element and adding an ARIA role, state or property to make it accessible, then do so. Even though the first approach also uses an ARIA attribute, it is more acceptable because aria-hidden only hides an element from screen readers. In contrast, aria-label overrides the standard HTML behavior for handling descriptive names. For this reason, following this principle, aria-hidden is preferable to aria-label in this case. Browser compatibility Both aria-label and aria-hidden are supported by all modern browsers and can be used without concern. Conclusion Ensuring accessibility in web design is more than just a nice-to-have—it’s a necessity. By implementing simple solutions like combining CSS with aria-hidden, you can create a user experience that is both aesthetically pleasing and accessible for everyone, including those who rely on screen readers. While there may be different approaches to solving accessibility challenges, the key is to be mindful of all users' needs. A few small adjustments can make a world of difference, ensuring that your features are truly usable by everyone. Cheers Resources / Links Unicode Character “Tear-Off Calendar” comport Unicode Website mdn web docs aria-label mdn web docs aria-hidden WAI-ARIA Standard Guidlines Tailwind CSS Screen Readers (sr-only) aria-label is a Code Smell First Rule of ARIA Use
by David Luhr The Description List (<dl>) element is useful for many common visual design patterns, but is unfortunately underutilized. It was originally intended to group terms with their definitions, but it's also a great fit for other content that has a key/value structure, such as product attributes or cards that have several supporting details. Developers often mark up these patterns with overused heading or table semantics, or neglect semantics entirely. With the Description List (<dl>) element and its dedicated Description Term (<dt>) and Description Definition (<dd>) elements, we can improve the semantics and accessibility of these design patterns. The <dl> has a unique content model: A parent <dl> containing one or more groups of <dt> and <dd> elements Each term/definition group can have multiple <dt> (Description Term) elements per <dd> (Description Definition) element, or multiple definitions per term The <dl> can optionally accept a single layer of <div> to wrap the <dt> and <dd> elements, which can be useful for styling Examples An initial example would be a simple list of terms and definitions: <dl> <dt>Compression damping</dt> <dd>Controls the rate a spring compresses when it experiences a force</dd> <dt>Rebound damping</dt> <dd>Controls the rate a spring returns to it's extended length after compressing</dd> </dl> A common design pattern is "stat callouts", which feature mini cards of small label text above large numeric values. The <dl> is a great fit for this content: <dl> <div> <dt>Founded</dt> <dd>1988</dd> </div> <div> <dt>Frames built</dt> <dd>8,678</dd> </div> <div> <dt>Race podiums</dt> <dd>212</dd> </div> </dl> And, a final example of a product listing, which has a list of technical specs: <h2>Downhill MTB</h2> <dl> <div> <dt>Front travel:</dt> <dd>160mm</dd> </div> <div> <dt>Wheel size:</dt> <dd>27.5"</dd> </div> <div> <dt>Weight:</dt> <dd>15.2 kg</dd> </div> </dl> Accessibility With this markup in place, common screen readers will convey important semantic and navigational information. In my testing, NVDA on Windows and VoiceOver on MacOS conveyed a list role, the count of list items, your position in the list, and the boundaries of the list. TalkBack on Android only conveyed the term and definition roles of the <dt> and <dd> elements, respectively. If the design doesn't include visible labels, you can at least include them as visually hidden text for assistive technology users. But, I always advocate to visually display them if possible. Wrapping up The <dl> is a versatile element that unfortunately doesn't get much use. In over a decade of coding, I've almost never encountered it in existing codebases. It also doesn't appear anywhere in the top HTML elements lists in the Web Almanac 2024 or an Advanced Web Ranking study of over 11.3 million pages. The next time you're building out a design, look for opportunities where the underrated Description List is a good fit. To go deeper, be sure to check out this article by Ben Myers on the <dl> element.
by Alistair Shepherd Web performance is incredibly important. If you were here for the advent calendar last year you may have already read many of my thoughts on the subject. If not, read Getting started with Web Performance when you’re done here! This year I’m back for more web performance, this time focusing on my favourite HTML snippet for improving the loading performance of web fonts using preloads. This short HTML snippet added to the head of your page, can make a substantial improvement to both perceived and measured performance. <link rel="preload" href="/nova-sans.woff2" as="font" type="font/woff2" crossorigin="anonymous" > Above we have a link element that instructs the browser to preload the /nova-sans.woff2 font. By preloading your critical above-the-fold font we can make a huge impact by reducing potential flashes of unstyled or invisible text and layout shifts caused by font loading, like here in the following video: Recording of a page load illustrating how a font loading late can result in a jarring layout shift How web fonts are loaded To explain how preloading fonts can make such an impact, let’s go through the process of how web fonts are loaded. Font files are downloaded later than you may think, due to a combination of network requests and conservative browser behaviour. In a standard web page, there will be the main HTML document which will include a CSS file using a link element in the head. If you’re using self-hosted custom fonts you’ll have a @font-face rule within your CSS that specifies the font name, the src, and possibly some other font-related properties. In other CSS rules you specify a font-family so elements use your custom font. Once our browser encounters our page it: Starts streaming the HTML document, parsing it as it goes Encounters the link element pointing to our CSS file Starts downloading that CSS file, blocking the render of the page until it’s complete Parses and applies the contents of that file Finds the @font-face rule with our font URL Okay let’s pause here for a moment — It may make sense for step 6 to be “Starts downloading our font file”, however that’s not the case. You see, if a browser downloaded every font within a CSS file when it first encountered them, we could end up loading much more than is needed. We could be specifying fonts for multiple different weights, italics, other character sets/languages, or even multiple different fonts. If we don’t need all these fonts immediately it would be wasteful to download them all, and doing so may slow down higher priority CSS or JS. Instead, the browser is more conservative and simply takes note of the font declaration until it’s explicitly needed. The browser next: Takes a note of our @font-face declarations and their URLs for later Finishes processing CSS and starts rendering the page Discovers a piece of text on the page that needs our font Finally starts downloading our font now it knows it’s needed! So as we can see there’s actually a lot that happens between our HTML file arriving in the browser and our font file being downloaded. This is ideal for lower priority fonts, but for the main or headline font this process can make our custom font appear surprisingly late in the page load. This is what causes the behaviour we saw in the video above, where the page starts rendering but it takes some time before our custom font appears. A waterfall graph showing how our custom ‘lobster.woff2’ font doesn’t start being downloaded until 2 seconds into the page load and isn’t available until after 3 seconds This is an intentional decision by browser makers and spec writers to ensure that pages with lots of fonts aren’t badly impacted by having to load many font files ahead of time. But that doesn’t mean it can’t be optimised! Preloading our font with a link <link rel="preload" href="/nova-sans.woff2" as="font" type="font/woff2" crossorigin="anonymous" > The purpose of my favourite HTML snippet is to inform the browser that this font file will be needed with high priority, before it even has knowledge of it. We’re building our page and know more about how our fonts are used — so we can provide hints to be less conservative! If we start downloading the font as soon as possible then it can be ready ahead of when the browser ‘realises’ it’s needed. Looking back at our list above, by adding a preload we move the start of the font download from step 9 to step 2! Starts streaming the HTML document, parsing it as it goes Encounters our preload and starts downloading our font file in the background Encounters the link element pointing to our CSS file Continues as above Taking a closer look at the snippet, we’re using a link element and rel="preload" to ask the browser to preload a file with the intention of using it early in the page load. Like a CSS file, we provide the URL with the href attribute. We use as="font" and type="font/woff2" to indicate this is a font file in woff2. For modern browsers woff2 is the only format you need as it’s universally supported. Finally there’s crossorigin="anonymous". This comes from the wonderfully transparent and clear world of Cross Origin Resource Sharing. I jest of course, CORS is anything but transparent and clear! For fonts you almost always want crossorigin="anonymous" on your link element, even when the request isn’t cross-origin. Omitting this attribute would mean our preload wouldn’t be used and the file would be requested again. But why? Browser requests can be sent either with or without credentials (cookies, etc), and requests to the same URL with and without credentials are fundamentally different. For a preload to be used by the browser, it needs to match the type of request that the browser would have made normally. By default fonts are always requested without credentials, so we need to add crossorigin="anonymous" to ensure our preload matches a normal font request. By omitting this attribute our preload would not be used and the browser would request the font again. If you’re ever unsure of how your preloads are working, check your browsers’ devtools. In Chrome the Network pane will show a duplicate request, and the Console will log a warning telling you a preload wasn’t used. Screenshot showing the Chrome devtools Console pane, with warnings for an incorrect font preload Result and conclusion By preloading our critical fonts we ensure our browser has the most important fonts available earlier in the page loading process. We can see this by comparing our recording and waterfall charts from earlier: Side-by-side recording of the same page loading in different ways. ‘no-preload’ shows a large layout shift caused by the font switching and finishes loading at 4.4s. ‘preload’ doesn’t have a shift and finishes loading at 3.1s. Side-by-side comparison of two waterfall charts of the same site with font file `lobster.woff2`. For the ‘no-preload’ document the font loads after all other assets and finishes at 3s. The ‘preload’ document shows the font loading much earlier, in parallel with other files and finishing at 2s. As I mentioned in Getting started with Web Performance, it’s best to use preloads sparingly so limit this to your most important font or two. Remember that it’s a balance. By preloading too many resources you run the risk of other high-priority resources such as CSS being slowed down and arriving late. I would recommend preloading just the heading font to start with. With some testing you can see if preloading your main body font is worth it also! With care, font preloads can be a simple and impactful optimisation opportunity and this is why it’s my favourite HTML snippet! This is a great step to improving font loading, and there are plenty of other web font optimisations to try also!
More in programming
This is a re-publishing of a blog post I originally wrote for work, but wanted on my own blog as well. AI is everywhere, and its impressive claims are leading to rapid adoption. At this stage, I’d qualify it as charismatic technology—something that under-delivers on what it promises, but promises so much that the industry still leverages it because we believe it will eventually deliver on these claims. This is a known pattern. In this post, I’ll use the example of automation deployments to go over known patterns and risks in order to provide you with a list of questions to ask about potential AI solutions. I’ll first cover a short list of base assumptions, and then borrow from scholars of cognitive systems engineering and resilience engineering to list said criteria. At the core of it is the idea that when we say we want humans in the loop, it really matters where in the loop they are. My base assumptions The first thing I’m going to say is that we currently do not have Artificial General Intelligence (AGI). I don’t care whether we have it in 2 years or 40 years or never; if I’m looking to deploy a tool (or an agent) that is supposed to do stuff to my production environments, it has to be able to do it now. I am not looking to be impressed, I am looking to make my life and the system better. Another mechanism I want you to keep in mind is something called the context gap. In a nutshell, any model or automation is constructed from a narrow definition of a controlled environment, which can expand as it gains autonomy, but remains limited. By comparison, people in a system start from a broad situation and narrow definitions down and add constraints to make problem-solving tractable. One side starts from a narrow context, and one starts from a wide one—so in practice, with humans and machines, you end up seeing a type of teamwork where one constantly updates the other: The optimal solution of a model is not an optimal solution of a problem unless the model is a perfect representation of the problem, which it never is. — Ackoff (1979, p. 97) Because of that mindset, I will disregard all arguments of “it’s coming soon” and “it’s getting better real fast” and instead frame what current LLM solutions are shaped like: tools and automation. As it turns out, there are lots of studies about ergonomics, tool design, collaborative design, where semi-autonomous components fit into sociotechnical systems, and how they tend to fail. Additionally, I’ll borrow from the framing used by people who study joint cognitive systems: rather than looking only at the abilities of what a single person or tool can do, we’re going to look at the overall performance of the joint system. This is important because if you have a tool that is built to be operated like an autonomous agent, you can get weird results in your integration. You’re essentially building an interface for the wrong kind of component—like using a joystick to ride a bicycle. This lens will assist us in establishing general criteria about where the problems will likely be without having to test for every single one and evaluate them on benchmarks against each other. Questions you'll want to ask The following list of questions is meant to act as reminders—abstracting away all the theory from research papers you’d need to read—to let you think through some of the important stuff your teams should track, whether they are engineers using code generation, SREs using AIOps, or managers and execs making the call to adopt new tooling. Are you better even after the tool is taken away? An interesting warning comes from studying how LLMs function as learning aides. The researchers found that people who trained using LLMs tended to fail tests more when the LLMs were taken away compared to people who never studied with them, except if the prompts were specifically (and successfully) designed to help people learn. Likewise, it’s been known for decades that when automation handles standard challenges, the operators expected to take over when they reach their limits end up worse off and generally require more training to keep the overall system performant. While people can feel like they’re getting better and more productive with tool assistance, it doesn’t necessarily follow that they are learning or improving. Over time, there’s a serious risk that your overall system’s performance will be limited to what the automation can do—because without proper design, people keeping the automation in check will gradually lose the skills they had developed prior. Are you augmenting the person or the computer? Traditionally successful tools tend to work on the principle that they improve the physical or mental abilities of their operator: search tools let you go through more data than you could on your own and shift demands to external memory, a bicycle more effectively transmits force for locomotion, a blind spot alert on your car can extend your ability to pay attention to your surroundings, and so on. Automation that augments users therefore tends to be easier to direct, and sort of extends the person’s abilities, rather than acting based on preset goals and framing. Automation that augments a machine tends to broaden the device’s scope and control by leveraging some known effects of their environment and successfully hiding them away. For software folks, an autoscaling controller is a good example of the latter. Neither is fundamentally better nor worse than the other—but you should figure out what kind of automation you’re getting, because they fail differently. Augmenting the user implies that they can tackle a broader variety of challenges effectively. Augmenting the computers tends to mean that when the component reaches its limits, the challenges are worse for the operator. Is it turning you into a monitor rather than helping build an understanding? If your job is to look at the tool go and then say whether it was doing a good or bad job (and maybe take over if it does a bad job), you’re going to have problems. It has long been known that people adapt to their tools, and automation can create complacency. Self-driving cars that generally self-drive themselves well but still require a monitor are not effectively monitored. Instead, having AI that supports people or adds perspectives to the work an operator is already doing tends to yield better long-term results than patterns where the human learns to mostly delegate and focus elsewhere. (As a side note, this is why I tend to dislike incident summarizers. Don’t make it so people stop trying to piece together what happened! Instead, I prefer seeing tools that look at your summaries to remind you of items you may have forgotten, or that look for linguistic cues that point to biases or reductive points of view.) Does it pigeonhole what you can look at? When evaluating a tool, you should ask questions about where the automation lands: Does it let you look at the world more effectively? Does it tell you where to look in the world? Does it force you to look somewhere specific? Does it tell you to do something specific? Does it force you to do something? This is a bit of a hybrid between “Does it extend you?” and “Is it turning you into a monitor?” The five questions above let you figure that out. As the tool becomes a source of assertions or constraints (rather than a source of information and options), the operator becomes someone who interacts with the world from inside the tool rather than someone who interacts with the world with the tool’s help. The tool stops being a tool and becomes a representation of the whole system, which means whatever limitations and internal constraints it has are then transmitted to your users. Is it a built-in distraction? People tend to do multiple tasks over many contexts. Some automated systems are built with alarms or alerts that require stealing someone’s focus, and unless they truly are the most critical thing their users could give attention to, they are going to be an annoyance that can lower the effectiveness of the overall system. What perspectives does it bake in? Tools tend to embody a given perspective. For example, AIOps tools that are built to find a root cause will likely carry the conceptual framework behind root causes in their design. More subtly, these perspectives are sometimes hidden in the type of data you get: if your AIOps agent can only see alerts, your telemetry data, and maybe your code, it will rarely be a source of suggestions on how to improve your workflows because that isn’t part of its world. In roles that are inherently about pulling context from many disconnected sources, how on earth is automation going to make the right decisions? And moreover, who’s accountable for when it makes a poor decision on incomplete data? Surely not the buyer who installed it! This is also one of the many ways in which automation can reinforce biases—not just based on what is in its training data, but also based on its own structure and what inputs were considered most important at design time. The tool can itself become a keyhole through which your conclusions are guided. Is it going to become a hero? A common trope in incident response is heroes—the few people who know everything inside and out, and who end up being necessary bottlenecks to all emergencies. They can’t go away for vacation, they’re too busy to train others, they develop blind spots that nobody can fix, and they can’t be replaced. To avoid this, you have to maintain a continuous awareness of who knows what, and crosstrain each other to always have enough redundancy. If you have a team of multiple engineers and you add AI to it, having it do all of the tasks of a specific kind means it becomes a de facto hero to your team. If that’s okay, be aware that any outages or dysfunction in the AI agent would likely have no practical workaround. You will essentially have offshored part of your ops. Do you need it to be perfect? What a thing promises to be is never what it is—otherwise AWS would be enough, and Kubernetes would be enough, and JIRA would be enough, and the software would work fine with no one needing to fix things. That just doesn’t happen. Ever. Even if it’s really, really good, it’s gonna have outages and surprises, and it’ll mess up here and there, no matter what it is. We aren’t building an omnipotent computer god, we’re building imperfect software. You’ll want to seriously consider whether the tradeoffs you’d make in terms of quality and cost are worth it, and this is going to be a case-by-case basis. Just be careful not to fix the problem by adding a human in the loop that acts as a monitor! Is it doing the whole job or a fraction of it? We don’t notice major parts of our own jobs because they feel natural. A classic pattern here is one of AIs getting better at diagnosing patients, except the benchmarks are usually run on a patient chart where most of the relevant observations have already been made by someone else. Similarly, we often see AI pass a test with flying colors while it still can’t be productive at the job the test represents. People in general have adopted a model of cognition based on information processing that’s very similar to how computers work (get data in, think, output stuff, rinse and repeat), but for decades, there have been multiple disciplines that looked harder at situated work and cognition, moving past that model. Key patterns of cognition are not just in the mind, but are also embedded in the environment and in the interactions we have with each other. Be wary of acquiring a solution that solves what you think the problem is rather than what it actually is. We routinely show we don’t accurately know the latter. What if we have more than one? You probably know how straightforward it can be to write a toy project on your own, with full control of every refactor. You probably also know how this stops being true as your team grows. As it stands today, a lot of AI agents are built within a snapshot of the current world: one or few AI tools added to teams that are mostly made up of people. By analogy, this would be like everyone selling you a computer assuming it were the first and only electronic device inside your household. Problems arise when you go beyond these assumptions: maybe AI that writes code has to go through a code review process, but what if that code review is done by another unrelated AI agent? What happens when you get to operations and common mode failures impact components from various teams that all have agents empowered to go fix things to the best of their ability with the available data? Are they going to clash with people, or even with each other? Humans also have that ability and tend to solve it via processes and procedures, explicit coordination, announcing what they’ll do before they do it, and calling upon each other when they need help. Will multiple agents require something equivalent, and if so, do you have it in place? How do they cope with limited context? Some changes that cause issues might be safe to roll back, some not (maybe they include database migrations, maybe it is better to be down than corrupting data), and some may contain changes that rolling back wouldn’t fix (maybe the workload is controlled by one or more feature flags). Knowing what to do in these situations can sometimes be understood from code or release notes, but some situations can require different workflows involving broader parts of the organization. A risk of automation without context is that if you have situations where waiting or doing little is the best option, then you’ll need to either have automation that requires input to act, or a set of actions to quickly disable multiple types of automation as fast as possible. Many of these may exist at the same time, and it becomes the operators’ jobs to not only maintain their own context, but also maintain a mental model of the context each of these pieces of automation has access to. The fancier your agents, the fancier your operators’ understanding and abilities must be to properly orchestrate them. The more surprising your landscape is, the harder it can become to manage with semi-autonomous elements roaming around. After an outage or incident, who does the learning and who does the fixing? One way to track accountability in a system is to figure out who ends up having to learn lessons and change how things are done. It’s not always the same people or teams, and generally, learning will happen whether you want it or not. This is more of a rhetorical question right now, because I expect that in most cases, when things go wrong, whoever is expected to monitor the AI tool is going to have to steer it in a better direction and fix it (if they can); if it can’t be fixed, then the expectation will be that the automation, as a tool, will be used more judiciously in the future. In a nutshell, if the expectation is that your engineers are going to be doing the learning and tweaking, your AI isn’t an independent agent—it’s a tool that cosplays as an independent agent. Do what you will—just be mindful All in all, none of the above questions flat out say you should not use AI, nor where exactly in the loop you should put people. The key point is that you should ask that question and be aware that just adding whatever to your system is not going to substitute workers away. It will, instead, transform work and create new patterns and weaknesses. Some of these patterns are known and well-studied. We don’t have to go rushing to rediscover them all through failures as if we were the first to ever automate something. If AI ever gets so good and so smart that it’s better than all your engineers, it won’t make a difference whether you adopt it only once it’s good. In the meanwhile, these things do matter and have real impacts, so please design your systems responsibly. If you’re interested to know more about the theoretical elements underpinning this post, the following references—on top of whatever was already linked in the text—might be of interest: Books: Joint Cognitive Systems: Foundations of Cognitive Systems Engineering by Erik Hollnagel Joint Cognitive Systems: Patterns in Cognitive Systems Engineering by David D. Woods Cognition in the Wild by Edwin Hutchins Behind Human Error by David D. Woods, Sydney Dekker, Richard Cook, Leila Johannesen, Nadine Sarter Papers: Ironies of Automation by Lisanne Bainbridge The French-Speaking Ergonomists’ Approach to Work Activity by Daniellou How in the World Did We Ever Get into That Mode? Mode Error and Awareness in Supervisory Control by Nadine Sarter Can We Ever Escape from Data Overload? A Cognitive Systems Diagnosis by David D. Woods Ten Challenges for Making Automation a “Team Player” in Joint Human-Agent Activity by Gary Klein and David D. Woods MABA-MABA or Abracadabra? Progress on Human–Automation Co-ordination by Sidney Dekker Managing the Hidden Costs of Coordination by Laura Maguire Designing for Expertise by David D. Woods The Impact of Generative AI on Critical Thinking by Lee et al.
AMD is sending us the two MI300X boxes we asked for. They are in the mail. It took a bit, but AMD passed my cultural test. I now believe they aren’t going to shoot themselves in the foot on software, and if that’s true, there’s absolutely no reason they should be worth 1/16th of NVIDIA. CUDA isn’t really the moat people think it is, it was just an early ecosystem. tiny corp has a fully sovereign AMD stack, and soon we’ll port it to the MI300X. You won’t even have to use tinygrad proper, tinygrad has a torch frontend now. Either NVIDIA is super overvalued or AMD is undervalued. If the petaflop gets commoditized (tiny corp’s mission), the current situation doesn’t make any sense. The hardware is similar, AMD even got the double throughput Tensor Cores on RDNA4 (NVIDIA artificially halves this on their cards, soon they won’t be able to). I’m betting on AMD being undervalued, and that the demand for AI has barely started. With good software, the MI300X should outperform the H100. In for a quarter million. Long term. It can always dip short term, but check back in 5 years.
Earlier this weekGuileWhippet But now I do! Today’s note is about how we can support untagged allocations of a few different kinds in Whippet’s .mostly-marking collector Why bother supporting untagged allocations at all? Well, if I had my way, I wouldn’t; I would just slog through Guile and fix all uses to be tagged. There are only a finite number of use sites and I could get to them all in a month or so. The problem comes for uses of from outside itself, in C extensions and embedding programs. These users are loathe to adapt to any kind of change, and garbage-collection-related changes are the worst. So, somehow, we need to support these users if we are not to break the Guile community.scm_gc_malloclibguile The problem with , though, is that it is missing an expression of intent, notably as regards tagging. You can use it to allocate an object that has a tag and thus can be traced precisely, or you can use it to allocate, well, anything else. I think we will have to add an API for the tagged case and assume that anything that goes through is requesting an untagged, conservatively-scanned block of memory. Similarly for : you could be allocating a tagged object that happens to not contain pointers, or you could be allocating an untagged array of whatever. A new API is needed there too for pointerless untagged allocations.scm_gc_mallocscm_gc_mallocscm_gc_malloc_pointerless Recall that the mostly-marking collector can be built in a number of different ways: it can support conservative and/or precise roots, it can trace the heap precisely or conservatively, it can be generational or not, and the collector can use multiple threads during pauses or not. Consider a basic configuration with precise roots. You can make tagged pointerless allocations just fine: the trace function for that tag is just trivial. You would like to extend the collector with the ability to make pointerless allocations, for raw data. How to do this?untagged Consider first that when the collector goes to trace an object, it can’t use bits inside the object to discriminate between the tagged and untagged cases. Fortunately though . Of those 8 bits, 3 are used for the mark (five different states, allowing for future concurrent tracing), two for the , one to indicate whether the object is pinned or not, and one to indicate the end of the object, so that we can determine object bounds just by scanning the metadata byte array. That leaves 1 bit, and we can use it to indicate untagged pointerless allocations. Hooray!the main space of the mostly-marking collector has one metadata byte for each 16 bytes of payloadprecise field-logging write barrier However there is a wrinkle: when Whippet decides the it should evacuate an object, it tracks the evacuation state in the object itself; the embedder has to provide an implementation of a , allowing the collector to detect whether an object is forwarded or not, to claim an object for forwarding, to commit a forwarding pointer, and so on. We can’t do that for raw data, because all bit states belong to the object, not the collector or the embedder. So, we have to set the “pinned” bit on the object, indicating that these objects can’t move.little state machine We could in theory manage the forwarding state in the metadata byte, but we don’t have the bits to do that currently; maybe some day. For now, untagged pointerless allocations are pinned. You might also want to support untagged allocations that contain pointers to other GC-managed objects. In this case you would want these untagged allocations to be scanned conservatively. We can do this, but if we do, it will pin all objects. Thing is, conservative stack roots is a kind of a sweet spot in language run-time design. You get to avoid constraining your compiler, you avoid a class of bugs related to rooting, but you can still support compaction of the heap. How is this, you ask? Well, consider that you can move any object for which we can precisely enumerate the incoming references. This is trivially the case for precise roots and precise tracing. For conservative roots, we don’t know whether a given edge is really an object reference or not, so we have to conservatively avoid moving those objects. But once you are done tracing conservative edges, any live object that hasn’t yet been traced is fair game for evacuation, because none of its predecessors have yet been visited. But once you add conservatively-traced objects back into the mix, you don’t know when you are done tracing conservative edges; you could always discover another conservatively-traced object later in the trace, so you have to pin everything. The good news, though, is that we have gained an easier migration path. I can now shove Whippet into Guile and get it running even before I have removed untagged allocations. Once I have done so, I will be able to allow for compaction / evacuation; things only get better from here. Also as a side benefit, the mostly-marking collector’s heap-conservative configurations are now faster, because we have metadata attached to objects which allows tracing to skip known-pointerless objects. This regains an optimization that BDW has long had via its , used in Guile since time out of mind.GC_malloc_atomic With support for untagged allocations, I think I am finally ready to start getting Whippet into Guile itself. Happy hacking, and see you on the other side! inside and outside on intent on data on slop fin
I’ve been working on a project where I need to plot points on a map. I don’t need an interactive or dynamic visualisation – just a static map with coloured dots for each coordinate. I’ve created maps on the web using Leaflet.js, which load map data from OpenStreetMap (OSM) and support zooming and panning – but for this project, I want a standalone image rather than something I embed in a web page. I want to put in coordinates, and get a PNG image back. This feels like it should be straightforward. There are lots of Python libraries for data visualisation, but it’s not an area I’ve ever explored in detail. I don’t know how to use these libraries, and despite trying I couldn’t work out how to accomplish this seemingly simple task. I made several attempts with libraries like matplotlib and plotly, but I felt like I was fighting the tools. Rather than persist, I wrote my own solution with “lower level” tools. The key was a page on the OpenStreetMap wiki explaining how to convert lat/lon coordinates into the pixel system used by OSM tiles. In particular, it allowed me to break the process into two steps: Get a “base map” image that covers the entire world Convert lat/lon coordinates into xy coordinates that can be overlaid on this image Let’s go through those steps. Get a “base map” image that covers the entire world Let’s talk about how OpenStreetMap works, and in particular their image tiles. If you start at the most zoomed-out level, OSM represents the entire world with a single 256×256 pixel square. This is the Web Mercator projection, and you don’t get much detail – just a rough outline of the world. We can zoom in, and this tile splits into four new tiles of the same size. There are twice as many pixels along each edge, and each tile has more detail. Notice that country boundaries are visible now, but we can’t see any names yet. We can zoom in even further, and each of these tiles split again. There still aren’t any text labels, but the map is getting more detailed and we can see small features that weren’t visible before. You get the idea – we could keep zooming, and we’d get more and more tiles, each with more detail. This tile system means you can get detailed information for a specific area, without loading the entire world. For example, if I’m looking at street information in Britain, I only need the detailed tiles for that part of the world. I don’t need the detailed tiles for Bolivia at the same time. OpenStreetMap will only give you 256×256 pixels at a time, but we can download every tile and stitch them together, one-by-one. Here’s a Python script that enumerates all the tiles at a particular zoom level, downloads them, and uses the Pillow library to combine them into a single large image: #!/usr/bin/env python3 """ Download all the map tiles for a particular zoom level from OpenStreetMap, and stitch them into a single image. """ import io import itertools import httpx from PIL import Image zoom_level = 2 width = 256 * 2**zoom_level height = 256 * (2**zoom_level) im = Image.new("RGB", (width, height)) for x, y in itertools.product(range(2**zoom_level), range(2**zoom_level)): resp = httpx.get(f"https://tile.openstreetmap.org/{zoom_level}/{x}/{y}.png", timeout=50) resp.raise_for_status() im_buffer = Image.open(io.BytesIO(resp.content)) im.paste(im_buffer, (x * 256, y * 256)) out_path = f"map_{zoom_level}.png" im.save(out_path) print(out_path) The higher the zoom level, the more tiles you need to download, and the larger the final image will be. I ran this script up to zoom level 6, and this is the data involved: Zoom level Number of tiles Pixels File size 0 1 256×256 17.1 kB 1 4 512×512 56.3 kB 2 16 1024×1024 155.2 kB 3 64 2048×2048 506.4 kB 4 256 4096×4096 2.7 MB 5 1,024 8192×8192 13.9 MB 6 4,096 16384×16384 46.1 MB I can just about open that zoom level 6 image on my computer, but it’s struggling. I didn’t try opening zoom level 7 – that includes 16,384 tiles, and I’d probably run out of memory. For most static images, zoom level 3 or 4 should be sufficient – I ended up a base map from zoom level 4 for my project. It takes a minute or so to download all the tiles from OpenStreetMap, but you only need to request it once, and then you have a static image you can use again and again. This is a particularly good approach if you want to draw a lot of maps. OpenStreetMap is provided for free, and we want to be a respectful user of the service. Downloading all the map tiles once is more efficient than making repeated requests for the same data. Overlay lat/lon coordinates on this base map Now we have an image with a map of the whole world, we need to overlay our lat/lon coordinates as points on this map. I found instructions on the OpenStreetMap wiki which explain how to convert GPS coordinates into a position on the unit square, which we can in turn add to our map. They outline a straightforward algorithm, which I implemented in Python: import math def convert_gps_coordinates_to_unit_xy( *, latitude: float, longitude: float ) -> tuple[float, float]: """ Convert GPS coordinates to positions on the unit square, which can be plotted on a Web Mercator projection of the world. This expects the coordinates to be specified in **degrees**. The result will be (x, y) coordinates: - x will fall in the range (0, 1). x=0 is the left (180° west) edge of the map. x=1 is the right (180° east) edge of the map. x=0.5 is the middle, the prime meridian. - y will fall in the range (0, 1). y=0 is the top (north) edge of the map, at 85.0511 °N. y=1 is the bottom (south) edge of the map, at 85.0511 °S. y=0.5 is the middle, the equator. """ # This is based on instructions from the OpenStreetMap Wiki: # https://wiki.openstreetmap.org/wiki/Slippy_map_tilenames#Example:_Convert_a_GPS_coordinate_to_a_pixel_position_in_a_Web_Mercator_tile # (Retrieved 16 January 2025) # Convert the coordinate to the Web Mercator projection # (https://epsg.io/3857) # # x = longitude # y = arsinh(tan(latitude)) # x_webm = longitude y_webm = math.asinh(math.tan(math.radians(latitude))) # Transform the projected point onto the unit square # # x = 0.5 + x / 360 # y = 0.5 - y / 2π # x_unit = 0.5 + x_webm / 360 y_unit = 0.5 - y_webm / (2 * math.pi) return x_unit, y_unit Their documentation includes a worked example using the coordinates of the Hachiko Statue. We can run our code, and check we get the same results: >>> convert_gps_coordinates_to_unit_xy(latitude=35.6590699, longitude=139.7006793) (0.8880574425, 0.39385379958274735) Most users of OpenStreetMap tiles will use these unit positions to select the tiles they need, and then dowload those images – but we can also position these points directly on the global map. I wrote some more Pillow code that converts GPS coordinates to these unit positions, scales those unit positions to the size of the entire map, then draws a coloured circle at each point on the map. Here’s the code: from PIL import Image, ImageDraw gps_coordinates = [ # Hachiko Memorial Statue in Tokyo {"latitude": 35.6590699, "longitude": 139.7006793}, # Greyfriars Bobby in Edinburgh {"latitude": 55.9469224, "longitude": -3.1913043}, # Fido Statue in Tuscany {"latitude": 43.955101, "longitude": 11.388186}, ] im = Image.open("base_map.png") draw = ImageDraw.Draw(im) for coord in gps_coordinates: x, y = convert_gps_coordinates_to_unit_xy(**coord) radius = 32 draw.ellipse( [ x * im.width - radius, y * im.height - radius, x * im.width + radius, y * im.height + radius, ], fill="red", ) im.save("map_with_dots.png") and here’s the map it produces: The nice thing about writing this code in Pillow is that it’s a library I already know how to use, and so I can customise it if I need to. I can change the shape and colour of the points, or crop to specific regions, or add text to the image. I’m sure more sophisticated data visualisation libraries can do all this, and more – but I wouldn’t know how. The downside is that if I need more advanced features, I’ll have to write them myself. I’m okay with that – trading sophistication for simplicity. I didn’t need to learn a complex visualization library – I was able to write code I can read and understand. In a world full of AI-generating code, writing something I know I understand feels more important than ever. [If the formatting of this post looks odd in your feed reader, visit the original article]
This website has a new section: blogroll.opml! A blogroll is a list of blogs - a lightweight way of people recommending other people’s writing on the indieweb. What it includes The blogs that I included are just sampled from my many RSS subscriptions that I keep in my Feedbin reader. I’m subscribed to about 200 RSS feeds, the majority of which are dead or only publish once a year. I like that about blogs, that there’s no expectation of getting a post out every single day, like there is in more algorithmically-driven media. If someone who I interacted with on the internet years ago decides to restart their writing, that’s great! There’s no reason to prune all the quiet feeds. The picks are oriented toward what I’m into: niches, blogs that have a loose topic but don’t try to be general-interest, people with distinctive writing. If you import all of the feeds into your RSS reader, you’ll probably end up unsubscribing from some of them because some of the experimental electric guitar design or bonsai news is not what you’re into. Seems fine, or you’ll discover a new interest! How it works Ruben Schade figured out a brilliant way to show blogrolls and I copied him. Check out his post on styling OPML and RSS with XSLT to XHTML for how it works. My only additions to that scheme were making the blogroll page blend into the rest of the website by using an include tag with Jekyll to add the basic site skeleton, and adding a link with the download attribute to provide a simple way to download the OPML file. Oddly, if you try to save the OPML page using Save as… in Firefox, Firefox will save the transformed output via the XSLT, rather than the raw source code. XSLT is such an odd and rare part of the web ecosystem, I had to use it.