Full Width [alt+shift+f] Shortcuts [alt+shift+k]
Sign Up [alt+shift+s] Log In [alt+shift+l]
6
After I wrote YARR (Yet Another Rust Resource, with requisite pirate mentions), one of my friends tried it out. He gave me some really useful insights as he went through it, letting me see what was hard about learning Rust from a newcomer's perspective. Unsurprisingly, lifetimes are a challenge—and seeing him go through it helped me understand why they're hard to learn. Here are a few of the challenges he ran into. I don't think that these are necessarily problems, but they're perhaps opportunities to improve educational materials. They don't map 100% to how long a variable is in memory My friend gave me an example he's seen a few times when people explain lifetimes. fn longest<'a>(x: &'a str, y: &'a str) -> &'a str { if x.len() > y.len() { x } else { y } } And for many newcomers, you see this and you expect it is saying that x and y both have the lifetime 'a, so they live the same amount of time. But the following is valid: fn print_longest(x: &'static...
6 days ago

Improve your reading experience

Logged in users get linked directly to articles resulting in a better reading experience. Please login for free, it takes less than 1 minute.

More from ntietz.com blog - technically a blog

Your product shouldn't require showing my legal name

Last week, I finally got verified on LinkedIn. Now there's a little badge next to my name that says "yes, she's a human who is legally named Nicole." Their marketing for verification says that I should now expect 60% more profile views and 50% more comments and reactions. For a writer like me, that seems great. More people viewing my content means more people can learn from me, or be entertained by me. And all that for free? There's a problem, of course. Nicole is my legal name, so I was able to get verified as a result. But many people don't go by their legal name. Other names are common So who doesn't go by their legal name? I didn't, for years after my wife and I got married. I went by an alias—our hyphenated last name—without legally changing it. I wasn't eligible to get verified then, since that name was not on my ID. I didn't, when I came out as transgender. It takes time to change your name and update your documents. Until that was complete, I would have had to go by my deadname or lose verification. And what about women who change their name when they get married, but go by their previous name professionally? This is an alias, and it is their name even though it's not what the government knows them as. But they would lose verification for doing this. Anyone who goes by a nickname or alias is ineligible. I have many friends who go by a different name than what their ID shows. This isn't fraudulent—it just reflects who they are. Penalized for being yourself And yet, if you fall into any case where you cannot get verified, then you can't get the benefits. You can't get your extra profile views and extra comments. Or put another way: you're penalized for not being verified. You'll get about 40% fewer profile views and 33% fewer comments/reactions than people who are verified. You get marginalized, unable to reap the full benefits of the platform, if you don't conform to a very particular outlook on what a name is (the official sequence of letters on your ID). To be clear, the problem isn't the verification process itself1. That process (and its associated benefits) may be in place to deal with bot traffic. I can sympathize with this, and I do want lower bot traffic—it makes platforms much more pleasant to use. Let us use our names The problem is that you have to show your legal name to everyone. There should be a process for being verified without it being your legal name on display. This process doesn't have to be scalable if the group that would utilize it is small—since surely they'd only forget about small populations2. It can be as simple as filing a help ticket and allowing a human to approve it based on some evidence. Is your name consistent across your public profiles? And you're a human being? Cool, verified. I believe that LinkedIn can, and should, do better. This feature as implemented is harming marginalized folks who are not able to get the same visibility when they cannot get verified. It reduces the exposure that marginalized creators can get. You should keep this in mind in the products you make, too. Don't require people to display their legal names. And before you even collect that data, think about what problem you're trying to solve. Do you need to collect legal names to solve that? (Probably not.) If so, do you need to store them after processing once? (Probably not.) And if so, do you need to display them publicly? (Probably not.) Names are so much more than what the government knows us by. Let us be our true selves and verify us with our true names. 1 It's not free of problems, though: I'd like to have a way to achieve the same result ("she's a human! she's generally the internet person she claims she is!") without showing my government identity documents to a third party. 2 Though, companies have been known to marginalize large groups of people. This is a rhetorical point that they are either harming a lot of people or they could solve the problem for a low cost.

a week ago 14 votes
Can I ethically use LLMs?

The title is not a rhetorical question, and I'm not going to bury an answer. I don't have an answer. This post is my exploration of the question, and why I think it is a question1. Important things up front: what's my relationship with LLMs today? I don't use any LLMs regularly. I do have access to GitHub Copilot through my employer. I have it available on a hotkey, I think, and I cannot remember how to trigger it since I do not use it. I've explored using LLMs in the past. I used to be a regular Copilot user, and I explored ChatGPT, Claude, etc. to see what their capabilities were. I have done trainings for my coworkers on how to use them effectively, though I would not feel comfortable doing so now. My employer's product uses LLMs. I don't want to link to my employer, but yeah, I guess my paycheck depends on being okay with integrating them in? It's complicated (a refrain). (This post is, obviously, my opinions and does not reflect my employer.) I don't think using or not using them is a moral failing. There is a lot of moralizing around LLM usage. I'm not doing any of that here. I have my own beliefs (or my own questions), but I don't think people using LLMs are immoral (or vice versa). So, you can see I have used them and I'm not absolutist, but I don't use them today. Why not? Why did I stop using them in spite of the advances, where they're more capable than ever? It's because of these questions and issues. Where I have undecided ethical questions, I lean toward the more conservative2 choice of not using them until I have clarity on the ethics. (Note: I am not inviting folks to email me with answers to this question.) Energy usage Another technology that uses a lot of energy is blockchains. I think using public blockchains is almost universally unethical since there are other, better, less harmful options. Part of the harm from blockchains is an absurd amount of energy usage. LLMs also use a lot of energy. This can be split into training and inference energy usage. These vary based on the model. Some models can run locally on Apple silicon, and those are lower energy usage—their upper bound is running your computer full tilt, and an M4 Mac mini's max power consumption is 65 watts. This is roughly equivalent to one incandescent light bulb, or 8 LED light bulbs. It's good to turn off unnecessary lights, but doing so isn't going to solve the climate crisis; we need bigger, more sweeping reforms. I don't think that local models are going to significantly alter the climate crisis in either direction. Other models are massive and run in data centers on lots of power-hungry GPUs3. These data centers also require construction, and that comes with its own environmental impact. An article from Tom's Guide last year showed that "a single query on ChatGPT-4 can use up to 3 bottles of water, that a year of queries uses enough electricity to power over nine houses". A lot of the cost comes from new data centers being built. The demand for LLMs has led to more demand for power generation and more demand for data centers. And this new power generation is coming from gas-fired plants instead of sustainable, clean energy sources, because that's all we can build fast enough. A lot of attention is given to the training side. The numbers for training are large and shocking: Llama 3 used 500 mWh and GPT-3 training used 1,287 mWh—even more if you include the cost of training failed models which preceded these, the experiments that made the models possible. The listed figures are high, and 500 mWh is about the cost of a large jet flying for 7 hours. But we do it once per foundational model, and then the cost is spread across all the remaining usage. I don't think that the training side is significantly shifting the equation on climate change. We'd have a much larger impact on improving the climate crisis by advocating for remote work—reducing vehicles on the road, making many flights unnecessary—than by not training models. Overall it feels to me like local models have a clearly acceptable impact, and data center models have higher energy usage but still probably do not change the situation very much. Training data The training data for LLMs has largely been lifted without the consent of the people who generated that data. This is a lot of writing, music, videos, visual art, all of it. There are some attempts at there at using licensed data only, but the majority of models, and the most popular ones, are unlicensed data. Now the question is: is this an ethical problem? I know there are opinions on both sides of this. Some say that this data is publicly visible on the internet (though some of the data was not on the internet), and so it's fair game. Others say that this use isn't one that people consented to, and it should require that consent. My thought experiment is this: If we made search engines today, would people have this same objection to a search engine using their data without their express consent? I think most people would ultimately support search engines. They are different than LLMs, because they (mostly) serve results that point you to the original source, rather than create new content for you to replace the original sources with. But maybe people would reject search engines. And maybe consistency between these two isn't necessary, or maybe it's a false consistency—there could be other differentiating details that lead to different answers. Where I come down is that I think we need a robust mechanism to opt out of use of data for training, but that it's probably fine to train with publicly available data on the internet. What you do with the trained model is another question entirely. When you try to replace people making original works instead of creating an entirely new function, that's where you get questions. "Replacing" people There's a lot of LLM usage that is just trying to replace people in entire jobs. I mean, it feels like all of it. We see LLMs that are meant to replace writers and editors, artists and illustrators, musicians and songwriters. It doesn't say that directly—it says you're empowered to create things yourself. But what it means is that people should be able to press a few buttons and, with no artist involved, get out a beautiful artwork. Sounds like replacing to me. This is something we've long done with technology. We make technology that puts people out of jobs, and that was the whole industrial revolution. That's what happened with shipping from the containerization of ports, putting many dockworkers out of jobs. Replacing people in the abstract is not unethical. The problem is if we fail to deal with the harm created from replacing people. And people will be harmed, because losing your job or the value of it going down has a serious impact on quality of life. When we put people out of work, we—both society and technologists—have an ethical responsibility to ensure there's a plan to mitigate the harm from that. Maybe that means grants for living expenses while people switch fields, if put out of work in an LLM-heavy industry. Maybe that means a universal basic income. But it certainly doesn't mean doing nothing. Incorrect information and bias One of the major well-known problems of LLMs is a tendency to "hallucinate," or to confidently state facts that are made up from whole cloth. They also have an unknown amount of bias, with unknown mitigations in place, due to being closed systems. This is a big problem! We're not good at seeing what information is incorrect in something that's generated to look like it's the most likely string of tokens. If there is incorrect information in there, we'll just miss it. This means that people can make poor decisions on the basis of what an LLM tells them. They can have lost income due to its mistakes. And the bias? That's a huge problem, because it means that we don't know if this system will reinforce existing problematic norms4. We don't know what it will reinforce, because the training data is closed and there's not a lot of public evaluation on bias. So ultimately, we're left with an unknown harm of unknown magnitude to an unknown population5. Concentrating power (with the wealthy elite) A big ethical concern for me is also what this will do to our entire society. Many technologies are heralded as "democratizing" things. Spotify "democratized" music by making it so that anyone can get listens—but, y'all, it ended up flattening the tail and making the popular artists more money while making small artists less money. Will LLMs do the same? We know that the big models need large data centers to run their training and inference. And even small models need beefy hardware to run inference, let alone training! We have some access to models which can run locally, which is a good step. But the problem is that we can only run other people's models. They'll have those people's decisions baked in, decisions on which data to include or to exclude, decisions on how to approach questions of bias and abuse. And when the hardware to run the biggest models is only accessible to a few companies, that means that those companies really get a lot more powerful. OpenAI and Anthropic and Google and Meta all have the ability to run really large models. I certainly don't, though. This means that a technology that many are heralding as making things more accessible to everyone is controlled by a small handful of people. A small handful of people can decide how the models are trained, and set policies on how they're used. In a time when the US government is trying to get any paper retracted that mentioned queer people, and erasing trans people from the Stonewall monument, it feels self-evident that letting a small group of people control this technology imperils the future of many people. * * * Ultimately, I want robots to do the things I don't want to do. I want them to do my dishes and my laundry. I don't want them to play music instead of me, write code instead of me, write words instead of me. I am not sure whether or not using LLMs is unethical. There are certainly ways of using them which are unquestionably unethical—as is true with every technology. And there are ways of developing LLMs which are unethical—as is true with every technology. But the problems with them are large. I think it is unethical to use them without addressing the ethical questions above. If you're not working on mitigating the harms from LLMs (which do exist), then you might be doing something unethical. 1 I've been in the interesting situation of having anti-LLM people think I'm pro-LLM and vice versa. It's a very weird feeling, and makes me a little nervous of posting this! But this is an important question and an important conversation. 2 Footnoting out of an abundance of caution: I don't meant conservative-like-Republicans, because, no, I'd like to keep my rights thank you very much. I just mean in terms of minimizing risk. Let's please stop attacking trans rights, immigrants, Palestinians, and you know, everyone else that I've forgotten because the whole world seems to be on fire. 3 If we ever achieve artificial sentience, this sentence may acquire a second, more sinister, meaning. 4 It will. 5 My guess is a large harm to all underrepresented groups, but who asked the trans woman?

2 weeks ago 18 votes
What's in a ring buffer? And using them in Rust

Working on my cursed MIDI project, I needed a way to store the most recent messages without risking an unbounded amount of memory usage. I turned to the trusty ring buffer for this! I started by writing a very simple one myself in Rust, then I looked and what do you know, of course the standard library has an implementation. While I was working on this project, I was pairing with a friend. She asked me about ring buffers, and that's where this post came from! What's a ring buffer? Ring buffers are known by a few names. Circular queues and circular buffers are the other two I hear the most. But this name just has a nice ring to it. It's an array, basically: a buffer, or a queue, or a list of items stacked up against each other. You can put things into it and take things out of it the same as any buffer. But the front of it connects to the back, like any good ring1. Instead of adding on the end and popping from the end it, like a stack, you can add to one end and remove from the start, like a queue. And as you add or remove things, where the start and end of the list move around. Your data eats its own tail. This lets us keep a fixed number of elements in the buffer without running into reallocation. In a regular old buffer, if you use it as a queue—add to the end, remove from the front—then you'll eventually need to either reallocate the entire thing or shift all the elements over. Instead, a ring buffer lets you just keep adding from either end and removing from either end and you never have to reallocate! Uses for a ring buffer My MIDI program is one example of where you'd want a ring buffer, to keep the most recent items. There are some general situations where you'll run into this: You want a fixed number of the most recent things, like the last 50 items seen You want a queue, especially with a fixed maximum number of queued elements2 You want a buffer for data coming in with an upper bound, like with streaming audio, and you want to overwrite old data if the consumer can't keep up for a bit A lot of it comes down to performance, and streaming data. Something is producing data, something else is consuming it, and you want to make sure insertions and removals are fast. That was exactly my use: a MIDI device produces messages, my UI consumes them, but I don't want to fill up all my memory, forever, with more of them. How ring buffers work So how do they work? This is a normal, non-circular buffer: When you add something, you put it on the end. If you're using it as a stack, then you can pop things off the end. But if you're using it as a queue, you pop things off the front... And then you have to shuffle everything to the left. That's why we have ring buffers! I'll draw it as a straight line here, to show how it's laid out in memory. But the end loops back to the front logically, and we'll see how it wraps around. We start with an empty buffer, and start and end both point at the first element. When start == end, we know the buffer is empty. If we insert an element, we move end forward. And it gets repeated as we insert multiple times. If we remove an element, we move start forward. We can also start the buffer at any point, and it crosses over the end gracefully. Ring buffers are pretty simple when you get into their details, with just a few things to wrap your head around. It's an incredibly useful data structure! Rust options If you want to use one in Rust, what are your options? There's the standard library, which includes VecDeque. This implements a ring buffer, and the name comes from "Vec" (Rust's growable array type) combined with "Deque" (a double-ended queue). As this is in the standard library, this is quite accessible from most code, and it's a growable ring buffer. This means that the pop/push operations will always be efficient, but if your buffer is full it will result in a resize operation. The amortized running time will still be O(1) for insertion, but you incur the cost all at once when a resize happens, rather than at each insertion. You can enforce size limits in your code, if you want to avoid resize operations. Here's an example of how you could do that. You check if it's full first, and if so, you remove something to make space for the new element. let buffer: VecDeque<u32> = VecDeque::with_capacity(10); for thing in 0..15 { // if the buffer is already full, remove the first element to make space if buffer.len() == 10 { buffer.pop_front(); } buffer.push_back(thing); } There are also libraries that can enforce this for you! For example, circular-buffer implements a ring buffer. It has a fixed max capacity and won't resize on you, instead overwriting elements when you run out of space. The size is set at compile time, though, which can be great or can be a constraint that's hard to meet. There is also ringbuffer, which gives you a fixed size buffer that's heap allocated at runtime. That buys you some flexibility, with the drawback of being heap-based instead of stack-based. I'm definitely reaching for a library when I need a non-growable ring buffer in Rust now. There are some good choices, and it saves me the trouble of having to manually enforce size limits. 1 One of my favorite rings represents the sun and the moon, with the sun nestling inside this crescent moon shape. Unfortunately, the ring is open there. It makes for a very nice visual effect, but it kept getting snagged on things and bending. So it's not a very good ring for living an active hands-on life. 2 These can also be resizable! The Rust standard library one is. This comes with reallocation cost, which amortizes to a low cost but can be expensive in individual moments.

3 weeks ago 16 votes
Summarizing our recommendations for software documentation

Last year, my coworker Suzanne and I got a case study accepted into an ethnography conference! She's an anthropologist by training, and I'm a software engineer. How'd we wind up there? The short version is I asked her what I thought was an interesting question, and she thought it was an interesting question, and we wound up doing research. We have a lot more to go on the original question—we're studying the relationship between the values we hold as engineers and our documentation practices—and this case study is the first step in that direction. The title of the case study is a mouthful: "Foundations of Practice: An Ethnographic Exploration of Software Documentation and its Cultural Implications for an Agile Startup". I mean, it's intended for a more academic audience, and in a field that's not software engineering. But what we studied is relevant for software engineers, so let's talk about it here in more familiar language! If you want to read the full case study, it's open access. I'll start with the recommendations we provided, share a little bit about our methodology, and finish us out with a little about where we can go from here. Our recommendations The case study concluded with recommendations. This is an area I'd like us to do a lot more work in, because recommendations based on a case study are based on one example, and they'd obviously be much stronger based on more data! But for the conference submission, they were a necessary component. I think that the specific recommendations we gave hold up based on my experience (I'm biased, so take this as it is). These were our core recommendations. Start with high-level documentation. If you don't have any docs today, or all your docs are just in a bad way, the biggest bang-for-buck is with your high-level systems docs and process documentation. Don't have a system architecture diagram? Make it! Don't have a CI process diagram? Make it! These help you get your bearings on everything quickly, and they're also usually much faster to make than detailed documentation of more focused pieces. Use design reviews! Here, "design review" means putting your architecture changes—or any proposal for a major code change—through a review process analogous to a code review. This is one of the things we implemented at my company over four years ago, and it's incredibly useful. It makes us think more deeply on our designs, so we make fewer errors in the design phase of any given feature. And it makes the design visible to the whole team, so we get much better knowledge transfer. If I could drop these into every software company, I would. They're that useful. Think about your audience. This is just plain good writing advice, and it bears repeating. Not every kind of documentation has the same audience, and if you don't take that into account you'll write bad docs that your end user won't be able to use effectively. You may use too much jargon for the lay user, or too little jargon for efficient communication with the engineers working on a feature. Maintain the docs, updating them as part of your workflow. I think we all know that docs should be updated. The recommendation here is that updating docs should be part of your workflow. If you run into something that's out of date, go ahead and fix it. If someone asks you a question because it wasn't in a given doc, go ahead and add it there. Document test plans. QA processes are woefully underappreciated in software engineering. I don't quite understand why QA engineering is so looked down upon, because y'all, that stuff is hard. At any rate, testing? That's something you should document! Assuming you run tests before you release (you do, don't you?) then you need to have some organization and documentation around what those tests are, why they're there, etc. This needs to be shared between both QA and product engineers since if the test fails, it's probably, you know, the product engineer's change that broke it—so the better those docs are, the faster they can find the issue. Make onboarding docs. We know that good onboarding is important for bringing new team members up to productive contribution quickly. And we know that documentation is a big part of that. So focus on your documentation as part of your onboarding strategy. Run post-mortems and retrospectives. Mistakes will happen, and when they do? It's imperative that we go back through why they happened and what changes (if any) are merited to our processes as a result. Maybe there's work to be done to prevent similar incidents, or there's something we could do to detect this quicker in the future. Whatever you find, also update your docs: point to the post-mortems in the design docs that led to these decisions so that if someone reads those, they'll also see the longer-term impacts of those decisions. Encourage collaboration on documentation. You can do this by involving multiple team members in creating and reviewing design docs. This is a tool for mentoring and professional development. It also improves all the documentation quality. If you involve team members at all levels (not just your senior engineers!), you'll be able to ensure the docs and processes meet everyone's actual needs. Prioritize critical, complex areas for docs. You can't document everything. We mentioned earlier that you want to start with a high-level documentation overview, but you also want to focus your efforts on things which are critical or particularly complex. This will pay off by reducing the difficulty of understanding these areas, reducing rates of defects and speeding up development. Eliminate ineffective docs, focus on friction points. If your team says that documentation isn't serving a useful purpose, trust them and get rid of it. Change what you do to do something more effective. Focus your documentation efforts on resolving friction points between teams or in the development process. The methodology Our research comprised two main pieces: a literature review and interviews with our engineering team. We did the literature review first, to get our bearings. And, honestly, I was hoping to find answers to what we ultimately were setting out to answer. I'd rather not have to do this research to get our answers! The existing papers gave us a good grounding. I won't repeat too much here—you can read the section in the case study if you're interested. The bulk of the new research we did was through interviews, and analysis of those interviews1. We interviewed nine of our software engineers who were present at a critical transition in the company's history, when we implemented a major new documentation process as a result of transitioning to remote work. Each interview was recorded with the consent of the participants. We then transcribed them and did qualitative coding to extract themes. We were looking for some pre-defined themes (collaboration, learning, and knowledge sharing) and we were also interested in emergent themes. From there, we were able to extract out five values that are represented by documentation practices at the company. These values (collaboration, learning, continuous improvement, accountability, and technical excellence) are exhibited through our documentation practices. They are also reinforced by the documentation practices, so it's not wholly a causal relationship one way or the other. At this point we were able to take the results of the interviews and combine that with the literature review to figure out recommendations based on our case study. These are, of course, not universal. And they're to some extent conjecture: by the nature of a case study, we're not able to ground this in data across multiple companies. But they're a launching off point for teams in need and for future research. What's next? Ultimately, we want to drive toward answering the bigger questions we have around software engineering and documentation. One of those is: a lot of software engineers say their documentation is bad, and we also say that we value good documentation, so why do we not make the documentation better? Why are we not able to live our what we value there? Or do we not value documentation? To answer these questions to a satisfying end, we'll need to do a larger study. That means we'll need to recruit participants across many companies, and we'll need to design a study that can cut to the heart of these issues. I'm not sure when we'll have the time or energy to approach that, but I know we've both been completely sucked in by these questions. So we'll get there eventually! 1 One of the most interesting questions Suzanne got when presenting our research was: how did you get permission to do this? The question surprised me, because I hadn't thought that other companies would make this really hard, but her peers at the conference certainly had a hard time doing similar projects! The answer to this is a few things: we didn't ask for budget (nothing needs approval there), our engineers volunteered their time for it (they were interested, too!), and I'm on the leadership team and was able to just talk to the other company leaders to clear it.

a month ago 21 votes

More in programming

The wrongness of relativism

Comparing yourself to other startups? Focus on yourself instead.

22 hours ago 2 votes
AI: Where in the Loop Should Humans Go?

This is a re-publishing of a blog post I originally wrote for work, but wanted on my own blog as well. AI is everywhere, and its impressive claims are leading to rapid adoption. At this stage, I’d qualify it as charismatic technology—something that under-delivers on what it promises, but promises so much that the industry still leverages it because we believe it will eventually deliver on these claims. This is a known pattern. In this post, I’ll use the example of automation deployments to go over known patterns and risks in order to provide you with a list of questions to ask about potential AI solutions. I’ll first cover a short list of base assumptions, and then borrow from scholars of cognitive systems engineering and resilience engineering to list said criteria. At the core of it is the idea that when we say we want humans in the loop, it really matters where in the loop they are. My base assumptions The first thing I’m going to say is that we currently do not have Artificial General Intelligence (AGI). I don’t care whether we have it in 2 years or 40 years or never; if I’m looking to deploy a tool (or an agent) that is supposed to do stuff to my production environments, it has to be able to do it now. I am not looking to be impressed, I am looking to make my life and the system better. Another mechanism I want you to keep in mind is something called the context gap. In a nutshell, any model or automation is constructed from a narrow definition of a controlled environment, which can expand as it gains autonomy, but remains limited. By comparison, people in a system start from a broad situation and narrow definitions down and add constraints to make problem-solving tractable. One side starts from a narrow context, and one starts from a wide one—so in practice, with humans and machines, you end up seeing a type of teamwork where one constantly updates the other: The optimal solution of a model is not an optimal solution of a problem unless the model is a perfect representation of the problem, which it never is.  — Ackoff (1979, p. 97) Because of that mindset, I will disregard all arguments of “it’s coming soon” and “it’s getting better real fast” and instead frame what current LLM solutions are shaped like: tools and automation. As it turns out, there are lots of studies about ergonomics, tool design, collaborative design, where semi-autonomous components fit into sociotechnical systems, and how they tend to fail. Additionally, I’ll borrow from the framing used by people who study joint cognitive systems: rather than looking only at the abilities of what a single person or tool can do, we’re going to look at the overall performance of the joint system. This is important because if you have a tool that is built to be operated like an autonomous agent, you can get weird results in your integration. You’re essentially building an interface for the wrong kind of component—like using a joystick to ride a bicycle. This lens will assist us in establishing general criteria about where the problems will likely be without having to test for every single one and evaluate them on benchmarks against each other. Questions you'll want to ask The following list of questions is meant to act as reminders—abstracting away all the theory from research papers you’d need to read—to let you think through some of the important stuff your teams should track, whether they are engineers using code generation, SREs using AIOps, or managers and execs making the call to adopt new tooling. Are you better even after the tool is taken away? An interesting warning comes from studying how LLMs function as learning aides. The researchers found that people who trained using LLMs tended to fail tests more when the LLMs were taken away compared to people who never studied with them, except if the prompts were specifically (and successfully) designed to help people learn. Likewise, it’s been known for decades that when automation handles standard challenges, the operators expected to take over when they reach their limits end up worse off and generally require more training to keep the overall system performant. While people can feel like they’re getting better and more productive with tool assistance, it doesn’t necessarily follow that they are learning or improving. Over time, there’s a serious risk that your overall system’s performance will be limited to what the automation can do—because without proper design, people keeping the automation in check will gradually lose the skills they had developed prior. Are you augmenting the person or the computer? Traditionally successful tools tend to work on the principle that they improve the physical or mental abilities of their operator: search tools let you go through more data than you could on your own and shift demands to external memory, a bicycle more effectively transmits force for locomotion, a blind spot alert on your car can extend your ability to pay attention to your surroundings, and so on. Automation that augments users therefore tends to be easier to direct, and sort of extends the person’s abilities, rather than acting based on preset goals and framing. Automation that augments a machine tends to broaden the device’s scope and control by leveraging some known effects of their environment and successfully hiding them away. For software folks, an autoscaling controller is a good example of the latter. Neither is fundamentally better nor worse than the other—but you should figure out what kind of automation you’re getting, because they fail differently. Augmenting the user implies that they can tackle a broader variety of challenges effectively. Augmenting the computers tends to mean that when the component reaches its limits, the challenges are worse for the operator. Is it turning you into a monitor rather than helping build an understanding? If your job is to look at the tool go and then say whether it was doing a good or bad job (and maybe take over if it does a bad job), you’re going to have problems. It has long been known that people adapt to their tools, and automation can create complacency. Self-driving cars that generally self-drive themselves well but still require a monitor are not effectively monitored. Instead, having AI that supports people or adds perspectives to the work an operator is already doing tends to yield better long-term results than patterns where the human learns to mostly delegate and focus elsewhere. (As a side note, this is why I tend to dislike incident summarizers. Don’t make it so people stop trying to piece together what happened! Instead, I prefer seeing tools that look at your summaries to remind you of items you may have forgotten, or that look for linguistic cues that point to biases or reductive points of view.) Does it pigeonhole what you can look at? When evaluating a tool, you should ask questions about where the automation lands: Does it let you look at the world more effectively? Does it tell you where to look in the world? Does it force you to look somewhere specific? Does it tell you to do something specific? Does it force you to do something? This is a bit of a hybrid between “Does it extend you?” and “Is it turning you into a monitor?” The five questions above let you figure that out. As the tool becomes a source of assertions or constraints (rather than a source of information and options), the operator becomes someone who interacts with the world from inside the tool rather than someone who interacts with the world with the tool’s help. The tool stops being a tool and becomes a representation of the whole system, which means whatever limitations and internal constraints it has are then transmitted to your users. Is it a built-in distraction? People tend to do multiple tasks over many contexts. Some automated systems are built with alarms or alerts that require stealing someone’s focus, and unless they truly are the most critical thing their users could give attention to, they are going to be an annoyance that can lower the effectiveness of the overall system. What perspectives does it bake in? Tools tend to embody a given perspective. For example, AIOps tools that are built to find a root cause will likely carry the conceptual framework behind root causes in their design. More subtly, these perspectives are sometimes hidden in the type of data you get: if your AIOps agent can only see alerts, your telemetry data, and maybe your code, it will rarely be a source of suggestions on how to improve your workflows because that isn’t part of its world. In roles that are inherently about pulling context from many disconnected sources, how on earth is automation going to make the right decisions? And moreover, who’s accountable for when it makes a poor decision on incomplete data? Surely not the buyer who installed it! This is also one of the many ways in which automation can reinforce biases—not just based on what is in its training data, but also based on its own structure and what inputs were considered most important at design time. The tool can itself become a keyhole through which your conclusions are guided. Is it going to become a hero? A common trope in incident response is heroes—the few people who know everything inside and out, and who end up being necessary bottlenecks to all emergencies. They can’t go away for vacation, they’re too busy to train others, they develop blind spots that nobody can fix, and they can’t be replaced. To avoid this, you have to maintain a continuous awareness of who knows what, and crosstrain each other to always have enough redundancy. If you have a team of multiple engineers and you add AI to it, having it do all of the tasks of a specific kind means it becomes a de facto hero to your team. If that’s okay, be aware that any outages or dysfunction in the AI agent would likely have no practical workaround. You will essentially have offshored part of your ops. Do you need it to be perfect? What a thing promises to be is never what it is—otherwise AWS would be enough, and Kubernetes would be enough, and JIRA would be enough, and the software would work fine with no one needing to fix things. That just doesn’t happen. Ever. Even if it’s really, really good, it’s gonna have outages and surprises, and it’ll mess up here and there, no matter what it is. We aren’t building an omnipotent computer god, we’re building imperfect software. You’ll want to seriously consider whether the tradeoffs you’d make in terms of quality and cost are worth it, and this is going to be a case-by-case basis. Just be careful not to fix the problem by adding a human in the loop that acts as a monitor! Is it doing the whole job or a fraction of it? We don’t notice major parts of our own jobs because they feel natural. A classic pattern here is one of AIs getting better at diagnosing patients, except the benchmarks are usually run on a patient chart where most of the relevant observations have already been made by someone else. Similarly, we often see AI pass a test with flying colors while it still can’t be productive at the job the test represents. People in general have adopted a model of cognition based on information processing that’s very similar to how computers work (get data in, think, output stuff, rinse and repeat), but for decades, there have been multiple disciplines that looked harder at situated work and cognition, moving past that model. Key patterns of cognition are not just in the mind, but are also embedded in the environment and in the interactions we have with each other. Be wary of acquiring a solution that solves what you think the problem is rather than what it actually is. We routinely show we don’t accurately know the latter. What if we have more than one? You probably know how straightforward it can be to write a toy project on your own, with full control of every refactor. You probably also know how this stops being true as your team grows. As it stands today, a lot of AI agents are built within a snapshot of the current world: one or few AI tools added to teams that are mostly made up of people. By analogy, this would be like everyone selling you a computer assuming it were the first and only electronic device inside your household. Problems arise when you go beyond these assumptions: maybe AI that writes code has to go through a code review process, but what if that code review is done by another unrelated AI agent? What happens when you get to operations and common mode failures impact components from various teams that all have agents empowered to go fix things to the best of their ability with the available data? Are they going to clash with people, or even with each other? Humans also have that ability and tend to solve it via processes and procedures, explicit coordination, announcing what they’ll do before they do it, and calling upon each other when they need help. Will multiple agents require something equivalent, and if so, do you have it in place? How do they cope with limited context? Some changes that cause issues might be safe to roll back, some not (maybe they include database migrations, maybe it is better to be down than corrupting data), and some may contain changes that rolling back wouldn’t fix (maybe the workload is controlled by one or more feature flags). Knowing what to do in these situations can sometimes be understood from code or release notes, but some situations can require different workflows involving broader parts of the organization. A risk of automation without context is that if you have situations where waiting or doing little is the best option, then you’ll need to either have automation that requires input to act, or a set of actions to quickly disable multiple types of automation as fast as possible. Many of these may exist at the same time, and it becomes the operators’ jobs to not only maintain their own context, but also maintain a mental model of the context each of these pieces of automation has access to. The fancier your agents, the fancier your operators’ understanding and abilities must be to properly orchestrate them. The more surprising your landscape is, the harder it can become to manage with semi-autonomous elements roaming around. After an outage or incident, who does the learning and who does the fixing? One way to track accountability in a system is to figure out who ends up having to learn lessons and change how things are done. It’s not always the same people or teams, and generally, learning will happen whether you want it or not. This is more of a rhetorical question right now, because I expect that in most cases, when things go wrong, whoever is expected to monitor the AI tool is going to have to steer it in a better direction and fix it (if they can); if it can’t be fixed, then the expectation will be that the automation, as a tool, will be used more judiciously in the future. In a nutshell, if the expectation is that your engineers are going to be doing the learning and tweaking, your AI isn’t an independent agent—it’s a tool that cosplays as an independent agent. Do what you will—just be mindful All in all, none of the above questions flat out say you should not use AI, nor where exactly in the loop you should put people. The key point is that you should ask that question and be aware that just adding whatever to your system is not going to substitute workers away. It will, instead, transform work and create new patterns and weaknesses. Some of these patterns are known and well-studied. We don’t have to go rushing to rediscover them all through failures as if we were the first to ever automate something. If AI ever gets so good and so smart that it’s better than all your engineers, it won’t make a difference whether you adopt it only once it’s good. In the meanwhile, these things do matter and have real impacts, so please design your systems responsibly. If you’re interested to know more about the theoretical elements underpinning this post, the following references—on top of whatever was already linked in the text—might be of interest: Books: Joint Cognitive Systems: Foundations of Cognitive Systems Engineering by Erik Hollnagel Joint Cognitive Systems: Patterns in Cognitive Systems Engineering by David D. Woods Cognition in the Wild by Edwin Hutchins Behind Human Error by David D. Woods, Sydney Dekker, Richard Cook, Leila Johannesen, Nadine Sarter Papers: Ironies of Automation by Lisanne Bainbridge The French-Speaking Ergonomists’ Approach to Work Activity by Daniellou How in the World Did We Ever Get into That Mode? Mode Error and Awareness in Supervisory Control by Nadine Sarter Can We Ever Escape from Data Overload? A Cognitive Systems Diagnosis by David D. Woods Ten Challenges for Making Automation a “Team Player” in Joint Human-Agent Activity by Gary Klein and David D. Woods MABA-MABA or Abracadabra? Progress on Human–Automation Co-ordination by Sidney Dekker Managing the Hidden Costs of Coordination by Laura Maguire Designing for Expertise by David D. Woods The Impact of Generative AI on Critical Thinking by Lee et al.

2 days ago 5 votes
AMD YOLO

AMD is sending us the two MI300X boxes we asked for. They are in the mail. It took a bit, but AMD passed my cultural test. I now believe they aren’t going to shoot themselves in the foot on software, and if that’s true, there’s absolutely no reason they should be worth 1/16th of NVIDIA. CUDA isn’t really the moat people think it is, it was just an early ecosystem. tiny corp has a fully sovereign AMD stack, and soon we’ll port it to the MI300X. You won’t even have to use tinygrad proper, tinygrad has a torch frontend now. Either NVIDIA is super overvalued or AMD is undervalued. If the petaflop gets commoditized (tiny corp’s mission), the current situation doesn’t make any sense. The hardware is similar, AMD even got the double throughput Tensor Cores on RDNA4 (NVIDIA artificially halves this on their cards, soon they won’t be able to). I’m betting on AMD being undervalued, and that the demand for AI has barely started. With good software, the MI300X should outperform the H100. In for a quarter million. Long term. It can always dip short term, but check back in 5 years.

2 days ago 4 votes
whippet lab notebook: untagged mallocs, bis

Earlier this weekGuileWhippet But now I do! Today’s note is about how we can support untagged allocations of a few different kinds in Whippet’s .mostly-marking collector Why bother supporting untagged allocations at all? Well, if I had my way, I wouldn’t; I would just slog through Guile and fix all uses to be tagged. There are only a finite number of use sites and I could get to them all in a month or so. The problem comes for uses of from outside itself, in C extensions and embedding programs. These users are loathe to adapt to any kind of change, and garbage-collection-related changes are the worst. So, somehow, we need to support these users if we are not to break the Guile community.scm_gc_malloclibguile The problem with , though, is that it is missing an expression of intent, notably as regards tagging. You can use it to allocate an object that has a tag and thus can be traced precisely, or you can use it to allocate, well, anything else. I think we will have to add an API for the tagged case and assume that anything that goes through is requesting an untagged, conservatively-scanned block of memory. Similarly for : you could be allocating a tagged object that happens to not contain pointers, or you could be allocating an untagged array of whatever. A new API is needed there too for pointerless untagged allocations.scm_gc_mallocscm_gc_mallocscm_gc_malloc_pointerless Recall that the mostly-marking collector can be built in a number of different ways: it can support conservative and/or precise roots, it can trace the heap precisely or conservatively, it can be generational or not, and the collector can use multiple threads during pauses or not. Consider a basic configuration with precise roots. You can make tagged pointerless allocations just fine: the trace function for that tag is just trivial. You would like to extend the collector with the ability to make pointerless allocations, for raw data. How to do this?untagged Consider first that when the collector goes to trace an object, it can’t use bits inside the object to discriminate between the tagged and untagged cases. Fortunately though . Of those 8 bits, 3 are used for the mark (five different states, allowing for future concurrent tracing), two for the , one to indicate whether the object is pinned or not, and one to indicate the end of the object, so that we can determine object bounds just by scanning the metadata byte array. That leaves 1 bit, and we can use it to indicate untagged pointerless allocations. Hooray!the main space of the mostly-marking collector has one metadata byte for each 16 bytes of payloadprecise field-logging write barrier However there is a wrinkle: when Whippet decides the it should evacuate an object, it tracks the evacuation state in the object itself; the embedder has to provide an implementation of a , allowing the collector to detect whether an object is forwarded or not, to claim an object for forwarding, to commit a forwarding pointer, and so on. We can’t do that for raw data, because all bit states belong to the object, not the collector or the embedder. So, we have to set the “pinned” bit on the object, indicating that these objects can’t move.little state machine We could in theory manage the forwarding state in the metadata byte, but we don’t have the bits to do that currently; maybe some day. For now, untagged pointerless allocations are pinned. You might also want to support untagged allocations that contain pointers to other GC-managed objects. In this case you would want these untagged allocations to be scanned conservatively. We can do this, but if we do, it will pin all objects. Thing is, conservative stack roots is a kind of a sweet spot in language run-time design. You get to avoid constraining your compiler, you avoid a class of bugs related to rooting, but you can still support compaction of the heap. How is this, you ask? Well, consider that you can move any object for which we can precisely enumerate the incoming references. This is trivially the case for precise roots and precise tracing. For conservative roots, we don’t know whether a given edge is really an object reference or not, so we have to conservatively avoid moving those objects. But once you are done tracing conservative edges, any live object that hasn’t yet been traced is fair game for evacuation, because none of its predecessors have yet been visited. But once you add conservatively-traced objects back into the mix, you don’t know when you are done tracing conservative edges; you could always discover another conservatively-traced object later in the trace, so you have to pin everything. The good news, though, is that we have gained an easier migration path. I can now shove Whippet into Guile and get it running even before I have removed untagged allocations. Once I have done so, I will be able to allow for compaction / evacuation; things only get better from here. Also as a side benefit, the mostly-marking collector’s heap-conservative configurations are now faster, because we have metadata attached to objects which allows tracing to skip known-pointerless objects. This regains an optimization that BDW has long had via its , used in Guile since time out of mind.GC_malloc_atomic With support for untagged allocations, I think I am finally ready to start getting Whippet into Guile itself. Happy hacking, and see you on the other side! inside and outside on intent on data on slop fin

2 days ago 3 votes
Creating static map images with OpenStreetMap, Web Mercator, and Pillow

I’ve been working on a project where I need to plot points on a map. I don’t need an interactive or dynamic visualisation – just a static map with coloured dots for each coordinate. I’ve created maps on the web using Leaflet.js, which load map data from OpenStreetMap (OSM) and support zooming and panning – but for this project, I want a standalone image rather than something I embed in a web page. I want to put in coordinates, and get a PNG image back. This feels like it should be straightforward. There are lots of Python libraries for data visualisation, but it’s not an area I’ve ever explored in detail. I don’t know how to use these libraries, and despite trying I couldn’t work out how to accomplish this seemingly simple task. I made several attempts with libraries like matplotlib and plotly, but I felt like I was fighting the tools. Rather than persist, I wrote my own solution with “lower level” tools. The key was a page on the OpenStreetMap wiki explaining how to convert lat/lon coordinates into the pixel system used by OSM tiles. In particular, it allowed me to break the process into two steps: Get a “base map” image that covers the entire world Convert lat/lon coordinates into xy coordinates that can be overlaid on this image Let’s go through those steps. Get a “base map” image that covers the entire world Let’s talk about how OpenStreetMap works, and in particular their image tiles. If you start at the most zoomed-out level, OSM represents the entire world with a single 256×256 pixel square. This is the Web Mercator projection, and you don’t get much detail – just a rough outline of the world. We can zoom in, and this tile splits into four new tiles of the same size. There are twice as many pixels along each edge, and each tile has more detail. Notice that country boundaries are visible now, but we can’t see any names yet. We can zoom in even further, and each of these tiles split again. There still aren’t any text labels, but the map is getting more detailed and we can see small features that weren’t visible before. You get the idea – we could keep zooming, and we’d get more and more tiles, each with more detail. This tile system means you can get detailed information for a specific area, without loading the entire world. For example, if I’m looking at street information in Britain, I only need the detailed tiles for that part of the world. I don’t need the detailed tiles for Bolivia at the same time. OpenStreetMap will only give you 256×256 pixels at a time, but we can download every tile and stitch them together, one-by-one. Here’s a Python script that enumerates all the tiles at a particular zoom level, downloads them, and uses the Pillow library to combine them into a single large image: #!/usr/bin/env python3 """ Download all the map tiles for a particular zoom level from OpenStreetMap, and stitch them into a single image. """ import io import itertools import httpx from PIL import Image zoom_level = 2 width = 256 * 2**zoom_level height = 256 * (2**zoom_level) im = Image.new("RGB", (width, height)) for x, y in itertools.product(range(2**zoom_level), range(2**zoom_level)): resp = httpx.get(f"https://tile.openstreetmap.org/{zoom_level}/{x}/{y}.png", timeout=50) resp.raise_for_status() im_buffer = Image.open(io.BytesIO(resp.content)) im.paste(im_buffer, (x * 256, y * 256)) out_path = f"map_{zoom_level}.png" im.save(out_path) print(out_path) The higher the zoom level, the more tiles you need to download, and the larger the final image will be. I ran this script up to zoom level 6, and this is the data involved: Zoom level Number of tiles Pixels File size 0 1 256×256 17.1 kB 1 4 512×512 56.3 kB 2 16 1024×1024 155.2 kB 3 64 2048×2048 506.4 kB 4 256 4096×4096 2.7 MB 5 1,024 8192×8192 13.9 MB 6 4,096 16384×16384 46.1 MB I can just about open that zoom level 6 image on my computer, but it’s struggling. I didn’t try opening zoom level 7 – that includes 16,384 tiles, and I’d probably run out of memory. For most static images, zoom level 3 or 4 should be sufficient – I ended up a base map from zoom level 4 for my project. It takes a minute or so to download all the tiles from OpenStreetMap, but you only need to request it once, and then you have a static image you can use again and again. This is a particularly good approach if you want to draw a lot of maps. OpenStreetMap is provided for free, and we want to be a respectful user of the service. Downloading all the map tiles once is more efficient than making repeated requests for the same data. Overlay lat/lon coordinates on this base map Now we have an image with a map of the whole world, we need to overlay our lat/lon coordinates as points on this map. I found instructions on the OpenStreetMap wiki which explain how to convert GPS coordinates into a position on the unit square, which we can in turn add to our map. They outline a straightforward algorithm, which I implemented in Python: import math def convert_gps_coordinates_to_unit_xy( *, latitude: float, longitude: float ) -> tuple[float, float]: """ Convert GPS coordinates to positions on the unit square, which can be plotted on a Web Mercator projection of the world. This expects the coordinates to be specified in **degrees**. The result will be (x, y) coordinates: - x will fall in the range (0, 1). x=0 is the left (180° west) edge of the map. x=1 is the right (180° east) edge of the map. x=0.5 is the middle, the prime meridian. - y will fall in the range (0, 1). y=0 is the top (north) edge of the map, at 85.0511 °N. y=1 is the bottom (south) edge of the map, at 85.0511 °S. y=0.5 is the middle, the equator. """ # This is based on instructions from the OpenStreetMap Wiki: # https://wiki.openstreetmap.org/wiki/Slippy_map_tilenames#Example:_Convert_a_GPS_coordinate_to_a_pixel_position_in_a_Web_Mercator_tile # (Retrieved 16 January 2025) # Convert the coordinate to the Web Mercator projection # (https://epsg.io/3857) # # x = longitude # y = arsinh(tan(latitude)) # x_webm = longitude y_webm = math.asinh(math.tan(math.radians(latitude))) # Transform the projected point onto the unit square # # x = 0.5 + x / 360 # y = 0.5 - y / 2π # x_unit = 0.5 + x_webm / 360 y_unit = 0.5 - y_webm / (2 * math.pi) return x_unit, y_unit Their documentation includes a worked example using the coordinates of the Hachiko Statue. We can run our code, and check we get the same results: >>> convert_gps_coordinates_to_unit_xy(latitude=35.6590699, longitude=139.7006793) (0.8880574425, 0.39385379958274735) Most users of OpenStreetMap tiles will use these unit positions to select the tiles they need, and then dowload those images – but we can also position these points directly on the global map. I wrote some more Pillow code that converts GPS coordinates to these unit positions, scales those unit positions to the size of the entire map, then draws a coloured circle at each point on the map. Here’s the code: from PIL import Image, ImageDraw gps_coordinates = [ # Hachiko Memorial Statue in Tokyo {"latitude": 35.6590699, "longitude": 139.7006793}, # Greyfriars Bobby in Edinburgh {"latitude": 55.9469224, "longitude": -3.1913043}, # Fido Statue in Tuscany {"latitude": 43.955101, "longitude": 11.388186}, ] im = Image.open("base_map.png") draw = ImageDraw.Draw(im) for coord in gps_coordinates: x, y = convert_gps_coordinates_to_unit_xy(**coord) radius = 32 draw.ellipse( [ x * im.width - radius, y * im.height - radius, x * im.width + radius, y * im.height + radius, ], fill="red", ) im.save("map_with_dots.png") and here’s the map it produces: The nice thing about writing this code in Pillow is that it’s a library I already know how to use, and so I can customise it if I need to. I can change the shape and colour of the points, or crop to specific regions, or add text to the image. I’m sure more sophisticated data visualisation libraries can do all this, and more – but I wouldn’t know how. The downside is that if I need more advanced features, I’ll have to write them myself. I’m okay with that – trading sophistication for simplicity. I didn’t need to learn a complex visualization library – I was able to write code I can read and understand. In a world full of AI-generating code, writing something I know I understand feels more important than ever. [If the formatting of this post looks odd in your feed reader, visit the original article]

2 days ago 5 votes